linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo
@ 2018-12-16 13:16 Lianbo Jiang
  2018-12-16 13:16 ` [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation Lianbo Jiang
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Lianbo Jiang @ 2018-12-16 13:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: kexec, tglx, mingo, x86, akpm, bhe, dyoung, linux-doc

This patchset did two things:
a. add a new document for vmcoreinfo

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo, and it would normalize the
exported variable as a standard ABI between kernel and use-space.

b. export the value of sme mask to vmcoreinfo

For AMD machine with SME feature, makedumpfile tools need to know whether
the crash kernel was encrypted or not. If SME is enabled in the first
kernel, the crash kernel's page table(pgd/pud/pmd/pte) contains the
memory encryption mask, so need to remove the sme mask to obtain the true
physical address.

Changes since v1:
1. No need to export a kernel-internal mask to userspace, so copy the
value of sme_me_mask to a local variable 'sme_mask' and write the value
of sme_mask to vmcoreinfo.
2. Add comment for the code.
3. Improve the patch log.
4. Add the vmcoreinfo documentation.

Changes since v2:
1. Improve the vmcoreinfo document, add more descripts for these
variables exported.
2. Fix spelling errors in the document.

Lianbo Jiang (2):
  kdump: add the vmcoreinfo documentation
  kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

 Documentation/kdump/vmcoreinfo.txt | 456 +++++++++++++++++++++++++++++
 arch/x86/kernel/machine_kexec_64.c |  14 +
 2 files changed, 470 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation
  2018-12-16 13:16 [PATCH 0/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo Lianbo Jiang
@ 2018-12-16 13:16 ` Lianbo Jiang
  2018-12-17 11:52   ` Borislav Petkov
                     ` (2 more replies)
  2018-12-16 13:16 ` [PATCH 2/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo Lianbo Jiang
  2018-12-17 11:54 ` [PATCH 0/2 " Borislav Petkov
  2 siblings, 3 replies; 14+ messages in thread
From: Lianbo Jiang @ 2018-12-16 13:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: kexec, tglx, mingo, x86, akpm, bhe, dyoung, linux-doc

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo, and it would normalize the
exported variable as a standard ABI between kernel and use-space.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 Documentation/kdump/vmcoreinfo.txt | 456 +++++++++++++++++++++++++++++
 1 file changed, 456 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

diff --git a/Documentation/kdump/vmcoreinfo.txt b/Documentation/kdump/vmcoreinfo.txt
new file mode 100644
index 000000000000..d71260bf383a
--- /dev/null
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -0,0 +1,456 @@
+================================================================
+		Documentation for VMCOREINFO
+================================================================
+
+=======================
+What is the VMCOREINFO?
+=======================
+It is a special ELF note section. The VMCOREINFO contains the first
+kernel's various information, for example, structure size, page size,
+symbol values and field offset, etc. These data are packed into an ELF
+note section, and these data will also help user-space tools(e.g. crash
+makedumpfile) analyze the first kernel's memory usage.
+
+In general, makedumpfile can dump the VMCOREINFO contents from vmlinux
+in the first kernel. For example:
+# makedumpfile -g VMCOREINFO -x vmlinux
+
+================
+Common variables
+================
+
+init_uts_ns.name.release
+========================
+The number of OS release. Based on this version number, people can find
+the source code for the corresponding version. When analyzing the vmcore,
+people must read the source code to find the reason why the kernel crashed.
+
+PAGE_SIZE
+=========
+The size of a page. It is the smallest unit of data for memory management
+in kernel. It is usually 4k bytes and the page is aligned in 4k bytes,
+which is very important for computing address.
+
+init_uts_ns
+===========
+This is the UTS namespace, which is used to isolate two specific elements
+of the system that relate to the uname system call. The UTS namespace is
+named after the data structure used to store information returned by the
+uname system call.
+
+User-space tools can get the kernel name, host name, kernel release number,
+kernel version, architecture name and OS type from the 'init_uts_ns'.
+
+node_online_map
+===============
+It is a macro definition, actually it is an array node_states[N_ONLINE],
+and it represents the set of online node in a system, one bit position
+per node number.
+
+This is used to keep track of which nodes are in the system and online.
+
+swapper_pg_dir
+=============
+It generally indicates the pgd for the kernel. When mmu is enabled in
+config file, the 'swapper_pg_dir' is valid.
+
+The 'swapper_pg_dir' helps to translate the virtual address to a physical
+address.
+
+_stext
+======
+It is an assemble symbol that defines the beginning of the text section.
+In general, the '_stext' indicates the kernel start address. This is used
+to convert a virtual address to a physical address when the virtual address
+does not belong to the 'vmalloc' address.
+
+vmap_area_list
+==============
+It stores the virtual area list, makedumpfile can get the vmalloc start
+value from this variable. This value is necessary for vmalloc translation.
+
+mem_map
+=======
+Physical addresses are translated to struct pages by treating them as an
+index into the mem_map array. Shifting a physical address PAGE_SHIFT bits
+to the right will treat it as a PFN from physical address 0, which is also
+an index within the mem_map array.
+
+In short, it can map the address to struct page.
+
+contig_page_data
+================
+Makedumpfile can get the pglist_data structure from this symbol
+'contig_page_data'. The pglist_data structure is used to describe the
+memory layout.
+
+User-space tools can use this symbols for excluding free pages.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+==========================================================================
+Export the address of 'mem_section' array, and it's length, structure size,
+and the 'section_mem_map' offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variable, both of them will help to translate
+the address.
+
+page
+====
+The size of a 'page' structure. In kernel, the page is an important data
+structure, it is widely used to compute the continuous memory.
+
+pglist_data
+===========
+The size of a 'pglist_data' structure. This value will be used to check if
+the 'pglist_data' structure is valid. It is also one of the conditions for
+checking the memory type.
+
+zone
+====
+The size of a 'zone' structure. This value is often used to check if the
+'zone' structure is found. It is necessary structures for excluding free
+pages.
+
+free_area
+=========
+The size of a 'free_area' structure. It indicates whether the 'free_area'
+structure is valid or not. This is useful for excluding free pages.
+
+list_head
+=========
+The size of a 'list_head' structure. It depends on this value when
+iterating the free list.
+
+nodemask_t
+==========
+The size of a 'nodemask_t' type. This value is used to compute the number
+of online nodes.
+
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|
+       compound_order|compound_head)
+===================================================================
+User-space tools can compute their values based on the offset of these
+variables. The variables are helpful to exclude unnecessary pages.
+
+(pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_
+              spanned_pages|node_id)
+===================================================================
+On NUMA machines, each NUMA node has a pg_data_t to describe it's memory
+layout. On UMA machines there is a single pglist_data which describes the
+whole memory.
+
+These values are used to check the memory type, and they are also helpful
+to compute the virtual address for memory map.
+
+(zone, free_area|vm_stat|spanned_pages)
+=======================================
+Each node is divided up into a number of blocks called zones which
+represent ranges within memory. A zone is described by a structure zone.
+Each zone type is suitable for a different type of usage.
+
+User-space tools can compute their values based on the offset of these
+variables.
+
+(free_area, free_list)
+======================
+Offset of the free_list's member. This value is used to compute the number
+of free pages.
+
+Each zone has a free_area structure array called free_area[MAX_ORDER].
+The fields in this structure are simple, the free_list represents a linked
+list of free page blocks.
+
+(list_head, next|prev)
+======================
+Offsets of the list_head's members. In general, the list_head is used to
+define a circular linked list. User-space tools often need to traverse
+the lists to get specific pages.
+
+(vmap_area, va_start|list)
+==========================
+Offsets of the vmap_area's members. They indicate the vmalloc layer
+information. Makedumpfile can get the start address of vmalloc region.
+
+(zone.free_area, MAX_ORDER)
+===========================
+It indicates the maximum number of the array free_area. This macro is
+used to the zone buddy allocator. User-space tools use this value to
+iterate the free_area.
+
+log_buf
+=======
+In general, console output is written to the ring buffer 'log_buf' at
+index 'log_first_idx'. It can get kernel log from the log_buf.
+
+log_buf_len
+===========
+Length of a 'log_buf'. Makedumpfile can read the number of strings
+from the log_buf.
+
+log_first_idx
+=============
+Index of the first record stored in the buffer 'log_buf'. This value
+tells the user-space tools the place where to read the strings in the
+log_buf.
+
+clear_idx
+=========
+The index that the next printk record to read after the last 'clear'
+command. It indicates the first record after the last SYSLOG_ACTION
+_CLEAR, like issued by 'dmesg -c'.
+
+log_next_idx
+============
+The index of the next record to store in the buffer 'log_buf'. It helps
+to compute the index of current strings position.
+
+printk_log
+==========
+The size of a structure 'printk_log'. It helps to compute the size of
+messages, and extract dmesg log.
+
+(printk_log, ts_nsec|len|text_len|dict_len)
+===========================================
+It represents these field offsets in the structure 'printk_log'. User
+space tools can parse it and detect any changes to structure down the
+line.
+
+(free_area.free_list, MIGRATE_TYPES)
+====================================
+The number of migrate types for pages. The free_list is divided into
+the array, it needs to know the number of the array.
+
+NR_FREE_PAGES
+=============
+On linux-2.6.21 or later, the number of free_pages is in
+vm_stat[NR_FREE_PAGES]. It can get the number of free pages from the
+array.
+
+PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|
+PG_hwpoision|PG_head_mask
+=====================================================
+It means the attribute of a page. These flags will be used to filter
+the free pages.
+
+PAGE_BUDDY_MAPCOUNT_VALUE or ~PG_buddy
+======================================
+The 'PG_buddy' flag indicates that the page is free and in the buddy
+system. Makedumpfile can exclude the free pages managed by a buddy.
+
+HUGETLB_PAGE_DTOR
+=================
+The 'HUGETLB_PAGE_DTOR' flag indicates the hugetlbfs pages. Makedumpfile
+will exclude these pages.
+
+================
+x86_64 variables
+================
+
+phys_base
+=========
+In x86_64, the 'phys_base' is necessary to convert virtual address of
+exported kernel symbol to physical address.
+
+init_top_pgt
+============
+The 'init_top_pgt' used to walk through the whole page table and convert
+virtual address to physical address.
+
+pgtable_l5_enabled
+==================
+User-space tools need to know whether the crash kernel was in 5-level
+paging mode or not.
+
+node_data
+=========
+This is a struct 'pglist_data' array, it stores all numa nodes information.
+In general, Makedumpfile can get the pglist_data structure from symbol
+'node_data'.
+
+(node_data, MAX_NUMNODES)
+=========================
+The number of this 'node_data' array. It means the maximum number of the
+nodes in system.
+
+KERNELOFFSET
+============
+Randomize the address of the kernel image. This is the offset of KASLR in
+VMCOREINFO ELF notes. It is used to compute the page offset in x86_64. If
+KASLE is disabled, this value is zero.
+
+KERNEL_IMAGE_SIZE
+=================
+The size of 'KERNEL_IMAGE_SIZE', currently unused.
+
+The old MODULES_VADDR need be decided by KERNEL_IMAGE_SIZE when kaslr
+enabled. Now MODULES_VADDR is not needed any more since Pratyush makes
+all VA to PA converting done by page table lookup.
+
+PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)
+========================================
+The value of 'PG_offline' flag can be used for marking pages as logically
+offline. Makedumpfile can directly skip pages that are logically offline.
+
+sme_mask
+========
+For AMD machine with SME feature, it indicates the secure memory encryption
+mask. Makedumpfile tools need to know whether the crash kernel was encrypted
+or not. If SME is enabled in the first kernel, the crash kernel's page
+table(pgd/pud/pmd/pte) contains the memory encryption mask, so need to
+remove the sme mask to obtain the true physical address.
+
+=============
+x86 variables
+=============
+
+X86_PAE
+=======
+It means the physical address extension. It has the cost of more
+page table lookup overhead, and also consumes more page table space
+per process. This flag will be used to check whether the PAE was
+enabled in crash kernel or not when converting virtual address to
+physical address.
+
+==============
+ia64 variables
+==============
+
+pgdat_list|(pgdat_list, MAX_NUMNODES)
+=====================================
+This is a struct 'pg_data_t' array, it stores all numa nodes information.
+And the 'MAX_NUMNODES' indicates the number of the nodes.
+
+node_memblk|(node_memblk, NR_NODE_MEMBLKS)
+==========================================
+List of node memory chunks. Filled when parsing SRAT table to obtain
+information about memory nodes. The 'NR_NODE_MEMBLKS' indicates the number
+of node memory chunks.
+
+These values are used to compute the number of nodes in crash kernel.
+
+node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size)
+================================================================
+The size of a struct 'node_memblk_s', and the offsets of the
+node_memblk_s's members. It helps to compute the number of nodes.
+
+PGTABLE_3|PGTABLE_4
+===================
+User-space tools need to know whether the crash kernel was in 3-level or
+4-level paging mode. This flag can help to distinguish the page table.
+
+===============
+arm64 variables
+===============
+
+VA_BITS
+=======
+The maximum number of bits for virtual addresses. This value helps to
+compute the virtual memory ranges.
+
+kimage_voffset
+==============
+The offset between the kernel virtual and physical mappings. This value
+helps to translate virtual address to physical address.
+
+PHYS_OFFSET
+===========
+It indicates the physical address of the start of memory. It is similar
+with the kimage_voffset, which is used to translate virtual address to
+physical address.
+
+KERNELOFFSET
+============
+It is similar to x86_64.
+
+=============
+arm variables
+=============
+
+ARM_LPAE
+========
+It indicates whether the crash kernel support the large physical address
+extension. This value will tell you how to translate virtual address to
+physical address.
+
+==============
+s390 variables
+==============
+
+lowcore_ptr
+==========
+An array with a pointer to the lowcore of every CPU. This value
+helps to print the psw and all registers information.
+
+high_memory
+===========
+It can get the vmalloc_start address from the high_memory symbol.
+
+(lowcore_ptr, NR_CPUS)
+======================
+The maximum number of cpus.
+
+TODO.
+
+powerpc variables
+=================
+
+node_data|(node_data, MAX_NUMNODES)
+===================================
+Please refer to common variables.
+
+contig_page_data
+================
+Please refer to common variables.
+
+vmemmap_list
+============
+The 'vmemmap_list' maintains the entire vmemmap physical mapping. It
+can get vmemmap list count and populate vmemmap regions info. If the
+vmemmap address translation information is stored in crash kernel,
+which helps to translate vmemmap kernel virtual addresses.
+
+mmu_vmemmap_psize
+=================
+The size of a page. It will try to use this page sizes for vmemmap if
+support. This value helps to translate virtual address to physical
+address.
+
+mmu_psize_defs
+==============
+It stores a variety of pages, such as the page size is 4k, 64k, or 16M.
+
+It depends on this value when making vtop translations.
+
+vmemmap_backing|(vmemmap_backing, list)|(vmemmap_backing, phys)|
+(vmemmap_backing, virt_addr)
+================================================================
+The vmemmap virtual address space management does not have a traditional
+page table to track which virtual struct pages are backed by physical
+mapping. The virtual to physical mappings are tracked in a simple linked
+list format.
+
+And user-space tools need to know the offset of 'list', 'phys' and
+'virt_addr'. It depends on these values when computing the count of
+vmemmap regions.
+
+mmu_psize_def|(mmu_psize_def, shift)
+====================================
+The size of a struct 'mmu_psize_def', and the offset of mmu_psize_def's
+member.
+
+These values help to make the vtop translations.
+
+============
+sh variables
+============
+
+node_data|(node_data, MAX_NUMNODES)
+===================================
+It is similar to X86_64, please refer to above description.
+
+X2TLB
+=====
+It indicates whether the crash kernel enables the extended mode of the SH.
+
+TODO.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo
  2018-12-16 13:16 [PATCH 0/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo Lianbo Jiang
  2018-12-16 13:16 ` [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation Lianbo Jiang
@ 2018-12-16 13:16 ` Lianbo Jiang
  2018-12-17 13:01   ` Borislav Petkov
  2018-12-17 11:54 ` [PATCH 0/2 " Borislav Petkov
  2 siblings, 1 reply; 14+ messages in thread
From: Lianbo Jiang @ 2018-12-16 13:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: kexec, tglx, mingo, x86, akpm, bhe, dyoung, linux-doc

For AMD machine with SME feature, makedumpfile tools need to know
whether the crash kernel was encrypted or not. If SME is enabled
in the first kernel, the crash kernel's page table(pgd/pud/pmd/pte)
contains the memory encryption mask, so need to remove the sme mask
to obtain the true physical address.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 arch/x86/kernel/machine_kexec_64.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 4c8acdfdc5a7..1860fe24117d 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -352,10 +352,24 @@ void machine_kexec(struct kimage *image)
 
 void arch_crash_save_vmcoreinfo(void)
 {
+	u64 sme_mask = sme_me_mask;
+
 	VMCOREINFO_NUMBER(phys_base);
 	VMCOREINFO_SYMBOL(init_top_pgt);
 	vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
 			pgtable_l5_enabled());
+	/*
+	 * Currently, the local variable 'sme_mask' stores the value of
+	 * sme_me_mask(bit 47), and also write the value of sme_mask to
+	 * the vmcoreinfo.
+	 * If need, the bit(sme_mask) might be redefined in the future,
+	 * but the 'bit63' will be reserved.
+	 * For example:
+	 * [ misc	   ][ enc bit  ][ other misc SME info       ]
+	 * 0000_0000_0000_0000_1000_0000_0000_0000_0000_0000_..._0000
+	 * 63   59   55   51   47   43   39   35   31   27   ... 3
+	 */
+	VMCOREINFO_NUMBER(sme_mask);
 
 #ifdef CONFIG_NUMA
 	VMCOREINFO_SYMBOL(node_data);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation
  2018-12-16 13:16 ` [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation Lianbo Jiang
@ 2018-12-17 11:52   ` Borislav Petkov
  2018-12-17 12:12   ` Borislav Petkov
  2018-12-17 13:00   ` Borislav Petkov
  2 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2018-12-17 11:52 UTC (permalink / raw)
  To: Lianbo Jiang
  Cc: linux-kernel, kexec, tglx, mingo, x86, akpm, bhe, dyoung, linux-doc

On Sun, Dec 16, 2018 at 09:16:16PM +0800, Lianbo Jiang wrote:
> +================
> +Common variables
> +================
> +
> +init_uts_ns.name.release
> +========================
> +The number of OS release. Based on this version number, people can find
> +the source code for the corresponding version. When analyzing the vmcore,
> +people must read the source code to find the reason why the kernel crashed.
> +

> +
> +init_uts_ns
> +===========
> +This is the UTS namespace, which is used to isolate two specific elements
> +of the system that relate to the uname system call. The UTS namespace is
> +named after the data structure used to store information returned by the
> +uname system call.
> +
> +User-space tools can get the kernel name, host name, kernel release number,
> +kernel version, architecture name and OS type from the 'init_uts_ns'.

And this document already fulfills its purpose - those two vmcoreinfo
exports are redundant and the first one can be removed.

And now that we agreed that VMCOREINFO is not an ABI and is very tightly
coupled to the kernel version, init_uts_ns.name.release can be removed,
yes?

Or is there anything speaking against that?

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo
  2018-12-16 13:16 [PATCH 0/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo Lianbo Jiang
  2018-12-16 13:16 ` [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation Lianbo Jiang
  2018-12-16 13:16 ` [PATCH 2/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo Lianbo Jiang
@ 2018-12-17 11:54 ` Borislav Petkov
  2 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2018-12-17 11:54 UTC (permalink / raw)
  To: Lianbo Jiang
  Cc: linux-kernel, kexec, tglx, mingo, x86, akpm, bhe, dyoung, linux-doc

On Sun, Dec 16, 2018 at 09:16:15PM +0800, Lianbo Jiang wrote:
> This patchset did two things:
> a. add a new document for vmcoreinfo
> 
> This document lists some variables that export to vmcoreinfo, and briefly
> describles what these variables indicate. It should be instructive for
> many people who do not know the vmcoreinfo, and it would normalize the
> exported variable as a standard ABI between kernel and use-space.
> 
> b. export the value of sme mask to vmcoreinfo
> 
> For AMD machine with SME feature, makedumpfile tools need to know whether
> the crash kernel was encrypted or not. If SME is enabled in the first
> kernel, the crash kernel's page table(pgd/pud/pmd/pte) contains the
> memory encryption mask, so need to remove the sme mask to obtain the true
> physical address.
> 
> Changes since v1:
> 1. No need to export a kernel-internal mask to userspace, so copy the
> value of sme_me_mask to a local variable 'sme_mask' and write the value
> of sme_mask to vmcoreinfo.
> 2. Add comment for the code.
> 3. Improve the patch log.
> 4. Add the vmcoreinfo documentation.
> 
> Changes since v2:
> 1. Improve the vmcoreinfo document, add more descripts for these
> variables exported.
> 2. Fix spelling errors in the document.

Yes, it is starting to look better.

The last thing that's missing is a checkpatch.pl check which verifies
whether a new VMCOREINFO export is not being documented and warn if so.
But you can do that later.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation
  2018-12-16 13:16 ` [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation Lianbo Jiang
  2018-12-17 11:52   ` Borislav Petkov
@ 2018-12-17 12:12   ` Borislav Petkov
  2018-12-17 13:00   ` Borislav Petkov
  2 siblings, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2018-12-17 12:12 UTC (permalink / raw)
  To: Lianbo Jiang
  Cc: linux-kernel, kexec, tglx, mingo, x86, akpm, bhe, dyoung, linux-doc

On Sun, Dec 16, 2018 at 09:16:16PM +0800, Lianbo Jiang wrote:

This...

> +node_online_map
> +===============
> +It is a macro definition, actually it is an array node_states[N_ONLINE],
> +and it represents the set of online node in a system, one bit position
> +per node number.
> +
> +This is used to keep track of which nodes are in the system and online.

... and this...

> +nodemask_t
> +==========
> +The size of a 'nodemask_t' type. This value is used to compute the number
> +of online nodes.

sound redundant too?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation
  2018-12-16 13:16 ` [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation Lianbo Jiang
  2018-12-17 11:52   ` Borislav Petkov
  2018-12-17 12:12   ` Borislav Petkov
@ 2018-12-17 13:00   ` Borislav Petkov
  2018-12-18  7:31     ` lijiang
  2 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2018-12-17 13:00 UTC (permalink / raw)
  To: Lianbo Jiang
  Cc: linux-kernel, kexec, tglx, mingo, x86, akpm, bhe, dyoung, linux-doc

On Sun, Dec 16, 2018 at 09:16:16PM +0800, Lianbo Jiang wrote:
> +clear_idx
> +=========
> +The index that the next printk record to read after the last 'clear'
> +command. It indicates the first record after the last SYSLOG_ACTION
> +_CLEAR, like issued by 'dmesg -c'.

What is that used for by the userspace tools?

> +
> +log_next_idx
> +============
> +The index of the next record to store in the buffer 'log_buf'. It helps
> +to compute the index of current strings position.
> +
> +printk_log
> +==========
> +The size of a structure 'printk_log'. It helps to compute the size of
> +messages, and extract dmesg log.

What is the difference between that and log_buf?



> +
> +(printk_log, ts_nsec|len|text_len|dict_len)
> +===========================================
> +It represents these field offsets in the structure 'printk_log'. User
> +space tools can parse it and detect any changes to structure down the
> +line.

What does that mean? "any changes down the line"?

> +
> +(free_area.free_list, MIGRATE_TYPES)
> +====================================
> +The number of migrate types for pages. The free_list is divided into
> +the array, it needs to know the number of the array.

... for?

> +
> +NR_FREE_PAGES
> +=============
> +On linux-2.6.21 or later, the number of free_pages is in
> +vm_stat[NR_FREE_PAGES]. It can get the number of free pages from the
> +array.
> +
> +PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|
> +PG_hwpoision|PG_head_mask
> +=====================================================
> +It means the attribute of a page. These flags will be used to filter
> +the free pages.
> +
> +PAGE_BUDDY_MAPCOUNT_VALUE or ~PG_buddy
> +======================================
> +The 'PG_buddy' flag indicates that the page is free and in the buddy
> +system. Makedumpfile can exclude the free pages managed by a buddy.

That text belongs with the one above?

> +
> +HUGETLB_PAGE_DTOR
> +=================
> +The 'HUGETLB_PAGE_DTOR' flag indicates the hugetlbfs pages. Makedumpfile
> +will exclude these pages.
> +
> +================
> +x86_64 variables
> +================
> +
> +phys_base
> +=========
> +In x86_64, the 'phys_base' is necessary to convert virtual address of
> +exported kernel symbol to physical address.
> +
> +init_top_pgt
> +============
> +The 'init_top_pgt' used to walk through the whole page table and convert
> +virtual address to physical address.

This is the same as swapper_pg_dir?

> +
> +pgtable_l5_enabled
> +==================
> +User-space tools need to know whether the crash kernel was in 5-level
> +paging mode or not.
> +
> +node_data
> +=========
> +This is a struct 'pglist_data' array, it stores all numa nodes information.
> +In general, Makedumpfile can get the pglist_data structure from symbol
> +'node_data'.
> +
> +(node_data, MAX_NUMNODES)
> +=========================
> +The number of this 'node_data' array. It means the maximum number of the
> +nodes in system.
> +
> +KERNELOFFSET
> +============
> +Randomize the address of the kernel image. This is the offset of KASLR in
> +VMCOREINFO ELF notes. It is used to compute the page offset in x86_64. If
> +KASLE is disabled, this value is zero.
> +
> +KERNEL_IMAGE_SIZE
> +=================
> +The size of 'KERNEL_IMAGE_SIZE', currently unused.

So remove?

> +
> +The old MODULES_VADDR need be decided by KERNEL_IMAGE_SIZE when kaslr
> +enabled. Now MODULES_VADDR is not needed any more since Pratyush makes
> +all VA to PA converting done by page table lookup.

Also, I did clean this up considerably - please include in your next
version:

---
diff --git a/Documentation/kdump/vmcoreinfo.txt b/Documentation/kdump/vmcoreinfo.txt
index d71260bf383a..2ce34d952bfd 100644
--- a/Documentation/kdump/vmcoreinfo.txt
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -1,18 +1,19 @@
 ================================================================
-		Documentation for VMCOREINFO
+			VMCOREINFO
 ================================================================
 
 =======================
 What is the VMCOREINFO?
 =======================
-It is a special ELF note section. The VMCOREINFO contains the first
-kernel's various information, for example, structure size, page size,
-symbol values and field offset, etc. These data are packed into an ELF
-note section, and these data will also help user-space tools(e.g. crash
-makedumpfile) analyze the first kernel's memory usage.
-
-In general, makedumpfile can dump the VMCOREINFO contents from vmlinux
-in the first kernel. For example:
+
+VMCOREINFO is a special ELF note section. It contains various
+information from the kernel like structure size, page size, symbol
+values, field offsets, etc. These data are packed into an ELF note
+section and used by user-space tools like crash and makedumpfile to
+analyze a kernel's memory layout.
+
+To dump the VMCOREINFO contents, one can do:
+
 # makedumpfile -g VMCOREINFO -x vmlinux
 
 ================
@@ -20,123 +21,132 @@ Common variables
 ================
 
 init_uts_ns.name.release
-========================
-The number of OS release. Based on this version number, people can find
-the source code for the corresponding version. When analyzing the vmcore,
-people must read the source code to find the reason why the kernel crashed.
+------------------------
+
+The version of the Linux kernel. Used to find the corresponding source
+code from which the kernel has been built.
 
 PAGE_SIZE
-=========
-The size of a page. It is the smallest unit of data for memory management
-in kernel. It is usually 4k bytes and the page is aligned in 4k bytes,
-which is very important for computing address.
+---------
+
+The size of a page. It is the smallest unit of data for memory
+management in kernel. It is usually 4096 bytes and a page is aligned on
+4096 bytes. Used for computing page addresses.
 
 init_uts_ns
-===========
-This is the UTS namespace, which is used to isolate two specific elements
-of the system that relate to the uname system call. The UTS namespace is
-named after the data structure used to store information returned by the
-uname system call.
+-----------
+
+This is the UTS namespace, which is used to isolate two specific
+elements of the system that relate to the uname(2) system call. The UTS
+namespace is named after the data structure used to store information
+returned by the uname(2) system call.
 
-User-space tools can get the kernel name, host name, kernel release number,
-kernel version, architecture name and OS type from the 'init_uts_ns'.
+User-space tools can get the kernel name, host name, kernel release
+number, kernel version, architecture name and OS type from it.
 
 node_online_map
-===============
-It is a macro definition, actually it is an array node_states[N_ONLINE],
-and it represents the set of online node in a system, one bit position
-per node number.
+---------------
 
-This is used to keep track of which nodes are in the system and online.
+An array node_states[N_ONLINE] which represents the set of online node
+in a system, one bit position per node number. Used to keep track of
+which nodes are in the system and online.
 
 swapper_pg_dir
-=============
-It generally indicates the pgd for the kernel. When mmu is enabled in
-config file, the 'swapper_pg_dir' is valid.
+-------------
 
-The 'swapper_pg_dir' helps to translate the virtual address to a physical
-address.
+The global page directory pointer of the kernel. Used to translate
+virtual to physical addresses.
 
 _stext
-======
-It is an assemble symbol that defines the beginning of the text section.
-In general, the '_stext' indicates the kernel start address. This is used
-to convert a virtual address to a physical address when the virtual address
-does not belong to the 'vmalloc' address.
+------
+
+Defines the beginning of the text section. In general, _stext indicates
+the kernel start address. Used to convert a virtual address from the
+direct kernel map to a physical address.
 
 vmap_area_list
-==============
-It stores the virtual area list, makedumpfile can get the vmalloc start
+--------------
+
+Stores the virtual area list. makedumpfile can get the vmalloc start
 value from this variable. This value is necessary for vmalloc translation.
 
 mem_map
-=======
-Physical addresses are translated to struct pages by treating them as an
-index into the mem_map array. Shifting a physical address PAGE_SHIFT bits
-to the right will treat it as a PFN from physical address 0, which is also
-an index within the mem_map array.
+-------
+
+Physical addresses are translated to struct pages by treating them as
+an index into the mem_map array. Right-shifting a physical address
+PAGE_SHIFT bits converts it into a page frame number which is an index
+into that mem_map array.
 
-In short, it can map the address to struct page.
+Used to map an address to the corresponding struct page.
 
 contig_page_data
-================
-Makedumpfile can get the pglist_data structure from this symbol
-'contig_page_data'. The pglist_data structure is used to describe the
-memory layout.
+----------------
 
-User-space tools can use this symbols for excluding free pages.
+Makedumpfile can get the pglist_data structure from this symbol, which
+is used to describe the memory layout.
+
+User-space tools use this to exclude free pages when dumping memory.
 
 mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
-==========================================================================
-Export the address of 'mem_section' array, and it's length, structure size,
-and the 'section_mem_map' offset.
+--------------------------------------------------------------------------
+
+The address of the mem_section array, its length, structure size, and
+the section_mem_map offset.
 
 It exists in the sparse memory mapping model, and it is also somewhat
-similar to the mem_map variable, both of them will help to translate
-the address.
+similar to the mem_map variable, both of them are used to translate an
+address.
 
 page
-====
-The size of a 'page' structure. In kernel, the page is an important data
-structure, it is widely used to compute the continuous memory.
+----
+
+The size of a page structure. struct page is an important data structure
+and it is widely used to compute the contiguous memory.
 
 pglist_data
-===========
-The size of a 'pglist_data' structure. This value will be used to check if
-the 'pglist_data' structure is valid. It is also one of the conditions for
-checking the memory type.
+-----------
+
+The size of a pglist_data structure. This value will be used to check
+if the pglist_data structure is valid. It is also used for checking the
+memory type.
 
 zone
-====
-The size of a 'zone' structure. This value is often used to check if the
-'zone' structure is found. It is necessary structures for excluding free
-pages.
+----
+
+The size of a zone structure. This value is often used to check if the
+zone structure has been found. It is also used for excluding free pages.
 
 free_area
-=========
-The size of a 'free_area' structure. It indicates whether the 'free_area'
-structure is valid or not. This is useful for excluding free pages.
+---------
+
+The size of a free_area structure. It indicates whether the free_area
+structure is valid or not. Useful for excluding free pages.
 
 list_head
-=========
-The size of a 'list_head' structure. It depends on this value when
-iterating the free list.
+---------
+
+The size of a list_head structure. Used when iterating lists in a
+post-mortem analysis session.
 
 nodemask_t
-==========
-The size of a 'nodemask_t' type. This value is used to compute the number
+----------
+
+The size of a nodemask_t type. This value is used to compute the number
 of online nodes.
 
 (page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|
        compound_order|compound_head)
-===================================================================
+-------------------------------------------------------------------
+
 User-space tools can compute their values based on the offset of these
 variables. The variables are helpful to exclude unnecessary pages.
 
 (pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_
               spanned_pages|node_id)
-===================================================================
-On NUMA machines, each NUMA node has a pg_data_t to describe it's memory
+-------------------------------------------------------------------
+
+On NUMA machines, each NUMA node has a pg_data_t to describe its memory
 layout. On UMA machines there is a single pglist_data which describes the
 whole memory.
 
@@ -144,16 +154,18 @@ These values are used to check the memory type, and they are also helpful
 to compute the virtual address for memory map.
 
 (zone, free_area|vm_stat|spanned_pages)
-=======================================
-Each node is divided up into a number of blocks called zones which
+---------------------------------------
+
+Each node is divided into a number of blocks called zones which
 represent ranges within memory. A zone is described by a structure zone.
 Each zone type is suitable for a different type of usage.
 
-User-space tools can compute their values based on the offset of these
+User-space tools can compute required values based on the offset of these
 variables.
 
 (free_area, free_list)
-======================
+----------------------
+
 Offset of the free_list's member. This value is used to compute the number
 of free pages.
 
@@ -162,295 +174,325 @@ The fields in this structure are simple, the free_list represents a linked
 list of free page blocks.
 
 (list_head, next|prev)
-======================
-Offsets of the list_head's members. In general, the list_head is used to
-define a circular linked list. User-space tools often need to traverse
-the lists to get specific pages.
+----------------------
+
+Offsets of the list_head's members. list_head is used to define a
+circular linked list. User-space tools need these in order to traverse
+lists.
 
 (vmap_area, va_start|list)
-==========================
+--------------------------
+
 Offsets of the vmap_area's members. They indicate the vmalloc layer
-information. Makedumpfile can get the start address of vmalloc region.
+information. Makedumpfile gets the start address of the vmalloc region.
 
 (zone.free_area, MAX_ORDER)
-===========================
+---------------------------
+
 It indicates the maximum number of the array free_area. This macro is
-used to the zone buddy allocator. User-space tools use this value to
+used by the zone buddy allocator. User-space tools use this value to
 iterate the free_area.
 
 log_buf
-=======
-In general, console output is written to the ring buffer 'log_buf' at
-index 'log_first_idx'. It can get kernel log from the log_buf.
+-------
+
+Console output is written to the ring buffer log_buf at index
+log_first_idx. Used to get the kernel log.
 
 log_buf_len
-===========
-Length of a 'log_buf'. Makedumpfile can read the number of strings
-from the log_buf.
+-----------
 
-log_first_idx
-=============
-Index of the first record stored in the buffer 'log_buf'. This value
-tells the user-space tools the place where to read the strings in the
+Length of a log_buf. Used to read the number of strings from the
 log_buf.
 
+log_first_idx
+-------------
+
+Index of the first record stored in the buffer log_buf. Used by
+user-space tools to read the strings in the log_buf.
+
 clear_idx
-=========
-The index that the next printk record to read after the last 'clear'
+---------
+
+The index that the next printk() record to read after the last clear
 command. It indicates the first record after the last SYSLOG_ACTION
 _CLEAR, like issued by 'dmesg -c'.
 
 log_next_idx
-============
-The index of the next record to store in the buffer 'log_buf'. It helps
-to compute the index of current strings position.
+------------
+
+The index of the next record to store in the buffer log_buf. Used to
+compute the index of the current string position.
 
 printk_log
-==========
-The size of a structure 'printk_log'. It helps to compute the size of
+----------
+
+The size of a structure printk_log. Used to compute the size of
 messages, and extract dmesg log.
 
 (printk_log, ts_nsec|len|text_len|dict_len)
-===========================================
-It represents these field offsets in the structure 'printk_log'. User
-space tools can parse it and detect any changes to structure down the
-line.
+-------------------------------------------
+
+It represents field offsets in struct printk_log. User space tools can
+parse it and detect any changes to structure down the line.
 
 (free_area.free_list, MIGRATE_TYPES)
-====================================
+------------------------------------
+
 The number of migrate types for pages. The free_list is divided into
 the array, it needs to know the number of the array.
 
 NR_FREE_PAGES
-=============
+-------------
+
 On linux-2.6.21 or later, the number of free_pages is in
-vm_stat[NR_FREE_PAGES]. It can get the number of free pages from the
-array.
+vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
 
 PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|
 PG_hwpoision|PG_head_mask
-=====================================================
-It means the attribute of a page. These flags will be used to filter
-the free pages.
+-----------------------------------------------------a
+
+Page attributes. These flags are used to filter free pages.
 
 PAGE_BUDDY_MAPCOUNT_VALUE or ~PG_buddy
-======================================
-The 'PG_buddy' flag indicates that the page is free and in the buddy
+--------------------------------------
+
+The PG_buddy flag indicates that the page is free and in the buddy
 system. Makedumpfile can exclude the free pages managed by a buddy.
 
 HUGETLB_PAGE_DTOR
-=================
-The 'HUGETLB_PAGE_DTOR' flag indicates the hugetlbfs pages. Makedumpfile
-will exclude these pages.
+-----------------
 
-================
-x86_64 variables
-================
+The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
+excludes these pages.
+
+======
+x86_64
+======
 
 phys_base
-=========
-In x86_64, the 'phys_base' is necessary to convert virtual address of
-exported kernel symbol to physical address.
+---------
+
+Used to convert the virtual address of an exported kernel symbol to its
+physical address.
 
 init_top_pgt
-============
-The 'init_top_pgt' used to walk through the whole page table and convert
-virtual address to physical address.
+------------
+
+Used to walk through the whole page table and convert virtual addresses
+to physical addresses.
 
 pgtable_l5_enabled
-==================
+------------------
+
 User-space tools need to know whether the crash kernel was in 5-level
-paging mode or not.
+paging mode.
 
 node_data
-=========
-This is a struct 'pglist_data' array, it stores all numa nodes information.
-In general, Makedumpfile can get the pglist_data structure from symbol
-'node_data'.
+---------
+
+This is a struct pglist_data array and stores all numa nodes
+information. Makedumpfile gets the pglist_data structure from it.
 
 (node_data, MAX_NUMNODES)
-=========================
-The number of this 'node_data' array. It means the maximum number of the
-nodes in system.
+-------------------------
+
+The maximum number of the nodes in system.
 
 KERNELOFFSET
-============
-Randomize the address of the kernel image. This is the offset of KASLR in
-VMCOREINFO ELF notes. It is used to compute the page offset in x86_64. If
-KASLE is disabled, this value is zero.
+------------
+
+The kernel randomization offset. Used to compute the page offset. If
+KASLR is disabled, this value is zero.
 
 KERNEL_IMAGE_SIZE
-=================
-The size of 'KERNEL_IMAGE_SIZE', currently unused.
+-----------------
 
-The old MODULES_VADDR need be decided by KERNEL_IMAGE_SIZE when kaslr
-enabled. Now MODULES_VADDR is not needed any more since Pratyush makes
-all VA to PA converting done by page table lookup.
+Currently unused.
 
 PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)
-========================================
-The value of 'PG_offline' flag can be used for marking pages as logically
-offline. Makedumpfile can directly skip pages that are logically offline.
+----------------------------------------
+
+The value of PG_offline flag can be used for marking pages as logically
+offline. Makedumpfile skips pages that are logically offline.
 
 sme_mask
-========
-For AMD machine with SME feature, it indicates the secure memory encryption
-mask. Makedumpfile tools need to know whether the crash kernel was encrypted
-or not. If SME is enabled in the first kernel, the crash kernel's page
-table(pgd/pud/pmd/pte) contains the memory encryption mask, so need to
-remove the sme mask to obtain the true physical address.
+--------
 
-=============
-x86 variables
-=============
+For AMD machine with SME feature, it indicates the secure memory
+encryption mask. Makedumpfile tools need to know whether the crash
+kernel was encrypted. If SME is enabled in the first kernel, the crash
+kernel's page table (pgd/pud/pmd/pte) contains the memory encryption
+mask and this is used to remove the SME mask to obtain the true physical
+address.
+
+======
+x86_32
+======
 
 X86_PAE
-=======
-It means the physical address extension. It has the cost of more
-page table lookup overhead, and also consumes more page table space
-per process. This flag will be used to check whether the PAE was
-enabled in crash kernel or not when converting virtual address to
-physical address.
+-------
 
-==============
-ia64 variables
-==============
+Denotes whether physical address extensions are enabled. It has the cost
+of more page table lookup overhead, and also consumes more page table
+space per process. Used to check whether PAE was enabled in the crash
+kernel when converting virtual addresses to physical addresses.
+
+====
+ia64
+====
 
 pgdat_list|(pgdat_list, MAX_NUMNODES)
-=====================================
-This is a struct 'pg_data_t' array, it stores all numa nodes information.
-And the 'MAX_NUMNODES' indicates the number of the nodes.
+-------------------------------------
+
+pg_data_t array storing all numa nodes information. MAX_NUMNODES
+indicates the number of the nodes.
 
 node_memblk|(node_memblk, NR_NODE_MEMBLKS)
-==========================================
+------------------------------------------
+
 List of node memory chunks. Filled when parsing SRAT table to obtain
-information about memory nodes. The 'NR_NODE_MEMBLKS' indicates the number
+information about memory nodes. NR_NODE_MEMBLKS indicates the number
 of node memory chunks.
 
-These values are used to compute the number of nodes in crash kernel.
+These values are used to compute the number of nodes in the crash kernel.
 
 node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size)
-================================================================
-The size of a struct 'node_memblk_s', and the offsets of the
-node_memblk_s's members. It helps to compute the number of nodes.
+----------------------------------------------------------------
+
+The size of a struct node_memblk_s and the offsets of the
+node_memblk_s's members. Used to compute the number of nodes.
 
 PGTABLE_3|PGTABLE_4
-===================
+-------------------
+
 User-space tools need to know whether the crash kernel was in 3-level or
-4-level paging mode. This flag can help to distinguish the page table.
+4-level paging mode. Used to distinguish the page table.
 
-===============
-arm64 variables
-===============
+=====
+ARM64
+=====
 
 VA_BITS
-=======
-The maximum number of bits for virtual addresses. This value helps to
-compute the virtual memory ranges.
+-------
+
+The maximum number of bits for virtual addresses. Used to compute the
+virtual memory ranges.
 
 kimage_voffset
-==============
-The offset between the kernel virtual and physical mappings. This value
-helps to translate virtual address to physical address.
+--------------
+
+The offset between the kernel virtual and physical mappings. Used to
+translate virtual to physical addresses.
 
 PHYS_OFFSET
-===========
-It indicates the physical address of the start of memory. It is similar
-with the kimage_voffset, which is used to translate virtual address to
-physical address.
+-----------
+
+Indicates the physical address of the start of memory. Similar to
+kimage_voffset, which is used to translate virtual address to physical
+address.
 
 KERNELOFFSET
-============
-It is similar to x86_64.
+------------
+
+The kernel randomization offset. Used to compute the page offset. If
+KASLR is disabled, this value is zero.
 
 =============
 arm variables
 =============
 
 ARM_LPAE
-========
-It indicates whether the crash kernel support the large physical address
-extension. This value will tell you how to translate virtual address to
-physical address.
+--------
 
-==============
-s390 variables
-==============
+It indicates whether the crash kernel supports large physical address
+extensions. Used to translate virtual address to physical address.
+
+====
+s390
+====
 
 lowcore_ptr
-==========
-An array with a pointer to the lowcore of every CPU. This value
-helps to print the psw and all registers information.
+----------
+
+An array with a pointer to the lowcore of every CPU. Used to print the
+psw and all registers information.
 
 high_memory
-===========
-It can get the vmalloc_start address from the high_memory symbol.
+-----------
+
+Used to get the vmalloc_start address from the high_memory symbol.
 
 (lowcore_ptr, NR_CPUS)
-======================
-The maximum number of cpus.
+----------------------
 
-TODO.
+The maximum number of CPUs.
+
+=======
+powerpc
+=======
 
-powerpc variables
-=================
 
 node_data|(node_data, MAX_NUMNODES)
-===================================
-Please refer to common variables.
+-----------------------------------
+
+See above.
 
 contig_page_data
-================
-Please refer to common variables.
+----------------
+
+See above.
 
 vmemmap_list
-============
-The 'vmemmap_list' maintains the entire vmemmap physical mapping. It
-can get vmemmap list count and populate vmemmap regions info. If the
-vmemmap address translation information is stored in crash kernel,
-which helps to translate vmemmap kernel virtual addresses.
+------------
+
+The vmemmap_list maintains the entire vmemmap physical mapping. It can
+get vmemmap list count and populate vmemmap regions info. If the vmemmap
+address translation information is stored in the crash kernel, it helps
+to translate vmemmap kernel virtual addresses.
 
 mmu_vmemmap_psize
-=================
-The size of a page. It will try to use this page sizes for vmemmap if
-support. This value helps to translate virtual address to physical
-address.
+-----------------
+
+The size of a page. Used to translate address to physical addresses.
 
 mmu_psize_defs
-==============
-It stores a variety of pages, such as the page size is 4k, 64k, or 16M.
+--------------
 
-It depends on this value when making vtop translations.
+Page size definitions, i.e. 4k, 64k, or 16M.
+
+Used to make vtop translations.
 
 vmemmap_backing|(vmemmap_backing, list)|(vmemmap_backing, phys)|
 (vmemmap_backing, virt_addr)
-================================================================
+----------------------------------------------------------------
+
 The vmemmap virtual address space management does not have a traditional
 page table to track which virtual struct pages are backed by physical
 mapping. The virtual to physical mappings are tracked in a simple linked
 list format.
 
-And user-space tools need to know the offset of 'list', 'phys' and
-'virt_addr'. It depends on these values when computing the count of
-vmemmap regions.
+User-space tools need to know the offset of list, phys and virt_addr
+when computing the count of vmemmap regions.
 
 mmu_psize_def|(mmu_psize_def, shift)
-====================================
-The size of a struct 'mmu_psize_def', and the offset of mmu_psize_def's
+------------------------------------
+
+The size of a struct mmu_psize_def and the offset of mmu_psize_def's
 member.
 
-These values help to make the vtop translations.
+Used in vtop translations.
 
-============
-sh variables
-============
+==
+sh
+==
 
 node_data|(node_data, MAX_NUMNODES)
-===================================
-It is similar to X86_64, please refer to above description.
+-----------------------------------
+
+See above.
 
 X2TLB
-=====
-It indicates whether the crash kernel enables the extended mode of the SH.
+-----
 
-TODO.
+Indicates whether the crash kernel enables SH extended mode.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo
  2018-12-16 13:16 ` [PATCH 2/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo Lianbo Jiang
@ 2018-12-17 13:01   ` Borislav Petkov
  2018-12-18  7:34     ` lijiang
  0 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2018-12-17 13:01 UTC (permalink / raw)
  To: Lianbo Jiang
  Cc: linux-kernel, kexec, tglx, mingo, x86, akpm, bhe, dyoung, linux-doc

On Sun, Dec 16, 2018 at 09:16:17PM +0800, Lianbo Jiang wrote:
> For AMD machine with SME feature, makedumpfile tools need to know
> whether the crash kernel was encrypted or not. If SME is enabled
> in the first kernel, the crash kernel's page table(pgd/pud/pmd/pte)
> contains the memory encryption mask, so need to remove the sme mask
> to obtain the true physical address.
> 
> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
> ---
>  arch/x86/kernel/machine_kexec_64.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 4c8acdfdc5a7..1860fe24117d 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -352,10 +352,24 @@ void machine_kexec(struct kimage *image)
>  
>  void arch_crash_save_vmcoreinfo(void)
>  {
> +	u64 sme_mask = sme_me_mask;
> +
>  	VMCOREINFO_NUMBER(phys_base);
>  	VMCOREINFO_SYMBOL(init_top_pgt);
>  	vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
>  			pgtable_l5_enabled());
> +	/*
> +	 * Currently, the local variable 'sme_mask' stores the value of
> +	 * sme_me_mask(bit 47), and also write the value of sme_mask to
> +	 * the vmcoreinfo.
> +	 * If need, the bit(sme_mask) might be redefined in the future,
> +	 * but the 'bit63' will be reserved.
> +	 * For example:
> +	 * [ misc	   ][ enc bit  ][ other misc SME info       ]
> +	 * 0000_0000_0000_0000_1000_0000_0000_0000_0000_0000_..._0000
> +	 * 63   59   55   51   47   43   39   35   31   27   ... 3
> +	 */

This text belongs into the document.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation
  2018-12-17 13:00   ` Borislav Petkov
@ 2018-12-18  7:31     ` lijiang
  2018-12-18 11:41       ` Borislav Petkov
  2018-12-26  3:24       ` Dave Young
  0 siblings, 2 replies; 14+ messages in thread
From: lijiang @ 2018-12-18  7:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, kexec, tglx, mingo, x86, akpm, bhe, dyoung, linux-doc

在 2018年12月17日 21:00, Borislav Petkov 写道:
> On Sun, Dec 16, 2018 at 09:16:16PM +0800, Lianbo Jiang wrote:
>> +clear_idx
>> +=========
>> +The index that the next printk record to read after the last 'clear'
>> +command. It indicates the first record after the last SYSLOG_ACTION
>> +_CLEAR, like issued by 'dmesg -c'.
> 
> What is that used for by the userspace tools?
> 

The clear_idx is used when dumping the dmesg log.

>> +
>> +log_next_idx
>> +============
>> +The index of the next record to store in the buffer 'log_buf'. It helps
>> +to compute the index of current strings position.
>> +
>> +printk_log
>> +==========
>> +The size of a structure 'printk_log'. It helps to compute the size of
>> +messages, and extract dmesg log.
> 
> What is the difference between that and log_buf?
> 

The printk_log is used to output human readable text, it will encapsulate header
information for log_buf, such as timestamp, syslog level, etc.

> 
> 
>> +
>> +(printk_log, ts_nsec|len|text_len|dict_len)
>> +===========================================
>> +It represents these field offsets in the structure 'printk_log'. User
>> +space tools can parse it and detect any changes to structure down the
>> +line.
> 
> What does that mean? "any changes down the line"?
> 

User space tools can parse it and check whether the values of printk_log's members
have been changed. 

I will improve it in patch v4.


>> +
>> +(free_area.free_list, MIGRATE_TYPES)
>> +====================================
>> +The number of migrate types for pages. The free_list is divided into
>> +the array, it needs to know the number of the array.
> 
> ... for?
> 

It needs to know the number of the array when makedumpfile computes the number of
free pages.

>> +
>> +NR_FREE_PAGES
>> +=============
>> +On linux-2.6.21 or later, the number of free_pages is in
>> +vm_stat[NR_FREE_PAGES]. It can get the number of free pages from the
>> +array.
>> +
>> +PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|
>> +PG_hwpoision|PG_head_mask
>> +=====================================================
>> +It means the attribute of a page. These flags will be used to filter
>> +the free pages.
>> +
>> +PAGE_BUDDY_MAPCOUNT_VALUE or ~PG_buddy
>> +======================================
>> +The 'PG_buddy' flag indicates that the page is free and in the buddy
>> +system. Makedumpfile can exclude the free pages managed by a buddy.
> 
> That text belongs with the one above?
> 

It exported the value of (~PG_buddy), so it is placed here independently.

>> +
>> +HUGETLB_PAGE_DTOR
>> +=================
>> +The 'HUGETLB_PAGE_DTOR' flag indicates the hugetlbfs pages. Makedumpfile
>> +will exclude these pages.
>> +
>> +================
>> +x86_64 variables
>> +================
>> +
>> +phys_base
>> +=========
>> +In x86_64, the 'phys_base' is necessary to convert virtual address of
>> +exported kernel symbol to physical address.
>> +
>> +init_top_pgt
>> +============
>> +The 'init_top_pgt' used to walk through the whole page table and convert
>> +virtual address to physical address.
> 
> This is the same as swapper_pg_dir?
> 

These two variables are somewhat similar, but they are used in different scenarios.

>> +
>> +pgtable_l5_enabled
>> +==================
>> +User-space tools need to know whether the crash kernel was in 5-level
>> +paging mode or not.
>> +
>> +node_data
>> +=========
>> +This is a struct 'pglist_data' array, it stores all numa nodes information.
>> +In general, Makedumpfile can get the pglist_data structure from symbol
>> +'node_data'.
>> +
>> +(node_data, MAX_NUMNODES)
>> +=========================
>> +The number of this 'node_data' array. It means the maximum number of the
>> +nodes in system.
>> +
>> +KERNELOFFSET
>> +============
>> +Randomize the address of the kernel image. This is the offset of KASLR in
>> +VMCOREINFO ELF notes. It is used to compute the page offset in x86_64. If
>> +KASLE is disabled, this value is zero.
>> +
>> +KERNEL_IMAGE_SIZE
>> +=================
>> +The size of 'KERNEL_IMAGE_SIZE', currently unused.
> 
> So remove?
> 

I'm not sure whether it should be removed, so i keep it.

>> +
>> +The old MODULES_VADDR need be decided by KERNEL_IMAGE_SIZE when kaslr
>> +enabled. Now MODULES_VADDR is not needed any more since Pratyush makes
>> +all VA to PA converting done by page table lookup.
> 
> Also, I did clean this up considerably - please include in your next
> version:
> 

Great, thanks for you help. I will post v4 later.

Regards,
Lianbo

> ---
> diff --git a/Documentation/kdump/vmcoreinfo.txt b/Documentation/kdump/vmcoreinfo.txt
> index d71260bf383a..2ce34d952bfd 100644
> --- a/Documentation/kdump/vmcoreinfo.txt
> +++ b/Documentation/kdump/vmcoreinfo.txt
> @@ -1,18 +1,19 @@
>  ================================================================
> -		Documentation for VMCOREINFO
> +			VMCOREINFO
>  ================================================================
>  
>  =======================
>  What is the VMCOREINFO?
>  =======================
> -It is a special ELF note section. The VMCOREINFO contains the first
> -kernel's various information, for example, structure size, page size,
> -symbol values and field offset, etc. These data are packed into an ELF
> -note section, and these data will also help user-space tools(e.g. crash
> -makedumpfile) analyze the first kernel's memory usage.
> -
> -In general, makedumpfile can dump the VMCOREINFO contents from vmlinux
> -in the first kernel. For example:
> +
> +VMCOREINFO is a special ELF note section. It contains various
> +information from the kernel like structure size, page size, symbol
> +values, field offsets, etc. These data are packed into an ELF note
> +section and used by user-space tools like crash and makedumpfile to
> +analyze a kernel's memory layout.
> +
> +To dump the VMCOREINFO contents, one can do:
> +
>  # makedumpfile -g VMCOREINFO -x vmlinux
>  
>  ================
> @@ -20,123 +21,132 @@ Common variables
>  ================
>  
>  init_uts_ns.name.release
> -========================
> -The number of OS release. Based on this version number, people can find
> -the source code for the corresponding version. When analyzing the vmcore,
> -people must read the source code to find the reason why the kernel crashed.
> +------------------------
> +
> +The version of the Linux kernel. Used to find the corresponding source
> +code from which the kernel has been built.
>  
>  PAGE_SIZE
> -=========
> -The size of a page. It is the smallest unit of data for memory management
> -in kernel. It is usually 4k bytes and the page is aligned in 4k bytes,
> -which is very important for computing address.
> +---------
> +
> +The size of a page. It is the smallest unit of data for memory
> +management in kernel. It is usually 4096 bytes and a page is aligned on
> +4096 bytes. Used for computing page addresses.
>  
>  init_uts_ns
> -===========
> -This is the UTS namespace, which is used to isolate two specific elements
> -of the system that relate to the uname system call. The UTS namespace is
> -named after the data structure used to store information returned by the
> -uname system call.
> +-----------
> +
> +This is the UTS namespace, which is used to isolate two specific
> +elements of the system that relate to the uname(2) system call. The UTS
> +namespace is named after the data structure used to store information
> +returned by the uname(2) system call.
>  
> -User-space tools can get the kernel name, host name, kernel release number,
> -kernel version, architecture name and OS type from the 'init_uts_ns'.
> +User-space tools can get the kernel name, host name, kernel release
> +number, kernel version, architecture name and OS type from it.
>  
>  node_online_map
> -===============
> -It is a macro definition, actually it is an array node_states[N_ONLINE],
> -and it represents the set of online node in a system, one bit position
> -per node number.
> +---------------
>  
> -This is used to keep track of which nodes are in the system and online.
> +An array node_states[N_ONLINE] which represents the set of online node
> +in a system, one bit position per node number. Used to keep track of
> +which nodes are in the system and online.
>  
>  swapper_pg_dir
> -=============
> -It generally indicates the pgd for the kernel. When mmu is enabled in
> -config file, the 'swapper_pg_dir' is valid.
> +-------------
>  
> -The 'swapper_pg_dir' helps to translate the virtual address to a physical
> -address.
> +The global page directory pointer of the kernel. Used to translate
> +virtual to physical addresses.
>  
>  _stext
> -======
> -It is an assemble symbol that defines the beginning of the text section.
> -In general, the '_stext' indicates the kernel start address. This is used
> -to convert a virtual address to a physical address when the virtual address
> -does not belong to the 'vmalloc' address.
> +------
> +
> +Defines the beginning of the text section. In general, _stext indicates
> +the kernel start address. Used to convert a virtual address from the
> +direct kernel map to a physical address.
>  
>  vmap_area_list
> -==============
> -It stores the virtual area list, makedumpfile can get the vmalloc start
> +--------------
> +
> +Stores the virtual area list. makedumpfile can get the vmalloc start
>  value from this variable. This value is necessary for vmalloc translation.
>  
>  mem_map
> -=======
> -Physical addresses are translated to struct pages by treating them as an
> -index into the mem_map array. Shifting a physical address PAGE_SHIFT bits
> -to the right will treat it as a PFN from physical address 0, which is also
> -an index within the mem_map array.
> +-------
> +
> +Physical addresses are translated to struct pages by treating them as
> +an index into the mem_map array. Right-shifting a physical address
> +PAGE_SHIFT bits converts it into a page frame number which is an index
> +into that mem_map array.
>  
> -In short, it can map the address to struct page.
> +Used to map an address to the corresponding struct page.
>  
>  contig_page_data
> -================
> -Makedumpfile can get the pglist_data structure from this symbol
> -'contig_page_data'. The pglist_data structure is used to describe the
> -memory layout.
> +----------------
>  
> -User-space tools can use this symbols for excluding free pages.
> +Makedumpfile can get the pglist_data structure from this symbol, which
> +is used to describe the memory layout.
> +
> +User-space tools use this to exclude free pages when dumping memory.
>  
>  mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
> -==========================================================================
> -Export the address of 'mem_section' array, and it's length, structure size,
> -and the 'section_mem_map' offset.
> +--------------------------------------------------------------------------
> +
> +The address of the mem_section array, its length, structure size, and
> +the section_mem_map offset.
>  
>  It exists in the sparse memory mapping model, and it is also somewhat
> -similar to the mem_map variable, both of them will help to translate
> -the address.
> +similar to the mem_map variable, both of them are used to translate an
> +address.
>  
>  page
> -====
> -The size of a 'page' structure. In kernel, the page is an important data
> -structure, it is widely used to compute the continuous memory.
> +----
> +
> +The size of a page structure. struct page is an important data structure
> +and it is widely used to compute the contiguous memory.
>  
>  pglist_data
> -===========
> -The size of a 'pglist_data' structure. This value will be used to check if
> -the 'pglist_data' structure is valid. It is also one of the conditions for
> -checking the memory type.
> +-----------
> +
> +The size of a pglist_data structure. This value will be used to check
> +if the pglist_data structure is valid. It is also used for checking the
> +memory type.
>  
>  zone
> -====
> -The size of a 'zone' structure. This value is often used to check if the
> -'zone' structure is found. It is necessary structures for excluding free
> -pages.
> +----
> +
> +The size of a zone structure. This value is often used to check if the
> +zone structure has been found. It is also used for excluding free pages.
>  
>  free_area
> -=========
> -The size of a 'free_area' structure. It indicates whether the 'free_area'
> -structure is valid or not. This is useful for excluding free pages.
> +---------
> +
> +The size of a free_area structure. It indicates whether the free_area
> +structure is valid or not. Useful for excluding free pages.
>  
>  list_head
> -=========
> -The size of a 'list_head' structure. It depends on this value when
> -iterating the free list.
> +---------
> +
> +The size of a list_head structure. Used when iterating lists in a
> +post-mortem analysis session.
>  
>  nodemask_t
> -==========
> -The size of a 'nodemask_t' type. This value is used to compute the number
> +----------
> +
> +The size of a nodemask_t type. This value is used to compute the number
>  of online nodes.
>  
>  (page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|
>         compound_order|compound_head)
> -===================================================================
> +-------------------------------------------------------------------
> +
>  User-space tools can compute their values based on the offset of these
>  variables. The variables are helpful to exclude unnecessary pages.
>  
>  (pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_
>                spanned_pages|node_id)
> -===================================================================
> -On NUMA machines, each NUMA node has a pg_data_t to describe it's memory
> +-------------------------------------------------------------------
> +
> +On NUMA machines, each NUMA node has a pg_data_t to describe its memory
>  layout. On UMA machines there is a single pglist_data which describes the
>  whole memory.
>  
> @@ -144,16 +154,18 @@ These values are used to check the memory type, and they are also helpful
>  to compute the virtual address for memory map.
>  
>  (zone, free_area|vm_stat|spanned_pages)
> -=======================================
> -Each node is divided up into a number of blocks called zones which
> +---------------------------------------
> +
> +Each node is divided into a number of blocks called zones which
>  represent ranges within memory. A zone is described by a structure zone.
>  Each zone type is suitable for a different type of usage.
>  
> -User-space tools can compute their values based on the offset of these
> +User-space tools can compute required values based on the offset of these
>  variables.
>  
>  (free_area, free_list)
> -======================
> +----------------------
> +
>  Offset of the free_list's member. This value is used to compute the number
>  of free pages.
>  
> @@ -162,295 +174,325 @@ The fields in this structure are simple, the free_list represents a linked
>  list of free page blocks.
>  
>  (list_head, next|prev)
> -======================
> -Offsets of the list_head's members. In general, the list_head is used to
> -define a circular linked list. User-space tools often need to traverse
> -the lists to get specific pages.
> +----------------------
> +
> +Offsets of the list_head's members. list_head is used to define a
> +circular linked list. User-space tools need these in order to traverse
> +lists.
>  
>  (vmap_area, va_start|list)
> -==========================
> +--------------------------
> +
>  Offsets of the vmap_area's members. They indicate the vmalloc layer
> -information. Makedumpfile can get the start address of vmalloc region.
> +information. Makedumpfile gets the start address of the vmalloc region.
>  
>  (zone.free_area, MAX_ORDER)
> -===========================
> +---------------------------
> +
>  It indicates the maximum number of the array free_area. This macro is
> -used to the zone buddy allocator. User-space tools use this value to
> +used by the zone buddy allocator. User-space tools use this value to
>  iterate the free_area.
>  
>  log_buf
> -=======
> -In general, console output is written to the ring buffer 'log_buf' at
> -index 'log_first_idx'. It can get kernel log from the log_buf.
> +-------
> +
> +Console output is written to the ring buffer log_buf at index
> +log_first_idx. Used to get the kernel log.
>  
>  log_buf_len
> -===========
> -Length of a 'log_buf'. Makedumpfile can read the number of strings
> -from the log_buf.
> +-----------
>  
> -log_first_idx
> -=============
> -Index of the first record stored in the buffer 'log_buf'. This value
> -tells the user-space tools the place where to read the strings in the
> +Length of a log_buf. Used to read the number of strings from the
>  log_buf.
>  
> +log_first_idx
> +-------------
> +
> +Index of the first record stored in the buffer log_buf. Used by
> +user-space tools to read the strings in the log_buf.
> +
>  clear_idx
> -=========
> -The index that the next printk record to read after the last 'clear'
> +---------
> +
> +The index that the next printk() record to read after the last clear
>  command. It indicates the first record after the last SYSLOG_ACTION
>  _CLEAR, like issued by 'dmesg -c'.
>  
>  log_next_idx
> -============
> -The index of the next record to store in the buffer 'log_buf'. It helps
> -to compute the index of current strings position.
> +------------
> +
> +The index of the next record to store in the buffer log_buf. Used to
> +compute the index of the current string position.
>  
>  printk_log
> -==========
> -The size of a structure 'printk_log'. It helps to compute the size of
> +----------
> +
> +The size of a structure printk_log. Used to compute the size of
>  messages, and extract dmesg log.
>  
>  (printk_log, ts_nsec|len|text_len|dict_len)
> -===========================================
> -It represents these field offsets in the structure 'printk_log'. User
> -space tools can parse it and detect any changes to structure down the
> -line.
> +-------------------------------------------
> +
> +It represents field offsets in struct printk_log. User space tools can
> +parse it and detect any changes to structure down the line.
>  
>  (free_area.free_list, MIGRATE_TYPES)
> -====================================
> +------------------------------------
> +
>  The number of migrate types for pages. The free_list is divided into
>  the array, it needs to know the number of the array.
>  
>  NR_FREE_PAGES
> -=============
> +-------------
> +
>  On linux-2.6.21 or later, the number of free_pages is in
> -vm_stat[NR_FREE_PAGES]. It can get the number of free pages from the
> -array.
> +vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
>  
>  PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|
>  PG_hwpoision|PG_head_mask
> -=====================================================
> -It means the attribute of a page. These flags will be used to filter
> -the free pages.
> +-----------------------------------------------------a
> +
> +Page attributes. These flags are used to filter free pages.
>  
>  PAGE_BUDDY_MAPCOUNT_VALUE or ~PG_buddy
> -======================================
> -The 'PG_buddy' flag indicates that the page is free and in the buddy
> +--------------------------------------
> +
> +The PG_buddy flag indicates that the page is free and in the buddy
>  system. Makedumpfile can exclude the free pages managed by a buddy.
>  
>  HUGETLB_PAGE_DTOR
> -=================
> -The 'HUGETLB_PAGE_DTOR' flag indicates the hugetlbfs pages. Makedumpfile
> -will exclude these pages.
> +-----------------
>  
> -================
> -x86_64 variables
> -================
> +The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
> +excludes these pages.
> +
> +======
> +x86_64
> +======
>  
>  phys_base
> -=========
> -In x86_64, the 'phys_base' is necessary to convert virtual address of
> -exported kernel symbol to physical address.
> +---------
> +
> +Used to convert the virtual address of an exported kernel symbol to its
> +physical address.
>  
>  init_top_pgt
> -============
> -The 'init_top_pgt' used to walk through the whole page table and convert
> -virtual address to physical address.
> +------------
> +
> +Used to walk through the whole page table and convert virtual addresses
> +to physical addresses.
>  
>  pgtable_l5_enabled
> -==================
> +------------------
> +
>  User-space tools need to know whether the crash kernel was in 5-level
> -paging mode or not.
> +paging mode.
>  
>  node_data
> -=========
> -This is a struct 'pglist_data' array, it stores all numa nodes information.
> -In general, Makedumpfile can get the pglist_data structure from symbol
> -'node_data'.
> +---------
> +
> +This is a struct pglist_data array and stores all numa nodes
> +information. Makedumpfile gets the pglist_data structure from it.
>  
>  (node_data, MAX_NUMNODES)
> -=========================
> -The number of this 'node_data' array. It means the maximum number of the
> -nodes in system.
> +-------------------------
> +
> +The maximum number of the nodes in system.
>  
>  KERNELOFFSET
> -============
> -Randomize the address of the kernel image. This is the offset of KASLR in
> -VMCOREINFO ELF notes. It is used to compute the page offset in x86_64. If
> -KASLE is disabled, this value is zero.
> +------------
> +
> +The kernel randomization offset. Used to compute the page offset. If
> +KASLR is disabled, this value is zero.
>  
>  KERNEL_IMAGE_SIZE
> -=================
> -The size of 'KERNEL_IMAGE_SIZE', currently unused.
> +-----------------
>  
> -The old MODULES_VADDR need be decided by KERNEL_IMAGE_SIZE when kaslr
> -enabled. Now MODULES_VADDR is not needed any more since Pratyush makes
> -all VA to PA converting done by page table lookup.
> +Currently unused.
>  
>  PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)
> -========================================
> -The value of 'PG_offline' flag can be used for marking pages as logically
> -offline. Makedumpfile can directly skip pages that are logically offline.
> +----------------------------------------
> +
> +The value of PG_offline flag can be used for marking pages as logically
> +offline. Makedumpfile skips pages that are logically offline.
>  
>  sme_mask
> -========
> -For AMD machine with SME feature, it indicates the secure memory encryption
> -mask. Makedumpfile tools need to know whether the crash kernel was encrypted
> -or not. If SME is enabled in the first kernel, the crash kernel's page
> -table(pgd/pud/pmd/pte) contains the memory encryption mask, so need to
> -remove the sme mask to obtain the true physical address.
> +--------
>  
> -=============
> -x86 variables
> -=============
> +For AMD machine with SME feature, it indicates the secure memory
> +encryption mask. Makedumpfile tools need to know whether the crash
> +kernel was encrypted. If SME is enabled in the first kernel, the crash
> +kernel's page table (pgd/pud/pmd/pte) contains the memory encryption
> +mask and this is used to remove the SME mask to obtain the true physical
> +address.
> +
> +======
> +x86_32
> +======
>  
>  X86_PAE
> -=======
> -It means the physical address extension. It has the cost of more
> -page table lookup overhead, and also consumes more page table space
> -per process. This flag will be used to check whether the PAE was
> -enabled in crash kernel or not when converting virtual address to
> -physical address.
> +-------
>  
> -==============
> -ia64 variables
> -==============
> +Denotes whether physical address extensions are enabled. It has the cost
> +of more page table lookup overhead, and also consumes more page table
> +space per process. Used to check whether PAE was enabled in the crash
> +kernel when converting virtual addresses to physical addresses.
> +
> +====
> +ia64
> +====
>  
>  pgdat_list|(pgdat_list, MAX_NUMNODES)
> -=====================================
> -This is a struct 'pg_data_t' array, it stores all numa nodes information.
> -And the 'MAX_NUMNODES' indicates the number of the nodes.
> +-------------------------------------
> +
> +pg_data_t array storing all numa nodes information. MAX_NUMNODES
> +indicates the number of the nodes.
>  
>  node_memblk|(node_memblk, NR_NODE_MEMBLKS)
> -==========================================
> +------------------------------------------
> +
>  List of node memory chunks. Filled when parsing SRAT table to obtain
> -information about memory nodes. The 'NR_NODE_MEMBLKS' indicates the number
> +information about memory nodes. NR_NODE_MEMBLKS indicates the number
>  of node memory chunks.
>  
> -These values are used to compute the number of nodes in crash kernel.
> +These values are used to compute the number of nodes in the crash kernel.
>  
>  node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size)
> -================================================================
> -The size of a struct 'node_memblk_s', and the offsets of the
> -node_memblk_s's members. It helps to compute the number of nodes.
> +----------------------------------------------------------------
> +
> +The size of a struct node_memblk_s and the offsets of the
> +node_memblk_s's members. Used to compute the number of nodes.
>  
>  PGTABLE_3|PGTABLE_4
> -===================
> +-------------------
> +
>  User-space tools need to know whether the crash kernel was in 3-level or
> -4-level paging mode. This flag can help to distinguish the page table.
> +4-level paging mode. Used to distinguish the page table.
>  
> -===============
> -arm64 variables
> -===============
> +=====
> +ARM64
> +=====
>  
>  VA_BITS
> -=======
> -The maximum number of bits for virtual addresses. This value helps to
> -compute the virtual memory ranges.
> +-------
> +
> +The maximum number of bits for virtual addresses. Used to compute the
> +virtual memory ranges.
>  
>  kimage_voffset
> -==============
> -The offset between the kernel virtual and physical mappings. This value
> -helps to translate virtual address to physical address.
> +--------------
> +
> +The offset between the kernel virtual and physical mappings. Used to
> +translate virtual to physical addresses.
>  
>  PHYS_OFFSET
> -===========
> -It indicates the physical address of the start of memory. It is similar
> -with the kimage_voffset, which is used to translate virtual address to
> -physical address.
> +-----------
> +
> +Indicates the physical address of the start of memory. Similar to
> +kimage_voffset, which is used to translate virtual address to physical
> +address.
>  
>  KERNELOFFSET
> -============
> -It is similar to x86_64.
> +------------
> +
> +The kernel randomization offset. Used to compute the page offset. If
> +KASLR is disabled, this value is zero.
>  
>  =============
>  arm variables
>  =============
>  
>  ARM_LPAE
> -========
> -It indicates whether the crash kernel support the large physical address
> -extension. This value will tell you how to translate virtual address to
> -physical address.
> +--------
>  
> -==============
> -s390 variables
> -==============
> +It indicates whether the crash kernel supports large physical address
> +extensions. Used to translate virtual address to physical address.
> +
> +====
> +s390
> +====
>  
>  lowcore_ptr
> -==========
> -An array with a pointer to the lowcore of every CPU. This value
> -helps to print the psw and all registers information.
> +----------
> +
> +An array with a pointer to the lowcore of every CPU. Used to print the
> +psw and all registers information.
>  
>  high_memory
> -===========
> -It can get the vmalloc_start address from the high_memory symbol.
> +-----------
> +
> +Used to get the vmalloc_start address from the high_memory symbol.
>  
>  (lowcore_ptr, NR_CPUS)
> -======================
> -The maximum number of cpus.
> +----------------------
>  
> -TODO.
> +The maximum number of CPUs.
> +
> +=======
> +powerpc
> +=======
>  
> -powerpc variables
> -=================
>  
>  node_data|(node_data, MAX_NUMNODES)
> -===================================
> -Please refer to common variables.
> +-----------------------------------
> +
> +See above.
>  
>  contig_page_data
> -================
> -Please refer to common variables.
> +----------------
> +
> +See above.
>  
>  vmemmap_list
> -============
> -The 'vmemmap_list' maintains the entire vmemmap physical mapping. It
> -can get vmemmap list count and populate vmemmap regions info. If the
> -vmemmap address translation information is stored in crash kernel,
> -which helps to translate vmemmap kernel virtual addresses.
> +------------
> +
> +The vmemmap_list maintains the entire vmemmap physical mapping. It can
> +get vmemmap list count and populate vmemmap regions info. If the vmemmap
> +address translation information is stored in the crash kernel, it helps
> +to translate vmemmap kernel virtual addresses.
>  
>  mmu_vmemmap_psize
> -=================
> -The size of a page. It will try to use this page sizes for vmemmap if
> -support. This value helps to translate virtual address to physical
> -address.
> +-----------------
> +
> +The size of a page. Used to translate address to physical addresses.
>  
>  mmu_psize_defs
> -==============
> -It stores a variety of pages, such as the page size is 4k, 64k, or 16M.
> +--------------
>  
> -It depends on this value when making vtop translations.
> +Page size definitions, i.e. 4k, 64k, or 16M.
> +
> +Used to make vtop translations.
>  
>  vmemmap_backing|(vmemmap_backing, list)|(vmemmap_backing, phys)|
>  (vmemmap_backing, virt_addr)
> -================================================================
> +----------------------------------------------------------------
> +
>  The vmemmap virtual address space management does not have a traditional
>  page table to track which virtual struct pages are backed by physical
>  mapping. The virtual to physical mappings are tracked in a simple linked
>  list format.
>  
> -And user-space tools need to know the offset of 'list', 'phys' and
> -'virt_addr'. It depends on these values when computing the count of
> -vmemmap regions.
> +User-space tools need to know the offset of list, phys and virt_addr
> +when computing the count of vmemmap regions.
>  
>  mmu_psize_def|(mmu_psize_def, shift)
> -====================================
> -The size of a struct 'mmu_psize_def', and the offset of mmu_psize_def's
> +------------------------------------
> +
> +The size of a struct mmu_psize_def and the offset of mmu_psize_def's
>  member.
>  
> -These values help to make the vtop translations.
> +Used in vtop translations.
>  
> -============
> -sh variables
> -============
> +==
> +sh
> +==
>  
>  node_data|(node_data, MAX_NUMNODES)
> -===================================
> -It is similar to X86_64, please refer to above description.
> +-----------------------------------
> +
> +See above.
>  
>  X2TLB
> -=====
> -It indicates whether the crash kernel enables the extended mode of the SH.
> +-----
>  
> -TODO.
> +Indicates whether the crash kernel enables SH extended mode.
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo
  2018-12-17 13:01   ` Borislav Petkov
@ 2018-12-18  7:34     ` lijiang
  0 siblings, 0 replies; 14+ messages in thread
From: lijiang @ 2018-12-18  7:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, kexec, tglx, mingo, x86, akpm, bhe, dyoung, linux-doc

在 2018年12月17日 21:01, Borislav Petkov 写道:
> On Sun, Dec 16, 2018 at 09:16:17PM +0800, Lianbo Jiang wrote:
>> For AMD machine with SME feature, makedumpfile tools need to know
>> whether the crash kernel was encrypted or not. If SME is enabled
>> in the first kernel, the crash kernel's page table(pgd/pud/pmd/pte)
>> contains the memory encryption mask, so need to remove the sme mask
>> to obtain the true physical address.
>>
>> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
>> ---
>>  arch/x86/kernel/machine_kexec_64.c | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 4c8acdfdc5a7..1860fe24117d 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -352,10 +352,24 @@ void machine_kexec(struct kimage *image)
>>  
>>  void arch_crash_save_vmcoreinfo(void)
>>  {
>> +	u64 sme_mask = sme_me_mask;
>> +
>>  	VMCOREINFO_NUMBER(phys_base);
>>  	VMCOREINFO_SYMBOL(init_top_pgt);
>>  	vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
>>  			pgtable_l5_enabled());
>> +	/*
>> +	 * Currently, the local variable 'sme_mask' stores the value of
>> +	 * sme_me_mask(bit 47), and also write the value of sme_mask to
>> +	 * the vmcoreinfo.
>> +	 * If need, the bit(sme_mask) might be redefined in the future,
>> +	 * but the 'bit63' will be reserved.
>> +	 * For example:
>> +	 * [ misc	   ][ enc bit  ][ other misc SME info       ]
>> +	 * 0000_0000_0000_0000_1000_0000_0000_0000_0000_0000_..._0000
>> +	 * 63   59   55   51   47   43   39   35   31   27   ... 3
>> +	 */
> 
> This text belongs into the document.
> 
Ok, i will move it into VMCOREINFO document.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation
  2018-12-18  7:31     ` lijiang
@ 2018-12-18 11:41       ` Borislav Petkov
  2018-12-26  3:24       ` Dave Young
  1 sibling, 0 replies; 14+ messages in thread
From: Borislav Petkov @ 2018-12-18 11:41 UTC (permalink / raw)
  To: lijiang
  Cc: linux-kernel, kexec, tglx, mingo, x86, akpm, bhe, dyoung, linux-doc

On Tue, Dec 18, 2018 at 03:31:32PM +0800, lijiang wrote:
> The printk_log is used to output human readable text, it will encapsulate header
> information for log_buf, such as timestamp, syslog level, etc.

Me asking those questions is supposed to hint that the explanations need
improvement. But you get the idea...

> >> +PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|
> >> +PG_hwpoision|PG_head_mask
> >> +=====================================================
> >> +It means the attribute of a page. These flags will be used to filter
> >> +the free pages.
> >> +
> >> +PAGE_BUDDY_MAPCOUNT_VALUE or ~PG_buddy
> >> +======================================
> >> +The 'PG_buddy' flag indicates that the page is free and in the buddy
> >> +system. Makedumpfile can exclude the free pages managed by a buddy.
> > 
> > That text belongs with the one above?
> > 
> It exported the value of (~PG_buddy), so it is placed here independently.

Then make that obvious in the description. The one above talks about the
PG flags and this one should talk about PAGE_BUDDY_MAPCOUNT_VALUE and
what it is used for. The fact that it is computed by negating PG_buddy
is an implementation detail.

> These two variables are somewhat similar, but they are used in
> different scenarios.

Those different scenarious need to be part of the description.

> >> +KERNEL_IMAGE_SIZE
> >> +=================
> >> +The size of 'KERNEL_IMAGE_SIZE', currently unused.
> > 
> > So remove?
> > 
> 
> I'm not sure whether it should be removed, so i keep it.

If it is unused, it should be removed as an VMCOREINFO export and from
the docs. But that can be done later, as a separate patch.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation
  2018-12-18  7:31     ` lijiang
  2018-12-18 11:41       ` Borislav Petkov
@ 2018-12-26  3:24       ` Dave Young
  2018-12-26  3:36         ` Dave Young
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Young @ 2018-12-26  3:24 UTC (permalink / raw)
  To: lijiang
  Cc: Borislav Petkov, bhe, linux-doc, x86, kexec, linux-kernel, mingo,
	tglx, akpm

> >> +
> >> +KERNEL_IMAGE_SIZE
> >> +=================
> >> +The size of 'KERNEL_IMAGE_SIZE', currently unused.
> > 
> > So remove?
> > 
> 
> I'm not sure whether it should be removed, so i keep it.

Just remove it.  It was added by Baoquan for KASLR issues, later
makedumpfile reverted the userspace part and added other implementation.

In case old makedumpfile does not support new kernel, it has some kernel
versions support list in code, thus no worry about the compatibility
issue.

Thanks
Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation
  2018-12-26  3:24       ` Dave Young
@ 2018-12-26  3:36         ` Dave Young
  2018-12-26  6:14           ` lijiang
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Young @ 2018-12-26  3:36 UTC (permalink / raw)
  To: lijiang
  Cc: Borislav Petkov, bhe, linux-doc, x86, kexec, linux-kernel, mingo,
	tglx, akpm, anderson, Kazuhito Hagio

On 12/26/18 at 11:24am, Dave Young wrote:
> > >> +
> > >> +KERNEL_IMAGE_SIZE
> > >> +=================
> > >> +The size of 'KERNEL_IMAGE_SIZE', currently unused.
> > > 
> > > So remove?
> > > 
> > 
> > I'm not sure whether it should be removed, so i keep it.
> 
> Just remove it.  It was added by Baoquan for KASLR issues, later
> makedumpfile reverted the userspace part and added other implementation.
> 
> In case old makedumpfile does not support new kernel, it has some kernel
> versions support list in code, thus no worry about the compatibility
> issue.

Ah, it is not unused actually, clone crash tool git:
$ git grep KERNEL_IMAGE_SIZE
x86_64.c:               if ((string = pc->read_vmcoreinfo("NUMBER(KERNEL_IMAGE_SIZE)"))) {

So in the documentation, the use cases of crash tool should also be
covered.

Lianbo, it would be good to cc Dave and Kazu for these patches, could
you cc them in your next post?

> 
> Thanks
> Dave

Thanks
Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation
  2018-12-26  3:36         ` Dave Young
@ 2018-12-26  6:14           ` lijiang
  0 siblings, 0 replies; 14+ messages in thread
From: lijiang @ 2018-12-26  6:14 UTC (permalink / raw)
  To: Dave Young
  Cc: Borislav Petkov, bhe, linux-doc, x86, kexec, linux-kernel, mingo,
	tglx, akpm, anderson, Kazuhito Hagio

在 2018年12月26日 11:36, Dave Young 写道:
> On 12/26/18 at 11:24am, Dave Young wrote:
>>>>> +
>>>>> +KERNEL_IMAGE_SIZE
>>>>> +=================
>>>>> +The size of 'KERNEL_IMAGE_SIZE', currently unused.
>>>>
>>>> So remove?
>>>>
>>>
>>> I'm not sure whether it should be removed, so i keep it.
>>
>> Just remove it.  It was added by Baoquan for KASLR issues, later
>> makedumpfile reverted the userspace part and added other implementation.
>>
>> In case old makedumpfile does not support new kernel, it has some kernel
>> versions support list in code, thus no worry about the compatibility
>> issue.
> 
> Ah, it is not unused actually, clone crash tool git:
> $ git grep KERNEL_IMAGE_SIZE
> x86_64.c:               if ((string = pc->read_vmcoreinfo("NUMBER(KERNEL_IMAGE_SIZE)"))) {
> 
> So in the documentation, the use cases of crash tool should also be
> covered.
> 

Sure, maybe only this one was ignored.

I will improve this variable in the documentation.

> Lianbo, it would be good to cc Dave and Kazu for these patches, could
> you cc them in your next post?
> 

Yes, i will add Dave and Kazu, and also resend patch v4.

Thanks.

>>
>> Thanks
>> Dave
> 
> Thanks
> Dave
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-12-26  6:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-16 13:16 [PATCH 0/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo Lianbo Jiang
2018-12-16 13:16 ` [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation Lianbo Jiang
2018-12-17 11:52   ` Borislav Petkov
2018-12-17 12:12   ` Borislav Petkov
2018-12-17 13:00   ` Borislav Petkov
2018-12-18  7:31     ` lijiang
2018-12-18 11:41       ` Borislav Petkov
2018-12-26  3:24       ` Dave Young
2018-12-26  3:36         ` Dave Young
2018-12-26  6:14           ` lijiang
2018-12-16 13:16 ` [PATCH 2/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo Lianbo Jiang
2018-12-17 13:01   ` Borislav Petkov
2018-12-18  7:34     ` lijiang
2018-12-17 11:54 ` [PATCH 0/2 " Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).