linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lianbo Jiang <lijiang@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: kexec@lists.infradead.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, akpm@linux-foundation.org,
	bhe@redhat.com, dyoung@redhat.com, k-hagio@ab.jp.nec.com,
	anderson@redhat.com, linux-doc@vger.kernel.org
Subject: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation
Date: Thu, 10 Jan 2019 20:19:43 +0800	[thread overview]
Message-ID: <20190110121944.6050-2-lijiang@redhat.com> (raw)
In-Reply-To: <20190110121944.6050-1-lijiang@redhat.com>

This document lists some variables that export to vmcoreinfo, and briefly
describles what these variables indicate. It should be instructive for
many people who do not know the vmcoreinfo.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 Documentation/kdump/vmcoreinfo.txt | 500 +++++++++++++++++++++++++++++
 1 file changed, 500 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

diff --git a/Documentation/kdump/vmcoreinfo.txt b/Documentation/kdump/vmcoreinfo.txt
new file mode 100644
index 000000000000..8e444586b87b
--- /dev/null
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -0,0 +1,500 @@
+================================================================
+			VMCOREINFO
+================================================================
+
+=======================
+What is the VMCOREINFO?
+=======================
+
+VMCOREINFO is a special ELF note section. It contains various
+information from the kernel like structure size, page size, symbol
+values, field offsets, etc. These data are packed into an ELF note
+section and used by user-space tools like crash and makedumpfile to
+analyze a kernel's memory layout.
+
+================
+Common variables
+================
+
+init_uts_ns.name.release
+------------------------
+
+The version of the Linux kernel. Used to find the corresponding source
+code from which the kernel has been built.
+
+PAGE_SIZE
+---------
+
+The size of a page. It is the smallest unit of data for memory
+management in kernel. It is usually 4096 bytes and a page is aligned
+on 4096 bytes. Used for computing page addresses.
+
+init_uts_ns
+-----------
+
+This is the UTS namespace, which is used to isolate two specific
+elements of the system that relate to the uname(2) system call. The UTS
+namespace is named after the data structure used to store information
+returned by the uname(2) system call.
+
+User-space tools can get the kernel name, host name, kernel release
+number, kernel version, architecture name and OS type from it.
+
+node_online_map
+---------------
+
+An array node_states[N_ONLINE] which represents the set of online node
+in a system, one bit position per node number. Used to keep track of
+which nodes are in the system and online.
+
+swapper_pg_dir
+-------------
+
+The global page directory pointer of the kernel. Used to translate
+virtual to physical addresses.
+
+_stext
+------
+
+Defines the beginning of the text section. In general, _stext indicates
+the kernel start address. Used to convert a virtual address from the
+direct kernel map to a physical address.
+
+vmap_area_list
+--------------
+
+Stores the virtual area list. makedumpfile can get the vmalloc start
+value from this variable. This value is necessary for vmalloc translation.
+
+mem_map
+-------
+
+Physical addresses are translated to struct pages by treating them as
+an index into the mem_map array. Right-shifting a physical address
+PAGE_SHIFT bits converts it into a page frame number which is an index
+into that mem_map array.
+
+Used to map an address to the corresponding struct page.
+
+contig_page_data
+----------------
+
+Makedumpfile can get the pglist_data structure from this symbol, which
+is used to describe the memory layout.
+
+User-space tools use this to exclude free pages when dumping memory.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+--------------------------------------------------------------------------
+
+The address of the mem_section array, its length, structure size, and
+the section_mem_map offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variable, both of them are used to translate an
+address.
+
+page
+----
+
+The size of a page structure. struct page is an important data structure
+and it is widely used to compute the contiguous memory.
+
+pglist_data
+-----------
+
+The size of a pglist_data structure. This value will be used to check
+if the pglist_data structure is valid. It is also used for checking the
+memory type.
+
+zone
+----
+
+The size of a zone structure. This value is often used to check if the
+zone structure has been found. It is also used for excluding free pages.
+
+free_area
+---------
+
+The size of a free_area structure. It indicates whether the free_area
+structure is valid or not. Useful for excluding free pages.
+
+list_head
+---------
+
+The size of a list_head structure. Used when iterating lists in a
+post-mortem analysis session.
+
+nodemask_t
+----------
+
+The size of a nodemask_t type. Used to compute the number of online
+nodes.
+
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|
+       compound_order|compound_head)
+-------------------------------------------------------------------
+
+User-space tools can compute their values based on the offset of these
+variables. The variables are helpful to exclude unnecessary pages.
+
+(pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_
+              spanned_pages|node_id)
+-------------------------------------------------------------------
+
+On NUMA machines, each NUMA node has a pg_data_t to describe its memory
+layout. On UMA machines there is a single pglist_data which describes the
+whole memory.
+
+These values are used to check the memory type, and they are also helpful
+to compute the virtual address for memory map.
+
+(zone, free_area|vm_stat|spanned_pages)
+---------------------------------------
+
+Each node is divided into a number of blocks called zones which
+represent ranges within memory. A zone is described by a structure zone.
+Each zone type is suitable for a different type of usage.
+
+User-space tools can compute required values based on the offset of these
+variables.
+
+(free_area, free_list)
+----------------------
+
+Offset of the free_list's member. This value is used to compute the number
+of free pages.
+
+Each zone has a free_area structure array called free_area[MAX_ORDER].
+The fields in this structure are simple, the free_list represents a linked
+list of free page blocks.
+
+(list_head, next|prev)
+----------------------
+
+Offsets of the list_head's members. list_head is used to define a
+circular linked list. User-space tools need these in order to traverse
+lists.
+
+(vmap_area, va_start|list)
+--------------------------
+
+Offsets of the vmap_area's members. They indicate the vmalloc layer
+information. Makedumpfile gets the start address of the vmalloc region.
+
+(zone.free_area, MAX_ORDER)
+---------------------------
+
+It indicates the maximum number of the array free_area. This macro is
+used by the zone buddy allocator. User-space tools use this value to
+iterate the free_area.
+
+log_buf
+-------
+
+Console output is written to the ring buffer log_buf at index
+log_first_idx. Used to get the kernel log.
+
+log_buf_len
+-----------
+
+Length of a log_buf. Used to read the number of strings from the
+log_buf.
+
+log_first_idx
+-------------
+
+Index of the first record stored in the buffer log_buf. Used by
+user-space tools to read the strings in the log_buf.
+
+clear_idx
+---------
+
+The index that the next printk() record to read after the last clear
+command. It indicates the first record after the last SYSLOG_ACTION
+_CLEAR, like issued by 'dmesg -c'. Used by user-space tools to dump
+the dmesg log.
+
+log_next_idx
+------------
+
+The index of the next record to store in the buffer log_buf. Used to
+compute the index of the current string position.
+
+printk_log
+----------
+
+The size of a structure printk_log. Used to compute the size of
+messages, and extract dmesg log. It can output human readable text.
+Encapsulate header information for log_buf, such as timestamp, syslog
+level, etc.
+
+(printk_log, ts_nsec|len|text_len|dict_len)
+-------------------------------------------
+
+It represents field offsets in struct printk_log. User space tools can
+parse it and check whether the values of printk_log's members have been
+changed.
+
+(free_area.free_list, MIGRATE_TYPES)
+------------------------------------
+
+The number of migrate types for pages. The free_list is divided into
+the array, it needs to know the number of the array when makedumpfile
+computes the number of free pages.
+
+NR_FREE_PAGES
+-------------
+
+On linux-2.6.21 or later, the number of free_pages is in
+vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
+
+PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision
+|PG_head_mask|PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy)
+|PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)
+-----------------------------------------------------------------
+
+Page attributes. These flags are used to filter various unnecessary
+pages.
+
+HUGETLB_PAGE_DTOR
+-----------------
+
+The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
+excludes these pages.
+
+======
+x86_64
+======
+
+phys_base
+---------
+
+Used to convert the virtual address of an exported kernel symbol to its
+physical address.
+
+init_top_pgt
+------------
+
+Used to walk through the whole page table and convert virtual addresses
+to physical addresses. The init_top_pgt is somewhat similar to the
+swapper_pg_dir, but it is only used in x86_64.
+
+pgtable_l5_enabled
+------------------
+
+User-space tools need to know whether the crash kernel was in 5-level
+paging mode.
+
+node_data
+---------
+
+This is a struct pglist_data array and stores all numa nodes
+information. Makedumpfile gets the pglist_data structure from it.
+
+(node_data, MAX_NUMNODES)
+-------------------------
+
+The maximum number of the nodes in system.
+
+KERNELOFFSET
+------------
+
+The kernel randomization offset. Used to compute the page offset. If
+KASLR is disabled, this value is zero.
+
+KERNEL_IMAGE_SIZE
+-----------------
+
+Currently unused by Makedumpfile. Used to compute the module virtual
+address by Crash.
+
+sme_mask
+--------
+
+For AMD machine with SME feature, it indicates the secure memory
+encryption mask. Makedumpfile tools need to know whether the crash
+kernel was encrypted. If SME is enabled in the first kernel, the crash
+kernel's page table (pgd/pud/pmd/pte) contains the memory encryption
+mask and this is used to remove the SME mask to obtain the true physical
+address.
+
+Currently, the sme_mask stores the value of sme_me_mask(bit 47). If need,
+the bit(sme_mask) might be redefined in the future, but the bit 63 will
+be reserved.
+
+For example:
+[ misc	        ][ enc bit  ][ other misc SME info       ]
+0000_0000_0000_0000_1000_0000_0000_0000_0000_0000_..._0000
+63   59   55   51   47   43   39   35   31   27   ... 3
+
+======
+x86_32
+======
+
+X86_PAE
+-------
+
+Denotes whether physical address extensions are enabled. It has the cost
+of more page table lookup overhead, and also consumes more page table
+space per process. Used to check whether PAE was enabled in the crash
+kernel when converting virtual addresses to physical addresses.
+
+====
+ia64
+====
+
+pgdat_list|(pgdat_list, MAX_NUMNODES)
+-------------------------------------
+
+pg_data_t array storing all numa nodes information. MAX_NUMNODES
+indicates the number of the nodes.
+
+node_memblk|(node_memblk, NR_NODE_MEMBLKS)
+------------------------------------------
+
+List of node memory chunks. Filled when parsing SRAT table to obtain
+information about memory nodes. NR_NODE_MEMBLKS indicates the number
+of node memory chunks.
+
+These values are used to compute the number of nodes in the crash kernel.
+
+node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size)
+----------------------------------------------------------------
+
+The size of a struct node_memblk_s and the offsets of the
+node_memblk_s's members. Used to compute the number of nodes.
+
+PGTABLE_3|PGTABLE_4
+-------------------
+
+User-space tools need to know whether the crash kernel was in 3-level or
+4-level paging mode. Used to distinguish the page table.
+
+=====
+ARM64
+=====
+
+VA_BITS
+-------
+
+The maximum number of bits for virtual addresses. Used to compute the
+virtual memory ranges.
+
+kimage_voffset
+--------------
+
+The offset between the kernel virtual and physical mappings. Used to
+translate virtual to physical addresses.
+
+PHYS_OFFSET
+-----------
+
+Indicates the physical address of the start of memory. Similar to
+kimage_voffset, which is used to translate virtual address to physical
+address.
+
+KERNELOFFSET
+------------
+
+The kernel randomization offset. Used to compute the page offset. If
+KASLR is disabled, this value is zero.
+
+====
+arm
+====
+
+ARM_LPAE
+--------
+
+It indicates whether the crash kernel supports large physical address
+extensions. Used to translate virtual address to physical address.
+
+====
+s390
+====
+
+lowcore_ptr
+----------
+
+An array with a pointer to the lowcore of every CPU. Used to print the
+psw and all registers information.
+
+high_memory
+-----------
+
+Used to get the vmalloc_start address from the high_memory symbol.
+
+(lowcore_ptr, NR_CPUS)
+----------------------
+
+The maximum number of CPUs.
+
+=======
+powerpc
+=======
+
+
+node_data|(node_data, MAX_NUMNODES)
+-----------------------------------
+
+See above.
+
+contig_page_data
+----------------
+
+See above.
+
+vmemmap_list
+------------
+
+The vmemmap_list maintains the entire vmemmap physical mapping. It can
+get vmemmap list count and populate vmemmap regions info. If the vmemmap
+address translation information is stored in the crash kernel, it helps
+to translate vmemmap kernel virtual addresses.
+
+mmu_vmemmap_psize
+-----------------
+
+The size of a page. Used to translate address to physical addresses.
+
+mmu_psize_defs
+--------------
+
+Page size definitions, i.e. 4k, 64k, or 16M.
+
+Used to make vtop translations.
+
+vmemmap_backing|(vmemmap_backing, list)|(vmemmap_backing, phys)|
+(vmemmap_backing, virt_addr)
+----------------------------------------------------------------
+
+The vmemmap virtual address space management does not have a traditional
+page table to track which virtual struct pages are backed by physical
+mapping. The virtual to physical mappings are tracked in a simple linked
+list format.
+
+User-space tools need to know the offset of list, phys and virt_addr
+when computing the count of vmemmap regions.
+
+mmu_psize_def|(mmu_psize_def, shift)
+------------------------------------
+
+The size of a struct mmu_psize_def and the offset of mmu_psize_def's
+member.
+
+Used in vtop translations.
+
+==
+sh
+==
+
+node_data|(node_data, MAX_NUMNODES)
+-----------------------------------
+
+See above.
+
+X2TLB
+-----
+
+Indicates whether the crash kernel enables SH extended mode.
-- 
2.17.1


  reply	other threads:[~2019-01-10 12:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-10 12:19 [PATCH 0/2 v6] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo Lianbo Jiang
2019-01-10 12:19 ` Lianbo Jiang [this message]
2019-01-11 12:33   ` [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation Borislav Petkov
2019-01-14  1:52     ` lijiang
2019-01-14  9:10       ` Borislav Petkov
2019-01-14 17:48     ` Kazuhito Hagio
2019-01-14 18:01       ` Borislav Petkov
2019-01-14 18:58         ` Dave Anderson
2019-01-14 19:21           ` Borislav Petkov
2019-01-14 19:36             ` Dave Anderson
2019-01-14 19:59               ` Borislav Petkov
2019-01-14 20:07                 ` Dave Anderson
2019-01-14 20:18                   ` Borislav Petkov
2019-01-14 20:26                     ` Dave Anderson
2019-01-14 20:35                       ` Borislav Petkov
2019-01-14 20:49                         ` Dave Anderson
2019-01-15 10:06                           ` Borislav Petkov
2019-01-15 11:00                           ` lijiang
2019-01-11 14:56   ` Borislav Petkov
2019-01-14  5:30     ` lijiang
2019-01-14  9:15       ` Borislav Petkov
2019-01-15  9:41         ` lijiang
2019-01-15 10:09   ` [tip:x86/kdump] kdump: Document kernel data exported in the vmcoreinfo note tip-bot for Lianbo Jiang
2019-01-10 12:19 ` [PATCH 2/2 v6] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo Lianbo Jiang
2019-01-11 15:15   ` [tip:x86/kdump] x86/kdump: Export the SME " tip-bot for Lianbo Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190110121944.6050-2-lijiang@redhat.com \
    --to=lijiang@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anderson@redhat.com \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=dyoung@redhat.com \
    --cc=k-hagio@ab.jp.nec.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).