All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] fadump: Firmware-assisted dump support for Powerpc.
@ 2011-07-13 18:06 ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

Hi All,

Please find the series of the patchset that implements firmware-assisted
dump mechanism to capture kernel crash dump for Powerpc architecture. The
firmware-assisted dump is a robust mechanism to get reliable kernel crash
dump with assistance from firmware. This approach does not use kexec, instead
firmware assists in booting the kdump kernel while preserving memory contents.

The most of the code implementation has been adapted from phyp assisted dump
implementation written by Linas Vepstas and Manish Ahuja.

The first patch is a documentation that talks about firmware-assisted dump
mechanism, implementation details and TODO list.

One of the important item from TODO list where I am looking forward for more
ideas/suggestions is regarding fadump crash info structure in the scratch
area before the ELF core header (see patch 4/10 and 5/10). The idea of
introducing this structure is to pass some important crash info data to the
second kernel which will help second kernel to populate ELF core header with
correct data before it gets exported through /proc/vmcore. The current design
implementation does not address the possibility of introducing additional
fields (in future) to this structure without affecting compatibility.
Following are the possible approaches I have in mind:
    1. Introduce version field for version tracking, bump up the version
       whenever a new field is added to the structure in future. The version
       field can be used to find out what fields are valid for the current
       version of the structure.
    2. Reserve the area of predefined size (say PAGE_SIZE) for this
       structure and have unused area as reserved (initialized to zero)
       for future field additions.
The advantage of the approach 1 over 2 is, we don't need to reserve extra
space.
Please let me know if there is better solution available.

I have tested the patches on following system configuration:
1. LPAR on Power6 with 4GB RAM and 8 CPUs
2. LPAR on Power7 with 2GB RAM and 20 CPUs
3. LPAR on Power7 with 1TB RAM and 896 CPUs

These patches cleanly apply on commit 8d86e5f91440aa56a5df516bf58fe3883552ad56
in linux-2.6 git tree.

Please review the patchset and let me know your comments.

Thanks,
-Mahesh.
---

Mahesh Salgaonkar (10):
      fadump: Add documentation for firmware-assisted dump.
      fadump: Reserve the memory for firmware assisted dump.
      fadump: Register for firmware assisted dump.
      fadump: Initialize elfcore header and add PT_LOAD program headers.
      fadump: Convert firmware-assisted cpu state dump data into elf notes.
      fadump: Add PT_NOTE program header for vmcoreinfo
      fadump: Introduce cleanup routine to invalidate /proc/vmcore.
      fadump: Invalidate registration and release reserved memory for general use.
      fadump: Invalidate the fadump registration during machine shutdown.
      fadump: Introduce config option for firmware assisted dump feature


 Documentation/powerpc/firmware-assisted-dump.txt |  231 ++++
 arch/powerpc/Kconfig                             |   13 
 arch/powerpc/include/asm/fadump.h                |  194 +++
 arch/powerpc/kernel/Makefile                     |    1 
 arch/powerpc/kernel/fadump.c                     | 1228 ++++++++++++++++++++++
 arch/powerpc/kernel/prom.c                       |   15 
 arch/powerpc/kernel/setup-common.c               |    9 
 arch/powerpc/kernel/setup_64.c                   |    8 
 arch/powerpc/kernel/traps.c                      |    5 
 arch/powerpc/mm/hash_utils_64.c                  |   11 
 fs/proc/vmcore.c                                 |   20 
 include/linux/crash_dump.h                       |    1 
 include/linux/memblock.h                         |    1 
 kernel/crash_dump.c                              |   33 +
 kernel/panic.c                                   |   16 
 15 files changed, 1785 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/powerpc/firmware-assisted-dump.txt
 create mode 100644 arch/powerpc/include/asm/fadump.h
 create mode 100644 arch/powerpc/kernel/fadump.c

-- 
Signature

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 00/10] fadump: Firmware-assisted dump support for Powerpc.
@ 2011-07-13 18:06 ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Anton Blanchard, Milton Miller, Eric W. Biederman

Hi All,

Please find the series of the patchset that implements firmware-assisted
dump mechanism to capture kernel crash dump for Powerpc architecture. The
firmware-assisted dump is a robust mechanism to get reliable kernel crash
dump with assistance from firmware. This approach does not use kexec, instead
firmware assists in booting the kdump kernel while preserving memory contents.

The most of the code implementation has been adapted from phyp assisted dump
implementation written by Linas Vepstas and Manish Ahuja.

The first patch is a documentation that talks about firmware-assisted dump
mechanism, implementation details and TODO list.

One of the important item from TODO list where I am looking forward for more
ideas/suggestions is regarding fadump crash info structure in the scratch
area before the ELF core header (see patch 4/10 and 5/10). The idea of
introducing this structure is to pass some important crash info data to the
second kernel which will help second kernel to populate ELF core header with
correct data before it gets exported through /proc/vmcore. The current design
implementation does not address the possibility of introducing additional
fields (in future) to this structure without affecting compatibility.
Following are the possible approaches I have in mind:
    1. Introduce version field for version tracking, bump up the version
       whenever a new field is added to the structure in future. The version
       field can be used to find out what fields are valid for the current
       version of the structure.
    2. Reserve the area of predefined size (say PAGE_SIZE) for this
       structure and have unused area as reserved (initialized to zero)
       for future field additions.
The advantage of the approach 1 over 2 is, we don't need to reserve extra
space.
Please let me know if there is better solution available.

I have tested the patches on following system configuration:
1. LPAR on Power6 with 4GB RAM and 8 CPUs
2. LPAR on Power7 with 2GB RAM and 20 CPUs
3. LPAR on Power7 with 1TB RAM and 896 CPUs

These patches cleanly apply on commit 8d86e5f91440aa56a5df516bf58fe3883552ad56
in linux-2.6 git tree.

Please review the patchset and let me know your comments.

Thanks,
-Mahesh.
---

Mahesh Salgaonkar (10):
      fadump: Add documentation for firmware-assisted dump.
      fadump: Reserve the memory for firmware assisted dump.
      fadump: Register for firmware assisted dump.
      fadump: Initialize elfcore header and add PT_LOAD program headers.
      fadump: Convert firmware-assisted cpu state dump data into elf notes.
      fadump: Add PT_NOTE program header for vmcoreinfo
      fadump: Introduce cleanup routine to invalidate /proc/vmcore.
      fadump: Invalidate registration and release reserved memory for general use.
      fadump: Invalidate the fadump registration during machine shutdown.
      fadump: Introduce config option for firmware assisted dump feature


 Documentation/powerpc/firmware-assisted-dump.txt |  231 ++++
 arch/powerpc/Kconfig                             |   13 
 arch/powerpc/include/asm/fadump.h                |  194 +++
 arch/powerpc/kernel/Makefile                     |    1 
 arch/powerpc/kernel/fadump.c                     | 1228 ++++++++++++++++++++++
 arch/powerpc/kernel/prom.c                       |   15 
 arch/powerpc/kernel/setup-common.c               |    9 
 arch/powerpc/kernel/setup_64.c                   |    8 
 arch/powerpc/kernel/traps.c                      |    5 
 arch/powerpc/mm/hash_utils_64.c                  |   11 
 fs/proc/vmcore.c                                 |   20 
 include/linux/crash_dump.h                       |    1 
 include/linux/memblock.h                         |    1 
 kernel/crash_dump.c                              |   33 +
 kernel/panic.c                                   |   16 
 15 files changed, 1785 insertions(+), 1 deletions(-)
 create mode 100644 Documentation/powerpc/firmware-assisted-dump.txt
 create mode 100644 arch/powerpc/include/asm/fadump.h
 create mode 100644 arch/powerpc/kernel/fadump.c

-- 
Signature

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 01/10] fadump: Add documentation for firmware-assisted dump.
  2011-07-13 18:06 ` Mahesh J Salgaonkar
@ 2011-07-13 18:06   ` Mahesh J Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Documentation for firmware-assisted dump. This document is based on the
original documentation written for phyp assisted dump by Linas Vepstas
and Manish Ahuja, with few changes to reflect the current implementation.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 Documentation/powerpc/firmware-assisted-dump.txt |  231 ++++++++++++++++++++++
 1 files changed, 231 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/powerpc/firmware-assisted-dump.txt

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt
new file mode 100644
index 0000000..c8160e6
--- /dev/null
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -0,0 +1,231 @@
+
+                   Firmware-Assisted Dump
+                   ------------------------
+                       July 2011
+
+The goal of firmware-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, firmware-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- Once the dump is copied out, the memory that held the dump
+   is immediately available to the running kernel. A further
+   reboot isn't required.
+
+The above can only be accomplished by coordination with,
+and assistance from the Power firmware. The procedure is
+as follows:
+
+-- The first kernel registers the sections of memory with the
+   Power firmware for dump preservation during OS initialization.
+   This registered sections of memory is reserved by the first
+   kernel during early boot.
+
+-- When a system crashes, the Power firmware will save
+   the low memory (boot memory of size larger of 5% of system RAM
+   or 256MB) of RAM to a previously registered save region. It
+   will also save system registers, and hardware PTE's.
+
+   NOTE: The term 'boot memory' means size of the low memory chunk
+         that is required for a kernel to boot successfully when
+         booted with restricted memory.
+
+-- After the low memory (boot memory) area has been saved, the
+   firmware will reset PCI and other hardware state.  It will
+   *not* clear the RAM. It will then launch the bootloader, as
+   normal.
+
+-- The freshly booted kernel will notice that there is a new
+   node (ibm,dump-kernel) in the device tree, indicating that
+   there is crash data available from a previous boot. During
+   the early boot OS will reserve rest of the memory above
+   boot memory size effectively booting with restricted memory
+   size. This will make sure that the second kernel will not
+   touch any of the dump memory area.
+
+-- Userspace tools will read /proc/vmcore to obtain the contents
+   of memory, which holds the previous crashed kernel dump in ELF
+   format. The userspace tools may copy this info to disk, or
+   network, nas, san, iscsi, etc. as desired.
+
+-- Once the userspace tool is done saving dump, it will echo
+   '1' to /sys/kernel/fadump_release_mem to release the reserved
+   memory back to general use, except the memory required for
+   next firmware-assisted dump registration.
+
+   e.g.
+     # echo 1 > /sys/kernel/fadump_release_mem
+
+Please note that the firmware-assisted dump feature
+is only available on Power6 and above systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+
+During boot, a check is made to see if firmware supports
+this feature on that particular machine. If it does, then
+we check to see if an active dump is waiting for us. If yes
+then everything but boot memory size of RAM is reserved during
+early boot (See Fig. 2). This area is released once we collect a
+dump from user land scripts (kdump scripts) that are run. If
+there is dump data, then the /sys/kernel/fadump_release_mem
+file is created, and the reserved memory is held.
+
+If there is no waiting dump data, then only the memory required
+to hold CPU state, HPTE region, boot memory dump and elfcore
+header, is reserved at the top of memory (see Fig. 1). This area
+is *not* released: this region will be kept permanently reserved,
+so that it can act as a receptacle for a copy of the boot memory
+content in addition to CPU state and HPTE region, in the case a
+crash does occur.
+
+  o Memory Reservation during first kernel
+
+  Low memory                                        Top of memory
+  0      boot memory size                                       |
+  |           |                       |<--Reserved dump area -->|
+  V           V                       |   Permanent Reservation V
+  +-----------+----------/ /----------+---+----+-----------+----+
+  |           |                       |CPU|HPTE|  DUMP     |ELF |
+  +-----------+----------/ /----------+---+----+-----------+----+
+        |                                           ^
+        |                                           |
+        \                                           /
+         -------------------------------------------
+          Boot memory content gets transferred to
+          reserved area by firmware at the time of
+          crash
+                   Fig. 1
+
+  o Memory Reservation during second kernel after crash
+
+  Low memory                                        Top of memory
+  0      boot memory size                                       |
+  |           |<------------- Reserved dump area ----------- -->|
+  V           V                                                 V
+  +-----------+----------/ /----------+---+----+-----------+----+
+  |           |                       |CPU|HPTE|  DUMP     |ELF |
+  +-----------+----------/ /----------+---+----+-----------+----+
+        |                                                    |
+        V                                                    V
+   Used by second                                    /proc/vmcore
+   kernel to boot
+                   Fig. 2
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The dump data available through
+/proc/vmcore will be in ELF format. Hence the existing kdump
+infrastructure (kdump scripts) to save the dump works fine
+with minor modifications. The kdump script requires following
+modifications:
+-- During service kdump start if /proc/vmcore entry is not present,
+   look for the existence of /sys/kernel/fadump_enabled and read
+   value exported by it. If value is set to '1' then print
+   success otherwise fallback to existing kexec based kdump.
+
+-- During service kdump start if /proc/vmcore entry is present,
+   execute the existing routine to save the dump. Once the dump
+   is saved, echo 1 > /sys/kernel/fadump_release_mem (if the
+   file exists) to release the reserved memory for general use
+   and continue without rebooting. At this point the memory
+   reservation map will look like as shown in Fig. 1. If the file
+   /sys/kernel/fadump_release_mem is not present then follow
+   the existing routine to reboot into new kernel.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+How to enable firmware-assisted dump (fadump):
+-------------------------------------
+
+1. Set config option CONFIG_FA_DUMP=y and build kernel.
+2. Boot into linux kernel with 'fadump=1' kernel cmdline option.
+
+NOTE: If firmware-assisted dump fails to reserve memory then it will
+   fallback to existing kdump mechanism if 'crashkernel=' option
+   is set at kernel cmdline.
+
+Sysfs files:
+------------
+
+Firmware-assisted dump feature uses sysfs file system to hold
+the control files as well as the files to display memory reserved
+region.
+
+Here is the list of files under kernel sysfs:
+
+ /sys/kernel/fadump_enabled
+
+    This is used to display the fadump status.
+    0 = fadump is disabled
+    1 = fadump is enabled
+
+ /sys/kernel/fadump_region
+
+    This file shows the reserved memory regions if fadump is
+    enabled otherwise this file is empty. The output format
+    is:
+    <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size>
+
+    e.g.
+    Contents when fadump is registered during first kernel
+
+    # cat /sys/kernel/fadump_region
+    CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0
+    HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0
+    DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0
+
+    Contents when fadump is active during second kernel
+
+    # cat /sys/kernel/fadump_region
+    CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020
+    HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000
+    DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000
+        : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000
+
+ /sys/kernel/fadump_release_mem
+
+    This file is available only when fadump is active during
+    second kernel. This is used to release the reserved memory
+    region that are held for saving crash dump. To release the
+    reserved memory echo 1 to it:
+
+    echo 1  > /sys/kernel/fadump_release_mem
+
+    After echo 1, the content of the /sys/kernel/fadump_region
+    file will change to reflect the new memory reservations.
+
+TODO:
+-----
+ o Need to come up with the better approach to find out more
+   accurate boot memory size that is required for a kernel to
+   boot successfully when booted with restricted memory.
+ o The fadump implementation introduces a fadump crash info structure
+   in the scratch area before the ELF core header. The idea of introducing
+   this structure is to pass some important crash info data to the second
+   kernel which will help second kernel to populate ELF core header with
+   correct data before it gets exported through /proc/vmcore. The current
+   design implementation does not address a possibility of introducing
+   additional fields (in future) to this structure without affecting
+   compatibility. Need to come up with the better approach to address this.
+   The possible approaches are:
+	1. Introduce version field for version tracking, bump up the version
+	whenever a new field is added to the structure in future. The version
+	field can be used to find out what fields are valid for the current
+	version of the structure.
+	2. Reserve the area of predefined size (say PAGE_SIZE) for this
+	structure and have unused area as reserved (initialized to zero)
+	for future field additions.
+   The advantage of approach 1 over 2 is we don't need to reserve extra space.
+---
+Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+This document is based on the original documentation written for phyp
+assisted dump by Linas Vepstas and Manish Ahuja.


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 01/10] fadump: Add documentation for firmware-assisted dump.
@ 2011-07-13 18:06   ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Anton Blanchard, Milton Miller, Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Documentation for firmware-assisted dump. This document is based on the
original documentation written for phyp assisted dump by Linas Vepstas
and Manish Ahuja, with few changes to reflect the current implementation.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 Documentation/powerpc/firmware-assisted-dump.txt |  231 ++++++++++++++++++++++
 1 files changed, 231 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/powerpc/firmware-assisted-dump.txt

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt
new file mode 100644
index 0000000..c8160e6
--- /dev/null
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -0,0 +1,231 @@
+
+                   Firmware-Assisted Dump
+                   ------------------------
+                       July 2011
+
+The goal of firmware-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, firmware-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- Once the dump is copied out, the memory that held the dump
+   is immediately available to the running kernel. A further
+   reboot isn't required.
+
+The above can only be accomplished by coordination with,
+and assistance from the Power firmware. The procedure is
+as follows:
+
+-- The first kernel registers the sections of memory with the
+   Power firmware for dump preservation during OS initialization.
+   This registered sections of memory is reserved by the first
+   kernel during early boot.
+
+-- When a system crashes, the Power firmware will save
+   the low memory (boot memory of size larger of 5% of system RAM
+   or 256MB) of RAM to a previously registered save region. It
+   will also save system registers, and hardware PTE's.
+
+   NOTE: The term 'boot memory' means size of the low memory chunk
+         that is required for a kernel to boot successfully when
+         booted with restricted memory.
+
+-- After the low memory (boot memory) area has been saved, the
+   firmware will reset PCI and other hardware state.  It will
+   *not* clear the RAM. It will then launch the bootloader, as
+   normal.
+
+-- The freshly booted kernel will notice that there is a new
+   node (ibm,dump-kernel) in the device tree, indicating that
+   there is crash data available from a previous boot. During
+   the early boot OS will reserve rest of the memory above
+   boot memory size effectively booting with restricted memory
+   size. This will make sure that the second kernel will not
+   touch any of the dump memory area.
+
+-- Userspace tools will read /proc/vmcore to obtain the contents
+   of memory, which holds the previous crashed kernel dump in ELF
+   format. The userspace tools may copy this info to disk, or
+   network, nas, san, iscsi, etc. as desired.
+
+-- Once the userspace tool is done saving dump, it will echo
+   '1' to /sys/kernel/fadump_release_mem to release the reserved
+   memory back to general use, except the memory required for
+   next firmware-assisted dump registration.
+
+   e.g.
+     # echo 1 > /sys/kernel/fadump_release_mem
+
+Please note that the firmware-assisted dump feature
+is only available on Power6 and above systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+
+During boot, a check is made to see if firmware supports
+this feature on that particular machine. If it does, then
+we check to see if an active dump is waiting for us. If yes
+then everything but boot memory size of RAM is reserved during
+early boot (See Fig. 2). This area is released once we collect a
+dump from user land scripts (kdump scripts) that are run. If
+there is dump data, then the /sys/kernel/fadump_release_mem
+file is created, and the reserved memory is held.
+
+If there is no waiting dump data, then only the memory required
+to hold CPU state, HPTE region, boot memory dump and elfcore
+header, is reserved at the top of memory (see Fig. 1). This area
+is *not* released: this region will be kept permanently reserved,
+so that it can act as a receptacle for a copy of the boot memory
+content in addition to CPU state and HPTE region, in the case a
+crash does occur.
+
+  o Memory Reservation during first kernel
+
+  Low memory                                        Top of memory
+  0      boot memory size                                       |
+  |           |                       |<--Reserved dump area -->|
+  V           V                       |   Permanent Reservation V
+  +-----------+----------/ /----------+---+----+-----------+----+
+  |           |                       |CPU|HPTE|  DUMP     |ELF |
+  +-----------+----------/ /----------+---+----+-----------+----+
+        |                                           ^
+        |                                           |
+        \                                           /
+         -------------------------------------------
+          Boot memory content gets transferred to
+          reserved area by firmware at the time of
+          crash
+                   Fig. 1
+
+  o Memory Reservation during second kernel after crash
+
+  Low memory                                        Top of memory
+  0      boot memory size                                       |
+  |           |<------------- Reserved dump area ----------- -->|
+  V           V                                                 V
+  +-----------+----------/ /----------+---+----+-----------+----+
+  |           |                       |CPU|HPTE|  DUMP     |ELF |
+  +-----------+----------/ /----------+---+----+-----------+----+
+        |                                                    |
+        V                                                    V
+   Used by second                                    /proc/vmcore
+   kernel to boot
+                   Fig. 2
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The dump data available through
+/proc/vmcore will be in ELF format. Hence the existing kdump
+infrastructure (kdump scripts) to save the dump works fine
+with minor modifications. The kdump script requires following
+modifications:
+-- During service kdump start if /proc/vmcore entry is not present,
+   look for the existence of /sys/kernel/fadump_enabled and read
+   value exported by it. If value is set to '1' then print
+   success otherwise fallback to existing kexec based kdump.
+
+-- During service kdump start if /proc/vmcore entry is present,
+   execute the existing routine to save the dump. Once the dump
+   is saved, echo 1 > /sys/kernel/fadump_release_mem (if the
+   file exists) to release the reserved memory for general use
+   and continue without rebooting. At this point the memory
+   reservation map will look like as shown in Fig. 1. If the file
+   /sys/kernel/fadump_release_mem is not present then follow
+   the existing routine to reboot into new kernel.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+How to enable firmware-assisted dump (fadump):
+-------------------------------------
+
+1. Set config option CONFIG_FA_DUMP=y and build kernel.
+2. Boot into linux kernel with 'fadump=1' kernel cmdline option.
+
+NOTE: If firmware-assisted dump fails to reserve memory then it will
+   fallback to existing kdump mechanism if 'crashkernel=' option
+   is set at kernel cmdline.
+
+Sysfs files:
+------------
+
+Firmware-assisted dump feature uses sysfs file system to hold
+the control files as well as the files to display memory reserved
+region.
+
+Here is the list of files under kernel sysfs:
+
+ /sys/kernel/fadump_enabled
+
+    This is used to display the fadump status.
+    0 = fadump is disabled
+    1 = fadump is enabled
+
+ /sys/kernel/fadump_region
+
+    This file shows the reserved memory regions if fadump is
+    enabled otherwise this file is empty. The output format
+    is:
+    <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size>
+
+    e.g.
+    Contents when fadump is registered during first kernel
+
+    # cat /sys/kernel/fadump_region
+    CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0
+    HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0
+    DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0
+
+    Contents when fadump is active during second kernel
+
+    # cat /sys/kernel/fadump_region
+    CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020
+    HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000
+    DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000
+        : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000
+
+ /sys/kernel/fadump_release_mem
+
+    This file is available only when fadump is active during
+    second kernel. This is used to release the reserved memory
+    region that are held for saving crash dump. To release the
+    reserved memory echo 1 to it:
+
+    echo 1  > /sys/kernel/fadump_release_mem
+
+    After echo 1, the content of the /sys/kernel/fadump_region
+    file will change to reflect the new memory reservations.
+
+TODO:
+-----
+ o Need to come up with the better approach to find out more
+   accurate boot memory size that is required for a kernel to
+   boot successfully when booted with restricted memory.
+ o The fadump implementation introduces a fadump crash info structure
+   in the scratch area before the ELF core header. The idea of introducing
+   this structure is to pass some important crash info data to the second
+   kernel which will help second kernel to populate ELF core header with
+   correct data before it gets exported through /proc/vmcore. The current
+   design implementation does not address a possibility of introducing
+   additional fields (in future) to this structure without affecting
+   compatibility. Need to come up with the better approach to address this.
+   The possible approaches are:
+	1. Introduce version field for version tracking, bump up the version
+	whenever a new field is added to the structure in future. The version
+	field can be used to find out what fields are valid for the current
+	version of the structure.
+	2. Reserve the area of predefined size (say PAGE_SIZE) for this
+	structure and have unused area as reserved (initialized to zero)
+	for future field additions.
+   The advantage of approach 1 over 2 is we don't need to reserve extra space.
+---
+Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+This document is based on the original documentation written for phyp
+assisted dump by Linas Vepstas and Manish Ahuja.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 02/10] fadump: Reserve the memory for firmware assisted dump.
  2011-07-13 18:06 ` Mahesh J Salgaonkar
@ 2011-07-13 18:06   ` Mahesh J Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Reserve the memory during early boot to preserve CPU state data, HPTE region
and RMR region data in case of kernel crash. At the time of crash, powerpc
firmware will store CPU state data, HPTE region data and move RMR region
data to the reserved memory area.

If the firmware-assisted dump fails to reserve the memory, then fallback
to existing kexec-based kdump.

The most of the code implementation to reserve memory has been
adapted from phyp assisted dump implementation written by Linas Vepstas
and Manish Ahuja

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |   56 +++++++++
 arch/powerpc/kernel/Makefile      |    1 
 arch/powerpc/kernel/fadump.c      |  240 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/prom.c        |   15 ++
 4 files changed, 311 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/include/asm/fadump.h
 create mode 100644 arch/powerpc/kernel/fadump.c

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
new file mode 100644
index 0000000..08ef997
--- /dev/null
+++ b/arch/powerpc/include/asm/fadump.h
@@ -0,0 +1,56 @@
+/*
+ * Firmware Assisted dump header file.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright 2011 IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+ */
+
+#ifndef __PPC64_FA_DUMP_H__
+#define __PPC64_FA_DUMP_H__
+
+#ifdef CONFIG_FA_DUMP
+
+/*
+ * The RMR region will be saved for later dumping when kernel crashes.
+ * Set this to 256MB.
+ */
+#define RMR_START	0x0
+#define RMR_END		(0x1UL << 28)	/* 256 MB */
+
+/* Firmware provided dump sections */
+#define FADUMP_CPU_STATE_DATA	0x0001
+#define FADUMP_HPTE_REGION	0x0002
+#define FADUMP_REAL_MODE_REGION	0x0011
+
+struct fw_dump {
+	unsigned long	cpu_state_data_size;
+	unsigned long	hpte_region_size;
+	unsigned long	boot_memory_size;
+	unsigned long	reserve_dump_area_start;
+	unsigned long	reserve_dump_area_size;
+	int		ibm_configure_kernel_dump;
+
+	unsigned long	fadump_enabled:1;
+	unsigned long	fadump_supported:1;
+	unsigned long	dump_active:1;
+};
+
+extern int early_init_dt_scan_fw_dump(unsigned long node,
+		const char *uname, int depth, void *data);
+extern int fadump_reserve_mem(void);
+#endif
+#endif
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index e8b9818..47baff0 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -60,6 +60,7 @@ obj-$(CONFIG_IBMVIO)		+= vio.o
 obj-$(CONFIG_IBMEBUS)           += ibmebus.o
 obj-$(CONFIG_GENERIC_TBSYNC)	+= smp-tbsync.o
 obj-$(CONFIG_CRASH_DUMP)	+= crash_dump.o
+obj-$(CONFIG_FA_DUMP)		+= fadump.o
 ifeq ($(CONFIG_PPC32),y)
 obj-$(CONFIG_E500)		+= idle_e500.o
 endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
new file mode 100644
index 0000000..446dcdc
--- /dev/null
+++ b/arch/powerpc/kernel/fadump.c
@@ -0,0 +1,240 @@
+/*
+ * Firmware Assisted dump: A robust mechanism to get reliable kernel crash
+ * dump with assistance from firmware. This approach does not use kexec,
+ * instead firmware assists in booting the kdump kernel while preserving
+ * memory contents. The most of the code implementation has been adapted
+ * from phyp assisted dump implementation written by Linas Vepstas and
+ * Manish Ahuja
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright 2011 IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+ */
+
+#undef DEBUG
+
+#include <linux/string.h>
+#include <linux/memblock.h>
+
+#include <asm/page.h>
+#include <asm/prom.h>
+#include <asm/rtas.h>
+#include <asm/fadump.h>
+
+#ifdef DEBUG
+#define PREFIX		"fadump: "
+#define DBG(fmt...)	printk(KERN_ERR PREFIX fmt)
+#else
+#define DBG(fmt...)
+#endif
+
+/*
+ * The RTAS property "ibm,configure-kernel-dump-sizes" returns dump
+ * sizes for two of the firmware provided dump sections (cpu state data
+ * and hpte region).
+ */
+#define FW_DUMP_NUM_SECTIONS	2
+
+struct dump_section {
+	u32		dump_section;
+	unsigned long	section_size;
+} __packed;
+
+/* Global variable to hold firmware assisted dump configuration info. */
+static struct fw_dump fw_dump;
+
+/* Scan the Firmware Assisted dump configuration details. */
+int __init early_init_dt_scan_fw_dump(unsigned long node,
+			const char *uname, int depth, void *data)
+{
+	const struct dump_section *sections;
+	int i;
+	const int *token;
+
+	if (depth != 1 || strcmp(uname, "rtas") != 0)
+		return 0;
+
+	/*
+	 * Check if Firmware Assisted dump is supported. if yes, check
+	 * if dump has been initiated on last reboot.
+	 */
+	token = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL);
+	if (!token)
+		return 0;
+
+	fw_dump.fadump_supported = 1;
+	fw_dump.ibm_configure_kernel_dump = *token;
+
+	/*
+	 * The 'ibm,kernel-dump' rtas node is present only if there is
+	 * dump data waiting for us.
+	 */
+	if (of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL))
+		fw_dump.dump_active = 1;
+
+	/* Get the sizes required to store dump data for the firmware provided
+	 * dump sections.
+	 */
+	sections = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
+					NULL);
+
+	if (!sections)
+		return 0;
+
+	for (i = 0; i < FW_DUMP_NUM_SECTIONS; i++) {
+		switch (sections[i].dump_section) {
+		case FADUMP_CPU_STATE_DATA:
+			fw_dump.cpu_state_data_size = sections[i].section_size;
+			break;
+		case FADUMP_HPTE_REGION:
+			fw_dump.hpte_region_size = sections[i].section_size;
+			break;
+		}
+	}
+	return 1;
+}
+
+/**
+ * calculate_reserve_size() - reserve variable boot area 5% of System RAM
+ *
+ * Function to find the largest memory size we need to reserve during early
+ * boot process. This will be the size of the memory that is required for a
+ * kernel to boot successfully.
+ *
+ * This function has been taken from phyp-assisted dump feature implementation.
+ *
+ * returns larger of 256MB or 5% rounded down to multiples of 256MB.
+ *
+ * TODO: Come up with better approach to find out more accurate memory size
+ * that is required for a kernel to boot successfully.
+ *
+ */
+static inline unsigned long calculate_reserve_size(void)
+{
+	unsigned long size;
+
+	/* divide by 20 to get 5% of value */
+	size = memblock_end_of_DRAM();
+	do_div(size, 20);
+
+	/* round it down in multiples of 256 */
+	size = size & ~0x0FFFFFFFUL;
+
+	/* Truncate to memory_limit. We don't want to over reserve the memory.*/
+	if (memory_limit && size > memory_limit)
+		size = memory_limit;
+
+	return (size > RMR_END ? size : RMR_END);
+}
+
+/*
+ * Calculate the total memory size required to be reserved for
+ * firmware-assisted dump registration.
+ */
+static unsigned long get_dump_area_size(void)
+{
+	unsigned long size = 0;
+
+	size += fw_dump.cpu_state_data_size;
+	size += fw_dump.hpte_region_size;
+	size += fw_dump.boot_memory_size;
+
+	size = PAGE_ALIGN(size);
+	return size;
+}
+
+int __init fadump_reserve_mem(void)
+{
+	unsigned long base, size, memory_boundary;
+
+	if (!fw_dump.fadump_enabled)
+		return 0;
+
+	if (!fw_dump.fadump_supported) {
+		printk(KERN_ERR "Firmware-assisted dump is not supported on"
+				" this hardware\n");
+		fw_dump.fadump_enabled = 0;
+		return 0;
+	}
+	/* Initialize boot memory size */
+	fw_dump.boot_memory_size = calculate_reserve_size();
+
+	/*
+	 * Calculate the memory boundary.
+	 * If memory_limit is less than actual memory boundary then reserve
+	 * the memory for fadump beyond the memory_limit and adjust the
+	 * memory_limit accordingly, so that the running kernel can run with
+	 * specified memory_limit.
+	 */
+	if (memory_limit && memory_limit < memblock_end_of_DRAM()) {
+		size = get_dump_area_size();
+		if ((memory_limit + size) < memblock_end_of_DRAM())
+			memory_limit += size;
+		else
+			memory_limit = memblock_end_of_DRAM();
+		printk(KERN_INFO "Adjusted memory_limit for firmware-assisted"
+				" dump, now %#016llx\n",
+				(unsigned long long)memory_limit);
+	}
+	if (memory_limit)
+		memory_boundary = memory_limit;
+	else
+		memory_boundary = memblock_end_of_DRAM();
+
+	if (fw_dump.dump_active) {
+		printk(KERN_INFO "Firmware-assisted dump is active.\n");
+		/*
+		 * If last boot has crashed then reserve all the memory
+		 * above boot_memory_size so that we don't touch it until
+		 * dump is written to disk by userspace tool. This memory
+		 * will be released for general use once the dump is saved.
+		 */
+		base = fw_dump.boot_memory_size;
+		size = memory_boundary - base;
+		memblock_reserve(base, size);
+		printk(KERN_INFO "Reserved %ldMB of memory at %ldMB "
+				"for saving crash dump\n",
+				(unsigned long)(size >> 20),
+				(unsigned long)(base >> 20));
+	} else {
+		/* Reserve the memory at the top of memory. */
+		size = get_dump_area_size();
+		base = memory_boundary - size;
+		memblock_reserve(base, size);
+		printk(KERN_INFO "Reserved %ldMB of memory at %ldMB "
+				"for firmware-assisted dump\n",
+				(unsigned long)(size >> 20),
+				(unsigned long)(base >> 20));
+	}
+	fw_dump.reserve_dump_area_start = base;
+	fw_dump.reserve_dump_area_size = size;
+	return 1;
+}
+
+/* Look for fadump= cmdline option. */
+static int __init early_fadump_param(char *p)
+{
+	if (!p)
+		return 1;
+
+	if (p[0] == '1')
+		fw_dump.fadump_enabled = 1;
+	else if (p[0] == '0')
+		fw_dump.fadump_enabled = 0;
+
+	return 0;
+}
+early_param("fadump", early_fadump_param);
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 8c3112a..10e3de0 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -54,6 +54,7 @@
 #include <asm/pci-bridge.h>
 #include <asm/phyp_dump.h>
 #include <asm/kexec.h>
+#include <asm/fadump.h>
 #include <mm/mmu_decl.h>
 
 #ifdef DEBUG
@@ -711,6 +712,11 @@ void __init early_init_devtree(void *params)
 	of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL);
 #endif
 
+#ifdef CONFIG_FA_DUMP
+	/* scan tree to see if dump is active during last boot */
+	of_scan_flat_dt(early_init_dt_scan_fw_dump, NULL);
+#endif
+
 	/* Retrieve various informations from the /chosen node of the
 	 * device-tree, including the platform type, initrd location and
 	 * size, TCE reserve, and more ...
@@ -734,7 +740,14 @@ void __init early_init_devtree(void *params)
 	if (PHYSICAL_START > MEMORY_START)
 		memblock_reserve(MEMORY_START, 0x8000);
 	reserve_kdump_trampoline();
-	reserve_crashkernel();
+#ifdef CONFIG_FA_DUMP
+	/*
+	 * If we fail to reserve memory for firmware-assisted dump then
+	 * fallback to kexec based kdump.
+	 */
+	if (fadump_reserve_mem() == 0)
+#endif
+		reserve_crashkernel();
 	early_reserve_mem();
 	phyp_dump_reserve_mem();
 


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 02/10] fadump: Reserve the memory for firmware assisted dump.
@ 2011-07-13 18:06   ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Anton Blanchard, Milton Miller, Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Reserve the memory during early boot to preserve CPU state data, HPTE region
and RMR region data in case of kernel crash. At the time of crash, powerpc
firmware will store CPU state data, HPTE region data and move RMR region
data to the reserved memory area.

If the firmware-assisted dump fails to reserve the memory, then fallback
to existing kexec-based kdump.

The most of the code implementation to reserve memory has been
adapted from phyp assisted dump implementation written by Linas Vepstas
and Manish Ahuja

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |   56 +++++++++
 arch/powerpc/kernel/Makefile      |    1 
 arch/powerpc/kernel/fadump.c      |  240 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/prom.c        |   15 ++
 4 files changed, 311 insertions(+), 1 deletions(-)
 create mode 100644 arch/powerpc/include/asm/fadump.h
 create mode 100644 arch/powerpc/kernel/fadump.c

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
new file mode 100644
index 0000000..08ef997
--- /dev/null
+++ b/arch/powerpc/include/asm/fadump.h
@@ -0,0 +1,56 @@
+/*
+ * Firmware Assisted dump header file.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright 2011 IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+ */
+
+#ifndef __PPC64_FA_DUMP_H__
+#define __PPC64_FA_DUMP_H__
+
+#ifdef CONFIG_FA_DUMP
+
+/*
+ * The RMR region will be saved for later dumping when kernel crashes.
+ * Set this to 256MB.
+ */
+#define RMR_START	0x0
+#define RMR_END		(0x1UL << 28)	/* 256 MB */
+
+/* Firmware provided dump sections */
+#define FADUMP_CPU_STATE_DATA	0x0001
+#define FADUMP_HPTE_REGION	0x0002
+#define FADUMP_REAL_MODE_REGION	0x0011
+
+struct fw_dump {
+	unsigned long	cpu_state_data_size;
+	unsigned long	hpte_region_size;
+	unsigned long	boot_memory_size;
+	unsigned long	reserve_dump_area_start;
+	unsigned long	reserve_dump_area_size;
+	int		ibm_configure_kernel_dump;
+
+	unsigned long	fadump_enabled:1;
+	unsigned long	fadump_supported:1;
+	unsigned long	dump_active:1;
+};
+
+extern int early_init_dt_scan_fw_dump(unsigned long node,
+		const char *uname, int depth, void *data);
+extern int fadump_reserve_mem(void);
+#endif
+#endif
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index e8b9818..47baff0 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -60,6 +60,7 @@ obj-$(CONFIG_IBMVIO)		+= vio.o
 obj-$(CONFIG_IBMEBUS)           += ibmebus.o
 obj-$(CONFIG_GENERIC_TBSYNC)	+= smp-tbsync.o
 obj-$(CONFIG_CRASH_DUMP)	+= crash_dump.o
+obj-$(CONFIG_FA_DUMP)		+= fadump.o
 ifeq ($(CONFIG_PPC32),y)
 obj-$(CONFIG_E500)		+= idle_e500.o
 endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
new file mode 100644
index 0000000..446dcdc
--- /dev/null
+++ b/arch/powerpc/kernel/fadump.c
@@ -0,0 +1,240 @@
+/*
+ * Firmware Assisted dump: A robust mechanism to get reliable kernel crash
+ * dump with assistance from firmware. This approach does not use kexec,
+ * instead firmware assists in booting the kdump kernel while preserving
+ * memory contents. The most of the code implementation has been adapted
+ * from phyp assisted dump implementation written by Linas Vepstas and
+ * Manish Ahuja
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright 2011 IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+ */
+
+#undef DEBUG
+
+#include <linux/string.h>
+#include <linux/memblock.h>
+
+#include <asm/page.h>
+#include <asm/prom.h>
+#include <asm/rtas.h>
+#include <asm/fadump.h>
+
+#ifdef DEBUG
+#define PREFIX		"fadump: "
+#define DBG(fmt...)	printk(KERN_ERR PREFIX fmt)
+#else
+#define DBG(fmt...)
+#endif
+
+/*
+ * The RTAS property "ibm,configure-kernel-dump-sizes" returns dump
+ * sizes for two of the firmware provided dump sections (cpu state data
+ * and hpte region).
+ */
+#define FW_DUMP_NUM_SECTIONS	2
+
+struct dump_section {
+	u32		dump_section;
+	unsigned long	section_size;
+} __packed;
+
+/* Global variable to hold firmware assisted dump configuration info. */
+static struct fw_dump fw_dump;
+
+/* Scan the Firmware Assisted dump configuration details. */
+int __init early_init_dt_scan_fw_dump(unsigned long node,
+			const char *uname, int depth, void *data)
+{
+	const struct dump_section *sections;
+	int i;
+	const int *token;
+
+	if (depth != 1 || strcmp(uname, "rtas") != 0)
+		return 0;
+
+	/*
+	 * Check if Firmware Assisted dump is supported. if yes, check
+	 * if dump has been initiated on last reboot.
+	 */
+	token = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL);
+	if (!token)
+		return 0;
+
+	fw_dump.fadump_supported = 1;
+	fw_dump.ibm_configure_kernel_dump = *token;
+
+	/*
+	 * The 'ibm,kernel-dump' rtas node is present only if there is
+	 * dump data waiting for us.
+	 */
+	if (of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL))
+		fw_dump.dump_active = 1;
+
+	/* Get the sizes required to store dump data for the firmware provided
+	 * dump sections.
+	 */
+	sections = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
+					NULL);
+
+	if (!sections)
+		return 0;
+
+	for (i = 0; i < FW_DUMP_NUM_SECTIONS; i++) {
+		switch (sections[i].dump_section) {
+		case FADUMP_CPU_STATE_DATA:
+			fw_dump.cpu_state_data_size = sections[i].section_size;
+			break;
+		case FADUMP_HPTE_REGION:
+			fw_dump.hpte_region_size = sections[i].section_size;
+			break;
+		}
+	}
+	return 1;
+}
+
+/**
+ * calculate_reserve_size() - reserve variable boot area 5% of System RAM
+ *
+ * Function to find the largest memory size we need to reserve during early
+ * boot process. This will be the size of the memory that is required for a
+ * kernel to boot successfully.
+ *
+ * This function has been taken from phyp-assisted dump feature implementation.
+ *
+ * returns larger of 256MB or 5% rounded down to multiples of 256MB.
+ *
+ * TODO: Come up with better approach to find out more accurate memory size
+ * that is required for a kernel to boot successfully.
+ *
+ */
+static inline unsigned long calculate_reserve_size(void)
+{
+	unsigned long size;
+
+	/* divide by 20 to get 5% of value */
+	size = memblock_end_of_DRAM();
+	do_div(size, 20);
+
+	/* round it down in multiples of 256 */
+	size = size & ~0x0FFFFFFFUL;
+
+	/* Truncate to memory_limit. We don't want to over reserve the memory.*/
+	if (memory_limit && size > memory_limit)
+		size = memory_limit;
+
+	return (size > RMR_END ? size : RMR_END);
+}
+
+/*
+ * Calculate the total memory size required to be reserved for
+ * firmware-assisted dump registration.
+ */
+static unsigned long get_dump_area_size(void)
+{
+	unsigned long size = 0;
+
+	size += fw_dump.cpu_state_data_size;
+	size += fw_dump.hpte_region_size;
+	size += fw_dump.boot_memory_size;
+
+	size = PAGE_ALIGN(size);
+	return size;
+}
+
+int __init fadump_reserve_mem(void)
+{
+	unsigned long base, size, memory_boundary;
+
+	if (!fw_dump.fadump_enabled)
+		return 0;
+
+	if (!fw_dump.fadump_supported) {
+		printk(KERN_ERR "Firmware-assisted dump is not supported on"
+				" this hardware\n");
+		fw_dump.fadump_enabled = 0;
+		return 0;
+	}
+	/* Initialize boot memory size */
+	fw_dump.boot_memory_size = calculate_reserve_size();
+
+	/*
+	 * Calculate the memory boundary.
+	 * If memory_limit is less than actual memory boundary then reserve
+	 * the memory for fadump beyond the memory_limit and adjust the
+	 * memory_limit accordingly, so that the running kernel can run with
+	 * specified memory_limit.
+	 */
+	if (memory_limit && memory_limit < memblock_end_of_DRAM()) {
+		size = get_dump_area_size();
+		if ((memory_limit + size) < memblock_end_of_DRAM())
+			memory_limit += size;
+		else
+			memory_limit = memblock_end_of_DRAM();
+		printk(KERN_INFO "Adjusted memory_limit for firmware-assisted"
+				" dump, now %#016llx\n",
+				(unsigned long long)memory_limit);
+	}
+	if (memory_limit)
+		memory_boundary = memory_limit;
+	else
+		memory_boundary = memblock_end_of_DRAM();
+
+	if (fw_dump.dump_active) {
+		printk(KERN_INFO "Firmware-assisted dump is active.\n");
+		/*
+		 * If last boot has crashed then reserve all the memory
+		 * above boot_memory_size so that we don't touch it until
+		 * dump is written to disk by userspace tool. This memory
+		 * will be released for general use once the dump is saved.
+		 */
+		base = fw_dump.boot_memory_size;
+		size = memory_boundary - base;
+		memblock_reserve(base, size);
+		printk(KERN_INFO "Reserved %ldMB of memory at %ldMB "
+				"for saving crash dump\n",
+				(unsigned long)(size >> 20),
+				(unsigned long)(base >> 20));
+	} else {
+		/* Reserve the memory at the top of memory. */
+		size = get_dump_area_size();
+		base = memory_boundary - size;
+		memblock_reserve(base, size);
+		printk(KERN_INFO "Reserved %ldMB of memory at %ldMB "
+				"for firmware-assisted dump\n",
+				(unsigned long)(size >> 20),
+				(unsigned long)(base >> 20));
+	}
+	fw_dump.reserve_dump_area_start = base;
+	fw_dump.reserve_dump_area_size = size;
+	return 1;
+}
+
+/* Look for fadump= cmdline option. */
+static int __init early_fadump_param(char *p)
+{
+	if (!p)
+		return 1;
+
+	if (p[0] == '1')
+		fw_dump.fadump_enabled = 1;
+	else if (p[0] == '0')
+		fw_dump.fadump_enabled = 0;
+
+	return 0;
+}
+early_param("fadump", early_fadump_param);
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 8c3112a..10e3de0 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -54,6 +54,7 @@
 #include <asm/pci-bridge.h>
 #include <asm/phyp_dump.h>
 #include <asm/kexec.h>
+#include <asm/fadump.h>
 #include <mm/mmu_decl.h>
 
 #ifdef DEBUG
@@ -711,6 +712,11 @@ void __init early_init_devtree(void *params)
 	of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL);
 #endif
 
+#ifdef CONFIG_FA_DUMP
+	/* scan tree to see if dump is active during last boot */
+	of_scan_flat_dt(early_init_dt_scan_fw_dump, NULL);
+#endif
+
 	/* Retrieve various informations from the /chosen node of the
 	 * device-tree, including the platform type, initrd location and
 	 * size, TCE reserve, and more ...
@@ -734,7 +740,14 @@ void __init early_init_devtree(void *params)
 	if (PHYSICAL_START > MEMORY_START)
 		memblock_reserve(MEMORY_START, 0x8000);
 	reserve_kdump_trampoline();
-	reserve_crashkernel();
+#ifdef CONFIG_FA_DUMP
+	/*
+	 * If we fail to reserve memory for firmware-assisted dump then
+	 * fallback to kexec based kdump.
+	 */
+	if (fadump_reserve_mem() == 0)
+#endif
+		reserve_crashkernel();
 	early_reserve_mem();
 	phyp_dump_reserve_mem();
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 03/10] fadump: Register for firmware assisted dump.
  2011-07-13 18:06 ` Mahesh J Salgaonkar
@ 2011-07-13 18:07   ` Mahesh J Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch registers for firmware-assisted dump using rtas token
ibm,configure-kernel-dump. During registration firmware is informed about
the reserved area where it saves the CPU state data, HPTE table and contents
of RMR region at the time of kernel crash. Apart from this, firmware also
preserves the contents of entire partition memory even if it is not specified
during registration.

This patch also populates sysfs files under /sys/kernel to display
fadump status and reserved memory regions.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |   55 ++++++
 arch/powerpc/kernel/fadump.c      |  336 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/setup_64.c    |    8 +
 arch/powerpc/mm/hash_utils_64.c   |   11 +
 4 files changed, 407 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 08ef997..5568789 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -36,6 +36,58 @@
 #define FADUMP_HPTE_REGION	0x0002
 #define FADUMP_REAL_MODE_REGION	0x0011
 
+/* Dump request flag */
+#define FADUMP_REQUEST_FLAG	0x00000001
+
+/* FAD commands */
+#define FADUMP_REGISTER	1
+#define FADUMP_UNREGISTER	2
+#define FADUMP_INVALIDATE	3
+
+/* Kernel Dump section info */
+struct fadump_section {
+	u32	request_flag;
+	u16	source_data_type;
+	u16	error_flags;
+	u64	source_address;
+	u64	source_len;
+	u64	bytes_dumped;
+	u64	destination_address;
+};
+
+/* ibm,configure-kernel-dump header. */
+struct fadump_section_header {
+	u32	dump_format_version;
+	u16	dump_num_sections;
+	u16	dump_status_flag;
+	u32	offset_first_dump_section;
+
+	/* Fields for disk dump option. */
+	u32	dd_block_size;
+	u64	dd_block_offset;
+	u64	dd_num_blocks;
+	u32	dd_offset_disk_path;
+
+	/* Maximum time allowed to prevent an automatic dump-reboot. */
+	u32	max_time_auto;
+};
+
+/*
+ * Firmware Assisted dump memory structure. This structure is required for
+ * registering future kernel dump with power firmware through rtas call.
+ *
+ * No disk dump option. Hence disk dump path string section is not included.
+ */
+struct fadump_mem_struct {
+	struct fadump_section_header	header;
+
+	/* Kernel dump sections */
+	struct fadump_section		cpu_state_data;
+	struct fadump_section		hpte_region;
+	struct fadump_section		rmr_region;
+};
+
+/* Firmware-assisted dump configuration details. */
 struct fw_dump {
 	unsigned long	cpu_state_data_size;
 	unsigned long	hpte_region_size;
@@ -47,10 +99,13 @@ struct fw_dump {
 	unsigned long	fadump_enabled:1;
 	unsigned long	fadump_supported:1;
 	unsigned long	dump_active:1;
+	unsigned long	dump_registered:1;
 };
 
 extern int early_init_dt_scan_fw_dump(unsigned long node,
 		const char *uname, int depth, void *data);
 extern int fadump_reserve_mem(void);
+extern int setup_fadump(void);
+extern int is_fadump_active(void);
 #endif
 #endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 446dcdc..0130ed7 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -28,6 +28,7 @@
 
 #include <linux/string.h>
 #include <linux/memblock.h>
+#include <linux/delay.h>
 
 #include <asm/page.h>
 #include <asm/prom.h>
@@ -55,6 +56,8 @@ struct dump_section {
 
 /* Global variable to hold firmware assisted dump configuration info. */
 static struct fw_dump fw_dump;
+static struct fadump_mem_struct fdm;
+static const struct fadump_mem_struct *fdm_active;
 
 /* Scan the Firmware Assisted dump configuration details. */
 int __init early_init_dt_scan_fw_dump(unsigned long node,
@@ -82,7 +85,8 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 	 * The 'ibm,kernel-dump' rtas node is present only if there is
 	 * dump data waiting for us.
 	 */
-	if (of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL))
+	fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL);
+	if (fdm_active)
 		fw_dump.dump_active = 1;
 
 	/* Get the sizes required to store dump data for the firmware provided
@@ -107,6 +111,163 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 	return 1;
 }
 
+int is_fadump_active(void)
+{
+	return fw_dump.dump_active;
+}
+
+/* Print firmware assisted dump configurations for debugging purpose. */
+static void fadump_show_config(void)
+{
+	DBG("Support for firmware-assisted dump (fadump): %s\n",
+			(fw_dump.fadump_supported ? "present" : "no support"));
+
+	if (!fw_dump.fadump_supported)
+		return;
+
+	DBG("Fadump enabled    : %s\n",
+				(fw_dump.fadump_enabled ? "yes" : "no"));
+	DBG("Dump Active       : %s\n", (fw_dump.dump_active ? "yes" : "no"));
+	DBG("Dump section sizes:\n");
+	DBG("	CPU state data size: %lx\n", fw_dump.cpu_state_data_size);
+	DBG("	HPTE region size   : %lx\n", fw_dump.hpte_region_size);
+	DBG("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
+	DBG("Reserve area start: %lx\n", fw_dump.reserve_dump_area_start);
+	DBG("Reserve area size : %lx\n", fw_dump.reserve_dump_area_size);
+}
+
+static void show_fadump_mem_struct(const struct fadump_mem_struct *fdm)
+{
+	if (!fdm)
+		return;
+
+	DBG("--------Firmware-assisted dump memory structure---------\n");
+	DBG("header.dump_format_version        : 0x%08x\n",
+					fdm->header.dump_format_version);
+	DBG("header.dump_num_sections          : %d\n",
+					fdm->header.dump_num_sections);
+	DBG("header.dump_status_flag           : 0x%04x\n",
+					fdm->header.dump_status_flag);
+	DBG("header.offset_first_dump_section  : 0x%x\n",
+					fdm->header.offset_first_dump_section);
+
+	DBG("header.dd_block_size              : %d\n",
+					fdm->header.dd_block_size);
+	DBG("header.dd_block_offset            : 0x%Lx\n",
+					fdm->header.dd_block_offset);
+	DBG("header.dd_num_blocks              : %Lx\n",
+					fdm->header.dd_num_blocks);
+	DBG("header.dd_offset_disk_path        : 0x%x\n",
+					fdm->header.dd_offset_disk_path);
+
+	DBG("header.max_time_auto              : %d\n",
+					fdm->header.max_time_auto);
+
+	/* Kernel dump sections */
+	DBG("cpu_state_data.request_flag       : 0x%08x\n",
+					fdm->cpu_state_data.request_flag);
+	DBG("cpu_state_data.source_data_type   : 0x%04x\n",
+					fdm->cpu_state_data.source_data_type);
+	DBG("cpu_state_data.error_flags        : 0x%04x\n",
+					fdm->cpu_state_data.error_flags);
+	DBG("cpu_state_data.source_address     : 0x%016Lx\n",
+					fdm->cpu_state_data.source_address);
+	DBG("cpu_state_data.source_len         : 0x%Lx\n",
+					fdm->cpu_state_data.source_len);
+	DBG("cpu_state_data.bytes_dumped       : 0x%Lx\n",
+					fdm->cpu_state_data.bytes_dumped);
+	DBG("cpu_state_data.destination_address: 0x%016Lx\n",
+				fdm->cpu_state_data.destination_address);
+
+	DBG("hpte_region.request_flag          : 0x%08x\n",
+					fdm->hpte_region.request_flag);
+	DBG("hpte_region.source_data_type      : 0x%04x\n",
+					fdm->hpte_region.source_data_type);
+	DBG("hpte_region.error_flags           : 0x%04x\n",
+					fdm->hpte_region.error_flags);
+	DBG("hpte_region.source_address        : 0x%016Lx\n",
+					fdm->hpte_region.source_address);
+	DBG("hpte_region.source_len            : 0x%Lx\n",
+					fdm->hpte_region.source_len);
+	DBG("hpte_region.bytes_dumped          : 0x%Lx\n",
+					fdm->hpte_region.bytes_dumped);
+	DBG("hpte_region.destination_address   : 0x%016Lx\n",
+				fdm->hpte_region.destination_address);
+
+	DBG("rmr_region.request_flag           : 0x%08x\n",
+					fdm->rmr_region.request_flag);
+	DBG("rmr_region.source_data_type       : 0x%04x\n",
+					fdm->rmr_region.source_data_type);
+	DBG("rmr_region.error_flags            : 0x%04x\n",
+					fdm->rmr_region.error_flags);
+	DBG("rmr_region.source_address         : 0x%016Lx\n",
+					fdm->rmr_region.source_address);
+	DBG("rmr_region.source_len             : 0x%Lx\n",
+					fdm->rmr_region.source_len);
+	DBG("rmr_region.bytes_dumped           : 0x%Lx\n",
+					fdm->rmr_region.bytes_dumped);
+	DBG("rmr_region.destination_address    : 0x%016Lx\n",
+				fdm->rmr_region.destination_address);
+
+	DBG("--------Firmware-assisted dump memory structure---------\n");
+}
+
+static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
+				unsigned long addr)
+{
+	if (!fdm)
+		return 0;
+
+	memset(fdm, 0, sizeof(struct fadump_mem_struct));
+	addr = addr & PAGE_MASK;
+
+	fdm->header.dump_format_version = 0x00000001;
+	fdm->header.dump_num_sections = 3;
+	fdm->header.dump_status_flag = 0;
+	fdm->header.offset_first_dump_section =
+		(u32)offsetof(struct fadump_mem_struct, cpu_state_data);
+
+	/*
+	 * Fields for disk dump option.
+	 * We are not using disk dump option, hence set these fields to 0.
+	 */
+	fdm->header.dd_block_size = 0;
+	fdm->header.dd_block_offset = 0;
+	fdm->header.dd_num_blocks = 0;
+	fdm->header.dd_offset_disk_path = 0;
+
+	/* set 0 to disable an automatic dump-reboot. */
+	fdm->header.max_time_auto = 0;
+
+	/* Kernel dump sections */
+	/* cpu state data section. */
+	fdm->cpu_state_data.request_flag = FADUMP_REQUEST_FLAG;
+	fdm->cpu_state_data.source_data_type = FADUMP_CPU_STATE_DATA;
+	fdm->cpu_state_data.source_address = 0;
+	fdm->cpu_state_data.source_len = fw_dump.cpu_state_data_size;
+	fdm->cpu_state_data.destination_address = addr;
+	addr += fw_dump.cpu_state_data_size;
+
+	/* hpte region section */
+	fdm->hpte_region.request_flag = FADUMP_REQUEST_FLAG;
+	fdm->hpte_region.source_data_type = FADUMP_HPTE_REGION;
+	fdm->hpte_region.source_address = 0;
+	fdm->hpte_region.source_len = fw_dump.hpte_region_size;
+	fdm->hpte_region.destination_address = addr;
+	addr += fw_dump.hpte_region_size;
+
+	/* RMR region section */
+	fdm->rmr_region.request_flag = FADUMP_REQUEST_FLAG;
+	fdm->rmr_region.source_data_type = FADUMP_REAL_MODE_REGION;
+	fdm->rmr_region.source_address = RMR_START;
+	fdm->rmr_region.source_len = fw_dump.boot_memory_size;
+	fdm->rmr_region.destination_address = addr;
+	addr += fw_dump.boot_memory_size;
+
+	show_fadump_mem_struct(fdm);
+	return addr;
+}
+
 /**
  * calculate_reserve_size() - reserve variable boot area 5% of System RAM
  *
@@ -169,8 +330,15 @@ int __init fadump_reserve_mem(void)
 		fw_dump.fadump_enabled = 0;
 		return 0;
 	}
-	/* Initialize boot memory size */
-	fw_dump.boot_memory_size = calculate_reserve_size();
+	/*
+	 * Initialize boot memory size
+	 * If dump is active then we have already calculated the size during
+	 * first kernel.
+	 */
+	if (fdm_active)
+		fw_dump.boot_memory_size = fdm_active->rmr_region.source_len;
+	else
+		fw_dump.boot_memory_size = calculate_reserve_size();
 
 	/*
 	 * Calculate the memory boundary.
@@ -238,3 +406,165 @@ static int __init early_fadump_param(char *p)
 	return 0;
 }
 early_param("fadump", early_fadump_param);
+
+static void register_fw_dump(struct fadump_mem_struct *fdm)
+{
+	int rc;
+	unsigned int wait_time;
+
+	DBG("Registering for firmware-assisted kernel dump...\n");
+
+	/* TODO: Add upper time limit for the delay */
+	do {
+		rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL,
+			FADUMP_REGISTER, fdm,
+			sizeof(struct fadump_mem_struct));
+
+		wait_time = rtas_busy_delay_time(rc);
+		if (wait_time)
+			mdelay(wait_time);
+
+	} while (wait_time);
+
+	switch (rc) {
+	case -1:
+		printk(KERN_ERR "Failed to register firmware-assisted kernel"
+			" dump. Hardware Error(%d).\n", rc);
+		break;
+	case -3:
+		printk(KERN_ERR "Failed to register firmware-assisted kernel"
+			" dump. Parameter Error(%d).\n", rc);
+		break;
+	case -9:
+		printk(KERN_ERR "firmware-assisted kernel dump is already "
+			" registered.");
+		fw_dump.dump_registered = 1;
+		break;
+	case 0:
+		printk(KERN_INFO "firmware-assisted kernel dump registration"
+			" is successful\n");
+		fw_dump.dump_registered = 1;
+		break;
+	}
+}
+
+static void register_fadump(void)
+{
+	/*
+	 * If no memory is reserved then we can not register for firmware-
+	 * assisted dump.
+	 */
+	if (!fw_dump.reserve_dump_area_size)
+		return;
+
+	/* Initialize the kernel dump memory structure for FAD registration. */
+	init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start);
+
+	/* register the future kernel dump with firmware. */
+	register_fw_dump(&fdm);
+}
+
+static ssize_t fadump_enabled_show(struct kobject *kobj,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
+}
+
+static ssize_t fadump_region_show(struct kobject *kobj,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	const struct fadump_mem_struct *fdm_ptr;
+	ssize_t n = 0;
+
+	if (!fw_dump.fadump_enabled)
+		return n;
+
+	if (fdm_active)
+		fdm_ptr = fdm_active;
+	else
+		fdm_ptr = &fdm;
+
+	n += sprintf(buf,
+			"CPU : [%#016llx-%#016llx] %#llx bytes, "
+			"Dumped: %#llx\n",
+			fdm_ptr->cpu_state_data.destination_address,
+			fdm_ptr->cpu_state_data.destination_address +
+			fdm_ptr->cpu_state_data.source_len - 1,
+			fdm_ptr->cpu_state_data.source_len,
+			fdm_ptr->cpu_state_data.bytes_dumped);
+	n += sprintf(buf + n,
+			"HPTE: [%#016llx-%#016llx] %#llx bytes, "
+			"Dumped: %#llx\n",
+			fdm_ptr->hpte_region.destination_address,
+			fdm_ptr->hpte_region.destination_address +
+			fdm_ptr->hpte_region.source_len - 1,
+			fdm_ptr->hpte_region.source_len,
+			fdm_ptr->hpte_region.bytes_dumped);
+	n += sprintf(buf + n,
+			"DUMP: [%#016llx-%#016llx] %#llx bytes, "
+			"Dumped: %#llx\n",
+			fdm_ptr->rmr_region.destination_address,
+			fdm_ptr->rmr_region.destination_address +
+			fdm_ptr->rmr_region.source_len - 1,
+			fdm_ptr->rmr_region.source_len,
+			fdm_ptr->rmr_region.bytes_dumped);
+
+	if (!fdm_active ||
+		(fw_dump.reserve_dump_area_start ==
+		fdm_ptr->cpu_state_data.destination_address))
+		return n;
+
+	/* Dump is active. Show reserved memory region. */
+	n += sprintf(buf + n,
+			"    : [%#016llx-%#016llx] %#llx bytes, "
+			"Dumped: %#llx\n",
+			(unsigned long long)fw_dump.reserve_dump_area_start,
+			fdm_ptr->cpu_state_data.destination_address - 1,
+			fdm_ptr->cpu_state_data.destination_address -
+			fw_dump.reserve_dump_area_start,
+			fdm_ptr->cpu_state_data.destination_address -
+			fw_dump.reserve_dump_area_start);
+	return n;
+}
+
+static struct kobj_attribute fadump_attr = __ATTR(fadump_enabled,
+						0444, fadump_enabled_show,
+						NULL);
+static struct kobj_attribute fadump_region_attr = __ATTR(fadump_region,
+						0444, fadump_region_show, NULL);
+
+static int fadump_init_sysfs(void)
+{
+	int rc = 0;
+
+	rc = sysfs_create_file(kernel_kobj, &fadump_attr.attr);
+	if (rc)
+		printk(KERN_ERR "fadump: unable to create sysfs file"
+			" (%d)\n", rc);
+
+	rc = sysfs_create_file(kernel_kobj, &fadump_region_attr.attr);
+	if (rc)
+		printk(KERN_ERR "fadump: unable to create sysfs file"
+			" (%d)\n", rc);
+	return rc;
+}
+subsys_initcall(fadump_init_sysfs);
+
+/*
+ * Prepare for firmware-assisted dump.
+ */
+int __init setup_fadump(void)
+{
+	if (!fw_dump.fadump_supported) {
+		printk(KERN_ERR "Firmware-assisted dump is not supported on"
+			" this hardware\n");
+		return 0;
+	}
+
+	fadump_show_config();
+	register_fadump();
+
+	return 1;
+}
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index a88bf27..3031ea7 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -63,6 +63,7 @@
 #include <asm/kexec.h>
 #include <asm/mmu_context.h>
 #include <asm/code-patching.h>
+#include <asm/fadump.h>
 
 #include "setup.h"
 
@@ -371,6 +372,13 @@ void __init setup_system(void)
 	rtas_initialize();
 #endif /* CONFIG_PPC_RTAS */
 
+#ifdef CONFIG_FA_DUMP
+	/*
+	 * Setup Firmware-assisted dump.
+	 */
+	setup_fadump();
+#endif
+
 	/*
 	 * Check if we have an initrd provided via the device-tree
 	 */
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 26b2872..ba64f1a 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -54,6 +54,7 @@
 #include <asm/spu.h>
 #include <asm/udbg.h>
 #include <asm/code-patching.h>
+#include <asm/fadump.h>
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
@@ -627,6 +628,16 @@ static void __init htab_initialize(void)
 		/* Using a hypervisor which owns the htab */
 		htab_address = NULL;
 		_SDR1 = 0; 
+#ifdef CONFIG_FA_DUMP
+		/*
+		 * If firmware assisted dump is active firmware preserves
+		 * the contents of htab along with entire partition memory.
+		 * Clear the htab if firmware assisted dump is active so
+		 * that we dont end up using old mappings.
+		 */
+		if (is_fadump_active() && ppc_md.hpte_clear_all)
+			ppc_md.hpte_clear_all();
+#endif
 	} else {
 		/* Find storage for the HPT.  Must be contiguous in
 		 * the absolute address space. On cell we want it to be


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 03/10] fadump: Register for firmware assisted dump.
@ 2011-07-13 18:07   ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Anton Blanchard, Milton Miller, Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch registers for firmware-assisted dump using rtas token
ibm,configure-kernel-dump. During registration firmware is informed about
the reserved area where it saves the CPU state data, HPTE table and contents
of RMR region at the time of kernel crash. Apart from this, firmware also
preserves the contents of entire partition memory even if it is not specified
during registration.

This patch also populates sysfs files under /sys/kernel to display
fadump status and reserved memory regions.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |   55 ++++++
 arch/powerpc/kernel/fadump.c      |  336 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/setup_64.c    |    8 +
 arch/powerpc/mm/hash_utils_64.c   |   11 +
 4 files changed, 407 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 08ef997..5568789 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -36,6 +36,58 @@
 #define FADUMP_HPTE_REGION	0x0002
 #define FADUMP_REAL_MODE_REGION	0x0011
 
+/* Dump request flag */
+#define FADUMP_REQUEST_FLAG	0x00000001
+
+/* FAD commands */
+#define FADUMP_REGISTER	1
+#define FADUMP_UNREGISTER	2
+#define FADUMP_INVALIDATE	3
+
+/* Kernel Dump section info */
+struct fadump_section {
+	u32	request_flag;
+	u16	source_data_type;
+	u16	error_flags;
+	u64	source_address;
+	u64	source_len;
+	u64	bytes_dumped;
+	u64	destination_address;
+};
+
+/* ibm,configure-kernel-dump header. */
+struct fadump_section_header {
+	u32	dump_format_version;
+	u16	dump_num_sections;
+	u16	dump_status_flag;
+	u32	offset_first_dump_section;
+
+	/* Fields for disk dump option. */
+	u32	dd_block_size;
+	u64	dd_block_offset;
+	u64	dd_num_blocks;
+	u32	dd_offset_disk_path;
+
+	/* Maximum time allowed to prevent an automatic dump-reboot. */
+	u32	max_time_auto;
+};
+
+/*
+ * Firmware Assisted dump memory structure. This structure is required for
+ * registering future kernel dump with power firmware through rtas call.
+ *
+ * No disk dump option. Hence disk dump path string section is not included.
+ */
+struct fadump_mem_struct {
+	struct fadump_section_header	header;
+
+	/* Kernel dump sections */
+	struct fadump_section		cpu_state_data;
+	struct fadump_section		hpte_region;
+	struct fadump_section		rmr_region;
+};
+
+/* Firmware-assisted dump configuration details. */
 struct fw_dump {
 	unsigned long	cpu_state_data_size;
 	unsigned long	hpte_region_size;
@@ -47,10 +99,13 @@ struct fw_dump {
 	unsigned long	fadump_enabled:1;
 	unsigned long	fadump_supported:1;
 	unsigned long	dump_active:1;
+	unsigned long	dump_registered:1;
 };
 
 extern int early_init_dt_scan_fw_dump(unsigned long node,
 		const char *uname, int depth, void *data);
 extern int fadump_reserve_mem(void);
+extern int setup_fadump(void);
+extern int is_fadump_active(void);
 #endif
 #endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 446dcdc..0130ed7 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -28,6 +28,7 @@
 
 #include <linux/string.h>
 #include <linux/memblock.h>
+#include <linux/delay.h>
 
 #include <asm/page.h>
 #include <asm/prom.h>
@@ -55,6 +56,8 @@ struct dump_section {
 
 /* Global variable to hold firmware assisted dump configuration info. */
 static struct fw_dump fw_dump;
+static struct fadump_mem_struct fdm;
+static const struct fadump_mem_struct *fdm_active;
 
 /* Scan the Firmware Assisted dump configuration details. */
 int __init early_init_dt_scan_fw_dump(unsigned long node,
@@ -82,7 +85,8 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 	 * The 'ibm,kernel-dump' rtas node is present only if there is
 	 * dump data waiting for us.
 	 */
-	if (of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL))
+	fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL);
+	if (fdm_active)
 		fw_dump.dump_active = 1;
 
 	/* Get the sizes required to store dump data for the firmware provided
@@ -107,6 +111,163 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 	return 1;
 }
 
+int is_fadump_active(void)
+{
+	return fw_dump.dump_active;
+}
+
+/* Print firmware assisted dump configurations for debugging purpose. */
+static void fadump_show_config(void)
+{
+	DBG("Support for firmware-assisted dump (fadump): %s\n",
+			(fw_dump.fadump_supported ? "present" : "no support"));
+
+	if (!fw_dump.fadump_supported)
+		return;
+
+	DBG("Fadump enabled    : %s\n",
+				(fw_dump.fadump_enabled ? "yes" : "no"));
+	DBG("Dump Active       : %s\n", (fw_dump.dump_active ? "yes" : "no"));
+	DBG("Dump section sizes:\n");
+	DBG("	CPU state data size: %lx\n", fw_dump.cpu_state_data_size);
+	DBG("	HPTE region size   : %lx\n", fw_dump.hpte_region_size);
+	DBG("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
+	DBG("Reserve area start: %lx\n", fw_dump.reserve_dump_area_start);
+	DBG("Reserve area size : %lx\n", fw_dump.reserve_dump_area_size);
+}
+
+static void show_fadump_mem_struct(const struct fadump_mem_struct *fdm)
+{
+	if (!fdm)
+		return;
+
+	DBG("--------Firmware-assisted dump memory structure---------\n");
+	DBG("header.dump_format_version        : 0x%08x\n",
+					fdm->header.dump_format_version);
+	DBG("header.dump_num_sections          : %d\n",
+					fdm->header.dump_num_sections);
+	DBG("header.dump_status_flag           : 0x%04x\n",
+					fdm->header.dump_status_flag);
+	DBG("header.offset_first_dump_section  : 0x%x\n",
+					fdm->header.offset_first_dump_section);
+
+	DBG("header.dd_block_size              : %d\n",
+					fdm->header.dd_block_size);
+	DBG("header.dd_block_offset            : 0x%Lx\n",
+					fdm->header.dd_block_offset);
+	DBG("header.dd_num_blocks              : %Lx\n",
+					fdm->header.dd_num_blocks);
+	DBG("header.dd_offset_disk_path        : 0x%x\n",
+					fdm->header.dd_offset_disk_path);
+
+	DBG("header.max_time_auto              : %d\n",
+					fdm->header.max_time_auto);
+
+	/* Kernel dump sections */
+	DBG("cpu_state_data.request_flag       : 0x%08x\n",
+					fdm->cpu_state_data.request_flag);
+	DBG("cpu_state_data.source_data_type   : 0x%04x\n",
+					fdm->cpu_state_data.source_data_type);
+	DBG("cpu_state_data.error_flags        : 0x%04x\n",
+					fdm->cpu_state_data.error_flags);
+	DBG("cpu_state_data.source_address     : 0x%016Lx\n",
+					fdm->cpu_state_data.source_address);
+	DBG("cpu_state_data.source_len         : 0x%Lx\n",
+					fdm->cpu_state_data.source_len);
+	DBG("cpu_state_data.bytes_dumped       : 0x%Lx\n",
+					fdm->cpu_state_data.bytes_dumped);
+	DBG("cpu_state_data.destination_address: 0x%016Lx\n",
+				fdm->cpu_state_data.destination_address);
+
+	DBG("hpte_region.request_flag          : 0x%08x\n",
+					fdm->hpte_region.request_flag);
+	DBG("hpte_region.source_data_type      : 0x%04x\n",
+					fdm->hpte_region.source_data_type);
+	DBG("hpte_region.error_flags           : 0x%04x\n",
+					fdm->hpte_region.error_flags);
+	DBG("hpte_region.source_address        : 0x%016Lx\n",
+					fdm->hpte_region.source_address);
+	DBG("hpte_region.source_len            : 0x%Lx\n",
+					fdm->hpte_region.source_len);
+	DBG("hpte_region.bytes_dumped          : 0x%Lx\n",
+					fdm->hpte_region.bytes_dumped);
+	DBG("hpte_region.destination_address   : 0x%016Lx\n",
+				fdm->hpte_region.destination_address);
+
+	DBG("rmr_region.request_flag           : 0x%08x\n",
+					fdm->rmr_region.request_flag);
+	DBG("rmr_region.source_data_type       : 0x%04x\n",
+					fdm->rmr_region.source_data_type);
+	DBG("rmr_region.error_flags            : 0x%04x\n",
+					fdm->rmr_region.error_flags);
+	DBG("rmr_region.source_address         : 0x%016Lx\n",
+					fdm->rmr_region.source_address);
+	DBG("rmr_region.source_len             : 0x%Lx\n",
+					fdm->rmr_region.source_len);
+	DBG("rmr_region.bytes_dumped           : 0x%Lx\n",
+					fdm->rmr_region.bytes_dumped);
+	DBG("rmr_region.destination_address    : 0x%016Lx\n",
+				fdm->rmr_region.destination_address);
+
+	DBG("--------Firmware-assisted dump memory structure---------\n");
+}
+
+static unsigned long init_fadump_mem_struct(struct fadump_mem_struct *fdm,
+				unsigned long addr)
+{
+	if (!fdm)
+		return 0;
+
+	memset(fdm, 0, sizeof(struct fadump_mem_struct));
+	addr = addr & PAGE_MASK;
+
+	fdm->header.dump_format_version = 0x00000001;
+	fdm->header.dump_num_sections = 3;
+	fdm->header.dump_status_flag = 0;
+	fdm->header.offset_first_dump_section =
+		(u32)offsetof(struct fadump_mem_struct, cpu_state_data);
+
+	/*
+	 * Fields for disk dump option.
+	 * We are not using disk dump option, hence set these fields to 0.
+	 */
+	fdm->header.dd_block_size = 0;
+	fdm->header.dd_block_offset = 0;
+	fdm->header.dd_num_blocks = 0;
+	fdm->header.dd_offset_disk_path = 0;
+
+	/* set 0 to disable an automatic dump-reboot. */
+	fdm->header.max_time_auto = 0;
+
+	/* Kernel dump sections */
+	/* cpu state data section. */
+	fdm->cpu_state_data.request_flag = FADUMP_REQUEST_FLAG;
+	fdm->cpu_state_data.source_data_type = FADUMP_CPU_STATE_DATA;
+	fdm->cpu_state_data.source_address = 0;
+	fdm->cpu_state_data.source_len = fw_dump.cpu_state_data_size;
+	fdm->cpu_state_data.destination_address = addr;
+	addr += fw_dump.cpu_state_data_size;
+
+	/* hpte region section */
+	fdm->hpte_region.request_flag = FADUMP_REQUEST_FLAG;
+	fdm->hpte_region.source_data_type = FADUMP_HPTE_REGION;
+	fdm->hpte_region.source_address = 0;
+	fdm->hpte_region.source_len = fw_dump.hpte_region_size;
+	fdm->hpte_region.destination_address = addr;
+	addr += fw_dump.hpte_region_size;
+
+	/* RMR region section */
+	fdm->rmr_region.request_flag = FADUMP_REQUEST_FLAG;
+	fdm->rmr_region.source_data_type = FADUMP_REAL_MODE_REGION;
+	fdm->rmr_region.source_address = RMR_START;
+	fdm->rmr_region.source_len = fw_dump.boot_memory_size;
+	fdm->rmr_region.destination_address = addr;
+	addr += fw_dump.boot_memory_size;
+
+	show_fadump_mem_struct(fdm);
+	return addr;
+}
+
 /**
  * calculate_reserve_size() - reserve variable boot area 5% of System RAM
  *
@@ -169,8 +330,15 @@ int __init fadump_reserve_mem(void)
 		fw_dump.fadump_enabled = 0;
 		return 0;
 	}
-	/* Initialize boot memory size */
-	fw_dump.boot_memory_size = calculate_reserve_size();
+	/*
+	 * Initialize boot memory size
+	 * If dump is active then we have already calculated the size during
+	 * first kernel.
+	 */
+	if (fdm_active)
+		fw_dump.boot_memory_size = fdm_active->rmr_region.source_len;
+	else
+		fw_dump.boot_memory_size = calculate_reserve_size();
 
 	/*
 	 * Calculate the memory boundary.
@@ -238,3 +406,165 @@ static int __init early_fadump_param(char *p)
 	return 0;
 }
 early_param("fadump", early_fadump_param);
+
+static void register_fw_dump(struct fadump_mem_struct *fdm)
+{
+	int rc;
+	unsigned int wait_time;
+
+	DBG("Registering for firmware-assisted kernel dump...\n");
+
+	/* TODO: Add upper time limit for the delay */
+	do {
+		rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL,
+			FADUMP_REGISTER, fdm,
+			sizeof(struct fadump_mem_struct));
+
+		wait_time = rtas_busy_delay_time(rc);
+		if (wait_time)
+			mdelay(wait_time);
+
+	} while (wait_time);
+
+	switch (rc) {
+	case -1:
+		printk(KERN_ERR "Failed to register firmware-assisted kernel"
+			" dump. Hardware Error(%d).\n", rc);
+		break;
+	case -3:
+		printk(KERN_ERR "Failed to register firmware-assisted kernel"
+			" dump. Parameter Error(%d).\n", rc);
+		break;
+	case -9:
+		printk(KERN_ERR "firmware-assisted kernel dump is already "
+			" registered.");
+		fw_dump.dump_registered = 1;
+		break;
+	case 0:
+		printk(KERN_INFO "firmware-assisted kernel dump registration"
+			" is successful\n");
+		fw_dump.dump_registered = 1;
+		break;
+	}
+}
+
+static void register_fadump(void)
+{
+	/*
+	 * If no memory is reserved then we can not register for firmware-
+	 * assisted dump.
+	 */
+	if (!fw_dump.reserve_dump_area_size)
+		return;
+
+	/* Initialize the kernel dump memory structure for FAD registration. */
+	init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start);
+
+	/* register the future kernel dump with firmware. */
+	register_fw_dump(&fdm);
+}
+
+static ssize_t fadump_enabled_show(struct kobject *kobj,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
+}
+
+static ssize_t fadump_region_show(struct kobject *kobj,
+					struct kobj_attribute *attr,
+					char *buf)
+{
+	const struct fadump_mem_struct *fdm_ptr;
+	ssize_t n = 0;
+
+	if (!fw_dump.fadump_enabled)
+		return n;
+
+	if (fdm_active)
+		fdm_ptr = fdm_active;
+	else
+		fdm_ptr = &fdm;
+
+	n += sprintf(buf,
+			"CPU : [%#016llx-%#016llx] %#llx bytes, "
+			"Dumped: %#llx\n",
+			fdm_ptr->cpu_state_data.destination_address,
+			fdm_ptr->cpu_state_data.destination_address +
+			fdm_ptr->cpu_state_data.source_len - 1,
+			fdm_ptr->cpu_state_data.source_len,
+			fdm_ptr->cpu_state_data.bytes_dumped);
+	n += sprintf(buf + n,
+			"HPTE: [%#016llx-%#016llx] %#llx bytes, "
+			"Dumped: %#llx\n",
+			fdm_ptr->hpte_region.destination_address,
+			fdm_ptr->hpte_region.destination_address +
+			fdm_ptr->hpte_region.source_len - 1,
+			fdm_ptr->hpte_region.source_len,
+			fdm_ptr->hpte_region.bytes_dumped);
+	n += sprintf(buf + n,
+			"DUMP: [%#016llx-%#016llx] %#llx bytes, "
+			"Dumped: %#llx\n",
+			fdm_ptr->rmr_region.destination_address,
+			fdm_ptr->rmr_region.destination_address +
+			fdm_ptr->rmr_region.source_len - 1,
+			fdm_ptr->rmr_region.source_len,
+			fdm_ptr->rmr_region.bytes_dumped);
+
+	if (!fdm_active ||
+		(fw_dump.reserve_dump_area_start ==
+		fdm_ptr->cpu_state_data.destination_address))
+		return n;
+
+	/* Dump is active. Show reserved memory region. */
+	n += sprintf(buf + n,
+			"    : [%#016llx-%#016llx] %#llx bytes, "
+			"Dumped: %#llx\n",
+			(unsigned long long)fw_dump.reserve_dump_area_start,
+			fdm_ptr->cpu_state_data.destination_address - 1,
+			fdm_ptr->cpu_state_data.destination_address -
+			fw_dump.reserve_dump_area_start,
+			fdm_ptr->cpu_state_data.destination_address -
+			fw_dump.reserve_dump_area_start);
+	return n;
+}
+
+static struct kobj_attribute fadump_attr = __ATTR(fadump_enabled,
+						0444, fadump_enabled_show,
+						NULL);
+static struct kobj_attribute fadump_region_attr = __ATTR(fadump_region,
+						0444, fadump_region_show, NULL);
+
+static int fadump_init_sysfs(void)
+{
+	int rc = 0;
+
+	rc = sysfs_create_file(kernel_kobj, &fadump_attr.attr);
+	if (rc)
+		printk(KERN_ERR "fadump: unable to create sysfs file"
+			" (%d)\n", rc);
+
+	rc = sysfs_create_file(kernel_kobj, &fadump_region_attr.attr);
+	if (rc)
+		printk(KERN_ERR "fadump: unable to create sysfs file"
+			" (%d)\n", rc);
+	return rc;
+}
+subsys_initcall(fadump_init_sysfs);
+
+/*
+ * Prepare for firmware-assisted dump.
+ */
+int __init setup_fadump(void)
+{
+	if (!fw_dump.fadump_supported) {
+		printk(KERN_ERR "Firmware-assisted dump is not supported on"
+			" this hardware\n");
+		return 0;
+	}
+
+	fadump_show_config();
+	register_fadump();
+
+	return 1;
+}
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index a88bf27..3031ea7 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -63,6 +63,7 @@
 #include <asm/kexec.h>
 #include <asm/mmu_context.h>
 #include <asm/code-patching.h>
+#include <asm/fadump.h>
 
 #include "setup.h"
 
@@ -371,6 +372,13 @@ void __init setup_system(void)
 	rtas_initialize();
 #endif /* CONFIG_PPC_RTAS */
 
+#ifdef CONFIG_FA_DUMP
+	/*
+	 * Setup Firmware-assisted dump.
+	 */
+	setup_fadump();
+#endif
+
 	/*
 	 * Check if we have an initrd provided via the device-tree
 	 */
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 26b2872..ba64f1a 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -54,6 +54,7 @@
 #include <asm/spu.h>
 #include <asm/udbg.h>
 #include <asm/code-patching.h>
+#include <asm/fadump.h>
 
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
@@ -627,6 +628,16 @@ static void __init htab_initialize(void)
 		/* Using a hypervisor which owns the htab */
 		htab_address = NULL;
 		_SDR1 = 0; 
+#ifdef CONFIG_FA_DUMP
+		/*
+		 * If firmware assisted dump is active firmware preserves
+		 * the contents of htab along with entire partition memory.
+		 * Clear the htab if firmware assisted dump is active so
+		 * that we dont end up using old mappings.
+		 */
+		if (is_fadump_active() && ppc_md.hpte_clear_all)
+			ppc_md.hpte_clear_all();
+#endif
 	} else {
 		/* Find storage for the HPT.  Must be contiguous in
 		 * the absolute address space. On cell we want it to be

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 04/10] fadump: Initialize elfcore header and add PT_LOAD program headers.
  2011-07-13 18:06 ` Mahesh J Salgaonkar
@ 2011-07-13 18:07   ` Mahesh J Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Build the crash memory range list by traversing through system memory during
the first kernel before we register for firmware-assisted dump. After the
successful dump registration, initialize the elfcore header and populate
PT_LOAD program headers with crash memory ranges. The elfcore header is
saved in the scratch area within the reserved memory. The scratch area starts
at the end of the memory reserved for saving RMR region contents. The
scratch area contains fadump crash info structure that contains magic number
for fadump validation and physical address where the eflcore header can be
found. This structure will also be used to pass some important crash info
data to the second kernel which will help second kernel to populate ELF core
header with correct data before it gets exported through /proc/vmcore. Since
the firmware preserves the entire partition memory at the time of crash the
contents of the scratch area will be preserved till second kernel boot.

NOTE: The current design implementation does not address a possibility of
introducing additional fields (in future) to this structure without affecting
compatibility. It's on TODO list to come up with better approach to
address this.

Reserved dump area start => +-------------------------------------+
                            |  CPU state dump data                |
                            +-------------------------------------+
                            |  HPTE region data                   |
                            +-------------------------------------+
                            |  RMR region data                    |
Scratch area start       => +-------------------------------------+
                            |  fadump crash info structure {      |
                            |     magic nummber                   |
                     +------|---- elfcorehdr_addr                 |
                     |      |  }                                  |
                     +----> +-------------------------------------+
                            |  ELF core header                    |
Reserved dump area end   => +-------------------------------------+

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |   37 +++++++
 arch/powerpc/kernel/fadump.c      |  209 +++++++++++++++++++++++++++++++++++++
 include/linux/crash_dump.h        |    1 
 include/linux/memblock.h          |    1 
 kernel/crash_dump.c               |   33 ++++++
 5 files changed, 280 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 5568789..4eba2d7 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -44,6 +44,9 @@
 #define FADUMP_UNREGISTER	2
 #define FADUMP_INVALIDATE	3
 
+/* Dump status flag */
+#define FADUMP_ERROR_FLAG	0x2000
+
 /* Kernel Dump section info */
 struct fadump_section {
 	u32	request_flag;
@@ -94,6 +97,7 @@ struct fw_dump {
 	unsigned long	boot_memory_size;
 	unsigned long	reserve_dump_area_start;
 	unsigned long	reserve_dump_area_size;
+	unsigned long	fadumphdr_addr;
 	int		ibm_configure_kernel_dump;
 
 	unsigned long	fadump_enabled:1;
@@ -102,6 +106,39 @@ struct fw_dump {
 	unsigned long	dump_registered:1;
 };
 
+/*
+ * Copy the ascii values for first 8 characters from a string into u64
+ * variable at their respective indexes.
+ * e.g.
+ *  The string "FADMPINF" will be converted into 0x4641444d50494e46
+ */
+static inline u64 str_to_u64(const char *str)
+{
+	u64 val = 0;
+	int i;
+
+	for (i = 0; i < sizeof(val); i++)
+		val = (*str) ? (val << 8) | *str++ : val << 8;
+	return val;
+}
+#define STR_TO_HEX(x)	str_to_u64(x)
+
+#define FADUMP_CRASH_INFO_MAGIC		STR_TO_HEX("FADMPINF")
+
+/* fadump crash info structure */
+struct fadump_crash_info_header {
+	u64		magic_number;
+	u64		elfcorehdr_addr;
+};
+
+/* Crash memory ranges */
+#define INIT_CRASHMEM_RANGES	(INIT_MEMBLOCK_REGIONS + 2)
+
+struct fad_crash_memory_ranges {
+	unsigned long long	base;
+	unsigned long long	size;
+};
+
 extern int early_init_dt_scan_fw_dump(unsigned long node,
 		const char *uname, int depth, void *data);
 extern int fadump_reserve_mem(void);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 0130ed7..b6f4a8e 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -29,6 +29,7 @@
 #include <linux/string.h>
 #include <linux/memblock.h>
 #include <linux/delay.h>
+#include <linux/crash_dump.h>
 
 #include <asm/page.h>
 #include <asm/prom.h>
@@ -59,6 +60,9 @@ static struct fw_dump fw_dump;
 static struct fadump_mem_struct fdm;
 static const struct fadump_mem_struct *fdm_active;
 
+struct fad_crash_memory_ranges crash_memory_ranges[INIT_CRASHMEM_RANGES];
+int crash_mem_ranges;
+
 /* Scan the Firmware Assisted dump configuration details. */
 int __init early_init_dt_scan_fw_dump(unsigned long node,
 			const char *uname, int depth, void *data)
@@ -312,6 +316,10 @@ static unsigned long get_dump_area_size(void)
 	size += fw_dump.cpu_state_data_size;
 	size += fw_dump.hpte_region_size;
 	size += fw_dump.boot_memory_size;
+	size += sizeof(struct fadump_crash_info_header);
+	size += sizeof(struct elfhdr); /* ELF core header.*/
+	/* Program headers for crash memory regions. */
+	size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
 
 	size = PAGE_ALIGN(size);
 	return size;
@@ -377,6 +385,11 @@ int __init fadump_reserve_mem(void)
 				"for saving crash dump\n",
 				(unsigned long)(size >> 20),
 				(unsigned long)(base >> 20));
+
+		fw_dump.fadumphdr_addr =
+				fdm_active->rmr_region.destination_address +
+				fdm_active->rmr_region.source_len;
+		DBG("fadumphdr_addr = %p\n", (void *) fw_dump.fadumphdr_addr);
 	} else {
 		/* Reserve the memory at the top of memory. */
 		size = get_dump_area_size();
@@ -448,8 +461,183 @@ static void register_fw_dump(struct fadump_mem_struct *fdm)
 	}
 }
 
+/*
+ * Validate and process the dump data stored by firmware before exporting
+ * it through '/proc/vmcore'.
+ */
+static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
+{
+	struct fadump_crash_info_header *fdh;
+
+	if (!fdm_active || !fw_dump.fadumphdr_addr)
+		return -EINVAL;
+
+	show_fadump_mem_struct(fdm_active);
+
+	/* Check if the dump data is valid. */
+	if ((fdm_active->header.dump_status_flag == FADUMP_ERROR_FLAG) ||
+			(fdm_active->rmr_region.error_flags != 0)) {
+		printk(KERN_ERR "Dump taken by platform is not valid\n");
+		return -EINVAL;
+	}
+	if (fdm_active->rmr_region.bytes_dumped !=
+			fdm_active->rmr_region.source_len) {
+		printk(KERN_ERR "Dump taken by platform is incomplete\n");
+		return -EINVAL;
+	}
+
+	/* Validate the fadump crash info header */
+	fdh = __va(fw_dump.fadumphdr_addr);
+	if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) {
+		printk(KERN_ERR "Crash info header is not valid.\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * We are done validating dump info and elfcore header is now ready
+	 * to be exported. set elfcorehdr_addr so that vmcore module will
+	 * export the elfcore header through '/proc/vmcore'.
+	 */
+	elfcorehdr_addr = fdh->elfcorehdr_addr;
+
+	return 0;
+}
+
+static inline void add_crash_memory(unsigned long long base,
+					unsigned long long end)
+{
+	if (base == end)
+		return;
+
+	DBG("crash_memory_range[%d] [%#016llx-%#016llx], %#llx bytes\n",
+		crash_mem_ranges, base, end - 1, (end - base));
+	crash_memory_ranges[crash_mem_ranges].base = base;
+	crash_memory_ranges[crash_mem_ranges].size = end - base;
+	crash_mem_ranges++;
+}
+
+static void exclude_reserved_area(unsigned long long start,
+					unsigned long long end)
+{
+	unsigned long long ra_start, ra_end;
+
+	ra_start = fw_dump.reserve_dump_area_start;
+	ra_end = ra_start + fw_dump.reserve_dump_area_size;
+
+	if ((ra_start < end) && (ra_end > start)) {
+		if ((start < ra_start) && (end > ra_end)) {
+			add_crash_memory(start, ra_start);
+			add_crash_memory(ra_end, end);
+		} else if (start < ra_start) {
+			add_crash_memory(start, ra_start);
+		} else if (ra_end < end) {
+			add_crash_memory(ra_end, end);
+		}
+	} else
+		add_crash_memory(start, end);
+}
+
+/*
+ * Traverse through memblock structure and setup crash memory ranges. These
+ * ranges will be used create PT_LOAD program headers in elfcore header.
+ */
+static void setup_crash_memory_ranges(void)
+{
+	struct memblock_region *reg;
+	unsigned long long start, end;
+
+	DBG("Setup crash memory ranges.\n");
+	crash_mem_ranges = 0;
+	/*
+	 * add the first memory chunk (RMR_START through boot_memory_size) as
+	 * a separate memory chunk. The reason is, at the time crash firmware
+	 * will move the content of this memory chunk to different location
+	 * specified during fadump registration. We need to create a separate
+	 * program header for this chunk with the correct offset.
+	 */
+	add_crash_memory(RMR_START, fw_dump.boot_memory_size);
+
+	for_each_memblock(memory, reg) {
+		start = (unsigned long long)reg->base;
+		end = start + (unsigned long long)reg->size;
+		if (start == RMR_START && end >= fw_dump.boot_memory_size)
+			start = fw_dump.boot_memory_size;
+
+		/* add this range excluding the reserved dump area. */
+		exclude_reserved_area(start, end);
+	}
+}
+
+static int create_elfcore_headers(char *bufp)
+{
+	struct elfhdr *elf;
+	struct elf_phdr *phdr;
+	int i;
+
+	init_elfcore_header(bufp);
+	elf = (struct elfhdr *)bufp;
+	bufp += sizeof(struct elfhdr);
+
+	/* setup PT_LOAD sections. */
+
+	for (i = 0; i < crash_mem_ranges; i++) {
+		unsigned long long mbase, msize;
+		mbase = crash_memory_ranges[i].base;
+		msize = crash_memory_ranges[i].size;
+
+		if (!msize)
+			continue;
+
+		phdr = (struct elf_phdr *)bufp;
+		bufp += sizeof(struct elf_phdr);
+		phdr->p_type	= PT_LOAD;
+		phdr->p_flags	= PF_R|PF_W|PF_X;
+		phdr->p_offset	= mbase;
+
+		if (mbase == RMR_START) {
+			/*
+			 * The entire RMR region will be moved by firmware
+			 * to the specified destination_address. Hence set
+			 * the correct offset.
+			 */
+			phdr->p_offset = fdm.rmr_region.destination_address;
+		}
+
+		phdr->p_paddr = mbase;
+		phdr->p_vaddr = (unsigned long)__va(mbase);
+		phdr->p_filesz = msize;
+		phdr->p_memsz = msize;
+		phdr->p_align = 0;
+
+		/* Increment number of program headers. */
+		(elf->e_phnum)++;
+	}
+	return 0;
+}
+
+static unsigned long init_fadump_header(unsigned long addr)
+{
+	struct fadump_crash_info_header *fdh;
+
+	if (!addr)
+		return 0;
+
+	fw_dump.fadumphdr_addr = addr;
+	fdh = __va(addr);
+	addr += sizeof(struct fadump_crash_info_header);
+
+	memset(fdh, 0, sizeof(struct fadump_crash_info_header));
+	fdh->magic_number = FADUMP_CRASH_INFO_MAGIC;
+	fdh->elfcorehdr_addr = addr;
+
+	return addr;
+}
+
 static void register_fadump(void)
 {
+	unsigned long addr;
+	void *vaddr;
+
 	/*
 	 * If no memory is reserved then we can not register for firmware-
 	 * assisted dump.
@@ -457,8 +645,17 @@ static void register_fadump(void)
 	if (!fw_dump.reserve_dump_area_size)
 		return;
 
+	setup_crash_memory_ranges();
+
 	/* Initialize the kernel dump memory structure for FAD registration. */
-	init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start);
+	addr = init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start);
+
+	/* Initialize fadump crash info header. */
+	addr = init_fadump_header(addr);
+	vaddr = __va(addr);
+
+	DBG("Creating ELF core headers at %#016lx\n", addr);
+	create_elfcore_headers(vaddr);
 
 	/* register the future kernel dump with firmware. */
 	register_fw_dump(&fdm);
@@ -564,6 +761,16 @@ int __init setup_fadump(void)
 	}
 
 	fadump_show_config();
+
+	/*
+	 * If dump data is available then see if it is valid and prepare for
+	 * saving it to the disk.
+	 */
+	if (fw_dump.dump_active) {
+		process_fadump(fdm_active);
+		return 1;
+	}
+
 	register_fadump();
 
 	return 1;
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 7405407..14627d4 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -13,6 +13,7 @@ extern unsigned long long elfcorehdr_addr;
 
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
 						unsigned long, int);
+extern int init_elfcore_header(char *);
 
 /* Architecture code defines this if there are other possible ELF
  * machine types, e.g. on bi-arch capable hardware. */
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 7525e38..63ae7a0 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -152,6 +152,7 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
 	     region < (memblock.memblock_type.regions + memblock.memblock_type.cnt);	\
 	     region++)
 
+#define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
 
 #ifdef ARCH_DISCARD_MEMBLOCK
 #define __init_memblock __init
diff --git a/kernel/crash_dump.c b/kernel/crash_dump.c
index 5f85690..ce93529 100644
--- a/kernel/crash_dump.c
+++ b/kernel/crash_dump.c
@@ -4,6 +4,10 @@
 #include <linux/errno.h>
 #include <linux/module.h>
 
+#ifndef ELF_CORE_EFLAGS
+#define ELF_CORE_EFLAGS 0
+#endif
+
 /*
  * If we have booted due to a crash, max_pfn will be a very low value. We need
  * to know the amount of memory that the previous kernel used.
@@ -32,3 +36,32 @@ static int __init setup_elfcorehdr(char *arg)
 	return end > arg ? 0 : -EINVAL;
 }
 early_param("elfcorehdr", setup_elfcorehdr);
+
+int init_elfcore_header(char *bufp)
+{
+	struct elfhdr *elf;
+
+	elf = (struct elfhdr *) bufp;
+	bufp += sizeof(struct elfhdr);
+	memcpy(elf->e_ident, ELFMAG, SELFMAG);
+	elf->e_ident[EI_CLASS]	= ELF_CLASS;
+	elf->e_ident[EI_DATA]	= ELF_DATA;
+	elf->e_ident[EI_VERSION] = EV_CURRENT;
+	elf->e_ident[EI_OSABI] = ELF_OSABI;
+	memset(elf->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
+	elf->e_type	= ET_CORE;
+	elf->e_machine	= ELF_ARCH;
+	elf->e_version	= EV_CURRENT;
+	elf->e_entry	= 0;
+	elf->e_phoff	= sizeof(struct elfhdr);
+	elf->e_shoff	= 0;
+	elf->e_flags	= ELF_CORE_EFLAGS;
+	elf->e_ehsize	= sizeof(struct elfhdr);
+	elf->e_phentsize = sizeof(struct elf_phdr);
+	elf->e_phnum	= 0;
+	elf->e_shentsize = 0;
+	elf->e_shnum	= 0;
+	elf->e_shstrndx	= 0;
+
+	return 0;
+}


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 04/10] fadump: Initialize elfcore header and add PT_LOAD program headers.
@ 2011-07-13 18:07   ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Anton Blanchard, Milton Miller, Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Build the crash memory range list by traversing through system memory during
the first kernel before we register for firmware-assisted dump. After the
successful dump registration, initialize the elfcore header and populate
PT_LOAD program headers with crash memory ranges. The elfcore header is
saved in the scratch area within the reserved memory. The scratch area starts
at the end of the memory reserved for saving RMR region contents. The
scratch area contains fadump crash info structure that contains magic number
for fadump validation and physical address where the eflcore header can be
found. This structure will also be used to pass some important crash info
data to the second kernel which will help second kernel to populate ELF core
header with correct data before it gets exported through /proc/vmcore. Since
the firmware preserves the entire partition memory at the time of crash the
contents of the scratch area will be preserved till second kernel boot.

NOTE: The current design implementation does not address a possibility of
introducing additional fields (in future) to this structure without affecting
compatibility. It's on TODO list to come up with better approach to
address this.

Reserved dump area start => +-------------------------------------+
                            |  CPU state dump data                |
                            +-------------------------------------+
                            |  HPTE region data                   |
                            +-------------------------------------+
                            |  RMR region data                    |
Scratch area start       => +-------------------------------------+
                            |  fadump crash info structure {      |
                            |     magic nummber                   |
                     +------|---- elfcorehdr_addr                 |
                     |      |  }                                  |
                     +----> +-------------------------------------+
                            |  ELF core header                    |
Reserved dump area end   => +-------------------------------------+

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |   37 +++++++
 arch/powerpc/kernel/fadump.c      |  209 +++++++++++++++++++++++++++++++++++++
 include/linux/crash_dump.h        |    1 
 include/linux/memblock.h          |    1 
 kernel/crash_dump.c               |   33 ++++++
 5 files changed, 280 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 5568789..4eba2d7 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -44,6 +44,9 @@
 #define FADUMP_UNREGISTER	2
 #define FADUMP_INVALIDATE	3
 
+/* Dump status flag */
+#define FADUMP_ERROR_FLAG	0x2000
+
 /* Kernel Dump section info */
 struct fadump_section {
 	u32	request_flag;
@@ -94,6 +97,7 @@ struct fw_dump {
 	unsigned long	boot_memory_size;
 	unsigned long	reserve_dump_area_start;
 	unsigned long	reserve_dump_area_size;
+	unsigned long	fadumphdr_addr;
 	int		ibm_configure_kernel_dump;
 
 	unsigned long	fadump_enabled:1;
@@ -102,6 +106,39 @@ struct fw_dump {
 	unsigned long	dump_registered:1;
 };
 
+/*
+ * Copy the ascii values for first 8 characters from a string into u64
+ * variable at their respective indexes.
+ * e.g.
+ *  The string "FADMPINF" will be converted into 0x4641444d50494e46
+ */
+static inline u64 str_to_u64(const char *str)
+{
+	u64 val = 0;
+	int i;
+
+	for (i = 0; i < sizeof(val); i++)
+		val = (*str) ? (val << 8) | *str++ : val << 8;
+	return val;
+}
+#define STR_TO_HEX(x)	str_to_u64(x)
+
+#define FADUMP_CRASH_INFO_MAGIC		STR_TO_HEX("FADMPINF")
+
+/* fadump crash info structure */
+struct fadump_crash_info_header {
+	u64		magic_number;
+	u64		elfcorehdr_addr;
+};
+
+/* Crash memory ranges */
+#define INIT_CRASHMEM_RANGES	(INIT_MEMBLOCK_REGIONS + 2)
+
+struct fad_crash_memory_ranges {
+	unsigned long long	base;
+	unsigned long long	size;
+};
+
 extern int early_init_dt_scan_fw_dump(unsigned long node,
 		const char *uname, int depth, void *data);
 extern int fadump_reserve_mem(void);
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 0130ed7..b6f4a8e 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -29,6 +29,7 @@
 #include <linux/string.h>
 #include <linux/memblock.h>
 #include <linux/delay.h>
+#include <linux/crash_dump.h>
 
 #include <asm/page.h>
 #include <asm/prom.h>
@@ -59,6 +60,9 @@ static struct fw_dump fw_dump;
 static struct fadump_mem_struct fdm;
 static const struct fadump_mem_struct *fdm_active;
 
+struct fad_crash_memory_ranges crash_memory_ranges[INIT_CRASHMEM_RANGES];
+int crash_mem_ranges;
+
 /* Scan the Firmware Assisted dump configuration details. */
 int __init early_init_dt_scan_fw_dump(unsigned long node,
 			const char *uname, int depth, void *data)
@@ -312,6 +316,10 @@ static unsigned long get_dump_area_size(void)
 	size += fw_dump.cpu_state_data_size;
 	size += fw_dump.hpte_region_size;
 	size += fw_dump.boot_memory_size;
+	size += sizeof(struct fadump_crash_info_header);
+	size += sizeof(struct elfhdr); /* ELF core header.*/
+	/* Program headers for crash memory regions. */
+	size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
 
 	size = PAGE_ALIGN(size);
 	return size;
@@ -377,6 +385,11 @@ int __init fadump_reserve_mem(void)
 				"for saving crash dump\n",
 				(unsigned long)(size >> 20),
 				(unsigned long)(base >> 20));
+
+		fw_dump.fadumphdr_addr =
+				fdm_active->rmr_region.destination_address +
+				fdm_active->rmr_region.source_len;
+		DBG("fadumphdr_addr = %p\n", (void *) fw_dump.fadumphdr_addr);
 	} else {
 		/* Reserve the memory at the top of memory. */
 		size = get_dump_area_size();
@@ -448,8 +461,183 @@ static void register_fw_dump(struct fadump_mem_struct *fdm)
 	}
 }
 
+/*
+ * Validate and process the dump data stored by firmware before exporting
+ * it through '/proc/vmcore'.
+ */
+static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
+{
+	struct fadump_crash_info_header *fdh;
+
+	if (!fdm_active || !fw_dump.fadumphdr_addr)
+		return -EINVAL;
+
+	show_fadump_mem_struct(fdm_active);
+
+	/* Check if the dump data is valid. */
+	if ((fdm_active->header.dump_status_flag == FADUMP_ERROR_FLAG) ||
+			(fdm_active->rmr_region.error_flags != 0)) {
+		printk(KERN_ERR "Dump taken by platform is not valid\n");
+		return -EINVAL;
+	}
+	if (fdm_active->rmr_region.bytes_dumped !=
+			fdm_active->rmr_region.source_len) {
+		printk(KERN_ERR "Dump taken by platform is incomplete\n");
+		return -EINVAL;
+	}
+
+	/* Validate the fadump crash info header */
+	fdh = __va(fw_dump.fadumphdr_addr);
+	if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) {
+		printk(KERN_ERR "Crash info header is not valid.\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * We are done validating dump info and elfcore header is now ready
+	 * to be exported. set elfcorehdr_addr so that vmcore module will
+	 * export the elfcore header through '/proc/vmcore'.
+	 */
+	elfcorehdr_addr = fdh->elfcorehdr_addr;
+
+	return 0;
+}
+
+static inline void add_crash_memory(unsigned long long base,
+					unsigned long long end)
+{
+	if (base == end)
+		return;
+
+	DBG("crash_memory_range[%d] [%#016llx-%#016llx], %#llx bytes\n",
+		crash_mem_ranges, base, end - 1, (end - base));
+	crash_memory_ranges[crash_mem_ranges].base = base;
+	crash_memory_ranges[crash_mem_ranges].size = end - base;
+	crash_mem_ranges++;
+}
+
+static void exclude_reserved_area(unsigned long long start,
+					unsigned long long end)
+{
+	unsigned long long ra_start, ra_end;
+
+	ra_start = fw_dump.reserve_dump_area_start;
+	ra_end = ra_start + fw_dump.reserve_dump_area_size;
+
+	if ((ra_start < end) && (ra_end > start)) {
+		if ((start < ra_start) && (end > ra_end)) {
+			add_crash_memory(start, ra_start);
+			add_crash_memory(ra_end, end);
+		} else if (start < ra_start) {
+			add_crash_memory(start, ra_start);
+		} else if (ra_end < end) {
+			add_crash_memory(ra_end, end);
+		}
+	} else
+		add_crash_memory(start, end);
+}
+
+/*
+ * Traverse through memblock structure and setup crash memory ranges. These
+ * ranges will be used create PT_LOAD program headers in elfcore header.
+ */
+static void setup_crash_memory_ranges(void)
+{
+	struct memblock_region *reg;
+	unsigned long long start, end;
+
+	DBG("Setup crash memory ranges.\n");
+	crash_mem_ranges = 0;
+	/*
+	 * add the first memory chunk (RMR_START through boot_memory_size) as
+	 * a separate memory chunk. The reason is, at the time crash firmware
+	 * will move the content of this memory chunk to different location
+	 * specified during fadump registration. We need to create a separate
+	 * program header for this chunk with the correct offset.
+	 */
+	add_crash_memory(RMR_START, fw_dump.boot_memory_size);
+
+	for_each_memblock(memory, reg) {
+		start = (unsigned long long)reg->base;
+		end = start + (unsigned long long)reg->size;
+		if (start == RMR_START && end >= fw_dump.boot_memory_size)
+			start = fw_dump.boot_memory_size;
+
+		/* add this range excluding the reserved dump area. */
+		exclude_reserved_area(start, end);
+	}
+}
+
+static int create_elfcore_headers(char *bufp)
+{
+	struct elfhdr *elf;
+	struct elf_phdr *phdr;
+	int i;
+
+	init_elfcore_header(bufp);
+	elf = (struct elfhdr *)bufp;
+	bufp += sizeof(struct elfhdr);
+
+	/* setup PT_LOAD sections. */
+
+	for (i = 0; i < crash_mem_ranges; i++) {
+		unsigned long long mbase, msize;
+		mbase = crash_memory_ranges[i].base;
+		msize = crash_memory_ranges[i].size;
+
+		if (!msize)
+			continue;
+
+		phdr = (struct elf_phdr *)bufp;
+		bufp += sizeof(struct elf_phdr);
+		phdr->p_type	= PT_LOAD;
+		phdr->p_flags	= PF_R|PF_W|PF_X;
+		phdr->p_offset	= mbase;
+
+		if (mbase == RMR_START) {
+			/*
+			 * The entire RMR region will be moved by firmware
+			 * to the specified destination_address. Hence set
+			 * the correct offset.
+			 */
+			phdr->p_offset = fdm.rmr_region.destination_address;
+		}
+
+		phdr->p_paddr = mbase;
+		phdr->p_vaddr = (unsigned long)__va(mbase);
+		phdr->p_filesz = msize;
+		phdr->p_memsz = msize;
+		phdr->p_align = 0;
+
+		/* Increment number of program headers. */
+		(elf->e_phnum)++;
+	}
+	return 0;
+}
+
+static unsigned long init_fadump_header(unsigned long addr)
+{
+	struct fadump_crash_info_header *fdh;
+
+	if (!addr)
+		return 0;
+
+	fw_dump.fadumphdr_addr = addr;
+	fdh = __va(addr);
+	addr += sizeof(struct fadump_crash_info_header);
+
+	memset(fdh, 0, sizeof(struct fadump_crash_info_header));
+	fdh->magic_number = FADUMP_CRASH_INFO_MAGIC;
+	fdh->elfcorehdr_addr = addr;
+
+	return addr;
+}
+
 static void register_fadump(void)
 {
+	unsigned long addr;
+	void *vaddr;
+
 	/*
 	 * If no memory is reserved then we can not register for firmware-
 	 * assisted dump.
@@ -457,8 +645,17 @@ static void register_fadump(void)
 	if (!fw_dump.reserve_dump_area_size)
 		return;
 
+	setup_crash_memory_ranges();
+
 	/* Initialize the kernel dump memory structure for FAD registration. */
-	init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start);
+	addr = init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start);
+
+	/* Initialize fadump crash info header. */
+	addr = init_fadump_header(addr);
+	vaddr = __va(addr);
+
+	DBG("Creating ELF core headers at %#016lx\n", addr);
+	create_elfcore_headers(vaddr);
 
 	/* register the future kernel dump with firmware. */
 	register_fw_dump(&fdm);
@@ -564,6 +761,16 @@ int __init setup_fadump(void)
 	}
 
 	fadump_show_config();
+
+	/*
+	 * If dump data is available then see if it is valid and prepare for
+	 * saving it to the disk.
+	 */
+	if (fw_dump.dump_active) {
+		process_fadump(fdm_active);
+		return 1;
+	}
+
 	register_fadump();
 
 	return 1;
diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 7405407..14627d4 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -13,6 +13,7 @@ extern unsigned long long elfcorehdr_addr;
 
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
 						unsigned long, int);
+extern int init_elfcore_header(char *);
 
 /* Architecture code defines this if there are other possible ELF
  * machine types, e.g. on bi-arch capable hardware. */
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 7525e38..63ae7a0 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -152,6 +152,7 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
 	     region < (memblock.memblock_type.regions + memblock.memblock_type.cnt);	\
 	     region++)
 
+#define memblock_num_regions(memblock_type)	(memblock.memblock_type.cnt)
 
 #ifdef ARCH_DISCARD_MEMBLOCK
 #define __init_memblock __init
diff --git a/kernel/crash_dump.c b/kernel/crash_dump.c
index 5f85690..ce93529 100644
--- a/kernel/crash_dump.c
+++ b/kernel/crash_dump.c
@@ -4,6 +4,10 @@
 #include <linux/errno.h>
 #include <linux/module.h>
 
+#ifndef ELF_CORE_EFLAGS
+#define ELF_CORE_EFLAGS 0
+#endif
+
 /*
  * If we have booted due to a crash, max_pfn will be a very low value. We need
  * to know the amount of memory that the previous kernel used.
@@ -32,3 +36,32 @@ static int __init setup_elfcorehdr(char *arg)
 	return end > arg ? 0 : -EINVAL;
 }
 early_param("elfcorehdr", setup_elfcorehdr);
+
+int init_elfcore_header(char *bufp)
+{
+	struct elfhdr *elf;
+
+	elf = (struct elfhdr *) bufp;
+	bufp += sizeof(struct elfhdr);
+	memcpy(elf->e_ident, ELFMAG, SELFMAG);
+	elf->e_ident[EI_CLASS]	= ELF_CLASS;
+	elf->e_ident[EI_DATA]	= ELF_DATA;
+	elf->e_ident[EI_VERSION] = EV_CURRENT;
+	elf->e_ident[EI_OSABI] = ELF_OSABI;
+	memset(elf->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
+	elf->e_type	= ET_CORE;
+	elf->e_machine	= ELF_ARCH;
+	elf->e_version	= EV_CURRENT;
+	elf->e_entry	= 0;
+	elf->e_phoff	= sizeof(struct elfhdr);
+	elf->e_shoff	= 0;
+	elf->e_flags	= ELF_CORE_EFLAGS;
+	elf->e_ehsize	= sizeof(struct elfhdr);
+	elf->e_phentsize = sizeof(struct elf_phdr);
+	elf->e_phnum	= 0;
+	elf->e_shentsize = 0;
+	elf->e_shnum	= 0;
+	elf->e_shstrndx	= 0;
+
+	return 0;
+}

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 05/10] fadump: Convert firmware-assisted cpu state dump data into elf notes.
  2011-07-13 18:06 ` Mahesh J Salgaonkar
@ 2011-07-13 18:07   ` Mahesh J Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

When registered for firmware assisted dump on powerpc, firmware preserves
the registers for the active CPUs during a system crash. This patch reads
the cpu register data stored in Firmware-assisted dump format (except for
crashing cpu) and converts it into elf notes and updates the PT_NOTE program
header accordingly. The exact register state for crashing cpu is saved to
fadump crash info structure in scratch area during crash_fadump() and read
during second kernel boot.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |   43 ++++++
 arch/powerpc/kernel/fadump.c      |  277 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/traps.c       |    5 +
 kernel/panic.c                    |   16 ++
 4 files changed, 339 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 4eba2d7..1faa980 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -47,6 +47,18 @@
 /* Dump status flag */
 #define FADUMP_ERROR_FLAG	0x2000
 
+#define FADUMP_CPU_ID_MASK	((1UL << 32) - 1)
+
+#define CPU_UNKNOWN		(~((u32)0))
+
+/* Utility macros */
+#define SKIP_TO_NEXT_CPU(reg_entry)			\
+({							\
+	while (reg_entry->reg_id != REG_ID("CPUEND"))	\
+		reg_entry++;				\
+	reg_entry++;					\
+})
+
 /* Kernel Dump section info */
 struct fadump_section {
 	u32	request_flag;
@@ -98,6 +110,9 @@ struct fw_dump {
 	unsigned long	reserve_dump_area_start;
 	unsigned long	reserve_dump_area_size;
 	unsigned long	fadumphdr_addr;
+	unsigned long	cpu_notes_buf;
+	unsigned long	cpu_notes_buf_size;
+
 	int		ibm_configure_kernel_dump;
 
 	unsigned long	fadump_enabled:1;
@@ -122,13 +137,40 @@ static inline u64 str_to_u64(const char *str)
 	return val;
 }
 #define STR_TO_HEX(x)	str_to_u64(x)
+#define REG_ID(x)	str_to_u64(x)
 
 #define FADUMP_CRASH_INFO_MAGIC		STR_TO_HEX("FADMPINF")
+#define REGSAVE_AREA_MAGIC		STR_TO_HEX("REGSAVE")
+
+/* The firmware-assisted dump format.
+ *
+ * The register save area is an area in the partition's memory used to preserve
+ * the register contents (CPU state data) for the active CPUs during a firmware
+ * assisted dump. The dump format contains register save area header followed
+ * by register entries. Each list of registers for a CPU starts with
+ * "CPUSTRT" and ends with "CPUEND".
+ */
+
+/* Register save area header. */
+struct fadump_reg_save_area_header {
+	u64		magic_number;
+	u32		version;
+	u32		num_cpu_offset;
+};
+
+/* Register entry. */
+struct fadump_reg_entry {
+	u64		reg_id;
+	u64		reg_value;
+};
 
 /* fadump crash info structure */
 struct fadump_crash_info_header {
 	u64		magic_number;
 	u64		elfcorehdr_addr;
+	u32		crashing_cpu;
+	struct pt_regs	regs;
+	struct cpumask	cpu_online_mask;
 };
 
 /* Crash memory ranges */
@@ -144,5 +186,6 @@ extern int early_init_dt_scan_fw_dump(unsigned long node,
 extern int fadump_reserve_mem(void);
 extern int setup_fadump(void);
 extern int is_fadump_active(void);
+extern void crash_fadump(struct pt_regs *, const char *);
 #endif
 #endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b6f4a8e..51e645c 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -318,6 +318,7 @@ static unsigned long get_dump_area_size(void)
 	size += fw_dump.boot_memory_size;
 	size += sizeof(struct fadump_crash_info_header);
 	size += sizeof(struct elfhdr); /* ELF core header.*/
+	size += sizeof(struct elf_phdr); /* place holder for cpu notes */
 	/* Program headers for crash memory regions. */
 	size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
 
@@ -461,6 +462,247 @@ static void register_fw_dump(struct fadump_mem_struct *fdm)
 	}
 }
 
+void crash_fadump(struct pt_regs *regs, const char *str)
+{
+	struct fadump_crash_info_header *fdh = NULL;
+
+	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr)
+		return;
+
+	fdh = __va(fw_dump.fadumphdr_addr);
+	crashing_cpu = smp_processor_id();
+	fdh->crashing_cpu = crashing_cpu;
+	crash_save_vmcoreinfo();
+
+	if (regs)
+		fdh->regs = *regs;
+	else
+		ppc_save_regs(&fdh->regs);
+
+	fdh->cpu_online_mask = *cpu_online_mask;
+
+	/* Call ibm,os-term rtas call to trigger firmware assisted dump */
+	rtas_os_term((char *)str);
+}
+
+#define GPR_MASK	0xffffff0000000000
+static inline int gpr_index(u64 id)
+{
+	int i = -1;
+	char str[3];
+
+	if ((id & GPR_MASK) == REG_ID("GPR")) {
+		/* get the digits at the end */
+		id &= ~GPR_MASK;
+		id >>= 24;
+		str[2] = '\0';
+		str[1] = id & 0xff;
+		str[0] = (id >> 8) & 0xff;
+		sscanf(str, "%d", &i);
+		if (i > 31)
+			i = -1;
+	}
+	return i;
+}
+
+static inline void set_regval(struct pt_regs *regs, u64 reg_id, u64 reg_val)
+{
+	int i;
+
+	i = gpr_index(reg_id);
+	if (i >= 0)
+		regs->gpr[i] = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("NIA"))
+		regs->nip = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("MSR"))
+		regs->msr = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("CTR"))
+		regs->ctr = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("LR"))
+		regs->link = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("XER"))
+		regs->xer = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("CR"))
+		regs->ccr = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("DAR"))
+		regs->dar = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("DSISR"))
+		regs->dsisr = (unsigned long)reg_val;
+}
+
+static struct fadump_reg_entry*
+read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs)
+{
+	memset(regs, 0, sizeof(struct pt_regs));
+
+	while (reg_entry->reg_id != REG_ID("CPUEND")) {
+		set_regval(regs, reg_entry->reg_id, reg_entry->reg_value);
+		reg_entry++;
+	}
+	reg_entry++;
+	return reg_entry;
+}
+
+static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data,
+			    size_t data_len)
+{
+	struct elf_note note;
+
+	note.n_namesz = strlen(name) + 1;
+	note.n_descsz = data_len;
+	note.n_type   = type;
+	memcpy(buf, &note, sizeof(note));
+	buf += (sizeof(note) + 3)/4;
+	memcpy(buf, name, note.n_namesz);
+	buf += (note.n_namesz + 3)/4;
+	memcpy(buf, data, note.n_descsz);
+	buf += (note.n_descsz + 3)/4;
+
+	return buf;
+}
+
+static void final_note(u32 *buf)
+{
+	struct elf_note note;
+
+	note.n_namesz = 0;
+	note.n_descsz = 0;
+	note.n_type   = 0;
+	memcpy(buf, &note, sizeof(note));
+}
+
+static u32 *regs_to_elf_notes(u32 *buf, struct pt_regs *regs)
+{
+	struct elf_prstatus prstatus;
+
+	memset(&prstatus, 0, sizeof(prstatus));
+	/*
+	 * FIXME: How do i get PID? Do I really need it?
+	 * prstatus.pr_pid = ????
+	 */
+	elf_core_copy_kernel_regs(&prstatus.pr_reg, regs);
+	buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS,
+				&prstatus, sizeof(prstatus));
+	return buf;
+}
+
+static void update_elfcore_header(char *bufp)
+{
+	struct elfhdr *elf;
+	struct elf_phdr *phdr;
+
+	elf = (struct elfhdr *)bufp;
+	bufp += sizeof(struct elfhdr);
+
+	/* First note is a place holder for cpu notes info. */
+	phdr = (struct elf_phdr *)bufp;
+
+	if (phdr->p_type == PT_NOTE) {
+		phdr->p_paddr = fw_dump.cpu_notes_buf;
+		phdr->p_offset	= phdr->p_paddr;
+		phdr->p_filesz	= fw_dump.cpu_notes_buf_size;
+		phdr->p_memsz = fw_dump.cpu_notes_buf_size;
+	}
+	return;
+}
+
+/*
+ * Read CPU state dump data and convert it into ELF notes.
+ * The CPU dump starts with magic number "REGSAVE". NumCpusOffset should be
+ * used to access the data to allow for additional fields to be added without
+ * affecting compatibility. Each list of registers for a CPU starts with
+ * "CPUSTRT" and ends with "CPUEND". Each register entry is of 16 bytes,
+ * 8 Byte ASCII identifier and 8 Byte register value. The register entry
+ * with identifier "CPUSTRT" and "CPUEND" contains 4 byte cpu id as part
+ * of register value. For more details refer to PAPR document.
+ *
+ * Only for the crashing cpu we ignore the CPU dump data and get exact
+ * state from fadump crash info structure populated by first kernel at the
+ * time of crash.
+ */
+static int __init build_cpu_notes(const struct fadump_mem_struct *fdm)
+{
+	struct fadump_reg_save_area_header *reg_header;
+	struct fadump_reg_entry *reg_entry;
+	struct fadump_crash_info_header *fdh = NULL;
+	void *vaddr;
+	unsigned long addr;
+	u32 num_cpus, *note_buf;
+	struct pt_regs regs;
+	int i, rc = 0, cpu = 0;
+
+	if (!fdm->cpu_state_data.bytes_dumped)
+		return -EINVAL;
+
+	addr = fdm->cpu_state_data.destination_address;
+	vaddr = __va(addr);
+
+	reg_header = vaddr;
+	if (reg_header->magic_number != REGSAVE_AREA_MAGIC) {
+		printk(KERN_ERR "Unable to read register save area.\n");
+		return -ENOENT;
+	}
+	DBG("--------CPU State Data------------\n");
+	DBG("Magic Number: %llx\n", reg_header->magic_number);
+	DBG("NumCpuOffset: %x\n", reg_header->num_cpu_offset);
+
+	vaddr += reg_header->num_cpu_offset;
+	num_cpus = *((u32 *)(vaddr));
+	DBG("NumCpus     : %u\n", num_cpus);
+	vaddr += sizeof(u32);
+	reg_entry = (struct fadump_reg_entry *)vaddr;
+
+	/* Allocate buffer to hold cpu crash notes. */
+	fw_dump.cpu_notes_buf_size = num_cpus * sizeof(note_buf_t);
+	fw_dump.cpu_notes_buf_size = PAGE_ALIGN(fw_dump.cpu_notes_buf_size);
+	addr = memblock_alloc(fw_dump.cpu_notes_buf_size, PAGE_SIZE);
+	fw_dump.cpu_notes_buf = addr;
+
+	note_buf = (u32 *)__va(addr);
+	DBG("Allocated buffer for cpu notes of size %ld at %p\n",
+			(num_cpus * sizeof(note_buf_t)), note_buf);
+
+	if (fw_dump.fadumphdr_addr)
+		fdh = __va(fw_dump.fadumphdr_addr);
+
+	for (i = 0; i < num_cpus; i++) {
+		if (reg_entry->reg_id != REG_ID("CPUSTRT")) {
+			printk(KERN_ERR "Unable to read CPU state data\n");
+			rc = -ENOENT;
+			goto error_out;
+		}
+		/* Lower 4 bytes of reg_value contains logical cpu id */
+		cpu = reg_entry->reg_value & FADUMP_CPU_ID_MASK;
+		if (!cpumask_test_cpu(cpu, &fdh->cpu_online_mask)) {
+			SKIP_TO_NEXT_CPU(reg_entry);
+			continue;
+		}
+		DBG("Reading register data for cpu %d...\n", cpu);
+		if (fdh && fdh->crashing_cpu == cpu) {
+			regs = fdh->regs;
+			note_buf = regs_to_elf_notes(note_buf, &regs);
+			SKIP_TO_NEXT_CPU(reg_entry);
+		} else {
+			reg_entry++;
+			reg_entry = read_registers(reg_entry, &regs);
+			note_buf = regs_to_elf_notes(note_buf, &regs);
+		}
+	}
+	final_note(note_buf);
+
+	DBG("Updating elfcore header (%llx) with cpu notes\n",
+							fdh->elfcorehdr_addr);
+	update_elfcore_header((char *)__va(fdh->elfcorehdr_addr));
+	return 0;
+
+error_out:
+	memblock_free(fw_dump.cpu_notes_buf, fw_dump.cpu_notes_buf_size);
+	fw_dump.cpu_notes_buf = 0;
+	fw_dump.cpu_notes_buf_size = 0;
+	return rc;
+
+}
+
 /*
  * Validate and process the dump data stored by firmware before exporting
  * it through '/proc/vmcore'.
@@ -468,6 +710,7 @@ static void register_fw_dump(struct fadump_mem_struct *fdm)
 static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
 {
 	struct fadump_crash_info_header *fdh;
+	int rc = 0;
 
 	if (!fdm_active || !fw_dump.fadumphdr_addr)
 		return -EINVAL;
@@ -476,12 +719,14 @@ static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
 
 	/* Check if the dump data is valid. */
 	if ((fdm_active->header.dump_status_flag == FADUMP_ERROR_FLAG) ||
+			(fdm_active->cpu_state_data.error_flags != 0) ||
 			(fdm_active->rmr_region.error_flags != 0)) {
 		printk(KERN_ERR "Dump taken by platform is not valid\n");
 		return -EINVAL;
 	}
-	if (fdm_active->rmr_region.bytes_dumped !=
-			fdm_active->rmr_region.source_len) {
+	if ((fdm_active->rmr_region.bytes_dumped !=
+			fdm_active->rmr_region.source_len) ||
+			!fdm_active->cpu_state_data.bytes_dumped) {
 		printk(KERN_ERR "Dump taken by platform is incomplete\n");
 		return -EINVAL;
 	}
@@ -493,6 +738,10 @@ static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
 		return -EINVAL;
 	}
 
+	rc = build_cpu_notes(fdm_active);
+	if (rc)
+		return rc;
+
 	/*
 	 * We are done validating dump info and elfcore header is now ready
 	 * to be exported. set elfcorehdr_addr so that vmcore module will
@@ -578,6 +827,28 @@ static int create_elfcore_headers(char *bufp)
 	elf = (struct elfhdr *)bufp;
 	bufp += sizeof(struct elfhdr);
 
+	/*
+	 * setup ELF PT_NOTE, place holder for cpu notes info. The notes info
+	 * will be populated during second kernel boot after crash. Hence
+	 * this PT_NOTE will always be the first elf note.
+	 *
+	 * NOTE: Any new ELF note addition should be placed after this note.
+	 */
+	phdr = (struct elf_phdr *)bufp;
+	bufp += sizeof(struct elf_phdr);
+	phdr->p_type	= PT_NOTE;
+	phdr->p_flags	= 0;
+	phdr->p_vaddr	= 0;
+	phdr->p_align	= 0;
+
+	phdr->p_offset	= 0;
+	phdr->p_paddr	= 0;
+	phdr->p_filesz	= 0;
+	phdr->p_memsz	= 0;
+
+	/* Increment number of program headers. */
+	(elf->e_phnum)++;
+
 	/* setup PT_LOAD sections. */
 
 	for (i = 0; i < crash_mem_ranges; i++) {
@@ -629,6 +900,8 @@ static unsigned long init_fadump_header(unsigned long addr)
 	memset(fdh, 0, sizeof(struct fadump_crash_info_header));
 	fdh->magic_number = FADUMP_CRASH_INFO_MAGIC;
 	fdh->elfcorehdr_addr = addr;
+	/* We will set the crashing cpu id in crash_fadump() during crash. */
+	fdh->crashing_cpu = CPU_UNKNOWN;
 
 	return addr;
 }
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 1a01414..e9ad3c5 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -57,6 +57,7 @@
 #include <asm/kexec.h>
 #include <asm/ppc-opcode.h>
 #include <asm/rio.h>
+#include <asm/fadump.h>
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -160,6 +161,10 @@ int die(const char *str, struct pt_regs *regs, long err)
 	add_taint(TAINT_DIE);
 	raw_spin_unlock_irqrestore(&die.lock, flags);
 
+#ifdef CONFIG_FA_DUMP
+	crash_fadump(regs, str);
+#endif
+
 	if (kexec_should_crash(current) ||
 		kexec_sr_activated(smp_processor_id()))
 		crash_kexec(regs);
diff --git a/kernel/panic.c b/kernel/panic.c
index 6923167..1965b50 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -49,6 +49,15 @@ static long no_blink(int state)
 long (*panic_blink)(int state);
 EXPORT_SYMBOL(panic_blink);
 
+#ifdef CONFIG_FA_DUMP
+/*
+ * provide an empty default implementation here -- architecture
+ * code may override this
+ */
+void __attribute__ ((weak)) crash_fadump(struct pt_regs *regs, const char *str)
+{}
+#endif
+
 /**
  *	panic - halt the system
  *	@fmt: The text string to print
@@ -81,6 +90,13 @@ NORET_TYPE void panic(const char * fmt, ...)
 	dump_stack();
 #endif
 
+#ifdef CONFIG_FA_DUMP
+	/*
+	 * If firmware-assisted dump has been registered then trigger
+	 * firmware-assisted dump and let firmware handle everything else.
+	 */
+	crash_fadump(NULL, buf);
+#endif
 	/*
 	 * If we have crashed and we have a crash kernel loaded let it handle
 	 * everything else.


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 05/10] fadump: Convert firmware-assisted cpu state dump data into elf notes.
@ 2011-07-13 18:07   ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Anton Blanchard, Milton Miller, Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

When registered for firmware assisted dump on powerpc, firmware preserves
the registers for the active CPUs during a system crash. This patch reads
the cpu register data stored in Firmware-assisted dump format (except for
crashing cpu) and converts it into elf notes and updates the PT_NOTE program
header accordingly. The exact register state for crashing cpu is saved to
fadump crash info structure in scratch area during crash_fadump() and read
during second kernel boot.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |   43 ++++++
 arch/powerpc/kernel/fadump.c      |  277 +++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/traps.c       |    5 +
 kernel/panic.c                    |   16 ++
 4 files changed, 339 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 4eba2d7..1faa980 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -47,6 +47,18 @@
 /* Dump status flag */
 #define FADUMP_ERROR_FLAG	0x2000
 
+#define FADUMP_CPU_ID_MASK	((1UL << 32) - 1)
+
+#define CPU_UNKNOWN		(~((u32)0))
+
+/* Utility macros */
+#define SKIP_TO_NEXT_CPU(reg_entry)			\
+({							\
+	while (reg_entry->reg_id != REG_ID("CPUEND"))	\
+		reg_entry++;				\
+	reg_entry++;					\
+})
+
 /* Kernel Dump section info */
 struct fadump_section {
 	u32	request_flag;
@@ -98,6 +110,9 @@ struct fw_dump {
 	unsigned long	reserve_dump_area_start;
 	unsigned long	reserve_dump_area_size;
 	unsigned long	fadumphdr_addr;
+	unsigned long	cpu_notes_buf;
+	unsigned long	cpu_notes_buf_size;
+
 	int		ibm_configure_kernel_dump;
 
 	unsigned long	fadump_enabled:1;
@@ -122,13 +137,40 @@ static inline u64 str_to_u64(const char *str)
 	return val;
 }
 #define STR_TO_HEX(x)	str_to_u64(x)
+#define REG_ID(x)	str_to_u64(x)
 
 #define FADUMP_CRASH_INFO_MAGIC		STR_TO_HEX("FADMPINF")
+#define REGSAVE_AREA_MAGIC		STR_TO_HEX("REGSAVE")
+
+/* The firmware-assisted dump format.
+ *
+ * The register save area is an area in the partition's memory used to preserve
+ * the register contents (CPU state data) for the active CPUs during a firmware
+ * assisted dump. The dump format contains register save area header followed
+ * by register entries. Each list of registers for a CPU starts with
+ * "CPUSTRT" and ends with "CPUEND".
+ */
+
+/* Register save area header. */
+struct fadump_reg_save_area_header {
+	u64		magic_number;
+	u32		version;
+	u32		num_cpu_offset;
+};
+
+/* Register entry. */
+struct fadump_reg_entry {
+	u64		reg_id;
+	u64		reg_value;
+};
 
 /* fadump crash info structure */
 struct fadump_crash_info_header {
 	u64		magic_number;
 	u64		elfcorehdr_addr;
+	u32		crashing_cpu;
+	struct pt_regs	regs;
+	struct cpumask	cpu_online_mask;
 };
 
 /* Crash memory ranges */
@@ -144,5 +186,6 @@ extern int early_init_dt_scan_fw_dump(unsigned long node,
 extern int fadump_reserve_mem(void);
 extern int setup_fadump(void);
 extern int is_fadump_active(void);
+extern void crash_fadump(struct pt_regs *, const char *);
 #endif
 #endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b6f4a8e..51e645c 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -318,6 +318,7 @@ static unsigned long get_dump_area_size(void)
 	size += fw_dump.boot_memory_size;
 	size += sizeof(struct fadump_crash_info_header);
 	size += sizeof(struct elfhdr); /* ELF core header.*/
+	size += sizeof(struct elf_phdr); /* place holder for cpu notes */
 	/* Program headers for crash memory regions. */
 	size += sizeof(struct elf_phdr) * (memblock_num_regions(memory) + 2);
 
@@ -461,6 +462,247 @@ static void register_fw_dump(struct fadump_mem_struct *fdm)
 	}
 }
 
+void crash_fadump(struct pt_regs *regs, const char *str)
+{
+	struct fadump_crash_info_header *fdh = NULL;
+
+	if (!fw_dump.dump_registered || !fw_dump.fadumphdr_addr)
+		return;
+
+	fdh = __va(fw_dump.fadumphdr_addr);
+	crashing_cpu = smp_processor_id();
+	fdh->crashing_cpu = crashing_cpu;
+	crash_save_vmcoreinfo();
+
+	if (regs)
+		fdh->regs = *regs;
+	else
+		ppc_save_regs(&fdh->regs);
+
+	fdh->cpu_online_mask = *cpu_online_mask;
+
+	/* Call ibm,os-term rtas call to trigger firmware assisted dump */
+	rtas_os_term((char *)str);
+}
+
+#define GPR_MASK	0xffffff0000000000
+static inline int gpr_index(u64 id)
+{
+	int i = -1;
+	char str[3];
+
+	if ((id & GPR_MASK) == REG_ID("GPR")) {
+		/* get the digits at the end */
+		id &= ~GPR_MASK;
+		id >>= 24;
+		str[2] = '\0';
+		str[1] = id & 0xff;
+		str[0] = (id >> 8) & 0xff;
+		sscanf(str, "%d", &i);
+		if (i > 31)
+			i = -1;
+	}
+	return i;
+}
+
+static inline void set_regval(struct pt_regs *regs, u64 reg_id, u64 reg_val)
+{
+	int i;
+
+	i = gpr_index(reg_id);
+	if (i >= 0)
+		regs->gpr[i] = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("NIA"))
+		regs->nip = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("MSR"))
+		regs->msr = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("CTR"))
+		regs->ctr = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("LR"))
+		regs->link = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("XER"))
+		regs->xer = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("CR"))
+		regs->ccr = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("DAR"))
+		regs->dar = (unsigned long)reg_val;
+	else if (reg_id == REG_ID("DSISR"))
+		regs->dsisr = (unsigned long)reg_val;
+}
+
+static struct fadump_reg_entry*
+read_registers(struct fadump_reg_entry *reg_entry, struct pt_regs *regs)
+{
+	memset(regs, 0, sizeof(struct pt_regs));
+
+	while (reg_entry->reg_id != REG_ID("CPUEND")) {
+		set_regval(regs, reg_entry->reg_id, reg_entry->reg_value);
+		reg_entry++;
+	}
+	reg_entry++;
+	return reg_entry;
+}
+
+static u32 *append_elf_note(u32 *buf, char *name, unsigned type, void *data,
+			    size_t data_len)
+{
+	struct elf_note note;
+
+	note.n_namesz = strlen(name) + 1;
+	note.n_descsz = data_len;
+	note.n_type   = type;
+	memcpy(buf, &note, sizeof(note));
+	buf += (sizeof(note) + 3)/4;
+	memcpy(buf, name, note.n_namesz);
+	buf += (note.n_namesz + 3)/4;
+	memcpy(buf, data, note.n_descsz);
+	buf += (note.n_descsz + 3)/4;
+
+	return buf;
+}
+
+static void final_note(u32 *buf)
+{
+	struct elf_note note;
+
+	note.n_namesz = 0;
+	note.n_descsz = 0;
+	note.n_type   = 0;
+	memcpy(buf, &note, sizeof(note));
+}
+
+static u32 *regs_to_elf_notes(u32 *buf, struct pt_regs *regs)
+{
+	struct elf_prstatus prstatus;
+
+	memset(&prstatus, 0, sizeof(prstatus));
+	/*
+	 * FIXME: How do i get PID? Do I really need it?
+	 * prstatus.pr_pid = ????
+	 */
+	elf_core_copy_kernel_regs(&prstatus.pr_reg, regs);
+	buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_PRSTATUS,
+				&prstatus, sizeof(prstatus));
+	return buf;
+}
+
+static void update_elfcore_header(char *bufp)
+{
+	struct elfhdr *elf;
+	struct elf_phdr *phdr;
+
+	elf = (struct elfhdr *)bufp;
+	bufp += sizeof(struct elfhdr);
+
+	/* First note is a place holder for cpu notes info. */
+	phdr = (struct elf_phdr *)bufp;
+
+	if (phdr->p_type == PT_NOTE) {
+		phdr->p_paddr = fw_dump.cpu_notes_buf;
+		phdr->p_offset	= phdr->p_paddr;
+		phdr->p_filesz	= fw_dump.cpu_notes_buf_size;
+		phdr->p_memsz = fw_dump.cpu_notes_buf_size;
+	}
+	return;
+}
+
+/*
+ * Read CPU state dump data and convert it into ELF notes.
+ * The CPU dump starts with magic number "REGSAVE". NumCpusOffset should be
+ * used to access the data to allow for additional fields to be added without
+ * affecting compatibility. Each list of registers for a CPU starts with
+ * "CPUSTRT" and ends with "CPUEND". Each register entry is of 16 bytes,
+ * 8 Byte ASCII identifier and 8 Byte register value. The register entry
+ * with identifier "CPUSTRT" and "CPUEND" contains 4 byte cpu id as part
+ * of register value. For more details refer to PAPR document.
+ *
+ * Only for the crashing cpu we ignore the CPU dump data and get exact
+ * state from fadump crash info structure populated by first kernel at the
+ * time of crash.
+ */
+static int __init build_cpu_notes(const struct fadump_mem_struct *fdm)
+{
+	struct fadump_reg_save_area_header *reg_header;
+	struct fadump_reg_entry *reg_entry;
+	struct fadump_crash_info_header *fdh = NULL;
+	void *vaddr;
+	unsigned long addr;
+	u32 num_cpus, *note_buf;
+	struct pt_regs regs;
+	int i, rc = 0, cpu = 0;
+
+	if (!fdm->cpu_state_data.bytes_dumped)
+		return -EINVAL;
+
+	addr = fdm->cpu_state_data.destination_address;
+	vaddr = __va(addr);
+
+	reg_header = vaddr;
+	if (reg_header->magic_number != REGSAVE_AREA_MAGIC) {
+		printk(KERN_ERR "Unable to read register save area.\n");
+		return -ENOENT;
+	}
+	DBG("--------CPU State Data------------\n");
+	DBG("Magic Number: %llx\n", reg_header->magic_number);
+	DBG("NumCpuOffset: %x\n", reg_header->num_cpu_offset);
+
+	vaddr += reg_header->num_cpu_offset;
+	num_cpus = *((u32 *)(vaddr));
+	DBG("NumCpus     : %u\n", num_cpus);
+	vaddr += sizeof(u32);
+	reg_entry = (struct fadump_reg_entry *)vaddr;
+
+	/* Allocate buffer to hold cpu crash notes. */
+	fw_dump.cpu_notes_buf_size = num_cpus * sizeof(note_buf_t);
+	fw_dump.cpu_notes_buf_size = PAGE_ALIGN(fw_dump.cpu_notes_buf_size);
+	addr = memblock_alloc(fw_dump.cpu_notes_buf_size, PAGE_SIZE);
+	fw_dump.cpu_notes_buf = addr;
+
+	note_buf = (u32 *)__va(addr);
+	DBG("Allocated buffer for cpu notes of size %ld at %p\n",
+			(num_cpus * sizeof(note_buf_t)), note_buf);
+
+	if (fw_dump.fadumphdr_addr)
+		fdh = __va(fw_dump.fadumphdr_addr);
+
+	for (i = 0; i < num_cpus; i++) {
+		if (reg_entry->reg_id != REG_ID("CPUSTRT")) {
+			printk(KERN_ERR "Unable to read CPU state data\n");
+			rc = -ENOENT;
+			goto error_out;
+		}
+		/* Lower 4 bytes of reg_value contains logical cpu id */
+		cpu = reg_entry->reg_value & FADUMP_CPU_ID_MASK;
+		if (!cpumask_test_cpu(cpu, &fdh->cpu_online_mask)) {
+			SKIP_TO_NEXT_CPU(reg_entry);
+			continue;
+		}
+		DBG("Reading register data for cpu %d...\n", cpu);
+		if (fdh && fdh->crashing_cpu == cpu) {
+			regs = fdh->regs;
+			note_buf = regs_to_elf_notes(note_buf, &regs);
+			SKIP_TO_NEXT_CPU(reg_entry);
+		} else {
+			reg_entry++;
+			reg_entry = read_registers(reg_entry, &regs);
+			note_buf = regs_to_elf_notes(note_buf, &regs);
+		}
+	}
+	final_note(note_buf);
+
+	DBG("Updating elfcore header (%llx) with cpu notes\n",
+							fdh->elfcorehdr_addr);
+	update_elfcore_header((char *)__va(fdh->elfcorehdr_addr));
+	return 0;
+
+error_out:
+	memblock_free(fw_dump.cpu_notes_buf, fw_dump.cpu_notes_buf_size);
+	fw_dump.cpu_notes_buf = 0;
+	fw_dump.cpu_notes_buf_size = 0;
+	return rc;
+
+}
+
 /*
  * Validate and process the dump data stored by firmware before exporting
  * it through '/proc/vmcore'.
@@ -468,6 +710,7 @@ static void register_fw_dump(struct fadump_mem_struct *fdm)
 static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
 {
 	struct fadump_crash_info_header *fdh;
+	int rc = 0;
 
 	if (!fdm_active || !fw_dump.fadumphdr_addr)
 		return -EINVAL;
@@ -476,12 +719,14 @@ static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
 
 	/* Check if the dump data is valid. */
 	if ((fdm_active->header.dump_status_flag == FADUMP_ERROR_FLAG) ||
+			(fdm_active->cpu_state_data.error_flags != 0) ||
 			(fdm_active->rmr_region.error_flags != 0)) {
 		printk(KERN_ERR "Dump taken by platform is not valid\n");
 		return -EINVAL;
 	}
-	if (fdm_active->rmr_region.bytes_dumped !=
-			fdm_active->rmr_region.source_len) {
+	if ((fdm_active->rmr_region.bytes_dumped !=
+			fdm_active->rmr_region.source_len) ||
+			!fdm_active->cpu_state_data.bytes_dumped) {
 		printk(KERN_ERR "Dump taken by platform is incomplete\n");
 		return -EINVAL;
 	}
@@ -493,6 +738,10 @@ static int __init process_fadump(const struct fadump_mem_struct *fdm_active)
 		return -EINVAL;
 	}
 
+	rc = build_cpu_notes(fdm_active);
+	if (rc)
+		return rc;
+
 	/*
 	 * We are done validating dump info and elfcore header is now ready
 	 * to be exported. set elfcorehdr_addr so that vmcore module will
@@ -578,6 +827,28 @@ static int create_elfcore_headers(char *bufp)
 	elf = (struct elfhdr *)bufp;
 	bufp += sizeof(struct elfhdr);
 
+	/*
+	 * setup ELF PT_NOTE, place holder for cpu notes info. The notes info
+	 * will be populated during second kernel boot after crash. Hence
+	 * this PT_NOTE will always be the first elf note.
+	 *
+	 * NOTE: Any new ELF note addition should be placed after this note.
+	 */
+	phdr = (struct elf_phdr *)bufp;
+	bufp += sizeof(struct elf_phdr);
+	phdr->p_type	= PT_NOTE;
+	phdr->p_flags	= 0;
+	phdr->p_vaddr	= 0;
+	phdr->p_align	= 0;
+
+	phdr->p_offset	= 0;
+	phdr->p_paddr	= 0;
+	phdr->p_filesz	= 0;
+	phdr->p_memsz	= 0;
+
+	/* Increment number of program headers. */
+	(elf->e_phnum)++;
+
 	/* setup PT_LOAD sections. */
 
 	for (i = 0; i < crash_mem_ranges; i++) {
@@ -629,6 +900,8 @@ static unsigned long init_fadump_header(unsigned long addr)
 	memset(fdh, 0, sizeof(struct fadump_crash_info_header));
 	fdh->magic_number = FADUMP_CRASH_INFO_MAGIC;
 	fdh->elfcorehdr_addr = addr;
+	/* We will set the crashing cpu id in crash_fadump() during crash. */
+	fdh->crashing_cpu = CPU_UNKNOWN;
 
 	return addr;
 }
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 1a01414..e9ad3c5 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -57,6 +57,7 @@
 #include <asm/kexec.h>
 #include <asm/ppc-opcode.h>
 #include <asm/rio.h>
+#include <asm/fadump.h>
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -160,6 +161,10 @@ int die(const char *str, struct pt_regs *regs, long err)
 	add_taint(TAINT_DIE);
 	raw_spin_unlock_irqrestore(&die.lock, flags);
 
+#ifdef CONFIG_FA_DUMP
+	crash_fadump(regs, str);
+#endif
+
 	if (kexec_should_crash(current) ||
 		kexec_sr_activated(smp_processor_id()))
 		crash_kexec(regs);
diff --git a/kernel/panic.c b/kernel/panic.c
index 6923167..1965b50 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -49,6 +49,15 @@ static long no_blink(int state)
 long (*panic_blink)(int state);
 EXPORT_SYMBOL(panic_blink);
 
+#ifdef CONFIG_FA_DUMP
+/*
+ * provide an empty default implementation here -- architecture
+ * code may override this
+ */
+void __attribute__ ((weak)) crash_fadump(struct pt_regs *regs, const char *str)
+{}
+#endif
+
 /**
  *	panic - halt the system
  *	@fmt: The text string to print
@@ -81,6 +90,13 @@ NORET_TYPE void panic(const char * fmt, ...)
 	dump_stack();
 #endif
 
+#ifdef CONFIG_FA_DUMP
+	/*
+	 * If firmware-assisted dump has been registered then trigger
+	 * firmware-assisted dump and let firmware handle everything else.
+	 */
+	crash_fadump(NULL, buf);
+#endif
 	/*
 	 * If we have crashed and we have a crash kernel loaded let it handle
 	 * everything else.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 06/10] fadump: Add PT_NOTE program header for vmcoreinfo
  2011-07-13 18:06 ` Mahesh J Salgaonkar
@ 2011-07-13 18:07   ` Mahesh J Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Introduce a PT_NOTE program header that points to physical address of
vmcoreinfo_note buffer declared in kernel/kexec.c. The vmcoreinfo
note buffer is populated during crash_fadump() at the time of system
crash.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 51e645c..fee132b 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -817,6 +817,19 @@ static void setup_crash_memory_ranges(void)
 	}
 }
 
+/*
+ * If the given physical address falls within the boot memory region then
+ * return the relocated address that points to the dump region reserved
+ * for saving initial boot memory contents.
+ */
+static inline unsigned long relocate(unsigned long paddr)
+{
+	if (paddr > RMR_START && paddr < fw_dump.boot_memory_size)
+		return fdm.rmr_region.destination_address + paddr;
+	else
+		return paddr;
+}
+
 static int create_elfcore_headers(char *bufp)
 {
 	struct elfhdr *elf;
@@ -849,6 +862,22 @@ static int create_elfcore_headers(char *bufp)
 	/* Increment number of program headers. */
 	(elf->e_phnum)++;
 
+	/* setup ELF PT_NOTE for vmcoreinfo */
+	phdr = (struct elf_phdr *)bufp;
+	bufp += sizeof(struct elf_phdr);
+	phdr->p_type	= PT_NOTE;
+	phdr->p_flags	= 0;
+	phdr->p_vaddr	= 0;
+	phdr->p_align	= 0;
+
+	phdr->p_paddr	= relocate(paddr_vmcoreinfo_note());
+	phdr->p_offset	= phdr->p_paddr;
+	phdr->p_memsz	= vmcoreinfo_max_size;
+	phdr->p_filesz	= vmcoreinfo_max_size;
+
+	/* Increment number of program headers. */
+	(elf->e_phnum)++;
+
 	/* setup PT_LOAD sections. */
 
 	for (i = 0; i < crash_mem_ranges; i++) {


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 06/10] fadump: Add PT_NOTE program header for vmcoreinfo
@ 2011-07-13 18:07   ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Anton Blanchard, Milton Miller, Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Introduce a PT_NOTE program header that points to physical address of
vmcoreinfo_note buffer declared in kernel/kexec.c. The vmcoreinfo
note buffer is populated during crash_fadump() at the time of system
crash.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/fadump.c |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 51e645c..fee132b 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -817,6 +817,19 @@ static void setup_crash_memory_ranges(void)
 	}
 }
 
+/*
+ * If the given physical address falls within the boot memory region then
+ * return the relocated address that points to the dump region reserved
+ * for saving initial boot memory contents.
+ */
+static inline unsigned long relocate(unsigned long paddr)
+{
+	if (paddr > RMR_START && paddr < fw_dump.boot_memory_size)
+		return fdm.rmr_region.destination_address + paddr;
+	else
+		return paddr;
+}
+
 static int create_elfcore_headers(char *bufp)
 {
 	struct elfhdr *elf;
@@ -849,6 +862,22 @@ static int create_elfcore_headers(char *bufp)
 	/* Increment number of program headers. */
 	(elf->e_phnum)++;
 
+	/* setup ELF PT_NOTE for vmcoreinfo */
+	phdr = (struct elf_phdr *)bufp;
+	bufp += sizeof(struct elf_phdr);
+	phdr->p_type	= PT_NOTE;
+	phdr->p_flags	= 0;
+	phdr->p_vaddr	= 0;
+	phdr->p_align	= 0;
+
+	phdr->p_paddr	= relocate(paddr_vmcoreinfo_note());
+	phdr->p_offset	= phdr->p_paddr;
+	phdr->p_memsz	= vmcoreinfo_max_size;
+	phdr->p_filesz	= vmcoreinfo_max_size;
+
+	/* Increment number of program headers. */
+	(elf->e_phnum)++;
+
 	/* setup PT_LOAD sections. */
 
 	for (i = 0; i < crash_mem_ranges; i++) {

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 07/10] fadump: Introduce cleanup routine to invalidate /proc/vmcore.
  2011-07-13 18:06 ` Mahesh J Salgaonkar
@ 2011-07-13 18:08   ` Mahesh J Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

With the firmware-assisted dump support we don't require a reboot when we
are in second kernel after crash. The second kernel after crash is a normal
kernel boot and has knowledge about entire system RAM with the page tables
initialized for entire system RAM. Hence once the dump is saved to disk, we
can just release the reserved memory area for general use and continue
with second kernel as production kernel.

Hence when we release the reserved memory that contains dump data, the
'/proc/vmcore' will not be valid anymore. Hence this patch introduces
a cleanup routine that invalidates and removes the /proc/vmcore file. This
routine will be invoked before we release the reserved dump memory area.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 fs/proc/vmcore.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index cd99bf5..1aa3d7b 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -699,3 +699,23 @@ static int __init vmcore_init(void)
 	return 0;
 }
 module_init(vmcore_init)
+
+/* Cleanup function for vmcore module. */
+void vmcore_cleanup(void)
+{
+	struct list_head *pos, *next;
+
+	if (proc_vmcore)
+		remove_proc_entry(proc_vmcore->name, proc_vmcore->parent);
+
+	/* clear the vmcore list. */
+	list_for_each_safe(pos, next, &vmcore_list) {
+		struct vmcore *m;
+
+		m = list_entry(pos, struct vmcore, list);
+		list_del(&m->list);
+		kfree(m);
+	}
+	kfree(elfcorebuf);
+}
+EXPORT_SYMBOL_GPL(vmcore_cleanup);


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 07/10] fadump: Introduce cleanup routine to invalidate /proc/vmcore.
@ 2011-07-13 18:08   ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Anton Blanchard, Milton Miller, Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

With the firmware-assisted dump support we don't require a reboot when we
are in second kernel after crash. The second kernel after crash is a normal
kernel boot and has knowledge about entire system RAM with the page tables
initialized for entire system RAM. Hence once the dump is saved to disk, we
can just release the reserved memory area for general use and continue
with second kernel as production kernel.

Hence when we release the reserved memory that contains dump data, the
'/proc/vmcore' will not be valid anymore. Hence this patch introduces
a cleanup routine that invalidates and removes the /proc/vmcore file. This
routine will be invoked before we release the reserved dump memory area.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 fs/proc/vmcore.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index cd99bf5..1aa3d7b 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -699,3 +699,23 @@ static int __init vmcore_init(void)
 	return 0;
 }
 module_init(vmcore_init)
+
+/* Cleanup function for vmcore module. */
+void vmcore_cleanup(void)
+{
+	struct list_head *pos, *next;
+
+	if (proc_vmcore)
+		remove_proc_entry(proc_vmcore->name, proc_vmcore->parent);
+
+	/* clear the vmcore list. */
+	list_for_each_safe(pos, next, &vmcore_list) {
+		struct vmcore *m;
+
+		m = list_entry(pos, struct vmcore, list);
+		list_del(&m->list);
+		kfree(m);
+	}
+	kfree(elfcorebuf);
+}
+EXPORT_SYMBOL_GPL(vmcore_cleanup);

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 08/10] fadump: Invalidate registration and release reserved memory for general use.
  2011-07-13 18:06 ` Mahesh J Salgaonkar
@ 2011-07-13 18:08   ` Mahesh J Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch introduces an sysfs interface '/sys/kernel/fadump_release_mem' to
invalidate the last fadump registration, invalidate '/proc/vmcore', release
the reserved memory for general use and re-register for future kernel dump.
Once the dump is copied to the disk, the userspace tool will echo 1 to
'/sys/kernel/fadump_release_mem'.

Release the reserved memory region excluding the size of the memory required
for future kernel dump registration.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |    3 +
 arch/powerpc/kernel/fadump.c      |  153 +++++++++++++++++++++++++++++++++++++
 2 files changed, 154 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 1faa980..52fa1db 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -187,5 +187,8 @@ extern int fadump_reserve_mem(void);
 extern int setup_fadump(void);
 extern int is_fadump_active(void);
 extern void crash_fadump(struct pt_regs *, const char *);
+extern void fadump_cleanup(void);
+
+extern void vmcore_cleanup(void);
 #endif
 #endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index fee132b..22c72cb 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -30,6 +30,8 @@
 #include <linux/memblock.h>
 #include <linux/delay.h>
 #include <linux/crash_dump.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
 
 #include <asm/page.h>
 #include <asm/prom.h>
@@ -963,6 +965,137 @@ static void register_fadump(void)
 	register_fw_dump(&fdm);
 }
 
+static int fadump_invalidate_dump(struct fadump_mem_struct *fdm)
+{
+	int rc = 0;
+	unsigned int wait_time;
+
+	DBG("Invalidating firmware-assisted dump registration\n");
+
+	/* TODO: Add upper time limit for the delay */
+	do {
+		rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL,
+			FADUMP_INVALIDATE, fdm,
+			sizeof(struct fadump_mem_struct));
+
+		wait_time = rtas_busy_delay_time(rc);
+		if (wait_time)
+			mdelay(wait_time);
+	} while (wait_time);
+
+	if (rc) {
+		printk(KERN_ERR "Failed to invalidate firmware-assisted dump "
+			"rgistration. unexpected error(%d).\n", rc);
+		return rc;
+	}
+	fw_dump.dump_active = 0;
+	fdm_active = NULL;
+	return 0;
+}
+
+void fadump_cleanup(void)
+{
+	/* Invalidate the registration only if dump is active. */
+	if (fw_dump.dump_active) {
+		init_fadump_mem_struct(&fdm,
+			fdm_active->cpu_state_data.destination_address);
+		fadump_invalidate_dump(&fdm);
+	}
+}
+
+/*
+ * Release the memory that was reserved in early boot to preserve the memory
+ * contents. The released memory will be available for general use.
+ */
+static void fadump_release_memory(unsigned long begin, unsigned long end,
+							int early_boot)
+{
+	unsigned long addr;
+	unsigned long ra_start, ra_end;
+
+	ra_start = fw_dump.reserve_dump_area_start;
+	ra_end = ra_start + fw_dump.reserve_dump_area_size;
+
+	for (addr = begin; addr < end; addr += PAGE_SIZE) {
+		/*
+		 * exclude the dump reserve area. Will reuse it for next
+		 * fadump registration.
+		 */
+		if (addr <= ra_end && ((addr + PAGE_SIZE) > ra_start))
+			continue;
+
+		if (early_boot) {
+			memblock_free(addr, PAGE_SIZE);
+			continue;
+		}
+
+		ClearPageReserved(pfn_to_page(addr >> PAGE_SHIFT));
+		init_page_count(pfn_to_page(addr >> PAGE_SHIFT));
+		free_page((unsigned long)__va(addr));
+		totalram_pages++;
+	}
+}
+
+static void fadump_invalidate_release_mem(int early_boot)
+{
+	unsigned long reserved_area_start, reserved_area_end;
+
+	if (!fw_dump.dump_active)
+		return;
+
+	/*
+	 * Save the current reserved memory bounds we will require them
+	 * later for releasing the memory for general use.
+	 */
+	reserved_area_start = fw_dump.reserve_dump_area_start;
+	reserved_area_end = reserved_area_start +
+			fw_dump.reserve_dump_area_size;
+	/*
+	 * Setup reserve_dump_area_start and its size so that we can
+	 * reuse this reserved memory for Re-registration.
+	 */
+	fw_dump.reserve_dump_area_start =
+			fdm_active->cpu_state_data.destination_address;
+	fw_dump.reserve_dump_area_size = get_dump_area_size();
+
+	fadump_cleanup();
+	fadump_release_memory(reserved_area_start, reserved_area_end,
+							early_boot);
+	if (fw_dump.cpu_notes_buf) {
+		fadump_release_memory(fw_dump.cpu_notes_buf,
+					fw_dump.cpu_notes_buf_size,
+					early_boot);
+		fw_dump.cpu_notes_buf = 0;
+		fw_dump.cpu_notes_buf_size = 0;
+	}
+}
+
+static ssize_t fadump_release_memory_store(struct kobject *kobj,
+					struct kobj_attribute *attr,
+					const char *buf, size_t count)
+{
+	if (!fw_dump.dump_active)
+		return -EPERM;
+
+	if (buf[0] == '1') {
+		/*
+		 * Take away the '/proc/vmcore'. We are releasing the dump
+		 * memory, hence it will not be valid anymore.
+		 */
+		vmcore_cleanup();
+		fadump_invalidate_release_mem(0);
+
+		/*
+		 * We are done saving the dump and have release the memory
+		 * for general use. Now Re-register for firmware-assisted dump
+		 * for future kernel dump.
+		 */
+		register_fadump();
+	} else
+		return -EINVAL;
+	return count;
+}
+
 static ssize_t fadump_enabled_show(struct kobject *kobj,
 					struct kobj_attribute *attr,
 					char *buf)
@@ -1028,6 +1161,9 @@ static ssize_t fadump_region_show(struct kobject *kobj,
 	return n;
 }
 
+static struct kobj_attribute fadump_release_attr = __ATTR(fadump_release_mem,
+						0200, NULL,
+						fadump_release_memory_store);
 static struct kobj_attribute fadump_attr = __ATTR(fadump_enabled,
 						0444, fadump_enabled_show,
 						NULL);
@@ -1047,6 +1183,12 @@ static int fadump_init_sysfs(void)
 	if (rc)
 		printk(KERN_ERR "fadump: unable to create sysfs file"
 			" (%d)\n", rc);
+	if (fw_dump.dump_active) {
+		rc = sysfs_create_file(kernel_kobj, &fadump_release_attr.attr);
+		if (rc)
+			printk(KERN_ERR "fadump: unable to create sysfs file"
+				" (%d)\n", rc);
+	}
 	return rc;
 }
 subsys_initcall(fadump_init_sysfs);
@@ -1069,8 +1211,15 @@ int __init setup_fadump(void)
 	 * saving it to the disk.
 	 */
 	if (fw_dump.dump_active) {
-		process_fadump(fdm_active);
-		return 1;
+		/*
+		 * if dump process fails then invalidate the registration
+		 * and release memory before proceeding for re-registration.
+		 */
+		if (process_fadump(fdm_active) < 0) {
+			fadump_invalidate_release_mem(1);
+			/* fall through for Re-registration. */
+		} else
+			return 1;
 	}
 
 	register_fadump();


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 08/10] fadump: Invalidate registration and release reserved memory for general use.
@ 2011-07-13 18:08   ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Anton Blanchard, Milton Miller, Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch introduces an sysfs interface '/sys/kernel/fadump_release_mem' to
invalidate the last fadump registration, invalidate '/proc/vmcore', release
the reserved memory for general use and re-register for future kernel dump.
Once the dump is copied to the disk, the userspace tool will echo 1 to
'/sys/kernel/fadump_release_mem'.

Release the reserved memory region excluding the size of the memory required
for future kernel dump registration.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/fadump.h |    3 +
 arch/powerpc/kernel/fadump.c      |  153 +++++++++++++++++++++++++++++++++++++
 2 files changed, 154 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h
index 1faa980..52fa1db 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -187,5 +187,8 @@ extern int fadump_reserve_mem(void);
 extern int setup_fadump(void);
 extern int is_fadump_active(void);
 extern void crash_fadump(struct pt_regs *, const char *);
+extern void fadump_cleanup(void);
+
+extern void vmcore_cleanup(void);
 #endif
 #endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index fee132b..22c72cb 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -30,6 +30,8 @@
 #include <linux/memblock.h>
 #include <linux/delay.h>
 #include <linux/crash_dump.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
 
 #include <asm/page.h>
 #include <asm/prom.h>
@@ -963,6 +965,137 @@ static void register_fadump(void)
 	register_fw_dump(&fdm);
 }
 
+static int fadump_invalidate_dump(struct fadump_mem_struct *fdm)
+{
+	int rc = 0;
+	unsigned int wait_time;
+
+	DBG("Invalidating firmware-assisted dump registration\n");
+
+	/* TODO: Add upper time limit for the delay */
+	do {
+		rc = rtas_call(fw_dump.ibm_configure_kernel_dump, 3, 1, NULL,
+			FADUMP_INVALIDATE, fdm,
+			sizeof(struct fadump_mem_struct));
+
+		wait_time = rtas_busy_delay_time(rc);
+		if (wait_time)
+			mdelay(wait_time);
+	} while (wait_time);
+
+	if (rc) {
+		printk(KERN_ERR "Failed to invalidate firmware-assisted dump "
+			"rgistration. unexpected error(%d).\n", rc);
+		return rc;
+	}
+	fw_dump.dump_active = 0;
+	fdm_active = NULL;
+	return 0;
+}
+
+void fadump_cleanup(void)
+{
+	/* Invalidate the registration only if dump is active. */
+	if (fw_dump.dump_active) {
+		init_fadump_mem_struct(&fdm,
+			fdm_active->cpu_state_data.destination_address);
+		fadump_invalidate_dump(&fdm);
+	}
+}
+
+/*
+ * Release the memory that was reserved in early boot to preserve the memory
+ * contents. The released memory will be available for general use.
+ */
+static void fadump_release_memory(unsigned long begin, unsigned long end,
+							int early_boot)
+{
+	unsigned long addr;
+	unsigned long ra_start, ra_end;
+
+	ra_start = fw_dump.reserve_dump_area_start;
+	ra_end = ra_start + fw_dump.reserve_dump_area_size;
+
+	for (addr = begin; addr < end; addr += PAGE_SIZE) {
+		/*
+		 * exclude the dump reserve area. Will reuse it for next
+		 * fadump registration.
+		 */
+		if (addr <= ra_end && ((addr + PAGE_SIZE) > ra_start))
+			continue;
+
+		if (early_boot) {
+			memblock_free(addr, PAGE_SIZE);
+			continue;
+		}
+
+		ClearPageReserved(pfn_to_page(addr >> PAGE_SHIFT));
+		init_page_count(pfn_to_page(addr >> PAGE_SHIFT));
+		free_page((unsigned long)__va(addr));
+		totalram_pages++;
+	}
+}
+
+static void fadump_invalidate_release_mem(int early_boot)
+{
+	unsigned long reserved_area_start, reserved_area_end;
+
+	if (!fw_dump.dump_active)
+		return;
+
+	/*
+	 * Save the current reserved memory bounds we will require them
+	 * later for releasing the memory for general use.
+	 */
+	reserved_area_start = fw_dump.reserve_dump_area_start;
+	reserved_area_end = reserved_area_start +
+			fw_dump.reserve_dump_area_size;
+	/*
+	 * Setup reserve_dump_area_start and its size so that we can
+	 * reuse this reserved memory for Re-registration.
+	 */
+	fw_dump.reserve_dump_area_start =
+			fdm_active->cpu_state_data.destination_address;
+	fw_dump.reserve_dump_area_size = get_dump_area_size();
+
+	fadump_cleanup();
+	fadump_release_memory(reserved_area_start, reserved_area_end,
+							early_boot);
+	if (fw_dump.cpu_notes_buf) {
+		fadump_release_memory(fw_dump.cpu_notes_buf,
+					fw_dump.cpu_notes_buf_size,
+					early_boot);
+		fw_dump.cpu_notes_buf = 0;
+		fw_dump.cpu_notes_buf_size = 0;
+	}
+}
+
+static ssize_t fadump_release_memory_store(struct kobject *kobj,
+					struct kobj_attribute *attr,
+					const char *buf, size_t count)
+{
+	if (!fw_dump.dump_active)
+		return -EPERM;
+
+	if (buf[0] == '1') {
+		/*
+		 * Take away the '/proc/vmcore'. We are releasing the dump
+		 * memory, hence it will not be valid anymore.
+		 */
+		vmcore_cleanup();
+		fadump_invalidate_release_mem(0);
+
+		/*
+		 * We are done saving the dump and have release the memory
+		 * for general use. Now Re-register for firmware-assisted dump
+		 * for future kernel dump.
+		 */
+		register_fadump();
+	} else
+		return -EINVAL;
+	return count;
+}
+
 static ssize_t fadump_enabled_show(struct kobject *kobj,
 					struct kobj_attribute *attr,
 					char *buf)
@@ -1028,6 +1161,9 @@ static ssize_t fadump_region_show(struct kobject *kobj,
 	return n;
 }
 
+static struct kobj_attribute fadump_release_attr = __ATTR(fadump_release_mem,
+						0200, NULL,
+						fadump_release_memory_store);
 static struct kobj_attribute fadump_attr = __ATTR(fadump_enabled,
 						0444, fadump_enabled_show,
 						NULL);
@@ -1047,6 +1183,12 @@ static int fadump_init_sysfs(void)
 	if (rc)
 		printk(KERN_ERR "fadump: unable to create sysfs file"
 			" (%d)\n", rc);
+	if (fw_dump.dump_active) {
+		rc = sysfs_create_file(kernel_kobj, &fadump_release_attr.attr);
+		if (rc)
+			printk(KERN_ERR "fadump: unable to create sysfs file"
+				" (%d)\n", rc);
+	}
 	return rc;
 }
 subsys_initcall(fadump_init_sysfs);
@@ -1069,8 +1211,15 @@ int __init setup_fadump(void)
 	 * saving it to the disk.
 	 */
 	if (fw_dump.dump_active) {
-		process_fadump(fdm_active);
-		return 1;
+		/*
+		 * if dump process fails then invalidate the registration
+		 * and release memory before proceeding for re-registration.
+		 */
+		if (process_fadump(fdm_active) < 0) {
+			fadump_invalidate_release_mem(1);
+			/* fall through for Re-registration. */
+		} else
+			return 1;
 	}
 
 	register_fadump();

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 09/10] fadump: Invalidate the fadump registration during machine shutdown.
  2011-07-13 18:06 ` Mahesh J Salgaonkar
@ 2011-07-13 18:08   ` Mahesh J Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

If dump is active during system reboot, shutdown or halt then invalidate
the fadump registration as it does not get invalidated automatically.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/setup-common.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 79fca26..5683661 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -61,6 +61,7 @@
 #include <asm/xmon.h>
 #include <asm/cputhreads.h>
 #include <mm/mmu_decl.h>
+#include <asm/fadump.h>
 
 #include "setup.h"
 
@@ -109,6 +110,14 @@ EXPORT_SYMBOL(ppc_do_canonicalize_irqs);
 /* also used by kexec */
 void machine_shutdown(void)
 {
+#ifdef CONFIG_FA_DUMP
+	/*
+	 * if fadump is active, cleanup the fadump registration before we
+	 * shutdown.
+	 */
+	fadump_cleanup();
+#endif
+
 	if (ppc_md.machine_shutdown)
 		ppc_md.machine_shutdown();
 }


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 09/10] fadump: Invalidate the fadump registration during machine shutdown.
@ 2011-07-13 18:08   ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Anton Blanchard, Milton Miller, Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

If dump is active during system reboot, shutdown or halt then invalidate
the fadump registration as it does not get invalidated automatically.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/setup-common.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 79fca26..5683661 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -61,6 +61,7 @@
 #include <asm/xmon.h>
 #include <asm/cputhreads.h>
 #include <mm/mmu_decl.h>
+#include <asm/fadump.h>
 
 #include "setup.h"
 
@@ -109,6 +110,14 @@ EXPORT_SYMBOL(ppc_do_canonicalize_irqs);
 /* also used by kexec */
 void machine_shutdown(void)
 {
+#ifdef CONFIG_FA_DUMP
+	/*
+	 * if fadump is active, cleanup the fadump registration before we
+	 * shutdown.
+	 */
+	fadump_cleanup();
+#endif
+
 	if (ppc_md.machine_shutdown)
 		ppc_md.machine_shutdown();
 }

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 10/10] fadump: Introduce config option for firmware assisted dump feature
  2011-07-13 18:06 ` Mahesh J Salgaonkar
@ 2011-07-13 18:08   ` Mahesh J Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch introduces a new config option CONFIG_FA_DUMP for firmware
assisted dump feature on Powerpc (ppc64) architecture.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/Kconfig |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2729c66..a424ad3 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -376,6 +376,19 @@ config PHYP_DUMP
 
 	  If unsure, say "N"
 
+config FA_DUMP
+	bool "Firmware-assisted dump"
+	depends on PPC64 || PPC_RTAS || CRASH_DUMP
+	help
+	  A robust mechanism to get reliable kernel crash dump with
+	  assistance from firmware. This approach does not use kexec,
+	  instead firmware assists in booting the kdump kernel
+	  while preserving memory contents. Firmware-assisted dump
+	  is meant to be a kdump replacement offering robustness and
+	  speed not possible without system firmware assistance.
+
+	  If unsure, say "N"
+
 config PPCBUG_NVRAM
 	bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC
 	default y if PPC_PREP


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 10/10] fadump: Introduce config option for firmware assisted dump feature
@ 2011-07-13 18:08   ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-07-13 18:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel
  Cc: Michael Ellerman, Anton Blanchard, Milton Miller, Eric W. Biederman

From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

This patch introduces a new config option CONFIG_FA_DUMP for firmware
assisted dump feature on Powerpc (ppc64) architecture.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/Kconfig |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2729c66..a424ad3 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -376,6 +376,19 @@ config PHYP_DUMP
 
 	  If unsure, say "N"
 
+config FA_DUMP
+	bool "Firmware-assisted dump"
+	depends on PPC64 || PPC_RTAS || CRASH_DUMP
+	help
+	  A robust mechanism to get reliable kernel crash dump with
+	  assistance from firmware. This approach does not use kexec,
+	  instead firmware assists in booting the kdump kernel
+	  while preserving memory contents. Firmware-assisted dump
+	  is meant to be a kdump replacement offering robustness and
+	  speed not possible without system firmware assistance.
+
+	  If unsure, say "N"
+
 config PPCBUG_NVRAM
 	bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC
 	default y if PPC_PREP

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 02/10] fadump: Reserve the memory for firmware assisted dump.
  2011-07-13 18:06   ` Mahesh J Salgaonkar
@ 2011-08-31  4:11     ` Anton Blanchard
  -1 siblings, 0 replies; 34+ messages in thread
From: Anton Blanchard @ 2011-08-31  4:11 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel,
	Michael Ellerman, Milton Miller, Eric W. Biederman


Hi Mahesh,

Just a few comments.

> +#define RMR_START	0x0
> +#define RMR_END		(0x1UL << 28)	/* 256 MB */

What if the RMO is bigger than 256MB? Should we be using ppc64_rma_size?

> +#ifdef DEBUG
> +#define PREFIX		"fadump: "
> +#define DBG(fmt...)	printk(KERN_ERR PREFIX fmt)
> +#else
> +#define DBG(fmt...)
> +#endif

We should use the standard debug macros (pr_debug etc).

> +/* Global variable to hold firmware assisted dump configuration info. */
> +static struct fw_dump fw_dump;

You can remove this comment, especially because the variable isn't global :)

> +	sections = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
> +					NULL);
> +
> +	if (!sections)
> +		return 0;
> +
> +	for (i = 0; i < FW_DUMP_NUM_SECTIONS; i++) {
> +		switch (sections[i].dump_section) {
> +		case FADUMP_CPU_STATE_DATA:
> +			fw_dump.cpu_state_data_size =
> sections[i].section_size;
> +			break;
> +		case FADUMP_HPTE_REGION:
> +			fw_dump.hpte_region_size =
> sections[i].section_size;
> +			break;
> +		}
> +	}
> +	return 1;
> +}

This makes me a bit nervous. We should really get the size of the property
and use it to iterate through the array. I saw no requirement in the PAPR
that the array had to be 2 entries long.

> +static inline unsigned long calculate_reserve_size(void)
> +{
> +	unsigned long size;
> +
> +	/* divide by 20 to get 5% of value */
> +	size = memblock_end_of_DRAM();
> +	do_div(size, 20);
> +
> +	/* round it down in multiples of 256 */
> +	size = size & ~0x0FFFFFFFUL;
> +
> +	/* Truncate to memory_limit. We don't want to over reserve
> the memory.*/
> +	if (memory_limit && size > memory_limit)
> +		size = memory_limit;
> +
> +	return (size > RMR_END ? size : RMR_END);
> +}

5% is pretty aribitrary, that's 400GB on an 8TB box. Also our experience
with kdump is that 256MB is too small. Is there any reason to scale it
with memory size? Can we do what kdump does and set it to a single
value (eg 512MB)?

We could override the default with a boot option, which is similar to
how kdump specifies the region to reserve.

Anton

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 02/10] fadump: Reserve the memory for firmware assisted dump.
@ 2011-08-31  4:11     ` Anton Blanchard
  0 siblings, 0 replies; 34+ messages in thread
From: Anton Blanchard @ 2011-08-31  4:11 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: Linux Kernel, Milton Miller, Michael Ellerman, Eric W. Biederman,
	linuxppc-dev


Hi Mahesh,

Just a few comments.

> +#define RMR_START	0x0
> +#define RMR_END		(0x1UL << 28)	/* 256 MB */

What if the RMO is bigger than 256MB? Should we be using ppc64_rma_size?

> +#ifdef DEBUG
> +#define PREFIX		"fadump: "
> +#define DBG(fmt...)	printk(KERN_ERR PREFIX fmt)
> +#else
> +#define DBG(fmt...)
> +#endif

We should use the standard debug macros (pr_debug etc).

> +/* Global variable to hold firmware assisted dump configuration info. */
> +static struct fw_dump fw_dump;

You can remove this comment, especially because the variable isn't global :)

> +	sections = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
> +					NULL);
> +
> +	if (!sections)
> +		return 0;
> +
> +	for (i = 0; i < FW_DUMP_NUM_SECTIONS; i++) {
> +		switch (sections[i].dump_section) {
> +		case FADUMP_CPU_STATE_DATA:
> +			fw_dump.cpu_state_data_size =
> sections[i].section_size;
> +			break;
> +		case FADUMP_HPTE_REGION:
> +			fw_dump.hpte_region_size =
> sections[i].section_size;
> +			break;
> +		}
> +	}
> +	return 1;
> +}

This makes me a bit nervous. We should really get the size of the property
and use it to iterate through the array. I saw no requirement in the PAPR
that the array had to be 2 entries long.

> +static inline unsigned long calculate_reserve_size(void)
> +{
> +	unsigned long size;
> +
> +	/* divide by 20 to get 5% of value */
> +	size = memblock_end_of_DRAM();
> +	do_div(size, 20);
> +
> +	/* round it down in multiples of 256 */
> +	size = size & ~0x0FFFFFFFUL;
> +
> +	/* Truncate to memory_limit. We don't want to over reserve
> the memory.*/
> +	if (memory_limit && size > memory_limit)
> +		size = memory_limit;
> +
> +	return (size > RMR_END ? size : RMR_END);
> +}

5% is pretty aribitrary, that's 400GB on an 8TB box. Also our experience
with kdump is that 256MB is too small. Is there any reason to scale it
with memory size? Can we do what kdump does and set it to a single
value (eg 512MB)?

We could override the default with a boot option, which is similar to
how kdump specifies the region to reserve.

Anton

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 03/10] fadump: Register for firmware assisted dump.
  2011-07-13 18:07   ` Mahesh J Salgaonkar
@ 2011-08-31  4:20     ` Anton Blanchard
  -1 siblings, 0 replies; 34+ messages in thread
From: Anton Blanchard @ 2011-08-31  4:20 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel,
	Michael Ellerman, Milton Miller, Eric W. Biederman


Hi,

> +static void fadump_show_config(void)
> +{
> +	DBG("Support for firmware-assisted dump (fadump): %s\n",
> +			(fw_dump.fadump_supported ? "present" : "no support"));
> +
> +	if (!fw_dump.fadump_supported)
> +		return;
> +
> +	DBG("Fadump enabled    : %s\n",
> +				(fw_dump.fadump_enabled ? "yes" : "no"));
> +	DBG("Dump Active       : %s\n", (fw_dump.dump_active ? "yes" : "no"));
> +	DBG("Dump section sizes:\n");
> +	DBG("	CPU state data size: %lx\n", fw_dump.cpu_state_data_size);
> +	DBG("	HPTE region size   : %lx\n", fw_dump.hpte_region_size);
> +	DBG("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
> +	DBG("Reserve area start: %lx\n", fw_dump.reserve_dump_area_start);
> +	DBG("Reserve area size : %lx\n", fw_dump.reserve_dump_area_size);
> +}
> +
> +static void show_fadump_mem_struct(const struct fadump_mem_struct *fdm)
> +{
> +	if (!fdm)
> +		return;
> +
> +	DBG("--------Firmware-assisted dump memory structure---------\n");
> +	DBG("header.dump_format_version        : 0x%08x\n",
> +					fdm->header.dump_format_version);
> +	DBG("header.dump_num_sections          : %d\n",
> +					fdm->header.dump_num_sections);
> +	DBG("header.dump_status_flag           : 0x%04x\n",
> +					fdm->header.dump_status_flag);
> +	DBG("header.offset_first_dump_section  : 0x%x\n",
> +					fdm->header.offset_first_dump_section);
> +
> +	DBG("header.dd_block_size              : %d\n",
> +					fdm->header.dd_block_size);
> +	DBG("header.dd_block_offset            : 0x%Lx\n",
> +					fdm->header.dd_block_offset);
> +	DBG("header.dd_num_blocks              : %Lx\n",
> +					fdm->header.dd_num_blocks);
> +	DBG("header.dd_offset_disk_path        : 0x%x\n",
> +					fdm->header.dd_offset_disk_path);
> +
> +	DBG("header.max_time_auto              : %d\n",
> +					fdm->header.max_time_auto);
> +
> +	/* Kernel dump sections */
> +	DBG("cpu_state_data.request_flag       : 0x%08x\n",
> +					fdm->cpu_state_data.request_flag);
> +	DBG("cpu_state_data.source_data_type   : 0x%04x\n",
> +					fdm->cpu_state_data.source_data_type);
> +	DBG("cpu_state_data.error_flags        : 0x%04x\n",
> +					fdm->cpu_state_data.error_flags);
> +	DBG("cpu_state_data.source_address     : 0x%016Lx\n",
> +					fdm->cpu_state_data.source_address);
> +	DBG("cpu_state_data.source_len         : 0x%Lx\n",
> +					fdm->cpu_state_data.source_len);
> +	DBG("cpu_state_data.bytes_dumped       : 0x%Lx\n",
> +					fdm->cpu_state_data.bytes_dumped);
> +	DBG("cpu_state_data.destination_address: 0x%016Lx\n",
> +				fdm->cpu_state_data.destination_address);
> +
> +	DBG("hpte_region.request_flag          : 0x%08x\n",
> +					fdm->hpte_region.request_flag);
> +	DBG("hpte_region.source_data_type      : 0x%04x\n",
> +					fdm->hpte_region.source_data_type);
> +	DBG("hpte_region.error_flags           : 0x%04x\n",
> +					fdm->hpte_region.error_flags);
> +	DBG("hpte_region.source_address        : 0x%016Lx\n",
> +					fdm->hpte_region.source_address);
> +	DBG("hpte_region.source_len            : 0x%Lx\n",
> +					fdm->hpte_region.source_len);
> +	DBG("hpte_region.bytes_dumped          : 0x%Lx\n",
> +					fdm->hpte_region.bytes_dumped);
> +	DBG("hpte_region.destination_address   : 0x%016Lx\n",
> +				fdm->hpte_region.destination_address);
> +
> +	DBG("rmr_region.request_flag           : 0x%08x\n",
> +					fdm->rmr_region.request_flag);
> +	DBG("rmr_region.source_data_type       : 0x%04x\n",
> +					fdm->rmr_region.source_data_type);
> +	DBG("rmr_region.error_flags            : 0x%04x\n",
> +					fdm->rmr_region.error_flags);
> +	DBG("rmr_region.source_address         : 0x%016Lx\n",
> +					fdm->rmr_region.source_address);
> +	DBG("rmr_region.source_len             : 0x%Lx\n",
> +					fdm->rmr_region.source_len);
> +	DBG("rmr_region.bytes_dumped           : 0x%Lx\n",
> +					fdm->rmr_region.bytes_dumped);
> +	DBG("rmr_region.destination_address    : 0x%016Lx\n",
> +				fdm->rmr_region.destination_address);
> +
> +	DBG("--------Firmware-assisted dump memory structure---------\n");
> +}
> +

That's an awful lot of debug information. I don't think we need to carry
this around in the kernel once the feature is working.

> +static ssize_t fadump_enabled_show(struct kobject *kobj,
> +					struct kobj_attribute *attr,
> +					char *buf)
> +{
> +	return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
> +}
> +

> +static ssize_t fadump_region_show(struct kobject *kobj,
> +					struct kobj_attribute *attr,
> +					char *buf)
> +{
> +	const struct fadump_mem_struct *fdm_ptr;
> +	ssize_t n = 0;
> +
> +	if (!fw_dump.fadump_enabled)
> +		return n;
> +
> +	if (fdm_active)
> +		fdm_ptr = fdm_active;
> +	else
> +		fdm_ptr = &fdm;
> +
> +	n += sprintf(buf,
> +			"CPU : [%#016llx-%#016llx] %#llx bytes, "
> +			"Dumped: %#llx\n",
> +			fdm_ptr->cpu_state_data.destination_address,
> +			fdm_ptr->cpu_state_data.destination_address +
> +			fdm_ptr->cpu_state_data.source_len - 1,
> +			fdm_ptr->cpu_state_data.source_len,
> +			fdm_ptr->cpu_state_data.bytes_dumped);
> +	n += sprintf(buf + n,
> +			"HPTE: [%#016llx-%#016llx] %#llx bytes, "
> +			"Dumped: %#llx\n",
> +			fdm_ptr->hpte_region.destination_address,
> +			fdm_ptr->hpte_region.destination_address +
> +			fdm_ptr->hpte_region.source_len - 1,
> +			fdm_ptr->hpte_region.source_len,
> +			fdm_ptr->hpte_region.bytes_dumped);
> +	n += sprintf(buf + n,
> +			"DUMP: [%#016llx-%#016llx] %#llx bytes, "
> +			"Dumped: %#llx\n",
> +			fdm_ptr->rmr_region.destination_address,
> +			fdm_ptr->rmr_region.destination_address +
> +			fdm_ptr->rmr_region.source_len - 1,
> +			fdm_ptr->rmr_region.source_len,
> +			fdm_ptr->rmr_region.bytes_dumped);
> +
> +	if (!fdm_active ||
> +		(fw_dump.reserve_dump_area_start ==
> +		fdm_ptr->cpu_state_data.destination_address))
> +		return n;
> +
> +	/* Dump is active. Show reserved memory region. */
> +	n += sprintf(buf + n,
> +			"    : [%#016llx-%#016llx] %#llx bytes, "
> +			"Dumped: %#llx\n",
> +			(unsigned long long)fw_dump.reserve_dump_area_start,
> +			fdm_ptr->cpu_state_data.destination_address - 1,
> +			fdm_ptr->cpu_state_data.destination_address -
> +			fw_dump.reserve_dump_area_start,
> +			fdm_ptr->cpu_state_data.destination_address -
> +			fw_dump.reserve_dump_area_start);
> +	return n;
> +}
> +
> +static struct kobj_attribute fadump_attr = __ATTR(fadump_enabled,
> +						0444, fadump_enabled_show,
> +						NULL);
> +static struct kobj_attribute fadump_region_attr = __ATTR(fadump_region,
> +						0444, fadump_region_show, NULL);
> +
> +static int fadump_init_sysfs(void)
> +{
> +	int rc = 0;
> +
> +	rc = sysfs_create_file(kernel_kobj, &fadump_attr.attr);
> +	if (rc)
> +		printk(KERN_ERR "fadump: unable to create sysfs file"
> +			" (%d)\n", rc);
> +
> +	rc = sysfs_create_file(kernel_kobj, &fadump_region_attr.attr);
> +	if (rc)
> +		printk(KERN_ERR "fadump: unable to create sysfs file"
> +			" (%d)\n", rc);
> +	return rc;
> +}
> +subsys_initcall(fadump_init_sysfs);

Do we need to dump this all out via sysfs? Will tools depend on this,
or is it just for debug? It might be better to place in debugfs.

Anton

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 03/10] fadump: Register for firmware assisted dump.
@ 2011-08-31  4:20     ` Anton Blanchard
  0 siblings, 0 replies; 34+ messages in thread
From: Anton Blanchard @ 2011-08-31  4:20 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: Linux Kernel, Milton Miller, Michael Ellerman, Eric W. Biederman,
	linuxppc-dev


Hi,

> +static void fadump_show_config(void)
> +{
> +	DBG("Support for firmware-assisted dump (fadump): %s\n",
> +			(fw_dump.fadump_supported ? "present" : "no support"));
> +
> +	if (!fw_dump.fadump_supported)
> +		return;
> +
> +	DBG("Fadump enabled    : %s\n",
> +				(fw_dump.fadump_enabled ? "yes" : "no"));
> +	DBG("Dump Active       : %s\n", (fw_dump.dump_active ? "yes" : "no"));
> +	DBG("Dump section sizes:\n");
> +	DBG("	CPU state data size: %lx\n", fw_dump.cpu_state_data_size);
> +	DBG("	HPTE region size   : %lx\n", fw_dump.hpte_region_size);
> +	DBG("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
> +	DBG("Reserve area start: %lx\n", fw_dump.reserve_dump_area_start);
> +	DBG("Reserve area size : %lx\n", fw_dump.reserve_dump_area_size);
> +}
> +
> +static void show_fadump_mem_struct(const struct fadump_mem_struct *fdm)
> +{
> +	if (!fdm)
> +		return;
> +
> +	DBG("--------Firmware-assisted dump memory structure---------\n");
> +	DBG("header.dump_format_version        : 0x%08x\n",
> +					fdm->header.dump_format_version);
> +	DBG("header.dump_num_sections          : %d\n",
> +					fdm->header.dump_num_sections);
> +	DBG("header.dump_status_flag           : 0x%04x\n",
> +					fdm->header.dump_status_flag);
> +	DBG("header.offset_first_dump_section  : 0x%x\n",
> +					fdm->header.offset_first_dump_section);
> +
> +	DBG("header.dd_block_size              : %d\n",
> +					fdm->header.dd_block_size);
> +	DBG("header.dd_block_offset            : 0x%Lx\n",
> +					fdm->header.dd_block_offset);
> +	DBG("header.dd_num_blocks              : %Lx\n",
> +					fdm->header.dd_num_blocks);
> +	DBG("header.dd_offset_disk_path        : 0x%x\n",
> +					fdm->header.dd_offset_disk_path);
> +
> +	DBG("header.max_time_auto              : %d\n",
> +					fdm->header.max_time_auto);
> +
> +	/* Kernel dump sections */
> +	DBG("cpu_state_data.request_flag       : 0x%08x\n",
> +					fdm->cpu_state_data.request_flag);
> +	DBG("cpu_state_data.source_data_type   : 0x%04x\n",
> +					fdm->cpu_state_data.source_data_type);
> +	DBG("cpu_state_data.error_flags        : 0x%04x\n",
> +					fdm->cpu_state_data.error_flags);
> +	DBG("cpu_state_data.source_address     : 0x%016Lx\n",
> +					fdm->cpu_state_data.source_address);
> +	DBG("cpu_state_data.source_len         : 0x%Lx\n",
> +					fdm->cpu_state_data.source_len);
> +	DBG("cpu_state_data.bytes_dumped       : 0x%Lx\n",
> +					fdm->cpu_state_data.bytes_dumped);
> +	DBG("cpu_state_data.destination_address: 0x%016Lx\n",
> +				fdm->cpu_state_data.destination_address);
> +
> +	DBG("hpte_region.request_flag          : 0x%08x\n",
> +					fdm->hpte_region.request_flag);
> +	DBG("hpte_region.source_data_type      : 0x%04x\n",
> +					fdm->hpte_region.source_data_type);
> +	DBG("hpte_region.error_flags           : 0x%04x\n",
> +					fdm->hpte_region.error_flags);
> +	DBG("hpte_region.source_address        : 0x%016Lx\n",
> +					fdm->hpte_region.source_address);
> +	DBG("hpte_region.source_len            : 0x%Lx\n",
> +					fdm->hpte_region.source_len);
> +	DBG("hpte_region.bytes_dumped          : 0x%Lx\n",
> +					fdm->hpte_region.bytes_dumped);
> +	DBG("hpte_region.destination_address   : 0x%016Lx\n",
> +				fdm->hpte_region.destination_address);
> +
> +	DBG("rmr_region.request_flag           : 0x%08x\n",
> +					fdm->rmr_region.request_flag);
> +	DBG("rmr_region.source_data_type       : 0x%04x\n",
> +					fdm->rmr_region.source_data_type);
> +	DBG("rmr_region.error_flags            : 0x%04x\n",
> +					fdm->rmr_region.error_flags);
> +	DBG("rmr_region.source_address         : 0x%016Lx\n",
> +					fdm->rmr_region.source_address);
> +	DBG("rmr_region.source_len             : 0x%Lx\n",
> +					fdm->rmr_region.source_len);
> +	DBG("rmr_region.bytes_dumped           : 0x%Lx\n",
> +					fdm->rmr_region.bytes_dumped);
> +	DBG("rmr_region.destination_address    : 0x%016Lx\n",
> +				fdm->rmr_region.destination_address);
> +
> +	DBG("--------Firmware-assisted dump memory structure---------\n");
> +}
> +

That's an awful lot of debug information. I don't think we need to carry
this around in the kernel once the feature is working.

> +static ssize_t fadump_enabled_show(struct kobject *kobj,
> +					struct kobj_attribute *attr,
> +					char *buf)
> +{
> +	return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
> +}
> +

> +static ssize_t fadump_region_show(struct kobject *kobj,
> +					struct kobj_attribute *attr,
> +					char *buf)
> +{
> +	const struct fadump_mem_struct *fdm_ptr;
> +	ssize_t n = 0;
> +
> +	if (!fw_dump.fadump_enabled)
> +		return n;
> +
> +	if (fdm_active)
> +		fdm_ptr = fdm_active;
> +	else
> +		fdm_ptr = &fdm;
> +
> +	n += sprintf(buf,
> +			"CPU : [%#016llx-%#016llx] %#llx bytes, "
> +			"Dumped: %#llx\n",
> +			fdm_ptr->cpu_state_data.destination_address,
> +			fdm_ptr->cpu_state_data.destination_address +
> +			fdm_ptr->cpu_state_data.source_len - 1,
> +			fdm_ptr->cpu_state_data.source_len,
> +			fdm_ptr->cpu_state_data.bytes_dumped);
> +	n += sprintf(buf + n,
> +			"HPTE: [%#016llx-%#016llx] %#llx bytes, "
> +			"Dumped: %#llx\n",
> +			fdm_ptr->hpte_region.destination_address,
> +			fdm_ptr->hpte_region.destination_address +
> +			fdm_ptr->hpte_region.source_len - 1,
> +			fdm_ptr->hpte_region.source_len,
> +			fdm_ptr->hpte_region.bytes_dumped);
> +	n += sprintf(buf + n,
> +			"DUMP: [%#016llx-%#016llx] %#llx bytes, "
> +			"Dumped: %#llx\n",
> +			fdm_ptr->rmr_region.destination_address,
> +			fdm_ptr->rmr_region.destination_address +
> +			fdm_ptr->rmr_region.source_len - 1,
> +			fdm_ptr->rmr_region.source_len,
> +			fdm_ptr->rmr_region.bytes_dumped);
> +
> +	if (!fdm_active ||
> +		(fw_dump.reserve_dump_area_start ==
> +		fdm_ptr->cpu_state_data.destination_address))
> +		return n;
> +
> +	/* Dump is active. Show reserved memory region. */
> +	n += sprintf(buf + n,
> +			"    : [%#016llx-%#016llx] %#llx bytes, "
> +			"Dumped: %#llx\n",
> +			(unsigned long long)fw_dump.reserve_dump_area_start,
> +			fdm_ptr->cpu_state_data.destination_address - 1,
> +			fdm_ptr->cpu_state_data.destination_address -
> +			fw_dump.reserve_dump_area_start,
> +			fdm_ptr->cpu_state_data.destination_address -
> +			fw_dump.reserve_dump_area_start);
> +	return n;
> +}
> +
> +static struct kobj_attribute fadump_attr = __ATTR(fadump_enabled,
> +						0444, fadump_enabled_show,
> +						NULL);
> +static struct kobj_attribute fadump_region_attr = __ATTR(fadump_region,
> +						0444, fadump_region_show, NULL);
> +
> +static int fadump_init_sysfs(void)
> +{
> +	int rc = 0;
> +
> +	rc = sysfs_create_file(kernel_kobj, &fadump_attr.attr);
> +	if (rc)
> +		printk(KERN_ERR "fadump: unable to create sysfs file"
> +			" (%d)\n", rc);
> +
> +	rc = sysfs_create_file(kernel_kobj, &fadump_region_attr.attr);
> +	if (rc)
> +		printk(KERN_ERR "fadump: unable to create sysfs file"
> +			" (%d)\n", rc);
> +	return rc;
> +}
> +subsys_initcall(fadump_init_sysfs);

Do we need to dump this all out via sysfs? Will tools depend on this,
or is it just for debug? It might be better to place in debugfs.

Anton

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 05/10] fadump: Convert firmware-assisted cpu state dump data into elf notes.
  2011-07-13 18:07   ` Mahesh J Salgaonkar
@ 2011-08-31  4:23     ` Anton Blanchard
  -1 siblings, 0 replies; 34+ messages in thread
From: Anton Blanchard @ 2011-08-31  4:23 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel,
	Michael Ellerman, Milton Miller, Eric W. Biederman


> diff --git a/kernel/panic.c b/kernel/panic.c
> index 6923167..1965b50 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -49,6 +49,15 @@ static long no_blink(int state)
>  long (*panic_blink)(int state);
>  EXPORT_SYMBOL(panic_blink);
>  
> +#ifdef CONFIG_FA_DUMP
> +/*
> + * provide an empty default implementation here -- architecture
> + * code may override this
> + */
> +void __attribute__ ((weak)) crash_fadump(struct pt_regs *regs, const char *str)
> +{}
> +#endif
> +
>  /**
>   *	panic - halt the system
>   *	@fmt: The text string to print
> @@ -81,6 +90,13 @@ NORET_TYPE void panic(const char * fmt, ...)
>  	dump_stack();
>  #endif
>  
> +#ifdef CONFIG_FA_DUMP
> +	/*
> +	 * If firmware-assisted dump has been registered then trigger
> +	 * firmware-assisted dump and let firmware handle everything else.
> +	 */
> +	crash_fadump(NULL, buf);
> +#endif
>  	/*
>  	 * If we have crashed and we have a crash kernel loaded let it handle
>  	 * everything else.

Firmware assisted dump is an IBM POWER Systems specific feature and it
shouldn't leak into a common file like this. Isn't there an existing
hook we can catch like the panic notifier?

Anton

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 05/10] fadump: Convert firmware-assisted cpu state dump data into elf notes.
@ 2011-08-31  4:23     ` Anton Blanchard
  0 siblings, 0 replies; 34+ messages in thread
From: Anton Blanchard @ 2011-08-31  4:23 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: Linux Kernel, Milton Miller, Michael Ellerman, Eric W. Biederman,
	linuxppc-dev


> diff --git a/kernel/panic.c b/kernel/panic.c
> index 6923167..1965b50 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -49,6 +49,15 @@ static long no_blink(int state)
>  long (*panic_blink)(int state);
>  EXPORT_SYMBOL(panic_blink);
>  
> +#ifdef CONFIG_FA_DUMP
> +/*
> + * provide an empty default implementation here -- architecture
> + * code may override this
> + */
> +void __attribute__ ((weak)) crash_fadump(struct pt_regs *regs, const char *str)
> +{}
> +#endif
> +
>  /**
>   *	panic - halt the system
>   *	@fmt: The text string to print
> @@ -81,6 +90,13 @@ NORET_TYPE void panic(const char * fmt, ...)
>  	dump_stack();
>  #endif
>  
> +#ifdef CONFIG_FA_DUMP
> +	/*
> +	 * If firmware-assisted dump has been registered then trigger
> +	 * firmware-assisted dump and let firmware handle everything else.
> +	 */
> +	crash_fadump(NULL, buf);
> +#endif
>  	/*
>  	 * If we have crashed and we have a crash kernel loaded let it handle
>  	 * everything else.

Firmware assisted dump is an IBM POWER Systems specific feature and it
shouldn't leak into a common file like this. Isn't there an existing
hook we can catch like the panic notifier?

Anton

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 02/10] fadump: Reserve the memory for firmware assisted dump.
  2011-08-31  4:11     ` Anton Blanchard
@ 2011-09-06 11:59       ` Mahesh Jagannath Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2011-09-06 11:59 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel,
	Michael Ellerman, Milton Miller, Eric W. Biederman

Hi Anton,

On 08/31/2011 09:41 AM, Anton Blanchard wrote:
> 
> Hi Mahesh,
> 
> Just a few comments.
> 
>> +#define RMR_START	0x0
>> +#define RMR_END		(0x1UL << 28)	/* 256 MB */
> 
> What if the RMO is bigger than 256MB? Should we be using ppc64_rma_size?

The idea was to have a minimum memory threshold that requires for a
kernel to boot successfully. On some Power systems where RMO is 128MB,
it still requires minimum of 256MB for kernel to boot successfully.

I think we can rename above #defines as BOOT_MEM_START and BOOT_MEM_END
respectively and have BOOT_MEM_END defined as below:

#define BOOT_MEM_END 	((ppc64_rma_size < (0x1UL << 28)) ? \
			(0x1UL << 28) : ppc64_rma_size)

What do you think?

> 
>> +#ifdef DEBUG
>> +#define PREFIX		"fadump: "
>> +#define DBG(fmt...)	printk(KERN_ERR PREFIX fmt)
>> +#else
>> +#define DBG(fmt...)
>> +#endif
> 
> We should use the standard debug macros (pr_debug etc).

Sure will do that.

> 
>> +/* Global variable to hold firmware assisted dump configuration info. */
>> +static struct fw_dump fw_dump;
> 
> You can remove this comment, especially because the variable isn't global :)

Agree.

> 
>> +	sections = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
>> +					NULL);
>> +
>> +	if (!sections)
>> +		return 0;
>> +
>> +	for (i = 0; i < FW_DUMP_NUM_SECTIONS; i++) {
>> +		switch (sections[i].dump_section) {
>> +		case FADUMP_CPU_STATE_DATA:
>> +			fw_dump.cpu_state_data_size =
>> sections[i].section_size;
>> +			break;
>> +		case FADUMP_HPTE_REGION:
>> +			fw_dump.hpte_region_size =
>> sections[i].section_size;
>> +			break;
>> +		}
>> +	}
>> +	return 1;
>> +}
> 
> This makes me a bit nervous. We should really get the size of the property
> and use it to iterate through the array. I saw no requirement in the PAPR
> that the array had to be 2 entries long.
> 

Agree. Will make the change.

>> +static inline unsigned long calculate_reserve_size(void)
>> +{
>> +	unsigned long size;
>> +
>> +	/* divide by 20 to get 5% of value */
>> +	size = memblock_end_of_DRAM();
>> +	do_div(size, 20);
>> +
>> +	/* round it down in multiples of 256 */
>> +	size = size & ~0x0FFFFFFFUL;
>> +
>> +	/* Truncate to memory_limit. We don't want to over reserve
>> the memory.*/
>> +	if (memory_limit && size > memory_limit)
>> +		size = memory_limit;
>> +
>> +	return (size > RMR_END ? size : RMR_END);
>> +}
> 
> 5% is pretty aribitrary, that's 400GB on an 8TB box. Also our experience
> with kdump is that 256MB is too small. Is there any reason to scale it
> with memory size? Can we do what kdump does and set it to a single
> value (eg 512MB)?

I have picked up this heuristic from the phyp-assisted dump code. I am
yet to figure out a fool-proof method to calculate the minimum memory
needed for any Power box to successfully boot. Till then, I presume we
can use this heuristic based approach?

While testing these patches on huge power system with 1TB RAM and 896
CPUs, I found that even 512MB is small. Hence setting it to a single
value may not work for all system configuration.

> 
> We could override the default with a boot option, which is similar to
> how kdump specifies the region to reserve.

Agree, will work on the change.

Thanks,
-Mahesh.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 02/10] fadump: Reserve the memory for firmware assisted dump.
@ 2011-09-06 11:59       ` Mahesh Jagannath Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh Jagannath Salgaonkar @ 2011-09-06 11:59 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Linux Kernel, Milton Miller, Michael Ellerman, Eric W. Biederman,
	linuxppc-dev

Hi Anton,

On 08/31/2011 09:41 AM, Anton Blanchard wrote:
> 
> Hi Mahesh,
> 
> Just a few comments.
> 
>> +#define RMR_START	0x0
>> +#define RMR_END		(0x1UL << 28)	/* 256 MB */
> 
> What if the RMO is bigger than 256MB? Should we be using ppc64_rma_size?

The idea was to have a minimum memory threshold that requires for a
kernel to boot successfully. On some Power systems where RMO is 128MB,
it still requires minimum of 256MB for kernel to boot successfully.

I think we can rename above #defines as BOOT_MEM_START and BOOT_MEM_END
respectively and have BOOT_MEM_END defined as below:

#define BOOT_MEM_END 	((ppc64_rma_size < (0x1UL << 28)) ? \
			(0x1UL << 28) : ppc64_rma_size)

What do you think?

> 
>> +#ifdef DEBUG
>> +#define PREFIX		"fadump: "
>> +#define DBG(fmt...)	printk(KERN_ERR PREFIX fmt)
>> +#else
>> +#define DBG(fmt...)
>> +#endif
> 
> We should use the standard debug macros (pr_debug etc).

Sure will do that.

> 
>> +/* Global variable to hold firmware assisted dump configuration info. */
>> +static struct fw_dump fw_dump;
> 
> You can remove this comment, especially because the variable isn't global :)

Agree.

> 
>> +	sections = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
>> +					NULL);
>> +
>> +	if (!sections)
>> +		return 0;
>> +
>> +	for (i = 0; i < FW_DUMP_NUM_SECTIONS; i++) {
>> +		switch (sections[i].dump_section) {
>> +		case FADUMP_CPU_STATE_DATA:
>> +			fw_dump.cpu_state_data_size =
>> sections[i].section_size;
>> +			break;
>> +		case FADUMP_HPTE_REGION:
>> +			fw_dump.hpte_region_size =
>> sections[i].section_size;
>> +			break;
>> +		}
>> +	}
>> +	return 1;
>> +}
> 
> This makes me a bit nervous. We should really get the size of the property
> and use it to iterate through the array. I saw no requirement in the PAPR
> that the array had to be 2 entries long.
> 

Agree. Will make the change.

>> +static inline unsigned long calculate_reserve_size(void)
>> +{
>> +	unsigned long size;
>> +
>> +	/* divide by 20 to get 5% of value */
>> +	size = memblock_end_of_DRAM();
>> +	do_div(size, 20);
>> +
>> +	/* round it down in multiples of 256 */
>> +	size = size & ~0x0FFFFFFFUL;
>> +
>> +	/* Truncate to memory_limit. We don't want to over reserve
>> the memory.*/
>> +	if (memory_limit && size > memory_limit)
>> +		size = memory_limit;
>> +
>> +	return (size > RMR_END ? size : RMR_END);
>> +}
> 
> 5% is pretty aribitrary, that's 400GB on an 8TB box. Also our experience
> with kdump is that 256MB is too small. Is there any reason to scale it
> with memory size? Can we do what kdump does and set it to a single
> value (eg 512MB)?

I have picked up this heuristic from the phyp-assisted dump code. I am
yet to figure out a fool-proof method to calculate the minimum memory
needed for any Power box to successfully boot. Till then, I presume we
can use this heuristic based approach?

While testing these patches on huge power system with 1TB RAM and 896
CPUs, I found that even 512MB is small. Hence setting it to a single
value may not work for all system configuration.

> 
> We could override the default with a boot option, which is similar to
> how kdump specifies the region to reserve.

Agree, will work on the change.

Thanks,
-Mahesh.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 03/10] fadump: Register for firmware assisted dump.
  2011-08-31  4:20     ` Anton Blanchard
@ 2011-09-07  7:20       ` Mahesh J Salgaonkar
  -1 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-09-07  7:20 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel,
	Michael Ellerman, Milton Miller, Eric W. Biederman

Hi Anton,

On 2011-08-31 14:20:49 Wed, Anton Blanchard wrote:
> 
> Hi,
> 
> > +static void fadump_show_config(void)
> > +{
> > +	DBG("Support for firmware-assisted dump (fadump): %s\n",
> > +			(fw_dump.fadump_supported ? "present" : "no support"));
> > +
> > +	if (!fw_dump.fadump_supported)
> > +		return;
> > +
> > +	DBG("Fadump enabled    : %s\n",
> > +				(fw_dump.fadump_enabled ? "yes" : "no"));
> > +	DBG("Dump Active       : %s\n", (fw_dump.dump_active ? "yes" : "no"));
> > +	DBG("Dump section sizes:\n");
> > +	DBG("	CPU state data size: %lx\n", fw_dump.cpu_state_data_size);
> > +	DBG("	HPTE region size   : %lx\n", fw_dump.hpte_region_size);
> > +	DBG("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
> > +	DBG("Reserve area start: %lx\n", fw_dump.reserve_dump_area_start);
> > +	DBG("Reserve area size : %lx\n", fw_dump.reserve_dump_area_size);
> > +}
> > +
> > +static void show_fadump_mem_struct(const struct fadump_mem_struct *fdm)
> > +{
> > +	if (!fdm)
> > +		return;
> > +
> > +	DBG("--------Firmware-assisted dump memory structure---------\n");
> > +	DBG("header.dump_format_version        : 0x%08x\n",
> > +					fdm->header.dump_format_version);
> > +	DBG("header.dump_num_sections          : %d\n",
> > +					fdm->header.dump_num_sections);
> > +	DBG("header.dump_status_flag           : 0x%04x\n",
> > +					fdm->header.dump_status_flag);
> > +	DBG("header.offset_first_dump_section  : 0x%x\n",
> > +					fdm->header.offset_first_dump_section);
> > +
> > +	DBG("header.dd_block_size              : %d\n",
> > +					fdm->header.dd_block_size);
> > +	DBG("header.dd_block_offset            : 0x%Lx\n",
> > +					fdm->header.dd_block_offset);
> > +	DBG("header.dd_num_blocks              : %Lx\n",
> > +					fdm->header.dd_num_blocks);
> > +	DBG("header.dd_offset_disk_path        : 0x%x\n",
> > +					fdm->header.dd_offset_disk_path);
> > +
> > +	DBG("header.max_time_auto              : %d\n",
> > +					fdm->header.max_time_auto);
> > +
> > +	/* Kernel dump sections */
> > +	DBG("cpu_state_data.request_flag       : 0x%08x\n",
> > +					fdm->cpu_state_data.request_flag);
> > +	DBG("cpu_state_data.source_data_type   : 0x%04x\n",
> > +					fdm->cpu_state_data.source_data_type);
> > +	DBG("cpu_state_data.error_flags        : 0x%04x\n",
> > +					fdm->cpu_state_data.error_flags);
> > +	DBG("cpu_state_data.source_address     : 0x%016Lx\n",
> > +					fdm->cpu_state_data.source_address);
> > +	DBG("cpu_state_data.source_len         : 0x%Lx\n",
> > +					fdm->cpu_state_data.source_len);
> > +	DBG("cpu_state_data.bytes_dumped       : 0x%Lx\n",
> > +					fdm->cpu_state_data.bytes_dumped);
> > +	DBG("cpu_state_data.destination_address: 0x%016Lx\n",
> > +				fdm->cpu_state_data.destination_address);
> > +
> > +	DBG("hpte_region.request_flag          : 0x%08x\n",
> > +					fdm->hpte_region.request_flag);
> > +	DBG("hpte_region.source_data_type      : 0x%04x\n",
> > +					fdm->hpte_region.source_data_type);
> > +	DBG("hpte_region.error_flags           : 0x%04x\n",
> > +					fdm->hpte_region.error_flags);
> > +	DBG("hpte_region.source_address        : 0x%016Lx\n",
> > +					fdm->hpte_region.source_address);
> > +	DBG("hpte_region.source_len            : 0x%Lx\n",
> > +					fdm->hpte_region.source_len);
> > +	DBG("hpte_region.bytes_dumped          : 0x%Lx\n",
> > +					fdm->hpte_region.bytes_dumped);
> > +	DBG("hpte_region.destination_address   : 0x%016Lx\n",
> > +				fdm->hpte_region.destination_address);
> > +
> > +	DBG("rmr_region.request_flag           : 0x%08x\n",
> > +					fdm->rmr_region.request_flag);
> > +	DBG("rmr_region.source_data_type       : 0x%04x\n",
> > +					fdm->rmr_region.source_data_type);
> > +	DBG("rmr_region.error_flags            : 0x%04x\n",
> > +					fdm->rmr_region.error_flags);
> > +	DBG("rmr_region.source_address         : 0x%016Lx\n",
> > +					fdm->rmr_region.source_address);
> > +	DBG("rmr_region.source_len             : 0x%Lx\n",
> > +					fdm->rmr_region.source_len);
> > +	DBG("rmr_region.bytes_dumped           : 0x%Lx\n",
> > +					fdm->rmr_region.bytes_dumped);
> > +	DBG("rmr_region.destination_address    : 0x%016Lx\n",
> > +				fdm->rmr_region.destination_address);
> > +
> > +	DBG("--------Firmware-assisted dump memory structure---------\n");
> > +}
> > +
> 
> That's an awful lot of debug information. I don't think we need to carry
> this around in the kernel once the feature is working.

Sure, will remove them.

> 
> > +static ssize_t fadump_enabled_show(struct kobject *kobj,
> > +					struct kobj_attribute *attr,
> > +					char *buf)
> > +{
> > +	return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
> > +}
> > +
> 
> > +static ssize_t fadump_region_show(struct kobject *kobj,
> > +					struct kobj_attribute *attr,
> > +					char *buf)
> > +{
> > +	const struct fadump_mem_struct *fdm_ptr;
> > +	ssize_t n = 0;
> > +
> > +	if (!fw_dump.fadump_enabled)
> > +		return n;
> > +
> > +	if (fdm_active)
> > +		fdm_ptr = fdm_active;
> > +	else
> > +		fdm_ptr = &fdm;
> > +
> > +	n += sprintf(buf,
> > +			"CPU : [%#016llx-%#016llx] %#llx bytes, "
> > +			"Dumped: %#llx\n",
> > +			fdm_ptr->cpu_state_data.destination_address,
> > +			fdm_ptr->cpu_state_data.destination_address +
> > +			fdm_ptr->cpu_state_data.source_len - 1,
> > +			fdm_ptr->cpu_state_data.source_len,
> > +			fdm_ptr->cpu_state_data.bytes_dumped);
> > +	n += sprintf(buf + n,
> > +			"HPTE: [%#016llx-%#016llx] %#llx bytes, "
> > +			"Dumped: %#llx\n",
> > +			fdm_ptr->hpte_region.destination_address,
> > +			fdm_ptr->hpte_region.destination_address +
> > +			fdm_ptr->hpte_region.source_len - 1,
> > +			fdm_ptr->hpte_region.source_len,
> > +			fdm_ptr->hpte_region.bytes_dumped);
> > +	n += sprintf(buf + n,
> > +			"DUMP: [%#016llx-%#016llx] %#llx bytes, "
> > +			"Dumped: %#llx\n",
> > +			fdm_ptr->rmr_region.destination_address,
> > +			fdm_ptr->rmr_region.destination_address +
> > +			fdm_ptr->rmr_region.source_len - 1,
> > +			fdm_ptr->rmr_region.source_len,
> > +			fdm_ptr->rmr_region.bytes_dumped);
> > +
> > +	if (!fdm_active ||
> > +		(fw_dump.reserve_dump_area_start ==
> > +		fdm_ptr->cpu_state_data.destination_address))
> > +		return n;
> > +
> > +	/* Dump is active. Show reserved memory region. */
> > +	n += sprintf(buf + n,
> > +			"    : [%#016llx-%#016llx] %#llx bytes, "
> > +			"Dumped: %#llx\n",
> > +			(unsigned long long)fw_dump.reserve_dump_area_start,
> > +			fdm_ptr->cpu_state_data.destination_address - 1,
> > +			fdm_ptr->cpu_state_data.destination_address -
> > +			fw_dump.reserve_dump_area_start,
> > +			fdm_ptr->cpu_state_data.destination_address -
> > +			fw_dump.reserve_dump_area_start);
> > +	return n;
> > +}
> > +
> > +static struct kobj_attribute fadump_attr = __ATTR(fadump_enabled,
> > +						0444, fadump_enabled_show,
> > +						NULL);
> > +static struct kobj_attribute fadump_region_attr = __ATTR(fadump_region,
> > +						0444, fadump_region_show, NULL);
> > +
> > +static int fadump_init_sysfs(void)
> > +{
> > +	int rc = 0;
> > +
> > +	rc = sysfs_create_file(kernel_kobj, &fadump_attr.attr);
> > +	if (rc)
> > +		printk(KERN_ERR "fadump: unable to create sysfs file"
> > +			" (%d)\n", rc);
> > +
> > +	rc = sysfs_create_file(kernel_kobj, &fadump_region_attr.attr);
> > +	if (rc)
> > +		printk(KERN_ERR "fadump: unable to create sysfs file"
> > +			" (%d)\n", rc);
> > +	return rc;
> > +}
> > +subsys_initcall(fadump_init_sysfs);
> 
> Do we need to dump this all out via sysfs? Will tools depend on this,
> or is it just for debug? It might be better to place in debugfs.

The 'fadump_enabled' sysfs attribute will be used by tool (e.g. kdump
init script) to find out whether fadump is enabled or not and act accordingly.
I will place 'fadump_region' under debugfs as it only shows the fadump memory
reservation map information.

Thanks,
-Mahesh.

-- 
Mahesh J Salgaonkar


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 03/10] fadump: Register for firmware assisted dump.
@ 2011-09-07  7:20       ` Mahesh J Salgaonkar
  0 siblings, 0 replies; 34+ messages in thread
From: Mahesh J Salgaonkar @ 2011-09-07  7:20 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Linux Kernel, Milton Miller, Michael Ellerman, Eric W. Biederman,
	linuxppc-dev

Hi Anton,

On 2011-08-31 14:20:49 Wed, Anton Blanchard wrote:
> 
> Hi,
> 
> > +static void fadump_show_config(void)
> > +{
> > +	DBG("Support for firmware-assisted dump (fadump): %s\n",
> > +			(fw_dump.fadump_supported ? "present" : "no support"));
> > +
> > +	if (!fw_dump.fadump_supported)
> > +		return;
> > +
> > +	DBG("Fadump enabled    : %s\n",
> > +				(fw_dump.fadump_enabled ? "yes" : "no"));
> > +	DBG("Dump Active       : %s\n", (fw_dump.dump_active ? "yes" : "no"));
> > +	DBG("Dump section sizes:\n");
> > +	DBG("	CPU state data size: %lx\n", fw_dump.cpu_state_data_size);
> > +	DBG("	HPTE region size   : %lx\n", fw_dump.hpte_region_size);
> > +	DBG("Boot memory size  : %lx\n", fw_dump.boot_memory_size);
> > +	DBG("Reserve area start: %lx\n", fw_dump.reserve_dump_area_start);
> > +	DBG("Reserve area size : %lx\n", fw_dump.reserve_dump_area_size);
> > +}
> > +
> > +static void show_fadump_mem_struct(const struct fadump_mem_struct *fdm)
> > +{
> > +	if (!fdm)
> > +		return;
> > +
> > +	DBG("--------Firmware-assisted dump memory structure---------\n");
> > +	DBG("header.dump_format_version        : 0x%08x\n",
> > +					fdm->header.dump_format_version);
> > +	DBG("header.dump_num_sections          : %d\n",
> > +					fdm->header.dump_num_sections);
> > +	DBG("header.dump_status_flag           : 0x%04x\n",
> > +					fdm->header.dump_status_flag);
> > +	DBG("header.offset_first_dump_section  : 0x%x\n",
> > +					fdm->header.offset_first_dump_section);
> > +
> > +	DBG("header.dd_block_size              : %d\n",
> > +					fdm->header.dd_block_size);
> > +	DBG("header.dd_block_offset            : 0x%Lx\n",
> > +					fdm->header.dd_block_offset);
> > +	DBG("header.dd_num_blocks              : %Lx\n",
> > +					fdm->header.dd_num_blocks);
> > +	DBG("header.dd_offset_disk_path        : 0x%x\n",
> > +					fdm->header.dd_offset_disk_path);
> > +
> > +	DBG("header.max_time_auto              : %d\n",
> > +					fdm->header.max_time_auto);
> > +
> > +	/* Kernel dump sections */
> > +	DBG("cpu_state_data.request_flag       : 0x%08x\n",
> > +					fdm->cpu_state_data.request_flag);
> > +	DBG("cpu_state_data.source_data_type   : 0x%04x\n",
> > +					fdm->cpu_state_data.source_data_type);
> > +	DBG("cpu_state_data.error_flags        : 0x%04x\n",
> > +					fdm->cpu_state_data.error_flags);
> > +	DBG("cpu_state_data.source_address     : 0x%016Lx\n",
> > +					fdm->cpu_state_data.source_address);
> > +	DBG("cpu_state_data.source_len         : 0x%Lx\n",
> > +					fdm->cpu_state_data.source_len);
> > +	DBG("cpu_state_data.bytes_dumped       : 0x%Lx\n",
> > +					fdm->cpu_state_data.bytes_dumped);
> > +	DBG("cpu_state_data.destination_address: 0x%016Lx\n",
> > +				fdm->cpu_state_data.destination_address);
> > +
> > +	DBG("hpte_region.request_flag          : 0x%08x\n",
> > +					fdm->hpte_region.request_flag);
> > +	DBG("hpte_region.source_data_type      : 0x%04x\n",
> > +					fdm->hpte_region.source_data_type);
> > +	DBG("hpte_region.error_flags           : 0x%04x\n",
> > +					fdm->hpte_region.error_flags);
> > +	DBG("hpte_region.source_address        : 0x%016Lx\n",
> > +					fdm->hpte_region.source_address);
> > +	DBG("hpte_region.source_len            : 0x%Lx\n",
> > +					fdm->hpte_region.source_len);
> > +	DBG("hpte_region.bytes_dumped          : 0x%Lx\n",
> > +					fdm->hpte_region.bytes_dumped);
> > +	DBG("hpte_region.destination_address   : 0x%016Lx\n",
> > +				fdm->hpte_region.destination_address);
> > +
> > +	DBG("rmr_region.request_flag           : 0x%08x\n",
> > +					fdm->rmr_region.request_flag);
> > +	DBG("rmr_region.source_data_type       : 0x%04x\n",
> > +					fdm->rmr_region.source_data_type);
> > +	DBG("rmr_region.error_flags            : 0x%04x\n",
> > +					fdm->rmr_region.error_flags);
> > +	DBG("rmr_region.source_address         : 0x%016Lx\n",
> > +					fdm->rmr_region.source_address);
> > +	DBG("rmr_region.source_len             : 0x%Lx\n",
> > +					fdm->rmr_region.source_len);
> > +	DBG("rmr_region.bytes_dumped           : 0x%Lx\n",
> > +					fdm->rmr_region.bytes_dumped);
> > +	DBG("rmr_region.destination_address    : 0x%016Lx\n",
> > +				fdm->rmr_region.destination_address);
> > +
> > +	DBG("--------Firmware-assisted dump memory structure---------\n");
> > +}
> > +
> 
> That's an awful lot of debug information. I don't think we need to carry
> this around in the kernel once the feature is working.

Sure, will remove them.

> 
> > +static ssize_t fadump_enabled_show(struct kobject *kobj,
> > +					struct kobj_attribute *attr,
> > +					char *buf)
> > +{
> > +	return sprintf(buf, "%d\n", fw_dump.fadump_enabled);
> > +}
> > +
> 
> > +static ssize_t fadump_region_show(struct kobject *kobj,
> > +					struct kobj_attribute *attr,
> > +					char *buf)
> > +{
> > +	const struct fadump_mem_struct *fdm_ptr;
> > +	ssize_t n = 0;
> > +
> > +	if (!fw_dump.fadump_enabled)
> > +		return n;
> > +
> > +	if (fdm_active)
> > +		fdm_ptr = fdm_active;
> > +	else
> > +		fdm_ptr = &fdm;
> > +
> > +	n += sprintf(buf,
> > +			"CPU : [%#016llx-%#016llx] %#llx bytes, "
> > +			"Dumped: %#llx\n",
> > +			fdm_ptr->cpu_state_data.destination_address,
> > +			fdm_ptr->cpu_state_data.destination_address +
> > +			fdm_ptr->cpu_state_data.source_len - 1,
> > +			fdm_ptr->cpu_state_data.source_len,
> > +			fdm_ptr->cpu_state_data.bytes_dumped);
> > +	n += sprintf(buf + n,
> > +			"HPTE: [%#016llx-%#016llx] %#llx bytes, "
> > +			"Dumped: %#llx\n",
> > +			fdm_ptr->hpte_region.destination_address,
> > +			fdm_ptr->hpte_region.destination_address +
> > +			fdm_ptr->hpte_region.source_len - 1,
> > +			fdm_ptr->hpte_region.source_len,
> > +			fdm_ptr->hpte_region.bytes_dumped);
> > +	n += sprintf(buf + n,
> > +			"DUMP: [%#016llx-%#016llx] %#llx bytes, "
> > +			"Dumped: %#llx\n",
> > +			fdm_ptr->rmr_region.destination_address,
> > +			fdm_ptr->rmr_region.destination_address +
> > +			fdm_ptr->rmr_region.source_len - 1,
> > +			fdm_ptr->rmr_region.source_len,
> > +			fdm_ptr->rmr_region.bytes_dumped);
> > +
> > +	if (!fdm_active ||
> > +		(fw_dump.reserve_dump_area_start ==
> > +		fdm_ptr->cpu_state_data.destination_address))
> > +		return n;
> > +
> > +	/* Dump is active. Show reserved memory region. */
> > +	n += sprintf(buf + n,
> > +			"    : [%#016llx-%#016llx] %#llx bytes, "
> > +			"Dumped: %#llx\n",
> > +			(unsigned long long)fw_dump.reserve_dump_area_start,
> > +			fdm_ptr->cpu_state_data.destination_address - 1,
> > +			fdm_ptr->cpu_state_data.destination_address -
> > +			fw_dump.reserve_dump_area_start,
> > +			fdm_ptr->cpu_state_data.destination_address -
> > +			fw_dump.reserve_dump_area_start);
> > +	return n;
> > +}
> > +
> > +static struct kobj_attribute fadump_attr = __ATTR(fadump_enabled,
> > +						0444, fadump_enabled_show,
> > +						NULL);
> > +static struct kobj_attribute fadump_region_attr = __ATTR(fadump_region,
> > +						0444, fadump_region_show, NULL);
> > +
> > +static int fadump_init_sysfs(void)
> > +{
> > +	int rc = 0;
> > +
> > +	rc = sysfs_create_file(kernel_kobj, &fadump_attr.attr);
> > +	if (rc)
> > +		printk(KERN_ERR "fadump: unable to create sysfs file"
> > +			" (%d)\n", rc);
> > +
> > +	rc = sysfs_create_file(kernel_kobj, &fadump_region_attr.attr);
> > +	if (rc)
> > +		printk(KERN_ERR "fadump: unable to create sysfs file"
> > +			" (%d)\n", rc);
> > +	return rc;
> > +}
> > +subsys_initcall(fadump_init_sysfs);
> 
> Do we need to dump this all out via sysfs? Will tools depend on this,
> or is it just for debug? It might be better to place in debugfs.

The 'fadump_enabled' sysfs attribute will be used by tool (e.g. kdump
init script) to find out whether fadump is enabled or not and act accordingly.
I will place 'fadump_region' under debugfs as it only shows the fadump memory
reservation map information.

Thanks,
-Mahesh.

-- 
Mahesh J Salgaonkar

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 03/10] fadump: Register for firmware assisted dump.
  2011-07-13 18:07   ` Mahesh J Salgaonkar
@ 2011-09-08 18:34     ` Kumar Gala
  -1 siblings, 0 replies; 34+ messages in thread
From: Kumar Gala @ 2011-09-08 18:34 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: Benjamin Herrenschmidt, linuxppc-dev, Linux Kernel,
	Michael Ellerman, Haren Myneni, Anton Blanchard, Milton Miller,
	Eric W. Biederman

> 
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index a88bf27..3031ea7 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -63,6 +63,7 @@
> #include <asm/kexec.h>
> #include <asm/mmu_context.h>
> #include <asm/code-patching.h>
> +#include <asm/fadump.h>
> 
> #include "setup.h"
> 
> @@ -371,6 +372,13 @@ void __init setup_system(void)
> 	rtas_initialize();
> #endif /* CONFIG_PPC_RTAS */
> 
> +#ifdef CONFIG_FA_DUMP
> +	/*
> +	 * Setup Firmware-assisted dump.
> +	 */
> +	setup_fadump();

Is there a reason this has to be done here?  Can it be an initcall or called from platform init code?

> +#endif
> +

- k




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 03/10] fadump: Register for firmware assisted dump.
@ 2011-09-08 18:34     ` Kumar Gala
  0 siblings, 0 replies; 34+ messages in thread
From: Kumar Gala @ 2011-09-08 18:34 UTC (permalink / raw)
  To: Mahesh J Salgaonkar
  Cc: Linux Kernel, Milton Miller, Michael Ellerman, Anton Blanchard,
	linuxppc-dev, Eric W. Biederman

>=20
> diff --git a/arch/powerpc/kernel/setup_64.c =
b/arch/powerpc/kernel/setup_64.c
> index a88bf27..3031ea7 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -63,6 +63,7 @@
> #include <asm/kexec.h>
> #include <asm/mmu_context.h>
> #include <asm/code-patching.h>
> +#include <asm/fadump.h>
>=20
> #include "setup.h"
>=20
> @@ -371,6 +372,13 @@ void __init setup_system(void)
> 	rtas_initialize();
> #endif /* CONFIG_PPC_RTAS */
>=20
> +#ifdef CONFIG_FA_DUMP
> +	/*
> +	 * Setup Firmware-assisted dump.
> +	 */
> +	setup_fadump();

Is there a reason this has to be done here?  Can it be an initcall or =
called from platform init code?

> +#endif
> +

- k

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2011-09-08 23:00 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-13 18:06 [RFC PATCH 00/10] fadump: Firmware-assisted dump support for Powerpc Mahesh J Salgaonkar
2011-07-13 18:06 ` Mahesh J Salgaonkar
2011-07-13 18:06 ` [RFC PATCH 01/10] fadump: Add documentation for firmware-assisted dump Mahesh J Salgaonkar
2011-07-13 18:06   ` Mahesh J Salgaonkar
2011-07-13 18:06 ` [RFC PATCH 02/10] fadump: Reserve the memory for firmware assisted dump Mahesh J Salgaonkar
2011-07-13 18:06   ` Mahesh J Salgaonkar
2011-08-31  4:11   ` Anton Blanchard
2011-08-31  4:11     ` Anton Blanchard
2011-09-06 11:59     ` Mahesh Jagannath Salgaonkar
2011-09-06 11:59       ` Mahesh Jagannath Salgaonkar
2011-07-13 18:07 ` [RFC PATCH 03/10] fadump: Register " Mahesh J Salgaonkar
2011-07-13 18:07   ` Mahesh J Salgaonkar
2011-08-31  4:20   ` Anton Blanchard
2011-08-31  4:20     ` Anton Blanchard
2011-09-07  7:20     ` Mahesh J Salgaonkar
2011-09-07  7:20       ` Mahesh J Salgaonkar
2011-09-08 18:34   ` Kumar Gala
2011-09-08 18:34     ` Kumar Gala
2011-07-13 18:07 ` [RFC PATCH 04/10] fadump: Initialize elfcore header and add PT_LOAD program headers Mahesh J Salgaonkar
2011-07-13 18:07   ` Mahesh J Salgaonkar
2011-07-13 18:07 ` [RFC PATCH 05/10] fadump: Convert firmware-assisted cpu state dump data into elf notes Mahesh J Salgaonkar
2011-07-13 18:07   ` Mahesh J Salgaonkar
2011-08-31  4:23   ` Anton Blanchard
2011-08-31  4:23     ` Anton Blanchard
2011-07-13 18:07 ` [RFC PATCH 06/10] fadump: Add PT_NOTE program header for vmcoreinfo Mahesh J Salgaonkar
2011-07-13 18:07   ` Mahesh J Salgaonkar
2011-07-13 18:08 ` [RFC PATCH 07/10] fadump: Introduce cleanup routine to invalidate /proc/vmcore Mahesh J Salgaonkar
2011-07-13 18:08   ` Mahesh J Salgaonkar
2011-07-13 18:08 ` [RFC PATCH 08/10] fadump: Invalidate registration and release reserved memory for general use Mahesh J Salgaonkar
2011-07-13 18:08   ` Mahesh J Salgaonkar
2011-07-13 18:08 ` [RFC PATCH 09/10] fadump: Invalidate the fadump registration during machine shutdown Mahesh J Salgaonkar
2011-07-13 18:08   ` Mahesh J Salgaonkar
2011-07-13 18:08 ` [RFC PATCH 10/10] fadump: Introduce config option for firmware assisted dump feature Mahesh J Salgaonkar
2011-07-13 18:08   ` Mahesh J Salgaonkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.