All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
To: Avi Kivity <avi@redhat.com>
Cc: mtosatti@redhat.com, ebiederm@xmission.com, luto@mit.edu,
	Joerg Roedel <joerg.roedel@amd.com>,
	dzickus@redhat.com, paul.gortmaker@windriver.com,
	ludwig.nussel@suse.de, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, kexec@lists.infradead.org,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [PATCH v5 0/3] Export offsets of VMCS fields as note information for kdump
Date: Mon, 30 Jul 2012 10:53:43 +0800	[thread overview]
Message-ID: <5015F737.7090508@cn.fujitsu.com> (raw)
In-Reply-To: <4FFE9EDE.8080107@cn.fujitsu.com>

Hello Avi,

Do you have any comments about this version of the patch set?

于 2012年07月12日 17:54, Zhang Yanfei 写道:
> This patch set exports offsets of VMCS fields as note information for
> kdump. We call it VMCSINFO. The purpose of VMCSINFO is to retrieve
> runtime state of guest machine image, such as registers, in host
> machine's crash dump as VMCS format. The problem is that VMCS internal
> is hidden by Intel in its specification. So, we slove this problem
> by reverse engineering implemented in this patch set. The VMCSINFO
> is exported via sysfs (/sys/devices/system/cpu/vmcs/) to kexec-tools.
> 
> Here are two usercases for two features that we want.
> 
> 1) Create guest machine's crash dumpfile from host machine's crash dumpfile
> 
> In general, we want to use this feature on failure analysis for the system
> where the processing depends on the communication between host and guest
> machines to look into the system from both machines's viewpoints.
> 
> As a concrete situation, consider where there's heartbeat monitoring
> feature on the guest machine's side, where we need to determine in
> which machine side the cause of heartbeat stop lies. In our actual
> experiments, we encountered such situation and we found the cause of
> the bug was in host's process schedular so guest machine's vcpu stopped
> for a long time and then led to heartbeat stop.
> 
> The module that judges heartbeat stop is on guest machine, so we need
> to debug guest machine's data. But if the cause lies in host machine
> side, we need to look into host machine's crash dump.
> 
> Without this feature, we first create guest machine's dump and then
> create host mahine's, but there's only a short time between two
> processings, during which it's unlikely that buggy situation remains.
> 
> So, we think the feature is useful to debug both guest machine's and
> host machine's sides at the same time, and expect we can make failure
> analysis efficiently.
> 
> Of course, we believe this feature is commonly useful on the situation
> where guest machine doesn't work well due to something of host machine's.
> 
> 2) Get offsets of VMCS information on the CPU running on the host machine
> 
> If kdump doesn't work well, then it means we cannot use kvm API to get
> register values of guest machine and they are still left on its vmcs
> region. In the case, we use crash dump mechanism running outside of
> linux kernel, such as sadump, a firmware-based crash dump. Then VMCS
> information is then necessary.
> 
> TODO:
>   1. In kexec-tools, get VMCSINFO via sysfs and dump it as note information
>      into vmcore.
>   2. Dump VMCS region of each guest vcpu and VMCSINFO into qemu-process
>      core file. To do this, we will modify kernel core dumper, gdb gcore
>      and crash gcore.
>   3. Dump guest image from the qemu-process core file into a vmcore.
> 
> Changelog from v4 to v5:
> 1. The VMCSINFO is stored in a two-dimensional array filled with each
>    field's encoding and corresponding offset. So the size of VMCSINFO
>    is much smaller.
> 2. vmcs sysfs file /sys/devices/system/cpu/vmcs_id is moved to
>    /sys/devices/system/cpu/vmcs/id.
> 3. Rewrite the ABI entry for vmcs interface and remove the KernelVersion
>    line.
> 
> Changelog from v3 to v4:
> 1. All the variables and functions are moved to vmcsinfo-intel module.
> 2. Add a new sysfs interface /sys/devices/system/cpu/vmcs_id to export
>    vmcs revision identifier. And origial sysfs interface is changed
>    from /sys/devices/cpu/vmcs to /sys/devices/system/cpu/vmcs. Thanks
>    Greg KH for his helpful comments about sysfs.
> 
> Changelog from v2 to v3:
> 1. New VMCSINFO format.
>    Now the VMCSINFO is mainly made up of an array that contains all vmcs
>    fields' offsets. The offsets aren't encoded because we decode them in
>    the module itself. If some field doesn't exist or its offset cannot be
>    decoded correctly, the offset in the array is just set to zero.
> 2. New sysfs interface and Documentation/ABI entry. 
>    We expose the actual fields in /sys/devices/cpu/vmcs instead of just
>    exporting the address of VMCSINFO in /sys/kernel/vmcsinfo.
>    For example, /sys/devices/cpu/vmcs/0800 contains the offset of
>    GUEST_DS_SELECTOR. 0800 is the encoding of GUEST_DS_SELECTOR.
>    Accordingly, ABI entry in Documentation is changed from sysfs-kernel-vmcsinfo
>    to sysfs-devices-cpu-vmcs.
> 
> Changelog from v1 to v2:
> 1. The VMCSINFO now has a simple binary <field><encoded offset> format,
>    as below:
>      +-------------+--------------------------+
>      | Byte offset | Contents                 |
>      +-------------+--------------------------+
>      | 0           | VMCS revision identifier |
>      +-------------+--------------------------+
>      | 4           | <field><encoded offset>  |
>      +-------------+--------------------------+
>      | 16          | <field><encoded offset>  |
>      +-------------+--------------------------+
>      ......
>   
>    The first 32 bits of VMCSINFO contains the VMCS revision identifier.
>    The remainder of VMCSINFO is used for <field><encoded offset> sets.
>    Each set takes 12 bytes: field occupys 4 bytes and its corresponding
>    encoded offset occupys 8 bytes.
> 
>    Encoded offsets are raw values read by vmcs_read{16, 64, 32, l}, and
>    they are all unsigned extended to 8 bytes for each <field><encoded offset>
>    set will have the same size. 
>    We do not decode offsets here. The decoding work is delayed in userspace
>    tools for more flexible handling.
>    
>    And here are two examples of the new VMCSINFO:
>    Processor: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
>    VMCSINFO contains:
>      <0000000d>                   --> VMCS revision id = 0xd
>      <00004000><0000000001840180> --> OFFSET(PIN_BASED_VM_EXEC_CONTROL) = 0x01840180
>      <00004002><0000000001940190> --> OFFSET(CPU_BASED_VM_EXEC_CONTROL) = 0x01940190
>      <0000401e><000000000fe40fe0> --> OFFSET(SECONDARY_VM_EXEC_CONTROL) = 0x0fe40fe0
>      <0000400c><0000000001e401e0> --> OFFSET(VM_EXIT_CONTROLS) = 0x01e401e0
>      ......
> 
>    Processor: Intel(R) Xeon(R) CPU           E7540  @ 2.00GHz (24 cores)
>    VMCSINFO contains:
>      <0000000e>                   --> VMCS revision id = 0xe 
>      <00004000><0000000005540550> --> OFFSET(PIN_BASED_VM_EXEC_CONTROL) = 0x05540550
>      <00004002><0000000005440540> --> OFFSET(CPU_BASED_VM_EXEC_CONTROL) = 0x05440540
>      <0000401e><00000000054c0548> --> OFFSET(SECONDARY_VM_EXEC_CONTROL) = 0x054c0548
>      <0000400c><00000000057c0578> --> OFFSET(VM_EXIT_CONTROLS) = 0x057c0578
>      ......
> 
> 2. Add a new kernel module *vmcsinfo-intel* for filling VMCSINFO instead
>    of putting it in module kvm-intel. The new module is auto-loaded
>    when the vmx cpufeature is detected and it depends on module kvm-intel.
>    *Loading and unloading this module will have no side effect on the
>    running guests.*
> 3. The sysfs file vmcsinfo is splitted into 2 files:
>    /sys/kernel/vmcsinfo: shows physical address of VMCSINFO note information.
>    /sys/kernel/vmcsinfo_maxsize: shows max size of VMCSINFO.
> 4. A new Documentation/ABI entry is added for vmcsinfo and vmcsinfo_maxsize.
> 5. Do not update VMCSINFO note when the kernel is panicked.
> 
> zhangyanfei (3):
>   KVM: Export symbols for module vmcsinfo-intel
>   KVM-INTEL: Add new module vmcsinfo-intel to fill VMCSINFO
>   Documentation: Add ABI entry for vmcs sysfs interface.
> 
>  Documentation/ABI/testing/sysfs-devices-system-cpu |   20 +
>  arch/x86/include/asm/vmx.h                         |   73 ++
>  arch/x86/kvm/Kconfig                               |   11 +
>  arch/x86/kvm/Makefile                              |    3 +
>  arch/x86/kvm/vmcsinfo.c                            |  714 ++++++++++++++++++++
>  arch/x86/kvm/vmx.c                                 |   81 +--
>  include/linux/kvm_host.h                           |    3 +
>  virt/kvm/kvm_main.c                                |    8 +-
>  8 files changed, 841 insertions(+), 72 deletions(-)
>  create mode 100644 arch/x86/kvm/vmcsinfo.c


WARNING: multiple messages have this Message-ID (diff)
From: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
To: Avi Kivity <avi@redhat.com>
Cc: dzickus@redhat.com, luto@mit.edu, kvm@vger.kernel.org,
	Joerg Roedel <joerg.roedel@amd.com>,
	mtosatti@redhat.com, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, paul.gortmaker@windriver.com,
	ludwig.nussel@suse.de, ebiederm@xmission.com,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [PATCH v5 0/3] Export offsets of VMCS fields as note information for kdump
Date: Mon, 30 Jul 2012 10:53:43 +0800	[thread overview]
Message-ID: <5015F737.7090508@cn.fujitsu.com> (raw)
In-Reply-To: <4FFE9EDE.8080107@cn.fujitsu.com>

Hello Avi,

Do you have any comments about this version of the patch set?

于 2012年07月12日 17:54, Zhang Yanfei 写道:
> This patch set exports offsets of VMCS fields as note information for
> kdump. We call it VMCSINFO. The purpose of VMCSINFO is to retrieve
> runtime state of guest machine image, such as registers, in host
> machine's crash dump as VMCS format. The problem is that VMCS internal
> is hidden by Intel in its specification. So, we slove this problem
> by reverse engineering implemented in this patch set. The VMCSINFO
> is exported via sysfs (/sys/devices/system/cpu/vmcs/) to kexec-tools.
> 
> Here are two usercases for two features that we want.
> 
> 1) Create guest machine's crash dumpfile from host machine's crash dumpfile
> 
> In general, we want to use this feature on failure analysis for the system
> where the processing depends on the communication between host and guest
> machines to look into the system from both machines's viewpoints.
> 
> As a concrete situation, consider where there's heartbeat monitoring
> feature on the guest machine's side, where we need to determine in
> which machine side the cause of heartbeat stop lies. In our actual
> experiments, we encountered such situation and we found the cause of
> the bug was in host's process schedular so guest machine's vcpu stopped
> for a long time and then led to heartbeat stop.
> 
> The module that judges heartbeat stop is on guest machine, so we need
> to debug guest machine's data. But if the cause lies in host machine
> side, we need to look into host machine's crash dump.
> 
> Without this feature, we first create guest machine's dump and then
> create host mahine's, but there's only a short time between two
> processings, during which it's unlikely that buggy situation remains.
> 
> So, we think the feature is useful to debug both guest machine's and
> host machine's sides at the same time, and expect we can make failure
> analysis efficiently.
> 
> Of course, we believe this feature is commonly useful on the situation
> where guest machine doesn't work well due to something of host machine's.
> 
> 2) Get offsets of VMCS information on the CPU running on the host machine
> 
> If kdump doesn't work well, then it means we cannot use kvm API to get
> register values of guest machine and they are still left on its vmcs
> region. In the case, we use crash dump mechanism running outside of
> linux kernel, such as sadump, a firmware-based crash dump. Then VMCS
> information is then necessary.
> 
> TODO:
>   1. In kexec-tools, get VMCSINFO via sysfs and dump it as note information
>      into vmcore.
>   2. Dump VMCS region of each guest vcpu and VMCSINFO into qemu-process
>      core file. To do this, we will modify kernel core dumper, gdb gcore
>      and crash gcore.
>   3. Dump guest image from the qemu-process core file into a vmcore.
> 
> Changelog from v4 to v5:
> 1. The VMCSINFO is stored in a two-dimensional array filled with each
>    field's encoding and corresponding offset. So the size of VMCSINFO
>    is much smaller.
> 2. vmcs sysfs file /sys/devices/system/cpu/vmcs_id is moved to
>    /sys/devices/system/cpu/vmcs/id.
> 3. Rewrite the ABI entry for vmcs interface and remove the KernelVersion
>    line.
> 
> Changelog from v3 to v4:
> 1. All the variables and functions are moved to vmcsinfo-intel module.
> 2. Add a new sysfs interface /sys/devices/system/cpu/vmcs_id to export
>    vmcs revision identifier. And origial sysfs interface is changed
>    from /sys/devices/cpu/vmcs to /sys/devices/system/cpu/vmcs. Thanks
>    Greg KH for his helpful comments about sysfs.
> 
> Changelog from v2 to v3:
> 1. New VMCSINFO format.
>    Now the VMCSINFO is mainly made up of an array that contains all vmcs
>    fields' offsets. The offsets aren't encoded because we decode them in
>    the module itself. If some field doesn't exist or its offset cannot be
>    decoded correctly, the offset in the array is just set to zero.
> 2. New sysfs interface and Documentation/ABI entry. 
>    We expose the actual fields in /sys/devices/cpu/vmcs instead of just
>    exporting the address of VMCSINFO in /sys/kernel/vmcsinfo.
>    For example, /sys/devices/cpu/vmcs/0800 contains the offset of
>    GUEST_DS_SELECTOR. 0800 is the encoding of GUEST_DS_SELECTOR.
>    Accordingly, ABI entry in Documentation is changed from sysfs-kernel-vmcsinfo
>    to sysfs-devices-cpu-vmcs.
> 
> Changelog from v1 to v2:
> 1. The VMCSINFO now has a simple binary <field><encoded offset> format,
>    as below:
>      +-------------+--------------------------+
>      | Byte offset | Contents                 |
>      +-------------+--------------------------+
>      | 0           | VMCS revision identifier |
>      +-------------+--------------------------+
>      | 4           | <field><encoded offset>  |
>      +-------------+--------------------------+
>      | 16          | <field><encoded offset>  |
>      +-------------+--------------------------+
>      ......
>   
>    The first 32 bits of VMCSINFO contains the VMCS revision identifier.
>    The remainder of VMCSINFO is used for <field><encoded offset> sets.
>    Each set takes 12 bytes: field occupys 4 bytes and its corresponding
>    encoded offset occupys 8 bytes.
> 
>    Encoded offsets are raw values read by vmcs_read{16, 64, 32, l}, and
>    they are all unsigned extended to 8 bytes for each <field><encoded offset>
>    set will have the same size. 
>    We do not decode offsets here. The decoding work is delayed in userspace
>    tools for more flexible handling.
>    
>    And here are two examples of the new VMCSINFO:
>    Processor: Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
>    VMCSINFO contains:
>      <0000000d>                   --> VMCS revision id = 0xd
>      <00004000><0000000001840180> --> OFFSET(PIN_BASED_VM_EXEC_CONTROL) = 0x01840180
>      <00004002><0000000001940190> --> OFFSET(CPU_BASED_VM_EXEC_CONTROL) = 0x01940190
>      <0000401e><000000000fe40fe0> --> OFFSET(SECONDARY_VM_EXEC_CONTROL) = 0x0fe40fe0
>      <0000400c><0000000001e401e0> --> OFFSET(VM_EXIT_CONTROLS) = 0x01e401e0
>      ......
> 
>    Processor: Intel(R) Xeon(R) CPU           E7540  @ 2.00GHz (24 cores)
>    VMCSINFO contains:
>      <0000000e>                   --> VMCS revision id = 0xe 
>      <00004000><0000000005540550> --> OFFSET(PIN_BASED_VM_EXEC_CONTROL) = 0x05540550
>      <00004002><0000000005440540> --> OFFSET(CPU_BASED_VM_EXEC_CONTROL) = 0x05440540
>      <0000401e><00000000054c0548> --> OFFSET(SECONDARY_VM_EXEC_CONTROL) = 0x054c0548
>      <0000400c><00000000057c0578> --> OFFSET(VM_EXIT_CONTROLS) = 0x057c0578
>      ......
> 
> 2. Add a new kernel module *vmcsinfo-intel* for filling VMCSINFO instead
>    of putting it in module kvm-intel. The new module is auto-loaded
>    when the vmx cpufeature is detected and it depends on module kvm-intel.
>    *Loading and unloading this module will have no side effect on the
>    running guests.*
> 3. The sysfs file vmcsinfo is splitted into 2 files:
>    /sys/kernel/vmcsinfo: shows physical address of VMCSINFO note information.
>    /sys/kernel/vmcsinfo_maxsize: shows max size of VMCSINFO.
> 4. A new Documentation/ABI entry is added for vmcsinfo and vmcsinfo_maxsize.
> 5. Do not update VMCSINFO note when the kernel is panicked.
> 
> zhangyanfei (3):
>   KVM: Export symbols for module vmcsinfo-intel
>   KVM-INTEL: Add new module vmcsinfo-intel to fill VMCSINFO
>   Documentation: Add ABI entry for vmcs sysfs interface.
> 
>  Documentation/ABI/testing/sysfs-devices-system-cpu |   20 +
>  arch/x86/include/asm/vmx.h                         |   73 ++
>  arch/x86/kvm/Kconfig                               |   11 +
>  arch/x86/kvm/Makefile                              |    3 +
>  arch/x86/kvm/vmcsinfo.c                            |  714 ++++++++++++++++++++
>  arch/x86/kvm/vmx.c                                 |   81 +--
>  include/linux/kvm_host.h                           |    3 +
>  virt/kvm/kvm_main.c                                |    8 +-
>  8 files changed, 841 insertions(+), 72 deletions(-)
>  create mode 100644 arch/x86/kvm/vmcsinfo.c


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  parent reply	other threads:[~2012-07-30  2:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-12  9:54 [PATCH v5 0/3] Export offsets of VMCS fields as note information for kdump Zhang Yanfei
2012-07-12  9:54 ` Zhang Yanfei
2012-07-12  9:56 ` [PATCH v5 1/3] KVM: Export symbols for module vmcsinfo-intel Zhang Yanfei
2012-07-12  9:56   ` Zhang Yanfei
2012-07-12  9:57 ` [PATCH v5 2/3] KVM-INTEL: Add new module vmcsinfo-intel to fill VMCSINFO Zhang Yanfei
2012-07-12  9:57   ` Zhang Yanfei
2012-07-12  9:59 ` [PATCH v5 3/3] Documentation: Add ABI entry for vmcs sysfs interface Zhang Yanfei
2012-07-12  9:59   ` Zhang Yanfei
2012-07-30  2:53 ` Zhang Yanfei [this message]
2012-07-30  2:53   ` [PATCH v5 0/3] Export offsets of VMCS fields as note information for kdump Zhang Yanfei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5015F737.7090508@cn.fujitsu.com \
    --to=zhangyanfei@cn.fujitsu.com \
    --cc=avi@redhat.com \
    --cc=dzickus@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=joerg.roedel@amd.com \
    --cc=kexec@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ludwig.nussel@suse.de \
    --cc=luto@mit.edu \
    --cc=mtosatti@redhat.com \
    --cc=paul.gortmaker@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.