All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/7] KASLR kernel dump support
@ 2017-06-29 13:23 Marc-André Lureau
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack Marc-André Lureau
                   ` (6 more replies)
  0 siblings, 7 replies; 31+ messages in thread
From: Marc-André Lureau @ 2017-06-29 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, anderson, berrange, ehabkost, lersek, Marc-André Lureau

Recent linux kernels enable KASLR to randomize phys/virt memory
addresses. This series aims to provide enough information in qemu
dumps so that crash utility can work with randomized kernel too (it
hasn't been tested on other archs than x86 though, help welcome).

The vmcoreinfo device is an emulated ACPI device that exposes a 4k
memory range to the guest to store various informations useful to
debug the guest OS. (it is greatly inspired by the VMGENID device
implementation). The version field with value 0 is meant to give
paddr/size of the VMCOREINFO ELF PT_NOTE, other values can be used for
different purposes or OSes. (note: some wanted to see pvpanic somehow
merged with this device, I have no clear idea how to do that, nor do I
think this is a good idea since the devices are quite different, used
at different time for different purposes. And it can be done as a
future iteration if it is appropriate, feel free to send patches)

Crash 7.1.9 will parse the "phys_base" value from the VMCOREINFO note,
and thus will work with KASLR-dump produced by this series.

By priority, VMCOREINFO "phys_base" value is the most accurate. If not
available, qemu will keep the current guessed value.

The series implements the VMCOREINFO note addition in qemu ELF/kdump,
as well as the python scripts/dump-guest-memory.py.

To test:

Compile and run a guest kernel with CONFIG_RANDOMIZE_BASE=y.

Run qemu with -device vmcoreinfo.

Load the experimental vmcoreinfo module in guest
https://github.com/elmarco/vmgenid-test/blob/master/qemuvmci-test.c.

Produce an ELF dump:
{ "execute": "dump-guest-memory", "arguments": { "protocol": "file:dump", "paging": false } }

Produce a kdump:
{ "execute": "dump-guest-memory", "arguments": { "protocol": "file:dump", "paging": false, "format": "kdump-zlib" } }

Or with (gdb) dump-guest-memory, with scripts/dump-guest-memory.py script.

Analyze with crash >= 7.1.9
$ crash vmlinux dump

Marc-André Lureau (7):
  vmgenid: replace x-write-pointer-available hack
  acpi: add vmcoreinfo device
  tests: add simple vmcoreinfo test
  dump: add vmcoreinfo ELF note
  kdump: add vmcoreinfo ELF note
  scripts/dump-guest-memory.py: add vmcoreinfo
  MAINTAINERS: add Dump maintainers

 scripts/dump-guest-memory.py         |  32 ++++++
 include/hw/acpi/aml-build.h          |   1 +
 include/hw/acpi/bios-linker-loader.h |   2 +
 include/hw/acpi/vmcoreinfo.h         |  37 ++++++
 include/hw/compat.h                  |   4 -
 include/sysemu/dump.h                |   2 +
 dump.c                               | 165 ++++++++++++++++++++++++++-
 hw/acpi/aml-build.c                  |   2 +
 hw/acpi/bios-linker-loader.c         |   6 +
 hw/acpi/vmcoreinfo.c                 | 214 +++++++++++++++++++++++++++++++++++
 hw/acpi/vmgenid.c                    |   9 +-
 hw/i386/acpi-build.c                 |  14 +++
 tests/vmcoreinfo-test.c              | 130 +++++++++++++++++++++
 MAINTAINERS                          |   7 ++
 default-configs/arm-softmmu.mak      |   1 +
 default-configs/i386-softmmu.mak     |   1 +
 default-configs/x86_64-softmmu.mak   |   1 +
 docs/specs/vmcoreinfo.txt            | 138 ++++++++++++++++++++++
 hw/acpi/Makefile.objs                |   1 +
 tests/Makefile.include               |   2 +
 20 files changed, 753 insertions(+), 16 deletions(-)
 create mode 100644 include/hw/acpi/vmcoreinfo.h
 create mode 100644 hw/acpi/vmcoreinfo.c
 create mode 100644 tests/vmcoreinfo-test.c
 create mode 100644 docs/specs/vmcoreinfo.txt

-- 
2.13.1.395.gf7b71de06

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack
  2017-06-29 13:23 [Qemu-devel] [PATCH 0/7] KASLR kernel dump support Marc-André Lureau
@ 2017-06-29 13:23 ` Marc-André Lureau
  2017-06-29 14:11   ` Michael S. Tsirkin
                     ` (5 more replies)
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 2/7] acpi: add vmcoreinfo device Marc-André Lureau
                   ` (5 subsequent siblings)
  6 siblings, 6 replies; 31+ messages in thread
From: Marc-André Lureau @ 2017-06-29 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, anderson, berrange, ehabkost, lersek,
	Marc-André Lureau, Michael S. Tsirkin, Ben Warren

This compat property sole function is to prevent the device from being
instantiated. Instead of requiring an extra compat property, check if
fw_cfg has DMA enabled.

This has the additional benefit of handling other cases properly, like:

  $ qemu-system-x86_64 -device vmgenid -machine none
  qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
  $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
  qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
  $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
  [boots normally]

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 include/hw/acpi/bios-linker-loader.h | 2 ++
 include/hw/compat.h                  | 4 ----
 hw/acpi/bios-linker-loader.c         | 6 ++++++
 hw/acpi/vmgenid.c                    | 9 +--------
 4 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/include/hw/acpi/bios-linker-loader.h b/include/hw/acpi/bios-linker-loader.h
index efe17b0b9c..a711dbced8 100644
--- a/include/hw/acpi/bios-linker-loader.h
+++ b/include/hw/acpi/bios-linker-loader.h
@@ -7,6 +7,8 @@ typedef struct BIOSLinker {
     GArray *file_list;
 } BIOSLinker;
 
+bool bios_linker_loader_can_write_pointer(void);
+
 BIOSLinker *bios_linker_loader_init(void);
 
 void bios_linker_loader_alloc(BIOSLinker *linker,
diff --git a/include/hw/compat.h b/include/hw/compat.h
index 26cd5851a5..36f02179ac 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -150,10 +150,6 @@
         .driver   = "fw_cfg_io",\
         .property = "dma_enabled",\
         .value    = "off",\
-    },{\
-        .driver   = "vmgenid",\
-        .property = "x-write-pointer-available",\
-        .value    = "off",\
     },
 
 #define HW_COMPAT_2_3 \
diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
index 046183a0f1..587d62cb93 100644
--- a/hw/acpi/bios-linker-loader.c
+++ b/hw/acpi/bios-linker-loader.c
@@ -168,6 +168,12 @@ bios_linker_find_file(const BIOSLinker *linker, const char *name)
     return NULL;
 }
 
+bool bios_linker_loader_can_write_pointer(void)
+{
+    FWCfgState *fw_cfg = fw_cfg_find();
+    return fw_cfg && fw_cfg_dma_enabled(fw_cfg);
+}
+
 /*
  * bios_linker_loader_alloc: ask guest to load file into guest memory.
  *
diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
index a32b847fe0..ab5da293fd 100644
--- a/hw/acpi/vmgenid.c
+++ b/hw/acpi/vmgenid.c
@@ -205,17 +205,11 @@ static void vmgenid_handle_reset(void *opaque)
     memset(vms->vmgenid_addr_le, 0, ARRAY_SIZE(vms->vmgenid_addr_le));
 }
 
-static Property vmgenid_properties[] = {
-    DEFINE_PROP_BOOL("x-write-pointer-available", VmGenIdState,
-                     write_pointer_available, true),
-    DEFINE_PROP_END_OF_LIST(),
-};
-
 static void vmgenid_realize(DeviceState *dev, Error **errp)
 {
     VmGenIdState *vms = VMGENID(dev);
 
-    if (!vms->write_pointer_available) {
+    if (!bios_linker_loader_can_write_pointer()) {
         error_setg(errp, "%s requires DMA write support in fw_cfg, "
                    "which this machine type does not provide", VMGENID_DEVICE);
         return;
@@ -239,7 +233,6 @@ static void vmgenid_device_class_init(ObjectClass *klass, void *data)
     dc->vmsd = &vmstate_vmgenid;
     dc->realize = vmgenid_realize;
     dc->hotpluggable = false;
-    dc->props = vmgenid_properties;
 
     object_class_property_add_str(klass, VMGENID_GUID, NULL,
                                   vmgenid_set_guid, NULL);
-- 
2.13.1.395.gf7b71de06

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 2/7] acpi: add vmcoreinfo device
  2017-06-29 13:23 [Qemu-devel] [PATCH 0/7] KASLR kernel dump support Marc-André Lureau
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack Marc-André Lureau
@ 2017-06-29 13:23 ` Marc-André Lureau
  2017-07-04 22:07   ` Laszlo Ersek
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 3/7] tests: add simple vmcoreinfo test Marc-André Lureau
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 31+ messages in thread
From: Marc-André Lureau @ 2017-06-29 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, anderson, berrange, ehabkost, lersek,
	Marc-André Lureau, Michael S. Tsirkin, Paolo Bonzini,
	Richard Henderson

The VM coreinfo (vmcoreinfo) device is an emulated device which
exposes a 4k memory range to the guest to store various informations
useful to debug the guest OS. (it is greatly inspired by the VMGENID
device implementation)

This is an early-boot alternative to the qemu-ga VMDUMP_INFO event
proposed in "[PATCH 00/21] WIP: dump: add kaslr support".

A proof-of-concept kernel module:
https://github.com/elmarco/vmgenid-test/blob/master/qemuvmci-test.c

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 include/hw/acpi/aml-build.h        |   1 +
 include/hw/acpi/vmcoreinfo.h       |  36 +++++++
 hw/acpi/aml-build.c                |   2 +
 hw/acpi/vmcoreinfo.c               | 198 +++++++++++++++++++++++++++++++++++++
 hw/i386/acpi-build.c               |  14 +++
 default-configs/arm-softmmu.mak    |   1 +
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 docs/specs/vmcoreinfo.txt          | 138 ++++++++++++++++++++++++++
 hw/acpi/Makefile.objs              |   1 +
 10 files changed, 393 insertions(+)
 create mode 100644 include/hw/acpi/vmcoreinfo.h
 create mode 100644 hw/acpi/vmcoreinfo.c
 create mode 100644 docs/specs/vmcoreinfo.txt

diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 88d0738d76..cf781bcd34 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -211,6 +211,7 @@ struct AcpiBuildTables {
     GArray *rsdp;
     GArray *tcpalog;
     GArray *vmgenid;
+    GArray *vmcoreinfo;
     BIOSLinker *linker;
 } AcpiBuildTables;
 
diff --git a/include/hw/acpi/vmcoreinfo.h b/include/hw/acpi/vmcoreinfo.h
new file mode 100644
index 0000000000..40fe99c3ed
--- /dev/null
+++ b/include/hw/acpi/vmcoreinfo.h
@@ -0,0 +1,36 @@
+#ifndef ACPI_VMCOREINFO_H
+#define ACPI_VMCOREINFO_H
+
+#include "hw/acpi/bios-linker-loader.h"
+#include "hw/qdev.h"
+
+#define VMCOREINFO_DEVICE           "vmcoreinfo"
+#define VMCOREINFO_FW_CFG_FILE      "etc/vmcoreinfo"
+#define VMCOREINFO_ADDR_FW_CFG_FILE "etc/vmcoreinfo-addr"
+
+#define VMCOREINFO_FW_CFG_SIZE      4096 /* Occupy a page of memory */
+#define VMCOREINFO_OFFSET           40   /* allow space for
+                                          * OVMF SDT Header Probe Supressor
+                                          */
+
+#define VMCOREINFO(obj) OBJECT_CHECK(VmcoreinfoState, (obj), VMCOREINFO_DEVICE)
+
+typedef struct VmcoreinfoState {
+    DeviceClass parent_obj;
+    uint8_t vmcoreinfo_addr_le[8];   /* Address of memory region */
+    bool write_pointer_available;
+} VmcoreinfoState;
+
+/* returns NULL unless there is exactly one device */
+static inline Object *find_vmcoreinfo_dev(void)
+{
+    return object_resolve_path_type("", VMCOREINFO_DEVICE, NULL);
+}
+
+void vmcoreinfo_build_acpi(VmcoreinfoState *vis, GArray *table_data,
+                           GArray *vmci, BIOSLinker *linker);
+void vmcoreinfo_add_fw_cfg(VmcoreinfoState *vis, FWCfgState *s, GArray *vmci);
+bool vmcoreinfo_get(VmcoreinfoState *vis, uint64_t *paddr, uint64_t *size,
+                    Error **errp);
+
+#endif
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 36a6cc450e..47043ade4a 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1561,6 +1561,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
     tables->table_data = g_array_new(false, true /* clear */, 1);
     tables->tcpalog = g_array_new(false, true /* clear */, 1);
     tables->vmgenid = g_array_new(false, true /* clear */, 1);
+    tables->vmcoreinfo = g_array_new(false, true /* clear */, 1);
     tables->linker = bios_linker_loader_init();
 }
 
@@ -1571,6 +1572,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
     g_array_free(tables->table_data, true);
     g_array_free(tables->tcpalog, mfre);
     g_array_free(tables->vmgenid, mfre);
+    g_array_free(tables->vmcoreinfo, mfre);
 }
 
 /* Build rsdt table */
diff --git a/hw/acpi/vmcoreinfo.c b/hw/acpi/vmcoreinfo.c
new file mode 100644
index 0000000000..216e0bb83a
--- /dev/null
+++ b/hw/acpi/vmcoreinfo.c
@@ -0,0 +1,198 @@
+/*
+ *  Virtual Machine coreinfo device
+ *  (based on Virtual Machine Generation ID Device)
+ *
+ *  Copyright (C) 2017 Red Hat, Inc.
+ *  Copyright (C) 2017 Skyport Systems.
+ *
+ *  Authors: Marc-André Lureau <marcandre.lureau@redhat.com>
+ *           Ben Warren <ben@skyportsystems.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#include "qemu/osdep.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/vmcoreinfo.h"
+#include "hw/nvram/fw_cfg.h"
+#include "sysemu/sysemu.h"
+#include "qapi/error.h"
+
+void vmcoreinfo_build_acpi(VmcoreinfoState *vis, GArray *table_data,
+                           GArray *vmci, BIOSLinker *linker)
+{
+    Aml *ssdt, *dev, *scope, *method, *addr, *if_ctx;
+    uint32_t vgia_offset;
+
+    g_array_set_size(vmci, VMCOREINFO_FW_CFG_SIZE);
+
+    /* Put this in a separate SSDT table */
+    ssdt = init_aml_allocator();
+
+    /* Reserve space for header */
+    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
+
+    /* Storage address */
+    vgia_offset = table_data->len +
+        build_append_named_dword(ssdt->buf, "VCIA");
+    scope = aml_scope("\\_SB");
+    dev = aml_device("VMCI");
+    aml_append(dev, aml_name_decl("_HID", aml_string("QEMUVMCI")));
+
+    /* Simple status method to check that address is linked and non-zero */
+    method = aml_method("_STA", 0, AML_NOTSERIALIZED);
+    addr = aml_local(0);
+    aml_append(method, aml_store(aml_int(0xf), addr));
+    if_ctx = aml_if(aml_equal(aml_name("VCIA"), aml_int(0)));
+    aml_append(if_ctx, aml_store(aml_int(0), addr));
+    aml_append(method, if_ctx);
+    aml_append(method, aml_return(addr));
+    aml_append(dev, method);
+
+    /* the ADDR method returns two 32-bit words representing the lower and
+     * upper halves of the physical address of the vmcoreinfo area
+     */
+    method = aml_method("ADDR", 0, AML_NOTSERIALIZED);
+
+    addr = aml_local(0);
+    aml_append(method, aml_store(aml_package(2), addr));
+
+    aml_append(method, aml_store(aml_add(aml_name("VCIA"),
+                                         aml_int(VMCOREINFO_OFFSET), NULL),
+                                 aml_index(addr, aml_int(0))));
+    aml_append(method, aml_store(aml_int(0), aml_index(addr, aml_int(1))));
+    aml_append(method, aml_return(addr));
+
+    aml_append(dev, method);
+    aml_append(scope, dev);
+    aml_append(ssdt, scope);
+
+    g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
+
+    /* Allocate guest memory */
+    bios_linker_loader_alloc(linker, VMCOREINFO_FW_CFG_FILE, vmci, 4096,
+                             false /* page boundary, high memory */);
+
+    /* Patch address of vmcoreinfo fw_cfg blob into the ADDR fw_cfg
+     * blob so QEMU can read the info from there.  The address is
+     * expected to be < 4GB, but write 64 bits anyway.
+     * The address that is patched in is offset in order to implement
+     * the "OVMF SDT Header probe suppressor"
+     * see docs/specs/vmcoreinfo.txt for more details.
+     */
+    bios_linker_loader_write_pointer(linker,
+        VMCOREINFO_ADDR_FW_CFG_FILE, 0, sizeof(uint64_t),
+        VMCOREINFO_FW_CFG_FILE, VMCOREINFO_OFFSET);
+
+    /* Patch address of vmcoreinfo into the AML so OSPM can retrieve
+     * and read it.  Note that while we provide storage for 64 bits, only
+     * the least-signficant 32 get patched into AML.
+     */
+    bios_linker_loader_add_pointer(linker,
+        ACPI_BUILD_TABLE_FILE, vgia_offset, sizeof(uint32_t),
+        VMCOREINFO_FW_CFG_FILE, 0);
+
+    build_header(linker, table_data,
+        (void *)(table_data->data + table_data->len - ssdt->buf->len),
+        "SSDT", ssdt->buf->len, 1, NULL, "VMCOREIN");
+    free_aml_allocator();
+}
+
+void vmcoreinfo_add_fw_cfg(VmcoreinfoState *vis, FWCfgState *s, GArray *vmci)
+{
+    /* Create a read-only fw_cfg file for vmcoreinfo allocation */
+    /* XXX: linker could learn to allocate without backing fw_cfg? */
+    fw_cfg_add_file(s, VMCOREINFO_FW_CFG_FILE, vmci->data,
+                    VMCOREINFO_FW_CFG_SIZE);
+    /* Create a read-write fw_cfg file for Address */
+    fw_cfg_add_file_callback(s, VMCOREINFO_ADDR_FW_CFG_FILE, NULL, NULL,
+                             vis->vmcoreinfo_addr_le,
+                             ARRAY_SIZE(vis->vmcoreinfo_addr_le), false);
+}
+
+bool vmcoreinfo_get(VmcoreinfoState *vis,
+                    uint64_t *paddr, uint64_t *size,
+                    Error **errp)
+{
+    uint32_t vmcoreinfo_addr;
+    uint32_t version;
+
+    assert(vis);
+    assert(paddr);
+    assert(size);
+
+    memcpy(&vmcoreinfo_addr, vis->vmcoreinfo_addr_le, sizeof(vmcoreinfo_addr));
+    vmcoreinfo_addr = le32_to_cpu(vmcoreinfo_addr);
+    if (!vmcoreinfo_addr) {
+        error_setg(errp, "BIOS has not yet written the address of %s",
+                   VMCOREINFO_DEVICE);
+        return false;
+    }
+
+    cpu_physical_memory_read(vmcoreinfo_addr, &version, sizeof(version));
+    if (version != 0) {
+        error_setg(errp, "Unknown %s memory version", VMCOREINFO_DEVICE);
+        return false;
+    }
+
+    cpu_physical_memory_read(vmcoreinfo_addr + 4, paddr, sizeof(paddr));
+    *paddr = le64_to_cpu(*paddr);
+    cpu_physical_memory_read(vmcoreinfo_addr + 12, size, sizeof(size));
+    *size = le32_to_cpu(*size);
+
+    return true;
+}
+
+static const VMStateDescription vmstate_vmcoreinfo = {
+    .name = "vmcoreinfo",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8_ARRAY(vmcoreinfo_addr_le, VmcoreinfoState, sizeof(uint64_t)),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static void vmcoreinfo_realize(DeviceState *dev, Error **errp)
+{
+    if (!bios_linker_loader_can_write_pointer()) {
+        error_setg(errp, "%s requires DMA write support in fw_cfg, "
+                   "which this machine type does not provide",
+                   VMCOREINFO_DEVICE);
+        return;
+    }
+
+    /* Given that this function is executing, there is at least one VMCOREINFO
+     * device. Check if there are several.
+     */
+    if (!find_vmcoreinfo_dev()) {
+        error_setg(errp, "at most one %s device is permitted",
+                   VMCOREINFO_DEVICE);
+        return;
+    }
+}
+
+static void vmcoreinfo_device_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->vmsd = &vmstate_vmcoreinfo;
+    dc->realize = vmcoreinfo_realize;
+    dc->hotpluggable = false;
+}
+
+static const TypeInfo vmcoreinfo_device_info = {
+    .name          = VMCOREINFO_DEVICE,
+    .parent        = TYPE_DEVICE,
+    .instance_size = sizeof(VmcoreinfoState),
+    .class_init    = vmcoreinfo_device_class_init,
+};
+
+static void vmcoreinfo_register_types(void)
+{
+    type_register_static(&vmcoreinfo_device_info);
+}
+
+type_init(vmcoreinfo_register_types)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 0b8bc62b99..f8a6758841 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -43,6 +43,7 @@
 #include "sysemu/tpm.h"
 #include "hw/acpi/tpm.h"
 #include "hw/acpi/vmgenid.h"
+#include "hw/acpi/vmcoreinfo.h"
 #include "sysemu/tpm_backend.h"
 #include "hw/timer/mc146818rtc_regs.h"
 #include "sysemu/numa.h"
@@ -2641,6 +2642,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
     GArray *tables_blob = tables->table_data;
     AcpiSlicOem slic_oem = { .id = NULL, .table_id = NULL };
     Object *vmgenid_dev;
+    Object *vmcoreinfo_dev;
 
     acpi_get_pm_info(&pm);
     acpi_get_misc_info(&misc);
@@ -2690,6 +2692,12 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
         vmgenid_build_acpi(VMGENID(vmgenid_dev), tables_blob,
                            tables->vmgenid, tables->linker);
     }
+    vmcoreinfo_dev = find_vmcoreinfo_dev();
+    if (vmcoreinfo_dev) {
+        acpi_add_table(table_offsets, tables_blob);
+        vmcoreinfo_build_acpi(VMCOREINFO(vmcoreinfo_dev), tables_blob,
+                              tables->vmcoreinfo, tables->linker);
+    }
 
     if (misc.has_hpet) {
         acpi_add_table(table_offsets, tables_blob);
@@ -2866,6 +2874,7 @@ void acpi_setup(void)
     AcpiBuildTables tables;
     AcpiBuildState *build_state;
     Object *vmgenid_dev;
+    Object *vmcoreinfo_dev;
 
     if (!pcms->fw_cfg) {
         ACPI_BUILD_DPRINTF("No fw cfg. Bailing out.\n");
@@ -2907,6 +2916,11 @@ void acpi_setup(void)
         vmgenid_add_fw_cfg(VMGENID(vmgenid_dev), pcms->fw_cfg,
                            tables.vmgenid);
     }
+    vmcoreinfo_dev = find_vmcoreinfo_dev();
+    if (vmcoreinfo_dev) {
+        vmcoreinfo_add_fw_cfg(VMCOREINFO(vmcoreinfo_dev), pcms->fw_cfg,
+                              tables.vmcoreinfo);
+    }
 
     if (!pcmc->rsdp_in_ram) {
         /*
diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 93e995d318..320dd3680d 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -120,6 +120,7 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_ACPI=y
+CONFIG_ACPI_VMCOREINFO=y
 CONFIG_SMBIOS=y
 CONFIG_ASPEED_SOC=y
 CONFIG_GPIO_KEY=y
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index d2ab2f6655..df68628895 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -59,3 +59,4 @@ CONFIG_SMBIOS=y
 CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
 CONFIG_PXB=y
 CONFIG_ACPI_VMGENID=y
+CONFIG_ACPI_VMCOREINFO=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 9bde2f1c4b..e39ad5a680 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -59,3 +59,4 @@ CONFIG_SMBIOS=y
 CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
 CONFIG_PXB=y
 CONFIG_ACPI_VMGENID=y
+CONFIG_ACPI_VMCOREINFO=y
diff --git a/docs/specs/vmcoreinfo.txt b/docs/specs/vmcoreinfo.txt
new file mode 100644
index 0000000000..70d9716fe0
--- /dev/null
+++ b/docs/specs/vmcoreinfo.txt
@@ -0,0 +1,138 @@
+VIRTUAL MACHINE COREINFO DEVICE
+===============================
+
+Copyright (C) 2017 Red Hat, Inc.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+===
+
+The VM coreinfo (vmcoreinfo) device is an emulated device which
+exposes a 4k memory range to the guest to store various informations
+useful to debug the guest OS.
+
+QEMU Implementation
+-------------------
+
+The vmcoreinfo device is put in its own ACPI descriptor table, in a
+Secondary System Description Table, or SSDT.
+
+The following is a dump of the contents from a running system:
+
+# iasl -p ./SSDT -d /sys/firmware/acpi/tables/SSDT
+/*
+ * Intel ACPI Component Architecture
+ * AML/ASL+ Disassembler version 20160831-64
+ * Copyright (c) 2000 - 2016 Intel Corporation
+ *
+ * Disassembling to symbolic ASL+ operators
+ *
+ * Disassembly of /sys/firmware/acpi/tables/SSDT, Mon Apr 24 15:59:53 2017
+ *
+ * Original Table Header:
+ *     Signature        "SSDT"
+ *     Length           0x00000086 (134)
+ *     Revision         0x01
+ *     Checksum         0x5C
+ *     OEM ID           "BOCHS "
+ *     OEM Table ID     "VMCOREIN"
+ *     OEM Revision     0x00000001 (1)
+ *     Compiler ID      "BXPC"
+ *     Compiler Version 0x00000001 (1)
+ */
+DefinitionBlock ("", "SSDT", 1, "BOCHS ", "VMCOREIN", 0x00000001)
+{
+    Name (VCIA, 0x3FFFF000)
+    Scope (\_SB)
+    {
+        Device (VMCI)
+        {
+            Name (_HID, "QEMUVMCI")  // _HID: Hardware ID
+            Method (_STA, 0, NotSerialized)  // _STA: Status
+            {
+                Local0 = 0x0F
+                If (VCIA == Zero)
+                {
+                    Local0 = Zero
+                }
+
+                Return (Local0)
+            }
+
+            Method (ADDR, 0, NotSerialized)
+            {
+                Local0 = Package (0x02) {}
+                Local0 [Zero] = (VCIA + 0x28)
+                Local0 [One] = Zero
+                Return (Local0)
+            }
+        }
+    }
+}
+
+
+Design Details:
+---------------
+
+QEMU must be able to read the contents of the device memory,
+specifically when starting a memory dump.  In order to do this, QEMU
+must know the address that has been allocated.
+
+The mechanism chosen for this memory sharing is writeable fw_cfg blobs.
+These are data object that are visible to both QEMU and guests, and are
+addressable as sequential files.
+
+More information about fw_cfg can be found in "docs/specs/fw_cfg.txt"
+
+Two fw_cfg blobs are used in this case:
+
+/etc/vmcoreinfo      - used to allocate memory range, read-only to the guest
+/etc/vmcoreinfo-addr - contains the address of the allocated range
+                     - writeable by the guest
+
+
+QEMU sends the following commands to the guest at startup:
+
+1. Allocate memory for vmcoreinfo fw_cfg blob.
+2. Write the address of vmcoreinfo into the SSDT (VCIA ACPI variable as
+   shown above in the iasl dump).  Note that this change is not propagated
+   back to QEMU.
+3. Write the address of vmcoreinfo back to QEMU's copy of vmcoreinfo-addr
+   via the fw_cfg DMA interface.
+
+After step 3, QEMU is able to read the contents of vmcoreinfo.
+
+The value of VCIA is persisted via the VMState mechanism.
+
+
+Storage Format:
+---------------
+
+The content is expected to use little-endian format.
+
+In order to implement an OVMF "SDT Header Probe Suppressor", the contents of
+the vmcoreinfo blob has 40 bytes of padding:
+
++-----------------------------------+
+| SSDT with OEM Table ID = VMCOREIN |
++-----------------------------------+
+| ...                               |       TOP OF PAGE
+| VCIA dword object ----------------|-----> +---------------------------+
+| ...                               |       | fw-allocated array for    |
+| _STA method referring to VCIA     |       | "etc/vmcoreinfo"          |
+| ...                               |       +---------------------------+
+| ADDR method referring to VCIA     |       |  0: OVMF SDT Header probe |
+| ...                               |       |     suppressor            |
++-----------------------------------+       | 40: uint32 version field  |
+                                            | 44: info contents         |
+                                            |     ....                  |
+                                            +---------------------------+
+                                            END OF PAGE
+
+Version 0 content:
+
+ uint64 paddr:
+  Physical address of the Linux vmcoreinfo ELF note.
+ uint32 size:
+  Size of the vmcoreinfo ELF note.
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 11c35bcb44..9623078f95 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
+common-obj-$(CONFIG_ACPI_VMCOREINFO) += vmcoreinfo.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
-- 
2.13.1.395.gf7b71de06

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 3/7] tests: add simple vmcoreinfo test
  2017-06-29 13:23 [Qemu-devel] [PATCH 0/7] KASLR kernel dump support Marc-André Lureau
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack Marc-André Lureau
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 2/7] acpi: add vmcoreinfo device Marc-André Lureau
@ 2017-06-29 13:23 ` Marc-André Lureau
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 4/7] dump: add vmcoreinfo ELF note Marc-André Lureau
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 31+ messages in thread
From: Marc-André Lureau @ 2017-06-29 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, anderson, berrange, ehabkost, lersek, Marc-André Lureau

This test is based off vmgenid test from Ben Warren
<ben@skyportsystems.com>. It simply checks the vmcoreinfo ACPI device
is present and that the memory region associated can be read.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 tests/vmcoreinfo-test.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++++
 tests/Makefile.include  |   2 +
 2 files changed, 132 insertions(+)
 create mode 100644 tests/vmcoreinfo-test.c

diff --git a/tests/vmcoreinfo-test.c b/tests/vmcoreinfo-test.c
new file mode 100644
index 0000000000..c8b073366e
--- /dev/null
+++ b/tests/vmcoreinfo-test.c
@@ -0,0 +1,130 @@
+/*
+ * QTest testcase for VM coreinfo device
+ *
+ * Copyright (c) 2017 Red Hat, Inc.
+ * Copyright (c) 2017 Skyport Systems
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <string.h>
+#include <unistd.h>
+#include "qemu/osdep.h"
+#include "qemu/bitmap.h"
+#include "qemu/uuid.h"
+#include "hw/acpi/acpi-defs.h"
+#include "acpi-utils.h"
+#include "libqtest.h"
+
+#define VMCOREINFO_OFFSET 40     /* allow space for
+                                  * OVMF SDT Header Probe Supressor
+                                  */
+#define RSDP_ADDR_INVALID 0x100000 /* RSDP must be below this address */
+#define RSDP_SLEEP_US     100000   /* Sleep for 100ms between tries */
+#define RSDP_TRIES_MAX    100      /* Max total time is 10 seconds */
+
+typedef struct {
+    AcpiTableHeader header;
+    gchar name_op;
+    gchar vcia[4];
+    gchar val_op;
+    uint32_t vcia_val;
+} QEMU_PACKED VgidTable;
+
+static uint32_t acpi_find_vcia(void)
+{
+    uint32_t off;
+    AcpiRsdpDescriptor rsdp_table;
+    uint32_t rsdt;
+    AcpiRsdtDescriptorRev1 rsdt_table;
+    int tables_nr;
+    uint32_t *tables;
+    AcpiTableHeader ssdt_table;
+    VgidTable vgid_table;
+    int i;
+
+    /* Tables may take a short time to be set up by the guest */
+    for (i = 0; i < RSDP_TRIES_MAX; i++) {
+        off = acpi_find_rsdp_address();
+        if (off < RSDP_ADDR_INVALID) {
+            break;
+        }
+        g_usleep(RSDP_SLEEP_US);
+    }
+    g_assert_cmphex(off, <, RSDP_ADDR_INVALID);
+
+    acpi_parse_rsdp_table(off, &rsdp_table);
+
+    rsdt = rsdp_table.rsdt_physical_address;
+    /* read the header */
+    ACPI_READ_TABLE_HEADER(&rsdt_table, rsdt);
+    ACPI_ASSERT_CMP(rsdt_table.signature, "RSDT");
+
+    /* compute the table entries in rsdt */
+    tables_nr = (rsdt_table.length - sizeof(AcpiRsdtDescriptorRev1)) /
+                sizeof(uint32_t);
+    g_assert_cmpint(tables_nr, >, 0);
+
+    /* get the addresses of the tables pointed by rsdt */
+    tables = g_new0(uint32_t, tables_nr);
+    ACPI_READ_ARRAY_PTR(tables, tables_nr, rsdt);
+
+    for (i = 0; i < tables_nr; i++) {
+        ACPI_READ_TABLE_HEADER(&ssdt_table, tables[i]);
+        if (!strncmp((char *)ssdt_table.oem_table_id, "VMCOREIN", 8)) {
+            /* the first entry in the table should be VCIA
+             * That's all we need
+             */
+            ACPI_READ_FIELD(vgid_table.name_op, tables[i]);
+            g_assert(vgid_table.name_op == 0x08);  /* name */
+            ACPI_READ_ARRAY(vgid_table.vcia, tables[i]);
+            g_assert(memcmp(vgid_table.vcia, "VCIA", 4) == 0);
+            ACPI_READ_FIELD(vgid_table.val_op, tables[i]);
+            g_assert(vgid_table.val_op == 0x0C);  /* dword */
+            ACPI_READ_FIELD(vgid_table.vcia_val, tables[i]);
+            /* The GUID is written at a fixed offset into the fw_cfg file
+             * in order to implement the "OVMF SDT Header probe suppressor"
+             * see docs/specs/vmgenid.txt for more details
+             */
+            g_free(tables);
+            return vgid_table.vcia_val;
+        }
+    }
+    g_free(tables);
+    return 0;
+}
+
+static void vmcoreinfo_test(void)
+{
+    gchar *cmd;
+    uint32_t vmci_addr;
+    int i;
+
+    cmd = g_strdup_printf("-machine accel=tcg -device vmcoreinfo,id=vmci");
+    qtest_start(cmd);
+
+    vmci_addr = acpi_find_vcia();
+    g_assert(vmci_addr);
+
+    for (i = 0; i < 4096; i++) {
+        /* check the memory region can be read */
+        readb(vmci_addr + i);
+    }
+
+    qtest_quit(global_qtest);
+    g_free(cmd);
+}
+
+int main(int argc, char **argv)
+{
+    int ret;
+
+    g_test_init(&argc, &argv, NULL);
+
+    qtest_add_func("/vmcoreinfo/test", vmcoreinfo_test);
+    ret = g_test_run();
+
+    return ret;
+}
diff --git a/tests/Makefile.include b/tests/Makefile.include
index ae889cae02..67ddac40d8 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -250,6 +250,7 @@ check-qtest-i386-y += tests/usb-hcd-xhci-test$(EXESUF)
 gcov-files-i386-y += hw/usb/hcd-xhci.c
 check-qtest-i386-y += tests/pc-cpu-test$(EXESUF)
 check-qtest-i386-y += tests/q35-test$(EXESUF)
+check-qtest-i386-y += tests/vmcoreinfo-test$(EXESUF)
 gcov-files-i386-y += hw/pci-host/q35.c
 check-qtest-i386-$(CONFIG_VHOST_NET_TEST_i386) += tests/vhost-user-test$(EXESUF)
 ifeq ($(CONFIG_VHOST_NET_TEST_i386),)
@@ -760,6 +761,7 @@ tests/test-uuid$(EXESUF): tests/test-uuid.o $(test-util-obj-y)
 tests/test-arm-mptimer$(EXESUF): tests/test-arm-mptimer.o
 tests/test-qapi-util$(EXESUF): tests/test-qapi-util.o $(test-util-obj-y)
 tests/numa-test$(EXESUF): tests/numa-test.o
+tests/vmcoreinfo-test$(EXESUF): tests/vmcoreinfo-test.o tests/acpi-utils.o
 
 tests/migration/stress$(EXESUF): tests/migration/stress.o
 	$(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< ,"LINK","$(TARGET_DIR)$@")
-- 
2.13.1.395.gf7b71de06

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 4/7] dump: add vmcoreinfo ELF note
  2017-06-29 13:23 [Qemu-devel] [PATCH 0/7] KASLR kernel dump support Marc-André Lureau
                   ` (2 preceding siblings ...)
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 3/7] tests: add simple vmcoreinfo test Marc-André Lureau
@ 2017-06-29 13:23 ` Marc-André Lureau
  2017-07-04 23:48   ` Laszlo Ersek
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 5/7] kdump: " Marc-André Lureau
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 31+ messages in thread
From: Marc-André Lureau @ 2017-06-29 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, anderson, berrange, ehabkost, lersek, Marc-André Lureau

Read the vmcoreinfo ELF PT_NOTE from guest memory when vmcoreinfo
device provides the location, and write it as an ELF note in the dump.

There are now 2 possible sources of phys_base information.

(1) arch guessed value from arch_dump_info_get()
(2) vmcoreinfo ELF note NUMBER(phys_base)= field

    NUMBER(phys_base) in vmcoreinfo has only been recently introduced
    in Linux 4.10 (401721ecd1dc "kexec: export the value of phys_base
    instead of symbol address").

Since (2) has better chances to be accurate, the guessed value is
replaced by the value from the vmcoreinfo ELF note.

The phys_base value is stored in the same dump field locations as
before, and may duplicate the information available in the vmcoreinfo
ELF PT_NOTE. Crash tools should be prepared to handle this case.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 include/sysemu/dump.h |   2 +
 dump.c                | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 119 insertions(+)

diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index 2672a15f8b..111a7dcaa4 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -192,6 +192,8 @@ typedef struct DumpState {
                                   * this could be used to calculate
                                   * how much work we have
                                   * finished. */
+    uint8_t *vmcoreinfo;         /* ELF note content */
+    size_t vmcoreinfo_size;
 } DumpState;
 
 uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
diff --git a/dump.c b/dump.c
index d9090a24cc..8fda5cc1ed 100644
--- a/dump.c
+++ b/dump.c
@@ -26,6 +26,8 @@
 #include "qapi/qmp/qerror.h"
 #include "qmp-commands.h"
 #include "qapi-event.h"
+#include "qemu/error-report.h"
+#include "hw/acpi/vmcoreinfo.h"
 
 #include <zlib.h>
 #ifdef CONFIG_LZO
@@ -38,6 +40,11 @@
 #define ELF_MACHINE_UNAME "Unknown"
 #endif
 
+#define ELF_NOTE_SIZE(hdr_size, name_size, desc_size)   \
+    ((DIV_ROUND_UP((hdr_size), 4)                       \
+      + DIV_ROUND_UP((name_size), 4)                    \
+      + DIV_ROUND_UP((desc_size), 4)) * 4)
+
 uint16_t cpu_to_dump16(DumpState *s, uint16_t val)
 {
     if (s->dump_info.d_endian == ELFDATA2LSB) {
@@ -76,6 +83,8 @@ static int dump_cleanup(DumpState *s)
     guest_phys_blocks_free(&s->guest_phys_blocks);
     memory_mapping_list_free(&s->list);
     close(s->fd);
+    g_free(s->vmcoreinfo);
+    s->vmcoreinfo = NULL;
     if (s->resume) {
         if (s->detached) {
             qemu_mutex_lock_iothread();
@@ -235,6 +244,19 @@ static inline int cpu_index(CPUState *cpu)
     return cpu->cpu_index + 1;
 }
 
+static void write_vmcoreinfo_note(WriteCoreDumpFunction f, DumpState *s,
+                                  Error **errp)
+{
+    int ret;
+
+    if (s->vmcoreinfo) {
+        ret = f(s->vmcoreinfo, s->vmcoreinfo_size, s);
+        if (ret < 0) {
+            error_setg(errp, "dump: failed to write vmcoreinfo");
+        }
+    }
+}
+
 static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
                               Error **errp)
 {
@@ -258,6 +280,8 @@ static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
             return;
         }
     }
+
+    write_vmcoreinfo_note(f, s, errp);
 }
 
 static void write_elf32_note(DumpState *s, Error **errp)
@@ -303,6 +327,8 @@ static void write_elf32_notes(WriteCoreDumpFunction f, DumpState *s,
             return;
         }
     }
+
+    write_vmcoreinfo_note(f, s, errp);
 }
 
 static void write_elf_section(DumpState *s, int type, Error **errp)
@@ -714,6 +740,44 @@ static int buf_write_note(const void *buf, size_t size, void *opaque)
     return 0;
 }
 
+/*
+ * This function retrieves various sizes from an elf header.
+ *
+ * @note has to be a valid ELF note. The return sizes are unmodified
+ * (not padded or rounded up to be multiple of 4).
+ */
+static void get_note_sizes(DumpState *s, const void *note,
+                           uint64_t *note_head_size,
+                           uint64_t *name_size,
+                           uint64_t *desc_size)
+{
+    uint64_t note_head_sz;
+    uint64_t name_sz;
+    uint64_t desc_sz;
+
+    if (s->dump_info.d_class == ELFCLASS64) {
+        const Elf64_Nhdr *hdr = note;
+        note_head_sz = sizeof(Elf64_Nhdr);
+        name_sz = hdr->n_namesz;
+        desc_sz = hdr->n_descsz;
+    } else {
+        const Elf32_Nhdr *hdr = note;
+        note_head_sz = sizeof(Elf32_Nhdr);
+        name_sz = hdr->n_namesz;
+        desc_sz = hdr->n_descsz;
+    }
+
+    if (note_head_size) {
+        *note_head_size = note_head_sz;
+    }
+    if (name_size) {
+        *name_size = name_sz;
+    }
+    if (desc_size) {
+        *desc_size = desc_sz;
+    }
+}
+
 /* write common header, sub header and elf note to vmcore */
 static void create_header32(DumpState *s, Error **errp)
 {
@@ -1488,10 +1552,38 @@ static int64_t dump_calculate_size(DumpState *s)
     return total;
 }
 
+static void vmcoreinfo_update_phys_base(DumpState *s)
+{
+    uint64_t size, note_head_size, name_size;
+    char **lines;
+    uint8_t *vmci;
+    size_t i;
+
+    get_note_sizes(s, s->vmcoreinfo, &note_head_size, &name_size, &size);
+    note_head_size = ROUND_UP(note_head_size, 4);
+    name_size = ROUND_UP(name_size, 4);
+    vmci = s->vmcoreinfo + note_head_size + name_size;
+    *(vmci + size) = '\0';
+
+    lines = g_strsplit((char *)vmci, "\n", -1);
+    for (i = 0; lines[i]; i++) {
+        if (g_str_has_prefix(lines[i], "NUMBER(phys_base)=")) {
+            if (qemu_strtou64(lines[i] + 18, NULL, 16,
+                              &s->dump_info.phys_base) < 0) {
+                error_report("Failed to read NUMBER(phys_base)=");
+            }
+            break;
+        }
+    }
+
+    g_strfreev(lines);
+}
+
 static void dump_init(DumpState *s, int fd, bool has_format,
                       DumpGuestMemoryFormat format, bool paging, bool has_filter,
                       int64_t begin, int64_t length, Error **errp)
 {
+    Object *vmcoreinfo_dev = find_vmcoreinfo_dev();
     CPUState *cpu;
     int nr_cpus;
     Error *err = NULL;
@@ -1563,6 +1655,31 @@ static void dump_init(DumpState *s, int fd, bool has_format,
         goto cleanup;
     }
 
+    if (vmcoreinfo_dev) {
+        uint64_t addr, size, note_head_size, name_size, desc_size;
+
+        if (!vmcoreinfo_get(VMCOREINFO(vmcoreinfo_dev),
+                            &addr, &size, &err)) {
+            error_report_err(err);
+        } else {
+            s->vmcoreinfo = g_malloc(size);
+            cpu_physical_memory_read(addr, s->vmcoreinfo, size);
+
+            get_note_sizes(s, s->vmcoreinfo,
+                           &note_head_size, &name_size, &desc_size);
+            s->vmcoreinfo_size = ELF_NOTE_SIZE(note_head_size, name_size,
+                                               desc_size);
+            if (s->vmcoreinfo_size > size) {
+                error_report("Invalid vmcoreinfo header, size mismatch");
+                g_free(s->vmcoreinfo);
+                s->vmcoreinfo = NULL;
+            } else {
+                vmcoreinfo_update_phys_base(s);
+                s->note_size += s->vmcoreinfo_size;
+            }
+        }
+    }
+
     /* get memory mapping */
     if (paging) {
         qemu_get_guest_memory_mapping(&s->list, &s->guest_phys_blocks, &err);
-- 
2.13.1.395.gf7b71de06

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 5/7] kdump: add vmcoreinfo ELF note
  2017-06-29 13:23 [Qemu-devel] [PATCH 0/7] KASLR kernel dump support Marc-André Lureau
                   ` (3 preceding siblings ...)
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 4/7] dump: add vmcoreinfo ELF note Marc-André Lureau
@ 2017-06-29 13:23 ` Marc-André Lureau
  2017-07-05  0:07   ` Laszlo Ersek
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 6/7] scripts/dump-guest-memory.py: add vmcoreinfo Marc-André Lureau
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 7/7] MAINTAINERS: add Dump maintainers Marc-André Lureau
  6 siblings, 1 reply; 31+ messages in thread
From: Marc-André Lureau @ 2017-06-29 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, anderson, berrange, ehabkost, lersek, Marc-André Lureau

kdump header provides offset and size of the vmcoreinfo ELF note,
append it if available.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 dump.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/dump.c b/dump.c
index 8fda5cc1ed..b78bc1fda7 100644
--- a/dump.c
+++ b/dump.c
@@ -788,8 +788,9 @@ static void create_header32(DumpState *s, Error **errp)
     uint32_t sub_hdr_size;
     uint32_t bitmap_blocks;
     uint32_t status = 0;
-    uint64_t offset_note;
+    uint64_t offset_note, offset_vmcoreinfo, size_vmcoreinfo = 0;
     Error *local_err = NULL;
+    uint8_t *vmcoreinfo = NULL;
 
     /* write common header, the version of kdump-compressed format is 6th */
     size = sizeof(DiskDumpHeader32);
@@ -838,7 +839,18 @@ static void create_header32(DumpState *s, Error **errp)
     kh->phys_base = cpu_to_dump32(s, s->dump_info.phys_base);
     kh->dump_level = cpu_to_dump32(s, DUMP_LEVEL);
 
-    offset_note = DISKDUMP_HEADER_BLOCKS * block_size + size;
+    offset_vmcoreinfo = DISKDUMP_HEADER_BLOCKS * block_size + size;
+    if (s->vmcoreinfo) {
+        uint64_t hsize, name_size;
+
+        get_note_sizes(s, s->vmcoreinfo, &hsize, &name_size, &size_vmcoreinfo);
+        vmcoreinfo =
+            s->vmcoreinfo + (DIV_ROUND_UP(hsize, 4) + DIV_ROUND_UP(name_size, 4)) * 4;
+        kh->offset_vmcoreinfo = cpu_to_dump64(s, offset_vmcoreinfo);
+        kh->size_vmcoreinfo = cpu_to_dump32(s, size_vmcoreinfo);
+    }
+
+    offset_note = offset_vmcoreinfo + size_vmcoreinfo;
     kh->offset_note = cpu_to_dump64(s, offset_note);
     kh->note_size = cpu_to_dump32(s, s->note_size);
 
@@ -848,6 +860,14 @@ static void create_header32(DumpState *s, Error **errp)
         goto out;
     }
 
+    if (vmcoreinfo) {
+        if (write_buffer(s->fd, offset_vmcoreinfo, vmcoreinfo,
+                         size_vmcoreinfo) < 0) {
+            error_setg(errp, "dump: failed to vmcoreinfo");
+            goto out;
+        }
+    }
+
     /* write note */
     s->note_buf = g_malloc0(s->note_size);
     s->note_buf_offset = 0;
@@ -888,8 +908,9 @@ static void create_header64(DumpState *s, Error **errp)
     uint32_t sub_hdr_size;
     uint32_t bitmap_blocks;
     uint32_t status = 0;
-    uint64_t offset_note;
+    uint64_t offset_note, offset_vmcoreinfo, size_vmcoreinfo = 0;
     Error *local_err = NULL;
+    uint8_t *vmcoreinfo = NULL;
 
     /* write common header, the version of kdump-compressed format is 6th */
     size = sizeof(DiskDumpHeader64);
@@ -938,7 +959,18 @@ static void create_header64(DumpState *s, Error **errp)
     kh->phys_base = cpu_to_dump64(s, s->dump_info.phys_base);
     kh->dump_level = cpu_to_dump32(s, DUMP_LEVEL);
 
-    offset_note = DISKDUMP_HEADER_BLOCKS * block_size + size;
+    offset_vmcoreinfo = DISKDUMP_HEADER_BLOCKS * block_size + size;
+    if (s->vmcoreinfo) {
+        uint64_t hsize, name_size;
+
+        get_note_sizes(s, s->vmcoreinfo, &hsize, &name_size, &size_vmcoreinfo);
+        vmcoreinfo =
+            s->vmcoreinfo + (DIV_ROUND_UP(hsize, 4) + DIV_ROUND_UP(name_size, 4)) * 4;
+        kh->offset_vmcoreinfo = cpu_to_dump64(s, offset_vmcoreinfo);
+        kh->size_vmcoreinfo = cpu_to_dump64(s, size_vmcoreinfo);
+    }
+
+    offset_note = offset_vmcoreinfo + size_vmcoreinfo;
     kh->offset_note = cpu_to_dump64(s, offset_note);
     kh->note_size = cpu_to_dump64(s, s->note_size);
 
@@ -948,6 +980,14 @@ static void create_header64(DumpState *s, Error **errp)
         goto out;
     }
 
+    if (vmcoreinfo) {
+        if (write_buffer(s->fd, offset_vmcoreinfo, vmcoreinfo,
+                         size_vmcoreinfo) < 0) {
+            error_setg(errp, "dump: failed to vmcoreinfo");
+            goto out;
+        }
+    }
+
     /* write note */
     s->note_buf = g_malloc0(s->note_size);
     s->note_buf_offset = 0;
-- 
2.13.1.395.gf7b71de06

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 6/7] scripts/dump-guest-memory.py: add vmcoreinfo
  2017-06-29 13:23 [Qemu-devel] [PATCH 0/7] KASLR kernel dump support Marc-André Lureau
                   ` (4 preceding siblings ...)
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 5/7] kdump: " Marc-André Lureau
@ 2017-06-29 13:23 ` Marc-André Lureau
  2017-07-05  0:22   ` Laszlo Ersek
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 7/7] MAINTAINERS: add Dump maintainers Marc-André Lureau
  6 siblings, 1 reply; 31+ messages in thread
From: Marc-André Lureau @ 2017-06-29 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, anderson, berrange, ehabkost, lersek,
	Marc-André Lureau, Michael S. Tsirkin

Add vmcoreinfo ELF note if vmcoreinfo device is ready.

To help the python script, add a little global vmcoreinfo_gdb
structure, that is populated with vmcoreinfo_gdb_update().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 scripts/dump-guest-memory.py | 32 ++++++++++++++++++++++++++++++++
 include/hw/acpi/vmcoreinfo.h |  1 +
 hw/acpi/vmcoreinfo.c         | 16 ++++++++++++++++
 3 files changed, 49 insertions(+)

diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py
index f7c6635f15..16c3d7cb10 100644
--- a/scripts/dump-guest-memory.py
+++ b/scripts/dump-guest-memory.py
@@ -120,6 +120,20 @@ class ELF(object):
         self.segments[0].p_filesz += ctypes.sizeof(note)
         self.segments[0].p_memsz += ctypes.sizeof(note)
 
+
+    def add_vmcoreinfo_note(self, vmcoreinfo):
+        """Adds a vmcoreinfo note to the ELF dump."""
+        chead = type(get_arch_note(self.endianness, 0, 0))
+        header = chead.from_buffer_copy(vmcoreinfo[0:ctypes.sizeof(chead)])
+        note = get_arch_note(self.endianness,
+                             header.n_namesz - 1, header.n_descsz)
+        ctypes.memmove(ctypes.pointer(note), vmcoreinfo, ctypes.sizeof(note))
+        header_size = ctypes.sizeof(note) - header.n_descsz
+
+        self.notes.append(note)
+        self.segments[0].p_filesz += ctypes.sizeof(note)
+        self.segments[0].p_memsz += ctypes.sizeof(note)
+
     def add_segment(self, p_type, p_paddr, p_size):
         """Adds a segment to the elf."""
 
@@ -505,6 +519,23 @@ shape and this command should mostly work."""
                 cur += chunk_size
                 left -= chunk_size
 
+    def add_vmcoreinfo(self):
+        qemu_core = gdb.inferiors()[0]
+
+        gdb.execute("call vmcoreinfo_gdb_update()")
+        avail = gdb.parse_and_eval("vmcoreinfo_gdb.available")
+        if not avail:
+            return;
+
+        addr = gdb.parse_and_eval("vmcoreinfo_gdb.paddr")
+        size = gdb.parse_and_eval("vmcoreinfo_gdb.size")
+        for block in self.guest_phys_blocks:
+            if block["target_start"] <= addr < block["target_end"]:
+                haddr = block["host_addr"] + (addr - block["target_start"])
+                vmcoreinfo = qemu_core.read_memory(haddr, size)
+                self.elf.add_vmcoreinfo_note(vmcoreinfo.tobytes())
+                return
+
     def invoke(self, args, from_tty):
         """Handles command invocation from gdb."""
 
@@ -518,6 +549,7 @@ shape and this command should mostly work."""
 
         self.elf = ELF(argv[1])
         self.guest_phys_blocks = get_guest_phys_blocks()
+        self.add_vmcoreinfo()
 
         with open(argv[0], "wb") as vmcore:
             self.dump_init(vmcore)
diff --git a/include/hw/acpi/vmcoreinfo.h b/include/hw/acpi/vmcoreinfo.h
index 40fe99c3ed..4efa678237 100644
--- a/include/hw/acpi/vmcoreinfo.h
+++ b/include/hw/acpi/vmcoreinfo.h
@@ -32,5 +32,6 @@ void vmcoreinfo_build_acpi(VmcoreinfoState *vis, GArray *table_data,
 void vmcoreinfo_add_fw_cfg(VmcoreinfoState *vis, FWCfgState *s, GArray *vmci);
 bool vmcoreinfo_get(VmcoreinfoState *vis, uint64_t *paddr, uint64_t *size,
                     Error **errp);
+void vmcoreinfo_gdb_update(void);
 
 #endif
diff --git a/hw/acpi/vmcoreinfo.c b/hw/acpi/vmcoreinfo.c
index 216e0bb83a..75e3330813 100644
--- a/hw/acpi/vmcoreinfo.c
+++ b/hw/acpi/vmcoreinfo.c
@@ -145,6 +145,22 @@ bool vmcoreinfo_get(VmcoreinfoState *vis,
     return true;
 }
 
+struct vmcoreinfo_gdb {
+    bool available;
+    uint64_t paddr;
+    uint64_t size;
+} vmcoreinfo_gdb;
+
+void vmcoreinfo_gdb_update(void)
+{
+    Object *vmci = find_vmcoreinfo_dev();
+
+    vmcoreinfo_gdb.available = vmci ?
+        vmcoreinfo_get(VMCOREINFO(vmci),
+                       &vmcoreinfo_gdb.paddr, &vmcoreinfo_gdb.size, NULL)
+        : false;
+}
+
 static const VMStateDescription vmstate_vmcoreinfo = {
     .name = "vmcoreinfo",
     .version_id = 1,
-- 
2.13.1.395.gf7b71de06

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Qemu-devel] [PATCH 7/7] MAINTAINERS: add Dump maintainers
  2017-06-29 13:23 [Qemu-devel] [PATCH 0/7] KASLR kernel dump support Marc-André Lureau
                   ` (5 preceding siblings ...)
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 6/7] scripts/dump-guest-memory.py: add vmcoreinfo Marc-André Lureau
@ 2017-06-29 13:23 ` Marc-André Lureau
  2017-07-05  0:26   ` Laszlo Ersek
  6 siblings, 1 reply; 31+ messages in thread
From: Marc-André Lureau @ 2017-06-29 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: imammedo, anderson, berrange, ehabkost, lersek, Marc-André Lureau

Proposing myself, since I have some familiarity with the code now.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 839f7ca063..45a0eb4cb0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1272,6 +1272,13 @@ S: Maintained
 F: device_tree.c
 F: include/sysemu/device_tree.h
 
+Dump
+S: Supported
+M: Marc-André Lureau <marcandre.lureau@redhat.com>
+F: dump.c
+F: include/sysemu/dump.h
+F: include/sysemu/dump-arch.h
+
 Error reporting
 M: Markus Armbruster <armbru@redhat.com>
 S: Supported
-- 
2.13.1.395.gf7b71de06

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack Marc-André Lureau
@ 2017-06-29 14:11   ` Michael S. Tsirkin
  2017-07-02  3:09   ` Ben Warren
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Michael S. Tsirkin @ 2017-06-29 14:11 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: qemu-devel, imammedo, anderson, berrange, ehabkost, lersek, Ben Warren

On Thu, Jun 29, 2017 at 03:23:04PM +0200, Marc-André Lureau wrote:
> This compat property sole function is to prevent the device from being
> instantiated. Instead of requiring an extra compat property, check if
> fw_cfg has DMA enabled.
> 
> This has the additional benefit of handling other cases properly, like:
> 
>   $ qemu-system-x86_64 -device vmgenid -machine none
>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
>   [boots normally]
> 
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

I like this

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

Which tree would you like to merge this through?

> ---
>  include/hw/acpi/bios-linker-loader.h | 2 ++
>  include/hw/compat.h                  | 4 ----
>  hw/acpi/bios-linker-loader.c         | 6 ++++++
>  hw/acpi/vmgenid.c                    | 9 +--------
>  4 files changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/include/hw/acpi/bios-linker-loader.h b/include/hw/acpi/bios-linker-loader.h
> index efe17b0b9c..a711dbced8 100644
> --- a/include/hw/acpi/bios-linker-loader.h
> +++ b/include/hw/acpi/bios-linker-loader.h
> @@ -7,6 +7,8 @@ typedef struct BIOSLinker {
>      GArray *file_list;
>  } BIOSLinker;
>  
> +bool bios_linker_loader_can_write_pointer(void);
> +
>  BIOSLinker *bios_linker_loader_init(void);
>  
>  void bios_linker_loader_alloc(BIOSLinker *linker,
> diff --git a/include/hw/compat.h b/include/hw/compat.h
> index 26cd5851a5..36f02179ac 100644
> --- a/include/hw/compat.h
> +++ b/include/hw/compat.h
> @@ -150,10 +150,6 @@
>          .driver   = "fw_cfg_io",\
>          .property = "dma_enabled",\
>          .value    = "off",\
> -    },{\
> -        .driver   = "vmgenid",\
> -        .property = "x-write-pointer-available",\
> -        .value    = "off",\
>      },
>  
>  #define HW_COMPAT_2_3 \
> diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
> index 046183a0f1..587d62cb93 100644
> --- a/hw/acpi/bios-linker-loader.c
> +++ b/hw/acpi/bios-linker-loader.c
> @@ -168,6 +168,12 @@ bios_linker_find_file(const BIOSLinker *linker, const char *name)
>      return NULL;
>  }
>  
> +bool bios_linker_loader_can_write_pointer(void)
> +{
> +    FWCfgState *fw_cfg = fw_cfg_find();
> +    return fw_cfg && fw_cfg_dma_enabled(fw_cfg);
> +}
> +
>  /*
>   * bios_linker_loader_alloc: ask guest to load file into guest memory.
>   *
> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
> index a32b847fe0..ab5da293fd 100644
> --- a/hw/acpi/vmgenid.c
> +++ b/hw/acpi/vmgenid.c
> @@ -205,17 +205,11 @@ static void vmgenid_handle_reset(void *opaque)
>      memset(vms->vmgenid_addr_le, 0, ARRAY_SIZE(vms->vmgenid_addr_le));
>  }
>  
> -static Property vmgenid_properties[] = {
> -    DEFINE_PROP_BOOL("x-write-pointer-available", VmGenIdState,
> -                     write_pointer_available, true),
> -    DEFINE_PROP_END_OF_LIST(),
> -};
> -
>  static void vmgenid_realize(DeviceState *dev, Error **errp)
>  {
>      VmGenIdState *vms = VMGENID(dev);
>  
> -    if (!vms->write_pointer_available) {
> +    if (!bios_linker_loader_can_write_pointer()) {
>          error_setg(errp, "%s requires DMA write support in fw_cfg, "
>                     "which this machine type does not provide", VMGENID_DEVICE);
>          return;
> @@ -239,7 +233,6 @@ static void vmgenid_device_class_init(ObjectClass *klass, void *data)
>      dc->vmsd = &vmstate_vmgenid;
>      dc->realize = vmgenid_realize;
>      dc->hotpluggable = false;
> -    dc->props = vmgenid_properties;
>  
>      object_class_property_add_str(klass, VMGENID_GUID, NULL,
>                                    vmgenid_set_guid, NULL);
> -- 
> 2.13.1.395.gf7b71de06

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack Marc-André Lureau
  2017-06-29 14:11   ` Michael S. Tsirkin
@ 2017-07-02  3:09   ` Ben Warren
  2017-07-03 14:48   ` Eduardo Habkost
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Ben Warren @ 2017-07-02  3:09 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: qemu-devel, imammedo, anderson, berrange, ehabkost, lersek,
	Michael S. Tsirkin

Nice improvement!
> On Jun 29, 2017, at 9:23 AM, Marc-André Lureau <marcandre.lureau@redhat.com> wrote:
> 
> This compat property sole function is to prevent the device from being
> instantiated. Instead of requiring an extra compat property, check if
> fw_cfg has DMA enabled.
> 
> This has the additional benefit of handling other cases properly, like:
> 
>  $ qemu-system-x86_64 -device vmgenid -machine none
>  qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>  $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
>  qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>  $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
>  [boots normally]
> 
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Ben Warren <ben@skyportsystems.com <mailto:mst@redhat.com>>
> ---
> include/hw/acpi/bios-linker-loader.h | 2 ++
> include/hw/compat.h                  | 4 ----
> hw/acpi/bios-linker-loader.c         | 6 ++++++
> hw/acpi/vmgenid.c                    | 9 +--------
> 4 files changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/include/hw/acpi/bios-linker-loader.h b/include/hw/acpi/bios-linker-loader.h
> index efe17b0b9c..a711dbced8 100644
> --- a/include/hw/acpi/bios-linker-loader.h
> +++ b/include/hw/acpi/bios-linker-loader.h
> @@ -7,6 +7,8 @@ typedef struct BIOSLinker {
>     GArray *file_list;
> } BIOSLinker;
> 
> +bool bios_linker_loader_can_write_pointer(void);
> +
> BIOSLinker *bios_linker_loader_init(void);
> 
> void bios_linker_loader_alloc(BIOSLinker *linker,
> diff --git a/include/hw/compat.h b/include/hw/compat.h
> index 26cd5851a5..36f02179ac 100644
> --- a/include/hw/compat.h
> +++ b/include/hw/compat.h
> @@ -150,10 +150,6 @@
>         .driver   = "fw_cfg_io",\
>         .property = "dma_enabled",\
>         .value    = "off",\
> -    },{\
> -        .driver   = "vmgenid",\
> -        .property = "x-write-pointer-available",\
> -        .value    = "off",\
>     },
> 
> #define HW_COMPAT_2_3 \
> diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
> index 046183a0f1..587d62cb93 100644
> --- a/hw/acpi/bios-linker-loader.c
> +++ b/hw/acpi/bios-linker-loader.c
> @@ -168,6 +168,12 @@ bios_linker_find_file(const BIOSLinker *linker, const char *name)
>     return NULL;
> }
> 
> +bool bios_linker_loader_can_write_pointer(void)
> +{
> +    FWCfgState *fw_cfg = fw_cfg_find();
> +    return fw_cfg && fw_cfg_dma_enabled(fw_cfg);
> +}
> +
> /*
>  * bios_linker_loader_alloc: ask guest to load file into guest memory.
>  *
> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
> index a32b847fe0..ab5da293fd 100644
> --- a/hw/acpi/vmgenid.c
> +++ b/hw/acpi/vmgenid.c
> @@ -205,17 +205,11 @@ static void vmgenid_handle_reset(void *opaque)
>     memset(vms->vmgenid_addr_le, 0, ARRAY_SIZE(vms->vmgenid_addr_le));
> }
> 
> -static Property vmgenid_properties[] = {
> -    DEFINE_PROP_BOOL("x-write-pointer-available", VmGenIdState,
> -                     write_pointer_available, true),
> -    DEFINE_PROP_END_OF_LIST(),
> -};
> -
> static void vmgenid_realize(DeviceState *dev, Error **errp)
> {
>     VmGenIdState *vms = VMGENID(dev);
> 
> -    if (!vms->write_pointer_available) {
> +    if (!bios_linker_loader_can_write_pointer()) {
>         error_setg(errp, "%s requires DMA write support in fw_cfg, "
>                    "which this machine type does not provide", VMGENID_DEVICE);
>         return;
> @@ -239,7 +233,6 @@ static void vmgenid_device_class_init(ObjectClass *klass, void *data)
>     dc->vmsd = &vmstate_vmgenid;
>     dc->realize = vmgenid_realize;
>     dc->hotpluggable = false;
> -    dc->props = vmgenid_properties;
> 
>     object_class_property_add_str(klass, VMGENID_GUID, NULL,
>                                   vmgenid_set_guid, NULL);
> -- 
> 2.13.1.395.gf7b71de06
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack Marc-André Lureau
  2017-06-29 14:11   ` Michael S. Tsirkin
  2017-07-02  3:09   ` Ben Warren
@ 2017-07-03 14:48   ` Eduardo Habkost
  2017-07-03 18:06   ` Laszlo Ersek
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 31+ messages in thread
From: Eduardo Habkost @ 2017-07-03 14:48 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: qemu-devel, imammedo, anderson, berrange, lersek,
	Michael S. Tsirkin, Ben Warren

On Thu, Jun 29, 2017 at 03:23:04PM +0200, Marc-André Lureau wrote:
> This compat property sole function is to prevent the device from being
> instantiated. Instead of requiring an extra compat property, check if
> fw_cfg has DMA enabled.
> 
> This has the additional benefit of handling other cases properly, like:
> 
>   $ qemu-system-x86_64 -device vmgenid -machine none
>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
>   [boots normally]
> 
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack Marc-André Lureau
                     ` (2 preceding siblings ...)
  2017-07-03 14:48   ` Eduardo Habkost
@ 2017-07-03 18:06   ` Laszlo Ersek
  2017-07-03 18:27     ` Eduardo Habkost
  2017-07-03 18:21   ` Michael S. Tsirkin
  2017-07-03 18:38   ` Michael S. Tsirkin
  5 siblings, 1 reply; 31+ messages in thread
From: Laszlo Ersek @ 2017-07-03 18:06 UTC (permalink / raw)
  To: Marc-André Lureau, qemu-devel
  Cc: ehabkost, Ben Warren, Michael S. Tsirkin, anderson, imammedo

On 06/29/17 15:23, Marc-André Lureau wrote:
> This compat property sole function is to prevent the device from being
> instantiated. Instead of requiring an extra compat property, check if
> fw_cfg has DMA enabled.
> 
> This has the additional benefit of handling other cases properly, like:
> 
>   $ qemu-system-x86_64 -device vmgenid -machine none
>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
>   [boots normally]
> 
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  include/hw/acpi/bios-linker-loader.h | 2 ++
>  include/hw/compat.h                  | 4 ----
>  hw/acpi/bios-linker-loader.c         | 6 ++++++
>  hw/acpi/vmgenid.c                    | 9 +--------
>  4 files changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/include/hw/acpi/bios-linker-loader.h b/include/hw/acpi/bios-linker-loader.h
> index efe17b0b9c..a711dbced8 100644
> --- a/include/hw/acpi/bios-linker-loader.h
> +++ b/include/hw/acpi/bios-linker-loader.h
> @@ -7,6 +7,8 @@ typedef struct BIOSLinker {
>      GArray *file_list;
>  } BIOSLinker;
>  
> +bool bios_linker_loader_can_write_pointer(void);
> +
>  BIOSLinker *bios_linker_loader_init(void);
>  
>  void bios_linker_loader_alloc(BIOSLinker *linker,
> diff --git a/include/hw/compat.h b/include/hw/compat.h
> index 26cd5851a5..36f02179ac 100644
> --- a/include/hw/compat.h
> +++ b/include/hw/compat.h
> @@ -150,10 +150,6 @@
>          .driver   = "fw_cfg_io",\
>          .property = "dma_enabled",\
>          .value    = "off",\
> -    },{\
> -        .driver   = "vmgenid",\
> -        .property = "x-write-pointer-available",\
> -        .value    = "off",\
>      },
>  
>  #define HW_COMPAT_2_3 \
> diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
> index 046183a0f1..587d62cb93 100644
> --- a/hw/acpi/bios-linker-loader.c
> +++ b/hw/acpi/bios-linker-loader.c
> @@ -168,6 +168,12 @@ bios_linker_find_file(const BIOSLinker *linker, const char *name)
>      return NULL;
>  }
>  
> +bool bios_linker_loader_can_write_pointer(void)
> +{
> +    FWCfgState *fw_cfg = fw_cfg_find();
> +    return fw_cfg && fw_cfg_dma_enabled(fw_cfg);
> +}
> +
>  /*
>   * bios_linker_loader_alloc: ask guest to load file into guest memory.
>   *
> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
> index a32b847fe0..ab5da293fd 100644
> --- a/hw/acpi/vmgenid.c
> +++ b/hw/acpi/vmgenid.c
> @@ -205,17 +205,11 @@ static void vmgenid_handle_reset(void *opaque)
>      memset(vms->vmgenid_addr_le, 0, ARRAY_SIZE(vms->vmgenid_addr_le));
>  }
>  
> -static Property vmgenid_properties[] = {
> -    DEFINE_PROP_BOOL("x-write-pointer-available", VmGenIdState,
> -                     write_pointer_available, true),
> -    DEFINE_PROP_END_OF_LIST(),
> -};
> -
>  static void vmgenid_realize(DeviceState *dev, Error **errp)
>  {
>      VmGenIdState *vms = VMGENID(dev);
>  
> -    if (!vms->write_pointer_available) {
> +    if (!bios_linker_loader_can_write_pointer()) {
>          error_setg(errp, "%s requires DMA write support in fw_cfg, "
>                     "which this machine type does not provide", VMGENID_DEVICE);
>          return;
> @@ -239,7 +233,6 @@ static void vmgenid_device_class_init(ObjectClass *klass, void *data)
>      dc->vmsd = &vmstate_vmgenid;
>      dc->realize = vmgenid_realize;
>      dc->hotpluggable = false;
> -    dc->props = vmgenid_properties;
>  
>      object_class_property_add_str(klass, VMGENID_GUID, NULL,
>                                    vmgenid_set_guid, NULL);
> 

I believe we discussed this approach back then (but I can't find the
relevant messages, of course).

What guarantees that, by the time you call fw_cfg_find() from
vmgenid_realize() -- that is, from the realize function of an
independent device --, the fw_cfg device will have been realized (with
its properties having taken their final values)? I don't see how the
ordering is guaranteed here; please explain (preferably in the commit
message).

Thanks,
Laszlo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack Marc-André Lureau
                     ` (3 preceding siblings ...)
  2017-07-03 18:06   ` Laszlo Ersek
@ 2017-07-03 18:21   ` Michael S. Tsirkin
  2017-07-03 18:38   ` Michael S. Tsirkin
  5 siblings, 0 replies; 31+ messages in thread
From: Michael S. Tsirkin @ 2017-07-03 18:21 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: qemu-devel, imammedo, anderson, berrange, ehabkost, lersek, Ben Warren

On Thu, Jun 29, 2017 at 03:23:04PM +0200, Marc-André Lureau wrote:
> This compat property sole function is to prevent the device from being
> instantiated. Instead of requiring an extra compat property, check if
> fw_cfg has DMA enabled.
> 
> This has the additional benefit of handling other cases properly, like:
> 
>   $ qemu-system-x86_64 -device vmgenid -machine none
>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
>   [boots normally]
> 
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Looks like this can go in as a stand-alone patch.
I wasn't copied on rest of patchset so I'm not sure
who's merging that, but I think this one can be split out.


> ---
>  include/hw/acpi/bios-linker-loader.h | 2 ++
>  include/hw/compat.h                  | 4 ----
>  hw/acpi/bios-linker-loader.c         | 6 ++++++
>  hw/acpi/vmgenid.c                    | 9 +--------
>  4 files changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/include/hw/acpi/bios-linker-loader.h b/include/hw/acpi/bios-linker-loader.h
> index efe17b0b9c..a711dbced8 100644
> --- a/include/hw/acpi/bios-linker-loader.h
> +++ b/include/hw/acpi/bios-linker-loader.h
> @@ -7,6 +7,8 @@ typedef struct BIOSLinker {
>      GArray *file_list;
>  } BIOSLinker;
>  
> +bool bios_linker_loader_can_write_pointer(void);
> +
>  BIOSLinker *bios_linker_loader_init(void);
>  
>  void bios_linker_loader_alloc(BIOSLinker *linker,
> diff --git a/include/hw/compat.h b/include/hw/compat.h
> index 26cd5851a5..36f02179ac 100644
> --- a/include/hw/compat.h
> +++ b/include/hw/compat.h
> @@ -150,10 +150,6 @@
>          .driver   = "fw_cfg_io",\
>          .property = "dma_enabled",\
>          .value    = "off",\
> -    },{\
> -        .driver   = "vmgenid",\
> -        .property = "x-write-pointer-available",\
> -        .value    = "off",\
>      },
>  
>  #define HW_COMPAT_2_3 \
> diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
> index 046183a0f1..587d62cb93 100644
> --- a/hw/acpi/bios-linker-loader.c
> +++ b/hw/acpi/bios-linker-loader.c
> @@ -168,6 +168,12 @@ bios_linker_find_file(const BIOSLinker *linker, const char *name)
>      return NULL;
>  }
>  
> +bool bios_linker_loader_can_write_pointer(void)
> +{
> +    FWCfgState *fw_cfg = fw_cfg_find();
> +    return fw_cfg && fw_cfg_dma_enabled(fw_cfg);
> +}
> +
>  /*
>   * bios_linker_loader_alloc: ask guest to load file into guest memory.
>   *
> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
> index a32b847fe0..ab5da293fd 100644
> --- a/hw/acpi/vmgenid.c
> +++ b/hw/acpi/vmgenid.c
> @@ -205,17 +205,11 @@ static void vmgenid_handle_reset(void *opaque)
>      memset(vms->vmgenid_addr_le, 0, ARRAY_SIZE(vms->vmgenid_addr_le));
>  }
>  
> -static Property vmgenid_properties[] = {
> -    DEFINE_PROP_BOOL("x-write-pointer-available", VmGenIdState,
> -                     write_pointer_available, true),
> -    DEFINE_PROP_END_OF_LIST(),
> -};
> -
>  static void vmgenid_realize(DeviceState *dev, Error **errp)
>  {
>      VmGenIdState *vms = VMGENID(dev);
>  
> -    if (!vms->write_pointer_available) {
> +    if (!bios_linker_loader_can_write_pointer()) {
>          error_setg(errp, "%s requires DMA write support in fw_cfg, "
>                     "which this machine type does not provide", VMGENID_DEVICE);
>          return;
> @@ -239,7 +233,6 @@ static void vmgenid_device_class_init(ObjectClass *klass, void *data)
>      dc->vmsd = &vmstate_vmgenid;
>      dc->realize = vmgenid_realize;
>      dc->hotpluggable = false;
> -    dc->props = vmgenid_properties;
>  
>      object_class_property_add_str(klass, VMGENID_GUID, NULL,
>                                    vmgenid_set_guid, NULL);
> -- 
> 2.13.1.395.gf7b71de06

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack
  2017-07-03 18:06   ` Laszlo Ersek
@ 2017-07-03 18:27     ` Eduardo Habkost
  2017-07-03 18:35       ` Laszlo Ersek
  0 siblings, 1 reply; 31+ messages in thread
From: Eduardo Habkost @ 2017-07-03 18:27 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Marc-André Lureau, qemu-devel, Ben Warren,
	Michael S. Tsirkin, anderson, imammedo

On Mon, Jul 03, 2017 at 08:06:33PM +0200, Laszlo Ersek wrote:
> On 06/29/17 15:23, Marc-André Lureau wrote:
> > This compat property sole function is to prevent the device from being
> > instantiated. Instead of requiring an extra compat property, check if
> > fw_cfg has DMA enabled.
> > 
> > This has the additional benefit of handling other cases properly, like:
> > 
> >   $ qemu-system-x86_64 -device vmgenid -machine none
> >   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
> >   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
> >   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
> >   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
> >   [boots normally]
> > 
> > Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > ---
> >  include/hw/acpi/bios-linker-loader.h | 2 ++
> >  include/hw/compat.h                  | 4 ----
> >  hw/acpi/bios-linker-loader.c         | 6 ++++++
> >  hw/acpi/vmgenid.c                    | 9 +--------
> >  4 files changed, 9 insertions(+), 12 deletions(-)
> > 
> > diff --git a/include/hw/acpi/bios-linker-loader.h b/include/hw/acpi/bios-linker-loader.h
> > index efe17b0b9c..a711dbced8 100644
> > --- a/include/hw/acpi/bios-linker-loader.h
> > +++ b/include/hw/acpi/bios-linker-loader.h
> > @@ -7,6 +7,8 @@ typedef struct BIOSLinker {
> >      GArray *file_list;
> >  } BIOSLinker;
> >  
> > +bool bios_linker_loader_can_write_pointer(void);
> > +
> >  BIOSLinker *bios_linker_loader_init(void);
> >  
> >  void bios_linker_loader_alloc(BIOSLinker *linker,
> > diff --git a/include/hw/compat.h b/include/hw/compat.h
> > index 26cd5851a5..36f02179ac 100644
> > --- a/include/hw/compat.h
> > +++ b/include/hw/compat.h
> > @@ -150,10 +150,6 @@
> >          .driver   = "fw_cfg_io",\
> >          .property = "dma_enabled",\
> >          .value    = "off",\
> > -    },{\
> > -        .driver   = "vmgenid",\
> > -        .property = "x-write-pointer-available",\
> > -        .value    = "off",\
> >      },
> >  
> >  #define HW_COMPAT_2_3 \
> > diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
> > index 046183a0f1..587d62cb93 100644
> > --- a/hw/acpi/bios-linker-loader.c
> > +++ b/hw/acpi/bios-linker-loader.c
> > @@ -168,6 +168,12 @@ bios_linker_find_file(const BIOSLinker *linker, const char *name)
> >      return NULL;
> >  }
> >  
> > +bool bios_linker_loader_can_write_pointer(void)
> > +{
> > +    FWCfgState *fw_cfg = fw_cfg_find();
> > +    return fw_cfg && fw_cfg_dma_enabled(fw_cfg);
> > +}
> > +
> >  /*
> >   * bios_linker_loader_alloc: ask guest to load file into guest memory.
> >   *
> > diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
> > index a32b847fe0..ab5da293fd 100644
> > --- a/hw/acpi/vmgenid.c
> > +++ b/hw/acpi/vmgenid.c
> > @@ -205,17 +205,11 @@ static void vmgenid_handle_reset(void *opaque)
> >      memset(vms->vmgenid_addr_le, 0, ARRAY_SIZE(vms->vmgenid_addr_le));
> >  }
> >  
> > -static Property vmgenid_properties[] = {
> > -    DEFINE_PROP_BOOL("x-write-pointer-available", VmGenIdState,
> > -                     write_pointer_available, true),
> > -    DEFINE_PROP_END_OF_LIST(),
> > -};
> > -
> >  static void vmgenid_realize(DeviceState *dev, Error **errp)
> >  {
> >      VmGenIdState *vms = VMGENID(dev);
> >  
> > -    if (!vms->write_pointer_available) {
> > +    if (!bios_linker_loader_can_write_pointer()) {
> >          error_setg(errp, "%s requires DMA write support in fw_cfg, "
> >                     "which this machine type does not provide", VMGENID_DEVICE);
> >          return;
> > @@ -239,7 +233,6 @@ static void vmgenid_device_class_init(ObjectClass *klass, void *data)
> >      dc->vmsd = &vmstate_vmgenid;
> >      dc->realize = vmgenid_realize;
> >      dc->hotpluggable = false;
> > -    dc->props = vmgenid_properties;
> >  
> >      object_class_property_add_str(klass, VMGENID_GUID, NULL,
> >                                    vmgenid_set_guid, NULL);
> > 
> 
> I believe we discussed this approach back then (but I can't find the
> relevant messages, of course).
> 
> What guarantees that, by the time you call fw_cfg_find() from
> vmgenid_realize() -- that is, from the realize function of an
> independent device --, the fw_cfg device will have been realized (with
> its properties having taken their final values)? I don't see how the
> ordering is guaranteed here; please explain (preferably in the commit
> message).

Good point.  What makes this work is the fact that fw_cfg is a built-in
device that is initialized very early by the machine init code.  We
could remove that requirement, but it would require reporting errors
from the machine_done notifier (in acpi_setup(), probably).  I'm not
sure it would be worth the extra complexity.

We have at least one other device that also assumes fw_cfg_find() can be
safely used on realize: pvpanic.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack
  2017-07-03 18:27     ` Eduardo Habkost
@ 2017-07-03 18:35       ` Laszlo Ersek
  0 siblings, 0 replies; 31+ messages in thread
From: Laszlo Ersek @ 2017-07-03 18:35 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Marc-André Lureau, qemu-devel, Ben Warren,
	Michael S. Tsirkin, anderson, imammedo, Mark Cave-Ayland

On 07/03/17 20:27, Eduardo Habkost wrote:
> On Mon, Jul 03, 2017 at 08:06:33PM +0200, Laszlo Ersek wrote:
>> On 06/29/17 15:23, Marc-André Lureau wrote:
>>> This compat property sole function is to prevent the device from being
>>> instantiated. Instead of requiring an extra compat property, check if
>>> fw_cfg has DMA enabled.
>>>
>>> This has the additional benefit of handling other cases properly, like:
>>>
>>>   $ qemu-system-x86_64 -device vmgenid -machine none
>>>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>>>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
>>>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>>>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
>>>   [boots normally]
>>>
>>> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
>>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>>> ---
>>>  include/hw/acpi/bios-linker-loader.h | 2 ++
>>>  include/hw/compat.h                  | 4 ----
>>>  hw/acpi/bios-linker-loader.c         | 6 ++++++
>>>  hw/acpi/vmgenid.c                    | 9 +--------
>>>  4 files changed, 9 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/include/hw/acpi/bios-linker-loader.h b/include/hw/acpi/bios-linker-loader.h
>>> index efe17b0b9c..a711dbced8 100644
>>> --- a/include/hw/acpi/bios-linker-loader.h
>>> +++ b/include/hw/acpi/bios-linker-loader.h
>>> @@ -7,6 +7,8 @@ typedef struct BIOSLinker {
>>>      GArray *file_list;
>>>  } BIOSLinker;
>>>  
>>> +bool bios_linker_loader_can_write_pointer(void);
>>> +
>>>  BIOSLinker *bios_linker_loader_init(void);
>>>  
>>>  void bios_linker_loader_alloc(BIOSLinker *linker,
>>> diff --git a/include/hw/compat.h b/include/hw/compat.h
>>> index 26cd5851a5..36f02179ac 100644
>>> --- a/include/hw/compat.h
>>> +++ b/include/hw/compat.h
>>> @@ -150,10 +150,6 @@
>>>          .driver   = "fw_cfg_io",\
>>>          .property = "dma_enabled",\
>>>          .value    = "off",\
>>> -    },{\
>>> -        .driver   = "vmgenid",\
>>> -        .property = "x-write-pointer-available",\
>>> -        .value    = "off",\
>>>      },
>>>  
>>>  #define HW_COMPAT_2_3 \
>>> diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
>>> index 046183a0f1..587d62cb93 100644
>>> --- a/hw/acpi/bios-linker-loader.c
>>> +++ b/hw/acpi/bios-linker-loader.c
>>> @@ -168,6 +168,12 @@ bios_linker_find_file(const BIOSLinker *linker, const char *name)
>>>      return NULL;
>>>  }
>>>  
>>> +bool bios_linker_loader_can_write_pointer(void)
>>> +{
>>> +    FWCfgState *fw_cfg = fw_cfg_find();
>>> +    return fw_cfg && fw_cfg_dma_enabled(fw_cfg);
>>> +}
>>> +
>>>  /*
>>>   * bios_linker_loader_alloc: ask guest to load file into guest memory.
>>>   *
>>> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
>>> index a32b847fe0..ab5da293fd 100644
>>> --- a/hw/acpi/vmgenid.c
>>> +++ b/hw/acpi/vmgenid.c
>>> @@ -205,17 +205,11 @@ static void vmgenid_handle_reset(void *opaque)
>>>      memset(vms->vmgenid_addr_le, 0, ARRAY_SIZE(vms->vmgenid_addr_le));
>>>  }
>>>  
>>> -static Property vmgenid_properties[] = {
>>> -    DEFINE_PROP_BOOL("x-write-pointer-available", VmGenIdState,
>>> -                     write_pointer_available, true),
>>> -    DEFINE_PROP_END_OF_LIST(),
>>> -};
>>> -
>>>  static void vmgenid_realize(DeviceState *dev, Error **errp)
>>>  {
>>>      VmGenIdState *vms = VMGENID(dev);
>>>  
>>> -    if (!vms->write_pointer_available) {
>>> +    if (!bios_linker_loader_can_write_pointer()) {
>>>          error_setg(errp, "%s requires DMA write support in fw_cfg, "
>>>                     "which this machine type does not provide", VMGENID_DEVICE);
>>>          return;
>>> @@ -239,7 +233,6 @@ static void vmgenid_device_class_init(ObjectClass *klass, void *data)
>>>      dc->vmsd = &vmstate_vmgenid;
>>>      dc->realize = vmgenid_realize;
>>>      dc->hotpluggable = false;
>>> -    dc->props = vmgenid_properties;
>>>  
>>>      object_class_property_add_str(klass, VMGENID_GUID, NULL,
>>>                                    vmgenid_set_guid, NULL);
>>>
>>
>> I believe we discussed this approach back then (but I can't find the
>> relevant messages, of course).
>>
>> What guarantees that, by the time you call fw_cfg_find() from
>> vmgenid_realize() -- that is, from the realize function of an
>> independent device --, the fw_cfg device will have been realized (with
>> its properties having taken their final values)? I don't see how the
>> ordering is guaranteed here; please explain (preferably in the commit
>> message).
> 
> Good point.  What makes this work is the fact that fw_cfg is a built-in
> device that is initialized very early by the machine init code.  We
> could remove that requirement, but it would require reporting errors
> from the machine_done notifier (in acpi_setup(), probably).  I'm not
> sure it would be worth the extra complexity.

Thank you for the explanation -- I agree we shouldn't up-end the code
now, but I think two sets of documentation should be added:

- the commit message should capture your explanation

- the bios_linker_loader_can_write_pointer() function should have a
reminder that board code must realize fw_cfg first, as a fixed (sysbus?)
device, before another device realize function reaches
bios_linker_loader_can_write_pointer().

(CC'ing Mark Cave-Ayland for any necessary co-ordination.)

> We have at least one other device that also assumes fw_cfg_find() can be
> safely used on realize: pvpanic.

Good point.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack Marc-André Lureau
                     ` (4 preceding siblings ...)
  2017-07-03 18:21   ` Michael S. Tsirkin
@ 2017-07-03 18:38   ` Michael S. Tsirkin
  2017-07-03 18:50     ` Eduardo Habkost
  5 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2017-07-03 18:38 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: qemu-devel, imammedo, anderson, berrange, ehabkost, lersek, Ben Warren

On Thu, Jun 29, 2017 at 03:23:04PM +0200, Marc-André Lureau wrote:
> This compat property sole function is to prevent the device from being
> instantiated. Instead of requiring an extra compat property, check if
> fw_cfg has DMA enabled.
> 
> This has the additional benefit of handling other cases properly, like:
> 
>   $ qemu-system-x86_64 -device vmgenid -machine none
>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
>   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
>   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
>   [boots normally]
> 
> Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

It's a nice cleanup, but I suspect we need to first implement
a framework for initialization ordering. I don't much like it
that we are adding more dependencies to the current bag of hacks.


> ---
>  include/hw/acpi/bios-linker-loader.h | 2 ++
>  include/hw/compat.h                  | 4 ----
>  hw/acpi/bios-linker-loader.c         | 6 ++++++
>  hw/acpi/vmgenid.c                    | 9 +--------
>  4 files changed, 9 insertions(+), 12 deletions(-)
> 
> diff --git a/include/hw/acpi/bios-linker-loader.h b/include/hw/acpi/bios-linker-loader.h
> index efe17b0b9c..a711dbced8 100644
> --- a/include/hw/acpi/bios-linker-loader.h
> +++ b/include/hw/acpi/bios-linker-loader.h
> @@ -7,6 +7,8 @@ typedef struct BIOSLinker {
>      GArray *file_list;
>  } BIOSLinker;
>  
> +bool bios_linker_loader_can_write_pointer(void);
> +
>  BIOSLinker *bios_linker_loader_init(void);
>  
>  void bios_linker_loader_alloc(BIOSLinker *linker,
> diff --git a/include/hw/compat.h b/include/hw/compat.h
> index 26cd5851a5..36f02179ac 100644
> --- a/include/hw/compat.h
> +++ b/include/hw/compat.h
> @@ -150,10 +150,6 @@
>          .driver   = "fw_cfg_io",\
>          .property = "dma_enabled",\
>          .value    = "off",\
> -    },{\
> -        .driver   = "vmgenid",\
> -        .property = "x-write-pointer-available",\
> -        .value    = "off",\
>      },
>  
>  #define HW_COMPAT_2_3 \
> diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
> index 046183a0f1..587d62cb93 100644
> --- a/hw/acpi/bios-linker-loader.c
> +++ b/hw/acpi/bios-linker-loader.c
> @@ -168,6 +168,12 @@ bios_linker_find_file(const BIOSLinker *linker, const char *name)
>      return NULL;
>  }
>  
> +bool bios_linker_loader_can_write_pointer(void)
> +{
> +    FWCfgState *fw_cfg = fw_cfg_find();
> +    return fw_cfg && fw_cfg_dma_enabled(fw_cfg);
> +}
> +
>  /*
>   * bios_linker_loader_alloc: ask guest to load file into guest memory.
>   *
> diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
> index a32b847fe0..ab5da293fd 100644
> --- a/hw/acpi/vmgenid.c
> +++ b/hw/acpi/vmgenid.c
> @@ -205,17 +205,11 @@ static void vmgenid_handle_reset(void *opaque)
>      memset(vms->vmgenid_addr_le, 0, ARRAY_SIZE(vms->vmgenid_addr_le));
>  }
>  
> -static Property vmgenid_properties[] = {
> -    DEFINE_PROP_BOOL("x-write-pointer-available", VmGenIdState,
> -                     write_pointer_available, true),
> -    DEFINE_PROP_END_OF_LIST(),
> -};
> -
>  static void vmgenid_realize(DeviceState *dev, Error **errp)
>  {
>      VmGenIdState *vms = VMGENID(dev);
>  
> -    if (!vms->write_pointer_available) {
> +    if (!bios_linker_loader_can_write_pointer()) {
>          error_setg(errp, "%s requires DMA write support in fw_cfg, "
>                     "which this machine type does not provide", VMGENID_DEVICE);
>          return;
> @@ -239,7 +233,6 @@ static void vmgenid_device_class_init(ObjectClass *klass, void *data)
>      dc->vmsd = &vmstate_vmgenid;
>      dc->realize = vmgenid_realize;
>      dc->hotpluggable = false;
> -    dc->props = vmgenid_properties;
>  
>      object_class_property_add_str(klass, VMGENID_GUID, NULL,
>                                    vmgenid_set_guid, NULL);
> -- 
> 2.13.1.395.gf7b71de06

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack
  2017-07-03 18:38   ` Michael S. Tsirkin
@ 2017-07-03 18:50     ` Eduardo Habkost
  2017-07-03 19:51       ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Eduardo Habkost @ 2017-07-03 18:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, qemu-devel, imammedo, anderson, berrange,
	lersek, Ben Warren

On Mon, Jul 03, 2017 at 09:38:52PM +0300, Michael S. Tsirkin wrote:
> On Thu, Jun 29, 2017 at 03:23:04PM +0200, Marc-André Lureau wrote:
> > This compat property sole function is to prevent the device from being
> > instantiated. Instead of requiring an extra compat property, check if
> > fw_cfg has DMA enabled.
> > 
> > This has the additional benefit of handling other cases properly, like:
> > 
> >   $ qemu-system-x86_64 -device vmgenid -machine none
> >   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
> >   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
> >   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
> >   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
> >   [boots normally]
> > 
> > Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> It's a nice cleanup, but I suspect we need to first implement
> a framework for initialization ordering. I don't much like it
> that we are adding more dependencies to the current bag of hacks.

I agree we should address this, but in this case there's no need to
introduce new mechanisms for initialization ordering if we just check
the dependencies on machine_done notifier or acpi_setup() (which is
called by machine_done).

-- 
Eduardo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack
  2017-07-03 18:50     ` Eduardo Habkost
@ 2017-07-03 19:51       ` Michael S. Tsirkin
  0 siblings, 0 replies; 31+ messages in thread
From: Michael S. Tsirkin @ 2017-07-03 19:51 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Marc-André Lureau, qemu-devel, imammedo, anderson, berrange,
	lersek, Ben Warren

On Mon, Jul 03, 2017 at 03:50:52PM -0300, Eduardo Habkost wrote:
> On Mon, Jul 03, 2017 at 09:38:52PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Jun 29, 2017 at 03:23:04PM +0200, Marc-André Lureau wrote:
> > > This compat property sole function is to prevent the device from being
> > > instantiated. Instead of requiring an extra compat property, check if
> > > fw_cfg has DMA enabled.
> > > 
> > > This has the additional benefit of handling other cases properly, like:
> > > 
> > >   $ qemu-system-x86_64 -device vmgenid -machine none
> > >   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
> > >   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.9 -global fw_cfg.dma_enabled=off
> > >   qemu-system-x86_64: -device vmgenid: vmgenid requires DMA write support in fw_cfg, which this machine type does not provide
> > >   $ qemu-system-x86_64 -device vmgenid -machine pc-i440fx-2.6 -global fw_cfg.dma_enabled=on
> > >   [boots normally]
> > > 
> > > Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
> > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > 
> > It's a nice cleanup, but I suspect we need to first implement
> > a framework for initialization ordering. I don't much like it
> > that we are adding more dependencies to the current bag of hacks.
> 
> I agree we should address this, but in this case there's no need to
> introduce new mechanisms for initialization ordering if we just check
> the dependencies on machine_done notifier or acpi_setup() (which is
> called by machine_done).

I guess what should fail is attempt to register a writeable blob.
This sounds reasonable.

> -- 
> Eduardo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] acpi: add vmcoreinfo device
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 2/7] acpi: add vmcoreinfo device Marc-André Lureau
@ 2017-07-04 22:07   ` Laszlo Ersek
  2017-07-05 13:54     ` Marc-André Lureau
  0 siblings, 1 reply; 31+ messages in thread
From: Laszlo Ersek @ 2017-07-04 22:07 UTC (permalink / raw)
  To: Marc-André Lureau, qemu-devel
  Cc: ehabkost, Michael S. Tsirkin, Paolo Bonzini, anderson, imammedo,
	Richard Henderson

comments below

On 06/29/17 15:23, Marc-André Lureau wrote:
> The VM coreinfo (vmcoreinfo) device is an emulated device which
> exposes a 4k memory range to the guest to store various informations
> useful to debug the guest OS. (it is greatly inspired by the VMGENID
> device implementation)
> 
> This is an early-boot alternative to the qemu-ga VMDUMP_INFO event
> proposed in "[PATCH 00/21] WIP: dump: add kaslr support".
> 
> A proof-of-concept kernel module:
> https://github.com/elmarco/vmgenid-test/blob/master/qemuvmci-test.c
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  include/hw/acpi/aml-build.h        |   1 +
>  include/hw/acpi/vmcoreinfo.h       |  36 +++++++
>  hw/acpi/aml-build.c                |   2 +
>  hw/acpi/vmcoreinfo.c               | 198 +++++++++++++++++++++++++++++++++++++
>  hw/i386/acpi-build.c               |  14 +++
>  default-configs/arm-softmmu.mak    |   1 +
>  default-configs/i386-softmmu.mak   |   1 +
>  default-configs/x86_64-softmmu.mak |   1 +
>  docs/specs/vmcoreinfo.txt          | 138 ++++++++++++++++++++++++++
>  hw/acpi/Makefile.objs              |   1 +
>  10 files changed, 393 insertions(+)
>  create mode 100644 include/hw/acpi/vmcoreinfo.h
>  create mode 100644 hw/acpi/vmcoreinfo.c
>  create mode 100644 docs/specs/vmcoreinfo.txt
>
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 88d0738d76..cf781bcd34 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -211,6 +211,7 @@ struct AcpiBuildTables {
>      GArray *rsdp;
>      GArray *tcpalog;
>      GArray *vmgenid;
> +    GArray *vmcoreinfo;
>      BIOSLinker *linker;
>  } AcpiBuildTables;
>  
> diff --git a/include/hw/acpi/vmcoreinfo.h b/include/hw/acpi/vmcoreinfo.h
> new file mode 100644
> index 0000000000..40fe99c3ed
> --- /dev/null
> +++ b/include/hw/acpi/vmcoreinfo.h
> @@ -0,0 +1,36 @@
> +#ifndef ACPI_VMCOREINFO_H
> +#define ACPI_VMCOREINFO_H
> +
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "hw/qdev.h"
> +
> +#define VMCOREINFO_DEVICE           "vmcoreinfo"
> +#define VMCOREINFO_FW_CFG_FILE      "etc/vmcoreinfo"
> +#define VMCOREINFO_ADDR_FW_CFG_FILE "etc/vmcoreinfo-addr"
> +
> +#define VMCOREINFO_FW_CFG_SIZE      4096 /* Occupy a page of memory */
> +#define VMCOREINFO_OFFSET           40   /* allow space for
> +                                          * OVMF SDT Header Probe Supressor
> +                                          */
> +
> +#define VMCOREINFO(obj) OBJECT_CHECK(VmcoreinfoState, (obj), VMCOREINFO_DEVICE)
> +
> +typedef struct VmcoreinfoState {

I think this should be spelled with a bit more camel-casing, like
VMCoreInfoState or some such.

> +    DeviceClass parent_obj;
> +    uint8_t vmcoreinfo_addr_le[8];   /* Address of memory region */
> +    bool write_pointer_available;
> +} VmcoreinfoState;
> +
> +/* returns NULL unless there is exactly one device */
> +static inline Object *find_vmcoreinfo_dev(void)
> +{
> +    return object_resolve_path_type("", VMCOREINFO_DEVICE, NULL);
> +}
> +
> +void vmcoreinfo_build_acpi(VmcoreinfoState *vis, GArray *table_data,
> +                           GArray *vmci, BIOSLinker *linker);
> +void vmcoreinfo_add_fw_cfg(VmcoreinfoState *vis, FWCfgState *s, GArray *vmci);
> +bool vmcoreinfo_get(VmcoreinfoState *vis, uint64_t *paddr, uint64_t *size,
> +                    Error **errp);
> +
> +#endif
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 36a6cc450e..47043ade4a 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1561,6 +1561,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
>      tables->table_data = g_array_new(false, true /* clear */, 1);
>      tables->tcpalog = g_array_new(false, true /* clear */, 1);
>      tables->vmgenid = g_array_new(false, true /* clear */, 1);
> +    tables->vmcoreinfo = g_array_new(false, true /* clear */, 1);
>      tables->linker = bios_linker_loader_init();
>  }
>  
> @@ -1571,6 +1572,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
>      g_array_free(tables->table_data, true);
>      g_array_free(tables->tcpalog, mfre);
>      g_array_free(tables->vmgenid, mfre);
> +    g_array_free(tables->vmcoreinfo, mfre);
>  }
>  
>  /* Build rsdt table */
> diff --git a/hw/acpi/vmcoreinfo.c b/hw/acpi/vmcoreinfo.c
> new file mode 100644
> index 0000000000..216e0bb83a
> --- /dev/null
> +++ b/hw/acpi/vmcoreinfo.c
> @@ -0,0 +1,198 @@
> +/*
> + *  Virtual Machine coreinfo device
> + *  (based on Virtual Machine Generation ID Device)
> + *
> + *  Copyright (C) 2017 Red Hat, Inc.
> + *  Copyright (C) 2017 Skyport Systems.
> + *
> + *  Authors: Marc-André Lureau <marcandre.lureau@redhat.com>
> + *           Ben Warren <ben@skyportsystems.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +#include "qemu/osdep.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/vmcoreinfo.h"
> +#include "hw/nvram/fw_cfg.h"
> +#include "sysemu/sysemu.h"
> +#include "qapi/error.h"
> +
> +void vmcoreinfo_build_acpi(VmcoreinfoState *vis, GArray *table_data,
> +                           GArray *vmci, BIOSLinker *linker)
> +{
> +    Aml *ssdt, *dev, *scope, *method, *addr, *if_ctx;
> +    uint32_t vgia_offset;

This should be called "vcia_offset".

> +
> +    g_array_set_size(vmci, VMCOREINFO_FW_CFG_SIZE);
> +
> +    /* Put this in a separate SSDT table */
> +    ssdt = init_aml_allocator();
> +
> +    /* Reserve space for header */
> +    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
> +
> +    /* Storage address */
> +    vgia_offset = table_data->len +
> +        build_append_named_dword(ssdt->buf, "VCIA");
> +    scope = aml_scope("\\_SB");
> +    dev = aml_device("VMCI");
> +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMUVMCI")));
> +
> +    /* Simple status method to check that address is linked and non-zero */
> +    method = aml_method("_STA", 0, AML_NOTSERIALIZED);
> +    addr = aml_local(0);
> +    aml_append(method, aml_store(aml_int(0xf), addr));
> +    if_ctx = aml_if(aml_equal(aml_name("VCIA"), aml_int(0)));
> +    aml_append(if_ctx, aml_store(aml_int(0), addr));
> +    aml_append(method, if_ctx);
> +    aml_append(method, aml_return(addr));
> +    aml_append(dev, method);
> +
> +    /* the ADDR method returns two 32-bit words representing the lower and
> +     * upper halves of the physical address of the vmcoreinfo area
> +     */
> +    method = aml_method("ADDR", 0, AML_NOTSERIALIZED);
> +
> +    addr = aml_local(0);
> +    aml_append(method, aml_store(aml_package(2), addr));
> +
> +    aml_append(method, aml_store(aml_add(aml_name("VCIA"),
> +                                         aml_int(VMCOREINFO_OFFSET), NULL),
> +                                 aml_index(addr, aml_int(0))));
> +    aml_append(method, aml_store(aml_int(0), aml_index(addr, aml_int(1))));
> +    aml_append(method, aml_return(addr));
> +
> +    aml_append(dev, method);
> +    aml_append(scope, dev);
> +    aml_append(ssdt, scope);
> +
> +    g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
> +
> +    /* Allocate guest memory */
> +    bios_linker_loader_alloc(linker, VMCOREINFO_FW_CFG_FILE, vmci, 4096,
> +                             false /* page boundary, high memory */);
> +
> +    /* Patch address of vmcoreinfo fw_cfg blob into the ADDR fw_cfg
> +     * blob so QEMU can read the info from there.  The address is
> +     * expected to be < 4GB, but write 64 bits anyway.
> +     * The address that is patched in is offset in order to implement
> +     * the "OVMF SDT Header probe suppressor"
> +     * see docs/specs/vmcoreinfo.txt for more details.
> +     */
> +    bios_linker_loader_write_pointer(linker,
> +        VMCOREINFO_ADDR_FW_CFG_FILE, 0, sizeof(uint64_t),
> +        VMCOREINFO_FW_CFG_FILE, VMCOREINFO_OFFSET);
> +
> +    /* Patch address of vmcoreinfo into the AML so OSPM can retrieve
> +     * and read it.  Note that while we provide storage for 64 bits, only
> +     * the least-signficant 32 get patched into AML.
> +     */
> +    bios_linker_loader_add_pointer(linker,
> +        ACPI_BUILD_TABLE_FILE, vgia_offset, sizeof(uint32_t),
> +        VMCOREINFO_FW_CFG_FILE, 0);
> +
> +    build_header(linker, table_data,
> +        (void *)(table_data->data + table_data->len - ssdt->buf->len),
> +        "SSDT", ssdt->buf->len, 1, NULL, "VMCOREIN");
> +    free_aml_allocator();
> +}
> +
> +void vmcoreinfo_add_fw_cfg(VmcoreinfoState *vis, FWCfgState *s, GArray *vmci)
> +{
> +    /* Create a read-only fw_cfg file for vmcoreinfo allocation */
> +    /* XXX: linker could learn to allocate without backing fw_cfg? */

Yes, and a number of other things, as I'm sure Igor and MST will point
out upon reading the patch :)

> +    fw_cfg_add_file(s, VMCOREINFO_FW_CFG_FILE, vmci->data,
> +                    VMCOREINFO_FW_CFG_SIZE);
> +    /* Create a read-write fw_cfg file for Address */
> +    fw_cfg_add_file_callback(s, VMCOREINFO_ADDR_FW_CFG_FILE, NULL, NULL,
> +                             vis->vmcoreinfo_addr_le,
> +                             ARRAY_SIZE(vis->vmcoreinfo_addr_le), false);
> +}
> +
> +bool vmcoreinfo_get(VmcoreinfoState *vis,
> +                    uint64_t *paddr, uint64_t *size,
> +                    Error **errp)
> +{
> +    uint32_t vmcoreinfo_addr;
> +    uint32_t version;
> +
> +    assert(vis);
> +    assert(paddr);
> +    assert(size);
> +
> +    memcpy(&vmcoreinfo_addr, vis->vmcoreinfo_addr_le, sizeof(vmcoreinfo_addr));
> +    vmcoreinfo_addr = le32_to_cpu(vmcoreinfo_addr);
> +    if (!vmcoreinfo_addr) {
> +        error_setg(errp, "BIOS has not yet written the address of %s",
> +                   VMCOREINFO_DEVICE);
> +        return false;
> +    }
> +
> +    cpu_physical_memory_read(vmcoreinfo_addr, &version, sizeof(version));
> +    if (version != 0) {
> +        error_setg(errp, "Unknown %s memory version", VMCOREINFO_DEVICE);
> +        return false;
> +    }
> +
> +    cpu_physical_memory_read(vmcoreinfo_addr + 4, paddr, sizeof(paddr));

This is wrong, it should be sizeof(*paddr).

> +    *paddr = le64_to_cpu(*paddr);
> +    cpu_physical_memory_read(vmcoreinfo_addr + 12, size, sizeof(size));

This is wrong for two reasons:
- first, it should be sizeof(*size),
- second, sizeof(*size) is 8, however, according to the design, the size
  field is 4 bytes wide. I think the function prototype should be
  updated to fix this.

> +    *size = le32_to_cpu(*size);
> +
> +    return true;
> +}
> +
> +static const VMStateDescription vmstate_vmcoreinfo = {
> +    .name = "vmcoreinfo",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT8_ARRAY(vmcoreinfo_addr_le, VmcoreinfoState, sizeof(uint64_t)),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static void vmcoreinfo_realize(DeviceState *dev, Error **errp)
> +{
> +    if (!bios_linker_loader_can_write_pointer()) {
> +        error_setg(errp, "%s requires DMA write support in fw_cfg, "
> +                   "which this machine type does not provide",
> +                   VMCOREINFO_DEVICE);
> +        return;
> +    }
> +
> +    /* Given that this function is executing, there is at least one VMCOREINFO
> +     * device. Check if there are several.
> +     */
> +    if (!find_vmcoreinfo_dev()) {
> +        error_setg(errp, "at most one %s device is permitted",
> +                   VMCOREINFO_DEVICE);
> +        return;
> +    }
> +}

You didn't copy the reset logic from vmgenid, but that's wrong -- all
device models that produce WRITE_POINTER linker/loader commands must
forget about guest-returned GPAs upon reset. Think of it this way: when
the guest executes WRITE_POINTER, it creates a reference to guest-phys
memory in QEMU; and when the guest initiates a reboot, guest-phys memory
is fully "invalidated", so all such references must be dropped in QEMU.

(S3 resume is an exception, because guest memory is preserved, but both
SeaBIOS and OVMF handle that specially -- they stash the WRITE_POINTER
commands in a condensed format during normal boot, and then replay them
during S3 resume. Thus, they restore the QEMU references discussed
abovem at S3 resume.)

In more practical terms, assume the guest boots Linux, the address is
passed back fine, then the Linux guest reboots, and then some other
guest OS is launched from the guest firmware and/or bootloader (or we
just stay in the firmware / bootloader indefinitely). If a dump is
requested at this point, QEMU shouldn't go looking for the VMCOREINFO
ELF note at the previously communicated address.

The rest looks OK to me.

Thanks,
Laszlo

> +
> +static void vmcoreinfo_device_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->vmsd = &vmstate_vmcoreinfo;
> +    dc->realize = vmcoreinfo_realize;
> +    dc->hotpluggable = false;
> +}
> +
> +static const TypeInfo vmcoreinfo_device_info = {
> +    .name          = VMCOREINFO_DEVICE,
> +    .parent        = TYPE_DEVICE,
> +    .instance_size = sizeof(VmcoreinfoState),
> +    .class_init    = vmcoreinfo_device_class_init,
> +};
> +
> +static void vmcoreinfo_register_types(void)
> +{
> +    type_register_static(&vmcoreinfo_device_info);
> +}
> +
> +type_init(vmcoreinfo_register_types)
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 0b8bc62b99..f8a6758841 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -43,6 +43,7 @@
>  #include "sysemu/tpm.h"
>  #include "hw/acpi/tpm.h"
>  #include "hw/acpi/vmgenid.h"
> +#include "hw/acpi/vmcoreinfo.h"
>  #include "sysemu/tpm_backend.h"
>  #include "hw/timer/mc146818rtc_regs.h"
>  #include "sysemu/numa.h"
> @@ -2641,6 +2642,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>      GArray *tables_blob = tables->table_data;
>      AcpiSlicOem slic_oem = { .id = NULL, .table_id = NULL };
>      Object *vmgenid_dev;
> +    Object *vmcoreinfo_dev;
>  
>      acpi_get_pm_info(&pm);
>      acpi_get_misc_info(&misc);
> @@ -2690,6 +2692,12 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>          vmgenid_build_acpi(VMGENID(vmgenid_dev), tables_blob,
>                             tables->vmgenid, tables->linker);
>      }
> +    vmcoreinfo_dev = find_vmcoreinfo_dev();
> +    if (vmcoreinfo_dev) {
> +        acpi_add_table(table_offsets, tables_blob);
> +        vmcoreinfo_build_acpi(VMCOREINFO(vmcoreinfo_dev), tables_blob,
> +                              tables->vmcoreinfo, tables->linker);
> +    }
>  
>      if (misc.has_hpet) {
>          acpi_add_table(table_offsets, tables_blob);
> @@ -2866,6 +2874,7 @@ void acpi_setup(void)
>      AcpiBuildTables tables;
>      AcpiBuildState *build_state;
>      Object *vmgenid_dev;
> +    Object *vmcoreinfo_dev;
>  
>      if (!pcms->fw_cfg) {
>          ACPI_BUILD_DPRINTF("No fw cfg. Bailing out.\n");
> @@ -2907,6 +2916,11 @@ void acpi_setup(void)
>          vmgenid_add_fw_cfg(VMGENID(vmgenid_dev), pcms->fw_cfg,
>                             tables.vmgenid);
>      }
> +    vmcoreinfo_dev = find_vmcoreinfo_dev();
> +    if (vmcoreinfo_dev) {
> +        vmcoreinfo_add_fw_cfg(VMCOREINFO(vmcoreinfo_dev), pcms->fw_cfg,
> +                              tables.vmcoreinfo);
> +    }
>  
>      if (!pcmc->rsdp_in_ram) {
>          /*
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 93e995d318..320dd3680d 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -120,6 +120,7 @@ CONFIG_XIO3130=y
>  CONFIG_IOH3420=y
>  CONFIG_I82801B11=y
>  CONFIG_ACPI=y
> +CONFIG_ACPI_VMCOREINFO=y
>  CONFIG_SMBIOS=y
>  CONFIG_ASPEED_SOC=y
>  CONFIG_GPIO_KEY=y
> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> index d2ab2f6655..df68628895 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -59,3 +59,4 @@ CONFIG_SMBIOS=y
>  CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
>  CONFIG_PXB=y
>  CONFIG_ACPI_VMGENID=y
> +CONFIG_ACPI_VMCOREINFO=y
> diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> index 9bde2f1c4b..e39ad5a680 100644
> --- a/default-configs/x86_64-softmmu.mak
> +++ b/default-configs/x86_64-softmmu.mak
> @@ -59,3 +59,4 @@ CONFIG_SMBIOS=y
>  CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
>  CONFIG_PXB=y
>  CONFIG_ACPI_VMGENID=y
> +CONFIG_ACPI_VMCOREINFO=y
> diff --git a/docs/specs/vmcoreinfo.txt b/docs/specs/vmcoreinfo.txt
> new file mode 100644
> index 0000000000..70d9716fe0
> --- /dev/null
> +++ b/docs/specs/vmcoreinfo.txt
> @@ -0,0 +1,138 @@
> +VIRTUAL MACHINE COREINFO DEVICE
> +===============================
> +
> +Copyright (C) 2017 Red Hat, Inc.
> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +===
> +
> +The VM coreinfo (vmcoreinfo) device is an emulated device which
> +exposes a 4k memory range to the guest to store various informations
> +useful to debug the guest OS.
> +
> +QEMU Implementation
> +-------------------
> +
> +The vmcoreinfo device is put in its own ACPI descriptor table, in a
> +Secondary System Description Table, or SSDT.
> +
> +The following is a dump of the contents from a running system:
> +
> +# iasl -p ./SSDT -d /sys/firmware/acpi/tables/SSDT
> +/*
> + * Intel ACPI Component Architecture
> + * AML/ASL+ Disassembler version 20160831-64
> + * Copyright (c) 2000 - 2016 Intel Corporation
> + *
> + * Disassembling to symbolic ASL+ operators
> + *
> + * Disassembly of /sys/firmware/acpi/tables/SSDT, Mon Apr 24 15:59:53 2017
> + *
> + * Original Table Header:
> + *     Signature        "SSDT"
> + *     Length           0x00000086 (134)
> + *     Revision         0x01
> + *     Checksum         0x5C
> + *     OEM ID           "BOCHS "
> + *     OEM Table ID     "VMCOREIN"
> + *     OEM Revision     0x00000001 (1)
> + *     Compiler ID      "BXPC"
> + *     Compiler Version 0x00000001 (1)
> + */
> +DefinitionBlock ("", "SSDT", 1, "BOCHS ", "VMCOREIN", 0x00000001)
> +{
> +    Name (VCIA, 0x3FFFF000)
> +    Scope (\_SB)
> +    {
> +        Device (VMCI)
> +        {
> +            Name (_HID, "QEMUVMCI")  // _HID: Hardware ID
> +            Method (_STA, 0, NotSerialized)  // _STA: Status
> +            {
> +                Local0 = 0x0F
> +                If (VCIA == Zero)
> +                {
> +                    Local0 = Zero
> +                }
> +
> +                Return (Local0)
> +            }
> +
> +            Method (ADDR, 0, NotSerialized)
> +            {
> +                Local0 = Package (0x02) {}
> +                Local0 [Zero] = (VCIA + 0x28)
> +                Local0 [One] = Zero
> +                Return (Local0)
> +            }
> +        }
> +    }
> +}
> +
> +
> +Design Details:
> +---------------
> +
> +QEMU must be able to read the contents of the device memory,
> +specifically when starting a memory dump.  In order to do this, QEMU
> +must know the address that has been allocated.
> +
> +The mechanism chosen for this memory sharing is writeable fw_cfg blobs.
> +These are data object that are visible to both QEMU and guests, and are
> +addressable as sequential files.
> +
> +More information about fw_cfg can be found in "docs/specs/fw_cfg.txt"
> +
> +Two fw_cfg blobs are used in this case:
> +
> +/etc/vmcoreinfo      - used to allocate memory range, read-only to the guest
> +/etc/vmcoreinfo-addr - contains the address of the allocated range
> +                     - writeable by the guest
> +
> +
> +QEMU sends the following commands to the guest at startup:
> +
> +1. Allocate memory for vmcoreinfo fw_cfg blob.
> +2. Write the address of vmcoreinfo into the SSDT (VCIA ACPI variable as
> +   shown above in the iasl dump).  Note that this change is not propagated
> +   back to QEMU.
> +3. Write the address of vmcoreinfo back to QEMU's copy of vmcoreinfo-addr
> +   via the fw_cfg DMA interface.
> +
> +After step 3, QEMU is able to read the contents of vmcoreinfo.
> +
> +The value of VCIA is persisted via the VMState mechanism.
> +
> +
> +Storage Format:
> +---------------
> +
> +The content is expected to use little-endian format.
> +
> +In order to implement an OVMF "SDT Header Probe Suppressor", the contents of
> +the vmcoreinfo blob has 40 bytes of padding:
> +
> ++-----------------------------------+
> +| SSDT with OEM Table ID = VMCOREIN |
> ++-----------------------------------+
> +| ...                               |       TOP OF PAGE
> +| VCIA dword object ----------------|-----> +---------------------------+
> +| ...                               |       | fw-allocated array for    |
> +| _STA method referring to VCIA     |       | "etc/vmcoreinfo"          |
> +| ...                               |       +---------------------------+
> +| ADDR method referring to VCIA     |       |  0: OVMF SDT Header probe |
> +| ...                               |       |     suppressor            |
> ++-----------------------------------+       | 40: uint32 version field  |
> +                                            | 44: info contents         |
> +                                            |     ....                  |
> +                                            +---------------------------+
> +                                            END OF PAGE
> +
> +Version 0 content:
> +
> + uint64 paddr:
> +  Physical address of the Linux vmcoreinfo ELF note.
> + uint32 size:
> +  Size of the vmcoreinfo ELF note.
> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> index 11c35bcb44..9623078f95 100644
> --- a/hw/acpi/Makefile.objs
> +++ b/hw/acpi/Makefile.objs
> @@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
>  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
>  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
>  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
> +common-obj-$(CONFIG_ACPI_VMCOREINFO) += vmcoreinfo.o
>  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
>  
>  common-obj-y += acpi_interface.o
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] dump: add vmcoreinfo ELF note
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 4/7] dump: add vmcoreinfo ELF note Marc-André Lureau
@ 2017-07-04 23:48   ` Laszlo Ersek
  2017-07-05 21:52     ` Marc-André Lureau
  0 siblings, 1 reply; 31+ messages in thread
From: Laszlo Ersek @ 2017-07-04 23:48 UTC (permalink / raw)
  To: Marc-André Lureau, qemu-devel; +Cc: ehabkost, anderson, imammedo

On 06/29/17 15:23, Marc-André Lureau wrote:
> Read the vmcoreinfo ELF PT_NOTE from guest memory when vmcoreinfo
> device provides the location, and write it as an ELF note in the dump.
> 
> There are now 2 possible sources of phys_base information.
> 
> (1) arch guessed value from arch_dump_info_get()

The function is called cpu_get_dump_info().

> (2) vmcoreinfo ELF note NUMBER(phys_base)= field
> 
>     NUMBER(phys_base) in vmcoreinfo has only been recently introduced
>     in Linux 4.10 (401721ecd1dc "kexec: export the value of phys_base
>     instead of symbol address").
> 
> Since (2) has better chances to be accurate, the guessed value is
> replaced by the value from the vmcoreinfo ELF note.
> 
> The phys_base value is stored in the same dump field locations as
> before, and may duplicate the information available in the vmcoreinfo
> ELF PT_NOTE. Crash tools should be prepared to handle this case.
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  include/sysemu/dump.h |   2 +
>  dump.c                | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 119 insertions(+)
> 
> diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
> index 2672a15f8b..111a7dcaa4 100644
> --- a/include/sysemu/dump.h
> +++ b/include/sysemu/dump.h
> @@ -192,6 +192,8 @@ typedef struct DumpState {
>                                    * this could be used to calculate
>                                    * how much work we have
>                                    * finished. */
> +    uint8_t *vmcoreinfo;         /* ELF note content */
> +    size_t vmcoreinfo_size;
>  } DumpState;
>  
>  uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
> diff --git a/dump.c b/dump.c
> index d9090a24cc..8fda5cc1ed 100644
> --- a/dump.c
> +++ b/dump.c
> @@ -26,6 +26,8 @@
>  #include "qapi/qmp/qerror.h"
>  #include "qmp-commands.h"
>  #include "qapi-event.h"
> +#include "qemu/error-report.h"
> +#include "hw/acpi/vmcoreinfo.h"
>  
>  #include <zlib.h>
>  #ifdef CONFIG_LZO
> @@ -38,6 +40,11 @@
>  #define ELF_MACHINE_UNAME "Unknown"
>  #endif
>  
> +#define ELF_NOTE_SIZE(hdr_size, name_size, desc_size)   \
> +    ((DIV_ROUND_UP((hdr_size), 4)                       \
> +      + DIV_ROUND_UP((name_size), 4)                    \
> +      + DIV_ROUND_UP((desc_size), 4)) * 4)
> +

This looks really useful to me, but (I think?) we generally leave the
operator hanging at the end of the line:

#define ELF_NOTE_SIZE(hdr_size, name_size, desc_size) \
    ((DIV_ROUND_UP((hdr_size), 4) +                   \
      DIV_ROUND_UP((name_size), 4) +                  \
      DIV_ROUND_UP((desc_size), 4)) * 4)

>  uint16_t cpu_to_dump16(DumpState *s, uint16_t val)
>  {
>      if (s->dump_info.d_endian == ELFDATA2LSB) {
> @@ -76,6 +83,8 @@ static int dump_cleanup(DumpState *s)
>      guest_phys_blocks_free(&s->guest_phys_blocks);
>      memory_mapping_list_free(&s->list);
>      close(s->fd);
> +    g_free(s->vmcoreinfo);
> +    s->vmcoreinfo = NULL;
>      if (s->resume) {
>          if (s->detached) {
>              qemu_mutex_lock_iothread();
> @@ -235,6 +244,19 @@ static inline int cpu_index(CPUState *cpu)
>      return cpu->cpu_index + 1;
>  }
>  
> +static void write_vmcoreinfo_note(WriteCoreDumpFunction f, DumpState *s,
> +                                  Error **errp)
> +{
> +    int ret;
> +
> +    if (s->vmcoreinfo) {
> +        ret = f(s->vmcoreinfo, s->vmcoreinfo_size, s);
> +        if (ret < 0) {
> +            error_setg(errp, "dump: failed to write vmcoreinfo");
> +        }
> +    }
> +}
> +
>  static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
>                                Error **errp)
>  {
> @@ -258,6 +280,8 @@ static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
>              return;
>          }
>      }
> +
> +    write_vmcoreinfo_note(f, s, errp);
>  }
>  
>  static void write_elf32_note(DumpState *s, Error **errp)
> @@ -303,6 +327,8 @@ static void write_elf32_notes(WriteCoreDumpFunction f, DumpState *s,
>              return;
>          }
>      }
> +
> +    write_vmcoreinfo_note(f, s, errp);
>  }
>  

Wait, I'm confused again. You explained why it was OK to hook this logic
into the kdump handling too, but I don't think I understand your
explanation, so let me repeat my confusion below :)

In the ELF case, this code works fine, I think. As long as the guest
provided us with a well-formed note, a well-formed note will be appended
to the ELF dump.

But, this code is also invoked in the kdump case, and I don't understand
why that's a good thing. If I understand the next patch correctly, the
kdump format already provides crash with a (trimmed) copy of the guest
kernels vmcoreinfo note. So in the kdump case, why do we have to create
yet another copy of the guest kernel's vmcoreinfo note?

Thus, my confusion persists, and I can only think (again) that
write_vmcoreinfo_note() should be called from dump_begin() only (at the
end). (And the s->note_size adjustment should take that into account.)

Alternatively, we should keep this logic, and drop patch #5.

>  static void write_elf_section(DumpState *s, int type, Error **errp)
> @@ -714,6 +740,44 @@ static int buf_write_note(const void *buf, size_t size, void *opaque)
>      return 0;
>  }
>  
> +/*
> + * This function retrieves various sizes from an elf header.
> + *
> + * @note has to be a valid ELF note. The return sizes are unmodified
> + * (not padded or rounded up to be multiple of 4).
> + */
> +static void get_note_sizes(DumpState *s, const void *note,
> +                           uint64_t *note_head_size,
> +                           uint64_t *name_size,
> +                           uint64_t *desc_size)
> +{
> +    uint64_t note_head_sz;
> +    uint64_t name_sz;
> +    uint64_t desc_sz;
> +
> +    if (s->dump_info.d_class == ELFCLASS64) {
> +        const Elf64_Nhdr *hdr = note;
> +        note_head_sz = sizeof(Elf64_Nhdr);
> +        name_sz = hdr->n_namesz;
> +        desc_sz = hdr->n_descsz;
> +    } else {
> +        const Elf32_Nhdr *hdr = note;
> +        note_head_sz = sizeof(Elf32_Nhdr);
> +        name_sz = hdr->n_namesz;
> +        desc_sz = hdr->n_descsz;
> +    }
> +
> +    if (note_head_size) {
> +        *note_head_size = note_head_sz;
> +    }
> +    if (name_size) {
> +        *name_size = name_sz;
> +    }
> +    if (desc_size) {
> +        *desc_size = desc_sz;
> +    }
> +}

Should we do any endianness conversions here? (Because I think we're
going to parse the guest kernel's vmcoreinfo note with this function.)

> +
>  /* write common header, sub header and elf note to vmcore */
>  static void create_header32(DumpState *s, Error **errp)
>  {
> @@ -1488,10 +1552,38 @@ static int64_t dump_calculate_size(DumpState *s)
>      return total;
>  }
>  
> +static void vmcoreinfo_update_phys_base(DumpState *s)
> +{
> +    uint64_t size, note_head_size, name_size;
> +    char **lines;
> +    uint8_t *vmci;
> +    size_t i;
> +
> +    get_note_sizes(s, s->vmcoreinfo, &note_head_size, &name_size, &size);
> +    note_head_size = ROUND_UP(note_head_size, 4);
> +    name_size = ROUND_UP(name_size, 4);
> +    vmci = s->vmcoreinfo + note_head_size + name_size;
> +    *(vmci + size) = '\0';

I think this is a bit cavalier; you write to QEMU process memory based
on what the guest kernel tells you through its vmcoreinfo note. For
this, we need strict validation; namely:

(1) Via vmcoreinfo_get() we learn how large the vmcoreinfo note is,
    according to the guest kernel.

    In dump_init(), you run a g_malloc() on this value, which is a bad
    idea without some sanity upper bound.

(2) Assuming we managed to copy the guest kernel's vmcoreinfo note out
    of guest RAM, we parse it. We expect it to be an ElfXX_Nhdr
    (according to arch).

    In dump_init(), you pass the read data to get_note_sizes(), without
    checking whether the read data are at least as big as required by
    ElfXX_Nhdr (according to arch). We could be reading n_namesz and
    n_descsz from outside of the read buffer.

(3) Assuming "s->vmcoreinfo" was large enough and we fetched (and
    possibly converted the endianness of?) n_namesz and n_descsz,
    we must round up these values to integral multiples of 4.

    None of the individual rounding-up steps must overflow uintXX_t,
    but that is not checked in ELF_NOTE_SIZE().

(4) After we rounded up n_namesz and n_descsz, we calculate the sum

      sizeof(ElfXX_Nhdr) + round_up(n_namesz) + round_up(n_descsz)

    This sum must not overflow uintXX_t, but this is not checked in
    ELF_NOTE_SIZE().

(5) Assuming the sum can be represented fine, we should validate whether
    it matches the full note size advertized by the guest kernel
    (currently that means your experimental guest kernel module).

    In dump_init() you enforce

      s->vmcoreinfo_size <= size

    This looks safe, but the above NUL-termination of s->vmcoreinfo,
    for the purposes of g_strsplit(), actually writes one byte past
    the end of s->vmcoreinfo, *if* (s->vmcoreinfo_size == size) *and*
    desc_size is already a multiple of 4.

    So, for NUL-termination, you likely have to increase the initial
    allocation by one byte. (Any overflow in the addition can be
    prevented by the sanity upper bound from (1).)

(6) If the note parses fine, we add its size to s->note_size, in
    dump_init(). I think there's another (theoretical) chance for
    overflow here, which can also be prevented by the sanity upper bound
    in step (1).

> +
> +    lines = g_strsplit((char *)vmci, "\n", -1);
> +    for (i = 0; lines[i]; i++) {
> +        if (g_str_has_prefix(lines[i], "NUMBER(phys_base)=")) {
> +            if (qemu_strtou64(lines[i] + 18, NULL, 16,
> +                              &s->dump_info.phys_base) < 0) {

qemu_strtou64() can modify *result even on error, which will garble our
previously guessed phys_base value, from cpu_get_dump_info(). I think
you should use a local variable here.

> +                error_report("Failed to read NUMBER(phys_base)=");

I understand that this error is not supposed to fail the entire monitor
command, but I'm nonetheless unsure if this is the best way to report
warnings. Does this error message have to go to the QMP monitor as well?

> +            }
> +            break;
> +        }
> +    }
> +
> +    g_strfreev(lines);
> +}
> +
>  static void dump_init(DumpState *s, int fd, bool has_format,
>                        DumpGuestMemoryFormat format, bool paging, bool has_filter,
>                        int64_t begin, int64_t length, Error **errp)
>  {
> +    Object *vmcoreinfo_dev = find_vmcoreinfo_dev();
>      CPUState *cpu;
>      int nr_cpus;
>      Error *err = NULL;
> @@ -1563,6 +1655,31 @@ static void dump_init(DumpState *s, int fd, bool has_format,
>          goto cleanup;
>      }
>  
> +    if (vmcoreinfo_dev) {
> +        uint64_t addr, size, note_head_size, name_size, desc_size;
> +
> +        if (!vmcoreinfo_get(VMCOREINFO(vmcoreinfo_dev),
> +                            &addr, &size, &err)) {
> +            error_report_err(err);
> +        } else {
> +            s->vmcoreinfo = g_malloc(size);
> +            cpu_physical_memory_read(addr, s->vmcoreinfo, size);
> +
> +            get_note_sizes(s, s->vmcoreinfo,
> +                           &note_head_size, &name_size, &desc_size);
> +            s->vmcoreinfo_size = ELF_NOTE_SIZE(note_head_size, name_size,
> +                                               desc_size);
> +            if (s->vmcoreinfo_size > size) {
> +                error_report("Invalid vmcoreinfo header, size mismatch");
> +                g_free(s->vmcoreinfo);
> +                s->vmcoreinfo = NULL;
> +            } else {
> +                vmcoreinfo_update_phys_base(s);
> +                s->note_size += s->vmcoreinfo_size;
> +            }
> +        }
> +    }
> +
>      /* get memory mapping */
>      if (paging) {
>          qemu_get_guest_memory_mapping(&s->list, &s->guest_phys_blocks, &err);
> 

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] kdump: add vmcoreinfo ELF note
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 5/7] kdump: " Marc-André Lureau
@ 2017-07-05  0:07   ` Laszlo Ersek
  2017-07-06 10:05     ` Marc-André Lureau
  0 siblings, 1 reply; 31+ messages in thread
From: Laszlo Ersek @ 2017-07-05  0:07 UTC (permalink / raw)
  To: Marc-André Lureau, qemu-devel; +Cc: ehabkost, anderson, imammedo

On 06/29/17 15:23, Marc-André Lureau wrote:
> kdump header provides offset and size of the vmcoreinfo ELF note,
> append it if available.
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  dump.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/dump.c b/dump.c
> index 8fda5cc1ed..b78bc1fda7 100644
> --- a/dump.c
> +++ b/dump.c
> @@ -788,8 +788,9 @@ static void create_header32(DumpState *s, Error **errp)
>      uint32_t sub_hdr_size;
>      uint32_t bitmap_blocks;
>      uint32_t status = 0;
> -    uint64_t offset_note;
> +    uint64_t offset_note, offset_vmcoreinfo, size_vmcoreinfo = 0;
>      Error *local_err = NULL;
> +    uint8_t *vmcoreinfo = NULL;
>  
>      /* write common header, the version of kdump-compressed format is 6th */
>      size = sizeof(DiskDumpHeader32);
> @@ -838,7 +839,18 @@ static void create_header32(DumpState *s, Error **errp)
>      kh->phys_base = cpu_to_dump32(s, s->dump_info.phys_base);
>      kh->dump_level = cpu_to_dump32(s, DUMP_LEVEL);
>  
> -    offset_note = DISKDUMP_HEADER_BLOCKS * block_size + size;
> +    offset_vmcoreinfo = DISKDUMP_HEADER_BLOCKS * block_size + size;
> +    if (s->vmcoreinfo) {
> +        uint64_t hsize, name_size;
> +
> +        get_note_sizes(s, s->vmcoreinfo, &hsize, &name_size, &size_vmcoreinfo);

Should we round up "size_vmcoreinfo" as well? (Without the rounding,
offset_note might become unaligned, plus I simply don't know what
alignment is expected from kh->size_vmcoreinfo.)

> +        vmcoreinfo =
> +            s->vmcoreinfo + (DIV_ROUND_UP(hsize, 4) + DIV_ROUND_UP(name_size, 4)) * 4;
> +        kh->offset_vmcoreinfo = cpu_to_dump64(s, offset_vmcoreinfo);
> +        kh->size_vmcoreinfo = cpu_to_dump32(s, size_vmcoreinfo);
> +    }
> +
> +    offset_note = offset_vmcoreinfo + size_vmcoreinfo;
>      kh->offset_note = cpu_to_dump64(s, offset_note);
>      kh->note_size = cpu_to_dump32(s, s->note_size);
>  
> @@ -848,6 +860,14 @@ static void create_header32(DumpState *s, Error **errp)
>          goto out;
>      }
>  
> +    if (vmcoreinfo) {
> +        if (write_buffer(s->fd, offset_vmcoreinfo, vmcoreinfo,
> +                         size_vmcoreinfo) < 0) {
> +            error_setg(errp, "dump: failed to vmcoreinfo");

The verb "write" is missing from the message.

Same comments for create_header64() below.

Thanks
Laszlo

> +            goto out;
> +        }
> +    }
> +
>      /* write note */
>      s->note_buf = g_malloc0(s->note_size);
>      s->note_buf_offset = 0;
> @@ -888,8 +908,9 @@ static void create_header64(DumpState *s, Error **errp)
>      uint32_t sub_hdr_size;
>      uint32_t bitmap_blocks;
>      uint32_t status = 0;
> -    uint64_t offset_note;
> +    uint64_t offset_note, offset_vmcoreinfo, size_vmcoreinfo = 0;
>      Error *local_err = NULL;
> +    uint8_t *vmcoreinfo = NULL;
>  
>      /* write common header, the version of kdump-compressed format is 6th */
>      size = sizeof(DiskDumpHeader64);
> @@ -938,7 +959,18 @@ static void create_header64(DumpState *s, Error **errp)
>      kh->phys_base = cpu_to_dump64(s, s->dump_info.phys_base);
>      kh->dump_level = cpu_to_dump32(s, DUMP_LEVEL);
>  
> -    offset_note = DISKDUMP_HEADER_BLOCKS * block_size + size;
> +    offset_vmcoreinfo = DISKDUMP_HEADER_BLOCKS * block_size + size;
> +    if (s->vmcoreinfo) {
> +        uint64_t hsize, name_size;
> +
> +        get_note_sizes(s, s->vmcoreinfo, &hsize, &name_size, &size_vmcoreinfo);
> +        vmcoreinfo =
> +            s->vmcoreinfo + (DIV_ROUND_UP(hsize, 4) + DIV_ROUND_UP(name_size, 4)) * 4;
> +        kh->offset_vmcoreinfo = cpu_to_dump64(s, offset_vmcoreinfo);
> +        kh->size_vmcoreinfo = cpu_to_dump64(s, size_vmcoreinfo);
> +    }
> +
> +    offset_note = offset_vmcoreinfo + size_vmcoreinfo;
>      kh->offset_note = cpu_to_dump64(s, offset_note);
>      kh->note_size = cpu_to_dump64(s, s->note_size);
>  
> @@ -948,6 +980,14 @@ static void create_header64(DumpState *s, Error **errp)
>          goto out;
>      }
>  
> +    if (vmcoreinfo) {
> +        if (write_buffer(s->fd, offset_vmcoreinfo, vmcoreinfo,
> +                         size_vmcoreinfo) < 0) {
> +            error_setg(errp, "dump: failed to vmcoreinfo");
> +            goto out;
> +        }
> +    }
> +
>      /* write note */
>      s->note_buf = g_malloc0(s->note_size);
>      s->note_buf_offset = 0;
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 6/7] scripts/dump-guest-memory.py: add vmcoreinfo
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 6/7] scripts/dump-guest-memory.py: add vmcoreinfo Marc-André Lureau
@ 2017-07-05  0:22   ` Laszlo Ersek
  2017-07-05  9:58     ` Marc-André Lureau
  0 siblings, 1 reply; 31+ messages in thread
From: Laszlo Ersek @ 2017-07-05  0:22 UTC (permalink / raw)
  To: Marc-André Lureau, qemu-devel
  Cc: ehabkost, Michael S. Tsirkin, anderson, imammedo

On 06/29/17 15:23, Marc-André Lureau wrote:
> Add vmcoreinfo ELF note if vmcoreinfo device is ready.
> 
> To help the python script, add a little global vmcoreinfo_gdb
> structure, that is populated with vmcoreinfo_gdb_update().
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  scripts/dump-guest-memory.py | 32 ++++++++++++++++++++++++++++++++
>  include/hw/acpi/vmcoreinfo.h |  1 +
>  hw/acpi/vmcoreinfo.c         | 16 ++++++++++++++++
>  3 files changed, 49 insertions(+)
> 
> diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py
> index f7c6635f15..16c3d7cb10 100644
> --- a/scripts/dump-guest-memory.py
> +++ b/scripts/dump-guest-memory.py
> @@ -120,6 +120,20 @@ class ELF(object):
>          self.segments[0].p_filesz += ctypes.sizeof(note)
>          self.segments[0].p_memsz += ctypes.sizeof(note)
>  
> +
> +    def add_vmcoreinfo_note(self, vmcoreinfo):
> +        """Adds a vmcoreinfo note to the ELF dump."""
> +        chead = type(get_arch_note(self.endianness, 0, 0))
> +        header = chead.from_buffer_copy(vmcoreinfo[0:ctypes.sizeof(chead)])
> +        note = get_arch_note(self.endianness,
> +                             header.n_namesz - 1, header.n_descsz)
> +        ctypes.memmove(ctypes.pointer(note), vmcoreinfo, ctypes.sizeof(note))
> +        header_size = ctypes.sizeof(note) - header.n_descsz
> +
> +        self.notes.append(note)
> +        self.segments[0].p_filesz += ctypes.sizeof(note)
> +        self.segments[0].p_memsz += ctypes.sizeof(note)
> +
>      def add_segment(self, p_type, p_paddr, p_size):
>          """Adds a segment to the elf."""
>  
> @@ -505,6 +519,23 @@ shape and this command should mostly work."""
>                  cur += chunk_size
>                  left -= chunk_size
>  
> +    def add_vmcoreinfo(self):
> +        qemu_core = gdb.inferiors()[0]
> +
> +        gdb.execute("call vmcoreinfo_gdb_update()")

I think it's a bad idea to call a function from a process that's just
crashed.

If this feature is so important, maybe we can simply set a global
pointer variable at the end of vmcoreinfo_realize(); something like:

static void vmcoreinfo_realize(DeviceState *dev, Error **errp)
{
    static VmcoreinfoState * volatile vmcoreinfo_gdb_helper;
    [...]
    vmcoreinfo_gdb_helper = VMCOREINFO(dev);
}

- vmcoreinfo_gdb_helper has function scope, so no other code can abuse
  it
- it has static storage duration so gdb can access it at any time
- the pointer (not the pointed-to object) is qualified volatile, so gcc
  cannot optimize out the pointer assignment (which it might be tempted
  to do otherwise, due to the pointer never being read within QEMU)

Then you can use "vmcoreinfo_gdb_helper->vmcoreinfo_addr_le" to
implement all the logic in "dump-guest-memory.py".

Just my two cents, of course.

Thanks
Laszlo

> +        avail = gdb.parse_and_eval("vmcoreinfo_gdb.available")
> +        if not avail:
> +            return;
> +
> +        addr = gdb.parse_and_eval("vmcoreinfo_gdb.paddr")
> +        size = gdb.parse_and_eval("vmcoreinfo_gdb.size")
> +        for block in self.guest_phys_blocks:
> +            if block["target_start"] <= addr < block["target_end"]:
> +                haddr = block["host_addr"] + (addr - block["target_start"])
> +                vmcoreinfo = qemu_core.read_memory(haddr, size)
> +                self.elf.add_vmcoreinfo_note(vmcoreinfo.tobytes())
> +                return
> +
>      def invoke(self, args, from_tty):
>          """Handles command invocation from gdb."""
>  
> @@ -518,6 +549,7 @@ shape and this command should mostly work."""
>  
>          self.elf = ELF(argv[1])
>          self.guest_phys_blocks = get_guest_phys_blocks()
> +        self.add_vmcoreinfo()
>  
>          with open(argv[0], "wb") as vmcore:
>              self.dump_init(vmcore)
> diff --git a/include/hw/acpi/vmcoreinfo.h b/include/hw/acpi/vmcoreinfo.h
> index 40fe99c3ed..4efa678237 100644
> --- a/include/hw/acpi/vmcoreinfo.h
> +++ b/include/hw/acpi/vmcoreinfo.h
> @@ -32,5 +32,6 @@ void vmcoreinfo_build_acpi(VmcoreinfoState *vis, GArray *table_data,
>  void vmcoreinfo_add_fw_cfg(VmcoreinfoState *vis, FWCfgState *s, GArray *vmci);
>  bool vmcoreinfo_get(VmcoreinfoState *vis, uint64_t *paddr, uint64_t *size,
>                      Error **errp);
> +void vmcoreinfo_gdb_update(void);
>  
>  #endif
> diff --git a/hw/acpi/vmcoreinfo.c b/hw/acpi/vmcoreinfo.c
> index 216e0bb83a..75e3330813 100644
> --- a/hw/acpi/vmcoreinfo.c
> +++ b/hw/acpi/vmcoreinfo.c
> @@ -145,6 +145,22 @@ bool vmcoreinfo_get(VmcoreinfoState *vis,
>      return true;
>  }
>  
> +struct vmcoreinfo_gdb {
> +    bool available;
> +    uint64_t paddr;
> +    uint64_t size;
> +} vmcoreinfo_gdb;
> +
> +void vmcoreinfo_gdb_update(void)
> +{
> +    Object *vmci = find_vmcoreinfo_dev();
> +
> +    vmcoreinfo_gdb.available = vmci ?
> +        vmcoreinfo_get(VMCOREINFO(vmci),
> +                       &vmcoreinfo_gdb.paddr, &vmcoreinfo_gdb.size, NULL)
> +        : false;
> +}
> +
>  static const VMStateDescription vmstate_vmcoreinfo = {
>      .name = "vmcoreinfo",
>      .version_id = 1,
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] MAINTAINERS: add Dump maintainers
  2017-06-29 13:23 ` [Qemu-devel] [PATCH 7/7] MAINTAINERS: add Dump maintainers Marc-André Lureau
@ 2017-07-05  0:26   ` Laszlo Ersek
  2017-07-06  9:54     ` Marc-André Lureau
  0 siblings, 1 reply; 31+ messages in thread
From: Laszlo Ersek @ 2017-07-05  0:26 UTC (permalink / raw)
  To: Marc-André Lureau, qemu-devel; +Cc: ehabkost, anderson, imammedo

On 06/29/17 15:23, Marc-André Lureau wrote:
> Proposing myself, since I have some familiarity with the code now.
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  MAINTAINERS | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 839f7ca063..45a0eb4cb0 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1272,6 +1272,13 @@ S: Maintained
>  F: device_tree.c
>  F: include/sysemu/device_tree.h
>  
> +Dump
> +S: Supported
> +M: Marc-André Lureau <marcandre.lureau@redhat.com>
> +F: dump.c
> +F: include/sysemu/dump.h
> +F: include/sysemu/dump-arch.h
> +
>  Error reporting
>  M: Markus Armbruster <armbru@redhat.com>
>  S: Supported
> 

That's very kind of you, thanks!

Do you have suggestions for the following files?

scripts/dump-guest-memory.py
stubs/dump.c
target/arm/arch_dump.c
target/i386/arch_dump.c
target/ppc/arch_dump.c
target/s390x/arch_dump.c

(If not, I'm OK with that too.)

Thanks!
Laszlo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 6/7] scripts/dump-guest-memory.py: add vmcoreinfo
  2017-07-05  0:22   ` Laszlo Ersek
@ 2017-07-05  9:58     ` Marc-André Lureau
  2017-07-05 11:05       ` Laszlo Ersek
  0 siblings, 1 reply; 31+ messages in thread
From: Marc-André Lureau @ 2017-07-05  9:58 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: QEMU, Igor Mammedov, Dave Anderson, Eduardo Habkost, Michael S. Tsirkin

Hi

On Wed, Jul 5, 2017 at 2:22 AM, Laszlo Ersek <lersek@redhat.com> wrote:
> On 06/29/17 15:23, Marc-André Lureau wrote:
>> Add vmcoreinfo ELF note if vmcoreinfo device is ready.
>>
>> To help the python script, add a little global vmcoreinfo_gdb
>> structure, that is populated with vmcoreinfo_gdb_update().
>>
>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>> ---
>>  scripts/dump-guest-memory.py | 32 ++++++++++++++++++++++++++++++++
>>  include/hw/acpi/vmcoreinfo.h |  1 +
>>  hw/acpi/vmcoreinfo.c         | 16 ++++++++++++++++
>>  3 files changed, 49 insertions(+)
>>
>> diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py
>> index f7c6635f15..16c3d7cb10 100644
>> --- a/scripts/dump-guest-memory.py
>> +++ b/scripts/dump-guest-memory.py
>> @@ -120,6 +120,20 @@ class ELF(object):
>>          self.segments[0].p_filesz += ctypes.sizeof(note)
>>          self.segments[0].p_memsz += ctypes.sizeof(note)
>>
>> +
>> +    def add_vmcoreinfo_note(self, vmcoreinfo):
>> +        """Adds a vmcoreinfo note to the ELF dump."""
>> +        chead = type(get_arch_note(self.endianness, 0, 0))
>> +        header = chead.from_buffer_copy(vmcoreinfo[0:ctypes.sizeof(chead)])
>> +        note = get_arch_note(self.endianness,
>> +                             header.n_namesz - 1, header.n_descsz)
>> +        ctypes.memmove(ctypes.pointer(note), vmcoreinfo, ctypes.sizeof(note))
>> +        header_size = ctypes.sizeof(note) - header.n_descsz
>> +
>> +        self.notes.append(note)
>> +        self.segments[0].p_filesz += ctypes.sizeof(note)
>> +        self.segments[0].p_memsz += ctypes.sizeof(note)
>> +
>>      def add_segment(self, p_type, p_paddr, p_size):
>>          """Adds a segment to the elf."""
>>
>> @@ -505,6 +519,23 @@ shape and this command should mostly work."""
>>                  cur += chunk_size
>>                  left -= chunk_size
>>
>> +    def add_vmcoreinfo(self):
>> +        qemu_core = gdb.inferiors()[0]
>> +
>> +        gdb.execute("call vmcoreinfo_gdb_update()")
>
> I think it's a bad idea to call a function from a process that's just
> crashed.

Yeah, if qemu crashed you can't use that script. But we are talking
about dump of guest kernel, so qemu didn't crash :)

>
> If this feature is so important, maybe we can simply set a global
> pointer variable at the end of vmcoreinfo_realize(); something like:
>
> static void vmcoreinfo_realize(DeviceState *dev, Error **errp)
> {
>     static VmcoreinfoState * volatile vmcoreinfo_gdb_helper;
>     [...]
>     vmcoreinfo_gdb_helper = VMCOREINFO(dev);
> }
>
> - vmcoreinfo_gdb_helper has function scope, so no other code can abuse
>   it
> - it has static storage duration so gdb can access it at any time
> - the pointer (not the pointed-to object) is qualified volatile, so gcc
>   cannot optimize out the pointer assignment (which it might be tempted
>   to do otherwise, due to the pointer never being read within QEMU)
>
> Then you can use "vmcoreinfo_gdb_helper->vmcoreinfo_addr_le" to
> implement all the logic in "dump-guest-memory.py".

If necessary, I can try that.

>
> Just my two cents, of course.
>
> Thanks
> Laszlo
>
>> +        avail = gdb.parse_and_eval("vmcoreinfo_gdb.available")
>> +        if not avail:
>> +            return;
>> +
>> +        addr = gdb.parse_and_eval("vmcoreinfo_gdb.paddr")
>> +        size = gdb.parse_and_eval("vmcoreinfo_gdb.size")
>> +        for block in self.guest_phys_blocks:
>> +            if block["target_start"] <= addr < block["target_end"]:
>> +                haddr = block["host_addr"] + (addr - block["target_start"])
>> +                vmcoreinfo = qemu_core.read_memory(haddr, size)
>> +                self.elf.add_vmcoreinfo_note(vmcoreinfo.tobytes())
>> +                return
>> +
>>      def invoke(self, args, from_tty):
>>          """Handles command invocation from gdb."""
>>
>> @@ -518,6 +549,7 @@ shape and this command should mostly work."""
>>
>>          self.elf = ELF(argv[1])
>>          self.guest_phys_blocks = get_guest_phys_blocks()
>> +        self.add_vmcoreinfo()
>>
>>          with open(argv[0], "wb") as vmcore:
>>              self.dump_init(vmcore)
>> diff --git a/include/hw/acpi/vmcoreinfo.h b/include/hw/acpi/vmcoreinfo.h
>> index 40fe99c3ed..4efa678237 100644
>> --- a/include/hw/acpi/vmcoreinfo.h
>> +++ b/include/hw/acpi/vmcoreinfo.h
>> @@ -32,5 +32,6 @@ void vmcoreinfo_build_acpi(VmcoreinfoState *vis, GArray *table_data,
>>  void vmcoreinfo_add_fw_cfg(VmcoreinfoState *vis, FWCfgState *s, GArray *vmci);
>>  bool vmcoreinfo_get(VmcoreinfoState *vis, uint64_t *paddr, uint64_t *size,
>>                      Error **errp);
>> +void vmcoreinfo_gdb_update(void);
>>
>>  #endif
>> diff --git a/hw/acpi/vmcoreinfo.c b/hw/acpi/vmcoreinfo.c
>> index 216e0bb83a..75e3330813 100644
>> --- a/hw/acpi/vmcoreinfo.c
>> +++ b/hw/acpi/vmcoreinfo.c
>> @@ -145,6 +145,22 @@ bool vmcoreinfo_get(VmcoreinfoState *vis,
>>      return true;
>>  }
>>
>> +struct vmcoreinfo_gdb {
>> +    bool available;
>> +    uint64_t paddr;
>> +    uint64_t size;
>> +} vmcoreinfo_gdb;
>> +
>> +void vmcoreinfo_gdb_update(void)
>> +{
>> +    Object *vmci = find_vmcoreinfo_dev();
>> +
>> +    vmcoreinfo_gdb.available = vmci ?
>> +        vmcoreinfo_get(VMCOREINFO(vmci),
>> +                       &vmcoreinfo_gdb.paddr, &vmcoreinfo_gdb.size, NULL)
>> +        : false;
>> +}
>> +
>>  static const VMStateDescription vmstate_vmcoreinfo = {
>>      .name = "vmcoreinfo",
>>      .version_id = 1,
>>
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 6/7] scripts/dump-guest-memory.py: add vmcoreinfo
  2017-07-05  9:58     ` Marc-André Lureau
@ 2017-07-05 11:05       ` Laszlo Ersek
  0 siblings, 0 replies; 31+ messages in thread
From: Laszlo Ersek @ 2017-07-05 11:05 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: QEMU, Igor Mammedov, Dave Anderson, Eduardo Habkost, Michael S. Tsirkin

On 07/05/17 11:58, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 5, 2017 at 2:22 AM, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 06/29/17 15:23, Marc-André Lureau wrote:
>>> Add vmcoreinfo ELF note if vmcoreinfo device is ready.
>>>
>>> To help the python script, add a little global vmcoreinfo_gdb
>>> structure, that is populated with vmcoreinfo_gdb_update().
>>>
>>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>>> ---
>>>  scripts/dump-guest-memory.py | 32 ++++++++++++++++++++++++++++++++
>>>  include/hw/acpi/vmcoreinfo.h |  1 +
>>>  hw/acpi/vmcoreinfo.c         | 16 ++++++++++++++++
>>>  3 files changed, 49 insertions(+)
>>>
>>> diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py
>>> index f7c6635f15..16c3d7cb10 100644
>>> --- a/scripts/dump-guest-memory.py
>>> +++ b/scripts/dump-guest-memory.py
>>> @@ -120,6 +120,20 @@ class ELF(object):
>>>          self.segments[0].p_filesz += ctypes.sizeof(note)
>>>          self.segments[0].p_memsz += ctypes.sizeof(note)
>>>
>>> +
>>> +    def add_vmcoreinfo_note(self, vmcoreinfo):
>>> +        """Adds a vmcoreinfo note to the ELF dump."""
>>> +        chead = type(get_arch_note(self.endianness, 0, 0))
>>> +        header = chead.from_buffer_copy(vmcoreinfo[0:ctypes.sizeof(chead)])
>>> +        note = get_arch_note(self.endianness,
>>> +                             header.n_namesz - 1, header.n_descsz)
>>> +        ctypes.memmove(ctypes.pointer(note), vmcoreinfo, ctypes.sizeof(note))
>>> +        header_size = ctypes.sizeof(note) - header.n_descsz
>>> +
>>> +        self.notes.append(note)
>>> +        self.segments[0].p_filesz += ctypes.sizeof(note)
>>> +        self.segments[0].p_memsz += ctypes.sizeof(note)
>>> +
>>>      def add_segment(self, p_type, p_paddr, p_size):
>>>          """Adds a segment to the elf."""
>>>
>>> @@ -505,6 +519,23 @@ shape and this command should mostly work."""
>>>                  cur += chunk_size
>>>                  left -= chunk_size
>>>
>>> +    def add_vmcoreinfo(self):
>>> +        qemu_core = gdb.inferiors()[0]
>>> +
>>> +        gdb.execute("call vmcoreinfo_gdb_update()")
>>
>> I think it's a bad idea to call a function from a process that's just
>> crashed.
> 
> Yeah, if qemu crashed you can't use that script. But we are talking
> about dump of guest kernel, so qemu didn't crash :)

I think we have a misunderstanding here. Extracting the guest kernel
core from the core dump of a crashed QEMU process is the *only* purpose
of the GDB extension implemented in "dump-guest-memory.py".

In other words, if you are loading "dump-guest-memory.py" into gdb, then
QEMU crashed *by definition*. Because otherwise you'd dump the guest
kernel core using the live monitor commands (HMP or QMP).

So, "dump-guest-memory.py" is really a last resort utility, for the case
when the guest kernel does something "interesting" that crashes QEMU
with hopefully localized damage, and most of the data structures
hopefully remain usable. It is not guaranteed at all that
"dump-guest-memory.py" will produce anything useful, dependent on how
corrupt the QEMU process memory is at the time of the SIGSEGV or SIGABRT
(or another fatal signal).

Please see the message on original QEMU commit 3e16d14fd93c
("Python-lang gdb script to extract x86_64 guest vmcore from qemu
coredump", 2013-12-17).

See also the original RFE -- I apologize to non-Red Hatters, the RHBZ is
private because it was filed by a customer --:

https://bugzilla.redhat.com/show_bug.cgi?id=826266

In my opinion, poking at possibly corrupt data structures with the
python script is OK, while executing code directly from the crashed
image is too much. But, again, that's just my opinion.

> 
>>
>> If this feature is so important, maybe we can simply set a global
>> pointer variable at the end of vmcoreinfo_realize(); something like:
>>
>> static void vmcoreinfo_realize(DeviceState *dev, Error **errp)
>> {
>>     static VmcoreinfoState * volatile vmcoreinfo_gdb_helper;
>>     [...]
>>     vmcoreinfo_gdb_helper = VMCOREINFO(dev);
>> }
>>
>> - vmcoreinfo_gdb_helper has function scope, so no other code can abuse
>>   it
>> - it has static storage duration so gdb can access it at any time
>> - the pointer (not the pointed-to object) is qualified volatile, so gcc
>>   cannot optimize out the pointer assignment (which it might be tempted
>>   to do otherwise, due to the pointer never being read within QEMU)
>>
>> Then you can use "vmcoreinfo_gdb_helper->vmcoreinfo_addr_le" to
>> implement all the logic in "dump-guest-memory.py".
> 
> If necessary, I can try that.

Well, I can't claim it is "objectively necessary"; it's just that this
method would satisfy my preferences above (i.e., poking at process data
from python: OK; running code from the process: not OK).

Thanks,
Laszlo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] acpi: add vmcoreinfo device
  2017-07-04 22:07   ` Laszlo Ersek
@ 2017-07-05 13:54     ` Marc-André Lureau
  0 siblings, 0 replies; 31+ messages in thread
From: Marc-André Lureau @ 2017-07-05 13:54 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: qemu-devel, ehabkost, Michael S. Tsirkin, Paolo Bonzini,
	anderson, imammedo, Richard Henderson

Hi

----- Original Message -----
> comments below
> 
> On 06/29/17 15:23, Marc-André Lureau wrote:
> > The VM coreinfo (vmcoreinfo) device is an emulated device which
> > exposes a 4k memory range to the guest to store various informations
> > useful to debug the guest OS. (it is greatly inspired by the VMGENID
> > device implementation)
> > 
> > This is an early-boot alternative to the qemu-ga VMDUMP_INFO event
> > proposed in "[PATCH 00/21] WIP: dump: add kaslr support".
> > 
> > A proof-of-concept kernel module:
> > https://github.com/elmarco/vmgenid-test/blob/master/qemuvmci-test.c
> > 
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > ---
> >  include/hw/acpi/aml-build.h        |   1 +
> >  include/hw/acpi/vmcoreinfo.h       |  36 +++++++
> >  hw/acpi/aml-build.c                |   2 +
> >  hw/acpi/vmcoreinfo.c               | 198
> >  +++++++++++++++++++++++++++++++++++++
> >  hw/i386/acpi-build.c               |  14 +++
> >  default-configs/arm-softmmu.mak    |   1 +
> >  default-configs/i386-softmmu.mak   |   1 +
> >  default-configs/x86_64-softmmu.mak |   1 +
> >  docs/specs/vmcoreinfo.txt          | 138 ++++++++++++++++++++++++++
> >  hw/acpi/Makefile.objs              |   1 +
> >  10 files changed, 393 insertions(+)
> >  create mode 100644 include/hw/acpi/vmcoreinfo.h
> >  create mode 100644 hw/acpi/vmcoreinfo.c
> >  create mode 100644 docs/specs/vmcoreinfo.txt
> >
> > diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> > index 88d0738d76..cf781bcd34 100644
> > --- a/include/hw/acpi/aml-build.h
> > +++ b/include/hw/acpi/aml-build.h
> > @@ -211,6 +211,7 @@ struct AcpiBuildTables {
> >      GArray *rsdp;
> >      GArray *tcpalog;
> >      GArray *vmgenid;
> > +    GArray *vmcoreinfo;
> >      BIOSLinker *linker;
> >  } AcpiBuildTables;
> >  
> > diff --git a/include/hw/acpi/vmcoreinfo.h b/include/hw/acpi/vmcoreinfo.h
> > new file mode 100644
> > index 0000000000..40fe99c3ed
> > --- /dev/null
> > +++ b/include/hw/acpi/vmcoreinfo.h
> > @@ -0,0 +1,36 @@
> > +#ifndef ACPI_VMCOREINFO_H
> > +#define ACPI_VMCOREINFO_H
> > +
> > +#include "hw/acpi/bios-linker-loader.h"
> > +#include "hw/qdev.h"
> > +
> > +#define VMCOREINFO_DEVICE           "vmcoreinfo"
> > +#define VMCOREINFO_FW_CFG_FILE      "etc/vmcoreinfo"
> > +#define VMCOREINFO_ADDR_FW_CFG_FILE "etc/vmcoreinfo-addr"
> > +
> > +#define VMCOREINFO_FW_CFG_SIZE      4096 /* Occupy a page of memory */
> > +#define VMCOREINFO_OFFSET           40   /* allow space for
> > +                                          * OVMF SDT Header Probe
> > Supressor
> > +                                          */
> > +
> > +#define VMCOREINFO(obj) OBJECT_CHECK(VmcoreinfoState, (obj),
> > VMCOREINFO_DEVICE)
> > +
> > +typedef struct VmcoreinfoState {
> 
> I think this should be spelled with a bit more camel-casing, like
> VMCoreInfoState or some such.
> 

ok

> > +    DeviceClass parent_obj;
> > +    uint8_t vmcoreinfo_addr_le[8];   /* Address of memory region */
> > +    bool write_pointer_available;
> > +} VmcoreinfoState;
> > +
> > +/* returns NULL unless there is exactly one device */
> > +static inline Object *find_vmcoreinfo_dev(void)
> > +{
> > +    return object_resolve_path_type("", VMCOREINFO_DEVICE, NULL);
> > +}
> > +
> > +void vmcoreinfo_build_acpi(VmcoreinfoState *vis, GArray *table_data,
> > +                           GArray *vmci, BIOSLinker *linker);
> > +void vmcoreinfo_add_fw_cfg(VmcoreinfoState *vis, FWCfgState *s, GArray
> > *vmci);
> > +bool vmcoreinfo_get(VmcoreinfoState *vis, uint64_t *paddr, uint64_t *size,
> > +                    Error **errp);
> > +
> > +#endif
> > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > index 36a6cc450e..47043ade4a 100644
> > --- a/hw/acpi/aml-build.c
> > +++ b/hw/acpi/aml-build.c
> > @@ -1561,6 +1561,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
> >      tables->table_data = g_array_new(false, true /* clear */, 1);
> >      tables->tcpalog = g_array_new(false, true /* clear */, 1);
> >      tables->vmgenid = g_array_new(false, true /* clear */, 1);
> > +    tables->vmcoreinfo = g_array_new(false, true /* clear */, 1);
> >      tables->linker = bios_linker_loader_init();
> >  }
> >  
> > @@ -1571,6 +1572,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables
> > *tables, bool mfre)
> >      g_array_free(tables->table_data, true);
> >      g_array_free(tables->tcpalog, mfre);
> >      g_array_free(tables->vmgenid, mfre);
> > +    g_array_free(tables->vmcoreinfo, mfre);
> >  }
> >  
> >  /* Build rsdt table */
> > diff --git a/hw/acpi/vmcoreinfo.c b/hw/acpi/vmcoreinfo.c
> > new file mode 100644
> > index 0000000000..216e0bb83a
> > --- /dev/null
> > +++ b/hw/acpi/vmcoreinfo.c
> > @@ -0,0 +1,198 @@
> > +/*
> > + *  Virtual Machine coreinfo device
> > + *  (based on Virtual Machine Generation ID Device)
> > + *
> > + *  Copyright (C) 2017 Red Hat, Inc.
> > + *  Copyright (C) 2017 Skyport Systems.
> > + *
> > + *  Authors: Marc-André Lureau <marcandre.lureau@redhat.com>
> > + *           Ben Warren <ben@skyportsystems.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +#include "qemu/osdep.h"
> > +#include "hw/acpi/acpi.h"
> > +#include "hw/acpi/aml-build.h"
> > +#include "hw/acpi/vmcoreinfo.h"
> > +#include "hw/nvram/fw_cfg.h"
> > +#include "sysemu/sysemu.h"
> > +#include "qapi/error.h"
> > +
> > +void vmcoreinfo_build_acpi(VmcoreinfoState *vis, GArray *table_data,
> > +                           GArray *vmci, BIOSLinker *linker)
> > +{
> > +    Aml *ssdt, *dev, *scope, *method, *addr, *if_ctx;
> > +    uint32_t vgia_offset;
> 
> This should be called "vcia_offset".

ok

> 
> > +
> > +    g_array_set_size(vmci, VMCOREINFO_FW_CFG_SIZE);
> > +
> > +    /* Put this in a separate SSDT table */
> > +    ssdt = init_aml_allocator();
> > +
> > +    /* Reserve space for header */
> > +    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
> > +
> > +    /* Storage address */
> > +    vgia_offset = table_data->len +
> > +        build_append_named_dword(ssdt->buf, "VCIA");
> > +    scope = aml_scope("\\_SB");
> > +    dev = aml_device("VMCI");
> > +    aml_append(dev, aml_name_decl("_HID", aml_string("QEMUVMCI")));
> > +
> > +    /* Simple status method to check that address is linked and non-zero
> > */
> > +    method = aml_method("_STA", 0, AML_NOTSERIALIZED);
> > +    addr = aml_local(0);
> > +    aml_append(method, aml_store(aml_int(0xf), addr));
> > +    if_ctx = aml_if(aml_equal(aml_name("VCIA"), aml_int(0)));
> > +    aml_append(if_ctx, aml_store(aml_int(0), addr));
> > +    aml_append(method, if_ctx);
> > +    aml_append(method, aml_return(addr));
> > +    aml_append(dev, method);
> > +
> > +    /* the ADDR method returns two 32-bit words representing the lower and
> > +     * upper halves of the physical address of the vmcoreinfo area
> > +     */
> > +    method = aml_method("ADDR", 0, AML_NOTSERIALIZED);
> > +
> > +    addr = aml_local(0);
> > +    aml_append(method, aml_store(aml_package(2), addr));
> > +
> > +    aml_append(method, aml_store(aml_add(aml_name("VCIA"),
> > +                                         aml_int(VMCOREINFO_OFFSET),
> > NULL),
> > +                                 aml_index(addr, aml_int(0))));
> > +    aml_append(method, aml_store(aml_int(0), aml_index(addr,
> > aml_int(1))));
> > +    aml_append(method, aml_return(addr));
> > +
> > +    aml_append(dev, method);
> > +    aml_append(scope, dev);
> > +    aml_append(ssdt, scope);
> > +
> > +    g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
> > +
> > +    /* Allocate guest memory */
> > +    bios_linker_loader_alloc(linker, VMCOREINFO_FW_CFG_FILE, vmci, 4096,
> > +                             false /* page boundary, high memory */);
> > +
> > +    /* Patch address of vmcoreinfo fw_cfg blob into the ADDR fw_cfg
> > +     * blob so QEMU can read the info from there.  The address is
> > +     * expected to be < 4GB, but write 64 bits anyway.
> > +     * The address that is patched in is offset in order to implement
> > +     * the "OVMF SDT Header probe suppressor"
> > +     * see docs/specs/vmcoreinfo.txt for more details.
> > +     */
> > +    bios_linker_loader_write_pointer(linker,
> > +        VMCOREINFO_ADDR_FW_CFG_FILE, 0, sizeof(uint64_t),
> > +        VMCOREINFO_FW_CFG_FILE, VMCOREINFO_OFFSET);
> > +
> > +    /* Patch address of vmcoreinfo into the AML so OSPM can retrieve
> > +     * and read it.  Note that while we provide storage for 64 bits, only
> > +     * the least-signficant 32 get patched into AML.
> > +     */
> > +    bios_linker_loader_add_pointer(linker,
> > +        ACPI_BUILD_TABLE_FILE, vgia_offset, sizeof(uint32_t),
> > +        VMCOREINFO_FW_CFG_FILE, 0);
> > +
> > +    build_header(linker, table_data,
> > +        (void *)(table_data->data + table_data->len - ssdt->buf->len),
> > +        "SSDT", ssdt->buf->len, 1, NULL, "VMCOREIN");
> > +    free_aml_allocator();
> > +}
> > +
> > +void vmcoreinfo_add_fw_cfg(VmcoreinfoState *vis, FWCfgState *s, GArray
> > *vmci)
> > +{
> > +    /* Create a read-only fw_cfg file for vmcoreinfo allocation */
> > +    /* XXX: linker could learn to allocate without backing fw_cfg? */
> 
> Yes, and a number of other things, as I'm sure Igor and MST will point
> out upon reading the patch :)
> 
> > +    fw_cfg_add_file(s, VMCOREINFO_FW_CFG_FILE, vmci->data,
> > +                    VMCOREINFO_FW_CFG_SIZE);
> > +    /* Create a read-write fw_cfg file for Address */
> > +    fw_cfg_add_file_callback(s, VMCOREINFO_ADDR_FW_CFG_FILE, NULL, NULL,
> > +                             vis->vmcoreinfo_addr_le,
> > +                             ARRAY_SIZE(vis->vmcoreinfo_addr_le), false);
> > +}
> > +
> > +bool vmcoreinfo_get(VmcoreinfoState *vis,
> > +                    uint64_t *paddr, uint64_t *size,
> > +                    Error **errp)
> > +{
> > +    uint32_t vmcoreinfo_addr;
> > +    uint32_t version;
> > +
> > +    assert(vis);
> > +    assert(paddr);
> > +    assert(size);
> > +
> > +    memcpy(&vmcoreinfo_addr, vis->vmcoreinfo_addr_le,
> > sizeof(vmcoreinfo_addr));
> > +    vmcoreinfo_addr = le32_to_cpu(vmcoreinfo_addr);
> > +    if (!vmcoreinfo_addr) {
> > +        error_setg(errp, "BIOS has not yet written the address of %s",
> > +                   VMCOREINFO_DEVICE);
> > +        return false;
> > +    }
> > +
> > +    cpu_physical_memory_read(vmcoreinfo_addr, &version, sizeof(version));
> > +    if (version != 0) {
> > +        error_setg(errp, "Unknown %s memory version", VMCOREINFO_DEVICE);
> > +        return false;
> > +    }
> > +
> > +    cpu_physical_memory_read(vmcoreinfo_addr + 4, paddr, sizeof(paddr));
> 
> This is wrong, it should be sizeof(*paddr).

right

> 
> > +    *paddr = le64_to_cpu(*paddr);
> > +    cpu_physical_memory_read(vmcoreinfo_addr + 12, size, sizeof(size));
> 
> This is wrong for two reasons:
> - first, it should be sizeof(*size),

ok

> - second, sizeof(*size) is 8, however, according to the design, the size
>   field is 4 bytes wide. I think the function prototype should be
>   updated to fix this.
> 

yes

> > +    *size = le32_to_cpu(*size);
> > +
> > +    return true;
> > +}
> > +
> > +static const VMStateDescription vmstate_vmcoreinfo = {
> > +    .name = "vmcoreinfo",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_UINT8_ARRAY(vmcoreinfo_addr_le, VmcoreinfoState,
> > sizeof(uint64_t)),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> > +
> > +static void vmcoreinfo_realize(DeviceState *dev, Error **errp)
> > +{
> > +    if (!bios_linker_loader_can_write_pointer()) {
> > +        error_setg(errp, "%s requires DMA write support in fw_cfg, "
> > +                   "which this machine type does not provide",
> > +                   VMCOREINFO_DEVICE);
> > +        return;
> > +    }
> > +
> > +    /* Given that this function is executing, there is at least one
> > VMCOREINFO
> > +     * device. Check if there are several.
> > +     */
> > +    if (!find_vmcoreinfo_dev()) {
> > +        error_setg(errp, "at most one %s device is permitted",
> > +                   VMCOREINFO_DEVICE);
> > +        return;
> > +    }
> > +}
> 
> You didn't copy the reset logic from vmgenid, but that's wrong -- all
> device models that produce WRITE_POINTER linker/loader commands must
> forget about guest-returned GPAs upon reset. Think of it this way: when
> the guest executes WRITE_POINTER, it creates a reference to guest-phys
> memory in QEMU; and when the guest initiates a reboot, guest-phys memory
> is fully "invalidated", so all such references must be dropped in QEMU.
> 
> (S3 resume is an exception, because guest memory is preserved, but both
> SeaBIOS and OVMF handle that specially -- they stash the WRITE_POINTER
> commands in a condensed format during normal boot, and then replay them
> during S3 resume. Thus, they restore the QEMU references discussed
> abovem at S3 resume.)
> 
> In more practical terms, assume the guest boots Linux, the address is
> passed back fine, then the Linux guest reboots, and then some other
> guest OS is launched from the guest firmware and/or bootloader (or we
> just stay in the firmware / bootloader indefinitely). If a dump is
> requested at this point, QEMU shouldn't go looking for the VMCOREINFO
> ELF note at the previously communicated address.
> 

Good point, I didn't realize it.


> The rest looks OK to me.

Thanks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] dump: add vmcoreinfo ELF note
  2017-07-04 23:48   ` Laszlo Ersek
@ 2017-07-05 21:52     ` Marc-André Lureau
  2017-07-06 10:29       ` Laszlo Ersek
  0 siblings, 1 reply; 31+ messages in thread
From: Marc-André Lureau @ 2017-07-05 21:52 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: QEMU, Igor Mammedov, Dave Anderson, Eduardo Habkost

Hi

On Wed, Jul 5, 2017 at 1:48 AM, Laszlo Ersek <lersek@redhat.com> wrote:
> On 06/29/17 15:23, Marc-André Lureau wrote:
>> Read the vmcoreinfo ELF PT_NOTE from guest memory when vmcoreinfo
>> device provides the location, and write it as an ELF note in the dump.
>>
>> There are now 2 possible sources of phys_base information.
>>
>> (1) arch guessed value from arch_dump_info_get()
>
> The function is called cpu_get_dump_info().
>
>> (2) vmcoreinfo ELF note NUMBER(phys_base)= field
>>
>>     NUMBER(phys_base) in vmcoreinfo has only been recently introduced
>>     in Linux 4.10 (401721ecd1dc "kexec: export the value of phys_base
>>     instead of symbol address").
>>
>> Since (2) has better chances to be accurate, the guessed value is
>> replaced by the value from the vmcoreinfo ELF note.
>>
>> The phys_base value is stored in the same dump field locations as
>> before, and may duplicate the information available in the vmcoreinfo
>> ELF PT_NOTE. Crash tools should be prepared to handle this case.
>>
>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>> ---
>>  include/sysemu/dump.h |   2 +
>>  dump.c                | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 119 insertions(+)
>>
>> diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
>> index 2672a15f8b..111a7dcaa4 100644
>> --- a/include/sysemu/dump.h
>> +++ b/include/sysemu/dump.h
>> @@ -192,6 +192,8 @@ typedef struct DumpState {
>>                                    * this could be used to calculate
>>                                    * how much work we have
>>                                    * finished. */
>> +    uint8_t *vmcoreinfo;         /* ELF note content */
>> +    size_t vmcoreinfo_size;
>>  } DumpState;
>>
>>  uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
>> diff --git a/dump.c b/dump.c
>> index d9090a24cc..8fda5cc1ed 100644
>> --- a/dump.c
>> +++ b/dump.c
>> @@ -26,6 +26,8 @@
>>  #include "qapi/qmp/qerror.h"
>>  #include "qmp-commands.h"
>>  #include "qapi-event.h"
>> +#include "qemu/error-report.h"
>> +#include "hw/acpi/vmcoreinfo.h"
>>
>>  #include <zlib.h>
>>  #ifdef CONFIG_LZO
>> @@ -38,6 +40,11 @@
>>  #define ELF_MACHINE_UNAME "Unknown"
>>  #endif
>>
>> +#define ELF_NOTE_SIZE(hdr_size, name_size, desc_size)   \
>> +    ((DIV_ROUND_UP((hdr_size), 4)                       \
>> +      + DIV_ROUND_UP((name_size), 4)                    \
>> +      + DIV_ROUND_UP((desc_size), 4)) * 4)
>> +
>
> This looks really useful to me, but (I think?) we generally leave the
> operator hanging at the end of the line:
>
> #define ELF_NOTE_SIZE(hdr_size, name_size, desc_size) \
>     ((DIV_ROUND_UP((hdr_size), 4) +                   \
>       DIV_ROUND_UP((name_size), 4) +                  \
>       DIV_ROUND_UP((desc_size), 4)) * 4)

ok

>
>>  uint16_t cpu_to_dump16(DumpState *s, uint16_t val)
>>  {
>>      if (s->dump_info.d_endian == ELFDATA2LSB) {
>> @@ -76,6 +83,8 @@ static int dump_cleanup(DumpState *s)
>>      guest_phys_blocks_free(&s->guest_phys_blocks);
>>      memory_mapping_list_free(&s->list);
>>      close(s->fd);
>> +    g_free(s->vmcoreinfo);
>> +    s->vmcoreinfo = NULL;
>>      if (s->resume) {
>>          if (s->detached) {
>>              qemu_mutex_lock_iothread();
>> @@ -235,6 +244,19 @@ static inline int cpu_index(CPUState *cpu)
>>      return cpu->cpu_index + 1;
>>  }
>>
>> +static void write_vmcoreinfo_note(WriteCoreDumpFunction f, DumpState *s,
>> +                                  Error **errp)
>> +{
>> +    int ret;
>> +
>> +    if (s->vmcoreinfo) {
>> +        ret = f(s->vmcoreinfo, s->vmcoreinfo_size, s);
>> +        if (ret < 0) {
>> +            error_setg(errp, "dump: failed to write vmcoreinfo");
>> +        }
>> +    }
>> +}
>> +
>>  static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
>>                                Error **errp)
>>  {
>> @@ -258,6 +280,8 @@ static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
>>              return;
>>          }
>>      }
>> +
>> +    write_vmcoreinfo_note(f, s, errp);
>>  }
>>
>>  static void write_elf32_note(DumpState *s, Error **errp)
>> @@ -303,6 +327,8 @@ static void write_elf32_notes(WriteCoreDumpFunction f, DumpState *s,
>>              return;
>>          }
>>      }
>> +
>> +    write_vmcoreinfo_note(f, s, errp);
>>  }
>>
>
> Wait, I'm confused again. You explained why it was OK to hook this logic
> into the kdump handling too, but I don't think I understand your
> explanation, so let me repeat my confusion below :)
>
> In the ELF case, this code works fine, I think. As long as the guest
> provided us with a well-formed note, a well-formed note will be appended
> to the ELF dump.
>
> But, this code is also invoked in the kdump case, and I don't understand
> why that's a good thing. If I understand the next patch correctly, the
> kdump format already provides crash with a (trimmed) copy of the guest
> kernels vmcoreinfo note. So in the kdump case, why do we have to create
> yet another copy of the guest kernel's vmcoreinfo note?
>
> Thus, my confusion persists, and I can only think (again) that
> write_vmcoreinfo_note() should be called from dump_begin() only (at the
> end). (And the s->note_size adjustment should take that into account.)
>
> Alternatively, we should keep this logic, and drop patch #5.

You are right, sorry for my misunderstanding, although crash is seems
fine as long as size/offsets are ok.

So, instead of duplicating the info, let's keep the complete ELF note
(with the header) in the dump, and simply adjust kdump header to point
to the adjusted offset/size. This has the advantage of not making the
code unnecessarily more complicated wrt s->note_size handling etc, and
is consistent with the rest of the elf notes.

>
>>  static void write_elf_section(DumpState *s, int type, Error **errp)
>> @@ -714,6 +740,44 @@ static int buf_write_note(const void *buf, size_t size, void *opaque)
>>      return 0;
>>  }
>>
>> +/*
>> + * This function retrieves various sizes from an elf header.
>> + *
>> + * @note has to be a valid ELF note. The return sizes are unmodified
>> + * (not padded or rounded up to be multiple of 4).
>> + */
>> +static void get_note_sizes(DumpState *s, const void *note,
>> +                           uint64_t *note_head_size,
>> +                           uint64_t *name_size,
>> +                           uint64_t *desc_size)
>> +{
>> +    uint64_t note_head_sz;
>> +    uint64_t name_sz;
>> +    uint64_t desc_sz;
>> +
>> +    if (s->dump_info.d_class == ELFCLASS64) {
>> +        const Elf64_Nhdr *hdr = note;
>> +        note_head_sz = sizeof(Elf64_Nhdr);
>> +        name_sz = hdr->n_namesz;
>> +        desc_sz = hdr->n_descsz;
>> +    } else {
>> +        const Elf32_Nhdr *hdr = note;
>> +        note_head_sz = sizeof(Elf32_Nhdr);
>> +        name_sz = hdr->n_namesz;
>> +        desc_sz = hdr->n_descsz;
>> +    }
>> +
>> +    if (note_head_size) {
>> +        *note_head_size = note_head_sz;
>> +    }
>> +    if (name_size) {
>> +        *name_size = name_sz;
>> +    }
>> +    if (desc_size) {
>> +        *desc_size = desc_sz;
>> +    }
>> +}
>
> Should we do any endianness conversions here? (Because I think we're
> going to parse the guest kernel's vmcoreinfo note with this function.)

Correct, it is in guest endianess, fixed

>
>> +
>>  /* write common header, sub header and elf note to vmcore */
>>  static void create_header32(DumpState *s, Error **errp)
>>  {
>> @@ -1488,10 +1552,38 @@ static int64_t dump_calculate_size(DumpState *s)
>>      return total;
>>  }
>>
>> +static void vmcoreinfo_update_phys_base(DumpState *s)
>> +{
>> +    uint64_t size, note_head_size, name_size;
>> +    char **lines;
>> +    uint8_t *vmci;
>> +    size_t i;
>> +
>> +    get_note_sizes(s, s->vmcoreinfo, &note_head_size, &name_size, &size);
>> +    note_head_size = ROUND_UP(note_head_size, 4);
>> +    name_size = ROUND_UP(name_size, 4);
>> +    vmci = s->vmcoreinfo + note_head_size + name_size;
>> +    *(vmci + size) = '\0';
>
> I think this is a bit cavalier; you write to QEMU process memory based
> on what the guest kernel tells you through its vmcoreinfo note. For
> this, we need strict validation; namely:
>
> (1) Via vmcoreinfo_get() we learn how large the vmcoreinfo note is,
>     according to the guest kernel.
>
>     In dump_init(), you run a g_malloc() on this value, which is a bad
>     idea without some sanity upper bound.
>

let's have 1MB limit

> (2) Assuming we managed to copy the guest kernel's vmcoreinfo note out
>     of guest RAM, we parse it. We expect it to be an ElfXX_Nhdr
>     (according to arch).
>
>     In dump_init(), you pass the read data to get_note_sizes(), without
>     checking whether the read data are at least as big as required by
>     ElfXX_Nhdr (according to arch). We could be reading n_namesz and
>     n_descsz from outside of the read buffer.

Ok, hdr size check added

>
> (3) Assuming "s->vmcoreinfo" was large enough and we fetched (and
>     possibly converted the endianness of?) n_namesz and n_descsz,
>     we must round up these values to integral multiples of 4.
>
>     None of the individual rounding-up steps must overflow uintXX_t,
>     but that is not checked in ELF_NOTE_SIZE().

How bad could an overflow be? If it occured, the result will likely be
smaller than expected size. But it shouldn't occur.

Would you have upper bound for namesz and descsz instead? Or how would
you deal with that?

>
> (4) After we rounded up n_namesz and n_descsz, we calculate the sum
>
>       sizeof(ElfXX_Nhdr) + round_up(n_namesz) + round_up(n_descsz)
>
>     This sum must not overflow uintXX_t, but this is not checked in
>     ELF_NOTE_SIZE().

same as above

>
> (5) Assuming the sum can be represented fine, we should validate whether
>     it matches the full note size advertized by the guest kernel
>     (currently that means your experimental guest kernel module).
>
>     In dump_init() you enforce
>
>       s->vmcoreinfo_size <= size
>
>     This looks safe, but the above NUL-termination of s->vmcoreinfo,
>     for the purposes of g_strsplit(), actually writes one byte past
>     the end of s->vmcoreinfo, *if* (s->vmcoreinfo_size == size) *and*
>     desc_size is already a multiple of 4.
>
>     So, for NUL-termination, you likely have to increase the initial
>     allocation by one byte. (Any overflow in the addition can be
>     prevented by the sanity upper bound from (1).)
>

ok

> (6) If the note parses fine, we add its size to s->note_size, in
>     dump_init(). I think there's another (theoretical) chance for
>     overflow here, which can also be prevented by the sanity upper bound
>     in step (1).
>
>> +
>> +    lines = g_strsplit((char *)vmci, "\n", -1);
>> +    for (i = 0; lines[i]; i++) {
>> +        if (g_str_has_prefix(lines[i], "NUMBER(phys_base)=")) {
>> +            if (qemu_strtou64(lines[i] + 18, NULL, 16,
>> +                              &s->dump_info.phys_base) < 0) {
>
> qemu_strtou64() can modify *result even on error, which will garble our
> previously guessed phys_base value, from cpu_get_dump_info(). I think
> you should use a local variable here.

ok

>
>> +                error_report("Failed to read NUMBER(phys_base)=");
>
> I understand that this error is not supposed to fail the entire monitor
> command, but I'm nonetheless unsure if this is the best way to report
> warnings. Does this error message have to go to the QMP monitor as well?
>

It doesn't go through QMP monitor afaik, only HMP or stderr.

>> +            }
>> +            break;
>> +        }
>> +    }
>> +
>> +    g_strfreev(lines);
>> +}
>> +
>>  static void dump_init(DumpState *s, int fd, bool has_format,
>>                        DumpGuestMemoryFormat format, bool paging, bool has_filter,
>>                        int64_t begin, int64_t length, Error **errp)
>>  {
>> +    Object *vmcoreinfo_dev = find_vmcoreinfo_dev();
>>      CPUState *cpu;
>>      int nr_cpus;
>>      Error *err = NULL;
>> @@ -1563,6 +1655,31 @@ static void dump_init(DumpState *s, int fd, bool has_format,
>>          goto cleanup;
>>      }
>>
>> +    if (vmcoreinfo_dev) {
>> +        uint64_t addr, size, note_head_size, name_size, desc_size;
>> +
>> +        if (!vmcoreinfo_get(VMCOREINFO(vmcoreinfo_dev),
>> +                            &addr, &size, &err)) {
>> +            error_report_err(err);
>> +        } else {
>> +            s->vmcoreinfo = g_malloc(size);
>> +            cpu_physical_memory_read(addr, s->vmcoreinfo, size);
>> +
>> +            get_note_sizes(s, s->vmcoreinfo,
>> +                           &note_head_size, &name_size, &desc_size);
>> +            s->vmcoreinfo_size = ELF_NOTE_SIZE(note_head_size, name_size,
>> +                                               desc_size);
>> +            if (s->vmcoreinfo_size > size) {
>> +                error_report("Invalid vmcoreinfo header, size mismatch");
>> +                g_free(s->vmcoreinfo);
>> +                s->vmcoreinfo = NULL;
>> +            } else {
>> +                vmcoreinfo_update_phys_base(s);
>> +                s->note_size += s->vmcoreinfo_size;
>> +            }
>> +        }
>> +    }
>> +
>>      /* get memory mapping */
>>      if (paging) {
>>          qemu_get_guest_memory_mapping(&s->list, &s->guest_phys_blocks, &err);
>>

Thanks

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] MAINTAINERS: add Dump maintainers
  2017-07-05  0:26   ` Laszlo Ersek
@ 2017-07-06  9:54     ` Marc-André Lureau
  2017-07-06 10:17       ` Laszlo Ersek
  0 siblings, 1 reply; 31+ messages in thread
From: Marc-André Lureau @ 2017-07-06  9:54 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: qemu-devel, ehabkost, anderson, imammedo

Hi

----- Original Message -----
> On 06/29/17 15:23, Marc-André Lureau wrote:
> > Proposing myself, since I have some familiarity with the code now.
> > 
> > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > ---
> >  MAINTAINERS | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 839f7ca063..45a0eb4cb0 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -1272,6 +1272,13 @@ S: Maintained
> >  F: device_tree.c
> >  F: include/sysemu/device_tree.h
> >  
> > +Dump
> > +S: Supported
> > +M: Marc-André Lureau <marcandre.lureau@redhat.com>
> > +F: dump.c
> > +F: include/sysemu/dump.h
> > +F: include/sysemu/dump-arch.h
> > +
> >  Error reporting
> >  M: Markus Armbruster <armbru@redhat.com>
> >  S: Supported
> > 
> 
> That's very kind of you, thanks!
> 
> Do you have suggestions for the following files?
> 
> scripts/dump-guest-memory.py

This one is yours, no? :) But I am ok to "support" it, meaning I'll take time to review the patches, and eventually make the pull-requests.
 
> stubs/dump.c

I can add that one, although it is also maintained by Paolo

> target/arm/arch_dump.c
> target/i386/arch_dump.c
> target/ppc/arch_dump.c
> target/s390x/arch_dump.c

I'd rather have those maintained by the respective arch maintainers, as they are. But I imagine it would make sense to also cover them with the rest of dump.

thanks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] kdump: add vmcoreinfo ELF note
  2017-07-05  0:07   ` Laszlo Ersek
@ 2017-07-06 10:05     ` Marc-André Lureau
  0 siblings, 0 replies; 31+ messages in thread
From: Marc-André Lureau @ 2017-07-06 10:05 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: QEMU, Igor Mammedov, Dave Anderson, Eduardo Habkost

Hi

On Wed, Jul 5, 2017 at 2:07 AM, Laszlo Ersek <lersek@redhat.com> wrote:
> On 06/29/17 15:23, Marc-André Lureau wrote:
>> kdump header provides offset and size of the vmcoreinfo ELF note,
>> append it if available.
>>
>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>> ---
>>  dump.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 44 insertions(+), 4 deletions(-)
>>
>> diff --git a/dump.c b/dump.c
>> index 8fda5cc1ed..b78bc1fda7 100644
>> --- a/dump.c
>> +++ b/dump.c
>> @@ -788,8 +788,9 @@ static void create_header32(DumpState *s, Error **errp)
>>      uint32_t sub_hdr_size;
>>      uint32_t bitmap_blocks;
>>      uint32_t status = 0;
>> -    uint64_t offset_note;
>> +    uint64_t offset_note, offset_vmcoreinfo, size_vmcoreinfo = 0;
>>      Error *local_err = NULL;
>> +    uint8_t *vmcoreinfo = NULL;
>>
>>      /* write common header, the version of kdump-compressed format is 6th */
>>      size = sizeof(DiskDumpHeader32);
>> @@ -838,7 +839,18 @@ static void create_header32(DumpState *s, Error **errp)
>>      kh->phys_base = cpu_to_dump32(s, s->dump_info.phys_base);
>>      kh->dump_level = cpu_to_dump32(s, DUMP_LEVEL);
>>
>> -    offset_note = DISKDUMP_HEADER_BLOCKS * block_size + size;
>> +    offset_vmcoreinfo = DISKDUMP_HEADER_BLOCKS * block_size + size;
>> +    if (s->vmcoreinfo) {
>> +        uint64_t hsize, name_size;
>> +
>> +        get_note_sizes(s, s->vmcoreinfo, &hsize, &name_size, &size_vmcoreinfo);
>
> Should we round up "size_vmcoreinfo" as well? (Without the rounding,
> offset_note might become unaligned, plus I simply don't know what
> alignment is expected from kh->size_vmcoreinfo.)

In the last iteration, it only sets the kh->offset/size_vmcoreinfo. I
don't see any alignment requirement in crash.
Dave, any comment?

>
>> +        vmcoreinfo =
>> +            s->vmcoreinfo + (DIV_ROUND_UP(hsize, 4) + DIV_ROUND_UP(name_size, 4)) * 4;
>> +        kh->offset_vmcoreinfo = cpu_to_dump64(s, offset_vmcoreinfo);
>> +        kh->size_vmcoreinfo = cpu_to_dump32(s, size_vmcoreinfo);
>> +    }
>> +
>> +    offset_note = offset_vmcoreinfo + size_vmcoreinfo;
>>      kh->offset_note = cpu_to_dump64(s, offset_note);
>>      kh->note_size = cpu_to_dump32(s, s->note_size);
>>
>> @@ -848,6 +860,14 @@ static void create_header32(DumpState *s, Error **errp)
>>          goto out;
>>      }
>>
>> +    if (vmcoreinfo) {
>> +        if (write_buffer(s->fd, offset_vmcoreinfo, vmcoreinfo,
>> +                         size_vmcoreinfo) < 0) {
>> +            error_setg(errp, "dump: failed to vmcoreinfo");
>
> The verb "write" is missing from the message.
>
> Same comments for create_header64() below.
>

gone

thanks

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] MAINTAINERS: add Dump maintainers
  2017-07-06  9:54     ` Marc-André Lureau
@ 2017-07-06 10:17       ` Laszlo Ersek
  0 siblings, 0 replies; 31+ messages in thread
From: Laszlo Ersek @ 2017-07-06 10:17 UTC (permalink / raw)
  To: Marc-André Lureau; +Cc: qemu-devel, ehabkost, anderson, imammedo

On 07/06/17 11:54, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
>> On 06/29/17 15:23, Marc-André Lureau wrote:
>>> Proposing myself, since I have some familiarity with the code now.
>>>
>>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>>> ---
>>>  MAINTAINERS | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index 839f7ca063..45a0eb4cb0 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -1272,6 +1272,13 @@ S: Maintained
>>>  F: device_tree.c
>>>  F: include/sysemu/device_tree.h
>>>  
>>> +Dump
>>> +S: Supported
>>> +M: Marc-André Lureau <marcandre.lureau@redhat.com>
>>> +F: dump.c
>>> +F: include/sysemu/dump.h
>>> +F: include/sysemu/dump-arch.h
>>> +
>>>  Error reporting
>>>  M: Markus Armbruster <armbru@redhat.com>
>>>  S: Supported
>>>
>>
>> That's very kind of you, thanks!
>>
>> Do you have suggestions for the following files?
>>
>> scripts/dump-guest-memory.py
> 
> This one is yours, no? :) But I am ok to "support" it, meaning I'll take time to review the patches, and eventually make the pull-requests.

It used to be "mine" until (a) it got rewritten in object-oriented
Python, and (b) it received multi-arch support ;)

>  
>> stubs/dump.c
> 
> I can add that one, although it is also maintained by Paolo
> 
>> target/arm/arch_dump.c
>> target/i386/arch_dump.c
>> target/ppc/arch_dump.c
>> target/s390x/arch_dump.c
> 
> I'd rather have those maintained by the respective arch maintainers, as they are. But I imagine it would make sense to also cover them with the rest of dump.

Yeah there's an overlap here -- I'm not suggesting that you take on
everything, just curious what you think. I'm fine with this patch as it is.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] dump: add vmcoreinfo ELF note
  2017-07-05 21:52     ` Marc-André Lureau
@ 2017-07-06 10:29       ` Laszlo Ersek
  0 siblings, 0 replies; 31+ messages in thread
From: Laszlo Ersek @ 2017-07-06 10:29 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Igor Mammedov, Dave Anderson, QEMU, Eduardo Habkost

On 07/05/17 23:52, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 5, 2017 at 1:48 AM, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 06/29/17 15:23, Marc-André Lureau wrote:

>>> @@ -258,6 +280,8 @@ static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
>>>              return;
>>>          }
>>>      }
>>> +
>>> +    write_vmcoreinfo_note(f, s, errp);
>>>  }
>>>
>>>  static void write_elf32_note(DumpState *s, Error **errp)
>>> @@ -303,6 +327,8 @@ static void write_elf32_notes(WriteCoreDumpFunction f, DumpState *s,
>>>              return;
>>>          }
>>>      }
>>> +
>>> +    write_vmcoreinfo_note(f, s, errp);
>>>  }
>>>
>>
>> Wait, I'm confused again. You explained why it was OK to hook this logic
>> into the kdump handling too, but I don't think I understand your
>> explanation, so let me repeat my confusion below :)
>>
>> In the ELF case, this code works fine, I think. As long as the guest
>> provided us with a well-formed note, a well-formed note will be appended
>> to the ELF dump.
>>
>> But, this code is also invoked in the kdump case, and I don't understand
>> why that's a good thing. If I understand the next patch correctly, the
>> kdump format already provides crash with a (trimmed) copy of the guest
>> kernels vmcoreinfo note. So in the kdump case, why do we have to create
>> yet another copy of the guest kernel's vmcoreinfo note?
>>
>> Thus, my confusion persists, and I can only think (again) that
>> write_vmcoreinfo_note() should be called from dump_begin() only (at the
>> end). (And the s->note_size adjustment should take that into account.)
>>
>> Alternatively, we should keep this logic, and drop patch #5.
> 
> You are right, sorry for my misunderstanding, although crash is seems
> fine as long as size/offsets are ok.
> 
> So, instead of duplicating the info, let's keep the complete ELF note
> (with the header) in the dump, and simply adjust kdump header to point
> to the adjusted offset/size. This has the advantage of not making the
> code unnecessarily more complicated wrt s->note_size handling etc, and
> is consistent with the rest of the elf notes.

I guess that's OK, although I don't really see why even that is
necessary. The vmcoreinfo note should be possible to find in the kdump
output with the rest of the ELF notes, even without pointing some kdump
header fields into that specific note.

Snipping the rest because I'm going to see updates on those things in v2
anyway (which you've just posted).

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-07-06 10:29 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-29 13:23 [Qemu-devel] [PATCH 0/7] KASLR kernel dump support Marc-André Lureau
2017-06-29 13:23 ` [Qemu-devel] [PATCH 1/7] vmgenid: replace x-write-pointer-available hack Marc-André Lureau
2017-06-29 14:11   ` Michael S. Tsirkin
2017-07-02  3:09   ` Ben Warren
2017-07-03 14:48   ` Eduardo Habkost
2017-07-03 18:06   ` Laszlo Ersek
2017-07-03 18:27     ` Eduardo Habkost
2017-07-03 18:35       ` Laszlo Ersek
2017-07-03 18:21   ` Michael S. Tsirkin
2017-07-03 18:38   ` Michael S. Tsirkin
2017-07-03 18:50     ` Eduardo Habkost
2017-07-03 19:51       ` Michael S. Tsirkin
2017-06-29 13:23 ` [Qemu-devel] [PATCH 2/7] acpi: add vmcoreinfo device Marc-André Lureau
2017-07-04 22:07   ` Laszlo Ersek
2017-07-05 13:54     ` Marc-André Lureau
2017-06-29 13:23 ` [Qemu-devel] [PATCH 3/7] tests: add simple vmcoreinfo test Marc-André Lureau
2017-06-29 13:23 ` [Qemu-devel] [PATCH 4/7] dump: add vmcoreinfo ELF note Marc-André Lureau
2017-07-04 23:48   ` Laszlo Ersek
2017-07-05 21:52     ` Marc-André Lureau
2017-07-06 10:29       ` Laszlo Ersek
2017-06-29 13:23 ` [Qemu-devel] [PATCH 5/7] kdump: " Marc-André Lureau
2017-07-05  0:07   ` Laszlo Ersek
2017-07-06 10:05     ` Marc-André Lureau
2017-06-29 13:23 ` [Qemu-devel] [PATCH 6/7] scripts/dump-guest-memory.py: add vmcoreinfo Marc-André Lureau
2017-07-05  0:22   ` Laszlo Ersek
2017-07-05  9:58     ` Marc-André Lureau
2017-07-05 11:05       ` Laszlo Ersek
2017-06-29 13:23 ` [Qemu-devel] [PATCH 7/7] MAINTAINERS: add Dump maintainers Marc-André Lureau
2017-07-05  0:26   ` Laszlo Ersek
2017-07-06  9:54     ` Marc-André Lureau
2017-07-06 10:17       ` Laszlo Ersek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.