qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU
@ 2021-02-08 20:57 Eric DeVolder
  2021-02-08 20:57 ` [PATCH v2 1/7] ACPI ERST: bios-tables-test.c steps 1 and 2 Eric DeVolder
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Eric DeVolder @ 2021-02-08 20:57 UTC (permalink / raw)
  To: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, qemu-devel
  Cc: boris.ostrovsky, kwilk

This patchset introduces support for the ACPI Error Record
Serialization Table, ERST.

Linux uses the persistent storage filesystem, pstore, to record
information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
independent of, and runs before, kdump.  In certain scenarios (ie.
hosts/guests with root filesystems on NFS/iSCSI where networking
software and/or hardware fails), pstore may contain the only
information available for post-mortem debugging.

Two common storage backends for the pstore filesystem are ACPI ERST
and UEFI. Most BIOS implement ACPI ERST; however, ACPI ERST is not
currently supported in QEMU, and UEFI is not utilized in all guests.
By implementing ACPI ERST within QEMU, then the ACPI ERST becomes a
viable pstore storage backend for virtual machines (as it is now for
bare metal machines).

Enabling support for ACPI ERST facilitates a consistent method to
capture kernel panic information in a wide range of guests: from
resource- constrained microvms to very large guests, and in
particular, in direct-boot environments (which would lack UEFI
run-time services).

Note that Microsoft Windows also utilizes the ACPI ERST for certain
crash information, if available.

The ACPI ERST persistent storage is contained within a single backing
file, with a default size of 64KiB. The size and filename of the
backing file can be obtained from QEMU parameters.

The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
(APEI)", and specifically subsection "Error Serialization", outlines
a method for storing error records into persistent storage.

[1] "Advanced Configuration and Power Interface Specification",
    version 6.2, May 2017.
    https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf

[2] "Unified Extensible Firmware Interface Specification",
    version 2.8, March 2019.
    https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf

Suggested-by: Konrad Wilk <konrad.wilk@oracle.com>
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>

---
v2: 8feb2021
 - Added qtest/smoke test per Paolo Bonzini
 - Split patch into smaller chunks, per Igo Mammedov
 - Did away with use of ACPI packed structures, per Igo Mammedov

v1: 26oct2020
 - initial post

---
Eric DeVolder (7):
  ACPI ERST: bios-tables-test.c steps 1 and 2
  ACPI ERST: header file for erst
  ACPI ERST: support for ACPI ERST feature
  ACPI ERST: build step for ACPI ERST
  ACPI ERST: support ERST for x86 guest
  ACPI ERST: qtest for ERST
  ACPI ERST: bios-tables-test.c step 5

 hw/acpi/erst.c               | 952 +++++++++++++++++++++++++++++++++++++++++++
 hw/acpi/meson.build          |   1 +
 hw/i386/acpi-build.c         |   4 +
 include/hw/acpi/erst.h       |  77 ++++
 tests/data/acpi/microvm/ERST |   0
 tests/data/acpi/pc/ERST      | Bin 0 -> 976 bytes
 tests/data/acpi/q35/ERST     | Bin 0 -> 976 bytes
 tests/qtest/erst-test.c      | 106 +++++
 tests/qtest/meson.build      |   2 +
 9 files changed, 1142 insertions(+)
 create mode 100644 hw/acpi/erst.c
 create mode 100644 include/hw/acpi/erst.h
 create mode 100644 tests/data/acpi/microvm/ERST
 create mode 100644 tests/data/acpi/pc/ERST
 create mode 100644 tests/data/acpi/q35/ERST
 create mode 100644 tests/qtest/erst-test.c

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/7] ACPI ERST: bios-tables-test.c steps 1 and 2
  2021-02-08 20:57 [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
@ 2021-02-08 20:57 ` Eric DeVolder
  2021-02-08 20:57 ` [PATCH v2 2/7] ACPI ERST: header file for erst Eric DeVolder
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Eric DeVolder @ 2021-02-08 20:57 UTC (permalink / raw)
  To: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, qemu-devel
  Cc: boris.ostrovsky, kwilk

Following the guidelines in tests/qtest/bios-tables-test.c, this
change adds empty placeholder files per step 1 for the new ERST
table, and excludes resulting changed files in bios-tables-test-allowed-diff.h
per step 2.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 tests/data/acpi/microvm/ERST                | 0
 tests/data/acpi/pc/ERST                     | 0
 tests/data/acpi/q35/ERST                    | 0
 tests/qtest/bios-tables-test-allowed-diff.h | 4 ++++
 4 files changed, 4 insertions(+)
 create mode 100644 tests/data/acpi/microvm/ERST
 create mode 100644 tests/data/acpi/pc/ERST
 create mode 100644 tests/data/acpi/q35/ERST

diff --git a/tests/data/acpi/microvm/ERST b/tests/data/acpi/microvm/ERST
new file mode 100644
index 0000000..e69de29
diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
new file mode 100644
index 0000000..e69de29
diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
new file mode 100644
index 0000000..e69de29
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523..e004c71 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,5 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/ERST",
+"tests/data/acpi/q35/ERST",
+"tests/data/acpi/microvm/ERST",
+
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/7] ACPI ERST: header file for erst
  2021-02-08 20:57 [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
  2021-02-08 20:57 ` [PATCH v2 1/7] ACPI ERST: bios-tables-test.c steps 1 and 2 Eric DeVolder
@ 2021-02-08 20:57 ` Eric DeVolder
  2021-02-08 20:57 ` [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature Eric DeVolder
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Eric DeVolder @ 2021-02-08 20:57 UTC (permalink / raw)
  To: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, qemu-devel
  Cc: boris.ostrovsky, kwilk

This change introduces the defintions for ACPI ERST support.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 include/hw/acpi/erst.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)
 create mode 100644 include/hw/acpi/erst.h

diff --git a/include/hw/acpi/erst.h b/include/hw/acpi/erst.h
new file mode 100644
index 0000000..be9b3fa
--- /dev/null
+++ b/include/hw/acpi/erst.h
@@ -0,0 +1,77 @@
+/*
+ * ACPI Error Record Serialization Table, ERST, Implementation
+ *
+ * Copyright (c) 2020 Oracle and/or its affiliates.
+ *
+ * See ACPI specification, "ACPI Platform Error Interfaces"
+ *  "Error Serialization"
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+#ifndef HW_ACPI_ERST_H
+#define HW_ACPI_ERST_H
+
+void build_erst(GArray *table_data, BIOSLinker *linker, hwaddr base);
+
+#define TYPE_ACPI_ERST "acpi-erst"
+
+#define ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION         0x0
+#define ACPI_ERST_ACTION_BEGIN_READ_OPERATION          0x1
+#define ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION         0x2
+#define ACPI_ERST_ACTION_END_OPERATION                 0x3
+#define ACPI_ERST_ACTION_SET_RECORD_OFFSET             0x4
+#define ACPI_ERST_ACTION_EXECUTE_OPERATION             0x5
+#define ACPI_ERST_ACTION_CHECK_BUSY_STATUS             0x6
+#define ACPI_ERST_ACTION_GET_COMMAND_STATUS            0x7
+#define ACPI_ERST_ACTION_GET_RECORD_IDENTIFIER         0x8
+#define ACPI_ERST_ACTION_SET_RECORD_IDENTIFIER         0x9
+#define ACPI_ERST_ACTION_GET_RECORD_COUNT              0xA
+#define ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION   0xB
+#define ACPI_ERST_ACTION_RESERVED                      0xC
+#define ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE   0xD
+#define ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_LENGTH  0xE
+#define ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES 0xF
+#define ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS 0x10
+#define ACPI_ERST_MAX_ACTIONS \
+    (ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS + 1)
+
+#define ACPI_ERST_STATUS_SUCCESS                0x00
+#define ACPI_ERST_STATUS_NOT_ENOUGH_SPACE       0x01
+#define ACPI_ERST_STATUS_HARDWARE_NOT_AVAILABLE 0x02
+#define ACPI_ERST_STATUS_FAILED                 0x03
+#define ACPI_ERST_STATUS_RECORD_STORE_EMPTY     0x04
+#define ACPI_ERST_STATUS_RECORD_NOT_FOUND       0x05
+
+#define ACPI_ERST_INST_READ_REGISTER                 0x00
+#define ACPI_ERST_INST_READ_REGISTER_VALUE           0x01
+#define ACPI_ERST_INST_WRITE_REGISTER                0x02
+#define ACPI_ERST_INST_WRITE_REGISTER_VALUE          0x03
+#define ACPI_ERST_INST_NOOP                          0x04
+#define ACPI_ERST_INST_LOAD_VAR1                     0x05
+#define ACPI_ERST_INST_LOAD_VAR2                     0x06
+#define ACPI_ERST_INST_STORE_VAR1                    0x07
+#define ACPI_ERST_INST_ADD                           0x08
+#define ACPI_ERST_INST_SUBTRACT                      0x09
+#define ACPI_ERST_INST_ADD_VALUE                     0x0A
+#define ACPI_ERST_INST_SUBTRACT_VALUE                0x0B
+#define ACPI_ERST_INST_STALL                         0x0C
+#define ACPI_ERST_INST_STALL_WHILE_TRUE              0x0D
+#define ACPI_ERST_INST_SKIP_NEXT_INSTRUCTION_IF_TRUE 0x0E
+#define ACPI_ERST_INST_GOTO                          0x0F
+#define ACPI_ERST_INST_SET_SRC_ADDRESS_BASE          0x10
+#define ACPI_ERST_INST_SET_DST_ADDRESS_BASE          0x11
+#define ACPI_ERST_INST_MOVE_DATA                     0x12
+
+#endif
+
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-02-08 20:57 [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
  2021-02-08 20:57 ` [PATCH v2 1/7] ACPI ERST: bios-tables-test.c steps 1 and 2 Eric DeVolder
  2021-02-08 20:57 ` [PATCH v2 2/7] ACPI ERST: header file for erst Eric DeVolder
@ 2021-02-08 20:57 ` Eric DeVolder
  2021-04-06 19:31   ` Igor Mammedov
  2021-02-08 20:57 ` [PATCH v2 4/7] ACPI ERST: build step for ACPI ERST Eric DeVolder
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Eric DeVolder @ 2021-02-08 20:57 UTC (permalink / raw)
  To: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, qemu-devel
  Cc: boris.ostrovsky, kwilk

This change implements the support for the ACPI ERST feature[1,2].

The size of the ACPI ERST storage is declared via the QEMU
global parameter acpi-erst.size. The size can range from 64KiB
to to 64MiB. The default is 64KiB.

The location of the ACPI ERST storage backing file is delared
via the QEMU global parameter acpi-erst.filename. The default
is acpi-erst.backing.

[1] "Advanced Configuration and Power Interface Specification",
    version 6.2, May 2017.
    https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf

[2] "Unified Extensible Firmware Interface Specification",
    version 2.8, March 2019.
    https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 hw/acpi/erst.c | 952 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 952 insertions(+)
 create mode 100644 hw/acpi/erst.c

diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
new file mode 100644
index 0000000..3a342f9
--- /dev/null
+++ b/hw/acpi/erst.c
@@ -0,0 +1,952 @@
+/*
+ * ACPI Error Record Serialization Table, ERST, Implementation
+ *
+ * Copyright (c) 2020 Oracle and/or its affiliates.
+ *
+ * See ACPI specification,
+ * "ACPI Platform Error Interfaces" : "Error Serialization"
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/sysbus.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "migration/vmstate.h"
+#include "hw/qdev-properties.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/acpi-defs.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "exec/address-spaces.h"
+#include "hw/acpi/erst.h"
+
+#ifdef _ERST_DEBUG
+#define erst_debug(fmt, ...) \
+    do { fprintf(stderr, fmt, ## __VA_ARGS__); fflush(stderr); } while (0)
+#else
+#define erst_debug(fmt, ...) do { } while (0)
+#endif
+
+/* See UEFI spec, Appendix N Common Platform Error Record */
+/* UEFI CPER allows for an OSPM book keeping area in the record */
+#define UEFI_CPER_RECORD_MIN_SIZE 128U
+#define UEFI_CPER_SIZE_OFFSET 20U
+#define UEFI_CPER_RECORD_ID_OFFSET 96U
+#define IS_UEFI_CPER_RECORD(ptr) \
+    (((ptr)[0] == 'C') && \
+     ((ptr)[1] == 'P') && \
+     ((ptr)[2] == 'E') && \
+     ((ptr)[3] == 'R'))
+#define THE_UEFI_CPER_RECORD_ID(ptr) \
+    (*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
+
+#define ERST_INVALID_RECORD_ID (~0UL)
+#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
+#define ERST_CSR_ACTION (0UL << 3) /* action (cmd) */
+#define ERST_CSR_VALUE  (1UL << 3) /* argument/value (data) */
+
+/*
+ * As ERST_IOMEM_SIZE is used to map the ERST into the guest,
+ * it should/must be an integer multiple of PAGE_SIZE.
+ * NOTE that any change to this value will make any pre-
+ * existing backing files, not of the same ERST_IOMEM_SIZE,
+ * unusable to the guest.
+ */
+#define ERST_IOMEM_SIZE (2UL * 4096)
+
+/*
+ * This implementation is an ACTION (cmd) and VALUE (data)
+ * interface consisting of just two 64-bit registers.
+ */
+#define ERST_REG_LEN (2UL * sizeof(uint64_t))
+
+/*
+ * The space not utilized by the register interface is the
+ * buffer for exchanging ERST record contents.
+ */
+#define ERST_RECORD_SIZE (ERST_IOMEM_SIZE - ERST_REG_LEN)
+
+/*
+ * Mode to be used for backing file
+ */
+#define ERST_BACKING_FILE_MODE 0644 /* S_IRWXU|S_IRWXG */
+
+#define ACPIERST(obj) \
+    OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
+#define ACPIERST_CLASS(oc) \
+    OBJECT_CLASS_CHECK(ERSTDeviceStateClass, (oc), TYPE_ACPI_ERST)
+#define ACPIERST_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(ERSTDeviceStateClass, (obj), TYPE_ACPI_ERST)
+
+static hwaddr erst_base;
+
+typedef struct {
+    SysBusDevice parent_obj;
+
+    MemoryRegion iomem;
+    uint32_t prop_size;
+    char *prop_filename;
+    hwaddr base;
+
+    uint8_t operation;
+    uint8_t busy_status;
+    uint8_t command_status;
+    uint32_t record_offset;
+    uint32_t record_count;
+    uint64_t reg_action;
+    uint64_t reg_value;
+    uint64_t record_identifier;
+
+    unsigned next_record_index;
+    uint8_t record[ERST_RECORD_SIZE]; /* read/written directly by guest */
+    uint8_t tmp_record[ERST_RECORD_SIZE]; /* intermediate manipulation buffer */
+    uint8_t *nvram; /* persistent storage, of length prop_size */
+
+} ERSTDeviceState;
+
+static void update_erst_backing_file(ERSTDeviceState *s,
+    off_t offset, const uint8_t *data, size_t length)
+{
+    int fd;
+
+    /* Bounds check */
+    if ((offset + length) > s->prop_size) {
+        error_report("update: off 0x%lx len 0x%lx > size 0x%lx out of range",
+            (long)offset, (long)length, (long)s->prop_size);
+        return;
+    }
+
+    fd = open(s->prop_filename, O_WRONLY | O_CREAT, ERST_BACKING_FILE_MODE);
+    if (fd > 0) {
+        off_t src;
+        size_t wrc = 0;
+        src = lseek(fd, offset, SEEK_SET);
+        if (offset == src) {
+            wrc = write(fd, data, length);
+        }
+        if ((offset != src) || (length != wrc)) {
+            error_report("ERST write failed: %d %d", (int)wrc, (int)length);
+        }
+        close(fd);
+    } else {
+        error_report("open failed: %s : %d", s->prop_filename, fd);
+    }
+}
+
+static unsigned copy_from_nvram_by_index(ERSTDeviceState *s, unsigned index)
+{
+    /* Read a nvram[] entry into tmp_record */
+    unsigned rc = ACPI_ERST_STATUS_FAILED;
+    off_t offset = (index * ERST_RECORD_SIZE);
+
+    if ((offset + ERST_RECORD_SIZE) <= s->prop_size) {
+        memcpy(s->tmp_record, &s->nvram[offset], ERST_RECORD_SIZE);
+        rc = ACPI_ERST_STATUS_SUCCESS;
+    }
+    return rc;
+}
+
+static unsigned copy_to_nvram_by_index(ERSTDeviceState *s, unsigned index)
+{
+    /* Write entry in tmp_record into nvram[], and backing file */
+    unsigned rc = ACPI_ERST_STATUS_FAILED;
+    off_t offset = (index * ERST_RECORD_SIZE);
+
+    if ((offset + ERST_RECORD_SIZE) <= s->prop_size) {
+        memcpy(&s->nvram[offset], s->tmp_record, ERST_RECORD_SIZE);
+        update_erst_backing_file(s, offset, s->tmp_record, ERST_RECORD_SIZE);
+        rc = ACPI_ERST_STATUS_SUCCESS;
+    }
+    return rc;
+}
+
+static int lookup_erst_record_by_identifier(ERSTDeviceState *s,
+    uint64_t record_identifier, bool *record_found, bool alloc_for_write)
+{
+    int rc = -1;
+    int empty_index = -1;
+    int index = 0;
+    unsigned rrc;
+
+    *record_found = 0;
+
+    do {
+        rrc = copy_from_nvram_by_index(s, (unsigned)index);
+        if (rrc == ACPI_ERST_STATUS_SUCCESS) {
+            uint64_t this_identifier;
+            this_identifier = THE_UEFI_CPER_RECORD_ID(s->tmp_record);
+            if (IS_UEFI_CPER_RECORD(s->tmp_record) &&
+                (this_identifier == record_identifier)) {
+                rc = index;
+                *record_found = 1;
+                break;
+            }
+            if ((this_identifier == ERST_INVALID_RECORD_ID) &&
+                (empty_index < 0)) {
+                empty_index = index; /* first available for write */
+            }
+        }
+        ++index;
+    } while (rrc == ACPI_ERST_STATUS_SUCCESS);
+
+    /* Record not found, allocate for writing */
+    if ((rc < 0) && alloc_for_write) {
+        rc = empty_index;
+    }
+
+    return rc;
+}
+
+static unsigned clear_erst_record(ERSTDeviceState *s)
+{
+    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
+    bool record_found;
+    int index;
+
+    index = lookup_erst_record_by_identifier(s,
+        s->record_identifier, &record_found, 0);
+    if (record_found) {
+        memset(s->tmp_record, 0xFF, ERST_RECORD_SIZE);
+        rc = copy_to_nvram_by_index(s, (unsigned)index);
+        if (rc == ACPI_ERST_STATUS_SUCCESS) {
+            s->record_count -= 1;
+        }
+    }
+
+    return rc;
+}
+
+static unsigned write_erst_record(ERSTDeviceState *s)
+{
+    unsigned rc = ACPI_ERST_STATUS_FAILED;
+
+    if (s->record_offset < (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
+        uint64_t record_identifier;
+        uint8_t *record = &s->record[s->record_offset];
+        bool record_found;
+        int index;
+
+        record_identifier = (s->record_identifier == ERST_INVALID_RECORD_ID)
+            ? THE_UEFI_CPER_RECORD_ID(record) : s->record_identifier;
+
+        index = lookup_erst_record_by_identifier(s,
+            record_identifier, &record_found, 1);
+        if (index < 0) {
+            rc = ACPI_ERST_STATUS_NOT_ENOUGH_SPACE;
+        } else {
+            if (0 != s->record_offset) {
+                memset(&s->tmp_record[ERST_RECORD_SIZE - s->record_offset],
+                    0xFF, s->record_offset);
+            }
+            memcpy(s->tmp_record, record, ERST_RECORD_SIZE - s->record_offset);
+            rc = copy_to_nvram_by_index(s, (unsigned)index);
+            if (rc == ACPI_ERST_STATUS_SUCCESS) {
+                if (!record_found) { /* not overwriting existing record */
+                    s->record_count += 1; /* writing new record */
+                }
+            }
+        }
+    }
+
+    return rc;
+}
+
+static unsigned next_erst_record(ERSTDeviceState *s,
+    uint64_t *record_identifier)
+{
+    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
+    unsigned index;
+    unsigned rrc;
+
+    *record_identifier = ERST_INVALID_RECORD_ID;
+
+    index = s->next_record_index;
+    do {
+        rrc = copy_from_nvram_by_index(s, (unsigned)index);
+        if (rrc == ACPI_ERST_STATUS_SUCCESS) {
+            if (IS_UEFI_CPER_RECORD(s->tmp_record)) {
+                s->next_record_index = index + 1; /* where to start next time */
+                *record_identifier = THE_UEFI_CPER_RECORD_ID(s->tmp_record);
+                rc = ACPI_ERST_STATUS_SUCCESS;
+                break;
+            }
+            ++index;
+        } else {
+            if (s->next_record_index == 0) {
+                rc = ACPI_ERST_STATUS_RECORD_STORE_EMPTY;
+            }
+            s->next_record_index = 0; /* at end, reset */
+        }
+    } while (rrc == ACPI_ERST_STATUS_SUCCESS);
+
+    return rc;
+}
+
+static unsigned read_erst_record(ERSTDeviceState *s)
+{
+    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
+    bool record_found;
+    int index;
+
+    index = lookup_erst_record_by_identifier(s,
+        s->record_identifier, &record_found, 0);
+    if (record_found) {
+        rc = copy_from_nvram_by_index(s, (unsigned)index);
+        if (rc == ACPI_ERST_STATUS_SUCCESS) {
+            if (s->record_offset < ERST_RECORD_SIZE) {
+                memcpy(&s->record[s->record_offset], s->tmp_record,
+                    ERST_RECORD_SIZE - s->record_offset);
+            }
+        }
+    }
+
+    return rc;
+}
+
+static unsigned get_erst_record_count(ERSTDeviceState *s)
+{
+    /* Compute record_count */
+    off_t offset;
+
+    s->record_count = 0;
+    offset = 0;
+    do {
+        uint8_t *ptr = &s->nvram[offset];
+        uint64_t record_identifier = THE_UEFI_CPER_RECORD_ID(ptr);
+        if (IS_UEFI_CPER_RECORD(ptr) &&
+            (ERST_INVALID_RECORD_ID != record_identifier)) {
+            s->record_count += 1;
+        }
+        offset += ERST_RECORD_SIZE;
+    } while (offset < (off_t)s->prop_size);
+
+    return s->record_count;
+}
+
+static void load_erst_backing_file(ERSTDeviceState *s)
+{
+    int fd;
+    struct stat statbuf;
+
+    erst_debug("+load_erst_backing_file()\n");
+
+    /* Allocate and initialize nvram[] */
+    s->nvram = g_malloc(s->prop_size);
+    memset(s->nvram, 0xFF, s->prop_size);
+
+    /* Ensure backing file at least same as prop_size */
+    if (stat(s->prop_filename, &statbuf) == 0) {
+        /* ensure prop_size at least matches file size */
+        if (statbuf.st_size < s->prop_size) {
+            /* Ensure records are ERST_INVALID_RECORD_ID */
+            memset(s->nvram, 0xFF, s->prop_size - statbuf.st_size);
+            update_erst_backing_file(s,
+                statbuf.st_size, s->nvram, s->prop_size - statbuf.st_size);
+        }
+    }
+
+    /* Pre-load nvram[] from backing file, if present */
+    fd = open(s->prop_filename, O_RDONLY, ERST_BACKING_FILE_MODE);
+    if (fd > 0) {
+        size_t rrc = read(fd, s->nvram, s->prop_size);
+        (void)rrc;
+        close(fd);
+        /*
+         * If existing file is smaller than prop_size, it will be resized
+         * accordingly upon subsequent record writes. If the file
+         * is larger than prop_size, only prop_size bytes are utilized,
+         * the extra bytes are untouched (though will be lost after
+         * a migration when the backing file is re-written as length
+         * of prop_size bytes).
+         */
+    } else {
+        /* Create empty backing file */
+        update_erst_backing_file(s, 0, s->nvram, s->prop_size);
+    }
+
+    /* Initialize record_count */
+    get_erst_record_count(s);
+
+    erst_debug("-load_erst_backing_file() %d\n", s->record_count);
+}
+
+static uint64_t erst_rd_reg64(hwaddr addr,
+    uint64_t reg, unsigned size)
+{
+    uint64_t rdval;
+    uint64_t mask;
+    unsigned shift;
+
+    if (size == sizeof(uint64_t)) {
+        /* 64b access */
+        mask = 0xFFFFFFFFFFFFFFFFUL;
+        shift = 0;
+    } else {
+        /* 32b access */
+        mask = 0x00000000FFFFFFFFUL;
+        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
+    }
+
+    rdval = reg;
+    rdval >>= shift;
+    rdval &= mask;
+
+    return rdval;
+}
+
+static uint64_t erst_wr_reg64(hwaddr addr,
+    uint64_t reg, uint64_t val, unsigned size)
+{
+    uint64_t wrval;
+    uint64_t mask;
+    unsigned shift;
+
+    if (size == sizeof(uint64_t)) {
+        /* 64b access */
+        mask = 0xFFFFFFFFFFFFFFFFUL;
+        shift = 0;
+    } else {
+        /* 32b access */
+        mask = 0x00000000FFFFFFFFUL;
+        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
+    }
+
+    val &= mask;
+    val <<= shift;
+    mask <<= shift;
+    wrval = reg;
+    wrval &= ~mask;
+    wrval |= val;
+
+    return wrval;
+}
+
+static void erst_write(void *opaque, hwaddr addr,
+    uint64_t val, unsigned size)
+{
+    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
+
+    if (addr < ERST_REG_LEN) {
+        /*
+         * NOTE: All actions/operations/side effects happen on the WRITE,
+         * by design. The READs simply return the reg_value contents.
+         */
+        erst_debug("ERST write %016lx %10s val %016lx sz %u",
+            addr, erst_reg(addr), val, size);
+        /* The REGISTER region */
+        switch (addr) {
+        case ERST_CSR_VALUE + 0:
+        case ERST_CSR_VALUE + 4:
+            s->reg_value = erst_wr_reg64(addr, s->reg_value, val, size);
+            break;
+        case ERST_CSR_ACTION + 0:
+/*      case ERST_CSR_ACTION+4: as coded, not really a 64b register */
+            switch (val) {
+            case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
+            case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
+            case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
+            case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
+            case ACPI_ERST_ACTION_END_OPERATION:
+                s->operation = val;
+                break;
+            case ACPI_ERST_ACTION_SET_RECORD_OFFSET:
+                s->record_offset = s->reg_value;
+                break;
+            case ACPI_ERST_ACTION_EXECUTE_OPERATION:
+                if ((uint8_t)s->reg_value == ERST_EXECUTE_OPERATION_MAGIC) {
+                    s->busy_status = 1;
+                    switch (s->operation) {
+                    case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
+                        s->command_status = write_erst_record(s);
+                        break;
+                    case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
+                        s->command_status = read_erst_record(s);
+                        break;
+                    case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
+                        s->command_status = clear_erst_record(s);
+                        break;
+                    case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
+                        s->command_status = ACPI_ERST_STATUS_SUCCESS;
+                        break;
+                    case ACPI_ERST_ACTION_END_OPERATION:
+                        s->command_status = ACPI_ERST_STATUS_SUCCESS;
+                        break;
+                    default:
+                        s->command_status = ACPI_ERST_STATUS_FAILED;
+                        break;
+                    }
+                    s->record_identifier = ERST_INVALID_RECORD_ID;
+                    s->busy_status = 0;
+                }
+                break;
+            case ACPI_ERST_ACTION_CHECK_BUSY_STATUS:
+                s->reg_value = s->busy_status;
+                break;
+            case ACPI_ERST_ACTION_GET_COMMAND_STATUS:
+                s->reg_value = s->command_status;
+                break;
+            case ACPI_ERST_ACTION_GET_RECORD_IDENTIFIER:
+                s->command_status = next_erst_record(s, &s->reg_value);
+                break;
+            case ACPI_ERST_ACTION_SET_RECORD_IDENTIFIER:
+                s->record_identifier = s->reg_value;
+                break;
+            case ACPI_ERST_ACTION_GET_RECORD_COUNT:
+                s->reg_value = s->record_count;
+                break;
+            case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
+                s->reg_value = s->base + ERST_REG_LEN;
+                break;
+            case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
+                s->reg_value = ERST_RECORD_SIZE;
+                break;
+            case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
+                s->reg_value = 0; /* correct/intended value */
+                break;
+            case ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS:
+                /*
+                 * 100UL is max, 10UL is nominal
+                 */
+                s->reg_value = ((100UL << 32) | (10UL << 0));
+                break;
+            case ACPI_ERST_ACTION_RESERVED:
+            default:
+                /*
+                 * NOP
+                 */
+                break;
+            }
+            break;
+        default:
+            /*
+             * All other register writes are NO-OPs
+             */
+            break;
+        }
+    } else {
+        /* The RECORD region */
+        unsigned offset = addr - ERST_REG_LEN;
+        uint8_t *ptr = &s->record[offset];
+        switch (size) {
+        default:
+        case sizeof(uint8_t):
+            *(uint8_t *)ptr = (uint8_t)val;
+            break;
+        case sizeof(uint16_t):
+            *(uint16_t *)ptr = (uint16_t)val;
+            break;
+        case sizeof(uint32_t):
+            *(uint32_t *)ptr = (uint32_t)val;
+            break;
+        case sizeof(uint64_t):
+            *(uint64_t *)ptr = (uint64_t)val;
+            break;
+        }
+    }
+}
+
+static uint64_t erst_read(void *opaque, hwaddr addr,
+                                unsigned size)
+{
+    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
+    uint64_t val = 0;
+
+    if (addr < ERST_REG_LEN) {
+        switch (addr) {
+        case ERST_CSR_ACTION + 0:
+        case ERST_CSR_ACTION + 4:
+            val = erst_rd_reg64(addr, s->reg_action, size);
+            break;
+        case ERST_CSR_VALUE + 0:
+        case ERST_CSR_VALUE + 4:
+            val = erst_rd_reg64(addr, s->reg_value, size);
+            break;
+        default:
+            break;
+        }
+    erst_debug("ERST read  %016lx %10s val %016lx sz %u\n",
+        addr, erst_reg(addr), val, size);
+    } else {
+        /*
+         * The RECORD region
+         */
+        uint8_t *ptr = &s->record[addr - ERST_REG_LEN];
+        switch (size) {
+        default:
+        case sizeof(uint8_t):
+            val = *(uint8_t *)ptr;
+            break;
+        case sizeof(uint16_t):
+            val = *(uint16_t *)ptr;
+            break;
+        case sizeof(uint32_t):
+            val = *(uint32_t *)ptr;
+            break;
+        case sizeof(uint64_t):
+            val = *(uint64_t *)ptr;
+            break;
+        }
+    }
+    erst_debug("ERST read  %016lx %10s val %016lx sz %u\n",
+        addr, erst_reg(addr), val, size);
+    return val;
+}
+
+static size_t build_erst_action(GArray *table_data,
+    uint8_t serialization_action,
+    uint8_t instruction,
+    uint8_t flags,
+    uint8_t width,
+    uint64_t address,
+    uint64_t value,
+    uint64_t mask)
+{
+    /* See ACPI spec, Error Serialization */
+    uint8_t access_width = 0;
+    build_append_int_noprefix(table_data, serialization_action, 1);
+    build_append_int_noprefix(table_data, instruction         , 1);
+    build_append_int_noprefix(table_data, flags               , 1);
+    build_append_int_noprefix(table_data, 0                   , 1);
+    /* GAS space_id */
+    build_append_int_noprefix(table_data, AML_SYSTEM_MEMORY   , 1);
+    /* GAS bit_width */
+    build_append_int_noprefix(table_data, width               , 1);
+    /* GAS bit_offset */
+    build_append_int_noprefix(table_data, 0                   , 1);
+    /* GAS access_width */
+    switch (width) {
+    case 8:
+        access_width = 1;
+        break;
+    case 16:
+        access_width = 2;
+        break;
+    case 32:
+        access_width = 3;
+        break;
+    case 64:
+        access_width = 4;
+        break;
+    default:
+        access_width = 0;
+        break;
+    }
+    build_append_int_noprefix(table_data, access_width        , 1);
+    /* GAS address */
+    build_append_int_noprefix(table_data, address, 8);
+    /* value */
+    build_append_int_noprefix(table_data, value  , 8);
+    /* mask */
+    build_append_int_noprefix(table_data, mask   , 8);
+
+    return 1;
+}
+
+static const MemoryRegionOps erst_rw_ops = {
+    .read = erst_read,
+    .write = erst_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+void build_erst(GArray *table_data, BIOSLinker *linker, hwaddr base)
+{
+    unsigned action, insns = 0;
+    unsigned erst_start = table_data->len;
+    unsigned iec_offset = 0;
+
+    /* See ACPI spec, Error Serialization */
+    acpi_data_push(table_data, sizeof(AcpiTableHeader));
+    /* serialization_header_length */
+    build_append_int_noprefix(table_data, 48, 4);
+    /* reserved */
+    build_append_int_noprefix(table_data,  0, 4);
+    iec_offset = table_data->len;
+    /* instruction_entry_count (placeholder) */
+    build_append_int_noprefix(table_data,  0, 4);
+
+#define BEA(I, F, W, ADDR, VAL, MASK) \
+    build_erst_action(table_data, action, \
+        ACPI_ERST_INST_##I, F, W, base + ADDR, VAL, MASK)
+#define MASK8  0x00000000000000FFUL
+#define MASK16 0x000000000000FFFFUL
+#define MASK32 0x00000000FFFFFFFFUL
+#define MASK64 0xFFFFFFFFFFFFFFFFUL
+
+    for (action = 0; action < ACPI_ERST_MAX_ACTIONS; ++action) {
+        switch (action) {
+        case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            break;
+        case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            break;
+        case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            break;
+        case ACPI_ERST_ACTION_END_OPERATION:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            break;
+        case ACPI_ERST_ACTION_SET_RECORD_OFFSET:
+            insns += BEA(WRITE_REGISTER      , 0, 32,
+                ERST_CSR_VALUE , 0, MASK32);
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            break;
+        case ACPI_ERST_ACTION_EXECUTE_OPERATION:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_VALUE , ERST_EXECUTE_OPERATION_MAGIC, MASK8);
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            break;
+        case ACPI_ERST_ACTION_CHECK_BUSY_STATUS:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            insns += BEA(READ_REGISTER_VALUE , 0, 32,
+                ERST_CSR_VALUE, 0x01, MASK8);
+            break;
+        case ACPI_ERST_ACTION_GET_COMMAND_STATUS:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            insns += BEA(READ_REGISTER       , 0, 32,
+                ERST_CSR_VALUE, 0, MASK8);
+            break;
+        case ACPI_ERST_ACTION_GET_RECORD_IDENTIFIER:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            insns += BEA(READ_REGISTER       , 0, 64,
+                ERST_CSR_VALUE, 0, MASK64);
+            break;
+        case ACPI_ERST_ACTION_SET_RECORD_IDENTIFIER:
+            insns += BEA(WRITE_REGISTER      , 0, 64,
+                ERST_CSR_VALUE , 0, MASK64);
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            break;
+        case ACPI_ERST_ACTION_GET_RECORD_COUNT:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            insns += BEA(READ_REGISTER       , 0, 32,
+                ERST_CSR_VALUE, 0, MASK32);
+            break;
+        case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            break;
+        case ACPI_ERST_ACTION_RESERVED:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            break;
+        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            insns += BEA(READ_REGISTER       , 0, 64,
+                ERST_CSR_VALUE, 0, MASK64);
+            break;
+        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            insns += BEA(READ_REGISTER       , 0, 64,
+                ERST_CSR_VALUE, 0, MASK32);
+            break;
+        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            insns += BEA(READ_REGISTER       , 0, 32,
+                ERST_CSR_VALUE, 0, MASK32);
+            break;
+        case ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS:
+            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
+                ERST_CSR_ACTION, action, MASK8);
+            insns += BEA(READ_REGISTER       , 0, 64,
+                ERST_CSR_VALUE, 0, MASK64);
+        default:
+            insns += BEA(NOOP, 0, 0, 0, action, 0);
+            break;
+        }
+    }
+
+    /* acpi_data_push() within BEA() can result in new GArray pointer */
+    *(uint32_t *)(table_data->data + iec_offset) = cpu_to_le32(insns);
+
+    build_header(linker, table_data,
+                 (void *)(table_data->data + erst_start),
+                 "ERST", table_data->len - erst_start,
+                 1, NULL, NULL);
+
+    if (erst_base == 0) {
+        /*
+         * This ACPI routine is invoked twice, but this code
+         * snippet needs to happen just once.
+         * And this code in erst_class_init() is too early.
+         */
+        DeviceState *dev;
+        SysBusDevice *s;
+
+        dev = qdev_new(TYPE_ACPI_ERST);
+        erst_debug("qdev_create dev %p\n", dev);
+        s = SYS_BUS_DEVICE(dev);
+        sysbus_realize_and_unref(s, &error_fatal);
+
+        ACPIERST(dev)->base = base;
+        sysbus_mmio_map(s, 0, base);
+        erst_base = base;
+        erst_debug("erst_base %lx\n", base);
+    }
+}
+
+/*******************************************************************/
+/*******************************************************************/
+static int erst_post_load(void *opaque, int version_id)
+{
+    ERSTDeviceState *s = opaque;
+    erst_debug("+erst_post_load(%d)\n", version_id);
+    /* Ensure nvram[] persists into backing file */
+    update_erst_backing_file(s, 0, s->nvram, s->prop_size);
+    erst_debug("-erst_post_load()\n");
+    return 0;
+}
+
+static const VMStateDescription erst_vmstate  = {
+    .name = "acpi-erst",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .post_load = erst_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8(operation, ERSTDeviceState),
+        VMSTATE_UINT8(busy_status, ERSTDeviceState),
+        VMSTATE_UINT8(command_status, ERSTDeviceState),
+        VMSTATE_UINT32(record_offset, ERSTDeviceState),
+        VMSTATE_UINT32(record_count, ERSTDeviceState),
+        VMSTATE_UINT64(reg_action, ERSTDeviceState),
+        VMSTATE_UINT64(reg_value, ERSTDeviceState),
+        VMSTATE_UINT64(record_identifier, ERSTDeviceState),
+        VMSTATE_UINT8_ARRAY(record, ERSTDeviceState, ERST_RECORD_SIZE),
+        VMSTATE_UINT8_ARRAY(tmp_record, ERSTDeviceState, ERST_RECORD_SIZE),
+        VMSTATE_VARRAY_UINT32(nvram, ERSTDeviceState, prop_size, 0,
+            vmstate_info_uint8, uint8_t),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void erst_realizefn(DeviceState *dev, Error **errp)
+{
+    SysBusDevice *d = SYS_BUS_DEVICE(dev);
+    ERSTDeviceState *s = ACPIERST(dev);
+
+    erst_debug("+erst_realizefn()\n");
+    if (!s->prop_filename) {
+        s->prop_filename = (char *)(TYPE_ACPI_ERST ".backing");
+    }
+
+    if (!s->prop_filename) {
+        error_setg(errp, "'filename' property is not set");
+        return;
+    }
+
+    if (!(s->prop_size > ERST_RECORD_SIZE) &&
+        (s->prop_size <= 0x04000000)) {
+        error_setg(errp, "'size' property %d is not set properly",
+            s->prop_size);
+        return;
+    }
+
+    /* Convert prop_size to integer multiple of ERST_RECORD_SIZE */
+    s->prop_size -= (s->prop_size % ERST_RECORD_SIZE);
+
+    load_erst_backing_file(s);
+
+    erst_debug("filename %s\n", s->prop_filename);
+    erst_debug("size %x\n", s->prop_size);
+
+    memory_region_init_io(&s->iomem, OBJECT(s), &erst_rw_ops, s,
+                          TYPE_ACPI_ERST, ERST_IOMEM_SIZE);
+    sysbus_init_mmio(d, &s->iomem);
+    erst_debug("-erst_realizefn()\n");
+}
+
+static void erst_unrealizefn(DeviceState *dev)
+{
+    ERSTDeviceState *s = ACPIERST(dev);
+
+    erst_debug("+erst_unrealizefn()\n");
+    if (s->nvram) {
+        /* Ensure nvram[] persists into backing file */
+        update_erst_backing_file(s, 0, s->nvram, s->prop_size);
+        g_free(s->nvram);
+        s->nvram = NULL;
+    }
+    erst_debug("-erst_unrealizefn()\n");
+}
+
+static void erst_reset(DeviceState *dev)
+{
+    ERSTDeviceState *s = ACPIERST(dev);
+
+    erst_debug("+erst_reset(%p) %d\n", s, s->record_count);
+    s->operation = 0;
+    s->busy_status = 0;
+    s->command_status = ACPI_ERST_STATUS_SUCCESS;
+    /* indicate empty/no-more until further notice */
+    s->record_identifier = ERST_INVALID_RECORD_ID;
+    s->record_offset = 0;
+    s->next_record_index = 0;
+    /* NOTE: record_count and nvram[] are initialized elsewhere */
+    erst_debug("-erst_reset()\n");
+}
+
+static Property erst_properties[] = {
+    DEFINE_PROP_UINT32("size", ERSTDeviceState, prop_size, 0x00010000),
+    DEFINE_PROP_STRING("filename", ERSTDeviceState, prop_filename),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void erst_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    erst_debug("+erst_class_init()\n");
+    dc->realize = erst_realizefn;
+    dc->unrealize = erst_unrealizefn;
+    dc->reset = erst_reset;
+    dc->vmsd = &erst_vmstate;
+    device_class_set_props(dc, erst_properties);
+    dc->desc = "ACPI Error Record Serialization Table (ERST) device";
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    erst_debug("-erst_class_init()\n");
+}
+
+static const TypeInfo erst_type_info = {
+    .name          = TYPE_ACPI_ERST,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .class_init    = erst_class_init,
+    .instance_size = sizeof(ERSTDeviceState),
+};
+
+static void erst_register_types(void)
+{
+    type_register_static(&erst_type_info);
+}
+
+type_init(erst_register_types)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/7] ACPI ERST: build step for ACPI ERST
  2021-02-08 20:57 [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (2 preceding siblings ...)
  2021-02-08 20:57 ` [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature Eric DeVolder
@ 2021-02-08 20:57 ` Eric DeVolder
  2021-04-06 19:17   ` Igor Mammedov
  2021-02-08 20:57 ` [PATCH v2 5/7] ACPI ERST: support ERST for x86 guest Eric DeVolder
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Eric DeVolder @ 2021-02-08 20:57 UTC (permalink / raw)
  To: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, qemu-devel
  Cc: boris.ostrovsky, kwilk

This change includes ERST in the build of ACPI support.

Signed-off-by: Eric DeVolder <eric.devolder@orace.com>
---
 hw/acpi/meson.build | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index dd69577..262a8ee 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -4,6 +4,7 @@ acpi_ss.add(files(
   'aml-build.c',
   'bios-linker-loader.c',
   'utils.c',
+  'erst.c',
 ))
 acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu_hotplug.c'))
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 5/7] ACPI ERST: support ERST for x86 guest
  2021-02-08 20:57 [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (3 preceding siblings ...)
  2021-02-08 20:57 ` [PATCH v2 4/7] ACPI ERST: build step for ACPI ERST Eric DeVolder
@ 2021-02-08 20:57 ` Eric DeVolder
  2021-02-08 20:57 ` [PATCH v2 6/7] ACPI ERST: qtest for ERST Eric DeVolder
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Eric DeVolder @ 2021-02-08 20:57 UTC (permalink / raw)
  To: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, qemu-devel
  Cc: boris.ostrovsky, kwilk

This change includes ERST support for x86 guests.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 hw/i386/acpi-build.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index f18b71d..2c0b27d 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -43,6 +43,7 @@
 #include "sysemu/tpm.h"
 #include "hw/acpi/tpm.h"
 #include "hw/acpi/vmgenid.h"
+#include "hw/acpi/erst.h"
 #include "hw/boards.h"
 #include "sysemu/tpm_backend.h"
 #include "hw/rtc/mc146818rtc_regs.h"
@@ -2196,6 +2197,9 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
     acpi_build_madt(tables_blob, tables->linker, x86ms,
                     ACPI_DEVICE_IF(x86ms->acpi_dev));
 
+    acpi_add_table(table_offsets, tables_blob);
+    build_erst(tables_blob, tables->linker, HPET_BASE + 0x10000UL);
+
     vmgenid_dev = find_vmgenid_dev();
     if (vmgenid_dev) {
         acpi_add_table(table_offsets, tables_blob);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 6/7] ACPI ERST: qtest for ERST
  2021-02-08 20:57 [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (4 preceding siblings ...)
  2021-02-08 20:57 ` [PATCH v2 5/7] ACPI ERST: support ERST for x86 guest Eric DeVolder
@ 2021-02-08 20:57 ` Eric DeVolder
  2021-02-08 20:57 ` [PATCH v2 7/7] ACPI ERST: bios-tables-test.c step 5 Eric DeVolder
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Eric DeVolder @ 2021-02-08 20:57 UTC (permalink / raw)
  To: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, qemu-devel
  Cc: boris.ostrovsky, kwilk

This change provides a qtest that locates and then does a simple
interrogation of the ERST feature within the guest.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 tests/qtest/erst-test.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++
 tests/qtest/meson.build |   2 +
 2 files changed, 108 insertions(+)
 create mode 100644 tests/qtest/erst-test.c

diff --git a/tests/qtest/erst-test.c b/tests/qtest/erst-test.c
new file mode 100644
index 0000000..1030e83
--- /dev/null
+++ b/tests/qtest/erst-test.c
@@ -0,0 +1,106 @@
+/*
+ * QTest testcase for ACPI ERST
+ *
+ * Copyright (c) 2021 Oracle
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/bitmap.h"
+#include "qemu/uuid.h"
+#include "hw/acpi/acpi-defs.h"
+#include "boot-sector.h"
+#include "acpi-utils.h"
+#include "libqos/libqtest.h"
+#include "qapi/qmp/qdict.h"
+
+#define RSDP_ADDR_INVALID 0x100000 /* RSDP must be below this address */
+
+static uint64_t acpi_find_erst(QTestState *qts)
+{
+    uint32_t rsdp_offset;
+    uint8_t rsdp_table[36 /* ACPI 2.0+ RSDP size */];
+    uint32_t rsdt_len, table_length;
+    uint8_t *rsdt, *ent;
+    uint64_t base = 0;
+
+    /* Wait for guest firmware to finish and start the payload. */
+    boot_sector_test(qts);
+
+    /* Tables should be initialized now. */
+    rsdp_offset = acpi_find_rsdp_address(qts);
+
+    g_assert_cmphex(rsdp_offset, <, RSDP_ADDR_INVALID);
+
+    acpi_fetch_rsdp_table(qts, rsdp_offset, rsdp_table);
+    acpi_fetch_table(qts, &rsdt, &rsdt_len, &rsdp_table[16 /* RsdtAddress */],
+                     4, "RSDT", true);
+
+    ACPI_FOREACH_RSDT_ENTRY(rsdt, rsdt_len, ent, 4 /* Entry size */) {
+        uint8_t *table_aml;
+        acpi_fetch_table(qts, &table_aml, &table_length, ent, 4, NULL, true);
+        if (!memcmp(table_aml + 16 /* OEM Table ID */, "BXPCERST", 8)) {
+            /*
+             * Picking up ERST base address from the Register Region
+             * specified as part of the first Serialization Instruction
+             * Action (which is a Begin Write Operation).
+             */
+            memcpy(&base, &table_aml[56], 8);
+            g_free(table_aml);
+            break;
+        }
+        g_free(table_aml);
+    }
+    g_free(rsdt);
+    return base;
+}
+
+static char disk[] = "tests/erst-test-disk-XXXXXX";
+
+#define ERST_CMD()                              \
+    "-accel kvm -accel tcg "                    \
+    "-drive id=hd0,if=none,file=%s,format=raw " \
+    "-device ide-hd,drive=hd0 ", disk
+
+static void erst_get_error_log_address_range(void)
+{
+    QTestState *qts;
+    uint64_t log_address_range = 0;
+
+    qts = qtest_initf(ERST_CMD());
+
+    uint64_t base = acpi_find_erst(qts);
+    g_assert(base != 0);
+
+    /* Issue GET_ERROR_LOG_ADDRESS_RANGE command */
+    qtest_writel(qts, base + 0, 0xD);
+    /* Read GET_ERROR_LOG_ADDRESS_RANGE result */
+    log_address_range = qtest_readq(qts, base + 8);\
+
+    /* Check addr_range is offset of base */
+    g_assert((base + 16) == log_address_range);
+
+    qtest_quit(qts);
+}
+
+int main(int argc, char **argv)
+{
+    int ret;
+
+    ret = boot_sector_init(disk);
+    if (ret) {
+        return ret;
+    }
+
+    g_test_init(&argc, &argv, NULL);
+
+    qtest_add_func("/erst/get-error-log-address-range",
+                   erst_get_error_log_address_range);
+
+    ret = g_test_run();
+    boot_sector_cleanup(disk);
+
+    return ret;
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 6a67c53..8409892 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -46,6 +46,7 @@ qtests_i386 = \
   (config_all_devices.has_key('CONFIG_TPM_TIS_ISA') ? ['tpm-tis-test'] : []) +              \
   (config_all_devices.has_key('CONFIG_TPM_TIS_ISA') ? ['tpm-tis-swtpm-test'] : []) +        \
   (config_all_devices.has_key('CONFIG_RTL8139_PCI') ? ['rtl8139-test'] : []) +              \
+  (config_all_devices.has_key('CONFIG_ACPI') ? ['erst-test'] : []) +                        \
   qtests_pci +                                                                              \
   ['fdc-test',
    'ide-test',
@@ -208,6 +209,7 @@ qtests = {
   'bios-tables-test': [io, 'boot-sector.c', 'acpi-utils.c', 'tpm-emu.c'],
   'cdrom-test': files('boot-sector.c'),
   'dbus-vmstate-test': files('migration-helpers.c') + dbus_vmstate1,
+  'erst-test': files('erst-test.c', 'boot-sector.c', 'acpi-utils.c'),
   'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
   'migration-test': files('migration-helpers.c'),
   'pxe-test': files('boot-sector.c'),
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 7/7] ACPI ERST: bios-tables-test.c step 5
  2021-02-08 20:57 [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (5 preceding siblings ...)
  2021-02-08 20:57 ` [PATCH v2 6/7] ACPI ERST: qtest for ERST Eric DeVolder
@ 2021-02-08 20:57 ` Eric DeVolder
  2021-03-01 19:04 ` [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric Devolder
  2021-05-14 13:57 ` Michael S. Tsirkin
  8 siblings, 0 replies; 24+ messages in thread
From: Eric DeVolder @ 2021-02-08 20:57 UTC (permalink / raw)
  To: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, qemu-devel
  Cc: boris.ostrovsky, kwilk

This change is step 5 of bios-tables-test.c. This change is the result
of the output of rebuild-expected-aml.sh.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 tests/data/acpi/pc/ERST                     | Bin 0 -> 976 bytes
 tests/data/acpi/q35/ERST                    | Bin 0 -> 976 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   4 ----
 3 files changed, 4 deletions(-)

diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..fcfec715478919a40b23aa20fe90b5dc74114f58 100644
GIT binary patch
literal 976
zcmaKqSq_3Q6h+HC4;wIH2`<4vNc_`?A1=;C>4wDK(pSli@Dhg0>1lbC@iyIGfl{8=
zUxxG4^^fZ?>Svx(3ir4k^?^Fzq{pfb=l2IuYPL5Xarh|VK5>7+jt9gMQR0UX^!h2U
zKhI`JNPcMSpC4H+{&Ry%e-ZJR=8u`9;nn+b{|WgC`6jP?i(UUI`6>A(pHwLQIbVMt
zREF=j$7%kI=Ff@e%#r`Szg`=tAiuyvU9!I=@p8eASHvsk$UFVj<k#ezyy0VLLw-ZP
a$vgeG<hSITys<v^uazI#{{Q0JY19|)NQVIc

literal 0
HcmV?d00001

diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..fcfec715478919a40b23aa20fe90b5dc74114f58 100644
GIT binary patch
literal 976
zcmaKqSq_3Q6h+HC4;wIH2`<4vNc_`?A1=;C>4wDK(pSli@Dhg0>1lbC@iyIGfl{8=
zUxxG4^^fZ?>Svx(3ir4k^?^Fzq{pfb=l2IuYPL5Xarh|VK5>7+jt9gMQR0UX^!h2U
zKhI`JNPcMSpC4H+{&Ry%e-ZJR=8u`9;nn+b{|WgC`6jP?i(UUI`6>A(pHwLQIbVMt
zREF=j$7%kI=Ff@e%#r`Szg`=tAiuyvU9!I=@p8eASHvsk$UFVj<k#ezyy0VLLw-ZP
a$vgeG<hSITys<v^uazI#{{Q0JY19|)NQVIc

literal 0
HcmV?d00001

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index e004c71..dfb8523 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,5 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/pc/ERST",
-"tests/data/acpi/q35/ERST",
-"tests/data/acpi/microvm/ERST",
-
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU
  2021-02-08 20:57 [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (6 preceding siblings ...)
  2021-02-08 20:57 ` [PATCH v2 7/7] ACPI ERST: bios-tables-test.c step 5 Eric DeVolder
@ 2021-03-01 19:04 ` Eric Devolder
  2021-04-01 13:44   ` Michael S. Tsirkin
  2021-05-14 13:57 ` Michael S. Tsirkin
  8 siblings, 1 reply; 24+ messages in thread
From: Eric Devolder @ 2021-03-01 19:04 UTC (permalink / raw)
  To: mst, imammedo, marcel.apfelbaum, pbonzini, rth, ehabkost, qemu-devel
  Cc: Boris Ostrovsky, kwilk

[-- Attachment #1: Type: text/plain, Size: 4029 bytes --]

Hi,
A friendly request to review and/or provide feedback on this patchset.
Thanks,
eric
________________________________
From: Eric Devolder <eric.devolder@oracle.com>
Sent: Monday, February 8, 2021 2:58 PM
To: mst@redhat.com <mst@redhat.com>; imammedo@redhat.com <imammedo@redhat.com>; marcel.apfelbaum@gmail.com <marcel.apfelbaum@gmail.com>; pbonzini@redhat.com <pbonzini@redhat.com>; rth@twiddle.net <rth@twiddle.net>; ehabkost@redhat.com <ehabkost@redhat.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>; kwilk@oracle.com <kwilk@oracle.com>
Subject: [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU

This patchset introduces support for the ACPI Error Record
Serialization Table, ERST.

Linux uses the persistent storage filesystem, pstore, to record
information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
independent of, and runs before, kdump.  In certain scenarios (ie.
hosts/guests with root filesystems on NFS/iSCSI where networking
software and/or hardware fails), pstore may contain the only
information available for post-mortem debugging.

Two common storage backends for the pstore filesystem are ACPI ERST
and UEFI. Most BIOS implement ACPI ERST; however, ACPI ERST is not
currently supported in QEMU, and UEFI is not utilized in all guests.
By implementing ACPI ERST within QEMU, then the ACPI ERST becomes a
viable pstore storage backend for virtual machines (as it is now for
bare metal machines).

Enabling support for ACPI ERST facilitates a consistent method to
capture kernel panic information in a wide range of guests: from
resource- constrained microvms to very large guests, and in
particular, in direct-boot environments (which would lack UEFI
run-time services).

Note that Microsoft Windows also utilizes the ACPI ERST for certain
crash information, if available.

The ACPI ERST persistent storage is contained within a single backing
file, with a default size of 64KiB. The size and filename of the
backing file can be obtained from QEMU parameters.

The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
(APEI)", and specifically subsection "Error Serialization", outlines
a method for storing error records into persistent storage.

[1] "Advanced Configuration and Power Interface Specification",
    version 6.2, May 2017.
    https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf

[2] "Unified Extensible Firmware Interface Specification",
    version 2.8, March 2019.
    https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf

Suggested-by: Konrad Wilk <konrad.wilk@oracle.com>
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>

---
v2: 8feb2021
 - Added qtest/smoke test per Paolo Bonzini
 - Split patch into smaller chunks, per Igo Mammedov
 - Did away with use of ACPI packed structures, per Igo Mammedov

v1: 26oct2020
 - initial post

---
Eric DeVolder (7):
  ACPI ERST: bios-tables-test.c steps 1 and 2
  ACPI ERST: header file for erst
  ACPI ERST: support for ACPI ERST feature
  ACPI ERST: build step for ACPI ERST
  ACPI ERST: support ERST for x86 guest
  ACPI ERST: qtest for ERST
  ACPI ERST: bios-tables-test.c step 5

 hw/acpi/erst.c               | 952 +++++++++++++++++++++++++++++++++++++++++++
 hw/acpi/meson.build          |   1 +
 hw/i386/acpi-build.c         |   4 +
 include/hw/acpi/erst.h       |  77 ++++
 tests/data/acpi/microvm/ERST |   0
 tests/data/acpi/pc/ERST      | Bin 0 -> 976 bytes
 tests/data/acpi/q35/ERST     | Bin 0 -> 976 bytes
 tests/qtest/erst-test.c      | 106 +++++
 tests/qtest/meson.build      |   2 +
 9 files changed, 1142 insertions(+)
 create mode 100644 hw/acpi/erst.c
 create mode 100644 include/hw/acpi/erst.h
 create mode 100644 tests/data/acpi/microvm/ERST
 create mode 100644 tests/data/acpi/pc/ERST
 create mode 100644 tests/data/acpi/q35/ERST
 create mode 100644 tests/qtest/erst-test.c

--
1.8.3.1


[-- Attachment #2: Type: text/html, Size: 6225 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU
  2021-03-01 19:04 ` [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric Devolder
@ 2021-04-01 13:44   ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-04-01 13:44 UTC (permalink / raw)
  To: Eric Devolder; +Cc: ehabkost, qemu-devel, kwilk, pbonzini, Boris Ostrovsky, rth

I tagged this for after the release. To help make sure I don't lose
this by mistake please ping after the QEMU release.

Thanks!

On Mon, Mar 01, 2021 at 07:04:50PM +0000, Eric Devolder wrote:
> Hi,
> A friendly request to review and/or provide feedback on this patchset.
> Thanks,
> eric
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> From: Eric Devolder <eric.devolder@oracle.com>
> Sent: Monday, February 8, 2021 2:58 PM
> To: mst@redhat.com <mst@redhat.com>; imammedo@redhat.com <imammedo@redhat.com>;
> marcel.apfelbaum@gmail.com <marcel.apfelbaum@gmail.com>; pbonzini@redhat.com
> <pbonzini@redhat.com>; rth@twiddle.net <rth@twiddle.net>; ehabkost@redhat.com
> <ehabkost@redhat.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>; kwilk@oracle.com
> <kwilk@oracle.com>
> Subject: [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support
> for QEMU
>  
> This patchset introduces support for the ACPI Error Record
> Serialization Table, ERST.
> 
> Linux uses the persistent storage filesystem, pstore, to record
> information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
> independent of, and runs before, kdump.  In certain scenarios (ie.
> hosts/guests with root filesystems on NFS/iSCSI where networking
> software and/or hardware fails), pstore may contain the only
> information available for post-mortem debugging.
> 
> Two common storage backends for the pstore filesystem are ACPI ERST
> and UEFI. Most BIOS implement ACPI ERST; however, ACPI ERST is not
> currently supported in QEMU, and UEFI is not utilized in all guests.
> By implementing ACPI ERST within QEMU, then the ACPI ERST becomes a
> viable pstore storage backend for virtual machines (as it is now for
> bare metal machines).
> 
> Enabling support for ACPI ERST facilitates a consistent method to
> capture kernel panic information in a wide range of guests: from
> resource- constrained microvms to very large guests, and in
> particular, in direct-boot environments (which would lack UEFI
> run-time services).
> 
> Note that Microsoft Windows also utilizes the ACPI ERST for certain
> crash information, if available.
> 
> The ACPI ERST persistent storage is contained within a single backing
> file, with a default size of 64KiB. The size and filename of the
> backing file can be obtained from QEMU parameters.
> 
> The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
> (APEI)", and specifically subsection "Error Serialization", outlines
> a method for storing error records into persistent storage.
> 
> [1] "Advanced Configuration and Power Interface Specification",
>     version 6.2, May 2017.
>     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> 
> [2] "Unified Extensible Firmware Interface Specification",
>     version 2.8, March 2019.
>     https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> 
> Suggested-by: Konrad Wilk <konrad.wilk@oracle.com>
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> 
> ---
> v2: 8feb2021
>  - Added qtest/smoke test per Paolo Bonzini
>  - Split patch into smaller chunks, per Igo Mammedov
>  - Did away with use of ACPI packed structures, per Igo Mammedov
> 
> v1: 26oct2020
>  - initial post
> 
> ---
> Eric DeVolder (7):
>   ACPI ERST: bios-tables-test.c steps 1 and 2
>   ACPI ERST: header file for erst
>   ACPI ERST: support for ACPI ERST feature
>   ACPI ERST: build step for ACPI ERST
>   ACPI ERST: support ERST for x86 guest
>   ACPI ERST: qtest for ERST
>   ACPI ERST: bios-tables-test.c step 5
> 
>  hw/acpi/erst.c               | 952 +++++++++++++++++++++++++++++++++++++++++++
>  hw/acpi/meson.build          |   1 +
>  hw/i386/acpi-build.c         |   4 +
>  include/hw/acpi/erst.h       |  77 ++++
>  tests/data/acpi/microvm/ERST |   0
>  tests/data/acpi/pc/ERST      | Bin 0 -> 976 bytes
>  tests/data/acpi/q35/ERST     | Bin 0 -> 976 bytes
>  tests/qtest/erst-test.c      | 106 +++++
>  tests/qtest/meson.build      |   2 +
>  9 files changed, 1142 insertions(+)
>  create mode 100644 hw/acpi/erst.c
>  create mode 100644 include/hw/acpi/erst.h
>  create mode 100644 tests/data/acpi/microvm/ERST
>  create mode 100644 tests/data/acpi/pc/ERST
>  create mode 100644 tests/data/acpi/q35/ERST
>  create mode 100644 tests/qtest/erst-test.c
> 
> --
> 1.8.3.1
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/7] ACPI ERST: build step for ACPI ERST
  2021-02-08 20:57 ` [PATCH v2 4/7] ACPI ERST: build step for ACPI ERST Eric DeVolder
@ 2021-04-06 19:17   ` Igor Mammedov
  0 siblings, 0 replies; 24+ messages in thread
From: Igor Mammedov @ 2021-04-06 19:17 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, mst, qemu-devel, kwilk, pbonzini, boris.ostrovsky, rth

On Mon,  8 Feb 2021 15:57:56 -0500
Eric DeVolder <eric.devolder@oracle.com> wrote:

> This change includes ERST in the build of ACPI support.
> 
> Signed-off-by: Eric DeVolder <eric.devolder@orace.com>
> ---
>  hw/acpi/meson.build | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
> index dd69577..262a8ee 100644
> --- a/hw/acpi/meson.build
> +++ b/hw/acpi/meson.build
> @@ -4,6 +4,7 @@ acpi_ss.add(files(
>    'aml-build.c',
>    'bios-linker-loader.c',
>    'utils.c',
> +  'erst.c',
>  ))

this should be part of the patch that introduces erst.c

>  acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu.c'))
>  acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu_hotplug.c'))



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-02-08 20:57 ` [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature Eric DeVolder
@ 2021-04-06 19:31   ` Igor Mammedov
  2021-04-09 15:54     ` Eric DeVolder
  0 siblings, 1 reply; 24+ messages in thread
From: Igor Mammedov @ 2021-04-06 19:31 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, mst, qemu-devel, kwilk, pbonzini, boris.ostrovsky, rth

On Mon,  8 Feb 2021 15:57:55 -0500
Eric DeVolder <eric.devolder@oracle.com> wrote:

> This change implements the support for the ACPI ERST feature[1,2].
> 
> The size of the ACPI ERST storage is declared via the QEMU
> global parameter acpi-erst.size. The size can range from 64KiB
> to to 64MiB. The default is 64KiB.
> 
> The location of the ACPI ERST storage backing file is delared
> via the QEMU global parameter acpi-erst.filename. The default
> is acpi-erst.backing.
> 
> [1] "Advanced Configuration and Power Interface Specification",
>     version 6.2, May 2017.
>     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> 
> [2] "Unified Extensible Firmware Interface Specification",
>     version 2.8, March 2019.
>     https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> 
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>

items 2/4/5 from v1 review still need to be addressed.

> 
> 2. patch is too big to review, please split it up in smaller chunks.
> 
> EJD: Done.

(separating a header and a makefile rule doesn't make much sense)

it should be split at least on part that implements device model and ACPI parts


[...]
> 
> 4. Maybe instead of SYSBUS device, implement it as a PCI device and
>    use its BAR/control registers for pstore storage and control interface.
>    It could save you headache of picking address where to map it +
>    it would take care of migration part automatically, as firmware
>    would do it for you and then QEMU could pickup firmware programmed
>    address and put it into ERST table.
> EJD: Thanks for the idea. For now I've left it as a SYSBUS device; we can revisit as needed.

I would really prefer to see a PCI version (current way is just a hack)

> 5. instead of dealing with file for storage directly, reuse hostmem backend
>    to provide it to for your device. ex: pc-dimm. i.e. split device
>    on frontend and backend
> 
> EJD: I had looked into that prior to posting v1. The entire ERST storage is not memory mapped, just an exchange buffer. So the hostmem backend is not suitable for this purpose.

Is there a compelling reason why it can't be memory mapped?

> ---
>  hw/acpi/erst.c | 952 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 952 insertions(+)
>  create mode 100644 hw/acpi/erst.c
> 
> diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
> new file mode 100644
> index 0000000..3a342f9
> --- /dev/null
> +++ b/hw/acpi/erst.c
> @@ -0,0 +1,952 @@
> +/*
> + * ACPI Error Record Serialization Table, ERST, Implementation
> + *
> + * Copyright (c) 2020 Oracle and/or its affiliates.
> + *
> + * See ACPI specification,
> + * "ACPI Platform Error Interfaces" : "Error Serialization"
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation;
> + * version 2 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <unistd.h>
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "hw/sysbus.h"
> +#include "qom/object_interfaces.h"
> +#include "qemu/error-report.h"
> +#include "migration/vmstate.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/acpi-defs.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "exec/address-spaces.h"
> +#include "hw/acpi/erst.h"
> +
> +#ifdef _ERST_DEBUG
> +#define erst_debug(fmt, ...) \
> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); fflush(stderr); } while (0)
> +#else
> +#define erst_debug(fmt, ...) do { } while (0)
> +#endif
> +
> +/* See UEFI spec, Appendix N Common Platform Error Record */
> +/* UEFI CPER allows for an OSPM book keeping area in the record */
> +#define UEFI_CPER_RECORD_MIN_SIZE 128U
> +#define UEFI_CPER_SIZE_OFFSET 20U
> +#define UEFI_CPER_RECORD_ID_OFFSET 96U
> +#define IS_UEFI_CPER_RECORD(ptr) \
> +    (((ptr)[0] == 'C') && \
> +     ((ptr)[1] == 'P') && \
> +     ((ptr)[2] == 'E') && \
> +     ((ptr)[3] == 'R'))
> +#define THE_UEFI_CPER_RECORD_ID(ptr) \
> +    (*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
> +
> +#define ERST_INVALID_RECORD_ID (~0UL)
> +#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
> +#define ERST_CSR_ACTION (0UL << 3) /* action (cmd) */
> +#define ERST_CSR_VALUE  (1UL << 3) /* argument/value (data) */
> +
> +/*
> + * As ERST_IOMEM_SIZE is used to map the ERST into the guest,
> + * it should/must be an integer multiple of PAGE_SIZE.
> + * NOTE that any change to this value will make any pre-
> + * existing backing files, not of the same ERST_IOMEM_SIZE,
> + * unusable to the guest.
> + */
> +#define ERST_IOMEM_SIZE (2UL * 4096)
> +
> +/*
> + * This implementation is an ACTION (cmd) and VALUE (data)
> + * interface consisting of just two 64-bit registers.
> + */
> +#define ERST_REG_LEN (2UL * sizeof(uint64_t))
> +
> +/*
> + * The space not utilized by the register interface is the
> + * buffer for exchanging ERST record contents.
> + */
> +#define ERST_RECORD_SIZE (ERST_IOMEM_SIZE - ERST_REG_LEN)
> +
> +/*
> + * Mode to be used for backing file
> + */
> +#define ERST_BACKING_FILE_MODE 0644 /* S_IRWXU|S_IRWXG */
> +
> +#define ACPIERST(obj) \
> +    OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
> +#define ACPIERST_CLASS(oc) \
> +    OBJECT_CLASS_CHECK(ERSTDeviceStateClass, (oc), TYPE_ACPI_ERST)
> +#define ACPIERST_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(ERSTDeviceStateClass, (obj), TYPE_ACPI_ERST)
> +
> +static hwaddr erst_base;
> +
> +typedef struct {
> +    SysBusDevice parent_obj;
> +
> +    MemoryRegion iomem;
> +    uint32_t prop_size;
> +    char *prop_filename;
> +    hwaddr base;
> +
> +    uint8_t operation;
> +    uint8_t busy_status;
> +    uint8_t command_status;
> +    uint32_t record_offset;
> +    uint32_t record_count;
> +    uint64_t reg_action;
> +    uint64_t reg_value;
> +    uint64_t record_identifier;
> +
> +    unsigned next_record_index;
> +    uint8_t record[ERST_RECORD_SIZE]; /* read/written directly by guest */
> +    uint8_t tmp_record[ERST_RECORD_SIZE]; /* intermediate manipulation buffer */
> +    uint8_t *nvram; /* persistent storage, of length prop_size */
> +
> +} ERSTDeviceState;
> +
> +static void update_erst_backing_file(ERSTDeviceState *s,
> +    off_t offset, const uint8_t *data, size_t length)
> +{
> +    int fd;
> +
> +    /* Bounds check */
> +    if ((offset + length) > s->prop_size) {
> +        error_report("update: off 0x%lx len 0x%lx > size 0x%lx out of range",
> +            (long)offset, (long)length, (long)s->prop_size);
> +        return;
> +    }
> +
> +    fd = open(s->prop_filename, O_WRONLY | O_CREAT, ERST_BACKING_FILE_MODE);
> +    if (fd > 0) {
> +        off_t src;
> +        size_t wrc = 0;
> +        src = lseek(fd, offset, SEEK_SET);
> +        if (offset == src) {
> +            wrc = write(fd, data, length);
> +        }
> +        if ((offset != src) || (length != wrc)) {
> +            error_report("ERST write failed: %d %d", (int)wrc, (int)length);
> +        }
> +        close(fd);
> +    } else {
> +        error_report("open failed: %s : %d", s->prop_filename, fd);
> +    }
> +}
> +
> +static unsigned copy_from_nvram_by_index(ERSTDeviceState *s, unsigned index)
> +{
> +    /* Read a nvram[] entry into tmp_record */
> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
> +    off_t offset = (index * ERST_RECORD_SIZE);
> +
> +    if ((offset + ERST_RECORD_SIZE) <= s->prop_size) {
> +        memcpy(s->tmp_record, &s->nvram[offset], ERST_RECORD_SIZE);
> +        rc = ACPI_ERST_STATUS_SUCCESS;
> +    }
> +    return rc;
> +}
> +
> +static unsigned copy_to_nvram_by_index(ERSTDeviceState *s, unsigned index)
> +{
> +    /* Write entry in tmp_record into nvram[], and backing file */
> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
> +    off_t offset = (index * ERST_RECORD_SIZE);
> +
> +    if ((offset + ERST_RECORD_SIZE) <= s->prop_size) {
> +        memcpy(&s->nvram[offset], s->tmp_record, ERST_RECORD_SIZE);
> +        update_erst_backing_file(s, offset, s->tmp_record, ERST_RECORD_SIZE);
> +        rc = ACPI_ERST_STATUS_SUCCESS;
> +    }
> +    return rc;
> +}
> +
> +static int lookup_erst_record_by_identifier(ERSTDeviceState *s,
> +    uint64_t record_identifier, bool *record_found, bool alloc_for_write)
> +{
> +    int rc = -1;
> +    int empty_index = -1;
> +    int index = 0;
> +    unsigned rrc;
> +
> +    *record_found = 0;
> +
> +    do {
> +        rrc = copy_from_nvram_by_index(s, (unsigned)index);
> +        if (rrc == ACPI_ERST_STATUS_SUCCESS) {
> +            uint64_t this_identifier;
> +            this_identifier = THE_UEFI_CPER_RECORD_ID(s->tmp_record);
> +            if (IS_UEFI_CPER_RECORD(s->tmp_record) &&
> +                (this_identifier == record_identifier)) {
> +                rc = index;
> +                *record_found = 1;
> +                break;
> +            }
> +            if ((this_identifier == ERST_INVALID_RECORD_ID) &&
> +                (empty_index < 0)) {
> +                empty_index = index; /* first available for write */
> +            }
> +        }
> +        ++index;
> +    } while (rrc == ACPI_ERST_STATUS_SUCCESS);
> +
> +    /* Record not found, allocate for writing */
> +    if ((rc < 0) && alloc_for_write) {
> +        rc = empty_index;
> +    }
> +
> +    return rc;
> +}
> +
> +static unsigned clear_erst_record(ERSTDeviceState *s)
> +{
> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
> +    bool record_found;
> +    int index;
> +
> +    index = lookup_erst_record_by_identifier(s,
> +        s->record_identifier, &record_found, 0);
> +    if (record_found) {
> +        memset(s->tmp_record, 0xFF, ERST_RECORD_SIZE);
> +        rc = copy_to_nvram_by_index(s, (unsigned)index);
> +        if (rc == ACPI_ERST_STATUS_SUCCESS) {
> +            s->record_count -= 1;
> +        }
> +    }
> +
> +    return rc;
> +}
> +
> +static unsigned write_erst_record(ERSTDeviceState *s)
> +{
> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
> +
> +    if (s->record_offset < (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
> +        uint64_t record_identifier;
> +        uint8_t *record = &s->record[s->record_offset];
> +        bool record_found;
> +        int index;
> +
> +        record_identifier = (s->record_identifier == ERST_INVALID_RECORD_ID)
> +            ? THE_UEFI_CPER_RECORD_ID(record) : s->record_identifier;
> +
> +        index = lookup_erst_record_by_identifier(s,
> +            record_identifier, &record_found, 1);
> +        if (index < 0) {
> +            rc = ACPI_ERST_STATUS_NOT_ENOUGH_SPACE;
> +        } else {
> +            if (0 != s->record_offset) {
> +                memset(&s->tmp_record[ERST_RECORD_SIZE - s->record_offset],
> +                    0xFF, s->record_offset);
> +            }
> +            memcpy(s->tmp_record, record, ERST_RECORD_SIZE - s->record_offset);
> +            rc = copy_to_nvram_by_index(s, (unsigned)index);
> +            if (rc == ACPI_ERST_STATUS_SUCCESS) {
> +                if (!record_found) { /* not overwriting existing record */
> +                    s->record_count += 1; /* writing new record */
> +                }
> +            }
> +        }
> +    }
> +
> +    return rc;
> +}
> +
> +static unsigned next_erst_record(ERSTDeviceState *s,
> +    uint64_t *record_identifier)
> +{
> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
> +    unsigned index;
> +    unsigned rrc;
> +
> +    *record_identifier = ERST_INVALID_RECORD_ID;
> +
> +    index = s->next_record_index;
> +    do {
> +        rrc = copy_from_nvram_by_index(s, (unsigned)index);
> +        if (rrc == ACPI_ERST_STATUS_SUCCESS) {
> +            if (IS_UEFI_CPER_RECORD(s->tmp_record)) {
> +                s->next_record_index = index + 1; /* where to start next time */
> +                *record_identifier = THE_UEFI_CPER_RECORD_ID(s->tmp_record);
> +                rc = ACPI_ERST_STATUS_SUCCESS;
> +                break;
> +            }
> +            ++index;
> +        } else {
> +            if (s->next_record_index == 0) {
> +                rc = ACPI_ERST_STATUS_RECORD_STORE_EMPTY;
> +            }
> +            s->next_record_index = 0; /* at end, reset */
> +        }
> +    } while (rrc == ACPI_ERST_STATUS_SUCCESS);
> +
> +    return rc;
> +}
> +
> +static unsigned read_erst_record(ERSTDeviceState *s)
> +{
> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
> +    bool record_found;
> +    int index;
> +
> +    index = lookup_erst_record_by_identifier(s,
> +        s->record_identifier, &record_found, 0);
> +    if (record_found) {
> +        rc = copy_from_nvram_by_index(s, (unsigned)index);
> +        if (rc == ACPI_ERST_STATUS_SUCCESS) {
> +            if (s->record_offset < ERST_RECORD_SIZE) {
> +                memcpy(&s->record[s->record_offset], s->tmp_record,
> +                    ERST_RECORD_SIZE - s->record_offset);
> +            }
> +        }
> +    }
> +
> +    return rc;
> +}
> +
> +static unsigned get_erst_record_count(ERSTDeviceState *s)
> +{
> +    /* Compute record_count */
> +    off_t offset;
> +
> +    s->record_count = 0;
> +    offset = 0;
> +    do {
> +        uint8_t *ptr = &s->nvram[offset];
> +        uint64_t record_identifier = THE_UEFI_CPER_RECORD_ID(ptr);
> +        if (IS_UEFI_CPER_RECORD(ptr) &&
> +            (ERST_INVALID_RECORD_ID != record_identifier)) {
> +            s->record_count += 1;
> +        }
> +        offset += ERST_RECORD_SIZE;
> +    } while (offset < (off_t)s->prop_size);
> +
> +    return s->record_count;
> +}
> +
> +static void load_erst_backing_file(ERSTDeviceState *s)
> +{
> +    int fd;
> +    struct stat statbuf;
> +
> +    erst_debug("+load_erst_backing_file()\n");
> +
> +    /* Allocate and initialize nvram[] */
> +    s->nvram = g_malloc(s->prop_size);
> +    memset(s->nvram, 0xFF, s->prop_size);
> +
> +    /* Ensure backing file at least same as prop_size */
> +    if (stat(s->prop_filename, &statbuf) == 0) {
> +        /* ensure prop_size at least matches file size */
> +        if (statbuf.st_size < s->prop_size) {
> +            /* Ensure records are ERST_INVALID_RECORD_ID */
> +            memset(s->nvram, 0xFF, s->prop_size - statbuf.st_size);
> +            update_erst_backing_file(s,
> +                statbuf.st_size, s->nvram, s->prop_size - statbuf.st_size);
> +        }
> +    }
> +
> +    /* Pre-load nvram[] from backing file, if present */
> +    fd = open(s->prop_filename, O_RDONLY, ERST_BACKING_FILE_MODE);
> +    if (fd > 0) {
> +        size_t rrc = read(fd, s->nvram, s->prop_size);
> +        (void)rrc;
> +        close(fd);
> +        /*
> +         * If existing file is smaller than prop_size, it will be resized
> +         * accordingly upon subsequent record writes. If the file
> +         * is larger than prop_size, only prop_size bytes are utilized,
> +         * the extra bytes are untouched (though will be lost after
> +         * a migration when the backing file is re-written as length
> +         * of prop_size bytes).
> +         */
> +    } else {
> +        /* Create empty backing file */
> +        update_erst_backing_file(s, 0, s->nvram, s->prop_size);
> +    }
> +
> +    /* Initialize record_count */
> +    get_erst_record_count(s);
> +
> +    erst_debug("-load_erst_backing_file() %d\n", s->record_count);
> +}
> +
> +static uint64_t erst_rd_reg64(hwaddr addr,
> +    uint64_t reg, unsigned size)
> +{
> +    uint64_t rdval;
> +    uint64_t mask;
> +    unsigned shift;
> +
> +    if (size == sizeof(uint64_t)) {
> +        /* 64b access */
> +        mask = 0xFFFFFFFFFFFFFFFFUL;
> +        shift = 0;
> +    } else {
> +        /* 32b access */
> +        mask = 0x00000000FFFFFFFFUL;
> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
> +    }
> +
> +    rdval = reg;
> +    rdval >>= shift;
> +    rdval &= mask;
> +
> +    return rdval;
> +}
> +
> +static uint64_t erst_wr_reg64(hwaddr addr,
> +    uint64_t reg, uint64_t val, unsigned size)
> +{
> +    uint64_t wrval;
> +    uint64_t mask;
> +    unsigned shift;
> +
> +    if (size == sizeof(uint64_t)) {
> +        /* 64b access */
> +        mask = 0xFFFFFFFFFFFFFFFFUL;
> +        shift = 0;
> +    } else {
> +        /* 32b access */
> +        mask = 0x00000000FFFFFFFFUL;
> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
> +    }
> +
> +    val &= mask;
> +    val <<= shift;
> +    mask <<= shift;
> +    wrval = reg;
> +    wrval &= ~mask;
> +    wrval |= val;
> +
> +    return wrval;
> +}
> +
> +static void erst_write(void *opaque, hwaddr addr,
> +    uint64_t val, unsigned size)
> +{
> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> +
> +    if (addr < ERST_REG_LEN) {
> +        /*
> +         * NOTE: All actions/operations/side effects happen on the WRITE,
> +         * by design. The READs simply return the reg_value contents.
> +         */
> +        erst_debug("ERST write %016lx %10s val %016lx sz %u",
> +            addr, erst_reg(addr), val, size);
> +        /* The REGISTER region */
> +        switch (addr) {
> +        case ERST_CSR_VALUE + 0:
> +        case ERST_CSR_VALUE + 4:
> +            s->reg_value = erst_wr_reg64(addr, s->reg_value, val, size);
> +            break;
> +        case ERST_CSR_ACTION + 0:
> +/*      case ERST_CSR_ACTION+4: as coded, not really a 64b register */
> +            switch (val) {
> +            case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
> +            case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
> +            case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
> +            case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> +            case ACPI_ERST_ACTION_END_OPERATION:
> +                s->operation = val;
> +                break;
> +            case ACPI_ERST_ACTION_SET_RECORD_OFFSET:
> +                s->record_offset = s->reg_value;
> +                break;
> +            case ACPI_ERST_ACTION_EXECUTE_OPERATION:
> +                if ((uint8_t)s->reg_value == ERST_EXECUTE_OPERATION_MAGIC) {
> +                    s->busy_status = 1;
> +                    switch (s->operation) {
> +                    case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
> +                        s->command_status = write_erst_record(s);
> +                        break;
> +                    case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
> +                        s->command_status = read_erst_record(s);
> +                        break;
> +                    case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
> +                        s->command_status = clear_erst_record(s);
> +                        break;
> +                    case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> +                        s->command_status = ACPI_ERST_STATUS_SUCCESS;
> +                        break;
> +                    case ACPI_ERST_ACTION_END_OPERATION:
> +                        s->command_status = ACPI_ERST_STATUS_SUCCESS;
> +                        break;
> +                    default:
> +                        s->command_status = ACPI_ERST_STATUS_FAILED;
> +                        break;
> +                    }
> +                    s->record_identifier = ERST_INVALID_RECORD_ID;
> +                    s->busy_status = 0;
> +                }
> +                break;
> +            case ACPI_ERST_ACTION_CHECK_BUSY_STATUS:
> +                s->reg_value = s->busy_status;
> +                break;
> +            case ACPI_ERST_ACTION_GET_COMMAND_STATUS:
> +                s->reg_value = s->command_status;
> +                break;
> +            case ACPI_ERST_ACTION_GET_RECORD_IDENTIFIER:
> +                s->command_status = next_erst_record(s, &s->reg_value);
> +                break;
> +            case ACPI_ERST_ACTION_SET_RECORD_IDENTIFIER:
> +                s->record_identifier = s->reg_value;
> +                break;
> +            case ACPI_ERST_ACTION_GET_RECORD_COUNT:
> +                s->reg_value = s->record_count;
> +                break;
> +            case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
> +                s->reg_value = s->base + ERST_REG_LEN;
> +                break;
> +            case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
> +                s->reg_value = ERST_RECORD_SIZE;
> +                break;
> +            case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
> +                s->reg_value = 0; /* correct/intended value */
> +                break;
> +            case ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS:
> +                /*
> +                 * 100UL is max, 10UL is nominal
> +                 */
> +                s->reg_value = ((100UL << 32) | (10UL << 0));
> +                break;
> +            case ACPI_ERST_ACTION_RESERVED:
> +            default:
> +                /*
> +                 * NOP
> +                 */
> +                break;
> +            }
> +            break;
> +        default:
> +            /*
> +             * All other register writes are NO-OPs
> +             */
> +            break;
> +        }
> +    } else {
> +        /* The RECORD region */
> +        unsigned offset = addr - ERST_REG_LEN;
> +        uint8_t *ptr = &s->record[offset];
> +        switch (size) {
> +        default:
> +        case sizeof(uint8_t):
> +            *(uint8_t *)ptr = (uint8_t)val;
> +            break;
> +        case sizeof(uint16_t):
> +            *(uint16_t *)ptr = (uint16_t)val;
> +            break;
> +        case sizeof(uint32_t):
> +            *(uint32_t *)ptr = (uint32_t)val;
> +            break;
> +        case sizeof(uint64_t):
> +            *(uint64_t *)ptr = (uint64_t)val;
> +            break;
> +        }
> +    }
> +}
> +
> +static uint64_t erst_read(void *opaque, hwaddr addr,
> +                                unsigned size)
> +{
> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> +    uint64_t val = 0;
> +
> +    if (addr < ERST_REG_LEN) {
> +        switch (addr) {
> +        case ERST_CSR_ACTION + 0:
> +        case ERST_CSR_ACTION + 4:
> +            val = erst_rd_reg64(addr, s->reg_action, size);
> +            break;
> +        case ERST_CSR_VALUE + 0:
> +        case ERST_CSR_VALUE + 4:
> +            val = erst_rd_reg64(addr, s->reg_value, size);
> +            break;
> +        default:
> +            break;
> +        }
> +    erst_debug("ERST read  %016lx %10s val %016lx sz %u\n",
> +        addr, erst_reg(addr), val, size);
> +    } else {
> +        /*
> +         * The RECORD region
> +         */
> +        uint8_t *ptr = &s->record[addr - ERST_REG_LEN];
> +        switch (size) {
> +        default:
> +        case sizeof(uint8_t):
> +            val = *(uint8_t *)ptr;
> +            break;
> +        case sizeof(uint16_t):
> +            val = *(uint16_t *)ptr;
> +            break;
> +        case sizeof(uint32_t):
> +            val = *(uint32_t *)ptr;
> +            break;
> +        case sizeof(uint64_t):
> +            val = *(uint64_t *)ptr;
> +            break;
> +        }
> +    }
> +    erst_debug("ERST read  %016lx %10s val %016lx sz %u\n",
> +        addr, erst_reg(addr), val, size);
> +    return val;
> +}
> +
> +static size_t build_erst_action(GArray *table_data,
> +    uint8_t serialization_action,
> +    uint8_t instruction,
> +    uint8_t flags,
> +    uint8_t width,
> +    uint64_t address,
> +    uint64_t value,
> +    uint64_t mask)
> +{
> +    /* See ACPI spec, Error Serialization */
> +    uint8_t access_width = 0;
> +    build_append_int_noprefix(table_data, serialization_action, 1);
> +    build_append_int_noprefix(table_data, instruction         , 1);
> +    build_append_int_noprefix(table_data, flags               , 1);
> +    build_append_int_noprefix(table_data, 0                   , 1);
> +    /* GAS space_id */
> +    build_append_int_noprefix(table_data, AML_SYSTEM_MEMORY   , 1);
> +    /* GAS bit_width */
> +    build_append_int_noprefix(table_data, width               , 1);
> +    /* GAS bit_offset */
> +    build_append_int_noprefix(table_data, 0                   , 1);
> +    /* GAS access_width */
> +    switch (width) {
> +    case 8:
> +        access_width = 1;
> +        break;
> +    case 16:
> +        access_width = 2;
> +        break;
> +    case 32:
> +        access_width = 3;
> +        break;
> +    case 64:
> +        access_width = 4;
> +        break;
> +    default:
> +        access_width = 0;
> +        break;
> +    }
> +    build_append_int_noprefix(table_data, access_width        , 1);
> +    /* GAS address */
> +    build_append_int_noprefix(table_data, address, 8);
> +    /* value */
> +    build_append_int_noprefix(table_data, value  , 8);
> +    /* mask */
> +    build_append_int_noprefix(table_data, mask   , 8);
> +
> +    return 1;
> +}
> +
> +static const MemoryRegionOps erst_rw_ops = {
> +    .read = erst_read,
> +    .write = erst_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +};
> +
> +void build_erst(GArray *table_data, BIOSLinker *linker, hwaddr base)
> +{
> +    unsigned action, insns = 0;
> +    unsigned erst_start = table_data->len;
> +    unsigned iec_offset = 0;
> +
> +    /* See ACPI spec, Error Serialization */
> +    acpi_data_push(table_data, sizeof(AcpiTableHeader));
> +    /* serialization_header_length */
> +    build_append_int_noprefix(table_data, 48, 4);
> +    /* reserved */
> +    build_append_int_noprefix(table_data,  0, 4);
> +    iec_offset = table_data->len;
> +    /* instruction_entry_count (placeholder) */
> +    build_append_int_noprefix(table_data,  0, 4);
> +
> +#define BEA(I, F, W, ADDR, VAL, MASK) \
> +    build_erst_action(table_data, action, \
> +        ACPI_ERST_INST_##I, F, W, base + ADDR, VAL, MASK)
> +#define MASK8  0x00000000000000FFUL
> +#define MASK16 0x000000000000FFFFUL
> +#define MASK32 0x00000000FFFFFFFFUL
> +#define MASK64 0xFFFFFFFFFFFFFFFFUL
> +
> +    for (action = 0; action < ACPI_ERST_MAX_ACTIONS; ++action) {
> +        switch (action) {
> +        case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_END_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_SET_RECORD_OFFSET:
> +            insns += BEA(WRITE_REGISTER      , 0, 32,
> +                ERST_CSR_VALUE , 0, MASK32);
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_EXECUTE_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_VALUE , ERST_EXECUTE_OPERATION_MAGIC, MASK8);
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_CHECK_BUSY_STATUS:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER_VALUE , 0, 32,
> +                ERST_CSR_VALUE, 0x01, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_GET_COMMAND_STATUS:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 32,
> +                ERST_CSR_VALUE, 0, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_GET_RECORD_IDENTIFIER:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 64,
> +                ERST_CSR_VALUE, 0, MASK64);
> +            break;
> +        case ACPI_ERST_ACTION_SET_RECORD_IDENTIFIER:
> +            insns += BEA(WRITE_REGISTER      , 0, 64,
> +                ERST_CSR_VALUE , 0, MASK64);
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_GET_RECORD_COUNT:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 32,
> +                ERST_CSR_VALUE, 0, MASK32);
> +            break;
> +        case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_RESERVED:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 64,
> +                ERST_CSR_VALUE, 0, MASK64);
> +            break;
> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 64,
> +                ERST_CSR_VALUE, 0, MASK32);
> +            break;
> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 32,
> +                ERST_CSR_VALUE, 0, MASK32);
> +            break;
> +        case ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 64,
> +                ERST_CSR_VALUE, 0, MASK64);
> +        default:
> +            insns += BEA(NOOP, 0, 0, 0, action, 0);
> +            break;
> +        }
> +    }
> +
> +    /* acpi_data_push() within BEA() can result in new GArray pointer */
> +    *(uint32_t *)(table_data->data + iec_offset) = cpu_to_le32(insns);
> +
> +    build_header(linker, table_data,
> +                 (void *)(table_data->data + erst_start),
> +                 "ERST", table_data->len - erst_start,
> +                 1, NULL, NULL);
> +
> +    if (erst_base == 0) {
> +        /*
> +         * This ACPI routine is invoked twice, but this code
> +         * snippet needs to happen just once.
> +         * And this code in erst_class_init() is too early.
> +         */
> +        DeviceState *dev;
> +        SysBusDevice *s;
> +
> +        dev = qdev_new(TYPE_ACPI_ERST);
> +        erst_debug("qdev_create dev %p\n", dev);
> +        s = SYS_BUS_DEVICE(dev);
> +        sysbus_realize_and_unref(s, &error_fatal);
> +
> +        ACPIERST(dev)->base = base;
> +        sysbus_mmio_map(s, 0, base);
> +        erst_base = base;
> +        erst_debug("erst_base %lx\n", base);
> +    }
> +}
> +
> +/*******************************************************************/
> +/*******************************************************************/
> +static int erst_post_load(void *opaque, int version_id)
> +{
> +    ERSTDeviceState *s = opaque;
> +    erst_debug("+erst_post_load(%d)\n", version_id);
> +    /* Ensure nvram[] persists into backing file */
> +    update_erst_backing_file(s, 0, s->nvram, s->prop_size);
> +    erst_debug("-erst_post_load()\n");
> +    return 0;
> +}
> +
> +static const VMStateDescription erst_vmstate  = {
> +    .name = "acpi-erst",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .post_load = erst_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT8(operation, ERSTDeviceState),
> +        VMSTATE_UINT8(busy_status, ERSTDeviceState),
> +        VMSTATE_UINT8(command_status, ERSTDeviceState),
> +        VMSTATE_UINT32(record_offset, ERSTDeviceState),
> +        VMSTATE_UINT32(record_count, ERSTDeviceState),
> +        VMSTATE_UINT64(reg_action, ERSTDeviceState),
> +        VMSTATE_UINT64(reg_value, ERSTDeviceState),
> +        VMSTATE_UINT64(record_identifier, ERSTDeviceState),
> +        VMSTATE_UINT8_ARRAY(record, ERSTDeviceState, ERST_RECORD_SIZE),
> +        VMSTATE_UINT8_ARRAY(tmp_record, ERSTDeviceState, ERST_RECORD_SIZE),
> +        VMSTATE_VARRAY_UINT32(nvram, ERSTDeviceState, prop_size, 0,
> +            vmstate_info_uint8, uint8_t),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static void erst_realizefn(DeviceState *dev, Error **errp)
> +{
> +    SysBusDevice *d = SYS_BUS_DEVICE(dev);
> +    ERSTDeviceState *s = ACPIERST(dev);
> +
> +    erst_debug("+erst_realizefn()\n");
> +    if (!s->prop_filename) {
> +        s->prop_filename = (char *)(TYPE_ACPI_ERST ".backing");
> +    }
> +
> +    if (!s->prop_filename) {
> +        error_setg(errp, "'filename' property is not set");
> +        return;
> +    }
> +
> +    if (!(s->prop_size > ERST_RECORD_SIZE) &&
> +        (s->prop_size <= 0x04000000)) {
> +        error_setg(errp, "'size' property %d is not set properly",
> +            s->prop_size);
> +        return;
> +    }
> +
> +    /* Convert prop_size to integer multiple of ERST_RECORD_SIZE */
> +    s->prop_size -= (s->prop_size % ERST_RECORD_SIZE);
> +
> +    load_erst_backing_file(s);
> +
> +    erst_debug("filename %s\n", s->prop_filename);
> +    erst_debug("size %x\n", s->prop_size);
> +
> +    memory_region_init_io(&s->iomem, OBJECT(s), &erst_rw_ops, s,
> +                          TYPE_ACPI_ERST, ERST_IOMEM_SIZE);
> +    sysbus_init_mmio(d, &s->iomem);
> +    erst_debug("-erst_realizefn()\n");
> +}
> +
> +static void erst_unrealizefn(DeviceState *dev)
> +{
> +    ERSTDeviceState *s = ACPIERST(dev);
> +
> +    erst_debug("+erst_unrealizefn()\n");
> +    if (s->nvram) {
> +        /* Ensure nvram[] persists into backing file */
> +        update_erst_backing_file(s, 0, s->nvram, s->prop_size);
> +        g_free(s->nvram);
> +        s->nvram = NULL;
> +    }
> +    erst_debug("-erst_unrealizefn()\n");
> +}
> +
> +static void erst_reset(DeviceState *dev)
> +{
> +    ERSTDeviceState *s = ACPIERST(dev);
> +
> +    erst_debug("+erst_reset(%p) %d\n", s, s->record_count);
> +    s->operation = 0;
> +    s->busy_status = 0;
> +    s->command_status = ACPI_ERST_STATUS_SUCCESS;
> +    /* indicate empty/no-more until further notice */
> +    s->record_identifier = ERST_INVALID_RECORD_ID;
> +    s->record_offset = 0;
> +    s->next_record_index = 0;
> +    /* NOTE: record_count and nvram[] are initialized elsewhere */
> +    erst_debug("-erst_reset()\n");
> +}
> +
> +static Property erst_properties[] = {
> +    DEFINE_PROP_UINT32("size", ERSTDeviceState, prop_size, 0x00010000),
> +    DEFINE_PROP_STRING("filename", ERSTDeviceState, prop_filename),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void erst_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    erst_debug("+erst_class_init()\n");
> +    dc->realize = erst_realizefn;
> +    dc->unrealize = erst_unrealizefn;
> +    dc->reset = erst_reset;
> +    dc->vmsd = &erst_vmstate;
> +    device_class_set_props(dc, erst_properties);
> +    dc->desc = "ACPI Error Record Serialization Table (ERST) device";
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +    erst_debug("-erst_class_init()\n");
> +}
> +
> +static const TypeInfo erst_type_info = {
> +    .name          = TYPE_ACPI_ERST,
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .class_init    = erst_class_init,
> +    .instance_size = sizeof(ERSTDeviceState),
> +};
> +
> +static void erst_register_types(void)
> +{
> +    type_register_static(&erst_type_info);
> +}
> +
> +type_init(erst_register_types)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-04-06 19:31   ` Igor Mammedov
@ 2021-04-09 15:54     ` Eric DeVolder
  2021-04-14  9:17       ` Igor Mammedov
  0 siblings, 1 reply; 24+ messages in thread
From: Eric DeVolder @ 2021-04-09 15:54 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, mst, Konrad Wilk, qemu-devel, pbonzini, Boris Ostrovsky, rth

[-- Attachment #1: Type: text/plain, Size: 40097 bytes --]

Hi Igor,
Thank you for reviewing. I've responded inline below.
eric

________________________________
From: Igor Mammedov <imammedo@redhat.com>
Sent: Tuesday, April 6, 2021 2:31 PM
To: Eric DeVolder <eric.devolder@oracle.com>
Cc: mst@redhat.com <mst@redhat.com>; marcel.apfelbaum@gmail.com <marcel.apfelbaum@gmail.com>; pbonzini@redhat.com <pbonzini@redhat.com>; rth@twiddle.net <rth@twiddle.net>; ehabkost@redhat.com <ehabkost@redhat.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; kwilk@oracle.com <kwilk@oracle.com>
Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature

On Mon,  8 Feb 2021 15:57:55 -0500
Eric DeVolder <eric.devolder@oracle.com> wrote:

> This change implements the support for the ACPI ERST feature[1,2].
>
> The size of the ACPI ERST storage is declared via the QEMU
> global parameter acpi-erst.size. The size can range from 64KiB
> to to 64MiB. The default is 64KiB.
>
> The location of the ACPI ERST storage backing file is delared
> via the QEMU global parameter acpi-erst.filename. The default
> is acpi-erst.backing.
>
> [1] "Advanced Configuration and Power Interface Specification",
>     version 6.2, May 2017.
>     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
>
> [2] "Unified Extensible Firmware Interface Specification",
>     version 2.8, March 2019.
>     https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
>
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>

items 2/4/5 from v1 review still need to be addressed.

>
> 2. patch is too big to review, please split it up in smaller chunks.
>
> EJD: Done.

(separating a header and a makefile rule doesn't make much sense)

it should be split at least on part that implements device model and ACPI parts

EJD: I'll rebase this patch set on qemu-6 and accommodate your suggestions with how to split/organize the patch set.

[...]
>
> 4. Maybe instead of SYSBUS device, implement it as a PCI device and
>    use its BAR/control registers for pstore storage and control interface.
>    It could save you headache of picking address where to map it +
>    it would take care of migration part automatically, as firmware
>    would do it for you and then QEMU could pickup firmware programmed
>    address and put it into ERST table.
> EJD: Thanks for the idea. For now I've left it as a SYSBUS device; we can revisit as needed.

I would really prefer to see a PCI version (current way is just a hack)

EJD: I understand, I don't like the base address problem either. Is there an example PCI device that gets its base address assigned during ACPI setup that I could reference and pattern this work after? I've been using SYSBUS as that most closely mimics the real hardware implementations I've studied in order to produce this code.
EJD: I thought my inexperience with authoring QEMU devices was the primary problem in establishing a solution for the base address. Otherwise, this thing only needs a single 4KiB page (for the 2 registers + exchange buffer) exposed.

> 5. instead of dealing with file for storage directly, reuse hostmem backend
>    to provide it to for your device. ex: pc-dimm. i.e. split device
>    on frontend and backend
>
> EJD: I had looked into that prior to posting v1. The entire ERST storage is not memory mapped, just an exchange buffer. So the hostmem backend is not suitable for this purpose.

Is there a compelling reason why it can't be memory mapped?

EJD: Well, this ERST device I've coded pretty much follows the ACPI ERST spec verbatim. As it stands today, the spec doesn't provide a way to report the total size of the persistent storage behind the interface; you know when storage is full only when you receive an Out Of Storage error code upon write. In a sense, that allows the size of the storage to vary greatly and be implemented in any way needed (ie actual hardware, this has tended to be in the 64KiB range when it is carved out of system parallel flash memory, but some hardware uses serial flash as well). In virtual environments, it can be of any size, and we at Oracle have intentions of heavily utilizing ACPI ERST to stuff all kinds of diagnostic information into it, thus wanting the storage to be very large. By not actually exposing/memory-mapping the storage, the issue of where to drop it in the memory map goes away (yes a PCI BAR could solve this).
EJD: But at the end of the day, could this storage be memory mapped? I suppose it could be, but then that rather circumvents the entire need for the ACPI ERST interface to start with. Linux and Windows both already know how to utilize ACPI ERST.




> ---
>  hw/acpi/erst.c | 952 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 952 insertions(+)
>  create mode 100644 hw/acpi/erst.c
>
> diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
> new file mode 100644
> index 0000000..3a342f9
> --- /dev/null
> +++ b/hw/acpi/erst.c
> @@ -0,0 +1,952 @@
> +/*
> + * ACPI Error Record Serialization Table, ERST, Implementation
> + *
> + * Copyright (c) 2020 Oracle and/or its affiliates.
> + *
> + * See ACPI specification,
> + * "ACPI Platform Error Interfaces" : "Error Serialization"
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation;
> + * version 2 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <unistd.h>
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "hw/sysbus.h"
> +#include "qom/object_interfaces.h"
> +#include "qemu/error-report.h"
> +#include "migration/vmstate.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/acpi-defs.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "exec/address-spaces.h"
> +#include "hw/acpi/erst.h"
> +
> +#ifdef _ERST_DEBUG
> +#define erst_debug(fmt, ...) \
> +    do { fprintf(stderr, fmt, ## __VA_ARGS__); fflush(stderr); } while (0)
> +#else
> +#define erst_debug(fmt, ...) do { } while (0)
> +#endif
> +
> +/* See UEFI spec, Appendix N Common Platform Error Record */
> +/* UEFI CPER allows for an OSPM book keeping area in the record */
> +#define UEFI_CPER_RECORD_MIN_SIZE 128U
> +#define UEFI_CPER_SIZE_OFFSET 20U
> +#define UEFI_CPER_RECORD_ID_OFFSET 96U
> +#define IS_UEFI_CPER_RECORD(ptr) \
> +    (((ptr)[0] == 'C') && \
> +     ((ptr)[1] == 'P') && \
> +     ((ptr)[2] == 'E') && \
> +     ((ptr)[3] == 'R'))
> +#define THE_UEFI_CPER_RECORD_ID(ptr) \
> +    (*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
> +
> +#define ERST_INVALID_RECORD_ID (~0UL)
> +#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
> +#define ERST_CSR_ACTION (0UL << 3) /* action (cmd) */
> +#define ERST_CSR_VALUE  (1UL << 3) /* argument/value (data) */
> +
> +/*
> + * As ERST_IOMEM_SIZE is used to map the ERST into the guest,
> + * it should/must be an integer multiple of PAGE_SIZE.
> + * NOTE that any change to this value will make any pre-
> + * existing backing files, not of the same ERST_IOMEM_SIZE,
> + * unusable to the guest.
> + */
> +#define ERST_IOMEM_SIZE (2UL * 4096)
> +
> +/*
> + * This implementation is an ACTION (cmd) and VALUE (data)
> + * interface consisting of just two 64-bit registers.
> + */
> +#define ERST_REG_LEN (2UL * sizeof(uint64_t))
> +
> +/*
> + * The space not utilized by the register interface is the
> + * buffer for exchanging ERST record contents.
> + */
> +#define ERST_RECORD_SIZE (ERST_IOMEM_SIZE - ERST_REG_LEN)
> +
> +/*
> + * Mode to be used for backing file
> + */
> +#define ERST_BACKING_FILE_MODE 0644 /* S_IRWXU|S_IRWXG */
> +
> +#define ACPIERST(obj) \
> +    OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
> +#define ACPIERST_CLASS(oc) \
> +    OBJECT_CLASS_CHECK(ERSTDeviceStateClass, (oc), TYPE_ACPI_ERST)
> +#define ACPIERST_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(ERSTDeviceStateClass, (obj), TYPE_ACPI_ERST)
> +
> +static hwaddr erst_base;
> +
> +typedef struct {
> +    SysBusDevice parent_obj;
> +
> +    MemoryRegion iomem;
> +    uint32_t prop_size;
> +    char *prop_filename;
> +    hwaddr base;
> +
> +    uint8_t operation;
> +    uint8_t busy_status;
> +    uint8_t command_status;
> +    uint32_t record_offset;
> +    uint32_t record_count;
> +    uint64_t reg_action;
> +    uint64_t reg_value;
> +    uint64_t record_identifier;
> +
> +    unsigned next_record_index;
> +    uint8_t record[ERST_RECORD_SIZE]; /* read/written directly by guest */
> +    uint8_t tmp_record[ERST_RECORD_SIZE]; /* intermediate manipulation buffer */
> +    uint8_t *nvram; /* persistent storage, of length prop_size */
> +
> +} ERSTDeviceState;
> +
> +static void update_erst_backing_file(ERSTDeviceState *s,
> +    off_t offset, const uint8_t *data, size_t length)
> +{
> +    int fd;
> +
> +    /* Bounds check */
> +    if ((offset + length) > s->prop_size) {
> +        error_report("update: off 0x%lx len 0x%lx > size 0x%lx out of range",
> +            (long)offset, (long)length, (long)s->prop_size);
> +        return;
> +    }
> +
> +    fd = open(s->prop_filename, O_WRONLY | O_CREAT, ERST_BACKING_FILE_MODE);
> +    if (fd > 0) {
> +        off_t src;
> +        size_t wrc = 0;
> +        src = lseek(fd, offset, SEEK_SET);
> +        if (offset == src) {
> +            wrc = write(fd, data, length);
> +        }
> +        if ((offset != src) || (length != wrc)) {
> +            error_report("ERST write failed: %d %d", (int)wrc, (int)length);
> +        }
> +        close(fd);
> +    } else {
> +        error_report("open failed: %s : %d", s->prop_filename, fd);
> +    }
> +}
> +
> +static unsigned copy_from_nvram_by_index(ERSTDeviceState *s, unsigned index)
> +{
> +    /* Read a nvram[] entry into tmp_record */
> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
> +    off_t offset = (index * ERST_RECORD_SIZE);
> +
> +    if ((offset + ERST_RECORD_SIZE) <= s->prop_size) {
> +        memcpy(s->tmp_record, &s->nvram[offset], ERST_RECORD_SIZE);
> +        rc = ACPI_ERST_STATUS_SUCCESS;
> +    }
> +    return rc;
> +}
> +
> +static unsigned copy_to_nvram_by_index(ERSTDeviceState *s, unsigned index)
> +{
> +    /* Write entry in tmp_record into nvram[], and backing file */
> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
> +    off_t offset = (index * ERST_RECORD_SIZE);
> +
> +    if ((offset + ERST_RECORD_SIZE) <= s->prop_size) {
> +        memcpy(&s->nvram[offset], s->tmp_record, ERST_RECORD_SIZE);
> +        update_erst_backing_file(s, offset, s->tmp_record, ERST_RECORD_SIZE);
> +        rc = ACPI_ERST_STATUS_SUCCESS;
> +    }
> +    return rc;
> +}
> +
> +static int lookup_erst_record_by_identifier(ERSTDeviceState *s,
> +    uint64_t record_identifier, bool *record_found, bool alloc_for_write)
> +{
> +    int rc = -1;
> +    int empty_index = -1;
> +    int index = 0;
> +    unsigned rrc;
> +
> +    *record_found = 0;
> +
> +    do {
> +        rrc = copy_from_nvram_by_index(s, (unsigned)index);
> +        if (rrc == ACPI_ERST_STATUS_SUCCESS) {
> +            uint64_t this_identifier;
> +            this_identifier = THE_UEFI_CPER_RECORD_ID(s->tmp_record);
> +            if (IS_UEFI_CPER_RECORD(s->tmp_record) &&
> +                (this_identifier == record_identifier)) {
> +                rc = index;
> +                *record_found = 1;
> +                break;
> +            }
> +            if ((this_identifier == ERST_INVALID_RECORD_ID) &&
> +                (empty_index < 0)) {
> +                empty_index = index; /* first available for write */
> +            }
> +        }
> +        ++index;
> +    } while (rrc == ACPI_ERST_STATUS_SUCCESS);
> +
> +    /* Record not found, allocate for writing */
> +    if ((rc < 0) && alloc_for_write) {
> +        rc = empty_index;
> +    }
> +
> +    return rc;
> +}
> +
> +static unsigned clear_erst_record(ERSTDeviceState *s)
> +{
> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
> +    bool record_found;
> +    int index;
> +
> +    index = lookup_erst_record_by_identifier(s,
> +        s->record_identifier, &record_found, 0);
> +    if (record_found) {
> +        memset(s->tmp_record, 0xFF, ERST_RECORD_SIZE);
> +        rc = copy_to_nvram_by_index(s, (unsigned)index);
> +        if (rc == ACPI_ERST_STATUS_SUCCESS) {
> +            s->record_count -= 1;
> +        }
> +    }
> +
> +    return rc;
> +}
> +
> +static unsigned write_erst_record(ERSTDeviceState *s)
> +{
> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
> +
> +    if (s->record_offset < (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
> +        uint64_t record_identifier;
> +        uint8_t *record = &s->record[s->record_offset];
> +        bool record_found;
> +        int index;
> +
> +        record_identifier = (s->record_identifier == ERST_INVALID_RECORD_ID)
> +            ? THE_UEFI_CPER_RECORD_ID(record) : s->record_identifier;
> +
> +        index = lookup_erst_record_by_identifier(s,
> +            record_identifier, &record_found, 1);
> +        if (index < 0) {
> +            rc = ACPI_ERST_STATUS_NOT_ENOUGH_SPACE;
> +        } else {
> +            if (0 != s->record_offset) {
> +                memset(&s->tmp_record[ERST_RECORD_SIZE - s->record_offset],
> +                    0xFF, s->record_offset);
> +            }
> +            memcpy(s->tmp_record, record, ERST_RECORD_SIZE - s->record_offset);
> +            rc = copy_to_nvram_by_index(s, (unsigned)index);
> +            if (rc == ACPI_ERST_STATUS_SUCCESS) {
> +                if (!record_found) { /* not overwriting existing record */
> +                    s->record_count += 1; /* writing new record */
> +                }
> +            }
> +        }
> +    }
> +
> +    return rc;
> +}
> +
> +static unsigned next_erst_record(ERSTDeviceState *s,
> +    uint64_t *record_identifier)
> +{
> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
> +    unsigned index;
> +    unsigned rrc;
> +
> +    *record_identifier = ERST_INVALID_RECORD_ID;
> +
> +    index = s->next_record_index;
> +    do {
> +        rrc = copy_from_nvram_by_index(s, (unsigned)index);
> +        if (rrc == ACPI_ERST_STATUS_SUCCESS) {
> +            if (IS_UEFI_CPER_RECORD(s->tmp_record)) {
> +                s->next_record_index = index + 1; /* where to start next time */
> +                *record_identifier = THE_UEFI_CPER_RECORD_ID(s->tmp_record);
> +                rc = ACPI_ERST_STATUS_SUCCESS;
> +                break;
> +            }
> +            ++index;
> +        } else {
> +            if (s->next_record_index == 0) {
> +                rc = ACPI_ERST_STATUS_RECORD_STORE_EMPTY;
> +            }
> +            s->next_record_index = 0; /* at end, reset */
> +        }
> +    } while (rrc == ACPI_ERST_STATUS_SUCCESS);
> +
> +    return rc;
> +}
> +
> +static unsigned read_erst_record(ERSTDeviceState *s)
> +{
> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
> +    bool record_found;
> +    int index;
> +
> +    index = lookup_erst_record_by_identifier(s,
> +        s->record_identifier, &record_found, 0);
> +    if (record_found) {
> +        rc = copy_from_nvram_by_index(s, (unsigned)index);
> +        if (rc == ACPI_ERST_STATUS_SUCCESS) {
> +            if (s->record_offset < ERST_RECORD_SIZE) {
> +                memcpy(&s->record[s->record_offset], s->tmp_record,
> +                    ERST_RECORD_SIZE - s->record_offset);
> +            }
> +        }
> +    }
> +
> +    return rc;
> +}
> +
> +static unsigned get_erst_record_count(ERSTDeviceState *s)
> +{
> +    /* Compute record_count */
> +    off_t offset;
> +
> +    s->record_count = 0;
> +    offset = 0;
> +    do {
> +        uint8_t *ptr = &s->nvram[offset];
> +        uint64_t record_identifier = THE_UEFI_CPER_RECORD_ID(ptr);
> +        if (IS_UEFI_CPER_RECORD(ptr) &&
> +            (ERST_INVALID_RECORD_ID != record_identifier)) {
> +            s->record_count += 1;
> +        }
> +        offset += ERST_RECORD_SIZE;
> +    } while (offset < (off_t)s->prop_size);
> +
> +    return s->record_count;
> +}
> +
> +static void load_erst_backing_file(ERSTDeviceState *s)
> +{
> +    int fd;
> +    struct stat statbuf;
> +
> +    erst_debug("+load_erst_backing_file()\n");
> +
> +    /* Allocate and initialize nvram[] */
> +    s->nvram = g_malloc(s->prop_size);
> +    memset(s->nvram, 0xFF, s->prop_size);
> +
> +    /* Ensure backing file at least same as prop_size */
> +    if (stat(s->prop_filename, &statbuf) == 0) {
> +        /* ensure prop_size at least matches file size */
> +        if (statbuf.st_size < s->prop_size) {
> +            /* Ensure records are ERST_INVALID_RECORD_ID */
> +            memset(s->nvram, 0xFF, s->prop_size - statbuf.st_size);
> +            update_erst_backing_file(s,
> +                statbuf.st_size, s->nvram, s->prop_size - statbuf.st_size);
> +        }
> +    }
> +
> +    /* Pre-load nvram[] from backing file, if present */
> +    fd = open(s->prop_filename, O_RDONLY, ERST_BACKING_FILE_MODE);
> +    if (fd > 0) {
> +        size_t rrc = read(fd, s->nvram, s->prop_size);
> +        (void)rrc;
> +        close(fd);
> +        /*
> +         * If existing file is smaller than prop_size, it will be resized
> +         * accordingly upon subsequent record writes. If the file
> +         * is larger than prop_size, only prop_size bytes are utilized,
> +         * the extra bytes are untouched (though will be lost after
> +         * a migration when the backing file is re-written as length
> +         * of prop_size bytes).
> +         */
> +    } else {
> +        /* Create empty backing file */
> +        update_erst_backing_file(s, 0, s->nvram, s->prop_size);
> +    }
> +
> +    /* Initialize record_count */
> +    get_erst_record_count(s);
> +
> +    erst_debug("-load_erst_backing_file() %d\n", s->record_count);
> +}
> +
> +static uint64_t erst_rd_reg64(hwaddr addr,
> +    uint64_t reg, unsigned size)
> +{
> +    uint64_t rdval;
> +    uint64_t mask;
> +    unsigned shift;
> +
> +    if (size == sizeof(uint64_t)) {
> +        /* 64b access */
> +        mask = 0xFFFFFFFFFFFFFFFFUL;
> +        shift = 0;
> +    } else {
> +        /* 32b access */
> +        mask = 0x00000000FFFFFFFFUL;
> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
> +    }
> +
> +    rdval = reg;
> +    rdval >>= shift;
> +    rdval &= mask;
> +
> +    return rdval;
> +}
> +
> +static uint64_t erst_wr_reg64(hwaddr addr,
> +    uint64_t reg, uint64_t val, unsigned size)
> +{
> +    uint64_t wrval;
> +    uint64_t mask;
> +    unsigned shift;
> +
> +    if (size == sizeof(uint64_t)) {
> +        /* 64b access */
> +        mask = 0xFFFFFFFFFFFFFFFFUL;
> +        shift = 0;
> +    } else {
> +        /* 32b access */
> +        mask = 0x00000000FFFFFFFFUL;
> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
> +    }
> +
> +    val &= mask;
> +    val <<= shift;
> +    mask <<= shift;
> +    wrval = reg;
> +    wrval &= ~mask;
> +    wrval |= val;
> +
> +    return wrval;
> +}
> +
> +static void erst_write(void *opaque, hwaddr addr,
> +    uint64_t val, unsigned size)
> +{
> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> +
> +    if (addr < ERST_REG_LEN) {
> +        /*
> +         * NOTE: All actions/operations/side effects happen on the WRITE,
> +         * by design. The READs simply return the reg_value contents.
> +         */
> +        erst_debug("ERST write %016lx %10s val %016lx sz %u",
> +            addr, erst_reg(addr), val, size);
> +        /* The REGISTER region */
> +        switch (addr) {
> +        case ERST_CSR_VALUE + 0:
> +        case ERST_CSR_VALUE + 4:
> +            s->reg_value = erst_wr_reg64(addr, s->reg_value, val, size);
> +            break;
> +        case ERST_CSR_ACTION + 0:
> +/*      case ERST_CSR_ACTION+4: as coded, not really a 64b register */
> +            switch (val) {
> +            case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
> +            case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
> +            case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
> +            case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> +            case ACPI_ERST_ACTION_END_OPERATION:
> +                s->operation = val;
> +                break;
> +            case ACPI_ERST_ACTION_SET_RECORD_OFFSET:
> +                s->record_offset = s->reg_value;
> +                break;
> +            case ACPI_ERST_ACTION_EXECUTE_OPERATION:
> +                if ((uint8_t)s->reg_value == ERST_EXECUTE_OPERATION_MAGIC) {
> +                    s->busy_status = 1;
> +                    switch (s->operation) {
> +                    case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
> +                        s->command_status = write_erst_record(s);
> +                        break;
> +                    case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
> +                        s->command_status = read_erst_record(s);
> +                        break;
> +                    case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
> +                        s->command_status = clear_erst_record(s);
> +                        break;
> +                    case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> +                        s->command_status = ACPI_ERST_STATUS_SUCCESS;
> +                        break;
> +                    case ACPI_ERST_ACTION_END_OPERATION:
> +                        s->command_status = ACPI_ERST_STATUS_SUCCESS;
> +                        break;
> +                    default:
> +                        s->command_status = ACPI_ERST_STATUS_FAILED;
> +                        break;
> +                    }
> +                    s->record_identifier = ERST_INVALID_RECORD_ID;
> +                    s->busy_status = 0;
> +                }
> +                break;
> +            case ACPI_ERST_ACTION_CHECK_BUSY_STATUS:
> +                s->reg_value = s->busy_status;
> +                break;
> +            case ACPI_ERST_ACTION_GET_COMMAND_STATUS:
> +                s->reg_value = s->command_status;
> +                break;
> +            case ACPI_ERST_ACTION_GET_RECORD_IDENTIFIER:
> +                s->command_status = next_erst_record(s, &s->reg_value);
> +                break;
> +            case ACPI_ERST_ACTION_SET_RECORD_IDENTIFIER:
> +                s->record_identifier = s->reg_value;
> +                break;
> +            case ACPI_ERST_ACTION_GET_RECORD_COUNT:
> +                s->reg_value = s->record_count;
> +                break;
> +            case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
> +                s->reg_value = s->base + ERST_REG_LEN;
> +                break;
> +            case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
> +                s->reg_value = ERST_RECORD_SIZE;
> +                break;
> +            case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
> +                s->reg_value = 0; /* correct/intended value */
> +                break;
> +            case ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS:
> +                /*
> +                 * 100UL is max, 10UL is nominal
> +                 */
> +                s->reg_value = ((100UL << 32) | (10UL << 0));
> +                break;
> +            case ACPI_ERST_ACTION_RESERVED:
> +            default:
> +                /*
> +                 * NOP
> +                 */
> +                break;
> +            }
> +            break;
> +        default:
> +            /*
> +             * All other register writes are NO-OPs
> +             */
> +            break;
> +        }
> +    } else {
> +        /* The RECORD region */
> +        unsigned offset = addr - ERST_REG_LEN;
> +        uint8_t *ptr = &s->record[offset];
> +        switch (size) {
> +        default:
> +        case sizeof(uint8_t):
> +            *(uint8_t *)ptr = (uint8_t)val;
> +            break;
> +        case sizeof(uint16_t):
> +            *(uint16_t *)ptr = (uint16_t)val;
> +            break;
> +        case sizeof(uint32_t):
> +            *(uint32_t *)ptr = (uint32_t)val;
> +            break;
> +        case sizeof(uint64_t):
> +            *(uint64_t *)ptr = (uint64_t)val;
> +            break;
> +        }
> +    }
> +}
> +
> +static uint64_t erst_read(void *opaque, hwaddr addr,
> +                                unsigned size)
> +{
> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> +    uint64_t val = 0;
> +
> +    if (addr < ERST_REG_LEN) {
> +        switch (addr) {
> +        case ERST_CSR_ACTION + 0:
> +        case ERST_CSR_ACTION + 4:
> +            val = erst_rd_reg64(addr, s->reg_action, size);
> +            break;
> +        case ERST_CSR_VALUE + 0:
> +        case ERST_CSR_VALUE + 4:
> +            val = erst_rd_reg64(addr, s->reg_value, size);
> +            break;
> +        default:
> +            break;
> +        }
> +    erst_debug("ERST read  %016lx %10s val %016lx sz %u\n",
> +        addr, erst_reg(addr), val, size);
> +    } else {
> +        /*
> +         * The RECORD region
> +         */
> +        uint8_t *ptr = &s->record[addr - ERST_REG_LEN];
> +        switch (size) {
> +        default:
> +        case sizeof(uint8_t):
> +            val = *(uint8_t *)ptr;
> +            break;
> +        case sizeof(uint16_t):
> +            val = *(uint16_t *)ptr;
> +            break;
> +        case sizeof(uint32_t):
> +            val = *(uint32_t *)ptr;
> +            break;
> +        case sizeof(uint64_t):
> +            val = *(uint64_t *)ptr;
> +            break;
> +        }
> +    }
> +    erst_debug("ERST read  %016lx %10s val %016lx sz %u\n",
> +        addr, erst_reg(addr), val, size);
> +    return val;
> +}
> +
> +static size_t build_erst_action(GArray *table_data,
> +    uint8_t serialization_action,
> +    uint8_t instruction,
> +    uint8_t flags,
> +    uint8_t width,
> +    uint64_t address,
> +    uint64_t value,
> +    uint64_t mask)
> +{
> +    /* See ACPI spec, Error Serialization */
> +    uint8_t access_width = 0;
> +    build_append_int_noprefix(table_data, serialization_action, 1);
> +    build_append_int_noprefix(table_data, instruction         , 1);
> +    build_append_int_noprefix(table_data, flags               , 1);
> +    build_append_int_noprefix(table_data, 0                   , 1);
> +    /* GAS space_id */
> +    build_append_int_noprefix(table_data, AML_SYSTEM_MEMORY   , 1);
> +    /* GAS bit_width */
> +    build_append_int_noprefix(table_data, width               , 1);
> +    /* GAS bit_offset */
> +    build_append_int_noprefix(table_data, 0                   , 1);
> +    /* GAS access_width */
> +    switch (width) {
> +    case 8:
> +        access_width = 1;
> +        break;
> +    case 16:
> +        access_width = 2;
> +        break;
> +    case 32:
> +        access_width = 3;
> +        break;
> +    case 64:
> +        access_width = 4;
> +        break;
> +    default:
> +        access_width = 0;
> +        break;
> +    }
> +    build_append_int_noprefix(table_data, access_width        , 1);
> +    /* GAS address */
> +    build_append_int_noprefix(table_data, address, 8);
> +    /* value */
> +    build_append_int_noprefix(table_data, value  , 8);
> +    /* mask */
> +    build_append_int_noprefix(table_data, mask   , 8);
> +
> +    return 1;
> +}
> +
> +static const MemoryRegionOps erst_rw_ops = {
> +    .read = erst_read,
> +    .write = erst_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +};
> +
> +void build_erst(GArray *table_data, BIOSLinker *linker, hwaddr base)
> +{
> +    unsigned action, insns = 0;
> +    unsigned erst_start = table_data->len;
> +    unsigned iec_offset = 0;
> +
> +    /* See ACPI spec, Error Serialization */
> +    acpi_data_push(table_data, sizeof(AcpiTableHeader));
> +    /* serialization_header_length */
> +    build_append_int_noprefix(table_data, 48, 4);
> +    /* reserved */
> +    build_append_int_noprefix(table_data,  0, 4);
> +    iec_offset = table_data->len;
> +    /* instruction_entry_count (placeholder) */
> +    build_append_int_noprefix(table_data,  0, 4);
> +
> +#define BEA(I, F, W, ADDR, VAL, MASK) \
> +    build_erst_action(table_data, action, \
> +        ACPI_ERST_INST_##I, F, W, base + ADDR, VAL, MASK)
> +#define MASK8  0x00000000000000FFUL
> +#define MASK16 0x000000000000FFFFUL
> +#define MASK32 0x00000000FFFFFFFFUL
> +#define MASK64 0xFFFFFFFFFFFFFFFFUL
> +
> +    for (action = 0; action < ACPI_ERST_MAX_ACTIONS; ++action) {
> +        switch (action) {
> +        case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_END_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_SET_RECORD_OFFSET:
> +            insns += BEA(WRITE_REGISTER      , 0, 32,
> +                ERST_CSR_VALUE , 0, MASK32);
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_EXECUTE_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_VALUE , ERST_EXECUTE_OPERATION_MAGIC, MASK8);
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_CHECK_BUSY_STATUS:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER_VALUE , 0, 32,
> +                ERST_CSR_VALUE, 0x01, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_GET_COMMAND_STATUS:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 32,
> +                ERST_CSR_VALUE, 0, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_GET_RECORD_IDENTIFIER:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 64,
> +                ERST_CSR_VALUE, 0, MASK64);
> +            break;
> +        case ACPI_ERST_ACTION_SET_RECORD_IDENTIFIER:
> +            insns += BEA(WRITE_REGISTER      , 0, 64,
> +                ERST_CSR_VALUE , 0, MASK64);
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_GET_RECORD_COUNT:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 32,
> +                ERST_CSR_VALUE, 0, MASK32);
> +            break;
> +        case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_RESERVED:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            break;
> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 64,
> +                ERST_CSR_VALUE, 0, MASK64);
> +            break;
> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 64,
> +                ERST_CSR_VALUE, 0, MASK32);
> +            break;
> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 32,
> +                ERST_CSR_VALUE, 0, MASK32);
> +            break;
> +        case ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS:
> +            insns += BEA(WRITE_REGISTER_VALUE, 0, 32,
> +                ERST_CSR_ACTION, action, MASK8);
> +            insns += BEA(READ_REGISTER       , 0, 64,
> +                ERST_CSR_VALUE, 0, MASK64);
> +        default:
> +            insns += BEA(NOOP, 0, 0, 0, action, 0);
> +            break;
> +        }
> +    }
> +
> +    /* acpi_data_push() within BEA() can result in new GArray pointer */
> +    *(uint32_t *)(table_data->data + iec_offset) = cpu_to_le32(insns);
> +
> +    build_header(linker, table_data,
> +                 (void *)(table_data->data + erst_start),
> +                 "ERST", table_data->len - erst_start,
> +                 1, NULL, NULL);
> +
> +    if (erst_base == 0) {
> +        /*
> +         * This ACPI routine is invoked twice, but this code
> +         * snippet needs to happen just once.
> +         * And this code in erst_class_init() is too early.
> +         */
> +        DeviceState *dev;
> +        SysBusDevice *s;
> +
> +        dev = qdev_new(TYPE_ACPI_ERST);
> +        erst_debug("qdev_create dev %p\n", dev);
> +        s = SYS_BUS_DEVICE(dev);
> +        sysbus_realize_and_unref(s, &error_fatal);
> +
> +        ACPIERST(dev)->base = base;
> +        sysbus_mmio_map(s, 0, base);
> +        erst_base = base;
> +        erst_debug("erst_base %lx\n", base);
> +    }
> +}
> +
> +/*******************************************************************/
> +/*******************************************************************/
> +static int erst_post_load(void *opaque, int version_id)
> +{
> +    ERSTDeviceState *s = opaque;
> +    erst_debug("+erst_post_load(%d)\n", version_id);
> +    /* Ensure nvram[] persists into backing file */
> +    update_erst_backing_file(s, 0, s->nvram, s->prop_size);
> +    erst_debug("-erst_post_load()\n");
> +    return 0;
> +}
> +
> +static const VMStateDescription erst_vmstate  = {
> +    .name = "acpi-erst",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .post_load = erst_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT8(operation, ERSTDeviceState),
> +        VMSTATE_UINT8(busy_status, ERSTDeviceState),
> +        VMSTATE_UINT8(command_status, ERSTDeviceState),
> +        VMSTATE_UINT32(record_offset, ERSTDeviceState),
> +        VMSTATE_UINT32(record_count, ERSTDeviceState),
> +        VMSTATE_UINT64(reg_action, ERSTDeviceState),
> +        VMSTATE_UINT64(reg_value, ERSTDeviceState),
> +        VMSTATE_UINT64(record_identifier, ERSTDeviceState),
> +        VMSTATE_UINT8_ARRAY(record, ERSTDeviceState, ERST_RECORD_SIZE),
> +        VMSTATE_UINT8_ARRAY(tmp_record, ERSTDeviceState, ERST_RECORD_SIZE),
> +        VMSTATE_VARRAY_UINT32(nvram, ERSTDeviceState, prop_size, 0,
> +            vmstate_info_uint8, uint8_t),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static void erst_realizefn(DeviceState *dev, Error **errp)
> +{
> +    SysBusDevice *d = SYS_BUS_DEVICE(dev);
> +    ERSTDeviceState *s = ACPIERST(dev);
> +
> +    erst_debug("+erst_realizefn()\n");
> +    if (!s->prop_filename) {
> +        s->prop_filename = (char *)(TYPE_ACPI_ERST ".backing");
> +    }
> +
> +    if (!s->prop_filename) {
> +        error_setg(errp, "'filename' property is not set");
> +        return;
> +    }
> +
> +    if (!(s->prop_size > ERST_RECORD_SIZE) &&
> +        (s->prop_size <= 0x04000000)) {
> +        error_setg(errp, "'size' property %d is not set properly",
> +            s->prop_size);
> +        return;
> +    }
> +
> +    /* Convert prop_size to integer multiple of ERST_RECORD_SIZE */
> +    s->prop_size -= (s->prop_size % ERST_RECORD_SIZE);
> +
> +    load_erst_backing_file(s);
> +
> +    erst_debug("filename %s\n", s->prop_filename);
> +    erst_debug("size %x\n", s->prop_size);
> +
> +    memory_region_init_io(&s->iomem, OBJECT(s), &erst_rw_ops, s,
> +                          TYPE_ACPI_ERST, ERST_IOMEM_SIZE);
> +    sysbus_init_mmio(d, &s->iomem);
> +    erst_debug("-erst_realizefn()\n");
> +}
> +
> +static void erst_unrealizefn(DeviceState *dev)
> +{
> +    ERSTDeviceState *s = ACPIERST(dev);
> +
> +    erst_debug("+erst_unrealizefn()\n");
> +    if (s->nvram) {
> +        /* Ensure nvram[] persists into backing file */
> +        update_erst_backing_file(s, 0, s->nvram, s->prop_size);
> +        g_free(s->nvram);
> +        s->nvram = NULL;
> +    }
> +    erst_debug("-erst_unrealizefn()\n");
> +}
> +
> +static void erst_reset(DeviceState *dev)
> +{
> +    ERSTDeviceState *s = ACPIERST(dev);
> +
> +    erst_debug("+erst_reset(%p) %d\n", s, s->record_count);
> +    s->operation = 0;
> +    s->busy_status = 0;
> +    s->command_status = ACPI_ERST_STATUS_SUCCESS;
> +    /* indicate empty/no-more until further notice */
> +    s->record_identifier = ERST_INVALID_RECORD_ID;
> +    s->record_offset = 0;
> +    s->next_record_index = 0;
> +    /* NOTE: record_count and nvram[] are initialized elsewhere */
> +    erst_debug("-erst_reset()\n");
> +}
> +
> +static Property erst_properties[] = {
> +    DEFINE_PROP_UINT32("size", ERSTDeviceState, prop_size, 0x00010000),
> +    DEFINE_PROP_STRING("filename", ERSTDeviceState, prop_filename),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void erst_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    erst_debug("+erst_class_init()\n");
> +    dc->realize = erst_realizefn;
> +    dc->unrealize = erst_unrealizefn;
> +    dc->reset = erst_reset;
> +    dc->vmsd = &erst_vmstate;
> +    device_class_set_props(dc, erst_properties);
> +    dc->desc = "ACPI Error Record Serialization Table (ERST) device";
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +    erst_debug("-erst_class_init()\n");
> +}
> +
> +static const TypeInfo erst_type_info = {
> +    .name          = TYPE_ACPI_ERST,
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .class_init    = erst_class_init,
> +    .instance_size = sizeof(ERSTDeviceState),
> +};
> +
> +static void erst_register_types(void)
> +{
> +    type_register_static(&erst_type_info);
> +}
> +
> +type_init(erst_register_types)


[-- Attachment #2: Type: text/html, Size: 77896 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-04-09 15:54     ` Eric DeVolder
@ 2021-04-14  9:17       ` Igor Mammedov
  2021-05-03 15:49         ` Eric DeVolder
  0 siblings, 1 reply; 24+ messages in thread
From: Igor Mammedov @ 2021-04-14  9:17 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, Konrad Wilk, mst, jusual, qemu-devel, pbonzini,
	Boris Ostrovsky, rth

On Fri, 9 Apr 2021 15:54:47 +0000
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Hi Igor,
> Thank you for reviewing. I've responded inline below.
> eric
> 
> ________________________________
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Tuesday, April 6, 2021 2:31 PM
> To: Eric DeVolder <eric.devolder@oracle.com>
> Cc: mst@redhat.com <mst@redhat.com>; marcel.apfelbaum@gmail.com <marcel.apfelbaum@gmail.com>; pbonzini@redhat.com <pbonzini@redhat.com>; rth@twiddle.net <rth@twiddle.net>; ehabkost@redhat.com <ehabkost@redhat.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; kwilk@oracle.com <kwilk@oracle.com>
> Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
> 
> On Mon,  8 Feb 2021 15:57:55 -0500
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
> > This change implements the support for the ACPI ERST feature[1,2].
> >
> > The size of the ACPI ERST storage is declared via the QEMU
> > global parameter acpi-erst.size. The size can range from 64KiB
> > to to 64MiB. The default is 64KiB.
> >
> > The location of the ACPI ERST storage backing file is delared
> > via the QEMU global parameter acpi-erst.filename. The default
> > is acpi-erst.backing.
> >
> > [1] "Advanced Configuration and Power Interface Specification",
> >     version 6.2, May 2017.
> >     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> >
> > [2] "Unified Extensible Firmware Interface Specification",
> >     version 2.8, March 2019.
> >     https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> >
> > Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>  
> 
> items 2/4/5 from v1 review still need to be addressed.
> 
> >
> > 2. patch is too big to review, please split it up in smaller chunks.
> >
> > EJD: Done.  
> 
> (separating a header and a makefile rule doesn't make much sense)
> 
> it should be split at least on part that implements device model and ACPI parts
> 
> EJD: I'll rebase this patch set on qemu-6 and accommodate your suggestions with how to split/organize the patch set.
> 
> [...]
> >
> > 4. Maybe instead of SYSBUS device, implement it as a PCI device and
> >    use its BAR/control registers for pstore storage and control interface.
> >    It could save you headache of picking address where to map it +
> >    it would take care of migration part automatically, as firmware
> >    would do it for you and then QEMU could pickup firmware programmed
> >    address and put it into ERST table.
> > EJD: Thanks for the idea. For now I've left it as a SYSBUS device; we can revisit as needed.  
> 
> I would really prefer to see a PCI version (current way is just a hack)
> 
> EJD: I understand, I don't like the base address problem either. Is there an example PCI device that gets its base address assigned during ACPI setup that I could reference and pattern this work after? I've been using SYSBUS as that most closely mimics the real hardware implementations I've studied in order to produce this code.
> EJD: I thought my inexperience with authoring QEMU devices was the primary problem in establishing a solution for the base address. Otherwise, this thing only needs a single 4KiB page (for the 2 registers + exchange buffer) exposed.

I don't recall if we merged example PCI device in QEMU, but someone worked on it before.
Google search yields following:
 https://github.com/grandemk/qemu_devices/commit/ba8d38a858ba63ef4d419a926f58328b9675fc98


> > 5. instead of dealing with file for storage directly, reuse hostmem backend
> >    to provide it to for your device. ex: pc-dimm. i.e. split device
> >    on frontend and backend
> >
> > EJD: I had looked into that prior to posting v1. The entire ERST storage is not memory mapped, just an exchange buffer. So the hostmem backend is not suitable for this purpose.  
> 
> Is there a compelling reason why it can't be memory mapped?
> 
> EJD: Well, this ERST device I've coded pretty much follows the ACPI ERST spec verbatim. As it stands today, the spec doesn't provide a way to report the total size of the persistent storage behind the interface; you know when storage is full only when you receive an Out Of Storage error code upon write. In a sense, that allows the size of the storage to vary greatly and be implemented in any way needed (ie actual hardware, this has tended to be in the 64KiB range when it is carved out of system parallel flash memory, but some hardware uses serial flash as well). In virtual environments, it can be of any size, and we at Oracle have intentions of heavily utilizing ACPI ERST to stuff all kinds of diagnostic information into it, thus wanting the storage to be very large. By not actually exposing/memory-mapping the storage, the issue of where to drop it in the memory map goes away (yes a PCI BAR could solve this).
> EJD: But at the end of the day, could this storage be memory mapped? I suppose it could be, but then that rather circumvents the entire need for the ACPI ERST interface to start with. Linux and Windows both already know how to utilize ACPI ERST.

Maybe I wasn't clear on it, I did not propose to map storage into guest.
Only use MemoryRegion provided by backend inside of your device.
This way you can drop all file related code from your patch,
and just work with read/store info from/to memory directly.

[...]



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-04-14  9:17       ` Igor Mammedov
@ 2021-05-03 15:49         ` Eric DeVolder
  2021-05-03 17:07           ` Igor Mammedov
  0 siblings, 1 reply; 24+ messages in thread
From: Eric DeVolder @ 2021-05-03 15:49 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Konrad Wilk, mst, jusual, qemu-devel, pbonzini,
	Boris Ostrovsky, rth

[-- Attachment #1: Type: text/plain, Size: 6614 bytes --]

Igor,
I've rebased the original patches on to qemu-v6.0.0-rc4, and finally have everything working as it previously did.
I've started now to work to incorporate the HostMemoryBackendFile; that is progressing.
My question for you today is with regard to placing ERST device on PCI. The PCI example provided is a template device, and while I do find that helpful, I still do not understand how the ERST Actions, which contain GAS for describing the register accesses, would be patched/linked when a PCI bar is assigned. Or is there perhaps another way of obtaining the PCI BAR using ACPI semantics?
Thanks,
eric

________________________________
From: Igor Mammedov <imammedo@redhat.com>
Sent: Wednesday, April 14, 2021 4:17 AM
To: Eric DeVolder <eric.devolder@oracle.com>
Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature

On Fri, 9 Apr 2021 15:54:47 +0000
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Hi Igor,
> Thank you for reviewing. I've responded inline below.
> eric
>
> ________________________________
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Tuesday, April 6, 2021 2:31 PM
> To: Eric DeVolder <eric.devolder@oracle.com>
> Cc: mst@redhat.com <mst@redhat.com>; marcel.apfelbaum@gmail.com <marcel.apfelbaum@gmail.com>; pbonzini@redhat.com <pbonzini@redhat.com>; rth@twiddle.net <rth@twiddle.net>; ehabkost@redhat.com <ehabkost@redhat.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; kwilk@oracle.com <kwilk@oracle.com>
> Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
>
> On Mon,  8 Feb 2021 15:57:55 -0500
> Eric DeVolder <eric.devolder@oracle.com> wrote:
>
> > This change implements the support for the ACPI ERST feature[1,2].
> >
> > The size of the ACPI ERST storage is declared via the QEMU
> > global parameter acpi-erst.size. The size can range from 64KiB
> > to to 64MiB. The default is 64KiB.
> >
> > The location of the ACPI ERST storage backing file is delared
> > via the QEMU global parameter acpi-erst.filename. The default
> > is acpi-erst.backing.
> >
> > [1] "Advanced Configuration and Power Interface Specification",
> >     version 6.2, May 2017.
> >     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> >
> > [2] "Unified Extensible Firmware Interface Specification",
> >     version 2.8, March 2019.
> >     https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> >
> > Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>
> items 2/4/5 from v1 review still need to be addressed.
>
> >
> > 2. patch is too big to review, please split it up in smaller chunks.
> >
> > EJD: Done.
>
> (separating a header and a makefile rule doesn't make much sense)
>
> it should be split at least on part that implements device model and ACPI parts
>
> EJD: I'll rebase this patch set on qemu-6 and accommodate your suggestions with how to split/organize the patch set.
>
> [...]
> >
> > 4. Maybe instead of SYSBUS device, implement it as a PCI device and
> >    use its BAR/control registers for pstore storage and control interface.
> >    It could save you headache of picking address where to map it +
> >    it would take care of migration part automatically, as firmware
> >    would do it for you and then QEMU could pickup firmware programmed
> >    address and put it into ERST table.
> > EJD: Thanks for the idea. For now I've left it as a SYSBUS device; we can revisit as needed.
>
> I would really prefer to see a PCI version (current way is just a hack)
>
> EJD: I understand, I don't like the base address problem either. Is there an example PCI device that gets its base address assigned during ACPI setup that I could reference and pattern this work after? I've been using SYSBUS as that most closely mimics the real hardware implementations I've studied in order to produce this code.
> EJD: I thought my inexperience with authoring QEMU devices was the primary problem in establishing a solution for the base address. Otherwise, this thing only needs a single 4KiB page (for the 2 registers + exchange buffer) exposed.

I don't recall if we merged example PCI device in QEMU, but someone worked on it before.
Google search yields following:
 https://github.com/grandemk/qemu_devices/commit/ba8d38a858ba63ef4d419a926f58328b9675fc98


> > 5. instead of dealing with file for storage directly, reuse hostmem backend
> >    to provide it to for your device. ex: pc-dimm. i.e. split device
> >    on frontend and backend
> >
> > EJD: I had looked into that prior to posting v1. The entire ERST storage is not memory mapped, just an exchange buffer. So the hostmem backend is not suitable for this purpose.
>
> Is there a compelling reason why it can't be memory mapped?
>
> EJD: Well, this ERST device I've coded pretty much follows the ACPI ERST spec verbatim. As it stands today, the spec doesn't provide a way to report the total size of the persistent storage behind the interface; you know when storage is full only when you receive an Out Of Storage error code upon write. In a sense, that allows the size of the storage to vary greatly and be implemented in any way needed (ie actual hardware, this has tended to be in the 64KiB range when it is carved out of system parallel flash memory, but some hardware uses serial flash as well). In virtual environments, it can be of any size, and we at Oracle have intentions of heavily utilizing ACPI ERST to stuff all kinds of diagnostic information into it, thus wanting the storage to be very large. By not actually exposing/memory-mapping the storage, the issue of where to drop it in the memory map goes away (yes a PCI BAR could solve this).
> EJD: But at the end of the day, could this storage be memory mapped? I suppose it could be, but then that rather circumvents the entire need for the ACPI ERST interface to start with. Linux and Windows both already know how to utilize ACPI ERST.

Maybe I wasn't clear on it, I did not propose to map storage into guest.
Only use MemoryRegion provided by backend inside of your device.
This way you can drop all file related code from your patch,
and just work with read/store info from/to memory directly.

[...]


[-- Attachment #2: Type: text/html, Size: 9344 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-05-03 15:49         ` Eric DeVolder
@ 2021-05-03 17:07           ` Igor Mammedov
  2021-05-17 15:01             ` Eric DeVolder
  0 siblings, 1 reply; 24+ messages in thread
From: Igor Mammedov @ 2021-05-03 17:07 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, Konrad Wilk, mst, jusual, qemu-devel, pbonzini,
	Boris Ostrovsky, rth

On Mon, 3 May 2021 15:49:28 +0000
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Igor,
> I've rebased the original patches on to qemu-v6.0.0-rc4, and finally have everything working as it previously did.
> I've started now to work to incorporate the HostMemoryBackendFile; that is progressing.
> My question for you today is with regard to placing ERST device on PCI. The PCI example provided is a template device, and while I do find that helpful, I still do not understand how the ERST Actions, which contain GAS for describing the register accesses, would be patched/linked when a PCI bar is assigned. Or is there perhaps another way of obtaining the PCI BAR using ACPI semantics?

current order of initialization is,
 0. QEMU builds initial ACPI tables (unpatched, mainly used to gauge total size of ACPI tables) and starts guest
 1. guest firmware initializes PCI devices (including BARs)
 2. guest reads ACPI tables from QEMU(via fwcfg)
 2.1 reading ACPI tables traps into QEMU and QEMU rebuilds all ACPI tables (including ERST)
      at this time one can get info from PCI devices (probably pci_get_bar_addr() is what you are looking for)
      that were initialized by firmware and build tables using address.
      Maybe it will need dynamic tables patching but lets get to that only if rebuilding table won't be enough
        
        

> Thanks,
> eric
> 
> ________________________________
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Wednesday, April 14, 2021 4:17 AM
> To: Eric DeVolder <eric.devolder@oracle.com>
> Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
> Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
> 
> On Fri, 9 Apr 2021 15:54:47 +0000
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
> > Hi Igor,
> > Thank you for reviewing. I've responded inline below.
> > eric
> >
> > ________________________________
> > From: Igor Mammedov <imammedo@redhat.com>
> > Sent: Tuesday, April 6, 2021 2:31 PM
> > To: Eric DeVolder <eric.devolder@oracle.com>
> > Cc: mst@redhat.com <mst@redhat.com>; marcel.apfelbaum@gmail.com <marcel.apfelbaum@gmail.com>; pbonzini@redhat.com <pbonzini@redhat.com>; rth@twiddle.net <rth@twiddle.net>; ehabkost@redhat.com <ehabkost@redhat.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; kwilk@oracle.com <kwilk@oracle.com>
> > Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
> >
> > On Mon,  8 Feb 2021 15:57:55 -0500
> > Eric DeVolder <eric.devolder@oracle.com> wrote:
> >  
> > > This change implements the support for the ACPI ERST feature[1,2].
> > >
> > > The size of the ACPI ERST storage is declared via the QEMU
> > > global parameter acpi-erst.size. The size can range from 64KiB
> > > to to 64MiB. The default is 64KiB.
> > >
> > > The location of the ACPI ERST storage backing file is delared
> > > via the QEMU global parameter acpi-erst.filename. The default
> > > is acpi-erst.backing.
> > >
> > > [1] "Advanced Configuration and Power Interface Specification",
> > >     version 6.2, May 2017.
> > >     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> > >
> > > [2] "Unified Extensible Firmware Interface Specification",
> > >     version 2.8, March 2019.
> > >     https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> > >
> > > Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>  
> >
> > items 2/4/5 from v1 review still need to be addressed.
> >  
> > >
> > > 2. patch is too big to review, please split it up in smaller chunks.
> > >
> > > EJD: Done.  
> >
> > (separating a header and a makefile rule doesn't make much sense)
> >
> > it should be split at least on part that implements device model and ACPI parts
> >
> > EJD: I'll rebase this patch set on qemu-6 and accommodate your suggestions with how to split/organize the patch set.
> >
> > [...]  
> > >
> > > 4. Maybe instead of SYSBUS device, implement it as a PCI device and
> > >    use its BAR/control registers for pstore storage and control interface.
> > >    It could save you headache of picking address where to map it +
> > >    it would take care of migration part automatically, as firmware
> > >    would do it for you and then QEMU could pickup firmware programmed
> > >    address and put it into ERST table.
> > > EJD: Thanks for the idea. For now I've left it as a SYSBUS device; we can revisit as needed.  
> >
> > I would really prefer to see a PCI version (current way is just a hack)
> >
> > EJD: I understand, I don't like the base address problem either. Is there an example PCI device that gets its base address assigned during ACPI setup that I could reference and pattern this work after? I've been using SYSBUS as that most closely mimics the real hardware implementations I've studied in order to produce this code.
> > EJD: I thought my inexperience with authoring QEMU devices was the primary problem in establishing a solution for the base address. Otherwise, this thing only needs a single 4KiB page (for the 2 registers + exchange buffer) exposed.  
> 
> I don't recall if we merged example PCI device in QEMU, but someone worked on it before.
> Google search yields following:
>  https://github.com/grandemk/qemu_devices/commit/ba8d38a858ba63ef4d419a926f58328b9675fc98
> 
> 
> > > 5. instead of dealing with file for storage directly, reuse hostmem backend
> > >    to provide it to for your device. ex: pc-dimm. i.e. split device
> > >    on frontend and backend
> > >
> > > EJD: I had looked into that prior to posting v1. The entire ERST storage is not memory mapped, just an exchange buffer. So the hostmem backend is not suitable for this purpose.  
> >
> > Is there a compelling reason why it can't be memory mapped?
> >
> > EJD: Well, this ERST device I've coded pretty much follows the ACPI ERST spec verbatim. As it stands today, the spec doesn't provide a way to report the total size of the persistent storage behind the interface; you know when storage is full only when you receive an Out Of Storage error code upon write. In a sense, that allows the size of the storage to vary greatly and be implemented in any way needed (ie actual hardware, this has tended to be in the 64KiB range when it is carved out of system parallel flash memory, but some hardware uses serial flash as well). In virtual environments, it can be of any size, and we at Oracle have intentions of heavily utilizing ACPI ERST to stuff all kinds of diagnostic information into it, thus wanting the storage to be very large. By not actually exposing/memory-mapping the storage, the issue of where to drop it in the memory map goes away (yes a PCI BAR could solve this).
> > EJD: But at the end of the day, could this storage be memory mapped? I suppose it could be, but then that rather circumvents the entire need for the ACPI ERST interface to start with. Linux and Windows both already know how to utilize ACPI ERST.  
> 
> Maybe I wasn't clear on it, I did not propose to map storage into guest.
> Only use MemoryRegion provided by backend inside of your device.
> This way you can drop all file related code from your patch,
> and just work with read/store info from/to memory directly.
> 
> [...]
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU
  2021-02-08 20:57 [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (7 preceding siblings ...)
  2021-03-01 19:04 ` [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric Devolder
@ 2021-05-14 13:57 ` Michael S. Tsirkin
  2021-05-14 14:12   ` Eric DeVolder
  8 siblings, 1 reply; 24+ messages in thread
From: Michael S. Tsirkin @ 2021-05-14 13:57 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, qemu-devel, kwilk, pbonzini, imammedo, boris.ostrovsky, rth

On Mon, Feb 08, 2021 at 03:57:52PM -0500, Eric DeVolder wrote:
> This patchset introduces support for the ACPI Error Record
> Serialization Table, ERST.

OK I'm expecting v3 I guess?


> Linux uses the persistent storage filesystem, pstore, to record
> information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
> independent of, and runs before, kdump.  In certain scenarios (ie.
> hosts/guests with root filesystems on NFS/iSCSI where networking
> software and/or hardware fails), pstore may contain the only
> information available for post-mortem debugging.
> 
> Two common storage backends for the pstore filesystem are ACPI ERST
> and UEFI. Most BIOS implement ACPI ERST; however, ACPI ERST is not
> currently supported in QEMU, and UEFI is not utilized in all guests.
> By implementing ACPI ERST within QEMU, then the ACPI ERST becomes a
> viable pstore storage backend for virtual machines (as it is now for
> bare metal machines).
> 
> Enabling support for ACPI ERST facilitates a consistent method to
> capture kernel panic information in a wide range of guests: from
> resource- constrained microvms to very large guests, and in
> particular, in direct-boot environments (which would lack UEFI
> run-time services).
> 
> Note that Microsoft Windows also utilizes the ACPI ERST for certain
> crash information, if available.
> 
> The ACPI ERST persistent storage is contained within a single backing
> file, with a default size of 64KiB. The size and filename of the
> backing file can be obtained from QEMU parameters.
> 
> The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
> (APEI)", and specifically subsection "Error Serialization", outlines
> a method for storing error records into persistent storage.
> 
> [1] "Advanced Configuration and Power Interface Specification",
>     version 6.2, May 2017.
>     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> 
> [2] "Unified Extensible Firmware Interface Specification",
>     version 2.8, March 2019.
>     https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> 
> Suggested-by: Konrad Wilk <konrad.wilk@oracle.com>
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> 
> ---
> v2: 8feb2021
>  - Added qtest/smoke test per Paolo Bonzini
>  - Split patch into smaller chunks, per Igo Mammedov
>  - Did away with use of ACPI packed structures, per Igo Mammedov
> 
> v1: 26oct2020
>  - initial post
> 
> ---
> Eric DeVolder (7):
>   ACPI ERST: bios-tables-test.c steps 1 and 2
>   ACPI ERST: header file for erst
>   ACPI ERST: support for ACPI ERST feature
>   ACPI ERST: build step for ACPI ERST
>   ACPI ERST: support ERST for x86 guest
>   ACPI ERST: qtest for ERST
>   ACPI ERST: bios-tables-test.c step 5
> 
>  hw/acpi/erst.c               | 952 +++++++++++++++++++++++++++++++++++++++++++
>  hw/acpi/meson.build          |   1 +
>  hw/i386/acpi-build.c         |   4 +
>  include/hw/acpi/erst.h       |  77 ++++
>  tests/data/acpi/microvm/ERST |   0
>  tests/data/acpi/pc/ERST      | Bin 0 -> 976 bytes
>  tests/data/acpi/q35/ERST     | Bin 0 -> 976 bytes
>  tests/qtest/erst-test.c      | 106 +++++
>  tests/qtest/meson.build      |   2 +
>  9 files changed, 1142 insertions(+)
>  create mode 100644 hw/acpi/erst.c
>  create mode 100644 include/hw/acpi/erst.h
>  create mode 100644 tests/data/acpi/microvm/ERST
>  create mode 100644 tests/data/acpi/pc/ERST
>  create mode 100644 tests/data/acpi/q35/ERST
>  create mode 100644 tests/qtest/erst-test.c
> 
> -- 
> 1.8.3.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU
  2021-05-14 13:57 ` Michael S. Tsirkin
@ 2021-05-14 14:12   ` Eric DeVolder
  0 siblings, 0 replies; 24+ messages in thread
From: Eric DeVolder @ 2021-05-14 14:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: ehabkost, qemu-devel, kwilk, pbonzini, imammedo, Boris Ostrovsky, rth

[-- Attachment #1: Type: text/plain, Size: 4273 bytes --]

Michael,
Yes, I'm working on a v3 to accommodate items Igor has requested.
eric
________________________________
From: Michael S. Tsirkin <mst@redhat.com>
Sent: Friday, May 14, 2021 8:57 AM
To: Eric DeVolder <eric.devolder@oracle.com>
Cc: imammedo@redhat.com <imammedo@redhat.com>; marcel.apfelbaum@gmail.com <marcel.apfelbaum@gmail.com>; pbonzini@redhat.com <pbonzini@redhat.com>; rth@twiddle.net <rth@twiddle.net>; ehabkost@redhat.com <ehabkost@redhat.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; kwilk@oracle.com <kwilk@oracle.com>
Subject: Re: [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU

On Mon, Feb 08, 2021 at 03:57:52PM -0500, Eric DeVolder wrote:
> This patchset introduces support for the ACPI Error Record
> Serialization Table, ERST.

OK I'm expecting v3 I guess?


> Linux uses the persistent storage filesystem, pstore, to record
> information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
> independent of, and runs before, kdump.  In certain scenarios (ie.
> hosts/guests with root filesystems on NFS/iSCSI where networking
> software and/or hardware fails), pstore may contain the only
> information available for post-mortem debugging.
>
> Two common storage backends for the pstore filesystem are ACPI ERST
> and UEFI. Most BIOS implement ACPI ERST; however, ACPI ERST is not
> currently supported in QEMU, and UEFI is not utilized in all guests.
> By implementing ACPI ERST within QEMU, then the ACPI ERST becomes a
> viable pstore storage backend for virtual machines (as it is now for
> bare metal machines).
>
> Enabling support for ACPI ERST facilitates a consistent method to
> capture kernel panic information in a wide range of guests: from
> resource- constrained microvms to very large guests, and in
> particular, in direct-boot environments (which would lack UEFI
> run-time services).
>
> Note that Microsoft Windows also utilizes the ACPI ERST for certain
> crash information, if available.
>
> The ACPI ERST persistent storage is contained within a single backing
> file, with a default size of 64KiB. The size and filename of the
> backing file can be obtained from QEMU parameters.
>
> The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
> (APEI)", and specifically subsection "Error Serialization", outlines
> a method for storing error records into persistent storage.
>
> [1] "Advanced Configuration and Power Interface Specification",
>     version 6.2, May 2017.
>     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
>
> [2] "Unified Extensible Firmware Interface Specification",
>     version 2.8, March 2019.
>     https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
>
> Suggested-by: Konrad Wilk <konrad.wilk@oracle.com>
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>
> ---
> v2: 8feb2021
>  - Added qtest/smoke test per Paolo Bonzini
>  - Split patch into smaller chunks, per Igo Mammedov
>  - Did away with use of ACPI packed structures, per Igo Mammedov
>
> v1: 26oct2020
>  - initial post
>
> ---
> Eric DeVolder (7):
>   ACPI ERST: bios-tables-test.c steps 1 and 2
>   ACPI ERST: header file for erst
>   ACPI ERST: support for ACPI ERST feature
>   ACPI ERST: build step for ACPI ERST
>   ACPI ERST: support ERST for x86 guest
>   ACPI ERST: qtest for ERST
>   ACPI ERST: bios-tables-test.c step 5
>
>  hw/acpi/erst.c               | 952 +++++++++++++++++++++++++++++++++++++++++++
>  hw/acpi/meson.build          |   1 +
>  hw/i386/acpi-build.c         |   4 +
>  include/hw/acpi/erst.h       |  77 ++++
>  tests/data/acpi/microvm/ERST |   0
>  tests/data/acpi/pc/ERST      | Bin 0 -> 976 bytes
>  tests/data/acpi/q35/ERST     | Bin 0 -> 976 bytes
>  tests/qtest/erst-test.c      | 106 +++++
>  tests/qtest/meson.build      |   2 +
>  9 files changed, 1142 insertions(+)
>  create mode 100644 hw/acpi/erst.c
>  create mode 100644 include/hw/acpi/erst.h
>  create mode 100644 tests/data/acpi/microvm/ERST
>  create mode 100644 tests/data/acpi/pc/ERST
>  create mode 100644 tests/data/acpi/q35/ERST
>  create mode 100644 tests/qtest/erst-test.c
>
> --
> 1.8.3.1


[-- Attachment #2: Type: text/html, Size: 6688 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-05-03 17:07           ` Igor Mammedov
@ 2021-05-17 15:01             ` Eric DeVolder
  2021-05-17 16:31               ` Igor Mammedov
  0 siblings, 1 reply; 24+ messages in thread
From: Eric DeVolder @ 2021-05-17 15:01 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Konrad Wilk, mst, jusual, qemu-devel, pbonzini,
	Boris Ostrovsky, rth

[-- Attachment #1: Type: text/plain, Size: 8771 bytes --]

Hi Igor,
I've been working to transition ERST to use the hostmem-file object as the backing store, as requested.

I have the backend-file object now in ERST, and I have a question for you. This hostmem-file initializes
itself from a file, but in looking at the code, I do not see that it ever writes back to the file!? Furthermore,
I don't see a "flush" type method to force writeback of data in the object back to file?

The original ERST code would flush/write to the backing file each record as it was created. I don't see
any equivalent way of doing that with hostmem-file?

Please point out where I am misunderstanding.

Thanks,
eric

________________________________
From: Igor Mammedov <imammedo@redhat.com>
Sent: Monday, May 3, 2021 12:07 PM
To: Eric DeVolder <eric.devolder@oracle.com>
Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature

On Mon, 3 May 2021 15:49:28 +0000
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Igor,
> I've rebased the original patches on to qemu-v6.0.0-rc4, and finally have everything working as it previously did.
> I've started now to work to incorporate the HostMemoryBackendFile; that is progressing.
> My question for you today is with regard to placing ERST device on PCI. The PCI example provided is a template device, and while I do find that helpful, I still do not understand how the ERST Actions, which contain GAS for describing the register accesses, would be patched/linked when a PCI bar is assigned. Or is there perhaps another way of obtaining the PCI BAR using ACPI semantics?

current order of initialization is,
 0. QEMU builds initial ACPI tables (unpatched, mainly used to gauge total size of ACPI tables) and starts guest
 1. guest firmware initializes PCI devices (including BARs)
 2. guest reads ACPI tables from QEMU(via fwcfg)
 2.1 reading ACPI tables traps into QEMU and QEMU rebuilds all ACPI tables (including ERST)
      at this time one can get info from PCI devices (probably pci_get_bar_addr() is what you are looking for)
      that were initialized by firmware and build tables using address.
      Maybe it will need dynamic tables patching but lets get to that only if rebuilding table won't be enough



> Thanks,
> eric
>
> ________________________________
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Wednesday, April 14, 2021 4:17 AM
> To: Eric DeVolder <eric.devolder@oracle.com>
> Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
> Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
>
> On Fri, 9 Apr 2021 15:54:47 +0000
> Eric DeVolder <eric.devolder@oracle.com> wrote:
>
> > Hi Igor,
> > Thank you for reviewing. I've responded inline below.
> > eric
> >
> > ________________________________
> > From: Igor Mammedov <imammedo@redhat.com>
> > Sent: Tuesday, April 6, 2021 2:31 PM
> > To: Eric DeVolder <eric.devolder@oracle.com>
> > Cc: mst@redhat.com <mst@redhat.com>; marcel.apfelbaum@gmail.com <marcel.apfelbaum@gmail.com>; pbonzini@redhat.com <pbonzini@redhat.com>; rth@twiddle.net <rth@twiddle.net>; ehabkost@redhat.com <ehabkost@redhat.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; kwilk@oracle.com <kwilk@oracle.com>
> > Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
> >
> > On Mon,  8 Feb 2021 15:57:55 -0500
> > Eric DeVolder <eric.devolder@oracle.com> wrote:
> >
> > > This change implements the support for the ACPI ERST feature[1,2].
> > >
> > > The size of the ACPI ERST storage is declared via the QEMU
> > > global parameter acpi-erst.size. The size can range from 64KiB
> > > to to 64MiB. The default is 64KiB.
> > >
> > > The location of the ACPI ERST storage backing file is delared
> > > via the QEMU global parameter acpi-erst.filename. The default
> > > is acpi-erst.backing.
> > >
> > > [1] "Advanced Configuration and Power Interface Specification",
> > >     version 6.2, May 2017.
> > >     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> > >
> > > [2] "Unified Extensible Firmware Interface Specification",
> > >     version 2.8, March 2019.
> > >     https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> > >
> > > Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> >
> > items 2/4/5 from v1 review still need to be addressed.
> >
> > >
> > > 2. patch is too big to review, please split it up in smaller chunks.
> > >
> > > EJD: Done.
> >
> > (separating a header and a makefile rule doesn't make much sense)
> >
> > it should be split at least on part that implements device model and ACPI parts
> >
> > EJD: I'll rebase this patch set on qemu-6 and accommodate your suggestions with how to split/organize the patch set.
> >
> > [...]
> > >
> > > 4. Maybe instead of SYSBUS device, implement it as a PCI device and
> > >    use its BAR/control registers for pstore storage and control interface.
> > >    It could save you headache of picking address where to map it +
> > >    it would take care of migration part automatically, as firmware
> > >    would do it for you and then QEMU could pickup firmware programmed
> > >    address and put it into ERST table.
> > > EJD: Thanks for the idea. For now I've left it as a SYSBUS device; we can revisit as needed.
> >
> > I would really prefer to see a PCI version (current way is just a hack)
> >
> > EJD: I understand, I don't like the base address problem either. Is there an example PCI device that gets its base address assigned during ACPI setup that I could reference and pattern this work after? I've been using SYSBUS as that most closely mimics the real hardware implementations I've studied in order to produce this code.
> > EJD: I thought my inexperience with authoring QEMU devices was the primary problem in establishing a solution for the base address. Otherwise, this thing only needs a single 4KiB page (for the 2 registers + exchange buffer) exposed.
>
> I don't recall if we merged example PCI device in QEMU, but someone worked on it before.
> Google search yields following:
>  https://github.com/grandemk/qemu_devices/commit/ba8d38a858ba63ef4d419a926f58328b9675fc98
>
>
> > > 5. instead of dealing with file for storage directly, reuse hostmem backend
> > >    to provide it to for your device. ex: pc-dimm. i.e. split device
> > >    on frontend and backend
> > >
> > > EJD: I had looked into that prior to posting v1. The entire ERST storage is not memory mapped, just an exchange buffer. So the hostmem backend is not suitable for this purpose.
> >
> > Is there a compelling reason why it can't be memory mapped?
> >
> > EJD: Well, this ERST device I've coded pretty much follows the ACPI ERST spec verbatim. As it stands today, the spec doesn't provide a way to report the total size of the persistent storage behind the interface; you know when storage is full only when you receive an Out Of Storage error code upon write. In a sense, that allows the size of the storage to vary greatly and be implemented in any way needed (ie actual hardware, this has tended to be in the 64KiB range when it is carved out of system parallel flash memory, but some hardware uses serial flash as well). In virtual environments, it can be of any size, and we at Oracle have intentions of heavily utilizing ACPI ERST to stuff all kinds of diagnostic information into it, thus wanting the storage to be very large. By not actually exposing/memory-mapping the storage, the issue of where to drop it in the memory map goes away (yes a PCI BAR could solve this).
> > EJD: But at the end of the day, could this storage be memory mapped? I suppose it could be, but then that rather circumvents the entire need for the ACPI ERST interface to start with. Linux and Windows both already know how to utilize ACPI ERST.
>
> Maybe I wasn't clear on it, I did not propose to map storage into guest.
> Only use MemoryRegion provided by backend inside of your device.
> This way you can drop all file related code from your patch,
> and just work with read/store info from/to memory directly.
>
> [...]
>


[-- Attachment #2: Type: text/html, Size: 13139 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-05-17 15:01             ` Eric DeVolder
@ 2021-05-17 16:31               ` Igor Mammedov
  2021-05-18 17:08                 ` Eric DeVolder
  0 siblings, 1 reply; 24+ messages in thread
From: Igor Mammedov @ 2021-05-17 16:31 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, Konrad Wilk, mst, jusual, qemu-devel, pbonzini,
	Boris Ostrovsky, rth

On Mon, 17 May 2021 15:01:02 +0000
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Hi Igor,
> I've been working to transition ERST to use the hostmem-file object as the backing store, as requested.
> 
> I have the backend-file object now in ERST, and I have a question for you. This hostmem-file initializes
> itself from a file, but in looking at the code, I do not see that it ever writes back to the file!? Furthermore,
> I don't see a "flush" type method to force writeback of data in the object back to file?
> 
> The original ERST code would flush/write to the backing file each record as it was created. I don't see
> any equivalent way of doing that with hostmem-file?

To force flush you can use memory_region_msync() on MemoryRegion that you get from hostmem backend.
But question is what are you trying to achieve with sync
  1. data persistence in case of QEMU crash
  2. data persistence in case of host crash

for the former you do not need explicit sync as memory buffers should be flushed to disk by kernel
if you put backend on nvdimm, you should get 2 without sync as well (see pmem=on property)

just do not forget that sync is not free, so if #1 is acceptable I'd avoid explicit sync.


> Please point out where I am misunderstanding.
> 
> Thanks,
> eric
> 
> ________________________________
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Monday, May 3, 2021 12:07 PM
> To: Eric DeVolder <eric.devolder@oracle.com>
> Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
> Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
> 
> On Mon, 3 May 2021 15:49:28 +0000
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
> > Igor,
> > I've rebased the original patches on to qemu-v6.0.0-rc4, and finally have everything working as it previously did.
> > I've started now to work to incorporate the HostMemoryBackendFile; that is progressing.
> > My question for you today is with regard to placing ERST device on PCI. The PCI example provided is a template device, and while I do find that helpful, I still do not understand how the ERST Actions, which contain GAS for describing the register accesses, would be patched/linked when a PCI bar is assigned. Or is there perhaps another way of obtaining the PCI BAR using ACPI semantics?  
> 
> current order of initialization is,
>  0. QEMU builds initial ACPI tables (unpatched, mainly used to gauge total size of ACPI tables) and starts guest
>  1. guest firmware initializes PCI devices (including BARs)
>  2. guest reads ACPI tables from QEMU(via fwcfg)
>  2.1 reading ACPI tables traps into QEMU and QEMU rebuilds all ACPI tables (including ERST)
>       at this time one can get info from PCI devices (probably pci_get_bar_addr() is what you are looking for)
>       that were initialized by firmware and build tables using address.
>       Maybe it will need dynamic tables patching but lets get to that only if rebuilding table won't be enough
> 
> 
> 
> > Thanks,
> > eric
> >
> > ________________________________
> > From: Igor Mammedov <imammedo@redhat.com>
> > Sent: Wednesday, April 14, 2021 4:17 AM
> > To: Eric DeVolder <eric.devolder@oracle.com>
> > Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
> > Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
> >
> > On Fri, 9 Apr 2021 15:54:47 +0000
> > Eric DeVolder <eric.devolder@oracle.com> wrote:
> >  
> > > Hi Igor,
> > > Thank you for reviewing. I've responded inline below.
> > > eric
> > >
> > > ________________________________
> > > From: Igor Mammedov <imammedo@redhat.com>
> > > Sent: Tuesday, April 6, 2021 2:31 PM
> > > To: Eric DeVolder <eric.devolder@oracle.com>
> > > Cc: mst@redhat.com <mst@redhat.com>; marcel.apfelbaum@gmail.com <marcel.apfelbaum@gmail.com>; pbonzini@redhat.com <pbonzini@redhat.com>; rth@twiddle.net <rth@twiddle.net>; ehabkost@redhat.com <ehabkost@redhat.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; kwilk@oracle.com <kwilk@oracle.com>
> > > Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
> > >
> > > On Mon,  8 Feb 2021 15:57:55 -0500
> > > Eric DeVolder <eric.devolder@oracle.com> wrote:
> > >  
> > > > This change implements the support for the ACPI ERST feature[1,2].
> > > >
> > > > The size of the ACPI ERST storage is declared via the QEMU
> > > > global parameter acpi-erst.size. The size can range from 64KiB
> > > > to to 64MiB. The default is 64KiB.
> > > >
> > > > The location of the ACPI ERST storage backing file is delared
> > > > via the QEMU global parameter acpi-erst.filename. The default
> > > > is acpi-erst.backing.
> > > >
> > > > [1] "Advanced Configuration and Power Interface Specification",
> > > >     version 6.2, May 2017.
> > > >     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> > > >
> > > > [2] "Unified Extensible Firmware Interface Specification",
> > > >     version 2.8, March 2019.
> > > >     https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> > > >
> > > > Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>  
> > >
> > > items 2/4/5 from v1 review still need to be addressed.
> > >  
> > > >
> > > > 2. patch is too big to review, please split it up in smaller chunks.
> > > >
> > > > EJD: Done.  
> > >
> > > (separating a header and a makefile rule doesn't make much sense)
> > >
> > > it should be split at least on part that implements device model and ACPI parts
> > >
> > > EJD: I'll rebase this patch set on qemu-6 and accommodate your suggestions with how to split/organize the patch set.
> > >
> > > [...]  
> > > >
> > > > 4. Maybe instead of SYSBUS device, implement it as a PCI device and
> > > >    use its BAR/control registers for pstore storage and control interface.
> > > >    It could save you headache of picking address where to map it +
> > > >    it would take care of migration part automatically, as firmware
> > > >    would do it for you and then QEMU could pickup firmware programmed
> > > >    address and put it into ERST table.
> > > > EJD: Thanks for the idea. For now I've left it as a SYSBUS device; we can revisit as needed.  
> > >
> > > I would really prefer to see a PCI version (current way is just a hack)
> > >
> > > EJD: I understand, I don't like the base address problem either. Is there an example PCI device that gets its base address assigned during ACPI setup that I could reference and pattern this work after? I've been using SYSBUS as that most closely mimics the real hardware implementations I've studied in order to produce this code.
> > > EJD: I thought my inexperience with authoring QEMU devices was the primary problem in establishing a solution for the base address. Otherwise, this thing only needs a single 4KiB page (for the 2 registers + exchange buffer) exposed.  
> >
> > I don't recall if we merged example PCI device in QEMU, but someone worked on it before.
> > Google search yields following:
> >  https://github.com/grandemk/qemu_devices/commit/ba8d38a858ba63ef4d419a926f58328b9675fc98
> >
> >  
> > > > 5. instead of dealing with file for storage directly, reuse hostmem backend
> > > >    to provide it to for your device. ex: pc-dimm. i.e. split device
> > > >    on frontend and backend
> > > >
> > > > EJD: I had looked into that prior to posting v1. The entire ERST storage is not memory mapped, just an exchange buffer. So the hostmem backend is not suitable for this purpose.  
> > >
> > > Is there a compelling reason why it can't be memory mapped?
> > >
> > > EJD: Well, this ERST device I've coded pretty much follows the ACPI ERST spec verbatim. As it stands today, the spec doesn't provide a way to report the total size of the persistent storage behind the interface; you know when storage is full only when you receive an Out Of Storage error code upon write. In a sense, that allows the size of the storage to vary greatly and be implemented in any way needed (ie actual hardware, this has tended to be in the 64KiB range when it is carved out of system parallel flash memory, but some hardware uses serial flash as well). In virtual environments, it can be of any size, and we at Oracle have intentions of heavily utilizing ACPI ERST to stuff all kinds of diagnostic information into it, thus wanting the storage to be very large. By not actually exposing/memory-mapping the storage, the issue of where to drop it in the memory map goes away (yes a PCI BAR could solve this).
> > > EJD: But at the end of the day, could this storage be memory mapped? I suppose it could be, but then that rather circumvents the entire need for the ACPI ERST interface to start with. Linux and Windows both already know how to utilize ACPI ERST.  
> >
> > Maybe I wasn't clear on it, I did not propose to map storage into guest.
> > Only use MemoryRegion provided by backend inside of your device.
> > This way you can drop all file related code from your patch,
> > and just work with read/store info from/to memory directly.
> >
> > [...]
> >  
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-05-17 16:31               ` Igor Mammedov
@ 2021-05-18 17:08                 ` Eric DeVolder
  2021-05-20 11:00                   ` Igor Mammedov
  0 siblings, 1 reply; 24+ messages in thread
From: Eric DeVolder @ 2021-05-18 17:08 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Konrad Wilk, mst, jusual, qemu-devel, pbonzini,
	Boris Ostrovsky, rth

[-- Attachment #1: Type: text/plain, Size: 11617 bytes --]

Hi Igor,
Thanks for the information. I am primarily interested in ensuring data persistence in the case of #1.
As it stands so far, I have yet to observe any kind of write back into the backing file. Just to summarize,
what I've done thus far is:

in erst_realizefn():
...
    s->hostmem_obj = object_new(TYPE_MEMORY_BACKEND_FILE);
    object_property_set_str(s->hostmem_obj, "mem-path", (const char *)(TYPE_ACPI_ERST ".hostmem"), &error_fatal);
    object_property_set_int(s->hostmem_obj, "size", s->prop_size, &error_fatal);
    user_creatable_complete(USER_CREATABLE(s->hostmem_obj), &error_fatal);
    s->hostmem = MEMORY_BACKEND(s->hostmem_obj);

and then in erst_update_backing_file(), which is called when records are created/updated:

...
    if ((mr = host_memory_backend_get_memory(s->hostmem))) {
        uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(mr);
        memcpy(p + offset, data, length);
        memory_region_msync(mr, 0, s->prop_size); /* for now, the whole thing */
}

I've instrumented this code, and I can see the records. I've instrumented memory_region_msync() all the way down
to qemu_msync() and it makes it into that code. But the end result has always been the same, the backing file is
never updated.

I'm not really sure what else I need to do to get the hostmem contents to be written back into the file.
Thanks,
eric


________________________________
From: Igor Mammedov <imammedo@redhat.com>
Sent: Monday, May 17, 2021 11:31 AM
To: Eric DeVolder <eric.devolder@oracle.com>
Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature

On Mon, 17 May 2021 15:01:02 +0000
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Hi Igor,
> I've been working to transition ERST to use the hostmem-file object as the backing store, as requested.
>
> I have the backend-file object now in ERST, and I have a question for you. This hostmem-file initializes
> itself from a file, but in looking at the code, I do not see that it ever writes back to the file!? Furthermore,
> I don't see a "flush" type method to force writeback of data in the object back to file?
>
> The original ERST code would flush/write to the backing file each record as it was created. I don't see
> any equivalent way of doing that with hostmem-file?

To force flush you can use memory_region_msync() on MemoryRegion that you get from hostmem backend.
But question is what are you trying to achieve with sync
  1. data persistence in case of QEMU crash
  2. data persistence in case of host crash

for the former you do not need explicit sync as memory buffers should be flushed to disk by kernel
if you put backend on nvdimm, you should get 2 without sync as well (see pmem=on property)

just do not forget that sync is not free, so if #1 is acceptable I'd avoid explicit sync.


> Please point out where I am misunderstanding.
>
> Thanks,
> eric
>
> ________________________________
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Monday, May 3, 2021 12:07 PM
> To: Eric DeVolder <eric.devolder@oracle.com>
> Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
> Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
>
> On Mon, 3 May 2021 15:49:28 +0000
> Eric DeVolder <eric.devolder@oracle.com> wrote:
>
> > Igor,
> > I've rebased the original patches on to qemu-v6.0.0-rc4, and finally have everything working as it previously did.
> > I've started now to work to incorporate the HostMemoryBackendFile; that is progressing.
> > My question for you today is with regard to placing ERST device on PCI. The PCI example provided is a template device, and while I do find that helpful, I still do not understand how the ERST Actions, which contain GAS for describing the register accesses, would be patched/linked when a PCI bar is assigned. Or is there perhaps another way of obtaining the PCI BAR using ACPI semantics?
>
> current order of initialization is,
>  0. QEMU builds initial ACPI tables (unpatched, mainly used to gauge total size of ACPI tables) and starts guest
>  1. guest firmware initializes PCI devices (including BARs)
>  2. guest reads ACPI tables from QEMU(via fwcfg)
>  2.1 reading ACPI tables traps into QEMU and QEMU rebuilds all ACPI tables (including ERST)
>       at this time one can get info from PCI devices (probably pci_get_bar_addr() is what you are looking for)
>       that were initialized by firmware and build tables using address.
>       Maybe it will need dynamic tables patching but lets get to that only if rebuilding table won't be enough
>
>
>
> > Thanks,
> > eric
> >
> > ________________________________
> > From: Igor Mammedov <imammedo@redhat.com>
> > Sent: Wednesday, April 14, 2021 4:17 AM
> > To: Eric DeVolder <eric.devolder@oracle.com>
> > Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
> > Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
> >
> > On Fri, 9 Apr 2021 15:54:47 +0000
> > Eric DeVolder <eric.devolder@oracle.com> wrote:
> >
> > > Hi Igor,
> > > Thank you for reviewing. I've responded inline below.
> > > eric
> > >
> > > ________________________________
> > > From: Igor Mammedov <imammedo@redhat.com>
> > > Sent: Tuesday, April 6, 2021 2:31 PM
> > > To: Eric DeVolder <eric.devolder@oracle.com>
> > > Cc: mst@redhat.com <mst@redhat.com>; marcel.apfelbaum@gmail.com <marcel.apfelbaum@gmail.com>; pbonzini@redhat.com <pbonzini@redhat.com>; rth@twiddle.net <rth@twiddle.net>; ehabkost@redhat.com <ehabkost@redhat.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; kwilk@oracle.com <kwilk@oracle.com>
> > > Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
> > >
> > > On Mon,  8 Feb 2021 15:57:55 -0500
> > > Eric DeVolder <eric.devolder@oracle.com> wrote:
> > >
> > > > This change implements the support for the ACPI ERST feature[1,2].
> > > >
> > > > The size of the ACPI ERST storage is declared via the QEMU
> > > > global parameter acpi-erst.size. The size can range from 64KiB
> > > > to to 64MiB. The default is 64KiB.
> > > >
> > > > The location of the ACPI ERST storage backing file is delared
> > > > via the QEMU global parameter acpi-erst.filename. The default
> > > > is acpi-erst.backing.
> > > >
> > > > [1] "Advanced Configuration and Power Interface Specification",
> > > >     version 6.2, May 2017.
> > > >     https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
> > > >
> > > > [2] "Unified Extensible Firmware Interface Specification",
> > > >     version 2.8, March 2019.
> > > >     https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> > > >
> > > > Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> > >
> > > items 2/4/5 from v1 review still need to be addressed.
> > >
> > > >
> > > > 2. patch is too big to review, please split it up in smaller chunks.
> > > >
> > > > EJD: Done.
> > >
> > > (separating a header and a makefile rule doesn't make much sense)
> > >
> > > it should be split at least on part that implements device model and ACPI parts
> > >
> > > EJD: I'll rebase this patch set on qemu-6 and accommodate your suggestions with how to split/organize the patch set.
> > >
> > > [...]
> > > >
> > > > 4. Maybe instead of SYSBUS device, implement it as a PCI device and
> > > >    use its BAR/control registers for pstore storage and control interface.
> > > >    It could save you headache of picking address where to map it +
> > > >    it would take care of migration part automatically, as firmware
> > > >    would do it for you and then QEMU could pickup firmware programmed
> > > >    address and put it into ERST table.
> > > > EJD: Thanks for the idea. For now I've left it as a SYSBUS device; we can revisit as needed.
> > >
> > > I would really prefer to see a PCI version (current way is just a hack)
> > >
> > > EJD: I understand, I don't like the base address problem either. Is there an example PCI device that gets its base address assigned during ACPI setup that I could reference and pattern this work after? I've been using SYSBUS as that most closely mimics the real hardware implementations I've studied in order to produce this code.
> > > EJD: I thought my inexperience with authoring QEMU devices was the primary problem in establishing a solution for the base address. Otherwise, this thing only needs a single 4KiB page (for the 2 registers + exchange buffer) exposed.
> >
> > I don't recall if we merged example PCI device in QEMU, but someone worked on it before.
> > Google search yields following:
> >  https://github.com/grandemk/qemu_devices/commit/ba8d38a858ba63ef4d419a926f58328b9675fc98
> >
> >
> > > > 5. instead of dealing with file for storage directly, reuse hostmem backend
> > > >    to provide it to for your device. ex: pc-dimm. i.e. split device
> > > >    on frontend and backend
> > > >
> > > > EJD: I had looked into that prior to posting v1. The entire ERST storage is not memory mapped, just an exchange buffer. So the hostmem backend is not suitable for this purpose.
> > >
> > > Is there a compelling reason why it can't be memory mapped?
> > >
> > > EJD: Well, this ERST device I've coded pretty much follows the ACPI ERST spec verbatim. As it stands today, the spec doesn't provide a way to report the total size of the persistent storage behind the interface; you know when storage is full only when you receive an Out Of Storage error code upon write. In a sense, that allows the size of the storage to vary greatly and be implemented in any way needed (ie actual hardware, this has tended to be in the 64KiB range when it is carved out of system parallel flash memory, but some hardware uses serial flash as well). In virtual environments, it can be of any size, and we at Oracle have intentions of heavily utilizing ACPI ERST to stuff all kinds of diagnostic information into it, thus wanting the storage to be very large. By not actually exposing/memory-mapping the storage, the issue of where to drop it in the memory map goes away (yes a PCI BAR could solve this).
> > > EJD: But at the end of the day, could this storage be memory mapped? I suppose it could be, but then that rather circumvents the entire need for the ACPI ERST interface to start with. Linux and Windows both already know how to utilize ACPI ERST.
> >
> > Maybe I wasn't clear on it, I did not propose to map storage into guest.
> > Only use MemoryRegion provided by backend inside of your device.
> > This way you can drop all file related code from your patch,
> > and just work with read/store info from/to memory directly.
> >
> > [...]
> >
>


[-- Attachment #2: Type: text/html, Size: 16245 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-05-18 17:08                 ` Eric DeVolder
@ 2021-05-20 11:00                   ` Igor Mammedov
  2021-05-25 20:22                     ` Eric DeVolder
  0 siblings, 1 reply; 24+ messages in thread
From: Igor Mammedov @ 2021-05-20 11:00 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, Konrad Wilk, mst, jusual, qemu-devel, pbonzini,
	Boris Ostrovsky, rth

On Tue, 18 May 2021 17:08:31 +0000
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Hi Igor,
> Thanks for the information. I am primarily interested in ensuring data persistence in the case of #1.
> As it stands so far, I have yet to observe any kind of write back into the backing file. Just to summarize,
> what I've done thus far is:
> 
> in erst_realizefn():
> ...
>     s->hostmem_obj = object_new(TYPE_MEMORY_BACKEND_FILE);
>     object_property_set_str(s->hostmem_obj, "mem-path", (const char *)(TYPE_ACPI_ERST ".hostmem"), &error_fatal);
>     object_property_set_int(s->hostmem_obj, "size", s->prop_size, &error_fatal);
>     user_creatable_complete(USER_CREATABLE(s->hostmem_obj), &error_fatal);
>     s->hostmem = MEMORY_BACKEND(s->hostmem_obj);

backend should be provided by user on CLI so all backend's properties are configured there
as user desires and frontend should access it via link property.
see how pc-dimm's memdev property is used.

> and then in erst_update_backing_file(), which is called when records are created/updated:
> 
> ...
>     if ((mr = host_memory_backend_get_memory(s->hostmem))) {
>         uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(mr);
>         memcpy(p + offset, data, length);
>         memory_region_msync(mr, 0, s->prop_size); /* for now, the whole thing */
> }
> 
> I've instrumented this code, and I can see the records. I've instrumented memory_region_msync() all the way down
> to qemu_msync() and it makes it into that code. But the end result has always been the same, the backing file is
> never updated.
> 
> I'm not really sure what else I need to do to get the hostmem contents to be written back into the file.

see "man mmap"
 in particular MAP_SHARED vs MAP_PRIVATE

and there is a corresponding property for the file backend to manage that.

in case #1 no explicit sync is needed, backing file should be updated on close at the latest
(whether it's graceful/or forced (i.e. crash))


> Thanks,
> eric
> 
> 
> ________________________________
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Monday, May 17, 2021 11:31 AM
> To: Eric DeVolder <eric.devolder@oracle.com>
> Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
> Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
> 
> On Mon, 17 May 2021 15:01:02 +0000
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
> > Hi Igor,
> > I've been working to transition ERST to use the hostmem-file object as the backing store, as requested.
> >
> > I have the backend-file object now in ERST, and I have a question for you. This hostmem-file initializes
> > itself from a file, but in looking at the code, I do not see that it ever writes back to the file!? Furthermore,
> > I don't see a "flush" type method to force writeback of data in the object back to file?
> >
> > The original ERST code would flush/write to the backing file each record as it was created. I don't see
> > any equivalent way of doing that with hostmem-file?  
> 
> To force flush you can use memory_region_msync() on MemoryRegion that you get from hostmem backend.
> But question is what are you trying to achieve with sync
>   1. data persistence in case of QEMU crash
>   2. data persistence in case of host crash
> 
> for the former you do not need explicit sync as memory buffers should be flushed to disk by kernel
> if you put backend on nvdimm, you should get 2 without sync as well (see pmem=on property)
> 
> just do not forget that sync is not free, so if #1 is acceptable I'd avoid explicit sync.
> 
> 
> > Please point out where I am misunderstanding.
> >
> > Thanks,
> > eric
[...]



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-05-20 11:00                   ` Igor Mammedov
@ 2021-05-25 20:22                     ` Eric DeVolder
  2021-05-25 21:43                       ` Igor Mammedov
  0 siblings, 1 reply; 24+ messages in thread
From: Eric DeVolder @ 2021-05-25 20:22 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Konrad Wilk, mst, jusual, qemu-devel, pbonzini,
	Boris Ostrovsky, rth

[-- Attachment #1: Type: text/plain, Size: 5018 bytes --]

Igor,
Thank you for the pointers, I've turned the corner on the use of hostmem-file and am able to use it now!
I do have one question regarding hostmem-file. In the VMStateDescription, I used to have this for the contents
of the nvram (backed by a file):

          VMSTATE_VARRAY_UINT32(nvram, ERSTDeviceState, prop_size, 0,
            vmstate_info_uint8, uint8_t),

I've searched and I do not locate an example of passing a HostMemoryBackend object; how do I do that?
Thanks,
eric



________________________________
From: Igor Mammedov <imammedo@redhat.com>
Sent: Thursday, May 20, 2021 6:00 AM
To: Eric DeVolder <eric.devolder@oracle.com>
Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature

On Tue, 18 May 2021 17:08:31 +0000
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Hi Igor,
> Thanks for the information. I am primarily interested in ensuring data persistence in the case of #1.
> As it stands so far, I have yet to observe any kind of write back into the backing file. Just to summarize,
> what I've done thus far is:
>
> in erst_realizefn():
> ...
>     s->hostmem_obj = object_new(TYPE_MEMORY_BACKEND_FILE);
>     object_property_set_str(s->hostmem_obj, "mem-path", (const char *)(TYPE_ACPI_ERST ".hostmem"), &error_fatal);
>     object_property_set_int(s->hostmem_obj, "size", s->prop_size, &error_fatal);
>     user_creatable_complete(USER_CREATABLE(s->hostmem_obj), &error_fatal);
>     s->hostmem = MEMORY_BACKEND(s->hostmem_obj);

backend should be provided by user on CLI so all backend's properties are configured there
as user desires and frontend should access it via link property.
see how pc-dimm's memdev property is used.

> and then in erst_update_backing_file(), which is called when records are created/updated:
>
> ...
>     if ((mr = host_memory_backend_get_memory(s->hostmem))) {
>         uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(mr);
>         memcpy(p + offset, data, length);
>         memory_region_msync(mr, 0, s->prop_size); /* for now, the whole thing */
> }
>
> I've instrumented this code, and I can see the records. I've instrumented memory_region_msync() all the way down
> to qemu_msync() and it makes it into that code. But the end result has always been the same, the backing file is
> never updated.
>
> I'm not really sure what else I need to do to get the hostmem contents to be written back into the file.

see "man mmap"
 in particular MAP_SHARED vs MAP_PRIVATE

and there is a corresponding property for the file backend to manage that.

in case #1 no explicit sync is needed, backing file should be updated on close at the latest
(whether it's graceful/or forced (i.e. crash))


> Thanks,
> eric
>
>
> ________________________________
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Monday, May 17, 2021 11:31 AM
> To: Eric DeVolder <eric.devolder@oracle.com>
> Cc: ehabkost@redhat.com <ehabkost@redhat.com>; mst@redhat.com <mst@redhat.com>; Konrad Wilk <konrad.wilk@oracle.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>; pbonzini@redhat.com <pbonzini@redhat.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>; rth@twiddle.net <rth@twiddle.net>; jusual@redhat.com <jusual@redhat.com>
> Subject: Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
>
> On Mon, 17 May 2021 15:01:02 +0000
> Eric DeVolder <eric.devolder@oracle.com> wrote:
>
> > Hi Igor,
> > I've been working to transition ERST to use the hostmem-file object as the backing store, as requested.
> >
> > I have the backend-file object now in ERST, and I have a question for you. This hostmem-file initializes
> > itself from a file, but in looking at the code, I do not see that it ever writes back to the file!? Furthermore,
> > I don't see a "flush" type method to force writeback of data in the object back to file?
> >
> > The original ERST code would flush/write to the backing file each record as it was created. I don't see
> > any equivalent way of doing that with hostmem-file?
>
> To force flush you can use memory_region_msync() on MemoryRegion that you get from hostmem backend.
> But question is what are you trying to achieve with sync
>   1. data persistence in case of QEMU crash
>   2. data persistence in case of host crash
>
> for the former you do not need explicit sync as memory buffers should be flushed to disk by kernel
> if you put backend on nvdimm, you should get 2 without sync as well (see pmem=on property)
>
> just do not forget that sync is not free, so if #1 is acceptable I'd avoid explicit sync.
>
>
> > Please point out where I am misunderstanding.
> >
> > Thanks,
> > eric
[...]


[-- Attachment #2: Type: text/html, Size: 8254 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature
  2021-05-25 20:22                     ` Eric DeVolder
@ 2021-05-25 21:43                       ` Igor Mammedov
  0 siblings, 0 replies; 24+ messages in thread
From: Igor Mammedov @ 2021-05-25 21:43 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, Konrad Wilk, mst, jusual, qemu-devel, dgilbert,
	pbonzini, Boris Ostrovsky, rth

On Tue, 25 May 2021 20:22:26 +0000
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Igor,
> Thank you for the pointers, I've turned the corner on the use of hostmem-file and am able to use it now!
> I do have one question regarding hostmem-file. In the VMStateDescription, I used to have this for the contents
> of the nvram (backed by a file):
> 
>           VMSTATE_VARRAY_UINT32(nvram, ERSTDeviceState, prop_size, 0,
>             vmstate_info_uint8, uint8_t),
> 
> I've searched and I do not locate an example of passing a HostMemoryBackend object; how do I do that?

if I'm not wrong, you do not need to do it, i.e. treat backing file as any other storage,
i.e. put it on shared storage were VM's hard disks are located so source and target host can access it.

alternatively you can mark MemoryRegion as migratable, which will make QEMU stream its content
over migration socket automatically. In that case file associated with backend doesn't need to
be shared between source and target hosts. I think this option is fine if file in question is small,
but others might think otherwise. (obvious drawback is that it increases migration time)

CCed David as he might have an opinion about it from migration point of view.


> Thanks,
> eric
> 
> 
[...]



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-05-25 21:44 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-08 20:57 [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
2021-02-08 20:57 ` [PATCH v2 1/7] ACPI ERST: bios-tables-test.c steps 1 and 2 Eric DeVolder
2021-02-08 20:57 ` [PATCH v2 2/7] ACPI ERST: header file for erst Eric DeVolder
2021-02-08 20:57 ` [PATCH v2 3/7] ACPI ERST: support for ACPI ERST feature Eric DeVolder
2021-04-06 19:31   ` Igor Mammedov
2021-04-09 15:54     ` Eric DeVolder
2021-04-14  9:17       ` Igor Mammedov
2021-05-03 15:49         ` Eric DeVolder
2021-05-03 17:07           ` Igor Mammedov
2021-05-17 15:01             ` Eric DeVolder
2021-05-17 16:31               ` Igor Mammedov
2021-05-18 17:08                 ` Eric DeVolder
2021-05-20 11:00                   ` Igor Mammedov
2021-05-25 20:22                     ` Eric DeVolder
2021-05-25 21:43                       ` Igor Mammedov
2021-02-08 20:57 ` [PATCH v2 4/7] ACPI ERST: build step for ACPI ERST Eric DeVolder
2021-04-06 19:17   ` Igor Mammedov
2021-02-08 20:57 ` [PATCH v2 5/7] ACPI ERST: support ERST for x86 guest Eric DeVolder
2021-02-08 20:57 ` [PATCH v2 6/7] ACPI ERST: qtest for ERST Eric DeVolder
2021-02-08 20:57 ` [PATCH v2 7/7] ACPI ERST: bios-tables-test.c step 5 Eric DeVolder
2021-03-01 19:04 ` [PATCH v2 0/7] acpi: Error Record Serialization Table, ERST, support for QEMU Eric Devolder
2021-04-01 13:44   ` Michael S. Tsirkin
2021-05-14 13:57 ` Michael S. Tsirkin
2021-05-14 14:12   ` Eric DeVolder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).