All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU
@ 2021-08-05 22:30 Eric DeVolder
  2021-08-05 22:30 ` [PATCH v6 01/10] ACPI ERST: bios-tables-test.c steps 1 and 2 Eric DeVolder
                   ` (9 more replies)
  0 siblings, 10 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-08-05 22:30 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

This patchset introduces support for the ACPI Error Record
Serialization Table, ERST.

For background and implementation information, please see
docs/specs/acpi_erst.txt, which is patch 2/10.

Suggested-by: Konrad Wilk <konrad.wilk@oracle.com>
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>

---
v6: 5aug2021
 - Fixed compile warning/error, per Michael Tsirkin
 - Fixed mingw32 build error, per Michael
 - Converted exchange buffer to MemoryBackend, per Igor
 - Migrated test to PCI, per Igor
 - Significantly reduced amount of copying, per Igor
 - Corrections/enhancements to acpi_erst.txt, per Igor
 - Many misc/other small items, per Igor

v5: 30jun2021
 - Create docs/specs/acpi_erst.txt, per Igor
 - Separate PCI BARs for registers and memory, per Igor
 - Convert debugging to use trace infrastructure, per Igor
 - Various other fixups, per Igor

v4: 11jun2021
 - Converted to a PCI device, per Igor.
 - Updated qtest.
 - Rearranged patches, per Igor.

v3: 28may2021
 - Converted to using a TYPE_MEMORY_BACKEND_FILE object rather than
   internal array with explicit file operations, per Igor.
 - Changed the way the qdev and base address are handled, allowing
   ERST to be disabled at run-time. Also aligns better with other
   existing code.

v2: 8feb2021
 - Added qtest/smoke test per Paolo Bonzini
 - Split patch into smaller chunks, per Igor Mammedov
 - Did away with use of ACPI packed structures, per Igor Mammedov

v1: 26oct2020
 - initial post

---
Eric DeVolder (10):
  ACPI ERST: bios-tables-test.c steps 1 and 2
  ACPI ERST: specification for ERST support
  ACPI ERST: PCI device_id for ERST
  ACPI ERST: header file for ERST
  ACPI ERST: support for ACPI ERST feature
  ACPI ERST: build the ACPI ERST table
  ACPI ERST: create ACPI ERST table for pc/x86 machines
  ACPI ERST: qtest for ERST
  ACPI ERST: bios-tables-test testcase
  ACPI ERST: step 6 of bios-tables-test

 docs/specs/acpi_erst.txt          | 147 ++++++
 hw/acpi/erst.c                    | 989 ++++++++++++++++++++++++++++++++++++++
 hw/acpi/meson.build               |   1 +
 hw/acpi/trace-events              |  15 +
 hw/i386/acpi-build.c              |   9 +
 hw/i386/acpi-microvm.c            |   9 +
 include/hw/acpi/erst.h            |  24 +
 include/hw/pci/pci.h              |   1 +
 tests/data/acpi/microvm/ERST      |   0
 tests/data/acpi/microvm/ERST.pcie | Bin 0 -> 912 bytes
 tests/data/acpi/pc/DSDT           | Bin 6002 -> 6009 bytes
 tests/data/acpi/pc/ERST           | Bin 0 -> 912 bytes
 tests/data/acpi/q35/DSDT          | Bin 8289 -> 8306 bytes
 tests/data/acpi/q35/ERST          | Bin 0 -> 912 bytes
 tests/qtest/bios-tables-test.c    |  43 ++
 tests/qtest/erst-test.c           | 167 +++++++
 tests/qtest/meson.build           |   2 +
 17 files changed, 1407 insertions(+)
 create mode 100644 docs/specs/acpi_erst.txt
 create mode 100644 hw/acpi/erst.c
 create mode 100644 include/hw/acpi/erst.h
 create mode 100644 tests/data/acpi/microvm/ERST
 create mode 100644 tests/data/acpi/microvm/ERST.pcie
 create mode 100644 tests/data/acpi/pc/ERST
 create mode 100644 tests/data/acpi/q35/ERST
 create mode 100644 tests/qtest/erst-test.c

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v6 01/10] ACPI ERST: bios-tables-test.c steps 1 and 2
  2021-08-05 22:30 [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
@ 2021-08-05 22:30 ` Eric DeVolder
  2021-09-20 13:05   ` Igor Mammedov
  2021-08-05 22:30 ` [PATCH v6 02/10] ACPI ERST: specification for ERST support Eric DeVolder
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Eric DeVolder @ 2021-08-05 22:30 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

Following the guidelines in tests/qtest/bios-tables-test.c, this
change adds empty placeholder files per step 1 for the new ERST
table, and excludes resulting changed files in bios-tables-test-allowed-diff.h
per step 2.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 tests/data/acpi/microvm/ERST                | 0
 tests/data/acpi/pc/ERST                     | 0
 tests/data/acpi/q35/ERST                    | 0
 tests/qtest/bios-tables-test-allowed-diff.h | 6 ++++++
 4 files changed, 6 insertions(+)
 create mode 100644 tests/data/acpi/microvm/ERST
 create mode 100644 tests/data/acpi/pc/ERST
 create mode 100644 tests/data/acpi/q35/ERST

diff --git a/tests/data/acpi/microvm/ERST b/tests/data/acpi/microvm/ERST
new file mode 100644
index 0000000..e69de29
diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
new file mode 100644
index 0000000..e69de29
diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
new file mode 100644
index 0000000..e69de29
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523..b3aaf76 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,7 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/ERST",
+"tests/data/acpi/q35/ERST",
+"tests/data/acpi/microvm/ERST",
+"tests/data/acpi/pc/DSDT",
+"tests/data/acpi/q35/DSDT",
+"tests/data/acpi/microvm/DSDT",
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 02/10] ACPI ERST: specification for ERST support
  2021-08-05 22:30 [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
  2021-08-05 22:30 ` [PATCH v6 01/10] ACPI ERST: bios-tables-test.c steps 1 and 2 Eric DeVolder
@ 2021-08-05 22:30 ` Eric DeVolder
  2021-09-20 13:38   ` Igor Mammedov
                     ` (2 more replies)
  2021-08-05 22:30 ` [PATCH v6 03/10] ACPI ERST: PCI device_id for ERST Eric DeVolder
                   ` (7 subsequent siblings)
  9 siblings, 3 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-08-05 22:30 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

Information on the implementation of the ACPI ERST support.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 docs/specs/acpi_erst.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)
 create mode 100644 docs/specs/acpi_erst.txt

diff --git a/docs/specs/acpi_erst.txt b/docs/specs/acpi_erst.txt
new file mode 100644
index 0000000..7f7544f
--- /dev/null
+++ b/docs/specs/acpi_erst.txt
@@ -0,0 +1,147 @@
+ACPI ERST DEVICE
+================
+
+The ACPI ERST device is utilized to support the ACPI Error Record
+Serialization Table, ERST, functionality. This feature is designed for
+storing error records in persistent storage for future reference
+and/or debugging.
+
+The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
+(APEI)", and specifically subsection "Error Serialization", outlines a
+method for storing error records into persistent storage.
+
+The format of error records is described in the UEFI specification[2],
+in Appendix N "Common Platform Error Record".
+
+While the ACPI specification allows for an NVRAM "mode" (see
+GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES) where non-volatile RAM is
+directly exposed for direct access by the OS/guest, this device
+implements the non-NVRAM "mode". This non-NVRAM "mode" is what is
+implemented by most BIOS (since flash memory requires programming
+operations in order to update its contents). Furthermore, as of the
+time of this writing, Linux only supports the non-NVRAM "mode".
+
+
+Background/Motivation
+---------------------
+
+Linux uses the persistent storage filesystem, pstore, to record
+information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
+independent of, and runs before, kdump.  In certain scenarios (ie.
+hosts/guests with root filesystems on NFS/iSCSI where networking
+software and/or hardware fails), pstore may contain information
+available for post-mortem debugging.
+
+Two common storage backends for the pstore filesystem are ACPI ERST
+and UEFI. Most BIOS implement ACPI ERST.  UEFI is not utilized in all
+guests. With QEMU supporting ACPI ERST, it becomes a viable pstore
+storage backend for virtual machines (as it is now for bare metal
+machines).
+
+Enabling support for ACPI ERST facilitates a consistent method to
+capture kernel panic information in a wide range of guests: from
+resource-constrained microvms to very large guests, and in particular,
+in direct-boot environments (which would lack UEFI run-time services).
+
+Note that Microsoft Windows also utilizes the ACPI ERST for certain
+crash information, if available[3].
+
+
+Configuration|Usage
+-------------------
+
+To use ACPI ERST, a memory-backend-file object and acpi-erst device
+can be created, for example:
+
+ qemu ...
+ -object memory-backend-file,id=erstnvram,mem-path=acpi-erst.backing,size=0x10000,share=on \
+ -device acpi-erst,memdev=erstnvram
+
+For proper operation, the ACPI ERST device needs a memory-backend-file
+object with the following parameters:
+
+ - id: The id of the memory-backend-file object is used to associate
+   this memory with the acpi-erst device.
+ - size: The size of the ACPI ERST backing storage. This parameter is
+   required.
+ - mem-path: The location of the ACPI ERST backing storage file. This
+   parameter is also required.
+ - share: The share=on parameter is required so that updates to the
+   ERST backing store are written to the file.
+
+and ERST device:
+
+ - memdev: Is the object id of the memory-backend-file.
+
+
+PCI Interface
+-------------
+
+The ERST device is a PCI device with two BARs, one for accessing the
+programming registers, and the other for accessing the record exchange
+buffer.
+
+BAR0 contains the programming interface consisting of ACTION and VALUE
+64-bit registers.  All ERST actions/operations/side effects happen on
+the write to the ACTION, by design. Any data needed by the action must
+be placed into VALUE prior to writing ACTION.  Reading the VALUE
+simply returns the register contents, which can be updated by a
+previous ACTION.
+
+BAR1 contains the 8KiB record exchange buffer, which is the
+implemented maximum record size.
+
+
+Backend Storage Format
+----------------------
+
+The backend storage is divided into fixed size "slots", 8KiB in
+length, with each slot storing a single record.  Not all slots need to
+be occupied, and they need not be occupied in a contiguous fashion.
+The ability to clear/erase specific records allows for the formation
+of unoccupied slots.
+
+Slot 0 is reserved for a backend storage header that identifies the
+contents as ERST and also facilitates efficient access to the records.
+Depending upon the size of the backend storage, additional slots will
+be reserved to be a part of the slot 0 header. For example, at 8KiB,
+the slot 0 header can accomodate 1024 records. Thus a storage size
+above 8MiB (8KiB * 1024) requires an additional slot. In this
+scenario, slot 0 and slot 1 form the backend storage header, and
+records can be stored starting at slot 2.
+
+Below is an example layout of the backend storage format (for storage
+size less than 8MiB). The size of the storage is a multiple of 8KiB,
+and contains N number of slots to store records. The example below
+shows two records (in CPER format) in the backend storage, while the
+remaining slots are empty/available.
+
+ Slot   Record
+        +--------------------------------------------+
+    0   | reserved: storage header                   |
+        +--------------------------------------------+
+    1   | empty/available                            |
+        +--------------------------------------------+
+    2   | CPER                                       |
+        +--------------------------------------------+
+    3   | CPER                                       |
+        +--------------------------------------------+
+  ...   |                                            |
+        +--------------------------------------------+
+    N   | empty/available                            |
+        +--------------------------------------------+
+        <------------------ 8KiB -------------------->
+
+
+
+References
+----------
+
+[1] "Advanced Configuration and Power Interface Specification",
+    version 4.0, June 2009.
+
+[2] "Unified Extensible Firmware Interface Specification",
+    version 2.1, October 2008.
+
+[3] "Windows Hardware Error Architecture", specfically
+    "Error Record Persistence Mechanism".
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 03/10] ACPI ERST: PCI device_id for ERST
  2021-08-05 22:30 [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
  2021-08-05 22:30 ` [PATCH v6 01/10] ACPI ERST: bios-tables-test.c steps 1 and 2 Eric DeVolder
  2021-08-05 22:30 ` [PATCH v6 02/10] ACPI ERST: specification for ERST support Eric DeVolder
@ 2021-08-05 22:30 ` Eric DeVolder
  2021-09-21 11:32   ` Igor Mammedov
  2021-08-05 22:30 ` [PATCH v6 04/10] ACPI ERST: header file " Eric DeVolder
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Eric DeVolder @ 2021-08-05 22:30 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

This change reserves the PCI device_id for the new ACPI ERST
device.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 include/hw/pci/pci.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d0f4266..58101d8 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -108,6 +108,7 @@ extern bool pci_available;
 #define PCI_DEVICE_ID_REDHAT_MDPY        0x000f
 #define PCI_DEVICE_ID_REDHAT_NVME        0x0010
 #define PCI_DEVICE_ID_REDHAT_PVPANIC     0x0011
+#define PCI_DEVICE_ID_REDHAT_ACPI_ERST   0x0012
 #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
 
 #define FMT_PCIBUS                      PRIx64
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 04/10] ACPI ERST: header file for ERST
  2021-08-05 22:30 [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (2 preceding siblings ...)
  2021-08-05 22:30 ` [PATCH v6 03/10] ACPI ERST: PCI device_id for ERST Eric DeVolder
@ 2021-08-05 22:30 ` Eric DeVolder
  2021-08-05 22:30 ` [PATCH v6 05/10] ACPI ERST: support for ACPI ERST feature Eric DeVolder
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-08-05 22:30 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

This change introduces the public defintions for ACPI ERST.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 include/hw/acpi/erst.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
 create mode 100644 include/hw/acpi/erst.h

diff --git a/include/hw/acpi/erst.h b/include/hw/acpi/erst.h
new file mode 100644
index 0000000..9d63717
--- /dev/null
+++ b/include/hw/acpi/erst.h
@@ -0,0 +1,19 @@
+/*
+ * ACPI Error Record Serialization Table, ERST, Implementation
+ *
+ * ACPI ERST introduced in ACPI 4.0, June 16, 2009.
+ * ACPI Platform Error Interfaces : Error Serialization
+ *
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef HW_ACPI_ERST_H
+#define HW_ACPI_ERST_H
+
+void build_erst(GArray *table_data, BIOSLinker *linker, Object *erst_dev,
+                const char *oem_id, const char *oem_table_id);
+
+#define TYPE_ACPI_ERST "acpi-erst"
+
+#endif
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 05/10] ACPI ERST: support for ACPI ERST feature
  2021-08-05 22:30 [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (3 preceding siblings ...)
  2021-08-05 22:30 ` [PATCH v6 04/10] ACPI ERST: header file " Eric DeVolder
@ 2021-08-05 22:30 ` Eric DeVolder
  2021-09-21 15:30   ` Igor Mammedov
  2021-08-05 22:30 ` [PATCH v6 06/10] ACPI ERST: build the ACPI ERST table Eric DeVolder
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Eric DeVolder @ 2021-08-05 22:30 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

This implements a PCI device for ACPI ERST. This implements the
non-NVRAM "mode" of operation for ERST as it is supported by
Linux and Windows.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 hw/acpi/erst.c       | 750 +++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/acpi/meson.build  |   1 +
 hw/acpi/trace-events |  15 ++
 3 files changed, 766 insertions(+)
 create mode 100644 hw/acpi/erst.c

diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
new file mode 100644
index 0000000..eb4ab34
--- /dev/null
+++ b/hw/acpi/erst.c
@@ -0,0 +1,750 @@
+/*
+ * ACPI Error Record Serialization Table, ERST, Implementation
+ *
+ * ACPI ERST introduced in ACPI 4.0, June 16, 2009.
+ * ACPI Platform Error Interfaces : Error Serialization
+ *
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/qdev-core.h"
+#include "exec/memory.h"
+#include "qom/object.h"
+#include "hw/pci/pci.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "migration/vmstate.h"
+#include "hw/qdev-properties.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/acpi-defs.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "exec/address-spaces.h"
+#include "sysemu/hostmem.h"
+#include "hw/acpi/erst.h"
+#include "trace.h"
+
+/* ACPI 4.0: Table 17-16 Serialization Actions */
+#define ACTION_BEGIN_WRITE_OPERATION         0x0
+#define ACTION_BEGIN_READ_OPERATION          0x1
+#define ACTION_BEGIN_CLEAR_OPERATION         0x2
+#define ACTION_END_OPERATION                 0x3
+#define ACTION_SET_RECORD_OFFSET             0x4
+#define ACTION_EXECUTE_OPERATION             0x5
+#define ACTION_CHECK_BUSY_STATUS             0x6
+#define ACTION_GET_COMMAND_STATUS            0x7
+#define ACTION_GET_RECORD_IDENTIFIER         0x8
+#define ACTION_SET_RECORD_IDENTIFIER         0x9
+#define ACTION_GET_RECORD_COUNT              0xA
+#define ACTION_BEGIN_DUMMY_WRITE_OPERATION   0xB
+#define ACTION_RESERVED                      0xC
+#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE   0xD
+#define ACTION_GET_ERROR_LOG_ADDRESS_LENGTH  0xE
+#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES 0xF
+#define ACTION_GET_EXECUTE_OPERATION_TIMINGS 0x10
+
+/* ACPI 4.0: Table 17-17 Command Status Definitions */
+#define STATUS_SUCCESS                0x00
+#define STATUS_NOT_ENOUGH_SPACE       0x01
+#define STATUS_HARDWARE_NOT_AVAILABLE 0x02
+#define STATUS_FAILED                 0x03
+#define STATUS_RECORD_STORE_EMPTY     0x04
+#define STATUS_RECORD_NOT_FOUND       0x05
+
+
+/* UEFI 2.1: Appendix N Common Platform Error Record */
+#define UEFI_CPER_RECORD_MIN_SIZE 128U
+#define UEFI_CPER_RECORD_LENGTH_OFFSET 20U
+#define UEFI_CPER_RECORD_ID_OFFSET 96U
+#define IS_UEFI_CPER_RECORD(ptr) \
+    (((ptr)[0] == 'C') && \
+     ((ptr)[1] == 'P') && \
+     ((ptr)[2] == 'E') && \
+     ((ptr)[3] == 'R'))
+#define THE_UEFI_CPER_RECORD_ID(ptr) \
+    (*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
+
+/*
+ * This implementation is an ACTION (cmd) and VALUE (data)
+ * interface consisting of just two 64-bit registers.
+ */
+#define ERST_REG_SIZE (16UL)
+#define ERST_ACTION_OFFSET (0UL) /* action (cmd) */
+#define ERST_VALUE_OFFSET  (8UL) /* argument/value (data) */
+
+/*
+ * ERST_RECORD_SIZE is the buffer size for exchanging ERST
+ * record contents. Thus, it defines the maximum record size.
+ * As this is mapped through a PCI BAR, it must be a power of
+ * two and larger than UEFI_CPER_RECORD_MIN_SIZE.
+ * The backing storage is divided into fixed size "slots",
+ * each ERST_RECORD_SIZE in length, and each "slot"
+ * storing a single record. No attempt at optimizing storage
+ * through compression, compaction, etc is attempted.
+ * NOTE that slot 0 is reserved for the backing storage header.
+ * Depending upon the size of the backing storage, additional
+ * slots will be part of the slot 0 header in order to account
+ * for a record_id for each available remaining slot.
+ */
+/* 8KiB records, not too small, not too big */
+#define ERST_RECORD_SIZE (8192UL)
+
+#define ACPI_ERST_MEMDEV_PROP "memdev"
+
+/*
+ * From the ACPI ERST spec sections:
+ * A record id of all 0s is used to indicate
+ * 'unspecified' record id.
+ * A record id of all 1s is used to indicate
+ * empty or end.
+ */
+#define ERST_UNSPECIFIED_RECORD_ID (0UL)
+#define ERST_EMPTY_END_RECORD_ID (~0UL)
+#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
+#define ERST_IS_VALID_RECORD_ID(rid) \
+    ((rid != ERST_UNSPECIFIED_RECORD_ID) && \
+     (rid != ERST_EMPTY_END_RECORD_ID))
+
+typedef struct erst_storage_header_s {
+#define ERST_STORE_MAGIC 0x524F545354535245UL
+    uint64_t magic;
+    uint32_t record_size;
+    uint32_t record_offset; /* offset to record storage beyond header */
+    uint16_t version;
+    uint16_t reserved;
+    uint32_t record_count;
+    uint64_t map[]; /* contains record_ids, and position indicates index */
+} erst_storage_header_t;
+
+/*
+ * Object cast macro
+ */
+#define ACPIERST(obj) \
+    OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
+
+/*
+ * Main ERST device state structure
+ */
+typedef struct {
+    PCIDevice parent_obj;
+
+    /* Backend storage */
+    HostMemoryBackend *hostmem;
+    MemoryRegion *hostmem_mr;
+
+    /* Programming registers */
+    MemoryRegion iomem;
+
+    /* Exchange buffer */
+    Object *exchange_obj;
+    HostMemoryBackend *exchange;
+    MemoryRegion *exchange_mr;
+    uint32_t storage_size;
+
+    /* Interface state */
+    uint8_t operation;
+    uint8_t busy_status;
+    uint8_t command_status;
+    uint32_t record_offset;
+    uint64_t reg_action;
+    uint64_t reg_value;
+    uint64_t record_identifier;
+    erst_storage_header_t *header;
+    unsigned next_record_index;
+    unsigned first_record_index;
+    unsigned last_record_index;
+
+} ERSTDeviceState;
+
+/*******************************************************************/
+/*******************************************************************/
+
+static uint8_t *get_nvram_ptr_by_index(ERSTDeviceState *s, unsigned index)
+{
+    uint8_t *rc = NULL;
+    off_t offset = (index * ERST_RECORD_SIZE);
+    if ((offset + ERST_RECORD_SIZE) <= s->storage_size) {
+        if (s->hostmem_mr) {
+            uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(s->hostmem_mr);
+            rc = p + offset;
+        }
+    }
+    return rc;
+}
+
+static void make_erst_storage_header(ERSTDeviceState *s)
+{
+    erst_storage_header_t *header = s->header;
+    unsigned mapsz, headersz;
+
+    header->magic = ERST_STORE_MAGIC;
+    header->record_size = ERST_RECORD_SIZE;
+    header->version = 0x0101;
+    header->reserved = 0x0000;
+
+    /* Compute mapsize */
+    mapsz = s->storage_size / ERST_RECORD_SIZE;
+    mapsz *= sizeof(uint64_t);
+    /* Compute header+map size */
+    headersz = sizeof(erst_storage_header_t) + mapsz;
+    /* Round up to nearest integer multiple of ERST_RECORD_SIZE */
+    headersz += (ERST_RECORD_SIZE - 1);
+    headersz /= ERST_RECORD_SIZE;
+    headersz *= ERST_RECORD_SIZE;
+    header->record_offset = headersz;
+
+    /*
+     * The HostMemoryBackend initializes contents to zero,
+     * so all record_ids stashed in the map are zero'd.
+     * As well the record_count is zero. Properly initialized.
+     */
+}
+
+static void check_erst_backend_storage(ERSTDeviceState *s, Error **errp)
+{
+    erst_storage_header_t *header;
+
+    header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);
+    s->header = header;
+
+    /* Check if header is uninitialized */
+    if (header->magic == 0UL) { /* HostMemoryBackend inits to 0 */
+        make_erst_storage_header(s);
+    }
+
+    if (!(
+        (header->magic == ERST_STORE_MAGIC) &&
+        (header->record_size == ERST_RECORD_SIZE) &&
+        ((header->record_offset % ERST_RECORD_SIZE) == 0) &&
+        (header->version == 0x0101) &&
+        (header->reserved == 0x0000)
+        )) {
+        error_setg(errp, "ERST backend storage header is invalid");
+    }
+
+    /* Compute offset of first and last record storage slot */
+    s->first_record_index = header->record_offset / ERST_RECORD_SIZE;
+    s->last_record_index = (s->storage_size / ERST_RECORD_SIZE);
+}
+
+static void set_erst_map_by_index(ERSTDeviceState *s, unsigned index,
+    uint64_t record_id)
+{
+    if (index < s->last_record_index) {
+        s->header->map[index] = record_id;
+    }
+}
+
+static unsigned lookup_erst_record(ERSTDeviceState *s,
+    uint64_t record_identifier)
+{
+    unsigned rc = 0; /* 0 not a valid index */
+    unsigned index = s->first_record_index;
+
+    /* Find the record_identifier in the map */
+    if (record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
+        /*
+         * Count number of valid records encountered, and
+         * short-circuit the loop if identifier not found
+         */
+        unsigned count = 0;
+        for (; index < s->last_record_index &&
+                count < s->header->record_count; ++index) {
+            uint64_t map_record_identifier = s->header->map[index];
+            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
+                ++count;
+            }
+            if (map_record_identifier == record_identifier) {
+                rc = index;
+                break;
+            }
+        }
+    } else {
+        /* Find first available unoccupied slot */
+        for (; index < s->last_record_index; ++index) {
+            if (s->header->map[index] == ERST_UNSPECIFIED_RECORD_ID) {
+                rc = index;
+                break;
+            }
+        }
+    }
+
+    return rc;
+}
+
+/* ACPI 4.0: 17.4.2.3 Operations - Clearing */
+static unsigned clear_erst_record(ERSTDeviceState *s)
+{
+    unsigned rc = STATUS_RECORD_NOT_FOUND;
+    unsigned index;
+
+    /* Check for valid record identifier */
+    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
+        return STATUS_FAILED;
+    }
+
+    index = lookup_erst_record(s, s->record_identifier);
+    if (index) {
+        /* No need to wipe record, just invalidate its map entry */
+        set_erst_map_by_index(s, index, ERST_UNSPECIFIED_RECORD_ID);
+        s->header->record_count -= 1;
+        rc = STATUS_SUCCESS;
+    }
+
+    return rc;
+}
+
+/* ACPI 4.0: 17.4.2.2 Operations - Reading */
+static unsigned read_erst_record(ERSTDeviceState *s)
+{
+    unsigned rc = STATUS_RECORD_NOT_FOUND;
+    unsigned index;
+
+    /* Check record boundary wihin exchange buffer */
+    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
+        return STATUS_FAILED;
+    }
+
+    /* Check for valid record identifier */
+    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
+        return STATUS_FAILED;
+    }
+
+    index = lookup_erst_record(s, s->record_identifier);
+    if (index) {
+        uint8_t *ptr;
+        uint8_t *record = ((uint8_t *)
+            memory_region_get_ram_ptr(s->exchange_mr) +
+            s->record_offset);
+        ptr = get_nvram_ptr_by_index(s, index);
+        memcpy(record, ptr, ERST_RECORD_SIZE - s->record_offset);
+        rc = STATUS_SUCCESS;
+    }
+
+    return rc;
+}
+
+/* ACPI 4.0: 17.4.2.1 Operations - Writing */
+static unsigned write_erst_record(ERSTDeviceState *s)
+{
+    unsigned rc = STATUS_FAILED;
+    unsigned index;
+    uint64_t record_identifier;
+    uint8_t *record;
+    uint8_t *ptr = NULL;
+    bool record_found = false;
+
+    /* Check record boundary wihin exchange buffer */
+    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
+        return STATUS_FAILED;
+    }
+
+    /* Extract record identifier */
+    record = ((uint8_t *)memory_region_get_ram_ptr(s->exchange_mr)
+        + s->record_offset);
+    record_identifier = THE_UEFI_CPER_RECORD_ID(record);
+
+    /* Check for valid record identifier */
+    if (!ERST_IS_VALID_RECORD_ID(record_identifier)) {
+        return STATUS_FAILED;
+    }
+
+    index = lookup_erst_record(s, record_identifier);
+    if (index) {
+        /* Record found, overwrite existing record */
+        ptr = get_nvram_ptr_by_index(s, index);
+        record_found = true;
+    } else {
+        /* Record not found, not an overwrite, allocate for write */
+        index = lookup_erst_record(s, ERST_UNSPECIFIED_RECORD_ID);
+        if (index) {
+            ptr = get_nvram_ptr_by_index(s, index);
+        } else {
+            rc = STATUS_NOT_ENOUGH_SPACE;
+        }
+    }
+    if (ptr) {
+        memcpy(ptr, record, ERST_RECORD_SIZE - s->record_offset);
+        if (0 != s->record_offset) {
+            memset(&ptr[ERST_RECORD_SIZE - s->record_offset],
+                0xFF, s->record_offset);
+        }
+        if (!record_found) {
+            s->header->record_count += 1; /* writing new record */
+        }
+        set_erst_map_by_index(s, index, record_identifier);
+        rc = STATUS_SUCCESS;
+    }
+
+    return rc;
+}
+
+/* ACPI 4.0: 17.4.2.2 Operations - Reading "During boot..." */
+static unsigned next_erst_record(ERSTDeviceState *s,
+    uint64_t *record_identifier)
+{
+    unsigned rc = STATUS_RECORD_NOT_FOUND;
+    unsigned index = s->next_record_index;
+
+    *record_identifier = ERST_EMPTY_END_RECORD_ID;
+
+    if (s->header->record_count) {
+        for (; index < s->last_record_index; ++index) {
+            uint64_t map_record_identifier;
+            map_record_identifier = s->header->map[index];
+            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
+                    /* where to start next time */
+                    s->next_record_index = index + 1;
+                    *record_identifier = map_record_identifier;
+                    rc = STATUS_SUCCESS;
+                    break;
+            }
+        }
+    }
+    if (rc != STATUS_SUCCESS) {
+        if (s->next_record_index == s->first_record_index) {
+            /*
+             * next_record_identifier is unchanged, no records found
+             * and *record_identifier contains EMPTY_END id
+             */
+            rc = STATUS_RECORD_STORE_EMPTY;
+        }
+        /* at end/scan complete, reset */
+        s->next_record_index = s->first_record_index;
+    }
+
+    return rc;
+}
+
+/*******************************************************************/
+
+static uint64_t erst_rd_reg64(hwaddr addr,
+    uint64_t reg, unsigned size)
+{
+    uint64_t rdval;
+    uint64_t mask;
+    unsigned shift;
+
+    if (size == sizeof(uint64_t)) {
+        /* 64b access */
+        mask = 0xFFFFFFFFFFFFFFFFUL;
+        shift = 0;
+    } else {
+        /* 32b access */
+        mask = 0x00000000FFFFFFFFUL;
+        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
+    }
+
+    rdval = reg;
+    rdval >>= shift;
+    rdval &= mask;
+
+    return rdval;
+}
+
+static uint64_t erst_wr_reg64(hwaddr addr,
+    uint64_t reg, uint64_t val, unsigned size)
+{
+    uint64_t wrval;
+    uint64_t mask;
+    unsigned shift;
+
+    if (size == sizeof(uint64_t)) {
+        /* 64b access */
+        mask = 0xFFFFFFFFFFFFFFFFUL;
+        shift = 0;
+    } else {
+        /* 32b access */
+        mask = 0x00000000FFFFFFFFUL;
+        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
+    }
+
+    val &= mask;
+    val <<= shift;
+    mask <<= shift;
+    wrval = reg;
+    wrval &= ~mask;
+    wrval |= val;
+
+    return wrval;
+}
+
+static void erst_reg_write(void *opaque, hwaddr addr,
+    uint64_t val, unsigned size)
+{
+    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
+
+    /*
+     * NOTE: All actions/operations/side effects happen on the WRITE,
+     * by design. The READs simply return the reg_value contents.
+     */
+    trace_acpi_erst_reg_write(addr, val, size);
+
+    switch (addr) {
+    case ERST_VALUE_OFFSET + 0:
+    case ERST_VALUE_OFFSET + 4:
+        s->reg_value = erst_wr_reg64(addr, s->reg_value, val, size);
+        break;
+    case ERST_ACTION_OFFSET + 0:
+/*  case ERST_ACTION_OFFSET+4: as coded, not really a 64b register */
+        switch (val) {
+        case ACTION_BEGIN_WRITE_OPERATION:
+        case ACTION_BEGIN_READ_OPERATION:
+        case ACTION_BEGIN_CLEAR_OPERATION:
+        case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
+        case ACTION_END_OPERATION:
+            s->operation = val;
+            break;
+        case ACTION_SET_RECORD_OFFSET:
+            s->record_offset = s->reg_value;
+            break;
+        case ACTION_EXECUTE_OPERATION:
+            if ((uint8_t)s->reg_value == ERST_EXECUTE_OPERATION_MAGIC) {
+                s->busy_status = 1;
+                switch (s->operation) {
+                case ACTION_BEGIN_WRITE_OPERATION:
+                    s->command_status = write_erst_record(s);
+                    break;
+                case ACTION_BEGIN_READ_OPERATION:
+                    s->command_status = read_erst_record(s);
+                    break;
+                case ACTION_BEGIN_CLEAR_OPERATION:
+                    s->command_status = clear_erst_record(s);
+                    break;
+                case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
+                    s->command_status = STATUS_SUCCESS;
+                    break;
+                case ACTION_END_OPERATION:
+                    s->command_status = STATUS_SUCCESS;
+                    break;
+                default:
+                    s->command_status = STATUS_FAILED;
+                    break;
+                }
+                s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;
+                s->busy_status = 0;
+            }
+            break;
+        case ACTION_CHECK_BUSY_STATUS:
+            s->reg_value = s->busy_status;
+            break;
+        case ACTION_GET_COMMAND_STATUS:
+            s->reg_value = s->command_status;
+            break;
+        case ACTION_GET_RECORD_IDENTIFIER:
+            s->command_status = next_erst_record(s, &s->reg_value);
+            break;
+        case ACTION_SET_RECORD_IDENTIFIER:
+            s->record_identifier = s->reg_value;
+            break;
+        case ACTION_GET_RECORD_COUNT:
+            s->reg_value = s->header->record_count;
+            break;
+        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
+            s->reg_value = (hwaddr)pci_get_bar_addr(PCI_DEVICE(s), 1);
+            break;
+        case ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
+            s->reg_value = ERST_RECORD_SIZE;
+            break;
+        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
+            s->reg_value = 0x0; /* intentional, not NVRAM mode */
+            break;
+        case ACTION_GET_EXECUTE_OPERATION_TIMINGS:
+            s->reg_value =
+                (100ULL << 32) | /* 100us max time */
+                (10ULL  <<  0) ; /*  10us min time */
+            break;
+        default:
+            /* Unknown action/command, NOP */
+            break;
+        }
+        break;
+    default:
+        /* This should not happen, but if it does, NOP */
+        break;
+    }
+}
+
+static uint64_t erst_reg_read(void *opaque, hwaddr addr,
+                                unsigned size)
+{
+    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
+    uint64_t val = 0;
+
+    switch (addr) {
+    case ERST_ACTION_OFFSET + 0:
+    case ERST_ACTION_OFFSET + 4:
+        val = erst_rd_reg64(addr, s->reg_action, size);
+        break;
+    case ERST_VALUE_OFFSET + 0:
+    case ERST_VALUE_OFFSET + 4:
+        val = erst_rd_reg64(addr, s->reg_value, size);
+        break;
+    default:
+        break;
+    }
+    trace_acpi_erst_reg_read(addr, val, size);
+    return val;
+}
+
+static const MemoryRegionOps erst_reg_ops = {
+    .read = erst_reg_read,
+    .write = erst_reg_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+/*******************************************************************/
+/*******************************************************************/
+static int erst_post_load(void *opaque, int version_id)
+{
+    ERSTDeviceState *s = opaque;
+
+    /* Recompute pointer to header */
+    s->header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);
+    trace_acpi_erst_post_load(s->header);
+
+    return 0;
+}
+
+static const VMStateDescription erst_vmstate  = {
+    .name = "acpi-erst",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .post_load = erst_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(storage_size, ERSTDeviceState),
+        VMSTATE_UINT8(operation, ERSTDeviceState),
+        VMSTATE_UINT8(busy_status, ERSTDeviceState),
+        VMSTATE_UINT8(command_status, ERSTDeviceState),
+        VMSTATE_UINT32(record_offset, ERSTDeviceState),
+        VMSTATE_UINT64(reg_action, ERSTDeviceState),
+        VMSTATE_UINT64(reg_value, ERSTDeviceState),
+        VMSTATE_UINT64(record_identifier, ERSTDeviceState),
+        VMSTATE_UINT32(next_record_index, ERSTDeviceState),
+        VMSTATE_UINT32(first_record_index, ERSTDeviceState),
+        VMSTATE_UINT32(last_record_index, ERSTDeviceState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void erst_realizefn(PCIDevice *pci_dev, Error **errp)
+{
+    ERSTDeviceState *s = ACPIERST(pci_dev);
+
+    trace_acpi_erst_realizefn_in();
+
+    if (!s->hostmem) {
+        error_setg(errp, "'" ACPI_ERST_MEMDEV_PROP "' property is not set");
+        return;
+    } else if (host_memory_backend_is_mapped(s->hostmem)) {
+        error_setg(errp, "can't use already busy memdev: %s",
+                   object_get_canonical_path_component(OBJECT(s->hostmem)));
+        return;
+    }
+
+    s->hostmem_mr = host_memory_backend_get_memory(s->hostmem);
+
+    /* HostMemoryBackend size will be multiple of PAGE_SIZE */
+    s->storage_size = object_property_get_int(OBJECT(s->hostmem), "size", errp);
+
+    /* Check storage_size against ERST_RECORD_SIZE */
+    if (((s->storage_size % ERST_RECORD_SIZE) != 0) ||
+         (ERST_RECORD_SIZE > s->storage_size)) {
+        error_setg(errp, "ACPI ERST requires size be multiple of "
+            "record size (%luKiB)", ERST_RECORD_SIZE);
+    }
+
+    /* Initialize backend storage and record_count */
+    check_erst_backend_storage(s, errp);
+
+    /* BAR 0: Programming registers */
+    memory_region_init_io(&s->iomem, OBJECT(pci_dev), &erst_reg_ops, s,
+                          TYPE_ACPI_ERST, ERST_REG_SIZE);
+    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->iomem);
+
+    /* BAR 1: Exchange buffer memory */
+    /* Create a hostmem object to use as the exchange buffer */
+    s->exchange_obj = object_new(TYPE_MEMORY_BACKEND_RAM);
+    object_property_set_int(s->exchange_obj, "size", ERST_RECORD_SIZE, errp);
+    user_creatable_complete(USER_CREATABLE(s->exchange_obj), errp);
+    s->exchange = MEMORY_BACKEND(s->exchange_obj);
+    host_memory_backend_set_mapped(s->exchange, true);
+    s->exchange_mr = host_memory_backend_get_memory(s->exchange);
+    memory_region_init_resizeable_ram(s->exchange_mr, OBJECT(pci_dev),
+        TYPE_ACPI_ERST, ERST_RECORD_SIZE, ERST_RECORD_SIZE, NULL, errp);
+    pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, s->exchange_mr);
+    /* Include the exchange buffer in the migration stream */
+    vmstate_register_ram_global(s->exchange_mr);
+
+    /* Include the backend storage in the migration stream */
+    vmstate_register_ram_global(s->hostmem_mr);
+
+    trace_acpi_erst_realizefn_out(s->storage_size);
+}
+
+static void erst_reset(DeviceState *dev)
+{
+    ERSTDeviceState *s = ACPIERST(dev);
+
+    trace_acpi_erst_reset_in(s->header->record_count);
+    s->operation = 0;
+    s->busy_status = 0;
+    s->command_status = STATUS_SUCCESS;
+    s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;
+    s->record_offset = 0;
+    s->next_record_index = s->first_record_index;
+    /* NOTE: first/last_record_index are computed only once */
+    trace_acpi_erst_reset_out(s->header->record_count);
+}
+
+static Property erst_properties[] = {
+    DEFINE_PROP_LINK(ACPI_ERST_MEMDEV_PROP, ERSTDeviceState, hostmem,
+                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void erst_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    trace_acpi_erst_class_init_in();
+    k->realize = erst_realizefn;
+    k->vendor_id = PCI_VENDOR_ID_REDHAT;
+    k->device_id = PCI_DEVICE_ID_REDHAT_ACPI_ERST;
+    k->revision = 0x00;
+    k->class_id = PCI_CLASS_OTHERS;
+    dc->reset = erst_reset;
+    dc->vmsd = &erst_vmstate;
+    dc->user_creatable = true;
+    device_class_set_props(dc, erst_properties);
+    dc->desc = "ACPI Error Record Serialization Table (ERST) device";
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    trace_acpi_erst_class_init_out();
+}
+
+static const TypeInfo erst_type_info = {
+    .name          = TYPE_ACPI_ERST,
+    .parent        = TYPE_PCI_DEVICE,
+    .class_init    = erst_class_init,
+    .instance_size = sizeof(ERSTDeviceState),
+    .interfaces = (InterfaceInfo[]) {
+        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
+        { }
+    }
+};
+
+static void erst_register_types(void)
+{
+    type_register_static(&erst_type_info);
+}
+
+type_init(erst_register_types)
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index 29f804d..401d0e5 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -5,6 +5,7 @@ acpi_ss.add(files(
   'bios-linker-loader.c',
   'core.c',
   'utils.c',
+  'erst.c',
 ))
 acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu.c'))
 acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu_hotplug.c'))
diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
index 974d770..3579768 100644
--- a/hw/acpi/trace-events
+++ b/hw/acpi/trace-events
@@ -55,3 +55,18 @@ piix4_gpe_writeb(uint64_t addr, unsigned width, uint64_t val) "addr: 0x%" PRIx64
 # tco.c
 tco_timer_reload(int ticks, int msec) "ticks=%d (%d ms)"
 tco_timer_expired(int timeouts_no, bool strap, bool no_reboot) "timeouts_no=%d no_reboot=%d/%d"
+
+# erst.c
+acpi_erst_reg_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%04" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
+acpi_erst_reg_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%04" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
+acpi_erst_mem_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%06" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
+acpi_erst_mem_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%06" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
+acpi_erst_pci_bar_0(uint64_t addr) "BAR0: 0x%016" PRIx64
+acpi_erst_pci_bar_1(uint64_t addr) "BAR1: 0x%016" PRIx64
+acpi_erst_realizefn_in(void)
+acpi_erst_realizefn_out(unsigned size) "total nvram size %u bytes"
+acpi_erst_reset_in(unsigned record_count) "record_count %u"
+acpi_erst_reset_out(unsigned record_count) "record_count %u"
+acpi_erst_post_load(void *header) "header: 0x%p"
+acpi_erst_class_init_in(void)
+acpi_erst_class_init_out(void)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 06/10] ACPI ERST: build the ACPI ERST table
  2021-08-05 22:30 [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (4 preceding siblings ...)
  2021-08-05 22:30 ` [PATCH v6 05/10] ACPI ERST: support for ACPI ERST feature Eric DeVolder
@ 2021-08-05 22:30 ` Eric DeVolder
  2021-08-05 22:30 ` [PATCH v6 07/10] ACPI ERST: create ACPI ERST table for pc/x86 machines Eric DeVolder
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-08-05 22:30 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

This builds the ACPI ERST table to inform OSPM how to communicate
with the acpi-erst device.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 hw/acpi/erst.c | 239 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 239 insertions(+)

diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
index eb4ab34..ecf1533 100644
--- a/hw/acpi/erst.c
+++ b/hw/acpi/erst.c
@@ -601,6 +601,245 @@ static const MemoryRegionOps erst_reg_ops = {
     .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
+
+/*******************************************************************/
+/*******************************************************************/
+
+/* ACPI 4.0: Table 17-19 Serialization Instructions */
+#define INST_READ_REGISTER                 0x00
+#define INST_READ_REGISTER_VALUE           0x01
+#define INST_WRITE_REGISTER                0x02
+#define INST_WRITE_REGISTER_VALUE          0x03
+#define INST_NOOP                          0x04
+#define INST_LOAD_VAR1                     0x05
+#define INST_LOAD_VAR2                     0x06
+#define INST_STORE_VAR1                    0x07
+#define INST_ADD                           0x08
+#define INST_SUBTRACT                      0x09
+#define INST_ADD_VALUE                     0x0A
+#define INST_SUBTRACT_VALUE                0x0B
+#define INST_STALL                         0x0C
+#define INST_STALL_WHILE_TRUE              0x0D
+#define INST_SKIP_NEXT_INSTRUCTION_IF_TRUE 0x0E
+#define INST_GOTO                          0x0F
+#define INST_SET_SRC_ADDRESS_BASE          0x10
+#define INST_SET_DST_ADDRESS_BASE          0x11
+#define INST_MOVE_DATA                     0x12
+
+/* ACPI 4.0: 17.4.1.2 Serialization Instruction Entries */
+static void build_serialization_instruction_entry(GArray *table_data,
+    uint8_t serialization_action,
+    uint8_t instruction,
+    uint8_t flags,
+    uint8_t register_bit_width,
+    uint64_t register_address,
+    uint64_t value,
+    uint64_t mask)
+{
+    /* ACPI 4.0: Table 17-18 Serialization Instruction Entry */
+    struct AcpiGenericAddress gas;
+
+    /* Serialization Action */
+    build_append_int_noprefix(table_data, serialization_action, 1);
+    /* Instruction */
+    build_append_int_noprefix(table_data, instruction         , 1);
+    /* Flags */
+    build_append_int_noprefix(table_data, flags               , 1);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0                   , 1);
+    /* Register Region */
+    gas.space_id = AML_SYSTEM_MEMORY;
+    gas.bit_width = register_bit_width;
+    gas.bit_offset = 0;
+    switch (register_bit_width) {
+    case 8:
+        gas.access_width = 1;
+        break;
+    case 16:
+        gas.access_width = 2;
+        break;
+    case 32:
+        gas.access_width = 3;
+        break;
+    case 64:
+        gas.access_width = 4;
+        break;
+    default:
+        gas.access_width = 0;
+        break;
+    }
+    gas.address = register_address;
+    build_append_gas_from_struct(table_data, &gas);
+    /* Value */
+    build_append_int_noprefix(table_data, value  , 8);
+    /* Mask */
+    build_append_int_noprefix(table_data, mask   , 8);
+}
+
+/* ACPI 4.0: 17.4.1 Serialization Action Table */
+void build_erst(GArray *table_data, BIOSLinker *linker, Object *erst_dev,
+    const char *oem_id, const char *oem_table_id)
+{
+    GArray *table_instruction_data;
+    unsigned action;
+    unsigned erst_start = table_data->len;
+    hwaddr bar0, bar1;
+
+    bar0 = (hwaddr)pci_get_bar_addr(PCI_DEVICE(erst_dev), 0);
+    trace_acpi_erst_pci_bar_0(bar0);
+    bar1 = (hwaddr)pci_get_bar_addr(PCI_DEVICE(erst_dev), 1);
+    trace_acpi_erst_pci_bar_1(bar1);
+
+#define MASK8  0x00000000000000FFUL
+#define MASK16 0x000000000000FFFFUL
+#define MASK32 0x00000000FFFFFFFFUL
+#define MASK64 0xFFFFFFFFFFFFFFFFUL
+
+    /*
+     * Serialization Action Table
+     * The serialization action table must be generated first
+     * so that its size can be known in order to populate the
+     * Instruction Entry Count field.
+     */
+    table_instruction_data = g_array_new(FALSE, FALSE, sizeof(char));
+
+    /* Serialization Instruction Entries */
+    action = ACTION_BEGIN_WRITE_OPERATION;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+
+    action = ACTION_BEGIN_READ_OPERATION;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+
+    action = ACTION_BEGIN_CLEAR_OPERATION;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+
+    action = ACTION_END_OPERATION;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+
+    action = ACTION_SET_RECORD_OFFSET;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER      , 0, 32,
+        bar0 + ERST_VALUE_OFFSET , 0, MASK32);
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+
+    action = ACTION_EXECUTE_OPERATION;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_VALUE_OFFSET , ERST_EXECUTE_OPERATION_MAGIC, MASK8);
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+
+    action = ACTION_CHECK_BUSY_STATUS;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_READ_REGISTER_VALUE , 0, 32,
+        bar0 + ERST_VALUE_OFFSET, 0x01, MASK8);
+
+    action = ACTION_GET_COMMAND_STATUS;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_READ_REGISTER       , 0, 32,
+        bar0 + ERST_VALUE_OFFSET, 0, MASK8);
+
+    action = ACTION_GET_RECORD_IDENTIFIER;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_READ_REGISTER       , 0, 64,
+        bar0 + ERST_VALUE_OFFSET, 0, MASK64);
+
+    action = ACTION_SET_RECORD_IDENTIFIER;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER      , 0, 64,
+        bar0 + ERST_VALUE_OFFSET , 0, MASK64);
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+
+    action = ACTION_GET_RECORD_COUNT;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_READ_REGISTER       , 0, 32,
+        bar0 + ERST_VALUE_OFFSET, 0, MASK32);
+
+    action = ACTION_BEGIN_DUMMY_WRITE_OPERATION;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+
+    action = ACTION_GET_ERROR_LOG_ADDRESS_RANGE;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_READ_REGISTER       , 0, 64,
+        bar0 + ERST_VALUE_OFFSET, 0, MASK64);
+
+    action = ACTION_GET_ERROR_LOG_ADDRESS_LENGTH;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_READ_REGISTER       , 0, 64,
+        bar0 + ERST_VALUE_OFFSET, 0, MASK32);
+
+    action = ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_READ_REGISTER       , 0, 32,
+        bar0 + ERST_VALUE_OFFSET, 0, MASK32);
+
+    action = ACTION_GET_EXECUTE_OPERATION_TIMINGS;
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_WRITE_REGISTER_VALUE, 0, 32,
+        bar0 + ERST_ACTION_OFFSET, action, MASK8);
+    build_serialization_instruction_entry(table_instruction_data,
+        action, INST_READ_REGISTER       , 0, 64,
+        bar0 + ERST_VALUE_OFFSET, 0, MASK64);
+
+    /* Serialization Header */
+    acpi_data_push(table_data, sizeof(AcpiTableHeader));
+    /* Serialization Header Size */
+    build_append_int_noprefix(table_data, 48, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data,  0, 4);
+    /*
+     * Instruction Entry Count
+     * Each instruction entry is 32 bytes
+     */
+    build_append_int_noprefix(table_data,
+        (table_instruction_data->len / 32), 4);
+    /* Serialization Instruction Entries */
+    g_array_append_vals(table_data, table_instruction_data->data,
+        table_instruction_data->len);
+    g_array_free(table_instruction_data, TRUE);
+
+    build_header(linker, table_data,
+                 (void *)(table_data->data + erst_start),
+                 "ERST", table_data->len - erst_start,
+                 1, oem_id, oem_table_id);
+}
+
 /*******************************************************************/
 /*******************************************************************/
 static int erst_post_load(void *opaque, int version_id)
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 07/10] ACPI ERST: create ACPI ERST table for pc/x86 machines
  2021-08-05 22:30 [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (5 preceding siblings ...)
  2021-08-05 22:30 ` [PATCH v6 06/10] ACPI ERST: build the ACPI ERST table Eric DeVolder
@ 2021-08-05 22:30 ` Eric DeVolder
  2021-08-05 22:30 ` [PATCH v6 08/10] ACPI ERST: qtest for ERST Eric DeVolder
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-08-05 22:30 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

This change exposes ACPI ERST support for x86 guests.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 hw/i386/acpi-build.c   | 9 +++++++++
 hw/i386/acpi-microvm.c | 9 +++++++++
 include/hw/acpi/erst.h | 5 +++++
 3 files changed, 23 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a33ac8b..b55a548 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -43,6 +43,7 @@
 #include "sysemu/tpm.h"
 #include "hw/acpi/tpm.h"
 #include "hw/acpi/vmgenid.h"
+#include "hw/acpi/erst.h"
 #include "sysemu/tpm_backend.h"
 #include "hw/rtc/mc146818rtc_regs.h"
 #include "migration/vmstate.h"
@@ -2443,6 +2444,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
     GArray *tables_blob = tables->table_data;
     AcpiSlicOem slic_oem = { .id = NULL, .table_id = NULL };
     Object *vmgenid_dev;
+    Object *erst_dev;
     char *oem_id;
     char *oem_table_id;
 
@@ -2504,6 +2506,13 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
                     ACPI_DEVICE_IF(x86ms->acpi_dev), x86ms->oem_id,
                     x86ms->oem_table_id);
 
+    erst_dev = find_erst_dev();
+    if (erst_dev) {
+        acpi_add_table(table_offsets, tables_blob);
+        build_erst(tables_blob, tables->linker, erst_dev,
+                   x86ms->oem_id, x86ms->oem_table_id);
+    }
+
     vmgenid_dev = find_vmgenid_dev();
     if (vmgenid_dev) {
         acpi_add_table(table_offsets, tables_blob);
diff --git a/hw/i386/acpi-microvm.c b/hw/i386/acpi-microvm.c
index 1a0f77b..6578254 100644
--- a/hw/i386/acpi-microvm.c
+++ b/hw/i386/acpi-microvm.c
@@ -30,6 +30,7 @@
 #include "hw/acpi/bios-linker-loader.h"
 #include "hw/acpi/generic_event_device.h"
 #include "hw/acpi/utils.h"
+#include "hw/acpi/erst.h"
 #include "hw/i386/fw_cfg.h"
 #include "hw/i386/microvm.h"
 #include "hw/pci/pci.h"
@@ -159,6 +160,7 @@ static void acpi_build_microvm(AcpiBuildTables *tables,
     X86MachineState *x86ms = X86_MACHINE(mms);
     GArray *table_offsets;
     GArray *tables_blob = tables->table_data;
+    Object *erst_dev;
     unsigned dsdt, xsdt;
     AcpiFadtData pmfadt = {
         /* ACPI 5.0: 4.1 Hardware-Reduced ACPI */
@@ -208,6 +210,13 @@ static void acpi_build_microvm(AcpiBuildTables *tables,
                     ACPI_DEVICE_IF(x86ms->acpi_dev), x86ms->oem_id,
                     x86ms->oem_table_id);
 
+    erst_dev = find_erst_dev();
+    if (erst_dev) {
+        acpi_add_table(table_offsets, tables_blob);
+        build_erst(tables_blob, tables->linker, erst_dev,
+                   x86ms->oem_id, x86ms->oem_table_id);
+    }
+
     xsdt = tables_blob->len;
     build_xsdt(tables_blob, tables->linker, table_offsets, x86ms->oem_id,
                x86ms->oem_table_id);
diff --git a/include/hw/acpi/erst.h b/include/hw/acpi/erst.h
index 9d63717..b747fe7 100644
--- a/include/hw/acpi/erst.h
+++ b/include/hw/acpi/erst.h
@@ -16,4 +16,9 @@ void build_erst(GArray *table_data, BIOSLinker *linker, Object *erst_dev,
 
 #define TYPE_ACPI_ERST "acpi-erst"
 
+/* returns NULL unless there is exactly one device */
+static inline Object *find_erst_dev(void)
+{
+    return object_resolve_path_type("", TYPE_ACPI_ERST, NULL);
+}
 #endif
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 08/10] ACPI ERST: qtest for ERST
  2021-08-05 22:30 [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (6 preceding siblings ...)
  2021-08-05 22:30 ` [PATCH v6 07/10] ACPI ERST: create ACPI ERST table for pc/x86 machines Eric DeVolder
@ 2021-08-05 22:30 ` Eric DeVolder
  2021-08-05 22:30 ` [PATCH v6 09/10] ACPI ERST: bios-tables-test testcase Eric DeVolder
  2021-08-05 22:30 ` [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test Eric DeVolder
  9 siblings, 0 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-08-05 22:30 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

This change provides a qtest that locates and then does a simple
interrogation of the ERST feature within the guest.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 tests/qtest/erst-test.c | 167 ++++++++++++++++++++++++++++++++++++++++++++++++
 tests/qtest/meson.build |   2 +
 2 files changed, 169 insertions(+)
 create mode 100644 tests/qtest/erst-test.c

diff --git a/tests/qtest/erst-test.c b/tests/qtest/erst-test.c
new file mode 100644
index 0000000..370c119
--- /dev/null
+++ b/tests/qtest/erst-test.c
@@ -0,0 +1,167 @@
+/*
+ * QTest testcase for acpi-erst
+ *
+ * Copyright (c) 2021 Oracle
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include <glib/gstdio.h>
+#include "libqos/libqos-pc.h"
+#include "libqos/libqtest.h"
+#include "qemu-common.h"
+
+static void save_fn(QPCIDevice *dev, int devfn, void *data)
+{
+    QPCIDevice **pdev = (QPCIDevice **) data;
+
+    *pdev = dev;
+}
+
+static QPCIDevice *get_device(QPCIBus *pcibus)
+{
+    QPCIDevice *dev;
+
+    dev = NULL;
+    qpci_device_foreach(pcibus, 0x1b36, 0x0012, save_fn, &dev);
+    g_assert(dev != NULL);
+
+    return dev;
+}
+
+typedef struct _ERSTState {
+    QOSState *qs;
+    QPCIBar reg_bar, mem_bar;
+    uint64_t reg_barsize, mem_barsize;
+    QPCIDevice *dev;
+} ERSTState;
+
+#define ACTION 0
+#define VALUE 8
+
+static const char *reg2str(unsigned reg)
+{
+    switch (reg) {
+    case 0:
+        return "ACTION";
+    case 8:
+        return "VALUE";
+    default:
+        return NULL;
+    }
+}
+
+static inline uint32_t in_reg32(ERSTState *s, unsigned reg)
+{
+    const char *name = reg2str(reg);
+    uint32_t res;
+
+    res = qpci_io_readl(s->dev, s->reg_bar, reg);
+    g_test_message("*%s -> %08x", name, res);
+
+    return res;
+}
+
+static inline uint64_t in_reg64(ERSTState *s, unsigned reg)
+{
+    const char *name = reg2str(reg);
+    uint64_t res;
+
+    res = qpci_io_readq(s->dev, s->reg_bar, reg);
+    g_test_message("*%s -> %016llx", name, (unsigned long long)res);
+
+    return res;
+}
+
+static inline void out_reg32(ERSTState *s, unsigned reg, uint32_t v)
+{
+    const char *name = reg2str(reg);
+
+    g_test_message("%08x -> *%s", v, name);
+    qpci_io_writel(s->dev, s->reg_bar, reg, v);
+}
+
+static inline void out_reg64(ERSTState *s, unsigned reg, uint64_t v)
+{
+    const char *name = reg2str(reg);
+
+    g_test_message("%016llx -> *%s", (unsigned long long)v, name);
+    qpci_io_writeq(s->dev, s->reg_bar, reg, v);
+}
+
+static void cleanup_vm(ERSTState *s)
+{
+    g_free(s->dev);
+    qtest_shutdown(s->qs);
+}
+
+static void setup_vm_cmd(ERSTState *s, const char *cmd)
+{
+    const char *arch = qtest_get_arch();
+
+    if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
+        s->qs = qtest_pc_boot(cmd);
+    } else {
+        g_printerr("erst-test tests are only available on x86\n");
+        exit(EXIT_FAILURE);
+    }
+    s->dev = get_device(s->qs->pcibus);
+
+    s->reg_bar = qpci_iomap(s->dev, 0, &s->reg_barsize);
+    g_assert_cmpuint(s->reg_barsize, ==, 16);
+
+    s->mem_bar = qpci_iomap(s->dev, 1, &s->mem_barsize);
+    g_assert_cmpuint(s->mem_barsize, ==, 0x2000);
+
+    qpci_device_enable(s->dev);
+}
+
+static void test_acpi_erst_basic(void)
+{
+    ERSTState state;
+    uint64_t log_address_range;
+    uint64_t log_address_length;
+    uint32_t log_address_attr;
+
+    setup_vm_cmd(&state,
+        "-object memory-backend-file,"
+            "mem-path=acpi-erst.XXXXXX,"
+            "size=64K,"
+            "share=on,"
+            "id=nvram "
+        "-device acpi-erst,"
+            "memdev=nvram");
+
+    out_reg32(&state, ACTION, 0xD);
+    log_address_range = in_reg64(&state, VALUE);
+    out_reg32(&state, ACTION, 0xE);
+    log_address_length = in_reg64(&state, VALUE);
+    out_reg32(&state, ACTION, 0xF);
+    log_address_attr = in_reg32(&state, VALUE);
+
+    /* Check log_address_range is not 0, ~0 or base */
+    g_assert_cmpuint(log_address_range, !=,  0ULL);
+    g_assert_cmpuint(log_address_range, !=, ~0ULL);
+    g_assert_cmpuint(log_address_range, !=, state.reg_bar.addr);
+    g_assert_cmpuint(log_address_range, ==, state.mem_bar.addr);
+
+    /* Check log_address_length is bar1_size */
+    g_assert_cmpuint(log_address_length, ==, state.mem_barsize);
+
+    /* Check log_address_attr is 0 */
+    g_assert_cmpuint(log_address_attr, ==, 0);
+
+    cleanup_vm(&state);
+}
+
+int main(int argc, char **argv)
+{
+    int ret;
+
+    g_test_init(&argc, &argv, NULL);
+    qtest_add_func("/acpi-erst/basic", test_acpi_erst_basic);
+    ret = g_test_run();
+    return ret;
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index e22a079..b69834d 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -68,6 +68,7 @@ qtests_i386 = \
   (config_all_devices.has_key('CONFIG_RTL8139_PCI') ? ['rtl8139-test'] : []) +              \
   (config_all_devices.has_key('CONFIG_E1000E_PCI_EXPRESS') ? ['fuzz-e1000e-test'] : []) +   \
   (config_all_devices.has_key('CONFIG_ESP_PCI') ? ['am53c974-test'] : []) +                 \
+  (config_all_devices.has_key('CONFIG_ACPI') ? ['erst-test'] : []) +                 \
   qtests_pci +                                                                              \
   ['fdc-test',
    'ide-test',
@@ -245,6 +246,7 @@ qtests = {
   'bios-tables-test': [io, 'boot-sector.c', 'acpi-utils.c', 'tpm-emu.c'],
   'cdrom-test': files('boot-sector.c'),
   'dbus-vmstate-test': files('migration-helpers.c') + dbus_vmstate1,
+  'erst-test': files('erst-test.c'),
   'ivshmem-test': [rt, '../../contrib/ivshmem-server/ivshmem-server.c'],
   'migration-test': files('migration-helpers.c'),
   'pxe-test': files('boot-sector.c'),
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 09/10] ACPI ERST: bios-tables-test testcase
  2021-08-05 22:30 [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (7 preceding siblings ...)
  2021-08-05 22:30 ` [PATCH v6 08/10] ACPI ERST: qtest for ERST Eric DeVolder
@ 2021-08-05 22:30 ` Eric DeVolder
  2021-09-21 11:32   ` Igor Mammedov
  2021-08-05 22:30 ` [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test Eric DeVolder
  9 siblings, 1 reply; 34+ messages in thread
From: Eric DeVolder @ 2021-08-05 22:30 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

This change implements the test suite checks for the ERST table.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 tests/qtest/bios-tables-test.c | 43 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index 51d3a4e..6ee78ec 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -1378,6 +1378,45 @@ static void test_acpi_piix4_tcg_acpi_hmat(void)
     test_acpi_tcg_acpi_hmat(MACHINE_PC);
 }
 
+static void test_acpi_erst(const char *machine)
+{
+    test_data data;
+
+    memset(&data, 0, sizeof(data));
+    data.machine = machine;
+    /*data.variant = ".acpierst";*/
+    test_acpi_one(" -object memory-backend-file,id=erstnvram,"
+                    "mem-path=tests/acpi-erst.XXXXXX,size=0x10000,share=on"
+                    " -device acpi-erst,memdev=erstnvram",
+                  &data);
+    free_test_data(&data);
+}
+
+static void test_acpi_piix4_erst(void)
+{
+    test_acpi_erst(MACHINE_PC);
+}
+
+static void test_acpi_q35_erst(void)
+{
+    test_acpi_erst(MACHINE_Q35);
+}
+
+static void test_acpi_microvm_erst(void)
+{
+    test_data data;
+
+    test_acpi_microvm_prepare(&data);
+    data.variant = ".pcie";
+    data.tcg_only = true; /* need constant host-phys-bits */
+    test_acpi_one(" -machine microvm,acpi=on,ioapic2=off,rtc=off,pcie=on "
+                    "-object memory-backend-file,id=erstnvram,"
+                    "mem-path=tests/acpi-erst.XXXXXX,size=0x10000,share=on "
+                    "-device acpi-erst,memdev=erstnvram",
+                  &data);
+    free_test_data(&data);
+}
+
 static void test_acpi_virt_tcg(void)
 {
     test_data data = {
@@ -1560,7 +1599,11 @@ int main(int argc, char *argv[])
         qtest_add_func("acpi/microvm/oem-fields", test_acpi_oem_fields_microvm);
         if (strcmp(arch, "x86_64") == 0) {
             qtest_add_func("acpi/microvm/pcie", test_acpi_microvm_pcie_tcg);
+            qtest_add_func("acpi/microvm/acpierst", test_acpi_microvm_erst);
         }
+        qtest_add_func("acpi/piix4/acpierst", test_acpi_piix4_erst);
+        qtest_add_func("acpi/q35/acpierst", test_acpi_q35_erst);
+
     } else if (strcmp(arch, "aarch64") == 0) {
         qtest_add_func("acpi/virt", test_acpi_virt_tcg);
         qtest_add_func("acpi/virt/numamem", test_acpi_virt_tcg_numamem);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test
  2021-08-05 22:30 [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
                   ` (8 preceding siblings ...)
  2021-08-05 22:30 ` [PATCH v6 09/10] ACPI ERST: bios-tables-test testcase Eric DeVolder
@ 2021-08-05 22:30 ` Eric DeVolder
  2021-08-06 17:16   ` Eric DeVolder
  2021-09-21 11:24   ` Igor Mammedov
  9 siblings, 2 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-08-05 22:30 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

Following the guidelines in tests/qtest/bios-tables-test.c, this
is step 6, the re-generated ACPI tables binary blobs.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 tests/data/acpi/microvm/ERST.pcie           | Bin 0 -> 912 bytes
 tests/data/acpi/pc/DSDT                     | Bin 6002 -> 6009 bytes
 tests/data/acpi/pc/ERST                     | Bin 0 -> 912 bytes
 tests/data/acpi/q35/DSDT                    | Bin 8289 -> 8306 bytes
 tests/data/acpi/q35/ERST                    | Bin 0 -> 912 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   6 ------
 6 files changed, 6 deletions(-)
 create mode 100644 tests/data/acpi/microvm/ERST.pcie

diff --git a/tests/data/acpi/microvm/ERST.pcie b/tests/data/acpi/microvm/ERST.pcie
new file mode 100644
index 0000000000000000000000000000000000000000..d9a2b3211ab5893a50751ad52be3782579e367f2
GIT binary patch
literal 912
zcmaKpO%8%E5QPUQ|KVrvh9h_c12J)@5f?5!k_Ygv*jGA8UW7?#`}+D#XXyDpKHiZ?
z@anI_W$gOrZRl(SB7!yMqx}#E4EC&a5=}m^g_!0^0`kEl)DOuIXM6D@@*xq*8vyqH
z)b0KTlmlgmH~xt7vG<k#Z1~z=OnyT76ZX;Ysy^;NC0^^$`kY?zKK;^vMtny1JAD$P
zc^BR{l;i*H`IJAW`~~?1`_TXD_wQ2@UlL!DU$GCpQ-4i-O}x_^JdQTRH^e)=(_c$`
LOT5z?_v4Aa+v(5&

literal 0
HcmV?d00001

diff --git a/tests/data/acpi/pc/DSDT b/tests/data/acpi/pc/DSDT
index cc1223773e9c459a8d2f20666c051a74338d40b7..bff625a25602fa954b5b395fea53e3c7dfaca485 100644
GIT binary patch
delta 85
zcmeyQ_fwC{CD<jTQk;Q-F=QiG057Ni!kGAAr+5MP$;rGe;+`zQh8FQ0@s2J*JPZuX
l3>=QZp?+M<lN)&@ggD~CY!RV&S1$v`0B2XP&5C@1oB+Xc6m$Rp

delta 65
zcmeyV_eqb-CD<jTNSuLzao$F*0A5ayg)#BLPVoW`laqN{#GF`y4K3n1;)6r|xR^QO
V9bJNW7#Nr*U*I#`Y|7`t2>@&@5ljF8

diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
GIT binary patch
literal 912
zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2
z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
R+q~0XMSexT%^U0Ee0~)`g#iEn

literal 0
HcmV?d00001

diff --git a/tests/data/acpi/q35/DSDT b/tests/data/acpi/q35/DSDT
index 842533f53e6db40935c3cdecd1d182edba6c17d4..950c286b4c751f3c116a11d8892779942375e16b 100644
GIT binary patch
delta 59
zcmaFp@X3M8CD<jTNP&TYv2`OCrvjHhYfOBwQ@nsX>ttC4TZ!l<{$N9cc#e2SmmnSn
O1||j(wg6|p5C#C(xDBxY

delta 42
xcmez5@X&$FCD<h-QGtPh@##h`P6aMMmYDcpr+5K3mdUaTw(KHo0nUCQ3;+kH3ZMW0

diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
GIT binary patch
literal 912
zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2
z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
R+q~0XMSexT%^U0Ee0~)`g#iEn

literal 0
HcmV?d00001

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index b3aaf76..dfb8523 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,7 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/pc/ERST",
-"tests/data/acpi/q35/ERST",
-"tests/data/acpi/microvm/ERST",
-"tests/data/acpi/pc/DSDT",
-"tests/data/acpi/q35/DSDT",
-"tests/data/acpi/microvm/DSDT",
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test
  2021-08-05 22:30 ` [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test Eric DeVolder
@ 2021-08-06 17:16   ` Eric DeVolder
  2021-08-27 21:45     ` Eric DeVolder
  2021-09-21 11:24   ` Igor Mammedov
  1 sibling, 1 reply; 34+ messages in thread
From: Eric DeVolder @ 2021-08-06 17:16 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

Well, I discovered today that running "make check" again resulted in
bios table mismatches. In looking into this further, I think I might
finally have an understanding as to how this is all to work. My
bios-tables-test-allowed-diff for step 1 now looks like:

"tests/data/acpi/pc/DSDT.acpierst",
"tests/data/acpi/pc/ERST",
"tests/data/acpi/q35/DSDT.acpierst",
"tests/data/acpi/q35/ERST",
"tests/data/acpi/microvm/ERST.pcie",

and with the corresponding empty files and by using the
  .variant = ".acpierst"
in bios-tables-test, I am able to run "make check" multiple times
now without failures.

So, that means patch 01/10 and 10/10 are wrong. I'm assuming there
will be other items to address, so I'll plan for these fixes in
v7.

My apologies,
eric


On 8/5/21 5:30 PM, Eric DeVolder wrote:
> Following the guidelines in tests/qtest/bios-tables-test.c, this
> is step 6, the re-generated ACPI tables binary blobs.
> 
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> ---
>   tests/data/acpi/microvm/ERST.pcie           | Bin 0 -> 912 bytes
>   tests/data/acpi/pc/DSDT                     | Bin 6002 -> 6009 bytes
>   tests/data/acpi/pc/ERST                     | Bin 0 -> 912 bytes
>   tests/data/acpi/q35/DSDT                    | Bin 8289 -> 8306 bytes
>   tests/data/acpi/q35/ERST                    | Bin 0 -> 912 bytes
>   tests/qtest/bios-tables-test-allowed-diff.h |   6 ------
>   6 files changed, 6 deletions(-)
>   create mode 100644 tests/data/acpi/microvm/ERST.pcie
> 
> diff --git a/tests/data/acpi/microvm/ERST.pcie b/tests/data/acpi/microvm/ERST.pcie
> new file mode 100644
> index 0000000000000000000000000000000000000000..d9a2b3211ab5893a50751ad52be3782579e367f2
> GIT binary patch
> literal 912
> zcmaKpO%8%E5QPUQ|KVrvh9h_c12J)@5f?5!k_Ygv*jGA8UW7?#`}+D#XXyDpKHiZ?
> z@anI_W$gOrZRl(SB7!yMqx}#E4EC&a5=}m^g_!0^0`kEl)DOuIXM6D@@*xq*8vyqH
> z)b0KTlmlgmH~xt7vG<k#Z1~z=OnyT76ZX;Ysy^;NC0^^$`kY?zKK;^vMtny1JAD$P
> zc^BR{l;i*H`IJAW`~~?1`_TXD_wQ2@UlL!DU$GCpQ-4i-O}x_^JdQTRH^e)=(_c$`
> LOT5z?_v4Aa+v(5&
> 
> literal 0
> HcmV?d00001
> 
> diff --git a/tests/data/acpi/pc/DSDT b/tests/data/acpi/pc/DSDT
> index cc1223773e9c459a8d2f20666c051a74338d40b7..bff625a25602fa954b5b395fea53e3c7dfaca485 100644
> GIT binary patch
> delta 85
> zcmeyQ_fwC{CD<jTQk;Q-F=QiG057Ni!kGAAr+5MP$;rGe;+`zQh8FQ0@s2J*JPZuX
> l3>=QZp?+M<lN)&@ggD~CY!RV&S1$v`0B2XP&5C@1oB+Xc6m$Rp
> 
> delta 65
> zcmeyV_eqb-CD<jTNSuLzao$F*0A5ayg)#BLPVoW`laqN{#GF`y4K3n1;)6r|xR^QO
> V9bJNW7#Nr*U*I#`Y|7`t2>@&@5ljF8
> 
> diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
> index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
> GIT binary patch
> literal 912
> zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2
> z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
> zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
> zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
> R+q~0XMSexT%^U0Ee0~)`g#iEn
> 
> literal 0
> HcmV?d00001
> 
> diff --git a/tests/data/acpi/q35/DSDT b/tests/data/acpi/q35/DSDT
> index 842533f53e6db40935c3cdecd1d182edba6c17d4..950c286b4c751f3c116a11d8892779942375e16b 100644
> GIT binary patch
> delta 59
> zcmaFp@X3M8CD<jTNP&TYv2`OCrvjHhYfOBwQ@nsX>ttC4TZ!l<{$N9cc#e2SmmnSn
> O1||j(wg6|p5C#C(xDBxY
> 
> delta 42
> xcmez5@X&$FCD<h-QGtPh@##h`P6aMMmYDcpr+5K3mdUaTw(KHo0nUCQ3;+kH3ZMW0
> 
> diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
> index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
> GIT binary patch
> literal 912
> zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2
> z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
> zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
> zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
> R+q~0XMSexT%^U0Ee0~)`g#iEn
> 
> literal 0
> HcmV?d00001
> 
> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
> index b3aaf76..dfb8523 100644
> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> @@ -1,7 +1 @@
>   /* List of comma-separated changed AML files to ignore */
> -"tests/data/acpi/pc/ERST",
> -"tests/data/acpi/q35/ERST",
> -"tests/data/acpi/microvm/ERST",
> -"tests/data/acpi/pc/DSDT",
> -"tests/data/acpi/q35/DSDT",
> -"tests/data/acpi/microvm/DSDT",
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test
  2021-08-06 17:16   ` Eric DeVolder
@ 2021-08-27 21:45     ` Eric DeVolder
  2021-09-02  6:34       ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Eric DeVolder @ 2021-08-27 21:45 UTC (permalink / raw)
  To: qemu-devel, imammedo
  Cc: ehabkost, konrad.wilk, mst, pbonzini, boris.ostrovsky, rth

Igor,
I'm not sure if I should post v7 with the correction to the tables,
or await your guidance/feedback on v6.
Thanks,
eric


On 8/6/21 12:16 PM, Eric DeVolder wrote:
> Well, I discovered today that running "make check" again resulted in
> bios table mismatches. In looking into this further, I think I might
> finally have an understanding as to how this is all to work. My
> bios-tables-test-allowed-diff for step 1 now looks like:
> 
> "tests/data/acpi/pc/DSDT.acpierst",
> "tests/data/acpi/pc/ERST",
> "tests/data/acpi/q35/DSDT.acpierst",
> "tests/data/acpi/q35/ERST",
> "tests/data/acpi/microvm/ERST.pcie",
> 
> and with the corresponding empty files and by using the
>   .variant = ".acpierst"
> in bios-tables-test, I am able to run "make check" multiple times
> now without failures.
> 
> So, that means patch 01/10 and 10/10 are wrong. I'm assuming there
> will be other items to address, so I'll plan for these fixes in
> v7.
> 
> My apologies,
> eric
> 
> 
> On 8/5/21 5:30 PM, Eric DeVolder wrote:
>> Following the guidelines in tests/qtest/bios-tables-test.c, this
>> is step 6, the re-generated ACPI tables binary blobs.
>>
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> ---
>>   tests/data/acpi/microvm/ERST.pcie           | Bin 0 -> 912 bytes
>>   tests/data/acpi/pc/DSDT                     | Bin 6002 -> 6009 bytes
>>   tests/data/acpi/pc/ERST                     | Bin 0 -> 912 bytes
>>   tests/data/acpi/q35/DSDT                    | Bin 8289 -> 8306 bytes
>>   tests/data/acpi/q35/ERST                    | Bin 0 -> 912 bytes
>>   tests/qtest/bios-tables-test-allowed-diff.h |   6 ------
>>   6 files changed, 6 deletions(-)
>>   create mode 100644 tests/data/acpi/microvm/ERST.pcie
>>
>> diff --git a/tests/data/acpi/microvm/ERST.pcie b/tests/data/acpi/microvm/ERST.pcie
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..d9a2b3211ab5893a50751ad52be3782579e367f2
>> GIT binary patch
>> literal 912
>> zcmaKpO%8%E5QPUQ|KVrvh9h_c12J)@5f?5!k_Ygv*jGA8UW7?#`}+D#XXyDpKHiZ?
>> z@anI_W$gOrZRl(SB7!yMqx}#E4EC&a5=}m^g_!0^0`kEl)DOuIXM6D@@*xq*8vyqH
>> z)b0KTlmlgmH~xt7vG<k#Z1~z=OnyT76ZX;Ysy^;NC0^^$`kY?zKK;^vMtny1JAD$P
>> zc^BR{l;i*H`IJAW`~~?1`_TXD_wQ2@UlL!DU$GCpQ-4i-O}x_^JdQTRH^e)=(_c$`
>> LOT5z?_v4Aa+v(5&
>>
>> literal 0
>> HcmV?d00001
>>
>> diff --git a/tests/data/acpi/pc/DSDT b/tests/data/acpi/pc/DSDT
>> index cc1223773e9c459a8d2f20666c051a74338d40b7..bff625a25602fa954b5b395fea53e3c7dfaca485 100644
>> GIT binary patch
>> delta 85
>> zcmeyQ_fwC{CD<jTQk;Q-F=QiG057Ni!kGAAr+5MP$;rGe;+`zQh8FQ0@s2J*JPZuX
>> l3>=QZp?+M<lN)&@ggD~CY!RV&S1$v`0B2XP&5C@1oB+Xc6m$Rp
>>
>> delta 65
>> zcmeyV_eqb-CD<jTNSuLzao$F*0A5ayg)#BLPVoW`laqN{#GF`y4K3n1;)6r|xR^QO
>> V9bJNW7#Nr*U*I#`Y|7`t2>@&@5ljF8
>>
>> diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
>> index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
>> GIT binary patch
>> literal 912
>> zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2
>> z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
>> zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
>> zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
>> R+q~0XMSexT%^U0Ee0~)`g#iEn
>>
>> literal 0
>> HcmV?d00001
>>
>> diff --git a/tests/data/acpi/q35/DSDT b/tests/data/acpi/q35/DSDT
>> index 842533f53e6db40935c3cdecd1d182edba6c17d4..950c286b4c751f3c116a11d8892779942375e16b 100644
>> GIT binary patch
>> delta 59
>> zcmaFp@X3M8CD<jTNP&TYv2`OCrvjHhYfOBwQ@nsX>ttC4TZ!l<{$N9cc#e2SmmnSn
>> O1||j(wg6|p5C#C(xDBxY
>>
>> delta 42
>> xcmez5@X&$FCD<h-QGtPh@##h`P6aMMmYDcpr+5K3mdUaTw(KHo0nUCQ3;+kH3ZMW0
>>
>> diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
>> index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
>> GIT binary patch
>> literal 912
>> zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2
>> z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
>> zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
>> zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
>> R+q~0XMSexT%^U0Ee0~)`g#iEn
>>
>> literal 0
>> HcmV?d00001
>>
>> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
>> b/tests/qtest/bios-tables-test-allowed-diff.h
>> index b3aaf76..dfb8523 100644
>> --- a/tests/qtest/bios-tables-test-allowed-diff.h
>> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
>> @@ -1,7 +1 @@
>>   /* List of comma-separated changed AML files to ignore */
>> -"tests/data/acpi/pc/ERST",
>> -"tests/data/acpi/q35/ERST",
>> -"tests/data/acpi/microvm/ERST",
>> -"tests/data/acpi/pc/DSDT",
>> -"tests/data/acpi/q35/DSDT",
>> -"tests/data/acpi/microvm/DSDT",
>>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test
  2021-08-27 21:45     ` Eric DeVolder
@ 2021-09-02  6:34       ` Igor Mammedov
  0 siblings, 0 replies; 34+ messages in thread
From: Igor Mammedov @ 2021-09-02  6:34 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth

On Fri, 27 Aug 2021 16:45:15 -0500
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Igor,
> I'm not sure if I should post v7 with the correction to the tables,
> or await your guidance/feedback on v6.

Hopefully, I'll be back to reviewing patches (including yours) next week.

> Thanks,
> eric
> 
> 
> On 8/6/21 12:16 PM, Eric DeVolder wrote:
> > Well, I discovered today that running "make check" again resulted in
> > bios table mismatches. In looking into this further, I think I might
> > finally have an understanding as to how this is all to work. My
> > bios-tables-test-allowed-diff for step 1 now looks like:
> > 
> > "tests/data/acpi/pc/DSDT.acpierst",
> > "tests/data/acpi/pc/ERST",
> > "tests/data/acpi/q35/DSDT.acpierst",
> > "tests/data/acpi/q35/ERST",
> > "tests/data/acpi/microvm/ERST.pcie",
> > 
> > and with the corresponding empty files and by using the
> >   .variant = ".acpierst"
> > in bios-tables-test, I am able to run "make check" multiple times
> > now without failures.
> > 
> > So, that means patch 01/10 and 10/10 are wrong. I'm assuming there
> > will be other items to address, so I'll plan for these fixes in
> > v7.
> > 
> > My apologies,
> > eric
> > 
> > 
> > On 8/5/21 5:30 PM, Eric DeVolder wrote:  
> >> Following the guidelines in tests/qtest/bios-tables-test.c, this
> >> is step 6, the re-generated ACPI tables binary blobs.
> >>
> >> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> >> ---
> >>   tests/data/acpi/microvm/ERST.pcie           | Bin 0 -> 912 bytes
> >>   tests/data/acpi/pc/DSDT                     | Bin 6002 -> 6009 bytes
> >>   tests/data/acpi/pc/ERST                     | Bin 0 -> 912 bytes
> >>   tests/data/acpi/q35/DSDT                    | Bin 8289 -> 8306 bytes
> >>   tests/data/acpi/q35/ERST                    | Bin 0 -> 912 bytes
> >>   tests/qtest/bios-tables-test-allowed-diff.h |   6 ------
> >>   6 files changed, 6 deletions(-)
> >>   create mode 100644 tests/data/acpi/microvm/ERST.pcie
> >>
> >> diff --git a/tests/data/acpi/microvm/ERST.pcie b/tests/data/acpi/microvm/ERST.pcie
> >> new file mode 100644
> >> index 0000000000000000000000000000000000000000..d9a2b3211ab5893a50751ad52be3782579e367f2
> >> GIT binary patch
> >> literal 912
> >> zcmaKpO%8%E5QPUQ|KVrvh9h_c12J)@5f?5!k_Ygv*jGA8UW7?#`}+D#XXyDpKHiZ?
> >> z@anI_W$gOrZRl(SB7!yMqx}#E4EC&a5=}m^g_!0^0`kEl)DOuIXM6D@@*xq*8vyqH
> >> z)b0KTlmlgmH~xt7vG<k#Z1~z=OnyT76ZX;Ysy^;NC0^^$`kY?zKK;^vMtny1JAD$P
> >> zc^BR{l;i*H`IJAW`~~?1`_TXD_wQ2@UlL!DU$GCpQ-4i-O}x_^JdQTRH^e)=(_c$`
> >> LOT5z?_v4Aa+v(5&
> >>
> >> literal 0
> >> HcmV?d00001
> >>
> >> diff --git a/tests/data/acpi/pc/DSDT b/tests/data/acpi/pc/DSDT
> >> index cc1223773e9c459a8d2f20666c051a74338d40b7..bff625a25602fa954b5b395fea53e3c7dfaca485 100644
> >> GIT binary patch
> >> delta 85
> >> zcmeyQ_fwC{CD<jTQk;Q-F=QiG057Ni!kGAAr+5MP$;rGe;+`zQh8FQ0@s2J*JPZuX  
> >> l3>=QZp?+M<lN)&@ggD~CY!RV&S1$v`0B2XP&5C@1oB+Xc6m$Rp  
> >>
> >> delta 65
> >> zcmeyV_eqb-CD<jTNSuLzao$F*0A5ayg)#BLPVoW`laqN{#GF`y4K3n1;)6r|xR^QO  
> >> V9bJNW7#Nr*U*I#`Y|7`t2>@&@5ljF8  
> >>
> >> diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
> >> index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
> >> GIT binary patch
> >> literal 912  
> >> zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2  
> >> z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
> >> zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
> >> zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
> >> R+q~0XMSexT%^U0Ee0~)`g#iEn
> >>
> >> literal 0
> >> HcmV?d00001
> >>
> >> diff --git a/tests/data/acpi/q35/DSDT b/tests/data/acpi/q35/DSDT
> >> index 842533f53e6db40935c3cdecd1d182edba6c17d4..950c286b4c751f3c116a11d8892779942375e16b 100644
> >> GIT binary patch
> >> delta 59
> >> zcmaFp@X3M8CD<jTNP&TYv2`OCrvjHhYfOBwQ@nsX>ttC4TZ!l<{$N9cc#e2SmmnSn
> >> O1||j(wg6|p5C#C(xDBxY
> >>
> >> delta 42
> >> xcmez5@X&$FCD<h-QGtPh@##h`P6aMMmYDcpr+5K3mdUaTw(KHo0nUCQ3;+kH3ZMW0
> >>
> >> diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
> >> index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
> >> GIT binary patch
> >> literal 912  
> >> zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2  
> >> z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
> >> zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
> >> zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
> >> R+q~0XMSexT%^U0Ee0~)`g#iEn
> >>
> >> literal 0
> >> HcmV?d00001
> >>
> >> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
> >> b/tests/qtest/bios-tables-test-allowed-diff.h
> >> index b3aaf76..dfb8523 100644
> >> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> >> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> >> @@ -1,7 +1 @@
> >>   /* List of comma-separated changed AML files to ignore */
> >> -"tests/data/acpi/pc/ERST",
> >> -"tests/data/acpi/q35/ERST",
> >> -"tests/data/acpi/microvm/ERST",
> >> -"tests/data/acpi/pc/DSDT",
> >> -"tests/data/acpi/q35/DSDT",
> >> -"tests/data/acpi/microvm/DSDT",
> >>  
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/10] ACPI ERST: bios-tables-test.c steps 1 and 2
  2021-08-05 22:30 ` [PATCH v6 01/10] ACPI ERST: bios-tables-test.c steps 1 and 2 Eric DeVolder
@ 2021-09-20 13:05   ` Igor Mammedov
  2021-10-04 20:37     ` Eric DeVolder
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2021-09-20 13:05 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth

On Thu,  5 Aug 2021 18:30:30 -0400
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Following the guidelines in tests/qtest/bios-tables-test.c, this
> change adds empty placeholder files per step 1 for the new ERST
> table, and excludes resulting changed files in bios-tables-test-allowed-diff.h
> per step 2.
> 

I'd move this right before 10/10

> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>

Acked-by: Igor Mammedov <imammedo@redhat.com>


> ---
>  tests/data/acpi/microvm/ERST                | 0
>  tests/data/acpi/pc/ERST                     | 0
>  tests/data/acpi/q35/ERST                    | 0
>  tests/qtest/bios-tables-test-allowed-diff.h | 6 ++++++
>  4 files changed, 6 insertions(+)
>  create mode 100644 tests/data/acpi/microvm/ERST
>  create mode 100644 tests/data/acpi/pc/ERST
>  create mode 100644 tests/data/acpi/q35/ERST
> 
> diff --git a/tests/data/acpi/microvm/ERST b/tests/data/acpi/microvm/ERST
> new file mode 100644
> index 0000000..e69de29
> diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
> new file mode 100644
> index 0000000..e69de29
> diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
> new file mode 100644
> index 0000000..e69de29
> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
> index dfb8523..b3aaf76 100644
> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> @@ -1 +1,7 @@
>  /* List of comma-separated changed AML files to ignore */
> +"tests/data/acpi/pc/ERST",
> +"tests/data/acpi/q35/ERST",
> +"tests/data/acpi/microvm/ERST",
> +"tests/data/acpi/pc/DSDT",
> +"tests/data/acpi/q35/DSDT",
> +"tests/data/acpi/microvm/DSDT",



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 02/10] ACPI ERST: specification for ERST support
  2021-08-05 22:30 ` [PATCH v6 02/10] ACPI ERST: specification for ERST support Eric DeVolder
@ 2021-09-20 13:38   ` Igor Mammedov
  2021-10-04 20:40     ` Eric DeVolder
  2021-10-06  6:58   ` [PATCH] " Ani Sinha
  2021-10-06  8:12   ` [PATCH v6 02/10] " Michael S. Tsirkin
  2 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2021-09-20 13:38 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: Daniel P. Berrangé,
	ehabkost, mst, konrad.wilk, qemu-devel, pbonzini,
	boris.ostrovsky, eblake, rth

On Thu,  5 Aug 2021 18:30:31 -0400
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Information on the implementation of the ACPI ERST support.
> 
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>

modulo missing parts documentation looks good to but
I'm tainted at this point (after so many reviews) so
libvirt folks (CCed) can take look at it and see if
something needs to be changed here.

> ---
>  docs/specs/acpi_erst.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 147 insertions(+)
>  create mode 100644 docs/specs/acpi_erst.txt
> 
> diff --git a/docs/specs/acpi_erst.txt b/docs/specs/acpi_erst.txt
> new file mode 100644
> index 0000000..7f7544f
> --- /dev/null
> +++ b/docs/specs/acpi_erst.txt
> @@ -0,0 +1,147 @@
> +ACPI ERST DEVICE
> +================
> +
> +The ACPI ERST device is utilized to support the ACPI Error Record
> +Serialization Table, ERST, functionality. This feature is designed for
> +storing error records in persistent storage for future reference
> +and/or debugging.
> +
> +The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
> +(APEI)", and specifically subsection "Error Serialization", outlines a
> +method for storing error records into persistent storage.
> +
> +The format of error records is described in the UEFI specification[2],
> +in Appendix N "Common Platform Error Record".
> +
> +While the ACPI specification allows for an NVRAM "mode" (see
> +GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES) where non-volatile RAM is
> +directly exposed for direct access by the OS/guest, this device
> +implements the non-NVRAM "mode". This non-NVRAM "mode" is what is
> +implemented by most BIOS (since flash memory requires programming
> +operations in order to update its contents). Furthermore, as of the
> +time of this writing, Linux only supports the non-NVRAM "mode".
> +
> +
> +Background/Motivation
> +---------------------
> +
> +Linux uses the persistent storage filesystem, pstore, to record
> +information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
> +independent of, and runs before, kdump.  In certain scenarios (ie.
> +hosts/guests with root filesystems on NFS/iSCSI where networking
> +software and/or hardware fails), pstore may contain information
> +available for post-mortem debugging.
> +
> +Two common storage backends for the pstore filesystem are ACPI ERST
> +and UEFI. Most BIOS implement ACPI ERST.  UEFI is not utilized in all
> +guests. With QEMU supporting ACPI ERST, it becomes a viable pstore
> +storage backend for virtual machines (as it is now for bare metal
> +machines).
> +
> +Enabling support for ACPI ERST facilitates a consistent method to
> +capture kernel panic information in a wide range of guests: from
> +resource-constrained microvms to very large guests, and in particular,
> +in direct-boot environments (which would lack UEFI run-time services).
> +
> +Note that Microsoft Windows also utilizes the ACPI ERST for certain
> +crash information, if available[3].
> +
> +
> +Configuration|Usage
> +-------------------
> +
> +To use ACPI ERST, a memory-backend-file object and acpi-erst device
> +can be created, for example:
> +
> + qemu ...
> + -object memory-backend-file,id=erstnvram,mem-path=acpi-erst.backing,size=0x10000,share=on \
> + -device acpi-erst,memdev=erstnvram
> +
> +For proper operation, the ACPI ERST device needs a memory-backend-file
> +object with the following parameters:
> +
> + - id: The id of the memory-backend-file object is used to associate
> +   this memory with the acpi-erst device.
> + - size: The size of the ACPI ERST backing storage. This parameter is
> +   required.
> + - mem-path: The location of the ACPI ERST backing storage file. This
> +   parameter is also required.
> + - share: The share=on parameter is required so that updates to the
> +   ERST backing store are written to the file.
> +
> +and ERST device:
> +
> + - memdev: Is the object id of the memory-backend-file.
> +
> +
> +PCI Interface
> +-------------
> +
> +The ERST device is a PCI device with two BARs, one for accessing the
> +programming registers, and the other for accessing the record exchange
> +buffer.
> +
> +BAR0 contains the programming interface consisting of ACTION and VALUE
> +64-bit registers.  All ERST actions/operations/side effects happen on
> +the write to the ACTION, by design. Any data needed by the action must
> +be placed into VALUE prior to writing ACTION.  Reading the VALUE
> +simply returns the register contents, which can be updated by a
> +previous ACTION.
> +
> +BAR1 contains the 8KiB record exchange buffer, which is the
> +implemented maximum record size.
> +
> +
> +Backend Storage Format
> +----------------------
> +
> +The backend storage is divided into fixed size "slots", 8KiB in
> +length, with each slot storing a single record.  Not all slots need to
> +be occupied, and they need not be occupied in a contiguous fashion.
> +The ability to clear/erase specific records allows for the formation
> +of unoccupied slots.
> +
> +Slot 0 is reserved for a backend storage header that identifies the
> +contents as ERST and also facilitates efficient access to the records.
> +Depending upon the size of the backend storage, additional slots will
> +be reserved to be a part of the slot 0 header. For example, at 8KiB,
> +the slot 0 header can accomodate 1024 records. Thus a storage size
> +above 8MiB (8KiB * 1024) requires an additional slot. In this
> +scenario, slot 0 and slot 1 form the backend storage header, and
> +records can be stored starting at slot 2.
> +
> +Below is an example layout of the backend storage format (for storage
> +size less than 8MiB). The size of the storage is a multiple of 8KiB,
> +and contains N number of slots to store records. The example below
> +shows two records (in CPER format) in the backend storage, while the
> +remaining slots are empty/available.
> +
> + Slot   Record
> +        +--------------------------------------------+
> +    0   | reserved: storage header                   |

typically reserved means 'not used', so I'd drop mentioning reserved 
an leave it just as storage header.

Also header format should be described here

> +        +--------------------------------------------+
> +    1   | empty/available                            |
> +        +--------------------------------------------+
> +    2   | CPER                                       |
> +        +--------------------------------------------+

how can one distinguish empty vs used slots (i.e define empty somewhere here)

> +    3   | CPER                                       |
> +        +--------------------------------------------+
> +  ...   |                                            |
> +        +--------------------------------------------+
> +    N   | empty/available                            |
> +        +--------------------------------------------+
> +        <------------------ 8KiB -------------------->
> +
> +
> +
> +References
> +----------
> +
> +[1] "Advanced Configuration and Power Interface Specification",
> +    version 4.0, June 2009.
> +
> +[2] "Unified Extensible Firmware Interface Specification",
> +    version 2.1, October 2008.
> +
> +[3] "Windows Hardware Error Architecture", specfically
> +    "Error Record Persistence Mechanism".



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test
  2021-08-05 22:30 ` [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test Eric DeVolder
  2021-08-06 17:16   ` Eric DeVolder
@ 2021-09-21 11:24   ` Igor Mammedov
  2021-10-04 21:14     ` Eric DeVolder
  1 sibling, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2021-09-21 11:24 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth

On Thu,  5 Aug 2021 18:30:39 -0400
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Following the guidelines in tests/qtest/bios-tables-test.c, this
> is step 6, the re-generated ACPI tables binary blobs.


commit message should include ASL diff for new/changed content

for example see commit:
1aaef7d8092803 acpi: unit-test: Update WAET ACPI Table expected binaries

> 
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> ---
>  tests/data/acpi/microvm/ERST.pcie           | Bin 0 -> 912 bytes
>  tests/data/acpi/pc/DSDT                     | Bin 6002 -> 6009 bytes
>  tests/data/acpi/pc/ERST                     | Bin 0 -> 912 bytes
>  tests/data/acpi/q35/DSDT                    | Bin 8289 -> 8306 bytes
>  tests/data/acpi/q35/ERST                    | Bin 0 -> 912 bytes
>  tests/qtest/bios-tables-test-allowed-diff.h |   6 ------
>  6 files changed, 6 deletions(-)
>  create mode 100644 tests/data/acpi/microvm/ERST.pcie
> 
> diff --git a/tests/data/acpi/microvm/ERST.pcie b/tests/data/acpi/microvm/ERST.pcie
> new file mode 100644
> index 0000000000000000000000000000000000000000..d9a2b3211ab5893a50751ad52be3782579e367f2
> GIT binary patch
> literal 912
> zcmaKpO%8%E5QPUQ|KVrvh9h_c12J)@5f?5!k_Ygv*jGA8UW7?#`}+D#XXyDpKHiZ?
> z@anI_W$gOrZRl(SB7!yMqx}#E4EC&a5=}m^g_!0^0`kEl)DOuIXM6D@@*xq*8vyqH
> z)b0KTlmlgmH~xt7vG<k#Z1~z=OnyT76ZX;Ysy^;NC0^^$`kY?zKK;^vMtny1JAD$P
> zc^BR{l;i*H`IJAW`~~?1`_TXD_wQ2@UlL!DU$GCpQ-4i-O}x_^JdQTRH^e)=(_c$`
> LOT5z?_v4Aa+v(5&
> 
> literal 0
> HcmV?d00001
> 
> diff --git a/tests/data/acpi/pc/DSDT b/tests/data/acpi/pc/DSDT
> index cc1223773e9c459a8d2f20666c051a74338d40b7..bff625a25602fa954b5b395fea53e3c7dfaca485 100644
> GIT binary patch
> delta 85
> zcmeyQ_fwC{CD<jTQk;Q-F=QiG057Ni!kGAAr+5MP$;rGe;+`zQh8FQ0@s2J*JPZuX
> l3>=QZp?+M<lN)&@ggD~CY!RV&S1$v`0B2XP&5C@1oB+Xc6m$Rp  
> 
> delta 65
> zcmeyV_eqb-CD<jTNSuLzao$F*0A5ayg)#BLPVoW`laqN{#GF`y4K3n1;)6r|xR^QO
> V9bJNW7#Nr*U*I#`Y|7`t2>@&@5ljF8  
> 
> diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
> index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
> GIT binary patch
> literal 912
> zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2  
> z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
> zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
> zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
> R+q~0XMSexT%^U0Ee0~)`g#iEn
> 
> literal 0
> HcmV?d00001
> 
> diff --git a/tests/data/acpi/q35/DSDT b/tests/data/acpi/q35/DSDT
> index 842533f53e6db40935c3cdecd1d182edba6c17d4..950c286b4c751f3c116a11d8892779942375e16b 100644
> GIT binary patch
> delta 59
> zcmaFp@X3M8CD<jTNP&TYv2`OCrvjHhYfOBwQ@nsX>ttC4TZ!l<{$N9cc#e2SmmnSn
> O1||j(wg6|p5C#C(xDBxY
> 
> delta 42
> xcmez5@X&$FCD<h-QGtPh@##h`P6aMMmYDcpr+5K3mdUaTw(KHo0nUCQ3;+kH3ZMW0
> 
> diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
> index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
> GIT binary patch
> literal 912
> zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2  
> z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
> zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
> zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
> R+q~0XMSexT%^U0Ee0~)`g#iEn
> 
> literal 0
> HcmV?d00001
> 
> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
> index b3aaf76..dfb8523 100644
> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> @@ -1,7 +1 @@
>  /* List of comma-separated changed AML files to ignore */
> -"tests/data/acpi/pc/ERST",
> -"tests/data/acpi/q35/ERST",
> -"tests/data/acpi/microvm/ERST",
> -"tests/data/acpi/pc/DSDT",
> -"tests/data/acpi/q35/DSDT",
> -"tests/data/acpi/microvm/DSDT",



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 09/10] ACPI ERST: bios-tables-test testcase
  2021-08-05 22:30 ` [PATCH v6 09/10] ACPI ERST: bios-tables-test testcase Eric DeVolder
@ 2021-09-21 11:32   ` Igor Mammedov
  2021-10-04 21:13     ` Eric DeVolder
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2021-09-21 11:32 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth

On Thu,  5 Aug 2021 18:30:38 -0400
Eric DeVolder <eric.devolder@oracle.com> wrote:

> This change implements the test suite checks for the ERST table.
> 
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> ---
>  tests/qtest/bios-tables-test.c | 43 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
> index 51d3a4e..6ee78ec 100644
> --- a/tests/qtest/bios-tables-test.c
> +++ b/tests/qtest/bios-tables-test.c
> @@ -1378,6 +1378,45 @@ static void test_acpi_piix4_tcg_acpi_hmat(void)
>      test_acpi_tcg_acpi_hmat(MACHINE_PC);
>  }
>  
> +static void test_acpi_erst(const char *machine)
> +{
> +    test_data data;
> +
> +    memset(&data, 0, sizeof(data));
> +    data.machine = machine;
> +    /*data.variant = ".acpierst";*/
> +    test_acpi_one(" -object memory-backend-file,id=erstnvram,"
> +                    "mem-path=tests/acpi-erst.XXXXXX,size=0x10000,share=on"
> +                    " -device acpi-erst,memdev=erstnvram",
> +                  &data);
> +    free_test_data(&data);
> +}
> +
> +static void test_acpi_piix4_erst(void)
> +{
> +    test_acpi_erst(MACHINE_PC);
> +}
> +
> +static void test_acpi_q35_erst(void)
> +{
> +    test_acpi_erst(MACHINE_Q35);
> +}
> +
> +static void test_acpi_microvm_erst(void)
> +{
> +    test_data data;
> +
> +    test_acpi_microvm_prepare(&data);
> +    data.variant = ".pcie";
> +    data.tcg_only = true; /* need constant host-phys-bits */
> +    test_acpi_one(" -machine microvm,acpi=on,ioapic2=off,rtc=off,pcie=on "
> +                    "-object memory-backend-file,id=erstnvram,"
> +                    "mem-path=tests/acpi-erst.XXXXXX,size=0x10000,share=on "
                                 ^^^^
shouldn't the path be generated with g_dir_make_tmp() & co + corresponding cleanup

> +                    "-device acpi-erst,memdev=erstnvram",
> +                  &data);
> +    free_test_data(&data);
> +}
> +
>  static void test_acpi_virt_tcg(void)
>  {
>      test_data data = {
> @@ -1560,7 +1599,11 @@ int main(int argc, char *argv[])
>          qtest_add_func("acpi/microvm/oem-fields", test_acpi_oem_fields_microvm);
>          if (strcmp(arch, "x86_64") == 0) {
>              qtest_add_func("acpi/microvm/pcie", test_acpi_microvm_pcie_tcg);
> +            qtest_add_func("acpi/microvm/acpierst", test_acpi_microvm_erst);
>          }
> +        qtest_add_func("acpi/piix4/acpierst", test_acpi_piix4_erst);
> +        qtest_add_func("acpi/q35/acpierst", test_acpi_q35_erst);
> +
>      } else if (strcmp(arch, "aarch64") == 0) {
>          qtest_add_func("acpi/virt", test_acpi_virt_tcg);
>          qtest_add_func("acpi/virt/numamem", test_acpi_virt_tcg_numamem);



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 03/10] ACPI ERST: PCI device_id for ERST
  2021-08-05 22:30 ` [PATCH v6 03/10] ACPI ERST: PCI device_id for ERST Eric DeVolder
@ 2021-09-21 11:32   ` Igor Mammedov
  0 siblings, 0 replies; 34+ messages in thread
From: Igor Mammedov @ 2021-09-21 11:32 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth

On Thu,  5 Aug 2021 18:30:32 -0400
Eric DeVolder <eric.devolder@oracle.com> wrote:

> This change reserves the PCI device_id for the new ACPI ERST
> device.
> 
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>

Acked-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  include/hw/pci/pci.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index d0f4266..58101d8 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -108,6 +108,7 @@ extern bool pci_available;
>  #define PCI_DEVICE_ID_REDHAT_MDPY        0x000f
>  #define PCI_DEVICE_ID_REDHAT_NVME        0x0010
>  #define PCI_DEVICE_ID_REDHAT_PVPANIC     0x0011
> +#define PCI_DEVICE_ID_REDHAT_ACPI_ERST   0x0012
>  #define PCI_DEVICE_ID_REDHAT_QXL         0x0100
>  
>  #define FMT_PCIBUS                      PRIx64



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 05/10] ACPI ERST: support for ACPI ERST feature
  2021-08-05 22:30 ` [PATCH v6 05/10] ACPI ERST: support for ACPI ERST feature Eric DeVolder
@ 2021-09-21 15:30   ` Igor Mammedov
  2021-10-04 21:13     ` Eric DeVolder
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2021-09-21 15:30 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth

On Thu,  5 Aug 2021 18:30:34 -0400
Eric DeVolder <eric.devolder@oracle.com> wrote:

> This implements a PCI device for ACPI ERST. This implements the
> non-NVRAM "mode" of operation for ERST as it is supported by
> Linux and Windows.
> 
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> ---
>  hw/acpi/erst.c       | 750 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/acpi/meson.build  |   1 +
>  hw/acpi/trace-events |  15 ++
>  3 files changed, 766 insertions(+)
>  create mode 100644 hw/acpi/erst.c
> 
> diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
> new file mode 100644
> index 0000000..eb4ab34
> --- /dev/null
> +++ b/hw/acpi/erst.c
> @@ -0,0 +1,750 @@
> +/*
> + * ACPI Error Record Serialization Table, ERST, Implementation
> + *
> + * ACPI ERST introduced in ACPI 4.0, June 16, 2009.
> + * ACPI Platform Error Interfaces : Error Serialization
> + *
> + * Copyright (c) 2021 Oracle and/or its affiliates.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <unistd.h>
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "hw/qdev-core.h"
> +#include "exec/memory.h"
> +#include "qom/object.h"
> +#include "hw/pci/pci.h"
> +#include "qom/object_interfaces.h"
> +#include "qemu/error-report.h"
> +#include "migration/vmstate.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/acpi-defs.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "exec/address-spaces.h"
> +#include "sysemu/hostmem.h"
> +#include "hw/acpi/erst.h"
> +#include "trace.h"
> +
> +/* ACPI 4.0: Table 17-16 Serialization Actions */
> +#define ACTION_BEGIN_WRITE_OPERATION         0x0
> +#define ACTION_BEGIN_READ_OPERATION          0x1
> +#define ACTION_BEGIN_CLEAR_OPERATION         0x2
> +#define ACTION_END_OPERATION                 0x3
> +#define ACTION_SET_RECORD_OFFSET             0x4
> +#define ACTION_EXECUTE_OPERATION             0x5
> +#define ACTION_CHECK_BUSY_STATUS             0x6
> +#define ACTION_GET_COMMAND_STATUS            0x7
> +#define ACTION_GET_RECORD_IDENTIFIER         0x8
> +#define ACTION_SET_RECORD_IDENTIFIER         0x9
> +#define ACTION_GET_RECORD_COUNT              0xA
> +#define ACTION_BEGIN_DUMMY_WRITE_OPERATION   0xB
> +#define ACTION_RESERVED                      0xC
> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE   0xD
> +#define ACTION_GET_ERROR_LOG_ADDRESS_LENGTH  0xE
> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES 0xF
> +#define ACTION_GET_EXECUTE_OPERATION_TIMINGS 0x10
> +
> +/* ACPI 4.0: Table 17-17 Command Status Definitions */
> +#define STATUS_SUCCESS                0x00
> +#define STATUS_NOT_ENOUGH_SPACE       0x01
> +#define STATUS_HARDWARE_NOT_AVAILABLE 0x02
> +#define STATUS_FAILED                 0x03
> +#define STATUS_RECORD_STORE_EMPTY     0x04
> +#define STATUS_RECORD_NOT_FOUND       0x05
> +
> +
> +/* UEFI 2.1: Appendix N Common Platform Error Record */
> +#define UEFI_CPER_RECORD_MIN_SIZE 128U
> +#define UEFI_CPER_RECORD_LENGTH_OFFSET 20U
> +#define UEFI_CPER_RECORD_ID_OFFSET 96U
> +#define IS_UEFI_CPER_RECORD(ptr) \
> +    (((ptr)[0] == 'C') && \
> +     ((ptr)[1] == 'P') && \
> +     ((ptr)[2] == 'E') && \
> +     ((ptr)[3] == 'R'))
> +#define THE_UEFI_CPER_RECORD_ID(ptr) \
> +    (*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
> +
> +/*
> + * This implementation is an ACTION (cmd) and VALUE (data)
> + * interface consisting of just two 64-bit registers.
> + */
> +#define ERST_REG_SIZE (16UL)
> +#define ERST_ACTION_OFFSET (0UL) /* action (cmd) */
> +#define ERST_VALUE_OFFSET  (8UL) /* argument/value (data) */
> +
> +/*
> + * ERST_RECORD_SIZE is the buffer size for exchanging ERST
> + * record contents. Thus, it defines the maximum record size.
> + * As this is mapped through a PCI BAR, it must be a power of
> + * two and larger than UEFI_CPER_RECORD_MIN_SIZE.
> + * The backing storage is divided into fixed size "slots",
> + * each ERST_RECORD_SIZE in length, and each "slot"
> + * storing a single record. No attempt at optimizing storage
> + * through compression, compaction, etc is attempted.
> + * NOTE that slot 0 is reserved for the backing storage header.
> + * Depending upon the size of the backing storage, additional
> + * slots will be part of the slot 0 header in order to account
> + * for a record_id for each available remaining slot.
> + */
> +/* 8KiB records, not too small, not too big */
> +#define ERST_RECORD_SIZE (8192UL)
> +
> +#define ACPI_ERST_MEMDEV_PROP "memdev"
> +
> +/*
> + * From the ACPI ERST spec sections:
> + * A record id of all 0s is used to indicate
> + * 'unspecified' record id.
> + * A record id of all 1s is used to indicate
> + * empty or end.
> + */
> +#define ERST_UNSPECIFIED_RECORD_ID (0UL)
> +#define ERST_EMPTY_END_RECORD_ID (~0UL)
> +#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
> +#define ERST_IS_VALID_RECORD_ID(rid) \
> +    ((rid != ERST_UNSPECIFIED_RECORD_ID) && \
> +     (rid != ERST_EMPTY_END_RECORD_ID))
> +
> +typedef struct erst_storage_header_s {

> +#define ERST_STORE_MAGIC 0x524F545354535245UL

move it out of structure definition,
also where value comes from? (perhaps something starting
with ERST... would be more self-describing)

> +    uint64_t magic;
> +    uint32_t record_size;
> +    uint32_t record_offset; /* offset to record storage beyond header */
> +    uint16_t version;
> +    uint16_t reserved;
> +    uint32_t record_count;
> +    uint64_t map[]; /* contains record_ids, and position indicates index */
> +} erst_storage_header_t;
docs/devel/style.rst: Typedefs

also give it's used as header layout in storage,
set packed attribute for structure

> +
> +/*
> + * Object cast macro
> + */
> +#define ACPIERST(obj) \
> +    OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
> +
> +/*
> + * Main ERST device state structure
> + */
> +typedef struct {
> +    PCIDevice parent_obj;
> +
> +    /* Backend storage */
> +    HostMemoryBackend *hostmem;
> +    MemoryRegion *hostmem_mr;
> +
> +    /* Programming registers */
> +    MemoryRegion iomem;
> +
> +    /* Exchange buffer */
> +    Object *exchange_obj;
> +    HostMemoryBackend *exchange;
> +    MemoryRegion *exchange_mr;
> +    uint32_t storage_size;
> +
> +    /* Interface state */
> +    uint8_t operation;
> +    uint8_t busy_status;
> +    uint8_t command_status;
> +    uint32_t record_offset;
> +    uint64_t reg_action;
> +    uint64_t reg_value;
> +    uint64_t record_identifier;
> +    erst_storage_header_t *header;
> +    unsigned next_record_index;
> +    unsigned first_record_index;
> +    unsigned last_record_index;
> +
> +} ERSTDeviceState;
> +
> +/*******************************************************************/
> +/*******************************************************************/
> +
> +static uint8_t *get_nvram_ptr_by_index(ERSTDeviceState *s, unsigned index)
> +{
> +    uint8_t *rc = NULL;
> +    off_t offset = (index * ERST_RECORD_SIZE);

> +    if ((offset + ERST_RECORD_SIZE) <= s->storage_size) {

it looks like 'index' passed by caller is always valid, if it's the case
convert  this to
        g_assert((offset + ERST_RECORD_SIZE) <= s->storage_size))

also shouldn't <= be just <


> +        if (s->hostmem_mr) {
can hostmem_mr be NULL, when this function is called?
if not I'd drop condition.

> +            uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(s->hostmem_mr);
> +            rc = p + offset;
> +        }
> +    }
> +    return rc;
> +}
> +
> +static void make_erst_storage_header(ERSTDeviceState *s)
> +{
> +    erst_storage_header_t *header = s->header;
> +    unsigned mapsz, headersz;
> +
> +    header->magic = ERST_STORE_MAGIC;
> +    header->record_size = ERST_RECORD_SIZE;
> +    header->version = 0x0101;

maybe 0 or 1 to avoid question about what previous versions are

> +    header->reserved = 0x0000;
s/0x.../0/

> +
> +    /* Compute mapsize */
> +    mapsz = s->storage_size / ERST_RECORD_SIZE;
> +    mapsz *= sizeof(uint64_t);
> +    /* Compute header+map size */
> +    headersz = sizeof(erst_storage_header_t) + mapsz;

> +    /* Round up to nearest integer multiple of ERST_RECORD_SIZE */
> +    headersz += (ERST_RECORD_SIZE - 1);
> +    headersz /= ERST_RECORD_SIZE;
> +    headersz *= ERST_RECORD_SIZE;
git grep ROUND_UP
may be of help here

> +    header->record_offset = headersz;
> +
> +    /*
> +     * The HostMemoryBackend initializes contents to zero,
> +     * so all record_ids stashed in the map are zero'd.
> +     * As well the record_count is zero. Properly initialized.
> +     */
> +}
> +
> +static void check_erst_backend_storage(ERSTDeviceState *s, Error **errp)
> +{
> +    erst_storage_header_t *header;
> +
> +    header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);
optionally check/assert if it's not 64bit aligned,
if it's not you risk getting killed by SIGBUG on some hosts,
since you're accessing fields directly.

> +    s->header = header;
> +
> +    /* Check if header is uninitialized */
> +    if (header->magic == 0UL) { /* HostMemoryBackend inits to 0 */
> +        make_erst_storage_header(s);
> +    }
> +
> +    if (!(
> +        (header->magic == ERST_STORE_MAGIC) &&
> +        (header->record_size == ERST_RECORD_SIZE) &&
> +        ((header->record_offset % ERST_RECORD_SIZE) == 0) &&
> +        (header->version == 0x0101) &&
> +        (header->reserved == 0x0000)
> +        )) {
> +        error_setg(errp, "ERST backend storage header is invalid");
> +    }
> +
> +    /* Compute offset of first and last record storage slot */
> +    s->first_record_index = header->record_offset / ERST_RECORD_SIZE;
> +    s->last_record_index = (s->storage_size / ERST_RECORD_SIZE);

applies to whole patch/series,
if mmaped header values are interpreted as integers you shall
take care of endianness, i.e. use cpu_to_foo/foo_to_cpu for access

and document file endianness in doc (2/10)

> +}
> +
> +static void set_erst_map_by_index(ERSTDeviceState *s, unsigned index,
> +    uint64_t record_id)

update_[cache|map]_[entry|record_id]() or something like this might be
a better description erst and index don't really add much here as it's
clear from context.


> +{
> +    if (index < s->last_record_index) {
> +        s->header->map[index] = record_id;
> +    }
> +}
> +
> +static unsigned lookup_erst_record(ERSTDeviceState *s,
> +    uint64_t record_identifier)
> +{
> +    unsigned rc = 0; /* 0 not a valid index */
> +    unsigned index = s->first_record_index;
> +
> +    /* Find the record_identifier in the map */
> +    if (record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
> +        /*
> +         * Count number of valid records encountered, and
> +         * short-circuit the loop if identifier not found
> +         */
> +        unsigned count = 0;
> +        for (; index < s->last_record_index &&
> +                count < s->header->record_count; ++index) {
> +            uint64_t map_record_identifier = s->header->map[index];
I'd drop map_record_identifier and use s->header->map[index] directly,
i.e
   if (s->header->map[index] ...

> +            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
> +                ++count;
> +            }
> +            if (map_record_identifier == record_identifier) {
> +                rc = index;
> +                break;
> +            }
> +        }
> +    } else {
> +        /* Find first available unoccupied slot */
> +        for (; index < s->last_record_index; ++index) {
> +            if (s->header->map[index] == ERST_UNSPECIFIED_RECORD_ID) {
> +                rc = index;
> +                break;
> +            }
> +        }
> +    }
> +
> +    return rc;
> +}

what's the reason for combining lookup and allocate ops,
if they where separated it' would be easier to follow code.

> +
> +/* ACPI 4.0: 17.4.2.3 Operations - Clearing */
> +static unsigned clear_erst_record(ERSTDeviceState *s)
> +{
> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
> +    unsigned index;
> +
> +    /* Check for valid record identifier */
> +    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
> +        return STATUS_FAILED;
> +    }
> +
> +    index = lookup_erst_record(s, s->record_identifier);
> +    if (index) {
> +        /* No need to wipe record, just invalidate its map entry */
> +        set_erst_map_by_index(s, index, ERST_UNSPECIFIED_RECORD_ID);
> +        s->header->record_count -= 1;
> +        rc = STATUS_SUCCESS;
> +    }
> +
> +    return rc;
> +}
> +
> +/* ACPI 4.0: 17.4.2.2 Operations - Reading */
> +static unsigned read_erst_record(ERSTDeviceState *s)
> +{
> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
> +    unsigned index;
> +
> +    /* Check record boundary wihin exchange buffer */
                                ^^^ typo

> +    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
> +        return STATUS_FAILED;
> +    }
> +
> +    /* Check for valid record identifier */
> +    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
> +        return STATUS_FAILED;
> +    }
> +
> +    index = lookup_erst_record(s, s->record_identifier);
> +    if (index) {
> +        uint8_t *ptr;
> +        uint8_t *record = ((uint8_t *)
> +            memory_region_get_ram_ptr(s->exchange_mr) +
> +            s->record_offset);
> +        ptr = get_nvram_ptr_by_index(s, index);
> +        memcpy(record, ptr, ERST_RECORD_SIZE - s->record_offset);

if record_offset is large enough that record won't fit,
this will copy truncated record into the exchange buffer.

Maybe it's better to fail whole op?

> +        rc = STATUS_SUCCESS;
> +    }
> +
> +    return rc;
> +}
> +
> +/* ACPI 4.0: 17.4.2.1 Operations - Writing */
> +static unsigned write_erst_record(ERSTDeviceState *s)
> +{
> +    unsigned rc = STATUS_FAILED;
> +    unsigned index;
> +    uint64_t record_identifier;
> +    uint8_t *record;
> +    uint8_t *ptr = NULL;
> +    bool record_found = false;
> +
> +    /* Check record boundary wihin exchange buffer */
ditto, typo

> +    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
> +        return STATUS_FAILED;
> +    }
> +
> +    /* Extract record identifier */
> +    record = ((uint8_t *)memory_region_get_ram_ptr(s->exchange_mr)
> +        + s->record_offset);
> +    record_identifier = THE_UEFI_CPER_RECORD_ID(record);
potentially unaligned access to int, should use memcpy()

> +
> +    /* Check for valid record identifier */
> +    if (!ERST_IS_VALID_RECORD_ID(record_identifier)) {
> +        return STATUS_FAILED;
> +    }
> +
> +    index = lookup_erst_record(s, record_identifier);
> +    if (index) {
> +        /* Record found, overwrite existing record */
> +        ptr = get_nvram_ptr_by_index(s, index);
> +        record_found = true;
> +    } else {
> +        /* Record not found, not an overwrite, allocate for write */
> +        index = lookup_erst_record(s, ERST_UNSPECIFIED_RECORD_ID);
> +        if (index) {
> +            ptr = get_nvram_ptr_by_index(s, index);
> +        } else {
> +            rc = STATUS_NOT_ENOUGH_SPACE;
> +        }
> +    }
> +    if (ptr) {
> +        memcpy(ptr, record, ERST_RECORD_SIZE - s->record_offset);

> +        if (0 != s->record_offset) {
> +            memset(&ptr[ERST_RECORD_SIZE - s->record_offset],
> +                0xFF, s->record_offset);
> +        }
you've lost me here, care to explain what's going on here?

> +        if (!record_found) {
> +            s->header->record_count += 1; /* writing new record */
> +        }
> +        set_erst_map_by_index(s, index, record_identifier);
> +        rc = STATUS_SUCCESS;
> +    }
> +
> +    return rc;
> +}
> +
> +/* ACPI 4.0: 17.4.2.2 Operations - Reading "During boot..." */
> +static unsigned next_erst_record(ERSTDeviceState *s,
> +    uint64_t *record_identifier)
s/record_identifier/found.../

> +{
> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
> +    unsigned index = s->next_record_index;
> +
> +    *record_identifier = ERST_EMPTY_END_RECORD_ID;
> +
> +    if (s->header->record_count) {
> +        for (; index < s->last_record_index; ++index) {
> +            uint64_t map_record_identifier;
and then s/map_record_identifier/record_identifier/

the same applies to other occurrences within patch
(map_record_identifier is a bit confusing) or drop it
and use s->header->map[index] directly

> +            map_record_identifier = s->header->map[index];
> +            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
> +                    /* where to start next time */
> +                    s->next_record_index = index + 1;
> +                    *record_identifier = map_record_identifier;
> +                    rc = STATUS_SUCCESS;
> +                    break;
> +            }
> +        }
> +    }
> +    if (rc != STATUS_SUCCESS) {
> +        if (s->next_record_index == s->first_record_index) {
> +            /*
> +             * next_record_identifier is unchanged, no records found
> +             * and *record_identifier contains EMPTY_END id
> +             */
> +            rc = STATUS_RECORD_STORE_EMPTY;
> +        }
> +        /* at end/scan complete, reset */
> +        s->next_record_index = s->first_record_index;
> +    }

Table 17-16, says return existing error or ERST_EMPTY_END_RECORD_ID
but nothing about op returning a error, so I'd assume status
should always be STATUS_SUCCESS for GET_RECORD_IDENTIFIER.

Advancing to the next record is part of record READ op and
not the part of GET_RECORD_IDENTIFIER as it's done here.
  "The steps performed by the platform to carry out ...
     2. ..
        c. If the specified error record does not exist,
           ... update the status register’s Identifier field with the identifier of the
‘first’ error record
     4. Record the Identifier of the ‘next’ valid error record ...
  "


> +
> +    return rc;
> +}
> +
> +/*******************************************************************/
> +
> +static uint64_t erst_rd_reg64(hwaddr addr,
> +    uint64_t reg, unsigned size)
> +{
> +    uint64_t rdval;
> +    uint64_t mask;
> +    unsigned shift;
> +
> +    if (size == sizeof(uint64_t)) {
> +        /* 64b access */
> +        mask = 0xFFFFFFFFFFFFFFFFUL;
> +        shift = 0;
> +    } else {
> +        /* 32b access */
> +        mask = 0x00000000FFFFFFFFUL;
> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
> +    }
> +
> +    rdval = reg;
> +    rdval >>= shift;
> +    rdval &= mask;
> +
> +    return rdval;
> +}
> +
> +static uint64_t erst_wr_reg64(hwaddr addr,
> +    uint64_t reg, uint64_t val, unsigned size)
> +{
> +    uint64_t wrval;
> +    uint64_t mask;
> +    unsigned shift;
> +    if (size == sizeof(uint64_t)) {
> +        /* 64b access */
> +        mask = 0xFFFFFFFFFFFFFFFFUL;
> +        shift = 0;
> +    } else {
> +        /* 32b access */
> +        mask = 0x00000000FFFFFFFFUL;
> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
> +    }
> +
> +    val &= mask;
> +    val <<= shift;
> +    mask <<= shift;
> +    wrval = reg;
> +    wrval &= ~mask;
> +    wrval |= val;
> +
> +    return wrval;
> +}
> +
> +static void erst_reg_write(void *opaque, hwaddr addr,
> +    uint64_t val, unsigned size)
> +{
> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> +
> +    /*
> +     * NOTE: All actions/operations/side effects happen on the WRITE,
> +     * by design. The READs simply return the reg_value contents.

point to spec, pls.

> +     */
> +    trace_acpi_erst_reg_write(addr, val, size);
> +
> +    switch (addr) {
> +    case ERST_VALUE_OFFSET + 0:
> +    case ERST_VALUE_OFFSET + 4:
> +        s->reg_value = erst_wr_reg64(addr, s->reg_value, val, size);
> +        break;
> +    case ERST_ACTION_OFFSET + 0:

> +/*  case ERST_ACTION_OFFSET+4: as coded, not really a 64b register */

what does this mean? 

> +        switch (val) {
> +        case ACTION_BEGIN_WRITE_OPERATION:
> +        case ACTION_BEGIN_READ_OPERATION:
> +        case ACTION_BEGIN_CLEAR_OPERATION:
> +        case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> +        case ACTION_END_OPERATION:
> +            s->operation = val;
> +            break;
> +        case ACTION_SET_RECORD_OFFSET:
> +            s->record_offset = s->reg_value;
> +            break;
> +        case ACTION_EXECUTE_OPERATION:
> +            if ((uint8_t)s->reg_value == ERST_EXECUTE_OPERATION_MAGIC) {
> +                s->busy_status = 1;
> +                switch (s->operation) {
> +                case ACTION_BEGIN_WRITE_OPERATION:
> +                    s->command_status = write_erst_record(s);
> +                    break;
> +                case ACTION_BEGIN_READ_OPERATION:
> +                    s->command_status = read_erst_record(s);
> +                    break;
> +                case ACTION_BEGIN_CLEAR_OPERATION:
> +                    s->command_status = clear_erst_record(s);
> +                    break;
> +                case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> +                    s->command_status = STATUS_SUCCESS;
> +                    break;
> +                case ACTION_END_OPERATION:
> +                    s->command_status = STATUS_SUCCESS;
> +                    break;
> +                default:
> +                    s->command_status = STATUS_FAILED;
> +                    break;
> +                }
> +                s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;
                   shouldn't happen in case of Read op

"
17.4.2.2
4. Record the Identifier of the ‘next’ valid error record that resides on the persistent store. This
allows OSPM to retrieve a valid record identifier by executing a GET_RECORD_IDENTIFIER
operation.
"

> +                s->busy_status = 0;
> +            }
> +            break;
> +        case ACTION_CHECK_BUSY_STATUS:
> +            s->reg_value = s->busy_status;
> +            break;
> +        case ACTION_GET_COMMAND_STATUS:
> +            s->reg_value = s->command_status;
> +            break;
> +        case ACTION_GET_RECORD_IDENTIFIER:
> +            s->command_status = next_erst_record(s, &s->reg_value);
> +            break;
> +        case ACTION_SET_RECORD_IDENTIFIER:
> +            s->record_identifier = s->reg_value;
> +            break;
> +        case ACTION_GET_RECORD_COUNT:
> +            s->reg_value = s->header->record_count;
> +            break;
> +        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
> +            s->reg_value = (hwaddr)pci_get_bar_addr(PCI_DEVICE(s), 1);
> +            break;
> +        case ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
> +            s->reg_value = ERST_RECORD_SIZE;
> +            break;
> +        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
> +            s->reg_value = 0x0; /* intentional, not NVRAM mode */
> +            break;
> +        case ACTION_GET_EXECUTE_OPERATION_TIMINGS:
> +            s->reg_value =
> +                (100ULL << 32) | /* 100us max time */
> +                (10ULL  <<  0) ; /*  10us min time */
> +            break;
> +        default:
> +            /* Unknown action/command, NOP */
> +            break;
> +        }
> +        break;
> +    default:
> +        /* This should not happen, but if it does, NOP */
> +        break;
> +    }
> +}
> +
> +static uint64_t erst_reg_read(void *opaque, hwaddr addr,
> +                                unsigned size)
> +{
> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> +    uint64_t val = 0;
> +
> +    switch (addr) {
> +    case ERST_ACTION_OFFSET + 0:
> +    case ERST_ACTION_OFFSET + 4:
> +        val = erst_rd_reg64(addr, s->reg_action, size);
> +        break;
> +    case ERST_VALUE_OFFSET + 0:
> +    case ERST_VALUE_OFFSET + 4:
> +        val = erst_rd_reg64(addr, s->reg_value, size);
> +        break;
> +    default:
> +        break;
> +    }
> +    trace_acpi_erst_reg_read(addr, val, size);
> +    return val;
> +}
> +
> +static const MemoryRegionOps erst_reg_ops = {
> +    .read = erst_reg_read,
> +    .write = erst_reg_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +};
> +
> +/*******************************************************************/
> +/*******************************************************************/
> +static int erst_post_load(void *opaque, int version_id)
> +{
> +    ERSTDeviceState *s = opaque;
> +
> +    /* Recompute pointer to header */
> +    s->header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);
> +    trace_acpi_erst_post_load(s->header);
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription erst_vmstate  = {
> +    .name = "acpi-erst",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .post_load = erst_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(storage_size, ERSTDeviceState),
 1)
> +        VMSTATE_UINT8(operation, ERSTDeviceState),
> +        VMSTATE_UINT8(busy_status, ERSTDeviceState),
> +        VMSTATE_UINT8(command_status, ERSTDeviceState),
> +        VMSTATE_UINT32(record_offset, ERSTDeviceState),
> +        VMSTATE_UINT64(reg_action, ERSTDeviceState),
> +        VMSTATE_UINT64(reg_value, ERSTDeviceState),
> +        VMSTATE_UINT64(record_identifier, ERSTDeviceState),
> +        VMSTATE_UINT32(next_record_index, ERSTDeviceState),

> +        VMSTATE_UINT32(first_record_index, ERSTDeviceState),
> +        VMSTATE_UINT32(last_record_index, ERSTDeviceState),
 2)
> +        VMSTATE_END_OF_LIST()
> +    }
> +};

 1 and 2 aren't runtime state, so why they are in migration stream?

I'd imagine size could be used to check that backend on target is of the same size
to avoid buffer overrun if target side has smaller backend, and fail migration if
it's not the same. But it aren't used this way here.

the rest could be calculated at realize time.

> +
> +static void erst_realizefn(PCIDevice *pci_dev, Error **errp)
> +{
> +    ERSTDeviceState *s = ACPIERST(pci_dev);
> +
> +    trace_acpi_erst_realizefn_in();
> +
> +    if (!s->hostmem) {
> +        error_setg(errp, "'" ACPI_ERST_MEMDEV_PROP "' property is not set");
> +        return;
> +    } else if (host_memory_backend_is_mapped(s->hostmem)) {
> +        error_setg(errp, "can't use already busy memdev: %s",
> +                   object_get_canonical_path_component(OBJECT(s->hostmem)));
> +        return;
> +    }
> +
> +    s->hostmem_mr = host_memory_backend_get_memory(s->hostmem);
> +
> +    /* HostMemoryBackend size will be multiple of PAGE_SIZE */
> +    s->storage_size = object_property_get_int(OBJECT(s->hostmem), "size", errp);
> +
> +    /* Check storage_size against ERST_RECORD_SIZE */
> +    if (((s->storage_size % ERST_RECORD_SIZE) != 0) ||
> +         (ERST_RECORD_SIZE > s->storage_size)) {
> +        error_setg(errp, "ACPI ERST requires size be multiple of "
> +            "record size (%luKiB)", ERST_RECORD_SIZE);
> +    }
> +
> +    /* Initialize backend storage and record_count */
> +    check_erst_backend_storage(s, errp);
> +
> +    /* BAR 0: Programming registers */
> +    memory_region_init_io(&s->iomem, OBJECT(pci_dev), &erst_reg_ops, s,
> +                          TYPE_ACPI_ERST, ERST_REG_SIZE);
> +    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->iomem);
> +
> +    /* BAR 1: Exchange buffer memory */


> +    /* Create a hostmem object to use as the exchange buffer */
> +    s->exchange_obj = object_new(TYPE_MEMORY_BACKEND_RAM);
> +    object_property_set_int(s->exchange_obj, "size", ERST_RECORD_SIZE, errp);
> +    user_creatable_complete(USER_CREATABLE(s->exchange_obj), errp);
> +    s->exchange = MEMORY_BACKEND(s->exchange_obj);
> +    host_memory_backend_set_mapped(s->exchange, true);
> +    s->exchange_mr = host_memory_backend_get_memory(s->exchange);
replace this block with single memory_region_init_ram()


> +    memory_region_init_resizeable_ram(s->exchange_mr, OBJECT(pci_dev),
> +        TYPE_ACPI_ERST, ERST_RECORD_SIZE, ERST_RECORD_SIZE, NULL, errp);
have ho idea why it's necessary, seems just wrong, it basically leaks
previous memory region and creates a new one.

> +    pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, s->exchange_mr);

> +    /* Include the exchange buffer in the migration stream */
> +    vmstate_register_ram_global(s->exchange_mr);
not necessary if memory_region_init_ram() is used directly

> +
> +    /* Include the backend storage in the migration stream */
> +    vmstate_register_ram_global(s->hostmem_mr);
> +
> +    trace_acpi_erst_realizefn_out(s->storage_size);
> +}
> +
> +static void erst_reset(DeviceState *dev)
> +{
> +    ERSTDeviceState *s = ACPIERST(dev);
> +
> +    trace_acpi_erst_reset_in(s->header->record_count);
> +    s->operation = 0;
> +    s->busy_status = 0;
> +    s->command_status = STATUS_SUCCESS;
> +    s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;
> +    s->record_offset = 0;
> +    s->next_record_index = s->first_record_index;
> +    /* NOTE: first/last_record_index are computed only once */
> +    trace_acpi_erst_reset_out(s->header->record_count);
> +}
> +
> +static Property erst_properties[] = {
> +    DEFINE_PROP_LINK(ACPI_ERST_MEMDEV_PROP, ERSTDeviceState, hostmem,
> +                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void erst_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> +    trace_acpi_erst_class_init_in();
> +    k->realize = erst_realizefn;
> +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> +    k->device_id = PCI_DEVICE_ID_REDHAT_ACPI_ERST;
> +    k->revision = 0x00;
> +    k->class_id = PCI_CLASS_OTHERS;
> +    dc->reset = erst_reset;
> +    dc->vmsd = &erst_vmstate;
> +    dc->user_creatable = true;

can't be hotplugged, add:
       dc->hotpluggable = false;

> +    device_class_set_props(dc, erst_properties);
> +    dc->desc = "ACPI Error Record Serialization Table (ERST) device";
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +    trace_acpi_erst_class_init_out();
> +}
> +
> +static const TypeInfo erst_type_info = {
> +    .name          = TYPE_ACPI_ERST,
> +    .parent        = TYPE_PCI_DEVICE,
> +    .class_init    = erst_class_init,
> +    .instance_size = sizeof(ERSTDeviceState),
> +    .interfaces = (InterfaceInfo[]) {
> +        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> +        { }
> +    }
> +};
> +
> +static void erst_register_types(void)
> +{
> +    type_register_static(&erst_type_info);
> +}
> +
> +type_init(erst_register_types)
> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
> index 29f804d..401d0e5 100644
> --- a/hw/acpi/meson.build
> +++ b/hw/acpi/meson.build
> @@ -5,6 +5,7 @@ acpi_ss.add(files(
>    'bios-linker-loader.c',
>    'core.c',
>    'utils.c',
> +  'erst.c',
>  ))
>  acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu.c'))
>  acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu_hotplug.c'))
> diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
> index 974d770..3579768 100644
> --- a/hw/acpi/trace-events
> +++ b/hw/acpi/trace-events
> @@ -55,3 +55,18 @@ piix4_gpe_writeb(uint64_t addr, unsigned width, uint64_t val) "addr: 0x%" PRIx64
>  # tco.c
>  tco_timer_reload(int ticks, int msec) "ticks=%d (%d ms)"
>  tco_timer_expired(int timeouts_no, bool strap, bool no_reboot) "timeouts_no=%d no_reboot=%d/%d"
> +
> +# erst.c
> +acpi_erst_reg_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%04" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
> +acpi_erst_reg_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%04" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
> +acpi_erst_mem_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%06" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
> +acpi_erst_mem_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%06" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
> +acpi_erst_pci_bar_0(uint64_t addr) "BAR0: 0x%016" PRIx64
> +acpi_erst_pci_bar_1(uint64_t addr) "BAR1: 0x%016" PRIx64
> +acpi_erst_realizefn_in(void)
> +acpi_erst_realizefn_out(unsigned size) "total nvram size %u bytes"
> +acpi_erst_reset_in(unsigned record_count) "record_count %u"
> +acpi_erst_reset_out(unsigned record_count) "record_count %u"
> +acpi_erst_post_load(void *header) "header: 0x%p"
> +acpi_erst_class_init_in(void)
> +acpi_erst_class_init_out(void)



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 01/10] ACPI ERST: bios-tables-test.c steps 1 and 2
  2021-09-20 13:05   ` Igor Mammedov
@ 2021-10-04 20:37     ` Eric DeVolder
  0 siblings, 0 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-10-04 20:37 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth



On 9/20/21 8:05 AM, Igor Mammedov wrote:
> On Thu,  5 Aug 2021 18:30:30 -0400
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
>> Following the guidelines in tests/qtest/bios-tables-test.c, this
>> change adds empty placeholder files per step 1 for the new ERST
>> table, and excludes resulting changed files in bios-tables-test-allowed-diff.h
>> per step 2.
>>
> 
> I'd move this right before 10/10

done

> 
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> 
> Acked-by: Igor Mammedov <imammedo@redhat.com>
> 
> 
>> ---
>>   tests/data/acpi/microvm/ERST                | 0
>>   tests/data/acpi/pc/ERST                     | 0
>>   tests/data/acpi/q35/ERST                    | 0
>>   tests/qtest/bios-tables-test-allowed-diff.h | 6 ++++++
>>   4 files changed, 6 insertions(+)
>>   create mode 100644 tests/data/acpi/microvm/ERST
>>   create mode 100644 tests/data/acpi/pc/ERST
>>   create mode 100644 tests/data/acpi/q35/ERST
>>
>> diff --git a/tests/data/acpi/microvm/ERST b/tests/data/acpi/microvm/ERST
>> new file mode 100644
>> index 0000000..e69de29
>> diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
>> new file mode 100644
>> index 0000000..e69de29
>> diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
>> new file mode 100644
>> index 0000000..e69de29
>> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
>> index dfb8523..b3aaf76 100644
>> --- a/tests/qtest/bios-tables-test-allowed-diff.h
>> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
>> @@ -1 +1,7 @@
>>   /* List of comma-separated changed AML files to ignore */
>> +"tests/data/acpi/pc/ERST",
>> +"tests/data/acpi/q35/ERST",
>> +"tests/data/acpi/microvm/ERST",
>> +"tests/data/acpi/pc/DSDT",
>> +"tests/data/acpi/q35/DSDT",
>> +"tests/data/acpi/microvm/DSDT",
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 02/10] ACPI ERST: specification for ERST support
  2021-09-20 13:38   ` Igor Mammedov
@ 2021-10-04 20:40     ` Eric DeVolder
  0 siblings, 0 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-10-04 20:40 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Daniel P. Berrangé,
	ehabkost, mst, konrad.wilk, qemu-devel, pbonzini,
	boris.ostrovsky, eblake, rth



On 9/20/21 8:38 AM, Igor Mammedov wrote:
> On Thu,  5 Aug 2021 18:30:31 -0400
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
>> Information on the implementation of the ACPI ERST support.
>>
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> 
> modulo missing parts documentation looks good to but
> I'm tainted at this point (after so many reviews) so
> libvirt folks (CCed) can take look at it and see if
> something needs to be changed here.

OK. I'll wait for Daniel's feedback before posting v7.

> 
>> ---
>>   docs/specs/acpi_erst.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 147 insertions(+)
>>   create mode 100644 docs/specs/acpi_erst.txt
>>
>> diff --git a/docs/specs/acpi_erst.txt b/docs/specs/acpi_erst.txt
>> new file mode 100644
>> index 0000000..7f7544f
>> --- /dev/null
>> +++ b/docs/specs/acpi_erst.txt
>> @@ -0,0 +1,147 @@
>> +ACPI ERST DEVICE
>> +================
>> +
>> +The ACPI ERST device is utilized to support the ACPI Error Record
>> +Serialization Table, ERST, functionality. This feature is designed for
>> +storing error records in persistent storage for future reference
>> +and/or debugging.
>> +
>> +The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
>> +(APEI)", and specifically subsection "Error Serialization", outlines a
>> +method for storing error records into persistent storage.
>> +
>> +The format of error records is described in the UEFI specification[2],
>> +in Appendix N "Common Platform Error Record".
>> +
>> +While the ACPI specification allows for an NVRAM "mode" (see
>> +GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES) where non-volatile RAM is
>> +directly exposed for direct access by the OS/guest, this device
>> +implements the non-NVRAM "mode". This non-NVRAM "mode" is what is
>> +implemented by most BIOS (since flash memory requires programming
>> +operations in order to update its contents). Furthermore, as of the
>> +time of this writing, Linux only supports the non-NVRAM "mode".
>> +
>> +
>> +Background/Motivation
>> +---------------------
>> +
>> +Linux uses the persistent storage filesystem, pstore, to record
>> +information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
>> +independent of, and runs before, kdump.  In certain scenarios (ie.
>> +hosts/guests with root filesystems on NFS/iSCSI where networking
>> +software and/or hardware fails), pstore may contain information
>> +available for post-mortem debugging.
>> +
>> +Two common storage backends for the pstore filesystem are ACPI ERST
>> +and UEFI. Most BIOS implement ACPI ERST.  UEFI is not utilized in all
>> +guests. With QEMU supporting ACPI ERST, it becomes a viable pstore
>> +storage backend for virtual machines (as it is now for bare metal
>> +machines).
>> +
>> +Enabling support for ACPI ERST facilitates a consistent method to
>> +capture kernel panic information in a wide range of guests: from
>> +resource-constrained microvms to very large guests, and in particular,
>> +in direct-boot environments (which would lack UEFI run-time services).
>> +
>> +Note that Microsoft Windows also utilizes the ACPI ERST for certain
>> +crash information, if available[3].
>> +
>> +
>> +Configuration|Usage
>> +-------------------
>> +
>> +To use ACPI ERST, a memory-backend-file object and acpi-erst device
>> +can be created, for example:
>> +
>> + qemu ...
>> + -object memory-backend-file,id=erstnvram,mem-path=acpi-erst.backing,size=0x10000,share=on \
>> + -device acpi-erst,memdev=erstnvram
>> +
>> +For proper operation, the ACPI ERST device needs a memory-backend-file
>> +object with the following parameters:
>> +
>> + - id: The id of the memory-backend-file object is used to associate
>> +   this memory with the acpi-erst device.
>> + - size: The size of the ACPI ERST backing storage. This parameter is
>> +   required.
>> + - mem-path: The location of the ACPI ERST backing storage file. This
>> +   parameter is also required.
>> + - share: The share=on parameter is required so that updates to the
>> +   ERST backing store are written to the file.
>> +
>> +and ERST device:
>> +
>> + - memdev: Is the object id of the memory-backend-file.
>> +
>> +
>> +PCI Interface
>> +-------------
>> +
>> +The ERST device is a PCI device with two BARs, one for accessing the
>> +programming registers, and the other for accessing the record exchange
>> +buffer.
>> +
>> +BAR0 contains the programming interface consisting of ACTION and VALUE
>> +64-bit registers.  All ERST actions/operations/side effects happen on
>> +the write to the ACTION, by design. Any data needed by the action must
>> +be placed into VALUE prior to writing ACTION.  Reading the VALUE
>> +simply returns the register contents, which can be updated by a
>> +previous ACTION.
>> +
>> +BAR1 contains the 8KiB record exchange buffer, which is the
>> +implemented maximum record size.
>> +
>> +
>> +Backend Storage Format
>> +----------------------
>> +
>> +The backend storage is divided into fixed size "slots", 8KiB in
>> +length, with each slot storing a single record.  Not all slots need to
>> +be occupied, and they need not be occupied in a contiguous fashion.
>> +The ability to clear/erase specific records allows for the formation
>> +of unoccupied slots.
>> +
>> +Slot 0 is reserved for a backend storage header that identifies the
>> +contents as ERST and also facilitates efficient access to the records.
>> +Depending upon the size of the backend storage, additional slots will
>> +be reserved to be a part of the slot 0 header. For example, at 8KiB,
>> +the slot 0 header can accomodate 1024 records. Thus a storage size
>> +above 8MiB (8KiB * 1024) requires an additional slot. In this
>> +scenario, slot 0 and slot 1 form the backend storage header, and
>> +records can be stored starting at slot 2.
>> +
>> +Below is an example layout of the backend storage format (for storage
>> +size less than 8MiB). The size of the storage is a multiple of 8KiB,
>> +and contains N number of slots to store records. The example below
>> +shows two records (in CPER format) in the backend storage, while the
>> +remaining slots are empty/available.
>> +
>> + Slot   Record
>> +        +--------------------------------------------+
>> +    0   | reserved: storage header                   |
> 
> typically reserved means 'not used', so I'd drop mentioning reserved
> an leave it just as storage header.
done

> 
> Also header format should be described here
done

> 
>> +        +--------------------------------------------+
>> +    1   | empty/available                            |
>> +        +--------------------------------------------+
>> +    2   | CPER                                       |
>> +        +--------------------------------------------+
> 
> how can one distinguish empty vs used slots (i.e define empty somewhere here)
done; explained in v7.

> 
>> +    3   | CPER                                       |
>> +        +--------------------------------------------+
>> +  ...   |                                            |
>> +        +--------------------------------------------+
>> +    N   | empty/available                            |
>> +        +--------------------------------------------+
>> +        <------------------ 8KiB -------------------->
>> +
>> +
>> +
>> +References
>> +----------
>> +
>> +[1] "Advanced Configuration and Power Interface Specification",
>> +    version 4.0, June 2009.
>> +
>> +[2] "Unified Extensible Firmware Interface Specification",
>> +    version 2.1, October 2008.
>> +
>> +[3] "Windows Hardware Error Architecture", specfically
>> +    "Error Record Persistence Mechanism".
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 05/10] ACPI ERST: support for ACPI ERST feature
  2021-09-21 15:30   ` Igor Mammedov
@ 2021-10-04 21:13     ` Eric DeVolder
  2021-10-05 11:39       ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Eric DeVolder @ 2021-10-04 21:13 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth

Igor, thanks for the close examination. Inline responses below.
eric

On 9/21/21 10:30 AM, Igor Mammedov wrote:
> On Thu,  5 Aug 2021 18:30:34 -0400
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
>> This implements a PCI device for ACPI ERST. This implements the
>> non-NVRAM "mode" of operation for ERST as it is supported by
>> Linux and Windows.
>>
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> ---
>>   hw/acpi/erst.c       | 750 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   hw/acpi/meson.build  |   1 +
>>   hw/acpi/trace-events |  15 ++
>>   3 files changed, 766 insertions(+)
>>   create mode 100644 hw/acpi/erst.c
>>
>> diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
>> new file mode 100644
>> index 0000000..eb4ab34
>> --- /dev/null
>> +++ b/hw/acpi/erst.c
>> @@ -0,0 +1,750 @@
>> +/*
>> + * ACPI Error Record Serialization Table, ERST, Implementation
>> + *
>> + * ACPI ERST introduced in ACPI 4.0, June 16, 2009.
>> + * ACPI Platform Error Interfaces : Error Serialization
>> + *
>> + * Copyright (c) 2021 Oracle and/or its affiliates.
>> + *
>> + * SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +
>> +#include <sys/types.h>
>> +#include <sys/stat.h>
>> +#include <unistd.h>
>> +
>> +#include "qemu/osdep.h"
>> +#include "qapi/error.h"
>> +#include "hw/qdev-core.h"
>> +#include "exec/memory.h"
>> +#include "qom/object.h"
>> +#include "hw/pci/pci.h"
>> +#include "qom/object_interfaces.h"
>> +#include "qemu/error-report.h"
>> +#include "migration/vmstate.h"
>> +#include "hw/qdev-properties.h"
>> +#include "hw/acpi/acpi.h"
>> +#include "hw/acpi/acpi-defs.h"
>> +#include "hw/acpi/aml-build.h"
>> +#include "hw/acpi/bios-linker-loader.h"
>> +#include "exec/address-spaces.h"
>> +#include "sysemu/hostmem.h"
>> +#include "hw/acpi/erst.h"
>> +#include "trace.h"
>> +
>> +/* ACPI 4.0: Table 17-16 Serialization Actions */
>> +#define ACTION_BEGIN_WRITE_OPERATION         0x0
>> +#define ACTION_BEGIN_READ_OPERATION          0x1
>> +#define ACTION_BEGIN_CLEAR_OPERATION         0x2
>> +#define ACTION_END_OPERATION                 0x3
>> +#define ACTION_SET_RECORD_OFFSET             0x4
>> +#define ACTION_EXECUTE_OPERATION             0x5
>> +#define ACTION_CHECK_BUSY_STATUS             0x6
>> +#define ACTION_GET_COMMAND_STATUS            0x7
>> +#define ACTION_GET_RECORD_IDENTIFIER         0x8
>> +#define ACTION_SET_RECORD_IDENTIFIER         0x9
>> +#define ACTION_GET_RECORD_COUNT              0xA
>> +#define ACTION_BEGIN_DUMMY_WRITE_OPERATION   0xB
>> +#define ACTION_RESERVED                      0xC
>> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE   0xD
>> +#define ACTION_GET_ERROR_LOG_ADDRESS_LENGTH  0xE
>> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES 0xF
>> +#define ACTION_GET_EXECUTE_OPERATION_TIMINGS 0x10
>> +
>> +/* ACPI 4.0: Table 17-17 Command Status Definitions */
>> +#define STATUS_SUCCESS                0x00
>> +#define STATUS_NOT_ENOUGH_SPACE       0x01
>> +#define STATUS_HARDWARE_NOT_AVAILABLE 0x02
>> +#define STATUS_FAILED                 0x03
>> +#define STATUS_RECORD_STORE_EMPTY     0x04
>> +#define STATUS_RECORD_NOT_FOUND       0x05
>> +
>> +
>> +/* UEFI 2.1: Appendix N Common Platform Error Record */
>> +#define UEFI_CPER_RECORD_MIN_SIZE 128U
>> +#define UEFI_CPER_RECORD_LENGTH_OFFSET 20U
>> +#define UEFI_CPER_RECORD_ID_OFFSET 96U
>> +#define IS_UEFI_CPER_RECORD(ptr) \
>> +    (((ptr)[0] == 'C') && \
>> +     ((ptr)[1] == 'P') && \
>> +     ((ptr)[2] == 'E') && \
>> +     ((ptr)[3] == 'R'))
>> +#define THE_UEFI_CPER_RECORD_ID(ptr) \
>> +    (*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
>> +
>> +/*
>> + * This implementation is an ACTION (cmd) and VALUE (data)
>> + * interface consisting of just two 64-bit registers.
>> + */
>> +#define ERST_REG_SIZE (16UL)
>> +#define ERST_ACTION_OFFSET (0UL) /* action (cmd) */
>> +#define ERST_VALUE_OFFSET  (8UL) /* argument/value (data) */
>> +
>> +/*
>> + * ERST_RECORD_SIZE is the buffer size for exchanging ERST
>> + * record contents. Thus, it defines the maximum record size.
>> + * As this is mapped through a PCI BAR, it must be a power of
>> + * two and larger than UEFI_CPER_RECORD_MIN_SIZE.
>> + * The backing storage is divided into fixed size "slots",
>> + * each ERST_RECORD_SIZE in length, and each "slot"
>> + * storing a single record. No attempt at optimizing storage
>> + * through compression, compaction, etc is attempted.
>> + * NOTE that slot 0 is reserved for the backing storage header.
>> + * Depending upon the size of the backing storage, additional
>> + * slots will be part of the slot 0 header in order to account
>> + * for a record_id for each available remaining slot.
>> + */
>> +/* 8KiB records, not too small, not too big */
>> +#define ERST_RECORD_SIZE (8192UL)
>> +
>> +#define ACPI_ERST_MEMDEV_PROP "memdev"
>> +
>> +/*
>> + * From the ACPI ERST spec sections:
>> + * A record id of all 0s is used to indicate
>> + * 'unspecified' record id.
>> + * A record id of all 1s is used to indicate
>> + * empty or end.
>> + */
>> +#define ERST_UNSPECIFIED_RECORD_ID (0UL)
>> +#define ERST_EMPTY_END_RECORD_ID (~0UL)
>> +#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
>> +#define ERST_IS_VALID_RECORD_ID(rid) \
>> +    ((rid != ERST_UNSPECIFIED_RECORD_ID) && \
>> +     (rid != ERST_EMPTY_END_RECORD_ID))
>> +
>> +typedef struct erst_storage_header_s {
> 
>> +#define ERST_STORE_MAGIC 0x524F545354535245UL
> 
> move it out of structure definition,
> also where value comes from? (perhaps something starting
> with ERST... would be more self-describing)
done; this value is 'ERSTSTOR', which I've left as a comment in v7.

> 
>> +    uint64_t magic;
>> +    uint32_t record_size;
>> +    uint32_t record_offset; /* offset to record storage beyond header */
>> +    uint16_t version;
>> +    uint16_t reserved;
>> +    uint32_t record_count;
>> +    uint64_t map[]; /* contains record_ids, and position indicates index */
>> +} erst_storage_header_t;
> docs/devel/style.rst: Typedefs
done; thanks

> 
> also give it's used as header layout in storage,
> set packed attribute for structure
done

> 
>> +
>> +/*
>> + * Object cast macro
>> + */
>> +#define ACPIERST(obj) \
>> +    OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
>> +
>> +/*
>> + * Main ERST device state structure
>> + */
>> +typedef struct {
>> +    PCIDevice parent_obj;
>> +
>> +    /* Backend storage */
>> +    HostMemoryBackend *hostmem;
>> +    MemoryRegion *hostmem_mr;
>> +
>> +    /* Programming registers */
>> +    MemoryRegion iomem;
>> +
>> +    /* Exchange buffer */
>> +    Object *exchange_obj;
>> +    HostMemoryBackend *exchange;
>> +    MemoryRegion *exchange_mr;
>> +    uint32_t storage_size;
>> +
>> +    /* Interface state */
>> +    uint8_t operation;
>> +    uint8_t busy_status;
>> +    uint8_t command_status;
>> +    uint32_t record_offset;
>> +    uint64_t reg_action;
>> +    uint64_t reg_value;
>> +    uint64_t record_identifier;
>> +    erst_storage_header_t *header;
>> +    unsigned next_record_index;
>> +    unsigned first_record_index;
>> +    unsigned last_record_index;
>> +
>> +} ERSTDeviceState;
>> +
>> +/*******************************************************************/
>> +/*******************************************************************/
>> +
>> +static uint8_t *get_nvram_ptr_by_index(ERSTDeviceState *s, unsigned index)
>> +{
>> +    uint8_t *rc = NULL;
>> +    off_t offset = (index * ERST_RECORD_SIZE);
> 
>> +    if ((offset + ERST_RECORD_SIZE) <= s->storage_size) {
> 
> it looks like 'index' passed by caller is always valid, if it's the case
> convert  this to
>          g_assert((offset + ERST_RECORD_SIZE) <= s->storage_size))
done

> 
> also shouldn't <= be just <
yes, done

> 
> 
>> +        if (s->hostmem_mr) {
> can hostmem_mr be NULL, when this function is called?
> if not I'd drop condition.
no, so dropped. done

> 
>> +            uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(s->hostmem_mr);
>> +            rc = p + offset;
>> +        }
>> +    }
>> +    return rc;
>> +}
>> +
>> +static void make_erst_storage_header(ERSTDeviceState *s)
>> +{
>> +    erst_storage_header_t *header = s->header;
>> +    unsigned mapsz, headersz;
>> +
>> +    header->magic = ERST_STORE_MAGIC;
>> +    header->record_size = ERST_RECORD_SIZE;
>> +    header->version = 0x0101;
> 
> maybe 0 or 1 to avoid question about what previous versions are
changed to simply 0x0100 (ie 1.0)
> 
>> +    header->reserved = 0x0000;
> s/0x.../0/
done

> 
>> +
>> +    /* Compute mapsize */
>> +    mapsz = s->storage_size / ERST_RECORD_SIZE;
>> +    mapsz *= sizeof(uint64_t);
>> +    /* Compute header+map size */
>> +    headersz = sizeof(erst_storage_header_t) + mapsz;
> 
>> +    /* Round up to nearest integer multiple of ERST_RECORD_SIZE */
>> +    headersz += (ERST_RECORD_SIZE - 1);
>> +    headersz /= ERST_RECORD_SIZE;
>> +    headersz *= ERST_RECORD_SIZE;
> git grep ROUND_UP
> may be of help here
yes, thanks. I'm using that now, done.

> 
>> +    header->record_offset = headersz;
>> +
>> +    /*
>> +     * The HostMemoryBackend initializes contents to zero,
>> +     * so all record_ids stashed in the map are zero'd.
>> +     * As well the record_count is zero. Properly initialized.
>> +     */
>> +}
>> +
>> +static void check_erst_backend_storage(ERSTDeviceState *s, Error **errp)
>> +{
>> +    erst_storage_header_t *header;
>> +
>> +    header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);
> optionally check/assert if it's not 64bit aligned,
> if it's not you risk getting killed by SIGBUG on some hosts,
> since you're accessing fields directly.
done!

> 
>> +    s->header = header;
>> +
>> +    /* Check if header is uninitialized */
>> +    if (header->magic == 0UL) { /* HostMemoryBackend inits to 0 */
>> +        make_erst_storage_header(s);
>> +    }
>> +
>> +    if (!(
>> +        (header->magic == ERST_STORE_MAGIC) &&
>> +        (header->record_size == ERST_RECORD_SIZE) &&
>> +        ((header->record_offset % ERST_RECORD_SIZE) == 0) &&
>> +        (header->version == 0x0101) &&
>> +        (header->reserved == 0x0000)
>> +        )) {
>> +        error_setg(errp, "ERST backend storage header is invalid");
>> +    }
>> +
>> +    /* Compute offset of first and last record storage slot */
>> +    s->first_record_index = header->record_offset / ERST_RECORD_SIZE;
>> +    s->last_record_index = (s->storage_size / ERST_RECORD_SIZE);
> 
> applies to whole patch/series,
> if mmaped header values are interpreted as integers you shall
> take care of endianness, i.e. use cpu_to_foo/foo_to_cpu for access
done; I'm using cpu_to_leX() and leX_to_cpu() for any access to the header.

> 
> and document file endianness in doc (2/10)
done

> 
>> +}
>> +
>> +static void set_erst_map_by_index(ERSTDeviceState *s, unsigned index,
>> +    uint64_t record_id)
> 
> update_[cache|map]_[entry|record_id]() or something like this might be
> a better description erst and index don't really add much here as it's
> clear from context.
done; now update_map_entry()

> 
> 
>> +{
>> +    if (index < s->last_record_index) {
>> +        s->header->map[index] = record_id;
>> +    }
>> +}
>> +
>> +static unsigned lookup_erst_record(ERSTDeviceState *s,
>> +    uint64_t record_identifier)
>> +{
>> +    unsigned rc = 0; /* 0 not a valid index */
>> +    unsigned index = s->first_record_index;
>> +
>> +    /* Find the record_identifier in the map */
>> +    if (record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
>> +        /*
>> +         * Count number of valid records encountered, and
>> +         * short-circuit the loop if identifier not found
>> +         */
>> +        unsigned count = 0;
>> +        for (; index < s->last_record_index &&
>> +                count < s->header->record_count; ++index) {
>> +            uint64_t map_record_identifier = s->header->map[index];
> I'd drop map_record_identifier and use s->header->map[index] directly,
> i.e
>     if (s->header->map[index] ...
done

> 
>> +            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
>> +                ++count;
>> +            }
>> +            if (map_record_identifier == record_identifier) {
>> +                rc = index;
>> +                break;
>> +            }
>> +        }
>> +    } else {
>> +        /* Find first available unoccupied slot */
>> +        for (; index < s->last_record_index; ++index) {
>> +            if (s->header->map[index] == ERST_UNSPECIFIED_RECORD_ID) {
>> +                rc = index;
>> +                break;
>> +            }
>> +        }
>> +    }
>> +
>> +    return rc;
>> +}
> 
> what's the reason for combining lookup and allocate ops,
> if they where separated it' would be easier to follow code.
done; at one point it made sense; no longer.

> 
>> +
>> +/* ACPI 4.0: 17.4.2.3 Operations - Clearing */
>> +static unsigned clear_erst_record(ERSTDeviceState *s)
>> +{
>> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
>> +    unsigned index;
>> +
>> +    /* Check for valid record identifier */
>> +    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
>> +        return STATUS_FAILED;
>> +    }
>> +
>> +    index = lookup_erst_record(s, s->record_identifier);
>> +    if (index) {
>> +        /* No need to wipe record, just invalidate its map entry */
>> +        set_erst_map_by_index(s, index, ERST_UNSPECIFIED_RECORD_ID);
>> +        s->header->record_count -= 1;
>> +        rc = STATUS_SUCCESS;
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +/* ACPI 4.0: 17.4.2.2 Operations - Reading */
>> +static unsigned read_erst_record(ERSTDeviceState *s)
>> +{
>> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
>> +    unsigned index;
>> +
>> +    /* Check record boundary wihin exchange buffer */
>                                  ^^^ typo
done

> 
>> +    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
>> +        return STATUS_FAILED;
>> +    }
>> +
>> +    /* Check for valid record identifier */
>> +    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
>> +        return STATUS_FAILED;
>> +    }
>> +
>> +    index = lookup_erst_record(s, s->record_identifier);
>> +    if (index) {
>> +        uint8_t *ptr;
>> +        uint8_t *record = ((uint8_t *)
>> +            memory_region_get_ram_ptr(s->exchange_mr) +
>> +            s->record_offset);
>> +        ptr = get_nvram_ptr_by_index(s, index);
>> +        memcpy(record, ptr, ERST_RECORD_SIZE - s->record_offset);
> 
> if record_offset is large enough that record won't fit,
> this will copy truncated record into the exchange buffer.
> 
> Maybe it's better to fail whole op?
The first check within this function checks for this very condition, and does fail.
I believe the code does as you are asking.

> 
>> +        rc = STATUS_SUCCESS;
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +/* ACPI 4.0: 17.4.2.1 Operations - Writing */
>> +static unsigned write_erst_record(ERSTDeviceState *s)
>> +{
>> +    unsigned rc = STATUS_FAILED;
>> +    unsigned index;
>> +    uint64_t record_identifier;
>> +    uint8_t *record;
>> +    uint8_t *ptr = NULL;
>> +    bool record_found = false;
>> +
>> +    /* Check record boundary wihin exchange buffer */
> ditto, typo
done

> 
>> +    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
>> +        return STATUS_FAILED;
>> +    }
>> +
>> +    /* Extract record identifier */
>> +    record = ((uint8_t *)memory_region_get_ram_ptr(s->exchange_mr)
>> +        + s->record_offset);
>> +    record_identifier = THE_UEFI_CPER_RECORD_ID(record);
> potentially unaligned access to int, should use memcpy()
done

> 
>> +
>> +    /* Check for valid record identifier */
>> +    if (!ERST_IS_VALID_RECORD_ID(record_identifier)) {
>> +        return STATUS_FAILED;
>> +    }
>> +
>> +    index = lookup_erst_record(s, record_identifier);
>> +    if (index) {
>> +        /* Record found, overwrite existing record */
>> +        ptr = get_nvram_ptr_by_index(s, index);
>> +        record_found = true;
>> +    } else {
>> +        /* Record not found, not an overwrite, allocate for write */
>> +        index = lookup_erst_record(s, ERST_UNSPECIFIED_RECORD_ID);
>> +        if (index) {
>> +            ptr = get_nvram_ptr_by_index(s, index);
>> +        } else {
>> +            rc = STATUS_NOT_ENOUGH_SPACE;
>> +        }
>> +    }
>> +    if (ptr) {
>> +        memcpy(ptr, record, ERST_RECORD_SIZE - s->record_offset);
> 
>> +        if (0 != s->record_offset) {
>> +            memset(&ptr[ERST_RECORD_SIZE - s->record_offset],
>> +                0xFF, s->record_offset);
>> +        }
> you've lost me here, care to explain what's going on here?
If the record_offset is not 0, then there can be bytes following the record within the slot that 
were not written. This simply sets them to 0xFF (so bytes from a previously written record that 
happened to occupy this slot do not "bleed" through).
I've left a comment in v7.

> 
>> +        if (!record_found) {
>> +            s->header->record_count += 1; /* writing new record */
>> +        }
>> +        set_erst_map_by_index(s, index, record_identifier);
>> +        rc = STATUS_SUCCESS;
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +/* ACPI 4.0: 17.4.2.2 Operations - Reading "During boot..." */
>> +static unsigned next_erst_record(ERSTDeviceState *s,
>> +    uint64_t *record_identifier)
> s/record_identifier/found.../
done

> 
>> +{
>> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
>> +    unsigned index = s->next_record_index;
>> +
>> +    *record_identifier = ERST_EMPTY_END_RECORD_ID;
>> +
>> +    if (s->header->record_count) {
>> +        for (; index < s->last_record_index; ++index) {
>> +            uint64_t map_record_identifier;
> and then s/map_record_identifier/record_identifier/
done

> 
> the same applies to other occurrences within patch
> (map_record_identifier is a bit confusing) or drop it
> and use s->header->map[index] directly
done

> 
>> +            map_record_identifier = s->header->map[index];
>> +            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
>> +                    /* where to start next time */
>> +                    s->next_record_index = index + 1;
>> +                    *record_identifier = map_record_identifier;
>> +                    rc = STATUS_SUCCESS;
>> +                    break;
>> +            }
>> +        }
>> +    }
>> +    if (rc != STATUS_SUCCESS) {
>> +        if (s->next_record_index == s->first_record_index) {
>> +            /*
>> +             * next_record_identifier is unchanged, no records found
>> +             * and *record_identifier contains EMPTY_END id
>> +             */
>> +            rc = STATUS_RECORD_STORE_EMPTY;
>> +        }
>> +        /* at end/scan complete, reset */
>> +        s->next_record_index = s->first_record_index;
>> +    }
> 
> Table 17-16, says return existing error or ERST_EMPTY_END_RECORD_ID
> but nothing about op returning a error, so I'd assume status
> should always be STATUS_SUCCESS for GET_RECORD_IDENTIFIER.
done

> 
> Advancing to the next record is part of record READ op and
> not the part of GET_RECORD_IDENTIFIER as it's done here.
well...

>    "The steps performed by the platform to carry out ...
>       2. ..
>          c. If the specified error record does not exist,
>             ... update the status register’s Identifier field with the identifier of the
> ‘first’ error record
>       4. Record the Identifier of the ‘next’ valid error record ...
>    "

I used ACPI spec v6 and I was asked to locate the first occurrence of ERST in the spec, which was 
v4. So the above spec quotes are accurate, however, spec v6 deviates in an important way from the 
above, which reads:

   "c. If the status is Record Not Found (0x05), indicating that the specified error record does not 
exist, OSPM retrieves a valid identifier by a GET_RECORD_IDENTIFIER action. The platform will return 
a valid record identifier."

So GET_RECORD_IDENTIFIER is essentially a factory that pumps out record identifiers for records 
stored. I kind of think of it as the old DOS 'find_first/find_next'. And yes v4 of the spec states 
that the READ operation should initiate the first record_identifer. However v6 clearly states this 
is now the responsibility of OSPM, not the READ op.

I am thinking that the best way to handle this contradiction is to change the ACPI spec citation 
from v4 to v5, as the wording in v5 matches what I cite from v6, and implemented. Furthermore, this 
approach of OSPM obtaining the next valid record_id via GET_RECORD_IDENTIFIER is consistent with 
what I observed in BIOS and with how Linux is coded.

Thoughts?

> 
> 
>> +
>> +    return rc;
>> +}
>> +
>> +/*******************************************************************/
>> +
>> +static uint64_t erst_rd_reg64(hwaddr addr,
>> +    uint64_t reg, unsigned size)
>> +{
>> +    uint64_t rdval;
>> +    uint64_t mask;
>> +    unsigned shift;
>> +
>> +    if (size == sizeof(uint64_t)) {
>> +        /* 64b access */
>> +        mask = 0xFFFFFFFFFFFFFFFFUL;
>> +        shift = 0;
>> +    } else {
>> +        /* 32b access */
>> +        mask = 0x00000000FFFFFFFFUL;
>> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
>> +    }
>> +
>> +    rdval = reg;
>> +    rdval >>= shift;
>> +    rdval &= mask;
>> +
>> +    return rdval;
>> +}
>> +
>> +static uint64_t erst_wr_reg64(hwaddr addr,
>> +    uint64_t reg, uint64_t val, unsigned size)
>> +{
>> +    uint64_t wrval;
>> +    uint64_t mask;
>> +    unsigned shift;
>> +    if (size == sizeof(uint64_t)) {
>> +        /* 64b access */
>> +        mask = 0xFFFFFFFFFFFFFFFFUL;
>> +        shift = 0;
>> +    } else {
>> +        /* 32b access */
>> +        mask = 0x00000000FFFFFFFFUL;
>> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
>> +    }
>> +
>> +    val &= mask;
>> +    val <<= shift;
>> +    mask <<= shift;
>> +    wrval = reg;
>> +    wrval &= ~mask;
>> +    wrval |= val;
>> +
>> +    return wrval;
>> +}
>> +
>> +static void erst_reg_write(void *opaque, hwaddr addr,
>> +    uint64_t val, unsigned size)
>> +{
>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
>> +
>> +    /*
>> +     * NOTE: All actions/operations/side effects happen on the WRITE,
>> +     * by design. The READs simply return the reg_value contents.
> 
> point to spec, pls.
This was an implementation design choice, so no spec reference applicable, I left a comment.


> 
>> +     */
>> +    trace_acpi_erst_reg_write(addr, val, size);
>> +
>> +    switch (addr) {
>> +    case ERST_VALUE_OFFSET + 0:
>> +    case ERST_VALUE_OFFSET + 4:
>> +        s->reg_value = erst_wr_reg64(addr, s->reg_value, val, size);
>> +        break;
>> +    case ERST_ACTION_OFFSET + 0:
> 
>> +/*  case ERST_ACTION_OFFSET+4: as coded, not really a 64b register */
> 
> what does this mean?
In short, all values written to this register are just the ACTION ops, so there wasn't a need to 
implement this as a 64-bit register, especially since Linux seems to issue two 32-bit accesses for 
64-bit; in this case the upper access is utterly useless.
I placed a comment in code.

> 
>> +        switch (val) {
>> +        case ACTION_BEGIN_WRITE_OPERATION:
>> +        case ACTION_BEGIN_READ_OPERATION:
>> +        case ACTION_BEGIN_CLEAR_OPERATION:
>> +        case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
>> +        case ACTION_END_OPERATION:
>> +            s->operation = val;
>> +            break;
>> +        case ACTION_SET_RECORD_OFFSET:
>> +            s->record_offset = s->reg_value;
>> +            break;
>> +        case ACTION_EXECUTE_OPERATION:
>> +            if ((uint8_t)s->reg_value == ERST_EXECUTE_OPERATION_MAGIC) {
>> +                s->busy_status = 1;
>> +                switch (s->operation) {
>> +                case ACTION_BEGIN_WRITE_OPERATION:
>> +                    s->command_status = write_erst_record(s);
>> +                    break;
>> +                case ACTION_BEGIN_READ_OPERATION:
>> +                    s->command_status = read_erst_record(s);
>> +                    break;
>> +                case ACTION_BEGIN_CLEAR_OPERATION:
>> +                    s->command_status = clear_erst_record(s);
>> +                    break;
>> +                case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
>> +                    s->command_status = STATUS_SUCCESS;
>> +                    break;
>> +                case ACTION_END_OPERATION:
>> +                    s->command_status = STATUS_SUCCESS;
>> +                    break;
>> +                default:
>> +                    s->command_status = STATUS_FAILED;
>> +                    break;
>> +                }
>> +                s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;
>                     shouldn't happen in case of Read op
correct, removed as not needed at all now.

> 
> "
> 17.4.2.2
> 4. Record the Identifier of the ‘next’ valid error record that resides on the persistent store. This
> allows OSPM to retrieve a valid record identifier by executing a GET_RECORD_IDENTIFIER
> operation.
> "
> 
>> +                s->busy_status = 0;
>> +            }
>> +            break;
>> +        case ACTION_CHECK_BUSY_STATUS:
>> +            s->reg_value = s->busy_status;
>> +            break;
>> +        case ACTION_GET_COMMAND_STATUS:
>> +            s->reg_value = s->command_status;
>> +            break;
>> +        case ACTION_GET_RECORD_IDENTIFIER:
>> +            s->command_status = next_erst_record(s, &s->reg_value);
>> +            break;
>> +        case ACTION_SET_RECORD_IDENTIFIER:
>> +            s->record_identifier = s->reg_value;
>> +            break;
>> +        case ACTION_GET_RECORD_COUNT:
>> +            s->reg_value = s->header->record_count;
>> +            break;
>> +        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
>> +            s->reg_value = (hwaddr)pci_get_bar_addr(PCI_DEVICE(s), 1);
>> +            break;
>> +        case ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
>> +            s->reg_value = ERST_RECORD_SIZE;
>> +            break;
>> +        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
>> +            s->reg_value = 0x0; /* intentional, not NVRAM mode */
>> +            break;
>> +        case ACTION_GET_EXECUTE_OPERATION_TIMINGS:
>> +            s->reg_value =
>> +                (100ULL << 32) | /* 100us max time */
>> +                (10ULL  <<  0) ; /*  10us min time */
>> +            break;
>> +        default:
>> +            /* Unknown action/command, NOP */
>> +            break;
>> +        }
>> +        break;
>> +    default:
>> +        /* This should not happen, but if it does, NOP */
>> +        break;
>> +    }
>> +}
>> +
>> +static uint64_t erst_reg_read(void *opaque, hwaddr addr,
>> +                                unsigned size)
>> +{
>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
>> +    uint64_t val = 0;
>> +
>> +    switch (addr) {
>> +    case ERST_ACTION_OFFSET + 0:
>> +    case ERST_ACTION_OFFSET + 4:
>> +        val = erst_rd_reg64(addr, s->reg_action, size);
>> +        break;
>> +    case ERST_VALUE_OFFSET + 0:
>> +    case ERST_VALUE_OFFSET + 4:
>> +        val = erst_rd_reg64(addr, s->reg_value, size);
>> +        break;
>> +    default:
>> +        break;
>> +    }
>> +    trace_acpi_erst_reg_read(addr, val, size);
>> +    return val;
>> +}
>> +
>> +static const MemoryRegionOps erst_reg_ops = {
>> +    .read = erst_reg_read,
>> +    .write = erst_reg_write,
>> +    .endianness = DEVICE_NATIVE_ENDIAN,
>> +};
>> +
>> +/*******************************************************************/
>> +/*******************************************************************/
>> +static int erst_post_load(void *opaque, int version_id)
>> +{
>> +    ERSTDeviceState *s = opaque;
>> +
>> +    /* Recompute pointer to header */
>> +    s->header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);
>> +    trace_acpi_erst_post_load(s->header);
>> +
>> +    return 0;
>> +}
>> +
>> +static const VMStateDescription erst_vmstate  = {
>> +    .name = "acpi-erst",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .post_load = erst_post_load,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT32(storage_size, ERSTDeviceState),
>   1)
>> +        VMSTATE_UINT8(operation, ERSTDeviceState),
>> +        VMSTATE_UINT8(busy_status, ERSTDeviceState),
>> +        VMSTATE_UINT8(command_status, ERSTDeviceState),
>> +        VMSTATE_UINT32(record_offset, ERSTDeviceState),
>> +        VMSTATE_UINT64(reg_action, ERSTDeviceState),
>> +        VMSTATE_UINT64(reg_value, ERSTDeviceState),
>> +        VMSTATE_UINT64(record_identifier, ERSTDeviceState),
>> +        VMSTATE_UINT32(next_record_index, ERSTDeviceState),
> 
>> +        VMSTATE_UINT32(first_record_index, ERSTDeviceState),
>> +        VMSTATE_UINT32(last_record_index, ERSTDeviceState),
>   2)
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
> 
>   1 and 2 aren't runtime state, so why they are in migration stream?
done; removed storage_size, first_record_index and last_record_index from the migration stream.


> 
> I'd imagine size could be used to check that backend on target is of the same size
> to avoid buffer overrun if target side has smaller backend, and fail migration if
> it's not the same. But it aren't used this way here.
I decided to not do this check as that memory object is migrated automatically, so I dont think my 
check adds any value.

> 
> the rest could be calculated at realize time.
and in fact they are.

> 
>> +
>> +static void erst_realizefn(PCIDevice *pci_dev, Error **errp)
>> +{
>> +    ERSTDeviceState *s = ACPIERST(pci_dev);
>> +
>> +    trace_acpi_erst_realizefn_in();
>> +
>> +    if (!s->hostmem) {
>> +        error_setg(errp, "'" ACPI_ERST_MEMDEV_PROP "' property is not set");
>> +        return;
>> +    } else if (host_memory_backend_is_mapped(s->hostmem)) {
>> +        error_setg(errp, "can't use already busy memdev: %s",
>> +                   object_get_canonical_path_component(OBJECT(s->hostmem)));
>> +        return;
>> +    }
>> +
>> +    s->hostmem_mr = host_memory_backend_get_memory(s->hostmem);
>> +
>> +    /* HostMemoryBackend size will be multiple of PAGE_SIZE */
>> +    s->storage_size = object_property_get_int(OBJECT(s->hostmem), "size", errp);
>> +
>> +    /* Check storage_size against ERST_RECORD_SIZE */
>> +    if (((s->storage_size % ERST_RECORD_SIZE) != 0) ||
>> +         (ERST_RECORD_SIZE > s->storage_size)) {
>> +        error_setg(errp, "ACPI ERST requires size be multiple of "
>> +            "record size (%luKiB)", ERST_RECORD_SIZE);
>> +    }
>> +
>> +    /* Initialize backend storage and record_count */
>> +    check_erst_backend_storage(s, errp);
>> +
>> +    /* BAR 0: Programming registers */
>> +    memory_region_init_io(&s->iomem, OBJECT(pci_dev), &erst_reg_ops, s,
>> +                          TYPE_ACPI_ERST, ERST_REG_SIZE);
>> +    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->iomem);
>> +
>> +    /* BAR 1: Exchange buffer memory */
> 
> 
>> +    /* Create a hostmem object to use as the exchange buffer */
>> +    s->exchange_obj = object_new(TYPE_MEMORY_BACKEND_RAM);
>> +    object_property_set_int(s->exchange_obj, "size", ERST_RECORD_SIZE, errp);
>> +    user_creatable_complete(USER_CREATABLE(s->exchange_obj), errp);
>> +    s->exchange = MEMORY_BACKEND(s->exchange_obj);
>> +    host_memory_backend_set_mapped(s->exchange, true);
>> +    s->exchange_mr = host_memory_backend_get_memory(s->exchange);
> replace this block with single memory_region_init_ram()
done!

> 
> 
>> +    memory_region_init_resizeable_ram(s->exchange_mr, OBJECT(pci_dev),
>> +        TYPE_ACPI_ERST, ERST_RECORD_SIZE, ERST_RECORD_SIZE, NULL, errp);
> have ho idea why it's necessary, seems just wrong, it basically leaks
> previous memory region and creates a new one.
done!

> 
>> +    pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, s->exchange_mr);
> 
>> +    /* Include the exchange buffer in the migration stream */
>> +    vmstate_register_ram_global(s->exchange_mr);
> not necessary if memory_region_init_ram() is used directly
done!

> 
>> +
>> +    /* Include the backend storage in the migration stream */
>> +    vmstate_register_ram_global(s->hostmem_mr);
>> +
>> +    trace_acpi_erst_realizefn_out(s->storage_size);
>> +}
>> +
>> +static void erst_reset(DeviceState *dev)
>> +{
>> +    ERSTDeviceState *s = ACPIERST(dev);
>> +
>> +    trace_acpi_erst_reset_in(s->header->record_count);
>> +    s->operation = 0;
>> +    s->busy_status = 0;
>> +    s->command_status = STATUS_SUCCESS;
>> +    s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;
>> +    s->record_offset = 0;
>> +    s->next_record_index = s->first_record_index;
>> +    /* NOTE: first/last_record_index are computed only once */
>> +    trace_acpi_erst_reset_out(s->header->record_count);
>> +}
>> +
>> +static Property erst_properties[] = {
>> +    DEFINE_PROP_LINK(ACPI_ERST_MEMDEV_PROP, ERSTDeviceState, hostmem,
>> +                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void erst_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>> +
>> +    trace_acpi_erst_class_init_in();
>> +    k->realize = erst_realizefn;
>> +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
>> +    k->device_id = PCI_DEVICE_ID_REDHAT_ACPI_ERST;
>> +    k->revision = 0x00;
>> +    k->class_id = PCI_CLASS_OTHERS;
>> +    dc->reset = erst_reset;
>> +    dc->vmsd = &erst_vmstate;
>> +    dc->user_creatable = true;
> 
> can't be hotplugged, add:
>         dc->hotpluggable = false;
done

> 
>> +    device_class_set_props(dc, erst_properties);
>> +    dc->desc = "ACPI Error Record Serialization Table (ERST) device";
>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>> +    trace_acpi_erst_class_init_out();
>> +}
>> +
>> +static const TypeInfo erst_type_info = {
>> +    .name          = TYPE_ACPI_ERST,
>> +    .parent        = TYPE_PCI_DEVICE,
>> +    .class_init    = erst_class_init,
>> +    .instance_size = sizeof(ERSTDeviceState),
>> +    .interfaces = (InterfaceInfo[]) {
>> +        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
>> +        { }
>> +    }
>> +};
>> +
>> +static void erst_register_types(void)
>> +{
>> +    type_register_static(&erst_type_info);
>> +}
>> +
>> +type_init(erst_register_types)
>> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
>> index 29f804d..401d0e5 100644
>> --- a/hw/acpi/meson.build
>> +++ b/hw/acpi/meson.build
>> @@ -5,6 +5,7 @@ acpi_ss.add(files(
>>     'bios-linker-loader.c',
>>     'core.c',
>>     'utils.c',
>> +  'erst.c',
>>   ))
>>   acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu.c'))
>>   acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu_hotplug.c'))
>> diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
>> index 974d770..3579768 100644
>> --- a/hw/acpi/trace-events
>> +++ b/hw/acpi/trace-events
>> @@ -55,3 +55,18 @@ piix4_gpe_writeb(uint64_t addr, unsigned width, uint64_t val) "addr: 0x%" PRIx64
>>   # tco.c
>>   tco_timer_reload(int ticks, int msec) "ticks=%d (%d ms)"
>>   tco_timer_expired(int timeouts_no, bool strap, bool no_reboot) "timeouts_no=%d no_reboot=%d/%d"
>> +
>> +# erst.c
>> +acpi_erst_reg_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%04" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
>> +acpi_erst_reg_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%04" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
>> +acpi_erst_mem_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%06" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
>> +acpi_erst_mem_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%06" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
>> +acpi_erst_pci_bar_0(uint64_t addr) "BAR0: 0x%016" PRIx64
>> +acpi_erst_pci_bar_1(uint64_t addr) "BAR1: 0x%016" PRIx64
>> +acpi_erst_realizefn_in(void)
>> +acpi_erst_realizefn_out(unsigned size) "total nvram size %u bytes"
>> +acpi_erst_reset_in(unsigned record_count) "record_count %u"
>> +acpi_erst_reset_out(unsigned record_count) "record_count %u"
>> +acpi_erst_post_load(void *header) "header: 0x%p"
>> +acpi_erst_class_init_in(void)
>> +acpi_erst_class_init_out(void)
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 09/10] ACPI ERST: bios-tables-test testcase
  2021-09-21 11:32   ` Igor Mammedov
@ 2021-10-04 21:13     ` Eric DeVolder
  0 siblings, 0 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-10-04 21:13 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth



On 9/21/21 6:32 AM, Igor Mammedov wrote:
> On Thu,  5 Aug 2021 18:30:38 -0400
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
>> This change implements the test suite checks for the ERST table.
>>
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> ---
>>   tests/qtest/bios-tables-test.c | 43 ++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 43 insertions(+)
>>
>> diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
>> index 51d3a4e..6ee78ec 100644
>> --- a/tests/qtest/bios-tables-test.c
>> +++ b/tests/qtest/bios-tables-test.c
>> @@ -1378,6 +1378,45 @@ static void test_acpi_piix4_tcg_acpi_hmat(void)
>>       test_acpi_tcg_acpi_hmat(MACHINE_PC);
>>   }
>>   
>> +static void test_acpi_erst(const char *machine)
>> +{
>> +    test_data data;
>> +
>> +    memset(&data, 0, sizeof(data));
>> +    data.machine = machine;
>> +    /*data.variant = ".acpierst";*/
>> +    test_acpi_one(" -object memory-backend-file,id=erstnvram,"
>> +                    "mem-path=tests/acpi-erst.XXXXXX,size=0x10000,share=on"
>> +                    " -device acpi-erst,memdev=erstnvram",
>> +                  &data);
>> +    free_test_data(&data);
>> +}
>> +
>> +static void test_acpi_piix4_erst(void)
>> +{
>> +    test_acpi_erst(MACHINE_PC);
>> +}
>> +
>> +static void test_acpi_q35_erst(void)
>> +{
>> +    test_acpi_erst(MACHINE_Q35);
>> +}
>> +
>> +static void test_acpi_microvm_erst(void)
>> +{
>> +    test_data data;
>> +
>> +    test_acpi_microvm_prepare(&data);
>> +    data.variant = ".pcie";
>> +    data.tcg_only = true; /* need constant host-phys-bits */
>> +    test_acpi_one(" -machine microvm,acpi=on,ioapic2=off,rtc=off,pcie=on "
>> +                    "-object memory-backend-file,id=erstnvram,"
>> +                    "mem-path=tests/acpi-erst.XXXXXX,size=0x10000,share=on "
>                                   ^^^^
> shouldn't the path be generated with g_dir_make_tmp() & co + corresponding cleanup
done!

> 
>> +                    "-device acpi-erst,memdev=erstnvram",
>> +                  &data);
>> +    free_test_data(&data);
>> +}
>> +
>>   static void test_acpi_virt_tcg(void)
>>   {
>>       test_data data = {
>> @@ -1560,7 +1599,11 @@ int main(int argc, char *argv[])
>>           qtest_add_func("acpi/microvm/oem-fields", test_acpi_oem_fields_microvm);
>>           if (strcmp(arch, "x86_64") == 0) {
>>               qtest_add_func("acpi/microvm/pcie", test_acpi_microvm_pcie_tcg);
>> +            qtest_add_func("acpi/microvm/acpierst", test_acpi_microvm_erst);
>>           }
>> +        qtest_add_func("acpi/piix4/acpierst", test_acpi_piix4_erst);
>> +        qtest_add_func("acpi/q35/acpierst", test_acpi_q35_erst);
>> +
>>       } else if (strcmp(arch, "aarch64") == 0) {
>>           qtest_add_func("acpi/virt", test_acpi_virt_tcg);
>>           qtest_add_func("acpi/virt/numamem", test_acpi_virt_tcg_numamem);
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test
  2021-09-21 11:24   ` Igor Mammedov
@ 2021-10-04 21:14     ` Eric DeVolder
  0 siblings, 0 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-10-04 21:14 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth



On 9/21/21 6:24 AM, Igor Mammedov wrote:
> On Thu,  5 Aug 2021 18:30:39 -0400
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
>> Following the guidelines in tests/qtest/bios-tables-test.c, this
>> is step 6, the re-generated ACPI tables binary blobs.
> 
> 
> commit message should include ASL diff for new/changed content
> 
> for example see commit:
> 1aaef7d8092803 acpi: unit-test: Update WAET ACPI Table expected binaries

done

> 
>>
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> ---
>>   tests/data/acpi/microvm/ERST.pcie           | Bin 0 -> 912 bytes
>>   tests/data/acpi/pc/DSDT                     | Bin 6002 -> 6009 bytes
>>   tests/data/acpi/pc/ERST                     | Bin 0 -> 912 bytes
>>   tests/data/acpi/q35/DSDT                    | Bin 8289 -> 8306 bytes
>>   tests/data/acpi/q35/ERST                    | Bin 0 -> 912 bytes
>>   tests/qtest/bios-tables-test-allowed-diff.h |   6 ------
>>   6 files changed, 6 deletions(-)
>>   create mode 100644 tests/data/acpi/microvm/ERST.pcie
>>
>> diff --git a/tests/data/acpi/microvm/ERST.pcie b/tests/data/acpi/microvm/ERST.pcie
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..d9a2b3211ab5893a50751ad52be3782579e367f2
>> GIT binary patch
>> literal 912
>> zcmaKpO%8%E5QPUQ|KVrvh9h_c12J)@5f?5!k_Ygv*jGA8UW7?#`}+D#XXyDpKHiZ?
>> z@anI_W$gOrZRl(SB7!yMqx}#E4EC&a5=}m^g_!0^0`kEl)DOuIXM6D@@*xq*8vyqH
>> z)b0KTlmlgmH~xt7vG<k#Z1~z=OnyT76ZX;Ysy^;NC0^^$`kY?zKK;^vMtny1JAD$P
>> zc^BR{l;i*H`IJAW`~~?1`_TXD_wQ2@UlL!DU$GCpQ-4i-O}x_^JdQTRH^e)=(_c$`
>> LOT5z?_v4Aa+v(5&
>>
>> literal 0
>> HcmV?d00001
>>
>> diff --git a/tests/data/acpi/pc/DSDT b/tests/data/acpi/pc/DSDT
>> index cc1223773e9c459a8d2f20666c051a74338d40b7..bff625a25602fa954b5b395fea53e3c7dfaca485 100644
>> GIT binary patch
>> delta 85
>> zcmeyQ_fwC{CD<jTQk;Q-F=QiG057Ni!kGAAr+5MP$;rGe;+`zQh8FQ0@s2J*JPZuX
>> l3>=QZp?+M<lN)&@ggD~CY!RV&S1$v`0B2XP&5C@1oB+Xc6m$Rp
>>
>> delta 65
>> zcmeyV_eqb-CD<jTNSuLzao$F*0A5ayg)#BLPVoW`laqN{#GF`y4K3n1;)6r|xR^QO
>> V9bJNW7#Nr*U*I#`Y|7`t2>@&@5ljF8
>>
>> diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
>> index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
>> GIT binary patch
>> literal 912
>> zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2
>> z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
>> zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
>> zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
>> R+q~0XMSexT%^U0Ee0~)`g#iEn
>>
>> literal 0
>> HcmV?d00001
>>
>> diff --git a/tests/data/acpi/q35/DSDT b/tests/data/acpi/q35/DSDT
>> index 842533f53e6db40935c3cdecd1d182edba6c17d4..950c286b4c751f3c116a11d8892779942375e16b 100644
>> GIT binary patch
>> delta 59
>> zcmaFp@X3M8CD<jTNP&TYv2`OCrvjHhYfOBwQ@nsX>ttC4TZ!l<{$N9cc#e2SmmnSn
>> O1||j(wg6|p5C#C(xDBxY
>>
>> delta 42
>> xcmez5@X&$FCD<h-QGtPh@##h`P6aMMmYDcpr+5K3mdUaTw(KHo0nUCQ3;+kH3ZMW0
>>
>> diff --git a/tests/data/acpi/q35/ERST b/tests/data/acpi/q35/ERST
>> index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26 100644
>> GIT binary patch
>> literal 912
>> zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2
>> z21<FK_R^jEx_w-`TFO&O;T_LLF4O@x8LMi!H}5Z^t6_Tah{H!Y?i2S%JoA7!BFgz1
>> zf~;?N{b8^}H2K=viyuzh`L7M``U{CiG=Ib#4X^gc{m10T<lDURCp`CW$T#HMd{o-?
>> zH~aE`PznCu9;f*enm;9;GDrTme_0zSBR|7ODR;g(@qEM!N8Z_gL4HBL%^N<3mgJY@
>> R+q~0XMSexT%^U0Ee0~)`g#iEn
>>
>> literal 0
>> HcmV?d00001
>>
>> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
>> index b3aaf76..dfb8523 100644
>> --- a/tests/qtest/bios-tables-test-allowed-diff.h
>> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
>> @@ -1,7 +1 @@
>>   /* List of comma-separated changed AML files to ignore */
>> -"tests/data/acpi/pc/ERST",
>> -"tests/data/acpi/q35/ERST",
>> -"tests/data/acpi/microvm/ERST",
>> -"tests/data/acpi/pc/DSDT",
>> -"tests/data/acpi/q35/DSDT",
>> -"tests/data/acpi/microvm/DSDT",
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 05/10] ACPI ERST: support for ACPI ERST feature
  2021-10-04 21:13     ` Eric DeVolder
@ 2021-10-05 11:39       ` Igor Mammedov
  2021-10-05 16:40         ` Eric DeVolder
  0 siblings, 1 reply; 34+ messages in thread
From: Igor Mammedov @ 2021-10-05 11:39 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth

On Mon, 4 Oct 2021 16:13:09 -0500
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Igor, thanks for the close examination. Inline responses below.
> eric
> 
> On 9/21/21 10:30 AM, Igor Mammedov wrote:
> > On Thu,  5 Aug 2021 18:30:34 -0400
> > Eric DeVolder <eric.devolder@oracle.com> wrote:
> >   
> >> This implements a PCI device for ACPI ERST. This implements the
> >> non-NVRAM "mode" of operation for ERST as it is supported by
> >> Linux and Windows.
> >>
> >> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> >> ---
> >>   hw/acpi/erst.c       | 750 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>   hw/acpi/meson.build  |   1 +
> >>   hw/acpi/trace-events |  15 ++
> >>   3 files changed, 766 insertions(+)
> >>   create mode 100644 hw/acpi/erst.c
> >>
> >> diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
> >> new file mode 100644
> >> index 0000000..eb4ab34
> >> --- /dev/null
> >> +++ b/hw/acpi/erst.c
> >> @@ -0,0 +1,750 @@
> >> +/*
> >> + * ACPI Error Record Serialization Table, ERST, Implementation
> >> + *
> >> + * ACPI ERST introduced in ACPI 4.0, June 16, 2009.
> >> + * ACPI Platform Error Interfaces : Error Serialization
> >> + *
> >> + * Copyright (c) 2021 Oracle and/or its affiliates.
> >> + *
> >> + * SPDX-License-Identifier: GPL-2.0-or-later
> >> + */
> >> +
> >> +#include <sys/types.h>
> >> +#include <sys/stat.h>
> >> +#include <unistd.h>
> >> +
> >> +#include "qemu/osdep.h"
> >> +#include "qapi/error.h"
> >> +#include "hw/qdev-core.h"
> >> +#include "exec/memory.h"
> >> +#include "qom/object.h"
> >> +#include "hw/pci/pci.h"
> >> +#include "qom/object_interfaces.h"
> >> +#include "qemu/error-report.h"
> >> +#include "migration/vmstate.h"
> >> +#include "hw/qdev-properties.h"
> >> +#include "hw/acpi/acpi.h"
> >> +#include "hw/acpi/acpi-defs.h"
> >> +#include "hw/acpi/aml-build.h"
> >> +#include "hw/acpi/bios-linker-loader.h"
> >> +#include "exec/address-spaces.h"
> >> +#include "sysemu/hostmem.h"
> >> +#include "hw/acpi/erst.h"
> >> +#include "trace.h"
> >> +
> >> +/* ACPI 4.0: Table 17-16 Serialization Actions */
> >> +#define ACTION_BEGIN_WRITE_OPERATION         0x0
> >> +#define ACTION_BEGIN_READ_OPERATION          0x1
> >> +#define ACTION_BEGIN_CLEAR_OPERATION         0x2
> >> +#define ACTION_END_OPERATION                 0x3
> >> +#define ACTION_SET_RECORD_OFFSET             0x4
> >> +#define ACTION_EXECUTE_OPERATION             0x5
> >> +#define ACTION_CHECK_BUSY_STATUS             0x6
> >> +#define ACTION_GET_COMMAND_STATUS            0x7
> >> +#define ACTION_GET_RECORD_IDENTIFIER         0x8
> >> +#define ACTION_SET_RECORD_IDENTIFIER         0x9
> >> +#define ACTION_GET_RECORD_COUNT              0xA
> >> +#define ACTION_BEGIN_DUMMY_WRITE_OPERATION   0xB
> >> +#define ACTION_RESERVED                      0xC
> >> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE   0xD
> >> +#define ACTION_GET_ERROR_LOG_ADDRESS_LENGTH  0xE
> >> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES 0xF
> >> +#define ACTION_GET_EXECUTE_OPERATION_TIMINGS 0x10
> >> +
> >> +/* ACPI 4.0: Table 17-17 Command Status Definitions */
> >> +#define STATUS_SUCCESS                0x00
> >> +#define STATUS_NOT_ENOUGH_SPACE       0x01
> >> +#define STATUS_HARDWARE_NOT_AVAILABLE 0x02
> >> +#define STATUS_FAILED                 0x03
> >> +#define STATUS_RECORD_STORE_EMPTY     0x04
> >> +#define STATUS_RECORD_NOT_FOUND       0x05
> >> +
> >> +
> >> +/* UEFI 2.1: Appendix N Common Platform Error Record */
> >> +#define UEFI_CPER_RECORD_MIN_SIZE 128U
> >> +#define UEFI_CPER_RECORD_LENGTH_OFFSET 20U
> >> +#define UEFI_CPER_RECORD_ID_OFFSET 96U
> >> +#define IS_UEFI_CPER_RECORD(ptr) \
> >> +    (((ptr)[0] == 'C') && \
> >> +     ((ptr)[1] == 'P') && \
> >> +     ((ptr)[2] == 'E') && \
> >> +     ((ptr)[3] == 'R'))
> >> +#define THE_UEFI_CPER_RECORD_ID(ptr) \
> >> +    (*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
> >> +
> >> +/*
> >> + * This implementation is an ACTION (cmd) and VALUE (data)
> >> + * interface consisting of just two 64-bit registers.
> >> + */
> >> +#define ERST_REG_SIZE (16UL)
> >> +#define ERST_ACTION_OFFSET (0UL) /* action (cmd) */
> >> +#define ERST_VALUE_OFFSET  (8UL) /* argument/value (data) */
> >> +
> >> +/*
> >> + * ERST_RECORD_SIZE is the buffer size for exchanging ERST
> >> + * record contents. Thus, it defines the maximum record size.
> >> + * As this is mapped through a PCI BAR, it must be a power of
> >> + * two and larger than UEFI_CPER_RECORD_MIN_SIZE.
> >> + * The backing storage is divided into fixed size "slots",
> >> + * each ERST_RECORD_SIZE in length, and each "slot"
> >> + * storing a single record. No attempt at optimizing storage
> >> + * through compression, compaction, etc is attempted.
> >> + * NOTE that slot 0 is reserved for the backing storage header.
> >> + * Depending upon the size of the backing storage, additional
> >> + * slots will be part of the slot 0 header in order to account
> >> + * for a record_id for each available remaining slot.
> >> + */
> >> +/* 8KiB records, not too small, not too big */
> >> +#define ERST_RECORD_SIZE (8192UL)
> >> +
> >> +#define ACPI_ERST_MEMDEV_PROP "memdev"
> >> +
> >> +/*
> >> + * From the ACPI ERST spec sections:
> >> + * A record id of all 0s is used to indicate
> >> + * 'unspecified' record id.
> >> + * A record id of all 1s is used to indicate
> >> + * empty or end.
> >> + */
> >> +#define ERST_UNSPECIFIED_RECORD_ID (0UL)
> >> +#define ERST_EMPTY_END_RECORD_ID (~0UL)
> >> +#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
> >> +#define ERST_IS_VALID_RECORD_ID(rid) \
> >> +    ((rid != ERST_UNSPECIFIED_RECORD_ID) && \
> >> +     (rid != ERST_EMPTY_END_RECORD_ID))
> >> +
> >> +typedef struct erst_storage_header_s {  
> >   
> >> +#define ERST_STORE_MAGIC 0x524F545354535245UL  
> > 
> > move it out of structure definition,
> > also where value comes from? (perhaps something starting
> > with ERST... would be more self-describing)  
> done; this value is 'ERSTSTOR', which I've left as a comment in v7.
> 
> >   
> >> +    uint64_t magic;
> >> +    uint32_t record_size;
> >> +    uint32_t record_offset; /* offset to record storage beyond header */
> >> +    uint16_t version;
> >> +    uint16_t reserved;
> >> +    uint32_t record_count;
> >> +    uint64_t map[]; /* contains record_ids, and position indicates index */
> >> +} erst_storage_header_t;  
> > docs/devel/style.rst: Typedefs  
> done; thanks
> 
> > 
> > also give it's used as header layout in storage,
> > set packed attribute for structure  
> done
> 
> >   
> >> +
> >> +/*
> >> + * Object cast macro
> >> + */
> >> +#define ACPIERST(obj) \
> >> +    OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
> >> +
> >> +/*
> >> + * Main ERST device state structure
> >> + */
> >> +typedef struct {
> >> +    PCIDevice parent_obj;
> >> +
> >> +    /* Backend storage */
> >> +    HostMemoryBackend *hostmem;
> >> +    MemoryRegion *hostmem_mr;
> >> +
> >> +    /* Programming registers */
> >> +    MemoryRegion iomem;
> >> +
> >> +    /* Exchange buffer */
> >> +    Object *exchange_obj;
> >> +    HostMemoryBackend *exchange;
> >> +    MemoryRegion *exchange_mr;
> >> +    uint32_t storage_size;
> >> +
> >> +    /* Interface state */
> >> +    uint8_t operation;
> >> +    uint8_t busy_status;
> >> +    uint8_t command_status;
> >> +    uint32_t record_offset;
> >> +    uint64_t reg_action;
> >> +    uint64_t reg_value;
> >> +    uint64_t record_identifier;
> >> +    erst_storage_header_t *header;
> >> +    unsigned next_record_index;
> >> +    unsigned first_record_index;
> >> +    unsigned last_record_index;
> >> +
> >> +} ERSTDeviceState;
> >> +
> >> +/*******************************************************************/
> >> +/*******************************************************************/
> >> +
> >> +static uint8_t *get_nvram_ptr_by_index(ERSTDeviceState *s, unsigned index)
> >> +{
> >> +    uint8_t *rc = NULL;
> >> +    off_t offset = (index * ERST_RECORD_SIZE);  
> >   
> >> +    if ((offset + ERST_RECORD_SIZE) <= s->storage_size) {  
> > 
> > it looks like 'index' passed by caller is always valid, if it's the case
> > convert  this to
> >          g_assert((offset + ERST_RECORD_SIZE) <= s->storage_size))  
> done
> 
> > 
> > also shouldn't <= be just <  
> yes, done
> 
> > 
> >   
> >> +        if (s->hostmem_mr) {  
> > can hostmem_mr be NULL, when this function is called?
> > if not I'd drop condition.  
> no, so dropped. done
> 
> >   
> >> +            uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(s->hostmem_mr);
> >> +            rc = p + offset;
> >> +        }
> >> +    }
> >> +    return rc;
> >> +}
> >> +
> >> +static void make_erst_storage_header(ERSTDeviceState *s)
> >> +{
> >> +    erst_storage_header_t *header = s->header;
> >> +    unsigned mapsz, headersz;
> >> +
> >> +    header->magic = ERST_STORE_MAGIC;
> >> +    header->record_size = ERST_RECORD_SIZE;
> >> +    header->version = 0x0101;  
> > 
> > maybe 0 or 1 to avoid question about what previous versions are  
> changed to simply 0x0100 (ie 1.0)
> >   
> >> +    header->reserved = 0x0000;  
> > s/0x.../0/  
> done
> 
> >   
> >> +
> >> +    /* Compute mapsize */
> >> +    mapsz = s->storage_size / ERST_RECORD_SIZE;
> >> +    mapsz *= sizeof(uint64_t);
> >> +    /* Compute header+map size */
> >> +    headersz = sizeof(erst_storage_header_t) + mapsz;  
> >   
> >> +    /* Round up to nearest integer multiple of ERST_RECORD_SIZE */
> >> +    headersz += (ERST_RECORD_SIZE - 1);
> >> +    headersz /= ERST_RECORD_SIZE;
> >> +    headersz *= ERST_RECORD_SIZE;  
> > git grep ROUND_UP
> > may be of help here  
> yes, thanks. I'm using that now, done.
> 
> >   
> >> +    header->record_offset = headersz;
> >> +
> >> +    /*
> >> +     * The HostMemoryBackend initializes contents to zero,
> >> +     * so all record_ids stashed in the map are zero'd.
> >> +     * As well the record_count is zero. Properly initialized.
> >> +     */
> >> +}
> >> +
> >> +static void check_erst_backend_storage(ERSTDeviceState *s, Error **errp)
> >> +{
> >> +    erst_storage_header_t *header;
> >> +
> >> +    header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);  
> > optionally check/assert if it's not 64bit aligned,
> > if it's not you risk getting killed by SIGBUG on some hosts,
> > since you're accessing fields directly.  
> done!
> 
> >   
> >> +    s->header = header;
> >> +
> >> +    /* Check if header is uninitialized */
> >> +    if (header->magic == 0UL) { /* HostMemoryBackend inits to 0 */
> >> +        make_erst_storage_header(s);
> >> +    }
> >> +
> >> +    if (!(
> >> +        (header->magic == ERST_STORE_MAGIC) &&
> >> +        (header->record_size == ERST_RECORD_SIZE) &&
> >> +        ((header->record_offset % ERST_RECORD_SIZE) == 0) &&
> >> +        (header->version == 0x0101) &&
> >> +        (header->reserved == 0x0000)
> >> +        )) {
> >> +        error_setg(errp, "ERST backend storage header is invalid");
> >> +    }
> >> +
> >> +    /* Compute offset of first and last record storage slot */
> >> +    s->first_record_index = header->record_offset / ERST_RECORD_SIZE;
> >> +    s->last_record_index = (s->storage_size / ERST_RECORD_SIZE);  
> > 
> > applies to whole patch/series,
> > if mmaped header values are interpreted as integers you shall
> > take care of endianness, i.e. use cpu_to_foo/foo_to_cpu for access  
> done; I'm using cpu_to_leX() and leX_to_cpu() for any access to the header.
> 
> > 
> > and document file endianness in doc (2/10)  
> done
> 
> >   
> >> +}
> >> +
> >> +static void set_erst_map_by_index(ERSTDeviceState *s, unsigned index,
> >> +    uint64_t record_id)  
> > 
> > update_[cache|map]_[entry|record_id]() or something like this might be
> > a better description erst and index don't really add much here as it's
> > clear from context.  
> done; now update_map_entry()
> 
> > 
> >   
> >> +{
> >> +    if (index < s->last_record_index) {
> >> +        s->header->map[index] = record_id;
> >> +    }
> >> +}
> >> +
> >> +static unsigned lookup_erst_record(ERSTDeviceState *s,
> >> +    uint64_t record_identifier)
> >> +{
> >> +    unsigned rc = 0; /* 0 not a valid index */
> >> +    unsigned index = s->first_record_index;
> >> +
> >> +    /* Find the record_identifier in the map */
> >> +    if (record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
> >> +        /*
> >> +         * Count number of valid records encountered, and
> >> +         * short-circuit the loop if identifier not found
> >> +         */
> >> +        unsigned count = 0;
> >> +        for (; index < s->last_record_index &&
> >> +                count < s->header->record_count; ++index) {
> >> +            uint64_t map_record_identifier = s->header->map[index];  
> > I'd drop map_record_identifier and use s->header->map[index] directly,
> > i.e
> >     if (s->header->map[index] ...  
> done
> 
> >   
> >> +            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
> >> +                ++count;
> >> +            }
> >> +            if (map_record_identifier == record_identifier) {
> >> +                rc = index;
> >> +                break;
> >> +            }
> >> +        }
> >> +    } else {
> >> +        /* Find first available unoccupied slot */
> >> +        for (; index < s->last_record_index; ++index) {
> >> +            if (s->header->map[index] == ERST_UNSPECIFIED_RECORD_ID) {
> >> +                rc = index;
> >> +                break;
> >> +            }
> >> +        }
> >> +    }
> >> +
> >> +    return rc;
> >> +}  
> > 
> > what's the reason for combining lookup and allocate ops,
> > if they where separated it' would be easier to follow code.  
> done; at one point it made sense; no longer.
> 
> >   
> >> +
> >> +/* ACPI 4.0: 17.4.2.3 Operations - Clearing */
> >> +static unsigned clear_erst_record(ERSTDeviceState *s)
> >> +{
> >> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
> >> +    unsigned index;
> >> +
> >> +    /* Check for valid record identifier */
> >> +    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
> >> +        return STATUS_FAILED;
> >> +    }
> >> +
> >> +    index = lookup_erst_record(s, s->record_identifier);
> >> +    if (index) {
> >> +        /* No need to wipe record, just invalidate its map entry */
> >> +        set_erst_map_by_index(s, index, ERST_UNSPECIFIED_RECORD_ID);
> >> +        s->header->record_count -= 1;
> >> +        rc = STATUS_SUCCESS;
> >> +    }
> >> +
> >> +    return rc;
> >> +}
> >> +
> >> +/* ACPI 4.0: 17.4.2.2 Operations - Reading */
> >> +static unsigned read_erst_record(ERSTDeviceState *s)
> >> +{
> >> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
> >> +    unsigned index;
> >> +
> >> +    /* Check record boundary wihin exchange buffer */  
> >                                  ^^^ typo  
> done
> 
> >   
> >> +    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
> >> +        return STATUS_FAILED;
> >> +    }
> >> +
> >> +    /* Check for valid record identifier */
> >> +    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
> >> +        return STATUS_FAILED;
> >> +    }
> >> +
> >> +    index = lookup_erst_record(s, s->record_identifier);
> >> +    if (index) {
> >> +        uint8_t *ptr;
> >> +        uint8_t *record = ((uint8_t *)
> >> +            memory_region_get_ram_ptr(s->exchange_mr) +
> >> +            s->record_offset);
> >> +        ptr = get_nvram_ptr_by_index(s, index);
> >> +        memcpy(record, ptr, ERST_RECORD_SIZE - s->record_offset);  
> > 
> > if record_offset is large enough that record won't fit,
> > this will copy truncated record into the exchange buffer.
> > 
> > Maybe it's better to fail whole op?  
> The first check within this function checks for this very condition, and does fail.
> I believe the code does as you are asking.

The 1st check guaranties that 'exchange_mr + record_offset, exchange_mr_end'
has a space at least for UEFI_CPER_RECORD_MIN_SIZE, while the source record
that's being copied can be larger than that.
i.e. assume 
 record_offset = 7, ERST_RECORD_SIZE = 10, UEFI_CPER_RECORD_MIN_SIZE = 2, ptr->record_size = 9
 
 > if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE))
will be passed, while
 > memcpy(record, ptr, ERST_RECORD_SIZE - s->record_offset);
will copy 3 bytes only, truncating the rest of the record
but still report success.

Also, while max copied amount won't exceed exchange_mr capacity
due to it being equal to ERST_RECORD_SIZE in current impl., it can
be dangerous later on if buffer/record sizes diverge as dependency
coded here is implicit. Safer option would be using actual destination
buffer/copied record size for check to avoid potential buffer overrun
(I'm assuming that records are not random blobs but CPER formatted structure).

 copy_size = to_be_copied_record_size
 if copy_size <= memory_region_size(exchange_mr) - record_offset
    memcpy(record, ptr, copy_size)
 else
    error_out

[1] the same applies to 'if (s->record_offset >= ...)' check
make it use actual exchange_mr size explicitly.

nit:
Also use of record_offset in header and in state is a bit of overloaded,
I'd consider renaming one of them to avoid confusion.

> >   
> >> +        rc = STATUS_SUCCESS;
> >> +    }
> >> +
> >> +    return rc;
> >> +}
> >> +
> >> +/* ACPI 4.0: 17.4.2.1 Operations - Writing */
> >> +static unsigned write_erst_record(ERSTDeviceState *s)
> >> +{
> >> +    unsigned rc = STATUS_FAILED;
> >> +    unsigned index;
> >> +    uint64_t record_identifier;
> >> +    uint8_t *record;
> >> +    uint8_t *ptr = NULL;
> >> +    bool record_found = false;
> >> +
> >> +    /* Check record boundary wihin exchange buffer */  
> > ditto, typo  
> done
> 
> >   
> >> +    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
> >> +        return STATUS_FAILED;
> >> +    }
see (1)

> >> +
> >> +    /* Extract record identifier */
> >> +    record = ((uint8_t *)memory_region_get_ram_ptr(s->exchange_mr)
> >> +        + s->record_offset);
> >> +    record_identifier = THE_UEFI_CPER_RECORD_ID(record);  
> > potentially unaligned access to int, should use memcpy()  
> done
> 
> >   
> >> +
> >> +    /* Check for valid record identifier */
> >> +    if (!ERST_IS_VALID_RECORD_ID(record_identifier)) {
> >> +        return STATUS_FAILED;
> >> +    }
> >> +
> >> +    index = lookup_erst_record(s, record_identifier);
> >> +    if (index) {
> >> +        /* Record found, overwrite existing record */
> >> +        ptr = get_nvram_ptr_by_index(s, index);
> >> +        record_found = true;
> >> +    } else {
> >> +        /* Record not found, not an overwrite, allocate for write */
> >> +        index = lookup_erst_record(s, ERST_UNSPECIFIED_RECORD_ID);
> >> +        if (index) {
> >> +            ptr = get_nvram_ptr_by_index(s, index);
> >> +        } else {
> >> +            rc = STATUS_NOT_ENOUGH_SPACE;
> >> +        }
> >> +    }
> >> +    if (ptr) {
> >> +        memcpy(ptr, record, ERST_RECORD_SIZE - s->record_offset);  

This copies the remainder of exchange buffer, including 'leftovers' from
previous operations.
Is there a reason why are you not verifying actual 'record' size in buffer
and if it fits within target 'ptr' copy just useful payload from buffer?

> >> +        if (0 != s->record_offset) {
> >> +            memset(&ptr[ERST_RECORD_SIZE - s->record_offset],
> >> +                0xFF, s->record_offset);
> >> +        }
> > you've lost me here, care to explain what's going on here?  
> If the record_offset is not 0, then there can be bytes following the record within the slot that 
> were not written. This simply sets them to 0xFF (so bytes from a previously written record that 
> happened to occupy this slot do not "bleed" through).
> I've left a comment in v7.

well, 'bleed' happens because 'read_erst_record' copies whole slot
instead of the actual record size.

And that would work, only while exchange buffer size and record size
are equal, and fall apart silently as soon as that is not true,
leading to potential exploits.

it might be more robust if it written like this:
   if_record_is_complete (i.e. record in buffer is not truncated)
       if_actual_record_size_fits_in_slot
           memcpy(slot, record, actual_record_size)
           memset(slot+actual_record_size, 0xff, slot_size - actual_record_size);
   otherwise error out
 
> >> +        if (!record_found) {
> >> +            s->header->record_count += 1; /* writing new record */
> >> +        }
> >> +        set_erst_map_by_index(s, index, record_identifier);
> >> +        rc = STATUS_SUCCESS;
> >> +    }
> >> +
> >> +    return rc;
> >> +}
> >> +
> >> +/* ACPI 4.0: 17.4.2.2 Operations - Reading "During boot..." */
> >> +static unsigned next_erst_record(ERSTDeviceState *s,
> >> +    uint64_t *record_identifier)  
> > s/record_identifier/found.../  
> done
> 
> >   
> >> +{
> >> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
> >> +    unsigned index = s->next_record_index;
> >> +
> >> +    *record_identifier = ERST_EMPTY_END_RECORD_ID;
> >> +
> >> +    if (s->header->record_count) {
> >> +        for (; index < s->last_record_index; ++index) {
> >> +            uint64_t map_record_identifier;  
> > and then s/map_record_identifier/record_identifier/  
> done
> 
> > 
> > the same applies to other occurrences within patch
> > (map_record_identifier is a bit confusing) or drop it
> > and use s->header->map[index] directly  
> done
> 
> >   
> >> +            map_record_identifier = s->header->map[index];
> >> +            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
> >> +                    /* where to start next time */
> >> +                    s->next_record_index = index + 1;
> >> +                    *record_identifier = map_record_identifier;
> >> +                    rc = STATUS_SUCCESS;
> >> +                    break;
> >> +            }
> >> +        }
> >> +    }
> >> +    if (rc != STATUS_SUCCESS) {
> >> +        if (s->next_record_index == s->first_record_index) {
> >> +            /*
> >> +             * next_record_identifier is unchanged, no records found
> >> +             * and *record_identifier contains EMPTY_END id
> >> +             */
> >> +            rc = STATUS_RECORD_STORE_EMPTY;
> >> +        }
> >> +        /* at end/scan complete, reset */
> >> +        s->next_record_index = s->first_record_index;
> >> +    }  
> > 
> > Table 17-16, says return existing error or ERST_EMPTY_END_RECORD_ID
> > but nothing about op returning a error, so I'd assume status
> > should always be STATUS_SUCCESS for GET_RECORD_IDENTIFIER.  
> done
> 
> > 
> > Advancing to the next record is part of record READ op and
> > not the part of GET_RECORD_IDENTIFIER as it's done here.  
> well...
> 
> >    "The steps performed by the platform to carry out ...
> >       2. ..
> >          c. If the specified error record does not exist,
> >             ... update the status register’s Identifier field with the identifier of the
> > ‘first’ error record
> >       4. Record the Identifier of the ‘next’ valid error record ...
> >    "  
> 
> I used ACPI spec v6 and I was asked to locate the first occurrence of ERST in the spec, which was 
> v4. So the above spec quotes are accurate, however, spec v6 deviates in an important way from the 
> above, which reads:
> 
>    "c. If the status is Record Not Found (0x05), indicating that the specified error record does not 
> exist, OSPM retrieves a valid identifier by a GET_RECORD_IDENTIFIER action. The platform will return 
> a valid record identifier."

that's quote from OSPM behavior,

the platform part still looks the same to me (in 4.0/5.0/6.0/6.3) (they split 2.c on 2.c and 2.d)
but the meaning is the same.

 
> So GET_RECORD_IDENTIFIER is essentially a factory that pumps out record identifiers for records 
> stored. I kind of think of it as the old DOS 'find_first/find_next'. And yes v4 of the spec states 
> that the READ operation should initiate the first record_identifer. However v6 clearly states this 
> is now the responsibility of OSPM, not the READ op.
> 
> I am thinking that the best way to handle this contradiction is to change the ACPI spec citation 
> from v4 to v5, as the wording in v5 matches what I cite from v6, and implemented. Furthermore, this 
> approach of OSPM obtaining the next valid record_id via GET_RECORD_IDENTIFIER is consistent with 

> what I observed in BIOS and with how Linux is coded.
pointer[s] to source[s] please?


Well, spec can be wrong too (not the 1st time) but we need to be sure
what is broken and doesn't work as it's supposed to and document it
properly, before deviating from the spec.



> Thoughts?
> 
> > 
> >   
> >> +
> >> +    return rc;
> >> +}
> >> +
> >> +/*******************************************************************/
> >> +
> >> +static uint64_t erst_rd_reg64(hwaddr addr,
> >> +    uint64_t reg, unsigned size)
> >> +{
> >> +    uint64_t rdval;
> >> +    uint64_t mask;
> >> +    unsigned shift;
> >> +
> >> +    if (size == sizeof(uint64_t)) {
> >> +        /* 64b access */
> >> +        mask = 0xFFFFFFFFFFFFFFFFUL;
> >> +        shift = 0;
> >> +    } else {
> >> +        /* 32b access */
> >> +        mask = 0x00000000FFFFFFFFUL;
> >> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
> >> +    }
> >> +
> >> +    rdval = reg;
> >> +    rdval >>= shift;
> >> +    rdval &= mask;
> >> +
> >> +    return rdval;
> >> +}
> >> +
> >> +static uint64_t erst_wr_reg64(hwaddr addr,
> >> +    uint64_t reg, uint64_t val, unsigned size)
> >> +{
> >> +    uint64_t wrval;
> >> +    uint64_t mask;
> >> +    unsigned shift;
> >> +    if (size == sizeof(uint64_t)) {
> >> +        /* 64b access */
> >> +        mask = 0xFFFFFFFFFFFFFFFFUL;
> >> +        shift = 0;
> >> +    } else {
> >> +        /* 32b access */
> >> +        mask = 0x00000000FFFFFFFFUL;
> >> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
> >> +    }
> >> +
> >> +    val &= mask;
> >> +    val <<= shift;
> >> +    mask <<= shift;
> >> +    wrval = reg;
> >> +    wrval &= ~mask;
> >> +    wrval |= val;
> >> +
> >> +    return wrval;
> >> +}
> >> +
> >> +static void erst_reg_write(void *opaque, hwaddr addr,
> >> +    uint64_t val, unsigned size)
> >> +{
> >> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> >> +
> >> +    /*
> >> +     * NOTE: All actions/operations/side effects happen on the WRITE,
> >> +     * by design. The READs simply return the reg_value contents.  
> > 
> > point to spec, pls.  
> This was an implementation design choice, so no spec reference applicable, I left a comment.
> 
> 
> >   
> >> +     */
> >> +    trace_acpi_erst_reg_write(addr, val, size);
> >> +
> >> +    switch (addr) {
> >> +    case ERST_VALUE_OFFSET + 0:
> >> +    case ERST_VALUE_OFFSET + 4:
> >> +        s->reg_value = erst_wr_reg64(addr, s->reg_value, val, size);
> >> +        break;
> >> +    case ERST_ACTION_OFFSET + 0:  
> >   
> >> +/*  case ERST_ACTION_OFFSET+4: as coded, not really a 64b register */  
> > 
> > what does this mean?  
> In short, all values written to this register are just the ACTION ops, so there wasn't a need to 
> implement this as a 64-bit register, especially since Linux seems to issue two 32-bit accesses for 
> 64-bit; in this case the upper access is utterly useless.
> I placed a comment in code.
comment as it's, above is not helpful,
so it would be better to have a comment that explains reasoning a bit better.
like:
   supported/impl ACPTION ops are 32 only, so ...

> >> +        switch (val) {
> >> +        case ACTION_BEGIN_WRITE_OPERATION:
> >> +        case ACTION_BEGIN_READ_OPERATION:
> >> +        case ACTION_BEGIN_CLEAR_OPERATION:
> >> +        case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> >> +        case ACTION_END_OPERATION:
> >> +            s->operation = val;
> >> +            break;
> >> +        case ACTION_SET_RECORD_OFFSET:
> >> +            s->record_offset = s->reg_value;
> >> +            break;
> >> +        case ACTION_EXECUTE_OPERATION:
> >> +            if ((uint8_t)s->reg_value == ERST_EXECUTE_OPERATION_MAGIC) {
> >> +                s->busy_status = 1;
> >> +                switch (s->operation) {
> >> +                case ACTION_BEGIN_WRITE_OPERATION:
> >> +                    s->command_status = write_erst_record(s);
> >> +                    break;
> >> +                case ACTION_BEGIN_READ_OPERATION:
> >> +                    s->command_status = read_erst_record(s);
> >> +                    break;
> >> +                case ACTION_BEGIN_CLEAR_OPERATION:
> >> +                    s->command_status = clear_erst_record(s);
> >> +                    break;
> >> +                case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> >> +                    s->command_status = STATUS_SUCCESS;
> >> +                    break;
> >> +                case ACTION_END_OPERATION:
> >> +                    s->command_status = STATUS_SUCCESS;
> >> +                    break;
> >> +                default:
> >> +                    s->command_status = STATUS_FAILED;
> >> +                    break;
> >> +                }
> >> +                s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;  
> >                     shouldn't happen in case of Read op  
> correct, removed as not needed at all now.
> 
> > 
> > "
> > 17.4.2.2
> > 4. Record the Identifier of the ‘next’ valid error record that resides on the persistent store. This
> > allows OSPM to retrieve a valid record identifier by executing a GET_RECORD_IDENTIFIER
> > operation.
> > "
> >   
> >> +                s->busy_status = 0;
> >> +            }
> >> +            break;
> >> +        case ACTION_CHECK_BUSY_STATUS:
> >> +            s->reg_value = s->busy_status;
> >> +            break;
> >> +        case ACTION_GET_COMMAND_STATUS:
> >> +            s->reg_value = s->command_status;
> >> +            break;
> >> +        case ACTION_GET_RECORD_IDENTIFIER:
> >> +            s->command_status = next_erst_record(s, &s->reg_value);
> >> +            break;
> >> +        case ACTION_SET_RECORD_IDENTIFIER:
> >> +            s->record_identifier = s->reg_value;
> >> +            break;
> >> +        case ACTION_GET_RECORD_COUNT:
> >> +            s->reg_value = s->header->record_count;
> >> +            break;
> >> +        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
> >> +            s->reg_value = (hwaddr)pci_get_bar_addr(PCI_DEVICE(s), 1);
> >> +            break;
> >> +        case ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
> >> +            s->reg_value = ERST_RECORD_SIZE;
> >> +            break;
> >> +        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
> >> +            s->reg_value = 0x0; /* intentional, not NVRAM mode */
> >> +            break;
> >> +        case ACTION_GET_EXECUTE_OPERATION_TIMINGS:
> >> +            s->reg_value =
> >> +                (100ULL << 32) | /* 100us max time */
> >> +                (10ULL  <<  0) ; /*  10us min time */
> >> +            break;
> >> +        default:
> >> +            /* Unknown action/command, NOP */
> >> +            break;
> >> +        }
> >> +        break;
> >> +    default:
> >> +        /* This should not happen, but if it does, NOP */
> >> +        break;
> >> +    }
> >> +}
> >> +
> >> +static uint64_t erst_reg_read(void *opaque, hwaddr addr,
> >> +                                unsigned size)
> >> +{
> >> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> >> +    uint64_t val = 0;
> >> +
> >> +    switch (addr) {
> >> +    case ERST_ACTION_OFFSET + 0:
> >> +    case ERST_ACTION_OFFSET + 4:
> >> +        val = erst_rd_reg64(addr, s->reg_action, size);
> >> +        break;
> >> +    case ERST_VALUE_OFFSET + 0:
> >> +    case ERST_VALUE_OFFSET + 4:
> >> +        val = erst_rd_reg64(addr, s->reg_value, size);
> >> +        break;
> >> +    default:
> >> +        break;
> >> +    }
> >> +    trace_acpi_erst_reg_read(addr, val, size);
> >> +    return val;
> >> +}
> >> +
> >> +static const MemoryRegionOps erst_reg_ops = {
> >> +    .read = erst_reg_read,
> >> +    .write = erst_reg_write,
> >> +    .endianness = DEVICE_NATIVE_ENDIAN,
> >> +};
> >> +
> >> +/*******************************************************************/
> >> +/*******************************************************************/
> >> +static int erst_post_load(void *opaque, int version_id)
> >> +{
> >> +    ERSTDeviceState *s = opaque;
> >> +
> >> +    /* Recompute pointer to header */
> >> +    s->header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);
> >> +    trace_acpi_erst_post_load(s->header);
> >> +
> >> +    return 0;
> >> +}
> >> +
> >> +static const VMStateDescription erst_vmstate  = {
> >> +    .name = "acpi-erst",
> >> +    .version_id = 1,
> >> +    .minimum_version_id = 1,
> >> +    .post_load = erst_post_load,
> >> +    .fields = (VMStateField[]) {
> >> +        VMSTATE_UINT32(storage_size, ERSTDeviceState),  
> >   1)  
> >> +        VMSTATE_UINT8(operation, ERSTDeviceState),
> >> +        VMSTATE_UINT8(busy_status, ERSTDeviceState),
> >> +        VMSTATE_UINT8(command_status, ERSTDeviceState),
> >> +        VMSTATE_UINT32(record_offset, ERSTDeviceState),
> >> +        VMSTATE_UINT64(reg_action, ERSTDeviceState),
> >> +        VMSTATE_UINT64(reg_value, ERSTDeviceState),
> >> +        VMSTATE_UINT64(record_identifier, ERSTDeviceState),
> >> +        VMSTATE_UINT32(next_record_index, ERSTDeviceState),  
> >   
> >> +        VMSTATE_UINT32(first_record_index, ERSTDeviceState),
> >> +        VMSTATE_UINT32(last_record_index, ERSTDeviceState),  
> >   2)  
> >> +        VMSTATE_END_OF_LIST()
> >> +    }
> >> +};  
> > 
> >   1 and 2 aren't runtime state, so why they are in migration stream?  
> done; removed storage_size, first_record_index and last_record_index from the migration stream.
> 
> 
> > 
> > I'd imagine size could be used to check that backend on target is of the same size
> > to avoid buffer overrun if target side has smaller backend, and fail migration if
> > it's not the same. But it aren't used this way here.  
> I decided to not do this check as that memory object is migrated automatically, so I dont think my 
> check adds any value.
> 
> > 
> > the rest could be calculated at realize time.  
> and in fact they are.
> 
> >   
> >> +
> >> +static void erst_realizefn(PCIDevice *pci_dev, Error **errp)
> >> +{
> >> +    ERSTDeviceState *s = ACPIERST(pci_dev);
> >> +
> >> +    trace_acpi_erst_realizefn_in();
> >> +
> >> +    if (!s->hostmem) {
> >> +        error_setg(errp, "'" ACPI_ERST_MEMDEV_PROP "' property is not set");
> >> +        return;
> >> +    } else if (host_memory_backend_is_mapped(s->hostmem)) {
> >> +        error_setg(errp, "can't use already busy memdev: %s",
> >> +                   object_get_canonical_path_component(OBJECT(s->hostmem)));
> >> +        return;
> >> +    }
> >> +
> >> +    s->hostmem_mr = host_memory_backend_get_memory(s->hostmem);
> >> +
> >> +    /* HostMemoryBackend size will be multiple of PAGE_SIZE */
> >> +    s->storage_size = object_property_get_int(OBJECT(s->hostmem), "size", errp);
> >> +
> >> +    /* Check storage_size against ERST_RECORD_SIZE */
> >> +    if (((s->storage_size % ERST_RECORD_SIZE) != 0) ||
> >> +         (ERST_RECORD_SIZE > s->storage_size)) {
> >> +        error_setg(errp, "ACPI ERST requires size be multiple of "
> >> +            "record size (%luKiB)", ERST_RECORD_SIZE);
> >> +    }
> >> +
> >> +    /* Initialize backend storage and record_count */
> >> +    check_erst_backend_storage(s, errp);
> >> +
> >> +    /* BAR 0: Programming registers */
> >> +    memory_region_init_io(&s->iomem, OBJECT(pci_dev), &erst_reg_ops, s,
> >> +                          TYPE_ACPI_ERST, ERST_REG_SIZE);
> >> +    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->iomem);
> >> +
> >> +    /* BAR 1: Exchange buffer memory */  
> > 
> >   
> >> +    /* Create a hostmem object to use as the exchange buffer */
> >> +    s->exchange_obj = object_new(TYPE_MEMORY_BACKEND_RAM);
> >> +    object_property_set_int(s->exchange_obj, "size", ERST_RECORD_SIZE, errp);
> >> +    user_creatable_complete(USER_CREATABLE(s->exchange_obj), errp);
> >> +    s->exchange = MEMORY_BACKEND(s->exchange_obj);
> >> +    host_memory_backend_set_mapped(s->exchange, true);
> >> +    s->exchange_mr = host_memory_backend_get_memory(s->exchange);  
> > replace this block with single memory_region_init_ram()  
> done!
> 
> > 
> >   
> >> +    memory_region_init_resizeable_ram(s->exchange_mr, OBJECT(pci_dev),
> >> +        TYPE_ACPI_ERST, ERST_RECORD_SIZE, ERST_RECORD_SIZE, NULL, errp);  
> > have ho idea why it's necessary, seems just wrong, it basically leaks
> > previous memory region and creates a new one.  
> done!
> 
> >   
> >> +    pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, s->exchange_mr);  
> >   
> >> +    /* Include the exchange buffer in the migration stream */
> >> +    vmstate_register_ram_global(s->exchange_mr);  
> > not necessary if memory_region_init_ram() is used directly  
> done!
> 
> >   
> >> +
> >> +    /* Include the backend storage in the migration stream */
> >> +    vmstate_register_ram_global(s->hostmem_mr);
> >> +
> >> +    trace_acpi_erst_realizefn_out(s->storage_size);
> >> +}
> >> +
> >> +static void erst_reset(DeviceState *dev)
> >> +{
> >> +    ERSTDeviceState *s = ACPIERST(dev);
> >> +
> >> +    trace_acpi_erst_reset_in(s->header->record_count);
> >> +    s->operation = 0;
> >> +    s->busy_status = 0;
> >> +    s->command_status = STATUS_SUCCESS;
> >> +    s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;
> >> +    s->record_offset = 0;
> >> +    s->next_record_index = s->first_record_index;
> >> +    /* NOTE: first/last_record_index are computed only once */
> >> +    trace_acpi_erst_reset_out(s->header->record_count);
> >> +}
> >> +
> >> +static Property erst_properties[] = {
> >> +    DEFINE_PROP_LINK(ACPI_ERST_MEMDEV_PROP, ERSTDeviceState, hostmem,
> >> +                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
> >> +    DEFINE_PROP_END_OF_LIST(),
> >> +};
> >> +
> >> +static void erst_class_init(ObjectClass *klass, void *data)
> >> +{
> >> +    DeviceClass *dc = DEVICE_CLASS(klass);
> >> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> >> +
> >> +    trace_acpi_erst_class_init_in();
> >> +    k->realize = erst_realizefn;
> >> +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> >> +    k->device_id = PCI_DEVICE_ID_REDHAT_ACPI_ERST;
> >> +    k->revision = 0x00;
> >> +    k->class_id = PCI_CLASS_OTHERS;
> >> +    dc->reset = erst_reset;
> >> +    dc->vmsd = &erst_vmstate;
> >> +    dc->user_creatable = true;  
> > 
> > can't be hotplugged, add:
> >         dc->hotpluggable = false;  
> done
> 
> >   
> >> +    device_class_set_props(dc, erst_properties);
> >> +    dc->desc = "ACPI Error Record Serialization Table (ERST) device";
> >> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> >> +    trace_acpi_erst_class_init_out();
> >> +}
> >> +
> >> +static const TypeInfo erst_type_info = {
> >> +    .name          = TYPE_ACPI_ERST,
> >> +    .parent        = TYPE_PCI_DEVICE,
> >> +    .class_init    = erst_class_init,
> >> +    .instance_size = sizeof(ERSTDeviceState),
> >> +    .interfaces = (InterfaceInfo[]) {
> >> +        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> >> +        { }
> >> +    }
> >> +};
> >> +
> >> +static void erst_register_types(void)
> >> +{
> >> +    type_register_static(&erst_type_info);
> >> +}
> >> +
> >> +type_init(erst_register_types)
> >> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
> >> index 29f804d..401d0e5 100644
> >> --- a/hw/acpi/meson.build
> >> +++ b/hw/acpi/meson.build
> >> @@ -5,6 +5,7 @@ acpi_ss.add(files(
> >>     'bios-linker-loader.c',
> >>     'core.c',
> >>     'utils.c',
> >> +  'erst.c',
> >>   ))
> >>   acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu.c'))
> >>   acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu_hotplug.c'))
> >> diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
> >> index 974d770..3579768 100644
> >> --- a/hw/acpi/trace-events
> >> +++ b/hw/acpi/trace-events
> >> @@ -55,3 +55,18 @@ piix4_gpe_writeb(uint64_t addr, unsigned width, uint64_t val) "addr: 0x%" PRIx64
> >>   # tco.c
> >>   tco_timer_reload(int ticks, int msec) "ticks=%d (%d ms)"
> >>   tco_timer_expired(int timeouts_no, bool strap, bool no_reboot) "timeouts_no=%d no_reboot=%d/%d"
> >> +
> >> +# erst.c
> >> +acpi_erst_reg_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%04" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
> >> +acpi_erst_reg_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%04" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
> >> +acpi_erst_mem_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%06" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
> >> +acpi_erst_mem_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%06" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
> >> +acpi_erst_pci_bar_0(uint64_t addr) "BAR0: 0x%016" PRIx64
> >> +acpi_erst_pci_bar_1(uint64_t addr) "BAR1: 0x%016" PRIx64
> >> +acpi_erst_realizefn_in(void)
> >> +acpi_erst_realizefn_out(unsigned size) "total nvram size %u bytes"
> >> +acpi_erst_reset_in(unsigned record_count) "record_count %u"
> >> +acpi_erst_reset_out(unsigned record_count) "record_count %u"
> >> +acpi_erst_post_load(void *header) "header: 0x%p"
> >> +acpi_erst_class_init_in(void)
> >> +acpi_erst_class_init_out(void)  
> >   
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 05/10] ACPI ERST: support for ACPI ERST feature
  2021-10-05 11:39       ` Igor Mammedov
@ 2021-10-05 16:40         ` Eric DeVolder
  2021-10-06 14:36           ` Igor Mammedov
  0 siblings, 1 reply; 34+ messages in thread
From: Eric DeVolder @ 2021-10-05 16:40 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth

Igor, again thanks for the detailed review. Inline responses below.
eric

On 10/5/21 6:39 AM, Igor Mammedov wrote:
> On Mon, 4 Oct 2021 16:13:09 -0500
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
>> Igor, thanks for the close examination. Inline responses below.
>> eric
>>
>> On 9/21/21 10:30 AM, Igor Mammedov wrote:
>>> On Thu,  5 Aug 2021 18:30:34 -0400
>>> Eric DeVolder <eric.devolder@oracle.com> wrote:
>>>    
>>>> This implements a PCI device for ACPI ERST. This implements the
>>>> non-NVRAM "mode" of operation for ERST as it is supported by
>>>> Linux and Windows.
>>>>
>>>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>>>> ---
>>>>    hw/acpi/erst.c       | 750 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    hw/acpi/meson.build  |   1 +
>>>>    hw/acpi/trace-events |  15 ++
>>>>    3 files changed, 766 insertions(+)
>>>>    create mode 100644 hw/acpi/erst.c
>>>>
>>>> diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
>>>> new file mode 100644
>>>> index 0000000..eb4ab34
>>>> --- /dev/null
>>>> +++ b/hw/acpi/erst.c
>>>> @@ -0,0 +1,750 @@
>>>> +/*
>>>> + * ACPI Error Record Serialization Table, ERST, Implementation
>>>> + *
>>>> + * ACPI ERST introduced in ACPI 4.0, June 16, 2009.
>>>> + * ACPI Platform Error Interfaces : Error Serialization
>>>> + *
>>>> + * Copyright (c) 2021 Oracle and/or its affiliates.
>>>> + *
>>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>>> + */
>>>> +
>>>> +#include <sys/types.h>
>>>> +#include <sys/stat.h>
>>>> +#include <unistd.h>
>>>> +
>>>> +#include "qemu/osdep.h"
>>>> +#include "qapi/error.h"
>>>> +#include "hw/qdev-core.h"
>>>> +#include "exec/memory.h"
>>>> +#include "qom/object.h"
>>>> +#include "hw/pci/pci.h"
>>>> +#include "qom/object_interfaces.h"
>>>> +#include "qemu/error-report.h"
>>>> +#include "migration/vmstate.h"
>>>> +#include "hw/qdev-properties.h"
>>>> +#include "hw/acpi/acpi.h"
>>>> +#include "hw/acpi/acpi-defs.h"
>>>> +#include "hw/acpi/aml-build.h"
>>>> +#include "hw/acpi/bios-linker-loader.h"
>>>> +#include "exec/address-spaces.h"
>>>> +#include "sysemu/hostmem.h"
>>>> +#include "hw/acpi/erst.h"
>>>> +#include "trace.h"
>>>> +
>>>> +/* ACPI 4.0: Table 17-16 Serialization Actions */
>>>> +#define ACTION_BEGIN_WRITE_OPERATION         0x0
>>>> +#define ACTION_BEGIN_READ_OPERATION          0x1
>>>> +#define ACTION_BEGIN_CLEAR_OPERATION         0x2
>>>> +#define ACTION_END_OPERATION                 0x3
>>>> +#define ACTION_SET_RECORD_OFFSET             0x4
>>>> +#define ACTION_EXECUTE_OPERATION             0x5
>>>> +#define ACTION_CHECK_BUSY_STATUS             0x6
>>>> +#define ACTION_GET_COMMAND_STATUS            0x7
>>>> +#define ACTION_GET_RECORD_IDENTIFIER         0x8
>>>> +#define ACTION_SET_RECORD_IDENTIFIER         0x9
>>>> +#define ACTION_GET_RECORD_COUNT              0xA
>>>> +#define ACTION_BEGIN_DUMMY_WRITE_OPERATION   0xB
>>>> +#define ACTION_RESERVED                      0xC
>>>> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE   0xD
>>>> +#define ACTION_GET_ERROR_LOG_ADDRESS_LENGTH  0xE
>>>> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES 0xF
>>>> +#define ACTION_GET_EXECUTE_OPERATION_TIMINGS 0x10
>>>> +
>>>> +/* ACPI 4.0: Table 17-17 Command Status Definitions */
>>>> +#define STATUS_SUCCESS                0x00
>>>> +#define STATUS_NOT_ENOUGH_SPACE       0x01
>>>> +#define STATUS_HARDWARE_NOT_AVAILABLE 0x02
>>>> +#define STATUS_FAILED                 0x03
>>>> +#define STATUS_RECORD_STORE_EMPTY     0x04
>>>> +#define STATUS_RECORD_NOT_FOUND       0x05
>>>> +
>>>> +
>>>> +/* UEFI 2.1: Appendix N Common Platform Error Record */
>>>> +#define UEFI_CPER_RECORD_MIN_SIZE 128U
>>>> +#define UEFI_CPER_RECORD_LENGTH_OFFSET 20U
>>>> +#define UEFI_CPER_RECORD_ID_OFFSET 96U
>>>> +#define IS_UEFI_CPER_RECORD(ptr) \
>>>> +    (((ptr)[0] == 'C') && \
>>>> +     ((ptr)[1] == 'P') && \
>>>> +     ((ptr)[2] == 'E') && \
>>>> +     ((ptr)[3] == 'R'))
>>>> +#define THE_UEFI_CPER_RECORD_ID(ptr) \
>>>> +    (*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
>>>> +
>>>> +/*
>>>> + * This implementation is an ACTION (cmd) and VALUE (data)
>>>> + * interface consisting of just two 64-bit registers.
>>>> + */
>>>> +#define ERST_REG_SIZE (16UL)
>>>> +#define ERST_ACTION_OFFSET (0UL) /* action (cmd) */
>>>> +#define ERST_VALUE_OFFSET  (8UL) /* argument/value (data) */
>>>> +
>>>> +/*
>>>> + * ERST_RECORD_SIZE is the buffer size for exchanging ERST
>>>> + * record contents. Thus, it defines the maximum record size.
>>>> + * As this is mapped through a PCI BAR, it must be a power of
>>>> + * two and larger than UEFI_CPER_RECORD_MIN_SIZE.
>>>> + * The backing storage is divided into fixed size "slots",
>>>> + * each ERST_RECORD_SIZE in length, and each "slot"
>>>> + * storing a single record. No attempt at optimizing storage
>>>> + * through compression, compaction, etc is attempted.
>>>> + * NOTE that slot 0 is reserved for the backing storage header.
>>>> + * Depending upon the size of the backing storage, additional
>>>> + * slots will be part of the slot 0 header in order to account
>>>> + * for a record_id for each available remaining slot.
>>>> + */
>>>> +/* 8KiB records, not too small, not too big */
>>>> +#define ERST_RECORD_SIZE (8192UL)
>>>> +
>>>> +#define ACPI_ERST_MEMDEV_PROP "memdev"
>>>> +
>>>> +/*
>>>> + * From the ACPI ERST spec sections:
>>>> + * A record id of all 0s is used to indicate
>>>> + * 'unspecified' record id.
>>>> + * A record id of all 1s is used to indicate
>>>> + * empty or end.
>>>> + */
>>>> +#define ERST_UNSPECIFIED_RECORD_ID (0UL)
>>>> +#define ERST_EMPTY_END_RECORD_ID (~0UL)
>>>> +#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
>>>> +#define ERST_IS_VALID_RECORD_ID(rid) \
>>>> +    ((rid != ERST_UNSPECIFIED_RECORD_ID) && \
>>>> +     (rid != ERST_EMPTY_END_RECORD_ID))
>>>> +
>>>> +typedef struct erst_storage_header_s {
>>>    
>>>> +#define ERST_STORE_MAGIC 0x524F545354535245UL
>>>
>>> move it out of structure definition,
>>> also where value comes from? (perhaps something starting
>>> with ERST... would be more self-describing)
>> done; this value is 'ERSTSTOR', which I've left as a comment in v7.
>>
>>>    
>>>> +    uint64_t magic;
>>>> +    uint32_t record_size;
>>>> +    uint32_t record_offset; /* offset to record storage beyond header */
>>>> +    uint16_t version;
>>>> +    uint16_t reserved;
>>>> +    uint32_t record_count;
>>>> +    uint64_t map[]; /* contains record_ids, and position indicates index */
>>>> +} erst_storage_header_t;
>>> docs/devel/style.rst: Typedefs
>> done; thanks
>>
>>>
>>> also give it's used as header layout in storage,
>>> set packed attribute for structure
>> done
>>
>>>    
>>>> +
>>>> +/*
>>>> + * Object cast macro
>>>> + */
>>>> +#define ACPIERST(obj) \
>>>> +    OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
>>>> +
>>>> +/*
>>>> + * Main ERST device state structure
>>>> + */
>>>> +typedef struct {
>>>> +    PCIDevice parent_obj;
>>>> +
>>>> +    /* Backend storage */
>>>> +    HostMemoryBackend *hostmem;
>>>> +    MemoryRegion *hostmem_mr;
>>>> +
>>>> +    /* Programming registers */
>>>> +    MemoryRegion iomem;
>>>> +
>>>> +    /* Exchange buffer */
>>>> +    Object *exchange_obj;
>>>> +    HostMemoryBackend *exchange;
>>>> +    MemoryRegion *exchange_mr;
>>>> +    uint32_t storage_size;
>>>> +
>>>> +    /* Interface state */
>>>> +    uint8_t operation;
>>>> +    uint8_t busy_status;
>>>> +    uint8_t command_status;
>>>> +    uint32_t record_offset;
>>>> +    uint64_t reg_action;
>>>> +    uint64_t reg_value;
>>>> +    uint64_t record_identifier;
>>>> +    erst_storage_header_t *header;
>>>> +    unsigned next_record_index;
>>>> +    unsigned first_record_index;
>>>> +    unsigned last_record_index;
>>>> +
>>>> +} ERSTDeviceState;
>>>> +
>>>> +/*******************************************************************/
>>>> +/*******************************************************************/
>>>> +
>>>> +static uint8_t *get_nvram_ptr_by_index(ERSTDeviceState *s, unsigned index)
>>>> +{
>>>> +    uint8_t *rc = NULL;
>>>> +    off_t offset = (index * ERST_RECORD_SIZE);
>>>    
>>>> +    if ((offset + ERST_RECORD_SIZE) <= s->storage_size) {
>>>
>>> it looks like 'index' passed by caller is always valid, if it's the case
>>> convert  this to
>>>           g_assert((offset + ERST_RECORD_SIZE) <= s->storage_size))
>> done
>>
>>>
>>> also shouldn't <= be just <
>> yes, done
>>
>>>
>>>    
>>>> +        if (s->hostmem_mr) {
>>> can hostmem_mr be NULL, when this function is called?
>>> if not I'd drop condition.
>> no, so dropped. done
>>
>>>    
>>>> +            uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(s->hostmem_mr);
>>>> +            rc = p + offset;
>>>> +        }
>>>> +    }
>>>> +    return rc;
>>>> +}
>>>> +
>>>> +static void make_erst_storage_header(ERSTDeviceState *s)
>>>> +{
>>>> +    erst_storage_header_t *header = s->header;
>>>> +    unsigned mapsz, headersz;
>>>> +
>>>> +    header->magic = ERST_STORE_MAGIC;
>>>> +    header->record_size = ERST_RECORD_SIZE;
>>>> +    header->version = 0x0101;
>>>
>>> maybe 0 or 1 to avoid question about what previous versions are
>> changed to simply 0x0100 (ie 1.0)
>>>    
>>>> +    header->reserved = 0x0000;
>>> s/0x.../0/
>> done
>>
>>>    
>>>> +
>>>> +    /* Compute mapsize */
>>>> +    mapsz = s->storage_size / ERST_RECORD_SIZE;
>>>> +    mapsz *= sizeof(uint64_t);
>>>> +    /* Compute header+map size */
>>>> +    headersz = sizeof(erst_storage_header_t) + mapsz;
>>>    
>>>> +    /* Round up to nearest integer multiple of ERST_RECORD_SIZE */
>>>> +    headersz += (ERST_RECORD_SIZE - 1);
>>>> +    headersz /= ERST_RECORD_SIZE;
>>>> +    headersz *= ERST_RECORD_SIZE;
>>> git grep ROUND_UP
>>> may be of help here
>> yes, thanks. I'm using that now, done.
>>
>>>    
>>>> +    header->record_offset = headersz;
>>>> +
>>>> +    /*
>>>> +     * The HostMemoryBackend initializes contents to zero,
>>>> +     * so all record_ids stashed in the map are zero'd.
>>>> +     * As well the record_count is zero. Properly initialized.
>>>> +     */
>>>> +}
>>>> +
>>>> +static void check_erst_backend_storage(ERSTDeviceState *s, Error **errp)
>>>> +{
>>>> +    erst_storage_header_t *header;
>>>> +
>>>> +    header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);
>>> optionally check/assert if it's not 64bit aligned,
>>> if it's not you risk getting killed by SIGBUG on some hosts,
>>> since you're accessing fields directly.
>> done!
>>
>>>    
>>>> +    s->header = header;
>>>> +
>>>> +    /* Check if header is uninitialized */
>>>> +    if (header->magic == 0UL) { /* HostMemoryBackend inits to 0 */
>>>> +        make_erst_storage_header(s);
>>>> +    }
>>>> +
>>>> +    if (!(
>>>> +        (header->magic == ERST_STORE_MAGIC) &&
>>>> +        (header->record_size == ERST_RECORD_SIZE) &&
>>>> +        ((header->record_offset % ERST_RECORD_SIZE) == 0) &&
>>>> +        (header->version == 0x0101) &&
>>>> +        (header->reserved == 0x0000)
>>>> +        )) {
>>>> +        error_setg(errp, "ERST backend storage header is invalid");
>>>> +    }
>>>> +
>>>> +    /* Compute offset of first and last record storage slot */
>>>> +    s->first_record_index = header->record_offset / ERST_RECORD_SIZE;
>>>> +    s->last_record_index = (s->storage_size / ERST_RECORD_SIZE);
>>>
>>> applies to whole patch/series,
>>> if mmaped header values are interpreted as integers you shall
>>> take care of endianness, i.e. use cpu_to_foo/foo_to_cpu for access
>> done; I'm using cpu_to_leX() and leX_to_cpu() for any access to the header.
>>
>>>
>>> and document file endianness in doc (2/10)
>> done
>>
>>>    
>>>> +}
>>>> +
>>>> +static void set_erst_map_by_index(ERSTDeviceState *s, unsigned index,
>>>> +    uint64_t record_id)
>>>
>>> update_[cache|map]_[entry|record_id]() or something like this might be
>>> a better description erst and index don't really add much here as it's
>>> clear from context.
>> done; now update_map_entry()
>>
>>>
>>>    
>>>> +{
>>>> +    if (index < s->last_record_index) {
>>>> +        s->header->map[index] = record_id;
>>>> +    }
>>>> +}
>>>> +
>>>> +static unsigned lookup_erst_record(ERSTDeviceState *s,
>>>> +    uint64_t record_identifier)
>>>> +{
>>>> +    unsigned rc = 0; /* 0 not a valid index */
>>>> +    unsigned index = s->first_record_index;
>>>> +
>>>> +    /* Find the record_identifier in the map */
>>>> +    if (record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
>>>> +        /*
>>>> +         * Count number of valid records encountered, and
>>>> +         * short-circuit the loop if identifier not found
>>>> +         */
>>>> +        unsigned count = 0;
>>>> +        for (; index < s->last_record_index &&
>>>> +                count < s->header->record_count; ++index) {
>>>> +            uint64_t map_record_identifier = s->header->map[index];
>>> I'd drop map_record_identifier and use s->header->map[index] directly,
>>> i.e
>>>      if (s->header->map[index] ...
>> done
>>
>>>    
>>>> +            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
>>>> +                ++count;
>>>> +            }
>>>> +            if (map_record_identifier == record_identifier) {
>>>> +                rc = index;
>>>> +                break;
>>>> +            }
>>>> +        }
>>>> +    } else {
>>>> +        /* Find first available unoccupied slot */
>>>> +        for (; index < s->last_record_index; ++index) {
>>>> +            if (s->header->map[index] == ERST_UNSPECIFIED_RECORD_ID) {
>>>> +                rc = index;
>>>> +                break;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return rc;
>>>> +}
>>>
>>> what's the reason for combining lookup and allocate ops,
>>> if they where separated it' would be easier to follow code.
>> done; at one point it made sense; no longer.
>>
>>>    
>>>> +
>>>> +/* ACPI 4.0: 17.4.2.3 Operations - Clearing */
>>>> +static unsigned clear_erst_record(ERSTDeviceState *s)
>>>> +{
>>>> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
>>>> +    unsigned index;
>>>> +
>>>> +    /* Check for valid record identifier */
>>>> +    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
>>>> +        return STATUS_FAILED;
>>>> +    }
>>>> +
>>>> +    index = lookup_erst_record(s, s->record_identifier);
>>>> +    if (index) {
>>>> +        /* No need to wipe record, just invalidate its map entry */
>>>> +        set_erst_map_by_index(s, index, ERST_UNSPECIFIED_RECORD_ID);
>>>> +        s->header->record_count -= 1;
>>>> +        rc = STATUS_SUCCESS;
>>>> +    }
>>>> +
>>>> +    return rc;
>>>> +}
>>>> +
>>>> +/* ACPI 4.0: 17.4.2.2 Operations - Reading */
>>>> +static unsigned read_erst_record(ERSTDeviceState *s)
>>>> +{
>>>> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
>>>> +    unsigned index;
>>>> +
>>>> +    /* Check record boundary wihin exchange buffer */
>>>                                   ^^^ typo
>> done
>>
>>>    
>>>> +    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
>>>> +        return STATUS_FAILED;
>>>> +    }
>>>> +
>>>> +    /* Check for valid record identifier */
>>>> +    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
>>>> +        return STATUS_FAILED;
>>>> +    }
>>>> +
>>>> +    index = lookup_erst_record(s, s->record_identifier);
>>>> +    if (index) {
>>>> +        uint8_t *ptr;
>>>> +        uint8_t *record = ((uint8_t *)
>>>> +            memory_region_get_ram_ptr(s->exchange_mr) +
>>>> +            s->record_offset);
>>>> +        ptr = get_nvram_ptr_by_index(s, index);
>>>> +        memcpy(record, ptr, ERST_RECORD_SIZE - s->record_offset);
>>>
>>> if record_offset is large enough that record won't fit,
>>> this will copy truncated record into the exchange buffer.
>>>
>>> Maybe it's better to fail whole op?
>> The first check within this function checks for this very condition, and does fail.
>> I believe the code does as you are asking.
> 
> The 1st check guaranties that 'exchange_mr + record_offset, exchange_mr_end'
> has a space at least for UEFI_CPER_RECORD_MIN_SIZE, while the source record
> that's being copied can be larger than that.
> i.e. assume
>   record_offset = 7, ERST_RECORD_SIZE = 10, UEFI_CPER_RECORD_MIN_SIZE = 2, ptr->record_size = 9
>   
>   > if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE))
> will be passed, while
>   > memcpy(record, ptr, ERST_RECORD_SIZE - s->record_offset);
> will copy 3 bytes only, truncating the rest of the record
> but still report success.
ok, I understand now, thanks!

> 
> Also, while max copied amount won't exceed exchange_mr capacity
> due to it being equal to ERST_RECORD_SIZE in current impl., it can
> be dangerous later on if buffer/record sizes diverge as dependency
> coded here is implicit. Safer option would be using actual destination
> buffer/copied record size for check to avoid potential buffer overrun
> (I'm assuming that records are not random blobs but CPER formatted structure).
> 
>   copy_size = to_be_copied_record_size
>   if copy_size <= memory_region_size(exchange_mr) - record_offset
>      memcpy(record, ptr, copy_size)
>   else
>      error_out
ok

> 
> [1] the same applies to 'if (s->record_offset >= ...)' check
> make it use actual exchange_mr size explicitly.
ok

> 
> nit:
> Also use of record_offset in header and in state is a bit of overloaded,
> I'd consider renaming one of them to avoid confusion.
done; changed header field to 'storage_offset'

> 
>>>    
>>>> +        rc = STATUS_SUCCESS;
>>>> +    }
>>>> +
>>>> +    return rc;
>>>> +}
>>>> +
>>>> +/* ACPI 4.0: 17.4.2.1 Operations - Writing */
>>>> +static unsigned write_erst_record(ERSTDeviceState *s)
>>>> +{
>>>> +    unsigned rc = STATUS_FAILED;
>>>> +    unsigned index;
>>>> +    uint64_t record_identifier;
>>>> +    uint8_t *record;
>>>> +    uint8_t *ptr = NULL;
>>>> +    bool record_found = false;
>>>> +
>>>> +    /* Check record boundary wihin exchange buffer */
>>> ditto, typo
>> done
>>
>>>    
>>>> +    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
>>>> +        return STATUS_FAILED;
>>>> +    }
> see (1)
yep

> 
>>>> +
>>>> +    /* Extract record identifier */
>>>> +    record = ((uint8_t *)memory_region_get_ram_ptr(s->exchange_mr)
>>>> +        + s->record_offset);
>>>> +    record_identifier = THE_UEFI_CPER_RECORD_ID(record);
>>> potentially unaligned access to int, should use memcpy()
>> done
>>
>>>    
>>>> +
>>>> +    /* Check for valid record identifier */
>>>> +    if (!ERST_IS_VALID_RECORD_ID(record_identifier)) {
>>>> +        return STATUS_FAILED;
>>>> +    }
>>>> +
>>>> +    index = lookup_erst_record(s, record_identifier);
>>>> +    if (index) {
>>>> +        /* Record found, overwrite existing record */
>>>> +        ptr = get_nvram_ptr_by_index(s, index);
>>>> +        record_found = true;
>>>> +    } else {
>>>> +        /* Record not found, not an overwrite, allocate for write */
>>>> +        index = lookup_erst_record(s, ERST_UNSPECIFIED_RECORD_ID);
>>>> +        if (index) {
>>>> +            ptr = get_nvram_ptr_by_index(s, index);
>>>> +        } else {
>>>> +            rc = STATUS_NOT_ENOUGH_SPACE;
>>>> +        }
>>>> +    }
>>>> +    if (ptr) {
>>>> +        memcpy(ptr, record, ERST_RECORD_SIZE - s->record_offset);
> 
> This copies the remainder of exchange buffer, including 'leftovers' from
> previous operations.
> Is there a reason why are you not verifying actual 'record' size in buffer
> and if it fits within target 'ptr' copy just useful payload from buffer?

So I think this question might be getting at a fundamental difference (and thus the questions/points 
you are raising). The UEFI CPER record has a member field 'record_length':

"Indicates the size of the actual error record, including the size of the record header, all section 
descriptors, and section bodies. The size may include extra buffer space to allow for the dynamic 
addition of error sections descriptors bodies."

Thus far, in this implementation, I have *avoided* using 'record_length' out of the record as simply 
deeming it as untrustworthy and a possible attack vector. Instead, I've been using 
(ERST_RECORD_SIZE-s->record_offset) as the length of the record to copy.

I could use 'record_length', and validate it prior to trusting it. Validation here would simply be 
ensuring it is <= ERST_RECORD_SIZE? I think this is what you are suggesting, correct?


> 
>>>> +        if (0 != s->record_offset) {
>>>> +            memset(&ptr[ERST_RECORD_SIZE - s->record_offset],
>>>> +                0xFF, s->record_offset);
>>>> +        }
>>> you've lost me here, care to explain what's going on here?
>> If the record_offset is not 0, then there can be bytes following the record within the slot that
>> were not written. This simply sets them to 0xFF (so bytes from a previously written record that
>> happened to occupy this slot do not "bleed" through).
>> I've left a comment in v7.
> 
> well, 'bleed' happens because 'read_erst_record' copies whole slot
> instead of the actual record size.
> 
> And that would work, only while exchange buffer size and record size
> are equal, and fall apart silently as soon as that is not true,
> leading to potential exploits.
> 
> it might be more robust if it written like this:
>     if_record_is_complete (i.e. record in buffer is not truncated)
>         if_actual_record_size_fits_in_slot
>             memcpy(slot, record, actual_record_size)
>             memset(slot+actual_record_size, 0xff, slot_size - actual_record_size);
>     otherwise error out

See question on 'record_length' above.

>   
>>>> +        if (!record_found) {
>>>> +            s->header->record_count += 1; /* writing new record */
>>>> +        }
>>>> +        set_erst_map_by_index(s, index, record_identifier);
>>>> +        rc = STATUS_SUCCESS;
>>>> +    }
>>>> +
>>>> +    return rc;
>>>> +}
>>>> +
>>>> +/* ACPI 4.0: 17.4.2.2 Operations - Reading "During boot..." */
>>>> +static unsigned next_erst_record(ERSTDeviceState *s,
>>>> +    uint64_t *record_identifier)
>>> s/record_identifier/found.../
>> done
>>
>>>    
>>>> +{
>>>> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
>>>> +    unsigned index = s->next_record_index;
>>>> +
>>>> +    *record_identifier = ERST_EMPTY_END_RECORD_ID;
>>>> +
>>>> +    if (s->header->record_count) {
>>>> +        for (; index < s->last_record_index; ++index) {
>>>> +            uint64_t map_record_identifier;
>>> and then s/map_record_identifier/record_identifier/
>> done
>>
>>>
>>> the same applies to other occurrences within patch
>>> (map_record_identifier is a bit confusing) or drop it
>>> and use s->header->map[index] directly
>> done
>>
>>>    
>>>> +            map_record_identifier = s->header->map[index];
>>>> +            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
>>>> +                    /* where to start next time */
>>>> +                    s->next_record_index = index + 1;
>>>> +                    *record_identifier = map_record_identifier;
>>>> +                    rc = STATUS_SUCCESS;
>>>> +                    break;
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +    if (rc != STATUS_SUCCESS) {
>>>> +        if (s->next_record_index == s->first_record_index) {
>>>> +            /*
>>>> +             * next_record_identifier is unchanged, no records found
>>>> +             * and *record_identifier contains EMPTY_END id
>>>> +             */
>>>> +            rc = STATUS_RECORD_STORE_EMPTY;
>>>> +        }
>>>> +        /* at end/scan complete, reset */
>>>> +        s->next_record_index = s->first_record_index;
>>>> +    }
>>>
>>> Table 17-16, says return existing error or ERST_EMPTY_END_RECORD_ID
>>> but nothing about op returning a error, so I'd assume status
>>> should always be STATUS_SUCCESS for GET_RECORD_IDENTIFIER.
>> done
>>
>>>
>>> Advancing to the next record is part of record READ op and
>>> not the part of GET_RECORD_IDENTIFIER as it's done here.
>> well...
>>
>>>     "The steps performed by the platform to carry out ...
>>>        2. ..
>>>           c. If the specified error record does not exist,
>>>              ... update the status register’s Identifier field with the identifier of the
>>> ‘first’ error record
>>>        4. Record the Identifier of the ‘next’ valid error record ...
>>>     "
>>
>> I used ACPI spec v6 and I was asked to locate the first occurrence of ERST in the spec, which was
>> v4. So the above spec quotes are accurate, however, spec v6 deviates in an important way from the
>> above, which reads:
>>
>>     "c. If the status is Record Not Found (0x05), indicating that the specified error record does not
>> exist, OSPM retrieves a valid identifier by a GET_RECORD_IDENTIFIER action. The platform will return
>> a valid record identifier."
> 
> that's quote from OSPM behavior,
> 
> the platform part still looks the same to me (in 4.0/5.0/6.0/6.3) (they split 2.c on 2.c and 2.d)
> but the meaning is the same.

So I now see that the description of the READ operation actually has two sections; the first starts 
with "during boot" and another talking about a straight up read operation. I had been focusing on 
the "on boot" reading, but alas I do need to accomodate better the plain read, as you point out.

> 
>   
>> So GET_RECORD_IDENTIFIER is essentially a factory that pumps out record identifiers for records
>> stored. I kind of think of it as the old DOS 'find_first/find_next'. And yes v4 of the spec states
>> that the READ operation should initiate the first record_identifer. However v6 clearly states this
>> is now the responsibility of OSPM, not the READ op.
>>
>> I am thinking that the best way to handle this contradiction is to change the ACPI spec citation
>> from v4 to v5, as the wording in v5 matches what I cite from v6, and implemented. Furthermore, this
>> approach of OSPM obtaining the next valid record_id via GET_RECORD_IDENTIFIER is consistent with
> 
>> what I observed in BIOS and with how Linux is coded.
> pointer[s] to source[s] please?

Fwiw, Linux converts all the entries in ERST into pstore entries upon boot. Any subsequent "read" of 
the pstore entry does not touch ERST again; instead it reads from the in-kernel pstore contents.

The driver in Linux is drivers/acpi/apei/erst.c; but note that it conforms to the pstore set of 
callbacks.

> 
> 
> Well, spec can be wrong too (not the 1st time) but we need to be sure
> what is broken and doesn't work as it's supposed to and document it
> properly, before deviating from the spec.

I see now specs appear to be the same. I need to accomodate the non "on boot" path.

> 
> 
> 
>> Thoughts?
>>
>>>
>>>    
>>>> +
>>>> +    return rc;
>>>> +}
>>>> +
>>>> +/*******************************************************************/
>>>> +
>>>> +static uint64_t erst_rd_reg64(hwaddr addr,
>>>> +    uint64_t reg, unsigned size)
>>>> +{
>>>> +    uint64_t rdval;
>>>> +    uint64_t mask;
>>>> +    unsigned shift;
>>>> +
>>>> +    if (size == sizeof(uint64_t)) {
>>>> +        /* 64b access */
>>>> +        mask = 0xFFFFFFFFFFFFFFFFUL;
>>>> +        shift = 0;
>>>> +    } else {
>>>> +        /* 32b access */
>>>> +        mask = 0x00000000FFFFFFFFUL;
>>>> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
>>>> +    }
>>>> +
>>>> +    rdval = reg;
>>>> +    rdval >>= shift;
>>>> +    rdval &= mask;
>>>> +
>>>> +    return rdval;
>>>> +}
>>>> +
>>>> +static uint64_t erst_wr_reg64(hwaddr addr,
>>>> +    uint64_t reg, uint64_t val, unsigned size)
>>>> +{
>>>> +    uint64_t wrval;
>>>> +    uint64_t mask;
>>>> +    unsigned shift;
>>>> +    if (size == sizeof(uint64_t)) {
>>>> +        /* 64b access */
>>>> +        mask = 0xFFFFFFFFFFFFFFFFUL;
>>>> +        shift = 0;
>>>> +    } else {
>>>> +        /* 32b access */
>>>> +        mask = 0x00000000FFFFFFFFUL;
>>>> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
>>>> +    }
>>>> +
>>>> +    val &= mask;
>>>> +    val <<= shift;
>>>> +    mask <<= shift;
>>>> +    wrval = reg;
>>>> +    wrval &= ~mask;
>>>> +    wrval |= val;
>>>> +
>>>> +    return wrval;
>>>> +}
>>>> +
>>>> +static void erst_reg_write(void *opaque, hwaddr addr,
>>>> +    uint64_t val, unsigned size)
>>>> +{
>>>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
>>>> +
>>>> +    /*
>>>> +     * NOTE: All actions/operations/side effects happen on the WRITE,
>>>> +     * by design. The READs simply return the reg_value contents.
>>>
>>> point to spec, pls.
>> This was an implementation design choice, so no spec reference applicable, I left a comment.
>>
>>
>>>    
>>>> +     */
>>>> +    trace_acpi_erst_reg_write(addr, val, size);
>>>> +
>>>> +    switch (addr) {
>>>> +    case ERST_VALUE_OFFSET + 0:
>>>> +    case ERST_VALUE_OFFSET + 4:
>>>> +        s->reg_value = erst_wr_reg64(addr, s->reg_value, val, size);
>>>> +        break;
>>>> +    case ERST_ACTION_OFFSET + 0:
>>>    
>>>> +/*  case ERST_ACTION_OFFSET+4: as coded, not really a 64b register */
>>>
>>> what does this mean?
>> In short, all values written to this register are just the ACTION ops, so there wasn't a need to
>> implement this as a 64-bit register, especially since Linux seems to issue two 32-bit accesses for
>> 64-bit; in this case the upper access is utterly useless.
>> I placed a comment in code.
> comment as it's, above is not helpful,
> so it would be better to have a comment that explains reasoning a bit better.
> like:
>     supported/impl ACPTION ops are 32 only, so ...
ok

> 
>>>> +        switch (val) {
>>>> +        case ACTION_BEGIN_WRITE_OPERATION:
>>>> +        case ACTION_BEGIN_READ_OPERATION:
>>>> +        case ACTION_BEGIN_CLEAR_OPERATION:
>>>> +        case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
>>>> +        case ACTION_END_OPERATION:
>>>> +            s->operation = val;
>>>> +            break;
>>>> +        case ACTION_SET_RECORD_OFFSET:
>>>> +            s->record_offset = s->reg_value;
>>>> +            break;
>>>> +        case ACTION_EXECUTE_OPERATION:
>>>> +            if ((uint8_t)s->reg_value == ERST_EXECUTE_OPERATION_MAGIC) {
>>>> +                s->busy_status = 1;
>>>> +                switch (s->operation) {
>>>> +                case ACTION_BEGIN_WRITE_OPERATION:
>>>> +                    s->command_status = write_erst_record(s);
>>>> +                    break;
>>>> +                case ACTION_BEGIN_READ_OPERATION:
>>>> +                    s->command_status = read_erst_record(s);
>>>> +                    break;
>>>> +                case ACTION_BEGIN_CLEAR_OPERATION:
>>>> +                    s->command_status = clear_erst_record(s);
>>>> +                    break;
>>>> +                case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
>>>> +                    s->command_status = STATUS_SUCCESS;
>>>> +                    break;
>>>> +                case ACTION_END_OPERATION:
>>>> +                    s->command_status = STATUS_SUCCESS;
>>>> +                    break;
>>>> +                default:
>>>> +                    s->command_status = STATUS_FAILED;
>>>> +                    break;
>>>> +                }
>>>> +                s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;
>>>                      shouldn't happen in case of Read op
>> correct, removed as not needed at all now.
>>
>>>
>>> "
>>> 17.4.2.2
>>> 4. Record the Identifier of the ‘next’ valid error record that resides on the persistent store. This
>>> allows OSPM to retrieve a valid record identifier by executing a GET_RECORD_IDENTIFIER
>>> operation.
>>> "
>>>    
>>>> +                s->busy_status = 0;
>>>> +            }
>>>> +            break;
>>>> +        case ACTION_CHECK_BUSY_STATUS:
>>>> +            s->reg_value = s->busy_status;
>>>> +            break;
>>>> +        case ACTION_GET_COMMAND_STATUS:
>>>> +            s->reg_value = s->command_status;
>>>> +            break;
>>>> +        case ACTION_GET_RECORD_IDENTIFIER:
>>>> +            s->command_status = next_erst_record(s, &s->reg_value);
>>>> +            break;
>>>> +        case ACTION_SET_RECORD_IDENTIFIER:
>>>> +            s->record_identifier = s->reg_value;
>>>> +            break;
>>>> +        case ACTION_GET_RECORD_COUNT:
>>>> +            s->reg_value = s->header->record_count;
>>>> +            break;
>>>> +        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
>>>> +            s->reg_value = (hwaddr)pci_get_bar_addr(PCI_DEVICE(s), 1);
>>>> +            break;
>>>> +        case ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
>>>> +            s->reg_value = ERST_RECORD_SIZE;
>>>> +            break;
>>>> +        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
>>>> +            s->reg_value = 0x0; /* intentional, not NVRAM mode */
>>>> +            break;
>>>> +        case ACTION_GET_EXECUTE_OPERATION_TIMINGS:
>>>> +            s->reg_value =
>>>> +                (100ULL << 32) | /* 100us max time */
>>>> +                (10ULL  <<  0) ; /*  10us min time */
>>>> +            break;
>>>> +        default:
>>>> +            /* Unknown action/command, NOP */
>>>> +            break;
>>>> +        }
>>>> +        break;
>>>> +    default:
>>>> +        /* This should not happen, but if it does, NOP */
>>>> +        break;
>>>> +    }
>>>> +}
>>>> +
>>>> +static uint64_t erst_reg_read(void *opaque, hwaddr addr,
>>>> +                                unsigned size)
>>>> +{
>>>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
>>>> +    uint64_t val = 0;
>>>> +
>>>> +    switch (addr) {
>>>> +    case ERST_ACTION_OFFSET + 0:
>>>> +    case ERST_ACTION_OFFSET + 4:
>>>> +        val = erst_rd_reg64(addr, s->reg_action, size);
>>>> +        break;
>>>> +    case ERST_VALUE_OFFSET + 0:
>>>> +    case ERST_VALUE_OFFSET + 4:
>>>> +        val = erst_rd_reg64(addr, s->reg_value, size);
>>>> +        break;
>>>> +    default:
>>>> +        break;
>>>> +    }
>>>> +    trace_acpi_erst_reg_read(addr, val, size);
>>>> +    return val;
>>>> +}
>>>> +
>>>> +static const MemoryRegionOps erst_reg_ops = {
>>>> +    .read = erst_reg_read,
>>>> +    .write = erst_reg_write,
>>>> +    .endianness = DEVICE_NATIVE_ENDIAN,
>>>> +};
>>>> +
>>>> +/*******************************************************************/
>>>> +/*******************************************************************/
>>>> +static int erst_post_load(void *opaque, int version_id)
>>>> +{
>>>> +    ERSTDeviceState *s = opaque;
>>>> +
>>>> +    /* Recompute pointer to header */
>>>> +    s->header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);
>>>> +    trace_acpi_erst_post_load(s->header);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static const VMStateDescription erst_vmstate  = {
>>>> +    .name = "acpi-erst",
>>>> +    .version_id = 1,
>>>> +    .minimum_version_id = 1,
>>>> +    .post_load = erst_post_load,
>>>> +    .fields = (VMStateField[]) {
>>>> +        VMSTATE_UINT32(storage_size, ERSTDeviceState),
>>>    1)
>>>> +        VMSTATE_UINT8(operation, ERSTDeviceState),
>>>> +        VMSTATE_UINT8(busy_status, ERSTDeviceState),
>>>> +        VMSTATE_UINT8(command_status, ERSTDeviceState),
>>>> +        VMSTATE_UINT32(record_offset, ERSTDeviceState),
>>>> +        VMSTATE_UINT64(reg_action, ERSTDeviceState),
>>>> +        VMSTATE_UINT64(reg_value, ERSTDeviceState),
>>>> +        VMSTATE_UINT64(record_identifier, ERSTDeviceState),
>>>> +        VMSTATE_UINT32(next_record_index, ERSTDeviceState),
>>>    
>>>> +        VMSTATE_UINT32(first_record_index, ERSTDeviceState),
>>>> +        VMSTATE_UINT32(last_record_index, ERSTDeviceState),
>>>    2)
>>>> +        VMSTATE_END_OF_LIST()
>>>> +    }
>>>> +};
>>>
>>>    1 and 2 aren't runtime state, so why they are in migration stream?
>> done; removed storage_size, first_record_index and last_record_index from the migration stream.
>>
>>
>>>
>>> I'd imagine size could be used to check that backend on target is of the same size
>>> to avoid buffer overrun if target side has smaller backend, and fail migration if
>>> it's not the same. But it aren't used this way here.
>> I decided to not do this check as that memory object is migrated automatically, so I dont think my
>> check adds any value.
>>
>>>
>>> the rest could be calculated at realize time.
>> and in fact they are.
>>
>>>    
>>>> +
>>>> +static void erst_realizefn(PCIDevice *pci_dev, Error **errp)
>>>> +{
>>>> +    ERSTDeviceState *s = ACPIERST(pci_dev);
>>>> +
>>>> +    trace_acpi_erst_realizefn_in();
>>>> +
>>>> +    if (!s->hostmem) {
>>>> +        error_setg(errp, "'" ACPI_ERST_MEMDEV_PROP "' property is not set");
>>>> +        return;
>>>> +    } else if (host_memory_backend_is_mapped(s->hostmem)) {
>>>> +        error_setg(errp, "can't use already busy memdev: %s",
>>>> +                   object_get_canonical_path_component(OBJECT(s->hostmem)));
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    s->hostmem_mr = host_memory_backend_get_memory(s->hostmem);
>>>> +
>>>> +    /* HostMemoryBackend size will be multiple of PAGE_SIZE */
>>>> +    s->storage_size = object_property_get_int(OBJECT(s->hostmem), "size", errp);
>>>> +
>>>> +    /* Check storage_size against ERST_RECORD_SIZE */
>>>> +    if (((s->storage_size % ERST_RECORD_SIZE) != 0) ||
>>>> +         (ERST_RECORD_SIZE > s->storage_size)) {
>>>> +        error_setg(errp, "ACPI ERST requires size be multiple of "
>>>> +            "record size (%luKiB)", ERST_RECORD_SIZE);
>>>> +    }
>>>> +
>>>> +    /* Initialize backend storage and record_count */
>>>> +    check_erst_backend_storage(s, errp);
>>>> +
>>>> +    /* BAR 0: Programming registers */
>>>> +    memory_region_init_io(&s->iomem, OBJECT(pci_dev), &erst_reg_ops, s,
>>>> +                          TYPE_ACPI_ERST, ERST_REG_SIZE);
>>>> +    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->iomem);
>>>> +
>>>> +    /* BAR 1: Exchange buffer memory */
>>>
>>>    
>>>> +    /* Create a hostmem object to use as the exchange buffer */
>>>> +    s->exchange_obj = object_new(TYPE_MEMORY_BACKEND_RAM);
>>>> +    object_property_set_int(s->exchange_obj, "size", ERST_RECORD_SIZE, errp);
>>>> +    user_creatable_complete(USER_CREATABLE(s->exchange_obj), errp);
>>>> +    s->exchange = MEMORY_BACKEND(s->exchange_obj);
>>>> +    host_memory_backend_set_mapped(s->exchange, true);
>>>> +    s->exchange_mr = host_memory_backend_get_memory(s->exchange);
>>> replace this block with single memory_region_init_ram()
>> done!
>>
>>>
>>>    
>>>> +    memory_region_init_resizeable_ram(s->exchange_mr, OBJECT(pci_dev),
>>>> +        TYPE_ACPI_ERST, ERST_RECORD_SIZE, ERST_RECORD_SIZE, NULL, errp);
>>> have ho idea why it's necessary, seems just wrong, it basically leaks
>>> previous memory region and creates a new one.
>> done!
>>
>>>    
>>>> +    pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, s->exchange_mr);
>>>    
>>>> +    /* Include the exchange buffer in the migration stream */
>>>> +    vmstate_register_ram_global(s->exchange_mr);
>>> not necessary if memory_region_init_ram() is used directly
>> done!
>>
>>>    
>>>> +
>>>> +    /* Include the backend storage in the migration stream */
>>>> +    vmstate_register_ram_global(s->hostmem_mr);
>>>> +
>>>> +    trace_acpi_erst_realizefn_out(s->storage_size);
>>>> +}
>>>> +
>>>> +static void erst_reset(DeviceState *dev)
>>>> +{
>>>> +    ERSTDeviceState *s = ACPIERST(dev);
>>>> +
>>>> +    trace_acpi_erst_reset_in(s->header->record_count);
>>>> +    s->operation = 0;
>>>> +    s->busy_status = 0;
>>>> +    s->command_status = STATUS_SUCCESS;
>>>> +    s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;
>>>> +    s->record_offset = 0;
>>>> +    s->next_record_index = s->first_record_index;
>>>> +    /* NOTE: first/last_record_index are computed only once */
>>>> +    trace_acpi_erst_reset_out(s->header->record_count);
>>>> +}
>>>> +
>>>> +static Property erst_properties[] = {
>>>> +    DEFINE_PROP_LINK(ACPI_ERST_MEMDEV_PROP, ERSTDeviceState, hostmem,
>>>> +                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
>>>> +    DEFINE_PROP_END_OF_LIST(),
>>>> +};
>>>> +
>>>> +static void erst_class_init(ObjectClass *klass, void *data)
>>>> +{
>>>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>>>> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>>>> +
>>>> +    trace_acpi_erst_class_init_in();
>>>> +    k->realize = erst_realizefn;
>>>> +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
>>>> +    k->device_id = PCI_DEVICE_ID_REDHAT_ACPI_ERST;
>>>> +    k->revision = 0x00;
>>>> +    k->class_id = PCI_CLASS_OTHERS;
>>>> +    dc->reset = erst_reset;
>>>> +    dc->vmsd = &erst_vmstate;
>>>> +    dc->user_creatable = true;
>>>
>>> can't be hotplugged, add:
>>>          dc->hotpluggable = false;
>> done
>>
>>>    
>>>> +    device_class_set_props(dc, erst_properties);
>>>> +    dc->desc = "ACPI Error Record Serialization Table (ERST) device";
>>>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>>> +    trace_acpi_erst_class_init_out();
>>>> +}
>>>> +
>>>> +static const TypeInfo erst_type_info = {
>>>> +    .name          = TYPE_ACPI_ERST,
>>>> +    .parent        = TYPE_PCI_DEVICE,
>>>> +    .class_init    = erst_class_init,
>>>> +    .instance_size = sizeof(ERSTDeviceState),
>>>> +    .interfaces = (InterfaceInfo[]) {
>>>> +        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
>>>> +        { }
>>>> +    }
>>>> +};
>>>> +
>>>> +static void erst_register_types(void)
>>>> +{
>>>> +    type_register_static(&erst_type_info);
>>>> +}
>>>> +
>>>> +type_init(erst_register_types)
>>>> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
>>>> index 29f804d..401d0e5 100644
>>>> --- a/hw/acpi/meson.build
>>>> +++ b/hw/acpi/meson.build
>>>> @@ -5,6 +5,7 @@ acpi_ss.add(files(
>>>>      'bios-linker-loader.c',
>>>>      'core.c',
>>>>      'utils.c',
>>>> +  'erst.c',
>>>>    ))
>>>>    acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu.c'))
>>>>    acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu_hotplug.c'))
>>>> diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
>>>> index 974d770..3579768 100644
>>>> --- a/hw/acpi/trace-events
>>>> +++ b/hw/acpi/trace-events
>>>> @@ -55,3 +55,18 @@ piix4_gpe_writeb(uint64_t addr, unsigned width, uint64_t val) "addr: 0x%" PRIx64
>>>>    # tco.c
>>>>    tco_timer_reload(int ticks, int msec) "ticks=%d (%d ms)"
>>>>    tco_timer_expired(int timeouts_no, bool strap, bool no_reboot) "timeouts_no=%d no_reboot=%d/%d"
>>>> +
>>>> +# erst.c
>>>> +acpi_erst_reg_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%04" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
>>>> +acpi_erst_reg_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%04" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
>>>> +acpi_erst_mem_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%06" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
>>>> +acpi_erst_mem_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%06" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
>>>> +acpi_erst_pci_bar_0(uint64_t addr) "BAR0: 0x%016" PRIx64
>>>> +acpi_erst_pci_bar_1(uint64_t addr) "BAR1: 0x%016" PRIx64
>>>> +acpi_erst_realizefn_in(void)
>>>> +acpi_erst_realizefn_out(unsigned size) "total nvram size %u bytes"
>>>> +acpi_erst_reset_in(unsigned record_count) "record_count %u"
>>>> +acpi_erst_reset_out(unsigned record_count) "record_count %u"
>>>> +acpi_erst_post_load(void *header) "header: 0x%p"
>>>> +acpi_erst_class_init_in(void)
>>>> +acpi_erst_class_init_out(void)
>>>    
>>
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] ACPI ERST: specification for ERST support
  2021-08-05 22:30 ` [PATCH v6 02/10] ACPI ERST: specification for ERST support Eric DeVolder
  2021-09-20 13:38   ` Igor Mammedov
@ 2021-10-06  6:58   ` Ani Sinha
  2021-10-06  7:00     ` Ani Sinha
  2021-10-06  8:12   ` [PATCH v6 02/10] " Michael S. Tsirkin
  2 siblings, 1 reply; 34+ messages in thread
From: Ani Sinha @ 2021-10-06  6:58 UTC (permalink / raw)
  To: eric.devolder
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, imammedo,
	boris.ostrovsky, rth

From: Eric DeVolder <eric.devolder@oracle.com>

>---
> docs/specs/acpi_erst.txt | 147 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 147 insertions(+)
> create mode 100644 docs/specs/acpi_erst.txt
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] ACPI ERST: specification for ERST support
  2021-10-06  6:58   ` [PATCH] " Ani Sinha
@ 2021-10-06  7:00     ` Ani Sinha
  2021-10-06 20:07       ` Eric DeVolder
  0 siblings, 1 reply; 34+ messages in thread
From: Ani Sinha @ 2021-10-06  7:00 UTC (permalink / raw)
  To: Ani Sinha
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, imammedo,
	boris.ostrovsky, eric.devolder, rth



On Wed, 6 Oct 2021, Ani Sinha wrote:

> From: Eric DeVolder <eric.devolder@oracle.com>
>
> >---
> > docs/specs/acpi_erst.txt | 147 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 147 insertions(+)
> > create mode 100644 docs/specs/acpi_erst.txt
> >

OK it did not come out the way I wanted. But

Acked-by: Ani Sinha <ani@anisinha.ca>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 02/10] ACPI ERST: specification for ERST support
  2021-08-05 22:30 ` [PATCH v6 02/10] ACPI ERST: specification for ERST support Eric DeVolder
  2021-09-20 13:38   ` Igor Mammedov
  2021-10-06  6:58   ` [PATCH] " Ani Sinha
@ 2021-10-06  8:12   ` Michael S. Tsirkin
  2021-10-06 20:07     ` Eric DeVolder
  2 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2021-10-06  8:12 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, konrad.wilk, qemu-devel, pbonzini, imammedo,
	boris.ostrovsky, rth

On Thu, Aug 05, 2021 at 06:30:31PM -0400, Eric DeVolder wrote:
> Information on the implementation of the ACPI ERST support.
> 
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> ---
>  docs/specs/acpi_erst.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 147 insertions(+)
>  create mode 100644 docs/specs/acpi_erst.txt

It's probably a good idea to have new documents in the rst
format.

-- 
MST



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 05/10] ACPI ERST: support for ACPI ERST feature
  2021-10-05 16:40         ` Eric DeVolder
@ 2021-10-06 14:36           ` Igor Mammedov
  0 siblings, 0 replies; 34+ messages in thread
From: Igor Mammedov @ 2021-10-06 14:36 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, boris.ostrovsky, rth

On Tue, 5 Oct 2021 11:40:35 -0500
Eric DeVolder <eric.devolder@oracle.com> wrote:

> Igor, again thanks for the detailed review. Inline responses below.
> eric
> 
> On 10/5/21 6:39 AM, Igor Mammedov wrote:
> > On Mon, 4 Oct 2021 16:13:09 -0500
> > Eric DeVolder <eric.devolder@oracle.com> wrote:
> >   
> >> Igor, thanks for the close examination. Inline responses below.
> >> eric
> >>
> >> On 9/21/21 10:30 AM, Igor Mammedov wrote:  
> >>> On Thu,  5 Aug 2021 18:30:34 -0400
> >>> Eric DeVolder <eric.devolder@oracle.com> wrote:
> >>>      
> >>>> This implements a PCI device for ACPI ERST. This implements the
> >>>> non-NVRAM "mode" of operation for ERST as it is supported by
> >>>> Linux and Windows.
> >>>>
> >>>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> >>>> ---
> >>>>    hw/acpi/erst.c       | 750 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>    hw/acpi/meson.build  |   1 +
> >>>>    hw/acpi/trace-events |  15 ++
> >>>>    3 files changed, 766 insertions(+)
> >>>>    create mode 100644 hw/acpi/erst.c
> >>>>
> >>>> diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
> >>>> new file mode 100644
> >>>> index 0000000..eb4ab34
> >>>> --- /dev/null
> >>>> +++ b/hw/acpi/erst.c
> >>>> @@ -0,0 +1,750 @@
> >>>> +/*
> >>>> + * ACPI Error Record Serialization Table, ERST, Implementation
> >>>> + *
> >>>> + * ACPI ERST introduced in ACPI 4.0, June 16, 2009.
> >>>> + * ACPI Platform Error Interfaces : Error Serialization
> >>>> + *
> >>>> + * Copyright (c) 2021 Oracle and/or its affiliates.
> >>>> + *
> >>>> + * SPDX-License-Identifier: GPL-2.0-or-later
> >>>> + */
> >>>> +
> >>>> +#include <sys/types.h>
> >>>> +#include <sys/stat.h>
> >>>> +#include <unistd.h>
> >>>> +
> >>>> +#include "qemu/osdep.h"
> >>>> +#include "qapi/error.h"
> >>>> +#include "hw/qdev-core.h"
> >>>> +#include "exec/memory.h"
> >>>> +#include "qom/object.h"
> >>>> +#include "hw/pci/pci.h"
> >>>> +#include "qom/object_interfaces.h"
> >>>> +#include "qemu/error-report.h"
> >>>> +#include "migration/vmstate.h"
> >>>> +#include "hw/qdev-properties.h"
> >>>> +#include "hw/acpi/acpi.h"
> >>>> +#include "hw/acpi/acpi-defs.h"
> >>>> +#include "hw/acpi/aml-build.h"
> >>>> +#include "hw/acpi/bios-linker-loader.h"
> >>>> +#include "exec/address-spaces.h"
> >>>> +#include "sysemu/hostmem.h"
> >>>> +#include "hw/acpi/erst.h"
> >>>> +#include "trace.h"
> >>>> +
> >>>> +/* ACPI 4.0: Table 17-16 Serialization Actions */
> >>>> +#define ACTION_BEGIN_WRITE_OPERATION         0x0
> >>>> +#define ACTION_BEGIN_READ_OPERATION          0x1
> >>>> +#define ACTION_BEGIN_CLEAR_OPERATION         0x2
> >>>> +#define ACTION_END_OPERATION                 0x3
> >>>> +#define ACTION_SET_RECORD_OFFSET             0x4
> >>>> +#define ACTION_EXECUTE_OPERATION             0x5
> >>>> +#define ACTION_CHECK_BUSY_STATUS             0x6
> >>>> +#define ACTION_GET_COMMAND_STATUS            0x7
> >>>> +#define ACTION_GET_RECORD_IDENTIFIER         0x8
> >>>> +#define ACTION_SET_RECORD_IDENTIFIER         0x9
> >>>> +#define ACTION_GET_RECORD_COUNT              0xA
> >>>> +#define ACTION_BEGIN_DUMMY_WRITE_OPERATION   0xB
> >>>> +#define ACTION_RESERVED                      0xC
> >>>> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE   0xD
> >>>> +#define ACTION_GET_ERROR_LOG_ADDRESS_LENGTH  0xE
> >>>> +#define ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES 0xF
> >>>> +#define ACTION_GET_EXECUTE_OPERATION_TIMINGS 0x10
> >>>> +
> >>>> +/* ACPI 4.0: Table 17-17 Command Status Definitions */
> >>>> +#define STATUS_SUCCESS                0x00
> >>>> +#define STATUS_NOT_ENOUGH_SPACE       0x01
> >>>> +#define STATUS_HARDWARE_NOT_AVAILABLE 0x02
> >>>> +#define STATUS_FAILED                 0x03
> >>>> +#define STATUS_RECORD_STORE_EMPTY     0x04
> >>>> +#define STATUS_RECORD_NOT_FOUND       0x05
> >>>> +
> >>>> +
> >>>> +/* UEFI 2.1: Appendix N Common Platform Error Record */
> >>>> +#define UEFI_CPER_RECORD_MIN_SIZE 128U
> >>>> +#define UEFI_CPER_RECORD_LENGTH_OFFSET 20U
> >>>> +#define UEFI_CPER_RECORD_ID_OFFSET 96U
> >>>> +#define IS_UEFI_CPER_RECORD(ptr) \
> >>>> +    (((ptr)[0] == 'C') && \
> >>>> +     ((ptr)[1] == 'P') && \
> >>>> +     ((ptr)[2] == 'E') && \
> >>>> +     ((ptr)[3] == 'R'))
> >>>> +#define THE_UEFI_CPER_RECORD_ID(ptr) \
> >>>> +    (*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
> >>>> +
> >>>> +/*
> >>>> + * This implementation is an ACTION (cmd) and VALUE (data)
> >>>> + * interface consisting of just two 64-bit registers.
> >>>> + */
> >>>> +#define ERST_REG_SIZE (16UL)
> >>>> +#define ERST_ACTION_OFFSET (0UL) /* action (cmd) */
> >>>> +#define ERST_VALUE_OFFSET  (8UL) /* argument/value (data) */
> >>>> +
> >>>> +/*
> >>>> + * ERST_RECORD_SIZE is the buffer size for exchanging ERST
> >>>> + * record contents. Thus, it defines the maximum record size.
> >>>> + * As this is mapped through a PCI BAR, it must be a power of
> >>>> + * two and larger than UEFI_CPER_RECORD_MIN_SIZE.
> >>>> + * The backing storage is divided into fixed size "slots",
> >>>> + * each ERST_RECORD_SIZE in length, and each "slot"
> >>>> + * storing a single record. No attempt at optimizing storage
> >>>> + * through compression, compaction, etc is attempted.
> >>>> + * NOTE that slot 0 is reserved for the backing storage header.
> >>>> + * Depending upon the size of the backing storage, additional
> >>>> + * slots will be part of the slot 0 header in order to account
> >>>> + * for a record_id for each available remaining slot.
> >>>> + */
> >>>> +/* 8KiB records, not too small, not too big */
> >>>> +#define ERST_RECORD_SIZE (8192UL)
> >>>> +
> >>>> +#define ACPI_ERST_MEMDEV_PROP "memdev"
> >>>> +
> >>>> +/*
> >>>> + * From the ACPI ERST spec sections:
> >>>> + * A record id of all 0s is used to indicate
> >>>> + * 'unspecified' record id.
> >>>> + * A record id of all 1s is used to indicate
> >>>> + * empty or end.
> >>>> + */
> >>>> +#define ERST_UNSPECIFIED_RECORD_ID (0UL)
> >>>> +#define ERST_EMPTY_END_RECORD_ID (~0UL)
> >>>> +#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
> >>>> +#define ERST_IS_VALID_RECORD_ID(rid) \
> >>>> +    ((rid != ERST_UNSPECIFIED_RECORD_ID) && \
> >>>> +     (rid != ERST_EMPTY_END_RECORD_ID))
> >>>> +
> >>>> +typedef struct erst_storage_header_s {  
> >>>      
> >>>> +#define ERST_STORE_MAGIC 0x524F545354535245UL  
> >>>
> >>> move it out of structure definition,
> >>> also where value comes from? (perhaps something starting
> >>> with ERST... would be more self-describing)  
> >> done; this value is 'ERSTSTOR', which I've left as a comment in v7.
> >>  
> >>>      
> >>>> +    uint64_t magic;
> >>>> +    uint32_t record_size;
> >>>> +    uint32_t record_offset; /* offset to record storage beyond header */
> >>>> +    uint16_t version;
> >>>> +    uint16_t reserved;
> >>>> +    uint32_t record_count;
> >>>> +    uint64_t map[]; /* contains record_ids, and position indicates index */
> >>>> +} erst_storage_header_t;  
> >>> docs/devel/style.rst: Typedefs  
> >> done; thanks
> >>  
> >>>
> >>> also give it's used as header layout in storage,
> >>> set packed attribute for structure  
> >> done
> >>  
> >>>      
> >>>> +
> >>>> +/*
> >>>> + * Object cast macro
> >>>> + */
> >>>> +#define ACPIERST(obj) \
> >>>> +    OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
> >>>> +
> >>>> +/*
> >>>> + * Main ERST device state structure
> >>>> + */
> >>>> +typedef struct {
> >>>> +    PCIDevice parent_obj;
> >>>> +
> >>>> +    /* Backend storage */
> >>>> +    HostMemoryBackend *hostmem;
> >>>> +    MemoryRegion *hostmem_mr;
> >>>> +
> >>>> +    /* Programming registers */
> >>>> +    MemoryRegion iomem;
> >>>> +
> >>>> +    /* Exchange buffer */
> >>>> +    Object *exchange_obj;
> >>>> +    HostMemoryBackend *exchange;
> >>>> +    MemoryRegion *exchange_mr;
> >>>> +    uint32_t storage_size;
> >>>> +
> >>>> +    /* Interface state */
> >>>> +    uint8_t operation;
> >>>> +    uint8_t busy_status;
> >>>> +    uint8_t command_status;
> >>>> +    uint32_t record_offset;
> >>>> +    uint64_t reg_action;
> >>>> +    uint64_t reg_value;
> >>>> +    uint64_t record_identifier;
> >>>> +    erst_storage_header_t *header;
> >>>> +    unsigned next_record_index;
> >>>> +    unsigned first_record_index;
> >>>> +    unsigned last_record_index;
> >>>> +
> >>>> +} ERSTDeviceState;
> >>>> +
> >>>> +/*******************************************************************/
> >>>> +/*******************************************************************/
> >>>> +
> >>>> +static uint8_t *get_nvram_ptr_by_index(ERSTDeviceState *s, unsigned index)
> >>>> +{
> >>>> +    uint8_t *rc = NULL;
> >>>> +    off_t offset = (index * ERST_RECORD_SIZE);  
> >>>      
> >>>> +    if ((offset + ERST_RECORD_SIZE) <= s->storage_size) {  
> >>>
> >>> it looks like 'index' passed by caller is always valid, if it's the case
> >>> convert  this to
> >>>           g_assert((offset + ERST_RECORD_SIZE) <= s->storage_size))  
> >> done
> >>  
> >>>
> >>> also shouldn't <= be just <  
> >> yes, done
> >>  
> >>>
> >>>      
> >>>> +        if (s->hostmem_mr) {  
> >>> can hostmem_mr be NULL, when this function is called?
> >>> if not I'd drop condition.  
> >> no, so dropped. done
> >>  
> >>>      
> >>>> +            uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(s->hostmem_mr);
> >>>> +            rc = p + offset;
> >>>> +        }
> >>>> +    }
> >>>> +    return rc;
> >>>> +}
> >>>> +
> >>>> +static void make_erst_storage_header(ERSTDeviceState *s)
> >>>> +{
> >>>> +    erst_storage_header_t *header = s->header;
> >>>> +    unsigned mapsz, headersz;
> >>>> +
> >>>> +    header->magic = ERST_STORE_MAGIC;
> >>>> +    header->record_size = ERST_RECORD_SIZE;
> >>>> +    header->version = 0x0101;  
> >>>
> >>> maybe 0 or 1 to avoid question about what previous versions are  
> >> changed to simply 0x0100 (ie 1.0)  
> >>>      
> >>>> +    header->reserved = 0x0000;  
> >>> s/0x.../0/  
> >> done
> >>  
> >>>      
> >>>> +
> >>>> +    /* Compute mapsize */
> >>>> +    mapsz = s->storage_size / ERST_RECORD_SIZE;
> >>>> +    mapsz *= sizeof(uint64_t);
> >>>> +    /* Compute header+map size */
> >>>> +    headersz = sizeof(erst_storage_header_t) + mapsz;  
> >>>      
> >>>> +    /* Round up to nearest integer multiple of ERST_RECORD_SIZE */
> >>>> +    headersz += (ERST_RECORD_SIZE - 1);
> >>>> +    headersz /= ERST_RECORD_SIZE;
> >>>> +    headersz *= ERST_RECORD_SIZE;  
> >>> git grep ROUND_UP
> >>> may be of help here  
> >> yes, thanks. I'm using that now, done.
> >>  
> >>>      
> >>>> +    header->record_offset = headersz;
> >>>> +
> >>>> +    /*
> >>>> +     * The HostMemoryBackend initializes contents to zero,
> >>>> +     * so all record_ids stashed in the map are zero'd.
> >>>> +     * As well the record_count is zero. Properly initialized.
> >>>> +     */
> >>>> +}
> >>>> +
> >>>> +static void check_erst_backend_storage(ERSTDeviceState *s, Error **errp)
> >>>> +{
> >>>> +    erst_storage_header_t *header;
> >>>> +
> >>>> +    header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);  
> >>> optionally check/assert if it's not 64bit aligned,
> >>> if it's not you risk getting killed by SIGBUG on some hosts,
> >>> since you're accessing fields directly.  
> >> done!
> >>  
> >>>      
> >>>> +    s->header = header;
> >>>> +
> >>>> +    /* Check if header is uninitialized */
> >>>> +    if (header->magic == 0UL) { /* HostMemoryBackend inits to 0 */
> >>>> +        make_erst_storage_header(s);
> >>>> +    }
> >>>> +
> >>>> +    if (!(
> >>>> +        (header->magic == ERST_STORE_MAGIC) &&
> >>>> +        (header->record_size == ERST_RECORD_SIZE) &&
> >>>> +        ((header->record_offset % ERST_RECORD_SIZE) == 0) &&
> >>>> +        (header->version == 0x0101) &&
> >>>> +        (header->reserved == 0x0000)
> >>>> +        )) {
> >>>> +        error_setg(errp, "ERST backend storage header is invalid");
> >>>> +    }
> >>>> +
> >>>> +    /* Compute offset of first and last record storage slot */
> >>>> +    s->first_record_index = header->record_offset / ERST_RECORD_SIZE;
> >>>> +    s->last_record_index = (s->storage_size / ERST_RECORD_SIZE);  
> >>>
> >>> applies to whole patch/series,
> >>> if mmaped header values are interpreted as integers you shall
> >>> take care of endianness, i.e. use cpu_to_foo/foo_to_cpu for access  
> >> done; I'm using cpu_to_leX() and leX_to_cpu() for any access to the header.
> >>  
> >>>
> >>> and document file endianness in doc (2/10)  
> >> done
> >>  
> >>>      
> >>>> +}
> >>>> +
> >>>> +static void set_erst_map_by_index(ERSTDeviceState *s, unsigned index,
> >>>> +    uint64_t record_id)  
> >>>
> >>> update_[cache|map]_[entry|record_id]() or something like this might be
> >>> a better description erst and index don't really add much here as it's
> >>> clear from context.  
> >> done; now update_map_entry()
> >>  
> >>>
> >>>      
> >>>> +{
> >>>> +    if (index < s->last_record_index) {
> >>>> +        s->header->map[index] = record_id;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +static unsigned lookup_erst_record(ERSTDeviceState *s,
> >>>> +    uint64_t record_identifier)
> >>>> +{
> >>>> +    unsigned rc = 0; /* 0 not a valid index */
> >>>> +    unsigned index = s->first_record_index;
> >>>> +
> >>>> +    /* Find the record_identifier in the map */
> >>>> +    if (record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
> >>>> +        /*
> >>>> +         * Count number of valid records encountered, and
> >>>> +         * short-circuit the loop if identifier not found
> >>>> +         */
> >>>> +        unsigned count = 0;
> >>>> +        for (; index < s->last_record_index &&
> >>>> +                count < s->header->record_count; ++index) {
> >>>> +            uint64_t map_record_identifier = s->header->map[index];  
> >>> I'd drop map_record_identifier and use s->header->map[index] directly,
> >>> i.e
> >>>      if (s->header->map[index] ...  
> >> done
> >>  
> >>>      
> >>>> +            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
> >>>> +                ++count;
> >>>> +            }
> >>>> +            if (map_record_identifier == record_identifier) {
> >>>> +                rc = index;
> >>>> +                break;
> >>>> +            }
> >>>> +        }
> >>>> +    } else {
> >>>> +        /* Find first available unoccupied slot */
> >>>> +        for (; index < s->last_record_index; ++index) {
> >>>> +            if (s->header->map[index] == ERST_UNSPECIFIED_RECORD_ID) {
> >>>> +                rc = index;
> >>>> +                break;
> >>>> +            }
> >>>> +        }
> >>>> +    }
> >>>> +
> >>>> +    return rc;
> >>>> +}  
> >>>
> >>> what's the reason for combining lookup and allocate ops,
> >>> if they where separated it' would be easier to follow code.  
> >> done; at one point it made sense; no longer.
> >>  
> >>>      
> >>>> +
> >>>> +/* ACPI 4.0: 17.4.2.3 Operations - Clearing */
> >>>> +static unsigned clear_erst_record(ERSTDeviceState *s)
> >>>> +{
> >>>> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
> >>>> +    unsigned index;
> >>>> +
> >>>> +    /* Check for valid record identifier */
> >>>> +    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
> >>>> +        return STATUS_FAILED;
> >>>> +    }
> >>>> +
> >>>> +    index = lookup_erst_record(s, s->record_identifier);
> >>>> +    if (index) {
> >>>> +        /* No need to wipe record, just invalidate its map entry */
> >>>> +        set_erst_map_by_index(s, index, ERST_UNSPECIFIED_RECORD_ID);
> >>>> +        s->header->record_count -= 1;
> >>>> +        rc = STATUS_SUCCESS;
> >>>> +    }
> >>>> +
> >>>> +    return rc;
> >>>> +}
> >>>> +
> >>>> +/* ACPI 4.0: 17.4.2.2 Operations - Reading */
> >>>> +static unsigned read_erst_record(ERSTDeviceState *s)
> >>>> +{
> >>>> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
> >>>> +    unsigned index;
> >>>> +
> >>>> +    /* Check record boundary wihin exchange buffer */  
> >>>                                   ^^^ typo  
> >> done
> >>  
> >>>      
> >>>> +    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
> >>>> +        return STATUS_FAILED;
> >>>> +    }
> >>>> +
> >>>> +    /* Check for valid record identifier */
> >>>> +    if (!ERST_IS_VALID_RECORD_ID(s->record_identifier)) {
> >>>> +        return STATUS_FAILED;
> >>>> +    }
> >>>> +
> >>>> +    index = lookup_erst_record(s, s->record_identifier);
> >>>> +    if (index) {
> >>>> +        uint8_t *ptr;
> >>>> +        uint8_t *record = ((uint8_t *)
> >>>> +            memory_region_get_ram_ptr(s->exchange_mr) +
> >>>> +            s->record_offset);
> >>>> +        ptr = get_nvram_ptr_by_index(s, index);
> >>>> +        memcpy(record, ptr, ERST_RECORD_SIZE - s->record_offset);  
> >>>
> >>> if record_offset is large enough that record won't fit,
> >>> this will copy truncated record into the exchange buffer.
> >>>
> >>> Maybe it's better to fail whole op?  
> >> The first check within this function checks for this very condition, and does fail.
> >> I believe the code does as you are asking.  
> > 
> > The 1st check guaranties that 'exchange_mr + record_offset, exchange_mr_end'
> > has a space at least for UEFI_CPER_RECORD_MIN_SIZE, while the source record
> > that's being copied can be larger than that.
> > i.e. assume
> >   record_offset = 7, ERST_RECORD_SIZE = 10, UEFI_CPER_RECORD_MIN_SIZE = 2, ptr->record_size = 9
> >     
> >   > if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE))  
> > will be passed, while  
> >   > memcpy(record, ptr, ERST_RECORD_SIZE - s->record_offset);  
> > will copy 3 bytes only, truncating the rest of the record
> > but still report success.  
> ok, I understand now, thanks!
> 
> > 
> > Also, while max copied amount won't exceed exchange_mr capacity
> > due to it being equal to ERST_RECORD_SIZE in current impl., it can
> > be dangerous later on if buffer/record sizes diverge as dependency
> > coded here is implicit. Safer option would be using actual destination
> > buffer/copied record size for check to avoid potential buffer overrun
> > (I'm assuming that records are not random blobs but CPER formatted structure).
> > 
> >   copy_size = to_be_copied_record_size
> >   if copy_size <= memory_region_size(exchange_mr) - record_offset
> >      memcpy(record, ptr, copy_size)
> >   else
> >      error_out  
> ok
> 
> > 
> > [1] the same applies to 'if (s->record_offset >= ...)' check
> > make it use actual exchange_mr size explicitly.  
> ok
> 
> > 
> > nit:
> > Also use of record_offset in header and in state is a bit of overloaded,
> > I'd consider renaming one of them to avoid confusion.  
> done; changed header field to 'storage_offset'
> 
> >   
> >>>      
> >>>> +        rc = STATUS_SUCCESS;
> >>>> +    }
> >>>> +
> >>>> +    return rc;
> >>>> +}
> >>>> +
> >>>> +/* ACPI 4.0: 17.4.2.1 Operations - Writing */
> >>>> +static unsigned write_erst_record(ERSTDeviceState *s)
> >>>> +{
> >>>> +    unsigned rc = STATUS_FAILED;
> >>>> +    unsigned index;
> >>>> +    uint64_t record_identifier;
> >>>> +    uint8_t *record;
> >>>> +    uint8_t *ptr = NULL;
> >>>> +    bool record_found = false;
> >>>> +
> >>>> +    /* Check record boundary wihin exchange buffer */  
> >>> ditto, typo  
> >> done
> >>  
> >>>      
> >>>> +    if (s->record_offset >= (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
> >>>> +        return STATUS_FAILED;
> >>>> +    }  
> > see (1)  
> yep
> 
> >   
> >>>> +
> >>>> +    /* Extract record identifier */
> >>>> +    record = ((uint8_t *)memory_region_get_ram_ptr(s->exchange_mr)
> >>>> +        + s->record_offset);
> >>>> +    record_identifier = THE_UEFI_CPER_RECORD_ID(record);  
> >>> potentially unaligned access to int, should use memcpy()  
> >> done
> >>  
> >>>      
> >>>> +
> >>>> +    /* Check for valid record identifier */
> >>>> +    if (!ERST_IS_VALID_RECORD_ID(record_identifier)) {
> >>>> +        return STATUS_FAILED;
> >>>> +    }
> >>>> +
> >>>> +    index = lookup_erst_record(s, record_identifier);
> >>>> +    if (index) {
> >>>> +        /* Record found, overwrite existing record */
> >>>> +        ptr = get_nvram_ptr_by_index(s, index);
> >>>> +        record_found = true;
> >>>> +    } else {
> >>>> +        /* Record not found, not an overwrite, allocate for write */
> >>>> +        index = lookup_erst_record(s, ERST_UNSPECIFIED_RECORD_ID);
> >>>> +        if (index) {
> >>>> +            ptr = get_nvram_ptr_by_index(s, index);
> >>>> +        } else {
> >>>> +            rc = STATUS_NOT_ENOUGH_SPACE;
> >>>> +        }
> >>>> +    }
> >>>> +    if (ptr) {
> >>>> +        memcpy(ptr, record, ERST_RECORD_SIZE - s->record_offset);  
> > 
> > This copies the remainder of exchange buffer, including 'leftovers' from
> > previous operations.
> > Is there a reason why are you not verifying actual 'record' size in buffer
> > and if it fits within target 'ptr' copy just useful payload from buffer?  
> 
> So I think this question might be getting at a fundamental difference (and thus the questions/points 
> you are raising). The UEFI CPER record has a member field 'record_length':
> 
> "Indicates the size of the actual error record, including the size of the record header, all section 
> descriptors, and section bodies. The size may include extra buffer space to allow for the dynamic 
> addition of error sections descriptors bodies."
> 
> Thus far, in this implementation, I have *avoided* using 'record_length' out of the record as simply 
> deeming it as untrustworthy and a possible attack vector. Instead, I've been using 
> (ERST_RECORD_SIZE-s->record_offset) as the length of the record to copy.
> 
> I could use 'record_length', and validate it prior to trusting it. Validation here would simply be 
> ensuring it is <= ERST_RECORD_SIZE? I think this is what you are suggesting, correct?

it should be verified and be less than memcpy target explicitly (i.e. use exchange_mr size or slot size)
that should decouple exchange buffer size from record size.
This way if in the future exchange buffer is increased, one won't have to rewrite
whole impl.

> 
> >   
> >>>> +        if (0 != s->record_offset) {
> >>>> +            memset(&ptr[ERST_RECORD_SIZE - s->record_offset],
> >>>> +                0xFF, s->record_offset);
> >>>> +        }  
> >>> you've lost me here, care to explain what's going on here?  
> >> If the record_offset is not 0, then there can be bytes following the record within the slot that
> >> were not written. This simply sets them to 0xFF (so bytes from a previously written record that
> >> happened to occupy this slot do not "bleed" through).
> >> I've left a comment in v7.  
> > 
> > well, 'bleed' happens because 'read_erst_record' copies whole slot
> > instead of the actual record size.
> > 
> > And that would work, only while exchange buffer size and record size
> > are equal, and fall apart silently as soon as that is not true,
> > leading to potential exploits.
> > 
> > it might be more robust if it written like this:
> >     if_record_is_complete (i.e. record in buffer is not truncated)
> >         if_actual_record_size_fits_in_slot
> >             memcpy(slot, record, actual_record_size)
> >             memset(slot+actual_record_size, 0xff, slot_size - actual_record_size);
> >     otherwise error out  
> 
> See question on 'record_length' above.
> 
> >     
> >>>> +        if (!record_found) {
> >>>> +            s->header->record_count += 1; /* writing new record */
> >>>> +        }
> >>>> +        set_erst_map_by_index(s, index, record_identifier);
> >>>> +        rc = STATUS_SUCCESS;
> >>>> +    }
> >>>> +
> >>>> +    return rc;
> >>>> +}
> >>>> +
> >>>> +/* ACPI 4.0: 17.4.2.2 Operations - Reading "During boot..." */
> >>>> +static unsigned next_erst_record(ERSTDeviceState *s,
> >>>> +    uint64_t *record_identifier)  
> >>> s/record_identifier/found.../  
> >> done
> >>  
> >>>      
> >>>> +{
> >>>> +    unsigned rc = STATUS_RECORD_NOT_FOUND;
> >>>> +    unsigned index = s->next_record_index;
> >>>> +
> >>>> +    *record_identifier = ERST_EMPTY_END_RECORD_ID;
> >>>> +
> >>>> +    if (s->header->record_count) {
> >>>> +        for (; index < s->last_record_index; ++index) {
> >>>> +            uint64_t map_record_identifier;  
> >>> and then s/map_record_identifier/record_identifier/  
> >> done
> >>  
> >>>
> >>> the same applies to other occurrences within patch
> >>> (map_record_identifier is a bit confusing) or drop it
> >>> and use s->header->map[index] directly  
> >> done
> >>  
> >>>      
> >>>> +            map_record_identifier = s->header->map[index];
> >>>> +            if (map_record_identifier != ERST_UNSPECIFIED_RECORD_ID) {
> >>>> +                    /* where to start next time */
> >>>> +                    s->next_record_index = index + 1;
> >>>> +                    *record_identifier = map_record_identifier;
> >>>> +                    rc = STATUS_SUCCESS;
> >>>> +                    break;
> >>>> +            }
> >>>> +        }
> >>>> +    }
> >>>> +    if (rc != STATUS_SUCCESS) {
> >>>> +        if (s->next_record_index == s->first_record_index) {
> >>>> +            /*
> >>>> +             * next_record_identifier is unchanged, no records found
> >>>> +             * and *record_identifier contains EMPTY_END id
> >>>> +             */
> >>>> +            rc = STATUS_RECORD_STORE_EMPTY;
> >>>> +        }
> >>>> +        /* at end/scan complete, reset */
> >>>> +        s->next_record_index = s->first_record_index;
> >>>> +    }  
> >>>
> >>> Table 17-16, says return existing error or ERST_EMPTY_END_RECORD_ID
> >>> but nothing about op returning a error, so I'd assume status
> >>> should always be STATUS_SUCCESS for GET_RECORD_IDENTIFIER.  
> >> done
> >>  
> >>>
> >>> Advancing to the next record is part of record READ op and
> >>> not the part of GET_RECORD_IDENTIFIER as it's done here.  
> >> well...
> >>  
> >>>     "The steps performed by the platform to carry out ...
> >>>        2. ..
> >>>           c. If the specified error record does not exist,
> >>>              ... update the status register’s Identifier field with the identifier of the
> >>> ‘first’ error record
> >>>        4. Record the Identifier of the ‘next’ valid error record ...
> >>>     "  
> >>
> >> I used ACPI spec v6 and I was asked to locate the first occurrence of ERST in the spec, which was
> >> v4. So the above spec quotes are accurate, however, spec v6 deviates in an important way from the
> >> above, which reads:
> >>
> >>     "c. If the status is Record Not Found (0x05), indicating that the specified error record does not
> >> exist, OSPM retrieves a valid identifier by a GET_RECORD_IDENTIFIER action. The platform will return
> >> a valid record identifier."  
> > 
> > that's quote from OSPM behavior,
> > 
> > the platform part still looks the same to me (in 4.0/5.0/6.0/6.3) (they split 2.c on 2.c and 2.d)
> > but the meaning is the same.  
> 
> So I now see that the description of the READ operation actually has two sections; the first starts 
> with "during boot" and another talking about a straight up read operation. I had been focusing on 
> the "on boot" reading, but alas I do need to accomodate better the plain read, as you point out.

if it can work as spec say, we'd better to follow it.
If it doesn't actually work, we can code whatever works +
sufficient documentation/proof to justify that.

> > 
> >     
> >> So GET_RECORD_IDENTIFIER is essentially a factory that pumps out record identifiers for records
> >> stored. I kind of think of it as the old DOS 'find_first/find_next'. And yes v4 of the spec states
> >> that the READ operation should initiate the first record_identifer. However v6 clearly states this
> >> is now the responsibility of OSPM, not the READ op.
> >>
> >> I am thinking that the best way to handle this contradiction is to change the ACPI spec citation
> >> from v4 to v5, as the wording in v5 matches what I cite from v6, and implemented. Furthermore, this
> >> approach of OSPM obtaining the next valid record_id via GET_RECORD_IDENTIFIER is consistent with  
> >   
> >> what I observed in BIOS and with how Linux is coded.  
> > pointer[s] to source[s] please?  
> 
> Fwiw, Linux converts all the entries in ERST into pstore entries upon boot. Any subsequent "read" of 
> the pstore entry does not touch ERST again; instead it reads from the in-kernel pstore contents.
> 
> The driver in Linux is drivers/acpi/apei/erst.c; but note that it conforms to the pstore set of 
> callbacks.
> 
> > 
> > 
> > Well, spec can be wrong too (not the 1st time) but we need to be sure
> > what is broken and doesn't work as it's supposed to and document it
> > properly, before deviating from the spec.  
> 
> I see now specs appear to be the same. I need to accomodate the non "on boot" path.
> 
> > 
> > 
> >   
> >> Thoughts?
> >>  
> >>>
> >>>      
> >>>> +
> >>>> +    return rc;
> >>>> +}
> >>>> +
> >>>> +/*******************************************************************/
> >>>> +
> >>>> +static uint64_t erst_rd_reg64(hwaddr addr,
> >>>> +    uint64_t reg, unsigned size)
> >>>> +{
> >>>> +    uint64_t rdval;
> >>>> +    uint64_t mask;
> >>>> +    unsigned shift;
> >>>> +
> >>>> +    if (size == sizeof(uint64_t)) {
> >>>> +        /* 64b access */
> >>>> +        mask = 0xFFFFFFFFFFFFFFFFUL;
> >>>> +        shift = 0;
> >>>> +    } else {
> >>>> +        /* 32b access */
> >>>> +        mask = 0x00000000FFFFFFFFUL;
> >>>> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
> >>>> +    }
> >>>> +
> >>>> +    rdval = reg;
> >>>> +    rdval >>= shift;
> >>>> +    rdval &= mask;
> >>>> +
> >>>> +    return rdval;
> >>>> +}
> >>>> +
> >>>> +static uint64_t erst_wr_reg64(hwaddr addr,
> >>>> +    uint64_t reg, uint64_t val, unsigned size)
> >>>> +{
> >>>> +    uint64_t wrval;
> >>>> +    uint64_t mask;
> >>>> +    unsigned shift;
> >>>> +    if (size == sizeof(uint64_t)) {
> >>>> +        /* 64b access */
> >>>> +        mask = 0xFFFFFFFFFFFFFFFFUL;
> >>>> +        shift = 0;
> >>>> +    } else {
> >>>> +        /* 32b access */
> >>>> +        mask = 0x00000000FFFFFFFFUL;
> >>>> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
> >>>> +    }
> >>>> +
> >>>> +    val &= mask;
> >>>> +    val <<= shift;
> >>>> +    mask <<= shift;
> >>>> +    wrval = reg;
> >>>> +    wrval &= ~mask;
> >>>> +    wrval |= val;
> >>>> +
> >>>> +    return wrval;
> >>>> +}
> >>>> +
> >>>> +static void erst_reg_write(void *opaque, hwaddr addr,
> >>>> +    uint64_t val, unsigned size)
> >>>> +{
> >>>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> >>>> +
> >>>> +    /*
> >>>> +     * NOTE: All actions/operations/side effects happen on the WRITE,
> >>>> +     * by design. The READs simply return the reg_value contents.  
> >>>
> >>> point to spec, pls.  
> >> This was an implementation design choice, so no spec reference applicable, I left a comment.
> >>
> >>  
> >>>      
> >>>> +     */
> >>>> +    trace_acpi_erst_reg_write(addr, val, size);
> >>>> +
> >>>> +    switch (addr) {
> >>>> +    case ERST_VALUE_OFFSET + 0:
> >>>> +    case ERST_VALUE_OFFSET + 4:
> >>>> +        s->reg_value = erst_wr_reg64(addr, s->reg_value, val, size);
> >>>> +        break;
> >>>> +    case ERST_ACTION_OFFSET + 0:  
> >>>      
> >>>> +/*  case ERST_ACTION_OFFSET+4: as coded, not really a 64b register */  
> >>>
> >>> what does this mean?  
> >> In short, all values written to this register are just the ACTION ops, so there wasn't a need to
> >> implement this as a 64-bit register, especially since Linux seems to issue two 32-bit accesses for
> >> 64-bit; in this case the upper access is utterly useless.
> >> I placed a comment in code.  
> > comment as it's, above is not helpful,
> > so it would be better to have a comment that explains reasoning a bit better.
> > like:
> >     supported/impl ACPTION ops are 32 only, so ...  
> ok
> 
> >   
> >>>> +        switch (val) {
> >>>> +        case ACTION_BEGIN_WRITE_OPERATION:
> >>>> +        case ACTION_BEGIN_READ_OPERATION:
> >>>> +        case ACTION_BEGIN_CLEAR_OPERATION:
> >>>> +        case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> >>>> +        case ACTION_END_OPERATION:
> >>>> +            s->operation = val;
> >>>> +            break;
> >>>> +        case ACTION_SET_RECORD_OFFSET:
> >>>> +            s->record_offset = s->reg_value;
> >>>> +            break;
> >>>> +        case ACTION_EXECUTE_OPERATION:
> >>>> +            if ((uint8_t)s->reg_value == ERST_EXECUTE_OPERATION_MAGIC) {
> >>>> +                s->busy_status = 1;
> >>>> +                switch (s->operation) {
> >>>> +                case ACTION_BEGIN_WRITE_OPERATION:
> >>>> +                    s->command_status = write_erst_record(s);
> >>>> +                    break;
> >>>> +                case ACTION_BEGIN_READ_OPERATION:
> >>>> +                    s->command_status = read_erst_record(s);
> >>>> +                    break;
> >>>> +                case ACTION_BEGIN_CLEAR_OPERATION:
> >>>> +                    s->command_status = clear_erst_record(s);
> >>>> +                    break;
> >>>> +                case ACTION_BEGIN_DUMMY_WRITE_OPERATION:
> >>>> +                    s->command_status = STATUS_SUCCESS;
> >>>> +                    break;
> >>>> +                case ACTION_END_OPERATION:
> >>>> +                    s->command_status = STATUS_SUCCESS;
> >>>> +                    break;
> >>>> +                default:
> >>>> +                    s->command_status = STATUS_FAILED;
> >>>> +                    break;
> >>>> +                }
> >>>> +                s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;  
> >>>                      shouldn't happen in case of Read op  
> >> correct, removed as not needed at all now.
> >>  
> >>>
> >>> "
> >>> 17.4.2.2
> >>> 4. Record the Identifier of the ‘next’ valid error record that resides on the persistent store. This
> >>> allows OSPM to retrieve a valid record identifier by executing a GET_RECORD_IDENTIFIER
> >>> operation.
> >>> "
> >>>      
> >>>> +                s->busy_status = 0;
> >>>> +            }
> >>>> +            break;
> >>>> +        case ACTION_CHECK_BUSY_STATUS:
> >>>> +            s->reg_value = s->busy_status;
> >>>> +            break;
> >>>> +        case ACTION_GET_COMMAND_STATUS:
> >>>> +            s->reg_value = s->command_status;
> >>>> +            break;
> >>>> +        case ACTION_GET_RECORD_IDENTIFIER:
> >>>> +            s->command_status = next_erst_record(s, &s->reg_value);
> >>>> +            break;
> >>>> +        case ACTION_SET_RECORD_IDENTIFIER:
> >>>> +            s->record_identifier = s->reg_value;
> >>>> +            break;
> >>>> +        case ACTION_GET_RECORD_COUNT:
> >>>> +            s->reg_value = s->header->record_count;
> >>>> +            break;
> >>>> +        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
> >>>> +            s->reg_value = (hwaddr)pci_get_bar_addr(PCI_DEVICE(s), 1);
> >>>> +            break;
> >>>> +        case ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
> >>>> +            s->reg_value = ERST_RECORD_SIZE;
> >>>> +            break;
> >>>> +        case ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
> >>>> +            s->reg_value = 0x0; /* intentional, not NVRAM mode */
> >>>> +            break;
> >>>> +        case ACTION_GET_EXECUTE_OPERATION_TIMINGS:
> >>>> +            s->reg_value =
> >>>> +                (100ULL << 32) | /* 100us max time */
> >>>> +                (10ULL  <<  0) ; /*  10us min time */
> >>>> +            break;
> >>>> +        default:
> >>>> +            /* Unknown action/command, NOP */
> >>>> +            break;
> >>>> +        }
> >>>> +        break;
> >>>> +    default:
> >>>> +        /* This should not happen, but if it does, NOP */
> >>>> +        break;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +static uint64_t erst_reg_read(void *opaque, hwaddr addr,
> >>>> +                                unsigned size)
> >>>> +{
> >>>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> >>>> +    uint64_t val = 0;
> >>>> +
> >>>> +    switch (addr) {
> >>>> +    case ERST_ACTION_OFFSET + 0:
> >>>> +    case ERST_ACTION_OFFSET + 4:
> >>>> +        val = erst_rd_reg64(addr, s->reg_action, size);
> >>>> +        break;
> >>>> +    case ERST_VALUE_OFFSET + 0:
> >>>> +    case ERST_VALUE_OFFSET + 4:
> >>>> +        val = erst_rd_reg64(addr, s->reg_value, size);
> >>>> +        break;
> >>>> +    default:
> >>>> +        break;
> >>>> +    }
> >>>> +    trace_acpi_erst_reg_read(addr, val, size);
> >>>> +    return val;
> >>>> +}
> >>>> +
> >>>> +static const MemoryRegionOps erst_reg_ops = {
> >>>> +    .read = erst_reg_read,
> >>>> +    .write = erst_reg_write,
> >>>> +    .endianness = DEVICE_NATIVE_ENDIAN,
> >>>> +};
> >>>> +
> >>>> +/*******************************************************************/
> >>>> +/*******************************************************************/
> >>>> +static int erst_post_load(void *opaque, int version_id)
> >>>> +{
> >>>> +    ERSTDeviceState *s = opaque;
> >>>> +
> >>>> +    /* Recompute pointer to header */
> >>>> +    s->header = (erst_storage_header_t *)get_nvram_ptr_by_index(s, 0);
> >>>> +    trace_acpi_erst_post_load(s->header);
> >>>> +
> >>>> +    return 0;
> >>>> +}
> >>>> +
> >>>> +static const VMStateDescription erst_vmstate  = {
> >>>> +    .name = "acpi-erst",
> >>>> +    .version_id = 1,
> >>>> +    .minimum_version_id = 1,
> >>>> +    .post_load = erst_post_load,
> >>>> +    .fields = (VMStateField[]) {
> >>>> +        VMSTATE_UINT32(storage_size, ERSTDeviceState),  
> >>>    1)  
> >>>> +        VMSTATE_UINT8(operation, ERSTDeviceState),
> >>>> +        VMSTATE_UINT8(busy_status, ERSTDeviceState),
> >>>> +        VMSTATE_UINT8(command_status, ERSTDeviceState),
> >>>> +        VMSTATE_UINT32(record_offset, ERSTDeviceState),
> >>>> +        VMSTATE_UINT64(reg_action, ERSTDeviceState),
> >>>> +        VMSTATE_UINT64(reg_value, ERSTDeviceState),
> >>>> +        VMSTATE_UINT64(record_identifier, ERSTDeviceState),
> >>>> +        VMSTATE_UINT32(next_record_index, ERSTDeviceState),  
> >>>      
> >>>> +        VMSTATE_UINT32(first_record_index, ERSTDeviceState),
> >>>> +        VMSTATE_UINT32(last_record_index, ERSTDeviceState),  
> >>>    2)  
> >>>> +        VMSTATE_END_OF_LIST()
> >>>> +    }
> >>>> +};  
> >>>
> >>>    1 and 2 aren't runtime state, so why they are in migration stream?  
> >> done; removed storage_size, first_record_index and last_record_index from the migration stream.
> >>
> >>  
> >>>
> >>> I'd imagine size could be used to check that backend on target is of the same size
> >>> to avoid buffer overrun if target side has smaller backend, and fail migration if
> >>> it's not the same. But it aren't used this way here.  
> >> I decided to not do this check as that memory object is migrated automatically, so I dont think my
> >> check adds any value.
> >>  
> >>>
> >>> the rest could be calculated at realize time.  
> >> and in fact they are.
> >>  
> >>>      
> >>>> +
> >>>> +static void erst_realizefn(PCIDevice *pci_dev, Error **errp)
> >>>> +{
> >>>> +    ERSTDeviceState *s = ACPIERST(pci_dev);
> >>>> +
> >>>> +    trace_acpi_erst_realizefn_in();
> >>>> +
> >>>> +    if (!s->hostmem) {
> >>>> +        error_setg(errp, "'" ACPI_ERST_MEMDEV_PROP "' property is not set");
> >>>> +        return;
> >>>> +    } else if (host_memory_backend_is_mapped(s->hostmem)) {
> >>>> +        error_setg(errp, "can't use already busy memdev: %s",
> >>>> +                   object_get_canonical_path_component(OBJECT(s->hostmem)));
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    s->hostmem_mr = host_memory_backend_get_memory(s->hostmem);
> >>>> +
> >>>> +    /* HostMemoryBackend size will be multiple of PAGE_SIZE */
> >>>> +    s->storage_size = object_property_get_int(OBJECT(s->hostmem), "size", errp);
> >>>> +
> >>>> +    /* Check storage_size against ERST_RECORD_SIZE */
> >>>> +    if (((s->storage_size % ERST_RECORD_SIZE) != 0) ||
> >>>> +         (ERST_RECORD_SIZE > s->storage_size)) {
> >>>> +        error_setg(errp, "ACPI ERST requires size be multiple of "
> >>>> +            "record size (%luKiB)", ERST_RECORD_SIZE);
> >>>> +    }
> >>>> +
> >>>> +    /* Initialize backend storage and record_count */
> >>>> +    check_erst_backend_storage(s, errp);
> >>>> +
> >>>> +    /* BAR 0: Programming registers */
> >>>> +    memory_region_init_io(&s->iomem, OBJECT(pci_dev), &erst_reg_ops, s,
> >>>> +                          TYPE_ACPI_ERST, ERST_REG_SIZE);
> >>>> +    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->iomem);
> >>>> +
> >>>> +    /* BAR 1: Exchange buffer memory */  
> >>>
> >>>      
> >>>> +    /* Create a hostmem object to use as the exchange buffer */
> >>>> +    s->exchange_obj = object_new(TYPE_MEMORY_BACKEND_RAM);
> >>>> +    object_property_set_int(s->exchange_obj, "size", ERST_RECORD_SIZE, errp);
> >>>> +    user_creatable_complete(USER_CREATABLE(s->exchange_obj), errp);
> >>>> +    s->exchange = MEMORY_BACKEND(s->exchange_obj);
> >>>> +    host_memory_backend_set_mapped(s->exchange, true);
> >>>> +    s->exchange_mr = host_memory_backend_get_memory(s->exchange);  
> >>> replace this block with single memory_region_init_ram()  
> >> done!
> >>  
> >>>
> >>>      
> >>>> +    memory_region_init_resizeable_ram(s->exchange_mr, OBJECT(pci_dev),
> >>>> +        TYPE_ACPI_ERST, ERST_RECORD_SIZE, ERST_RECORD_SIZE, NULL, errp);  
> >>> have ho idea why it's necessary, seems just wrong, it basically leaks
> >>> previous memory region and creates a new one.  
> >> done!
> >>  
> >>>      
> >>>> +    pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, s->exchange_mr);  
> >>>      
> >>>> +    /* Include the exchange buffer in the migration stream */
> >>>> +    vmstate_register_ram_global(s->exchange_mr);  
> >>> not necessary if memory_region_init_ram() is used directly  
> >> done!
> >>  
> >>>      
> >>>> +
> >>>> +    /* Include the backend storage in the migration stream */
> >>>> +    vmstate_register_ram_global(s->hostmem_mr);
> >>>> +
> >>>> +    trace_acpi_erst_realizefn_out(s->storage_size);
> >>>> +}
> >>>> +
> >>>> +static void erst_reset(DeviceState *dev)
> >>>> +{
> >>>> +    ERSTDeviceState *s = ACPIERST(dev);
> >>>> +
> >>>> +    trace_acpi_erst_reset_in(s->header->record_count);
> >>>> +    s->operation = 0;
> >>>> +    s->busy_status = 0;
> >>>> +    s->command_status = STATUS_SUCCESS;
> >>>> +    s->record_identifier = ERST_UNSPECIFIED_RECORD_ID;
> >>>> +    s->record_offset = 0;
> >>>> +    s->next_record_index = s->first_record_index;
> >>>> +    /* NOTE: first/last_record_index are computed only once */
> >>>> +    trace_acpi_erst_reset_out(s->header->record_count);
> >>>> +}
> >>>> +
> >>>> +static Property erst_properties[] = {
> >>>> +    DEFINE_PROP_LINK(ACPI_ERST_MEMDEV_PROP, ERSTDeviceState, hostmem,
> >>>> +                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
> >>>> +    DEFINE_PROP_END_OF_LIST(),
> >>>> +};
> >>>> +
> >>>> +static void erst_class_init(ObjectClass *klass, void *data)
> >>>> +{
> >>>> +    DeviceClass *dc = DEVICE_CLASS(klass);
> >>>> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> >>>> +
> >>>> +    trace_acpi_erst_class_init_in();
> >>>> +    k->realize = erst_realizefn;
> >>>> +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
> >>>> +    k->device_id = PCI_DEVICE_ID_REDHAT_ACPI_ERST;
> >>>> +    k->revision = 0x00;
> >>>> +    k->class_id = PCI_CLASS_OTHERS;
> >>>> +    dc->reset = erst_reset;
> >>>> +    dc->vmsd = &erst_vmstate;
> >>>> +    dc->user_creatable = true;  
> >>>
> >>> can't be hotplugged, add:
> >>>          dc->hotpluggable = false;  
> >> done
> >>  
> >>>      
> >>>> +    device_class_set_props(dc, erst_properties);
> >>>> +    dc->desc = "ACPI Error Record Serialization Table (ERST) device";
> >>>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> >>>> +    trace_acpi_erst_class_init_out();
> >>>> +}
> >>>> +
> >>>> +static const TypeInfo erst_type_info = {
> >>>> +    .name          = TYPE_ACPI_ERST,
> >>>> +    .parent        = TYPE_PCI_DEVICE,
> >>>> +    .class_init    = erst_class_init,
> >>>> +    .instance_size = sizeof(ERSTDeviceState),
> >>>> +    .interfaces = (InterfaceInfo[]) {
> >>>> +        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> >>>> +        { }
> >>>> +    }
> >>>> +};
> >>>> +
> >>>> +static void erst_register_types(void)
> >>>> +{
> >>>> +    type_register_static(&erst_type_info);
> >>>> +}
> >>>> +
> >>>> +type_init(erst_register_types)
> >>>> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
> >>>> index 29f804d..401d0e5 100644
> >>>> --- a/hw/acpi/meson.build
> >>>> +++ b/hw/acpi/meson.build
> >>>> @@ -5,6 +5,7 @@ acpi_ss.add(files(
> >>>>      'bios-linker-loader.c',
> >>>>      'core.c',
> >>>>      'utils.c',
> >>>> +  'erst.c',
> >>>>    ))
> >>>>    acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu.c'))
> >>>>    acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu_hotplug.c'))
> >>>> diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
> >>>> index 974d770..3579768 100644
> >>>> --- a/hw/acpi/trace-events
> >>>> +++ b/hw/acpi/trace-events
> >>>> @@ -55,3 +55,18 @@ piix4_gpe_writeb(uint64_t addr, unsigned width, uint64_t val) "addr: 0x%" PRIx64
> >>>>    # tco.c
> >>>>    tco_timer_reload(int ticks, int msec) "ticks=%d (%d ms)"
> >>>>    tco_timer_expired(int timeouts_no, bool strap, bool no_reboot) "timeouts_no=%d no_reboot=%d/%d"
> >>>> +
> >>>> +# erst.c
> >>>> +acpi_erst_reg_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%04" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
> >>>> +acpi_erst_reg_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%04" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
> >>>> +acpi_erst_mem_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%06" PRIx64 " <== 0x%016" PRIx64 " (size: %u)"
> >>>> +acpi_erst_mem_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%06" PRIx64 " ==> 0x%016" PRIx64 " (size: %u)"
> >>>> +acpi_erst_pci_bar_0(uint64_t addr) "BAR0: 0x%016" PRIx64
> >>>> +acpi_erst_pci_bar_1(uint64_t addr) "BAR1: 0x%016" PRIx64
> >>>> +acpi_erst_realizefn_in(void)
> >>>> +acpi_erst_realizefn_out(unsigned size) "total nvram size %u bytes"
> >>>> +acpi_erst_reset_in(unsigned record_count) "record_count %u"
> >>>> +acpi_erst_reset_out(unsigned record_count) "record_count %u"
> >>>> +acpi_erst_post_load(void *header) "header: 0x%p"
> >>>> +acpi_erst_class_init_in(void)
> >>>> +acpi_erst_class_init_out(void)  
> >>>      
> >>  
> >   
> 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v6 02/10] ACPI ERST: specification for ERST support
  2021-10-06  8:12   ` [PATCH v6 02/10] " Michael S. Tsirkin
@ 2021-10-06 20:07     ` Eric DeVolder
  0 siblings, 0 replies; 34+ messages in thread
From: Eric DeVolder @ 2021-10-06 20:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: ehabkost, konrad.wilk, qemu-devel, pbonzini, imammedo,
	boris.ostrovsky, rth



On 10/6/21 3:12 AM, Michael S. Tsirkin wrote:
> On Thu, Aug 05, 2021 at 06:30:31PM -0400, Eric DeVolder wrote:
>> Information on the implementation of the ACPI ERST support.
>>
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> ---
>>   docs/specs/acpi_erst.txt | 147 +++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 147 insertions(+)
>>   create mode 100644 docs/specs/acpi_erst.txt
> 
> It's probably a good idea to have new documents in the rst
> format.
> 
done!


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] ACPI ERST: specification for ERST support
  2021-10-06  7:00     ` Ani Sinha
@ 2021-10-06 20:07       ` Eric DeVolder
  2021-10-07  4:39         ` Ani Sinha
  0 siblings, 1 reply; 34+ messages in thread
From: Eric DeVolder @ 2021-10-06 20:07 UTC (permalink / raw)
  To: Ani Sinha
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, imammedo,
	boris.ostrovsky, rth



On 10/6/21 2:00 AM, Ani Sinha wrote:
> 
> 
> On Wed, 6 Oct 2021, Ani Sinha wrote:
> 
>> From: Eric DeVolder <eric.devolder@oracle.com>
>>
>>> ---
>>> docs/specs/acpi_erst.txt | 147 +++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 147 insertions(+)
>>> create mode 100644 docs/specs/acpi_erst.txt
>>>
> 
> OK it did not come out the way I wanted. But
> 
> Acked-by: Ani Sinha <ani@anisinha.ca>
> 
thank you!
eric


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] ACPI ERST: specification for ERST support
  2021-10-06 20:07       ` Eric DeVolder
@ 2021-10-07  4:39         ` Ani Sinha
  0 siblings, 0 replies; 34+ messages in thread
From: Ani Sinha @ 2021-10-07  4:39 UTC (permalink / raw)
  To: Eric DeVolder
  Cc: ehabkost, mst, konrad.wilk, qemu-devel, pbonzini, Ani Sinha,
	imammedo, boris.ostrovsky, rth



On Wed, 6 Oct 2021, Eric DeVolder wrote:

>
>
> On 10/6/21 2:00 AM, Ani Sinha wrote:
> >
> >
> > On Wed, 6 Oct 2021, Ani Sinha wrote:
> >
> > > From: Eric DeVolder <eric.devolder@oracle.com>
> > >
> > > > ---
> > > > docs/specs/acpi_erst.txt | 147 +++++++++++++++++++++++++++++++++++++++
> > > > 1 file changed, 147 insertions(+)
> > > > create mode 100644 docs/specs/acpi_erst.txt
> > > >
> >
> > OK it did not come out the way I wanted. But
> >
> > Acked-by: Ani Sinha <ani@anisinha.ca>
> >
> thank you!

So this patchset was sent when I was still not in MAINTAINERS file. Hence
I was not on the CC list. Its a little hard to send reviews when the
patches are not in my inbox (this resulted in the mess above). So if you
have pending updates to the patch series, maybe you can spin out a new
revision and I can review it more comfortably (this time hopefully I will
be on the cc list).



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-10-07  4:42 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-05 22:30 [PATCH v6 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
2021-08-05 22:30 ` [PATCH v6 01/10] ACPI ERST: bios-tables-test.c steps 1 and 2 Eric DeVolder
2021-09-20 13:05   ` Igor Mammedov
2021-10-04 20:37     ` Eric DeVolder
2021-08-05 22:30 ` [PATCH v6 02/10] ACPI ERST: specification for ERST support Eric DeVolder
2021-09-20 13:38   ` Igor Mammedov
2021-10-04 20:40     ` Eric DeVolder
2021-10-06  6:58   ` [PATCH] " Ani Sinha
2021-10-06  7:00     ` Ani Sinha
2021-10-06 20:07       ` Eric DeVolder
2021-10-07  4:39         ` Ani Sinha
2021-10-06  8:12   ` [PATCH v6 02/10] " Michael S. Tsirkin
2021-10-06 20:07     ` Eric DeVolder
2021-08-05 22:30 ` [PATCH v6 03/10] ACPI ERST: PCI device_id for ERST Eric DeVolder
2021-09-21 11:32   ` Igor Mammedov
2021-08-05 22:30 ` [PATCH v6 04/10] ACPI ERST: header file " Eric DeVolder
2021-08-05 22:30 ` [PATCH v6 05/10] ACPI ERST: support for ACPI ERST feature Eric DeVolder
2021-09-21 15:30   ` Igor Mammedov
2021-10-04 21:13     ` Eric DeVolder
2021-10-05 11:39       ` Igor Mammedov
2021-10-05 16:40         ` Eric DeVolder
2021-10-06 14:36           ` Igor Mammedov
2021-08-05 22:30 ` [PATCH v6 06/10] ACPI ERST: build the ACPI ERST table Eric DeVolder
2021-08-05 22:30 ` [PATCH v6 07/10] ACPI ERST: create ACPI ERST table for pc/x86 machines Eric DeVolder
2021-08-05 22:30 ` [PATCH v6 08/10] ACPI ERST: qtest for ERST Eric DeVolder
2021-08-05 22:30 ` [PATCH v6 09/10] ACPI ERST: bios-tables-test testcase Eric DeVolder
2021-09-21 11:32   ` Igor Mammedov
2021-10-04 21:13     ` Eric DeVolder
2021-08-05 22:30 ` [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test Eric DeVolder
2021-08-06 17:16   ` Eric DeVolder
2021-08-27 21:45     ` Eric DeVolder
2021-09-02  6:34       ` Igor Mammedov
2021-09-21 11:24   ` Igor Mammedov
2021-10-04 21:14     ` Eric DeVolder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.