All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/3] Generate APEI GHES table and dynamically record CPER
@ 2017-07-12  2:08 Dongjiu Geng
  2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros Dongjiu Geng
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Dongjiu Geng @ 2017-07-12  2:08 UTC (permalink / raw)
  To: lersek, mst, imammedo, famz, qemu-devel, zhaoshenglong,
	peter.maydell, qemu-arm, james.morse, zhengqiang10, huangshaoyu,
	wuquanming
  Cc: gengdongjiu

In the armv8 platform, the mainly hardware error source are ARMv8
SEA/SEI/GSIV. For the ARMv8 SEA/SEI, the KVM or host kernel will signal SIGBUS
or use other interface to notify user space, such as Qemu. After Qemu gets
the notification, it will record the CPER and inject the SEA/SEI to KVM. this
series of patches will generate APEI table when guest OS boot up, and dynamically
record CPER for the guest OS about the generic hardware errors, currently the
userspace only handle the memory section hardware errors

how to test:
1. In the guest OS, use this command to dump the APEI table: 
	"iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST"
2. And find the address for the generic error status block
   according to the notification type
3. then find the CPER record through the generic error status block.

For example(notification type is SEA):

(1) root@genericarmv8:~# iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST
(2) root@genericarmv8:~# cat HEST.dsl
	/*
	 * Intel ACPI Component Architecture
	 * AML/ASL+ Disassembler version 20170303 (64-bit version)
	 * Copyright (c) 2000 - 2017 Intel Corporation
	 *
	 * Disassembly of /sys/firmware/acpi/tables/HEST, Mon Dec 12 07:19:43 2016
	 *
	 * ACPI Data Table [HEST]
	 *
	 * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
	 */
    ..................................................................................
    [228h 0552   2]                Subtable Type : 0009 [Generic Hardware Error Source]
	[22Ah 0554   2]                    Source Id : 0008
	[22Ch 0556   2]            Related Source Id : FFFF
	[22Eh 0558   1]                     Reserved : 00
	[22Fh 0559   1]                      Enabled : 01
	[230h 0560   4]       Records To Preallocate : 00000001
	[234h 0564   4]      Max Sections Per Record : 00000001
	[238h 0568   4]          Max Raw Data Length : 00001000

	[23Ch 0572  12]         Error Status Address : [Generic Address Structure]
	[23Ch 0572   1]                     Space ID : 00 [SystemMemory]
	[23Dh 0573   1]                    Bit Width : 40
	[23Eh 0574   1]                   Bit Offset : 00
	[23Fh 0575   1]         Encoded Access Width : 04 [QWord Access:64]
	[240h 0576   8]                      Address : 00000000785D0040

	[248h 0584  28]                       Notify : [Hardware Error Notification Structure]
	[248h 0584   1]                  Notify Type : 08 [SEA]
	[249h 0585   1]                Notify Length : 1C
	[24Ah 0586   2]   Configuration Write Enable : 0000
	[24Ch 0588   4]                 PollInterval : 00000000
	[250h 0592   4]                       Vector : 00000000
	[254h 0596   4]      Polling Threshold Value : 00000000
	[258h 0600   4]     Polling Threshold Window : 00000000
	[25Ch 0604   4]        Error Threshold Value : 00000000
	[260h 0608   4]       Error Threshold Window : 00000000

	[264h 0612   4]    Error Status Block Length : 00001000
    .....................................................................................
(3) according to above table, the address that contains the physical address of a block
    of memory that holds the error status data for SEA notification error source is 0x00000000785D0040
(4) the address for SEA notification error source is 0x785d8058
	(qemu) xp /2x 0x00000000785D0040
	00000000785d0040: 0x785d8058 0x00000000
(5) check the content of generic error status block and generic error data entry
    (qemu) xp /100x 0x785d8058
    00000000785d8058: 0x00000001 0x00000000 0x00000000 0x00000098
    00000000785d8068: 0x00000001 0xa5bc1114 0x4ede6f64 0x833e63b8
	00000000785d8078: 0xb1837ced 0x00000000 0x00000000 0x00000050
	00000000785d8088: 0x00000000 0x00000000 0x00000000 0x00000000
	00000000785d8098: 0x00000000 0x00000000 0x00000000 0x00000000
	00000000785d80a8: 0x00000000 0x00000000 0x00000000 0x00004002
	00000000785d80b8: 0x00000000 0x00000000 0x00000000 0x00001111
	00000000785d80c8: 0x00000000 0x00000000 0x00000000 0x00000000
	00000000785d80d8: 0x00000000 0x00000000 0x00000000 0x00000000
	00000000785d80e8: 0x00000000 0x00000000 0x00000000 0x00000000
	00000000785d80f8: 0x00000000 0x00000003 0x00000000 0x00000000
	00000000785d8108: 0x00000000 0x00000000 0x00000000 0x00000000
	00000000785d8118: 0x00000000 0x00000000 0x00000000 0x00000000
	00000000785d8128: 0x00000000 0x00000000 0x00000000 0x00000000
	00000000785d8138: 0x00000000 0x00000000 0x00000000 0x00000000

Dongjiu Geng (3):
  ACPI: Add new ACPI structures and macros
  ACPI: Add APEI GHES Table Generation support
  ACPI: build and enable APEI GHES in the Makefile and configuration

 default-configs/arm-softmmu.mak |   1 +
 hw/acpi/Makefile.objs           |   1 +
 hw/acpi/aml-build.c             |   2 +
 hw/acpi/hest_ghes.c             | 219 ++++++++++++++++++++++++++++++++++++++++
 hw/arm/virt-acpi-build.c        |   6 ++
 include/hw/acpi/acpi-defs.h     | 194 +++++++++++++++++++++++++++++++++++
 include/hw/acpi/aml-build.h     |   1 +
 include/hw/acpi/hest_ghes.h     |  47 +++++++++
 include/qemu/uuid.h             |  11 ++
 9 files changed, 482 insertions(+)
 create mode 100644 hw/acpi/hest_ghes.c
 create mode 100644 include/hw/acpi/hest_ghes.h

-- 
2.10.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros
  2017-07-12  2:08 [Qemu-devel] [PATCH v5 0/3] Generate APEI GHES table and dynamically record CPER Dongjiu Geng
@ 2017-07-12  2:08 ` Dongjiu Geng
  2017-07-13 10:33   ` Laszlo Ersek
                     ` (2 more replies)
  2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 2/3] ACPI: Add APEI GHES Table Generation support Dongjiu Geng
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 19+ messages in thread
From: Dongjiu Geng @ 2017-07-12  2:08 UTC (permalink / raw)
  To: lersek, mst, imammedo, famz, qemu-devel, zhaoshenglong,
	peter.maydell, qemu-arm, james.morse, zhengqiang10, huangshaoyu,
	wuquanming
  Cc: gengdongjiu

(1) Add related APEI/HEST table structures and  macros, these
    definition refer to ACPI 6.1 and UEFI 2.6 spec.
(2) Add generic error status block and CPER memory section
    definition, user space only handle memory section errors.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
thanks Laszlo and Michael's review:

chnage since v4:
(1) fix email threading in this series is incorrect issue

change since v3:
(1) separate the original one patch into three patches: one is new
    ACPI structures and macros, another is C source file to generate ACPI HEST
    table and dynamically record CPER ,final patch is the change about Makefile
    and configuration
(2) add comments about where the ACPI structures and macros come from, for example,
    they come from the UEFI Spec 2.6, "xxxxxxxxxxxx"; ACPI 6.1 spec, 
    "xxxxxxxxxxxxxx".
(3) correct the macros name, for emaple, prefix some macro names with
    "UEFI_".
(4) remove the uuid_le struct and use the QemuUUID in the include/qemu/uuid.h"
(5) remove the duplicate ACPI address space, because it already defined in 
    the "include/hw/acpi/aml-build.h"
(6) remove the acpi_generic_address structure because same definition exists in the 
    AcpiGenericAddress.
(7) rename the struct acpi_hest_notify to AcpiHestNotifyType
(8) rename the struct acpi_hest_types to AcpiHestSourceType
(9) rename enum constants AcpiHestSourceType to ACPI_HEST_SOURCE_xxx from
     ACPI_HEST_TYPE_xxx
(10) remove the NOT_USED{3,4,5} enum constants in the AcpiHestSourceType.
(11) add missed QEMU_PACKED for the struct definition.
(12) remove the defnition of AcpiGenericErrorData, and rename the
     AcpiGenericErrorDataV300 to AcpiGenericErrorData.
(13) use the QemuUUID type for the "section_type" field AcpiGenericErrorData, and rename it
     to section_type_le.
(14) moving type AcpiGenericErrorSeverity above AcpiGenericErrorData and
     AcpiGenericErrorDataV300, and remarking on the "error_severity" fields
     that they take their values from AcpiGenericErrorSeverity
(15) remove the wrongly call to BERT(Boot Error Record Table)
(16) add comments for the struction member, for example, pint out that
     the block_status member in the AcpiGenericErrorStatus is a bitmask
     composed of ACPI_GEBS_xxx macros
(17) remove the hardware error source notification type list, and rename
     the MAX_ERROR_SOURCE_COUNT_V6 to ACPI_HEST_NOTIFY_RESERVED.
(18) remove the physical_addr member of GhesErrorState
(19) change the "uint64_t ghes_addr_le[8]" in GhesErrorState to uint64_t ghes_addr_le
(20) change the second parameter to "error_physical_addr" in the ghes_update_guest
     API statement
---
 include/hw/acpi/acpi-defs.h | 194 ++++++++++++++++++++++++++++++++++++++++++++
 include/hw/acpi/aml-build.h |   1 +
 include/hw/acpi/hest_ghes.h |  47 +++++++++++
 include/qemu/uuid.h         |  11 +++
 4 files changed, 253 insertions(+)
 create mode 100644 include/hw/acpi/hest_ghes.h

diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 4cc3630..0756339 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -15,6 +15,7 @@
 #ifndef QEMU_ACPI_DEFS_H
 #define QEMU_ACPI_DEFS_H
 
+#include "qemu/uuid.h"
 enum {
     ACPI_FADT_F_WBINVD,
     ACPI_FADT_F_WBINVD_FLUSH,
@@ -295,6 +296,44 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable;
 #define ACPI_APIC_GENERIC_TRANSLATOR    15
 #define ACPI_APIC_RESERVED              16   /* 16 and greater are reserved */
 
+/* UEFI Spec 2.6, "N.2.5 Memory Error Section */
+#define UEFI_CPER_MEM_VALID_ERROR_STATUS     0x0001
+#define UEFI_CPER_MEM_VALID_PA               0x0002
+#define UEFI_CPER_MEM_VALID_PA_MASK          0x0004
+#define UEFI_CPER_MEM_VALID_NODE             0x0008
+#define UEFI_CPER_MEM_VALID_CARD             0x0010
+#define UEFI_CPER_MEM_VALID_MODULE           0x0020
+#define UEFI_CPER_MEM_VALID_BANK             0x0040
+#define UEFI_CPER_MEM_VALID_DEVICE           0x0080
+#define UEFI_CPER_MEM_VALID_ROW              0x0100
+#define UEFI_CPER_MEM_VALID_COLUMN           0x0200
+#define UEFI_CPER_MEM_VALID_BIT_POSITION     0x0400
+#define UEFI_CPER_MEM_VALID_REQUESTOR        0x0800
+#define UEFI_CPER_MEM_VALID_RESPONDER        0x1000
+#define UEFI_CPER_MEM_VALID_TARGET           0x2000
+#define UEFI_CPER_MEM_VALID_ERROR_TYPE       0x4000
+#define UEFI_CPER_MEM_VALID_RANK_NUMBER      0x8000
+#define UEFI_CPER_MEM_VALID_CARD_HANDLE      0x10000
+#define UEFI_CPER_MEM_VALID_MODULE_HANDLE    0x20000
+#define UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC   3
+
+/* This is from the ACPI 6.1 spec, "18.3.2.9 Hardware Error Notification" */
+
+enum AcpiHestNotifyType {
+    ACPI_HEST_NOTIFY_POLLED = 0,
+    ACPI_HEST_NOTIFY_EXTERNAL = 1,
+    ACPI_HEST_NOTIFY_LOCAL = 2,
+    ACPI_HEST_NOTIFY_SCI = 3,
+    ACPI_HEST_NOTIFY_NMI = 4,
+    ACPI_HEST_NOTIFY_CMCI = 5,  /* ACPI 5.0 */
+    ACPI_HEST_NOTIFY_MCE = 6,   /* ACPI 5.0 */
+    ACPI_HEST_NOTIFY_GPIO = 7,  /* ACPI 6.0 */
+    ACPI_HEST_NOTIFY_SEA = 8,   /* ACPI 6.1 */
+    ACPI_HEST_NOTIFY_SEI = 9,   /* ACPI 6.1 */
+    ACPI_HEST_NOTIFY_GSIV = 10, /* ACPI 6.1 */
+    ACPI_HEST_NOTIFY_RESERVED = 11  /* 11 and greater are reserved */
+};
+
 /*
  * MADT sub-structures (Follow MULTIPLE_APIC_DESCRIPTION_TABLE)
  */
@@ -475,6 +514,161 @@ struct AcpiSystemResourceAffinityTable
 } QEMU_PACKED;
 typedef struct AcpiSystemResourceAffinityTable AcpiSystemResourceAffinityTable;
 
+/* Hardware Error Notification, this is from the ACPI 6.1
+ * spec, "18.3.2.9 Hardware Error Notification"
+ */
+struct AcpiHestNotify {
+    uint8_t type;
+    uint8_t length;
+    uint16_t config_write_enable;
+    uint32_t poll_interval;
+    uint32_t vector;
+    uint32_t polling_threshold_value;
+    uint32_t polling_threshold_window;
+    uint32_t error_threshold_value;
+    uint32_t error_threshold_window;
+} QEMU_PACKED;
+typedef struct AcpiHestNotify AcpiHestNotify;
+
+/* These are from ACPI 6.1, sections "18.3.2.1 IA-32 Architecture Machine
+ * Check Exception" through "18.3.2.8 Generic Hardware Error Source version 2".
+ */
+enum AcpiHestSourceType {
+    ACPI_HEST_SOURCE_IA32_CHECK = 0,
+    ACPI_HEST_SOURCE_IA32_CORRECTED_CHECK = 1,
+    ACPI_HEST_SOURCE_IA32_NMI = 2,
+    ACPI_HEST_SOURCE_AER_ROOT_PORT = 6,
+    ACPI_HEST_SOURCE_AER_ENDPOINT = 7,
+    ACPI_HEST_SOURCE_AER_BRIDGE = 8,
+    ACPI_HEST_SOURCE_GENERIC_ERROR = 9,
+    ACPI_HEST_SOURCE_GENERIC_ERROR_V2 = 10,
+    ACPI_HEST_SOURCE_RESERVED = 11    /* 11 and greater are reserved */
+};
+
+/* Block Status bitmasks from ACPI 6.1, "18.3.2.7.1 Generic Error Data" */
+#define ACPI_GEBS_UNCORRECTABLE             (1)
+#define ACPI_GEBS_CORRECTABLE               (1 << 1)
+#define ACPI_GEBS_MULTIPLE_UNCORRECTABLE    (1 << 2)
+#define ACPI_GEBS_MULTIPLE_CORRECTABLE      (1 << 3)
+/* 10 bits, error count */
+#define ACPI_GEBS_ERROR_ENTRY_COUNT         (0x3FF << 4)
+
+/* Generic Hardware Error Source Structure, refer to ACPI 6.1
+ * "18.3.2.7 Generic Hardware Error Source". in this struct the
+ * "type" field has to be ACPI_HEST_SOURCE_GENERIC_ERROR
+ */
+
+struct AcpiGenericHardwareErrorSource {
+    uint16_t type;
+    uint16_t source_id;
+    uint16_t related_source_id;
+    uint8_t flags;
+    uint8_t enabled;
+    uint32_t number_of_records;
+    uint32_t max_sections_per_record;
+    uint32_t max_raw_data_length;
+    struct AcpiGenericAddress error_status_address;
+    struct AcpiHestNotify notify;
+    uint32_t error_status_block_length;
+} QEMU_PACKED;
+typedef struct AcpiGenericHardwareErrorSource AcpiGenericHardwareErrorSource;
+
+/* Generic Hardware Error Source, version 2, ACPI 6.1, "18.3.2.8 Generic
+ * Hardware Error Source version 2", in this struct the "type" field has to
+ * be ACPI_HEST_SOURCE_GENERIC_ERROR_V2
+ */
+struct AcpiGenericHardwareErrorSourceV2 {
+    uint16_t type;
+    uint16_t source_id;
+    uint16_t related_source_id;
+    uint8_t flags;
+    uint8_t enabled;
+    uint32_t number_of_records;
+    uint32_t max_sections_per_record;
+    uint32_t max_raw_data_length;
+    struct AcpiGenericAddress error_status_address;
+    struct AcpiHestNotify notify;
+    uint32_t error_status_block_length;
+    struct AcpiGenericAddress read_ack_register;
+    uint64_t read_ack_preserve;
+    uint64_t read_ack_write;
+} QEMU_PACKED;
+typedef struct AcpiGenericHardwareErrorSourceV2
+            AcpiGenericHardwareErrorSourceV2;
+
+/* Generic Error Status block, this is from ACPI 6.1,
+ * "18.3.2.7.1 Generic Error Data"
+ */
+struct AcpiGenericErrorStatus {
+    /* It is a bitmask composed of ACPI_GEBS_xxx macros */
+    uint32_t block_status;
+    uint32_t raw_data_offset;
+    uint32_t raw_data_length;
+    uint32_t data_length;
+    uint32_t error_severity;
+} QEMU_PACKED;
+typedef struct AcpiGenericErrorStatus AcpiGenericErrorStatus;
+
+enum AcpiGenericErrorSeverity {
+    ACPI_CPER_SEV_RECOVERABLE,
+    ACPI_CPER_SEV_FATAL,
+    ACPI_CPER_SEV_CORRECTED,
+    ACPI_CPER_SEV_NONE,
+};
+
+/* Generic Error Data entry, revision number is 0x0300,
+ * ACPI 6.1, "18.3.2.7.1 Generic Error Data"
+ */
+struct AcpiGenericErrorData {
+    QemuUUID section_type_le;
+    /* The "error_severity" fields that they take their
+     * values from AcpiGenericErrorSeverity
+     */
+    uint32_t error_severity;
+    uint16_t revision;
+    uint8_t validation_bits;
+    uint8_t flags;
+    uint32_t error_data_length;
+    uint8_t fru_id[16];
+    uint8_t fru_text[20];
+    uint64_t time_stamp;
+} QEMU_PACKED;
+typedef struct AcpiGenericErrorData AcpiGenericErrorData;
+
+/* This is from UEFI 2.6, "N.2.5 Memory Error Section" */
+struct UefiCperSecMemErr {
+    uint64_t    validation_bits;
+    uint64_t    error_status;
+    uint64_t    physical_addr;
+    uint64_t    physical_addr_mask;
+    uint16_t    node;
+    uint16_t    card;
+    uint16_t    module;
+    uint16_t    bank;
+    uint16_t    device;
+    uint16_t    row;
+    uint16_t    column;
+    uint16_t    bit_pos;
+    uint64_t    requestor_id;
+    uint64_t    responder_id;
+    uint64_t    target_id;
+    uint8_t     error_type;
+    uint8_t     reserved;
+    uint16_t    rank;
+    uint16_t    mem_array_handle;   /* card handle in UEFI 2.4 */
+    uint16_t    mem_dev_handle;     /* module handle in UEFI 2.4 */
+} QEMU_PACKED;
+typedef struct UefiCperSecMemErr UefiCperSecMemErr;
+
+/*
+ * HEST Description Table
+ */
+struct AcpiHardwareErrorSourceTable {
+    ACPI_TABLE_HEADER_DEF                    /* ACPI common table header */
+    uint32_t           error_source_count;
+} QEMU_PACKED;
+typedef struct AcpiHardwareErrorSourceTable AcpiHardwareErrorSourceTable;
+
 #define ACPI_SRAT_PROCESSOR_APIC     0
 #define ACPI_SRAT_MEMORY             1
 #define ACPI_SRAT_PROCESSOR_x2APIC   2
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 00c21f1..c1d15b3 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -211,6 +211,7 @@ struct AcpiBuildTables {
     GArray *rsdp;
     GArray *tcpalog;
     GArray *vmgenid;
+    GArray *hardware_errors;
     BIOSLinker *linker;
 } AcpiBuildTables;
 
diff --git a/include/hw/acpi/hest_ghes.h b/include/hw/acpi/hest_ghes.h
new file mode 100644
index 0000000..0772756
--- /dev/null
+++ b/include/hw/acpi/hest_ghes.h
@@ -0,0 +1,47 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Authors:
+ *   Dongjiu Geng <gengdongjiu@huawei.com>
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef ACPI_GHES_H
+#define ACPI_GHES_H
+
+#include "hw/acpi/bios-linker-loader.h"
+
+#define GHES_ERRORS_FW_CFG_FILE      "etc/hardware_errors"
+#define GHES_DATA_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
+
+#define GHES_GAS_ADDRESS_OFFSET              4
+#define GHES_ERROR_STATUS_ADDRESS_OFFSET     20
+#define GHES_NOTIFICATION_STRUCTURE          32
+
+#define GHES_CPER_OK   1
+#define GHES_CPER_FAIL 0
+
+#define GHES_ACPI_HEST_NOTIFY_RESERVED           11
+/* The max size in Bytes for one error block */
+#define GHES_MAX_RAW_DATA_LENGTH                 0x1000
+
+
+typedef struct GhesState {
+    uint64_t ghes_addr_le;
+} GhesState;
+
+void ghes_build_acpi(GArray *table_data, GArray *hardware_error,
+                            BIOSLinker *linker);
+void ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
+bool ghes_update_guest(uint32_t notify, uint64_t error_physical_addr);
+#endif
diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
index afe4840..99c4041 100644
--- a/include/qemu/uuid.h
+++ b/include/qemu/uuid.h
@@ -44,6 +44,17 @@ typedef struct {
 
 #define UUID_NONE "00000000-0000-0000-0000-000000000000"
 
+#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
+{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
+   ((b) >> 8) & 0xff, (b) & 0xff,                   \
+    ((c) >> 8) & 0xff, (c) & 0xff,                    \
+    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
+
+/* Platform Memory, this is from UEFI 2.6 N.2.2 Section Descriptor */
+#define UEFI_CPER_SEC_PLATFORM_MEM                   \
+    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
+    0xED, 0x7C, 0x83, 0xB1)
+
 void qemu_uuid_generate(QemuUUID *out);
 
 int qemu_uuid_is_null(const QemuUUID *uu);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v5 2/3] ACPI: Add APEI GHES Table Generation support
  2017-07-12  2:08 [Qemu-devel] [PATCH v5 0/3] Generate APEI GHES table and dynamically record CPER Dongjiu Geng
  2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros Dongjiu Geng
@ 2017-07-12  2:08 ` Dongjiu Geng
  2017-07-13 17:32   ` Michael S. Tsirkin
  2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 3/3] ACPI: build and enable APEI GHES in the Makefile and configuration Dongjiu Geng
  2017-07-13 17:36 ` [Qemu-devel] [PATCH v5 0/3] Generate APEI GHES table and dynamically record CPER Michael S. Tsirkin
  3 siblings, 1 reply; 19+ messages in thread
From: Dongjiu Geng @ 2017-07-12  2:08 UTC (permalink / raw)
  To: lersek, mst, imammedo, famz, qemu-devel, zhaoshenglong,
	peter.maydell, qemu-arm, james.morse, zhengqiang10, huangshaoyu,
	wuquanming
  Cc: gengdongjiu

This implements APEI GHES Table by passing the error CPER info
to the guest via a fw_cfg_blob. After a CPER info is recorded, an
SEA(Synchronous External Abort)/SEI(SError Interrupt) exception
will be injected into the guest OS.

Below is the table layout, the max number of error soure is 11,
which is classified by notification type.

etc/acpi/tables                 etc/hardware_errors
================     ==========================================
                     +-----------+
+--------------+     | address   |         +-> +--------------+
|    HEST      +     | registers |         |   | Error Status |
+ +------------+     | +---------+         |   | Data Block 0 |
| | GHES0      | --> | |address0 | --------+   | +------------+
| | GHES1      | --> | |address1 | ------+     | |  CPER      |
| | GHES2      | --> | |address2 | ----+ |     | |  CPER      |
| |  ....      | --> | | ....... |     | |     | |  CPER      |
| | GHES10     | --> | |address10| -+  | |     | |  CPER      |
+-+------------+     +-+---------+  |  | |     +-+------------+
                                    |  | |
                                    |  | +---> +--------------+
                                    |  |       | Error Status |
                                    |  |       | Data Block 1 |
                                    |  |       | +------------+
                                    |  |       | |  CPER      |
                                    |  |       | |  CPER      |
                                    |  |       +-+------------+
                                    |  |
                                    |  +-----> +--------------+
                                    |          | Error Status |
                                    |          | Data Block 2 |
                                    |          | +------------+
                                    |          | |  CPER      |
                                    |          +-+------------+
                                    |            ...........
                                    +--------> +--------------+
                                               | Error Status |
                                               | Data Block 10|
                                               | +------------+
                                               | |  CPER      |
                                               | |  CPER      |
                                               | |  CPER      |
                                               +-+------------+
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
thanks a lot Laszlo's review and comments: 

change since v4:
1. fix email threading in this series is incorrect issue

change since v3:
1. remove the unnecessary include for "hw/acpi/vmgenid.h" in 
   hw/arm/virt-acpi-build.c
2. add conversion between LE and host-endian for the CPER record
3. handle the case that run out of the preallocated memory for the CPER record
4. change to use g_malloc0 instead of g_malloc 
5. change block_reqr_size name to block_rer_size
6. change QEMU coding style, that is, the operator is at the end of the line.
7. drop the ERROR_STATUS_ADDRESS_OFFSET and GAS_ADDRESS_OFFSET macros (from
   the header file as well), and use the offsetof to replace it.
8. remove the init_aml_allocator() / free_aml_allocator(), calculate the
   needed size, and push that many bytes directly to "table_data".
9. take an "OVMF header probe suppressor" into account
10.corrct HEST and CPER value assigment, for example, correct the source_id
   for every error source, this identifier of source_id should be unique among
   all error sources; 
11. create only one WRITE_POINTER command, for the base address of "etc/hardware_errors".
   This should be done outside of the loop.The base addresses of the individual error
   status data blocks should be calculated in ghes_update_guest(), based on the error
   source / notification type
12.correct the commit message lists error sources / notification types 0 through 10 (count=11 in total). 
13.correct the size calculation for GHES_DATA_ADDR_FW_CFG_FILE 
14.range-checked the value of "notify" before using it as an array subscript
---
 hw/acpi/aml-build.c      |   2 +
 hw/acpi/hest_ghes.c      | 219 +++++++++++++++++++++++++++++++++++++++++++++++
 hw/arm/virt-acpi-build.c |   6 ++
 3 files changed, 227 insertions(+)
 create mode 100644 hw/acpi/hest_ghes.c

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index c6f2032..802b98d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1560,6 +1560,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
     tables->table_data = g_array_new(false, true /* clear */, 1);
     tables->tcpalog = g_array_new(false, true /* clear */, 1);
     tables->vmgenid = g_array_new(false, true /* clear */, 1);
+    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
     tables->linker = bios_linker_loader_init();
 }
 
@@ -1570,6 +1571,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
     g_array_free(tables->table_data, true);
     g_array_free(tables->tcpalog, mfre);
     g_array_free(tables->vmgenid, mfre);
+    g_array_free(tables->hardware_errors, mfre);
 }
 
 /* Build rsdt table */
diff --git a/hw/acpi/hest_ghes.c b/hw/acpi/hest_ghes.c
new file mode 100644
index 0000000..c9442b6
--- /dev/null
+++ b/hw/acpi/hest_ghes.c
@@ -0,0 +1,219 @@
+/*
+ *  APEI GHES table Generation
+ *
+ *  Copyright (C) 2017 huawei.
+ *
+ *  Author: Dongjiu Geng <gengdongjiu@huawei.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qmp-commands.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/hest_ghes.h"
+#include "hw/nvram/fw_cfg.h"
+#include "sysemu/sysemu.h"
+
+static int ghes_record_cper(uint64_t error_block_address,
+                                    uint64_t error_physical_addr)
+{
+    AcpiGenericErrorStatus block;
+    AcpiGenericErrorData *gdata;
+    UefiCperSecMemErr *mem_err;
+    uint64_t current_block_length;
+    unsigned char *buffer;
+    QemuUUID section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
+
+
+    cpu_physical_memory_read(error_block_address, &block,
+                                sizeof(AcpiGenericErrorStatus));
+
+    /* Get the current generic error status block length */
+    current_block_length = sizeof(AcpiGenericErrorStatus) +
+        le32_to_cpu(block.data_length);
+
+    /* If the Generic Error Status Block is NULL, update
+     * the block header
+     */
+    if (!block.block_status) {
+        block.block_status = ACPI_GEBS_UNCORRECTABLE;
+        block.error_severity = ACPI_CPER_SEV_FATAL;
+    }
+
+    block.data_length += cpu_to_le32(sizeof(AcpiGenericErrorData));
+    block.data_length += cpu_to_le32(sizeof(UefiCperSecMemErr));
+
+    /* check whether it runs out of the preallocated memory */
+    if ((le32_to_cpu(block.data_length) + sizeof(AcpiGenericErrorStatus)) >
+       GHES_MAX_RAW_DATA_LENGTH) {
+        return GHES_CPER_FAIL;
+    }
+    /* Write back the Generic Error Status Block to guest memory */
+    cpu_physical_memory_write(error_block_address, &block,
+                        sizeof(AcpiGenericErrorStatus));
+
+    /* Fill in Generic Error Data Entry */
+    buffer = g_malloc0(sizeof(AcpiGenericErrorData) +
+                       sizeof(UefiCperSecMemErr));
+    memset(buffer, 0, sizeof(AcpiGenericErrorData) + sizeof(UefiCperSecMemErr));
+    gdata = (AcpiGenericErrorData *)buffer;
+
+    qemu_uuid_bswap(&section_id_le);
+    memcpy(&(gdata->section_type_le), &section_id_le,
+                sizeof(QemuUUID));
+    gdata->error_data_length = cpu_to_le32(sizeof(UefiCperSecMemErr));
+
+    mem_err = (UefiCperSecMemErr *) (gdata + 1);
+
+    /* In order to simplify simulation, hard code the CPER section to memory
+     * section.
+     */
+
+    /* Hard code to Multi-bit ECC error */
+    mem_err->validation_bits |= cpu_to_le32(UEFI_CPER_MEM_VALID_ERROR_TYPE);
+    mem_err->error_type = cpu_to_le32(UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC);
+
+    /* Record the physical address at which the memory error occurred */
+    mem_err->validation_bits |= cpu_to_le32(UEFI_CPER_MEM_VALID_PA);
+    mem_err->physical_addr = cpu_to_le32(error_physical_addr);
+
+    /* Write back the Generic Error Data Entry to guest memory */
+    cpu_physical_memory_write(error_block_address + current_block_length,
+        buffer, sizeof(AcpiGenericErrorData) + sizeof(UefiCperSecMemErr));
+
+    g_free(buffer);
+    return GHES_CPER_OK;
+}
+
+void ghes_build_acpi(GArray *table_data, GArray *hardware_error,
+                                            BIOSLinker *linker)
+{
+    GArray *buffer;
+    uint32_t address_registers_offset;
+    AcpiHardwareErrorSourceTable *error_source_table;
+    AcpiGenericHardwareErrorSource *error_source;
+    int i;
+    /*
+     * The block_req_size stands for one address and one
+     * generic error status block
+      +---------+
+      | address | --------+-> +---------+
+      +---------+             |  CPER   |
+                              |  CPER   |
+                              |  CPER   |
+                              |  CPER   |
+                              |  ....   |
+                              +---------+
+     */
+    int block_req_size = sizeof(uint64_t) + GHES_MAX_RAW_DATA_LENGTH;
+
+    /* The total size for address of data structure and
+     * error status data block
+     */
+    g_array_set_size(hardware_error, GHES_ACPI_HEST_NOTIFY_RESERVED *
+                                                block_req_size);
+
+    buffer = g_array_new(false, true /* clear */, 1);
+    address_registers_offset = table_data->len +
+        sizeof(AcpiHardwareErrorSourceTable) +
+        offsetof(AcpiGenericHardwareErrorSource, error_status_address) +
+        offsetof(struct AcpiGenericAddress, address);
+
+    /* Reserve space for HEST table size */
+    acpi_data_push(buffer, sizeof(AcpiHardwareErrorSourceTable) +
+                                GHES_ACPI_HEST_NOTIFY_RESERVED *
+                                sizeof(AcpiGenericHardwareErrorSource));
+
+    g_array_append_vals(table_data, buffer->data, buffer->len);
+    /* Allocate guest memory for the Data fw_cfg blob */
+    bios_linker_loader_alloc(linker, GHES_ERRORS_FW_CFG_FILE, hardware_error,
+                            4096, false /* page boundary, high memory */);
+
+    error_source_table = (AcpiHardwareErrorSourceTable *)(table_data->data
+                        + table_data->len - buffer->len);
+    error_source_table->error_source_count = GHES_ACPI_HEST_NOTIFY_RESERVED;
+    error_source = (AcpiGenericHardwareErrorSource *)
+        ((AcpiHardwareErrorSourceTable *)error_source_table + 1);
+
+    bios_linker_loader_write_pointer(linker, GHES_DATA_ADDR_FW_CFG_FILE,
+        0, sizeof(uint64_t), GHES_ERRORS_FW_CFG_FILE,
+        GHES_ACPI_HEST_NOTIFY_RESERVED * sizeof(uint64_t));
+
+    for (i = 0; i < GHES_ACPI_HEST_NOTIFY_RESERVED; i++) {
+        error_source->type = ACPI_HEST_SOURCE_GENERIC_ERROR;
+        error_source->source_id = cpu_to_le16(i);
+        error_source->related_source_id = 0xffff;
+        error_source->flags = 0;
+        error_source->enabled = 1;
+        /* The number of error status block per Generic Hardware Error Source */
+        error_source->number_of_records = 1;
+        error_source->max_sections_per_record = 1;
+        error_source->max_raw_data_length = GHES_MAX_RAW_DATA_LENGTH;
+        error_source->error_status_address.space_id =
+                                    AML_SYSTEM_MEMORY;
+        error_source->error_status_address.bit_width = 64;
+        error_source->error_status_address.bit_offset = 0;
+        error_source->error_status_address.access_width = 4;
+        error_source->notify.type = i;
+        error_source->notify.length = sizeof(AcpiHestNotify);
+
+        error_source->error_status_block_length = GHES_MAX_RAW_DATA_LENGTH;
+
+        bios_linker_loader_add_pointer(linker,
+            ACPI_BUILD_TABLE_FILE, address_registers_offset + i *
+            sizeof(AcpiGenericHardwareErrorSource), sizeof(uint64_t),
+            GHES_ERRORS_FW_CFG_FILE, i * sizeof(uint64_t));
+
+        error_source++;
+    }
+
+    for (i = 0; i < GHES_ACPI_HEST_NOTIFY_RESERVED; i++) {
+        bios_linker_loader_add_pointer(linker,
+            GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i, sizeof(uint64_t),
+            GHES_ERRORS_FW_CFG_FILE, GHES_ACPI_HEST_NOTIFY_RESERVED *
+            sizeof(uint64_t) + i * GHES_MAX_RAW_DATA_LENGTH);
+    }
+
+    build_header(linker, table_data,
+        (void *)error_source_table, "HEST", buffer->len, 1, NULL, "GHES");
+
+    g_array_free(buffer, true);
+}
+
+static GhesState ges;
+void ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
+{
+
+    size_t request_block_size = sizeof(uint64_t) + GHES_MAX_RAW_DATA_LENGTH;
+    size_t size = GHES_ACPI_HEST_NOTIFY_RESERVED * request_block_size;
+
+    /* Create a read-only fw_cfg file for GHES */
+    fw_cfg_add_file(s, GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
+                    size);
+    /* Create a read-write fw_cfg file for Address */
+    fw_cfg_add_file_callback(s, GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
+        &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
+}
+
+bool ghes_update_guest(uint32_t notify, uint64_t physical_address)
+{
+    uint64_t error_block_addr;
+
+    if (physical_address && notify < GHES_ACPI_HEST_NOTIFY_RESERVED) {
+        error_block_addr = ges.ghes_addr_le + notify * GHES_MAX_RAW_DATA_LENGTH;
+        error_block_addr = le32_to_cpu(error_block_addr);
+
+        /* A zero value in ghes_addr means that BIOS has not yet written
+         * the address
+         */
+        if (error_block_addr) {
+            return ghes_record_cper(error_block_addr, physical_address);
+        }
+    }
+
+    return GHES_CPER_FAIL;
+}
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 0835e59..5c97016 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -45,6 +45,7 @@
 #include "hw/arm/virt.h"
 #include "sysemu/numa.h"
 #include "kvm_arm.h"
+#include "hw/acpi/hest_ghes.h"
 
 #define ARM_SPI_BASE 32
 #define ACPI_POWER_BUTTON_DEVICE "PWRB"
@@ -778,6 +779,9 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     acpi_add_table(table_offsets, tables_blob);
     build_spcr(tables_blob, tables->linker, vms);
 
+    acpi_add_table(table_offsets, tables_blob);
+    ghes_build_acpi(tables_blob, tables->hardware_errors, tables->linker);
+
     if (nb_numa_nodes > 0) {
         acpi_add_table(table_offsets, tables_blob);
         build_srat(tables_blob, tables->linker, vms);
@@ -890,6 +894,8 @@ void virt_acpi_setup(VirtMachineState *vms)
     fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
                     acpi_data_len(tables.tcpalog));
 
+    ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
+
     build_state->rsdp_mr = acpi_add_rom_blob(build_state, tables.rsdp,
                                               ACPI_BUILD_RSDP_FILE, 0);
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v5 3/3] ACPI: build and enable APEI GHES in the Makefile and configuration
  2017-07-12  2:08 [Qemu-devel] [PATCH v5 0/3] Generate APEI GHES table and dynamically record CPER Dongjiu Geng
  2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros Dongjiu Geng
  2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 2/3] ACPI: Add APEI GHES Table Generation support Dongjiu Geng
@ 2017-07-12  2:08 ` Dongjiu Geng
  2017-07-13 17:36 ` [Qemu-devel] [PATCH v5 0/3] Generate APEI GHES table and dynamically record CPER Michael S. Tsirkin
  3 siblings, 0 replies; 19+ messages in thread
From: Dongjiu Geng @ 2017-07-12  2:08 UTC (permalink / raw)
  To: lersek, mst, imammedo, famz, qemu-devel, zhaoshenglong,
	peter.maydell, qemu-arm, james.morse, zhengqiang10, huangshaoyu,
	wuquanming
  Cc: gengdongjiu

Add CONFIG_ACPI_APEI configuration in the Makefile and
enable it in the arm-softmmu.mak

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
thanks a lot Laszlo's review and comments: 

change since v4:
(1) fix email threading in this series is incorrect issue

change since v3:
(1) change name to "CONFIG_ACPI_APEI" from CONFIG_ACPI_APEI_GENERATION

---
 default-configs/arm-softmmu.mak | 1 +
 hw/acpi/Makefile.objs           | 1 +
 2 files changed, 2 insertions(+)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 1e3bd2b..ee6f5fc 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -121,3 +121,4 @@ CONFIG_ACPI=y
 CONFIG_SMBIOS=y
 CONFIG_ASPEED_SOC=y
 CONFIG_GPIO_KEY=y
+CONFIG_ACPI_APEI=y
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 11c35bc..bafb148 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
+common-obj-$(CONFIG_ACPI_APEI) += hest_ghes.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros
  2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros Dongjiu Geng
@ 2017-07-13 10:33   ` Laszlo Ersek
  2017-07-13 12:00     ` gengdongjiu
  2017-07-13 17:01   ` Michael S. Tsirkin
  2017-07-13 17:11   ` Michael S. Tsirkin
  2 siblings, 1 reply; 19+ messages in thread
From: Laszlo Ersek @ 2017-07-13 10:33 UTC (permalink / raw)
  To: Dongjiu Geng, mst, imammedo, famz, qemu-devel, zhaoshenglong,
	peter.maydell, qemu-arm, james.morse, zhengqiang10, huangshaoyu,
	wuquanming

On 07/12/17 04:08, Dongjiu Geng wrote:
> (1) Add related APEI/HEST table structures and  macros, these
>     definition refer to ACPI 6.1 and UEFI 2.6 spec.
> (2) Add generic error status block and CPER memory section
>     definition, user space only handle memory section errors.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>

This patch looks generally good to me; let me comment on the code
organization:

(a) Please make the subject less generic. For example,

  ACPI: add APEI/HEST/CPER structures and macros

> ---
> thanks Laszlo and Michael's review:
> 
> chnage since v4:
> (1) fix email threading in this series is incorrect issue
> 
> change since v3:
> (1) separate the original one patch into three patches: one is new
>     ACPI structures and macros, another is C source file to generate ACPI HEST
>     table and dynamically record CPER ,final patch is the change about Makefile
>     and configuration
> (2) add comments about where the ACPI structures and macros come from, for example,
>     they come from the UEFI Spec 2.6, "xxxxxxxxxxxx"; ACPI 6.1 spec, 
>     "xxxxxxxxxxxxxx".
> (3) correct the macros name, for emaple, prefix some macro names with
>     "UEFI_".
> (4) remove the uuid_le struct and use the QemuUUID in the include/qemu/uuid.h"
> (5) remove the duplicate ACPI address space, because it already defined in 
>     the "include/hw/acpi/aml-build.h"
> (6) remove the acpi_generic_address structure because same definition exists in the 
>     AcpiGenericAddress.
> (7) rename the struct acpi_hest_notify to AcpiHestNotifyType
> (8) rename the struct acpi_hest_types to AcpiHestSourceType
> (9) rename enum constants AcpiHestSourceType to ACPI_HEST_SOURCE_xxx from
>      ACPI_HEST_TYPE_xxx
> (10) remove the NOT_USED{3,4,5} enum constants in the AcpiHestSourceType.
> (11) add missed QEMU_PACKED for the struct definition.
> (12) remove the defnition of AcpiGenericErrorData, and rename the
>      AcpiGenericErrorDataV300 to AcpiGenericErrorData.
> (13) use the QemuUUID type for the "section_type" field AcpiGenericErrorData, and rename it
>      to section_type_le.
> (14) moving type AcpiGenericErrorSeverity above AcpiGenericErrorData and
>      AcpiGenericErrorDataV300, and remarking on the "error_severity" fields
>      that they take their values from AcpiGenericErrorSeverity
> (15) remove the wrongly call to BERT(Boot Error Record Table)
> (16) add comments for the struction member, for example, pint out that
>      the block_status member in the AcpiGenericErrorStatus is a bitmask
>      composed of ACPI_GEBS_xxx macros
> (17) remove the hardware error source notification type list, and rename
>      the MAX_ERROR_SOURCE_COUNT_V6 to ACPI_HEST_NOTIFY_RESERVED.
> (18) remove the physical_addr member of GhesErrorState
> (19) change the "uint64_t ghes_addr_le[8]" in GhesErrorState to uint64_t ghes_addr_le
> (20) change the second parameter to "error_physical_addr" in the ghes_update_guest
>      API statement
> ---
>  include/hw/acpi/acpi-defs.h | 194 ++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/acpi/aml-build.h |   1 +
>  include/hw/acpi/hest_ghes.h |  47 +++++++++++
>  include/qemu/uuid.h         |  11 +++
>  4 files changed, 253 insertions(+)
>  create mode 100644 include/hw/acpi/hest_ghes.h
> 
> diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
> index 4cc3630..0756339 100644
> --- a/include/hw/acpi/acpi-defs.h
> +++ b/include/hw/acpi/acpi-defs.h
> @@ -15,6 +15,7 @@
>  #ifndef QEMU_ACPI_DEFS_H
>  #define QEMU_ACPI_DEFS_H
>  
> +#include "qemu/uuid.h"

(b) This is what I first thought upon seeing this:

'I don't think this should be included here. You don't use UUID_BE or
UEFI_CPER_SEC_PLATFORM_MEM anywhere in this patch. So whatever uses
those macros can include "qemu/uuid.h" directly.'

However, after all, I think this #include is OK here, with a
modification lower down.

>  enum {
>      ACPI_FADT_F_WBINVD,
>      ACPI_FADT_F_WBINVD_FLUSH,
> @@ -295,6 +296,44 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable;
>  #define ACPI_APIC_GENERIC_TRANSLATOR    15
>  #define ACPI_APIC_RESERVED              16   /* 16 and greater are reserved */
>  
> +/* UEFI Spec 2.6, "N.2.5 Memory Error Section */
> +#define UEFI_CPER_MEM_VALID_ERROR_STATUS     0x0001
> +#define UEFI_CPER_MEM_VALID_PA               0x0002
> +#define UEFI_CPER_MEM_VALID_PA_MASK          0x0004
> +#define UEFI_CPER_MEM_VALID_NODE             0x0008
> +#define UEFI_CPER_MEM_VALID_CARD             0x0010
> +#define UEFI_CPER_MEM_VALID_MODULE           0x0020
> +#define UEFI_CPER_MEM_VALID_BANK             0x0040
> +#define UEFI_CPER_MEM_VALID_DEVICE           0x0080
> +#define UEFI_CPER_MEM_VALID_ROW              0x0100
> +#define UEFI_CPER_MEM_VALID_COLUMN           0x0200
> +#define UEFI_CPER_MEM_VALID_BIT_POSITION     0x0400
> +#define UEFI_CPER_MEM_VALID_REQUESTOR        0x0800
> +#define UEFI_CPER_MEM_VALID_RESPONDER        0x1000
> +#define UEFI_CPER_MEM_VALID_TARGET           0x2000
> +#define UEFI_CPER_MEM_VALID_ERROR_TYPE       0x4000
> +#define UEFI_CPER_MEM_VALID_RANK_NUMBER      0x8000
> +#define UEFI_CPER_MEM_VALID_CARD_HANDLE      0x10000
> +#define UEFI_CPER_MEM_VALID_MODULE_HANDLE    0x20000
> +#define UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC   3
> +
> +/* This is from the ACPI 6.1 spec, "18.3.2.9 Hardware Error Notification" */
> +
> +enum AcpiHestNotifyType {
> +    ACPI_HEST_NOTIFY_POLLED = 0,
> +    ACPI_HEST_NOTIFY_EXTERNAL = 1,
> +    ACPI_HEST_NOTIFY_LOCAL = 2,
> +    ACPI_HEST_NOTIFY_SCI = 3,
> +    ACPI_HEST_NOTIFY_NMI = 4,
> +    ACPI_HEST_NOTIFY_CMCI = 5,  /* ACPI 5.0 */
> +    ACPI_HEST_NOTIFY_MCE = 6,   /* ACPI 5.0 */
> +    ACPI_HEST_NOTIFY_GPIO = 7,  /* ACPI 6.0 */
> +    ACPI_HEST_NOTIFY_SEA = 8,   /* ACPI 6.1 */
> +    ACPI_HEST_NOTIFY_SEI = 9,   /* ACPI 6.1 */
> +    ACPI_HEST_NOTIFY_GSIV = 10, /* ACPI 6.1 */
> +    ACPI_HEST_NOTIFY_RESERVED = 11  /* 11 and greater are reserved */
> +};
> +
>  /*
>   * MADT sub-structures (Follow MULTIPLE_APIC_DESCRIPTION_TABLE)
>   */
> @@ -475,6 +514,161 @@ struct AcpiSystemResourceAffinityTable
>  } QEMU_PACKED;
>  typedef struct AcpiSystemResourceAffinityTable AcpiSystemResourceAffinityTable;
>  
> +/* Hardware Error Notification, this is from the ACPI 6.1
> + * spec, "18.3.2.9 Hardware Error Notification"
> + */
> +struct AcpiHestNotify {
> +    uint8_t type;
> +    uint8_t length;
> +    uint16_t config_write_enable;
> +    uint32_t poll_interval;
> +    uint32_t vector;
> +    uint32_t polling_threshold_value;
> +    uint32_t polling_threshold_window;
> +    uint32_t error_threshold_value;
> +    uint32_t error_threshold_window;
> +} QEMU_PACKED;
> +typedef struct AcpiHestNotify AcpiHestNotify;
> +
> +/* These are from ACPI 6.1, sections "18.3.2.1 IA-32 Architecture Machine
> + * Check Exception" through "18.3.2.8 Generic Hardware Error Source version 2".
> + */
> +enum AcpiHestSourceType {
> +    ACPI_HEST_SOURCE_IA32_CHECK = 0,
> +    ACPI_HEST_SOURCE_IA32_CORRECTED_CHECK = 1,
> +    ACPI_HEST_SOURCE_IA32_NMI = 2,
> +    ACPI_HEST_SOURCE_AER_ROOT_PORT = 6,
> +    ACPI_HEST_SOURCE_AER_ENDPOINT = 7,
> +    ACPI_HEST_SOURCE_AER_BRIDGE = 8,
> +    ACPI_HEST_SOURCE_GENERIC_ERROR = 9,
> +    ACPI_HEST_SOURCE_GENERIC_ERROR_V2 = 10,
> +    ACPI_HEST_SOURCE_RESERVED = 11    /* 11 and greater are reserved */
> +};
> +
> +/* Block Status bitmasks from ACPI 6.1, "18.3.2.7.1 Generic Error Data" */
> +#define ACPI_GEBS_UNCORRECTABLE             (1)
> +#define ACPI_GEBS_CORRECTABLE               (1 << 1)
> +#define ACPI_GEBS_MULTIPLE_UNCORRECTABLE    (1 << 2)
> +#define ACPI_GEBS_MULTIPLE_CORRECTABLE      (1 << 3)
> +/* 10 bits, error count */
> +#define ACPI_GEBS_ERROR_ENTRY_COUNT         (0x3FF << 4)
> +
> +/* Generic Hardware Error Source Structure, refer to ACPI 6.1
> + * "18.3.2.7 Generic Hardware Error Source". in this struct the
> + * "type" field has to be ACPI_HEST_SOURCE_GENERIC_ERROR
> + */
> +
> +struct AcpiGenericHardwareErrorSource {
> +    uint16_t type;
> +    uint16_t source_id;
> +    uint16_t related_source_id;
> +    uint8_t flags;
> +    uint8_t enabled;
> +    uint32_t number_of_records;
> +    uint32_t max_sections_per_record;
> +    uint32_t max_raw_data_length;
> +    struct AcpiGenericAddress error_status_address;
> +    struct AcpiHestNotify notify;
> +    uint32_t error_status_block_length;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericHardwareErrorSource AcpiGenericHardwareErrorSource;
> +
> +/* Generic Hardware Error Source, version 2, ACPI 6.1, "18.3.2.8 Generic
> + * Hardware Error Source version 2", in this struct the "type" field has to
> + * be ACPI_HEST_SOURCE_GENERIC_ERROR_V2
> + */
> +struct AcpiGenericHardwareErrorSourceV2 {
> +    uint16_t type;
> +    uint16_t source_id;
> +    uint16_t related_source_id;
> +    uint8_t flags;
> +    uint8_t enabled;
> +    uint32_t number_of_records;
> +    uint32_t max_sections_per_record;
> +    uint32_t max_raw_data_length;
> +    struct AcpiGenericAddress error_status_address;
> +    struct AcpiHestNotify notify;
> +    uint32_t error_status_block_length;
> +    struct AcpiGenericAddress read_ack_register;
> +    uint64_t read_ack_preserve;
> +    uint64_t read_ack_write;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericHardwareErrorSourceV2
> +            AcpiGenericHardwareErrorSourceV2;
> +
> +/* Generic Error Status block, this is from ACPI 6.1,
> + * "18.3.2.7.1 Generic Error Data"
> + */
> +struct AcpiGenericErrorStatus {
> +    /* It is a bitmask composed of ACPI_GEBS_xxx macros */
> +    uint32_t block_status;
> +    uint32_t raw_data_offset;
> +    uint32_t raw_data_length;
> +    uint32_t data_length;
> +    uint32_t error_severity;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericErrorStatus AcpiGenericErrorStatus;
> +
> +enum AcpiGenericErrorSeverity {
> +    ACPI_CPER_SEV_RECOVERABLE,
> +    ACPI_CPER_SEV_FATAL,
> +    ACPI_CPER_SEV_CORRECTED,
> +    ACPI_CPER_SEV_NONE,
> +};
> +
> +/* Generic Error Data entry, revision number is 0x0300,
> + * ACPI 6.1, "18.3.2.7.1 Generic Error Data"
> + */
> +struct AcpiGenericErrorData {
> +    QemuUUID section_type_le;
> +    /* The "error_severity" fields that they take their
> +     * values from AcpiGenericErrorSeverity
> +     */
> +    uint32_t error_severity;
> +    uint16_t revision;
> +    uint8_t validation_bits;
> +    uint8_t flags;
> +    uint32_t error_data_length;
> +    uint8_t fru_id[16];
> +    uint8_t fru_text[20];
> +    uint64_t time_stamp;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericErrorData AcpiGenericErrorData;
> +
> +/* This is from UEFI 2.6, "N.2.5 Memory Error Section" */
> +struct UefiCperSecMemErr {
> +    uint64_t    validation_bits;
> +    uint64_t    error_status;
> +    uint64_t    physical_addr;
> +    uint64_t    physical_addr_mask;
> +    uint16_t    node;
> +    uint16_t    card;
> +    uint16_t    module;
> +    uint16_t    bank;
> +    uint16_t    device;
> +    uint16_t    row;
> +    uint16_t    column;
> +    uint16_t    bit_pos;
> +    uint64_t    requestor_id;
> +    uint64_t    responder_id;
> +    uint64_t    target_id;
> +    uint8_t     error_type;
> +    uint8_t     reserved;
> +    uint16_t    rank;
> +    uint16_t    mem_array_handle;   /* card handle in UEFI 2.4 */
> +    uint16_t    mem_dev_handle;     /* module handle in UEFI 2.4 */
> +} QEMU_PACKED;
> +typedef struct UefiCperSecMemErr UefiCperSecMemErr;
> +
> +/*
> + * HEST Description Table
> + */
> +struct AcpiHardwareErrorSourceTable {
> +    ACPI_TABLE_HEADER_DEF                    /* ACPI common table header */
> +    uint32_t           error_source_count;
> +} QEMU_PACKED;
> +typedef struct AcpiHardwareErrorSourceTable AcpiHardwareErrorSourceTable;
> +
>  #define ACPI_SRAT_PROCESSOR_APIC     0
>  #define ACPI_SRAT_MEMORY             1
>  #define ACPI_SRAT_PROCESSOR_x2APIC   2
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 00c21f1..c1d15b3 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -211,6 +211,7 @@ struct AcpiBuildTables {
>      GArray *rsdp;
>      GArray *tcpalog;
>      GArray *vmgenid;
> +    GArray *hardware_errors;
>      BIOSLinker *linker;
>  } AcpiBuildTables;
>  

(c) I think that this hunk belongs with the implementation (patch #2),
not with the ACPI struct / macro introduction patch.

> diff --git a/include/hw/acpi/hest_ghes.h b/include/hw/acpi/hest_ghes.h
> new file mode 100644
> index 0000000..0772756
> --- /dev/null
> +++ b/include/hw/acpi/hest_ghes.h
> @@ -0,0 +1,47 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * Authors:
> + *   Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef ACPI_GHES_H
> +#define ACPI_GHES_H
> +
> +#include "hw/acpi/bios-linker-loader.h"
> +
> +#define GHES_ERRORS_FW_CFG_FILE      "etc/hardware_errors"
> +#define GHES_DATA_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
> +
> +#define GHES_GAS_ADDRESS_OFFSET              4
> +#define GHES_ERROR_STATUS_ADDRESS_OFFSET     20
> +#define GHES_NOTIFICATION_STRUCTURE          32
> +
> +#define GHES_CPER_OK   1
> +#define GHES_CPER_FAIL 0
> +
> +#define GHES_ACPI_HEST_NOTIFY_RESERVED           11
> +/* The max size in Bytes for one error block */
> +#define GHES_MAX_RAW_DATA_LENGTH                 0x1000
> +
> +
> +typedef struct GhesState {
> +    uint64_t ghes_addr_le;
> +} GhesState;
> +
> +void ghes_build_acpi(GArray *table_data, GArray *hardware_error,
> +                            BIOSLinker *linker);
> +void ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> +bool ghes_update_guest(uint32_t notify, uint64_t error_physical_addr);
> +#endif

(d) same comment as (c)

> diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
> index afe4840..99c4041 100644
> --- a/include/qemu/uuid.h
> +++ b/include/qemu/uuid.h
> @@ -44,6 +44,17 @@ typedef struct {
>  
>  #define UUID_NONE "00000000-0000-0000-0000-000000000000"
>  
> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> +   ((b) >> 8) & 0xff, (b) & 0xff,                   \
> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> +
> +/* Platform Memory, this is from UEFI 2.6 N.2.2 Section Descriptor */
> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> +    0xED, 0x7C, 0x83, 0xB1)
> +
>  void qemu_uuid_generate(QemuUUID *out);
>  
>  int qemu_uuid_is_null(const QemuUUID *uu);
> 

(e) I think the addition of UUID_BE should be split out to a separate
patch; it adds a general facility. It should likely be the very first
patch in the series.

(f) While I think it is justified to have UUID_BE() in "qemu/uuid.h", I
think UEFI_CPER_SEC_PLATFORM_MEM is too specific to have here.

If UEFI_CPER_SEC_PLATFORM_MEM were *not* a standardized UUID, I would
suggest moving it to the implementation ("include/hw/acpi/hest_ghes.h"
-- which in turn should be moved to patch #2, see my remark (d)), *plus*
I would suggest eliminating the new #include from "acpi-defs.h", see my
remark (b).

However, given that this UUID *is* standard, I suggest keeping the (b)
#include as you currently propose, and to move the definition of
UEFI_CPER_SEC_PLATFORM_MEM to "acpi-defs.h".

I vaguely recall that Michael commented on this previously, but I don't
remember what he said. Michael, are you OK with my suggestion?

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros
  2017-07-13 10:33   ` Laszlo Ersek
@ 2017-07-13 12:00     ` gengdongjiu
  2017-07-13 15:33       ` Laszlo Ersek
  2017-07-13 17:35       ` Michael S. Tsirkin
  0 siblings, 2 replies; 19+ messages in thread
From: gengdongjiu @ 2017-07-13 12:00 UTC (permalink / raw)
  To: Laszlo Ersek, mst, imammedo, famz, qemu-devel, zhaoshenglong,
	peter.maydell, qemu-arm, james.morse, zhengqiang10, huangshaoyu,
	wuquanming, Gaozhihui

Laszlo,
  Thank you for your review and comments.


On 2017/7/13 18:33, Laszlo Ersek wrote:
> On 07/12/17 04:08, Dongjiu Geng wrote:
>> (1) Add related APEI/HEST table structures and  macros, these
>>     definition refer to ACPI 6.1 and UEFI 2.6 spec.
>> (2) Add generic error status block and CPER memory section
>>     definition, user space only handle memory section errors.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> 
> This patch looks generally good to me; let me comment on the code
> organization:
> 
> (a) Please make the subject less generic. For example,
> 
>   ACPI: add APEI/HEST/CPER structures and macros
  Ok, got it.

> 
>> ---
>> thanks Laszlo and Michael's review:
>>
>> chnage since v4:
>> (1) fix email threading in this series is incorrect issue
>>
>> change since v3:
>> (1) separate the original one patch into three patches: one is new
>>     ACPI structures and macros, another is C source file to generate ACPI HEST
>>     table and dynamically record CPER ,final patch is the change about Makefile
>>     and configuration
>> (2) add comments about where the ACPI structures and macros come from, for example,
>>     they come from the UEFI Spec 2.6, "xxxxxxxxxxxx"; ACPI 6.1 spec, 
>>     "xxxxxxxxxxxxxx".
>> (3) correct the macros name, for emaple, prefix some macro names with
>>     "UEFI_".
>> (4) remove the uuid_le struct and use the QemuUUID in the include/qemu/uuid.h"
>> (5) remove the duplicate ACPI address space, because it already defined in 
>>     the "include/hw/acpi/aml-build.h"
>> (6) remove the acpi_generic_address structure because same definition exists in the 
>>     AcpiGenericAddress.
>> (7) rename the struct acpi_hest_notify to AcpiHestNotifyType
>> (8) rename the struct acpi_hest_types to AcpiHestSourceType
>> (9) rename enum constants AcpiHestSourceType to ACPI_HEST_SOURCE_xxx from
>>      ACPI_HEST_TYPE_xxx
>> (10) remove the NOT_USED{3,4,5} enum constants in the AcpiHestSourceType.
>> (11) add missed QEMU_PACKED for the struct definition.
>> (12) remove the defnition of AcpiGenericErrorData, and rename the
>>      AcpiGenericErrorDataV300 to AcpiGenericErrorData.
>> (13) use the QemuUUID type for the "section_type" field AcpiGenericErrorData, and rename it
>>      to section_type_le.
>> (14) moving type AcpiGenericErrorSeverity above AcpiGenericErrorData and
>>      AcpiGenericErrorDataV300, and remarking on the "error_severity" fields
>>      that they take their values from AcpiGenericErrorSeverity
>> (15) remove the wrongly call to BERT(Boot Error Record Table)
>> (16) add comments for the struction member, for example, pint out that
>>      the block_status member in the AcpiGenericErrorStatus is a bitmask
>>      composed of ACPI_GEBS_xxx macros
>> (17) remove the hardware error source notification type list, and rename
>>      the MAX_ERROR_SOURCE_COUNT_V6 to ACPI_HEST_NOTIFY_RESERVED.
>> (18) remove the physical_addr member of GhesErrorState
>> (19) change the "uint64_t ghes_addr_le[8]" in GhesErrorState to uint64_t ghes_addr_le
>> (20) change the second parameter to "error_physical_addr" in the ghes_update_guest
>>      API statement
>> ---
>>  include/hw/acpi/acpi-defs.h | 194 ++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/acpi/aml-build.h |   1 +
>>  include/hw/acpi/hest_ghes.h |  47 +++++++++++
>>  include/qemu/uuid.h         |  11 +++
>>  4 files changed, 253 insertions(+)
>>  create mode 100644 include/hw/acpi/hest_ghes.h
>>
>> diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
>> index 4cc3630..0756339 100644
>> --- a/include/hw/acpi/acpi-defs.h
>> +++ b/include/hw/acpi/acpi-defs.h
>> @@ -15,6 +15,7 @@
>>  #ifndef QEMU_ACPI_DEFS_H
>>  #define QEMU_ACPI_DEFS_H
>>  
>> +#include "qemu/uuid.h"
> 
> (b) This is what I first thought upon seeing this:
> 
> 'I don't think this should be included here. You don't use UUID_BE or
> UEFI_CPER_SEC_PLATFORM_MEM anywhere in this patch. So whatever uses
> those macros can include "qemu/uuid.h" directly.'
> 
> However, after all, I think this #include is OK here, with a
> modification lower down.
> 
>>  enum {
>>      ACPI_FADT_F_WBINVD,
>>      ACPI_FADT_F_WBINVD_FLUSH,
>> @@ -295,6 +296,44 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable;
>>  #define ACPI_APIC_GENERIC_TRANSLATOR    15
>>  #define ACPI_APIC_RESERVED              16   /* 16 and greater are reserved */
>>  
>> +/* UEFI Spec 2.6, "N.2.5 Memory Error Section */
>> +#define UEFI_CPER_MEM_VALID_ERROR_STATUS     0x0001
>> +#define UEFI_CPER_MEM_VALID_PA               0x0002
>> +#define UEFI_CPER_MEM_VALID_PA_MASK          0x0004
>> +#define UEFI_CPER_MEM_VALID_NODE             0x0008
>> +#define UEFI_CPER_MEM_VALID_CARD             0x0010
>> +#define UEFI_CPER_MEM_VALID_MODULE           0x0020
>> +#define UEFI_CPER_MEM_VALID_BANK             0x0040
>> +#define UEFI_CPER_MEM_VALID_DEVICE           0x0080
>> +#define UEFI_CPER_MEM_VALID_ROW              0x0100
>> +#define UEFI_CPER_MEM_VALID_COLUMN           0x0200
>> +#define UEFI_CPER_MEM_VALID_BIT_POSITION     0x0400
>> +#define UEFI_CPER_MEM_VALID_REQUESTOR        0x0800
>> +#define UEFI_CPER_MEM_VALID_RESPONDER        0x1000
>> +#define UEFI_CPER_MEM_VALID_TARGET           0x2000
>> +#define UEFI_CPER_MEM_VALID_ERROR_TYPE       0x4000
>> +#define UEFI_CPER_MEM_VALID_RANK_NUMBER      0x8000
>> +#define UEFI_CPER_MEM_VALID_CARD_HANDLE      0x10000
>> +#define UEFI_CPER_MEM_VALID_MODULE_HANDLE    0x20000
>> +#define UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC   3
>> +
>> +/* This is from the ACPI 6.1 spec, "18.3.2.9 Hardware Error Notification" */
>> +
>> +enum AcpiHestNotifyType {
>> +    ACPI_HEST_NOTIFY_POLLED = 0,
>> +    ACPI_HEST_NOTIFY_EXTERNAL = 1,
>> +    ACPI_HEST_NOTIFY_LOCAL = 2,
>> +    ACPI_HEST_NOTIFY_SCI = 3,
>> +    ACPI_HEST_NOTIFY_NMI = 4,
>> +    ACPI_HEST_NOTIFY_CMCI = 5,  /* ACPI 5.0 */
>> +    ACPI_HEST_NOTIFY_MCE = 6,   /* ACPI 5.0 */
>> +    ACPI_HEST_NOTIFY_GPIO = 7,  /* ACPI 6.0 */
>> +    ACPI_HEST_NOTIFY_SEA = 8,   /* ACPI 6.1 */
>> +    ACPI_HEST_NOTIFY_SEI = 9,   /* ACPI 6.1 */
>> +    ACPI_HEST_NOTIFY_GSIV = 10, /* ACPI 6.1 */
>> +    ACPI_HEST_NOTIFY_RESERVED = 11  /* 11 and greater are reserved */
>> +};
>> +
>>  /*
>>   * MADT sub-structures (Follow MULTIPLE_APIC_DESCRIPTION_TABLE)
>>   */
>> @@ -475,6 +514,161 @@ struct AcpiSystemResourceAffinityTable
>>  } QEMU_PACKED;
>>  typedef struct AcpiSystemResourceAffinityTable AcpiSystemResourceAffinityTable;
>>  
>> +/* Hardware Error Notification, this is from the ACPI 6.1
>> + * spec, "18.3.2.9 Hardware Error Notification"
>> + */
>> +struct AcpiHestNotify {
>> +    uint8_t type;
>> +    uint8_t length;
>> +    uint16_t config_write_enable;
>> +    uint32_t poll_interval;
>> +    uint32_t vector;
>> +    uint32_t polling_threshold_value;
>> +    uint32_t polling_threshold_window;
>> +    uint32_t error_threshold_value;
>> +    uint32_t error_threshold_window;
>> +} QEMU_PACKED;
>> +typedef struct AcpiHestNotify AcpiHestNotify;
>> +
>> +/* These are from ACPI 6.1, sections "18.3.2.1 IA-32 Architecture Machine
>> + * Check Exception" through "18.3.2.8 Generic Hardware Error Source version 2".
>> + */
>> +enum AcpiHestSourceType {
>> +    ACPI_HEST_SOURCE_IA32_CHECK = 0,
>> +    ACPI_HEST_SOURCE_IA32_CORRECTED_CHECK = 1,
>> +    ACPI_HEST_SOURCE_IA32_NMI = 2,
>> +    ACPI_HEST_SOURCE_AER_ROOT_PORT = 6,
>> +    ACPI_HEST_SOURCE_AER_ENDPOINT = 7,
>> +    ACPI_HEST_SOURCE_AER_BRIDGE = 8,
>> +    ACPI_HEST_SOURCE_GENERIC_ERROR = 9,
>> +    ACPI_HEST_SOURCE_GENERIC_ERROR_V2 = 10,
>> +    ACPI_HEST_SOURCE_RESERVED = 11    /* 11 and greater are reserved */
>> +};
>> +
>> +/* Block Status bitmasks from ACPI 6.1, "18.3.2.7.1 Generic Error Data" */
>> +#define ACPI_GEBS_UNCORRECTABLE             (1)
>> +#define ACPI_GEBS_CORRECTABLE               (1 << 1)
>> +#define ACPI_GEBS_MULTIPLE_UNCORRECTABLE    (1 << 2)
>> +#define ACPI_GEBS_MULTIPLE_CORRECTABLE      (1 << 3)
>> +/* 10 bits, error count */
>> +#define ACPI_GEBS_ERROR_ENTRY_COUNT         (0x3FF << 4)
>> +
>> +/* Generic Hardware Error Source Structure, refer to ACPI 6.1
>> + * "18.3.2.7 Generic Hardware Error Source". in this struct the
>> + * "type" field has to be ACPI_HEST_SOURCE_GENERIC_ERROR
>> + */
>> +
>> +struct AcpiGenericHardwareErrorSource {
>> +    uint16_t type;
>> +    uint16_t source_id;
>> +    uint16_t related_source_id;
>> +    uint8_t flags;
>> +    uint8_t enabled;
>> +    uint32_t number_of_records;
>> +    uint32_t max_sections_per_record;
>> +    uint32_t max_raw_data_length;
>> +    struct AcpiGenericAddress error_status_address;
>> +    struct AcpiHestNotify notify;
>> +    uint32_t error_status_block_length;
>> +} QEMU_PACKED;
>> +typedef struct AcpiGenericHardwareErrorSource AcpiGenericHardwareErrorSource;
>> +
>> +/* Generic Hardware Error Source, version 2, ACPI 6.1, "18.3.2.8 Generic
>> + * Hardware Error Source version 2", in this struct the "type" field has to
>> + * be ACPI_HEST_SOURCE_GENERIC_ERROR_V2
>> + */
>> +struct AcpiGenericHardwareErrorSourceV2 {
>> +    uint16_t type;
>> +    uint16_t source_id;
>> +    uint16_t related_source_id;
>> +    uint8_t flags;
>> +    uint8_t enabled;
>> +    uint32_t number_of_records;
>> +    uint32_t max_sections_per_record;
>> +    uint32_t max_raw_data_length;
>> +    struct AcpiGenericAddress error_status_address;
>> +    struct AcpiHestNotify notify;
>> +    uint32_t error_status_block_length;
>> +    struct AcpiGenericAddress read_ack_register;
>> +    uint64_t read_ack_preserve;
>> +    uint64_t read_ack_write;
>> +} QEMU_PACKED;
>> +typedef struct AcpiGenericHardwareErrorSourceV2
>> +            AcpiGenericHardwareErrorSourceV2;
>> +
>> +/* Generic Error Status block, this is from ACPI 6.1,
>> + * "18.3.2.7.1 Generic Error Data"
>> + */
>> +struct AcpiGenericErrorStatus {
>> +    /* It is a bitmask composed of ACPI_GEBS_xxx macros */
>> +    uint32_t block_status;
>> +    uint32_t raw_data_offset;
>> +    uint32_t raw_data_length;
>> +    uint32_t data_length;
>> +    uint32_t error_severity;
>> +} QEMU_PACKED;
>> +typedef struct AcpiGenericErrorStatus AcpiGenericErrorStatus;
>> +
>> +enum AcpiGenericErrorSeverity {
>> +    ACPI_CPER_SEV_RECOVERABLE,
>> +    ACPI_CPER_SEV_FATAL,
>> +    ACPI_CPER_SEV_CORRECTED,
>> +    ACPI_CPER_SEV_NONE,
>> +};
>> +
>> +/* Generic Error Data entry, revision number is 0x0300,
>> + * ACPI 6.1, "18.3.2.7.1 Generic Error Data"
>> + */
>> +struct AcpiGenericErrorData {
>> +    QemuUUID section_type_le;
>> +    /* The "error_severity" fields that they take their
>> +     * values from AcpiGenericErrorSeverity
>> +     */
>> +    uint32_t error_severity;
>> +    uint16_t revision;
>> +    uint8_t validation_bits;
>> +    uint8_t flags;
>> +    uint32_t error_data_length;
>> +    uint8_t fru_id[16];
>> +    uint8_t fru_text[20];
>> +    uint64_t time_stamp;
>> +} QEMU_PACKED;
>> +typedef struct AcpiGenericErrorData AcpiGenericErrorData;
>> +
>> +/* This is from UEFI 2.6, "N.2.5 Memory Error Section" */
>> +struct UefiCperSecMemErr {
>> +    uint64_t    validation_bits;
>> +    uint64_t    error_status;
>> +    uint64_t    physical_addr;
>> +    uint64_t    physical_addr_mask;
>> +    uint16_t    node;
>> +    uint16_t    card;
>> +    uint16_t    module;
>> +    uint16_t    bank;
>> +    uint16_t    device;
>> +    uint16_t    row;
>> +    uint16_t    column;
>> +    uint16_t    bit_pos;
>> +    uint64_t    requestor_id;
>> +    uint64_t    responder_id;
>> +    uint64_t    target_id;
>> +    uint8_t     error_type;
>> +    uint8_t     reserved;
>> +    uint16_t    rank;
>> +    uint16_t    mem_array_handle;   /* card handle in UEFI 2.4 */
>> +    uint16_t    mem_dev_handle;     /* module handle in UEFI 2.4 */
>> +} QEMU_PACKED;
>> +typedef struct UefiCperSecMemErr UefiCperSecMemErr;
>> +
>> +/*
>> + * HEST Description Table
>> + */
>> +struct AcpiHardwareErrorSourceTable {
>> +    ACPI_TABLE_HEADER_DEF                    /* ACPI common table header */
>> +    uint32_t           error_source_count;
>> +} QEMU_PACKED;
>> +typedef struct AcpiHardwareErrorSourceTable AcpiHardwareErrorSourceTable;
>> +
>>  #define ACPI_SRAT_PROCESSOR_APIC     0
>>  #define ACPI_SRAT_MEMORY             1
>>  #define ACPI_SRAT_PROCESSOR_x2APIC   2
>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
>> index 00c21f1..c1d15b3 100644
>> --- a/include/hw/acpi/aml-build.h
>> +++ b/include/hw/acpi/aml-build.h
>> @@ -211,6 +211,7 @@ struct AcpiBuildTables {
>>      GArray *rsdp;
>>      GArray *tcpalog;
>>      GArray *vmgenid;
>> +    GArray *hardware_errors;
>>      BIOSLinker *linker;
>>  } AcpiBuildTables;
>>  
> 
> (c) I think that this hunk belongs with the implementation (patch #2),
> not with the ACPI struct / macro introduction patch.
   yes, the "aml-build.h" and "hest_ghes.h" modification is about the HEST/GHES generation
   and CPER record, it should be remove to (patch #2). I will change it.

> 
>> diff --git a/include/hw/acpi/hest_ghes.h b/include/hw/acpi/hest_ghes.h
>> new file mode 100644
>> index 0000000..0772756
>> --- /dev/null
>> +++ b/include/hw/acpi/hest_ghes.h
>> @@ -0,0 +1,47 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> +
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * Authors:
>> + *   Dongjiu Geng <gengdongjiu@huawei.com>
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef ACPI_GHES_H
>> +#define ACPI_GHES_H
>> +
>> +#include "hw/acpi/bios-linker-loader.h"
>> +
>> +#define GHES_ERRORS_FW_CFG_FILE      "etc/hardware_errors"
>> +#define GHES_DATA_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
>> +
>> +#define GHES_GAS_ADDRESS_OFFSET              4
>> +#define GHES_ERROR_STATUS_ADDRESS_OFFSET     20
>> +#define GHES_NOTIFICATION_STRUCTURE          32
>> +
>> +#define GHES_CPER_OK   1
>> +#define GHES_CPER_FAIL 0
>> +
>> +#define GHES_ACPI_HEST_NOTIFY_RESERVED           11
>> +/* The max size in Bytes for one error block */
>> +#define GHES_MAX_RAW_DATA_LENGTH                 0x1000
>> +
>> +
>> +typedef struct GhesState {
>> +    uint64_t ghes_addr_le;
>> +} GhesState;
>> +
>> +void ghes_build_acpi(GArray *table_data, GArray *hardware_error,
>> +                            BIOSLinker *linker);
>> +void ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
>> +bool ghes_update_guest(uint32_t notify, uint64_t error_physical_addr);
>> +#endif
> 
> (d) same comment as (c)
 got it.

> 
>> diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
>> index afe4840..99c4041 100644
>> --- a/include/qemu/uuid.h
>> +++ b/include/qemu/uuid.h
>> @@ -44,6 +44,17 @@ typedef struct {
>>  
>>  #define UUID_NONE "00000000-0000-0000-0000-000000000000"
>>  
>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>> +   ((b) >> 8) & 0xff, (b) & 0xff,                   \
>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>> +
>> +/* Platform Memory, this is from UEFI 2.6 N.2.2 Section Descriptor */
>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>> +    0xED, 0x7C, 0x83, 0xB1)
>> +
>>  void qemu_uuid_generate(QemuUUID *out);
>>  
>>  int qemu_uuid_is_null(const QemuUUID *uu);
>>
> 
> (e) I think the addition of UUID_BE should be split out to a separate
> patch; it adds a general facility. It should likely be the very first
> patch in the series.
  Ok.

> 
> (f) While I think it is justified to have UUID_BE() in "qemu/uuid.h", I
> think UEFI_CPER_SEC_PLATFORM_MEM is too specific to have here.
> 
> If UEFI_CPER_SEC_PLATFORM_MEM were *not* a standardized UUID, I would
> suggest moving it to the implementation ("include/hw/acpi/hest_ghes.h"
> -- which in turn should be moved to patch #2, see my remark (d)), *plus*
> I would suggest eliminating the new #include from "acpi-defs.h", see my
> remark (b).
  understand your idea.

> 
> However, given that this UUID *is* standard, I suggest keeping the (b)
> #include as you currently propose, and to move the definition of
> UEFI_CPER_SEC_PLATFORM_MEM to "acpi-defs.h".
  I agree with you.

> 
> I vaguely recall that Michael commented on this previously, but I don't
> remember what he said. Michael, are you OK with my suggestion?
   Laszlo, I pasted Michael's comments here, as shown below. Michael said the definition
should use build_append_int_noprefix to add data. but I think it may not good, becuase
the section "UEFI_CPER_SEC_PLATFORM_MEM" is runtime recorded as CPER, not a ACPI/HEST
table member, so it is not generated when system boot up. On the other hand,UEFI_CPER_SEC_PLATFORM_MEM
definition is from UEFI spec 2.6, N.2.2 Section Descriptor: {0xA5BC1114, 0x6F64, 0x4EDE, {0xB8, 0x63, 0x3E, 0x83, 0xED, 0x7C, 0x83, 0xB1}}.
if use  build_append_int_noprefix to add, may confuse others.

-----------------------------------------------------------------
-----------------------------------------------------------------
There's no reason to define these messy one-time use macros.
They just make it hard to look things up in the spec.


You can use build_append_int_noprefix to add data of
any length in LE format.
-----------------------------------------------------------------
-----------------------------------------------------------------

Hi  Michael,
  what is your suggestion about it? do you agree with Laszlo?


> 
> Thanks
> Laszlo
> 
> .
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros
  2017-07-13 12:00     ` gengdongjiu
@ 2017-07-13 15:33       ` Laszlo Ersek
  2017-07-13 17:13         ` Michael S. Tsirkin
  2017-07-13 17:35       ` Michael S. Tsirkin
  1 sibling, 1 reply; 19+ messages in thread
From: Laszlo Ersek @ 2017-07-13 15:33 UTC (permalink / raw)
  To: gengdongjiu, mst, imammedo, famz, qemu-devel, zhaoshenglong,
	peter.maydell, qemu-arm, james.morse, zhengqiang10, huangshaoyu,
	wuquanming, Gaozhihui

On 07/13/17 14:00, gengdongjiu wrote:
> Laszlo,
>   Thank you for your review and comments.
> 
> 
> On 2017/7/13 18:33, Laszlo Ersek wrote:
>> On 07/12/17 04:08, Dongjiu Geng wrote:

[snip]

>>> --- a/include/qemu/uuid.h
>>> +++ b/include/qemu/uuid.h
>>> @@ -44,6 +44,17 @@ typedef struct {
>>>  
>>>  #define UUID_NONE "00000000-0000-0000-0000-000000000000"
>>>  
>>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>>> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>>> +   ((b) >> 8) & 0xff, (b) & 0xff,                   \
>>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>>> +
>>> +/* Platform Memory, this is from UEFI 2.6 N.2.2 Section Descriptor */
>>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>>> +    0xED, 0x7C, 0x83, 0xB1)
>>> +
>>>  void qemu_uuid_generate(QemuUUID *out);
>>>  
>>>  int qemu_uuid_is_null(const QemuUUID *uu);
>>>
>>
>> (e) I think the addition of UUID_BE should be split out to a separate
>> patch; it adds a general facility. It should likely be the very first
>> patch in the series.
>   Ok.
> 
>>
>> (f) While I think it is justified to have UUID_BE() in "qemu/uuid.h", I
>> think UEFI_CPER_SEC_PLATFORM_MEM is too specific to have here.
>>
>> If UEFI_CPER_SEC_PLATFORM_MEM were *not* a standardized UUID, I would
>> suggest moving it to the implementation ("include/hw/acpi/hest_ghes.h"
>> -- which in turn should be moved to patch #2, see my remark (d)), *plus*
>> I would suggest eliminating the new #include from "acpi-defs.h", see my
>> remark (b).
>   understand your idea.
> 
>>
>> However, given that this UUID *is* standard, I suggest keeping the (b)
>> #include as you currently propose, and to move the definition of
>> UEFI_CPER_SEC_PLATFORM_MEM to "acpi-defs.h".
>   I agree with you.
> 
>>
>> I vaguely recall that Michael commented on this previously, but I don't
>> remember what he said. Michael, are you OK with my suggestion?
>    Laszlo, I pasted Michael's comments here, as shown below. Michael said the definition
> should use build_append_int_noprefix to add data. but I think it may not good, becuase
> the section "UEFI_CPER_SEC_PLATFORM_MEM" is runtime recorded as CPER, not a ACPI/HEST
> table member, so it is not generated when system boot up.

I agree: the UUID in question is not placed into the ACPI payload /
fw_cfg blobs, it is written into guest memory at runtime, into the
firmware-allocated area, if and when there is a hardware error to report.

Thanks
Laszlo

> On the other hand,UEFI_CPER_SEC_PLATFORM_MEM
> definition is from UEFI spec 2.6, N.2.2 Section Descriptor: {0xA5BC1114, 0x6F64, 0x4EDE, {0xB8, 0x63, 0x3E, 0x83, 0xED, 0x7C, 0x83, 0xB1}}.
> if use  build_append_int_noprefix to add, may confuse others.
> 
> -----------------------------------------------------------------
> -----------------------------------------------------------------
> There's no reason to define these messy one-time use macros.
> They just make it hard to look things up in the spec.
> 
> 
> You can use build_append_int_noprefix to add data of
> any length in LE format.
> -----------------------------------------------------------------
> -----------------------------------------------------------------
> 
> Hi  Michael,
>   what is your suggestion about it? do you agree with Laszlo?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros
  2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros Dongjiu Geng
  2017-07-13 10:33   ` Laszlo Ersek
@ 2017-07-13 17:01   ` Michael S. Tsirkin
  2017-07-14  5:51     ` gengdongjiu
  2017-07-13 17:11   ` Michael S. Tsirkin
  2 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-13 17:01 UTC (permalink / raw)
  To: Dongjiu Geng
  Cc: lersek, imammedo, famz, qemu-devel, zhaoshenglong, peter.maydell,
	qemu-arm, james.morse, zhengqiang10, huangshaoyu, wuquanming

On Wed, Jul 12, 2017 at 10:08:15AM +0800, Dongjiu Geng wrote:
> (1) Add related APEI/HEST table structures and  macros, these
>     definition refer to ACPI 6.1 and UEFI 2.6 spec.
> (2) Add generic error status block and CPER memory section
>     definition, user space only handle memory section errors.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> ---
> thanks Laszlo and Michael's review:
> 
> chnage since v4:
> (1) fix email threading in this series is incorrect issue
> 
> change since v3:
> (1) separate the original one patch into three patches: one is new
>     ACPI structures and macros, another is C source file to generate ACPI HEST
>     table and dynamically record CPER ,final patch is the change about Makefile
>     and configuration
> (2) add comments about where the ACPI structures and macros come from, for example,
>     they come from the UEFI Spec 2.6, "xxxxxxxxxxxx"; ACPI 6.1 spec, 
>     "xxxxxxxxxxxxxx".
> (3) correct the macros name, for emaple, prefix some macro names with
>     "UEFI_".
> (4) remove the uuid_le struct and use the QemuUUID in the include/qemu/uuid.h"
> (5) remove the duplicate ACPI address space, because it already defined in 
>     the "include/hw/acpi/aml-build.h"
> (6) remove the acpi_generic_address structure because same definition exists in the 
>     AcpiGenericAddress.
> (7) rename the struct acpi_hest_notify to AcpiHestNotifyType
> (8) rename the struct acpi_hest_types to AcpiHestSourceType
> (9) rename enum constants AcpiHestSourceType to ACPI_HEST_SOURCE_xxx from
>      ACPI_HEST_TYPE_xxx
> (10) remove the NOT_USED{3,4,5} enum constants in the AcpiHestSourceType.
> (11) add missed QEMU_PACKED for the struct definition.
> (12) remove the defnition of AcpiGenericErrorData, and rename the
>      AcpiGenericErrorDataV300 to AcpiGenericErrorData.
> (13) use the QemuUUID type for the "section_type" field AcpiGenericErrorData, and rename it
>      to section_type_le.
> (14) moving type AcpiGenericErrorSeverity above AcpiGenericErrorData and
>      AcpiGenericErrorDataV300, and remarking on the "error_severity" fields
>      that they take their values from AcpiGenericErrorSeverity
> (15) remove the wrongly call to BERT(Boot Error Record Table)
> (16) add comments for the struction member, for example, pint out that
>      the block_status member in the AcpiGenericErrorStatus is a bitmask
>      composed of ACPI_GEBS_xxx macros
> (17) remove the hardware error source notification type list, and rename
>      the MAX_ERROR_SOURCE_COUNT_V6 to ACPI_HEST_NOTIFY_RESERVED.
> (18) remove the physical_addr member of GhesErrorState
> (19) change the "uint64_t ghes_addr_le[8]" in GhesErrorState to uint64_t ghes_addr_le
> (20) change the second parameter to "error_physical_addr" in the ghes_update_guest
>      API statement
> ---
>  include/hw/acpi/acpi-defs.h | 194 ++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/acpi/aml-build.h |   1 +
>  include/hw/acpi/hest_ghes.h |  47 +++++++++++
>  include/qemu/uuid.h         |  11 +++
>  4 files changed, 253 insertions(+)
>  create mode 100644 include/hw/acpi/hest_ghes.h
> 
> diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
> index 4cc3630..0756339 100644
> --- a/include/hw/acpi/acpi-defs.h
> +++ b/include/hw/acpi/acpi-defs.h
> @@ -15,6 +15,7 @@
>  #ifndef QEMU_ACPI_DEFS_H
>  #define QEMU_ACPI_DEFS_H
>  
> +#include "qemu/uuid.h"
>  enum {
>      ACPI_FADT_F_WBINVD,
>      ACPI_FADT_F_WBINVD_FLUSH,
> @@ -295,6 +296,44 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable;
>  #define ACPI_APIC_GENERIC_TRANSLATOR    15
>  #define ACPI_APIC_RESERVED              16   /* 16 and greater are reserved */
>  
> +/* UEFI Spec 2.6, "N.2.5 Memory Error Section */
> +#define UEFI_CPER_MEM_VALID_ERROR_STATUS     0x0001
> +#define UEFI_CPER_MEM_VALID_PA               0x0002
> +#define UEFI_CPER_MEM_VALID_PA_MASK          0x0004
> +#define UEFI_CPER_MEM_VALID_NODE             0x0008
> +#define UEFI_CPER_MEM_VALID_CARD             0x0010
> +#define UEFI_CPER_MEM_VALID_MODULE           0x0020
> +#define UEFI_CPER_MEM_VALID_BANK             0x0040
> +#define UEFI_CPER_MEM_VALID_DEVICE           0x0080
> +#define UEFI_CPER_MEM_VALID_ROW              0x0100
> +#define UEFI_CPER_MEM_VALID_COLUMN           0x0200
> +#define UEFI_CPER_MEM_VALID_BIT_POSITION     0x0400
> +#define UEFI_CPER_MEM_VALID_REQUESTOR        0x0800
> +#define UEFI_CPER_MEM_VALID_RESPONDER        0x1000
> +#define UEFI_CPER_MEM_VALID_TARGET           0x2000
> +#define UEFI_CPER_MEM_VALID_ERROR_TYPE       0x4000
> +#define UEFI_CPER_MEM_VALID_RANK_NUMBER      0x8000
> +#define UEFI_CPER_MEM_VALID_CARD_HANDLE      0x10000
> +#define UEFI_CPER_MEM_VALID_MODULE_HANDLE    0x20000
> +#define UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC   3
> +
> +/* This is from the ACPI 6.1 spec, "18.3.2.9 Hardware Error Notification" */
> +
> +enum AcpiHestNotifyType {
> +    ACPI_HEST_NOTIFY_POLLED = 0,
> +    ACPI_HEST_NOTIFY_EXTERNAL = 1,
> +    ACPI_HEST_NOTIFY_LOCAL = 2,
> +    ACPI_HEST_NOTIFY_SCI = 3,
> +    ACPI_HEST_NOTIFY_NMI = 4,
> +    ACPI_HEST_NOTIFY_CMCI = 5,  /* ACPI 5.0 */
> +    ACPI_HEST_NOTIFY_MCE = 6,   /* ACPI 5.0 */
> +    ACPI_HEST_NOTIFY_GPIO = 7,  /* ACPI 6.0 */
> +    ACPI_HEST_NOTIFY_SEA = 8,   /* ACPI 6.1 */
> +    ACPI_HEST_NOTIFY_SEI = 9,   /* ACPI 6.1 */
> +    ACPI_HEST_NOTIFY_GSIV = 10, /* ACPI 6.1 */
> +    ACPI_HEST_NOTIFY_RESERVED = 11  /* 11 and greater are reserved */
> +};
> +
>  /*
>   * MADT sub-structures (Follow MULTIPLE_APIC_DESCRIPTION_TABLE)
>   */
> @@ -475,6 +514,161 @@ struct AcpiSystemResourceAffinityTable
>  } QEMU_PACKED;
>  typedef struct AcpiSystemResourceAffinityTable AcpiSystemResourceAffinityTable;
>  
> +/* Hardware Error Notification, this is from the ACPI 6.1
> + * spec, "18.3.2.9 Hardware Error Notification"
> + */
> +struct AcpiHestNotify {
> +    uint8_t type;
> +    uint8_t length;
> +    uint16_t config_write_enable;
> +    uint32_t poll_interval;
> +    uint32_t vector;
> +    uint32_t polling_threshold_value;
> +    uint32_t polling_threshold_window;
> +    uint32_t error_threshold_value;
> +    uint32_t error_threshold_window;
> +} QEMU_PACKED;
> +typedef struct AcpiHestNotify AcpiHestNotify;
> +
> +/* These are from ACPI 6.1, sections "18.3.2.1 IA-32 Architecture Machine
> + * Check Exception" through "18.3.2.8 Generic Hardware Error Source version 2".
> + */
> +enum AcpiHestSourceType {
> +    ACPI_HEST_SOURCE_IA32_CHECK = 0,
> +    ACPI_HEST_SOURCE_IA32_CORRECTED_CHECK = 1,
> +    ACPI_HEST_SOURCE_IA32_NMI = 2,
> +    ACPI_HEST_SOURCE_AER_ROOT_PORT = 6,
> +    ACPI_HEST_SOURCE_AER_ENDPOINT = 7,
> +    ACPI_HEST_SOURCE_AER_BRIDGE = 8,
> +    ACPI_HEST_SOURCE_GENERIC_ERROR = 9,
> +    ACPI_HEST_SOURCE_GENERIC_ERROR_V2 = 10,
> +    ACPI_HEST_SOURCE_RESERVED = 11    /* 11 and greater are reserved */
> +};
> +
> +/* Block Status bitmasks from ACPI 6.1, "18.3.2.7.1 Generic Error Data" */
> +#define ACPI_GEBS_UNCORRECTABLE             (1)
> +#define ACPI_GEBS_CORRECTABLE               (1 << 1)
> +#define ACPI_GEBS_MULTIPLE_UNCORRECTABLE    (1 << 2)
> +#define ACPI_GEBS_MULTIPLE_CORRECTABLE      (1 << 3)
> +/* 10 bits, error count */
> +#define ACPI_GEBS_ERROR_ENTRY_COUNT         (0x3FF << 4)
> +
> +/* Generic Hardware Error Source Structure, refer to ACPI 6.1
> + * "18.3.2.7 Generic Hardware Error Source". in this struct the
> + * "type" field has to be ACPI_HEST_SOURCE_GENERIC_ERROR
> + */
> +
> +struct AcpiGenericHardwareErrorSource {
> +    uint16_t type;
> +    uint16_t source_id;
> +    uint16_t related_source_id;
> +    uint8_t flags;
> +    uint8_t enabled;
> +    uint32_t number_of_records;
> +    uint32_t max_sections_per_record;
> +    uint32_t max_raw_data_length;
> +    struct AcpiGenericAddress error_status_address;
> +    struct AcpiHestNotify notify;
> +    uint32_t error_status_block_length;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericHardwareErrorSource AcpiGenericHardwareErrorSource;
> +
> +/* Generic Hardware Error Source, version 2, ACPI 6.1, "18.3.2.8 Generic
> + * Hardware Error Source version 2", in this struct the "type" field has to
> + * be ACPI_HEST_SOURCE_GENERIC_ERROR_V2
> + */
> +struct AcpiGenericHardwareErrorSourceV2 {
> +    uint16_t type;
> +    uint16_t source_id;
> +    uint16_t related_source_id;
> +    uint8_t flags;
> +    uint8_t enabled;
> +    uint32_t number_of_records;
> +    uint32_t max_sections_per_record;
> +    uint32_t max_raw_data_length;
> +    struct AcpiGenericAddress error_status_address;
> +    struct AcpiHestNotify notify;
> +    uint32_t error_status_block_length;
> +    struct AcpiGenericAddress read_ack_register;
> +    uint64_t read_ack_preserve;
> +    uint64_t read_ack_write;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericHardwareErrorSourceV2
> +            AcpiGenericHardwareErrorSourceV2;
> +
> +/* Generic Error Status block, this is from ACPI 6.1,
> + * "18.3.2.7.1 Generic Error Data"
> + */
> +struct AcpiGenericErrorStatus {
> +    /* It is a bitmask composed of ACPI_GEBS_xxx macros */
> +    uint32_t block_status;
> +    uint32_t raw_data_offset;
> +    uint32_t raw_data_length;
> +    uint32_t data_length;
> +    uint32_t error_severity;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericErrorStatus AcpiGenericErrorStatus;
> +
> +enum AcpiGenericErrorSeverity {
> +    ACPI_CPER_SEV_RECOVERABLE,
> +    ACPI_CPER_SEV_FATAL,
> +    ACPI_CPER_SEV_CORRECTED,
> +    ACPI_CPER_SEV_NONE,
> +};
> +
> +/* Generic Error Data entry, revision number is 0x0300,
> + * ACPI 6.1, "18.3.2.7.1 Generic Error Data"
> + */
> +struct AcpiGenericErrorData {
> +    QemuUUID section_type_le;
> +    /* The "error_severity" fields that they take their
> +     * values from AcpiGenericErrorSeverity
> +     */
> +    uint32_t error_severity;
> +    uint16_t revision;
> +    uint8_t validation_bits;
> +    uint8_t flags;
> +    uint32_t error_data_length;
> +    uint8_t fru_id[16];
> +    uint8_t fru_text[20];
> +    uint64_t time_stamp;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericErrorData AcpiGenericErrorData;
> +
> +/* This is from UEFI 2.6, "N.2.5 Memory Error Section" */
> +struct UefiCperSecMemErr {
> +    uint64_t    validation_bits;
> +    uint64_t    error_status;
> +    uint64_t    physical_addr;
> +    uint64_t    physical_addr_mask;
> +    uint16_t    node;
> +    uint16_t    card;
> +    uint16_t    module;
> +    uint16_t    bank;
> +    uint16_t    device;
> +    uint16_t    row;
> +    uint16_t    column;
> +    uint16_t    bit_pos;
> +    uint64_t    requestor_id;
> +    uint64_t    responder_id;
> +    uint64_t    target_id;
> +    uint8_t     error_type;
> +    uint8_t     reserved;
> +    uint16_t    rank;
> +    uint16_t    mem_array_handle;   /* card handle in UEFI 2.4 */
> +    uint16_t    mem_dev_handle;     /* module handle in UEFI 2.4 */
> +} QEMU_PACKED;
> +typedef struct UefiCperSecMemErr UefiCperSecMemErr;
> +
> +/*
> + * HEST Description Table
> + */
> +struct AcpiHardwareErrorSourceTable {
> +    ACPI_TABLE_HEADER_DEF                    /* ACPI common table header */
> +    uint32_t           error_source_count;
> +} QEMU_PACKED;
> +typedef struct AcpiHardwareErrorSourceTable AcpiHardwareErrorSourceTable;
> +
>  #define ACPI_SRAT_PROCESSOR_APIC     0
>  #define ACPI_SRAT_MEMORY             1
>  #define ACPI_SRAT_PROCESSOR_x2APIC   2
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 00c21f1..c1d15b3 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -211,6 +211,7 @@ struct AcpiBuildTables {
>      GArray *rsdp;
>      GArray *tcpalog;
>      GArray *vmgenid;
> +    GArray *hardware_errors;
>      BIOSLinker *linker;
>  } AcpiBuildTables;
>  
> diff --git a/include/hw/acpi/hest_ghes.h b/include/hw/acpi/hest_ghes.h
> new file mode 100644
> index 0000000..0772756
> --- /dev/null
> +++ b/include/hw/acpi/hest_ghes.h
> @@ -0,0 +1,47 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * Authors:
> + *   Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef ACPI_GHES_H
> +#define ACPI_GHES_H
> +
> +#include "hw/acpi/bios-linker-loader.h"
> +
> +#define GHES_ERRORS_FW_CFG_FILE      "etc/hardware_errors"
> +#define GHES_DATA_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
> +
> +#define GHES_GAS_ADDRESS_OFFSET              4
> +#define GHES_ERROR_STATUS_ADDRESS_OFFSET     20
> +#define GHES_NOTIFICATION_STRUCTURE          32
> +
> +#define GHES_CPER_OK   1
> +#define GHES_CPER_FAIL 0
> +
> +#define GHES_ACPI_HEST_NOTIFY_RESERVED           11
> +/* The max size in Bytes for one error block */
> +#define GHES_MAX_RAW_DATA_LENGTH                 0x1000
> +
> +
> +typedef struct GhesState {
> +    uint64_t ghes_addr_le;
> +} GhesState;
> +
> +void ghes_build_acpi(GArray *table_data, GArray *hardware_error,
> +                            BIOSLinker *linker);
> +void ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> +bool ghes_update_guest(uint32_t notify, uint64_t error_physical_addr);
> +#endif

It's pretty unusual to add the function declaration but not the
definition.

> diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
> index afe4840..99c4041 100644
> --- a/include/qemu/uuid.h
> +++ b/include/qemu/uuid.h
> @@ -44,6 +44,17 @@ typedef struct {
>  
>  #define UUID_NONE "00000000-0000-0000-0000-000000000000"
>  
> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> +   ((b) >> 8) & 0xff, (b) & 0xff,                   \
> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> +
> +/* Platform Memory, this is from UEFI 2.6 N.2.2 Section Descriptor */

Drop "this is" from the sentence.

> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> +    0xED, 0x7C, 0x83, 0xB1)
> +
>  void qemu_uuid_generate(QemuUUID *out);
>  
>  int qemu_uuid_is_null(const QemuUUID *uu);
> -- 
> 2.10.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros
  2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros Dongjiu Geng
  2017-07-13 10:33   ` Laszlo Ersek
  2017-07-13 17:01   ` Michael S. Tsirkin
@ 2017-07-13 17:11   ` Michael S. Tsirkin
  2017-07-14  8:22     ` gengdongjiu
  2 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-13 17:11 UTC (permalink / raw)
  To: Dongjiu Geng
  Cc: lersek, imammedo, famz, qemu-devel, zhaoshenglong, peter.maydell,
	qemu-arm, james.morse, zhengqiang10, huangshaoyu, wuquanming

On Wed, Jul 12, 2017 at 10:08:15AM +0800, Dongjiu Geng wrote:
> (1) Add related APEI/HEST table structures and  macros, these
>     definition refer to ACPI 6.1 and UEFI 2.6 spec.
> (2) Add generic error status block and CPER memory section
>     definition, user space only handle memory section errors.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> ---
> thanks Laszlo and Michael's review:
> 
> chnage since v4:
> (1) fix email threading in this series is incorrect issue
> 
> change since v3:
> (1) separate the original one patch into three patches: one is new
>     ACPI structures and macros, another is C source file to generate ACPI HEST
>     table and dynamically record CPER ,final patch is the change about Makefile
>     and configuration
> (2) add comments about where the ACPI structures and macros come from, for example,
>     they come from the UEFI Spec 2.6, "xxxxxxxxxxxx"; ACPI 6.1 spec, 
>     "xxxxxxxxxxxxxx".
> (3) correct the macros name, for emaple, prefix some macro names with
>     "UEFI_".
> (4) remove the uuid_le struct and use the QemuUUID in the include/qemu/uuid.h"
> (5) remove the duplicate ACPI address space, because it already defined in 
>     the "include/hw/acpi/aml-build.h"
> (6) remove the acpi_generic_address structure because same definition exists in the 
>     AcpiGenericAddress.
> (7) rename the struct acpi_hest_notify to AcpiHestNotifyType
> (8) rename the struct acpi_hest_types to AcpiHestSourceType
> (9) rename enum constants AcpiHestSourceType to ACPI_HEST_SOURCE_xxx from
>      ACPI_HEST_TYPE_xxx
> (10) remove the NOT_USED{3,4,5} enum constants in the AcpiHestSourceType.
> (11) add missed QEMU_PACKED for the struct definition.
> (12) remove the defnition of AcpiGenericErrorData, and rename the
>      AcpiGenericErrorDataV300 to AcpiGenericErrorData.
> (13) use the QemuUUID type for the "section_type" field AcpiGenericErrorData, and rename it
>      to section_type_le.
> (14) moving type AcpiGenericErrorSeverity above AcpiGenericErrorData and
>      AcpiGenericErrorDataV300, and remarking on the "error_severity" fields
>      that they take their values from AcpiGenericErrorSeverity
> (15) remove the wrongly call to BERT(Boot Error Record Table)
> (16) add comments for the struction member, for example, pint out that
>      the block_status member in the AcpiGenericErrorStatus is a bitmask
>      composed of ACPI_GEBS_xxx macros
> (17) remove the hardware error source notification type list, and rename
>      the MAX_ERROR_SOURCE_COUNT_V6 to ACPI_HEST_NOTIFY_RESERVED.
> (18) remove the physical_addr member of GhesErrorState
> (19) change the "uint64_t ghes_addr_le[8]" in GhesErrorState to uint64_t ghes_addr_le
> (20) change the second parameter to "error_physical_addr" in the ghes_update_guest
>      API statement
> ---
>  include/hw/acpi/acpi-defs.h | 194 ++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/acpi/aml-build.h |   1 +
>  include/hw/acpi/hest_ghes.h |  47 +++++++++++
>  include/qemu/uuid.h         |  11 +++
>  4 files changed, 253 insertions(+)
>  create mode 100644 include/hw/acpi/hest_ghes.h
> 
> diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
> index 4cc3630..0756339 100644
> --- a/include/hw/acpi/acpi-defs.h
> +++ b/include/hw/acpi/acpi-defs.h
> @@ -15,6 +15,7 @@
>  #ifndef QEMU_ACPI_DEFS_H
>  #define QEMU_ACPI_DEFS_H
>  
> +#include "qemu/uuid.h"
>  enum {
>      ACPI_FADT_F_WBINVD,
>      ACPI_FADT_F_WBINVD_FLUSH,
> @@ -295,6 +296,44 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable;
>  #define ACPI_APIC_GENERIC_TRANSLATOR    15
>  #define ACPI_APIC_RESERVED              16   /* 16 and greater are reserved */
>  
> +/* UEFI Spec 2.6, "N.2.5 Memory Error Section */
> +#define UEFI_CPER_MEM_VALID_ERROR_STATUS     0x0001
> +#define UEFI_CPER_MEM_VALID_PA               0x0002
> +#define UEFI_CPER_MEM_VALID_PA_MASK          0x0004
> +#define UEFI_CPER_MEM_VALID_NODE             0x0008
> +#define UEFI_CPER_MEM_VALID_CARD             0x0010
> +#define UEFI_CPER_MEM_VALID_MODULE           0x0020
> +#define UEFI_CPER_MEM_VALID_BANK             0x0040
> +#define UEFI_CPER_MEM_VALID_DEVICE           0x0080
> +#define UEFI_CPER_MEM_VALID_ROW              0x0100
> +#define UEFI_CPER_MEM_VALID_COLUMN           0x0200
> +#define UEFI_CPER_MEM_VALID_BIT_POSITION     0x0400
> +#define UEFI_CPER_MEM_VALID_REQUESTOR        0x0800
> +#define UEFI_CPER_MEM_VALID_RESPONDER        0x1000
> +#define UEFI_CPER_MEM_VALID_TARGET           0x2000
> +#define UEFI_CPER_MEM_VALID_ERROR_TYPE       0x4000
> +#define UEFI_CPER_MEM_VALID_RANK_NUMBER      0x8000
> +#define UEFI_CPER_MEM_VALID_CARD_HANDLE      0x10000
> +#define UEFI_CPER_MEM_VALID_MODULE_HANDLE    0x20000
> +#define UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC   3
> +
> +/* This is from the ACPI 6.1 spec, "18.3.2.9 Hardware Error Notification" */
> +
> +enum AcpiHestNotifyType {
> +    ACPI_HEST_NOTIFY_POLLED = 0,
> +    ACPI_HEST_NOTIFY_EXTERNAL = 1,
> +    ACPI_HEST_NOTIFY_LOCAL = 2,
> +    ACPI_HEST_NOTIFY_SCI = 3,
> +    ACPI_HEST_NOTIFY_NMI = 4,
> +    ACPI_HEST_NOTIFY_CMCI = 5,  /* ACPI 5.0 */
> +    ACPI_HEST_NOTIFY_MCE = 6,   /* ACPI 5.0 */
> +    ACPI_HEST_NOTIFY_GPIO = 7,  /* ACPI 6.0 */
> +    ACPI_HEST_NOTIFY_SEA = 8,   /* ACPI 6.1 */
> +    ACPI_HEST_NOTIFY_SEI = 9,   /* ACPI 6.1 */
> +    ACPI_HEST_NOTIFY_GSIV = 10, /* ACPI 6.1 */
> +    ACPI_HEST_NOTIFY_RESERVED = 11  /* 11 and greater are reserved */
> +};
> +
>  /*
>   * MADT sub-structures (Follow MULTIPLE_APIC_DESCRIPTION_TABLE)
>   */
> @@ -475,6 +514,161 @@ struct AcpiSystemResourceAffinityTable
>  } QEMU_PACKED;
>  typedef struct AcpiSystemResourceAffinityTable AcpiSystemResourceAffinityTable;
>  
> +/* Hardware Error Notification, this is from the ACPI 6.1
> + * spec, "18.3.2.9 Hardware Error Notification"
> + */
> +struct AcpiHestNotify {
> +    uint8_t type;
> +    uint8_t length;
> +    uint16_t config_write_enable;
> +    uint32_t poll_interval;
> +    uint32_t vector;
> +    uint32_t polling_threshold_value;
> +    uint32_t polling_threshold_window;
> +    uint32_t error_threshold_value;
> +    uint32_t error_threshold_window;
> +} QEMU_PACKED;
> +typedef struct AcpiHestNotify AcpiHestNotify;
> +
> +/* These are from ACPI 6.1, sections "18.3.2.1 IA-32 Architecture Machine
> + * Check Exception" through "18.3.2.8 Generic Hardware Error Source version 2".
> + */
> +enum AcpiHestSourceType {
> +    ACPI_HEST_SOURCE_IA32_CHECK = 0,
> +    ACPI_HEST_SOURCE_IA32_CORRECTED_CHECK = 1,
> +    ACPI_HEST_SOURCE_IA32_NMI = 2,
> +    ACPI_HEST_SOURCE_AER_ROOT_PORT = 6,
> +    ACPI_HEST_SOURCE_AER_ENDPOINT = 7,
> +    ACPI_HEST_SOURCE_AER_BRIDGE = 8,
> +    ACPI_HEST_SOURCE_GENERIC_ERROR = 9,
> +    ACPI_HEST_SOURCE_GENERIC_ERROR_V2 = 10,
> +    ACPI_HEST_SOURCE_RESERVED = 11    /* 11 and greater are reserved */
> +};
> +
> +/* Block Status bitmasks from ACPI 6.1, "18.3.2.7.1 Generic Error Data" */
> +#define ACPI_GEBS_UNCORRECTABLE             (1)
> +#define ACPI_GEBS_CORRECTABLE               (1 << 1)
> +#define ACPI_GEBS_MULTIPLE_UNCORRECTABLE    (1 << 2)
> +#define ACPI_GEBS_MULTIPLE_CORRECTABLE      (1 << 3)
> +/* 10 bits, error count */
> +#define ACPI_GEBS_ERROR_ENTRY_COUNT         (0x3FF << 4)
> +
> +/* Generic Hardware Error Source Structure, refer to ACPI 6.1
> + * "18.3.2.7 Generic Hardware Error Source". in this struct the
> + * "type" field has to be ACPI_HEST_SOURCE_GENERIC_ERROR
> + */
> +
> +struct AcpiGenericHardwareErrorSource {
> +    uint16_t type;
> +    uint16_t source_id;
> +    uint16_t related_source_id;
> +    uint8_t flags;
> +    uint8_t enabled;
> +    uint32_t number_of_records;
> +    uint32_t max_sections_per_record;
> +    uint32_t max_raw_data_length;
> +    struct AcpiGenericAddress error_status_address;
> +    struct AcpiHestNotify notify;
> +    uint32_t error_status_block_length;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericHardwareErrorSource AcpiGenericHardwareErrorSource;
> +
> +/* Generic Hardware Error Source, version 2, ACPI 6.1, "18.3.2.8 Generic
> + * Hardware Error Source version 2", in this struct the "type" field has to
> + * be ACPI_HEST_SOURCE_GENERIC_ERROR_V2
> + */
> +struct AcpiGenericHardwareErrorSourceV2 {
> +    uint16_t type;
> +    uint16_t source_id;
> +    uint16_t related_source_id;
> +    uint8_t flags;
> +    uint8_t enabled;
> +    uint32_t number_of_records;
> +    uint32_t max_sections_per_record;
> +    uint32_t max_raw_data_length;
> +    struct AcpiGenericAddress error_status_address;
> +    struct AcpiHestNotify notify;
> +    uint32_t error_status_block_length;
> +    struct AcpiGenericAddress read_ack_register;
> +    uint64_t read_ack_preserve;
> +    uint64_t read_ack_write;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericHardwareErrorSourceV2
> +            AcpiGenericHardwareErrorSourceV2;
> +
> +/* Generic Error Status block, this is from ACPI 6.1,
> + * "18.3.2.7.1 Generic Error Data"
> + */
> +struct AcpiGenericErrorStatus {
> +    /* It is a bitmask composed of ACPI_GEBS_xxx macros */
> +    uint32_t block_status;
> +    uint32_t raw_data_offset;
> +    uint32_t raw_data_length;
> +    uint32_t data_length;
> +    uint32_t error_severity;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericErrorStatus AcpiGenericErrorStatus;
> +
> +enum AcpiGenericErrorSeverity {
> +    ACPI_CPER_SEV_RECOVERABLE,
> +    ACPI_CPER_SEV_FATAL,
> +    ACPI_CPER_SEV_CORRECTED,
> +    ACPI_CPER_SEV_NONE,
> +};
> +
> +/* Generic Error Data entry, revision number is 0x0300,
> + * ACPI 6.1, "18.3.2.7.1 Generic Error Data"
> + */
> +struct AcpiGenericErrorData {
> +    QemuUUID section_type_le;
> +    /* The "error_severity" fields that they take their
> +     * values from AcpiGenericErrorSeverity
> +     */
> +    uint32_t error_severity;
> +    uint16_t revision;
> +    uint8_t validation_bits;
> +    uint8_t flags;
> +    uint32_t error_data_length;
> +    uint8_t fru_id[16];
> +    uint8_t fru_text[20];
> +    uint64_t time_stamp;
> +} QEMU_PACKED;
> +typedef struct AcpiGenericErrorData AcpiGenericErrorData;
> +
> +/* This is from UEFI 2.6, "N.2.5 Memory Error Section" */
> +struct UefiCperSecMemErr {
> +    uint64_t    validation_bits;
> +    uint64_t    error_status;
> +    uint64_t    physical_addr;
> +    uint64_t    physical_addr_mask;
> +    uint16_t    node;
> +    uint16_t    card;
> +    uint16_t    module;
> +    uint16_t    bank;
> +    uint16_t    device;
> +    uint16_t    row;
> +    uint16_t    column;
> +    uint16_t    bit_pos;
> +    uint64_t    requestor_id;
> +    uint64_t    responder_id;
> +    uint64_t    target_id;
> +    uint8_t     error_type;
> +    uint8_t     reserved;
> +    uint16_t    rank;
> +    uint16_t    mem_array_handle;   /* card handle in UEFI 2.4 */
> +    uint16_t    mem_dev_handle;     /* module handle in UEFI 2.4 */
> +} QEMU_PACKED;
> +typedef struct UefiCperSecMemErr UefiCperSecMemErr;
> +
> +/*
> + * HEST Description Table
> + */
> +struct AcpiHardwareErrorSourceTable {
> +    ACPI_TABLE_HEADER_DEF                    /* ACPI common table header */
> +    uint32_t           error_source_count;
> +} QEMU_PACKED;
> +typedef struct AcpiHardwareErrorSourceTable AcpiHardwareErrorSourceTable;
> +
>  #define ACPI_SRAT_PROCESSOR_APIC     0
>  #define ACPI_SRAT_MEMORY             1
>  #define ACPI_SRAT_PROCESSOR_x2APIC   2
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 00c21f1..c1d15b3 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -211,6 +211,7 @@ struct AcpiBuildTables {
>      GArray *rsdp;
>      GArray *tcpalog;
>      GArray *vmgenid;
> +    GArray *hardware_errors;
>      BIOSLinker *linker;
>  } AcpiBuildTables;
>  
> diff --git a/include/hw/acpi/hest_ghes.h b/include/hw/acpi/hest_ghes.h
> new file mode 100644
> index 0000000..0772756
> --- /dev/null
> +++ b/include/hw/acpi/hest_ghes.h
> @@ -0,0 +1,47 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * Authors:
> + *   Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef ACPI_GHES_H
> +#define ACPI_GHES_H
> +
> +#include "hw/acpi/bios-linker-loader.h"
> +
> +#define GHES_ERRORS_FW_CFG_FILE      "etc/hardware_errors"
> +#define GHES_DATA_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
> +
> +#define GHES_GAS_ADDRESS_OFFSET              4
> +#define GHES_ERROR_STATUS_ADDRESS_OFFSET     20
> +#define GHES_NOTIFICATION_STRUCTURE          32
> +
> +#define GHES_CPER_OK   1
> +#define GHES_CPER_FAIL 0
> +
> +#define GHES_ACPI_HEST_NOTIFY_RESERVED           11
> +/* The max size in Bytes for one error block */
> +#define GHES_MAX_RAW_DATA_LENGTH                 0x1000
> +
> +
> +typedef struct GhesState {
> +    uint64_t ghes_addr_le;
> +} GhesState;
> +
> +void ghes_build_acpi(GArray *table_data, GArray *hardware_error,
> +                            BIOSLinker *linker);
> +void ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> +bool ghes_update_guest(uint32_t notify, uint64_t error_physical_addr);
> +#endif
> diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
> index afe4840..99c4041 100644
> --- a/include/qemu/uuid.h
> +++ b/include/qemu/uuid.h
> @@ -44,6 +44,17 @@ typedef struct {
>  
>  #define UUID_NONE "00000000-0000-0000-0000-000000000000"
>  
> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> +   ((b) >> 8) & 0xff, (b) & 0xff,                   \
> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> +
> +/* Platform Memory, this is from UEFI 2.6 N.2.2 Section Descriptor */
> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> +    0xED, 0x7C, 0x83, 0xB1)
> +

Seems easier to just define the array:

#define UEFI_CPER_SEC_PLATFORM_MEM { \
	0xA5, 0xBC, 0x11, 0x14, \
	0x6F, 0x64, \
	0x4E, 0xDE, \
	0xB8, 0x63, 0x3E, 0x83, 0xED, 0x7C, 0x83, 0xB1 \
}

but looking at it, there is a single user and the name is
not readable at all.  Just open-code it where it's used
with the comment.

Same applies elsewhere.


>  void qemu_uuid_generate(QemuUUID *out);
>  
>  int qemu_uuid_is_null(const QemuUUID *uu);
> -- 
> 2.10.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros
  2017-07-13 15:33       ` Laszlo Ersek
@ 2017-07-13 17:13         ` Michael S. Tsirkin
  2017-07-14  8:25           ` gengdongjiu
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-13 17:13 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: gengdongjiu, imammedo, famz, qemu-devel, zhaoshenglong,
	peter.maydell, qemu-arm, james.morse, zhengqiang10, huangshaoyu,
	wuquanming, Gaozhihui

On Thu, Jul 13, 2017 at 05:33:30PM +0200, Laszlo Ersek wrote:
> On 07/13/17 14:00, gengdongjiu wrote:
> > Laszlo,
> >   Thank you for your review and comments.
> > 
> > 
> > On 2017/7/13 18:33, Laszlo Ersek wrote:
> >> On 07/12/17 04:08, Dongjiu Geng wrote:
> 
> [snip]
> 
> >>> --- a/include/qemu/uuid.h
> >>> +++ b/include/qemu/uuid.h
> >>> @@ -44,6 +44,17 @@ typedef struct {
> >>>  
> >>>  #define UUID_NONE "00000000-0000-0000-0000-000000000000"
> >>>  
> >>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> >>> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> >>> +   ((b) >> 8) & 0xff, (b) & 0xff,                   \
> >>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> >>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> >>> +
> >>> +/* Platform Memory, this is from UEFI 2.6 N.2.2 Section Descriptor */
> >>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> >>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> >>> +    0xED, 0x7C, 0x83, 0xB1)
> >>> +
> >>>  void qemu_uuid_generate(QemuUUID *out);
> >>>  
> >>>  int qemu_uuid_is_null(const QemuUUID *uu);
> >>>
> >>
> >> (e) I think the addition of UUID_BE should be split out to a separate
> >> patch; it adds a general facility. It should likely be the very first
> >> patch in the series.
> >   Ok.
> > 
> >>
> >> (f) While I think it is justified to have UUID_BE() in "qemu/uuid.h", I
> >> think UEFI_CPER_SEC_PLATFORM_MEM is too specific to have here.
> >>
> >> If UEFI_CPER_SEC_PLATFORM_MEM were *not* a standardized UUID, I would
> >> suggest moving it to the implementation ("include/hw/acpi/hest_ghes.h"
> >> -- which in turn should be moved to patch #2, see my remark (d)), *plus*
> >> I would suggest eliminating the new #include from "acpi-defs.h", see my
> >> remark (b).
> >   understand your idea.
> > 
> >>
> >> However, given that this UUID *is* standard, I suggest keeping the (b)
> >> #include as you currently propose, and to move the definition of
> >> UEFI_CPER_SEC_PLATFORM_MEM to "acpi-defs.h".
> >   I agree with you.
> > 
> >>
> >> I vaguely recall that Michael commented on this previously, but I don't
> >> remember what he said. Michael, are you OK with my suggestion?
> >    Laszlo, I pasted Michael's comments here, as shown below. Michael said the definition
> > should use build_append_int_noprefix to add data. but I think it may not good, becuase
> > the section "UEFI_CPER_SEC_PLATFORM_MEM" is runtime recorded as CPER, not a ACPI/HEST
> > table member, so it is not generated when system boot up.
> 
> I agree: the UUID in question is not placed into the ACPI payload /
> fw_cfg blobs, it is written into guest memory at runtime, into the
> firmware-allocated area, if and when there is a hardware error to report.
> 
> Thanks
> Laszlo

The main point is that wrapping it up in a macro with an
unreadable name is not really helpful when it's only
used in a single place.


> > On the other hand,UEFI_CPER_SEC_PLATFORM_MEM
> > definition is from UEFI spec 2.6, N.2.2 Section Descriptor: {0xA5BC1114, 0x6F64, 0x4EDE, {0xB8, 0x63, 0x3E, 0x83, 0xED, 0x7C, 0x83, 0xB1}}.
> > if use  build_append_int_noprefix to add, may confuse others.
> > 
> > -----------------------------------------------------------------
> > -----------------------------------------------------------------
> > There's no reason to define these messy one-time use macros.
> > They just make it hard to look things up in the spec.
> > 
> > 
> > You can use build_append_int_noprefix to add data of
> > any length in LE format.
> > -----------------------------------------------------------------
> > -----------------------------------------------------------------
> > 
> > Hi  Michael,
> >   what is your suggestion about it? do you agree with Laszlo?

My main point is that the macros do not seem helpful.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/3] ACPI: Add APEI GHES Table Generation support
  2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 2/3] ACPI: Add APEI GHES Table Generation support Dongjiu Geng
@ 2017-07-13 17:32   ` Michael S. Tsirkin
  2017-07-13 19:41     ` Laszlo Ersek
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-13 17:32 UTC (permalink / raw)
  To: Dongjiu Geng
  Cc: lersek, imammedo, famz, qemu-devel, zhaoshenglong, peter.maydell,
	qemu-arm, james.morse, zhengqiang10, huangshaoyu, wuquanming

On Wed, Jul 12, 2017 at 10:08:16AM +0800, Dongjiu Geng wrote:
> This implements APEI GHES Table by passing the error CPER info
> to the guest via a fw_cfg_blob. After a CPER info is recorded, an
> SEA(Synchronous External Abort)/SEI(SError Interrupt) exception
> will be injected into the guest OS.
> 
> Below is the table layout, the max number of error soure is 11,
> which is classified by notification type.
> 
> etc/acpi/tables                 etc/hardware_errors
> ================     ==========================================
>                      +-----------+
> +--------------+     | address   |         +-> +--------------+
> |    HEST      +     | registers |         |   | Error Status |
> + +------------+     | +---------+         |   | Data Block 0 |
> | | GHES0      | --> | |address0 | --------+   | +------------+
> | | GHES1      | --> | |address1 | ------+     | |  CPER      |
> | | GHES2      | --> | |address2 | ----+ |     | |  CPER      |
> | |  ....      | --> | | ....... |     | |     | |  CPER      |
> | | GHES10     | --> | |address10| -+  | |     | |  CPER      |
> +-+------------+     +-+---------+  |  | |     +-+------------+
>                                     |  | |
>                                     |  | +---> +--------------+
>                                     |  |       | Error Status |
>                                     |  |       | Data Block 1 |
>                                     |  |       | +------------+
>                                     |  |       | |  CPER      |
>                                     |  |       | |  CPER      |
>                                     |  |       +-+------------+
>                                     |  |
>                                     |  +-----> +--------------+
>                                     |          | Error Status |
>                                     |          | Data Block 2 |
>                                     |          | +------------+
>                                     |          | |  CPER      |
>                                     |          +-+------------+
>                                     |            ...........
>                                     +--------> +--------------+
>                                                | Error Status |
>                                                | Data Block 10|
>                                                | +------------+
>                                                | |  CPER      |
>                                                | |  CPER      |
>                                                | |  CPER      |
>                                                +-+------------+
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> ---
> thanks a lot Laszlo's review and comments: 
> 
> change since v4:
> 1. fix email threading in this series is incorrect issue
> 
> change since v3:
> 1. remove the unnecessary include for "hw/acpi/vmgenid.h" in 
>    hw/arm/virt-acpi-build.c
> 2. add conversion between LE and host-endian for the CPER record
> 3. handle the case that run out of the preallocated memory for the CPER record
> 4. change to use g_malloc0 instead of g_malloc 
> 5. change block_reqr_size name to block_rer_size
> 6. change QEMU coding style, that is, the operator is at the end of the line.
> 7. drop the ERROR_STATUS_ADDRESS_OFFSET and GAS_ADDRESS_OFFSET macros (from
>    the header file as well), and use the offsetof to replace it.
> 8. remove the init_aml_allocator() / free_aml_allocator(), calculate the
>    needed size, and push that many bytes directly to "table_data".
> 9. take an "OVMF header probe suppressor" into account
> 10.corrct HEST and CPER value assigment, for example, correct the source_id
>    for every error source, this identifier of source_id should be unique among
>    all error sources; 
> 11. create only one WRITE_POINTER command, for the base address of "etc/hardware_errors".
>    This should be done outside of the loop.The base addresses of the individual error
>    status data blocks should be calculated in ghes_update_guest(), based on the error
>    source / notification type
> 12.correct the commit message lists error sources / notification types 0 through 10 (count=11 in total). 
> 13.correct the size calculation for GHES_DATA_ADDR_FW_CFG_FILE 
> 14.range-checked the value of "notify" before using it as an array subscript
> ---
>  hw/acpi/aml-build.c      |   2 +
>  hw/acpi/hest_ghes.c      | 219 +++++++++++++++++++++++++++++++++++++++++++++++
>  hw/arm/virt-acpi-build.c |   6 ++
>  3 files changed, 227 insertions(+)
>  create mode 100644 hw/acpi/hest_ghes.c
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index c6f2032..802b98d 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1560,6 +1560,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
>      tables->table_data = g_array_new(false, true /* clear */, 1);
>      tables->tcpalog = g_array_new(false, true /* clear */, 1);
>      tables->vmgenid = g_array_new(false, true /* clear */, 1);
> +    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
>      tables->linker = bios_linker_loader_init();
>  }
>  
> @@ -1570,6 +1571,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
>      g_array_free(tables->table_data, true);
>      g_array_free(tables->tcpalog, mfre);
>      g_array_free(tables->vmgenid, mfre);
> +    g_array_free(tables->hardware_errors, mfre);
>  }
>  
>  /* Build rsdt table */
> diff --git a/hw/acpi/hest_ghes.c b/hw/acpi/hest_ghes.c
> new file mode 100644
> index 0000000..c9442b6
> --- /dev/null
> +++ b/hw/acpi/hest_ghes.c
> @@ -0,0 +1,219 @@
> +/*
> + *  APEI GHES table Generation
> + *
> + *  Copyright (C) 2017 huawei.
> + *
> + *  Author: Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qmp-commands.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/hest_ghes.h"
> +#include "hw/nvram/fw_cfg.h"
> +#include "sysemu/sysemu.h"
> +
> +static int ghes_record_cper(uint64_t error_block_address,
> +                                    uint64_t error_physical_addr)
> +{
> +    AcpiGenericErrorStatus block;
> +    AcpiGenericErrorData *gdata;
> +    UefiCperSecMemErr *mem_err;
> +    uint64_t current_block_length;
> +    unsigned char *buffer;
> +    QemuUUID section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
> +
> +
> +    cpu_physical_memory_read(error_block_address, &block,
> +                                sizeof(AcpiGenericErrorStatus));
> +
> +    /* Get the current generic error status block length */
> +    current_block_length = sizeof(AcpiGenericErrorStatus) +
> +        le32_to_cpu(block.data_length);
> +
> +    /* If the Generic Error Status Block is NULL, update
> +     * the block header
> +     */
> +    if (!block.block_status) {
> +        block.block_status = ACPI_GEBS_UNCORRECTABLE;
> +        block.error_severity = ACPI_CPER_SEV_FATAL;
> +    }
> +
> +    block.data_length += cpu_to_le32(sizeof(AcpiGenericErrorData));
> +    block.data_length += cpu_to_le32(sizeof(UefiCperSecMemErr));
> +
> +    /* check whether it runs out of the preallocated memory */
> +    if ((le32_to_cpu(block.data_length) + sizeof(AcpiGenericErrorStatus)) >
> +       GHES_MAX_RAW_DATA_LENGTH) {
> +        return GHES_CPER_FAIL;
> +    }
> +    /* Write back the Generic Error Status Block to guest memory */
> +    cpu_physical_memory_write(error_block_address, &block,
> +                        sizeof(AcpiGenericErrorStatus));
> +
> +    /* Fill in Generic Error Data Entry */
> +    buffer = g_malloc0(sizeof(AcpiGenericErrorData) +
> +                       sizeof(UefiCperSecMemErr));
> +    memset(buffer, 0, sizeof(AcpiGenericErrorData) + sizeof(UefiCperSecMemErr));
> +    gdata = (AcpiGenericErrorData *)buffer;
> +
> +    qemu_uuid_bswap(&section_id_le);
> +    memcpy(&(gdata->section_type_le), &section_id_le,
> +                sizeof(QemuUUID));
> +    gdata->error_data_length = cpu_to_le32(sizeof(UefiCperSecMemErr));
> +
> +    mem_err = (UefiCperSecMemErr *) (gdata + 1);
> +
> +    /* In order to simplify simulation, hard code the CPER section to memory
> +     * section.
> +     */
> +
> +    /* Hard code to Multi-bit ECC error */
> +    mem_err->validation_bits |= cpu_to_le32(UEFI_CPER_MEM_VALID_ERROR_TYPE);
> +    mem_err->error_type = cpu_to_le32(UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC);
> +
> +    /* Record the physical address at which the memory error occurred */
> +    mem_err->validation_bits |= cpu_to_le32(UEFI_CPER_MEM_VALID_PA);
> +    mem_err->physical_addr = cpu_to_le32(error_physical_addr);
> +
> +    /* Write back the Generic Error Data Entry to guest memory */
> +    cpu_physical_memory_write(error_block_address + current_block_length,
> +        buffer, sizeof(AcpiGenericErrorData) + sizeof(UefiCperSecMemErr));
> +
> +    g_free(buffer);
> +    return GHES_CPER_OK;
> +}
> +
> +void ghes_build_acpi(GArray *table_data, GArray *hardware_error,
> +                                            BIOSLinker *linker)
> +{
> +    GArray *buffer;
> +    uint32_t address_registers_offset;
> +    AcpiHardwareErrorSourceTable *error_source_table;
> +    AcpiGenericHardwareErrorSource *error_source;
> +    int i;
> +    /*
> +     * The block_req_size stands for one address and one
> +     * generic error status block
> +      +---------+
> +      | address | --------+-> +---------+
> +      +---------+             |  CPER   |
> +                              |  CPER   |
> +                              |  CPER   |
> +                              |  CPER   |
> +                              |  ....   |
> +                              +---------+
> +     */
> +    int block_req_size = sizeof(uint64_t) + GHES_MAX_RAW_DATA_LENGTH;
> +
> +    /* The total size for address of data structure and
> +     * error status data block

Pls start multiple comments with /* by itself

> +     */
> +    g_array_set_size(hardware_error, GHES_ACPI_HEST_NOTIFY_RESERVED *
> +                                                block_req_size);
> +
> +    buffer = g_array_new(false, true /* clear */, 1);
> +    address_registers_offset = table_data->len +
> +        sizeof(AcpiHardwareErrorSourceTable) +
> +        offsetof(AcpiGenericHardwareErrorSource, error_status_address) +
> +        offsetof(struct AcpiGenericAddress, address);
> +
> +    /* Reserve space for HEST table size */
> +    acpi_data_push(buffer, sizeof(AcpiHardwareErrorSourceTable) +
> +                                GHES_ACPI_HEST_NOTIFY_RESERVED *
> +                                sizeof(AcpiGenericHardwareErrorSource));
> +
> +    g_array_append_vals(table_data, buffer->data, buffer->len);
> +    /* Allocate guest memory for the Data fw_cfg blob */
> +    bios_linker_loader_alloc(linker, GHES_ERRORS_FW_CFG_FILE, hardware_error,
> +                            4096, false /* page boundary, high memory */);

page boundary refers to 4096? Pls move it there, or put
before bios_linker_loader_alloc.

> +
> +    error_source_table = (AcpiHardwareErrorSourceTable *)(table_data->data
> +                        + table_data->len - buffer->len);
> +    error_source_table->error_source_count = GHES_ACPI_HEST_NOTIFY_RESERVED;
> +    error_source = (AcpiGenericHardwareErrorSource *)
> +        ((AcpiHardwareErrorSourceTable *)error_source_table + 1);

Concern with all these casts and pointer math is that it is barely readable.
We can easily overflow the array and get hard to debug heap corruption
errors.

What can I suggest?  Just add this to the array in steps. Then you will
not need all the math.

Or again, you can just add things as you init them using build_append_int_noprefix.


> +
> +    bios_linker_loader_write_pointer(linker, GHES_DATA_ADDR_FW_CFG_FILE,
> +        0, sizeof(uint64_t), GHES_ERRORS_FW_CFG_FILE,
> +        GHES_ACPI_HEST_NOTIFY_RESERVED * sizeof(uint64_t));

What does this uint64_t refer to? It's repeated everywhere.

> +
> +    for (i = 0; i < GHES_ACPI_HEST_NOTIFY_RESERVED; i++) {
> +        error_source->type = ACPI_HEST_SOURCE_GENERIC_ERROR;
> +        error_source->source_id = cpu_to_le16(i);
> +        error_source->related_source_id = 0xffff;
> +        error_source->flags = 0;
> +        error_source->enabled = 1;
> +        /* The number of error status block per Generic Hardware Error Source */

number of ... blocks ?

> +        error_source->number_of_records = 1;
> +        error_source->max_sections_per_record = 1;
> +        error_source->max_raw_data_length = GHES_MAX_RAW_DATA_LENGTH;
> +        error_source->error_status_address.space_id =
> +                                    AML_SYSTEM_MEMORY;
> +        error_source->error_status_address.bit_width = 64;
> +        error_source->error_status_address.bit_offset = 0;
> +        error_source->error_status_address.access_width = 4;
> +        error_source->notify.type = i;
> +        error_source->notify.length = sizeof(AcpiHestNotify);
> +
> +        error_source->error_status_block_length = GHES_MAX_RAW_DATA_LENGTH;
> +
> +        bios_linker_loader_add_pointer(linker,
> +            ACPI_BUILD_TABLE_FILE, address_registers_offset + i *
> +            sizeof(AcpiGenericHardwareErrorSource), sizeof(uint64_t),
> +            GHES_ERRORS_FW_CFG_FILE, i * sizeof(uint64_t));

and what's this math for?

> +
> +        error_source++;
> +    }
> +
> +    for (i = 0; i < GHES_ACPI_HEST_NOTIFY_RESERVED; i++) {
> +        bios_linker_loader_add_pointer(linker,
> +            GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i, sizeof(uint64_t),
> +            GHES_ERRORS_FW_CFG_FILE, GHES_ACPI_HEST_NOTIFY_RESERVED *
> +            sizeof(uint64_t) + i * GHES_MAX_RAW_DATA_LENGTH);
> +    }
> +
> +    build_header(linker, table_data,
> +        (void *)error_source_table, "HEST", buffer->len, 1, NULL, "GHES");

Pls indent so continuation lines are ro the right of (.

> +
> +    g_array_free(buffer, true);
> +}
> +
> +static GhesState ges;
> +void ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> +{
> +
> +    size_t request_block_size = sizeof(uint64_t) + GHES_MAX_RAW_DATA_LENGTH;
> +    size_t size = GHES_ACPI_HEST_NOTIFY_RESERVED * request_block_size;
> +
> +    /* Create a read-only fw_cfg file for GHES */
> +    fw_cfg_add_file(s, GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
> +                    size);
> +    /* Create a read-write fw_cfg file for Address */
> +    fw_cfg_add_file_callback(s, GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> +        &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> +}
> +
> +bool ghes_update_guest(uint32_t notify, uint64_t physical_address)
> +{
> +    uint64_t error_block_addr;
> +
> +    if (physical_address && notify < GHES_ACPI_HEST_NOTIFY_RESERVED) {
> +        error_block_addr = ges.ghes_addr_le + notify * GHES_MAX_RAW_DATA_LENGTH;
> +        error_block_addr = le32_to_cpu(error_block_addr);
> +
> +        /* A zero value in ghes_addr means that BIOS has not yet written
> +         * the address
> +         */
> +        if (error_block_addr) {
> +            return ghes_record_cper(error_block_addr, physical_address);
> +        }
> +    }
> +
> +    return GHES_CPER_FAIL;
> +}
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 0835e59..5c97016 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -45,6 +45,7 @@
>  #include "hw/arm/virt.h"
>  #include "sysemu/numa.h"
>  #include "kvm_arm.h"
> +#include "hw/acpi/hest_ghes.h"
>  
>  #define ARM_SPI_BASE 32
>  #define ACPI_POWER_BUTTON_DEVICE "PWRB"
> @@ -778,6 +779,9 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>      acpi_add_table(table_offsets, tables_blob);
>      build_spcr(tables_blob, tables->linker, vms);
>  
> +    acpi_add_table(table_offsets, tables_blob);
> +    ghes_build_acpi(tables_blob, tables->hardware_errors, tables->linker);
> +
>      if (nb_numa_nodes > 0) {
>          acpi_add_table(table_offsets, tables_blob);
>          build_srat(tables_blob, tables->linker, vms);
> @@ -890,6 +894,8 @@ void virt_acpi_setup(VirtMachineState *vms)
>      fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
>                      acpi_data_len(tables.tcpalog));
>  
> +    ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
> +
>      build_state->rsdp_mr = acpi_add_rom_blob(build_state, tables.rsdp,
>                                                ACPI_BUILD_RSDP_FILE, 0);
>  
> -- 
> 2.10.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros
  2017-07-13 12:00     ` gengdongjiu
  2017-07-13 15:33       ` Laszlo Ersek
@ 2017-07-13 17:35       ` Michael S. Tsirkin
  1 sibling, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-13 17:35 UTC (permalink / raw)
  To: gengdongjiu
  Cc: Laszlo Ersek, imammedo, famz, qemu-devel, zhaoshenglong,
	peter.maydell, qemu-arm, james.morse, zhengqiang10, huangshaoyu,
	wuquanming, Gaozhihui

On Thu, Jul 13, 2017 at 08:00:26PM +0800, gengdongjiu wrote:
>    Laszlo, I pasted Michael's comments here, as shown below. Michael said the definition
> should use build_append_int_noprefix to add data. but I think it may not good, becuase
> the section "UEFI_CPER_SEC_PLATFORM_MEM" is runtime recorded as CPER,

One thing I wanted to understand is how are races avoided. E.g. what if
you are in the process of writing out CPER and guest reads it.
I tried to find where is this function writing CPER called and couldn't.
Can you clarify pls?

> not a ACPI/HEST
> table member, so it is not generated when system boot up. On the other hand,UEFI_CPER_SEC_PLATFORM_MEM
> definition is from UEFI spec 2.6, N.2.2 Section Descriptor: {0xA5BC1114, 0x6F64, 0x4EDE, {0xB8, 0x63, 0x3E, 0x83, 0xED, 0x7C, 0x83, 0xB1}}.
> if use  build_append_int_noprefix to add, may confuse others.

Right but ACPI/HEST generation could use cleanup, and build_append_int_noprefix
would be a nice way to do it.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/3] Generate APEI GHES table and dynamically record CPER
  2017-07-12  2:08 [Qemu-devel] [PATCH v5 0/3] Generate APEI GHES table and dynamically record CPER Dongjiu Geng
                   ` (2 preceding siblings ...)
  2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 3/3] ACPI: build and enable APEI GHES in the Makefile and configuration Dongjiu Geng
@ 2017-07-13 17:36 ` Michael S. Tsirkin
  3 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-13 17:36 UTC (permalink / raw)
  To: Dongjiu Geng
  Cc: lersek, imammedo, famz, qemu-devel, zhaoshenglong, peter.maydell,
	qemu-arm, james.morse, zhengqiang10, huangshaoyu, wuquanming

On Wed, Jul 12, 2017 at 10:08:14AM +0800, Dongjiu Geng wrote:
> In the armv8 platform, the mainly hardware error source are ARMv8
> SEA/SEI/GSIV. For the ARMv8 SEA/SEI, the KVM or host kernel will signal SIGBUS
> or use other interface to notify user space, such as Qemu. After Qemu gets
> the notification, it will record the CPER and inject the SEA/SEI to KVM. this
> series of patches will generate APEI table when guest OS boot up, and dynamically
> record CPER for the guest OS about the generic hardware errors, currently the
> userspace only handle the memory section hardware errors

This is in a decent shape. I would prefer some cod cleanups
before this goes in. Mostly I would like the amount of
pointer math trickery to go down, drop a bunch of
unused code and fix up comments that point to ACPI spec.

I pointed out some instances but pls find all instances and try
to clean up some more.

Thanks!

> how to test:
> 1. In the guest OS, use this command to dump the APEI table: 
> 	"iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST"
> 2. And find the address for the generic error status block
>    according to the notification type
> 3. then find the CPER record through the generic error status block.
> 
> For example(notification type is SEA):
> 
> (1) root@genericarmv8:~# iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST
> (2) root@genericarmv8:~# cat HEST.dsl
> 	/*
> 	 * Intel ACPI Component Architecture
> 	 * AML/ASL+ Disassembler version 20170303 (64-bit version)
> 	 * Copyright (c) 2000 - 2017 Intel Corporation
> 	 *
> 	 * Disassembly of /sys/firmware/acpi/tables/HEST, Mon Dec 12 07:19:43 2016
> 	 *
> 	 * ACPI Data Table [HEST]
> 	 *
> 	 * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
> 	 */
>     ..................................................................................
>     [228h 0552   2]                Subtable Type : 0009 [Generic Hardware Error Source]
> 	[22Ah 0554   2]                    Source Id : 0008
> 	[22Ch 0556   2]            Related Source Id : FFFF
> 	[22Eh 0558   1]                     Reserved : 00
> 	[22Fh 0559   1]                      Enabled : 01
> 	[230h 0560   4]       Records To Preallocate : 00000001
> 	[234h 0564   4]      Max Sections Per Record : 00000001
> 	[238h 0568   4]          Max Raw Data Length : 00001000
> 
> 	[23Ch 0572  12]         Error Status Address : [Generic Address Structure]
> 	[23Ch 0572   1]                     Space ID : 00 [SystemMemory]
> 	[23Dh 0573   1]                    Bit Width : 40
> 	[23Eh 0574   1]                   Bit Offset : 00
> 	[23Fh 0575   1]         Encoded Access Width : 04 [QWord Access:64]
> 	[240h 0576   8]                      Address : 00000000785D0040
> 
> 	[248h 0584  28]                       Notify : [Hardware Error Notification Structure]
> 	[248h 0584   1]                  Notify Type : 08 [SEA]
> 	[249h 0585   1]                Notify Length : 1C
> 	[24Ah 0586   2]   Configuration Write Enable : 0000
> 	[24Ch 0588   4]                 PollInterval : 00000000
> 	[250h 0592   4]                       Vector : 00000000
> 	[254h 0596   4]      Polling Threshold Value : 00000000
> 	[258h 0600   4]     Polling Threshold Window : 00000000
> 	[25Ch 0604   4]        Error Threshold Value : 00000000
> 	[260h 0608   4]       Error Threshold Window : 00000000
> 
> 	[264h 0612   4]    Error Status Block Length : 00001000
>     .....................................................................................
> (3) according to above table, the address that contains the physical address of a block
>     of memory that holds the error status data for SEA notification error source is 0x00000000785D0040
> (4) the address for SEA notification error source is 0x785d8058
> 	(qemu) xp /2x 0x00000000785D0040
> 	00000000785d0040: 0x785d8058 0x00000000
> (5) check the content of generic error status block and generic error data entry
>     (qemu) xp /100x 0x785d8058
>     00000000785d8058: 0x00000001 0x00000000 0x00000000 0x00000098
>     00000000785d8068: 0x00000001 0xa5bc1114 0x4ede6f64 0x833e63b8
> 	00000000785d8078: 0xb1837ced 0x00000000 0x00000000 0x00000050
> 	00000000785d8088: 0x00000000 0x00000000 0x00000000 0x00000000
> 	00000000785d8098: 0x00000000 0x00000000 0x00000000 0x00000000
> 	00000000785d80a8: 0x00000000 0x00000000 0x00000000 0x00004002
> 	00000000785d80b8: 0x00000000 0x00000000 0x00000000 0x00001111
> 	00000000785d80c8: 0x00000000 0x00000000 0x00000000 0x00000000
> 	00000000785d80d8: 0x00000000 0x00000000 0x00000000 0x00000000
> 	00000000785d80e8: 0x00000000 0x00000000 0x00000000 0x00000000
> 	00000000785d80f8: 0x00000000 0x00000003 0x00000000 0x00000000
> 	00000000785d8108: 0x00000000 0x00000000 0x00000000 0x00000000
> 	00000000785d8118: 0x00000000 0x00000000 0x00000000 0x00000000
> 	00000000785d8128: 0x00000000 0x00000000 0x00000000 0x00000000
> 	00000000785d8138: 0x00000000 0x00000000 0x00000000 0x00000000
> 
> Dongjiu Geng (3):
>   ACPI: Add new ACPI structures and macros
>   ACPI: Add APEI GHES Table Generation support
>   ACPI: build and enable APEI GHES in the Makefile and configuration
> 
>  default-configs/arm-softmmu.mak |   1 +
>  hw/acpi/Makefile.objs           |   1 +
>  hw/acpi/aml-build.c             |   2 +
>  hw/acpi/hest_ghes.c             | 219 ++++++++++++++++++++++++++++++++++++++++
>  hw/arm/virt-acpi-build.c        |   6 ++
>  include/hw/acpi/acpi-defs.h     | 194 +++++++++++++++++++++++++++++++++++
>  include/hw/acpi/aml-build.h     |   1 +
>  include/hw/acpi/hest_ghes.h     |  47 +++++++++
>  include/qemu/uuid.h             |  11 ++
>  9 files changed, 482 insertions(+)
>  create mode 100644 hw/acpi/hest_ghes.c
>  create mode 100644 include/hw/acpi/hest_ghes.h
> 
> -- 
> 2.10.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/3] ACPI: Add APEI GHES Table Generation support
  2017-07-13 17:32   ` Michael S. Tsirkin
@ 2017-07-13 19:41     ` Laszlo Ersek
  2017-07-13 22:31       ` Michael S. Tsirkin
  2017-07-17  4:43       ` gengdongjiu
  0 siblings, 2 replies; 19+ messages in thread
From: Laszlo Ersek @ 2017-07-13 19:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, Dongjiu Geng
  Cc: imammedo, famz, qemu-devel, zhaoshenglong, peter.maydell,
	qemu-arm, james.morse, zhengqiang10, huangshaoyu, wuquanming

On 07/13/17 19:32, Michael S. Tsirkin wrote:
> On Wed, Jul 12, 2017 at 10:08:16AM +0800, Dongjiu Geng wrote:

snip

>> etc/acpi/tables                 etc/hardware_errors
>> ================     ==========================================
>>                      +-----------+
>> +--------------+     | address   |         +-> +--------------+
>> |    HEST      +     | registers |         |   | Error Status |
>> + +------------+     | +---------+         |   | Data Block 0 |
>> | | GHES0      | --> | |address0 | --------+   | +------------+
>> | | GHES1      | --> | |address1 | ------+     | |  CPER      |
>> | | GHES2      | --> | |address2 | ----+ |     | |  CPER      |
>> | |  ....      | --> | | ....... |     | |     | |  CPER      |
>> | | GHES10     | --> | |address10| -+  | |     | |  CPER      |
>> +-+------------+     +-+---------+  |  | |     +-+------------+
>>                                     |  | |
>>                                     |  | +---> +--------------+
>>                                     |  |       | Error Status |
>>                                     |  |       | Data Block 1 |
>>                                     |  |       | +------------+
>>                                     |  |       | |  CPER      |
>>                                     |  |       | |  CPER      |
>>                                     |  |       +-+------------+
>>                                     |  |
>>                                     |  +-----> +--------------+
>>                                     |          | Error Status |
>>                                     |          | Data Block 2 |
>>                                     |          | +------------+
>>                                     |          | |  CPER      |
>>                                     |          +-+------------+
>>                                     |            ...........
>>                                     +--------> +--------------+
>>                                                | Error Status |
>>                                                | Data Block 10|
>>                                                | +------------+
>>                                                | |  CPER      |
>>                                                | |  CPER      |
>>                                                | |  CPER      |
>>                                                +-+------------+

snip

>> +
>> +    error_source_table = (AcpiHardwareErrorSourceTable *)(table_data->data
>> +                        + table_data->len - buffer->len);
>> +    error_source_table->error_source_count = GHES_ACPI_HEST_NOTIFY_RESERVED;
>> +    error_source = (AcpiGenericHardwareErrorSource *)
>> +        ((AcpiHardwareErrorSourceTable *)error_source_table + 1);
> 
> Concern with all these casts and pointer math is that it is barely readable.
> We can easily overflow the array and get hard to debug heap corruption
> errors.
> 
> What can I suggest?  Just add this to the array in steps. Then you will
> not need all the math.
> 
> Or again, you can just add things as you init them using build_append_int_noprefix.
> 
> 
>> +
>> +    bios_linker_loader_write_pointer(linker, GHES_DATA_ADDR_FW_CFG_FILE,
>> +        0, sizeof(uint64_t), GHES_ERRORS_FW_CFG_FILE,
>> +        GHES_ACPI_HEST_NOTIFY_RESERVED * sizeof(uint64_t));
> 
> What does this uint64_t refer to? It's repeated everywhere.
> 
>> +
>> +    for (i = 0; i < GHES_ACPI_HEST_NOTIFY_RESERVED; i++) {
>> +        error_source->type = ACPI_HEST_SOURCE_GENERIC_ERROR;
>> +        error_source->source_id = cpu_to_le16(i);
>> +        error_source->related_source_id = 0xffff;
>> +        error_source->flags = 0;
>> +        error_source->enabled = 1;
>> +        /* The number of error status block per Generic Hardware Error Source */
> 
> number of ... blocks ?
> 
>> +        error_source->number_of_records = 1;
>> +        error_source->max_sections_per_record = 1;
>> +        error_source->max_raw_data_length = GHES_MAX_RAW_DATA_LENGTH;
>> +        error_source->error_status_address.space_id =
>> +                                    AML_SYSTEM_MEMORY;
>> +        error_source->error_status_address.bit_width = 64;
>> +        error_source->error_status_address.bit_offset = 0;
>> +        error_source->error_status_address.access_width = 4;
>> +        error_source->notify.type = i;
>> +        error_source->notify.length = sizeof(AcpiHestNotify);
>> +
>> +        error_source->error_status_block_length = GHES_MAX_RAW_DATA_LENGTH;
>> +
>> +        bios_linker_loader_add_pointer(linker,
>> +            ACPI_BUILD_TABLE_FILE, address_registers_offset + i *
>> +            sizeof(AcpiGenericHardwareErrorSource), sizeof(uint64_t),
>> +            GHES_ERRORS_FW_CFG_FILE, i * sizeof(uint64_t));
> 
> and what's this math for?
> 
>> +
>> +        error_source++;
>> +    }
>> +
>> +    for (i = 0; i < GHES_ACPI_HEST_NOTIFY_RESERVED; i++) {
>> +        bios_linker_loader_add_pointer(linker,
>> +            GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i, sizeof(uint64_t),
>> +            GHES_ERRORS_FW_CFG_FILE, GHES_ACPI_HEST_NOTIFY_RESERVED *
>> +            sizeof(uint64_t) + i * GHES_MAX_RAW_DATA_LENGTH);
>> +    }

So basically all this math exists to set up the pointers that are shown
in the diagram in the commit message. It is a bit tricky because most of
those pointer fields (all 8-bytes wide) are individually embedded into
their own containing structures. In the previous version of this patch
set, I painstakingly verified the math, and pointed out wherever I
thought updates were necessary.

I agree the math is hard to read, the code is very "dense". My
suggestion (supporting yours) would be to calculate the fw_cfg blob
offsets that should be patched in more fine-grained steps, possibly with
multiple separate increments, using:
- structure type names,
- sizeof operators,
- offsetof macros,
- and possibly a separate comment for each offset increment.

Thanks,
Laszlo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/3] ACPI: Add APEI GHES Table Generation support
  2017-07-13 19:41     ` Laszlo Ersek
@ 2017-07-13 22:31       ` Michael S. Tsirkin
  2017-07-17  4:43       ` gengdongjiu
  1 sibling, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2017-07-13 22:31 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Dongjiu Geng, imammedo, famz, qemu-devel, zhaoshenglong,
	peter.maydell, qemu-arm, james.morse, zhengqiang10, huangshaoyu,
	wuquanming

On Thu, Jul 13, 2017 at 09:41:03PM +0200, Laszlo Ersek wrote:
> >> +
> >> +        error_source++;
> >> +    }
> >> +
> >> +    for (i = 0; i < GHES_ACPI_HEST_NOTIFY_RESERVED; i++) {
> >> +        bios_linker_loader_add_pointer(linker,
> >> +            GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i, sizeof(uint64_t),
> >> +            GHES_ERRORS_FW_CFG_FILE, GHES_ACPI_HEST_NOTIFY_RESERVED *
> >> +            sizeof(uint64_t) + i * GHES_MAX_RAW_DATA_LENGTH);
> >> +    }
> 
> So basically all this math exists to set up the pointers that are shown
> in the diagram in the commit message. It is a bit tricky because most of
> those pointer fields (all 8-bytes wide) are individually embedded into
> their own containing structures. In the previous version of this patch
> set, I painstakingly verified the math, and pointed out wherever I
> thought updates were necessary.
> 
> I agree the math is hard to read, the code is very "dense". My
> suggestion (supporting yours) would be to calculate the fw_cfg blob
> offsets that should be patched in more fine-grained steps, possibly with
> multiple separate increments, using:
> - structure type names,
> - sizeof operators,
> - offsetof macros,
> - and possibly a separate comment for each offset increment.
> 
> Thanks,
> Laszlo

Right. That's not what rest of ACPI does though. What we do there
is one of:

1. Use GArray to gradually build up the structure.

Struct *s = acpi_data_push(table, sizeof (*s));
s->foo = bar;

2. Even simpler: use build_append_int_noprefix:

build_append_int_noprefix(table, 2, bar); /* Foo field */

this removes the need to declare Struct in a header,
just add comments to match the spec exactly.

We might gradually move more code to (2) but (1) is an
OK style too, it makes it obvious the sizes are fine.


-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros
  2017-07-13 17:01   ` Michael S. Tsirkin
@ 2017-07-14  5:51     ` gengdongjiu
  0 siblings, 0 replies; 19+ messages in thread
From: gengdongjiu @ 2017-07-14  5:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: lersek, imammedo, famz, qemu-devel, zhaoshenglong, peter.maydell,
	qemu-arm, james.morse, zhengqiang10, huangshaoyu, wuquanming

Michael,
  Thanks for your review.


On 2017/7/14 1:01, Michael S. Tsirkin wrote:
>> +typedef struct GhesState {
>> +    uint64_t ghes_addr_le;
>> +} GhesState;
>> +
>> +void ghes_build_acpi(GArray *table_data, GArray *hardware_error,
>> +                            BIOSLinker *linker);
>> +void ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
>> +bool ghes_update_guest(uint32_t notify, uint64_t error_physical_addr);
>> +#endif
> It's pretty unusual to add the function declaration but not the
> definition.
  you are right. I will move this declaration file to another patch.
  thanks. Laszlo also ever mentioned that.

> 
>> diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
>> index afe4840..99c4041 100644
>> --- a/include/qemu/uuid.h
>> +++ b/include/qemu/uuid.h
>> @@ -44,6 +44,17 @@ typedef struct {
>>  
>>  #define UUID_NONE "00000000-0000-0000-0000-000000000000"
>>  
>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>> +   ((b) >> 8) & 0xff, (b) & 0xff,                   \
>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>> +
>> +/* Platform Memory, this is from UEFI 2.6 N.2.2 Section Descriptor */
> Drop "this is" from the sentence.
  OK.

> 
>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>> +    0xED, 0x7C, 0x83, 0xB1)
>> +
>>  void qemu_uuid_generate(QemuUUID *out);
>>  
>>  int qemu_uuid_is_null(const QemuUUID *uu);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros
  2017-07-13 17:11   ` Michael S. Tsirkin
@ 2017-07-14  8:22     ` gengdongjiu
  0 siblings, 0 replies; 19+ messages in thread
From: gengdongjiu @ 2017-07-14  8:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: lersek, imammedo, famz, qemu-devel, zhaoshenglong, peter.maydell,
	qemu-arm, james.morse, zhengqiang10, huangshaoyu, wuquanming

Michael,
  thanks for the review and comments.

On 2017/7/14 1:11, Michael S. Tsirkin wrote:
> On Wed, Jul 12, 2017 at 10:08:15AM +0800, Dongjiu Geng wrote:
>> (1) Add related APEI/HEST table structures and  macros, these
>>     definition refer to ACPI 6.1 and UEFI 2.6 spec.
>> (2) Add generic error status block and CPER memory section
>>     definition, user space only handle memory section errors.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> ---
>> thanks Laszlo and Michael's review:
>>
>> chnage since v4:
>> (1) fix email threading in this series is incorrect issue
>>
>> change since v3:
>> (1) separate the original one patch into three patches: one is new
>>     ACPI structures and macros, another is C source file to generate ACPI HEST
>>     table and dynamically record CPER ,final patch is the change about Makefile
>>     and configuration
>> (2) add comments about where the ACPI structures and macros come from, for example,
>>     they come from the UEFI Spec 2.6, "xxxxxxxxxxxx"; ACPI 6.1 spec, 
>>     "xxxxxxxxxxxxxx".
>> (3) correct the macros name, for emaple, prefix some macro names with
>>     "UEFI_".
>> (4) remove the uuid_le struct and use the QemuUUID in the include/qemu/uuid.h"
>> (5) remove the duplicate ACPI address space, because it already defined in 
>>     the "include/hw/acpi/aml-build.h"
>> (6) remove the acpi_generic_address structure because same definition exists in the 
>>     AcpiGenericAddress.
>> (7) rename the struct acpi_hest_notify to AcpiHestNotifyType
>> (8) rename the struct acpi_hest_types to AcpiHestSourceType
>> (9) rename enum constants AcpiHestSourceType to ACPI_HEST_SOURCE_xxx from
>>      ACPI_HEST_TYPE_xxx
>> (10) remove the NOT_USED{3,4,5} enum constants in the AcpiHestSourceType.
>> (11) add missed QEMU_PACKED for the struct definition.
>> (12) remove the defnition of AcpiGenericErrorData, and rename the
>>      AcpiGenericErrorDataV300 to AcpiGenericErrorData.
>> (13) use the QemuUUID type for the "section_type" field AcpiGenericErrorData, and rename it
>>      to section_type_le.
>> (14) moving type AcpiGenericErrorSeverity above AcpiGenericErrorData and
>>      AcpiGenericErrorDataV300, and remarking on the "error_severity" fields
>>      that they take their values from AcpiGenericErrorSeverity
>> (15) remove the wrongly call to BERT(Boot Error Record Table)
>> (16) add comments for the struction member, for example, pint out that
>>      the block_status member in the AcpiGenericErrorStatus is a bitmask
>>      composed of ACPI_GEBS_xxx macros
>> (17) remove the hardware error source notification type list, and rename
>>      the MAX_ERROR_SOURCE_COUNT_V6 to ACPI_HEST_NOTIFY_RESERVED.
>> (18) remove the physical_addr member of GhesErrorState
>> (19) change the "uint64_t ghes_addr_le[8]" in GhesErrorState to uint64_t ghes_addr_le
>> (20) change the second parameter to "error_physical_addr" in the ghes_update_guest
>>      API statement
>> ---
>>  include/hw/acpi/acpi-defs.h | 194 ++++++++++++++++++++++++++++++++++++++++++++
>>  include/hw/acpi/aml-build.h |   1 +
>>  include/hw/acpi/hest_ghes.h |  47 +++++++++++
>>  include/qemu/uuid.h         |  11 +++
>>  4 files changed, 253 insertions(+)
>>  create mode 100644 include/hw/acpi/hest_ghes.h
>>
>> diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
>> index 4cc3630..0756339 100644
>> --- a/include/hw/acpi/acpi-defs.h
>> +++ b/include/hw/acpi/acpi-defs.h
>> @@ -15,6 +15,7 @@
>>  #ifndef QEMU_ACPI_DEFS_H
>>  #define QEMU_ACPI_DEFS_H
>>  
>> +#include "qemu/uuid.h"
>>  enum {
>>      ACPI_FADT_F_WBINVD,
>>      ACPI_FADT_F_WBINVD_FLUSH,
>> @@ -295,6 +296,44 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable;
>>  #define ACPI_APIC_GENERIC_TRANSLATOR    15
>>  #define ACPI_APIC_RESERVED              16   /* 16 and greater are reserved */
>>  
>> +/* UEFI Spec 2.6, "N.2.5 Memory Error Section */
>> +#define UEFI_CPER_MEM_VALID_ERROR_STATUS     0x0001
>> +#define UEFI_CPER_MEM_VALID_PA               0x0002
>> +#define UEFI_CPER_MEM_VALID_PA_MASK          0x0004
>> +#define UEFI_CPER_MEM_VALID_NODE             0x0008
>> +#define UEFI_CPER_MEM_VALID_CARD             0x0010
>> +#define UEFI_CPER_MEM_VALID_MODULE           0x0020
>> +#define UEFI_CPER_MEM_VALID_BANK             0x0040
>> +#define UEFI_CPER_MEM_VALID_DEVICE           0x0080
>> +#define UEFI_CPER_MEM_VALID_ROW              0x0100
>> +#define UEFI_CPER_MEM_VALID_COLUMN           0x0200
>> +#define UEFI_CPER_MEM_VALID_BIT_POSITION     0x0400
>> +#define UEFI_CPER_MEM_VALID_REQUESTOR        0x0800
>> +#define UEFI_CPER_MEM_VALID_RESPONDER        0x1000
>> +#define UEFI_CPER_MEM_VALID_TARGET           0x2000
>> +#define UEFI_CPER_MEM_VALID_ERROR_TYPE       0x4000
>> +#define UEFI_CPER_MEM_VALID_RANK_NUMBER      0x8000
>> +#define UEFI_CPER_MEM_VALID_CARD_HANDLE      0x10000
>> +#define UEFI_CPER_MEM_VALID_MODULE_HANDLE    0x20000
>> +#define UEFI_CPER_MEM_ERROR_TYPE_MULTI_ECC   3
>> +
>> +/* This is from the ACPI 6.1 spec, "18.3.2.9 Hardware Error Notification" */
>> +
>> +enum AcpiHestNotifyType {
>> +    ACPI_HEST_NOTIFY_POLLED = 0,
>> +    ACPI_HEST_NOTIFY_EXTERNAL = 1,
>> +    ACPI_HEST_NOTIFY_LOCAL = 2,
>> +    ACPI_HEST_NOTIFY_SCI = 3,
>> +    ACPI_HEST_NOTIFY_NMI = 4,
>> +    ACPI_HEST_NOTIFY_CMCI = 5,  /* ACPI 5.0 */
>> +    ACPI_HEST_NOTIFY_MCE = 6,   /* ACPI 5.0 */
>> +    ACPI_HEST_NOTIFY_GPIO = 7,  /* ACPI 6.0 */
>> +    ACPI_HEST_NOTIFY_SEA = 8,   /* ACPI 6.1 */
>> +    ACPI_HEST_NOTIFY_SEI = 9,   /* ACPI 6.1 */
>> +    ACPI_HEST_NOTIFY_GSIV = 10, /* ACPI 6.1 */
>> +    ACPI_HEST_NOTIFY_RESERVED = 11  /* 11 and greater are reserved */
>> +};
>> +
>>  /*
>>   * MADT sub-structures (Follow MULTIPLE_APIC_DESCRIPTION_TABLE)
>>   */
>> @@ -475,6 +514,161 @@ struct AcpiSystemResourceAffinityTable
>>  } QEMU_PACKED;
>>  typedef struct AcpiSystemResourceAffinityTable AcpiSystemResourceAffinityTable;
>>  
>> +/* Hardware Error Notification, this is from the ACPI 6.1
>> + * spec, "18.3.2.9 Hardware Error Notification"
>> + */
>> +struct AcpiHestNotify {
>> +    uint8_t type;
>> +    uint8_t length;
>> +    uint16_t config_write_enable;
>> +    uint32_t poll_interval;
>> +    uint32_t vector;
>> +    uint32_t polling_threshold_value;
>> +    uint32_t polling_threshold_window;
>> +    uint32_t error_threshold_value;
>> +    uint32_t error_threshold_window;
>> +} QEMU_PACKED;
>> +typedef struct AcpiHestNotify AcpiHestNotify;
>> +
>> +/* These are from ACPI 6.1, sections "18.3.2.1 IA-32 Architecture Machine
>> + * Check Exception" through "18.3.2.8 Generic Hardware Error Source version 2".
>> + */
>> +enum AcpiHestSourceType {
>> +    ACPI_HEST_SOURCE_IA32_CHECK = 0,
>> +    ACPI_HEST_SOURCE_IA32_CORRECTED_CHECK = 1,
>> +    ACPI_HEST_SOURCE_IA32_NMI = 2,
>> +    ACPI_HEST_SOURCE_AER_ROOT_PORT = 6,
>> +    ACPI_HEST_SOURCE_AER_ENDPOINT = 7,
>> +    ACPI_HEST_SOURCE_AER_BRIDGE = 8,
>> +    ACPI_HEST_SOURCE_GENERIC_ERROR = 9,
>> +    ACPI_HEST_SOURCE_GENERIC_ERROR_V2 = 10,
>> +    ACPI_HEST_SOURCE_RESERVED = 11    /* 11 and greater are reserved */
>> +};
>> +
>> +/* Block Status bitmasks from ACPI 6.1, "18.3.2.7.1 Generic Error Data" */
>> +#define ACPI_GEBS_UNCORRECTABLE             (1)
>> +#define ACPI_GEBS_CORRECTABLE               (1 << 1)
>> +#define ACPI_GEBS_MULTIPLE_UNCORRECTABLE    (1 << 2)
>> +#define ACPI_GEBS_MULTIPLE_CORRECTABLE      (1 << 3)
>> +/* 10 bits, error count */
>> +#define ACPI_GEBS_ERROR_ENTRY_COUNT         (0x3FF << 4)
>> +
>> +/* Generic Hardware Error Source Structure, refer to ACPI 6.1
>> + * "18.3.2.7 Generic Hardware Error Source". in this struct the
>> + * "type" field has to be ACPI_HEST_SOURCE_GENERIC_ERROR
>> + */
>> +
>> +struct AcpiGenericHardwareErrorSource {
>> +    uint16_t type;
>> +    uint16_t source_id;
>> +    uint16_t related_source_id;
>> +    uint8_t flags;
>> +    uint8_t enabled;
>> +    uint32_t number_of_records;
>> +    uint32_t max_sections_per_record;
>> +    uint32_t max_raw_data_length;
>> +    struct AcpiGenericAddress error_status_address;
>> +    struct AcpiHestNotify notify;
>> +    uint32_t error_status_block_length;
>> +} QEMU_PACKED;
>> +typedef struct AcpiGenericHardwareErrorSource AcpiGenericHardwareErrorSource;
>> +
>> +/* Generic Hardware Error Source, version 2, ACPI 6.1, "18.3.2.8 Generic
>> + * Hardware Error Source version 2", in this struct the "type" field has to
>> + * be ACPI_HEST_SOURCE_GENERIC_ERROR_V2
>> + */
>> +struct AcpiGenericHardwareErrorSourceV2 {
>> +    uint16_t type;
>> +    uint16_t source_id;
>> +    uint16_t related_source_id;
>> +    uint8_t flags;
>> +    uint8_t enabled;
>> +    uint32_t number_of_records;
>> +    uint32_t max_sections_per_record;
>> +    uint32_t max_raw_data_length;
>> +    struct AcpiGenericAddress error_status_address;
>> +    struct AcpiHestNotify notify;
>> +    uint32_t error_status_block_length;
>> +    struct AcpiGenericAddress read_ack_register;
>> +    uint64_t read_ack_preserve;
>> +    uint64_t read_ack_write;
>> +} QEMU_PACKED;
>> +typedef struct AcpiGenericHardwareErrorSourceV2
>> +            AcpiGenericHardwareErrorSourceV2;
>> +
>> +/* Generic Error Status block, this is from ACPI 6.1,
>> + * "18.3.2.7.1 Generic Error Data"
>> + */
>> +struct AcpiGenericErrorStatus {
>> +    /* It is a bitmask composed of ACPI_GEBS_xxx macros */
>> +    uint32_t block_status;
>> +    uint32_t raw_data_offset;
>> +    uint32_t raw_data_length;
>> +    uint32_t data_length;
>> +    uint32_t error_severity;
>> +} QEMU_PACKED;
>> +typedef struct AcpiGenericErrorStatus AcpiGenericErrorStatus;
>> +
>> +enum AcpiGenericErrorSeverity {
>> +    ACPI_CPER_SEV_RECOVERABLE,
>> +    ACPI_CPER_SEV_FATAL,
>> +    ACPI_CPER_SEV_CORRECTED,
>> +    ACPI_CPER_SEV_NONE,
>> +};
>> +
>> +/* Generic Error Data entry, revision number is 0x0300,
>> + * ACPI 6.1, "18.3.2.7.1 Generic Error Data"
>> + */
>> +struct AcpiGenericErrorData {
>> +    QemuUUID section_type_le;
>> +    /* The "error_severity" fields that they take their
>> +     * values from AcpiGenericErrorSeverity
>> +     */
>> +    uint32_t error_severity;
>> +    uint16_t revision;
>> +    uint8_t validation_bits;
>> +    uint8_t flags;
>> +    uint32_t error_data_length;
>> +    uint8_t fru_id[16];
>> +    uint8_t fru_text[20];
>> +    uint64_t time_stamp;
>> +} QEMU_PACKED;
>> +typedef struct AcpiGenericErrorData AcpiGenericErrorData;
>> +
>> +/* This is from UEFI 2.6, "N.2.5 Memory Error Section" */
>> +struct UefiCperSecMemErr {
>> +    uint64_t    validation_bits;
>> +    uint64_t    error_status;
>> +    uint64_t    physical_addr;
>> +    uint64_t    physical_addr_mask;
>> +    uint16_t    node;
>> +    uint16_t    card;
>> +    uint16_t    module;
>> +    uint16_t    bank;
>> +    uint16_t    device;
>> +    uint16_t    row;
>> +    uint16_t    column;
>> +    uint16_t    bit_pos;
>> +    uint64_t    requestor_id;
>> +    uint64_t    responder_id;
>> +    uint64_t    target_id;
>> +    uint8_t     error_type;
>> +    uint8_t     reserved;
>> +    uint16_t    rank;
>> +    uint16_t    mem_array_handle;   /* card handle in UEFI 2.4 */
>> +    uint16_t    mem_dev_handle;     /* module handle in UEFI 2.4 */
>> +} QEMU_PACKED;
>> +typedef struct UefiCperSecMemErr UefiCperSecMemErr;
>> +
>> +/*
>> + * HEST Description Table
>> + */
>> +struct AcpiHardwareErrorSourceTable {
>> +    ACPI_TABLE_HEADER_DEF                    /* ACPI common table header */
>> +    uint32_t           error_source_count;
>> +} QEMU_PACKED;
>> +typedef struct AcpiHardwareErrorSourceTable AcpiHardwareErrorSourceTable;
>> +
>>  #define ACPI_SRAT_PROCESSOR_APIC     0
>>  #define ACPI_SRAT_MEMORY             1
>>  #define ACPI_SRAT_PROCESSOR_x2APIC   2
>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
>> index 00c21f1..c1d15b3 100644
>> --- a/include/hw/acpi/aml-build.h
>> +++ b/include/hw/acpi/aml-build.h
>> @@ -211,6 +211,7 @@ struct AcpiBuildTables {
>>      GArray *rsdp;
>>      GArray *tcpalog;
>>      GArray *vmgenid;
>> +    GArray *hardware_errors;
>>      BIOSLinker *linker;
>>  } AcpiBuildTables;
>>  
>> diff --git a/include/hw/acpi/hest_ghes.h b/include/hw/acpi/hest_ghes.h
>> new file mode 100644
>> index 0000000..0772756
>> --- /dev/null
>> +++ b/include/hw/acpi/hest_ghes.h
>> @@ -0,0 +1,47 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> +
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * Authors:
>> + *   Dongjiu Geng <gengdongjiu@huawei.com>
>> + *
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef ACPI_GHES_H
>> +#define ACPI_GHES_H
>> +
>> +#include "hw/acpi/bios-linker-loader.h"
>> +
>> +#define GHES_ERRORS_FW_CFG_FILE      "etc/hardware_errors"
>> +#define GHES_DATA_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
>> +
>> +#define GHES_GAS_ADDRESS_OFFSET              4
>> +#define GHES_ERROR_STATUS_ADDRESS_OFFSET     20
>> +#define GHES_NOTIFICATION_STRUCTURE          32
>> +
>> +#define GHES_CPER_OK   1
>> +#define GHES_CPER_FAIL 0
>> +
>> +#define GHES_ACPI_HEST_NOTIFY_RESERVED           11
>> +/* The max size in Bytes for one error block */
>> +#define GHES_MAX_RAW_DATA_LENGTH                 0x1000
>> +
>> +
>> +typedef struct GhesState {
>> +    uint64_t ghes_addr_le;
>> +} GhesState;
>> +
>> +void ghes_build_acpi(GArray *table_data, GArray *hardware_error,
>> +                            BIOSLinker *linker);
>> +void ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
>> +bool ghes_update_guest(uint32_t notify, uint64_t error_physical_addr);
>> +#endif
>> diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
>> index afe4840..99c4041 100644
>> --- a/include/qemu/uuid.h
>> +++ b/include/qemu/uuid.h
>> @@ -44,6 +44,17 @@ typedef struct {
>>  
>>  #define UUID_NONE "00000000-0000-0000-0000-000000000000"
>>  
>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>> +   ((b) >> 8) & 0xff, (b) & 0xff,                   \
>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>> +
>> +/* Platform Memory, this is from UEFI 2.6 N.2.2 Section Descriptor */
>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>> +    0xED, 0x7C, 0x83, 0xB1)
>> +
> 
> Seems easier to just define the array:
> 
> #define UEFI_CPER_SEC_PLATFORM_MEM { \
> 	0xA5, 0xBC, 0x11, 0x14, \
> 	0x6F, 0x64, \
> 	0x4E, 0xDE, \
> 	0xB8, 0x63, 0x3E, 0x83, 0xED, 0x7C, 0x83, 0xB1 \
> }
> 
> but looking at it, there is a single user and the name is
> not readable at all.  Just open-code it where it's used
> with the comment.
> 
> Same applies elsewhere.
 thanks for the point out. I will move it to the place where it is used
 with the comment.

> 
> 
>>  void qemu_uuid_generate(QemuUUID *out);
>>  
>>  int qemu_uuid_is_null(const QemuUUID *uu);
>> -- 
>> 2.10.1
> 
> .
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros
  2017-07-13 17:13         ` Michael S. Tsirkin
@ 2017-07-14  8:25           ` gengdongjiu
  0 siblings, 0 replies; 19+ messages in thread
From: gengdongjiu @ 2017-07-14  8:25 UTC (permalink / raw)
  To: Michael S. Tsirkin, Laszlo Ersek
  Cc: imammedo, famz, qemu-devel, zhaoshenglong, peter.maydell,
	qemu-arm, james.morse, zhengqiang10, huangshaoyu, wuquanming,
	Gaozhihui

Dear Michael,

On 2017/7/14 1:13, Michael S. Tsirkin wrote:
> On Thu, Jul 13, 2017 at 05:33:30PM +0200, Laszlo Ersek wrote:
>> On 07/13/17 14:00, gengdongjiu wrote:
>>> Laszlo,
>>>   Thank you for your review and comments.
>>>
>>>
>>> On 2017/7/13 18:33, Laszlo Ersek wrote:
>>>> On 07/12/17 04:08, Dongjiu Geng wrote:
>>
>> [snip]
>>
>>>>> --- a/include/qemu/uuid.h
>>>>> +++ b/include/qemu/uuid.h
>>>>> @@ -44,6 +44,17 @@ typedef struct {
>>>>>  
>>>>>  #define UUID_NONE "00000000-0000-0000-0000-000000000000"
>>>>>  
>>>>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>>>>> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>>>>> +   ((b) >> 8) & 0xff, (b) & 0xff,                   \
>>>>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>>>>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>>>>> +
>>>>> +/* Platform Memory, this is from UEFI 2.6 N.2.2 Section Descriptor */
>>>>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>>>>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>>>>> +    0xED, 0x7C, 0x83, 0xB1)
>>>>> +
>>>>>  void qemu_uuid_generate(QemuUUID *out);
>>>>>  
>>>>>  int qemu_uuid_is_null(const QemuUUID *uu);
>>>>>
>>>>
>>>> (e) I think the addition of UUID_BE should be split out to a separate
>>>> patch; it adds a general facility. It should likely be the very first
>>>> patch in the series.
>>>   Ok.
>>>
>>>>
>>>> (f) While I think it is justified to have UUID_BE() in "qemu/uuid.h", I
>>>> think UEFI_CPER_SEC_PLATFORM_MEM is too specific to have here.
>>>>
>>>> If UEFI_CPER_SEC_PLATFORM_MEM were *not* a standardized UUID, I would
>>>> suggest moving it to the implementation ("include/hw/acpi/hest_ghes.h"
>>>> -- which in turn should be moved to patch #2, see my remark (d)), *plus*
>>>> I would suggest eliminating the new #include from "acpi-defs.h", see my
>>>> remark (b).
>>>   understand your idea.
>>>
>>>>
>>>> However, given that this UUID *is* standard, I suggest keeping the (b)
>>>> #include as you currently propose, and to move the definition of
>>>> UEFI_CPER_SEC_PLATFORM_MEM to "acpi-defs.h".
>>>   I agree with you.
>>>
>>>>
>>>> I vaguely recall that Michael commented on this previously, but I don't
>>>> remember what he said. Michael, are you OK with my suggestion?
>>>    Laszlo, I pasted Michael's comments here, as shown below. Michael said the definition
>>> should use build_append_int_noprefix to add data. but I think it may not good, becuase
>>> the section "UEFI_CPER_SEC_PLATFORM_MEM" is runtime recorded as CPER, not a ACPI/HEST
>>> table member, so it is not generated when system boot up.
>>
>> I agree: the UUID in question is not placed into the ACPI payload /
>> fw_cfg blobs, it is written into guest memory at runtime, into the
>> firmware-allocated area, if and when there is a hardware error to report.
>>
>> Thanks
>> Laszlo
> 
> The main point is that wrapping it up in a macro with an
> unreadable name is not really helpful when it's only
> used in a single place.

 As I said in another mail, I will move it to the place where it is used
 with the comment. thanks

> 
> 
>>> On the other hand,UEFI_CPER_SEC_PLATFORM_MEM
>>> definition is from UEFI spec 2.6, N.2.2 Section Descriptor: {0xA5BC1114, 0x6F64, 0x4EDE, {0xB8, 0x63, 0x3E, 0x83, 0xED, 0x7C, 0x83, 0xB1}}.
>>> if use  build_append_int_noprefix to add, may confuse others.
>>>
>>> -----------------------------------------------------------------
>>> -----------------------------------------------------------------
>>> There's no reason to define these messy one-time use macros.
>>> They just make it hard to look things up in the spec.
>>>
>>>
>>> You can use build_append_int_noprefix to add data of
>>> any length in LE format.
>>> -----------------------------------------------------------------
>>> -----------------------------------------------------------------
>>>
>>> Hi  Michael,
>>>   what is your suggestion about it? do you agree with Laszlo?
> 
> My main point is that the macros do not seem helpful.
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v5 2/3] ACPI: Add APEI GHES Table Generation support
  2017-07-13 19:41     ` Laszlo Ersek
  2017-07-13 22:31       ` Michael S. Tsirkin
@ 2017-07-17  4:43       ` gengdongjiu
  1 sibling, 0 replies; 19+ messages in thread
From: gengdongjiu @ 2017-07-17  4:43 UTC (permalink / raw)
  To: Laszlo Ersek, Michael S. Tsirkin
  Cc: imammedo, famz, qemu-devel, zhaoshenglong, peter.maydell,
	qemu-arm, james.morse, zhengqiang10, huangshaoyu, wuquanming

Laszlo,
  Thanks for the comments.

On 2017/7/14 3:41, Laszlo Ersek wrote:
snip

> 
> snip
> 
>>> +
>>> +    error_source_table = (AcpiHardwareErrorSourceTable *)(table_data->data
>>> +                        + table_data->len - buffer->len);
>>> +    error_source_table->error_source_count = GHES_ACPI_HEST_NOTIFY_RESERVED;
>>> +    error_source = (AcpiGenericHardwareErrorSource *)
>>> +        ((AcpiHardwareErrorSourceTable *)error_source_table + 1);
>>
>> Concern with all these casts and pointer math is that it is barely readable.
>> We can easily overflow the array and get hard to debug heap corruption
>> errors.
>>
>> What can I suggest?  Just add this to the array in steps. Then you will
>> not need all the math.
>>
>> Or again, you can just add things as you init them using build_append_int_noprefix.
>>
>>
>>> +
>>> +    bios_linker_loader_write_pointer(linker, GHES_DATA_ADDR_FW_CFG_FILE,
>>> +        0, sizeof(uint64_t), GHES_ERRORS_FW_CFG_FILE,
>>> +        GHES_ACPI_HEST_NOTIFY_RESERVED * sizeof(uint64_t));
>>
>> What does this uint64_t refer to? It's repeated everywhere.
>>
>>> +
>>> +    for (i = 0; i < GHES_ACPI_HEST_NOTIFY_RESERVED; i++) {
>>> +        error_source->type = ACPI_HEST_SOURCE_GENERIC_ERROR;
>>> +        error_source->source_id = cpu_to_le16(i);
>>> +        error_source->related_source_id = 0xffff;
>>> +        error_source->flags = 0;
>>> +        error_source->enabled = 1;
>>> +        /* The number of error status block per Generic Hardware Error Source */
>>
>> number of ... blocks ?
>>
>>> +        error_source->number_of_records = 1;
>>> +        error_source->max_sections_per_record = 1;
>>> +        error_source->max_raw_data_length = GHES_MAX_RAW_DATA_LENGTH;
>>> +        error_source->error_status_address.space_id =
>>> +                                    AML_SYSTEM_MEMORY;
>>> +        error_source->error_status_address.bit_width = 64;
>>> +        error_source->error_status_address.bit_offset = 0;
>>> +        error_source->error_status_address.access_width = 4;
>>> +        error_source->notify.type = i;
>>> +        error_source->notify.length = sizeof(AcpiHestNotify);
>>> +
>>> +        error_source->error_status_block_length = GHES_MAX_RAW_DATA_LENGTH;
>>> +
>>> +        bios_linker_loader_add_pointer(linker,
>>> +            ACPI_BUILD_TABLE_FILE, address_registers_offset + i *
>>> +            sizeof(AcpiGenericHardwareErrorSource), sizeof(uint64_t),
>>> +            GHES_ERRORS_FW_CFG_FILE, i * sizeof(uint64_t));
>>
>> and what's this math for?
>>
>>> +
>>> +        error_source++;
>>> +    }
>>> +
>>> +    for (i = 0; i < GHES_ACPI_HEST_NOTIFY_RESERVED; i++) {
>>> +        bios_linker_loader_add_pointer(linker,
>>> +            GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i, sizeof(uint64_t),
>>> +            GHES_ERRORS_FW_CFG_FILE, GHES_ACPI_HEST_NOTIFY_RESERVED *
>>> +            sizeof(uint64_t) + i * GHES_MAX_RAW_DATA_LENGTH);
>>> +    }
> 
> So basically all this math exists to set up the pointers that are shown
> in the diagram in the commit message. It is a bit tricky because most of
> those pointer fields (all 8-bytes wide) are individually embedded into
> their own containing structures. In the previous version of this patch
> set, I painstakingly verified the math, and pointed out wherever I
> thought updates were necessary.
Laszlo, you are right. it is indeed a bit tricky. Michael also has a concern
about it. I am changing to make it clear

> 
> I agree the math is hard to read, the code is very "dense". My
> suggestion (supporting yours) would be to calculate the fw_cfg blob
> offsets that should be patched in more fine-grained steps, possibly with
> multiple separate increments, using:
> - structure type names,
> - sizeof operators,
> - offsetof macros,
> - and possibly a separate comment for each offset increment.
good suggestion. I will follow that.

> 
> Thanks,
> Laszlo
> 
> .
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-07-17  4:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-12  2:08 [Qemu-devel] [PATCH v5 0/3] Generate APEI GHES table and dynamically record CPER Dongjiu Geng
2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 1/3] ACPI: Add new ACPI structures and macros Dongjiu Geng
2017-07-13 10:33   ` Laszlo Ersek
2017-07-13 12:00     ` gengdongjiu
2017-07-13 15:33       ` Laszlo Ersek
2017-07-13 17:13         ` Michael S. Tsirkin
2017-07-14  8:25           ` gengdongjiu
2017-07-13 17:35       ` Michael S. Tsirkin
2017-07-13 17:01   ` Michael S. Tsirkin
2017-07-14  5:51     ` gengdongjiu
2017-07-13 17:11   ` Michael S. Tsirkin
2017-07-14  8:22     ` gengdongjiu
2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 2/3] ACPI: Add APEI GHES Table Generation support Dongjiu Geng
2017-07-13 17:32   ` Michael S. Tsirkin
2017-07-13 19:41     ` Laszlo Ersek
2017-07-13 22:31       ` Michael S. Tsirkin
2017-07-17  4:43       ` gengdongjiu
2017-07-12  2:08 ` [Qemu-devel] [PATCH v5 3/3] ACPI: build and enable APEI GHES in the Makefile and configuration Dongjiu Geng
2017-07-13 17:36 ` [Qemu-devel] [PATCH v5 0/3] Generate APEI GHES table and dynamically record CPER Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.