All of lore.kernel.org
 help / color / mirror / Atom feed
* [RESEND PATCH v21 0/6] Add ARMv8 RAS virtualization support in QEMU
@ 2019-11-11  1:40 ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: zhengxiang9, wanghaibin.wang

In the ARMv8 platform, the CPU error types are synchronous external abort(SEA)
and SError Interrupt (SEI). If exception happens in guest, sometimes it's better
for guest to perform the recovery, because host does not know the detailed
information of guest. For example, if an exception happens in a user-space
application within guest, host does not know which application encounters
errors.

For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify userspace.
After user space gets the notification, it will record the CPER into guest GHES
buffer and inject an exception or IRQ into guest.

In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we will
treat it as a synchronous exception, and notify guest with ARMv8 SEA
notification type after recording CPER into guest.

This series of patches are based on Qemu 4.1, which include two parts:
1. Generate APEI/GHES table.
2. Handle the SIGBUS signal, record the CPER in runtime and fill it into guest
   memory, then notify guest according to the type of SIGBUS.

The whole solution was suggested by James(james.morse@arm.com); The solution of
APEI section was suggested by Laszlo(lersek@redhat.com).
Show some discussions in [1].

This series of patches have already been tested on ARM64 platform with RAS
feature enabled:
Show the APEI part verification result in [2].
Show the BUS_MCEERR_AR SIGBUS handling verification result in [3].

---
Change since v20:
1. Move some implementation details from acpi_ghes.h to acpi_ghes.c
2. Add the reviewers for the ACPI/APEI/GHES part

Change since v19:
1. Fix clang compile error
2. Fix sphinx build error

Change since v18:
1. Fix some code-style and typo/grammar problems.
2. Remove no_ras in the VirtMachineClass struct.
3. Convert documentation to rst format.
4. Simplize the code and add comments for some magic value.
5. Move kvm_inject_arm_sea() function into the patch where it's used.
6. Register the reset handler(kvm_unpoison_all()) in the kvm_init() function.

Change since v17:
1. Improve some commit messages and comments.
2. Fix some code-style problems.
3. Add a *ras* machine option.
4. Move HEST/GHES related structures and macros into "hw/acpi/acpi_ghes.*".
5. Move HWPoison page functions into "include/sysemu/kvm_int.h".
6. Fix some bugs.
7. Improve the design document.

Change since v16:
1. check whether ACPI table is enabled when handling the memory error in the SIGBUS handler.

Change since v15:
1. Add a doc-comment in the proper format for 'include/exec/ram_addr.h'
2. Remove write_part_cpustate_to_list() because there is another bug fix patch
   has been merged "arm: Allow system registers for KVM guests to be changed by QEMU code"
3. Add some comments for kvm_inject_arm_sea() in 'target/arm/kvm64.c'
4. Compare the arm_current_el() return value to 0,1,2,3, not to PSTATE_MODE_* constants.
5. Change the RAS support wasn't introduced before 4.1 QEMU version.
6. Move the no_ras flag  patch to begin in this series

Change since v14:
1. Remove the BUS_MCEERR_AO handling logic because this asynchronous signal was masked by main thread
2. Address some Igor Mammedov's comments(ACPI part)
   1) change the comments for the enum AcpiHestNotifyType definition and remove ditto in patch 1
   2) change some patch commit messages and separate "APEI GHES table generation" patch to more patches.
3. Address some peter's comments(arm64 Synchronous External Abort injection)
   1) change some code notes
   2) using arm_current_el() for current EL
   2) use the helper functions for those (syn_data_abort_*).

Change since v13:
1. Move the patches that set guest ESR and inject virtual SError out of this series
2. Clean and optimize the APEI part patches
3. Update the commit messages and add some comments for the code

Change since v12:
1. Address Paolo's comments to move HWPoisonPage definition to accel/kvm/kvm-all.c
2. Only call kvm_cpu_synchronize_state() when get the BUS_MCEERR_AR signal
3. Only add and enable GPIO-Signal and ARMv8 SEA two hardware error sources
4. Address Michael's comments to not sync SPDX from Linux kernel header file

Change since v11:
Address James's comments(james.morse@arm.com)
1. Check whether KVM has the capability to to set ESR instead of detecting host CPU RAS capability
2. For SIGBUS_MCEERR_AR SIGBUS, use Synchronous-External-Abort(SEA) notification type
   for SIGBUS_MCEERR_AO SIGBUS, use GPIO-Signal notification


Address Shannon's comments(for ACPI part):
1. Unify hest_ghes.c and hest_ghes.h license declaration
2. Remove unnecessary including "qmp-commands.h" in hest_ghes.c
3. Unconditionally add guest APEI table based on James's comments(james.morse@arm.com)
4. Add a option to virt machine for migration compatibility. On new virt machine it's on
   by default while off for old ones, we enabled it since 2.12
5. Refer to the ACPI spec version which introduces Hardware Error Notification first time
6. Add ACPI_HEST_NOTIFY_RESERVED notification type

Address Igor's comments(for ACPI part):
1. Add doc patch first which will describe how it's supposed to work between QEMU/firmware/guest
   OS with expected flows.
2. Move APEI diagrams into doc/spec patch
3. Remove redundant g_malloc in ghes_record_cper()
4. Use build_append_int_noprefix() API to compose whole error status block and whole APEI table,
   and try to get rid of most structures in patch 1, as they will be left unused after that
5. Reuse something like https://github.com/imammedo/qemu/commit/3d2fd6d13a3ea298d2ee814835495ce6241d085c
   to build GAS
6. Remove much offsetof() in the function
7. Build independent tables first and only then build dependent tables passing to it pointers
   to previously build table if necessary.
8. Redefine macro GHES_ACPI_HEST_NOTIFY_RESERVED to ACPI_HEST_ERROR_SOURCE_COUNT to avoid confusion


Address Peter Maydell's comments
1. linux-headers is done as a patch of their own created using scripts/update-linux-headers.sh run against a
   mainline kernel tree
2. Tested whether this patchset builds OK on aarch32
3. Abstract Hwpoison page adding code  out properly into a cpu-independent source file from target/i386/kvm.c,
   such as kvm-all.c
4. Add doc-comment formatted documentation comment for new globally-visible function prototype in a header

---
[1]:
https://lkml.org/lkml/2017/2/27/246
https://patchwork.kernel.org/patch/9633105/
https://patchwork.kernel.org/patch/9925227/

[2]:
Note: the UEFI(QEMU_EFI.fd) is needed if guest want to use ACPI table.

After guest boot up, dump the APEI table, then can see the initialized table
(1) # iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST
(2) # cat HEST.dsl
    /*
     * Intel ACPI Component Architecture
     * AML/ASL+ Disassembler version 20170728 (64-bit version)
     * Copyright (c) 2000 - 2017 Intel Corporation
     *
     * Disassembly of /sys/firmware/acpi/tables/HEST, Mon Sep  5 07:59:17 2016
     *
     * ACPI Data Table [HEST]
     *
     * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
     */

    ..................................................................................
    [308h 0776   2]                Subtable Type : 000A [Generic Hardware Error Source V2]
    [30Ah 0778   2]                    Source Id : 0001
    [30Ch 0780   2]            Related Source Id : FFFF
    [30Eh 0782   1]                     Reserved : 00
    [30Fh 0783   1]                      Enabled : 01
    [310h 0784   4]       Records To Preallocate : 00000001
    [314h 0788   4]      Max Sections Per Record : 00000001
    [318h 0792   4]          Max Raw Data Length : 00001000

    [31Ch 0796  12]         Error Status Address : [Generic Address Structure]
    [31Ch 0796   1]                     Space ID : 00 [SystemMemory]
    [31Dh 0797   1]                    Bit Width : 40
    [31Eh 0798   1]                   Bit Offset : 00
    [31Fh 0799   1]         Encoded Access Width : 04 [QWord Access:64]
    [320h 0800   8]                      Address : 00000000785D0040

    [328h 0808  28]                       Notify : [Hardware Error Notification Structure]
    [328h 0808   1]                  Notify Type : 08 [SEA]
    [329h 0809   1]                Notify Length : 1C
    [32Ah 0810   2]   Configuration Write Enable : 0000
    [32Ch 0812   4]                 PollInterval : 00000000
    [330h 0816   4]                       Vector : 00000000
    [334h 0820   4]      Polling Threshold Value : 00000000
    [338h 0824   4]     Polling Threshold Window : 00000000
    [33Ch 0828   4]        Error Threshold Value : 00000000
    [340h 0832   4]       Error Threshold Window : 00000000

    [344h 0836   4]    Error Status Block Length : 00001000
    [348h 0840  12]            Read Ack Register : [Generic Address Structure]
    [348h 0840   1]                     Space ID : 00 [SystemMemory]
    [349h 0841   1]                    Bit Width : 40
    [34Ah 0842   1]                   Bit Offset : 00
    [34Bh 0843   1]         Encoded Access Width : 04 [QWord Access:64]
    [34Ch 0844   8]                      Address : 00000000785D0098

    [354h 0852   8]            Read Ack Preserve : 00000000FFFFFFFE
    [35Ch 0860   8]               Read Ack Write : 0000000000000001

    .....................................................................................

(3) After a synchronous external abort(SEA) happen, Qemu receive a SIGBUS and 
    filled the CPER into guest GHES memory.  For example, according to above table,
    the address that contains the physical address of a block of memory that holds
    the error status data for this abort is 0x00000000785D0040
(4) the address for SEA notification error source is 0x785d80b0
    (qemu) xp /1 0x00000000785D0040
    00000000785d0040: 0x785d80b0

(5) check the content of generic error status block and generic error data entry
    (qemu) xp /100x 0x785d80b0
    00000000785d80b0: 0x00000001 0x00000000 0x00000000 0x00000098
    00000000785d80c0: 0x00000000 0xa5bc1114 0x4ede6f64 0x833e63b8
    00000000785d80d0: 0xb1837ced 0x00000000 0x00000300 0x00000050
    00000000785d80e0: 0x00000000 0x00000000 0x00000000 0x00000000
    00000000785d80f0: 0x00000000 0x00000000 0x00000000 0x00000000
    00000000785d8100: 0x00000000 0x00000000 0x00000000 0x00004002
(6) check the OSPM's ACK value(for example SEA)
    /* Before OSPM acknowledges the error, check the ACK value */
    (qemu) xp /1 0x00000000785D0098
    00000000785d00f0: 0x00000000

    /* After OSPM acknowledges the error, check the ACK value, it change to 1 from 0 */
    (qemu) xp /1 0x00000000785D0098
    00000000785d00f0: 0x00000001

[3]: KVM deliver "BUS_MCEERR_AR" to Qemu, Qemu record the guest CPER and inject
    synchronous external abort to notify guest, then guest do the recovery.

[ 1552.516170] Synchronous External Abort: synchronous external abort (0x92000410) at 0x000000003751c6b4
[ 1553.074073] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 8
[ 1553.081654] {1}[Hardware Error]: event severity: recoverable
[ 1554.034191] {1}[Hardware Error]:  Error 0, type: recoverable
[ 1554.037934] {1}[Hardware Error]:   section_type: memory error
[ 1554.513261] {1}[Hardware Error]:   physical_address: 0x0000000040fa6000
[ 1554.513944] {1}[Hardware Error]:   error_type: 0, unknown
[ 1555.041451] Memory failure: 0x40fa6: Killing mca-recover:1296 due to hardware memory corruption
[ 1555.373116] Memory failure: 0x40fa6: recovery action for dirty LRU page: Recovered

Dongjiu Geng (6):
  hw/arm/virt: Introduce a RAS machine option
  docs: APEI GHES generation and CPER record description
  ACPI: Add APEI GHES table generation support
  KVM: Move hwpoison page related functions into kvm-all.c
  target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  MAINTAINERS: Add APCI/APEI/GHES entries

 MAINTAINERS                     |   9 +
 accel/kvm/kvm-all.c             |  36 ++
 default-configs/arm-softmmu.mak |   1 +
 docs/specs/acpi_hest_ghes.rst   |  95 ++++++
 docs/specs/index.rst            |   1 +
 hw/acpi/Kconfig                 |   4 +
 hw/acpi/Makefile.objs           |   1 +
 hw/acpi/acpi_ghes.c             | 564 ++++++++++++++++++++++++++++++++
 hw/acpi/aml-build.c             |   2 +
 hw/arm/virt-acpi-build.c        |  12 +
 hw/arm/virt.c                   |  23 ++
 include/hw/acpi/acpi_ghes.h     |  60 ++++
 include/hw/acpi/aml-build.h     |   1 +
 include/hw/arm/virt.h           |   1 +
 include/sysemu/kvm.h            |   3 +-
 include/sysemu/kvm_int.h        |  12 +
 target/arm/cpu.h                |   4 +
 target/arm/helper.c             |   2 +-
 target/arm/internals.h          |   5 +-
 target/arm/kvm64.c              |  64 ++++
 target/arm/tlb_helper.c         |   2 +-
 target/i386/cpu.h               |   2 +
 target/i386/kvm.c               |  36 --
 23 files changed, 898 insertions(+), 42 deletions(-)
 create mode 100644 docs/specs/acpi_hest_ghes.rst
 create mode 100644 hw/acpi/acpi_ghes.c
 create mode 100644 include/hw/acpi/acpi_ghes.h

-- 
2.19.1



^ permalink raw reply	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 0/6] Add ARMv8 RAS virtualization support in QEMU
@ 2019-11-11  1:40 ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: wanghaibin.wang, zhengxiang9

In the ARMv8 platform, the CPU error types are synchronous external abort(SEA)
and SError Interrupt (SEI). If exception happens in guest, sometimes it's better
for guest to perform the recovery, because host does not know the detailed
information of guest. For example, if an exception happens in a user-space
application within guest, host does not know which application encounters
errors.

For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify userspace.
After user space gets the notification, it will record the CPER into guest GHES
buffer and inject an exception or IRQ into guest.

In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we will
treat it as a synchronous exception, and notify guest with ARMv8 SEA
notification type after recording CPER into guest.

This series of patches are based on Qemu 4.1, which include two parts:
1. Generate APEI/GHES table.
2. Handle the SIGBUS signal, record the CPER in runtime and fill it into guest
   memory, then notify guest according to the type of SIGBUS.

The whole solution was suggested by James(james.morse@arm.com); The solution of
APEI section was suggested by Laszlo(lersek@redhat.com).
Show some discussions in [1].

This series of patches have already been tested on ARM64 platform with RAS
feature enabled:
Show the APEI part verification result in [2].
Show the BUS_MCEERR_AR SIGBUS handling verification result in [3].

---
Change since v20:
1. Move some implementation details from acpi_ghes.h to acpi_ghes.c
2. Add the reviewers for the ACPI/APEI/GHES part

Change since v19:
1. Fix clang compile error
2. Fix sphinx build error

Change since v18:
1. Fix some code-style and typo/grammar problems.
2. Remove no_ras in the VirtMachineClass struct.
3. Convert documentation to rst format.
4. Simplize the code and add comments for some magic value.
5. Move kvm_inject_arm_sea() function into the patch where it's used.
6. Register the reset handler(kvm_unpoison_all()) in the kvm_init() function.

Change since v17:
1. Improve some commit messages and comments.
2. Fix some code-style problems.
3. Add a *ras* machine option.
4. Move HEST/GHES related structures and macros into "hw/acpi/acpi_ghes.*".
5. Move HWPoison page functions into "include/sysemu/kvm_int.h".
6. Fix some bugs.
7. Improve the design document.

Change since v16:
1. check whether ACPI table is enabled when handling the memory error in the SIGBUS handler.

Change since v15:
1. Add a doc-comment in the proper format for 'include/exec/ram_addr.h'
2. Remove write_part_cpustate_to_list() because there is another bug fix patch
   has been merged "arm: Allow system registers for KVM guests to be changed by QEMU code"
3. Add some comments for kvm_inject_arm_sea() in 'target/arm/kvm64.c'
4. Compare the arm_current_el() return value to 0,1,2,3, not to PSTATE_MODE_* constants.
5. Change the RAS support wasn't introduced before 4.1 QEMU version.
6. Move the no_ras flag  patch to begin in this series

Change since v14:
1. Remove the BUS_MCEERR_AO handling logic because this asynchronous signal was masked by main thread
2. Address some Igor Mammedov's comments(ACPI part)
   1) change the comments for the enum AcpiHestNotifyType definition and remove ditto in patch 1
   2) change some patch commit messages and separate "APEI GHES table generation" patch to more patches.
3. Address some peter's comments(arm64 Synchronous External Abort injection)
   1) change some code notes
   2) using arm_current_el() for current EL
   2) use the helper functions for those (syn_data_abort_*).

Change since v13:
1. Move the patches that set guest ESR and inject virtual SError out of this series
2. Clean and optimize the APEI part patches
3. Update the commit messages and add some comments for the code

Change since v12:
1. Address Paolo's comments to move HWPoisonPage definition to accel/kvm/kvm-all.c
2. Only call kvm_cpu_synchronize_state() when get the BUS_MCEERR_AR signal
3. Only add and enable GPIO-Signal and ARMv8 SEA two hardware error sources
4. Address Michael's comments to not sync SPDX from Linux kernel header file

Change since v11:
Address James's comments(james.morse@arm.com)
1. Check whether KVM has the capability to to set ESR instead of detecting host CPU RAS capability
2. For SIGBUS_MCEERR_AR SIGBUS, use Synchronous-External-Abort(SEA) notification type
   for SIGBUS_MCEERR_AO SIGBUS, use GPIO-Signal notification


Address Shannon's comments(for ACPI part):
1. Unify hest_ghes.c and hest_ghes.h license declaration
2. Remove unnecessary including "qmp-commands.h" in hest_ghes.c
3. Unconditionally add guest APEI table based on James's comments(james.morse@arm.com)
4. Add a option to virt machine for migration compatibility. On new virt machine it's on
   by default while off for old ones, we enabled it since 2.12
5. Refer to the ACPI spec version which introduces Hardware Error Notification first time
6. Add ACPI_HEST_NOTIFY_RESERVED notification type

Address Igor's comments(for ACPI part):
1. Add doc patch first which will describe how it's supposed to work between QEMU/firmware/guest
   OS with expected flows.
2. Move APEI diagrams into doc/spec patch
3. Remove redundant g_malloc in ghes_record_cper()
4. Use build_append_int_noprefix() API to compose whole error status block and whole APEI table,
   and try to get rid of most structures in patch 1, as they will be left unused after that
5. Reuse something like https://github.com/imammedo/qemu/commit/3d2fd6d13a3ea298d2ee814835495ce6241d085c
   to build GAS
6. Remove much offsetof() in the function
7. Build independent tables first and only then build dependent tables passing to it pointers
   to previously build table if necessary.
8. Redefine macro GHES_ACPI_HEST_NOTIFY_RESERVED to ACPI_HEST_ERROR_SOURCE_COUNT to avoid confusion


Address Peter Maydell's comments
1. linux-headers is done as a patch of their own created using scripts/update-linux-headers.sh run against a
   mainline kernel tree
2. Tested whether this patchset builds OK on aarch32
3. Abstract Hwpoison page adding code  out properly into a cpu-independent source file from target/i386/kvm.c,
   such as kvm-all.c
4. Add doc-comment formatted documentation comment for new globally-visible function prototype in a header

---
[1]:
https://lkml.org/lkml/2017/2/27/246
https://patchwork.kernel.org/patch/9633105/
https://patchwork.kernel.org/patch/9925227/

[2]:
Note: the UEFI(QEMU_EFI.fd) is needed if guest want to use ACPI table.

After guest boot up, dump the APEI table, then can see the initialized table
(1) # iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST
(2) # cat HEST.dsl
    /*
     * Intel ACPI Component Architecture
     * AML/ASL+ Disassembler version 20170728 (64-bit version)
     * Copyright (c) 2000 - 2017 Intel Corporation
     *
     * Disassembly of /sys/firmware/acpi/tables/HEST, Mon Sep  5 07:59:17 2016
     *
     * ACPI Data Table [HEST]
     *
     * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
     */

    ..................................................................................
    [308h 0776   2]                Subtable Type : 000A [Generic Hardware Error Source V2]
    [30Ah 0778   2]                    Source Id : 0001
    [30Ch 0780   2]            Related Source Id : FFFF
    [30Eh 0782   1]                     Reserved : 00
    [30Fh 0783   1]                      Enabled : 01
    [310h 0784   4]       Records To Preallocate : 00000001
    [314h 0788   4]      Max Sections Per Record : 00000001
    [318h 0792   4]          Max Raw Data Length : 00001000

    [31Ch 0796  12]         Error Status Address : [Generic Address Structure]
    [31Ch 0796   1]                     Space ID : 00 [SystemMemory]
    [31Dh 0797   1]                    Bit Width : 40
    [31Eh 0798   1]                   Bit Offset : 00
    [31Fh 0799   1]         Encoded Access Width : 04 [QWord Access:64]
    [320h 0800   8]                      Address : 00000000785D0040

    [328h 0808  28]                       Notify : [Hardware Error Notification Structure]
    [328h 0808   1]                  Notify Type : 08 [SEA]
    [329h 0809   1]                Notify Length : 1C
    [32Ah 0810   2]   Configuration Write Enable : 0000
    [32Ch 0812   4]                 PollInterval : 00000000
    [330h 0816   4]                       Vector : 00000000
    [334h 0820   4]      Polling Threshold Value : 00000000
    [338h 0824   4]     Polling Threshold Window : 00000000
    [33Ch 0828   4]        Error Threshold Value : 00000000
    [340h 0832   4]       Error Threshold Window : 00000000

    [344h 0836   4]    Error Status Block Length : 00001000
    [348h 0840  12]            Read Ack Register : [Generic Address Structure]
    [348h 0840   1]                     Space ID : 00 [SystemMemory]
    [349h 0841   1]                    Bit Width : 40
    [34Ah 0842   1]                   Bit Offset : 00
    [34Bh 0843   1]         Encoded Access Width : 04 [QWord Access:64]
    [34Ch 0844   8]                      Address : 00000000785D0098

    [354h 0852   8]            Read Ack Preserve : 00000000FFFFFFFE
    [35Ch 0860   8]               Read Ack Write : 0000000000000001

    .....................................................................................

(3) After a synchronous external abort(SEA) happen, Qemu receive a SIGBUS and 
    filled the CPER into guest GHES memory.  For example, according to above table,
    the address that contains the physical address of a block of memory that holds
    the error status data for this abort is 0x00000000785D0040
(4) the address for SEA notification error source is 0x785d80b0
    (qemu) xp /1 0x00000000785D0040
    00000000785d0040: 0x785d80b0

(5) check the content of generic error status block and generic error data entry
    (qemu) xp /100x 0x785d80b0
    00000000785d80b0: 0x00000001 0x00000000 0x00000000 0x00000098
    00000000785d80c0: 0x00000000 0xa5bc1114 0x4ede6f64 0x833e63b8
    00000000785d80d0: 0xb1837ced 0x00000000 0x00000300 0x00000050
    00000000785d80e0: 0x00000000 0x00000000 0x00000000 0x00000000
    00000000785d80f0: 0x00000000 0x00000000 0x00000000 0x00000000
    00000000785d8100: 0x00000000 0x00000000 0x00000000 0x00004002
(6) check the OSPM's ACK value(for example SEA)
    /* Before OSPM acknowledges the error, check the ACK value */
    (qemu) xp /1 0x00000000785D0098
    00000000785d00f0: 0x00000000

    /* After OSPM acknowledges the error, check the ACK value, it change to 1 from 0 */
    (qemu) xp /1 0x00000000785D0098
    00000000785d00f0: 0x00000001

[3]: KVM deliver "BUS_MCEERR_AR" to Qemu, Qemu record the guest CPER and inject
    synchronous external abort to notify guest, then guest do the recovery.

[ 1552.516170] Synchronous External Abort: synchronous external abort (0x92000410) at 0x000000003751c6b4
[ 1553.074073] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 8
[ 1553.081654] {1}[Hardware Error]: event severity: recoverable
[ 1554.034191] {1}[Hardware Error]:  Error 0, type: recoverable
[ 1554.037934] {1}[Hardware Error]:   section_type: memory error
[ 1554.513261] {1}[Hardware Error]:   physical_address: 0x0000000040fa6000
[ 1554.513944] {1}[Hardware Error]:   error_type: 0, unknown
[ 1555.041451] Memory failure: 0x40fa6: Killing mca-recover:1296 due to hardware memory corruption
[ 1555.373116] Memory failure: 0x40fa6: recovery action for dirty LRU page: Recovered

Dongjiu Geng (6):
  hw/arm/virt: Introduce a RAS machine option
  docs: APEI GHES generation and CPER record description
  ACPI: Add APEI GHES table generation support
  KVM: Move hwpoison page related functions into kvm-all.c
  target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  MAINTAINERS: Add APCI/APEI/GHES entries

 MAINTAINERS                     |   9 +
 accel/kvm/kvm-all.c             |  36 ++
 default-configs/arm-softmmu.mak |   1 +
 docs/specs/acpi_hest_ghes.rst   |  95 ++++++
 docs/specs/index.rst            |   1 +
 hw/acpi/Kconfig                 |   4 +
 hw/acpi/Makefile.objs           |   1 +
 hw/acpi/acpi_ghes.c             | 564 ++++++++++++++++++++++++++++++++
 hw/acpi/aml-build.c             |   2 +
 hw/arm/virt-acpi-build.c        |  12 +
 hw/arm/virt.c                   |  23 ++
 include/hw/acpi/acpi_ghes.h     |  60 ++++
 include/hw/acpi/aml-build.h     |   1 +
 include/hw/arm/virt.h           |   1 +
 include/sysemu/kvm.h            |   3 +-
 include/sysemu/kvm_int.h        |  12 +
 target/arm/cpu.h                |   4 +
 target/arm/helper.c             |   2 +-
 target/arm/internals.h          |   5 +-
 target/arm/kvm64.c              |  64 ++++
 target/arm/tlb_helper.c         |   2 +-
 target/i386/cpu.h               |   2 +
 target/i386/kvm.c               |  36 --
 23 files changed, 898 insertions(+), 42 deletions(-)
 create mode 100644 docs/specs/acpi_hest_ghes.rst
 create mode 100644 hw/acpi/acpi_ghes.c
 create mode 100644 include/hw/acpi/acpi_ghes.h

-- 
2.19.1




^ permalink raw reply	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 1/6] hw/arm/virt: Introduce a RAS machine option
  2019-11-11  1:40 ` Xiang Zheng
@ 2019-11-11  1:40   ` Xiang Zheng
  -1 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: zhengxiang9, wanghaibin.wang

From: Dongjiu Geng <gengdongjiu@huawei.com>

RAS Virtualization feature is not supported now, so add a RAS machine
option and disable it by default.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
---
 hw/arm/virt.c         | 23 +++++++++++++++++++++++
 include/hw/arm/virt.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d4bedc2607..ea0fbf82be 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1819,6 +1819,20 @@ static void virt_set_its(Object *obj, bool value, Error **errp)
     vms->its = value;
 }
 
+static bool virt_get_ras(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->ras;
+}
+
+static void virt_set_ras(Object *obj, bool value, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->ras = value;
+}
+
 static char *virt_get_gic_version(Object *obj, Error **errp)
 {
     VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2122,6 +2136,15 @@ static void virt_instance_init(Object *obj)
                                     "Valid values are none and smmuv3",
                                     NULL);
 
+    /* Default disallows RAS instantiation */
+    vms->ras = false;
+    object_property_add_bool(obj, "ras", virt_get_ras,
+                             virt_set_ras, NULL);
+    object_property_set_description(obj, "ras",
+                                    "Set on/off to enable/disable "
+                                    "RAS instantiation",
+                                    NULL);
+
     vms->irqmap = a15irqmap;
 
     virt_flash_create(vms);
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 0b41083e9d..989785f2f7 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -122,6 +122,7 @@ typedef struct {
     bool highmem_ecam;
     bool its;
     bool virt;
+    bool ras;
     int32_t gic_version;
     VirtIOMMUType iommu;
     struct arm_boot_info bootinfo;
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 1/6] hw/arm/virt: Introduce a RAS machine option
@ 2019-11-11  1:40   ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: wanghaibin.wang, zhengxiang9

From: Dongjiu Geng <gengdongjiu@huawei.com>

RAS Virtualization feature is not supported now, so add a RAS machine
option and disable it by default.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
---
 hw/arm/virt.c         | 23 +++++++++++++++++++++++
 include/hw/arm/virt.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d4bedc2607..ea0fbf82be 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1819,6 +1819,20 @@ static void virt_set_its(Object *obj, bool value, Error **errp)
     vms->its = value;
 }
 
+static bool virt_get_ras(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->ras;
+}
+
+static void virt_set_ras(Object *obj, bool value, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->ras = value;
+}
+
 static char *virt_get_gic_version(Object *obj, Error **errp)
 {
     VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2122,6 +2136,15 @@ static void virt_instance_init(Object *obj)
                                     "Valid values are none and smmuv3",
                                     NULL);
 
+    /* Default disallows RAS instantiation */
+    vms->ras = false;
+    object_property_add_bool(obj, "ras", virt_get_ras,
+                             virt_set_ras, NULL);
+    object_property_set_description(obj, "ras",
+                                    "Set on/off to enable/disable "
+                                    "RAS instantiation",
+                                    NULL);
+
     vms->irqmap = a15irqmap;
 
     virt_flash_create(vms);
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 0b41083e9d..989785f2f7 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -122,6 +122,7 @@ typedef struct {
     bool highmem_ecam;
     bool its;
     bool virt;
+    bool ras;
     int32_t gic_version;
     VirtIOMMUType iommu;
     struct arm_boot_info bootinfo;
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 2/6] docs: APEI GHES generation and CPER record description
  2019-11-11  1:40 ` Xiang Zheng
@ 2019-11-11  1:40   ` Xiang Zheng
  -1 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: zhengxiang9, wanghaibin.wang

From: Dongjiu Geng <gengdongjiu@huawei.com>

Add APEI/GHES detailed design document

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/specs/acpi_hest_ghes.rst | 95 +++++++++++++++++++++++++++++++++++
 docs/specs/index.rst          |  1 +
 2 files changed, 96 insertions(+)
 create mode 100644 docs/specs/acpi_hest_ghes.rst

diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
new file mode 100644
index 0000000000..348825f9d3
--- /dev/null
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -0,0 +1,95 @@
+APEI tables generating and CPER record
+======================================
+
+..
+   Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
+
+   This work is licensed under the terms of the GNU GPL, version 2 or later.
+   See the COPYING file in the top-level directory.
+
+Design Details
+--------------
+
+::
+
+         etc/acpi/tables                                 etc/hardware_errors
+      ====================                      ==========================================
+  + +--------------------------+            +-----------------------+
+  | | HEST                     |            |    address            |            +--------------+
+  | +--------------------------+            |    registers          |            | Error Status |
+  | | GHES1                    |            | +---------------------+            | Data Block 1 |
+  | +--------------------------+ +--------->| |error_block_address1 |----------->| +------------+
+  | | .................        | |          | +---------------------+            | |  CPER      |
+  | | error_status_address-----+-+ +------->| |error_block_address2 |--------+   | |  CPER      |
+  | | .................        |   |        | +---------------------+        |   | |  ....      |
+  | | read_ack_register--------+-+ |        | |    ..............   |        |   | |  CPER      |
+  | | read_ack_preserve        | | |        +-----------------------+        |   | +------------+
+  | | read_ack_write           | | | +----->| |error_block_addressN |------+ |   | Error Status |
+  + +--------------------------+ | | |      | +---------------------+      | |   | Data Block 2 |
+  | | GHES2                    | +-+-+----->| |read_ack_register1   |      | +-->| +------------+
+  + +--------------------------+   | |      | +---------------------+      |     | |  CPER      |
+  | | .................        |   | | +--->| |read_ack_register2   |      |     | |  CPER      |
+  | | error_status_address-----+---+ | |    | +---------------------+      |     | |  ....      |
+  | | .................        |     | |    | |  .............      |      |     | |  CPER      |
+  | | read_ack_register--------+-----+-+    | +---------------------+      |     +-+------------+
+  | | read_ack_preserve        |     |   +->| |read_ack_registerN   |      |     | |..........  |
+  | | read_ack_write           |     |   |  | +---------------------+      |     | +------------+
+  + +--------------------------|     |   |                                 |     | Error Status |
+  | | ...............          |     |   |                                 |     | Data Block N |
+  + +--------------------------+     |   |                                 +---->| +------------+
+  | | GHESN                    |     |   |                                       | |  CPER      |
+  + +--------------------------+     |   |                                       | |  CPER      |
+  | | .................        |     |   |                                       | |  ....      |
+  | | error_status_address-----+-----+   |                                       | |  CPER      |
+  | | .................        |         |                                       +-+------------+
+  | | read_ack_register--------+---------+
+  | | read_ack_preserve        |
+  | | read_ack_write           |
+  + +--------------------------+
+
+(1) QEMU generates the ACPI HEST table. This table goes in the current
+    "etc/acpi/tables" fw_cfg blob. Each error source has different
+    notification types.
+
+(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
+    also needs to populate this blob. The "etc/hardware_errors" fw_cfg blob
+    contains an address registers table and an Error Status Data Block table.
+
+(3) The address registers table contains N Error Block Address entries
+    and N Read Ack Register entries. The size for each entry is 8-byte.
+    The Error Status Data Block table contains N Error Status Data Block
+    entries. The size for each entry is 4096(0x1000) bytes. The total size
+    for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
+    N is the number of the kinds of hardware error sources.
+
+(4) QEMU generates the ACPI linker/loader script for the firmware. The
+    firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors"
+    and copies blob contents there.
+
+(5) QEMU generates N ADD_POINTER commands, which patch addresses in the
+    "error_status_address" fields of the HEST table with a pointer to the
+    corresponding "address registers" in the "etc/hardware_errors" blob.
+
+(6) QEMU generates N ADD_POINTER commands, which patch addresses in the
+    "read_ack_register" fields of the HEST table with a pointer to the
+    corresponding "address registers" in the "etc/hardware_errors" blob.
+
+(7) QEMU generates N ADD_POINTER commands for the firmware, which patch
+    addresses in the "error_block_address" fields with a pointer to the
+    respective "Error Status Data Block" in the "etc/hardware_errors" blob.
+
+(8) QEMU defines a third and write-only fw_cfg blob which is called
+    "etc/hardware_errors_addr". Through that blob, the firmware can send back
+    the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
+    blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command
+    for the firmware. The firmware will write back the start address of
+    "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr".
+
+(9) When QEMU gets a SIGBUS from the kernel, QEMU formats the CPER right into
+    guest memory, and then injects platform specific interrupt (in case of
+    arm/virt machine it's Synchronous External Abort) as a notification which
+    is necessary for notifying the guest.
+
+(10) This notification (in virtual hardware) will be handled by the guest
+     kernel, guest APEI driver will read the CPER which is recorded by QEMU and
+     do the recovery.
diff --git a/docs/specs/index.rst b/docs/specs/index.rst
index 984ba44029..3019b9c976 100644
--- a/docs/specs/index.rst
+++ b/docs/specs/index.rst
@@ -13,3 +13,4 @@ Contents:
    ppc-xive
    ppc-spapr-xive
    acpi_hw_reduced_hotplug
+   acpi_hest_ghes
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 2/6] docs: APEI GHES generation and CPER record description
@ 2019-11-11  1:40   ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: wanghaibin.wang, zhengxiang9

From: Dongjiu Geng <gengdongjiu@huawei.com>

Add APEI/GHES detailed design document

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/specs/acpi_hest_ghes.rst | 95 +++++++++++++++++++++++++++++++++++
 docs/specs/index.rst          |  1 +
 2 files changed, 96 insertions(+)
 create mode 100644 docs/specs/acpi_hest_ghes.rst

diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
new file mode 100644
index 0000000000..348825f9d3
--- /dev/null
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -0,0 +1,95 @@
+APEI tables generating and CPER record
+======================================
+
+..
+   Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
+
+   This work is licensed under the terms of the GNU GPL, version 2 or later.
+   See the COPYING file in the top-level directory.
+
+Design Details
+--------------
+
+::
+
+         etc/acpi/tables                                 etc/hardware_errors
+      ====================                      ==========================================
+  + +--------------------------+            +-----------------------+
+  | | HEST                     |            |    address            |            +--------------+
+  | +--------------------------+            |    registers          |            | Error Status |
+  | | GHES1                    |            | +---------------------+            | Data Block 1 |
+  | +--------------------------+ +--------->| |error_block_address1 |----------->| +------------+
+  | | .................        | |          | +---------------------+            | |  CPER      |
+  | | error_status_address-----+-+ +------->| |error_block_address2 |--------+   | |  CPER      |
+  | | .................        |   |        | +---------------------+        |   | |  ....      |
+  | | read_ack_register--------+-+ |        | |    ..............   |        |   | |  CPER      |
+  | | read_ack_preserve        | | |        +-----------------------+        |   | +------------+
+  | | read_ack_write           | | | +----->| |error_block_addressN |------+ |   | Error Status |
+  + +--------------------------+ | | |      | +---------------------+      | |   | Data Block 2 |
+  | | GHES2                    | +-+-+----->| |read_ack_register1   |      | +-->| +------------+
+  + +--------------------------+   | |      | +---------------------+      |     | |  CPER      |
+  | | .................        |   | | +--->| |read_ack_register2   |      |     | |  CPER      |
+  | | error_status_address-----+---+ | |    | +---------------------+      |     | |  ....      |
+  | | .................        |     | |    | |  .............      |      |     | |  CPER      |
+  | | read_ack_register--------+-----+-+    | +---------------------+      |     +-+------------+
+  | | read_ack_preserve        |     |   +->| |read_ack_registerN   |      |     | |..........  |
+  | | read_ack_write           |     |   |  | +---------------------+      |     | +------------+
+  + +--------------------------|     |   |                                 |     | Error Status |
+  | | ...............          |     |   |                                 |     | Data Block N |
+  + +--------------------------+     |   |                                 +---->| +------------+
+  | | GHESN                    |     |   |                                       | |  CPER      |
+  + +--------------------------+     |   |                                       | |  CPER      |
+  | | .................        |     |   |                                       | |  ....      |
+  | | error_status_address-----+-----+   |                                       | |  CPER      |
+  | | .................        |         |                                       +-+------------+
+  | | read_ack_register--------+---------+
+  | | read_ack_preserve        |
+  | | read_ack_write           |
+  + +--------------------------+
+
+(1) QEMU generates the ACPI HEST table. This table goes in the current
+    "etc/acpi/tables" fw_cfg blob. Each error source has different
+    notification types.
+
+(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
+    also needs to populate this blob. The "etc/hardware_errors" fw_cfg blob
+    contains an address registers table and an Error Status Data Block table.
+
+(3) The address registers table contains N Error Block Address entries
+    and N Read Ack Register entries. The size for each entry is 8-byte.
+    The Error Status Data Block table contains N Error Status Data Block
+    entries. The size for each entry is 4096(0x1000) bytes. The total size
+    for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
+    N is the number of the kinds of hardware error sources.
+
+(4) QEMU generates the ACPI linker/loader script for the firmware. The
+    firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors"
+    and copies blob contents there.
+
+(5) QEMU generates N ADD_POINTER commands, which patch addresses in the
+    "error_status_address" fields of the HEST table with a pointer to the
+    corresponding "address registers" in the "etc/hardware_errors" blob.
+
+(6) QEMU generates N ADD_POINTER commands, which patch addresses in the
+    "read_ack_register" fields of the HEST table with a pointer to the
+    corresponding "address registers" in the "etc/hardware_errors" blob.
+
+(7) QEMU generates N ADD_POINTER commands for the firmware, which patch
+    addresses in the "error_block_address" fields with a pointer to the
+    respective "Error Status Data Block" in the "etc/hardware_errors" blob.
+
+(8) QEMU defines a third and write-only fw_cfg blob which is called
+    "etc/hardware_errors_addr". Through that blob, the firmware can send back
+    the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
+    blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command
+    for the firmware. The firmware will write back the start address of
+    "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr".
+
+(9) When QEMU gets a SIGBUS from the kernel, QEMU formats the CPER right into
+    guest memory, and then injects platform specific interrupt (in case of
+    arm/virt machine it's Synchronous External Abort) as a notification which
+    is necessary for notifying the guest.
+
+(10) This notification (in virtual hardware) will be handled by the guest
+     kernel, guest APEI driver will read the CPER which is recorded by QEMU and
+     do the recovery.
diff --git a/docs/specs/index.rst b/docs/specs/index.rst
index 984ba44029..3019b9c976 100644
--- a/docs/specs/index.rst
+++ b/docs/specs/index.rst
@@ -13,3 +13,4 @@ Contents:
    ppc-xive
    ppc-spapr-xive
    acpi_hw_reduced_hotplug
+   acpi_hest_ghes
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
  2019-11-11  1:40 ` Xiang Zheng
@ 2019-11-11  1:40   ` Xiang Zheng
  -1 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: zhengxiang9, wanghaibin.wang

From: Dongjiu Geng <gengdongjiu@huawei.com>

This patch implements APEI GHES Table generation via fw_cfg blobs. Now
it only supports ARMv8 SEA, a type of GHESv2 error source. Afterwards,
we can extend the supported types if needed. For the CPER section,
currently it is memory section because kernel mainly wants userspace to
handle the memory errors.

This patch follows the spec ACPI 6.2 to build the Hardware Error Source
table. For more detailed information, please refer to document:
docs/specs/acpi_hest_ghes.rst

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 default-configs/arm-softmmu.mak |   1 +
 hw/acpi/Kconfig                 |   4 +
 hw/acpi/Makefile.objs           |   1 +
 hw/acpi/acpi_ghes.c             | 267 ++++++++++++++++++++++++++++++++
 hw/acpi/aml-build.c             |   2 +
 hw/arm/virt-acpi-build.c        |  12 ++
 include/hw/acpi/acpi_ghes.h     |  56 +++++++
 include/hw/acpi/aml-build.h     |   1 +
 8 files changed, 344 insertions(+)
 create mode 100644 hw/acpi/acpi_ghes.c
 create mode 100644 include/hw/acpi/acpi_ghes.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 1f2e0e7fde..5722f3130e 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
 CONFIG_FSL_IMX7=y
 CONFIG_FSL_IMX6UL=y
 CONFIG_SEMIHOSTING=y
+CONFIG_ACPI_APEI=y
diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 12e3f1e86e..ed8c34d238 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -23,6 +23,10 @@ config ACPI_NVDIMM
     bool
     depends on ACPI
 
+config ACPI_APEI
+    bool
+    depends on ACPI
+
 config ACPI_PCI
     bool
     depends on ACPI && PCI
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 655a9c1973..84474b0ca8 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
+common-obj-$(CONFIG_ACPI_APEI) += acpi_ghes.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
 common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
new file mode 100644
index 0000000000..42c00ff3d3
--- /dev/null
+++ b/hw/acpi/acpi_ghes.c
@@ -0,0 +1,267 @@
+/*
+ * Support for generating APEI tables and recording CPER for Guests
+ *
+ * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
+ *
+ * Author: Dongjiu Geng <gengdongjiu@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/acpi_ghes.h"
+#include "hw/nvram/fw_cfg.h"
+#include "sysemu/sysemu.h"
+#include "qemu/error-report.h"
+
+#define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
+#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
+
+/*
+ * The size of Address field in Generic Address Structure.
+ * ACPI 2.0/3.0: 5.2.3.1 Generic Address Structure.
+ */
+#define ACPI_GHES_ADDRESS_SIZE              8
+
+/* The max size in bytes for one error block */
+#define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
+
+/*
+ * Now only support ARMv8 SEA notification type error source
+ */
+#define ACPI_GHES_ERROR_SOURCE_COUNT        1
+
+/*
+ * Generic Hardware Error Source version 2
+ */
+#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
+
+/*
+ * | +--------------------------+ 0
+ * | |        Header            |
+ * | +--------------------------+ 40---+-
+ * | | .................        |      |
+ * | | error_status_address-----+ 60   |
+ * | | .................        |      |
+ * | | read_ack_register--------+ 104  92
+ * | | read_ack_preserve        |      |
+ * | | read_ack_write           |      |
+ * + +--------------------------+ 132--+-
+ *
+ * From above GHES definition, the error status address offset is 60;
+ * the Read Ack Register offset is 104, the whole size of GHESv2 is 92
+ */
+
+/* The error status address offset in GHES */
+#define ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(start_addr, n) (start_addr + \
+            60 + offsetof(struct AcpiGenericAddress, address) + n * 92)
+
+/* The Read Ack Register offset in GHES */
+#define ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(start_addr, n) (start_addr +\
+            104 + offsetof(struct AcpiGenericAddress, address) + n * 92)
+
+typedef struct AcpiGhesState {
+    uint64_t ghes_addr_le;
+} AcpiGhesState;
+
+/*
+ * Hardware Error Notification
+ * ACPI 4.0: 17.3.2.7 Hardware Error Notification
+ */
+static void acpi_ghes_build_notify(GArray *table, const uint8_t type)
+{
+        /* Type */
+        build_append_int_noprefix(table, type, 1);
+        /*
+         * Length:
+         * Total length of the structure in bytes
+         */
+        build_append_int_noprefix(table, 28, 1);
+        /* Configuration Write Enable */
+        build_append_int_noprefix(table, 0, 2);
+        /* Poll Interval */
+        build_append_int_noprefix(table, 0, 4);
+        /* Vector */
+        build_append_int_noprefix(table, 0, 4);
+        /* Switch To Polling Threshold Value */
+        build_append_int_noprefix(table, 0, 4);
+        /* Switch To Polling Threshold Window */
+        build_append_int_noprefix(table, 0, 4);
+        /* Error Threshold Value */
+        build_append_int_noprefix(table, 0, 4);
+        /* Error Threshold Window */
+        build_append_int_noprefix(table, 0, 4);
+}
+
+/* Build table for the hardware error fw_cfg blob */
+void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker)
+{
+    int i, error_status_block_offset;
+
+    /*
+     * | +--------------------------+
+     * | |    error_block_address   |
+     * | |      ..........          |
+     * | +--------------------------+
+     * | |    read_ack_register     |
+     * | |     ...........          |
+     * | +--------------------------+
+     * | |  Error Status Data Block |
+     * | |      ........            |
+     * | +--------------------------+
+     */
+
+    /* Build error_block_address */
+    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+        build_append_int_noprefix(hardware_errors, 0, ACPI_GHES_ADDRESS_SIZE);
+    }
+
+    /* Build read_ack_register */
+    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+        /*
+         * Initialize the value of read_ack_register to 1, so GHES can be
+         * writeable in the first time.
+         * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
+         * (GHESv2 - Type 10)
+         */
+        build_append_int_noprefix(hardware_errors, 1, ACPI_GHES_ADDRESS_SIZE);
+    }
+
+    /* Generic Error Status Block offset in the hardware error fw_cfg blob */
+    error_status_block_offset = hardware_errors->len;
+
+    /* Build Error Status Data Block */
+    build_append_int_noprefix(hardware_errors, 0,
+        ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
+
+    /* Allocate guest memory for the hardware error fw_cfg blob */
+    bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
+                             hardware_errors, 1, false);
+
+    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+        /*
+         * Patch the address of Error Status Data Block into
+         * the error_block_address of hardware_errors fw_cfg blob
+         */
+        bios_linker_loader_add_pointer(linker,
+            ACPI_GHES_ERRORS_FW_CFG_FILE, ACPI_GHES_ADDRESS_SIZE * i,
+            ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
+            error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
+    }
+
+    /*
+     * Write the address of hardware_errors fw_cfg blob into the
+     * hardware_errors_addr fw_cfg blob.
+     */
+    bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
+        0, ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
+}
+
+/* Build Hardware Error Source Table */
+void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_errors,
+                          BIOSLinker *linker)
+{
+    uint32_t hest_start = table_data->len;
+    uint32_t source_id = 0;
+
+    /* Hardware Error Source Table header*/
+    acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+    /* Error Source Count */
+    build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
+
+    /*
+     * Type:
+     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
+     */
+    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
+    /*
+     * Source Id
+     * Once we support more than one hardware error sources, we need to
+     * increase the value of this field.
+     */
+    build_append_int_noprefix(table_data, source_id, 2);
+    /* Related Source Id */
+    build_append_int_noprefix(table_data, 0xffff, 2);
+    /* Flags */
+    build_append_int_noprefix(table_data, 0, 1);
+    /* Enabled */
+    build_append_int_noprefix(table_data, 1, 1);
+
+    /* Number of Records To Pre-allocate */
+    build_append_int_noprefix(table_data, 1, 4);
+    /* Max Sections Per Record */
+    build_append_int_noprefix(table_data, 1, 4);
+    /* Max Raw Data Length */
+    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
+
+    /* Error Status Address */
+    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
+                     4 /* QWord access */, 0);
+    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
+        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
+        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
+        source_id * ACPI_GHES_ADDRESS_SIZE);
+
+    /*
+     * Notification Structure
+     * Now only enable ARMv8 SEA notification type
+     */
+    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
+
+    /* Error Status Block Length */
+    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
+
+    /*
+     * Read Ack Register
+     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
+     * version 2 (GHESv2 - Type 10)
+     */
+    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
+                     4 /* QWord access */, 0);
+    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
+        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
+        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
+        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
+
+    /*
+     * Read Ack Preserve
+     * We only provide the first bit in Read Ack Register to OSPM to write
+     * while the other bits are preserved.
+     */
+    build_append_int_noprefix(table_data, ~0x1ULL, 8);
+    /* Read Ack Write */
+    build_append_int_noprefix(table_data, 0x1, 8);
+
+    build_header(linker, table_data, (void *)(table_data->data + hest_start),
+        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
+}
+
+static AcpiGhesState ges;
+void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
+{
+
+    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
+    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
+
+    /* Create a read-only fw_cfg file for GHES */
+    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
+                    request_block_size);
+
+    /* Create a read-write fw_cfg file for Address */
+    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
+        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
+}
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 2c3702b882..3681ec6e3d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
     tables->table_data = g_array_new(false, true /* clear */, 1);
     tables->tcpalog = g_array_new(false, true /* clear */, 1);
     tables->vmgenid = g_array_new(false, true /* clear */, 1);
+    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
     tables->linker = bios_linker_loader_init();
 }
 
@@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
     g_array_free(tables->table_data, true);
     g_array_free(tables->tcpalog, mfre);
     g_array_free(tables->vmgenid, mfre);
+    g_array_free(tables->hardware_errors, mfre);
 }
 
 /*
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 4cd50175e0..1b1fd273e4 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -48,6 +48,7 @@
 #include "sysemu/reset.h"
 #include "kvm_arm.h"
 #include "migration/vmstate.h"
+#include "hw/acpi/acpi_ghes.h"
 
 #define ARM_SPI_BASE 32
 
@@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     acpi_add_table(table_offsets, tables_blob);
     build_spcr(tables_blob, tables->linker, vms);
 
+    if (vms->ras) {
+        acpi_add_table(table_offsets, tables_blob);
+        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
+        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
+                             tables->linker);
+    }
+
     if (ms->numa_state->num_nodes > 0) {
         acpi_add_table(table_offsets, tables_blob);
         build_srat(tables_blob, tables->linker, vms);
@@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
     fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
                     acpi_data_len(tables.tcpalog));
 
+    if (vms->ras) {
+        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
+    }
+
     build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
                                              build_state, tables.rsdp,
                                              ACPI_BUILD_RSDP_FILE, 0);
diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
new file mode 100644
index 0000000000..cb62ec9c7b
--- /dev/null
+++ b/include/hw/acpi/acpi_ghes.h
@@ -0,0 +1,56 @@
+/*
+ * Support for generating APEI tables and recording CPER for Guests
+ *
+ * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
+ *
+ * Author: Dongjiu Geng <gengdongjiu@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef ACPI_GHES_H
+#define ACPI_GHES_H
+
+#include "hw/acpi/bios-linker-loader.h"
+
+/*
+ * Values for Hardware Error Notification Type field
+ */
+enum AcpiGhesNotifyType {
+    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
+    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
+    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
+    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
+    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
+    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
+    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
+    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
+    ACPI_GHES_NOTIFY_GPIO = 7,
+    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
+    ACPI_GHES_NOTIFY_SEA = 8,
+    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
+    ACPI_GHES_NOTIFY_SEI = 9,
+    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
+    ACPI_GHES_NOTIFY_GSIV = 10,
+    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
+    ACPI_GHES_NOTIFY_SDEI = 11,
+    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
+};
+
+void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
+                          BIOSLinker *linker);
+
+void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
+void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
+#endif
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index de4a406568..8f13620701 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -220,6 +220,7 @@ struct AcpiBuildTables {
     GArray *rsdp;
     GArray *tcpalog;
     GArray *vmgenid;
+    GArray *hardware_errors;
     BIOSLinker *linker;
 } AcpiBuildTables;
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
@ 2019-11-11  1:40   ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: wanghaibin.wang, zhengxiang9

From: Dongjiu Geng <gengdongjiu@huawei.com>

This patch implements APEI GHES Table generation via fw_cfg blobs. Now
it only supports ARMv8 SEA, a type of GHESv2 error source. Afterwards,
we can extend the supported types if needed. For the CPER section,
currently it is memory section because kernel mainly wants userspace to
handle the memory errors.

This patch follows the spec ACPI 6.2 to build the Hardware Error Source
table. For more detailed information, please refer to document:
docs/specs/acpi_hest_ghes.rst

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 default-configs/arm-softmmu.mak |   1 +
 hw/acpi/Kconfig                 |   4 +
 hw/acpi/Makefile.objs           |   1 +
 hw/acpi/acpi_ghes.c             | 267 ++++++++++++++++++++++++++++++++
 hw/acpi/aml-build.c             |   2 +
 hw/arm/virt-acpi-build.c        |  12 ++
 include/hw/acpi/acpi_ghes.h     |  56 +++++++
 include/hw/acpi/aml-build.h     |   1 +
 8 files changed, 344 insertions(+)
 create mode 100644 hw/acpi/acpi_ghes.c
 create mode 100644 include/hw/acpi/acpi_ghes.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 1f2e0e7fde..5722f3130e 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
 CONFIG_FSL_IMX7=y
 CONFIG_FSL_IMX6UL=y
 CONFIG_SEMIHOSTING=y
+CONFIG_ACPI_APEI=y
diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 12e3f1e86e..ed8c34d238 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -23,6 +23,10 @@ config ACPI_NVDIMM
     bool
     depends on ACPI
 
+config ACPI_APEI
+    bool
+    depends on ACPI
+
 config ACPI_PCI
     bool
     depends on ACPI && PCI
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 655a9c1973..84474b0ca8 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
+common-obj-$(CONFIG_ACPI_APEI) += acpi_ghes.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
 common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
new file mode 100644
index 0000000000..42c00ff3d3
--- /dev/null
+++ b/hw/acpi/acpi_ghes.c
@@ -0,0 +1,267 @@
+/*
+ * Support for generating APEI tables and recording CPER for Guests
+ *
+ * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
+ *
+ * Author: Dongjiu Geng <gengdongjiu@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/acpi_ghes.h"
+#include "hw/nvram/fw_cfg.h"
+#include "sysemu/sysemu.h"
+#include "qemu/error-report.h"
+
+#define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
+#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
+
+/*
+ * The size of Address field in Generic Address Structure.
+ * ACPI 2.0/3.0: 5.2.3.1 Generic Address Structure.
+ */
+#define ACPI_GHES_ADDRESS_SIZE              8
+
+/* The max size in bytes for one error block */
+#define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
+
+/*
+ * Now only support ARMv8 SEA notification type error source
+ */
+#define ACPI_GHES_ERROR_SOURCE_COUNT        1
+
+/*
+ * Generic Hardware Error Source version 2
+ */
+#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
+
+/*
+ * | +--------------------------+ 0
+ * | |        Header            |
+ * | +--------------------------+ 40---+-
+ * | | .................        |      |
+ * | | error_status_address-----+ 60   |
+ * | | .................        |      |
+ * | | read_ack_register--------+ 104  92
+ * | | read_ack_preserve        |      |
+ * | | read_ack_write           |      |
+ * + +--------------------------+ 132--+-
+ *
+ * From above GHES definition, the error status address offset is 60;
+ * the Read Ack Register offset is 104, the whole size of GHESv2 is 92
+ */
+
+/* The error status address offset in GHES */
+#define ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(start_addr, n) (start_addr + \
+            60 + offsetof(struct AcpiGenericAddress, address) + n * 92)
+
+/* The Read Ack Register offset in GHES */
+#define ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(start_addr, n) (start_addr +\
+            104 + offsetof(struct AcpiGenericAddress, address) + n * 92)
+
+typedef struct AcpiGhesState {
+    uint64_t ghes_addr_le;
+} AcpiGhesState;
+
+/*
+ * Hardware Error Notification
+ * ACPI 4.0: 17.3.2.7 Hardware Error Notification
+ */
+static void acpi_ghes_build_notify(GArray *table, const uint8_t type)
+{
+        /* Type */
+        build_append_int_noprefix(table, type, 1);
+        /*
+         * Length:
+         * Total length of the structure in bytes
+         */
+        build_append_int_noprefix(table, 28, 1);
+        /* Configuration Write Enable */
+        build_append_int_noprefix(table, 0, 2);
+        /* Poll Interval */
+        build_append_int_noprefix(table, 0, 4);
+        /* Vector */
+        build_append_int_noprefix(table, 0, 4);
+        /* Switch To Polling Threshold Value */
+        build_append_int_noprefix(table, 0, 4);
+        /* Switch To Polling Threshold Window */
+        build_append_int_noprefix(table, 0, 4);
+        /* Error Threshold Value */
+        build_append_int_noprefix(table, 0, 4);
+        /* Error Threshold Window */
+        build_append_int_noprefix(table, 0, 4);
+}
+
+/* Build table for the hardware error fw_cfg blob */
+void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker)
+{
+    int i, error_status_block_offset;
+
+    /*
+     * | +--------------------------+
+     * | |    error_block_address   |
+     * | |      ..........          |
+     * | +--------------------------+
+     * | |    read_ack_register     |
+     * | |     ...........          |
+     * | +--------------------------+
+     * | |  Error Status Data Block |
+     * | |      ........            |
+     * | +--------------------------+
+     */
+
+    /* Build error_block_address */
+    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+        build_append_int_noprefix(hardware_errors, 0, ACPI_GHES_ADDRESS_SIZE);
+    }
+
+    /* Build read_ack_register */
+    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+        /*
+         * Initialize the value of read_ack_register to 1, so GHES can be
+         * writeable in the first time.
+         * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
+         * (GHESv2 - Type 10)
+         */
+        build_append_int_noprefix(hardware_errors, 1, ACPI_GHES_ADDRESS_SIZE);
+    }
+
+    /* Generic Error Status Block offset in the hardware error fw_cfg blob */
+    error_status_block_offset = hardware_errors->len;
+
+    /* Build Error Status Data Block */
+    build_append_int_noprefix(hardware_errors, 0,
+        ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
+
+    /* Allocate guest memory for the hardware error fw_cfg blob */
+    bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
+                             hardware_errors, 1, false);
+
+    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+        /*
+         * Patch the address of Error Status Data Block into
+         * the error_block_address of hardware_errors fw_cfg blob
+         */
+        bios_linker_loader_add_pointer(linker,
+            ACPI_GHES_ERRORS_FW_CFG_FILE, ACPI_GHES_ADDRESS_SIZE * i,
+            ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
+            error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
+    }
+
+    /*
+     * Write the address of hardware_errors fw_cfg blob into the
+     * hardware_errors_addr fw_cfg blob.
+     */
+    bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
+        0, ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
+}
+
+/* Build Hardware Error Source Table */
+void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_errors,
+                          BIOSLinker *linker)
+{
+    uint32_t hest_start = table_data->len;
+    uint32_t source_id = 0;
+
+    /* Hardware Error Source Table header*/
+    acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+    /* Error Source Count */
+    build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
+
+    /*
+     * Type:
+     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
+     */
+    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
+    /*
+     * Source Id
+     * Once we support more than one hardware error sources, we need to
+     * increase the value of this field.
+     */
+    build_append_int_noprefix(table_data, source_id, 2);
+    /* Related Source Id */
+    build_append_int_noprefix(table_data, 0xffff, 2);
+    /* Flags */
+    build_append_int_noprefix(table_data, 0, 1);
+    /* Enabled */
+    build_append_int_noprefix(table_data, 1, 1);
+
+    /* Number of Records To Pre-allocate */
+    build_append_int_noprefix(table_data, 1, 4);
+    /* Max Sections Per Record */
+    build_append_int_noprefix(table_data, 1, 4);
+    /* Max Raw Data Length */
+    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
+
+    /* Error Status Address */
+    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
+                     4 /* QWord access */, 0);
+    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
+        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
+        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
+        source_id * ACPI_GHES_ADDRESS_SIZE);
+
+    /*
+     * Notification Structure
+     * Now only enable ARMv8 SEA notification type
+     */
+    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
+
+    /* Error Status Block Length */
+    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
+
+    /*
+     * Read Ack Register
+     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
+     * version 2 (GHESv2 - Type 10)
+     */
+    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
+                     4 /* QWord access */, 0);
+    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
+        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
+        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
+        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
+
+    /*
+     * Read Ack Preserve
+     * We only provide the first bit in Read Ack Register to OSPM to write
+     * while the other bits are preserved.
+     */
+    build_append_int_noprefix(table_data, ~0x1ULL, 8);
+    /* Read Ack Write */
+    build_append_int_noprefix(table_data, 0x1, 8);
+
+    build_header(linker, table_data, (void *)(table_data->data + hest_start),
+        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
+}
+
+static AcpiGhesState ges;
+void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
+{
+
+    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
+    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
+
+    /* Create a read-only fw_cfg file for GHES */
+    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
+                    request_block_size);
+
+    /* Create a read-write fw_cfg file for Address */
+    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
+        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
+}
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 2c3702b882..3681ec6e3d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
     tables->table_data = g_array_new(false, true /* clear */, 1);
     tables->tcpalog = g_array_new(false, true /* clear */, 1);
     tables->vmgenid = g_array_new(false, true /* clear */, 1);
+    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
     tables->linker = bios_linker_loader_init();
 }
 
@@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
     g_array_free(tables->table_data, true);
     g_array_free(tables->tcpalog, mfre);
     g_array_free(tables->vmgenid, mfre);
+    g_array_free(tables->hardware_errors, mfre);
 }
 
 /*
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 4cd50175e0..1b1fd273e4 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -48,6 +48,7 @@
 #include "sysemu/reset.h"
 #include "kvm_arm.h"
 #include "migration/vmstate.h"
+#include "hw/acpi/acpi_ghes.h"
 
 #define ARM_SPI_BASE 32
 
@@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     acpi_add_table(table_offsets, tables_blob);
     build_spcr(tables_blob, tables->linker, vms);
 
+    if (vms->ras) {
+        acpi_add_table(table_offsets, tables_blob);
+        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
+        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
+                             tables->linker);
+    }
+
     if (ms->numa_state->num_nodes > 0) {
         acpi_add_table(table_offsets, tables_blob);
         build_srat(tables_blob, tables->linker, vms);
@@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
     fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
                     acpi_data_len(tables.tcpalog));
 
+    if (vms->ras) {
+        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
+    }
+
     build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
                                              build_state, tables.rsdp,
                                              ACPI_BUILD_RSDP_FILE, 0);
diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
new file mode 100644
index 0000000000..cb62ec9c7b
--- /dev/null
+++ b/include/hw/acpi/acpi_ghes.h
@@ -0,0 +1,56 @@
+/*
+ * Support for generating APEI tables and recording CPER for Guests
+ *
+ * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
+ *
+ * Author: Dongjiu Geng <gengdongjiu@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef ACPI_GHES_H
+#define ACPI_GHES_H
+
+#include "hw/acpi/bios-linker-loader.h"
+
+/*
+ * Values for Hardware Error Notification Type field
+ */
+enum AcpiGhesNotifyType {
+    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
+    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
+    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
+    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
+    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
+    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
+    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
+    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
+    ACPI_GHES_NOTIFY_GPIO = 7,
+    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
+    ACPI_GHES_NOTIFY_SEA = 8,
+    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
+    ACPI_GHES_NOTIFY_SEI = 9,
+    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
+    ACPI_GHES_NOTIFY_GSIV = 10,
+    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
+    ACPI_GHES_NOTIFY_SDEI = 11,
+    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
+};
+
+void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
+                          BIOSLinker *linker);
+
+void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
+void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
+#endif
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index de4a406568..8f13620701 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -220,6 +220,7 @@ struct AcpiBuildTables {
     GArray *rsdp;
     GArray *tcpalog;
     GArray *vmgenid;
+    GArray *hardware_errors;
     BIOSLinker *linker;
 } AcpiBuildTables;
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 4/6] KVM: Move hwpoison page related functions into kvm-all.c
  2019-11-11  1:40 ` Xiang Zheng
@ 2019-11-11  1:40   ` Xiang Zheng
  -1 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: zhengxiang9, wanghaibin.wang

From: Dongjiu Geng <gengdongjiu@huawei.com>

kvm_hwpoison_page_add() and kvm_unpoison_all() will both be used by X86
and ARM platforms, so moving them into "accel/kvm/kvm-all.c" to avoid
duplicate code.

For architectures that don't use the poison-list functionality the
reset handler will harmlessly do nothing, so let's register the
kvm_unpoison_all() function in the generic kvm_init() function.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
---
 accel/kvm/kvm-all.c      | 36 ++++++++++++++++++++++++++++++++++++
 include/sysemu/kvm_int.h | 12 ++++++++++++
 target/i386/kvm.c        | 36 ------------------------------------
 3 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 140b0bd8f6..f45096d5a0 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -41,6 +41,7 @@
 #include "hw/irq.h"
 #include "sysemu/sev.h"
 #include "sysemu/balloon.h"
+#include "sysemu/reset.h"
 
 #include "hw/boards.h"
 
@@ -856,6 +857,39 @@ int kvm_vm_check_extension(KVMState *s, unsigned int extension)
     return ret;
 }
 
+typedef struct HWPoisonPage {
+    ram_addr_t ram_addr;
+    QLIST_ENTRY(HWPoisonPage) list;
+} HWPoisonPage;
+
+static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
+    QLIST_HEAD_INITIALIZER(hwpoison_page_list);
+
+static void kvm_unpoison_all(void *param)
+{
+    HWPoisonPage *page, *next_page;
+
+    QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
+        QLIST_REMOVE(page, list);
+        qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
+        g_free(page);
+    }
+}
+
+void kvm_hwpoison_page_add(ram_addr_t ram_addr)
+{
+    HWPoisonPage *page;
+
+    QLIST_FOREACH(page, &hwpoison_page_list, list) {
+        if (page->ram_addr == ram_addr) {
+            return;
+        }
+    }
+    page = g_new(HWPoisonPage, 1);
+    page->ram_addr = ram_addr;
+    QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
+}
+
 static uint32_t adjust_ioeventfd_endianness(uint32_t val, uint32_t size)
 {
 #if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN)
@@ -2031,6 +2065,8 @@ static int kvm_init(MachineState *ms)
         goto err;
     }
 
+    qemu_register_reset(kvm_unpoison_all, NULL);
+
     if (machine_kernel_irqchip_allowed(ms)) {
         kvm_irqchip_create(ms, s);
     }
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index ac2d1f8b56..c660a70c51 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -42,4 +42,16 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
                                   AddressSpace *as, int as_id);
 
 void kvm_set_max_memslot_size(hwaddr max_slot_size);
+
+/**
+ * kvm_hwpoison_page_add:
+ *
+ * Parameters:
+ *  @ram_addr: the address in the RAM for the poisoned page
+ *
+ * Add a poisoned page to the list
+ *
+ * Return: None.
+ */
+void kvm_hwpoison_page_add(ram_addr_t ram_addr);
 #endif
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index bfd09bd441..d8f2507a8d 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -24,7 +24,6 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm_int.h"
-#include "sysemu/reset.h"
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "hyperv.h"
@@ -521,40 +520,6 @@ uint64_t kvm_arch_get_supported_msr_feature(KVMState *s, uint32_t index)
     }
 }
 
-
-typedef struct HWPoisonPage {
-    ram_addr_t ram_addr;
-    QLIST_ENTRY(HWPoisonPage) list;
-} HWPoisonPage;
-
-static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
-    QLIST_HEAD_INITIALIZER(hwpoison_page_list);
-
-static void kvm_unpoison_all(void *param)
-{
-    HWPoisonPage *page, *next_page;
-
-    QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
-        QLIST_REMOVE(page, list);
-        qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
-        g_free(page);
-    }
-}
-
-static void kvm_hwpoison_page_add(ram_addr_t ram_addr)
-{
-    HWPoisonPage *page;
-
-    QLIST_FOREACH(page, &hwpoison_page_list, list) {
-        if (page->ram_addr == ram_addr) {
-            return;
-        }
-    }
-    page = g_new(HWPoisonPage, 1);
-    page->ram_addr = ram_addr;
-    QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
-}
-
 static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
                                      int *max_banks)
 {
@@ -2157,7 +2122,6 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         fprintf(stderr, "e820_add_entry() table is full\n");
         return ret;
     }
-    qemu_register_reset(kvm_unpoison_all, NULL);
 
     shadow_mem = machine_kvm_shadow_mem(ms);
     if (shadow_mem != -1) {
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 4/6] KVM: Move hwpoison page related functions into kvm-all.c
@ 2019-11-11  1:40   ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: wanghaibin.wang, zhengxiang9

From: Dongjiu Geng <gengdongjiu@huawei.com>

kvm_hwpoison_page_add() and kvm_unpoison_all() will both be used by X86
and ARM platforms, so moving them into "accel/kvm/kvm-all.c" to avoid
duplicate code.

For architectures that don't use the poison-list functionality the
reset handler will harmlessly do nothing, so let's register the
kvm_unpoison_all() function in the generic kvm_init() function.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
---
 accel/kvm/kvm-all.c      | 36 ++++++++++++++++++++++++++++++++++++
 include/sysemu/kvm_int.h | 12 ++++++++++++
 target/i386/kvm.c        | 36 ------------------------------------
 3 files changed, 48 insertions(+), 36 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 140b0bd8f6..f45096d5a0 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -41,6 +41,7 @@
 #include "hw/irq.h"
 #include "sysemu/sev.h"
 #include "sysemu/balloon.h"
+#include "sysemu/reset.h"
 
 #include "hw/boards.h"
 
@@ -856,6 +857,39 @@ int kvm_vm_check_extension(KVMState *s, unsigned int extension)
     return ret;
 }
 
+typedef struct HWPoisonPage {
+    ram_addr_t ram_addr;
+    QLIST_ENTRY(HWPoisonPage) list;
+} HWPoisonPage;
+
+static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
+    QLIST_HEAD_INITIALIZER(hwpoison_page_list);
+
+static void kvm_unpoison_all(void *param)
+{
+    HWPoisonPage *page, *next_page;
+
+    QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
+        QLIST_REMOVE(page, list);
+        qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
+        g_free(page);
+    }
+}
+
+void kvm_hwpoison_page_add(ram_addr_t ram_addr)
+{
+    HWPoisonPage *page;
+
+    QLIST_FOREACH(page, &hwpoison_page_list, list) {
+        if (page->ram_addr == ram_addr) {
+            return;
+        }
+    }
+    page = g_new(HWPoisonPage, 1);
+    page->ram_addr = ram_addr;
+    QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
+}
+
 static uint32_t adjust_ioeventfd_endianness(uint32_t val, uint32_t size)
 {
 #if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN)
@@ -2031,6 +2065,8 @@ static int kvm_init(MachineState *ms)
         goto err;
     }
 
+    qemu_register_reset(kvm_unpoison_all, NULL);
+
     if (machine_kernel_irqchip_allowed(ms)) {
         kvm_irqchip_create(ms, s);
     }
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index ac2d1f8b56..c660a70c51 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -42,4 +42,16 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
                                   AddressSpace *as, int as_id);
 
 void kvm_set_max_memslot_size(hwaddr max_slot_size);
+
+/**
+ * kvm_hwpoison_page_add:
+ *
+ * Parameters:
+ *  @ram_addr: the address in the RAM for the poisoned page
+ *
+ * Add a poisoned page to the list
+ *
+ * Return: None.
+ */
+void kvm_hwpoison_page_add(ram_addr_t ram_addr);
 #endif
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index bfd09bd441..d8f2507a8d 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -24,7 +24,6 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm_int.h"
-#include "sysemu/reset.h"
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "hyperv.h"
@@ -521,40 +520,6 @@ uint64_t kvm_arch_get_supported_msr_feature(KVMState *s, uint32_t index)
     }
 }
 
-
-typedef struct HWPoisonPage {
-    ram_addr_t ram_addr;
-    QLIST_ENTRY(HWPoisonPage) list;
-} HWPoisonPage;
-
-static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
-    QLIST_HEAD_INITIALIZER(hwpoison_page_list);
-
-static void kvm_unpoison_all(void *param)
-{
-    HWPoisonPage *page, *next_page;
-
-    QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
-        QLIST_REMOVE(page, list);
-        qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
-        g_free(page);
-    }
-}
-
-static void kvm_hwpoison_page_add(ram_addr_t ram_addr)
-{
-    HWPoisonPage *page;
-
-    QLIST_FOREACH(page, &hwpoison_page_list, list) {
-        if (page->ram_addr == ram_addr) {
-            return;
-        }
-    }
-    page = g_new(HWPoisonPage, 1);
-    page->ram_addr = ram_addr;
-    QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
-}
-
 static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
                                      int *max_banks)
 {
@@ -2157,7 +2122,6 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
         fprintf(stderr, "e820_add_entry() table is full\n");
         return ret;
     }
-    qemu_register_reset(kvm_unpoison_all, NULL);
 
     shadow_mem = machine_kvm_shadow_mem(ms);
     if (shadow_mem != -1) {
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-11  1:40 ` Xiang Zheng
@ 2019-11-11  1:40   ` Xiang Zheng
  -1 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: zhengxiang9, wanghaibin.wang

From: Dongjiu Geng <gengdongjiu@huawei.com>

Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
translates the host VA delivered by host to guest PA, then fills this PA
to guest APEI GHES memory, then notifies guest according to the SIGBUS
type.

When guest accesses the poisoned memory, it will generate a Synchronous
External Abort(SEA). Then host kernel gets an APEI notification and calls
memory_failure() to unmapped the affected page in stage 2, finally
returns to guest.

Guest continues to access the PG_hwpoison page, it will trap to KVM as
stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
Qemu, Qemu records this error address into guest APEI GHES memory and
notifes guest using Synchronous-External-Abort(SEA).

In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
in which we can setup the type of exception and the syndrome information.
When switching to guest, the target vcpu will jump to the synchronous
external abort vector table entry.

The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
not valid and hold an UNKNOWN value. These values will be set to KVM
register structures through KVM_SET_ONE_REG IOCTL.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
 include/hw/acpi/acpi_ghes.h |   4 +
 include/sysemu/kvm.h        |   3 +-
 target/arm/cpu.h            |   4 +
 target/arm/helper.c         |   2 +-
 target/arm/internals.h      |   5 +-
 target/arm/kvm64.c          |  64 ++++++++
 target/arm/tlb_helper.c     |   2 +-
 target/i386/cpu.h           |   2 +
 9 files changed, 377 insertions(+), 6 deletions(-)

diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
index 42c00ff3d3..f5b54990c0 100644
--- a/hw/acpi/acpi_ghes.c
+++ b/hw/acpi/acpi_ghes.c
@@ -39,6 +39,34 @@
 /* The max size in bytes for one error block */
 #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
 
+/*
+ * The total size of Generic Error Data Entry
+ * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
+ * Table 18-343 Generic Error Data Entry
+ */
+#define ACPI_GHES_DATA_LENGTH               72
+
+/*
+ * The memory section CPER size,
+ * UEFI 2.6: N.2.5 Memory Error Section
+ */
+#define ACPI_GHES_MEM_CPER_LENGTH           80
+
+/*
+ * Masks for block_status flags
+ */
+#define ACPI_GEBS_UNCORRECTABLE         1
+
+/*
+ * Values for error_severity field
+ */
+enum AcpiGenericErrorSeverity {
+    ACPI_CPER_SEV_RECOVERABLE,
+    ACPI_CPER_SEV_FATAL,
+    ACPI_CPER_SEV_CORRECTED,
+    ACPI_CPER_SEV_NONE,
+};
+
 /*
  * Now only support ARMv8 SEA notification type error source
  */
@@ -49,6 +77,16 @@
  */
 #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
 
+#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
+    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
+    ((b) >> 8) & 0xff, (b) & 0xff,                   \
+    ((c) >> 8) & 0xff, (c) & 0xff,                    \
+    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
+
+#define UEFI_CPER_SEC_PLATFORM_MEM                   \
+    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
+    0xED, 0x7C, 0x83, 0xB1)
+
 /*
  * | +--------------------------+ 0
  * | |        Header            |
@@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
     uint64_t ghes_addr_le;
 } AcpiGhesState;
 
+/*
+ * Total size for Generic Error Status Block
+ * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
+ * Table 18-380 Generic Error Status Block
+ */
+#define ACPI_GHES_GESB_SIZE                 20
+/* The offset of Data Length in Generic Error Status Block */
+#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
+
+/*
+ * Record the value of data length for each error status block to avoid getting
+ * this value from guest.
+ */
+static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
+
+/*
+ * Generic Error Data Entry
+ * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ */
+static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
+                uint32_t error_severity, uint16_t revision,
+                uint8_t validation_bits, uint8_t flags,
+                uint32_t error_data_length, QemuUUID fru_id,
+                uint8_t *fru_text, uint64_t time_stamp)
+{
+    QemuUUID uuid_le;
+
+    /* Section Type */
+    uuid_le = qemu_uuid_bswap(section_type);
+    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
+
+    /* Error Severity */
+    build_append_int_noprefix(table, error_severity, 4);
+    /* Revision */
+    build_append_int_noprefix(table, revision, 2);
+    /* Validation Bits */
+    build_append_int_noprefix(table, validation_bits, 1);
+    /* Flags */
+    build_append_int_noprefix(table, flags, 1);
+    /* Error Data Length */
+    build_append_int_noprefix(table, error_data_length, 4);
+
+    /* FRU Id */
+    uuid_le = qemu_uuid_bswap(fru_id);
+    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
+
+    /* FRU Text */
+    g_array_append_vals(table, fru_text, 20);
+    /* Timestamp */
+    build_append_int_noprefix(table, time_stamp, 8);
+}
+
+/*
+ * Generic Error Status Block
+ * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ */
+static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
+                uint32_t raw_data_offset, uint32_t raw_data_length,
+                uint32_t data_length, uint32_t error_severity)
+{
+    /* Block Status */
+    build_append_int_noprefix(table, block_status, 4);
+    /* Raw Data Offset */
+    build_append_int_noprefix(table, raw_data_offset, 4);
+    /* Raw Data Length */
+    build_append_int_noprefix(table, raw_data_length, 4);
+    /* Data Length */
+    build_append_int_noprefix(table, data_length, 4);
+    /* Error Severity */
+    build_append_int_noprefix(table, error_severity, 4);
+}
+
+/* UEFI 2.6: N.2.5 Memory Error Section */
+static void acpi_ghes_build_append_mem_cper(GArray *table,
+                                            uint64_t error_physical_addr)
+{
+    /*
+     * Memory Error Record
+     */
+
+    /* Validation Bits */
+    build_append_int_noprefix(table,
+                              (1UL << 14) | /* Type Valid */
+                              (1UL << 1) /* Physical Address Valid */,
+                              8);
+    /* Error Status */
+    build_append_int_noprefix(table, 0, 8);
+    /* Physical Address */
+    build_append_int_noprefix(table, error_physical_addr, 8);
+    /* Skip all the detailed information normally found in such a record */
+    build_append_int_noprefix(table, 0, 48);
+    /* Memory Error Type */
+    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
+    /* Skip all the detailed information normally found in such a record */
+    build_append_int_noprefix(table, 0, 7);
+}
+
+static int acpi_ghes_record_mem_error(uint64_t error_block_address,
+                                      uint64_t error_physical_addr,
+                                      uint32_t data_length)
+{
+    GArray *block;
+    uint64_t current_block_length;
+    /* Memory Error Section Type */
+    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
+    QemuUUID fru_id = {};
+    uint8_t fru_text[20] = {};
+
+    /*
+     * Generic Error Status Block
+     * | +---------------------+
+     * | |     block_status    |
+     * | +---------------------+
+     * | |    raw_data_offset  |
+     * | +---------------------+
+     * | |    raw_data_length  |
+     * | +---------------------+
+     * | |     data_length     |
+     * | +---------------------+
+     * | |   error_severity    |
+     * | +---------------------+
+     */
+    block = g_array_new(false, true /* clear */, 1);
+
+    /* The current whole length of the generic error status block */
+    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
+
+    /* This is the length if adding a new generic error data entry*/
+    data_length += ACPI_GHES_DATA_LENGTH;
+    data_length += ACPI_GHES_MEM_CPER_LENGTH;
+
+    /*
+     * Check whether it will run out of the preallocated memory if adding a new
+     * generic error data entry
+     */
+    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
+        error_report("Record CPER out of boundary!!!");
+        return ACPI_GHES_CPER_FAIL;
+    }
+
+    /* Build the new generic error status block header */
+    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
+        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
+
+    /* Write back above generic error status block header to guest memory */
+    cpu_physical_memory_write(error_block_address, block->data,
+                              block->len);
+
+    /* Add a new generic error data entry */
+
+    data_length = block->len;
+    /* Build this new generic error data entry header */
+    acpi_ghes_generic_error_data(block, mem_section_id_le,
+        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
+        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
+
+    /* Build the memory section CPER for above new generic error data entry */
+    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
+
+    /* Write back above this new generic error data entry to guest memory */
+    cpu_physical_memory_write(error_block_address + current_block_length,
+        block->data + data_length, block->len - data_length);
+
+    g_array_free(block, true);
+
+    return ACPI_GHES_CPER_OK;
+}
+
 /*
  * Hardware Error Notification
  * ACPI 4.0: 17.3.2.7 Hardware Error Notification
@@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
     fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
         NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
 }
+
+bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
+{
+    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
+    int loop = 0;
+    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
+    bool ret = ACPI_GHES_CPER_FAIL;
+    uint8_t source_id;
+    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
+
+    /*
+     * | +---------------------+ ges.ghes_addr_le
+     * | |error_block_address0 |
+     * | +---------------------+ --+--
+     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
+     * | +---------------------+ --+--
+     * | |error_block_addressN |
+     * | +---------------------+
+     * | | read_ack_register0  |
+     * | +---------------------+ --+--
+     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
+     * | +---------------------+ --+--
+     * | | read_ack_registerN  |
+     * | +---------------------+ --+--
+     * | |      CPER           |   |
+     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
+     * | |      CPER           |   |
+     * | +---------------------+ --+--
+     * | |    ..........       |
+     * | +---------------------+
+     * | |      CPER           |
+     * | |      ....           |
+     * | |      CPER           |
+     * | +---------------------+
+     */
+    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
+        /* Find and check the source id for this new CPER */
+        source_id = error_source_id[notify];
+        if (source_id != 0xff) {
+            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
+        } else {
+            goto out;
+        }
+
+        cpu_physical_memory_read(start_addr, &error_block_addr,
+                                 ACPI_GHES_ADDRESS_SIZE);
+
+        read_ack_register_addr = start_addr +
+            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
+retry:
+        cpu_physical_memory_read(read_ack_register_addr,
+                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
+
+        /* zero means OSPM does not acknowledge the error */
+        if (!read_ack_register) {
+            if (loop < 3) {
+                usleep(100 * 1000);
+                loop++;
+                goto retry;
+            } else {
+                error_report("OSPM does not acknowledge previous error,"
+                    " so can not record CPER for current error, forcibly"
+                    " acknowledge previous error to avoid blocking next time"
+                    " CPER record! Exit");
+                read_ack_register = 1;
+                cpu_physical_memory_write(read_ack_register_addr,
+                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
+            }
+        } else {
+            if (error_block_addr) {
+                read_ack_register = 0;
+                /*
+                 * Clear the Read Ack Register, OSPM will write it to 1 when
+                 * acknowledge this error.
+                 */
+                cpu_physical_memory_write(read_ack_register_addr,
+                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
+                ret = acpi_ghes_record_mem_error(error_block_addr,
+                          physical_address, acpi_ghes_data_length[source_id]);
+                if (ret == ACPI_GHES_CPER_OK) {
+                    acpi_ghes_data_length[source_id] +=
+                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
+                }
+            }
+        }
+    }
+
+out:
+    return ret;
+}
diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
index cb62ec9c7b..8e3c5b879e 100644
--- a/include/hw/acpi/acpi_ghes.h
+++ b/include/hw/acpi/acpi_ghes.h
@@ -24,6 +24,9 @@
 
 #include "hw/acpi/bios-linker-loader.h"
 
+#define ACPI_GHES_CPER_OK                   1
+#define ACPI_GHES_CPER_FAIL                 0
+
 /*
  * Values for Hardware Error Notification Type field
  */
@@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
 
 void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
 void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
+bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
 #endif
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 9d143282bc..321ead8115 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
 /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
 unsigned long kvm_arch_vcpu_id(CPUState *cpu);
 
-#ifdef TARGET_I386
-#define KVM_HAVE_MCE_INJECTION 1
+#ifdef KVM_HAVE_MCE_INJECTION
 void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
 #endif
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index d844ea21d8..c4fe6ccc63 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -28,6 +28,10 @@
 /* ARM processors have a weak memory model */
 #define TCG_GUEST_DEFAULT_MO      (0)
 
+#ifdef TARGET_AARCH64
+#define KVM_HAVE_MCE_INJECTION 1
+#endif
+
 #define EXCP_UDEF            1   /* undefined instruction */
 #define EXCP_SWI             2   /* software interrupt */
 #define EXCP_PREFETCH_ABORT  3
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 63815fc4cf..a9ce97efb1 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
              * Report exception with ESR indicating a fault due to a
              * translation table walk for a cache maintenance instruction.
              */
-            syn = syn_data_abort_no_iss(current_el == target_el,
+            syn = syn_data_abort_no_iss(current_el == target_el, 0,
                                         fi.ea, 1, fi.s1ptw, 1, fsc);
             env->exception.vaddress = value;
             env->exception.fsr = fsr;
diff --git a/target/arm/internals.h b/target/arm/internals.h
index f5313dd3d4..28b8451d6d 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
         | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
 }
 
-static inline uint32_t syn_data_abort_no_iss(int same_el,
+static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
                                              int ea, int cm, int s1ptw,
                                              int wnr, int fsc)
 {
     return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
            | ARM_EL_IL
-           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
+           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
+           | (wnr << 6) | fsc;
 }
 
 static inline uint32_t syn_data_abort_with_iss(int same_el,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 28f6db57d5..c7b7653d3f 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -28,6 +28,8 @@
 #include "kvm_arm.h"
 #include "hw/boards.h"
 #include "internals.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/acpi_ghes.h"
 
 static bool have_guest_debug;
 
@@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
     return KVM_PUT_RUNTIME_STATE;
 }
 
+/* Callers must hold the iothread mutex lock */
+static void kvm_inject_arm_sea(CPUState *c)
+{
+    ARMCPU *cpu = ARM_CPU(c);
+    CPUARMState *env = &cpu->env;
+    CPUClass *cc = CPU_GET_CLASS(c);
+    uint32_t esr;
+    bool same_el;
+
+    c->exception_index = EXCP_DATA_ABORT;
+    env->exception.target_el = 1;
+
+    /*
+     * Set the DFSC to synchronous external abort and set FnV to not valid,
+     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
+     */
+    same_el = arm_current_el(env) == env->exception.target_el;
+    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
+
+    env->exception.syndrome = esr;
+
+    cc->do_interrupt(c);
+}
+
 #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
                  KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
 
@@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
     return ret;
 }
 
+void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
+{
+    ram_addr_t ram_addr;
+    hwaddr paddr;
+
+    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
+
+    if (acpi_enabled && addr &&
+            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
+        ram_addr = qemu_ram_addr_from_host(addr);
+        if (ram_addr != RAM_ADDR_INVALID &&
+            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
+            kvm_hwpoison_page_add(ram_addr);
+            /*
+             * Asynchronous signal will be masked by main thread, so
+             * only handle synchronous signal.
+             */
+            if (code == BUS_MCEERR_AR) {
+                kvm_cpu_synchronize_state(c);
+                if (ACPI_GHES_CPER_FAIL !=
+                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
+                    kvm_inject_arm_sea(c);
+                } else {
+                    fprintf(stderr, "failed to record the error\n");
+                }
+            }
+            return;
+        }
+        fprintf(stderr, "Hardware memory error for memory used by "
+                "QEMU itself instead of guest system!\n");
+    }
+
+    if (code == BUS_MCEERR_AR) {
+        fprintf(stderr, "Hardware memory error!\n");
+        exit(1);
+    }
+}
+
 /* C6.6.29 BRK instruction */
 static const uint32_t brk_insn = 0xd4200000;
 
diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
index 5feb312941..499672ebbc 100644
--- a/target/arm/tlb_helper.c
+++ b/target/arm/tlb_helper.c
@@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
      * ISV field.
      */
     if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
-        syn = syn_data_abort_no_iss(same_el,
+        syn = syn_data_abort_no_iss(same_el, 0,
                                     ea, 0, s1ptw, is_write, fsc);
     } else {
         /*
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5352c9ff55..f75a210f96 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -29,6 +29,8 @@
 /* The x86 has a strong memory model with some store-after-load re-ordering */
 #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
+#define KVM_HAVE_MCE_INJECTION 1
+
 /* Maximum instruction code size */
 #define TARGET_MAX_INSN_SIZE 16
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-11-11  1:40   ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: wanghaibin.wang, zhengxiang9

From: Dongjiu Geng <gengdongjiu@huawei.com>

Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
translates the host VA delivered by host to guest PA, then fills this PA
to guest APEI GHES memory, then notifies guest according to the SIGBUS
type.

When guest accesses the poisoned memory, it will generate a Synchronous
External Abort(SEA). Then host kernel gets an APEI notification and calls
memory_failure() to unmapped the affected page in stage 2, finally
returns to guest.

Guest continues to access the PG_hwpoison page, it will trap to KVM as
stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
Qemu, Qemu records this error address into guest APEI GHES memory and
notifes guest using Synchronous-External-Abort(SEA).

In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
in which we can setup the type of exception and the syndrome information.
When switching to guest, the target vcpu will jump to the synchronous
external abort vector table entry.

The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
not valid and hold an UNKNOWN value. These values will be set to KVM
register structures through KVM_SET_ONE_REG IOCTL.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
 include/hw/acpi/acpi_ghes.h |   4 +
 include/sysemu/kvm.h        |   3 +-
 target/arm/cpu.h            |   4 +
 target/arm/helper.c         |   2 +-
 target/arm/internals.h      |   5 +-
 target/arm/kvm64.c          |  64 ++++++++
 target/arm/tlb_helper.c     |   2 +-
 target/i386/cpu.h           |   2 +
 9 files changed, 377 insertions(+), 6 deletions(-)

diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
index 42c00ff3d3..f5b54990c0 100644
--- a/hw/acpi/acpi_ghes.c
+++ b/hw/acpi/acpi_ghes.c
@@ -39,6 +39,34 @@
 /* The max size in bytes for one error block */
 #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
 
+/*
+ * The total size of Generic Error Data Entry
+ * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
+ * Table 18-343 Generic Error Data Entry
+ */
+#define ACPI_GHES_DATA_LENGTH               72
+
+/*
+ * The memory section CPER size,
+ * UEFI 2.6: N.2.5 Memory Error Section
+ */
+#define ACPI_GHES_MEM_CPER_LENGTH           80
+
+/*
+ * Masks for block_status flags
+ */
+#define ACPI_GEBS_UNCORRECTABLE         1
+
+/*
+ * Values for error_severity field
+ */
+enum AcpiGenericErrorSeverity {
+    ACPI_CPER_SEV_RECOVERABLE,
+    ACPI_CPER_SEV_FATAL,
+    ACPI_CPER_SEV_CORRECTED,
+    ACPI_CPER_SEV_NONE,
+};
+
 /*
  * Now only support ARMv8 SEA notification type error source
  */
@@ -49,6 +77,16 @@
  */
 #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
 
+#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
+    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
+    ((b) >> 8) & 0xff, (b) & 0xff,                   \
+    ((c) >> 8) & 0xff, (c) & 0xff,                    \
+    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
+
+#define UEFI_CPER_SEC_PLATFORM_MEM                   \
+    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
+    0xED, 0x7C, 0x83, 0xB1)
+
 /*
  * | +--------------------------+ 0
  * | |        Header            |
@@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
     uint64_t ghes_addr_le;
 } AcpiGhesState;
 
+/*
+ * Total size for Generic Error Status Block
+ * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
+ * Table 18-380 Generic Error Status Block
+ */
+#define ACPI_GHES_GESB_SIZE                 20
+/* The offset of Data Length in Generic Error Status Block */
+#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
+
+/*
+ * Record the value of data length for each error status block to avoid getting
+ * this value from guest.
+ */
+static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
+
+/*
+ * Generic Error Data Entry
+ * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ */
+static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
+                uint32_t error_severity, uint16_t revision,
+                uint8_t validation_bits, uint8_t flags,
+                uint32_t error_data_length, QemuUUID fru_id,
+                uint8_t *fru_text, uint64_t time_stamp)
+{
+    QemuUUID uuid_le;
+
+    /* Section Type */
+    uuid_le = qemu_uuid_bswap(section_type);
+    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
+
+    /* Error Severity */
+    build_append_int_noprefix(table, error_severity, 4);
+    /* Revision */
+    build_append_int_noprefix(table, revision, 2);
+    /* Validation Bits */
+    build_append_int_noprefix(table, validation_bits, 1);
+    /* Flags */
+    build_append_int_noprefix(table, flags, 1);
+    /* Error Data Length */
+    build_append_int_noprefix(table, error_data_length, 4);
+
+    /* FRU Id */
+    uuid_le = qemu_uuid_bswap(fru_id);
+    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
+
+    /* FRU Text */
+    g_array_append_vals(table, fru_text, 20);
+    /* Timestamp */
+    build_append_int_noprefix(table, time_stamp, 8);
+}
+
+/*
+ * Generic Error Status Block
+ * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ */
+static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
+                uint32_t raw_data_offset, uint32_t raw_data_length,
+                uint32_t data_length, uint32_t error_severity)
+{
+    /* Block Status */
+    build_append_int_noprefix(table, block_status, 4);
+    /* Raw Data Offset */
+    build_append_int_noprefix(table, raw_data_offset, 4);
+    /* Raw Data Length */
+    build_append_int_noprefix(table, raw_data_length, 4);
+    /* Data Length */
+    build_append_int_noprefix(table, data_length, 4);
+    /* Error Severity */
+    build_append_int_noprefix(table, error_severity, 4);
+}
+
+/* UEFI 2.6: N.2.5 Memory Error Section */
+static void acpi_ghes_build_append_mem_cper(GArray *table,
+                                            uint64_t error_physical_addr)
+{
+    /*
+     * Memory Error Record
+     */
+
+    /* Validation Bits */
+    build_append_int_noprefix(table,
+                              (1UL << 14) | /* Type Valid */
+                              (1UL << 1) /* Physical Address Valid */,
+                              8);
+    /* Error Status */
+    build_append_int_noprefix(table, 0, 8);
+    /* Physical Address */
+    build_append_int_noprefix(table, error_physical_addr, 8);
+    /* Skip all the detailed information normally found in such a record */
+    build_append_int_noprefix(table, 0, 48);
+    /* Memory Error Type */
+    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
+    /* Skip all the detailed information normally found in such a record */
+    build_append_int_noprefix(table, 0, 7);
+}
+
+static int acpi_ghes_record_mem_error(uint64_t error_block_address,
+                                      uint64_t error_physical_addr,
+                                      uint32_t data_length)
+{
+    GArray *block;
+    uint64_t current_block_length;
+    /* Memory Error Section Type */
+    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
+    QemuUUID fru_id = {};
+    uint8_t fru_text[20] = {};
+
+    /*
+     * Generic Error Status Block
+     * | +---------------------+
+     * | |     block_status    |
+     * | +---------------------+
+     * | |    raw_data_offset  |
+     * | +---------------------+
+     * | |    raw_data_length  |
+     * | +---------------------+
+     * | |     data_length     |
+     * | +---------------------+
+     * | |   error_severity    |
+     * | +---------------------+
+     */
+    block = g_array_new(false, true /* clear */, 1);
+
+    /* The current whole length of the generic error status block */
+    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
+
+    /* This is the length if adding a new generic error data entry*/
+    data_length += ACPI_GHES_DATA_LENGTH;
+    data_length += ACPI_GHES_MEM_CPER_LENGTH;
+
+    /*
+     * Check whether it will run out of the preallocated memory if adding a new
+     * generic error data entry
+     */
+    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
+        error_report("Record CPER out of boundary!!!");
+        return ACPI_GHES_CPER_FAIL;
+    }
+
+    /* Build the new generic error status block header */
+    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
+        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
+
+    /* Write back above generic error status block header to guest memory */
+    cpu_physical_memory_write(error_block_address, block->data,
+                              block->len);
+
+    /* Add a new generic error data entry */
+
+    data_length = block->len;
+    /* Build this new generic error data entry header */
+    acpi_ghes_generic_error_data(block, mem_section_id_le,
+        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
+        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
+
+    /* Build the memory section CPER for above new generic error data entry */
+    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
+
+    /* Write back above this new generic error data entry to guest memory */
+    cpu_physical_memory_write(error_block_address + current_block_length,
+        block->data + data_length, block->len - data_length);
+
+    g_array_free(block, true);
+
+    return ACPI_GHES_CPER_OK;
+}
+
 /*
  * Hardware Error Notification
  * ACPI 4.0: 17.3.2.7 Hardware Error Notification
@@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
     fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
         NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
 }
+
+bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
+{
+    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
+    int loop = 0;
+    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
+    bool ret = ACPI_GHES_CPER_FAIL;
+    uint8_t source_id;
+    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
+
+    /*
+     * | +---------------------+ ges.ghes_addr_le
+     * | |error_block_address0 |
+     * | +---------------------+ --+--
+     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
+     * | +---------------------+ --+--
+     * | |error_block_addressN |
+     * | +---------------------+
+     * | | read_ack_register0  |
+     * | +---------------------+ --+--
+     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
+     * | +---------------------+ --+--
+     * | | read_ack_registerN  |
+     * | +---------------------+ --+--
+     * | |      CPER           |   |
+     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
+     * | |      CPER           |   |
+     * | +---------------------+ --+--
+     * | |    ..........       |
+     * | +---------------------+
+     * | |      CPER           |
+     * | |      ....           |
+     * | |      CPER           |
+     * | +---------------------+
+     */
+    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
+        /* Find and check the source id for this new CPER */
+        source_id = error_source_id[notify];
+        if (source_id != 0xff) {
+            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
+        } else {
+            goto out;
+        }
+
+        cpu_physical_memory_read(start_addr, &error_block_addr,
+                                 ACPI_GHES_ADDRESS_SIZE);
+
+        read_ack_register_addr = start_addr +
+            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
+retry:
+        cpu_physical_memory_read(read_ack_register_addr,
+                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
+
+        /* zero means OSPM does not acknowledge the error */
+        if (!read_ack_register) {
+            if (loop < 3) {
+                usleep(100 * 1000);
+                loop++;
+                goto retry;
+            } else {
+                error_report("OSPM does not acknowledge previous error,"
+                    " so can not record CPER for current error, forcibly"
+                    " acknowledge previous error to avoid blocking next time"
+                    " CPER record! Exit");
+                read_ack_register = 1;
+                cpu_physical_memory_write(read_ack_register_addr,
+                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
+            }
+        } else {
+            if (error_block_addr) {
+                read_ack_register = 0;
+                /*
+                 * Clear the Read Ack Register, OSPM will write it to 1 when
+                 * acknowledge this error.
+                 */
+                cpu_physical_memory_write(read_ack_register_addr,
+                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
+                ret = acpi_ghes_record_mem_error(error_block_addr,
+                          physical_address, acpi_ghes_data_length[source_id]);
+                if (ret == ACPI_GHES_CPER_OK) {
+                    acpi_ghes_data_length[source_id] +=
+                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
+                }
+            }
+        }
+    }
+
+out:
+    return ret;
+}
diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
index cb62ec9c7b..8e3c5b879e 100644
--- a/include/hw/acpi/acpi_ghes.h
+++ b/include/hw/acpi/acpi_ghes.h
@@ -24,6 +24,9 @@
 
 #include "hw/acpi/bios-linker-loader.h"
 
+#define ACPI_GHES_CPER_OK                   1
+#define ACPI_GHES_CPER_FAIL                 0
+
 /*
  * Values for Hardware Error Notification Type field
  */
@@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
 
 void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
 void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
+bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
 #endif
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 9d143282bc..321ead8115 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
 /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
 unsigned long kvm_arch_vcpu_id(CPUState *cpu);
 
-#ifdef TARGET_I386
-#define KVM_HAVE_MCE_INJECTION 1
+#ifdef KVM_HAVE_MCE_INJECTION
 void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
 #endif
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index d844ea21d8..c4fe6ccc63 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -28,6 +28,10 @@
 /* ARM processors have a weak memory model */
 #define TCG_GUEST_DEFAULT_MO      (0)
 
+#ifdef TARGET_AARCH64
+#define KVM_HAVE_MCE_INJECTION 1
+#endif
+
 #define EXCP_UDEF            1   /* undefined instruction */
 #define EXCP_SWI             2   /* software interrupt */
 #define EXCP_PREFETCH_ABORT  3
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 63815fc4cf..a9ce97efb1 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
              * Report exception with ESR indicating a fault due to a
              * translation table walk for a cache maintenance instruction.
              */
-            syn = syn_data_abort_no_iss(current_el == target_el,
+            syn = syn_data_abort_no_iss(current_el == target_el, 0,
                                         fi.ea, 1, fi.s1ptw, 1, fsc);
             env->exception.vaddress = value;
             env->exception.fsr = fsr;
diff --git a/target/arm/internals.h b/target/arm/internals.h
index f5313dd3d4..28b8451d6d 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
         | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
 }
 
-static inline uint32_t syn_data_abort_no_iss(int same_el,
+static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
                                              int ea, int cm, int s1ptw,
                                              int wnr, int fsc)
 {
     return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
            | ARM_EL_IL
-           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
+           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
+           | (wnr << 6) | fsc;
 }
 
 static inline uint32_t syn_data_abort_with_iss(int same_el,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 28f6db57d5..c7b7653d3f 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -28,6 +28,8 @@
 #include "kvm_arm.h"
 #include "hw/boards.h"
 #include "internals.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/acpi_ghes.h"
 
 static bool have_guest_debug;
 
@@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
     return KVM_PUT_RUNTIME_STATE;
 }
 
+/* Callers must hold the iothread mutex lock */
+static void kvm_inject_arm_sea(CPUState *c)
+{
+    ARMCPU *cpu = ARM_CPU(c);
+    CPUARMState *env = &cpu->env;
+    CPUClass *cc = CPU_GET_CLASS(c);
+    uint32_t esr;
+    bool same_el;
+
+    c->exception_index = EXCP_DATA_ABORT;
+    env->exception.target_el = 1;
+
+    /*
+     * Set the DFSC to synchronous external abort and set FnV to not valid,
+     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
+     */
+    same_el = arm_current_el(env) == env->exception.target_el;
+    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
+
+    env->exception.syndrome = esr;
+
+    cc->do_interrupt(c);
+}
+
 #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
                  KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
 
@@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
     return ret;
 }
 
+void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
+{
+    ram_addr_t ram_addr;
+    hwaddr paddr;
+
+    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
+
+    if (acpi_enabled && addr &&
+            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
+        ram_addr = qemu_ram_addr_from_host(addr);
+        if (ram_addr != RAM_ADDR_INVALID &&
+            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
+            kvm_hwpoison_page_add(ram_addr);
+            /*
+             * Asynchronous signal will be masked by main thread, so
+             * only handle synchronous signal.
+             */
+            if (code == BUS_MCEERR_AR) {
+                kvm_cpu_synchronize_state(c);
+                if (ACPI_GHES_CPER_FAIL !=
+                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
+                    kvm_inject_arm_sea(c);
+                } else {
+                    fprintf(stderr, "failed to record the error\n");
+                }
+            }
+            return;
+        }
+        fprintf(stderr, "Hardware memory error for memory used by "
+                "QEMU itself instead of guest system!\n");
+    }
+
+    if (code == BUS_MCEERR_AR) {
+        fprintf(stderr, "Hardware memory error!\n");
+        exit(1);
+    }
+}
+
 /* C6.6.29 BRK instruction */
 static const uint32_t brk_insn = 0xd4200000;
 
diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
index 5feb312941..499672ebbc 100644
--- a/target/arm/tlb_helper.c
+++ b/target/arm/tlb_helper.c
@@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
      * ISV field.
      */
     if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
-        syn = syn_data_abort_no_iss(same_el,
+        syn = syn_data_abort_no_iss(same_el, 0,
                                     ea, 0, s1ptw, is_write, fsc);
     } else {
         /*
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5352c9ff55..f75a210f96 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -29,6 +29,8 @@
 /* The x86 has a strong memory model with some store-after-load re-ordering */
 #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
+#define KVM_HAVE_MCE_INJECTION 1
+
 /* Maximum instruction code size */
 #define TARGET_MAX_INSN_SIZE 16
 
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 6/6] MAINTAINERS: Add APCI/APEI/GHES entries
  2019-11-11  1:40 ` Xiang Zheng
@ 2019-11-11  1:40   ` Xiang Zheng
  -1 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: zhengxiang9, wanghaibin.wang

From: Dongjiu Geng <gengdongjiu@huawei.com>

I and Xiang are willing to review the APEI-related patches and
volunteer as the reviewers for the APEI/GHES part.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
---
 MAINTAINERS | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 325e67a04e..043f7a928e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1414,6 +1414,15 @@ F: tests/bios-tables-test.c
 F: tests/acpi-utils.[hc]
 F: tests/data/acpi/
 
+ACPI/APEI/GHES
+R: Dongjiu Geng <gengdongjiu@huawei.com>
+R: Xiang Zheng <zhengxiang9@huawei.com>
+L: qemu-arm@nongnu.org
+S: Maintained
+F: hw/acpi/acpi_ghes.c
+F: include/hw/acpi/acpi_ghes.h
+F: docs/specs/acpi_hest_ghes.rst
+
 ppc4xx
 M: David Gibson <david@gibson.dropbear.id.au>
 L: qemu-ppc@nongnu.org
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [RESEND PATCH v21 6/6] MAINTAINERS: Add APCI/APEI/GHES entries
@ 2019-11-11  1:40   ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-11  1:40 UTC (permalink / raw)
  To: pbonzini, mst, imammedo, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm
  Cc: wanghaibin.wang, zhengxiang9

From: Dongjiu Geng <gengdongjiu@huawei.com>

I and Xiang are willing to review the APEI-related patches and
volunteer as the reviewers for the APEI/GHES part.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
---
 MAINTAINERS | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 325e67a04e..043f7a928e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1414,6 +1414,15 @@ F: tests/bios-tables-test.c
 F: tests/acpi-utils.[hc]
 F: tests/data/acpi/
 
+ACPI/APEI/GHES
+R: Dongjiu Geng <gengdongjiu@huawei.com>
+R: Xiang Zheng <zhengxiang9@huawei.com>
+L: qemu-arm@nongnu.org
+S: Maintained
+F: hw/acpi/acpi_ghes.c
+F: include/hw/acpi/acpi_ghes.h
+F: docs/specs/acpi_hest_ghes.rst
+
 ppc4xx
 M: David Gibson <david@gibson.dropbear.id.au>
 L: qemu-ppc@nongnu.org
-- 
2.19.1




^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
  2019-11-11  1:40   ` Xiang Zheng
@ 2019-11-15  9:38     ` Igor Mammedov
  -1 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-15  9:38 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: pbonzini, mst, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

On Mon, 11 Nov 2019 09:40:45 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> From: Dongjiu Geng <gengdongjiu@huawei.com>
> 
> This patch implements APEI GHES Table generation via fw_cfg blobs. Now
> it only supports ARMv8 SEA, a type of GHESv2 error source. Afterwards,
> we can extend the supported types if needed. For the CPER section,
> currently it is memory section because kernel mainly wants userspace to
> handle the memory errors.
> 
> This patch follows the spec ACPI 6.2 to build the Hardware Error Source
> table. For more detailed information, please refer to document:
> docs/specs/acpi_hest_ghes.rst
> 
> Suggested-by: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  default-configs/arm-softmmu.mak |   1 +
>  hw/acpi/Kconfig                 |   4 +
>  hw/acpi/Makefile.objs           |   1 +
>  hw/acpi/acpi_ghes.c             | 267 ++++++++++++++++++++++++++++++++
>  hw/acpi/aml-build.c             |   2 +
>  hw/arm/virt-acpi-build.c        |  12 ++
>  include/hw/acpi/acpi_ghes.h     |  56 +++++++
>  include/hw/acpi/aml-build.h     |   1 +
>  8 files changed, 344 insertions(+)
>  create mode 100644 hw/acpi/acpi_ghes.c
>  create mode 100644 include/hw/acpi/acpi_ghes.h
> 
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 1f2e0e7fde..5722f3130e 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
>  CONFIG_FSL_IMX7=y
>  CONFIG_FSL_IMX6UL=y
>  CONFIG_SEMIHOSTING=y
> +CONFIG_ACPI_APEI=y
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index 12e3f1e86e..ed8c34d238 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -23,6 +23,10 @@ config ACPI_NVDIMM
>      bool
>      depends on ACPI
>  
> +config ACPI_APEI
> +    bool
> +    depends on ACPI
> +
>  config ACPI_PCI
>      bool
>      depends on ACPI && PCI
> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> index 655a9c1973..84474b0ca8 100644
> --- a/hw/acpi/Makefile.objs
> +++ b/hw/acpi/Makefile.objs
> @@ -5,6 +5,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
>  common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
>  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
>  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> +common-obj-$(CONFIG_ACPI_APEI) += acpi_ghes.o
>  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
>  common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
>  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> new file mode 100644
> index 0000000000..42c00ff3d3
> --- /dev/null
> +++ b/hw/acpi/acpi_ghes.c
> @@ -0,0 +1,267 @@
> +/*
> + * Support for generating APEI tables and recording CPER for Guests
> + *
> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/acpi_ghes.h"
> +#include "hw/nvram/fw_cfg.h"
> +#include "sysemu/sysemu.h"
> +#include "qemu/error-report.h"
> +
> +#define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
> +#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
> +
> +/*
> + * The size of Address field in Generic Address Structure.
> + * ACPI 2.0/3.0: 5.2.3.1 Generic Address Structure.
> + */
> +#define ACPI_GHES_ADDRESS_SIZE              8
there is not such thing as GHES_ADDRESS_SIZE.

I'd just use sizeof(unit64_t), shorter and obvious value
when seen at a call site

> +
> +/* The max size in bytes for one error block */
> +#define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> +
> +/*
> + * Now only support ARMv8 SEA notification type error source
> + */
maybe one line comment

> +#define ACPI_GHES_ERROR_SOURCE_COUNT        1
> +
> +/*
> + * Generic Hardware Error Source version 2
> + */
ditto

> +#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10


> +
> +/*
> + * | +--------------------------+ 0
> + * | |        Header            |
> + * | +--------------------------+ 40---+-
> + * | | .................        |      |
> + * | | error_status_address-----+ 60   |
> + * | | .................        |      |
> + * | | read_ack_register--------+ 104  92
> + * | | read_ack_preserve        |      |
> + * | | read_ack_write           |      |
> + * + +--------------------------+ 132--+-
> + *
> + * From above GHES definition, the error status address offset is 60;
> + * the Read Ack Register offset is 104, the whole size of GHESv2 is 92
> + */
> +
> +/* The error status address offset in GHES */
> +#define ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(start_addr, n) (start_addr + \
> +            60 + offsetof(struct AcpiGenericAddress, address) + n * 92)
> +
> +/* The Read Ack Register offset in GHES */
> +#define ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(start_addr, n) (start_addr +\
> +            104 + offsetof(struct AcpiGenericAddress, address) + n * 92)
drop this hunk, see below why

> +
> +typedef struct AcpiGhesState {
> +    uint64_t ghes_addr_le;
> +} AcpiGhesState;
> +
> +/*
> + * Hardware Error Notification
> + * ACPI 4.0: 17.3.2.7 Hardware Error Notification

add/
composes dummy Hardware Error Notification descriptor of specified type

> + */
> +static void acpi_ghes_build_notify(GArray *table, const uint8_t type)

typically format should be build_WHAT(), so
 build_ghes_hw_error_notification()

And I'd move this out into its own patch.
this applies to other trivial in-depended sub-tables,
that take all data needed to construct them from supplied arguments.

> +{
> +        /* Type */
> +        build_append_int_noprefix(table, type, 1);
> +        /*
> +         * Length:
> +         * Total length of the structure in bytes
> +         */
> +        build_append_int_noprefix(table, 28, 1);
> +        /* Configuration Write Enable */
> +        build_append_int_noprefix(table, 0, 2);
> +        /* Poll Interval */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Vector */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Switch To Polling Threshold Value */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Switch To Polling Threshold Window */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Error Threshold Value */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Error Threshold Window */
> +        build_append_int_noprefix(table, 0, 4);
> +}
> +

/*
  Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fwcfg blobs.
  See docs/specs/acpi_hest_ghes.rst for blobs format
*/
> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker)
build_ghes_error_table()

also I'd move this function into its own patch along with other
related code that initializes and wires it into virt board.

> +{
> +    int i, error_status_block_offset;
> +
> +    /*
> +     * | +--------------------------+
> +     * | |    error_block_address   |
> +     * | |      ..........          |
> +     * | +--------------------------+
> +     * | |    read_ack_register     |
> +     * | |     ...........          |
> +     * | +--------------------------+
> +     * | |  Error Status Data Block |
> +     * | |      ........            |
> +     * | +--------------------------+
> +     */
I'd drop this comment, acpi_hest_ghes.rst should be sufficient,
if it's not then fix spec. For example it's not obvious from spec
that "Error Status Data Block" immediately follows 'read_ack_register'

> +
> +    /* Build error_block_address */
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        build_append_int_noprefix(hardware_errors, 0, ACPI_GHES_ADDRESS_SIZE);
> +    }
> +
> +    /* Build read_ack_register */
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        /*
> +         * Initialize the value of read_ack_register to 1, so GHES can be
> +         * writeable in the first time.
s/in the first time/after (re)boot/

> +         * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> +         * (GHESv2 - Type 10)
> +         */
> +        build_append_int_noprefix(hardware_errors, 1, ACPI_GHES_ADDRESS_SIZE);
> +    }
> +
> +    /* Generic Error Status Block offset in the hardware error fw_cfg blob */
> +    error_status_block_offset = hardware_errors->len;
> +
> +    /* Build Error Status Data Block */

/* reserve space for Error Status Data Block */

> +    build_append_int_noprefix(hardware_errors, 0,
> +        ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
this function is for integers only, if you just need to reserve space
you can use acpi_data_push().

> +
> +    /* Allocate guest memory for the hardware error fw_cfg blob */
/* tell guest firmware to place hardware_errors blob into RAM */

> +    bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +                             hardware_errors, 1, false);
> +
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        /*
> +         * Patch the address of Error Status Data Block into
> +         * the error_block_address of hardware_errors fw_cfg blob
 Tell firmware to patch error_block_address entries to point to
 corresponding "Error Status Data Block"

> +         */
> +        bios_linker_loader_add_pointer(linker,
> +            ACPI_GHES_ERRORS_FW_CFG_FILE, ACPI_GHES_ADDRESS_SIZE * i,
> +            ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +            error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
> +    }
> +
> +    /*
> +     * Write the address of hardware_errors blob into the
> +     * hardware_errors_addr fw_cfg blob.
/*
tell firmware to write hardware_errors GPA into hardware_errors_addr fw_cfg,
once the former has been initialized.
*/

> +     */
> +    bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
> +        0, ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
> +}
> +
> +/* Build Hardware Error Source Table */
> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_errors,
> +                          BIOSLinker *linker)
it's not GEST specific table, so
  build_hest()

> +{
> +    uint32_t hest_start = table_data->len;
> +    uint32_t source_id = 0;
> +
> +    /* Hardware Error Source Table header*/
> +    acpi_data_push(table_data, sizeof(AcpiTableHeader));
> +
> +    /* Error Source Count */
> +    build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> +

this is the place where all error source structures will be enumerated.
I'd move out GHESv2 specific coed into a separate function so that code here
would look like this

    build_ghes_v2(...);
    
    
   
> +    /*
> +     * Type:
> +     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> +     */
> +    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
> +    /*
> +     * Source Id

> +     * Once we support more than one hardware error sources, we need to
> +     * increase the value of this field.
I'm not sure ^^^ is correct, according to spec it's just unique id per
distinct error structure, so we just assign arbitrary values to each
declared source and that never changes once assigned.

For now I'd make source_id an enum with one member
  enum {
    ACPI_HEST_SRC_ID_SEA = 0,
    /* future ids go here */
    ACPI_HEST_SRC_ID_RESERVED,
  }

and use that instead of allocating magic 0 at the beginning of the function.
 build_ghes_v2(ACPI_HEST_GHES_SEA);
Also add a comment to declaration that already assigned values are not to be changed

> +     */
> +    build_append_int_noprefix(table_data, source_id, 2);
> +    /* Related Source Id */
> +    build_append_int_noprefix(table_data, 0xffff, 2);
> +    /* Flags */
> +    build_append_int_noprefix(table_data, 0, 1);
> +    /* Enabled */
> +    build_append_int_noprefix(table_data, 1, 1);
> +
> +    /* Number of Records To Pre-allocate */
> +    build_append_int_noprefix(table_data, 1, 4);
> +    /* Max Sections Per Record */
> +    build_append_int_noprefix(table_data, 1, 4);
> +    /* Max Raw Data Length */
> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> +
> +    /* Error Status Address */
> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> +                     4 /* QWord access */, 0);
> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
it's fine only if GHESv2 is the only entries in HEST, but once
other types are added this macro will silently fall apart and
cause table corruption.

Instead of offset from hest_start, I suggest to use offset relative
to GAS structure, here is an idea

#define GAS_ADDR_OFFSET 4

    off = table->len
    build_append_gas()
    bios_linker_loader_add_pointer(...,
        off + GAS_ADDR_OFFSET, ...

> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +        source_id * ACPI_GHES_ADDRESS_SIZE);
> +
> +    /*
> +     * Notification Structure
> +     * Now only enable ARMv8 SEA notification type
> +     */
> +    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
> +
> +    /* Error Status Block Length */
> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> +
> +    /*
> +     * Read Ack Register
> +     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> +     * version 2 (GHESv2 - Type 10)
> +     */
> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> +                     4 /* QWord access */, 0);
> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> +        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
ditto

> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
> +
> +    /*
> +     * Read Ack Preserve
> +     * We only provide the first bit in Read Ack Register to OSPM to write
> +     * while the other bits are preserved.
> +     */
> +    build_append_int_noprefix(table_data, ~0x1ULL, 8);
> +    /* Read Ack Write */
> +    build_append_int_noprefix(table_data, 0x1, 8);
> +
> +    build_header(linker, table_data, (void *)(table_data->data + hest_start),
> +        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
hest is not GHEST specific so s/GHES/NULL/
                                                         
> +}
> +
> +static AcpiGhesState ges;
> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> +{
> +
> +    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
> +    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
> +

> +    /* Create a read-only fw_cfg file for GHES */
> +    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
> +                    request_block_size);
> +
> +    /* Create a read-write fw_cfg file for Address */
> +    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> +        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> +}
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 2c3702b882..3681ec6e3d 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
>      tables->table_data = g_array_new(false, true /* clear */, 1);
>      tables->tcpalog = g_array_new(false, true /* clear */, 1);
>      tables->vmgenid = g_array_new(false, true /* clear */, 1);
> +    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
>      tables->linker = bios_linker_loader_init();
>  }
>  
> @@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
>      g_array_free(tables->table_data, true);
>      g_array_free(tables->tcpalog, mfre);
>      g_array_free(tables->vmgenid, mfre);
> +    g_array_free(tables->hardware_errors, mfre);
>  }
>  
>  /*
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 4cd50175e0..1b1fd273e4 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -48,6 +48,7 @@
>  #include "sysemu/reset.h"
>  #include "kvm_arm.h"
>  #include "migration/vmstate.h"
> +#include "hw/acpi/acpi_ghes.h"
>  
>  #define ARM_SPI_BASE 32
>  
> @@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>      acpi_add_table(table_offsets, tables_blob);
>      build_spcr(tables_blob, tables->linker, vms);
>  
> +    if (vms->ras) {
> +        acpi_add_table(table_offsets, tables_blob);
> +        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
> +        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
> +                             tables->linker);
> +    }
> +
>      if (ms->numa_state->num_nodes > 0) {
>          acpi_add_table(table_offsets, tables_blob);
>          build_srat(tables_blob, tables->linker, vms);
> @@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
>      fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
>                      acpi_data_len(tables.tcpalog));
>  
> +    if (vms->ras) {
> +        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
> +    }
> +
>      build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
>                                               build_state, tables.rsdp,
>                                               ACPI_BUILD_RSDP_FILE, 0);
> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> new file mode 100644
> index 0000000000..cb62ec9c7b
> --- /dev/null
> +++ b/include/hw/acpi/acpi_ghes.h
> @@ -0,0 +1,56 @@
> +/*
> + * Support for generating APEI tables and recording CPER for Guests
> + *
> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef ACPI_GHES_H
> +#define ACPI_GHES_H
> +
> +#include "hw/acpi/bios-linker-loader.h"
> +
> +/*
> + * Values for Hardware Error Notification Type field
> + */
> +enum AcpiGhesNotifyType {
> +    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
> +    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
> +    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
> +    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
> +    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
> +    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
> +    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
> +    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
> +    ACPI_GHES_NOTIFY_GPIO = 7,
> +    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_SEA = 8,
> +    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_SEI = 9,
> +    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_GSIV = 10,
> +    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
> +    ACPI_GHES_NOTIFY_SDEI = 11,
> +    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
> +};
maybe make all comment go on newline, otherwise zoo above look ugly
 
> +
> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> +                          BIOSLinker *linker);
> +
> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> +#endif
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index de4a406568..8f13620701 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -220,6 +220,7 @@ struct AcpiBuildTables {
>      GArray *rsdp;
>      GArray *tcpalog;
>      GArray *vmgenid;
> +    GArray *hardware_errors;
>      BIOSLinker *linker;
>  } AcpiBuildTables;
>  


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
@ 2019-11-15  9:38     ` Igor Mammedov
  0 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-15  9:38 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: peter.maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, xuwei5, jonathan.cameron, pbonzini, lersek, rth

On Mon, 11 Nov 2019 09:40:45 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> From: Dongjiu Geng <gengdongjiu@huawei.com>
> 
> This patch implements APEI GHES Table generation via fw_cfg blobs. Now
> it only supports ARMv8 SEA, a type of GHESv2 error source. Afterwards,
> we can extend the supported types if needed. For the CPER section,
> currently it is memory section because kernel mainly wants userspace to
> handle the memory errors.
> 
> This patch follows the spec ACPI 6.2 to build the Hardware Error Source
> table. For more detailed information, please refer to document:
> docs/specs/acpi_hest_ghes.rst
> 
> Suggested-by: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  default-configs/arm-softmmu.mak |   1 +
>  hw/acpi/Kconfig                 |   4 +
>  hw/acpi/Makefile.objs           |   1 +
>  hw/acpi/acpi_ghes.c             | 267 ++++++++++++++++++++++++++++++++
>  hw/acpi/aml-build.c             |   2 +
>  hw/arm/virt-acpi-build.c        |  12 ++
>  include/hw/acpi/acpi_ghes.h     |  56 +++++++
>  include/hw/acpi/aml-build.h     |   1 +
>  8 files changed, 344 insertions(+)
>  create mode 100644 hw/acpi/acpi_ghes.c
>  create mode 100644 include/hw/acpi/acpi_ghes.h
> 
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 1f2e0e7fde..5722f3130e 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
>  CONFIG_FSL_IMX7=y
>  CONFIG_FSL_IMX6UL=y
>  CONFIG_SEMIHOSTING=y
> +CONFIG_ACPI_APEI=y
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index 12e3f1e86e..ed8c34d238 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -23,6 +23,10 @@ config ACPI_NVDIMM
>      bool
>      depends on ACPI
>  
> +config ACPI_APEI
> +    bool
> +    depends on ACPI
> +
>  config ACPI_PCI
>      bool
>      depends on ACPI && PCI
> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> index 655a9c1973..84474b0ca8 100644
> --- a/hw/acpi/Makefile.objs
> +++ b/hw/acpi/Makefile.objs
> @@ -5,6 +5,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
>  common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
>  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
>  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> +common-obj-$(CONFIG_ACPI_APEI) += acpi_ghes.o
>  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
>  common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
>  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> new file mode 100644
> index 0000000000..42c00ff3d3
> --- /dev/null
> +++ b/hw/acpi/acpi_ghes.c
> @@ -0,0 +1,267 @@
> +/*
> + * Support for generating APEI tables and recording CPER for Guests
> + *
> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/acpi_ghes.h"
> +#include "hw/nvram/fw_cfg.h"
> +#include "sysemu/sysemu.h"
> +#include "qemu/error-report.h"
> +
> +#define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
> +#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
> +
> +/*
> + * The size of Address field in Generic Address Structure.
> + * ACPI 2.0/3.0: 5.2.3.1 Generic Address Structure.
> + */
> +#define ACPI_GHES_ADDRESS_SIZE              8
there is not such thing as GHES_ADDRESS_SIZE.

I'd just use sizeof(unit64_t), shorter and obvious value
when seen at a call site

> +
> +/* The max size in bytes for one error block */
> +#define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> +
> +/*
> + * Now only support ARMv8 SEA notification type error source
> + */
maybe one line comment

> +#define ACPI_GHES_ERROR_SOURCE_COUNT        1
> +
> +/*
> + * Generic Hardware Error Source version 2
> + */
ditto

> +#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10


> +
> +/*
> + * | +--------------------------+ 0
> + * | |        Header            |
> + * | +--------------------------+ 40---+-
> + * | | .................        |      |
> + * | | error_status_address-----+ 60   |
> + * | | .................        |      |
> + * | | read_ack_register--------+ 104  92
> + * | | read_ack_preserve        |      |
> + * | | read_ack_write           |      |
> + * + +--------------------------+ 132--+-
> + *
> + * From above GHES definition, the error status address offset is 60;
> + * the Read Ack Register offset is 104, the whole size of GHESv2 is 92
> + */
> +
> +/* The error status address offset in GHES */
> +#define ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(start_addr, n) (start_addr + \
> +            60 + offsetof(struct AcpiGenericAddress, address) + n * 92)
> +
> +/* The Read Ack Register offset in GHES */
> +#define ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(start_addr, n) (start_addr +\
> +            104 + offsetof(struct AcpiGenericAddress, address) + n * 92)
drop this hunk, see below why

> +
> +typedef struct AcpiGhesState {
> +    uint64_t ghes_addr_le;
> +} AcpiGhesState;
> +
> +/*
> + * Hardware Error Notification
> + * ACPI 4.0: 17.3.2.7 Hardware Error Notification

add/
composes dummy Hardware Error Notification descriptor of specified type

> + */
> +static void acpi_ghes_build_notify(GArray *table, const uint8_t type)

typically format should be build_WHAT(), so
 build_ghes_hw_error_notification()

And I'd move this out into its own patch.
this applies to other trivial in-depended sub-tables,
that take all data needed to construct them from supplied arguments.

> +{
> +        /* Type */
> +        build_append_int_noprefix(table, type, 1);
> +        /*
> +         * Length:
> +         * Total length of the structure in bytes
> +         */
> +        build_append_int_noprefix(table, 28, 1);
> +        /* Configuration Write Enable */
> +        build_append_int_noprefix(table, 0, 2);
> +        /* Poll Interval */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Vector */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Switch To Polling Threshold Value */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Switch To Polling Threshold Window */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Error Threshold Value */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Error Threshold Window */
> +        build_append_int_noprefix(table, 0, 4);
> +}
> +

/*
  Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fwcfg blobs.
  See docs/specs/acpi_hest_ghes.rst for blobs format
*/
> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker)
build_ghes_error_table()

also I'd move this function into its own patch along with other
related code that initializes and wires it into virt board.

> +{
> +    int i, error_status_block_offset;
> +
> +    /*
> +     * | +--------------------------+
> +     * | |    error_block_address   |
> +     * | |      ..........          |
> +     * | +--------------------------+
> +     * | |    read_ack_register     |
> +     * | |     ...........          |
> +     * | +--------------------------+
> +     * | |  Error Status Data Block |
> +     * | |      ........            |
> +     * | +--------------------------+
> +     */
I'd drop this comment, acpi_hest_ghes.rst should be sufficient,
if it's not then fix spec. For example it's not obvious from spec
that "Error Status Data Block" immediately follows 'read_ack_register'

> +
> +    /* Build error_block_address */
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        build_append_int_noprefix(hardware_errors, 0, ACPI_GHES_ADDRESS_SIZE);
> +    }
> +
> +    /* Build read_ack_register */
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        /*
> +         * Initialize the value of read_ack_register to 1, so GHES can be
> +         * writeable in the first time.
s/in the first time/after (re)boot/

> +         * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> +         * (GHESv2 - Type 10)
> +         */
> +        build_append_int_noprefix(hardware_errors, 1, ACPI_GHES_ADDRESS_SIZE);
> +    }
> +
> +    /* Generic Error Status Block offset in the hardware error fw_cfg blob */
> +    error_status_block_offset = hardware_errors->len;
> +
> +    /* Build Error Status Data Block */

/* reserve space for Error Status Data Block */

> +    build_append_int_noprefix(hardware_errors, 0,
> +        ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
this function is for integers only, if you just need to reserve space
you can use acpi_data_push().

> +
> +    /* Allocate guest memory for the hardware error fw_cfg blob */
/* tell guest firmware to place hardware_errors blob into RAM */

> +    bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +                             hardware_errors, 1, false);
> +
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        /*
> +         * Patch the address of Error Status Data Block into
> +         * the error_block_address of hardware_errors fw_cfg blob
 Tell firmware to patch error_block_address entries to point to
 corresponding "Error Status Data Block"

> +         */
> +        bios_linker_loader_add_pointer(linker,
> +            ACPI_GHES_ERRORS_FW_CFG_FILE, ACPI_GHES_ADDRESS_SIZE * i,
> +            ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +            error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
> +    }
> +
> +    /*
> +     * Write the address of hardware_errors blob into the
> +     * hardware_errors_addr fw_cfg blob.
/*
tell firmware to write hardware_errors GPA into hardware_errors_addr fw_cfg,
once the former has been initialized.
*/

> +     */
> +    bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
> +        0, ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
> +}
> +
> +/* Build Hardware Error Source Table */
> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_errors,
> +                          BIOSLinker *linker)
it's not GEST specific table, so
  build_hest()

> +{
> +    uint32_t hest_start = table_data->len;
> +    uint32_t source_id = 0;
> +
> +    /* Hardware Error Source Table header*/
> +    acpi_data_push(table_data, sizeof(AcpiTableHeader));
> +
> +    /* Error Source Count */
> +    build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> +

this is the place where all error source structures will be enumerated.
I'd move out GHESv2 specific coed into a separate function so that code here
would look like this

    build_ghes_v2(...);
    
    
   
> +    /*
> +     * Type:
> +     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> +     */
> +    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
> +    /*
> +     * Source Id

> +     * Once we support more than one hardware error sources, we need to
> +     * increase the value of this field.
I'm not sure ^^^ is correct, according to spec it's just unique id per
distinct error structure, so we just assign arbitrary values to each
declared source and that never changes once assigned.

For now I'd make source_id an enum with one member
  enum {
    ACPI_HEST_SRC_ID_SEA = 0,
    /* future ids go here */
    ACPI_HEST_SRC_ID_RESERVED,
  }

and use that instead of allocating magic 0 at the beginning of the function.
 build_ghes_v2(ACPI_HEST_GHES_SEA);
Also add a comment to declaration that already assigned values are not to be changed

> +     */
> +    build_append_int_noprefix(table_data, source_id, 2);
> +    /* Related Source Id */
> +    build_append_int_noprefix(table_data, 0xffff, 2);
> +    /* Flags */
> +    build_append_int_noprefix(table_data, 0, 1);
> +    /* Enabled */
> +    build_append_int_noprefix(table_data, 1, 1);
> +
> +    /* Number of Records To Pre-allocate */
> +    build_append_int_noprefix(table_data, 1, 4);
> +    /* Max Sections Per Record */
> +    build_append_int_noprefix(table_data, 1, 4);
> +    /* Max Raw Data Length */
> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> +
> +    /* Error Status Address */
> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> +                     4 /* QWord access */, 0);
> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
it's fine only if GHESv2 is the only entries in HEST, but once
other types are added this macro will silently fall apart and
cause table corruption.

Instead of offset from hest_start, I suggest to use offset relative
to GAS structure, here is an idea

#define GAS_ADDR_OFFSET 4

    off = table->len
    build_append_gas()
    bios_linker_loader_add_pointer(...,
        off + GAS_ADDR_OFFSET, ...

> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +        source_id * ACPI_GHES_ADDRESS_SIZE);
> +
> +    /*
> +     * Notification Structure
> +     * Now only enable ARMv8 SEA notification type
> +     */
> +    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
> +
> +    /* Error Status Block Length */
> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> +
> +    /*
> +     * Read Ack Register
> +     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> +     * version 2 (GHESv2 - Type 10)
> +     */
> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> +                     4 /* QWord access */, 0);
> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> +        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
ditto

> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
> +
> +    /*
> +     * Read Ack Preserve
> +     * We only provide the first bit in Read Ack Register to OSPM to write
> +     * while the other bits are preserved.
> +     */
> +    build_append_int_noprefix(table_data, ~0x1ULL, 8);
> +    /* Read Ack Write */
> +    build_append_int_noprefix(table_data, 0x1, 8);
> +
> +    build_header(linker, table_data, (void *)(table_data->data + hest_start),
> +        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
hest is not GHEST specific so s/GHES/NULL/
                                                         
> +}
> +
> +static AcpiGhesState ges;
> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> +{
> +
> +    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
> +    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
> +

> +    /* Create a read-only fw_cfg file for GHES */
> +    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
> +                    request_block_size);
> +
> +    /* Create a read-write fw_cfg file for Address */
> +    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> +        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> +}
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 2c3702b882..3681ec6e3d 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
>      tables->table_data = g_array_new(false, true /* clear */, 1);
>      tables->tcpalog = g_array_new(false, true /* clear */, 1);
>      tables->vmgenid = g_array_new(false, true /* clear */, 1);
> +    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
>      tables->linker = bios_linker_loader_init();
>  }
>  
> @@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
>      g_array_free(tables->table_data, true);
>      g_array_free(tables->tcpalog, mfre);
>      g_array_free(tables->vmgenid, mfre);
> +    g_array_free(tables->hardware_errors, mfre);
>  }
>  
>  /*
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 4cd50175e0..1b1fd273e4 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -48,6 +48,7 @@
>  #include "sysemu/reset.h"
>  #include "kvm_arm.h"
>  #include "migration/vmstate.h"
> +#include "hw/acpi/acpi_ghes.h"
>  
>  #define ARM_SPI_BASE 32
>  
> @@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>      acpi_add_table(table_offsets, tables_blob);
>      build_spcr(tables_blob, tables->linker, vms);
>  
> +    if (vms->ras) {
> +        acpi_add_table(table_offsets, tables_blob);
> +        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
> +        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
> +                             tables->linker);
> +    }
> +
>      if (ms->numa_state->num_nodes > 0) {
>          acpi_add_table(table_offsets, tables_blob);
>          build_srat(tables_blob, tables->linker, vms);
> @@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
>      fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
>                      acpi_data_len(tables.tcpalog));
>  
> +    if (vms->ras) {
> +        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
> +    }
> +
>      build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
>                                               build_state, tables.rsdp,
>                                               ACPI_BUILD_RSDP_FILE, 0);
> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> new file mode 100644
> index 0000000000..cb62ec9c7b
> --- /dev/null
> +++ b/include/hw/acpi/acpi_ghes.h
> @@ -0,0 +1,56 @@
> +/*
> + * Support for generating APEI tables and recording CPER for Guests
> + *
> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef ACPI_GHES_H
> +#define ACPI_GHES_H
> +
> +#include "hw/acpi/bios-linker-loader.h"
> +
> +/*
> + * Values for Hardware Error Notification Type field
> + */
> +enum AcpiGhesNotifyType {
> +    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
> +    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
> +    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
> +    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
> +    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
> +    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
> +    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
> +    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
> +    ACPI_GHES_NOTIFY_GPIO = 7,
> +    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_SEA = 8,
> +    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_SEI = 9,
> +    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_GSIV = 10,
> +    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
> +    ACPI_GHES_NOTIFY_SDEI = 11,
> +    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
> +};
maybe make all comment go on newline, otherwise zoo above look ugly
 
> +
> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> +                          BIOSLinker *linker);
> +
> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> +#endif
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index de4a406568..8f13620701 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -220,6 +220,7 @@ struct AcpiBuildTables {
>      GArray *rsdp;
>      GArray *tcpalog;
>      GArray *vmgenid;
> +    GArray *hardware_errors;
>      BIOSLinker *linker;
>  } AcpiBuildTables;
>  



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 2/6] docs: APEI GHES generation and CPER record description
  2019-11-11  1:40   ` Xiang Zheng
@ 2019-11-15  9:44     ` Igor Mammedov
  -1 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-15  9:44 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: pbonzini, mst, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

On Mon, 11 Nov 2019 09:40:44 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> From: Dongjiu Geng <gengdongjiu@huawei.com>
> 
> Add APEI/GHES detailed design document
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  docs/specs/acpi_hest_ghes.rst | 95 +++++++++++++++++++++++++++++++++++
>  docs/specs/index.rst          |  1 +
>  2 files changed, 96 insertions(+)
>  create mode 100644 docs/specs/acpi_hest_ghes.rst
> 
> diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
> new file mode 100644
> index 0000000000..348825f9d3
> --- /dev/null
> +++ b/docs/specs/acpi_hest_ghes.rst
> @@ -0,0 +1,95 @@
> +APEI tables generating and CPER record
> +======================================
> +
> +..
> +   Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> +
> +   This work is licensed under the terms of the GNU GPL, version 2 or later.
> +   See the COPYING file in the top-level directory.
> +
> +Design Details
> +--------------
> +
> +::
> +
> +         etc/acpi/tables                                 etc/hardware_errors
> +      ====================                      ==========================================
> +  + +--------------------------+            +-----------------------+
> +  | | HEST                     |            |    address            |            +--------------+
> +  | +--------------------------+            |    registers          |            | Error Status |
> +  | | GHES1                    |            | +---------------------+            | Data Block 1 |
> +  | +--------------------------+ +--------->| |error_block_address1 |----------->| +------------+
> +  | | .................        | |          | +---------------------+            | |  CPER      |
> +  | | error_status_address-----+-+ +------->| |error_block_address2 |--------+   | |  CPER      |
> +  | | .................        |   |        | +---------------------+        |   | |  ....      |
> +  | | read_ack_register--------+-+ |        | |    ..............   |        |   | |  CPER      |
> +  | | read_ack_preserve        | | |        +-----------------------+        |   | +------------+
> +  | | read_ack_write           | | | +----->| |error_block_addressN |------+ |   | Error Status |
> +  + +--------------------------+ | | |      | +---------------------+      | |   | Data Block 2 |
> +  | | GHES2                    | +-+-+----->| |read_ack_register1   |      | +-->| +------------+
> +  + +--------------------------+   | |      | +---------------------+      |     | |  CPER      |
> +  | | .................        |   | | +--->| |read_ack_register2   |      |     | |  CPER      |
> +  | | error_status_address-----+---+ | |    | +---------------------+      |     | |  ....      |
> +  | | .................        |     | |    | |  .............      |      |     | |  CPER      |
> +  | | read_ack_register--------+-----+-+    | +---------------------+      |     +-+------------+
> +  | | read_ack_preserve        |     |   +->| |read_ack_registerN   |      |     | |..........  |
> +  | | read_ack_write           |     |   |  | +---------------------+      |     | +------------+
> +  + +--------------------------|     |   |                                 |     | Error Status |
> +  | | ...............          |     |   |                                 |     | Data Block N |
> +  + +--------------------------+     |   |                                 +---->| +------------+
> +  | | GHESN                    |     |   |                                       | |  CPER      |
> +  + +--------------------------+     |   |                                       | |  CPER      |
> +  | | .................        |     |   |                                       | |  ....      |
> +  | | error_status_address-----+-----+   |                                       | |  CPER      |
> +  | | .................        |         |                                       +-+------------+
> +  | | read_ack_register--------+---------+
> +  | | read_ack_preserve        |
> +  | | read_ack_write           |
> +  + +--------------------------+

I'd merge "Error Status Data Block" with "address registers", so it would be
clear that "Error Status Data Block" is located after "read_ack_registerN"

> +
> +(1) QEMU generates the ACPI HEST table. This table goes in the current
> +    "etc/acpi/tables" fw_cfg blob. Each error source has different
> +    notification types.
> +
> +(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
> +    also needs to populate this blob. The "etc/hardware_errors" fw_cfg blob
> +    contains an address registers table and an Error Status Data Block table.
> +
> +(3) The address registers table contains N Error Block Address entries
> +    and N Read Ack Register entries. The size for each entry is 8-byte.
> +    The Error Status Data Block table contains N Error Status Data Block
> +    entries. The size for each entry is 4096(0x1000) bytes. The total size
> +    for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
> +    N is the number of the kinds of hardware error sources.
> +
> +(4) QEMU generates the ACPI linker/loader script for the firmware. The
> +    firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors"
> +    and copies blob contents there.
> +
> +(5) QEMU generates N ADD_POINTER commands, which patch addresses in the
> +    "error_status_address" fields of the HEST table with a pointer to the
> +    corresponding "address registers" in the "etc/hardware_errors" blob.
> +
> +(6) QEMU generates N ADD_POINTER commands, which patch addresses in the
> +    "read_ack_register" fields of the HEST table with a pointer to the
> +    corresponding "address registers" in the "etc/hardware_errors" blob.

s/"address registers" in/"read_ack_register" within/

> +
> +(7) QEMU generates N ADD_POINTER commands for the firmware, which patch
> +    addresses in the "error_block_address" fields with a pointer to the
> +    respective "Error Status Data Block" in the "etc/hardware_errors" blob.
> +
> +(8) QEMU defines a third and write-only fw_cfg blob which is called
> +    "etc/hardware_errors_addr". Through that blob, the firmware can send back
> +    the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
> +    blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command
> +    for the firmware. The firmware will write back the start address of
> +    "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr".
> +

> +(9) When QEMU gets a SIGBUS from the kernel, QEMU formats the CPER right into
> +    guest memory, 

s/
QEMU formats the CPER right into guest memory
/
QEMU writes CPER into corresponding "Error Status Data Block"
/

> and then injects platform specific interrupt (in case of
> +    arm/virt machine it's Synchronous External Abort) as a notification which
> +    is necessary for notifying the guest.


> +
> +(10) This notification (in virtual hardware) will be handled by the guest
> +     kernel, guest APEI driver will read the CPER which is recorded by QEMU and
> +     do the recovery.
Maybe better would be to say:
"
On receiving notification, guest APEI driver cold read the CPER error
and take appropriate action
"


also in HEST patches there is implicit ABI, which probably should be documented here.
More specifically kvm_arch_on_sigbus_vcpu() error injection
uses source_id as index in "etc/hardware_errors" to find out "Error Status Data Block"
entry corresponding to error source. So supported source_id values should be assigned
here and not be changed afterwards to make sure that guest will write error into
expected "Error Status Data Block" even if guest was migrated to a newer QEMU.


> diff --git a/docs/specs/index.rst b/docs/specs/index.rst
> index 984ba44029..3019b9c976 100644
> --- a/docs/specs/index.rst
> +++ b/docs/specs/index.rst
> @@ -13,3 +13,4 @@ Contents:
>     ppc-xive
>     ppc-spapr-xive
>     acpi_hw_reduced_hotplug
> +   acpi_hest_ghes


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 2/6] docs: APEI GHES generation and CPER record description
@ 2019-11-15  9:44     ` Igor Mammedov
  0 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-15  9:44 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: peter.maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, xuwei5, jonathan.cameron, pbonzini, lersek, rth

On Mon, 11 Nov 2019 09:40:44 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> From: Dongjiu Geng <gengdongjiu@huawei.com>
> 
> Add APEI/GHES detailed design document
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  docs/specs/acpi_hest_ghes.rst | 95 +++++++++++++++++++++++++++++++++++
>  docs/specs/index.rst          |  1 +
>  2 files changed, 96 insertions(+)
>  create mode 100644 docs/specs/acpi_hest_ghes.rst
> 
> diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
> new file mode 100644
> index 0000000000..348825f9d3
> --- /dev/null
> +++ b/docs/specs/acpi_hest_ghes.rst
> @@ -0,0 +1,95 @@
> +APEI tables generating and CPER record
> +======================================
> +
> +..
> +   Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> +
> +   This work is licensed under the terms of the GNU GPL, version 2 or later.
> +   See the COPYING file in the top-level directory.
> +
> +Design Details
> +--------------
> +
> +::
> +
> +         etc/acpi/tables                                 etc/hardware_errors
> +      ====================                      ==========================================
> +  + +--------------------------+            +-----------------------+
> +  | | HEST                     |            |    address            |            +--------------+
> +  | +--------------------------+            |    registers          |            | Error Status |
> +  | | GHES1                    |            | +---------------------+            | Data Block 1 |
> +  | +--------------------------+ +--------->| |error_block_address1 |----------->| +------------+
> +  | | .................        | |          | +---------------------+            | |  CPER      |
> +  | | error_status_address-----+-+ +------->| |error_block_address2 |--------+   | |  CPER      |
> +  | | .................        |   |        | +---------------------+        |   | |  ....      |
> +  | | read_ack_register--------+-+ |        | |    ..............   |        |   | |  CPER      |
> +  | | read_ack_preserve        | | |        +-----------------------+        |   | +------------+
> +  | | read_ack_write           | | | +----->| |error_block_addressN |------+ |   | Error Status |
> +  + +--------------------------+ | | |      | +---------------------+      | |   | Data Block 2 |
> +  | | GHES2                    | +-+-+----->| |read_ack_register1   |      | +-->| +------------+
> +  + +--------------------------+   | |      | +---------------------+      |     | |  CPER      |
> +  | | .................        |   | | +--->| |read_ack_register2   |      |     | |  CPER      |
> +  | | error_status_address-----+---+ | |    | +---------------------+      |     | |  ....      |
> +  | | .................        |     | |    | |  .............      |      |     | |  CPER      |
> +  | | read_ack_register--------+-----+-+    | +---------------------+      |     +-+------------+
> +  | | read_ack_preserve        |     |   +->| |read_ack_registerN   |      |     | |..........  |
> +  | | read_ack_write           |     |   |  | +---------------------+      |     | +------------+
> +  + +--------------------------|     |   |                                 |     | Error Status |
> +  | | ...............          |     |   |                                 |     | Data Block N |
> +  + +--------------------------+     |   |                                 +---->| +------------+
> +  | | GHESN                    |     |   |                                       | |  CPER      |
> +  + +--------------------------+     |   |                                       | |  CPER      |
> +  | | .................        |     |   |                                       | |  ....      |
> +  | | error_status_address-----+-----+   |                                       | |  CPER      |
> +  | | .................        |         |                                       +-+------------+
> +  | | read_ack_register--------+---------+
> +  | | read_ack_preserve        |
> +  | | read_ack_write           |
> +  + +--------------------------+

I'd merge "Error Status Data Block" with "address registers", so it would be
clear that "Error Status Data Block" is located after "read_ack_registerN"

> +
> +(1) QEMU generates the ACPI HEST table. This table goes in the current
> +    "etc/acpi/tables" fw_cfg blob. Each error source has different
> +    notification types.
> +
> +(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
> +    also needs to populate this blob. The "etc/hardware_errors" fw_cfg blob
> +    contains an address registers table and an Error Status Data Block table.
> +
> +(3) The address registers table contains N Error Block Address entries
> +    and N Read Ack Register entries. The size for each entry is 8-byte.
> +    The Error Status Data Block table contains N Error Status Data Block
> +    entries. The size for each entry is 4096(0x1000) bytes. The total size
> +    for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
> +    N is the number of the kinds of hardware error sources.
> +
> +(4) QEMU generates the ACPI linker/loader script for the firmware. The
> +    firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors"
> +    and copies blob contents there.
> +
> +(5) QEMU generates N ADD_POINTER commands, which patch addresses in the
> +    "error_status_address" fields of the HEST table with a pointer to the
> +    corresponding "address registers" in the "etc/hardware_errors" blob.
> +
> +(6) QEMU generates N ADD_POINTER commands, which patch addresses in the
> +    "read_ack_register" fields of the HEST table with a pointer to the
> +    corresponding "address registers" in the "etc/hardware_errors" blob.

s/"address registers" in/"read_ack_register" within/

> +
> +(7) QEMU generates N ADD_POINTER commands for the firmware, which patch
> +    addresses in the "error_block_address" fields with a pointer to the
> +    respective "Error Status Data Block" in the "etc/hardware_errors" blob.
> +
> +(8) QEMU defines a third and write-only fw_cfg blob which is called
> +    "etc/hardware_errors_addr". Through that blob, the firmware can send back
> +    the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
> +    blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command
> +    for the firmware. The firmware will write back the start address of
> +    "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr".
> +

> +(9) When QEMU gets a SIGBUS from the kernel, QEMU formats the CPER right into
> +    guest memory, 

s/
QEMU formats the CPER right into guest memory
/
QEMU writes CPER into corresponding "Error Status Data Block"
/

> and then injects platform specific interrupt (in case of
> +    arm/virt machine it's Synchronous External Abort) as a notification which
> +    is necessary for notifying the guest.


> +
> +(10) This notification (in virtual hardware) will be handled by the guest
> +     kernel, guest APEI driver will read the CPER which is recorded by QEMU and
> +     do the recovery.
Maybe better would be to say:
"
On receiving notification, guest APEI driver cold read the CPER error
and take appropriate action
"


also in HEST patches there is implicit ABI, which probably should be documented here.
More specifically kvm_arch_on_sigbus_vcpu() error injection
uses source_id as index in "etc/hardware_errors" to find out "Error Status Data Block"
entry corresponding to error source. So supported source_id values should be assigned
here and not be changed afterwards to make sure that guest will write error into
expected "Error Status Data Block" even if guest was migrated to a newer QEMU.


> diff --git a/docs/specs/index.rst b/docs/specs/index.rst
> index 984ba44029..3019b9c976 100644
> --- a/docs/specs/index.rst
> +++ b/docs/specs/index.rst
> @@ -13,3 +13,4 @@ Contents:
>     ppc-xive
>     ppc-spapr-xive
>     acpi_hw_reduced_hotplug
> +   acpi_hest_ghes



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-11  1:40   ` Xiang Zheng
@ 2019-11-15 16:37     ` Igor Mammedov
  -1 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-15 16:37 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: pbonzini, mst, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

On Mon, 11 Nov 2019 09:40:47 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> From: Dongjiu Geng <gengdongjiu@huawei.com>
> 
> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> translates the host VA delivered by host to guest PA, then fills this PA
> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> type.
> 
> When guest accesses the poisoned memory, it will generate a Synchronous
> External Abort(SEA). Then host kernel gets an APEI notification and calls
> memory_failure() to unmapped the affected page in stage 2, finally
> returns to guest.
> 
> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> Qemu, Qemu records this error address into guest APEI GHES memory and
> notifes guest using Synchronous-External-Abort(SEA).
> 
> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> in which we can setup the type of exception and the syndrome information.
> When switching to guest, the target vcpu will jump to the synchronous
> external abort vector table entry.
> 
> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> not valid and hold an UNKNOWN value. These values will be set to KVM
> register structures through KVM_SET_ONE_REG IOCTL.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>  include/hw/acpi/acpi_ghes.h |   4 +
>  include/sysemu/kvm.h        |   3 +-
>  target/arm/cpu.h            |   4 +
>  target/arm/helper.c         |   2 +-
>  target/arm/internals.h      |   5 +-
>  target/arm/kvm64.c          |  64 ++++++++
>  target/arm/tlb_helper.c     |   2 +-
>  target/i386/cpu.h           |   2 +
>  9 files changed, 377 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> index 42c00ff3d3..f5b54990c0 100644
> --- a/hw/acpi/acpi_ghes.c
> +++ b/hw/acpi/acpi_ghes.c
> @@ -39,6 +39,34 @@
>  /* The max size in bytes for one error block */
>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>  
> +/*
> + * The total size of Generic Error Data Entry
> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> + * Table 18-343 Generic Error Data Entry
> + */
> +#define ACPI_GHES_DATA_LENGTH               72
> +
> +/*
> + * The memory section CPER size,
> + * UEFI 2.6: N.2.5 Memory Error Section
> + */
maybe use one line comment

> +#define ACPI_GHES_MEM_CPER_LENGTH           80
> +
> +/*
> + * Masks for block_status flags
> + */
ditto

> +#define ACPI_GEBS_UNCORRECTABLE         1
> +
> +/*
> + * Values for error_severity field
> + */
ditto

> +enum AcpiGenericErrorSeverity {
> +    ACPI_CPER_SEV_RECOVERABLE,
> +    ACPI_CPER_SEV_FATAL,
> +    ACPI_CPER_SEV_CORRECTED,
> +    ACPI_CPER_SEV_NONE,
I'd assign values explicitly here
  foo = x,
  ...

> +};
> +
>  /*
>   * Now only support ARMv8 SEA notification type error source
>   */
> @@ -49,6 +77,16 @@
>   */
>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>  
> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> +
> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> +    0xED, 0x7C, 0x83, 0xB1)
> +
>  /*
>   * | +--------------------------+ 0
>   * | |        Header            |
> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>      uint64_t ghes_addr_le;
>  } AcpiGhesState;
>  
> +/*
> + * Total size for Generic Error Status Block
> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> + * Table 18-380 Generic Error Status Block
> + */
> +#define ACPI_GHES_GESB_SIZE                 20

> +/* The offset of Data Length in Generic Error Status Block */
> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12

unused, drop it

> +
> +/*
> + * Record the value of data length for each error status block to avoid getting
> + * this value from guest.
> + */
> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> +
> +/*
> + * Generic Error Data Entry
> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> + */
> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> +                uint32_t error_severity, uint16_t revision,
> +                uint8_t validation_bits, uint8_t flags,
> +                uint32_t error_data_length, QemuUUID fru_id,
> +                uint8_t *fru_text, uint64_t time_stamp)
> +{
> +    QemuUUID uuid_le;
> +
> +    /* Section Type */
> +    uuid_le = qemu_uuid_bswap(section_type);
> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> +
> +    /* Error Severity */
> +    build_append_int_noprefix(table, error_severity, 4);
> +    /* Revision */
> +    build_append_int_noprefix(table, revision, 2);
> +    /* Validation Bits */
> +    build_append_int_noprefix(table, validation_bits, 1);
> +    /* Flags */
> +    build_append_int_noprefix(table, flags, 1);
> +    /* Error Data Length */
> +    build_append_int_noprefix(table, error_data_length, 4);
> +
> +    /* FRU Id */
> +    uuid_le = qemu_uuid_bswap(fru_id);
> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> +
> +    /* FRU Text */
> +    g_array_append_vals(table, fru_text, 20);
what if fru_text were shorter than 20 bytes?

Suggest to pass length along or
drop all fru handling in the caller and just hardcode here invalid fru with empty text,
as function could be extended later, once there is something meaningful to put in fru.


> +    /* Timestamp */
> +    build_append_int_noprefix(table, time_stamp, 8);
> +}
> +
> +/*
> + * Generic Error Status Block
> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> + */
> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> +                uint32_t raw_data_offset, uint32_t raw_data_length,
> +                uint32_t data_length, uint32_t error_severity)
> +{
> +    /* Block Status */
> +    build_append_int_noprefix(table, block_status, 4);
> +    /* Raw Data Offset */
> +    build_append_int_noprefix(table, raw_data_offset, 4);
> +    /* Raw Data Length */
> +    build_append_int_noprefix(table, raw_data_length, 4);
> +    /* Data Length */
> +    build_append_int_noprefix(table, data_length, 4);
> +    /* Error Severity */
> +    build_append_int_noprefix(table, error_severity, 4);
> +}
> +
> +/* UEFI 2.6: N.2.5 Memory Error Section */
> +static void acpi_ghes_build_append_mem_cper(GArray *table,
> +                                            uint64_t error_physical_addr)
I'd split out this and acpi_ghes_generic_error_status() and
acpi_ghes_generic_error_data()  functions into a separate patch.

> +{
> +    /*
> +     * Memory Error Record
> +     */
> +
> +    /* Validation Bits */
> +    build_append_int_noprefix(table,

> +                              (1UL << 14) | /* Type Valid */
> +                              (1UL << 1) /* Physical Address Valid */,
shouldn't it use ULL suffixes?

> +                              8);
> +    /* Error Status */
> +    build_append_int_noprefix(table, 0, 8);
> +    /* Physical Address */
> +    build_append_int_noprefix(table, error_physical_addr, 8);
> +    /* Skip all the detailed information normally found in such a record */
> +    build_append_int_noprefix(table, 0, 48);
> +    /* Memory Error Type */
> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> +    /* Skip all the detailed information normally found in such a record */
> +    build_append_int_noprefix(table, 0, 7);
> +}
> +
> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> +                                      uint64_t error_physical_addr,
> +                                      uint32_t data_length)
> +{
> +    GArray *block;
> +    uint64_t current_block_length;
> +    /* Memory Error Section Type */
> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
                               ^^
UEFI_CPER_SEC_PLATFORM_MEM is defined as BE, so _le here is wrong
and then later you use qemu_uuid_bswap() to make it LE.

Why not define it as LE to begin with, like it's been done for NVDIMM_UUID_LE?


> +    QemuUUID fru_id = {};
> +    uint8_t fru_text[20] = {};
> +
> +    /*
> +     * Generic Error Status Block
> +     * | +---------------------+
> +     * | |     block_status    |
> +     * | +---------------------+
> +     * | |    raw_data_offset  |
> +     * | +---------------------+
> +     * | |    raw_data_length  |
> +     * | +---------------------+
> +     * | |     data_length     |
> +     * | +---------------------+
> +     * | |   error_severity    |
> +     * | +---------------------+
> +     */
not necessary, just point to concrete part of ACPI spec if needed.

> +    block = g_array_new(false, true /* clear */, 1);
> +
> +    /* The current whole length of the generic error status block */
> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> +
> +    /* This is the length if adding a new generic error data entry*/
> +    data_length += ACPI_GHES_DATA_LENGTH;
> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> +
> +    /*
> +     * Check whether it will run out of the preallocated memory if adding a new
> +     * generic error data entry
> +     */
> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> +        error_report("Record CPER out of boundary!!!");
> +        return ACPI_GHES_CPER_FAIL;
> +    }
> +
> +    /* Build the new generic error status block header */
> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
Down the road, the arguments are passed to build_append_int_noprefix() which takes
numbers in host byte order, so manually calling cpu_to_le32() is wrong.
just drop cpu_to_le32() here.


> +
> +    /* Write back above generic error status block header to guest memory */
> +    cpu_physical_memory_write(error_block_address, block->data,
> +                              block->len);
> +
> +    /* Add a new generic error data entry */
> +
> +    data_length = block->len;
> +    /* Build this new generic error data entry header */
> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
ditto

> +
> +    /* Build the memory section CPER for above new generic error data entry */
> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> +
> +    /* Write back above this new generic error data entry to guest memory */
> +    cpu_physical_memory_write(error_block_address + current_block_length,
> +        block->data + data_length, block->len - data_length);

If I read it right you are in the first write build an updated "Error Status Block"
header where you update "Data Length" to account for an additional
"Error Data Entry" and then this second write appends a new "Error Data Entry"
after the previous one (if any existed).

Now for GHESv2, OSPM is supposed to copy existing "Error Status Block" and Ack
that fact via "Read Ack Register" and QEMU must not overwrite old data until
they are acked by OSPM.

With that in mind appending a new error seems a pointless since guest
already consumed any pre-existing error before we are able to write.
So we can drop "Error Status Block" tracking and just
 1. compose whole "Error Status Block" with 1 new "Error Data Entry"
 2. check that it fits into start, start+ACPI_GHES_MAX_RAW_DATA_LENGTH range
 3. push it into guest RAM with 1 only write

and drop all data_length tracking related code.

> +
> +    g_array_free(block, true);
> +
> +    return ACPI_GHES_CPER_OK;
> +}
> +
>  /*
>   * Hardware Error Notification
>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>  }
> +
> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> +{
> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> +    int loop = 0;
> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
                                         ^^^^^^^^^^^^^^^
Forgot to mention in patch [3/6],

Migration is definitively broken here, since ges.ghes_addr_le is
not migrated to target QEMU. For example how it should be done see:
  vmgenid_addr_le and vmstate_vmgenid

for that you'd need to make ghes_addr_le a part of some device
(recently added hw/acpi/generic_event_device.c looks like suitable victim)


> +    bool ret = ACPI_GHES_CPER_FAIL;
> +    uint8_t source_id;

> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
put map at the beginning of this file 

s/const/static const/
s/error_source_id/ghes_notify2source_id_map/
 = { ...,
     ACPI_HEST_SCR_ID_SEA,
     ...,
     ACPI_HEST_SRC_ID_RESERVED
   }


> +
> +    /*
> +     * | +---------------------+ ges.ghes_addr_le
> +     * | |error_block_address0 |
> +     * | +---------------------+ --+--
> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> +     * | +---------------------+ --+--
> +     * | |error_block_addressN |
> +     * | +---------------------+
> +     * | | read_ack_register0  |
> +     * | +---------------------+ --+--
> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> +     * | +---------------------+ --+--
> +     * | | read_ack_registerN  |

above part is not necessary

> +     * | +---------------------+ --+--
> +     * | |      CPER           |   |
> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> +     * | |      CPER           |   |
> +     * | +---------------------+ --+--
and this one is not precise as it holds not only CPER record
Generic Error Status Block + Generic Error Data (with CPER inside)

and looking at code here and spec I'm not sure we can actually do
several Error Data Entries as implemented here, more on that later

> +     * | |    ..........       |
> +     * | +---------------------+
> +     * | |      CPER           |
> +     * | |      ....           |
> +     * | |      CPER           |
> +     * | +---------------------+
> +     */
> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> +        /* Find and check the source id for this new CPER */
> +        source_id = error_source_id[notify];
> +        if (source_id != 0xff) {
> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> +        } else {
> +            goto out;
assert() ???


> +        }
> +
> +        cpu_physical_memory_read(start_addr, &error_block_addr,
> +                                 ACPI_GHES_ADDRESS_SIZE);
> +
> +        read_ack_register_addr = start_addr +
> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
> +retry:
> +        cpu_physical_memory_read(read_ack_register_addr,
> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
it's safer to use
   sizeof(read_ack_register)
instead of ACPI_GHES_ADDRESS_SIZE to make sure that stack won't be corrupted
by accident later, the same applies to other reads.

> +
> +        /* zero means OSPM does not acknowledge the error */
> +        if (!read_ack_register) {
> +            if (loop < 3) {
> +                usleep(100 * 1000);
> +                loop++;
> +                goto retry;
as minimum this loop can stall guest repeatedly for 0.3s if guest triggers BQL,
until it handles error.

(not sure what to suggest here though)

> +            } else {
> +                error_report("OSPM does not acknowledge previous error,"
> +                    " so can not record CPER for current error, forcibly"
> +                    " acknowledge previous error to avoid blocking next time"
> +                    " CPER record! Exit");

Also error overwrite goes against the spec, which says
"
Platforms with RAS
controllers must prevent concurrent accesses to the Error Status Block (i.e., the RAS controller
must not overwrite the Error Status Block before the OS has completed reading it).
******************
"
we probably shouldn't override not acked block.
Question is what bare metal machines do in this case?

> +                read_ack_register = 1;
> +                cpu_physical_memory_write(read_ack_register_addr,
> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
Function writes data as is, so one has to ensure that endianness of
read_ack_register matches that of the spec/guest.
The same applies to the code below marked with "^^^".

> +            }
> +        } else {
> +            if (error_block_addr) {

} else if () {

> +                read_ack_register = 0;
> +                /*
> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
> +                 * acknowledge this error.
> +                 */
> +                cpu_physical_memory_write(read_ack_register_addr,
> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
                         ^^^ - for 0 it doesn't really matter but conversion should be done
                                even if it's just for the sake of documenting interface

> +                ret = acpi_ghes_record_mem_error(error_block_addr,
                                                    ^^^^

> +                          physical_address, acpi_ghes_data_length[source_id]);
                             ^^^

> +                if (ret == ACPI_GHES_CPER_OK) {
> +                    acpi_ghes_data_length[source_id] +=
> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
eventually we will run out of space and nothing short of QEMU restart will
help to reclaim that.

Also if you keep track of available space in QEMU,
you'd also have to migrate it otherwise it's lost after migration.
But maybe we don't need to keep a track of free space,
see my another comment in acpi_ghes_record_mem_error()

> +                }
> +            }
> +        }
> +    }
> +
> +out:
> +    return ret;
> +}
> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> index cb62ec9c7b..8e3c5b879e 100644
> --- a/include/hw/acpi/acpi_ghes.h
> +++ b/include/hw/acpi/acpi_ghes.h
> @@ -24,6 +24,9 @@
>  
>  #include "hw/acpi/bios-linker-loader.h"
>  
> +#define ACPI_GHES_CPER_OK                   1
> +#define ACPI_GHES_CPER_FAIL                 0
> +
>  /*
>   * Values for Hardware Error Notification Type field
>   */
> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
>  
>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
>  #endif
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 9d143282bc..321ead8115 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
>  
> -#ifdef TARGET_I386
> -#define KVM_HAVE_MCE_INJECTION 1
> +#ifdef KVM_HAVE_MCE_INJECTION
>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
>  #endif
>  
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index d844ea21d8..c4fe6ccc63 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -28,6 +28,10 @@
>  /* ARM processors have a weak memory model */
>  #define TCG_GUEST_DEFAULT_MO      (0)
>  
> +#ifdef TARGET_AARCH64
> +#define KVM_HAVE_MCE_INJECTION 1
> +#endif
> +
>  #define EXCP_UDEF            1   /* undefined instruction */
>  #define EXCP_SWI             2   /* software interrupt */
>  #define EXCP_PREFETCH_ABORT  3
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 63815fc4cf..a9ce97efb1 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
>               * Report exception with ESR indicating a fault due to a
>               * translation table walk for a cache maintenance instruction.
>               */
> -            syn = syn_data_abort_no_iss(current_el == target_el,
> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
>              env->exception.vaddress = value;
>              env->exception.fsr = fsr;
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index f5313dd3d4..28b8451d6d 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
>  }
>  
> -static inline uint32_t syn_data_abort_no_iss(int same_el,
> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
>                                               int ea, int cm, int s1ptw,
>                                               int wnr, int fsc)
>  {
>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
>             | ARM_EL_IL
> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
> +           | (wnr << 6) | fsc;
>  }
>  
>  static inline uint32_t syn_data_abort_with_iss(int same_el,
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 28f6db57d5..c7b7653d3f 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -28,6 +28,8 @@
>  #include "kvm_arm.h"
>  #include "hw/boards.h"
>  #include "internals.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/acpi_ghes.h"
>  
>  static bool have_guest_debug;
>  
> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
>      return KVM_PUT_RUNTIME_STATE;
>  }
>  
> +/* Callers must hold the iothread mutex lock */
> +static void kvm_inject_arm_sea(CPUState *c)
> +{
> +    ARMCPU *cpu = ARM_CPU(c);
> +    CPUARMState *env = &cpu->env;
> +    CPUClass *cc = CPU_GET_CLASS(c);
> +    uint32_t esr;
> +    bool same_el;
> +
> +    c->exception_index = EXCP_DATA_ABORT;
> +    env->exception.target_el = 1;
> +
> +    /*
> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> +     */
> +    same_el = arm_current_el(env) == env->exception.target_el;
> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> +
> +    env->exception.syndrome = esr;
> +
> +    cc->do_interrupt(c);
> +}
> +
>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
>  
> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>      return ret;
>  }
>  
> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> +{
> +    ram_addr_t ram_addr;
> +    hwaddr paddr;
> +
> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
you let BUS_MCEERR_AO in but then it's unused, so what's the purpose of allowing it?

> +
> +    if (acpi_enabled && addr &&
> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> +        ram_addr = qemu_ram_addr_from_host(addr);
> +        if (ram_addr != RAM_ADDR_INVALID &&
> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> +            kvm_hwpoison_page_add(ram_addr);
> +            /*
> +             * Asynchronous signal will be masked by main thread, so
> +             * only handle synchronous signal.
> +             */
> +            if (code == BUS_MCEERR_AR) {
> +                kvm_cpu_synchronize_state(c);
> +                if (ACPI_GHES_CPER_FAIL !=
> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> +                    kvm_inject_arm_sea(c);
> +                } else {
> +                    fprintf(stderr, "failed to record the error\n");

fprintf() shouldn't be used in new code
and another question is is it's fine to ignore error ?
maybe we should use error_fatal in such cases?

> +                }
> +            }
> +            return;
> +        }
> +        fprintf(stderr, "Hardware memory error for memory used by "
> +                "QEMU itself instead of guest system!\n");

> +    }
> +
> +    if (code == BUS_MCEERR_AR) {
> +        fprintf(stderr, "Hardware memory error!\n");
> +        exit(1);
> +    }
> +}
> +
>  /* C6.6.29 BRK instruction */
>  static const uint32_t brk_insn = 0xd4200000;
>  
> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> index 5feb312941..499672ebbc 100644
> --- a/target/arm/tlb_helper.c
> +++ b/target/arm/tlb_helper.c
> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
>       * ISV field.
>       */
>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> -        syn = syn_data_abort_no_iss(same_el,
> +        syn = syn_data_abort_no_iss(same_el, 0,
>                                      ea, 0, s1ptw, is_write, fsc);
>      } else {
>          /*
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 5352c9ff55..f75a210f96 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -29,6 +29,8 @@
>  /* The x86 has a strong memory model with some store-after-load re-ordering */
>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
>  
> +#define KVM_HAVE_MCE_INJECTION 1
> +
>  /* Maximum instruction code size */
>  #define TARGET_MAX_INSN_SIZE 16
>  


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-11-15 16:37     ` Igor Mammedov
  0 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-15 16:37 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: peter.maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, xuwei5, jonathan.cameron, pbonzini, lersek, rth

On Mon, 11 Nov 2019 09:40:47 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> From: Dongjiu Geng <gengdongjiu@huawei.com>
> 
> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> translates the host VA delivered by host to guest PA, then fills this PA
> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> type.
> 
> When guest accesses the poisoned memory, it will generate a Synchronous
> External Abort(SEA). Then host kernel gets an APEI notification and calls
> memory_failure() to unmapped the affected page in stage 2, finally
> returns to guest.
> 
> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> Qemu, Qemu records this error address into guest APEI GHES memory and
> notifes guest using Synchronous-External-Abort(SEA).
> 
> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> in which we can setup the type of exception and the syndrome information.
> When switching to guest, the target vcpu will jump to the synchronous
> external abort vector table entry.
> 
> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> not valid and hold an UNKNOWN value. These values will be set to KVM
> register structures through KVM_SET_ONE_REG IOCTL.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>  include/hw/acpi/acpi_ghes.h |   4 +
>  include/sysemu/kvm.h        |   3 +-
>  target/arm/cpu.h            |   4 +
>  target/arm/helper.c         |   2 +-
>  target/arm/internals.h      |   5 +-
>  target/arm/kvm64.c          |  64 ++++++++
>  target/arm/tlb_helper.c     |   2 +-
>  target/i386/cpu.h           |   2 +
>  9 files changed, 377 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> index 42c00ff3d3..f5b54990c0 100644
> --- a/hw/acpi/acpi_ghes.c
> +++ b/hw/acpi/acpi_ghes.c
> @@ -39,6 +39,34 @@
>  /* The max size in bytes for one error block */
>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>  
> +/*
> + * The total size of Generic Error Data Entry
> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> + * Table 18-343 Generic Error Data Entry
> + */
> +#define ACPI_GHES_DATA_LENGTH               72
> +
> +/*
> + * The memory section CPER size,
> + * UEFI 2.6: N.2.5 Memory Error Section
> + */
maybe use one line comment

> +#define ACPI_GHES_MEM_CPER_LENGTH           80
> +
> +/*
> + * Masks for block_status flags
> + */
ditto

> +#define ACPI_GEBS_UNCORRECTABLE         1
> +
> +/*
> + * Values for error_severity field
> + */
ditto

> +enum AcpiGenericErrorSeverity {
> +    ACPI_CPER_SEV_RECOVERABLE,
> +    ACPI_CPER_SEV_FATAL,
> +    ACPI_CPER_SEV_CORRECTED,
> +    ACPI_CPER_SEV_NONE,
I'd assign values explicitly here
  foo = x,
  ...

> +};
> +
>  /*
>   * Now only support ARMv8 SEA notification type error source
>   */
> @@ -49,6 +77,16 @@
>   */
>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>  
> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> +
> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> +    0xED, 0x7C, 0x83, 0xB1)
> +
>  /*
>   * | +--------------------------+ 0
>   * | |        Header            |
> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>      uint64_t ghes_addr_le;
>  } AcpiGhesState;
>  
> +/*
> + * Total size for Generic Error Status Block
> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> + * Table 18-380 Generic Error Status Block
> + */
> +#define ACPI_GHES_GESB_SIZE                 20

> +/* The offset of Data Length in Generic Error Status Block */
> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12

unused, drop it

> +
> +/*
> + * Record the value of data length for each error status block to avoid getting
> + * this value from guest.
> + */
> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> +
> +/*
> + * Generic Error Data Entry
> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> + */
> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> +                uint32_t error_severity, uint16_t revision,
> +                uint8_t validation_bits, uint8_t flags,
> +                uint32_t error_data_length, QemuUUID fru_id,
> +                uint8_t *fru_text, uint64_t time_stamp)
> +{
> +    QemuUUID uuid_le;
> +
> +    /* Section Type */
> +    uuid_le = qemu_uuid_bswap(section_type);
> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> +
> +    /* Error Severity */
> +    build_append_int_noprefix(table, error_severity, 4);
> +    /* Revision */
> +    build_append_int_noprefix(table, revision, 2);
> +    /* Validation Bits */
> +    build_append_int_noprefix(table, validation_bits, 1);
> +    /* Flags */
> +    build_append_int_noprefix(table, flags, 1);
> +    /* Error Data Length */
> +    build_append_int_noprefix(table, error_data_length, 4);
> +
> +    /* FRU Id */
> +    uuid_le = qemu_uuid_bswap(fru_id);
> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> +
> +    /* FRU Text */
> +    g_array_append_vals(table, fru_text, 20);
what if fru_text were shorter than 20 bytes?

Suggest to pass length along or
drop all fru handling in the caller and just hardcode here invalid fru with empty text,
as function could be extended later, once there is something meaningful to put in fru.


> +    /* Timestamp */
> +    build_append_int_noprefix(table, time_stamp, 8);
> +}
> +
> +/*
> + * Generic Error Status Block
> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> + */
> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> +                uint32_t raw_data_offset, uint32_t raw_data_length,
> +                uint32_t data_length, uint32_t error_severity)
> +{
> +    /* Block Status */
> +    build_append_int_noprefix(table, block_status, 4);
> +    /* Raw Data Offset */
> +    build_append_int_noprefix(table, raw_data_offset, 4);
> +    /* Raw Data Length */
> +    build_append_int_noprefix(table, raw_data_length, 4);
> +    /* Data Length */
> +    build_append_int_noprefix(table, data_length, 4);
> +    /* Error Severity */
> +    build_append_int_noprefix(table, error_severity, 4);
> +}
> +
> +/* UEFI 2.6: N.2.5 Memory Error Section */
> +static void acpi_ghes_build_append_mem_cper(GArray *table,
> +                                            uint64_t error_physical_addr)
I'd split out this and acpi_ghes_generic_error_status() and
acpi_ghes_generic_error_data()  functions into a separate patch.

> +{
> +    /*
> +     * Memory Error Record
> +     */
> +
> +    /* Validation Bits */
> +    build_append_int_noprefix(table,

> +                              (1UL << 14) | /* Type Valid */
> +                              (1UL << 1) /* Physical Address Valid */,
shouldn't it use ULL suffixes?

> +                              8);
> +    /* Error Status */
> +    build_append_int_noprefix(table, 0, 8);
> +    /* Physical Address */
> +    build_append_int_noprefix(table, error_physical_addr, 8);
> +    /* Skip all the detailed information normally found in such a record */
> +    build_append_int_noprefix(table, 0, 48);
> +    /* Memory Error Type */
> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> +    /* Skip all the detailed information normally found in such a record */
> +    build_append_int_noprefix(table, 0, 7);
> +}
> +
> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> +                                      uint64_t error_physical_addr,
> +                                      uint32_t data_length)
> +{
> +    GArray *block;
> +    uint64_t current_block_length;
> +    /* Memory Error Section Type */
> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
                               ^^
UEFI_CPER_SEC_PLATFORM_MEM is defined as BE, so _le here is wrong
and then later you use qemu_uuid_bswap() to make it LE.

Why not define it as LE to begin with, like it's been done for NVDIMM_UUID_LE?


> +    QemuUUID fru_id = {};
> +    uint8_t fru_text[20] = {};
> +
> +    /*
> +     * Generic Error Status Block
> +     * | +---------------------+
> +     * | |     block_status    |
> +     * | +---------------------+
> +     * | |    raw_data_offset  |
> +     * | +---------------------+
> +     * | |    raw_data_length  |
> +     * | +---------------------+
> +     * | |     data_length     |
> +     * | +---------------------+
> +     * | |   error_severity    |
> +     * | +---------------------+
> +     */
not necessary, just point to concrete part of ACPI spec if needed.

> +    block = g_array_new(false, true /* clear */, 1);
> +
> +    /* The current whole length of the generic error status block */
> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> +
> +    /* This is the length if adding a new generic error data entry*/
> +    data_length += ACPI_GHES_DATA_LENGTH;
> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> +
> +    /*
> +     * Check whether it will run out of the preallocated memory if adding a new
> +     * generic error data entry
> +     */
> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> +        error_report("Record CPER out of boundary!!!");
> +        return ACPI_GHES_CPER_FAIL;
> +    }
> +
> +    /* Build the new generic error status block header */
> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
Down the road, the arguments are passed to build_append_int_noprefix() which takes
numbers in host byte order, so manually calling cpu_to_le32() is wrong.
just drop cpu_to_le32() here.


> +
> +    /* Write back above generic error status block header to guest memory */
> +    cpu_physical_memory_write(error_block_address, block->data,
> +                              block->len);
> +
> +    /* Add a new generic error data entry */
> +
> +    data_length = block->len;
> +    /* Build this new generic error data entry header */
> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
ditto

> +
> +    /* Build the memory section CPER for above new generic error data entry */
> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> +
> +    /* Write back above this new generic error data entry to guest memory */
> +    cpu_physical_memory_write(error_block_address + current_block_length,
> +        block->data + data_length, block->len - data_length);

If I read it right you are in the first write build an updated "Error Status Block"
header where you update "Data Length" to account for an additional
"Error Data Entry" and then this second write appends a new "Error Data Entry"
after the previous one (if any existed).

Now for GHESv2, OSPM is supposed to copy existing "Error Status Block" and Ack
that fact via "Read Ack Register" and QEMU must not overwrite old data until
they are acked by OSPM.

With that in mind appending a new error seems a pointless since guest
already consumed any pre-existing error before we are able to write.
So we can drop "Error Status Block" tracking and just
 1. compose whole "Error Status Block" with 1 new "Error Data Entry"
 2. check that it fits into start, start+ACPI_GHES_MAX_RAW_DATA_LENGTH range
 3. push it into guest RAM with 1 only write

and drop all data_length tracking related code.

> +
> +    g_array_free(block, true);
> +
> +    return ACPI_GHES_CPER_OK;
> +}
> +
>  /*
>   * Hardware Error Notification
>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>  }
> +
> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> +{
> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> +    int loop = 0;
> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
                                         ^^^^^^^^^^^^^^^
Forgot to mention in patch [3/6],

Migration is definitively broken here, since ges.ghes_addr_le is
not migrated to target QEMU. For example how it should be done see:
  vmgenid_addr_le and vmstate_vmgenid

for that you'd need to make ghes_addr_le a part of some device
(recently added hw/acpi/generic_event_device.c looks like suitable victim)


> +    bool ret = ACPI_GHES_CPER_FAIL;
> +    uint8_t source_id;

> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
put map at the beginning of this file 

s/const/static const/
s/error_source_id/ghes_notify2source_id_map/
 = { ...,
     ACPI_HEST_SCR_ID_SEA,
     ...,
     ACPI_HEST_SRC_ID_RESERVED
   }


> +
> +    /*
> +     * | +---------------------+ ges.ghes_addr_le
> +     * | |error_block_address0 |
> +     * | +---------------------+ --+--
> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> +     * | +---------------------+ --+--
> +     * | |error_block_addressN |
> +     * | +---------------------+
> +     * | | read_ack_register0  |
> +     * | +---------------------+ --+--
> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> +     * | +---------------------+ --+--
> +     * | | read_ack_registerN  |

above part is not necessary

> +     * | +---------------------+ --+--
> +     * | |      CPER           |   |
> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> +     * | |      CPER           |   |
> +     * | +---------------------+ --+--
and this one is not precise as it holds not only CPER record
Generic Error Status Block + Generic Error Data (with CPER inside)

and looking at code here and spec I'm not sure we can actually do
several Error Data Entries as implemented here, more on that later

> +     * | |    ..........       |
> +     * | +---------------------+
> +     * | |      CPER           |
> +     * | |      ....           |
> +     * | |      CPER           |
> +     * | +---------------------+
> +     */
> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> +        /* Find and check the source id for this new CPER */
> +        source_id = error_source_id[notify];
> +        if (source_id != 0xff) {
> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> +        } else {
> +            goto out;
assert() ???


> +        }
> +
> +        cpu_physical_memory_read(start_addr, &error_block_addr,
> +                                 ACPI_GHES_ADDRESS_SIZE);
> +
> +        read_ack_register_addr = start_addr +
> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
> +retry:
> +        cpu_physical_memory_read(read_ack_register_addr,
> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
it's safer to use
   sizeof(read_ack_register)
instead of ACPI_GHES_ADDRESS_SIZE to make sure that stack won't be corrupted
by accident later, the same applies to other reads.

> +
> +        /* zero means OSPM does not acknowledge the error */
> +        if (!read_ack_register) {
> +            if (loop < 3) {
> +                usleep(100 * 1000);
> +                loop++;
> +                goto retry;
as minimum this loop can stall guest repeatedly for 0.3s if guest triggers BQL,
until it handles error.

(not sure what to suggest here though)

> +            } else {
> +                error_report("OSPM does not acknowledge previous error,"
> +                    " so can not record CPER for current error, forcibly"
> +                    " acknowledge previous error to avoid blocking next time"
> +                    " CPER record! Exit");

Also error overwrite goes against the spec, which says
"
Platforms with RAS
controllers must prevent concurrent accesses to the Error Status Block (i.e., the RAS controller
must not overwrite the Error Status Block before the OS has completed reading it).
******************
"
we probably shouldn't override not acked block.
Question is what bare metal machines do in this case?

> +                read_ack_register = 1;
> +                cpu_physical_memory_write(read_ack_register_addr,
> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
Function writes data as is, so one has to ensure that endianness of
read_ack_register matches that of the spec/guest.
The same applies to the code below marked with "^^^".

> +            }
> +        } else {
> +            if (error_block_addr) {

} else if () {

> +                read_ack_register = 0;
> +                /*
> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
> +                 * acknowledge this error.
> +                 */
> +                cpu_physical_memory_write(read_ack_register_addr,
> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
                         ^^^ - for 0 it doesn't really matter but conversion should be done
                                even if it's just for the sake of documenting interface

> +                ret = acpi_ghes_record_mem_error(error_block_addr,
                                                    ^^^^

> +                          physical_address, acpi_ghes_data_length[source_id]);
                             ^^^

> +                if (ret == ACPI_GHES_CPER_OK) {
> +                    acpi_ghes_data_length[source_id] +=
> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
eventually we will run out of space and nothing short of QEMU restart will
help to reclaim that.

Also if you keep track of available space in QEMU,
you'd also have to migrate it otherwise it's lost after migration.
But maybe we don't need to keep a track of free space,
see my another comment in acpi_ghes_record_mem_error()

> +                }
> +            }
> +        }
> +    }
> +
> +out:
> +    return ret;
> +}
> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> index cb62ec9c7b..8e3c5b879e 100644
> --- a/include/hw/acpi/acpi_ghes.h
> +++ b/include/hw/acpi/acpi_ghes.h
> @@ -24,6 +24,9 @@
>  
>  #include "hw/acpi/bios-linker-loader.h"
>  
> +#define ACPI_GHES_CPER_OK                   1
> +#define ACPI_GHES_CPER_FAIL                 0
> +
>  /*
>   * Values for Hardware Error Notification Type field
>   */
> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
>  
>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
>  #endif
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 9d143282bc..321ead8115 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
>  
> -#ifdef TARGET_I386
> -#define KVM_HAVE_MCE_INJECTION 1
> +#ifdef KVM_HAVE_MCE_INJECTION
>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
>  #endif
>  
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index d844ea21d8..c4fe6ccc63 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -28,6 +28,10 @@
>  /* ARM processors have a weak memory model */
>  #define TCG_GUEST_DEFAULT_MO      (0)
>  
> +#ifdef TARGET_AARCH64
> +#define KVM_HAVE_MCE_INJECTION 1
> +#endif
> +
>  #define EXCP_UDEF            1   /* undefined instruction */
>  #define EXCP_SWI             2   /* software interrupt */
>  #define EXCP_PREFETCH_ABORT  3
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 63815fc4cf..a9ce97efb1 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
>               * Report exception with ESR indicating a fault due to a
>               * translation table walk for a cache maintenance instruction.
>               */
> -            syn = syn_data_abort_no_iss(current_el == target_el,
> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
>              env->exception.vaddress = value;
>              env->exception.fsr = fsr;
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index f5313dd3d4..28b8451d6d 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
>  }
>  
> -static inline uint32_t syn_data_abort_no_iss(int same_el,
> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
>                                               int ea, int cm, int s1ptw,
>                                               int wnr, int fsc)
>  {
>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
>             | ARM_EL_IL
> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
> +           | (wnr << 6) | fsc;
>  }
>  
>  static inline uint32_t syn_data_abort_with_iss(int same_el,
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 28f6db57d5..c7b7653d3f 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -28,6 +28,8 @@
>  #include "kvm_arm.h"
>  #include "hw/boards.h"
>  #include "internals.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/acpi_ghes.h"
>  
>  static bool have_guest_debug;
>  
> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
>      return KVM_PUT_RUNTIME_STATE;
>  }
>  
> +/* Callers must hold the iothread mutex lock */
> +static void kvm_inject_arm_sea(CPUState *c)
> +{
> +    ARMCPU *cpu = ARM_CPU(c);
> +    CPUARMState *env = &cpu->env;
> +    CPUClass *cc = CPU_GET_CLASS(c);
> +    uint32_t esr;
> +    bool same_el;
> +
> +    c->exception_index = EXCP_DATA_ABORT;
> +    env->exception.target_el = 1;
> +
> +    /*
> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> +     */
> +    same_el = arm_current_el(env) == env->exception.target_el;
> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> +
> +    env->exception.syndrome = esr;
> +
> +    cc->do_interrupt(c);
> +}
> +
>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
>  
> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>      return ret;
>  }
>  
> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> +{
> +    ram_addr_t ram_addr;
> +    hwaddr paddr;
> +
> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
you let BUS_MCEERR_AO in but then it's unused, so what's the purpose of allowing it?

> +
> +    if (acpi_enabled && addr &&
> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> +        ram_addr = qemu_ram_addr_from_host(addr);
> +        if (ram_addr != RAM_ADDR_INVALID &&
> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> +            kvm_hwpoison_page_add(ram_addr);
> +            /*
> +             * Asynchronous signal will be masked by main thread, so
> +             * only handle synchronous signal.
> +             */
> +            if (code == BUS_MCEERR_AR) {
> +                kvm_cpu_synchronize_state(c);
> +                if (ACPI_GHES_CPER_FAIL !=
> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> +                    kvm_inject_arm_sea(c);
> +                } else {
> +                    fprintf(stderr, "failed to record the error\n");

fprintf() shouldn't be used in new code
and another question is is it's fine to ignore error ?
maybe we should use error_fatal in such cases?

> +                }
> +            }
> +            return;
> +        }
> +        fprintf(stderr, "Hardware memory error for memory used by "
> +                "QEMU itself instead of guest system!\n");

> +    }
> +
> +    if (code == BUS_MCEERR_AR) {
> +        fprintf(stderr, "Hardware memory error!\n");
> +        exit(1);
> +    }
> +}
> +
>  /* C6.6.29 BRK instruction */
>  static const uint32_t brk_insn = 0xd4200000;
>  
> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> index 5feb312941..499672ebbc 100644
> --- a/target/arm/tlb_helper.c
> +++ b/target/arm/tlb_helper.c
> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
>       * ISV field.
>       */
>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> -        syn = syn_data_abort_no_iss(same_el,
> +        syn = syn_data_abort_no_iss(same_el, 0,
>                                      ea, 0, s1ptw, is_write, fsc);
>      } else {
>          /*
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 5352c9ff55..f75a210f96 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -29,6 +29,8 @@
>  /* The x86 has a strong memory model with some store-after-load re-ordering */
>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
>  
> +#define KVM_HAVE_MCE_INJECTION 1
> +
>  /* Maximum instruction code size */
>  #define TARGET_MAX_INSN_SIZE 16
>  



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
  2019-11-15  9:38     ` Igor Mammedov
@ 2019-11-18 12:49       ` gengdongjiu
  -1 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-11-18 12:49 UTC (permalink / raw)
  To: Igor Mammedov, Xiang Zheng
  Cc: pbonzini, mst, shannon.zhaosl, peter.maydell, lersek,
	james.morse, mtosatti, rth, ehabkost, jonathan.cameron, xuwei5,
	kvm, qemu-devel, qemu-arm, linuxarm, wanghaibin.wang

Hi,Igor,
   Thanks for you review and time.

>    
>> +    /*
>> +     * Type:
>> +     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
>> +     */
>> +    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
>> +    /*
>> +     * Source Id
> 
>> +     * Once we support more than one hardware error sources, we need to
>> +     * increase the value of this field.
> I'm not sure ^^^ is correct, according to spec it's just unique id per
> distinct error structure, so we just assign arbitrary values to each
> declared source and that never changes once assigned.
The source id is used to distinct the error source, for each source, the ‘source id’ is unique,
but different source has different source id. for example, the 'source id' of the error source 0 is 0,
the 'source id' of the error source 1 is 1.



> 
> For now I'd make source_id an enum with one member
>   enum {
>     ACPI_HEST_SRC_ID_SEA = 0,
>     /* future ids go here */
>     ACPI_HEST_SRC_ID_RESERVED,
>   }
If we only have one error source, we can use enum instead of allocating magic 0.
But if we have more error source , such as 10 error source. using enum  maybe not a good idea.

for example, if there are 10 error sources, I can just using below loop

for(i=0; i< 10; i++)
   build_ghes_v2(source_id++);

> 
> and use that instead of allocating magic 0 at the beginning of the function.
>  build_ghes_v2(ACPI_HEST_GHES_SEA);
> Also add a comment to declaration that already assigned values are not to be changed
> 
>> +     */
>> +    build_append_int_noprefix(table_data, source_id, 2);
>> +    /* Related Source Id */
>> +    build_append_int_noprefix(table_data, 0xffff, 2);
>> +    /* Flags */
>> +    build_append_int_noprefix(table_data, 0, 1);
>> +    /* Enabled */
>> +    build_append_int_noprefix(table_data, 1, 1);
>> +
>> +    /* Number of Records To Pre-allocate */
>> +    build_append_int_noprefix(table_data, 1, 4);
>> +    /* Max Sections Per Record */
>> +    build_append_int_noprefix(table_data, 1, 4);
>> +    /* Max Raw Data Length */
>> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
>> +
>> +    /* Error Status Address */
>> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
>> +                     4 /* QWord access */, 0);
>> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> it's fine only if GHESv2 is the only entries in HEST, but once
> other types are added this macro will silently fall apart and
> cause table corruption.
> 
> Instead of offset from hest_start, I suggest to use offset relative
> to GAS structure, here is an idea
> 
> #define GAS_ADDR_OFFSET 4
> 
>     off = table->len
>     build_append_gas()
>     bios_linker_loader_add_pointer(...,
>         off + GAS_ADDR_OFFSET, ...
I think your suggestion is good.

> 
>> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
>> +        source_id * ACPI_GHES_ADDRESS_SIZE);
>> +
>> +    /*
>> +     * Notification Structure
>> +     * Now only enable ARMv8 SEA notification type
>> +     */
>> +    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
>> +
>> +    /* Error Status Block Length */
>> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
>> +
>> +    /*
>> +     * Read Ack Register
>> +     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
>> +     * version 2 (GHESv2 - Type 10)
>> +     */
>> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
>> +                     4 /* QWord access */, 0);
>> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>> +        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
> ditto
> 
>> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
>> +        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
>> +
>> +    /*
>> +     * Read Ack Preserve
>> +     * We only provide the first bit in Read Ack Register to OSPM to write
>> +     * while the other bits are preserved.
>> +     */
>> +    build_append_int_noprefix(table_data, ~0x1ULL, 8);
>> +    /* Read Ack Write */
>> +    build_append_int_noprefix(table_data, 0x1, 8);
>> +
>> +    build_header(linker, table_data, (void *)(table_data->data + hest_start),
>> +        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
> hest is not GHEST specific so s/GHES/NULL/
>                                                          
>> +}
>> +
>> +static AcpiGhesState ges;
>> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>> +{
>> +
>> +    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
>> +    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
>> +
> 
>> +    /* Create a read-only fw_cfg file for GHES */
>> +    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
>> +                    request_block_size);
>> +
>> +    /* Create a read-write fw_cfg file for Address */
>> +    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>> +        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>> +}
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index 2c3702b882..3681ec6e3d 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
>>      tables->table_data = g_array_new(false, true /* clear */, 1);
>>      tables->tcpalog = g_array_new(false, true /* clear */, 1);
>>      tables->vmgenid = g_array_new(false, true /* clear */, 1);
>> +    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
>>      tables->linker = bios_linker_loader_init();
>>  }
>>  
>> @@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
>>      g_array_free(tables->table_data, true);
>>      g_array_free(tables->tcpalog, mfre);
>>      g_array_free(tables->vmgenid, mfre);
>> +    g_array_free(tables->hardware_errors, mfre);
>>  }
>>  
>>  /*
>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>> index 4cd50175e0..1b1fd273e4 100644
>> --- a/hw/arm/virt-acpi-build.c
>> +++ b/hw/arm/virt-acpi-build.c
>> @@ -48,6 +48,7 @@
>>  #include "sysemu/reset.h"
>>  #include "kvm_arm.h"
>>  #include "migration/vmstate.h"
>> +#include "hw/acpi/acpi_ghes.h"
>>  
>>  #define ARM_SPI_BASE 32
>>  
>> @@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>>      acpi_add_table(table_offsets, tables_blob);
>>      build_spcr(tables_blob, tables->linker, vms);
>>  
>> +    if (vms->ras) {
>> +        acpi_add_table(table_offsets, tables_blob);
>> +        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
>> +        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
>> +                             tables->linker);
>> +    }
>> +
>>      if (ms->numa_state->num_nodes > 0) {
>>          acpi_add_table(table_offsets, tables_blob);
>>          build_srat(tables_blob, tables->linker, vms);
>> @@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
>>      fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
>>                      acpi_data_len(tables.tcpalog));
>>  
>> +    if (vms->ras) {
>> +        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
>> +    }
>> +
>>      build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
>>                                               build_state, tables.rsdp,
>>                                               ACPI_BUILD_RSDP_FILE, 0);
>> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
>> new file mode 100644
>> index 0000000000..cb62ec9c7b
>> --- /dev/null
>> +++ b/include/hw/acpi/acpi_ghes.h
>> @@ -0,0 +1,56 @@
>> +/*
>> + * Support for generating APEI tables and recording CPER for Guests
>> + *
>> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
>> + *
>> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> +
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> +
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef ACPI_GHES_H
>> +#define ACPI_GHES_H
>> +
>> +#include "hw/acpi/bios-linker-loader.h"
>> +
>> +/*
>> + * Values for Hardware Error Notification Type field
>> + */
>> +enum AcpiGhesNotifyType {
>> +    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
>> +    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
>> +    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
>> +    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
>> +    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
>> +    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
>> +    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
>> +    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
>> +    ACPI_GHES_NOTIFY_GPIO = 7,
>> +    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
>> +    ACPI_GHES_NOTIFY_SEA = 8,
>> +    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
>> +    ACPI_GHES_NOTIFY_SEI = 9,
>> +    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
>> +    ACPI_GHES_NOTIFY_GSIV = 10,
>> +    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
>> +    ACPI_GHES_NOTIFY_SDEI = 11,
>> +    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
>> +};
> maybe make all comment go on newline, otherwise zoo above look ugly
sure.

>  
>> +
>> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
>> +                          BIOSLinker *linker);
>> +
>> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
>> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
>> +#endif
>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
>> index de4a406568..8f13620701 100644
>> --- a/include/hw/acpi/aml-build.h
>> +++ b/include/hw/acpi/aml-build.h
>> @@ -220,6 +220,7 @@ struct AcpiBuildTables {
>>      GArray *rsdp;
>>      GArray *tcpalog;
>>      GArray *vmgenid;
>> +    GArray *hardware_errors;
>>      BIOSLinker *linker;
>>  } AcpiBuildTables;
>>  
> 
> .
> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
@ 2019-11-18 12:49       ` gengdongjiu
  0 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-11-18 12:49 UTC (permalink / raw)
  To: Igor Mammedov, Xiang Zheng
  Cc: peter.maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	qemu-devel, linuxarm, shannon.zhaosl, qemu-arm, james.morse,
	xuwei5, jonathan.cameron, pbonzini, lersek, rth

Hi,Igor,
   Thanks for you review and time.

>    
>> +    /*
>> +     * Type:
>> +     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
>> +     */
>> +    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
>> +    /*
>> +     * Source Id
> 
>> +     * Once we support more than one hardware error sources, we need to
>> +     * increase the value of this field.
> I'm not sure ^^^ is correct, according to spec it's just unique id per
> distinct error structure, so we just assign arbitrary values to each
> declared source and that never changes once assigned.
The source id is used to distinct the error source, for each source, the ‘source id’ is unique,
but different source has different source id. for example, the 'source id' of the error source 0 is 0,
the 'source id' of the error source 1 is 1.



> 
> For now I'd make source_id an enum with one member
>   enum {
>     ACPI_HEST_SRC_ID_SEA = 0,
>     /* future ids go here */
>     ACPI_HEST_SRC_ID_RESERVED,
>   }
If we only have one error source, we can use enum instead of allocating magic 0.
But if we have more error source , such as 10 error source. using enum  maybe not a good idea.

for example, if there are 10 error sources, I can just using below loop

for(i=0; i< 10; i++)
   build_ghes_v2(source_id++);

> 
> and use that instead of allocating magic 0 at the beginning of the function.
>  build_ghes_v2(ACPI_HEST_GHES_SEA);
> Also add a comment to declaration that already assigned values are not to be changed
> 
>> +     */
>> +    build_append_int_noprefix(table_data, source_id, 2);
>> +    /* Related Source Id */
>> +    build_append_int_noprefix(table_data, 0xffff, 2);
>> +    /* Flags */
>> +    build_append_int_noprefix(table_data, 0, 1);
>> +    /* Enabled */
>> +    build_append_int_noprefix(table_data, 1, 1);
>> +
>> +    /* Number of Records To Pre-allocate */
>> +    build_append_int_noprefix(table_data, 1, 4);
>> +    /* Max Sections Per Record */
>> +    build_append_int_noprefix(table_data, 1, 4);
>> +    /* Max Raw Data Length */
>> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
>> +
>> +    /* Error Status Address */
>> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
>> +                     4 /* QWord access */, 0);
>> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> it's fine only if GHESv2 is the only entries in HEST, but once
> other types are added this macro will silently fall apart and
> cause table corruption.
> 
> Instead of offset from hest_start, I suggest to use offset relative
> to GAS structure, here is an idea
> 
> #define GAS_ADDR_OFFSET 4
> 
>     off = table->len
>     build_append_gas()
>     bios_linker_loader_add_pointer(...,
>         off + GAS_ADDR_OFFSET, ...
I think your suggestion is good.

> 
>> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
>> +        source_id * ACPI_GHES_ADDRESS_SIZE);
>> +
>> +    /*
>> +     * Notification Structure
>> +     * Now only enable ARMv8 SEA notification type
>> +     */
>> +    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
>> +
>> +    /* Error Status Block Length */
>> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
>> +
>> +    /*
>> +     * Read Ack Register
>> +     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
>> +     * version 2 (GHESv2 - Type 10)
>> +     */
>> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
>> +                     4 /* QWord access */, 0);
>> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>> +        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
> ditto
> 
>> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
>> +        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
>> +
>> +    /*
>> +     * Read Ack Preserve
>> +     * We only provide the first bit in Read Ack Register to OSPM to write
>> +     * while the other bits are preserved.
>> +     */
>> +    build_append_int_noprefix(table_data, ~0x1ULL, 8);
>> +    /* Read Ack Write */
>> +    build_append_int_noprefix(table_data, 0x1, 8);
>> +
>> +    build_header(linker, table_data, (void *)(table_data->data + hest_start),
>> +        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
> hest is not GHEST specific so s/GHES/NULL/
>                                                          
>> +}
>> +
>> +static AcpiGhesState ges;
>> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>> +{
>> +
>> +    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
>> +    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
>> +
> 
>> +    /* Create a read-only fw_cfg file for GHES */
>> +    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
>> +                    request_block_size);
>> +
>> +    /* Create a read-write fw_cfg file for Address */
>> +    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>> +        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>> +}
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index 2c3702b882..3681ec6e3d 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
>>      tables->table_data = g_array_new(false, true /* clear */, 1);
>>      tables->tcpalog = g_array_new(false, true /* clear */, 1);
>>      tables->vmgenid = g_array_new(false, true /* clear */, 1);
>> +    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
>>      tables->linker = bios_linker_loader_init();
>>  }
>>  
>> @@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
>>      g_array_free(tables->table_data, true);
>>      g_array_free(tables->tcpalog, mfre);
>>      g_array_free(tables->vmgenid, mfre);
>> +    g_array_free(tables->hardware_errors, mfre);
>>  }
>>  
>>  /*
>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>> index 4cd50175e0..1b1fd273e4 100644
>> --- a/hw/arm/virt-acpi-build.c
>> +++ b/hw/arm/virt-acpi-build.c
>> @@ -48,6 +48,7 @@
>>  #include "sysemu/reset.h"
>>  #include "kvm_arm.h"
>>  #include "migration/vmstate.h"
>> +#include "hw/acpi/acpi_ghes.h"
>>  
>>  #define ARM_SPI_BASE 32
>>  
>> @@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>>      acpi_add_table(table_offsets, tables_blob);
>>      build_spcr(tables_blob, tables->linker, vms);
>>  
>> +    if (vms->ras) {
>> +        acpi_add_table(table_offsets, tables_blob);
>> +        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
>> +        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
>> +                             tables->linker);
>> +    }
>> +
>>      if (ms->numa_state->num_nodes > 0) {
>>          acpi_add_table(table_offsets, tables_blob);
>>          build_srat(tables_blob, tables->linker, vms);
>> @@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
>>      fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
>>                      acpi_data_len(tables.tcpalog));
>>  
>> +    if (vms->ras) {
>> +        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
>> +    }
>> +
>>      build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
>>                                               build_state, tables.rsdp,
>>                                               ACPI_BUILD_RSDP_FILE, 0);
>> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
>> new file mode 100644
>> index 0000000000..cb62ec9c7b
>> --- /dev/null
>> +++ b/include/hw/acpi/acpi_ghes.h
>> @@ -0,0 +1,56 @@
>> +/*
>> + * Support for generating APEI tables and recording CPER for Guests
>> + *
>> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
>> + *
>> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> +
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> +
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef ACPI_GHES_H
>> +#define ACPI_GHES_H
>> +
>> +#include "hw/acpi/bios-linker-loader.h"
>> +
>> +/*
>> + * Values for Hardware Error Notification Type field
>> + */
>> +enum AcpiGhesNotifyType {
>> +    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
>> +    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
>> +    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
>> +    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
>> +    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
>> +    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
>> +    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
>> +    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
>> +    ACPI_GHES_NOTIFY_GPIO = 7,
>> +    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
>> +    ACPI_GHES_NOTIFY_SEA = 8,
>> +    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
>> +    ACPI_GHES_NOTIFY_SEI = 9,
>> +    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
>> +    ACPI_GHES_NOTIFY_GSIV = 10,
>> +    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
>> +    ACPI_GHES_NOTIFY_SDEI = 11,
>> +    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
>> +};
> maybe make all comment go on newline, otherwise zoo above look ugly
sure.

>  
>> +
>> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
>> +                          BIOSLinker *linker);
>> +
>> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
>> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
>> +#endif
>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
>> index de4a406568..8f13620701 100644
>> --- a/include/hw/acpi/aml-build.h
>> +++ b/include/hw/acpi/aml-build.h
>> @@ -220,6 +220,7 @@ struct AcpiBuildTables {
>>      GArray *rsdp;
>>      GArray *tcpalog;
>>      GArray *vmgenid;
>> +    GArray *hardware_errors;
>>      BIOSLinker *linker;
>>  } AcpiBuildTables;
>>  
> 
> .
> 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
  2019-11-18 12:49       ` gengdongjiu
@ 2019-11-18 13:18         ` gengdongjiu
  -1 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-11-18 13:18 UTC (permalink / raw)
  To: Igor Mammedov, Xiang Zheng
  Cc: pbonzini, mst, shannon.zhaosl, peter.maydell, lersek,
	james.morse, mtosatti, rth, ehabkost, jonathan.cameron, xuwei5,
	kvm, qemu-devel, qemu-arm, linuxarm, wanghaibin.wang

On 2019/11/18 20:49, gengdongjiu wrote:
>>> +     */
>>> +    build_append_int_noprefix(table_data, source_id, 2);
>>> +    /* Related Source Id */
>>> +    build_append_int_noprefix(table_data, 0xffff, 2);
>>> +    /* Flags */
>>> +    build_append_int_noprefix(table_data, 0, 1);
>>> +    /* Enabled */
>>> +    build_append_int_noprefix(table_data, 1, 1);
>>> +
>>> +    /* Number of Records To Pre-allocate */
>>> +    build_append_int_noprefix(table_data, 1, 4);
>>> +    /* Max Sections Per Record */
>>> +    build_append_int_noprefix(table_data, 1, 4);
>>> +    /* Max Raw Data Length */
>>> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
>>> +
>>> +    /* Error Status Address */
>>> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
>>> +                     4 /* QWord access */, 0);
>>> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>>> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
>> it's fine only if GHESv2 is the only entries in HEST, but once
>> other types are added this macro will silently fall apart and
>> cause table corruption.
   why  silently fall?
   I think the acpi_ghes.c only support GHESv2 type, not support other type.

>>
>> Instead of offset from hest_start, I suggest to use offset relative
>> to GAS structure, here is an idea>>
>> #define GAS_ADDR_OFFSET 4
>>
>>     off = table->len
>>     build_append_gas()
>>     bios_linker_loader_add_pointer(...,
>>         off + GAS_ADDR_OFFSET, ...

If use offset relative to GAS structure, the code does not easily extend to support more Generic Hardware Error Source.
if use offset relative to hest_start, just use a loop, the code can support  more error source, for example:
for (source_id = 0; i<n; source_id++)
{
   ......
    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
        sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
        source_id * sizeof(uint64_t));
  .......
}

My previous series patch support 2 error sources, but now only enable 'SEA' type Error Source


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
@ 2019-11-18 13:18         ` gengdongjiu
  0 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-11-18 13:18 UTC (permalink / raw)
  To: Igor Mammedov, Xiang Zheng
  Cc: peter.maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	qemu-devel, linuxarm, shannon.zhaosl, qemu-arm, james.morse,
	xuwei5, jonathan.cameron, pbonzini, lersek, rth

On 2019/11/18 20:49, gengdongjiu wrote:
>>> +     */
>>> +    build_append_int_noprefix(table_data, source_id, 2);
>>> +    /* Related Source Id */
>>> +    build_append_int_noprefix(table_data, 0xffff, 2);
>>> +    /* Flags */
>>> +    build_append_int_noprefix(table_data, 0, 1);
>>> +    /* Enabled */
>>> +    build_append_int_noprefix(table_data, 1, 1);
>>> +
>>> +    /* Number of Records To Pre-allocate */
>>> +    build_append_int_noprefix(table_data, 1, 4);
>>> +    /* Max Sections Per Record */
>>> +    build_append_int_noprefix(table_data, 1, 4);
>>> +    /* Max Raw Data Length */
>>> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
>>> +
>>> +    /* Error Status Address */
>>> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
>>> +                     4 /* QWord access */, 0);
>>> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>>> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
>> it's fine only if GHESv2 is the only entries in HEST, but once
>> other types are added this macro will silently fall apart and
>> cause table corruption.
   why  silently fall?
   I think the acpi_ghes.c only support GHESv2 type, not support other type.

>>
>> Instead of offset from hest_start, I suggest to use offset relative
>> to GAS structure, here is an idea>>
>> #define GAS_ADDR_OFFSET 4
>>
>>     off = table->len
>>     build_append_gas()
>>     bios_linker_loader_add_pointer(...,
>>         off + GAS_ADDR_OFFSET, ...

If use offset relative to GAS structure, the code does not easily extend to support more Generic Hardware Error Source.
if use offset relative to hest_start, just use a loop, the code can support  more error source, for example:
for (source_id = 0; i<n; source_id++)
{
   ......
    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
        sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
        source_id * sizeof(uint64_t));
  .......
}

My previous series patch support 2 error sources, but now only enable 'SEA' type Error Source



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
  2019-11-18 13:18         ` gengdongjiu
@ 2019-11-18 13:21           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 82+ messages in thread
From: Michael S. Tsirkin @ 2019-11-18 13:21 UTC (permalink / raw)
  To: gengdongjiu
  Cc: Igor Mammedov, Xiang Zheng, pbonzini, shannon.zhaosl,
	peter.maydell, lersek, james.morse, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

On Mon, Nov 18, 2019 at 09:18:01PM +0800, gengdongjiu wrote:
> On 2019/11/18 20:49, gengdongjiu wrote:
> >>> +     */
> >>> +    build_append_int_noprefix(table_data, source_id, 2);
> >>> +    /* Related Source Id */
> >>> +    build_append_int_noprefix(table_data, 0xffff, 2);
> >>> +    /* Flags */
> >>> +    build_append_int_noprefix(table_data, 0, 1);
> >>> +    /* Enabled */
> >>> +    build_append_int_noprefix(table_data, 1, 1);
> >>> +
> >>> +    /* Number of Records To Pre-allocate */
> >>> +    build_append_int_noprefix(table_data, 1, 4);
> >>> +    /* Max Sections Per Record */
> >>> +    build_append_int_noprefix(table_data, 1, 4);
> >>> +    /* Max Raw Data Length */
> >>> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> >>> +
> >>> +    /* Error Status Address */
> >>> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> >>> +                     4 /* QWord access */, 0);
> >>> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> >>> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> >> it's fine only if GHESv2 is the only entries in HEST, but once
> >> other types are added this macro will silently fall apart and
> >> cause table corruption.
>    why  silently fall?
>    I think the acpi_ghes.c only support GHESv2 type, not support other type.
> 
> >>
> >> Instead of offset from hest_start, I suggest to use offset relative
> >> to GAS structure, here is an idea>>
> >> #define GAS_ADDR_OFFSET 4
> >>
> >>     off = table->len
> >>     build_append_gas()
> >>     bios_linker_loader_add_pointer(...,
> >>         off + GAS_ADDR_OFFSET, ...
> 
> If use offset relative to GAS structure, the code does not easily extend to support more Generic Hardware Error Source.
> if use offset relative to hest_start, just use a loop, the code can support  more error source, for example:
> for (source_id = 0; i<n; source_id++)
> {
>    ......
>     bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>         ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
>         sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
>         source_id * sizeof(uint64_t));
>   .......
> }
> 
> My previous series patch support 2 error sources, but now only enable 'SEA' type Error Source

I'd try to merge this, worry about extending things later.
This is at v21 and the simpler you can keep things,
the faster it'll go in.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
@ 2019-11-18 13:21           ` Michael S. Tsirkin
  0 siblings, 0 replies; 82+ messages in thread
From: Michael S. Tsirkin @ 2019-11-18 13:21 UTC (permalink / raw)
  To: gengdongjiu
  Cc: peter.maydell, ehabkost, kvm, wanghaibin.wang, mtosatti,
	qemu-devel, linuxarm, shannon.zhaosl, Xiang Zheng, qemu-arm,
	james.morse, jonathan.cameron, pbonzini, Igor Mammedov, xuwei5,
	lersek, rth

On Mon, Nov 18, 2019 at 09:18:01PM +0800, gengdongjiu wrote:
> On 2019/11/18 20:49, gengdongjiu wrote:
> >>> +     */
> >>> +    build_append_int_noprefix(table_data, source_id, 2);
> >>> +    /* Related Source Id */
> >>> +    build_append_int_noprefix(table_data, 0xffff, 2);
> >>> +    /* Flags */
> >>> +    build_append_int_noprefix(table_data, 0, 1);
> >>> +    /* Enabled */
> >>> +    build_append_int_noprefix(table_data, 1, 1);
> >>> +
> >>> +    /* Number of Records To Pre-allocate */
> >>> +    build_append_int_noprefix(table_data, 1, 4);
> >>> +    /* Max Sections Per Record */
> >>> +    build_append_int_noprefix(table_data, 1, 4);
> >>> +    /* Max Raw Data Length */
> >>> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> >>> +
> >>> +    /* Error Status Address */
> >>> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> >>> +                     4 /* QWord access */, 0);
> >>> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> >>> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> >> it's fine only if GHESv2 is the only entries in HEST, but once
> >> other types are added this macro will silently fall apart and
> >> cause table corruption.
>    why  silently fall?
>    I think the acpi_ghes.c only support GHESv2 type, not support other type.
> 
> >>
> >> Instead of offset from hest_start, I suggest to use offset relative
> >> to GAS structure, here is an idea>>
> >> #define GAS_ADDR_OFFSET 4
> >>
> >>     off = table->len
> >>     build_append_gas()
> >>     bios_linker_loader_add_pointer(...,
> >>         off + GAS_ADDR_OFFSET, ...
> 
> If use offset relative to GAS structure, the code does not easily extend to support more Generic Hardware Error Source.
> if use offset relative to hest_start, just use a loop, the code can support  more error source, for example:
> for (source_id = 0; i<n; source_id++)
> {
>    ......
>     bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>         ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
>         sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
>         source_id * sizeof(uint64_t));
>   .......
> }
> 
> My previous series patch support 2 error sources, but now only enable 'SEA' type Error Source

I'd try to merge this, worry about extending things later.
This is at v21 and the simpler you can keep things,
the faster it'll go in.



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
  2019-11-18 13:21           ` Michael S. Tsirkin
@ 2019-11-18 13:57             ` gengdongjiu
  -1 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-11-18 13:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Igor Mammedov, Xiang Zheng, pbonzini, shannon.zhaosl,
	peter.maydell, lersek, james.morse, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

On 2019/11/18 21:21, Michael S. Tsirkin wrote:
>> If use offset relative to GAS structure, the code does not easily extend to support more Generic Hardware Error Source.
>> if use offset relative to hest_start, just use a loop, the code can support  more error source, for example:
>> for (source_id = 0; i<n; source_id++)
>> {
>>    ......
>>     bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>>         ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
>>         sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
>>         source_id * sizeof(uint64_t));
>>   .......
>> }
>>
>> My previous series patch support 2 error sources, but now only enable 'SEA' type Error Source
> I'd try to merge this, worry about extending things later.
> This is at v21 and the simpler you can keep things,
> the faster it'll go in.
Thanks a lot for the comments. Yes, I think we can merge the v21 series.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
@ 2019-11-18 13:57             ` gengdongjiu
  0 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-11-18 13:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peter.maydell, ehabkost, kvm, wanghaibin.wang, mtosatti,
	qemu-devel, linuxarm, shannon.zhaosl, Xiang Zheng, qemu-arm,
	james.morse, jonathan.cameron, pbonzini, Igor Mammedov, xuwei5,
	lersek, rth

On 2019/11/18 21:21, Michael S. Tsirkin wrote:
>> If use offset relative to GAS structure, the code does not easily extend to support more Generic Hardware Error Source.
>> if use offset relative to hest_start, just use a loop, the code can support  more error source, for example:
>> for (source_id = 0; i<n; source_id++)
>> {
>>    ......
>>     bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>>         ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
>>         sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
>>         source_id * sizeof(uint64_t));
>>   .......
>> }
>>
>> My previous series patch support 2 error sources, but now only enable 'SEA' type Error Source
> I'd try to merge this, worry about extending things later.
> This is at v21 and the simpler you can keep things,
> the faster it'll go in.
Thanks a lot for the comments. Yes, I think we can merge the v21 series.



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
  2019-11-11  1:40   ` Xiang Zheng
@ 2019-11-22 15:42     ` Beata Michalska
  -1 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-22 15:42 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: pbonzini, mst, imammedo, shannon.zhaosl, Peter Maydell,
	Laszlo Ersek, james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

Hi Xiang,

On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> From: Dongjiu Geng <gengdongjiu@huawei.com>
>
> This patch implements APEI GHES Table generation via fw_cfg blobs. Now
> it only supports ARMv8 SEA, a type of GHESv2 error source. Afterwards,
> we can extend the supported types if needed. For the CPER section,
> currently it is memory section because kernel mainly wants userspace to
> handle the memory errors.
>
> This patch follows the spec ACPI 6.2 to build the Hardware Error Source
> table. For more detailed information, please refer to document:
> docs/specs/acpi_hest_ghes.rst
>
> Suggested-by: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  default-configs/arm-softmmu.mak |   1 +
>  hw/acpi/Kconfig                 |   4 +
>  hw/acpi/Makefile.objs           |   1 +
>  hw/acpi/acpi_ghes.c             | 267 ++++++++++++++++++++++++++++++++
>  hw/acpi/aml-build.c             |   2 +
>  hw/arm/virt-acpi-build.c        |  12 ++
>  include/hw/acpi/acpi_ghes.h     |  56 +++++++
>  include/hw/acpi/aml-build.h     |   1 +
>  8 files changed, 344 insertions(+)
>  create mode 100644 hw/acpi/acpi_ghes.c
>  create mode 100644 include/hw/acpi/acpi_ghes.h
>
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 1f2e0e7fde..5722f3130e 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
>  CONFIG_FSL_IMX7=y
>  CONFIG_FSL_IMX6UL=y
>  CONFIG_SEMIHOSTING=y
> +CONFIG_ACPI_APEI=y
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index 12e3f1e86e..ed8c34d238 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -23,6 +23,10 @@ config ACPI_NVDIMM
>      bool
>      depends on ACPI
>
> +config ACPI_APEI
> +    bool
> +    depends on ACPI
> +
>  config ACPI_PCI
>      bool
>      depends on ACPI && PCI
> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> index 655a9c1973..84474b0ca8 100644
> --- a/hw/acpi/Makefile.objs
> +++ b/hw/acpi/Makefile.objs
> @@ -5,6 +5,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
>  common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
>  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
>  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> +common-obj-$(CONFIG_ACPI_APEI) += acpi_ghes.o

Minor: The 'acpi' prefix could be dropped - it does not seem to be used
for other files (self impliend by the dir name).
This also applies to most of the naming within this patch

>  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
>  common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
>  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> new file mode 100644
> index 0000000000..42c00ff3d3
> --- /dev/null
> +++ b/hw/acpi/acpi_ghes.c
> @@ -0,0 +1,267 @@
> +/*
> + * Support for generating APEI tables and recording CPER for Guests
> + *
> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/acpi_ghes.h"
> +#include "hw/nvram/fw_cfg.h"
> +#include "sysemu/sysemu.h"
> +#include "qemu/error-report.h"
> +
> +#define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
> +#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
> +
> +/*
> + * The size of Address field in Generic Address Structure.
> + * ACPI 2.0/3.0: 5.2.3.1 Generic Address Structure.
> + */
> +#define ACPI_GHES_ADDRESS_SIZE              8
> +
As already mentioned, you can safely drop this and use sizeof(unit64_t).

> +/* The max size in bytes for one error block */
> +#define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> +
> +/*
> + * Now only support ARMv8 SEA notification type error source
> + */
> +#define ACPI_GHES_ERROR_SOURCE_COUNT        1
> +
> +/*
> + * Generic Hardware Error Source version 2
> + */
> +#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10

Minor: this is actually a type so would be good if the name would
reflect that somehow......

> +
> +/*
> + * | +--------------------------+ 0
> + * | |        Header            |
> + * | +--------------------------+ 40---+-
> + * | | .................        |      |
> + * | | error_status_address-----+ 60   |
> + * | | .................        |      |
> + * | | read_ack_register--------+ 104  92
> + * | | read_ack_preserve        |      |
> + * | | read_ack_write           |      |
> + * + +--------------------------+ 132--+-
> + *
> + * From above GHES definition, the error status address offset is 60;
> + * the Read Ack Register offset is 104, the whole size of GHESv2 is 92
> + */
> +
This could potentially land into the doc instead.
Also the GHEST is actually part of HEST so your offsets are for
HEST not GHEST itself so the comment might be slightly misleading

> +/* The error status address offset in GHES */
> +#define ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(start_addr, n) (start_addr + \
> +            60 + offsetof(struct AcpiGenericAddress, address) + n * 92)
> +
> +/* The Read Ack Register offset in GHES */
> +#define ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(start_addr, n) (start_addr +\
> +            104 + offsetof(struct AcpiGenericAddress, address) + n * 92)
> +
> +typedef struct AcpiGhesState {
> +    uint64_t ghes_addr_le;
> +} AcpiGhesState;
> +
Minor: Why AcpiGhes*State* ? And do we need the struct to track single address?

> +/*
> + * Hardware Error Notification
> + * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> + */
You are referencing older spec here. The commit message states
6.2 version. Not to mention that 4.0 did not support ARMv8 SEA source.
You should not mention sections that do not correspond to the spec
the patch is based on.

> +static void acpi_ghes_build_notify(GArray *table, const uint8_t type)

As it has already been mentioned - the naming here could follow the existing
convention. Also this function is creating Hardware Error Notification table
which is not necessarily tightly connected to GHES
Similarly this applies to the overall naming used within this patch.
> +{
> +        /* Type */
> +        build_append_int_noprefix(table, type, 1);
> +        /*
> +         * Length:
> +         * Total length of the structure in bytes
> +         */
> +        build_append_int_noprefix(table, 28, 1);
> +        /* Configuration Write Enable */
> +        build_append_int_noprefix(table, 0, 2);
> +        /* Poll Interval */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Vector */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Switch To Polling Threshold Value */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Switch To Polling Threshold Window */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Error Threshold Value */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Error Threshold Window */
> +        build_append_int_noprefix(table, 0, 4);

Most of  those fields are being set to the same single value.
Why not covering it all in one go ?

> +}
> +
> +/* Build table for the hardware error fw_cfg blob */
> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker)
> +{
> +    int i, error_status_block_offset;
> +
> +    /*
> +     * | +--------------------------+
> +     * | |    error_block_address   |
> +     * | |      ..........          |
> +     * | +--------------------------+
> +     * | |    read_ack_register     |
> +     * | |     ...........          |
> +     * | +--------------------------+
> +     * | |  Error Status Data Block |
> +     * | |      ........            |
> +     * | +--------------------------+
> +     */
> +
> +    /* Build error_block_address */
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        build_append_int_noprefix(hardware_errors, 0, ACPI_GHES_ADDRESS_SIZE);
> +    }
> +
> +    /* Build read_ack_register */
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        /*
> +         * Initialize the value of read_ack_register to 1, so GHES can be
> +         * writeable in the first time.
> +         * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> +         * (GHESv2 - Type 10)
> +         */
> +        build_append_int_noprefix(hardware_errors, 1, ACPI_GHES_ADDRESS_SIZE);
This is a bit of a simplification (justified to some extent) but this
should take into
account both Read Ack Preserve and Read Ack Write masks.....
or having at least a comment would be good

Also the above implies support only for GHESTv2 (the 'Ack' regs are GHESv2
specific) still this is iterating over potentially available/supported
hw error sources
At this point it is ok but if the support gets extended this will not
be valid - managing
'Ack' regs should be properly guarded for GHESv2 ..

> +    }
> +
> +    /* Generic Error Status Block offset in the hardware error fw_cfg blob */
> +    error_status_block_offset = hardware_errors->len;
> +
> +    /* Build Error Status Data Block */
> +    build_append_int_noprefix(hardware_errors, 0,
> +        ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
> +
> +    /* Allocate guest memory for the hardware error fw_cfg blob */
> +    bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +                             hardware_errors, 1, false);
> +
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        /*
> +         * Patch the address of Error Status Data Block into
> +         * the error_block_address of hardware_errors fw_cfg blob
> +         */
> +        bios_linker_loader_add_pointer(linker,
> +            ACPI_GHES_ERRORS_FW_CFG_FILE, ACPI_GHES_ADDRESS_SIZE * i,
> +            ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +            error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
> +    }
> +
> +    /*
> +     * Write the address of hardware_errors fw_cfg blob into the
> +     * hardware_errors_addr fw_cfg blob.
> +     */
> +    bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
> +        0, ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
> +}
> +
> +/* Build Hardware Error Source Table */
> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_errors,
> +                          BIOSLinker *linker)
> +{
> +    uint32_t hest_start = table_data->len;
> +    uint32_t source_id = 0;
> +
> +    /* Hardware Error Source Table header*/
> +    acpi_data_push(table_data, sizeof(AcpiTableHeader));
> +
> +    /* Error Source Count */
> +    build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> +
> +    /*
> +     * Type:
> +     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> +     */
> +    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
> +    /*
> +     * Source Id
> +     * Once we support more than one hardware error sources, we need to
> +     * increase the value of this field.
> +     */
> +    build_append_int_noprefix(table_data, source_id, 2);
> +    /* Related Source Id */
> +    build_append_int_noprefix(table_data, 0xffff, 2);

Would be nice to have a comment on the value used ->
'no alternate sources'

> +    /* Flags */
> +    build_append_int_noprefix(table_data, 0, 1);
> +    /* Enabled */
> +    build_append_int_noprefix(table_data, 1, 1);
> +
> +    /* Number of Records To Pre-allocate */
> +    build_append_int_noprefix(table_data, 1, 4);
> +    /* Max Sections Per Record */
> +    build_append_int_noprefix(table_data, 1, 4);
> +    /* Max Raw Data Length */
> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> +
> +    /* Error Status Address */
> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> +                     4 /* QWord access */, 0);
> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +        source_id * ACPI_GHES_ADDRESS_SIZE);
> +
> +    /*
> +     * Notification Structure
> +     * Now only enable ARMv8 SEA notification type
> +     */
> +    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
> +
> +    /* Error Status Block Length */
> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> +
> +    /*
> +     * Read Ack Register
> +     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> +     * version 2 (GHESv2 - Type 10)
> +     */
> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> +                     4 /* QWord access */, 0);
> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> +        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
> +
> +    /*
> +     * Read Ack Preserve
> +     * We only provide the first bit in Read Ack Register to OSPM to write
> +     * while the other bits are preserved.
> +     */
> +    build_append_int_noprefix(table_data, ~0x1ULL, 8);
> +    /* Read Ack Write */
> +    build_append_int_noprefix(table_data, 0x1, 8);
> +
> +    build_header(linker, table_data, (void *)(table_data->data + hest_start),
> +        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
> +}
> +
Already mentioned .... but ...
the last few lines are GHESv2 specific but it seems that HES/GHES/GHESv2
are being mixed within this patch. Would be nice if those could be separated
to easy future extensions

BR

Beata

> +static AcpiGhesState ges;
> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> +{
> +
> +    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
> +    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
> +
> +    /* Create a read-only fw_cfg file for GHES */
> +    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
> +                    request_block_size);
> +
> +    /* Create a read-write fw_cfg file for Address */
> +    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> +        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> +}
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 2c3702b882..3681ec6e3d 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
>      tables->table_data = g_array_new(false, true /* clear */, 1);
>      tables->tcpalog = g_array_new(false, true /* clear */, 1);
>      tables->vmgenid = g_array_new(false, true /* clear */, 1);
> +    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
>      tables->linker = bios_linker_loader_init();
>  }
>
> @@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
>      g_array_free(tables->table_data, true);
>      g_array_free(tables->tcpalog, mfre);
>      g_array_free(tables->vmgenid, mfre);
> +    g_array_free(tables->hardware_errors, mfre);
>  }
>
>  /*
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 4cd50175e0..1b1fd273e4 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -48,6 +48,7 @@
>  #include "sysemu/reset.h"
>  #include "kvm_arm.h"
>  #include "migration/vmstate.h"
> +#include "hw/acpi/acpi_ghes.h"
>
>  #define ARM_SPI_BASE 32
>
> @@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>      acpi_add_table(table_offsets, tables_blob);
>      build_spcr(tables_blob, tables->linker, vms);
>
> +    if (vms->ras) {
> +        acpi_add_table(table_offsets, tables_blob);
> +        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
> +        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
> +                             tables->linker);
> +    }
> +
>      if (ms->numa_state->num_nodes > 0) {
>          acpi_add_table(table_offsets, tables_blob);
>          build_srat(tables_blob, tables->linker, vms);
> @@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
>      fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
>                      acpi_data_len(tables.tcpalog));
>
> +    if (vms->ras) {
> +        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
> +    }
> +
>      build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
>                                               build_state, tables.rsdp,
>                                               ACPI_BUILD_RSDP_FILE, 0);
> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> new file mode 100644
> index 0000000000..cb62ec9c7b
> --- /dev/null
> +++ b/include/hw/acpi/acpi_ghes.h
> @@ -0,0 +1,56 @@
> +/*
> + * Support for generating APEI tables and recording CPER for Guests
> + *
> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef ACPI_GHES_H
> +#define ACPI_GHES_H
> +
> +#include "hw/acpi/bios-linker-loader.h"
> +
> +/*
> + * Values for Hardware Error Notification Type field
> + */
> +enum AcpiGhesNotifyType {
> +    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
> +    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
> +    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
> +    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
> +    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
> +    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
> +    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
> +    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
> +    ACPI_GHES_NOTIFY_GPIO = 7,
> +    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_SEA = 8,
> +    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_SEI = 9,
> +    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_GSIV = 10,
> +    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
> +    ACPI_GHES_NOTIFY_SDEI = 11,
> +    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
> +};
> +
> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> +                          BIOSLinker *linker);
> +
> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> +#endif
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index de4a406568..8f13620701 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -220,6 +220,7 @@ struct AcpiBuildTables {
>      GArray *rsdp;
>      GArray *tcpalog;
>      GArray *vmgenid;
> +    GArray *hardware_errors;
>      BIOSLinker *linker;
>  } AcpiBuildTables;
>
> --
> 2.19.1
>
>
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
@ 2019-11-22 15:42     ` Beata Michalska
  0 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-22 15:42 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, jonathan.cameron, imammedo, pbonzini, xuwei5,
	Laszlo Ersek, rth

Hi Xiang,

On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> From: Dongjiu Geng <gengdongjiu@huawei.com>
>
> This patch implements APEI GHES Table generation via fw_cfg blobs. Now
> it only supports ARMv8 SEA, a type of GHESv2 error source. Afterwards,
> we can extend the supported types if needed. For the CPER section,
> currently it is memory section because kernel mainly wants userspace to
> handle the memory errors.
>
> This patch follows the spec ACPI 6.2 to build the Hardware Error Source
> table. For more detailed information, please refer to document:
> docs/specs/acpi_hest_ghes.rst
>
> Suggested-by: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  default-configs/arm-softmmu.mak |   1 +
>  hw/acpi/Kconfig                 |   4 +
>  hw/acpi/Makefile.objs           |   1 +
>  hw/acpi/acpi_ghes.c             | 267 ++++++++++++++++++++++++++++++++
>  hw/acpi/aml-build.c             |   2 +
>  hw/arm/virt-acpi-build.c        |  12 ++
>  include/hw/acpi/acpi_ghes.h     |  56 +++++++
>  include/hw/acpi/aml-build.h     |   1 +
>  8 files changed, 344 insertions(+)
>  create mode 100644 hw/acpi/acpi_ghes.c
>  create mode 100644 include/hw/acpi/acpi_ghes.h
>
> diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> index 1f2e0e7fde..5722f3130e 100644
> --- a/default-configs/arm-softmmu.mak
> +++ b/default-configs/arm-softmmu.mak
> @@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
>  CONFIG_FSL_IMX7=y
>  CONFIG_FSL_IMX6UL=y
>  CONFIG_SEMIHOSTING=y
> +CONFIG_ACPI_APEI=y
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index 12e3f1e86e..ed8c34d238 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -23,6 +23,10 @@ config ACPI_NVDIMM
>      bool
>      depends on ACPI
>
> +config ACPI_APEI
> +    bool
> +    depends on ACPI
> +
>  config ACPI_PCI
>      bool
>      depends on ACPI && PCI
> diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> index 655a9c1973..84474b0ca8 100644
> --- a/hw/acpi/Makefile.objs
> +++ b/hw/acpi/Makefile.objs
> @@ -5,6 +5,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
>  common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
>  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
>  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> +common-obj-$(CONFIG_ACPI_APEI) += acpi_ghes.o

Minor: The 'acpi' prefix could be dropped - it does not seem to be used
for other files (self impliend by the dir name).
This also applies to most of the naming within this patch

>  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
>  common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
>  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> new file mode 100644
> index 0000000000..42c00ff3d3
> --- /dev/null
> +++ b/hw/acpi/acpi_ghes.c
> @@ -0,0 +1,267 @@
> +/*
> + * Support for generating APEI tables and recording CPER for Guests
> + *
> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/acpi_ghes.h"
> +#include "hw/nvram/fw_cfg.h"
> +#include "sysemu/sysemu.h"
> +#include "qemu/error-report.h"
> +
> +#define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
> +#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
> +
> +/*
> + * The size of Address field in Generic Address Structure.
> + * ACPI 2.0/3.0: 5.2.3.1 Generic Address Structure.
> + */
> +#define ACPI_GHES_ADDRESS_SIZE              8
> +
As already mentioned, you can safely drop this and use sizeof(unit64_t).

> +/* The max size in bytes for one error block */
> +#define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> +
> +/*
> + * Now only support ARMv8 SEA notification type error source
> + */
> +#define ACPI_GHES_ERROR_SOURCE_COUNT        1
> +
> +/*
> + * Generic Hardware Error Source version 2
> + */
> +#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10

Minor: this is actually a type so would be good if the name would
reflect that somehow......

> +
> +/*
> + * | +--------------------------+ 0
> + * | |        Header            |
> + * | +--------------------------+ 40---+-
> + * | | .................        |      |
> + * | | error_status_address-----+ 60   |
> + * | | .................        |      |
> + * | | read_ack_register--------+ 104  92
> + * | | read_ack_preserve        |      |
> + * | | read_ack_write           |      |
> + * + +--------------------------+ 132--+-
> + *
> + * From above GHES definition, the error status address offset is 60;
> + * the Read Ack Register offset is 104, the whole size of GHESv2 is 92
> + */
> +
This could potentially land into the doc instead.
Also the GHEST is actually part of HEST so your offsets are for
HEST not GHEST itself so the comment might be slightly misleading

> +/* The error status address offset in GHES */
> +#define ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(start_addr, n) (start_addr + \
> +            60 + offsetof(struct AcpiGenericAddress, address) + n * 92)
> +
> +/* The Read Ack Register offset in GHES */
> +#define ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(start_addr, n) (start_addr +\
> +            104 + offsetof(struct AcpiGenericAddress, address) + n * 92)
> +
> +typedef struct AcpiGhesState {
> +    uint64_t ghes_addr_le;
> +} AcpiGhesState;
> +
Minor: Why AcpiGhes*State* ? And do we need the struct to track single address?

> +/*
> + * Hardware Error Notification
> + * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> + */
You are referencing older spec here. The commit message states
6.2 version. Not to mention that 4.0 did not support ARMv8 SEA source.
You should not mention sections that do not correspond to the spec
the patch is based on.

> +static void acpi_ghes_build_notify(GArray *table, const uint8_t type)

As it has already been mentioned - the naming here could follow the existing
convention. Also this function is creating Hardware Error Notification table
which is not necessarily tightly connected to GHES
Similarly this applies to the overall naming used within this patch.
> +{
> +        /* Type */
> +        build_append_int_noprefix(table, type, 1);
> +        /*
> +         * Length:
> +         * Total length of the structure in bytes
> +         */
> +        build_append_int_noprefix(table, 28, 1);
> +        /* Configuration Write Enable */
> +        build_append_int_noprefix(table, 0, 2);
> +        /* Poll Interval */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Vector */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Switch To Polling Threshold Value */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Switch To Polling Threshold Window */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Error Threshold Value */
> +        build_append_int_noprefix(table, 0, 4);
> +        /* Error Threshold Window */
> +        build_append_int_noprefix(table, 0, 4);

Most of  those fields are being set to the same single value.
Why not covering it all in one go ?

> +}
> +
> +/* Build table for the hardware error fw_cfg blob */
> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker)
> +{
> +    int i, error_status_block_offset;
> +
> +    /*
> +     * | +--------------------------+
> +     * | |    error_block_address   |
> +     * | |      ..........          |
> +     * | +--------------------------+
> +     * | |    read_ack_register     |
> +     * | |     ...........          |
> +     * | +--------------------------+
> +     * | |  Error Status Data Block |
> +     * | |      ........            |
> +     * | +--------------------------+
> +     */
> +
> +    /* Build error_block_address */
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        build_append_int_noprefix(hardware_errors, 0, ACPI_GHES_ADDRESS_SIZE);
> +    }
> +
> +    /* Build read_ack_register */
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        /*
> +         * Initialize the value of read_ack_register to 1, so GHES can be
> +         * writeable in the first time.
> +         * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> +         * (GHESv2 - Type 10)
> +         */
> +        build_append_int_noprefix(hardware_errors, 1, ACPI_GHES_ADDRESS_SIZE);
This is a bit of a simplification (justified to some extent) but this
should take into
account both Read Ack Preserve and Read Ack Write masks.....
or having at least a comment would be good

Also the above implies support only for GHESTv2 (the 'Ack' regs are GHESv2
specific) still this is iterating over potentially available/supported
hw error sources
At this point it is ok but if the support gets extended this will not
be valid - managing
'Ack' regs should be properly guarded for GHESv2 ..

> +    }
> +
> +    /* Generic Error Status Block offset in the hardware error fw_cfg blob */
> +    error_status_block_offset = hardware_errors->len;
> +
> +    /* Build Error Status Data Block */
> +    build_append_int_noprefix(hardware_errors, 0,
> +        ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
> +
> +    /* Allocate guest memory for the hardware error fw_cfg blob */
> +    bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +                             hardware_errors, 1, false);
> +
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        /*
> +         * Patch the address of Error Status Data Block into
> +         * the error_block_address of hardware_errors fw_cfg blob
> +         */
> +        bios_linker_loader_add_pointer(linker,
> +            ACPI_GHES_ERRORS_FW_CFG_FILE, ACPI_GHES_ADDRESS_SIZE * i,
> +            ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +            error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
> +    }
> +
> +    /*
> +     * Write the address of hardware_errors fw_cfg blob into the
> +     * hardware_errors_addr fw_cfg blob.
> +     */
> +    bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
> +        0, ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
> +}
> +
> +/* Build Hardware Error Source Table */
> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_errors,
> +                          BIOSLinker *linker)
> +{
> +    uint32_t hest_start = table_data->len;
> +    uint32_t source_id = 0;
> +
> +    /* Hardware Error Source Table header*/
> +    acpi_data_push(table_data, sizeof(AcpiTableHeader));
> +
> +    /* Error Source Count */
> +    build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> +
> +    /*
> +     * Type:
> +     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> +     */
> +    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
> +    /*
> +     * Source Id
> +     * Once we support more than one hardware error sources, we need to
> +     * increase the value of this field.
> +     */
> +    build_append_int_noprefix(table_data, source_id, 2);
> +    /* Related Source Id */
> +    build_append_int_noprefix(table_data, 0xffff, 2);

Would be nice to have a comment on the value used ->
'no alternate sources'

> +    /* Flags */
> +    build_append_int_noprefix(table_data, 0, 1);
> +    /* Enabled */
> +    build_append_int_noprefix(table_data, 1, 1);
> +
> +    /* Number of Records To Pre-allocate */
> +    build_append_int_noprefix(table_data, 1, 4);
> +    /* Max Sections Per Record */
> +    build_append_int_noprefix(table_data, 1, 4);
> +    /* Max Raw Data Length */
> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> +
> +    /* Error Status Address */
> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> +                     4 /* QWord access */, 0);
> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +        source_id * ACPI_GHES_ADDRESS_SIZE);
> +
> +    /*
> +     * Notification Structure
> +     * Now only enable ARMv8 SEA notification type
> +     */
> +    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
> +
> +    /* Error Status Block Length */
> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> +
> +    /*
> +     * Read Ack Register
> +     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> +     * version 2 (GHESv2 - Type 10)
> +     */
> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> +                     4 /* QWord access */, 0);
> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> +        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> +        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
> +
> +    /*
> +     * Read Ack Preserve
> +     * We only provide the first bit in Read Ack Register to OSPM to write
> +     * while the other bits are preserved.
> +     */
> +    build_append_int_noprefix(table_data, ~0x1ULL, 8);
> +    /* Read Ack Write */
> +    build_append_int_noprefix(table_data, 0x1, 8);
> +
> +    build_header(linker, table_data, (void *)(table_data->data + hest_start),
> +        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
> +}
> +
Already mentioned .... but ...
the last few lines are GHESv2 specific but it seems that HES/GHES/GHESv2
are being mixed within this patch. Would be nice if those could be separated
to easy future extensions

BR

Beata

> +static AcpiGhesState ges;
> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> +{
> +
> +    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
> +    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
> +
> +    /* Create a read-only fw_cfg file for GHES */
> +    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
> +                    request_block_size);
> +
> +    /* Create a read-write fw_cfg file for Address */
> +    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> +        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> +}
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 2c3702b882..3681ec6e3d 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
>      tables->table_data = g_array_new(false, true /* clear */, 1);
>      tables->tcpalog = g_array_new(false, true /* clear */, 1);
>      tables->vmgenid = g_array_new(false, true /* clear */, 1);
> +    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
>      tables->linker = bios_linker_loader_init();
>  }
>
> @@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
>      g_array_free(tables->table_data, true);
>      g_array_free(tables->tcpalog, mfre);
>      g_array_free(tables->vmgenid, mfre);
> +    g_array_free(tables->hardware_errors, mfre);
>  }
>
>  /*
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 4cd50175e0..1b1fd273e4 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -48,6 +48,7 @@
>  #include "sysemu/reset.h"
>  #include "kvm_arm.h"
>  #include "migration/vmstate.h"
> +#include "hw/acpi/acpi_ghes.h"
>
>  #define ARM_SPI_BASE 32
>
> @@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>      acpi_add_table(table_offsets, tables_blob);
>      build_spcr(tables_blob, tables->linker, vms);
>
> +    if (vms->ras) {
> +        acpi_add_table(table_offsets, tables_blob);
> +        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
> +        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
> +                             tables->linker);
> +    }
> +
>      if (ms->numa_state->num_nodes > 0) {
>          acpi_add_table(table_offsets, tables_blob);
>          build_srat(tables_blob, tables->linker, vms);
> @@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
>      fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
>                      acpi_data_len(tables.tcpalog));
>
> +    if (vms->ras) {
> +        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
> +    }
> +
>      build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
>                                               build_state, tables.rsdp,
>                                               ACPI_BUILD_RSDP_FILE, 0);
> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> new file mode 100644
> index 0000000000..cb62ec9c7b
> --- /dev/null
> +++ b/include/hw/acpi/acpi_ghes.h
> @@ -0,0 +1,56 @@
> +/*
> + * Support for generating APEI tables and recording CPER for Guests
> + *
> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> + *
> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef ACPI_GHES_H
> +#define ACPI_GHES_H
> +
> +#include "hw/acpi/bios-linker-loader.h"
> +
> +/*
> + * Values for Hardware Error Notification Type field
> + */
> +enum AcpiGhesNotifyType {
> +    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
> +    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
> +    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
> +    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
> +    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
> +    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
> +    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
> +    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
> +    ACPI_GHES_NOTIFY_GPIO = 7,
> +    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_SEA = 8,
> +    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_SEI = 9,
> +    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
> +    ACPI_GHES_NOTIFY_GSIV = 10,
> +    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
> +    ACPI_GHES_NOTIFY_SDEI = 11,
> +    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
> +};
> +
> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> +                          BIOSLinker *linker);
> +
> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> +#endif
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index de4a406568..8f13620701 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -220,6 +220,7 @@ struct AcpiBuildTables {
>      GArray *rsdp;
>      GArray *tcpalog;
>      GArray *vmgenid;
> +    GArray *hardware_errors;
>      BIOSLinker *linker;
>  } AcpiBuildTables;
>
> --
> 2.19.1
>
>
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
  2019-11-18 12:49       ` gengdongjiu
@ 2019-11-22 15:44         ` Beata Michalska
  -1 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-22 15:44 UTC (permalink / raw)
  To: gengdongjiu
  Cc: Igor Mammedov, Xiang Zheng, Peter Maydell, ehabkost, kvm, mst,
	wanghaibin.wang, mtosatti, qemu-devel, linuxarm, shannon.zhaosl,
	qemu-arm, james.morse, xuwei5, jonathan.cameron, pbonzini,
	Laszlo Ersek, rth

Hi,

On Mon, 18 Nov 2019 at 12:50, gengdongjiu <gengdongjiu@huawei.com> wrote:
>
> Hi,Igor,
>    Thanks for you review and time.
>
> >
> >> +    /*
> >> +     * Type:
> >> +     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> >> +     */
> >> +    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
> >> +    /*
> >> +     * Source Id
> >
> >> +     * Once we support more than one hardware error sources, we need to
> >> +     * increase the value of this field.
> > I'm not sure ^^^ is correct, according to spec it's just unique id per
> > distinct error structure, so we just assign arbitrary values to each
> > declared source and that never changes once assigned.
> The source id is used to distinct the error source, for each source, the ‘source id’ is unique,
> but different source has different source id. for example, the 'source id' of the error source 0 is 0,
> the 'source id' of the error source 1 is 1.
>

I might be wrong but the source id is not a sequence number and it can
have any value as long
as it is unique and the comment 're 'increasing the number' reads bit wrong.

>
> >
> > For now I'd make source_id an enum with one member
> >   enum {
> >     ACPI_HEST_SRC_ID_SEA = 0,
> >     /* future ids go here */
> >     ACPI_HEST_SRC_ID_RESERVED,
> >   }
> If we only have one error source, we can use enum instead of allocating magic 0.
> But if we have more error source , such as 10 error source. using enum  maybe not a good idea.
>
> for example, if there are 10 error sources, I can just using below loop
>
> for(i=0; i< 10; i++)
>    build_ghes_v2(source_id++);
>

You can do that but using enum makes it more readable and maintainable.
Also you can keep the source id as a sequence number but still represent that
with enum, as it has been suggested, and use the 'RESERVED' field for
loop control.
I think it might be also worth to represent the HES type as enum as well :
enum{
    ACPI_HES_TYPE_GHESv2 = 10,

};

> >
> > and use that instead of allocating magic 0 at the beginning of the function.
> >  build_ghes_v2(ACPI_HEST_GHES_SEA);
> > Also add a comment to declaration that already assigned values are not to be changed
> >
> >> +     */
> >> +    build_append_int_noprefix(table_data, source_id, 2);
> >> +    /* Related Source Id */
> >> +    build_append_int_noprefix(table_data, 0xffff, 2);
> >> +    /* Flags */
> >> +    build_append_int_noprefix(table_data, 0, 1);
> >> +    /* Enabled */
> >> +    build_append_int_noprefix(table_data, 1, 1);
> >> +
> >> +    /* Number of Records To Pre-allocate */
> >> +    build_append_int_noprefix(table_data, 1, 4);
> >> +    /* Max Sections Per Record */
> >> +    build_append_int_noprefix(table_data, 1, 4);
> >> +    /* Max Raw Data Length */
> >> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> >> +
> >> +    /* Error Status Address */
> >> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> >> +                     4 /* QWord access */, 0);
> >> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> >> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> > it's fine only if GHESv2 is the only entries in HEST, but once
> > other types are added this macro will silently fall apart and
> > cause table corruption.
> >
> > Instead of offset from hest_start, I suggest to use offset relative
> > to GAS structure, here is an idea
> >
> > #define GAS_ADDR_OFFSET 4
> >
> >     off = table->len
> >     build_append_gas()
> >     bios_linker_loader_add_pointer(...,
> >         off + GAS_ADDR_OFFSET, ...
> I think your suggestion is good.
>
> >
> >> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> >> +        source_id * ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +    /*
> >> +     * Notification Structure
> >> +     * Now only enable ARMv8 SEA notification type
> >> +     */
> >> +    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
> >> +
> >> +    /* Error Status Block Length */
> >> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> >> +
> >> +    /*
> >> +     * Read Ack Register
> >> +     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> >> +     * version 2 (GHESv2 - Type 10)
> >> +     */
> >> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> >> +                     4 /* QWord access */, 0);
> >> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> >> +        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
> > ditto
> >
> >> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> >> +        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +    /*
> >> +     * Read Ack Preserve
> >> +     * We only provide the first bit in Read Ack Register to OSPM to write
> >> +     * while the other bits are preserved.
> >> +     */
> >> +    build_append_int_noprefix(table_data, ~0x1ULL, 8);
> >> +    /* Read Ack Write */
> >> +    build_append_int_noprefix(table_data, 0x1, 8);
> >> +
> >> +    build_header(linker, table_data, (void *)(table_data->data + hest_start),
> >> +        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
> > hest is not GHEST specific so s/GHES/NULL/
> >
> >> +}
> >> +
> >> +static AcpiGhesState ges;
> >> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> >> +{
> >> +
> >> +    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
> >> +    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
> >> +
> >
> >> +    /* Create a read-only fw_cfg file for GHES */
> >> +    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
> >> +                    request_block_size);
> >> +
> >> +    /* Create a read-write fw_cfg file for Address */
> >> +    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> >> +        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> >> +}
> >> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> >> index 2c3702b882..3681ec6e3d 100644
> >> --- a/hw/acpi/aml-build.c
> >> +++ b/hw/acpi/aml-build.c
> >> @@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
> >>      tables->table_data = g_array_new(false, true /* clear */, 1);
> >>      tables->tcpalog = g_array_new(false, true /* clear */, 1);
> >>      tables->vmgenid = g_array_new(false, true /* clear */, 1);
> >> +    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
> >>      tables->linker = bios_linker_loader_init();
> >>  }
> >>
> >> @@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
> >>      g_array_free(tables->table_data, true);
> >>      g_array_free(tables->tcpalog, mfre);
> >>      g_array_free(tables->vmgenid, mfre);
> >> +    g_array_free(tables->hardware_errors, mfre);
> >>  }
> >>
> >>  /*
> >> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> >> index 4cd50175e0..1b1fd273e4 100644
> >> --- a/hw/arm/virt-acpi-build.c
> >> +++ b/hw/arm/virt-acpi-build.c
> >> @@ -48,6 +48,7 @@
> >>  #include "sysemu/reset.h"
> >>  #include "kvm_arm.h"
> >>  #include "migration/vmstate.h"
> >> +#include "hw/acpi/acpi_ghes.h"
> >>
> >>  #define ARM_SPI_BASE 32
> >>
> >> @@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> >>      acpi_add_table(table_offsets, tables_blob);
> >>      build_spcr(tables_blob, tables->linker, vms);
> >>
> >> +    if (vms->ras) {
> >> +        acpi_add_table(table_offsets, tables_blob);
> >> +        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
> >> +        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
> >> +                             tables->linker);
> >> +    }
> >> +
> >>      if (ms->numa_state->num_nodes > 0) {
> >>          acpi_add_table(table_offsets, tables_blob);
> >>          build_srat(tables_blob, tables->linker, vms);
> >> @@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
> >>      fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
> >>                      acpi_data_len(tables.tcpalog));
> >>
> >> +    if (vms->ras) {
> >> +        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
> >> +    }
> >> +
> >>      build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
> >>                                               build_state, tables.rsdp,
> >>                                               ACPI_BUILD_RSDP_FILE, 0);
> >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> >> new file mode 100644
> >> index 0000000000..cb62ec9c7b
> >> --- /dev/null
> >> +++ b/include/hw/acpi/acpi_ghes.h
> >> @@ -0,0 +1,56 @@
> >> +/*
> >> + * Support for generating APEI tables and recording CPER for Guests
> >> + *
> >> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> >> + *
> >> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License as published by
> >> + * the Free Software Foundation; either version 2 of the License, or
> >> + * (at your option) any later version.
> >> +
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> +
> >> + * You should have received a copy of the GNU General Public License along
> >> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#ifndef ACPI_GHES_H
> >> +#define ACPI_GHES_H
> >> +
> >> +#include "hw/acpi/bios-linker-loader.h"
> >> +
> >> +/*
> >> + * Values for Hardware Error Notification Type field
> >> + */
> >> +enum AcpiGhesNotifyType {
> >> +    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
> >> +    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
> >> +    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
> >> +    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
> >> +    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
> >> +    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
> >> +    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
> >> +    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
> >> +    ACPI_GHES_NOTIFY_GPIO = 7,
> >> +    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
> >> +    ACPI_GHES_NOTIFY_SEA = 8,
> >> +    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
> >> +    ACPI_GHES_NOTIFY_SEI = 9,
> >> +    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
> >> +    ACPI_GHES_NOTIFY_GSIV = 10,
> >> +    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
> >> +    ACPI_GHES_NOTIFY_SDEI = 11,
> >> +    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
> >> +};
> > maybe make all comment go on newline, otherwise zoo above look ugly
> sure.
>
> >
> >> +
> >> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> >> +                          BIOSLinker *linker);
> >> +
> >> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> >> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> >> +#endif
> >> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> >> index de4a406568..8f13620701 100644
> >> --- a/include/hw/acpi/aml-build.h
> >> +++ b/include/hw/acpi/aml-build.h
> >> @@ -220,6 +220,7 @@ struct AcpiBuildTables {
> >>      GArray *rsdp;
> >>      GArray *tcpalog;
> >>      GArray *vmgenid;
> >> +    GArray *hardware_errors;
> >>      BIOSLinker *linker;
> >>  } AcpiBuildTables;
> >>
> >
> > .
> >
>
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
@ 2019-11-22 15:44         ` Beata Michalska
  0 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-22 15:44 UTC (permalink / raw)
  To: gengdongjiu
  Cc: Peter Maydell, ehabkost, kvm, mst, jonathan.cameron, pbonzini,
	mtosatti, qemu-devel, linuxarm, shannon.zhaosl, Xiang Zheng,
	qemu-arm, james.morse, xuwei5, wanghaibin.wang, Igor Mammedov,
	Laszlo Ersek, rth

Hi,

On Mon, 18 Nov 2019 at 12:50, gengdongjiu <gengdongjiu@huawei.com> wrote:
>
> Hi,Igor,
>    Thanks for you review and time.
>
> >
> >> +    /*
> >> +     * Type:
> >> +     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> >> +     */
> >> +    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
> >> +    /*
> >> +     * Source Id
> >
> >> +     * Once we support more than one hardware error sources, we need to
> >> +     * increase the value of this field.
> > I'm not sure ^^^ is correct, according to spec it's just unique id per
> > distinct error structure, so we just assign arbitrary values to each
> > declared source and that never changes once assigned.
> The source id is used to distinct the error source, for each source, the ‘source id’ is unique,
> but different source has different source id. for example, the 'source id' of the error source 0 is 0,
> the 'source id' of the error source 1 is 1.
>

I might be wrong but the source id is not a sequence number and it can
have any value as long
as it is unique and the comment 're 'increasing the number' reads bit wrong.

>
> >
> > For now I'd make source_id an enum with one member
> >   enum {
> >     ACPI_HEST_SRC_ID_SEA = 0,
> >     /* future ids go here */
> >     ACPI_HEST_SRC_ID_RESERVED,
> >   }
> If we only have one error source, we can use enum instead of allocating magic 0.
> But if we have more error source , such as 10 error source. using enum  maybe not a good idea.
>
> for example, if there are 10 error sources, I can just using below loop
>
> for(i=0; i< 10; i++)
>    build_ghes_v2(source_id++);
>

You can do that but using enum makes it more readable and maintainable.
Also you can keep the source id as a sequence number but still represent that
with enum, as it has been suggested, and use the 'RESERVED' field for
loop control.
I think it might be also worth to represent the HES type as enum as well :
enum{
    ACPI_HES_TYPE_GHESv2 = 10,

};

> >
> > and use that instead of allocating magic 0 at the beginning of the function.
> >  build_ghes_v2(ACPI_HEST_GHES_SEA);
> > Also add a comment to declaration that already assigned values are not to be changed
> >
> >> +     */
> >> +    build_append_int_noprefix(table_data, source_id, 2);
> >> +    /* Related Source Id */
> >> +    build_append_int_noprefix(table_data, 0xffff, 2);
> >> +    /* Flags */
> >> +    build_append_int_noprefix(table_data, 0, 1);
> >> +    /* Enabled */
> >> +    build_append_int_noprefix(table_data, 1, 1);
> >> +
> >> +    /* Number of Records To Pre-allocate */
> >> +    build_append_int_noprefix(table_data, 1, 4);
> >> +    /* Max Sections Per Record */
> >> +    build_append_int_noprefix(table_data, 1, 4);
> >> +    /* Max Raw Data Length */
> >> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> >> +
> >> +    /* Error Status Address */
> >> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> >> +                     4 /* QWord access */, 0);
> >> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> >> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> > it's fine only if GHESv2 is the only entries in HEST, but once
> > other types are added this macro will silently fall apart and
> > cause table corruption.
> >
> > Instead of offset from hest_start, I suggest to use offset relative
> > to GAS structure, here is an idea
> >
> > #define GAS_ADDR_OFFSET 4
> >
> >     off = table->len
> >     build_append_gas()
> >     bios_linker_loader_add_pointer(...,
> >         off + GAS_ADDR_OFFSET, ...
> I think your suggestion is good.
>
> >
> >> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> >> +        source_id * ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +    /*
> >> +     * Notification Structure
> >> +     * Now only enable ARMv8 SEA notification type
> >> +     */
> >> +    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
> >> +
> >> +    /* Error Status Block Length */
> >> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> >> +
> >> +    /*
> >> +     * Read Ack Register
> >> +     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> >> +     * version 2 (GHESv2 - Type 10)
> >> +     */
> >> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> >> +                     4 /* QWord access */, 0);
> >> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> >> +        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
> > ditto
> >
> >> +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> >> +        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +    /*
> >> +     * Read Ack Preserve
> >> +     * We only provide the first bit in Read Ack Register to OSPM to write
> >> +     * while the other bits are preserved.
> >> +     */
> >> +    build_append_int_noprefix(table_data, ~0x1ULL, 8);
> >> +    /* Read Ack Write */
> >> +    build_append_int_noprefix(table_data, 0x1, 8);
> >> +
> >> +    build_header(linker, table_data, (void *)(table_data->data + hest_start),
> >> +        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
> > hest is not GHEST specific so s/GHES/NULL/
> >
> >> +}
> >> +
> >> +static AcpiGhesState ges;
> >> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> >> +{
> >> +
> >> +    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
> >> +    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
> >> +
> >
> >> +    /* Create a read-only fw_cfg file for GHES */
> >> +    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
> >> +                    request_block_size);
> >> +
> >> +    /* Create a read-write fw_cfg file for Address */
> >> +    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> >> +        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> >> +}
> >> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> >> index 2c3702b882..3681ec6e3d 100644
> >> --- a/hw/acpi/aml-build.c
> >> +++ b/hw/acpi/aml-build.c
> >> @@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
> >>      tables->table_data = g_array_new(false, true /* clear */, 1);
> >>      tables->tcpalog = g_array_new(false, true /* clear */, 1);
> >>      tables->vmgenid = g_array_new(false, true /* clear */, 1);
> >> +    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
> >>      tables->linker = bios_linker_loader_init();
> >>  }
> >>
> >> @@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
> >>      g_array_free(tables->table_data, true);
> >>      g_array_free(tables->tcpalog, mfre);
> >>      g_array_free(tables->vmgenid, mfre);
> >> +    g_array_free(tables->hardware_errors, mfre);
> >>  }
> >>
> >>  /*
> >> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> >> index 4cd50175e0..1b1fd273e4 100644
> >> --- a/hw/arm/virt-acpi-build.c
> >> +++ b/hw/arm/virt-acpi-build.c
> >> @@ -48,6 +48,7 @@
> >>  #include "sysemu/reset.h"
> >>  #include "kvm_arm.h"
> >>  #include "migration/vmstate.h"
> >> +#include "hw/acpi/acpi_ghes.h"
> >>
> >>  #define ARM_SPI_BASE 32
> >>
> >> @@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> >>      acpi_add_table(table_offsets, tables_blob);
> >>      build_spcr(tables_blob, tables->linker, vms);
> >>
> >> +    if (vms->ras) {
> >> +        acpi_add_table(table_offsets, tables_blob);
> >> +        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
> >> +        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
> >> +                             tables->linker);
> >> +    }
> >> +
> >>      if (ms->numa_state->num_nodes > 0) {
> >>          acpi_add_table(table_offsets, tables_blob);
> >>          build_srat(tables_blob, tables->linker, vms);
> >> @@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
> >>      fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
> >>                      acpi_data_len(tables.tcpalog));
> >>
> >> +    if (vms->ras) {
> >> +        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
> >> +    }
> >> +
> >>      build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
> >>                                               build_state, tables.rsdp,
> >>                                               ACPI_BUILD_RSDP_FILE, 0);
> >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> >> new file mode 100644
> >> index 0000000000..cb62ec9c7b
> >> --- /dev/null
> >> +++ b/include/hw/acpi/acpi_ghes.h
> >> @@ -0,0 +1,56 @@
> >> +/*
> >> + * Support for generating APEI tables and recording CPER for Guests
> >> + *
> >> + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> >> + *
> >> + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License as published by
> >> + * the Free Software Foundation; either version 2 of the License, or
> >> + * (at your option) any later version.
> >> +
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> +
> >> + * You should have received a copy of the GNU General Public License along
> >> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#ifndef ACPI_GHES_H
> >> +#define ACPI_GHES_H
> >> +
> >> +#include "hw/acpi/bios-linker-loader.h"
> >> +
> >> +/*
> >> + * Values for Hardware Error Notification Type field
> >> + */
> >> +enum AcpiGhesNotifyType {
> >> +    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
> >> +    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
> >> +    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
> >> +    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
> >> +    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
> >> +    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
> >> +    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
> >> +    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
> >> +    ACPI_GHES_NOTIFY_GPIO = 7,
> >> +    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
> >> +    ACPI_GHES_NOTIFY_SEA = 8,
> >> +    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
> >> +    ACPI_GHES_NOTIFY_SEI = 9,
> >> +    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
> >> +    ACPI_GHES_NOTIFY_GSIV = 10,
> >> +    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
> >> +    ACPI_GHES_NOTIFY_SDEI = 11,
> >> +    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
> >> +};
> > maybe make all comment go on newline, otherwise zoo above look ugly
> sure.
>
> >
> >> +
> >> +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> >> +                          BIOSLinker *linker);
> >> +
> >> +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> >> +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> >> +#endif
> >> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> >> index de4a406568..8f13620701 100644
> >> --- a/include/hw/acpi/aml-build.h
> >> +++ b/include/hw/acpi/aml-build.h
> >> @@ -220,6 +220,7 @@ struct AcpiBuildTables {
> >>      GArray *rsdp;
> >>      GArray *tcpalog;
> >>      GArray *vmgenid;
> >> +    GArray *hardware_errors;
> >>      BIOSLinker *linker;
> >>  } AcpiBuildTables;
> >>
> >
> > .
> >
>
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-11  1:40   ` Xiang Zheng
@ 2019-11-22 15:47     ` Beata Michalska
  -1 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-22 15:47 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: pbonzini, mst, imammedo, shannon.zhaosl, Peter Maydell,
	Laszlo Ersek, james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

Hi,

On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> From: Dongjiu Geng <gengdongjiu@huawei.com>
>
> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> translates the host VA delivered by host to guest PA, then fills this PA
> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> type.
>
> When guest accesses the poisoned memory, it will generate a Synchronous
> External Abort(SEA). Then host kernel gets an APEI notification and calls
> memory_failure() to unmapped the affected page in stage 2, finally
> returns to guest.
>
> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> Qemu, Qemu records this error address into guest APEI GHES memory and
> notifes guest using Synchronous-External-Abort(SEA).
>
> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> in which we can setup the type of exception and the syndrome information.
> When switching to guest, the target vcpu will jump to the synchronous
> external abort vector table entry.
>
> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> not valid and hold an UNKNOWN value. These values will be set to KVM
> register structures through KVM_SET_ONE_REG IOCTL.
>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>  include/hw/acpi/acpi_ghes.h |   4 +
>  include/sysemu/kvm.h        |   3 +-
>  target/arm/cpu.h            |   4 +
>  target/arm/helper.c         |   2 +-
>  target/arm/internals.h      |   5 +-
>  target/arm/kvm64.c          |  64 ++++++++
>  target/arm/tlb_helper.c     |   2 +-
>  target/i386/cpu.h           |   2 +
>  9 files changed, 377 insertions(+), 6 deletions(-)
>
> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> index 42c00ff3d3..f5b54990c0 100644
> --- a/hw/acpi/acpi_ghes.c
> +++ b/hw/acpi/acpi_ghes.c
> @@ -39,6 +39,34 @@
>  /* The max size in bytes for one error block */
>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>
> +/*
> + * The total size of Generic Error Data Entry
> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> + * Table 18-343 Generic Error Data Entry
> + */
> +#define ACPI_GHES_DATA_LENGTH               72
> +
> +/*
> + * The memory section CPER size,
> + * UEFI 2.6: N.2.5 Memory Error Section
> + */
> +#define ACPI_GHES_MEM_CPER_LENGTH           80
> +
> +/*
> + * Masks for block_status flags
> + */
> +#define ACPI_GEBS_UNCORRECTABLE         1

Why not listing all supported statuses ? Similar to error severity below ?

> +
> +/*
> + * Values for error_severity field
> + */
> +enum AcpiGenericErrorSeverity {
> +    ACPI_CPER_SEV_RECOVERABLE,
> +    ACPI_CPER_SEV_FATAL,
> +    ACPI_CPER_SEV_CORRECTED,
> +    ACPI_CPER_SEV_NONE,
> +};
> +
>  /*
>   * Now only support ARMv8 SEA notification type error source
>   */
> @@ -49,6 +77,16 @@
>   */
>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>
> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> +
> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> +    0xED, 0x7C, 0x83, 0xB1)
> +
>  /*
>   * | +--------------------------+ 0
>   * | |        Header            |
> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>      uint64_t ghes_addr_le;
>  } AcpiGhesState;
>
> +/*
> + * Total size for Generic Error Status Block
> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> + * Table 18-380 Generic Error Status Block
> + */
> +#define ACPI_GHES_GESB_SIZE                 20

Minor: This is not entirely correct: GEDE is part of GESB so the total length
would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)

> +/* The offset of Data Length in Generic Error Status Block */
> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
> +

If those were nicely represented as structures you get the offsets easily
without having number of defines. That could simplify the code and make it
more readable - see comments below

> +/*
> + * Record the value of data length for each error status block to avoid getting
> + * this value from guest.
> + */
> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> +
> +/*
> + * Generic Error Data Entry
> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> + */
> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> +                uint32_t error_severity, uint16_t revision,
> +                uint8_t validation_bits, uint8_t flags,
> +                uint32_t error_data_length, QemuUUID fru_id,
> +                uint8_t *fru_text, uint64_t time_stamp)

Why not just defining a struct that represents the GED entry?

> +{
> +    QemuUUID uuid_le;
> +
> +    /* Section Type */
> +    uuid_le = qemu_uuid_bswap(section_type);
> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> +
> +    /* Error Severity */
> +    build_append_int_noprefix(table, error_severity, 4);
> +    /* Revision */
> +    build_append_int_noprefix(table, revision, 2);

Minor: According to the spec it seems that the revision number is
a fixed value so you could drop that from the parameters....
or ... use a struct to represent the data

> +    /* Validation Bits */
> +    build_append_int_noprefix(table, validation_bits, 1);
> +    /* Flags */
> +    build_append_int_noprefix(table, flags, 1);
> +    /* Error Data Length */
> +    build_append_int_noprefix(table, error_data_length, 4);
> +
> +    /* FRU Id */
> +    uuid_le = qemu_uuid_bswap(fru_id);
> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> +
> +    /* FRU Text */
> +    g_array_append_vals(table, fru_text, 20);
> +    /* Timestamp */
> +    build_append_int_noprefix(table, time_stamp, 8);
> +}
> +
> +/*
> + * Generic Error Status Block
> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> + */
> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> +                uint32_t raw_data_offset, uint32_t raw_data_length,
> +                uint32_t data_length, uint32_t error_severity)

Same as the above

> +{
> +    /* Block Status */
> +    build_append_int_noprefix(table, block_status, 4);
> +    /* Raw Data Offset */
> +    build_append_int_noprefix(table, raw_data_offset, 4);
> +    /* Raw Data Length */
> +    build_append_int_noprefix(table, raw_data_length, 4);
> +    /* Data Length */
> +    build_append_int_noprefix(table, data_length, 4);
> +    /* Error Severity */
> +    build_append_int_noprefix(table, error_severity, 4);
> +}
> +
> +/* UEFI 2.6: N.2.5 Memory Error Section */
> +static void acpi_ghes_build_append_mem_cper(GArray *table,
> +                                            uint64_t error_physical_addr)
> +{
> +    /*
> +     * Memory Error Record
> +     */
> +
> +    /* Validation Bits */
> +    build_append_int_noprefix(table,
> +                              (1UL << 14) | /* Type Valid */
> +                              (1UL << 1) /* Physical Address Valid */,
> +                              8);
> +    /* Error Status */
> +    build_append_int_noprefix(table, 0, 8);

Just wondering whether it would be worth to specify the Error Type
through the Error Status ?

> +    /* Physical Address */
> +    build_append_int_noprefix(table, error_physical_addr, 8);
> +    /* Skip all the detailed information normally found in such a record */
> +    build_append_int_noprefix(table, 0, 48);
> +    /* Memory Error Type */
> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> +    /* Skip all the detailed information normally found in such a record */
> +    build_append_int_noprefix(table, 0, 7);
> +}
> +
> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> +                                      uint64_t error_physical_addr,
> +                                      uint32_t data_length)
> +{
> +    GArray *block;
> +    uint64_t current_block_length;
> +    /* Memory Error Section Type */
> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;

As already mentioned - mixing LE /w BE

> +    QemuUUID fru_id = {};
> +    uint8_t fru_text[20] = {};
> +
> +    /*
> +     * Generic Error Status Block
> +     * | +---------------------+
> +     * | |     block_status    |
> +     * | +---------------------+
> +     * | |    raw_data_offset  |
> +     * | +---------------------+
> +     * | |    raw_data_length  |
> +     * | +---------------------+
> +     * | |     data_length     |
> +     * | +---------------------+
> +     * | |   error_severity    |
> +     * | +---------------------+
> +     */
> +    block = g_array_new(false, true /* clear */, 1);
> +
> +    /* The current whole length of the generic error status block */
> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> +
> +    /* This is the length if adding a new generic error data entry*/
> +    data_length += ACPI_GHES_DATA_LENGTH;
> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> +
> +    /*
> +     * Check whether it will run out of the preallocated memory if adding a new
> +     * generic error data entry
> +     */
> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> +        error_report("Record CPER out of boundary!!!");

Minor: The error message could be made more accurate, like:
    "Not enough memory to record new CPER"

> +        return ACPI_GHES_CPER_FAIL;
> +    }
> +
> +    /* Build the new generic error status block header */
> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
> +
> +    /* Write back above generic error status block header to guest memory */
> +    cpu_physical_memory_write(error_block_address, block->data,
> +                              block->len);
> +
> +    /* Add a new generic error data entry */
> +
> +    data_length = block->len;
> +    /* Build this new generic error data entry header */
> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
> +
> +    /* Build the memory section CPER for above new generic error data entry */
> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> +
> +    /* Write back above this new generic error data entry to guest memory */
> +    cpu_physical_memory_write(error_block_address + current_block_length,
> +        block->data + data_length, block->len - data_length);
> +

As already mentioned and unless I have missed smth (which is highly possible)
this will append new records while the GESB is kept 'in-place'. So the
used space is
only growing.

> +    g_array_free(block, true);
> +
> +    return ACPI_GHES_CPER_OK;
> +}
> +
>  /*
>   * Hardware Error Notification
>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>  }
> +
> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> +{
> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> +    int loop = 0;
> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
> +    bool ret = ACPI_GHES_CPER_FAIL;
> +    uint8_t source_id;
> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
> +

I'm not entirely sure why this is needed - se below

> +    /*
> +     * | +---------------------+ ges.ghes_addr_le
> +     * | |error_block_address0 |
> +     * | +---------------------+ --+--
> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> +     * | +---------------------+ --+--
> +     * | |error_block_addressN |
> +     * | +---------------------+
> +     * | | read_ack_register0  |
> +     * | +---------------------+ --+--
> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> +     * | +---------------------+ --+--
> +     * | | read_ack_registerN  |
> +     * | +---------------------+ --+--
> +     * | |      CPER           |   |
> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> +     * | |      CPER           |   |
> +     * | +---------------------+ --+--
> +     * | |    ..........       |
> +     * | +---------------------+
> +     * | |      CPER           |
> +     * | |      ....           |
> +     * | |      CPER           |
> +     * | +---------------------+
> +     */
> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> +        /* Find and check the source id for this new CPER */
> +        source_id = error_source_id[notify];

Why not using switch case for supported source types ?
For the time being only one is being supported. And you only use that to
verify that support - seems a bit unnecessary.

> +        if (source_id != 0xff) {
> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> +        } else {
> +            goto out;
> +        }
> +
> +        cpu_physical_memory_read(start_addr, &error_block_addr,
> +                                 ACPI_GHES_ADDRESS_SIZE);
> +
> +        read_ack_register_addr = start_addr +
> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
> +retry:
> +        cpu_physical_memory_read(read_ack_register_addr,
> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> +
> +        /* zero means OSPM does not acknowledge the error */
> +        if (!read_ack_register) {
> +            if (loop < 3) {
> +                usleep(100 * 1000);
> +                loop++;
> +                goto retry;
> +            } else {
> +                error_report("OSPM does not acknowledge previous error,"
> +                    " so can not record CPER for current error, forcibly"
> +                    " acknowledge previous error to avoid blocking next time"
> +                    " CPER record! Exit");
> +                read_ack_register = 1;
> +                cpu_physical_memory_write(read_ack_register_addr,
> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);

Already mentioned ...
This seems to be against the spec. It not only ignores the req
for OSPM to acknowledge receiving notifications for previous errors ,
but it also loses one of them. Why not caching it somewhere until
OSPM acknowledges the old ones ?

> +            }
> +        } else {
> +            if (error_block_addr) {

What is the use case for the address not being set ?

> +                read_ack_register = 0;
> +                /*
> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
> +                 * acknowledge this error.
> +                 */
> +                cpu_physical_memory_write(read_ack_register_addr,
> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);

If the ack register has been cleared - which is why we end up here ....
why writing it back if there is no notification for the system to process ?

> +                ret = acpi_ghes_record_mem_error(error_block_addr,
> +                          physical_address, acpi_ghes_data_length[source_id]);
> +                if (ret == ACPI_GHES_CPER_OK) {
> +                    acpi_ghes_data_length[source_id] +=
> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);

As mentioned .. this will run out of space - some roll-back
mechanism is needed to overwrite stale entries

> +                }
> +            }
> +        }
> +    }
> +
> +out:
> +    return ret;
> +}
> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> index cb62ec9c7b..8e3c5b879e 100644
> --- a/include/hw/acpi/acpi_ghes.h
> +++ b/include/hw/acpi/acpi_ghes.h
> @@ -24,6 +24,9 @@
>
>  #include "hw/acpi/bios-linker-loader.h"
>
> +#define ACPI_GHES_CPER_OK                   1
> +#define ACPI_GHES_CPER_FAIL                 0
> +

Is there really a need to introduce those ?

>  /*
>   * Values for Hardware Error Notification Type field
>   */
> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
>
>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
>  #endif

All the above should preferably land in a separate patch

> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 9d143282bc..321ead8115 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
>
> -#ifdef TARGET_I386
> -#define KVM_HAVE_MCE_INJECTION 1
> +#ifdef KVM_HAVE_MCE_INJECTION
>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
>  #endif
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index d844ea21d8..c4fe6ccc63 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -28,6 +28,10 @@
>  /* ARM processors have a weak memory model */
>  #define TCG_GUEST_DEFAULT_MO      (0)
>
> +#ifdef TARGET_AARCH64
> +#define KVM_HAVE_MCE_INJECTION 1
> +#endif
> +
>  #define EXCP_UDEF            1   /* undefined instruction */
>  #define EXCP_SWI             2   /* software interrupt */
>  #define EXCP_PREFETCH_ABORT  3
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 63815fc4cf..a9ce97efb1 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
>               * Report exception with ESR indicating a fault due to a
>               * translation table walk for a cache maintenance instruction.
>               */
> -            syn = syn_data_abort_no_iss(current_el == target_el,
> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
>              env->exception.vaddress = value;
>              env->exception.fsr = fsr;
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index f5313dd3d4..28b8451d6d 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
>  }
>
> -static inline uint32_t syn_data_abort_no_iss(int same_el,
> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
>                                               int ea, int cm, int s1ptw,
>                                               int wnr, int fsc)
>  {
>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
>             | ARM_EL_IL
> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
> +           | (wnr << 6) | fsc;
>  }
>
>  static inline uint32_t syn_data_abort_with_iss(int same_el,
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 28f6db57d5..c7b7653d3f 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -28,6 +28,8 @@
>  #include "kvm_arm.h"
>  #include "hw/boards.h"
>  #include "internals.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/acpi_ghes.h"
>
>  static bool have_guest_debug;
>
> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
>      return KVM_PUT_RUNTIME_STATE;
>  }
>
> +/* Callers must hold the iothread mutex lock */
> +static void kvm_inject_arm_sea(CPUState *c)

We could enclose this function along with the kvm_arch_on_sigbus_vcpu
within ifdef switch for KVM_HAVE_MCE_INJECTION

> +{
> +    ARMCPU *cpu = ARM_CPU(c);
> +    CPUARMState *env = &cpu->env;
> +    CPUClass *cc = CPU_GET_CLASS(c);
> +    uint32_t esr;
> +    bool same_el;
> +
> +    c->exception_index = EXCP_DATA_ABORT;
> +    env->exception.target_el = 1;
> +
> +    /*
> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> +     */
> +    same_el = arm_current_el(env) == env->exception.target_el;
> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);

IINM this is the only use case when FnV is considered to be valid
so I'm not convinced it is worth to modify the syn_data_abort_no_iss
just for this.

> +
> +    env->exception.syndrome = esr;
> +
> +    cc->do_interrupt(c);
> +}
> +
>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
>
> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>      return ret;
>  }
>
> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> +{
> +    ram_addr_t ram_addr;
> +    hwaddr paddr;
> +
> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> +
> +    if (acpi_enabled && addr &&
> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> +        ram_addr = qemu_ram_addr_from_host(addr);
> +        if (ram_addr != RAM_ADDR_INVALID &&
> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> +            kvm_hwpoison_page_add(ram_addr);
> +            /*
> +             * Asynchronous signal will be masked by main thread, so
> +             * only handle synchronous signal.
> +             */

I'm not entirely sure that the comment above is correct (it has been
pointed out before). I would expect the AO signal to be handled here as
well. Not having proper support to do that just yet is another story but
the comment might be bit misleading.


> +            if (code == BUS_MCEERR_AR) {
> +                kvm_cpu_synchronize_state(c);
> +                if (ACPI_GHES_CPER_FAIL !=
> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> +                    kvm_inject_arm_sea(c);
> +                } else {
> +                    fprintf(stderr, "failed to record the error\n");
> +                }
> +            }
> +            return;
> +        }
> +        fprintf(stderr, "Hardware memory error for memory used by "
> +                "QEMU itself instead of guest system!\n");
> +    }
> +
> +    if (code == BUS_MCEERR_AR) {
> +        fprintf(stderr, "Hardware memory error!\n");
> +        exit(1);
> +    }
> +}
> +
>  /* C6.6.29 BRK instruction */
>  static const uint32_t brk_insn = 0xd4200000;
>
> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> index 5feb312941..499672ebbc 100644
> --- a/target/arm/tlb_helper.c
> +++ b/target/arm/tlb_helper.c
> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
>       * ISV field.
>       */
>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> -        syn = syn_data_abort_no_iss(same_el,
> +        syn = syn_data_abort_no_iss(same_el, 0,
>                                      ea, 0, s1ptw, is_write, fsc);
>      } else {
>          /*
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 5352c9ff55..f75a210f96 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -29,6 +29,8 @@
>  /* The x86 has a strong memory model with some store-after-load re-ordering */
>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
>
> +#define KVM_HAVE_MCE_INJECTION 1
> +
>  /* Maximum instruction code size */
>  #define TARGET_MAX_INSN_SIZE 16
>
> --
> 2.19.1
>
>
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-11-22 15:47     ` Beata Michalska
  0 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-22 15:47 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, jonathan.cameron, imammedo, pbonzini, xuwei5,
	Laszlo Ersek, rth

Hi,

On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> From: Dongjiu Geng <gengdongjiu@huawei.com>
>
> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> translates the host VA delivered by host to guest PA, then fills this PA
> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> type.
>
> When guest accesses the poisoned memory, it will generate a Synchronous
> External Abort(SEA). Then host kernel gets an APEI notification and calls
> memory_failure() to unmapped the affected page in stage 2, finally
> returns to guest.
>
> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> Qemu, Qemu records this error address into guest APEI GHES memory and
> notifes guest using Synchronous-External-Abort(SEA).
>
> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> in which we can setup the type of exception and the syndrome information.
> When switching to guest, the target vcpu will jump to the synchronous
> external abort vector table entry.
>
> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> not valid and hold an UNKNOWN value. These values will be set to KVM
> register structures through KVM_SET_ONE_REG IOCTL.
>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>  include/hw/acpi/acpi_ghes.h |   4 +
>  include/sysemu/kvm.h        |   3 +-
>  target/arm/cpu.h            |   4 +
>  target/arm/helper.c         |   2 +-
>  target/arm/internals.h      |   5 +-
>  target/arm/kvm64.c          |  64 ++++++++
>  target/arm/tlb_helper.c     |   2 +-
>  target/i386/cpu.h           |   2 +
>  9 files changed, 377 insertions(+), 6 deletions(-)
>
> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> index 42c00ff3d3..f5b54990c0 100644
> --- a/hw/acpi/acpi_ghes.c
> +++ b/hw/acpi/acpi_ghes.c
> @@ -39,6 +39,34 @@
>  /* The max size in bytes for one error block */
>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>
> +/*
> + * The total size of Generic Error Data Entry
> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> + * Table 18-343 Generic Error Data Entry
> + */
> +#define ACPI_GHES_DATA_LENGTH               72
> +
> +/*
> + * The memory section CPER size,
> + * UEFI 2.6: N.2.5 Memory Error Section
> + */
> +#define ACPI_GHES_MEM_CPER_LENGTH           80
> +
> +/*
> + * Masks for block_status flags
> + */
> +#define ACPI_GEBS_UNCORRECTABLE         1

Why not listing all supported statuses ? Similar to error severity below ?

> +
> +/*
> + * Values for error_severity field
> + */
> +enum AcpiGenericErrorSeverity {
> +    ACPI_CPER_SEV_RECOVERABLE,
> +    ACPI_CPER_SEV_FATAL,
> +    ACPI_CPER_SEV_CORRECTED,
> +    ACPI_CPER_SEV_NONE,
> +};
> +
>  /*
>   * Now only support ARMv8 SEA notification type error source
>   */
> @@ -49,6 +77,16 @@
>   */
>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>
> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> +
> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> +    0xED, 0x7C, 0x83, 0xB1)
> +
>  /*
>   * | +--------------------------+ 0
>   * | |        Header            |
> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>      uint64_t ghes_addr_le;
>  } AcpiGhesState;
>
> +/*
> + * Total size for Generic Error Status Block
> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> + * Table 18-380 Generic Error Status Block
> + */
> +#define ACPI_GHES_GESB_SIZE                 20

Minor: This is not entirely correct: GEDE is part of GESB so the total length
would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)

> +/* The offset of Data Length in Generic Error Status Block */
> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
> +

If those were nicely represented as structures you get the offsets easily
without having number of defines. That could simplify the code and make it
more readable - see comments below

> +/*
> + * Record the value of data length for each error status block to avoid getting
> + * this value from guest.
> + */
> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> +
> +/*
> + * Generic Error Data Entry
> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> + */
> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> +                uint32_t error_severity, uint16_t revision,
> +                uint8_t validation_bits, uint8_t flags,
> +                uint32_t error_data_length, QemuUUID fru_id,
> +                uint8_t *fru_text, uint64_t time_stamp)

Why not just defining a struct that represents the GED entry?

> +{
> +    QemuUUID uuid_le;
> +
> +    /* Section Type */
> +    uuid_le = qemu_uuid_bswap(section_type);
> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> +
> +    /* Error Severity */
> +    build_append_int_noprefix(table, error_severity, 4);
> +    /* Revision */
> +    build_append_int_noprefix(table, revision, 2);

Minor: According to the spec it seems that the revision number is
a fixed value so you could drop that from the parameters....
or ... use a struct to represent the data

> +    /* Validation Bits */
> +    build_append_int_noprefix(table, validation_bits, 1);
> +    /* Flags */
> +    build_append_int_noprefix(table, flags, 1);
> +    /* Error Data Length */
> +    build_append_int_noprefix(table, error_data_length, 4);
> +
> +    /* FRU Id */
> +    uuid_le = qemu_uuid_bswap(fru_id);
> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> +
> +    /* FRU Text */
> +    g_array_append_vals(table, fru_text, 20);
> +    /* Timestamp */
> +    build_append_int_noprefix(table, time_stamp, 8);
> +}
> +
> +/*
> + * Generic Error Status Block
> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> + */
> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> +                uint32_t raw_data_offset, uint32_t raw_data_length,
> +                uint32_t data_length, uint32_t error_severity)

Same as the above

> +{
> +    /* Block Status */
> +    build_append_int_noprefix(table, block_status, 4);
> +    /* Raw Data Offset */
> +    build_append_int_noprefix(table, raw_data_offset, 4);
> +    /* Raw Data Length */
> +    build_append_int_noprefix(table, raw_data_length, 4);
> +    /* Data Length */
> +    build_append_int_noprefix(table, data_length, 4);
> +    /* Error Severity */
> +    build_append_int_noprefix(table, error_severity, 4);
> +}
> +
> +/* UEFI 2.6: N.2.5 Memory Error Section */
> +static void acpi_ghes_build_append_mem_cper(GArray *table,
> +                                            uint64_t error_physical_addr)
> +{
> +    /*
> +     * Memory Error Record
> +     */
> +
> +    /* Validation Bits */
> +    build_append_int_noprefix(table,
> +                              (1UL << 14) | /* Type Valid */
> +                              (1UL << 1) /* Physical Address Valid */,
> +                              8);
> +    /* Error Status */
> +    build_append_int_noprefix(table, 0, 8);

Just wondering whether it would be worth to specify the Error Type
through the Error Status ?

> +    /* Physical Address */
> +    build_append_int_noprefix(table, error_physical_addr, 8);
> +    /* Skip all the detailed information normally found in such a record */
> +    build_append_int_noprefix(table, 0, 48);
> +    /* Memory Error Type */
> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> +    /* Skip all the detailed information normally found in such a record */
> +    build_append_int_noprefix(table, 0, 7);
> +}
> +
> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> +                                      uint64_t error_physical_addr,
> +                                      uint32_t data_length)
> +{
> +    GArray *block;
> +    uint64_t current_block_length;
> +    /* Memory Error Section Type */
> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;

As already mentioned - mixing LE /w BE

> +    QemuUUID fru_id = {};
> +    uint8_t fru_text[20] = {};
> +
> +    /*
> +     * Generic Error Status Block
> +     * | +---------------------+
> +     * | |     block_status    |
> +     * | +---------------------+
> +     * | |    raw_data_offset  |
> +     * | +---------------------+
> +     * | |    raw_data_length  |
> +     * | +---------------------+
> +     * | |     data_length     |
> +     * | +---------------------+
> +     * | |   error_severity    |
> +     * | +---------------------+
> +     */
> +    block = g_array_new(false, true /* clear */, 1);
> +
> +    /* The current whole length of the generic error status block */
> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> +
> +    /* This is the length if adding a new generic error data entry*/
> +    data_length += ACPI_GHES_DATA_LENGTH;
> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> +
> +    /*
> +     * Check whether it will run out of the preallocated memory if adding a new
> +     * generic error data entry
> +     */
> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> +        error_report("Record CPER out of boundary!!!");

Minor: The error message could be made more accurate, like:
    "Not enough memory to record new CPER"

> +        return ACPI_GHES_CPER_FAIL;
> +    }
> +
> +    /* Build the new generic error status block header */
> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
> +
> +    /* Write back above generic error status block header to guest memory */
> +    cpu_physical_memory_write(error_block_address, block->data,
> +                              block->len);
> +
> +    /* Add a new generic error data entry */
> +
> +    data_length = block->len;
> +    /* Build this new generic error data entry header */
> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
> +
> +    /* Build the memory section CPER for above new generic error data entry */
> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> +
> +    /* Write back above this new generic error data entry to guest memory */
> +    cpu_physical_memory_write(error_block_address + current_block_length,
> +        block->data + data_length, block->len - data_length);
> +

As already mentioned and unless I have missed smth (which is highly possible)
this will append new records while the GESB is kept 'in-place'. So the
used space is
only growing.

> +    g_array_free(block, true);
> +
> +    return ACPI_GHES_CPER_OK;
> +}
> +
>  /*
>   * Hardware Error Notification
>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>  }
> +
> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> +{
> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> +    int loop = 0;
> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
> +    bool ret = ACPI_GHES_CPER_FAIL;
> +    uint8_t source_id;
> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
> +

I'm not entirely sure why this is needed - se below

> +    /*
> +     * | +---------------------+ ges.ghes_addr_le
> +     * | |error_block_address0 |
> +     * | +---------------------+ --+--
> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> +     * | +---------------------+ --+--
> +     * | |error_block_addressN |
> +     * | +---------------------+
> +     * | | read_ack_register0  |
> +     * | +---------------------+ --+--
> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> +     * | +---------------------+ --+--
> +     * | | read_ack_registerN  |
> +     * | +---------------------+ --+--
> +     * | |      CPER           |   |
> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> +     * | |      CPER           |   |
> +     * | +---------------------+ --+--
> +     * | |    ..........       |
> +     * | +---------------------+
> +     * | |      CPER           |
> +     * | |      ....           |
> +     * | |      CPER           |
> +     * | +---------------------+
> +     */
> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> +        /* Find and check the source id for this new CPER */
> +        source_id = error_source_id[notify];

Why not using switch case for supported source types ?
For the time being only one is being supported. And you only use that to
verify that support - seems a bit unnecessary.

> +        if (source_id != 0xff) {
> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> +        } else {
> +            goto out;
> +        }
> +
> +        cpu_physical_memory_read(start_addr, &error_block_addr,
> +                                 ACPI_GHES_ADDRESS_SIZE);
> +
> +        read_ack_register_addr = start_addr +
> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
> +retry:
> +        cpu_physical_memory_read(read_ack_register_addr,
> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> +
> +        /* zero means OSPM does not acknowledge the error */
> +        if (!read_ack_register) {
> +            if (loop < 3) {
> +                usleep(100 * 1000);
> +                loop++;
> +                goto retry;
> +            } else {
> +                error_report("OSPM does not acknowledge previous error,"
> +                    " so can not record CPER for current error, forcibly"
> +                    " acknowledge previous error to avoid blocking next time"
> +                    " CPER record! Exit");
> +                read_ack_register = 1;
> +                cpu_physical_memory_write(read_ack_register_addr,
> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);

Already mentioned ...
This seems to be against the spec. It not only ignores the req
for OSPM to acknowledge receiving notifications for previous errors ,
but it also loses one of them. Why not caching it somewhere until
OSPM acknowledges the old ones ?

> +            }
> +        } else {
> +            if (error_block_addr) {

What is the use case for the address not being set ?

> +                read_ack_register = 0;
> +                /*
> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
> +                 * acknowledge this error.
> +                 */
> +                cpu_physical_memory_write(read_ack_register_addr,
> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);

If the ack register has been cleared - which is why we end up here ....
why writing it back if there is no notification for the system to process ?

> +                ret = acpi_ghes_record_mem_error(error_block_addr,
> +                          physical_address, acpi_ghes_data_length[source_id]);
> +                if (ret == ACPI_GHES_CPER_OK) {
> +                    acpi_ghes_data_length[source_id] +=
> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);

As mentioned .. this will run out of space - some roll-back
mechanism is needed to overwrite stale entries

> +                }
> +            }
> +        }
> +    }
> +
> +out:
> +    return ret;
> +}
> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> index cb62ec9c7b..8e3c5b879e 100644
> --- a/include/hw/acpi/acpi_ghes.h
> +++ b/include/hw/acpi/acpi_ghes.h
> @@ -24,6 +24,9 @@
>
>  #include "hw/acpi/bios-linker-loader.h"
>
> +#define ACPI_GHES_CPER_OK                   1
> +#define ACPI_GHES_CPER_FAIL                 0
> +

Is there really a need to introduce those ?

>  /*
>   * Values for Hardware Error Notification Type field
>   */
> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
>
>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
>  #endif

All the above should preferably land in a separate patch

> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 9d143282bc..321ead8115 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
>
> -#ifdef TARGET_I386
> -#define KVM_HAVE_MCE_INJECTION 1
> +#ifdef KVM_HAVE_MCE_INJECTION
>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
>  #endif
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index d844ea21d8..c4fe6ccc63 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -28,6 +28,10 @@
>  /* ARM processors have a weak memory model */
>  #define TCG_GUEST_DEFAULT_MO      (0)
>
> +#ifdef TARGET_AARCH64
> +#define KVM_HAVE_MCE_INJECTION 1
> +#endif
> +
>  #define EXCP_UDEF            1   /* undefined instruction */
>  #define EXCP_SWI             2   /* software interrupt */
>  #define EXCP_PREFETCH_ABORT  3
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 63815fc4cf..a9ce97efb1 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
>               * Report exception with ESR indicating a fault due to a
>               * translation table walk for a cache maintenance instruction.
>               */
> -            syn = syn_data_abort_no_iss(current_el == target_el,
> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
>              env->exception.vaddress = value;
>              env->exception.fsr = fsr;
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index f5313dd3d4..28b8451d6d 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
>  }
>
> -static inline uint32_t syn_data_abort_no_iss(int same_el,
> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
>                                               int ea, int cm, int s1ptw,
>                                               int wnr, int fsc)
>  {
>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
>             | ARM_EL_IL
> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
> +           | (wnr << 6) | fsc;
>  }
>
>  static inline uint32_t syn_data_abort_with_iss(int same_el,
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 28f6db57d5..c7b7653d3f 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -28,6 +28,8 @@
>  #include "kvm_arm.h"
>  #include "hw/boards.h"
>  #include "internals.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/acpi_ghes.h"
>
>  static bool have_guest_debug;
>
> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
>      return KVM_PUT_RUNTIME_STATE;
>  }
>
> +/* Callers must hold the iothread mutex lock */
> +static void kvm_inject_arm_sea(CPUState *c)

We could enclose this function along with the kvm_arch_on_sigbus_vcpu
within ifdef switch for KVM_HAVE_MCE_INJECTION

> +{
> +    ARMCPU *cpu = ARM_CPU(c);
> +    CPUARMState *env = &cpu->env;
> +    CPUClass *cc = CPU_GET_CLASS(c);
> +    uint32_t esr;
> +    bool same_el;
> +
> +    c->exception_index = EXCP_DATA_ABORT;
> +    env->exception.target_el = 1;
> +
> +    /*
> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> +     */
> +    same_el = arm_current_el(env) == env->exception.target_el;
> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);

IINM this is the only use case when FnV is considered to be valid
so I'm not convinced it is worth to modify the syn_data_abort_no_iss
just for this.

> +
> +    env->exception.syndrome = esr;
> +
> +    cc->do_interrupt(c);
> +}
> +
>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
>
> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>      return ret;
>  }
>
> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> +{
> +    ram_addr_t ram_addr;
> +    hwaddr paddr;
> +
> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> +
> +    if (acpi_enabled && addr &&
> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> +        ram_addr = qemu_ram_addr_from_host(addr);
> +        if (ram_addr != RAM_ADDR_INVALID &&
> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> +            kvm_hwpoison_page_add(ram_addr);
> +            /*
> +             * Asynchronous signal will be masked by main thread, so
> +             * only handle synchronous signal.
> +             */

I'm not entirely sure that the comment above is correct (it has been
pointed out before). I would expect the AO signal to be handled here as
well. Not having proper support to do that just yet is another story but
the comment might be bit misleading.


> +            if (code == BUS_MCEERR_AR) {
> +                kvm_cpu_synchronize_state(c);
> +                if (ACPI_GHES_CPER_FAIL !=
> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> +                    kvm_inject_arm_sea(c);
> +                } else {
> +                    fprintf(stderr, "failed to record the error\n");
> +                }
> +            }
> +            return;
> +        }
> +        fprintf(stderr, "Hardware memory error for memory used by "
> +                "QEMU itself instead of guest system!\n");
> +    }
> +
> +    if (code == BUS_MCEERR_AR) {
> +        fprintf(stderr, "Hardware memory error!\n");
> +        exit(1);
> +    }
> +}
> +
>  /* C6.6.29 BRK instruction */
>  static const uint32_t brk_insn = 0xd4200000;
>
> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> index 5feb312941..499672ebbc 100644
> --- a/target/arm/tlb_helper.c
> +++ b/target/arm/tlb_helper.c
> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
>       * ISV field.
>       */
>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> -        syn = syn_data_abort_no_iss(same_el,
> +        syn = syn_data_abort_no_iss(same_el, 0,
>                                      ea, 0, s1ptw, is_write, fsc);
>      } else {
>          /*
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 5352c9ff55..f75a210f96 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -29,6 +29,8 @@
>  /* The x86 has a strong memory model with some store-after-load re-ordering */
>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
>
> +#define KVM_HAVE_MCE_INJECTION 1
> +
>  /* Maximum instruction code size */
>  #define TARGET_MAX_INSN_SIZE 16
>
> --
> 2.19.1
>
>
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-15 16:37     ` Igor Mammedov
@ 2019-11-22 15:47       ` Beata Michalska
  -1 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-22 15:47 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Xiang Zheng, Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang,
	mtosatti, linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl,
	qemu-arm, james.morse, xuwei5, jonathan.cameron, pbonzini,
	Laszlo Ersek, rth

Hi,

On Fri, 15 Nov 2019 at 16:54, Igor Mammedov <imammedo@redhat.com> wrote:
>
> On Mon, 11 Nov 2019 09:40:47 +0800
> Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> > From: Dongjiu Geng <gengdongjiu@huawei.com>
> >
> > Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> > translates the host VA delivered by host to guest PA, then fills this PA
> > to guest APEI GHES memory, then notifies guest according to the SIGBUS
> > type.
> >
> > When guest accesses the poisoned memory, it will generate a Synchronous
> > External Abort(SEA). Then host kernel gets an APEI notification and calls
> > memory_failure() to unmapped the affected page in stage 2, finally
> > returns to guest.
> >
> > Guest continues to access the PG_hwpoison page, it will trap to KVM as
> > stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> > Qemu, Qemu records this error address into guest APEI GHES memory and
> > notifes guest using Synchronous-External-Abort(SEA).
> >
> > In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> > in which we can setup the type of exception and the syndrome information.
> > When switching to guest, the target vcpu will jump to the synchronous
> > external abort vector table entry.
> >
> > The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> > ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> > not valid and hold an UNKNOWN value. These values will be set to KVM
> > register structures through KVM_SET_ONE_REG IOCTL.
> >
> > Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> > Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
> >  include/hw/acpi/acpi_ghes.h |   4 +
> >  include/sysemu/kvm.h        |   3 +-
> >  target/arm/cpu.h            |   4 +
> >  target/arm/helper.c         |   2 +-
> >  target/arm/internals.h      |   5 +-
> >  target/arm/kvm64.c          |  64 ++++++++
> >  target/arm/tlb_helper.c     |   2 +-
> >  target/i386/cpu.h           |   2 +
> >  9 files changed, 377 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> > index 42c00ff3d3..f5b54990c0 100644
> > --- a/hw/acpi/acpi_ghes.c
> > +++ b/hw/acpi/acpi_ghes.c
> > @@ -39,6 +39,34 @@
> >  /* The max size in bytes for one error block */
> >  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> >
> > +/*
> > + * The total size of Generic Error Data Entry
> > + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> > + * Table 18-343 Generic Error Data Entry
> > + */
> > +#define ACPI_GHES_DATA_LENGTH               72
> > +
> > +/*
> > + * The memory section CPER size,
> > + * UEFI 2.6: N.2.5 Memory Error Section
> > + */
> maybe use one line comment
>
> > +#define ACPI_GHES_MEM_CPER_LENGTH           80
> > +
> > +/*
> > + * Masks for block_status flags
> > + */
> ditto
>
> > +#define ACPI_GEBS_UNCORRECTABLE         1
> > +
> > +/*
> > + * Values for error_severity field
> > + */
> ditto
>
> > +enum AcpiGenericErrorSeverity {
> > +    ACPI_CPER_SEV_RECOVERABLE,
> > +    ACPI_CPER_SEV_FATAL,
> > +    ACPI_CPER_SEV_CORRECTED,
> > +    ACPI_CPER_SEV_NONE,
> I'd assign values explicitly here
>   foo = x,
>   ...
>
> > +};
> > +
> >  /*
> >   * Now only support ARMv8 SEA notification type error source
> >   */
> > @@ -49,6 +77,16 @@
> >   */
> >  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >
> > +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> > +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> > +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> > +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> > +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> > +
> > +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> > +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> > +    0xED, 0x7C, 0x83, 0xB1)
> > +
> >  /*
> >   * | +--------------------------+ 0
> >   * | |        Header            |
> > @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
> >      uint64_t ghes_addr_le;
> >  } AcpiGhesState;
> >
> > +/*
> > + * Total size for Generic Error Status Block
> > + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> > + * Table 18-380 Generic Error Status Block
> > + */
> > +#define ACPI_GHES_GESB_SIZE                 20
>
> > +/* The offset of Data Length in Generic Error Status Block */
> > +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
>
> unused, drop it
>
> > +
> > +/*
> > + * Record the value of data length for each error status block to avoid getting
> > + * this value from guest.
> > + */
> > +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> > +
> > +/*
> > + * Generic Error Data Entry
> > + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> > + */
> > +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> > +                uint32_t error_severity, uint16_t revision,
> > +                uint8_t validation_bits, uint8_t flags,
> > +                uint32_t error_data_length, QemuUUID fru_id,
> > +                uint8_t *fru_text, uint64_t time_stamp)
> > +{
> > +    QemuUUID uuid_le;
> > +
> > +    /* Section Type */
> > +    uuid_le = qemu_uuid_bswap(section_type);
> > +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> > +
> > +    /* Error Severity */
> > +    build_append_int_noprefix(table, error_severity, 4);
> > +    /* Revision */
> > +    build_append_int_noprefix(table, revision, 2);
> > +    /* Validation Bits */
> > +    build_append_int_noprefix(table, validation_bits, 1);
> > +    /* Flags */
> > +    build_append_int_noprefix(table, flags, 1);
> > +    /* Error Data Length */
> > +    build_append_int_noprefix(table, error_data_length, 4);
> > +
> > +    /* FRU Id */
> > +    uuid_le = qemu_uuid_bswap(fru_id);
> > +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> > +
> > +    /* FRU Text */
> > +    g_array_append_vals(table, fru_text, 20);
> what if fru_text were shorter than 20 bytes?
>
> Suggest to pass length along or
> drop all fru handling in the caller and just hardcode here invalid fru with empty text,
> as function could be extended later, once there is something meaningful to put in fru.
>
>
> > +    /* Timestamp */
> > +    build_append_int_noprefix(table, time_stamp, 8);
> > +}
> > +
> > +/*
> > + * Generic Error Status Block
> > + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> > + */
> > +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> > +                uint32_t raw_data_offset, uint32_t raw_data_length,
> > +                uint32_t data_length, uint32_t error_severity)
> > +{
> > +    /* Block Status */
> > +    build_append_int_noprefix(table, block_status, 4);
> > +    /* Raw Data Offset */
> > +    build_append_int_noprefix(table, raw_data_offset, 4);
> > +    /* Raw Data Length */
> > +    build_append_int_noprefix(table, raw_data_length, 4);
> > +    /* Data Length */
> > +    build_append_int_noprefix(table, data_length, 4);
> > +    /* Error Severity */
> > +    build_append_int_noprefix(table, error_severity, 4);
> > +}
> > +
> > +/* UEFI 2.6: N.2.5 Memory Error Section */
> > +static void acpi_ghes_build_append_mem_cper(GArray *table,
> > +                                            uint64_t error_physical_addr)
> I'd split out this and acpi_ghes_generic_error_status() and
> acpi_ghes_generic_error_data()  functions into a separate patch.
>
> > +{
> > +    /*
> > +     * Memory Error Record
> > +     */
> > +
> > +    /* Validation Bits */
> > +    build_append_int_noprefix(table,
>
> > +                              (1UL << 14) | /* Type Valid */
> > +                              (1UL << 1) /* Physical Address Valid */,
> shouldn't it use ULL suffixes?
>
> > +                              8);
> > +    /* Error Status */
> > +    build_append_int_noprefix(table, 0, 8);
> > +    /* Physical Address */
> > +    build_append_int_noprefix(table, error_physical_addr, 8);
> > +    /* Skip all the detailed information normally found in such a record */
> > +    build_append_int_noprefix(table, 0, 48);
> > +    /* Memory Error Type */
> > +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> > +    /* Skip all the detailed information normally found in such a record */
> > +    build_append_int_noprefix(table, 0, 7);
> > +}
> > +
> > +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> > +                                      uint64_t error_physical_addr,
> > +                                      uint32_t data_length)
> > +{
> > +    GArray *block;
> > +    uint64_t current_block_length;
> > +    /* Memory Error Section Type */
> > +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
>                                ^^
> UEFI_CPER_SEC_PLATFORM_MEM is defined as BE, so _le here is wrong
> and then later you use qemu_uuid_bswap() to make it LE.
>
> Why not define it as LE to begin with, like it's been done for NVDIMM_UUID_LE?
>
Is there a chance to make it common for both ?


>
> > +    QemuUUID fru_id = {};
> > +    uint8_t fru_text[20] = {};
> > +
> > +    /*
> > +     * Generic Error Status Block
> > +     * | +---------------------+
> > +     * | |     block_status    |
> > +     * | +---------------------+
> > +     * | |    raw_data_offset  |
> > +     * | +---------------------+
> > +     * | |    raw_data_length  |
> > +     * | +---------------------+
> > +     * | |     data_length     |
> > +     * | +---------------------+
> > +     * | |   error_severity    |
> > +     * | +---------------------+
> > +     */
> not necessary, just point to concrete part of ACPI spec if needed.
>
> > +    block = g_array_new(false, true /* clear */, 1);
> > +
> > +    /* The current whole length of the generic error status block */
> > +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> > +
> > +    /* This is the length if adding a new generic error data entry*/
> > +    data_length += ACPI_GHES_DATA_LENGTH;
> > +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> > +
> > +    /*
> > +     * Check whether it will run out of the preallocated memory if adding a new
> > +     * generic error data entry
> > +     */
> > +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> > +        error_report("Record CPER out of boundary!!!");
> > +        return ACPI_GHES_CPER_FAIL;
> > +    }
> > +
> > +    /* Build the new generic error status block header */
> > +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> > +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
> Down the road, the arguments are passed to build_append_int_noprefix() which takes
> numbers in host byte order, so manually calling cpu_to_le32() is wrong.
> just drop cpu_to_le32() here.
>
>
> > +
> > +    /* Write back above generic error status block header to guest memory */
> > +    cpu_physical_memory_write(error_block_address, block->data,
> > +                              block->len);
> > +
> > +    /* Add a new generic error data entry */
> > +
> > +    data_length = block->len;
> > +    /* Build this new generic error data entry header */
> > +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> > +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> > +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
> ditto
>
> > +
> > +    /* Build the memory section CPER for above new generic error data entry */
> > +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> > +
> > +    /* Write back above this new generic error data entry to guest memory */
> > +    cpu_physical_memory_write(error_block_address + current_block_length,
> > +        block->data + data_length, block->len - data_length);
>
> If I read it right you are in the first write build an updated "Error Status Block"
> header where you update "Data Length" to account for an additional
> "Error Data Entry" and then this second write appends a new "Error Data Entry"
> after the previous one (if any existed).
>
> Now for GHESv2, OSPM is supposed to copy existing "Error Status Block" and Ack
> that fact via "Read Ack Register" and QEMU must not overwrite old data until
> they are acked by OSPM.
>
> With that in mind appending a new error seems a pointless since guest
> already consumed any pre-existing error before we are able to write.
> So we can drop "Error Status Block" tracking and just
>  1. compose whole "Error Status Block" with 1 new "Error Data Entry"
>  2. check that it fits into start, start+ACPI_GHES_MAX_RAW_DATA_LENGTH range
>  3. push it into guest RAM with 1 only write
>
> and drop all data_length tracking related code.
>
> > +
> > +    g_array_free(block, true);
> > +
> > +    return ACPI_GHES_CPER_OK;
> > +}
> > +
> >  /*
> >   * Hardware Error Notification
> >   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> > @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> >      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> >          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> >  }
> > +
> > +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> > +{
> > +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > +    int loop = 0;
> > +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
>                                          ^^^^^^^^^^^^^^^
> Forgot to mention in patch [3/6],
>
> Migration is definitively broken here, since ges.ghes_addr_le is
> not migrated to target QEMU. For example how it should be done see:
>   vmgenid_addr_le and vmstate_vmgenid
>
> for that you'd need to make ghes_addr_le a part of some device
> (recently added hw/acpi/generic_event_device.c looks like suitable victim)
>
>
> > +    bool ret = ACPI_GHES_CPER_FAIL;
> > +    uint8_t source_id;
>
> > +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> > +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
> put map at the beginning of this file
>
> s/const/static const/
> s/error_source_id/ghes_notify2source_id_map/
>  = { ...,
>      ACPI_HEST_SCR_ID_SEA,
>      ...,
>      ACPI_HEST_SRC_ID_RESERVED
>    }
>
>
> > +
> > +    /*
> > +     * | +---------------------+ ges.ghes_addr_le
> > +     * | |error_block_address0 |
> > +     * | +---------------------+ --+--
> > +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> > +     * | +---------------------+ --+--
> > +     * | |error_block_addressN |
> > +     * | +---------------------+
> > +     * | | read_ack_register0  |
> > +     * | +---------------------+ --+--
> > +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> > +     * | +---------------------+ --+--
> > +     * | | read_ack_registerN  |
>
> above part is not necessary
>
> > +     * | +---------------------+ --+--
> > +     * | |      CPER           |   |
> > +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> > +     * | |      CPER           |   |
> > +     * | +---------------------+ --+--
> and this one is not precise as it holds not only CPER record
> Generic Error Status Block + Generic Error Data (with CPER inside)
>
> and looking at code here and spec I'm not sure we can actually do
> several Error Data Entries as implemented here, more on that later
>
> > +     * | |    ..........       |
> > +     * | +---------------------+
> > +     * | |      CPER           |
> > +     * | |      ....           |
> > +     * | |      CPER           |
> > +     * | +---------------------+
> > +     */
> > +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> > +        /* Find and check the source id for this new CPER */
> > +        source_id = error_source_id[notify];
> > +        if (source_id != 0xff) {
> > +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> > +        } else {
> > +            goto out;
> assert() ???
>
>
> > +        }
> > +
> > +        cpu_physical_memory_read(start_addr, &error_block_addr,
> > +                                 ACPI_GHES_ADDRESS_SIZE);
> > +
> > +        read_ack_register_addr = start_addr +
> > +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
> > +retry:
> > +        cpu_physical_memory_read(read_ack_register_addr,
> > +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> it's safer to use
>    sizeof(read_ack_register)
> instead of ACPI_GHES_ADDRESS_SIZE to make sure that stack won't be corrupted
> by accident later, the same applies to other reads.
>
> > +
> > +        /* zero means OSPM does not acknowledge the error */
> > +        if (!read_ack_register) {
> > +            if (loop < 3) {
> > +                usleep(100 * 1000);
> > +                loop++;
> > +                goto retry;
> as minimum this loop can stall guest repeatedly for 0.3s if guest triggers BQL,
> until it handles error.
>
> (not sure what to suggest here though)
>
> > +            } else {
> > +                error_report("OSPM does not acknowledge previous error,"
> > +                    " so can not record CPER for current error, forcibly"
> > +                    " acknowledge previous error to avoid blocking next time"
> > +                    " CPER record! Exit");
>
> Also error overwrite goes against the spec, which says
> "
> Platforms with RAS
> controllers must prevent concurrent accesses to the Error Status Block (i.e., the RAS controller
> must not overwrite the Error Status Block before the OS has completed reading it).
> ******************
> "
> we probably shouldn't override not acked block.
> Question is what bare metal machines do in this case?
>
> > +                read_ack_register = 1;
> > +                cpu_physical_memory_write(read_ack_register_addr,
> > +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> Function writes data as is, so one has to ensure that endianness of
> read_ack_register matches that of the spec/guest.
> The same applies to the code below marked with "^^^".
>
> > +            }
> > +        } else {
> > +            if (error_block_addr) {
>
> } else if () {
>
> > +                read_ack_register = 0;
> > +                /*
> > +                 * Clear the Read Ack Register, OSPM will write it to 1 when
> > +                 * acknowledge this error.
> > +                 */
> > +                cpu_physical_memory_write(read_ack_register_addr,
> > +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
>                          ^^^ - for 0 it doesn't really matter but conversion should be done
>                                 even if it's just for the sake of documenting interface
>
> > +                ret = acpi_ghes_record_mem_error(error_block_addr,
>                                                     ^^^^
>
> > +                          physical_address, acpi_ghes_data_length[source_id]);
>                              ^^^
>
> > +                if (ret == ACPI_GHES_CPER_OK) {
> > +                    acpi_ghes_data_length[source_id] +=
> > +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
> eventually we will run out of space and nothing short of QEMU restart will
> help to reclaim that.
>
> Also if you keep track of available space in QEMU,
> you'd also have to migrate it otherwise it's lost after migration.
> But maybe we don't need to keep a track of free space,
> see my another comment in acpi_ghes_record_mem_error()
>
> > +                }
> > +            }
> > +        }
> > +    }
> > +
> > +out:
> > +    return ret;
> > +}
> > diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> > index cb62ec9c7b..8e3c5b879e 100644
> > --- a/include/hw/acpi/acpi_ghes.h
> > +++ b/include/hw/acpi/acpi_ghes.h
> > @@ -24,6 +24,9 @@
> >
> >  #include "hw/acpi/bios-linker-loader.h"
> >
> > +#define ACPI_GHES_CPER_OK                   1
> > +#define ACPI_GHES_CPER_FAIL                 0
> > +
> >  /*
> >   * Values for Hardware Error Notification Type field
> >   */
> > @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> >
> >  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> >  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> > +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
> >  #endif
> > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> > index 9d143282bc..321ead8115 100644
> > --- a/include/sysemu/kvm.h
> > +++ b/include/sysemu/kvm.h
> > @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
> >  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
> >  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
> >
> > -#ifdef TARGET_I386
> > -#define KVM_HAVE_MCE_INJECTION 1
> > +#ifdef KVM_HAVE_MCE_INJECTION
> >  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
> >  #endif
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index d844ea21d8..c4fe6ccc63 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -28,6 +28,10 @@
> >  /* ARM processors have a weak memory model */
> >  #define TCG_GUEST_DEFAULT_MO      (0)
> >
> > +#ifdef TARGET_AARCH64
> > +#define KVM_HAVE_MCE_INJECTION 1
> > +#endif
> > +
> >  #define EXCP_UDEF            1   /* undefined instruction */
> >  #define EXCP_SWI             2   /* software interrupt */
> >  #define EXCP_PREFETCH_ABORT  3
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index 63815fc4cf..a9ce97efb1 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
> >               * Report exception with ESR indicating a fault due to a
> >               * translation table walk for a cache maintenance instruction.
> >               */
> > -            syn = syn_data_abort_no_iss(current_el == target_el,
> > +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
> >                                          fi.ea, 1, fi.s1ptw, 1, fsc);
> >              env->exception.vaddress = value;
> >              env->exception.fsr = fsr;
> > diff --git a/target/arm/internals.h b/target/arm/internals.h
> > index f5313dd3d4..28b8451d6d 100644
> > --- a/target/arm/internals.h
> > +++ b/target/arm/internals.h
> > @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
> >          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
> >  }
> >
> > -static inline uint32_t syn_data_abort_no_iss(int same_el,
> > +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
> >                                               int ea, int cm, int s1ptw,
> >                                               int wnr, int fsc)
> >  {
> >      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
> >             | ARM_EL_IL
> > -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
> > +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
> > +           | (wnr << 6) | fsc;
> >  }
> >
> >  static inline uint32_t syn_data_abort_with_iss(int same_el,
> > diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> > index 28f6db57d5..c7b7653d3f 100644
> > --- a/target/arm/kvm64.c
> > +++ b/target/arm/kvm64.c
> > @@ -28,6 +28,8 @@
> >  #include "kvm_arm.h"
> >  #include "hw/boards.h"
> >  #include "internals.h"
> > +#include "hw/acpi/acpi.h"
> > +#include "hw/acpi/acpi_ghes.h"
> >
> >  static bool have_guest_debug;
> >
> > @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
> >      return KVM_PUT_RUNTIME_STATE;
> >  }
> >
> > +/* Callers must hold the iothread mutex lock */
> > +static void kvm_inject_arm_sea(CPUState *c)
> > +{
> > +    ARMCPU *cpu = ARM_CPU(c);
> > +    CPUARMState *env = &cpu->env;
> > +    CPUClass *cc = CPU_GET_CLASS(c);
> > +    uint32_t esr;
> > +    bool same_el;
> > +
> > +    c->exception_index = EXCP_DATA_ABORT;
> > +    env->exception.target_el = 1;
> > +
> > +    /*
> > +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> > +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> > +     */
> > +    same_el = arm_current_el(env) == env->exception.target_el;
> > +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> > +
> > +    env->exception.syndrome = esr;
> > +
> > +    cc->do_interrupt(c);
> > +}
> > +
> >  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> >                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
> >
> > @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
> >      return ret;
> >  }
> >
> > +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> > +{
> > +    ram_addr_t ram_addr;
> > +    hwaddr paddr;
> > +
> > +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> you let BUS_MCEERR_AO in but then it's unused, so what's the purpose of allowing it?
>
> > +
> > +    if (acpi_enabled && addr &&
> > +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> > +        ram_addr = qemu_ram_addr_from_host(addr);
> > +        if (ram_addr != RAM_ADDR_INVALID &&
> > +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> > +            kvm_hwpoison_page_add(ram_addr);
> > +            /*
> > +             * Asynchronous signal will be masked by main thread, so
> > +             * only handle synchronous signal.
> > +             */
> > +            if (code == BUS_MCEERR_AR) {
> > +                kvm_cpu_synchronize_state(c);
> > +                if (ACPI_GHES_CPER_FAIL !=
> > +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> > +                    kvm_inject_arm_sea(c);
> > +                } else {
> > +                    fprintf(stderr, "failed to record the error\n");
>
> fprintf() shouldn't be used in new code
> and another question is is it's fine to ignore error ?
> maybe we should use error_fatal in such cases?
>
> > +                }
> > +            }
> > +            return;
> > +        }
> > +        fprintf(stderr, "Hardware memory error for memory used by "
> > +                "QEMU itself instead of guest system!\n");
>
> > +    }
> > +
> > +    if (code == BUS_MCEERR_AR) {
> > +        fprintf(stderr, "Hardware memory error!\n");
> > +        exit(1);
> > +    }
> > +}
> > +
> >  /* C6.6.29 BRK instruction */
> >  static const uint32_t brk_insn = 0xd4200000;
> >
> > diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> > index 5feb312941..499672ebbc 100644
> > --- a/target/arm/tlb_helper.c
> > +++ b/target/arm/tlb_helper.c
> > @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
> >       * ISV field.
> >       */
> >      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> > -        syn = syn_data_abort_no_iss(same_el,
> > +        syn = syn_data_abort_no_iss(same_el, 0,
> >                                      ea, 0, s1ptw, is_write, fsc);
> >      } else {
> >          /*
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index 5352c9ff55..f75a210f96 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -29,6 +29,8 @@
> >  /* The x86 has a strong memory model with some store-after-load re-ordering */
> >  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
> >
> > +#define KVM_HAVE_MCE_INJECTION 1
> > +
> >  /* Maximum instruction code size */
> >  #define TARGET_MAX_INSN_SIZE 16
> >
>
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-11-22 15:47       ` Beata Michalska
  0 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-22 15:47 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, ehabkost, kvm, mst, pbonzini, mtosatti, linuxarm,
	qemu-devel, shannon.zhaosl, Xiang Zheng, qemu-arm, james.morse,
	xuwei5, jonathan.cameron, wanghaibin.wang, gengdongjiu,
	Laszlo Ersek, rth

Hi,

On Fri, 15 Nov 2019 at 16:54, Igor Mammedov <imammedo@redhat.com> wrote:
>
> On Mon, 11 Nov 2019 09:40:47 +0800
> Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> > From: Dongjiu Geng <gengdongjiu@huawei.com>
> >
> > Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> > translates the host VA delivered by host to guest PA, then fills this PA
> > to guest APEI GHES memory, then notifies guest according to the SIGBUS
> > type.
> >
> > When guest accesses the poisoned memory, it will generate a Synchronous
> > External Abort(SEA). Then host kernel gets an APEI notification and calls
> > memory_failure() to unmapped the affected page in stage 2, finally
> > returns to guest.
> >
> > Guest continues to access the PG_hwpoison page, it will trap to KVM as
> > stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> > Qemu, Qemu records this error address into guest APEI GHES memory and
> > notifes guest using Synchronous-External-Abort(SEA).
> >
> > In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> > in which we can setup the type of exception and the syndrome information.
> > When switching to guest, the target vcpu will jump to the synchronous
> > external abort vector table entry.
> >
> > The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> > ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> > not valid and hold an UNKNOWN value. These values will be set to KVM
> > register structures through KVM_SET_ONE_REG IOCTL.
> >
> > Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> > Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
> >  include/hw/acpi/acpi_ghes.h |   4 +
> >  include/sysemu/kvm.h        |   3 +-
> >  target/arm/cpu.h            |   4 +
> >  target/arm/helper.c         |   2 +-
> >  target/arm/internals.h      |   5 +-
> >  target/arm/kvm64.c          |  64 ++++++++
> >  target/arm/tlb_helper.c     |   2 +-
> >  target/i386/cpu.h           |   2 +
> >  9 files changed, 377 insertions(+), 6 deletions(-)
> >
> > diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> > index 42c00ff3d3..f5b54990c0 100644
> > --- a/hw/acpi/acpi_ghes.c
> > +++ b/hw/acpi/acpi_ghes.c
> > @@ -39,6 +39,34 @@
> >  /* The max size in bytes for one error block */
> >  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> >
> > +/*
> > + * The total size of Generic Error Data Entry
> > + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> > + * Table 18-343 Generic Error Data Entry
> > + */
> > +#define ACPI_GHES_DATA_LENGTH               72
> > +
> > +/*
> > + * The memory section CPER size,
> > + * UEFI 2.6: N.2.5 Memory Error Section
> > + */
> maybe use one line comment
>
> > +#define ACPI_GHES_MEM_CPER_LENGTH           80
> > +
> > +/*
> > + * Masks for block_status flags
> > + */
> ditto
>
> > +#define ACPI_GEBS_UNCORRECTABLE         1
> > +
> > +/*
> > + * Values for error_severity field
> > + */
> ditto
>
> > +enum AcpiGenericErrorSeverity {
> > +    ACPI_CPER_SEV_RECOVERABLE,
> > +    ACPI_CPER_SEV_FATAL,
> > +    ACPI_CPER_SEV_CORRECTED,
> > +    ACPI_CPER_SEV_NONE,
> I'd assign values explicitly here
>   foo = x,
>   ...
>
> > +};
> > +
> >  /*
> >   * Now only support ARMv8 SEA notification type error source
> >   */
> > @@ -49,6 +77,16 @@
> >   */
> >  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >
> > +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> > +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> > +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> > +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> > +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> > +
> > +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> > +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> > +    0xED, 0x7C, 0x83, 0xB1)
> > +
> >  /*
> >   * | +--------------------------+ 0
> >   * | |        Header            |
> > @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
> >      uint64_t ghes_addr_le;
> >  } AcpiGhesState;
> >
> > +/*
> > + * Total size for Generic Error Status Block
> > + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> > + * Table 18-380 Generic Error Status Block
> > + */
> > +#define ACPI_GHES_GESB_SIZE                 20
>
> > +/* The offset of Data Length in Generic Error Status Block */
> > +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
>
> unused, drop it
>
> > +
> > +/*
> > + * Record the value of data length for each error status block to avoid getting
> > + * this value from guest.
> > + */
> > +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> > +
> > +/*
> > + * Generic Error Data Entry
> > + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> > + */
> > +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> > +                uint32_t error_severity, uint16_t revision,
> > +                uint8_t validation_bits, uint8_t flags,
> > +                uint32_t error_data_length, QemuUUID fru_id,
> > +                uint8_t *fru_text, uint64_t time_stamp)
> > +{
> > +    QemuUUID uuid_le;
> > +
> > +    /* Section Type */
> > +    uuid_le = qemu_uuid_bswap(section_type);
> > +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> > +
> > +    /* Error Severity */
> > +    build_append_int_noprefix(table, error_severity, 4);
> > +    /* Revision */
> > +    build_append_int_noprefix(table, revision, 2);
> > +    /* Validation Bits */
> > +    build_append_int_noprefix(table, validation_bits, 1);
> > +    /* Flags */
> > +    build_append_int_noprefix(table, flags, 1);
> > +    /* Error Data Length */
> > +    build_append_int_noprefix(table, error_data_length, 4);
> > +
> > +    /* FRU Id */
> > +    uuid_le = qemu_uuid_bswap(fru_id);
> > +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> > +
> > +    /* FRU Text */
> > +    g_array_append_vals(table, fru_text, 20);
> what if fru_text were shorter than 20 bytes?
>
> Suggest to pass length along or
> drop all fru handling in the caller and just hardcode here invalid fru with empty text,
> as function could be extended later, once there is something meaningful to put in fru.
>
>
> > +    /* Timestamp */
> > +    build_append_int_noprefix(table, time_stamp, 8);
> > +}
> > +
> > +/*
> > + * Generic Error Status Block
> > + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> > + */
> > +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> > +                uint32_t raw_data_offset, uint32_t raw_data_length,
> > +                uint32_t data_length, uint32_t error_severity)
> > +{
> > +    /* Block Status */
> > +    build_append_int_noprefix(table, block_status, 4);
> > +    /* Raw Data Offset */
> > +    build_append_int_noprefix(table, raw_data_offset, 4);
> > +    /* Raw Data Length */
> > +    build_append_int_noprefix(table, raw_data_length, 4);
> > +    /* Data Length */
> > +    build_append_int_noprefix(table, data_length, 4);
> > +    /* Error Severity */
> > +    build_append_int_noprefix(table, error_severity, 4);
> > +}
> > +
> > +/* UEFI 2.6: N.2.5 Memory Error Section */
> > +static void acpi_ghes_build_append_mem_cper(GArray *table,
> > +                                            uint64_t error_physical_addr)
> I'd split out this and acpi_ghes_generic_error_status() and
> acpi_ghes_generic_error_data()  functions into a separate patch.
>
> > +{
> > +    /*
> > +     * Memory Error Record
> > +     */
> > +
> > +    /* Validation Bits */
> > +    build_append_int_noprefix(table,
>
> > +                              (1UL << 14) | /* Type Valid */
> > +                              (1UL << 1) /* Physical Address Valid */,
> shouldn't it use ULL suffixes?
>
> > +                              8);
> > +    /* Error Status */
> > +    build_append_int_noprefix(table, 0, 8);
> > +    /* Physical Address */
> > +    build_append_int_noprefix(table, error_physical_addr, 8);
> > +    /* Skip all the detailed information normally found in such a record */
> > +    build_append_int_noprefix(table, 0, 48);
> > +    /* Memory Error Type */
> > +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> > +    /* Skip all the detailed information normally found in such a record */
> > +    build_append_int_noprefix(table, 0, 7);
> > +}
> > +
> > +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> > +                                      uint64_t error_physical_addr,
> > +                                      uint32_t data_length)
> > +{
> > +    GArray *block;
> > +    uint64_t current_block_length;
> > +    /* Memory Error Section Type */
> > +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
>                                ^^
> UEFI_CPER_SEC_PLATFORM_MEM is defined as BE, so _le here is wrong
> and then later you use qemu_uuid_bswap() to make it LE.
>
> Why not define it as LE to begin with, like it's been done for NVDIMM_UUID_LE?
>
Is there a chance to make it common for both ?


>
> > +    QemuUUID fru_id = {};
> > +    uint8_t fru_text[20] = {};
> > +
> > +    /*
> > +     * Generic Error Status Block
> > +     * | +---------------------+
> > +     * | |     block_status    |
> > +     * | +---------------------+
> > +     * | |    raw_data_offset  |
> > +     * | +---------------------+
> > +     * | |    raw_data_length  |
> > +     * | +---------------------+
> > +     * | |     data_length     |
> > +     * | +---------------------+
> > +     * | |   error_severity    |
> > +     * | +---------------------+
> > +     */
> not necessary, just point to concrete part of ACPI spec if needed.
>
> > +    block = g_array_new(false, true /* clear */, 1);
> > +
> > +    /* The current whole length of the generic error status block */
> > +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> > +
> > +    /* This is the length if adding a new generic error data entry*/
> > +    data_length += ACPI_GHES_DATA_LENGTH;
> > +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> > +
> > +    /*
> > +     * Check whether it will run out of the preallocated memory if adding a new
> > +     * generic error data entry
> > +     */
> > +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> > +        error_report("Record CPER out of boundary!!!");
> > +        return ACPI_GHES_CPER_FAIL;
> > +    }
> > +
> > +    /* Build the new generic error status block header */
> > +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> > +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
> Down the road, the arguments are passed to build_append_int_noprefix() which takes
> numbers in host byte order, so manually calling cpu_to_le32() is wrong.
> just drop cpu_to_le32() here.
>
>
> > +
> > +    /* Write back above generic error status block header to guest memory */
> > +    cpu_physical_memory_write(error_block_address, block->data,
> > +                              block->len);
> > +
> > +    /* Add a new generic error data entry */
> > +
> > +    data_length = block->len;
> > +    /* Build this new generic error data entry header */
> > +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> > +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> > +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
> ditto
>
> > +
> > +    /* Build the memory section CPER for above new generic error data entry */
> > +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> > +
> > +    /* Write back above this new generic error data entry to guest memory */
> > +    cpu_physical_memory_write(error_block_address + current_block_length,
> > +        block->data + data_length, block->len - data_length);
>
> If I read it right you are in the first write build an updated "Error Status Block"
> header where you update "Data Length" to account for an additional
> "Error Data Entry" and then this second write appends a new "Error Data Entry"
> after the previous one (if any existed).
>
> Now for GHESv2, OSPM is supposed to copy existing "Error Status Block" and Ack
> that fact via "Read Ack Register" and QEMU must not overwrite old data until
> they are acked by OSPM.
>
> With that in mind appending a new error seems a pointless since guest
> already consumed any pre-existing error before we are able to write.
> So we can drop "Error Status Block" tracking and just
>  1. compose whole "Error Status Block" with 1 new "Error Data Entry"
>  2. check that it fits into start, start+ACPI_GHES_MAX_RAW_DATA_LENGTH range
>  3. push it into guest RAM with 1 only write
>
> and drop all data_length tracking related code.
>
> > +
> > +    g_array_free(block, true);
> > +
> > +    return ACPI_GHES_CPER_OK;
> > +}
> > +
> >  /*
> >   * Hardware Error Notification
> >   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> > @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> >      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> >          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> >  }
> > +
> > +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> > +{
> > +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > +    int loop = 0;
> > +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
>                                          ^^^^^^^^^^^^^^^
> Forgot to mention in patch [3/6],
>
> Migration is definitively broken here, since ges.ghes_addr_le is
> not migrated to target QEMU. For example how it should be done see:
>   vmgenid_addr_le and vmstate_vmgenid
>
> for that you'd need to make ghes_addr_le a part of some device
> (recently added hw/acpi/generic_event_device.c looks like suitable victim)
>
>
> > +    bool ret = ACPI_GHES_CPER_FAIL;
> > +    uint8_t source_id;
>
> > +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> > +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
> put map at the beginning of this file
>
> s/const/static const/
> s/error_source_id/ghes_notify2source_id_map/
>  = { ...,
>      ACPI_HEST_SCR_ID_SEA,
>      ...,
>      ACPI_HEST_SRC_ID_RESERVED
>    }
>
>
> > +
> > +    /*
> > +     * | +---------------------+ ges.ghes_addr_le
> > +     * | |error_block_address0 |
> > +     * | +---------------------+ --+--
> > +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> > +     * | +---------------------+ --+--
> > +     * | |error_block_addressN |
> > +     * | +---------------------+
> > +     * | | read_ack_register0  |
> > +     * | +---------------------+ --+--
> > +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> > +     * | +---------------------+ --+--
> > +     * | | read_ack_registerN  |
>
> above part is not necessary
>
> > +     * | +---------------------+ --+--
> > +     * | |      CPER           |   |
> > +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> > +     * | |      CPER           |   |
> > +     * | +---------------------+ --+--
> and this one is not precise as it holds not only CPER record
> Generic Error Status Block + Generic Error Data (with CPER inside)
>
> and looking at code here and spec I'm not sure we can actually do
> several Error Data Entries as implemented here, more on that later
>
> > +     * | |    ..........       |
> > +     * | +---------------------+
> > +     * | |      CPER           |
> > +     * | |      ....           |
> > +     * | |      CPER           |
> > +     * | +---------------------+
> > +     */
> > +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> > +        /* Find and check the source id for this new CPER */
> > +        source_id = error_source_id[notify];
> > +        if (source_id != 0xff) {
> > +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> > +        } else {
> > +            goto out;
> assert() ???
>
>
> > +        }
> > +
> > +        cpu_physical_memory_read(start_addr, &error_block_addr,
> > +                                 ACPI_GHES_ADDRESS_SIZE);
> > +
> > +        read_ack_register_addr = start_addr +
> > +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
> > +retry:
> > +        cpu_physical_memory_read(read_ack_register_addr,
> > +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> it's safer to use
>    sizeof(read_ack_register)
> instead of ACPI_GHES_ADDRESS_SIZE to make sure that stack won't be corrupted
> by accident later, the same applies to other reads.
>
> > +
> > +        /* zero means OSPM does not acknowledge the error */
> > +        if (!read_ack_register) {
> > +            if (loop < 3) {
> > +                usleep(100 * 1000);
> > +                loop++;
> > +                goto retry;
> as minimum this loop can stall guest repeatedly for 0.3s if guest triggers BQL,
> until it handles error.
>
> (not sure what to suggest here though)
>
> > +            } else {
> > +                error_report("OSPM does not acknowledge previous error,"
> > +                    " so can not record CPER for current error, forcibly"
> > +                    " acknowledge previous error to avoid blocking next time"
> > +                    " CPER record! Exit");
>
> Also error overwrite goes against the spec, which says
> "
> Platforms with RAS
> controllers must prevent concurrent accesses to the Error Status Block (i.e., the RAS controller
> must not overwrite the Error Status Block before the OS has completed reading it).
> ******************
> "
> we probably shouldn't override not acked block.
> Question is what bare metal machines do in this case?
>
> > +                read_ack_register = 1;
> > +                cpu_physical_memory_write(read_ack_register_addr,
> > +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> Function writes data as is, so one has to ensure that endianness of
> read_ack_register matches that of the spec/guest.
> The same applies to the code below marked with "^^^".
>
> > +            }
> > +        } else {
> > +            if (error_block_addr) {
>
> } else if () {
>
> > +                read_ack_register = 0;
> > +                /*
> > +                 * Clear the Read Ack Register, OSPM will write it to 1 when
> > +                 * acknowledge this error.
> > +                 */
> > +                cpu_physical_memory_write(read_ack_register_addr,
> > +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
>                          ^^^ - for 0 it doesn't really matter but conversion should be done
>                                 even if it's just for the sake of documenting interface
>
> > +                ret = acpi_ghes_record_mem_error(error_block_addr,
>                                                     ^^^^
>
> > +                          physical_address, acpi_ghes_data_length[source_id]);
>                              ^^^
>
> > +                if (ret == ACPI_GHES_CPER_OK) {
> > +                    acpi_ghes_data_length[source_id] +=
> > +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
> eventually we will run out of space and nothing short of QEMU restart will
> help to reclaim that.
>
> Also if you keep track of available space in QEMU,
> you'd also have to migrate it otherwise it's lost after migration.
> But maybe we don't need to keep a track of free space,
> see my another comment in acpi_ghes_record_mem_error()
>
> > +                }
> > +            }
> > +        }
> > +    }
> > +
> > +out:
> > +    return ret;
> > +}
> > diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> > index cb62ec9c7b..8e3c5b879e 100644
> > --- a/include/hw/acpi/acpi_ghes.h
> > +++ b/include/hw/acpi/acpi_ghes.h
> > @@ -24,6 +24,9 @@
> >
> >  #include "hw/acpi/bios-linker-loader.h"
> >
> > +#define ACPI_GHES_CPER_OK                   1
> > +#define ACPI_GHES_CPER_FAIL                 0
> > +
> >  /*
> >   * Values for Hardware Error Notification Type field
> >   */
> > @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> >
> >  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> >  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> > +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
> >  #endif
> > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> > index 9d143282bc..321ead8115 100644
> > --- a/include/sysemu/kvm.h
> > +++ b/include/sysemu/kvm.h
> > @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
> >  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
> >  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
> >
> > -#ifdef TARGET_I386
> > -#define KVM_HAVE_MCE_INJECTION 1
> > +#ifdef KVM_HAVE_MCE_INJECTION
> >  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
> >  #endif
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index d844ea21d8..c4fe6ccc63 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -28,6 +28,10 @@
> >  /* ARM processors have a weak memory model */
> >  #define TCG_GUEST_DEFAULT_MO      (0)
> >
> > +#ifdef TARGET_AARCH64
> > +#define KVM_HAVE_MCE_INJECTION 1
> > +#endif
> > +
> >  #define EXCP_UDEF            1   /* undefined instruction */
> >  #define EXCP_SWI             2   /* software interrupt */
> >  #define EXCP_PREFETCH_ABORT  3
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index 63815fc4cf..a9ce97efb1 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
> >               * Report exception with ESR indicating a fault due to a
> >               * translation table walk for a cache maintenance instruction.
> >               */
> > -            syn = syn_data_abort_no_iss(current_el == target_el,
> > +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
> >                                          fi.ea, 1, fi.s1ptw, 1, fsc);
> >              env->exception.vaddress = value;
> >              env->exception.fsr = fsr;
> > diff --git a/target/arm/internals.h b/target/arm/internals.h
> > index f5313dd3d4..28b8451d6d 100644
> > --- a/target/arm/internals.h
> > +++ b/target/arm/internals.h
> > @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
> >          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
> >  }
> >
> > -static inline uint32_t syn_data_abort_no_iss(int same_el,
> > +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
> >                                               int ea, int cm, int s1ptw,
> >                                               int wnr, int fsc)
> >  {
> >      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
> >             | ARM_EL_IL
> > -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
> > +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
> > +           | (wnr << 6) | fsc;
> >  }
> >
> >  static inline uint32_t syn_data_abort_with_iss(int same_el,
> > diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> > index 28f6db57d5..c7b7653d3f 100644
> > --- a/target/arm/kvm64.c
> > +++ b/target/arm/kvm64.c
> > @@ -28,6 +28,8 @@
> >  #include "kvm_arm.h"
> >  #include "hw/boards.h"
> >  #include "internals.h"
> > +#include "hw/acpi/acpi.h"
> > +#include "hw/acpi/acpi_ghes.h"
> >
> >  static bool have_guest_debug;
> >
> > @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
> >      return KVM_PUT_RUNTIME_STATE;
> >  }
> >
> > +/* Callers must hold the iothread mutex lock */
> > +static void kvm_inject_arm_sea(CPUState *c)
> > +{
> > +    ARMCPU *cpu = ARM_CPU(c);
> > +    CPUARMState *env = &cpu->env;
> > +    CPUClass *cc = CPU_GET_CLASS(c);
> > +    uint32_t esr;
> > +    bool same_el;
> > +
> > +    c->exception_index = EXCP_DATA_ABORT;
> > +    env->exception.target_el = 1;
> > +
> > +    /*
> > +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> > +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> > +     */
> > +    same_el = arm_current_el(env) == env->exception.target_el;
> > +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> > +
> > +    env->exception.syndrome = esr;
> > +
> > +    cc->do_interrupt(c);
> > +}
> > +
> >  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> >                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
> >
> > @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
> >      return ret;
> >  }
> >
> > +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> > +{
> > +    ram_addr_t ram_addr;
> > +    hwaddr paddr;
> > +
> > +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> you let BUS_MCEERR_AO in but then it's unused, so what's the purpose of allowing it?
>
> > +
> > +    if (acpi_enabled && addr &&
> > +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> > +        ram_addr = qemu_ram_addr_from_host(addr);
> > +        if (ram_addr != RAM_ADDR_INVALID &&
> > +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> > +            kvm_hwpoison_page_add(ram_addr);
> > +            /*
> > +             * Asynchronous signal will be masked by main thread, so
> > +             * only handle synchronous signal.
> > +             */
> > +            if (code == BUS_MCEERR_AR) {
> > +                kvm_cpu_synchronize_state(c);
> > +                if (ACPI_GHES_CPER_FAIL !=
> > +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> > +                    kvm_inject_arm_sea(c);
> > +                } else {
> > +                    fprintf(stderr, "failed to record the error\n");
>
> fprintf() shouldn't be used in new code
> and another question is is it's fine to ignore error ?
> maybe we should use error_fatal in such cases?
>
> > +                }
> > +            }
> > +            return;
> > +        }
> > +        fprintf(stderr, "Hardware memory error for memory used by "
> > +                "QEMU itself instead of guest system!\n");
>
> > +    }
> > +
> > +    if (code == BUS_MCEERR_AR) {
> > +        fprintf(stderr, "Hardware memory error!\n");
> > +        exit(1);
> > +    }
> > +}
> > +
> >  /* C6.6.29 BRK instruction */
> >  static const uint32_t brk_insn = 0xd4200000;
> >
> > diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> > index 5feb312941..499672ebbc 100644
> > --- a/target/arm/tlb_helper.c
> > +++ b/target/arm/tlb_helper.c
> > @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
> >       * ISV field.
> >       */
> >      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> > -        syn = syn_data_abort_no_iss(same_el,
> > +        syn = syn_data_abort_no_iss(same_el, 0,
> >                                      ea, 0, s1ptw, is_write, fsc);
> >      } else {
> >          /*
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index 5352c9ff55..f75a210f96 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -29,6 +29,8 @@
> >  /* The x86 has a strong memory model with some store-after-load re-ordering */
> >  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
> >
> > +#define KVM_HAVE_MCE_INJECTION 1
> > +
> >  /* Maximum instruction code size */
> >  #define TARGET_MAX_INSN_SIZE 16
> >
>
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
  2019-11-22 15:42     ` Beata Michalska
@ 2019-11-25  9:23       ` Igor Mammedov
  -1 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-25  9:23 UTC (permalink / raw)
  To: Beata Michalska
  Cc: Xiang Zheng, pbonzini, mst, shannon.zhaosl, Peter Maydell,
	Laszlo Ersek, james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

On Fri, 22 Nov 2019 15:42:52 +0000
Beata Michalska <beata.michalska@linaro.org> wrote:

> Hi Xiang,
> 
> On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
> >
> > From: Dongjiu Geng <gengdongjiu@huawei.com>
> >
> > This patch implements APEI GHES Table generation via fw_cfg blobs. Now
> > it only supports ARMv8 SEA, a type of GHESv2 error source. Afterwards,
> > we can extend the supported types if needed. For the CPER section,
> > currently it is memory section because kernel mainly wants userspace to
> > handle the memory errors.
> >
> > This patch follows the spec ACPI 6.2 to build the Hardware Error Source
> > table. For more detailed information, please refer to document:
> > docs/specs/acpi_hest_ghes.rst
> >
> > Suggested-by: Laszlo Ersek <lersek@redhat.com>
> > Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> > Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  default-configs/arm-softmmu.mak |   1 +
> >  hw/acpi/Kconfig                 |   4 +
> >  hw/acpi/Makefile.objs           |   1 +
> >  hw/acpi/acpi_ghes.c             | 267 ++++++++++++++++++++++++++++++++
> >  hw/acpi/aml-build.c             |   2 +
> >  hw/arm/virt-acpi-build.c        |  12 ++
> >  include/hw/acpi/acpi_ghes.h     |  56 +++++++
> >  include/hw/acpi/aml-build.h     |   1 +
> >  8 files changed, 344 insertions(+)
> >  create mode 100644 hw/acpi/acpi_ghes.c
> >  create mode 100644 include/hw/acpi/acpi_ghes.h
> >
> > diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> > index 1f2e0e7fde..5722f3130e 100644
> > --- a/default-configs/arm-softmmu.mak
> > +++ b/default-configs/arm-softmmu.mak
> > @@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
> >  CONFIG_FSL_IMX7=y
> >  CONFIG_FSL_IMX6UL=y
> >  CONFIG_SEMIHOSTING=y
> > +CONFIG_ACPI_APEI=y
> > diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> > index 12e3f1e86e..ed8c34d238 100644
> > --- a/hw/acpi/Kconfig
> > +++ b/hw/acpi/Kconfig
> > @@ -23,6 +23,10 @@ config ACPI_NVDIMM
> >      bool
> >      depends on ACPI
> >
> > +config ACPI_APEI
> > +    bool
> > +    depends on ACPI
> > +
> >  config ACPI_PCI
> >      bool
> >      depends on ACPI && PCI
> > diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> > index 655a9c1973..84474b0ca8 100644
> > --- a/hw/acpi/Makefile.objs
> > +++ b/hw/acpi/Makefile.objs
> > @@ -5,6 +5,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
> >  common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
> >  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
> >  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> > +common-obj-$(CONFIG_ACPI_APEI) += acpi_ghes.o  
> 
> Minor: The 'acpi' prefix could be dropped - it does not seem to be used
> for other files (self impliend by the dir name).
> This also applies to most of the naming within this patch
> 
> >  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
> >  common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
> >  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> > diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> > new file mode 100644
> > index 0000000000..42c00ff3d3
> > --- /dev/null
> > +++ b/hw/acpi/acpi_ghes.c
> > @@ -0,0 +1,267 @@
> > +/*
> > + * Support for generating APEI tables and recording CPER for Guests
> > + *
> > + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> > + *
> > + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > +
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > +
> > + * You should have received a copy of the GNU General Public License along
> > + * with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "hw/acpi/acpi.h"
> > +#include "hw/acpi/aml-build.h"
> > +#include "hw/acpi/acpi_ghes.h"
> > +#include "hw/nvram/fw_cfg.h"
> > +#include "sysemu/sysemu.h"
> > +#include "qemu/error-report.h"
> > +
> > +#define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
> > +#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
> > +
> > +/*
> > + * The size of Address field in Generic Address Structure.
> > + * ACPI 2.0/3.0: 5.2.3.1 Generic Address Structure.
> > + */
> > +#define ACPI_GHES_ADDRESS_SIZE              8
> > +  
> As already mentioned, you can safely drop this and use sizeof(unit64_t).
> 
> > +/* The max size in bytes for one error block */
> > +#define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> > +
> > +/*
> > + * Now only support ARMv8 SEA notification type error source
> > + */
> > +#define ACPI_GHES_ERROR_SOURCE_COUNT        1
> > +
> > +/*
> > + * Generic Hardware Error Source version 2
> > + */
> > +#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10  
> 
> Minor: this is actually a type so would be good if the name would
> reflect that somehow......
> 
> > +
> > +/*
> > + * | +--------------------------+ 0
> > + * | |        Header            |
> > + * | +--------------------------+ 40---+-
> > + * | | .................        |      |
> > + * | | error_status_address-----+ 60   |
> > + * | | .................        |      |
> > + * | | read_ack_register--------+ 104  92
> > + * | | read_ack_preserve        |      |
> > + * | | read_ack_write           |      |
> > + * + +--------------------------+ 132--+-
> > + *
> > + * From above GHES definition, the error status address offset is 60;
> > + * the Read Ack Register offset is 104, the whole size of GHESv2 is 92
> > + */
> > +  
> This could potentially land into the doc instead.
> Also the GHEST is actually part of HEST so your offsets are for
> HEST not GHEST itself so the comment might be slightly misleading
> 
> > +/* The error status address offset in GHES */
> > +#define ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(start_addr, n) (start_addr + \
> > +            60 + offsetof(struct AcpiGenericAddress, address) + n * 92)
> > +
> > +/* The Read Ack Register offset in GHES */
> > +#define ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(start_addr, n) (start_addr +\
> > +            104 + offsetof(struct AcpiGenericAddress, address) + n * 92)
> > +
> > +typedef struct AcpiGhesState {
> > +    uint64_t ghes_addr_le;
> > +} AcpiGhesState;
> > +  
> Minor: Why AcpiGhes*State* ? And do we need the struct to track single address?
> 
> > +/*
> > + * Hardware Error Notification
> > + * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> > + */  
> You are referencing older spec here. The commit message states
> 6.2 version. Not to mention that 4.0 did not support ARMv8 SEA source.
> You should not mention sections that do not correspond to the spec
> the patch is based on.

normally we use the spec where structure appeared first,
and use later one only when there is no other choice
(i.e. work uses/implement fields that weren't in the original structure revision)


> 
> > +static void acpi_ghes_build_notify(GArray *table, const uint8_t type)  
> 
> As it has already been mentioned - the naming here could follow the existing
> convention. Also this function is creating Hardware Error Notification table
> which is not necessarily tightly connected to GHES
> Similarly this applies to the overall naming used within this patch.
> > +{
> > +        /* Type */
> > +        build_append_int_noprefix(table, type, 1);
> > +        /*
> > +         * Length:
> > +         * Total length of the structure in bytes
> > +         */
> > +        build_append_int_noprefix(table, 28, 1);
> > +        /* Configuration Write Enable */
> > +        build_append_int_noprefix(table, 0, 2);
> > +        /* Poll Interval */
> > +        build_append_int_noprefix(table, 0, 4);
> > +        /* Vector */
> > +        build_append_int_noprefix(table, 0, 4);
> > +        /* Switch To Polling Threshold Value */
> > +        build_append_int_noprefix(table, 0, 4);
> > +        /* Switch To Polling Threshold Window */
> > +        build_append_int_noprefix(table, 0, 4);
> > +        /* Error Threshold Value */
> > +        build_append_int_noprefix(table, 0, 4);
> > +        /* Error Threshold Window */
> > +        build_append_int_noprefix(table, 0, 4);  
> 
> Most of  those fields are being set to the same single value.
> Why not covering it all in one go ?


that's intentional.
yep it takes more lines to code but it also makes comparing
code against spec much easier as it practically matches table
in the spec line by line.

> > +}
> > +
> > +/* Build table for the hardware error fw_cfg blob */
> > +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker)
> > +{
> > +    int i, error_status_block_offset;
> > +
> > +    /*
> > +     * | +--------------------------+
> > +     * | |    error_block_address   |
> > +     * | |      ..........          |
> > +     * | +--------------------------+
> > +     * | |    read_ack_register     |
> > +     * | |     ...........          |
> > +     * | +--------------------------+
> > +     * | |  Error Status Data Block |
> > +     * | |      ........            |
> > +     * | +--------------------------+
> > +     */
> > +
> > +    /* Build error_block_address */
> > +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> > +        build_append_int_noprefix(hardware_errors, 0, ACPI_GHES_ADDRESS_SIZE);
> > +    }
> > +
> > +    /* Build read_ack_register */
> > +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> > +        /*
> > +         * Initialize the value of read_ack_register to 1, so GHES can be
> > +         * writeable in the first time.
> > +         * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> > +         * (GHESv2 - Type 10)
> > +         */
> > +        build_append_int_noprefix(hardware_errors, 1, ACPI_GHES_ADDRESS_SIZE);  
> This is a bit of a simplification (justified to some extent) but this
> should take into
> account both Read Ack Preserve and Read Ack Write masks.....
> or having at least a comment would be good
> 
> Also the above implies support only for GHESTv2 (the 'Ack' regs are GHESv2
> specific) still this is iterating over potentially available/supported
> hw error sources
> At this point it is ok but if the support gets extended this will not
> be valid - managing
> 'Ack' regs should be properly guarded for GHESv2 ..

It ok for code to be more complicated so it would be able to handle
other usecases in the future.
But if there aren't actual plans to add other usecases, then it's just
over-engineering which might mislead reader later on to trying
figure out what's going on here.
(modulo the case where we are defining ABI between guest and QEMU)

So I'd rather simplify code if there aren't plans to extend it,
and let someone else to generalize it when there is an actual need for it.

> 
> > +    }
> > +
> > +    /* Generic Error Status Block offset in the hardware error fw_cfg blob */
> > +    error_status_block_offset = hardware_errors->len;
> > +
> > +    /* Build Error Status Data Block */
> > +    build_append_int_noprefix(hardware_errors, 0,
> > +        ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
> > +
> > +    /* Allocate guest memory for the hardware error fw_cfg blob */
> > +    bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
> > +                             hardware_errors, 1, false);
> > +
> > +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> > +        /*
> > +         * Patch the address of Error Status Data Block into
> > +         * the error_block_address of hardware_errors fw_cfg blob
> > +         */
> > +        bios_linker_loader_add_pointer(linker,
> > +            ACPI_GHES_ERRORS_FW_CFG_FILE, ACPI_GHES_ADDRESS_SIZE * i,
> > +            ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> > +            error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
> > +    }
> > +
> > +    /*
> > +     * Write the address of hardware_errors fw_cfg blob into the
> > +     * hardware_errors_addr fw_cfg blob.
> > +     */
> > +    bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
> > +        0, ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
> > +}
> > +
> > +/* Build Hardware Error Source Table */
> > +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_errors,
> > +                          BIOSLinker *linker)
> > +{
> > +    uint32_t hest_start = table_data->len;
> > +    uint32_t source_id = 0;
> > +
> > +    /* Hardware Error Source Table header*/
> > +    acpi_data_push(table_data, sizeof(AcpiTableHeader));
> > +
> > +    /* Error Source Count */
> > +    build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> > +
> > +    /*
> > +     * Type:
> > +     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> > +     */
> > +    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
> > +    /*
> > +     * Source Id
> > +     * Once we support more than one hardware error sources, we need to
> > +     * increase the value of this field.
> > +     */
> > +    build_append_int_noprefix(table_data, source_id, 2);
> > +    /* Related Source Id */
> > +    build_append_int_noprefix(table_data, 0xffff, 2);  
> 
> Would be nice to have a comment on the value used ->
> 'no alternate sources'

usually we use verbaltim field name as in the spec table definition.
This way reader could easily find it in the spec and read on the meaning
of values. This is done to avoid copying needlessly spec text into QEMU.
So if 0xffff is described in the table definition, then typically just
field name comment is sufficient.

> > +    /* Flags */
> > +    build_append_int_noprefix(table_data, 0, 1);
> > +    /* Enabled */
> > +    build_append_int_noprefix(table_data, 1, 1);
> > +
> > +    /* Number of Records To Pre-allocate */
> > +    build_append_int_noprefix(table_data, 1, 4);
> > +    /* Max Sections Per Record */
> > +    build_append_int_noprefix(table_data, 1, 4);
> > +    /* Max Raw Data Length */
> > +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> > +
> > +    /* Error Status Address */
> > +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> > +                     4 /* QWord access */, 0);
> > +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> > +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> > +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> > +        source_id * ACPI_GHES_ADDRESS_SIZE);
> > +
> > +    /*
> > +     * Notification Structure
> > +     * Now only enable ARMv8 SEA notification type
> > +     */
> > +    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
> > +
> > +    /* Error Status Block Length */
> > +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> > +
> > +    /*
> > +     * Read Ack Register
> > +     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> > +     * version 2 (GHESv2 - Type 10)
> > +     */
> > +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> > +                     4 /* QWord access */, 0);
> > +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> > +        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
> > +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> > +        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
> > +
> > +    /*
> > +     * Read Ack Preserve
> > +     * We only provide the first bit in Read Ack Register to OSPM to write
> > +     * while the other bits are preserved.
> > +     */
> > +    build_append_int_noprefix(table_data, ~0x1ULL, 8);
> > +    /* Read Ack Write */
> > +    build_append_int_noprefix(table_data, 0x1, 8);
> > +
> > +    build_header(linker, table_data, (void *)(table_data->data + hest_start),
> > +        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
> > +}
> > +  
> Already mentioned .... but ...
> the last few lines are GHESv2 specific but it seems that HES/GHES/GHESv2
> are being mixed within this patch. Would be nice if those could be separated
> to easy future extensions
> 
> BR
> 
> Beata
> 
> > +static AcpiGhesState ges;
> > +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> > +{
> > +
> > +    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
> > +    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
> > +
> > +    /* Create a read-only fw_cfg file for GHES */
> > +    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
> > +                    request_block_size);
> > +
> > +    /* Create a read-write fw_cfg file for Address */
> > +    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> > +        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> > +}
> > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > index 2c3702b882..3681ec6e3d 100644
> > --- a/hw/acpi/aml-build.c
> > +++ b/hw/acpi/aml-build.c
> > @@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
> >      tables->table_data = g_array_new(false, true /* clear */, 1);
> >      tables->tcpalog = g_array_new(false, true /* clear */, 1);
> >      tables->vmgenid = g_array_new(false, true /* clear */, 1);
> > +    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
> >      tables->linker = bios_linker_loader_init();
> >  }
> >
> > @@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
> >      g_array_free(tables->table_data, true);
> >      g_array_free(tables->tcpalog, mfre);
> >      g_array_free(tables->vmgenid, mfre);
> > +    g_array_free(tables->hardware_errors, mfre);
> >  }
> >
> >  /*
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index 4cd50175e0..1b1fd273e4 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -48,6 +48,7 @@
> >  #include "sysemu/reset.h"
> >  #include "kvm_arm.h"
> >  #include "migration/vmstate.h"
> > +#include "hw/acpi/acpi_ghes.h"
> >
> >  #define ARM_SPI_BASE 32
> >
> > @@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> >      acpi_add_table(table_offsets, tables_blob);
> >      build_spcr(tables_blob, tables->linker, vms);
> >
> > +    if (vms->ras) {
> > +        acpi_add_table(table_offsets, tables_blob);
> > +        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
> > +        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
> > +                             tables->linker);
> > +    }
> > +
> >      if (ms->numa_state->num_nodes > 0) {
> >          acpi_add_table(table_offsets, tables_blob);
> >          build_srat(tables_blob, tables->linker, vms);
> > @@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
> >      fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
> >                      acpi_data_len(tables.tcpalog));
> >
> > +    if (vms->ras) {
> > +        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
> > +    }
> > +
> >      build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
> >                                               build_state, tables.rsdp,
> >                                               ACPI_BUILD_RSDP_FILE, 0);
> > diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> > new file mode 100644
> > index 0000000000..cb62ec9c7b
> > --- /dev/null
> > +++ b/include/hw/acpi/acpi_ghes.h
> > @@ -0,0 +1,56 @@
> > +/*
> > + * Support for generating APEI tables and recording CPER for Guests
> > + *
> > + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> > + *
> > + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > +
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > +
> > + * You should have received a copy of the GNU General Public License along
> > + * with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#ifndef ACPI_GHES_H
> > +#define ACPI_GHES_H
> > +
> > +#include "hw/acpi/bios-linker-loader.h"
> > +
> > +/*
> > + * Values for Hardware Error Notification Type field
> > + */
> > +enum AcpiGhesNotifyType {
> > +    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
> > +    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
> > +    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
> > +    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
> > +    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
> > +    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
> > +    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
> > +    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
> > +    ACPI_GHES_NOTIFY_GPIO = 7,
> > +    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
> > +    ACPI_GHES_NOTIFY_SEA = 8,
> > +    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
> > +    ACPI_GHES_NOTIFY_SEI = 9,
> > +    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
> > +    ACPI_GHES_NOTIFY_GSIV = 10,
> > +    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
> > +    ACPI_GHES_NOTIFY_SDEI = 11,
> > +    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
> > +};
> > +
> > +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> > +                          BIOSLinker *linker);
> > +
> > +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> > +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> > +#endif
> > diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> > index de4a406568..8f13620701 100644
> > --- a/include/hw/acpi/aml-build.h
> > +++ b/include/hw/acpi/aml-build.h
> > @@ -220,6 +220,7 @@ struct AcpiBuildTables {
> >      GArray *rsdp;
> >      GArray *tcpalog;
> >      GArray *vmgenid;
> > +    GArray *hardware_errors;
> >      BIOSLinker *linker;
> >  } AcpiBuildTables;
> >
> > --
> > 2.19.1
> >
> >
> >  
> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
@ 2019-11-25  9:23       ` Igor Mammedov
  0 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-25  9:23 UTC (permalink / raw)
  To: Beata Michalska
  Cc: Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, Xiang Zheng,
	qemu-arm, james.morse, xuwei5, jonathan.cameron, pbonzini,
	Laszlo Ersek, rth

On Fri, 22 Nov 2019 15:42:52 +0000
Beata Michalska <beata.michalska@linaro.org> wrote:

> Hi Xiang,
> 
> On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
> >
> > From: Dongjiu Geng <gengdongjiu@huawei.com>
> >
> > This patch implements APEI GHES Table generation via fw_cfg blobs. Now
> > it only supports ARMv8 SEA, a type of GHESv2 error source. Afterwards,
> > we can extend the supported types if needed. For the CPER section,
> > currently it is memory section because kernel mainly wants userspace to
> > handle the memory errors.
> >
> > This patch follows the spec ACPI 6.2 to build the Hardware Error Source
> > table. For more detailed information, please refer to document:
> > docs/specs/acpi_hest_ghes.rst
> >
> > Suggested-by: Laszlo Ersek <lersek@redhat.com>
> > Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> > Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  default-configs/arm-softmmu.mak |   1 +
> >  hw/acpi/Kconfig                 |   4 +
> >  hw/acpi/Makefile.objs           |   1 +
> >  hw/acpi/acpi_ghes.c             | 267 ++++++++++++++++++++++++++++++++
> >  hw/acpi/aml-build.c             |   2 +
> >  hw/arm/virt-acpi-build.c        |  12 ++
> >  include/hw/acpi/acpi_ghes.h     |  56 +++++++
> >  include/hw/acpi/aml-build.h     |   1 +
> >  8 files changed, 344 insertions(+)
> >  create mode 100644 hw/acpi/acpi_ghes.c
> >  create mode 100644 include/hw/acpi/acpi_ghes.h
> >
> > diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
> > index 1f2e0e7fde..5722f3130e 100644
> > --- a/default-configs/arm-softmmu.mak
> > +++ b/default-configs/arm-softmmu.mak
> > @@ -40,3 +40,4 @@ CONFIG_FSL_IMX25=y
> >  CONFIG_FSL_IMX7=y
> >  CONFIG_FSL_IMX6UL=y
> >  CONFIG_SEMIHOSTING=y
> > +CONFIG_ACPI_APEI=y
> > diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> > index 12e3f1e86e..ed8c34d238 100644
> > --- a/hw/acpi/Kconfig
> > +++ b/hw/acpi/Kconfig
> > @@ -23,6 +23,10 @@ config ACPI_NVDIMM
> >      bool
> >      depends on ACPI
> >
> > +config ACPI_APEI
> > +    bool
> > +    depends on ACPI
> > +
> >  config ACPI_PCI
> >      bool
> >      depends on ACPI && PCI
> > diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
> > index 655a9c1973..84474b0ca8 100644
> > --- a/hw/acpi/Makefile.objs
> > +++ b/hw/acpi/Makefile.objs
> > @@ -5,6 +5,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
> >  common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
> >  common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
> >  common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
> > +common-obj-$(CONFIG_ACPI_APEI) += acpi_ghes.o  
> 
> Minor: The 'acpi' prefix could be dropped - it does not seem to be used
> for other files (self impliend by the dir name).
> This also applies to most of the naming within this patch
> 
> >  common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
> >  common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
> >  common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
> > diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> > new file mode 100644
> > index 0000000000..42c00ff3d3
> > --- /dev/null
> > +++ b/hw/acpi/acpi_ghes.c
> > @@ -0,0 +1,267 @@
> > +/*
> > + * Support for generating APEI tables and recording CPER for Guests
> > + *
> > + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> > + *
> > + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > +
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > +
> > + * You should have received a copy of the GNU General Public License along
> > + * with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "hw/acpi/acpi.h"
> > +#include "hw/acpi/aml-build.h"
> > +#include "hw/acpi/acpi_ghes.h"
> > +#include "hw/nvram/fw_cfg.h"
> > +#include "sysemu/sysemu.h"
> > +#include "qemu/error-report.h"
> > +
> > +#define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
> > +#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
> > +
> > +/*
> > + * The size of Address field in Generic Address Structure.
> > + * ACPI 2.0/3.0: 5.2.3.1 Generic Address Structure.
> > + */
> > +#define ACPI_GHES_ADDRESS_SIZE              8
> > +  
> As already mentioned, you can safely drop this and use sizeof(unit64_t).
> 
> > +/* The max size in bytes for one error block */
> > +#define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> > +
> > +/*
> > + * Now only support ARMv8 SEA notification type error source
> > + */
> > +#define ACPI_GHES_ERROR_SOURCE_COUNT        1
> > +
> > +/*
> > + * Generic Hardware Error Source version 2
> > + */
> > +#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10  
> 
> Minor: this is actually a type so would be good if the name would
> reflect that somehow......
> 
> > +
> > +/*
> > + * | +--------------------------+ 0
> > + * | |        Header            |
> > + * | +--------------------------+ 40---+-
> > + * | | .................        |      |
> > + * | | error_status_address-----+ 60   |
> > + * | | .................        |      |
> > + * | | read_ack_register--------+ 104  92
> > + * | | read_ack_preserve        |      |
> > + * | | read_ack_write           |      |
> > + * + +--------------------------+ 132--+-
> > + *
> > + * From above GHES definition, the error status address offset is 60;
> > + * the Read Ack Register offset is 104, the whole size of GHESv2 is 92
> > + */
> > +  
> This could potentially land into the doc instead.
> Also the GHEST is actually part of HEST so your offsets are for
> HEST not GHEST itself so the comment might be slightly misleading
> 
> > +/* The error status address offset in GHES */
> > +#define ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(start_addr, n) (start_addr + \
> > +            60 + offsetof(struct AcpiGenericAddress, address) + n * 92)
> > +
> > +/* The Read Ack Register offset in GHES */
> > +#define ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(start_addr, n) (start_addr +\
> > +            104 + offsetof(struct AcpiGenericAddress, address) + n * 92)
> > +
> > +typedef struct AcpiGhesState {
> > +    uint64_t ghes_addr_le;
> > +} AcpiGhesState;
> > +  
> Minor: Why AcpiGhes*State* ? And do we need the struct to track single address?
> 
> > +/*
> > + * Hardware Error Notification
> > + * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> > + */  
> You are referencing older spec here. The commit message states
> 6.2 version. Not to mention that 4.0 did not support ARMv8 SEA source.
> You should not mention sections that do not correspond to the spec
> the patch is based on.

normally we use the spec where structure appeared first,
and use later one only when there is no other choice
(i.e. work uses/implement fields that weren't in the original structure revision)


> 
> > +static void acpi_ghes_build_notify(GArray *table, const uint8_t type)  
> 
> As it has already been mentioned - the naming here could follow the existing
> convention. Also this function is creating Hardware Error Notification table
> which is not necessarily tightly connected to GHES
> Similarly this applies to the overall naming used within this patch.
> > +{
> > +        /* Type */
> > +        build_append_int_noprefix(table, type, 1);
> > +        /*
> > +         * Length:
> > +         * Total length of the structure in bytes
> > +         */
> > +        build_append_int_noprefix(table, 28, 1);
> > +        /* Configuration Write Enable */
> > +        build_append_int_noprefix(table, 0, 2);
> > +        /* Poll Interval */
> > +        build_append_int_noprefix(table, 0, 4);
> > +        /* Vector */
> > +        build_append_int_noprefix(table, 0, 4);
> > +        /* Switch To Polling Threshold Value */
> > +        build_append_int_noprefix(table, 0, 4);
> > +        /* Switch To Polling Threshold Window */
> > +        build_append_int_noprefix(table, 0, 4);
> > +        /* Error Threshold Value */
> > +        build_append_int_noprefix(table, 0, 4);
> > +        /* Error Threshold Window */
> > +        build_append_int_noprefix(table, 0, 4);  
> 
> Most of  those fields are being set to the same single value.
> Why not covering it all in one go ?


that's intentional.
yep it takes more lines to code but it also makes comparing
code against spec much easier as it practically matches table
in the spec line by line.

> > +}
> > +
> > +/* Build table for the hardware error fw_cfg blob */
> > +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker)
> > +{
> > +    int i, error_status_block_offset;
> > +
> > +    /*
> > +     * | +--------------------------+
> > +     * | |    error_block_address   |
> > +     * | |      ..........          |
> > +     * | +--------------------------+
> > +     * | |    read_ack_register     |
> > +     * | |     ...........          |
> > +     * | +--------------------------+
> > +     * | |  Error Status Data Block |
> > +     * | |      ........            |
> > +     * | +--------------------------+
> > +     */
> > +
> > +    /* Build error_block_address */
> > +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> > +        build_append_int_noprefix(hardware_errors, 0, ACPI_GHES_ADDRESS_SIZE);
> > +    }
> > +
> > +    /* Build read_ack_register */
> > +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> > +        /*
> > +         * Initialize the value of read_ack_register to 1, so GHES can be
> > +         * writeable in the first time.
> > +         * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> > +         * (GHESv2 - Type 10)
> > +         */
> > +        build_append_int_noprefix(hardware_errors, 1, ACPI_GHES_ADDRESS_SIZE);  
> This is a bit of a simplification (justified to some extent) but this
> should take into
> account both Read Ack Preserve and Read Ack Write masks.....
> or having at least a comment would be good
> 
> Also the above implies support only for GHESTv2 (the 'Ack' regs are GHESv2
> specific) still this is iterating over potentially available/supported
> hw error sources
> At this point it is ok but if the support gets extended this will not
> be valid - managing
> 'Ack' regs should be properly guarded for GHESv2 ..

It ok for code to be more complicated so it would be able to handle
other usecases in the future.
But if there aren't actual plans to add other usecases, then it's just
over-engineering which might mislead reader later on to trying
figure out what's going on here.
(modulo the case where we are defining ABI between guest and QEMU)

So I'd rather simplify code if there aren't plans to extend it,
and let someone else to generalize it when there is an actual need for it.

> 
> > +    }
> > +
> > +    /* Generic Error Status Block offset in the hardware error fw_cfg blob */
> > +    error_status_block_offset = hardware_errors->len;
> > +
> > +    /* Build Error Status Data Block */
> > +    build_append_int_noprefix(hardware_errors, 0,
> > +        ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
> > +
> > +    /* Allocate guest memory for the hardware error fw_cfg blob */
> > +    bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
> > +                             hardware_errors, 1, false);
> > +
> > +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> > +        /*
> > +         * Patch the address of Error Status Data Block into
> > +         * the error_block_address of hardware_errors fw_cfg blob
> > +         */
> > +        bios_linker_loader_add_pointer(linker,
> > +            ACPI_GHES_ERRORS_FW_CFG_FILE, ACPI_GHES_ADDRESS_SIZE * i,
> > +            ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> > +            error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
> > +    }
> > +
> > +    /*
> > +     * Write the address of hardware_errors fw_cfg blob into the
> > +     * hardware_errors_addr fw_cfg blob.
> > +     */
> > +    bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
> > +        0, ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
> > +}
> > +
> > +/* Build Hardware Error Source Table */
> > +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_errors,
> > +                          BIOSLinker *linker)
> > +{
> > +    uint32_t hest_start = table_data->len;
> > +    uint32_t source_id = 0;
> > +
> > +    /* Hardware Error Source Table header*/
> > +    acpi_data_push(table_data, sizeof(AcpiTableHeader));
> > +
> > +    /* Error Source Count */
> > +    build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> > +
> > +    /*
> > +     * Type:
> > +     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> > +     */
> > +    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
> > +    /*
> > +     * Source Id
> > +     * Once we support more than one hardware error sources, we need to
> > +     * increase the value of this field.
> > +     */
> > +    build_append_int_noprefix(table_data, source_id, 2);
> > +    /* Related Source Id */
> > +    build_append_int_noprefix(table_data, 0xffff, 2);  
> 
> Would be nice to have a comment on the value used ->
> 'no alternate sources'

usually we use verbaltim field name as in the spec table definition.
This way reader could easily find it in the spec and read on the meaning
of values. This is done to avoid copying needlessly spec text into QEMU.
So if 0xffff is described in the table definition, then typically just
field name comment is sufficient.

> > +    /* Flags */
> > +    build_append_int_noprefix(table_data, 0, 1);
> > +    /* Enabled */
> > +    build_append_int_noprefix(table_data, 1, 1);
> > +
> > +    /* Number of Records To Pre-allocate */
> > +    build_append_int_noprefix(table_data, 1, 4);
> > +    /* Max Sections Per Record */
> > +    build_append_int_noprefix(table_data, 1, 4);
> > +    /* Max Raw Data Length */
> > +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> > +
> > +    /* Error Status Address */
> > +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> > +                     4 /* QWord access */, 0);
> > +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> > +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> > +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> > +        source_id * ACPI_GHES_ADDRESS_SIZE);
> > +
> > +    /*
> > +     * Notification Structure
> > +     * Now only enable ARMv8 SEA notification type
> > +     */
> > +    acpi_ghes_build_notify(table_data, ACPI_GHES_NOTIFY_SEA);
> > +
> > +    /* Error Status Block Length */
> > +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> > +
> > +    /*
> > +     * Read Ack Register
> > +     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> > +     * version 2 (GHESv2 - Type 10)
> > +     */
> > +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> > +                     4 /* QWord access */, 0);
> > +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> > +        ACPI_GHES_READ_ACK_REGISTER_ADDRESS_OFFSET(hest_start, 0),
> > +        ACPI_GHES_ADDRESS_SIZE, ACPI_GHES_ERRORS_FW_CFG_FILE,
> > +        (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * ACPI_GHES_ADDRESS_SIZE);
> > +
> > +    /*
> > +     * Read Ack Preserve
> > +     * We only provide the first bit in Read Ack Register to OSPM to write
> > +     * while the other bits are preserved.
> > +     */
> > +    build_append_int_noprefix(table_data, ~0x1ULL, 8);
> > +    /* Read Ack Write */
> > +    build_append_int_noprefix(table_data, 0x1, 8);
> > +
> > +    build_header(linker, table_data, (void *)(table_data->data + hest_start),
> > +        "HEST", table_data->len - hest_start, 1, NULL, "GHES");
> > +}
> > +  
> Already mentioned .... but ...
> the last few lines are GHESv2 specific but it seems that HES/GHES/GHESv2
> are being mixed within this patch. Would be nice if those could be separated
> to easy future extensions
> 
> BR
> 
> Beata
> 
> > +static AcpiGhesState ges;
> > +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> > +{
> > +
> > +    size_t size = 2 * ACPI_GHES_ADDRESS_SIZE + ACPI_GHES_MAX_RAW_DATA_LENGTH;
> > +    size_t request_block_size = ACPI_GHES_ERROR_SOURCE_COUNT * size;
> > +
> > +    /* Create a read-only fw_cfg file for GHES */
> > +    fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
> > +                    request_block_size);
> > +
> > +    /* Create a read-write fw_cfg file for Address */
> > +    fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> > +        NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> > +}
> > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > index 2c3702b882..3681ec6e3d 100644
> > --- a/hw/acpi/aml-build.c
> > +++ b/hw/acpi/aml-build.c
> > @@ -1578,6 +1578,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
> >      tables->table_data = g_array_new(false, true /* clear */, 1);
> >      tables->tcpalog = g_array_new(false, true /* clear */, 1);
> >      tables->vmgenid = g_array_new(false, true /* clear */, 1);
> > +    tables->hardware_errors = g_array_new(false, true /* clear */, 1);
> >      tables->linker = bios_linker_loader_init();
> >  }
> >
> > @@ -1588,6 +1589,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
> >      g_array_free(tables->table_data, true);
> >      g_array_free(tables->tcpalog, mfre);
> >      g_array_free(tables->vmgenid, mfre);
> > +    g_array_free(tables->hardware_errors, mfre);
> >  }
> >
> >  /*
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index 4cd50175e0..1b1fd273e4 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -48,6 +48,7 @@
> >  #include "sysemu/reset.h"
> >  #include "kvm_arm.h"
> >  #include "migration/vmstate.h"
> > +#include "hw/acpi/acpi_ghes.h"
> >
> >  #define ARM_SPI_BASE 32
> >
> > @@ -825,6 +826,13 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> >      acpi_add_table(table_offsets, tables_blob);
> >      build_spcr(tables_blob, tables->linker, vms);
> >
> > +    if (vms->ras) {
> > +        acpi_add_table(table_offsets, tables_blob);
> > +        acpi_ghes_build_error_table(tables->hardware_errors, tables->linker);
> > +        acpi_ghes_build_hest(tables_blob, tables->hardware_errors,
> > +                             tables->linker);
> > +    }
> > +
> >      if (ms->numa_state->num_nodes > 0) {
> >          acpi_add_table(table_offsets, tables_blob);
> >          build_srat(tables_blob, tables->linker, vms);
> > @@ -942,6 +950,10 @@ void virt_acpi_setup(VirtMachineState *vms)
> >      fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
> >                      acpi_data_len(tables.tcpalog));
> >
> > +    if (vms->ras) {
> > +        acpi_ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
> > +    }
> > +
> >      build_state->rsdp_mr = acpi_add_rom_blob(virt_acpi_build_update,
> >                                               build_state, tables.rsdp,
> >                                               ACPI_BUILD_RSDP_FILE, 0);
> > diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> > new file mode 100644
> > index 0000000000..cb62ec9c7b
> > --- /dev/null
> > +++ b/include/hw/acpi/acpi_ghes.h
> > @@ -0,0 +1,56 @@
> > +/*
> > + * Support for generating APEI tables and recording CPER for Guests
> > + *
> > + * Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
> > + *
> > + * Author: Dongjiu Geng <gengdongjiu@huawei.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > +
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > +
> > + * You should have received a copy of the GNU General Public License along
> > + * with this program; if not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#ifndef ACPI_GHES_H
> > +#define ACPI_GHES_H
> > +
> > +#include "hw/acpi/bios-linker-loader.h"
> > +
> > +/*
> > + * Values for Hardware Error Notification Type field
> > + */
> > +enum AcpiGhesNotifyType {
> > +    ACPI_GHES_NOTIFY_POLLED = 0,    /* Polled */
> > +    ACPI_GHES_NOTIFY_EXTERNAL = 1,  /* External Interrupt */
> > +    ACPI_GHES_NOTIFY_LOCAL = 2, /* Local Interrupt */
> > +    ACPI_GHES_NOTIFY_SCI = 3,   /* SCI */
> > +    ACPI_GHES_NOTIFY_NMI = 4,   /* NMI */
> > +    ACPI_GHES_NOTIFY_CMCI = 5,  /* CMCI, ACPI 5.0: 18.3.2.7, Table 18-290 */
> > +    ACPI_GHES_NOTIFY_MCE = 6,   /* MCE, ACPI 5.0: 18.3.2.7, Table 18-290 */
> > +    /* GPIO-Signal, ACPI 6.0: 18.3.2.7, Table 18-332 */
> > +    ACPI_GHES_NOTIFY_GPIO = 7,
> > +    /* ARMv8 SEA, ACPI 6.1: 18.3.2.9, Table 18-345 */
> > +    ACPI_GHES_NOTIFY_SEA = 8,
> > +    /* ARMv8 SEI, ACPI 6.1: 18.3.2.9, Table 18-345 */
> > +    ACPI_GHES_NOTIFY_SEI = 9,
> > +    /* External Interrupt - GSIV, ACPI 6.1: 18.3.2.9, Table 18-345 */
> > +    ACPI_GHES_NOTIFY_GSIV = 10,
> > +    /* Software Delegated Exception, ACPI 6.2: 18.3.2.9, Table 18-383 */
> > +    ACPI_GHES_NOTIFY_SDEI = 11,
> > +    ACPI_GHES_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
> > +};
> > +
> > +void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> > +                          BIOSLinker *linker);
> > +
> > +void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> > +void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> > +#endif
> > diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> > index de4a406568..8f13620701 100644
> > --- a/include/hw/acpi/aml-build.h
> > +++ b/include/hw/acpi/aml-build.h
> > @@ -220,6 +220,7 @@ struct AcpiBuildTables {
> >      GArray *rsdp;
> >      GArray *tcpalog;
> >      GArray *vmgenid;
> > +    GArray *hardware_errors;
> >      BIOSLinker *linker;
> >  } AcpiBuildTables;
> >
> > --
> > 2.19.1
> >
> >
> >  
> 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-22 15:47       ` Beata Michalska
@ 2019-11-25  9:37         ` Igor Mammedov
  -1 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-25  9:37 UTC (permalink / raw)
  To: Beata Michalska
  Cc: Xiang Zheng, Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang,
	mtosatti, linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl,
	qemu-arm, james.morse, xuwei5, jonathan.cameron, pbonzini,
	Laszlo Ersek, rth

On Fri, 22 Nov 2019 15:47:24 +0000
Beata Michalska <beata.michalska@linaro.org> wrote:

> Hi,
> 
> On Fri, 15 Nov 2019 at 16:54, Igor Mammedov <imammedo@redhat.com> wrote:
> >
> > On Mon, 11 Nov 2019 09:40:47 +0800
> > Xiang Zheng <zhengxiang9@huawei.com> wrote:
> >  
> > > From: Dongjiu Geng <gengdongjiu@huawei.com>
> > >
> > > Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> > > translates the host VA delivered by host to guest PA, then fills this PA
> > > to guest APEI GHES memory, then notifies guest according to the SIGBUS
> > > type.
> > >
> > > When guest accesses the poisoned memory, it will generate a Synchronous
> > > External Abort(SEA). Then host kernel gets an APEI notification and calls
> > > memory_failure() to unmapped the affected page in stage 2, finally
> > > returns to guest.
> > >
> > > Guest continues to access the PG_hwpoison page, it will trap to KVM as
> > > stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> > > Qemu, Qemu records this error address into guest APEI GHES memory and
> > > notifes guest using Synchronous-External-Abort(SEA).
> > >
> > > In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> > > in which we can setup the type of exception and the syndrome information.
> > > When switching to guest, the target vcpu will jump to the synchronous
> > > external abort vector table entry.
> > >
> > > The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> > > ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> > > not valid and hold an UNKNOWN value. These values will be set to KVM
> > > register structures through KVM_SET_ONE_REG IOCTL.
> > >
> > > Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> > > Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
[...]
> > > +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> > > +                                      uint64_t error_physical_addr,
> > > +                                      uint32_t data_length)
> > > +{
> > > +    GArray *block;
> > > +    uint64_t current_block_length;
> > > +    /* Memory Error Section Type */
> > > +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;  
> >                                ^^
> > UEFI_CPER_SEC_PLATFORM_MEM is defined as BE, so _le here is wrong
> > and then later you use qemu_uuid_bswap() to make it LE.
> >
> > Why not define it as LE to begin with, like it's been done for NVDIMM_UUID_LE?
> >  
> Is there a chance to make it common for both ?

sure, it just should be a separate patch.

Maybe put it in include/qemu/uuid.h
or maybe make qemu_uuid_parse() return QemuUUID
so we could initialize like this:
  QemuUUID mem_section_id_le = qemu_uuid_parse("00000000-0000-0000-0000-000000000000", &error_abort);
where used UUID value is easy to read and compare with spec.

[...]


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-11-25  9:37         ` Igor Mammedov
  0 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-25  9:37 UTC (permalink / raw)
  To: Beata Michalska
  Cc: Peter Maydell, ehabkost, kvm, mst, pbonzini, mtosatti, linuxarm,
	qemu-devel, shannon.zhaosl, Xiang Zheng, qemu-arm, james.morse,
	xuwei5, jonathan.cameron, wanghaibin.wang, gengdongjiu,
	Laszlo Ersek, rth

On Fri, 22 Nov 2019 15:47:24 +0000
Beata Michalska <beata.michalska@linaro.org> wrote:

> Hi,
> 
> On Fri, 15 Nov 2019 at 16:54, Igor Mammedov <imammedo@redhat.com> wrote:
> >
> > On Mon, 11 Nov 2019 09:40:47 +0800
> > Xiang Zheng <zhengxiang9@huawei.com> wrote:
> >  
> > > From: Dongjiu Geng <gengdongjiu@huawei.com>
> > >
> > > Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> > > translates the host VA delivered by host to guest PA, then fills this PA
> > > to guest APEI GHES memory, then notifies guest according to the SIGBUS
> > > type.
> > >
> > > When guest accesses the poisoned memory, it will generate a Synchronous
> > > External Abort(SEA). Then host kernel gets an APEI notification and calls
> > > memory_failure() to unmapped the affected page in stage 2, finally
> > > returns to guest.
> > >
> > > Guest continues to access the PG_hwpoison page, it will trap to KVM as
> > > stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> > > Qemu, Qemu records this error address into guest APEI GHES memory and
> > > notifes guest using Synchronous-External-Abort(SEA).
> > >
> > > In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> > > in which we can setup the type of exception and the syndrome information.
> > > When switching to guest, the target vcpu will jump to the synchronous
> > > external abort vector table entry.
> > >
> > > The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> > > ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> > > not valid and hold an UNKNOWN value. These values will be set to KVM
> > > register structures through KVM_SET_ONE_REG IOCTL.
> > >
> > > Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> > > Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
[...]
> > > +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> > > +                                      uint64_t error_physical_addr,
> > > +                                      uint32_t data_length)
> > > +{
> > > +    GArray *block;
> > > +    uint64_t current_block_length;
> > > +    /* Memory Error Section Type */
> > > +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;  
> >                                ^^
> > UEFI_CPER_SEC_PLATFORM_MEM is defined as BE, so _le here is wrong
> > and then later you use qemu_uuid_bswap() to make it LE.
> >
> > Why not define it as LE to begin with, like it's been done for NVDIMM_UUID_LE?
> >  
> Is there a chance to make it common for both ?

sure, it just should be a separate patch.

Maybe put it in include/qemu/uuid.h
or maybe make qemu_uuid_parse() return QemuUUID
so we could initialize like this:
  QemuUUID mem_section_id_le = qemu_uuid_parse("00000000-0000-0000-0000-000000000000", &error_abort);
where used UUID value is easy to read and compare with spec.

[...]



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
  2019-11-18 13:21           ` Michael S. Tsirkin
@ 2019-11-25  9:48             ` Igor Mammedov
  -1 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-25  9:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: gengdongjiu, peter.maydell, ehabkost, kvm, wanghaibin.wang,
	mtosatti, qemu-devel, linuxarm, shannon.zhaosl, Xiang Zheng,
	qemu-arm, james.morse, jonathan.cameron, pbonzini, xuwei5,
	lersek, rth

On Mon, 18 Nov 2019 08:21:18 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Nov 18, 2019 at 09:18:01PM +0800, gengdongjiu wrote:
> > On 2019/11/18 20:49, gengdongjiu wrote:  
> > >>> +     */
> > >>> +    build_append_int_noprefix(table_data, source_id, 2);
> > >>> +    /* Related Source Id */
> > >>> +    build_append_int_noprefix(table_data, 0xffff, 2);
> > >>> +    /* Flags */
> > >>> +    build_append_int_noprefix(table_data, 0, 1);
> > >>> +    /* Enabled */
> > >>> +    build_append_int_noprefix(table_data, 1, 1);
> > >>> +
> > >>> +    /* Number of Records To Pre-allocate */
> > >>> +    build_append_int_noprefix(table_data, 1, 4);
> > >>> +    /* Max Sections Per Record */
> > >>> +    build_append_int_noprefix(table_data, 1, 4);
> > >>> +    /* Max Raw Data Length */
> > >>> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> > >>> +
> > >>> +    /* Error Status Address */
> > >>> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> > >>> +                     4 /* QWord access */, 0);
> > >>> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> > >>> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),  
> > >> it's fine only if GHESv2 is the only entries in HEST, but once
> > >> other types are added this macro will silently fall apart and
> > >> cause table corruption.  
> >    why  silently fall?
> >    I think the acpi_ghes.c only support GHESv2 type, not support other type.
> >   
> > >>
> > >> Instead of offset from hest_start, I suggest to use offset relative
> > >> to GAS structure, here is an idea>>
> > >> #define GAS_ADDR_OFFSET 4
> > >>
> > >>     off = table->len
> > >>     build_append_gas()
> > >>     bios_linker_loader_add_pointer(...,
> > >>         off + GAS_ADDR_OFFSET, ...  
> > 
> > If use offset relative to GAS structure, the code does not easily extend to support more Generic Hardware Error Source.
> > if use offset relative to hest_start, just use a loop, the code can support  more error source, for example:
> > for (source_id = 0; i<n; source_id++)
> > {
> >    ......
> >     bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> >         ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> >         sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
> >         source_id * sizeof(uint64_t));
> >   .......
> > }
> > 
> > My previous series patch support 2 error sources, but now only enable 'SEA' type Error Source  
> 
> I'd try to merge this, worry about extending things later.
> This is at v21 and the simpler you can keep things,
> the faster it'll go in.
I don't think the series is ready for merging yet.
It has a number of issues (not stylistic ones) that need to be fixed first.

As for extending, I think I've suggested to simplify series
to account for single error source only in some places so it
would be easier on author and reviewers and worry about extending
it later.



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
@ 2019-11-25  9:48             ` Igor Mammedov
  0 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-25  9:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peter.maydell, ehabkost, kvm, pbonzini, mtosatti, linuxarm,
	qemu-devel, gengdongjiu, shannon.zhaosl, Xiang Zheng, qemu-arm,
	james.morse, xuwei5, jonathan.cameron, wanghaibin.wang, lersek,
	rth

On Mon, 18 Nov 2019 08:21:18 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Nov 18, 2019 at 09:18:01PM +0800, gengdongjiu wrote:
> > On 2019/11/18 20:49, gengdongjiu wrote:  
> > >>> +     */
> > >>> +    build_append_int_noprefix(table_data, source_id, 2);
> > >>> +    /* Related Source Id */
> > >>> +    build_append_int_noprefix(table_data, 0xffff, 2);
> > >>> +    /* Flags */
> > >>> +    build_append_int_noprefix(table_data, 0, 1);
> > >>> +    /* Enabled */
> > >>> +    build_append_int_noprefix(table_data, 1, 1);
> > >>> +
> > >>> +    /* Number of Records To Pre-allocate */
> > >>> +    build_append_int_noprefix(table_data, 1, 4);
> > >>> +    /* Max Sections Per Record */
> > >>> +    build_append_int_noprefix(table_data, 1, 4);
> > >>> +    /* Max Raw Data Length */
> > >>> +    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> > >>> +
> > >>> +    /* Error Status Address */
> > >>> +    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> > >>> +                     4 /* QWord access */, 0);
> > >>> +    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> > >>> +        ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),  
> > >> it's fine only if GHESv2 is the only entries in HEST, but once
> > >> other types are added this macro will silently fall apart and
> > >> cause table corruption.  
> >    why  silently fall?
> >    I think the acpi_ghes.c only support GHESv2 type, not support other type.
> >   
> > >>
> > >> Instead of offset from hest_start, I suggest to use offset relative
> > >> to GAS structure, here is an idea>>
> > >> #define GAS_ADDR_OFFSET 4
> > >>
> > >>     off = table->len
> > >>     build_append_gas()
> > >>     bios_linker_loader_add_pointer(...,
> > >>         off + GAS_ADDR_OFFSET, ...  
> > 
> > If use offset relative to GAS structure, the code does not easily extend to support more Generic Hardware Error Source.
> > if use offset relative to hest_start, just use a loop, the code can support  more error source, for example:
> > for (source_id = 0; i<n; source_id++)
> > {
> >    ......
> >     bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> >         ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
> >         sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
> >         source_id * sizeof(uint64_t));
> >   .......
> > }
> > 
> > My previous series patch support 2 error sources, but now only enable 'SEA' type Error Source  
> 
> I'd try to merge this, worry about extending things later.
> This is at v21 and the simpler you can keep things,
> the faster it'll go in.
I don't think the series is ready for merging yet.
It has a number of issues (not stylistic ones) that need to be fixed first.

As for extending, I think I've suggested to simplify series
to account for single error source only in some places so it
would be easier on author and reviewers and worry about extending
it later.




^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 2/6] docs: APEI GHES generation and CPER record description
  2019-11-15  9:44     ` Igor Mammedov
@ 2019-11-27  1:37       ` Xiang Zheng
  -1 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-27  1:37 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: pbonzini, mst, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

Hi Igor,

Thanks for your review!
Since the series of patches are going to be merged, we will address your comments by follow up patches.

On 2019/11/15 17:44, Igor Mammedov wrote:
> On Mon, 11 Nov 2019 09:40:44 +0800
> Xiang Zheng <zhengxiang9@huawei.com> wrote:
> 
>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>
>> Add APEI/GHES detailed design document
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>  docs/specs/acpi_hest_ghes.rst | 95 +++++++++++++++++++++++++++++++++++
>>  docs/specs/index.rst          |  1 +
>>  2 files changed, 96 insertions(+)
>>  create mode 100644 docs/specs/acpi_hest_ghes.rst
>>
>> diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
>> new file mode 100644
>> index 0000000000..348825f9d3
>> --- /dev/null
>> +++ b/docs/specs/acpi_hest_ghes.rst
>> @@ -0,0 +1,95 @@
>> +APEI tables generating and CPER record
>> +======================================
>> +
>> +..
>> +   Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
>> +
>> +   This work is licensed under the terms of the GNU GPL, version 2 or later.
>> +   See the COPYING file in the top-level directory.
>> +
>> +Design Details
>> +--------------
>> +
>> +::
>> +
>> +         etc/acpi/tables                                 etc/hardware_errors
>> +      ====================                      ==========================================
>> +  + +--------------------------+            +-----------------------+
>> +  | | HEST                     |            |    address            |            +--------------+
>> +  | +--------------------------+            |    registers          |            | Error Status |
>> +  | | GHES1                    |            | +---------------------+            | Data Block 1 |
>> +  | +--------------------------+ +--------->| |error_block_address1 |----------->| +------------+
>> +  | | .................        | |          | +---------------------+            | |  CPER      |
>> +  | | error_status_address-----+-+ +------->| |error_block_address2 |--------+   | |  CPER      |
>> +  | | .................        |   |        | +---------------------+        |   | |  ....      |
>> +  | | read_ack_register--------+-+ |        | |    ..............   |        |   | |  CPER      |
>> +  | | read_ack_preserve        | | |        +-----------------------+        |   | +------------+
>> +  | | read_ack_write           | | | +----->| |error_block_addressN |------+ |   | Error Status |
>> +  + +--------------------------+ | | |      | +---------------------+      | |   | Data Block 2 |
>> +  | | GHES2                    | +-+-+----->| |read_ack_register1   |      | +-->| +------------+
>> +  + +--------------------------+   | |      | +---------------------+      |     | |  CPER      |
>> +  | | .................        |   | | +--->| |read_ack_register2   |      |     | |  CPER      |
>> +  | | error_status_address-----+---+ | |    | +---------------------+      |     | |  ....      |
>> +  | | .................        |     | |    | |  .............      |      |     | |  CPER      |
>> +  | | read_ack_register--------+-----+-+    | +---------------------+      |     +-+------------+
>> +  | | read_ack_preserve        |     |   +->| |read_ack_registerN   |      |     | |..........  |
>> +  | | read_ack_write           |     |   |  | +---------------------+      |     | +------------+
>> +  + +--------------------------|     |   |                                 |     | Error Status |
>> +  | | ...............          |     |   |                                 |     | Data Block N |
>> +  + +--------------------------+     |   |                                 +---->| +------------+
>> +  | | GHESN                    |     |   |                                       | |  CPER      |
>> +  + +--------------------------+     |   |                                       | |  CPER      |
>> +  | | .................        |     |   |                                       | |  ....      |
>> +  | | error_status_address-----+-----+   |                                       | |  CPER      |
>> +  | | .................        |         |                                       +-+------------+
>> +  | | read_ack_register--------+---------+
>> +  | | read_ack_preserve        |
>> +  | | read_ack_write           |
>> +  + +--------------------------+
> 
> I'd merge "Error Status Data Block" with "address registers", so it would be
> clear that "Error Status Data Block" is located after "read_ack_registerN"

Yes, this image doesn't demonstrate this point. We will make some changes on
this image.

> 
>> +
>> +(1) QEMU generates the ACPI HEST table. This table goes in the current
>> +    "etc/acpi/tables" fw_cfg blob. Each error source has different
>> +    notification types.
>> +
>> +(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
>> +    also needs to populate this blob. The "etc/hardware_errors" fw_cfg blob
>> +    contains an address registers table and an Error Status Data Block table.
>> +
>> +(3) The address registers table contains N Error Block Address entries
>> +    and N Read Ack Register entries. The size for each entry is 8-byte.
>> +    The Error Status Data Block table contains N Error Status Data Block
>> +    entries. The size for each entry is 4096(0x1000) bytes. The total size
>> +    for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
>> +    N is the number of the kinds of hardware error sources.
>> +
>> +(4) QEMU generates the ACPI linker/loader script for the firmware. The
>> +    firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors"
>> +    and copies blob contents there.
>> +
>> +(5) QEMU generates N ADD_POINTER commands, which patch addresses in the
>> +    "error_status_address" fields of the HEST table with a pointer to the
>> +    corresponding "address registers" in the "etc/hardware_errors" blob.
>> +
>> +(6) QEMU generates N ADD_POINTER commands, which patch addresses in the
>> +    "read_ack_register" fields of the HEST table with a pointer to the
>> +    corresponding "address registers" in the "etc/hardware_errors" blob.
> 
> s/"address registers" in/"read_ack_register" within/

OK.

> 
>> +
>> +(7) QEMU generates N ADD_POINTER commands for the firmware, which patch
>> +    addresses in the "error_block_address" fields with a pointer to the
>> +    respective "Error Status Data Block" in the "etc/hardware_errors" blob.
>> +
>> +(8) QEMU defines a third and write-only fw_cfg blob which is called
>> +    "etc/hardware_errors_addr". Through that blob, the firmware can send back
>> +    the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
>> +    blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command
>> +    for the firmware. The firmware will write back the start address of
>> +    "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr".
>> +
> 
>> +(9) When QEMU gets a SIGBUS from the kernel, QEMU formats the CPER right into
>> +    guest memory, 
> 
> s/
> QEMU formats the CPER right into guest memory
> /
> QEMU writes CPER into corresponding "Error Status Data Block"
> /
> 

OK.

>> and then injects platform specific interrupt (in case of
>> +    arm/virt machine it's Synchronous External Abort) as a notification which
>> +    is necessary for notifying the guest.
> 
> 
>> +
>> +(10) This notification (in virtual hardware) will be handled by the guest
>> +     kernel, guest APEI driver will read the CPER which is recorded by QEMU and
>> +     do the recovery.
> Maybe better would be to say:
> "
> On receiving notification, guest APEI driver cold read the CPER error
> and take appropriate action
> "

OK.

> 
> 
> also in HEST patches there is implicit ABI, which probably should be documented here.
> More specifically kvm_arch_on_sigbus_vcpu() error injection
> uses source_id as index in "etc/hardware_errors" to find out "Error Status Data Block"
> entry corresponding to error source. So supported source_id values should be assigned
> here and not be changed afterwards to make sure that guest will write error into
> expected "Error Status Data Block" even if guest was migrated to a newer QEMU.
> 

OK, I will add the descriptions of the implicit ABI.

-- 

Thanks,
Xiang


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 2/6] docs: APEI GHES generation and CPER record description
@ 2019-11-27  1:37       ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-27  1:37 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, xuwei5, jonathan.cameron, pbonzini, lersek, rth

Hi Igor,

Thanks for your review!
Since the series of patches are going to be merged, we will address your comments by follow up patches.

On 2019/11/15 17:44, Igor Mammedov wrote:
> On Mon, 11 Nov 2019 09:40:44 +0800
> Xiang Zheng <zhengxiang9@huawei.com> wrote:
> 
>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>
>> Add APEI/GHES detailed design document
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>  docs/specs/acpi_hest_ghes.rst | 95 +++++++++++++++++++++++++++++++++++
>>  docs/specs/index.rst          |  1 +
>>  2 files changed, 96 insertions(+)
>>  create mode 100644 docs/specs/acpi_hest_ghes.rst
>>
>> diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
>> new file mode 100644
>> index 0000000000..348825f9d3
>> --- /dev/null
>> +++ b/docs/specs/acpi_hest_ghes.rst
>> @@ -0,0 +1,95 @@
>> +APEI tables generating and CPER record
>> +======================================
>> +
>> +..
>> +   Copyright (c) 2019 HUAWEI TECHNOLOGIES CO., LTD.
>> +
>> +   This work is licensed under the terms of the GNU GPL, version 2 or later.
>> +   See the COPYING file in the top-level directory.
>> +
>> +Design Details
>> +--------------
>> +
>> +::
>> +
>> +         etc/acpi/tables                                 etc/hardware_errors
>> +      ====================                      ==========================================
>> +  + +--------------------------+            +-----------------------+
>> +  | | HEST                     |            |    address            |            +--------------+
>> +  | +--------------------------+            |    registers          |            | Error Status |
>> +  | | GHES1                    |            | +---------------------+            | Data Block 1 |
>> +  | +--------------------------+ +--------->| |error_block_address1 |----------->| +------------+
>> +  | | .................        | |          | +---------------------+            | |  CPER      |
>> +  | | error_status_address-----+-+ +------->| |error_block_address2 |--------+   | |  CPER      |
>> +  | | .................        |   |        | +---------------------+        |   | |  ....      |
>> +  | | read_ack_register--------+-+ |        | |    ..............   |        |   | |  CPER      |
>> +  | | read_ack_preserve        | | |        +-----------------------+        |   | +------------+
>> +  | | read_ack_write           | | | +----->| |error_block_addressN |------+ |   | Error Status |
>> +  + +--------------------------+ | | |      | +---------------------+      | |   | Data Block 2 |
>> +  | | GHES2                    | +-+-+----->| |read_ack_register1   |      | +-->| +------------+
>> +  + +--------------------------+   | |      | +---------------------+      |     | |  CPER      |
>> +  | | .................        |   | | +--->| |read_ack_register2   |      |     | |  CPER      |
>> +  | | error_status_address-----+---+ | |    | +---------------------+      |     | |  ....      |
>> +  | | .................        |     | |    | |  .............      |      |     | |  CPER      |
>> +  | | read_ack_register--------+-----+-+    | +---------------------+      |     +-+------------+
>> +  | | read_ack_preserve        |     |   +->| |read_ack_registerN   |      |     | |..........  |
>> +  | | read_ack_write           |     |   |  | +---------------------+      |     | +------------+
>> +  + +--------------------------|     |   |                                 |     | Error Status |
>> +  | | ...............          |     |   |                                 |     | Data Block N |
>> +  + +--------------------------+     |   |                                 +---->| +------------+
>> +  | | GHESN                    |     |   |                                       | |  CPER      |
>> +  + +--------------------------+     |   |                                       | |  CPER      |
>> +  | | .................        |     |   |                                       | |  ....      |
>> +  | | error_status_address-----+-----+   |                                       | |  CPER      |
>> +  | | .................        |         |                                       +-+------------+
>> +  | | read_ack_register--------+---------+
>> +  | | read_ack_preserve        |
>> +  | | read_ack_write           |
>> +  + +--------------------------+
> 
> I'd merge "Error Status Data Block" with "address registers", so it would be
> clear that "Error Status Data Block" is located after "read_ack_registerN"

Yes, this image doesn't demonstrate this point. We will make some changes on
this image.

> 
>> +
>> +(1) QEMU generates the ACPI HEST table. This table goes in the current
>> +    "etc/acpi/tables" fw_cfg blob. Each error source has different
>> +    notification types.
>> +
>> +(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
>> +    also needs to populate this blob. The "etc/hardware_errors" fw_cfg blob
>> +    contains an address registers table and an Error Status Data Block table.
>> +
>> +(3) The address registers table contains N Error Block Address entries
>> +    and N Read Ack Register entries. The size for each entry is 8-byte.
>> +    The Error Status Data Block table contains N Error Status Data Block
>> +    entries. The size for each entry is 4096(0x1000) bytes. The total size
>> +    for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
>> +    N is the number of the kinds of hardware error sources.
>> +
>> +(4) QEMU generates the ACPI linker/loader script for the firmware. The
>> +    firmware pre-allocates memory for "etc/acpi/tables", "etc/hardware_errors"
>> +    and copies blob contents there.
>> +
>> +(5) QEMU generates N ADD_POINTER commands, which patch addresses in the
>> +    "error_status_address" fields of the HEST table with a pointer to the
>> +    corresponding "address registers" in the "etc/hardware_errors" blob.
>> +
>> +(6) QEMU generates N ADD_POINTER commands, which patch addresses in the
>> +    "read_ack_register" fields of the HEST table with a pointer to the
>> +    corresponding "address registers" in the "etc/hardware_errors" blob.
> 
> s/"address registers" in/"read_ack_register" within/

OK.

> 
>> +
>> +(7) QEMU generates N ADD_POINTER commands for the firmware, which patch
>> +    addresses in the "error_block_address" fields with a pointer to the
>> +    respective "Error Status Data Block" in the "etc/hardware_errors" blob.
>> +
>> +(8) QEMU defines a third and write-only fw_cfg blob which is called
>> +    "etc/hardware_errors_addr". Through that blob, the firmware can send back
>> +    the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
>> +    blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command
>> +    for the firmware. The firmware will write back the start address of
>> +    "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr".
>> +
> 
>> +(9) When QEMU gets a SIGBUS from the kernel, QEMU formats the CPER right into
>> +    guest memory, 
> 
> s/
> QEMU formats the CPER right into guest memory
> /
> QEMU writes CPER into corresponding "Error Status Data Block"
> /
> 

OK.

>> and then injects platform specific interrupt (in case of
>> +    arm/virt machine it's Synchronous External Abort) as a notification which
>> +    is necessary for notifying the guest.
> 
> 
>> +
>> +(10) This notification (in virtual hardware) will be handled by the guest
>> +     kernel, guest APEI driver will read the CPER which is recorded by QEMU and
>> +     do the recovery.
> Maybe better would be to say:
> "
> On receiving notification, guest APEI driver cold read the CPER error
> and take appropriate action
> "

OK.

> 
> 
> also in HEST patches there is implicit ABI, which probably should be documented here.
> More specifically kvm_arch_on_sigbus_vcpu() error injection
> uses source_id as index in "etc/hardware_errors" to find out "Error Status Data Block"
> entry corresponding to error source. So supported source_id values should be assigned
> here and not be changed afterwards to make sure that guest will write error into
> expected "Error Status Data Block" even if guest was migrated to a newer QEMU.
> 

OK, I will add the descriptions of the implicit ABI.

-- 

Thanks,
Xiang



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-15 16:37     ` Igor Mammedov
@ 2019-11-27  1:40       ` Xiang Zheng
  -1 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-27  1:40 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: pbonzini, mst, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

Hi,

On 2019/11/16 0:37, Igor Mammedov wrote:
> On Mon, 11 Nov 2019 09:40:47 +0800
> Xiang Zheng <zhengxiang9@huawei.com> wrote:
> 
>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>
>> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
>> translates the host VA delivered by host to guest PA, then fills this PA
>> to guest APEI GHES memory, then notifies guest according to the SIGBUS
>> type.
>>
>> When guest accesses the poisoned memory, it will generate a Synchronous
>> External Abort(SEA). Then host kernel gets an APEI notification and calls
>> memory_failure() to unmapped the affected page in stage 2, finally
>> returns to guest.
>>
>> Guest continues to access the PG_hwpoison page, it will trap to KVM as
>> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
>> Qemu, Qemu records this error address into guest APEI GHES memory and
>> notifes guest using Synchronous-External-Abort(SEA).
>>
>> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
>> in which we can setup the type of exception and the syndrome information.
>> When switching to guest, the target vcpu will jump to the synchronous
>> external abort vector table entry.
>>
>> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
>> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
>> not valid and hold an UNKNOWN value. These values will be set to KVM
>> register structures through KVM_SET_ONE_REG IOCTL.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>>  include/hw/acpi/acpi_ghes.h |   4 +
>>  include/sysemu/kvm.h        |   3 +-
>>  target/arm/cpu.h            |   4 +
>>  target/arm/helper.c         |   2 +-
>>  target/arm/internals.h      |   5 +-
>>  target/arm/kvm64.c          |  64 ++++++++
>>  target/arm/tlb_helper.c     |   2 +-
>>  target/i386/cpu.h           |   2 +
>>  9 files changed, 377 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
>> index 42c00ff3d3..f5b54990c0 100644
>> --- a/hw/acpi/acpi_ghes.c
>> +++ b/hw/acpi/acpi_ghes.c
>> @@ -39,6 +39,34 @@
>>  /* The max size in bytes for one error block */
>>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>>  
>> +/*
>> + * The total size of Generic Error Data Entry
>> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-343 Generic Error Data Entry
>> + */
>> +#define ACPI_GHES_DATA_LENGTH               72
>> +
>> +/*
>> + * The memory section CPER size,
>> + * UEFI 2.6: N.2.5 Memory Error Section
>> + */
> maybe use one line comment
> 
>> +#define ACPI_GHES_MEM_CPER_LENGTH           80
>> +
>> +/*
>> + * Masks for block_status flags
>> + */
> ditto
> 
>> +#define ACPI_GEBS_UNCORRECTABLE         1
>> +
>> +/*
>> + * Values for error_severity field
>> + */
> ditto
> 

OK, I will use one line comment.

>> +enum AcpiGenericErrorSeverity {
>> +    ACPI_CPER_SEV_RECOVERABLE,
>> +    ACPI_CPER_SEV_FATAL,
>> +    ACPI_CPER_SEV_CORRECTED,
>> +    ACPI_CPER_SEV_NONE,
> I'd assign values explicitly here
>   foo = x,
>   ...

OK.

> 
>> +};
>> +
>>  /*
>>   * Now only support ARMv8 SEA notification type error source
>>   */
>> @@ -49,6 +77,16 @@
>>   */
>>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>>  
>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>> +
>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>> +    0xED, 0x7C, 0x83, 0xB1)
>> +
>>  /*
>>   * | +--------------------------+ 0
>>   * | |        Header            |
>> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>>      uint64_t ghes_addr_le;
>>  } AcpiGhesState;
>>  
>> +/*
>> + * Total size for Generic Error Status Block
>> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-380 Generic Error Status Block
>> + */
>> +#define ACPI_GHES_GESB_SIZE                 20
> 
>> +/* The offset of Data Length in Generic Error Status Block */
>> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
> 
> unused, drop it
> 

OK.

>> +
>> +/*
>> + * Record the value of data length for each error status block to avoid getting
>> + * this value from guest.
>> + */
>> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
>> +
>> +/*
>> + * Generic Error Data Entry
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
>> +                uint32_t error_severity, uint16_t revision,
>> +                uint8_t validation_bits, uint8_t flags,
>> +                uint32_t error_data_length, QemuUUID fru_id,
>> +                uint8_t *fru_text, uint64_t time_stamp)
>> +{
>> +    QemuUUID uuid_le;
>> +
>> +    /* Section Type */
>> +    uuid_le = qemu_uuid_bswap(section_type);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +    /* Revision */
>> +    build_append_int_noprefix(table, revision, 2);
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table, validation_bits, 1);
>> +    /* Flags */
>> +    build_append_int_noprefix(table, flags, 1);
>> +    /* Error Data Length */
>> +    build_append_int_noprefix(table, error_data_length, 4);
>> +
>> +    /* FRU Id */
>> +    uuid_le = qemu_uuid_bswap(fru_id);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* FRU Text */
>> +    g_array_append_vals(table, fru_text, 20);
> what if fru_text were shorter than 20 bytes?
> 
> Suggest to pass length along or
> drop all fru handling in the caller and just hardcode here invalid fru with empty text,
> as function could be extended later, once there is something meaningful to put in fru.
> 

"FRU Text" and "FRU Id" are only used in acpi_ghes_generic_error_data(), so I'd move the
definition into acpi_ghes_generic_error_data() and just hardcode here.

> 
>> +    /* Timestamp */
>> +    build_append_int_noprefix(table, time_stamp, 8);
>> +}
>> +
>> +/*
>> + * Generic Error Status Block
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
>> +                uint32_t raw_data_offset, uint32_t raw_data_length,
>> +                uint32_t data_length, uint32_t error_severity)
>> +{
>> +    /* Block Status */
>> +    build_append_int_noprefix(table, block_status, 4);
>> +    /* Raw Data Offset */
>> +    build_append_int_noprefix(table, raw_data_offset, 4);
>> +    /* Raw Data Length */
>> +    build_append_int_noprefix(table, raw_data_length, 4);
>> +    /* Data Length */
>> +    build_append_int_noprefix(table, data_length, 4);
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +}
>> +
>> +/* UEFI 2.6: N.2.5 Memory Error Section */
>> +static void acpi_ghes_build_append_mem_cper(GArray *table,
>> +                                            uint64_t error_physical_addr)
> I'd split out this and acpi_ghes_generic_error_status() and
> acpi_ghes_generic_error_data()  functions into a separate patch.
> 

OK.

>> +{
>> +    /*
>> +     * Memory Error Record
>> +     */
>> +
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table,
> 
>> +                              (1UL << 14) | /* Type Valid */
>> +                              (1UL << 1) /* Physical Address Valid */,
> shouldn't it use ULL suffixes?
> 

Yes, we should use ULL. The field "Validation Bits" in Memory Error Section is
in 8-bytes size.

>> +                              8);
>> +    /* Error Status */
>> +    build_append_int_noprefix(table, 0, 8);
>> +    /* Physical Address */
>> +    build_append_int_noprefix(table, error_physical_addr, 8);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 48);
>> +    /* Memory Error Type */
>> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 7);
>> +}
>> +
>> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
>> +                                      uint64_t error_physical_addr,
>> +                                      uint32_t data_length)
>> +{
>> +    GArray *block;
>> +    uint64_t current_block_length;
>> +    /* Memory Error Section Type */
>> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
>                                ^^
> UEFI_CPER_SEC_PLATFORM_MEM is defined as BE, so _le here is wrong
> and then later you use qemu_uuid_bswap() to make it LE.
> 
> Why not define it as LE to begin with, like it's been done for NVDIMM_UUID_LE?
> 

OK.

> 
>> +    QemuUUID fru_id = {};
>> +    uint8_t fru_text[20] = {};
>> +
>> +    /*
>> +     * Generic Error Status Block
>> +     * | +---------------------+
>> +     * | |     block_status    |
>> +     * | +---------------------+
>> +     * | |    raw_data_offset  |
>> +     * | +---------------------+
>> +     * | |    raw_data_length  |
>> +     * | +---------------------+
>> +     * | |     data_length     |
>> +     * | +---------------------+
>> +     * | |   error_severity    |
>> +     * | +---------------------+
>> +     */
> not necessary, just point to concrete part of ACPI spec if needed.
> 

OK.

>> +    block = g_array_new(false, true /* clear */, 1);
>> +
>> +    /* The current whole length of the generic error status block */
>> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
>> +
>> +    /* This is the length if adding a new generic error data entry*/
>> +    data_length += ACPI_GHES_DATA_LENGTH;
>> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
>> +
>> +    /*
>> +     * Check whether it will run out of the preallocated memory if adding a new
>> +     * generic error data entry
>> +     */
>> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
>> +        error_report("Record CPER out of boundary!!!");
>> +        return ACPI_GHES_CPER_FAIL;
>> +    }
>> +
>> +    /* Build the new generic error status block header */
>> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
>> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
> Down the road, the arguments are passed to build_append_int_noprefix() which takes
> numbers in host byte order, so manually calling cpu_to_le32() is wrong.
> just drop cpu_to_le32() here.
> 
> 
>> +
>> +    /* Write back above generic error status block header to guest memory */
>> +    cpu_physical_memory_write(error_block_address, block->data,
>> +                              block->len);
>> +
>> +    /* Add a new generic error data entry */
>> +
>> +    data_length = block->len;
>> +    /* Build this new generic error data entry header */
>> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
>> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
>> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
> ditto
>

Yes, this would be wrong when running with big-endian QEMU. I will drop "cpu_to_le32()".

>> +
>> +    /* Build the memory section CPER for above new generic error data entry */
>> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
>> +
>> +    /* Write back above this new generic error data entry to guest memory */
>> +    cpu_physical_memory_write(error_block_address + current_block_length,
>> +        block->data + data_length, block->len - data_length);
> 
> If I read it right you are in the first write build an updated "Error Status Block"
> header where you update "Data Length" to account for an additional
> "Error Data Entry" and then this second write appends a new "Error Data Entry"
> after the previous one (if any existed).
> 
> Now for GHESv2, OSPM is supposed to copy existing "Error Status Block" and Ack
> that fact via "Read Ack Register" and QEMU must not overwrite old data until
> they are acked by OSPM.
> 
> With that in mind appending a new error seems a pointless since guest
> already consumed any pre-existing error before we are able to write.
> So we can drop "Error Status Block" tracking and just
>  1. compose whole "Error Status Block" with 1 new "Error Data Entry"
>  2. check that it fits into start, start+ACPI_GHES_MAX_RAW_DATA_LENGTH range
>  3. push it into guest RAM with 1 only write
> 
> and drop all data_length tracking related code.
> 

Yes, you're right. After OSPM acks the "Error Status Block", it can be reused. Maybe one
"Error Data Entry" is enough for each "Error Status Block" in current implementation.

In the ACPI spec, it says that "One or more Generic Error Data Entry structures may be
recorded in the Generic Error Data Entries field of the Generic Error Status Block
structure". I'm not sure whether a single "Error Data Entry" is sufficient.

>> +
>> +    g_array_free(block, true);
>> +
>> +    return ACPI_GHES_CPER_OK;
>> +}
>> +
>>  /*
>>   * Hardware Error Notification
>>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
>> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>>  }
>> +
>> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
>> +{
>> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
>> +    int loop = 0;
>> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
>                                          ^^^^^^^^^^^^^^^
> Forgot to mention in patch [3/6],
> 
> Migration is definitively broken here, since ges.ghes_addr_le is
> not migrated to target QEMU. For example how it should be done see:
>   vmgenid_addr_le and vmstate_vmgenid
> 
> for that you'd need to make ghes_addr_le a part of some device
> (recently added hw/acpi/generic_event_device.c looks like suitable victim)
> 

Yes, this is a serious problem! Is there any better way to migrate ges.ghes_addr_le?
It looks weird that making ghes_addr_le a part of GED. How about registering a single
"ACPI GHES Device" which inherits from "TYPE_DEVICE"?

> 
>> +    bool ret = ACPI_GHES_CPER_FAIL;
>> +    uint8_t source_id;
> 
>> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
> put map at the beginning of this file 
> 
> s/const/static const/
> s/error_source_id/ghes_notify2source_id_map/
>  = { ...,
>      ACPI_HEST_SCR_ID_SEA,
>      ...,
>      ACPI_HEST_SRC_ID_RESERVED
>    }
> 

OK.

> 
>> +
>> +    /*
>> +     * | +---------------------+ ges.ghes_addr_le
>> +     * | |error_block_address0 |
>> +     * | +---------------------+ --+--
>> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | |error_block_addressN |
>> +     * | +---------------------+
>> +     * | | read_ack_register0  |
>> +     * | +---------------------+ --+--
>> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | | read_ack_registerN  |
> 
> above part is not necessary
> 

OK

>> +     * | +---------------------+ --+--
>> +     * | |      CPER           |   |
>> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
>> +     * | |      CPER           |   |
>> +     * | +---------------------+ --+--
> and this one is not precise as it holds not only CPER record
> Generic Error Status Block + Generic Error Data (with CPER inside)
> 

Even in the spec, it only shows the CPER in Error Data Entry. But I
also think using "Error Data Entry" is more precise.

> and looking at code here and spec I'm not sure we can actually do
> several Error Data Entries as implemented here, more on that later
> 


>> +     * | |    ..........       |
>> +     * | +---------------------+
>> +     * | |      CPER           |
>> +     * | |      ....           |
>> +     * | |      CPER           |
>> +     * | +---------------------+
>> +     */
>> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
>> +        /* Find and check the source id for this new CPER */
>> +        source_id = error_source_id[notify];
>> +        if (source_id != 0xff) {
>> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
>> +        } else {
>> +            goto out;
> assert() ???
> 

> 
>> +        }
>> +
>> +        cpu_physical_memory_read(start_addr, &error_block_addr,
>> +                                 ACPI_GHES_ADDRESS_SIZE);
>> +
>> +        read_ack_register_addr = start_addr +
>> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
>> +retry:
>> +        cpu_physical_memory_read(read_ack_register_addr,
>> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> it's safer to use
>    sizeof(read_ack_register)
> instead of ACPI_GHES_ADDRESS_SIZE to make sure that stack won't be corrupted
> by accident later, the same applies to other reads.
> 

OK, I will droy the ACPI_GHES_ADDRESS_SIZE and replace it with "sizeof()".

>> +
>> +        /* zero means OSPM does not acknowledge the error */
>> +        if (!read_ack_register) {
>> +            if (loop < 3) {
>> +                usleep(100 * 1000);
>> +                loop++;
>> +                goto retry;
> as minimum this loop can stall guest repeatedly for 0.3s if guest triggers BQL,
> until it handles error.
> 
> (not sure what to suggest here though)
> 

>> +            } else {
>> +                error_report("OSPM does not acknowledge previous error,"
>> +                    " so can not record CPER for current error, forcibly"
>> +                    " acknowledge previous error to avoid blocking next time"
>> +                    " CPER record! Exit");
> 
> Also error overwrite goes against the spec, which says
> "
> Platforms with RAS
> controllers must prevent concurrent accesses to the Error Status Block (i.e., the RAS controller
> must not overwrite the Error Status Block before the OS has completed reading it).
> ******************
> "
> we probably shouldn't override not acked block.
> Question is what bare metal machines do in this case?
> 

Hmm... Yes, you're right. For example, on bare metal machines there are 3 pre-allocated GHESv2(s)
for SEA. If none of them is acked, it will do nothing for the incoming SEA.

On Qemu there is one pre-allocated GHESv2 for SEA, so we think should do nothing if the block is
not acked.


>> +                read_ack_register = 1;
>> +                cpu_physical_memory_write(read_ack_register_addr,
>> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> Function writes data as is, so one has to ensure that endianness of
> read_ack_register matches that of the spec/guest.
> The same applies to the code below marked with "^^^".
> 
Yes, for the codes here and below marked with "^^^", we need to add cpu_to_leXX() to
match the spec.

>> +            }
>> +        } else {
>> +            if (error_block_addr) {
> 
> } else if () {
> 

OK.

>> +                read_ack_register = 0;
>> +                /*
>> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
>> +                 * acknowledge this error.
>> +                 */
>> +                cpu_physical_memory_write(read_ack_register_addr,
>> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
>                          ^^^ - for 0 it doesn't really matter but conversion should be done
>                                 even if it's just for the sake of documenting interface
> 

>> +                ret = acpi_ghes_record_mem_error(error_block_addr,
>                                                     ^^^^
> 
>> +                          physical_address, acpi_ghes_data_length[source_id]);
>                              ^^^
> 
>> +                if (ret == ACPI_GHES_CPER_OK) {
>> +                    acpi_ghes_data_length[source_id] +=
>> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
> eventually we will run out of space and nothing short of QEMU restart will
> help to reclaim that.
> 
> Also if you keep track of available space in QEMU,
> you'd also have to migrate it otherwise it's lost after migration.
> But maybe we don't need to keep a track of free space,
> see my another comment in acpi_ghes_record_mem_error()
> 
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +out:
>> +    return ret;
>> +}
>> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
>> index cb62ec9c7b..8e3c5b879e 100644
>> --- a/include/hw/acpi/acpi_ghes.h
>> +++ b/include/hw/acpi/acpi_ghes.h
>> @@ -24,6 +24,9 @@
>>  
>>  #include "hw/acpi/bios-linker-loader.h"
>>  
>> +#define ACPI_GHES_CPER_OK                   1
>> +#define ACPI_GHES_CPER_FAIL                 0
>> +
>>  /*
>>   * Values for Hardware Error Notification Type field
>>   */
>> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
>>  
>>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
>>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
>> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
>>  #endif
>> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
>> index 9d143282bc..321ead8115 100644
>> --- a/include/sysemu/kvm.h
>> +++ b/include/sysemu/kvm.h
>> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
>>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
>>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
>>  
>> -#ifdef TARGET_I386
>> -#define KVM_HAVE_MCE_INJECTION 1
>> +#ifdef KVM_HAVE_MCE_INJECTION
>>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
>>  #endif
>>  
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index d844ea21d8..c4fe6ccc63 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -28,6 +28,10 @@
>>  /* ARM processors have a weak memory model */
>>  #define TCG_GUEST_DEFAULT_MO      (0)
>>  
>> +#ifdef TARGET_AARCH64
>> +#define KVM_HAVE_MCE_INJECTION 1
>> +#endif
>> +
>>  #define EXCP_UDEF            1   /* undefined instruction */
>>  #define EXCP_SWI             2   /* software interrupt */
>>  #define EXCP_PREFETCH_ABORT  3
>> diff --git a/target/arm/helper.c b/target/arm/helper.c
>> index 63815fc4cf..a9ce97efb1 100644
>> --- a/target/arm/helper.c
>> +++ b/target/arm/helper.c
>> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
>>               * Report exception with ESR indicating a fault due to a
>>               * translation table walk for a cache maintenance instruction.
>>               */
>> -            syn = syn_data_abort_no_iss(current_el == target_el,
>> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
>>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
>>              env->exception.vaddress = value;
>>              env->exception.fsr = fsr;
>> diff --git a/target/arm/internals.h b/target/arm/internals.h
>> index f5313dd3d4..28b8451d6d 100644
>> --- a/target/arm/internals.h
>> +++ b/target/arm/internals.h
>> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
>>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
>>  }
>>  
>> -static inline uint32_t syn_data_abort_no_iss(int same_el,
>> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
>>                                               int ea, int cm, int s1ptw,
>>                                               int wnr, int fsc)
>>  {
>>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
>>             | ARM_EL_IL
>> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
>> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
>> +           | (wnr << 6) | fsc;
>>  }
>>  
>>  static inline uint32_t syn_data_abort_with_iss(int same_el,
>> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
>> index 28f6db57d5..c7b7653d3f 100644
>> --- a/target/arm/kvm64.c
>> +++ b/target/arm/kvm64.c
>> @@ -28,6 +28,8 @@
>>  #include "kvm_arm.h"
>>  #include "hw/boards.h"
>>  #include "internals.h"
>> +#include "hw/acpi/acpi.h"
>> +#include "hw/acpi/acpi_ghes.h"
>>  
>>  static bool have_guest_debug;
>>  
>> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
>>      return KVM_PUT_RUNTIME_STATE;
>>  }
>>  
>> +/* Callers must hold the iothread mutex lock */
>> +static void kvm_inject_arm_sea(CPUState *c)
>> +{
>> +    ARMCPU *cpu = ARM_CPU(c);
>> +    CPUARMState *env = &cpu->env;
>> +    CPUClass *cc = CPU_GET_CLASS(c);
>> +    uint32_t esr;
>> +    bool same_el;
>> +
>> +    c->exception_index = EXCP_DATA_ABORT;
>> +    env->exception.target_el = 1;
>> +
>> +    /*
>> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
>> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
>> +     */
>> +    same_el = arm_current_el(env) == env->exception.target_el;
>> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
>> +
>> +    env->exception.syndrome = esr;
>> +
>> +    cc->do_interrupt(c);
>> +}
>> +
>>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
>>  
>> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>>      return ret;
>>  }
>>  
>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>> +{
>> +    ram_addr_t ram_addr;
>> +    hwaddr paddr;
>> +
>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> you let BUS_MCEERR_AO in but then it's unused, so what's the purpose of allowing it?
> 

Yes, the assertion "code == BUS_MCEERR_AO" is useless at least for now.

>> +
>> +    if (acpi_enabled && addr &&
>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>> +        ram_addr = qemu_ram_addr_from_host(addr);
>> +        if (ram_addr != RAM_ADDR_INVALID &&
>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>> +            kvm_hwpoison_page_add(ram_addr);
>> +            /*
>> +             * Asynchronous signal will be masked by main thread, so
>> +             * only handle synchronous signal.
>> +             */
>> +            if (code == BUS_MCEERR_AR) {
>> +                kvm_cpu_synchronize_state(c);
>> +                if (ACPI_GHES_CPER_FAIL !=
>> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
>> +                    kvm_inject_arm_sea(c);
>> +                } else {
>> +                    fprintf(stderr, "failed to record the error\n");
> 
> fprintf() shouldn't be used in new code
> and another question is is it's fine to ignore error ?
> maybe we should use error_fatal in such cases?
> 

I'd use error_fatal in this case.

>> +                }
>> +            }
>> +            return;
>> +        }
>> +        fprintf(stderr, "Hardware memory error for memory used by "
>> +                "QEMU itself instead of guest system!\n");
> 
>> +    }
>> +
>> +    if (code == BUS_MCEERR_AR) {
>> +        fprintf(stderr, "Hardware memory error!\n");
>> +        exit(1);
>> +    }
>> +}
>> +
>>  /* C6.6.29 BRK instruction */
>>  static const uint32_t brk_insn = 0xd4200000;
>>  
>> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
>> index 5feb312941..499672ebbc 100644
>> --- a/target/arm/tlb_helper.c
>> +++ b/target/arm/tlb_helper.c
>> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
>>       * ISV field.
>>       */
>>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
>> -        syn = syn_data_abort_no_iss(same_el,
>> +        syn = syn_data_abort_no_iss(same_el, 0,
>>                                      ea, 0, s1ptw, is_write, fsc);
>>      } else {
>>          /*
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 5352c9ff55..f75a210f96 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -29,6 +29,8 @@
>>  /* The x86 has a strong memory model with some store-after-load re-ordering */
>>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
>>  
>> +#define KVM_HAVE_MCE_INJECTION 1
>> +
>>  /* Maximum instruction code size */
>>  #define TARGET_MAX_INSN_SIZE 16
>>  
> 
> 
> .
> 

-- 

Thanks,
Xiang


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-11-27  1:40       ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-27  1:40 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, xuwei5, jonathan.cameron, pbonzini, lersek, rth

Hi,

On 2019/11/16 0:37, Igor Mammedov wrote:
> On Mon, 11 Nov 2019 09:40:47 +0800
> Xiang Zheng <zhengxiang9@huawei.com> wrote:
> 
>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>
>> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
>> translates the host VA delivered by host to guest PA, then fills this PA
>> to guest APEI GHES memory, then notifies guest according to the SIGBUS
>> type.
>>
>> When guest accesses the poisoned memory, it will generate a Synchronous
>> External Abort(SEA). Then host kernel gets an APEI notification and calls
>> memory_failure() to unmapped the affected page in stage 2, finally
>> returns to guest.
>>
>> Guest continues to access the PG_hwpoison page, it will trap to KVM as
>> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
>> Qemu, Qemu records this error address into guest APEI GHES memory and
>> notifes guest using Synchronous-External-Abort(SEA).
>>
>> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
>> in which we can setup the type of exception and the syndrome information.
>> When switching to guest, the target vcpu will jump to the synchronous
>> external abort vector table entry.
>>
>> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
>> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
>> not valid and hold an UNKNOWN value. These values will be set to KVM
>> register structures through KVM_SET_ONE_REG IOCTL.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>>  include/hw/acpi/acpi_ghes.h |   4 +
>>  include/sysemu/kvm.h        |   3 +-
>>  target/arm/cpu.h            |   4 +
>>  target/arm/helper.c         |   2 +-
>>  target/arm/internals.h      |   5 +-
>>  target/arm/kvm64.c          |  64 ++++++++
>>  target/arm/tlb_helper.c     |   2 +-
>>  target/i386/cpu.h           |   2 +
>>  9 files changed, 377 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
>> index 42c00ff3d3..f5b54990c0 100644
>> --- a/hw/acpi/acpi_ghes.c
>> +++ b/hw/acpi/acpi_ghes.c
>> @@ -39,6 +39,34 @@
>>  /* The max size in bytes for one error block */
>>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>>  
>> +/*
>> + * The total size of Generic Error Data Entry
>> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-343 Generic Error Data Entry
>> + */
>> +#define ACPI_GHES_DATA_LENGTH               72
>> +
>> +/*
>> + * The memory section CPER size,
>> + * UEFI 2.6: N.2.5 Memory Error Section
>> + */
> maybe use one line comment
> 
>> +#define ACPI_GHES_MEM_CPER_LENGTH           80
>> +
>> +/*
>> + * Masks for block_status flags
>> + */
> ditto
> 
>> +#define ACPI_GEBS_UNCORRECTABLE         1
>> +
>> +/*
>> + * Values for error_severity field
>> + */
> ditto
> 

OK, I will use one line comment.

>> +enum AcpiGenericErrorSeverity {
>> +    ACPI_CPER_SEV_RECOVERABLE,
>> +    ACPI_CPER_SEV_FATAL,
>> +    ACPI_CPER_SEV_CORRECTED,
>> +    ACPI_CPER_SEV_NONE,
> I'd assign values explicitly here
>   foo = x,
>   ...

OK.

> 
>> +};
>> +
>>  /*
>>   * Now only support ARMv8 SEA notification type error source
>>   */
>> @@ -49,6 +77,16 @@
>>   */
>>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>>  
>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>> +
>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>> +    0xED, 0x7C, 0x83, 0xB1)
>> +
>>  /*
>>   * | +--------------------------+ 0
>>   * | |        Header            |
>> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>>      uint64_t ghes_addr_le;
>>  } AcpiGhesState;
>>  
>> +/*
>> + * Total size for Generic Error Status Block
>> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-380 Generic Error Status Block
>> + */
>> +#define ACPI_GHES_GESB_SIZE                 20
> 
>> +/* The offset of Data Length in Generic Error Status Block */
>> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
> 
> unused, drop it
> 

OK.

>> +
>> +/*
>> + * Record the value of data length for each error status block to avoid getting
>> + * this value from guest.
>> + */
>> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
>> +
>> +/*
>> + * Generic Error Data Entry
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
>> +                uint32_t error_severity, uint16_t revision,
>> +                uint8_t validation_bits, uint8_t flags,
>> +                uint32_t error_data_length, QemuUUID fru_id,
>> +                uint8_t *fru_text, uint64_t time_stamp)
>> +{
>> +    QemuUUID uuid_le;
>> +
>> +    /* Section Type */
>> +    uuid_le = qemu_uuid_bswap(section_type);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +    /* Revision */
>> +    build_append_int_noprefix(table, revision, 2);
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table, validation_bits, 1);
>> +    /* Flags */
>> +    build_append_int_noprefix(table, flags, 1);
>> +    /* Error Data Length */
>> +    build_append_int_noprefix(table, error_data_length, 4);
>> +
>> +    /* FRU Id */
>> +    uuid_le = qemu_uuid_bswap(fru_id);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* FRU Text */
>> +    g_array_append_vals(table, fru_text, 20);
> what if fru_text were shorter than 20 bytes?
> 
> Suggest to pass length along or
> drop all fru handling in the caller and just hardcode here invalid fru with empty text,
> as function could be extended later, once there is something meaningful to put in fru.
> 

"FRU Text" and "FRU Id" are only used in acpi_ghes_generic_error_data(), so I'd move the
definition into acpi_ghes_generic_error_data() and just hardcode here.

> 
>> +    /* Timestamp */
>> +    build_append_int_noprefix(table, time_stamp, 8);
>> +}
>> +
>> +/*
>> + * Generic Error Status Block
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
>> +                uint32_t raw_data_offset, uint32_t raw_data_length,
>> +                uint32_t data_length, uint32_t error_severity)
>> +{
>> +    /* Block Status */
>> +    build_append_int_noprefix(table, block_status, 4);
>> +    /* Raw Data Offset */
>> +    build_append_int_noprefix(table, raw_data_offset, 4);
>> +    /* Raw Data Length */
>> +    build_append_int_noprefix(table, raw_data_length, 4);
>> +    /* Data Length */
>> +    build_append_int_noprefix(table, data_length, 4);
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +}
>> +
>> +/* UEFI 2.6: N.2.5 Memory Error Section */
>> +static void acpi_ghes_build_append_mem_cper(GArray *table,
>> +                                            uint64_t error_physical_addr)
> I'd split out this and acpi_ghes_generic_error_status() and
> acpi_ghes_generic_error_data()  functions into a separate patch.
> 

OK.

>> +{
>> +    /*
>> +     * Memory Error Record
>> +     */
>> +
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table,
> 
>> +                              (1UL << 14) | /* Type Valid */
>> +                              (1UL << 1) /* Physical Address Valid */,
> shouldn't it use ULL suffixes?
> 

Yes, we should use ULL. The field "Validation Bits" in Memory Error Section is
in 8-bytes size.

>> +                              8);
>> +    /* Error Status */
>> +    build_append_int_noprefix(table, 0, 8);
>> +    /* Physical Address */
>> +    build_append_int_noprefix(table, error_physical_addr, 8);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 48);
>> +    /* Memory Error Type */
>> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 7);
>> +}
>> +
>> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
>> +                                      uint64_t error_physical_addr,
>> +                                      uint32_t data_length)
>> +{
>> +    GArray *block;
>> +    uint64_t current_block_length;
>> +    /* Memory Error Section Type */
>> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
>                                ^^
> UEFI_CPER_SEC_PLATFORM_MEM is defined as BE, so _le here is wrong
> and then later you use qemu_uuid_bswap() to make it LE.
> 
> Why not define it as LE to begin with, like it's been done for NVDIMM_UUID_LE?
> 

OK.

> 
>> +    QemuUUID fru_id = {};
>> +    uint8_t fru_text[20] = {};
>> +
>> +    /*
>> +     * Generic Error Status Block
>> +     * | +---------------------+
>> +     * | |     block_status    |
>> +     * | +---------------------+
>> +     * | |    raw_data_offset  |
>> +     * | +---------------------+
>> +     * | |    raw_data_length  |
>> +     * | +---------------------+
>> +     * | |     data_length     |
>> +     * | +---------------------+
>> +     * | |   error_severity    |
>> +     * | +---------------------+
>> +     */
> not necessary, just point to concrete part of ACPI spec if needed.
> 

OK.

>> +    block = g_array_new(false, true /* clear */, 1);
>> +
>> +    /* The current whole length of the generic error status block */
>> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
>> +
>> +    /* This is the length if adding a new generic error data entry*/
>> +    data_length += ACPI_GHES_DATA_LENGTH;
>> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
>> +
>> +    /*
>> +     * Check whether it will run out of the preallocated memory if adding a new
>> +     * generic error data entry
>> +     */
>> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
>> +        error_report("Record CPER out of boundary!!!");
>> +        return ACPI_GHES_CPER_FAIL;
>> +    }
>> +
>> +    /* Build the new generic error status block header */
>> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
>> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
> Down the road, the arguments are passed to build_append_int_noprefix() which takes
> numbers in host byte order, so manually calling cpu_to_le32() is wrong.
> just drop cpu_to_le32() here.
> 
> 
>> +
>> +    /* Write back above generic error status block header to guest memory */
>> +    cpu_physical_memory_write(error_block_address, block->data,
>> +                              block->len);
>> +
>> +    /* Add a new generic error data entry */
>> +
>> +    data_length = block->len;
>> +    /* Build this new generic error data entry header */
>> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
>> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
>> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
> ditto
>

Yes, this would be wrong when running with big-endian QEMU. I will drop "cpu_to_le32()".

>> +
>> +    /* Build the memory section CPER for above new generic error data entry */
>> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
>> +
>> +    /* Write back above this new generic error data entry to guest memory */
>> +    cpu_physical_memory_write(error_block_address + current_block_length,
>> +        block->data + data_length, block->len - data_length);
> 
> If I read it right you are in the first write build an updated "Error Status Block"
> header where you update "Data Length" to account for an additional
> "Error Data Entry" and then this second write appends a new "Error Data Entry"
> after the previous one (if any existed).
> 
> Now for GHESv2, OSPM is supposed to copy existing "Error Status Block" and Ack
> that fact via "Read Ack Register" and QEMU must not overwrite old data until
> they are acked by OSPM.
> 
> With that in mind appending a new error seems a pointless since guest
> already consumed any pre-existing error before we are able to write.
> So we can drop "Error Status Block" tracking and just
>  1. compose whole "Error Status Block" with 1 new "Error Data Entry"
>  2. check that it fits into start, start+ACPI_GHES_MAX_RAW_DATA_LENGTH range
>  3. push it into guest RAM with 1 only write
> 
> and drop all data_length tracking related code.
> 

Yes, you're right. After OSPM acks the "Error Status Block", it can be reused. Maybe one
"Error Data Entry" is enough for each "Error Status Block" in current implementation.

In the ACPI spec, it says that "One or more Generic Error Data Entry structures may be
recorded in the Generic Error Data Entries field of the Generic Error Status Block
structure". I'm not sure whether a single "Error Data Entry" is sufficient.

>> +
>> +    g_array_free(block, true);
>> +
>> +    return ACPI_GHES_CPER_OK;
>> +}
>> +
>>  /*
>>   * Hardware Error Notification
>>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
>> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>>  }
>> +
>> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
>> +{
>> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
>> +    int loop = 0;
>> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
>                                          ^^^^^^^^^^^^^^^
> Forgot to mention in patch [3/6],
> 
> Migration is definitively broken here, since ges.ghes_addr_le is
> not migrated to target QEMU. For example how it should be done see:
>   vmgenid_addr_le and vmstate_vmgenid
> 
> for that you'd need to make ghes_addr_le a part of some device
> (recently added hw/acpi/generic_event_device.c looks like suitable victim)
> 

Yes, this is a serious problem! Is there any better way to migrate ges.ghes_addr_le?
It looks weird that making ghes_addr_le a part of GED. How about registering a single
"ACPI GHES Device" which inherits from "TYPE_DEVICE"?

> 
>> +    bool ret = ACPI_GHES_CPER_FAIL;
>> +    uint8_t source_id;
> 
>> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
> put map at the beginning of this file 
> 
> s/const/static const/
> s/error_source_id/ghes_notify2source_id_map/
>  = { ...,
>      ACPI_HEST_SCR_ID_SEA,
>      ...,
>      ACPI_HEST_SRC_ID_RESERVED
>    }
> 

OK.

> 
>> +
>> +    /*
>> +     * | +---------------------+ ges.ghes_addr_le
>> +     * | |error_block_address0 |
>> +     * | +---------------------+ --+--
>> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | |error_block_addressN |
>> +     * | +---------------------+
>> +     * | | read_ack_register0  |
>> +     * | +---------------------+ --+--
>> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | | read_ack_registerN  |
> 
> above part is not necessary
> 

OK

>> +     * | +---------------------+ --+--
>> +     * | |      CPER           |   |
>> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
>> +     * | |      CPER           |   |
>> +     * | +---------------------+ --+--
> and this one is not precise as it holds not only CPER record
> Generic Error Status Block + Generic Error Data (with CPER inside)
> 

Even in the spec, it only shows the CPER in Error Data Entry. But I
also think using "Error Data Entry" is more precise.

> and looking at code here and spec I'm not sure we can actually do
> several Error Data Entries as implemented here, more on that later
> 


>> +     * | |    ..........       |
>> +     * | +---------------------+
>> +     * | |      CPER           |
>> +     * | |      ....           |
>> +     * | |      CPER           |
>> +     * | +---------------------+
>> +     */
>> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
>> +        /* Find and check the source id for this new CPER */
>> +        source_id = error_source_id[notify];
>> +        if (source_id != 0xff) {
>> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
>> +        } else {
>> +            goto out;
> assert() ???
> 

> 
>> +        }
>> +
>> +        cpu_physical_memory_read(start_addr, &error_block_addr,
>> +                                 ACPI_GHES_ADDRESS_SIZE);
>> +
>> +        read_ack_register_addr = start_addr +
>> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
>> +retry:
>> +        cpu_physical_memory_read(read_ack_register_addr,
>> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> it's safer to use
>    sizeof(read_ack_register)
> instead of ACPI_GHES_ADDRESS_SIZE to make sure that stack won't be corrupted
> by accident later, the same applies to other reads.
> 

OK, I will droy the ACPI_GHES_ADDRESS_SIZE and replace it with "sizeof()".

>> +
>> +        /* zero means OSPM does not acknowledge the error */
>> +        if (!read_ack_register) {
>> +            if (loop < 3) {
>> +                usleep(100 * 1000);
>> +                loop++;
>> +                goto retry;
> as minimum this loop can stall guest repeatedly for 0.3s if guest triggers BQL,
> until it handles error.
> 
> (not sure what to suggest here though)
> 

>> +            } else {
>> +                error_report("OSPM does not acknowledge previous error,"
>> +                    " so can not record CPER for current error, forcibly"
>> +                    " acknowledge previous error to avoid blocking next time"
>> +                    " CPER record! Exit");
> 
> Also error overwrite goes against the spec, which says
> "
> Platforms with RAS
> controllers must prevent concurrent accesses to the Error Status Block (i.e., the RAS controller
> must not overwrite the Error Status Block before the OS has completed reading it).
> ******************
> "
> we probably shouldn't override not acked block.
> Question is what bare metal machines do in this case?
> 

Hmm... Yes, you're right. For example, on bare metal machines there are 3 pre-allocated GHESv2(s)
for SEA. If none of them is acked, it will do nothing for the incoming SEA.

On Qemu there is one pre-allocated GHESv2 for SEA, so we think should do nothing if the block is
not acked.


>> +                read_ack_register = 1;
>> +                cpu_physical_memory_write(read_ack_register_addr,
>> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> Function writes data as is, so one has to ensure that endianness of
> read_ack_register matches that of the spec/guest.
> The same applies to the code below marked with "^^^".
> 
Yes, for the codes here and below marked with "^^^", we need to add cpu_to_leXX() to
match the spec.

>> +            }
>> +        } else {
>> +            if (error_block_addr) {
> 
> } else if () {
> 

OK.

>> +                read_ack_register = 0;
>> +                /*
>> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
>> +                 * acknowledge this error.
>> +                 */
>> +                cpu_physical_memory_write(read_ack_register_addr,
>> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
>                          ^^^ - for 0 it doesn't really matter but conversion should be done
>                                 even if it's just for the sake of documenting interface
> 

>> +                ret = acpi_ghes_record_mem_error(error_block_addr,
>                                                     ^^^^
> 
>> +                          physical_address, acpi_ghes_data_length[source_id]);
>                              ^^^
> 
>> +                if (ret == ACPI_GHES_CPER_OK) {
>> +                    acpi_ghes_data_length[source_id] +=
>> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
> eventually we will run out of space and nothing short of QEMU restart will
> help to reclaim that.
> 
> Also if you keep track of available space in QEMU,
> you'd also have to migrate it otherwise it's lost after migration.
> But maybe we don't need to keep a track of free space,
> see my another comment in acpi_ghes_record_mem_error()
> 
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +out:
>> +    return ret;
>> +}
>> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
>> index cb62ec9c7b..8e3c5b879e 100644
>> --- a/include/hw/acpi/acpi_ghes.h
>> +++ b/include/hw/acpi/acpi_ghes.h
>> @@ -24,6 +24,9 @@
>>  
>>  #include "hw/acpi/bios-linker-loader.h"
>>  
>> +#define ACPI_GHES_CPER_OK                   1
>> +#define ACPI_GHES_CPER_FAIL                 0
>> +
>>  /*
>>   * Values for Hardware Error Notification Type field
>>   */
>> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
>>  
>>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
>>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
>> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
>>  #endif
>> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
>> index 9d143282bc..321ead8115 100644
>> --- a/include/sysemu/kvm.h
>> +++ b/include/sysemu/kvm.h
>> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
>>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
>>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
>>  
>> -#ifdef TARGET_I386
>> -#define KVM_HAVE_MCE_INJECTION 1
>> +#ifdef KVM_HAVE_MCE_INJECTION
>>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
>>  #endif
>>  
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index d844ea21d8..c4fe6ccc63 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -28,6 +28,10 @@
>>  /* ARM processors have a weak memory model */
>>  #define TCG_GUEST_DEFAULT_MO      (0)
>>  
>> +#ifdef TARGET_AARCH64
>> +#define KVM_HAVE_MCE_INJECTION 1
>> +#endif
>> +
>>  #define EXCP_UDEF            1   /* undefined instruction */
>>  #define EXCP_SWI             2   /* software interrupt */
>>  #define EXCP_PREFETCH_ABORT  3
>> diff --git a/target/arm/helper.c b/target/arm/helper.c
>> index 63815fc4cf..a9ce97efb1 100644
>> --- a/target/arm/helper.c
>> +++ b/target/arm/helper.c
>> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
>>               * Report exception with ESR indicating a fault due to a
>>               * translation table walk for a cache maintenance instruction.
>>               */
>> -            syn = syn_data_abort_no_iss(current_el == target_el,
>> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
>>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
>>              env->exception.vaddress = value;
>>              env->exception.fsr = fsr;
>> diff --git a/target/arm/internals.h b/target/arm/internals.h
>> index f5313dd3d4..28b8451d6d 100644
>> --- a/target/arm/internals.h
>> +++ b/target/arm/internals.h
>> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
>>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
>>  }
>>  
>> -static inline uint32_t syn_data_abort_no_iss(int same_el,
>> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
>>                                               int ea, int cm, int s1ptw,
>>                                               int wnr, int fsc)
>>  {
>>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
>>             | ARM_EL_IL
>> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
>> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
>> +           | (wnr << 6) | fsc;
>>  }
>>  
>>  static inline uint32_t syn_data_abort_with_iss(int same_el,
>> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
>> index 28f6db57d5..c7b7653d3f 100644
>> --- a/target/arm/kvm64.c
>> +++ b/target/arm/kvm64.c
>> @@ -28,6 +28,8 @@
>>  #include "kvm_arm.h"
>>  #include "hw/boards.h"
>>  #include "internals.h"
>> +#include "hw/acpi/acpi.h"
>> +#include "hw/acpi/acpi_ghes.h"
>>  
>>  static bool have_guest_debug;
>>  
>> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
>>      return KVM_PUT_RUNTIME_STATE;
>>  }
>>  
>> +/* Callers must hold the iothread mutex lock */
>> +static void kvm_inject_arm_sea(CPUState *c)
>> +{
>> +    ARMCPU *cpu = ARM_CPU(c);
>> +    CPUARMState *env = &cpu->env;
>> +    CPUClass *cc = CPU_GET_CLASS(c);
>> +    uint32_t esr;
>> +    bool same_el;
>> +
>> +    c->exception_index = EXCP_DATA_ABORT;
>> +    env->exception.target_el = 1;
>> +
>> +    /*
>> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
>> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
>> +     */
>> +    same_el = arm_current_el(env) == env->exception.target_el;
>> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
>> +
>> +    env->exception.syndrome = esr;
>> +
>> +    cc->do_interrupt(c);
>> +}
>> +
>>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
>>  
>> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>>      return ret;
>>  }
>>  
>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>> +{
>> +    ram_addr_t ram_addr;
>> +    hwaddr paddr;
>> +
>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> you let BUS_MCEERR_AO in but then it's unused, so what's the purpose of allowing it?
> 

Yes, the assertion "code == BUS_MCEERR_AO" is useless at least for now.

>> +
>> +    if (acpi_enabled && addr &&
>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>> +        ram_addr = qemu_ram_addr_from_host(addr);
>> +        if (ram_addr != RAM_ADDR_INVALID &&
>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>> +            kvm_hwpoison_page_add(ram_addr);
>> +            /*
>> +             * Asynchronous signal will be masked by main thread, so
>> +             * only handle synchronous signal.
>> +             */
>> +            if (code == BUS_MCEERR_AR) {
>> +                kvm_cpu_synchronize_state(c);
>> +                if (ACPI_GHES_CPER_FAIL !=
>> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
>> +                    kvm_inject_arm_sea(c);
>> +                } else {
>> +                    fprintf(stderr, "failed to record the error\n");
> 
> fprintf() shouldn't be used in new code
> and another question is is it's fine to ignore error ?
> maybe we should use error_fatal in such cases?
> 

I'd use error_fatal in this case.

>> +                }
>> +            }
>> +            return;
>> +        }
>> +        fprintf(stderr, "Hardware memory error for memory used by "
>> +                "QEMU itself instead of guest system!\n");
> 
>> +    }
>> +
>> +    if (code == BUS_MCEERR_AR) {
>> +        fprintf(stderr, "Hardware memory error!\n");
>> +        exit(1);
>> +    }
>> +}
>> +
>>  /* C6.6.29 BRK instruction */
>>  static const uint32_t brk_insn = 0xd4200000;
>>  
>> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
>> index 5feb312941..499672ebbc 100644
>> --- a/target/arm/tlb_helper.c
>> +++ b/target/arm/tlb_helper.c
>> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
>>       * ISV field.
>>       */
>>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
>> -        syn = syn_data_abort_no_iss(same_el,
>> +        syn = syn_data_abort_no_iss(same_el, 0,
>>                                      ea, 0, s1ptw, is_write, fsc);
>>      } else {
>>          /*
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 5352c9ff55..f75a210f96 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -29,6 +29,8 @@
>>  /* The x86 has a strong memory model with some store-after-load re-ordering */
>>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
>>  
>> +#define KVM_HAVE_MCE_INJECTION 1
>> +
>>  /* Maximum instruction code size */
>>  #define TARGET_MAX_INSN_SIZE 16
>>  
> 
> 
> .
> 

-- 

Thanks,
Xiang



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 2/6] docs: APEI GHES generation and CPER record description
  2019-11-27  1:37       ` Xiang Zheng
@ 2019-11-27  8:12         ` Igor Mammedov
  -1 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-27  8:12 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: pbonzini, mst, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

On Wed, 27 Nov 2019 09:37:57 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> Hi Igor,
> 
> Thanks for your review!
> Since the series of patches are going to be merged, we will address your comments by follow up patches.

Yes, I know it's quite frustrating to respin series multiple times,
but on the other hand it's more frustrating later on when reader
tries to figure out mess caused by a bunch of fixups in commit
history.

With amount of issues spotted during review, which also requires
rewriting some patches. I don't see big vXX as a valid reason
to merge without other compelling reason, especially at
the beginning of merge window.
(it might be fine right before soft-freeze if issues are minor
but is not the case here)

If I were you, I'd just respin v22 with comments addressed.
(from my side I can promise to review it shortly after that,
while I still remember how it works)

[...]


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 2/6] docs: APEI GHES generation and CPER record description
@ 2019-11-27  8:12         ` Igor Mammedov
  0 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-27  8:12 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: peter.maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, xuwei5, jonathan.cameron, pbonzini, lersek, rth

On Wed, 27 Nov 2019 09:37:57 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> Hi Igor,
> 
> Thanks for your review!
> Since the series of patches are going to be merged, we will address your comments by follow up patches.

Yes, I know it's quite frustrating to respin series multiple times,
but on the other hand it's more frustrating later on when reader
tries to figure out mess caused by a bunch of fixups in commit
history.

With amount of issues spotted during review, which also requires
rewriting some patches. I don't see big vXX as a valid reason
to merge without other compelling reason, especially at
the beginning of merge window.
(it might be fine right before soft-freeze if issues are minor
but is not the case here)

If I were you, I'd just respin v22 with comments addressed.
(from my side I can promise to review it shortly after that,
while I still remember how it works)

[...]



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-27  1:40       ` Xiang Zheng
@ 2019-11-27 10:43         ` Igor Mammedov
  -1 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-27 10:43 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: pbonzini, mst, shannon.zhaosl, peter.maydell, lersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

On Wed, 27 Nov 2019 09:40:14 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> Hi,
> 
> On 2019/11/16 0:37, Igor Mammedov wrote:
> > On Mon, 11 Nov 2019 09:40:47 +0800
> > Xiang Zheng <zhengxiang9@huawei.com> wrote:
> >   
> >> From: Dongjiu Geng <gengdongjiu@huawei.com>
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> >> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> ---
> >>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
> >>  include/hw/acpi/acpi_ghes.h |   4 +
> >>  include/sysemu/kvm.h        |   3 +-
> >>  target/arm/cpu.h            |   4 +
> >>  target/arm/helper.c         |   2 +-
> >>  target/arm/internals.h      |   5 +-
> >>  target/arm/kvm64.c          |  64 ++++++++
> >>  target/arm/tlb_helper.c     |   2 +-
> >>  target/i386/cpu.h           |   2 +
> >>  9 files changed, 377 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> >> index 42c00ff3d3..f5b54990c0 100644
> >> --- a/hw/acpi/acpi_ghes.c
> >> +++ b/hw/acpi/acpi_ghes.c
> >> @@ -39,6 +39,34 @@
> >>  /* The max size in bytes for one error block */
> >>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> >>  
> >> +/*
> >> + * The total size of Generic Error Data Entry
> >> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-343 Generic Error Data Entry
> >> + */
> >> +#define ACPI_GHES_DATA_LENGTH               72
> >> +
> >> +/*
> >> + * The memory section CPER size,
> >> + * UEFI 2.6: N.2.5 Memory Error Section
> >> + */  
> > maybe use one line comment
> >   
> >> +#define ACPI_GHES_MEM_CPER_LENGTH           80
> >> +
> >> +/*
> >> + * Masks for block_status flags
> >> + */  
> > ditto
> >   
> >> +#define ACPI_GEBS_UNCORRECTABLE         1
> >> +
> >> +/*
> >> + * Values for error_severity field
> >> + */  
> > ditto
> >   
> 
> OK, I will use one line comment.
> 
> >> +enum AcpiGenericErrorSeverity {
> >> +    ACPI_CPER_SEV_RECOVERABLE,
> >> +    ACPI_CPER_SEV_FATAL,
> >> +    ACPI_CPER_SEV_CORRECTED,
> >> +    ACPI_CPER_SEV_NONE,  
> > I'd assign values explicitly here
> >   foo = x,
> >   ...  
> 
> OK.
> 
> >   
> >> +};
> >> +
> >>  /*
> >>   * Now only support ARMv8 SEA notification type error source
> >>   */
> >> @@ -49,6 +77,16 @@
> >>   */
> >>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >>  
> >> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> >> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> >> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> >> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> >> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> >> +
> >> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> >> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> >> +    0xED, 0x7C, 0x83, 0xB1)
> >> +
> >>  /*
> >>   * | +--------------------------+ 0
> >>   * | |        Header            |
> >> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
> >>      uint64_t ghes_addr_le;
> >>  } AcpiGhesState;
> >>  
> >> +/*
> >> + * Total size for Generic Error Status Block
> >> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-380 Generic Error Status Block
> >> + */
> >> +#define ACPI_GHES_GESB_SIZE                 20  
> >   
> >> +/* The offset of Data Length in Generic Error Status Block */
> >> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12  
> > 
> > unused, drop it
> >   
> 
> OK.
> 
> >> +
> >> +/*
> >> + * Record the value of data length for each error status block to avoid getting
> >> + * this value from guest.
> >> + */
> >> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> >> +
> >> +/*
> >> + * Generic Error Data Entry
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> >> +                uint32_t error_severity, uint16_t revision,
> >> +                uint8_t validation_bits, uint8_t flags,
> >> +                uint32_t error_data_length, QemuUUID fru_id,
> >> +                uint8_t *fru_text, uint64_t time_stamp)
> >> +{
> >> +    QemuUUID uuid_le;
> >> +
> >> +    /* Section Type */
> >> +    uuid_le = qemu_uuid_bswap(section_type);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +    /* Revision */
> >> +    build_append_int_noprefix(table, revision, 2);
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table, validation_bits, 1);
> >> +    /* Flags */
> >> +    build_append_int_noprefix(table, flags, 1);
> >> +    /* Error Data Length */
> >> +    build_append_int_noprefix(table, error_data_length, 4);
> >> +
> >> +    /* FRU Id */
> >> +    uuid_le = qemu_uuid_bswap(fru_id);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* FRU Text */
> >> +    g_array_append_vals(table, fru_text, 20);  
> > what if fru_text were shorter than 20 bytes?
> > 
> > Suggest to pass length along or
> > drop all fru handling in the caller and just hardcode here invalid fru with empty text,
> > as function could be extended later, once there is something meaningful to put in fru.
> >   
> 
> "FRU Text" and "FRU Id" are only used in acpi_ghes_generic_error_data(), so I'd move the
> definition into acpi_ghes_generic_error_data() and just hardcode here.
> 
> >   
> >> +    /* Timestamp */
> >> +    build_append_int_noprefix(table, time_stamp, 8);
> >> +}
> >> +
> >> +/*
> >> + * Generic Error Status Block
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> >> +                uint32_t raw_data_offset, uint32_t raw_data_length,
> >> +                uint32_t data_length, uint32_t error_severity)
> >> +{
> >> +    /* Block Status */
> >> +    build_append_int_noprefix(table, block_status, 4);
> >> +    /* Raw Data Offset */
> >> +    build_append_int_noprefix(table, raw_data_offset, 4);
> >> +    /* Raw Data Length */
> >> +    build_append_int_noprefix(table, raw_data_length, 4);
> >> +    /* Data Length */
> >> +    build_append_int_noprefix(table, data_length, 4);
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +}
> >> +
> >> +/* UEFI 2.6: N.2.5 Memory Error Section */
> >> +static void acpi_ghes_build_append_mem_cper(GArray *table,
> >> +                                            uint64_t error_physical_addr)  
> > I'd split out this and acpi_ghes_generic_error_status() and
> > acpi_ghes_generic_error_data()  functions into a separate patch.
> >   
> 
> OK.
> 
> >> +{
> >> +    /*
> >> +     * Memory Error Record
> >> +     */
> >> +
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table,  
> >   
> >> +                              (1UL << 14) | /* Type Valid */
> >> +                              (1UL << 1) /* Physical Address Valid */,  
> > shouldn't it use ULL suffixes?
> >   
> 
> Yes, we should use ULL. The field "Validation Bits" in Memory Error Section is
> in 8-bytes size.
> 
> >> +                              8);
> >> +    /* Error Status */
> >> +    build_append_int_noprefix(table, 0, 8);
> >> +    /* Physical Address */
> >> +    build_append_int_noprefix(table, error_physical_addr, 8);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 48);
> >> +    /* Memory Error Type */
> >> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 7);
> >> +}
> >> +
> >> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> >> +                                      uint64_t error_physical_addr,
> >> +                                      uint32_t data_length)
> >> +{
> >> +    GArray *block;
> >> +    uint64_t current_block_length;
> >> +    /* Memory Error Section Type */
> >> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;  
> >                                ^^
> > UEFI_CPER_SEC_PLATFORM_MEM is defined as BE, so _le here is wrong
> > and then later you use qemu_uuid_bswap() to make it LE.
> > 
> > Why not define it as LE to begin with, like it's been done for NVDIMM_UUID_LE?
> >   
> 
> OK.
> 
> >   
> >> +    QemuUUID fru_id = {};
> >> +    uint8_t fru_text[20] = {};
> >> +
> >> +    /*
> >> +     * Generic Error Status Block
> >> +     * | +---------------------+
> >> +     * | |     block_status    |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_offset  |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_length  |
> >> +     * | +---------------------+
> >> +     * | |     data_length     |
> >> +     * | +---------------------+
> >> +     * | |   error_severity    |
> >> +     * | +---------------------+
> >> +     */  
> > not necessary, just point to concrete part of ACPI spec if needed.
> >   
> 
> OK.
> 
> >> +    block = g_array_new(false, true /* clear */, 1);
> >> +
> >> +    /* The current whole length of the generic error status block */
> >> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> >> +
> >> +    /* This is the length if adding a new generic error data entry*/
> >> +    data_length += ACPI_GHES_DATA_LENGTH;
> >> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> >> +
> >> +    /*
> >> +     * Check whether it will run out of the preallocated memory if adding a new
> >> +     * generic error data entry
> >> +     */
> >> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> >> +        error_report("Record CPER out of boundary!!!");
> >> +        return ACPI_GHES_CPER_FAIL;
> >> +    }
> >> +
> >> +    /* Build the new generic error status block header */
> >> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> >> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));  
> > Down the road, the arguments are passed to build_append_int_noprefix() which takes
> > numbers in host byte order, so manually calling cpu_to_le32() is wrong.
> > just drop cpu_to_le32() here.
> > 
> >   
> >> +
> >> +    /* Write back above generic error status block header to guest memory */
> >> +    cpu_physical_memory_write(error_block_address, block->data,
> >> +                              block->len);
> >> +
> >> +    /* Add a new generic error data entry */
> >> +
> >> +    data_length = block->len;
> >> +    /* Build this new generic error data entry header */
> >> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> >> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> >> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);  
> > ditto
> >  
> 
> Yes, this would be wrong when running with big-endian QEMU. I will drop "cpu_to_le32()".
> 
> >> +
> >> +    /* Build the memory section CPER for above new generic error data entry */
> >> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> >> +
> >> +    /* Write back above this new generic error data entry to guest memory */
> >> +    cpu_physical_memory_write(error_block_address + current_block_length,
> >> +        block->data + data_length, block->len - data_length);  
> > 
> > If I read it right you are in the first write build an updated "Error Status Block"
> > header where you update "Data Length" to account for an additional
> > "Error Data Entry" and then this second write appends a new "Error Data Entry"
> > after the previous one (if any existed).
> > 
> > Now for GHESv2, OSPM is supposed to copy existing "Error Status Block" and Ack
> > that fact via "Read Ack Register" and QEMU must not overwrite old data until
> > they are acked by OSPM.
> > 
> > With that in mind appending a new error seems a pointless since guest
> > already consumed any pre-existing error before we are able to write.
> > So we can drop "Error Status Block" tracking and just
> >  1. compose whole "Error Status Block" with 1 new "Error Data Entry"
> >  2. check that it fits into start, start+ACPI_GHES_MAX_RAW_DATA_LENGTH range
> >  3. push it into guest RAM with 1 only write
> > 
> > and drop all data_length tracking related code.
> >   
> 
> Yes, you're right. After OSPM acks the "Error Status Block", it can be reused. Maybe one
> "Error Data Entry" is enough for each "Error Status Block" in current implementation.
> 
> In the ACPI spec, it says that "One or more Generic Error Data Entry structures may be
> recorded in the Generic Error Data Entries field of the Generic Error Status Block
> structure". I'm not sure whether a single "Error Data Entry" is sufficient.
Several data entries could be used if system detects several errors at the same time,
so it could create single status block with all error it knows about.


> >> +    g_array_free(block, true);
> >> +
> >> +    return ACPI_GHES_CPER_OK;
> >> +}
> >> +
> >>  /*
> >>   * Hardware Error Notification
> >>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> >> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> >>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> >>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> >>  }
> >> +
> >> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> >> +{
> >> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> >> +    int loop = 0;
> >> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);  
> >                                          ^^^^^^^^^^^^^^^
> > Forgot to mention in patch [3/6],
> > 
> > Migration is definitively broken here, since ges.ghes_addr_le is
> > not migrated to target QEMU. For example how it should be done see:
> >   vmgenid_addr_le and vmstate_vmgenid
> > 
> > for that you'd need to make ghes_addr_le a part of some device
> > (recently added hw/acpi/generic_event_device.c looks like suitable victim)
> >   
> 
> Yes, this is a serious problem! Is there any better way to migrate ges.ghes_addr_le?
> It looks weird that making ghes_addr_le a part of GED. How about registering a single
> "ACPI GHES Device" which inherits from "TYPE_DEVICE"?
we already have acpi specific device (GED), so I suggest to reuse it.


> >> +    bool ret = ACPI_GHES_CPER_FAIL;
> >> +    uint8_t source_id;  
> >   
> >> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> >> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};  
> > put map at the beginning of this file 
> > 
> > s/const/static const/
> > s/error_source_id/ghes_notify2source_id_map/
> >  = { ...,
> >      ACPI_HEST_SCR_ID_SEA,
> >      ...,
> >      ACPI_HEST_SRC_ID_RESERVED
> >    }
> >   
> 
> OK.
> 
> >   
> >> +
> >> +    /*
> >> +     * | +---------------------+ ges.ghes_addr_le
> >> +     * | |error_block_address0 |
> >> +     * | +---------------------+ --+--
> >> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | |error_block_addressN |
> >> +     * | +---------------------+
> >> +     * | | read_ack_register0  |
> >> +     * | +---------------------+ --+--
> >> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | | read_ack_registerN  |  
> > 
> > above part is not necessary
> >   
> 
> OK
> 
> >> +     * | +---------------------+ --+--
> >> +     * | |      CPER           |   |
> >> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> >> +     * | |      CPER           |   |
> >> +     * | +---------------------+ --+--  
> > and this one is not precise as it holds not only CPER record
> > Generic Error Status Block + Generic Error Data (with CPER inside)
> >   
> 
> Even in the spec, it only shows the CPER in Error Data Entry. But I
> also think using "Error Data Entry" is more precise.
> 
> > and looking at code here and spec I'm not sure we can actually do
> > several Error Data Entries as implemented here, more on that later
> >   
> 
> 
> >> +     * | |    ..........       |
> >> +     * | +---------------------+
> >> +     * | |      CPER           |
> >> +     * | |      ....           |
> >> +     * | |      CPER           |
> >> +     * | +---------------------+
> >> +     */
> >> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> >> +        /* Find and check the source id for this new CPER */
> >> +        source_id = error_source_id[notify];
> >> +        if (source_id != 0xff) {
> >> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> >> +        } else {
> >> +            goto out;  
> > assert() ???
> >   
> 
> >   
> >> +        }
> >> +
> >> +        cpu_physical_memory_read(start_addr, &error_block_addr,
> >> +                                 ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +        read_ack_register_addr = start_addr +
> >> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
> >> +retry:
> >> +        cpu_physical_memory_read(read_ack_register_addr,
> >> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);  
> > it's safer to use
> >    sizeof(read_ack_register)
> > instead of ACPI_GHES_ADDRESS_SIZE to make sure that stack won't be corrupted
> > by accident later, the same applies to other reads.
> >   
> 
> OK, I will droy the ACPI_GHES_ADDRESS_SIZE and replace it with "sizeof()".
> 
> >> +
> >> +        /* zero means OSPM does not acknowledge the error */
> >> +        if (!read_ack_register) {
> >> +            if (loop < 3) {
> >> +                usleep(100 * 1000);
> >> +                loop++;
> >> +                goto retry;  
> > as minimum this loop can stall guest repeatedly for 0.3s if guest triggers BQL,
> > until it handles error.
> > 
> > (not sure what to suggest here though)
> >   
> 
> >> +            } else {
> >> +                error_report("OSPM does not acknowledge previous error,"
> >> +                    " so can not record CPER for current error, forcibly"
> >> +                    " acknowledge previous error to avoid blocking next time"
> >> +                    " CPER record! Exit");  
> > 
> > Also error overwrite goes against the spec, which says
> > "
> > Platforms with RAS
> > controllers must prevent concurrent accesses to the Error Status Block (i.e., the RAS controller
> > must not overwrite the Error Status Block before the OS has completed reading it).
> > ******************
> > "
> > we probably shouldn't override not acked block.
> > Question is what bare metal machines do in this case?
> >   
> 
> Hmm... Yes, you're right. For example, on bare metal machines there are 3 pre-allocated GHESv2(s)
> for SEA. If none of them is acked, it will do nothing for the incoming SEA.
> 
> On Qemu there is one pre-allocated GHESv2 for SEA, so we think should do nothing if the block is
> not acked.
ok if bare-metal behave like this, lets go with it for now.


> >> +                read_ack_register = 1;
> >> +                cpu_physical_memory_write(read_ack_register_addr,
> >> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);  
> > Function writes data as is, so one has to ensure that endianness of
> > read_ack_register matches that of the spec/guest.
> > The same applies to the code below marked with "^^^".
> >   
> Yes, for the codes here and below marked with "^^^", we need to add cpu_to_leXX() to
> match the spec.
> 
> >> +            }
> >> +        } else {
> >> +            if (error_block_addr) {  
> > 
> > } else if () {
> >   
> 
> OK.
> 
> >> +                read_ack_register = 0;
> >> +                /*
> >> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
> >> +                 * acknowledge this error.
> >> +                 */
> >> +                cpu_physical_memory_write(read_ack_register_addr,
> >> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);  
> >                          ^^^ - for 0 it doesn't really matter but conversion should be done
> >                                 even if it's just for the sake of documenting interface
> >   
> 
> >> +                ret = acpi_ghes_record_mem_error(error_block_addr,  
> >                                                     ^^^^
> >   
> >> +                          physical_address, acpi_ghes_data_length[source_id]);  
> >                              ^^^
> >   
> >> +                if (ret == ACPI_GHES_CPER_OK) {
> >> +                    acpi_ghes_data_length[source_id] +=
> >> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);  
> > eventually we will run out of space and nothing short of QEMU restart will
> > help to reclaim that.
> > 
> > Also if you keep track of available space in QEMU,
> > you'd also have to migrate it otherwise it's lost after migration.
> > But maybe we don't need to keep a track of free space,
> > see my another comment in acpi_ghes_record_mem_error()
> >  
> >> +                }
> >> +            }
> >> +        }
> >> +    }
> >> +
> >> +out:
> >> +    return ret;
> >> +}
> >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> >> index cb62ec9c7b..8e3c5b879e 100644
> >> --- a/include/hw/acpi/acpi_ghes.h
> >> +++ b/include/hw/acpi/acpi_ghes.h
> >> @@ -24,6 +24,9 @@
> >>  
> >>  #include "hw/acpi/bios-linker-loader.h"
> >>  
> >> +#define ACPI_GHES_CPER_OK                   1
> >> +#define ACPI_GHES_CPER_FAIL                 0
> >> +
> >>  /*
> >>   * Values for Hardware Error Notification Type field
> >>   */
> >> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> >>  
> >>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> >>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> >> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
> >>  #endif
> >> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> >> index 9d143282bc..321ead8115 100644
> >> --- a/include/sysemu/kvm.h
> >> +++ b/include/sysemu/kvm.h
> >> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
> >>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
> >>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
> >>  
> >> -#ifdef TARGET_I386
> >> -#define KVM_HAVE_MCE_INJECTION 1
> >> +#ifdef KVM_HAVE_MCE_INJECTION
> >>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
> >>  #endif
> >>  
> >> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> >> index d844ea21d8..c4fe6ccc63 100644
> >> --- a/target/arm/cpu.h
> >> +++ b/target/arm/cpu.h
> >> @@ -28,6 +28,10 @@
> >>  /* ARM processors have a weak memory model */
> >>  #define TCG_GUEST_DEFAULT_MO      (0)
> >>  
> >> +#ifdef TARGET_AARCH64
> >> +#define KVM_HAVE_MCE_INJECTION 1
> >> +#endif
> >> +
> >>  #define EXCP_UDEF            1   /* undefined instruction */
> >>  #define EXCP_SWI             2   /* software interrupt */
> >>  #define EXCP_PREFETCH_ABORT  3
> >> diff --git a/target/arm/helper.c b/target/arm/helper.c
> >> index 63815fc4cf..a9ce97efb1 100644
> >> --- a/target/arm/helper.c
> >> +++ b/target/arm/helper.c
> >> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
> >>               * Report exception with ESR indicating a fault due to a
> >>               * translation table walk for a cache maintenance instruction.
> >>               */
> >> -            syn = syn_data_abort_no_iss(current_el == target_el,
> >> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
> >>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
> >>              env->exception.vaddress = value;
> >>              env->exception.fsr = fsr;
> >> diff --git a/target/arm/internals.h b/target/arm/internals.h
> >> index f5313dd3d4..28b8451d6d 100644
> >> --- a/target/arm/internals.h
> >> +++ b/target/arm/internals.h
> >> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
> >>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
> >>  }
> >>  
> >> -static inline uint32_t syn_data_abort_no_iss(int same_el,
> >> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
> >>                                               int ea, int cm, int s1ptw,
> >>                                               int wnr, int fsc)
> >>  {
> >>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
> >>             | ARM_EL_IL
> >> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
> >> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
> >> +           | (wnr << 6) | fsc;
> >>  }
> >>  
> >>  static inline uint32_t syn_data_abort_with_iss(int same_el,
> >> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> >> index 28f6db57d5..c7b7653d3f 100644
> >> --- a/target/arm/kvm64.c
> >> +++ b/target/arm/kvm64.c
> >> @@ -28,6 +28,8 @@
> >>  #include "kvm_arm.h"
> >>  #include "hw/boards.h"
> >>  #include "internals.h"
> >> +#include "hw/acpi/acpi.h"
> >> +#include "hw/acpi/acpi_ghes.h"
> >>  
> >>  static bool have_guest_debug;
> >>  
> >> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
> >>      return KVM_PUT_RUNTIME_STATE;
> >>  }
> >>  
> >> +/* Callers must hold the iothread mutex lock */
> >> +static void kvm_inject_arm_sea(CPUState *c)
> >> +{
> >> +    ARMCPU *cpu = ARM_CPU(c);
> >> +    CPUARMState *env = &cpu->env;
> >> +    CPUClass *cc = CPU_GET_CLASS(c);
> >> +    uint32_t esr;
> >> +    bool same_el;
> >> +
> >> +    c->exception_index = EXCP_DATA_ABORT;
> >> +    env->exception.target_el = 1;
> >> +
> >> +    /*
> >> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> >> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> >> +     */
> >> +    same_el = arm_current_el(env) == env->exception.target_el;
> >> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> >> +
> >> +    env->exception.syndrome = esr;
> >> +
> >> +    cc->do_interrupt(c);
> >> +}
> >> +
> >>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> >>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
> >>  
> >> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
> >>      return ret;
> >>  }
> >>  
> >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >> +{
> >> +    ram_addr_t ram_addr;
> >> +    hwaddr paddr;
> >> +
> >> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);  
> > you let BUS_MCEERR_AO in but then it's unused, so what's the purpose of allowing it?
> >   
> 
> Yes, the assertion "code == BUS_MCEERR_AO" is useless at least for now.
> 
> >> +
> >> +    if (acpi_enabled && addr &&
> >> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> >> +        ram_addr = qemu_ram_addr_from_host(addr);
> >> +        if (ram_addr != RAM_ADDR_INVALID &&
> >> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> >> +            kvm_hwpoison_page_add(ram_addr);
> >> +            /*
> >> +             * Asynchronous signal will be masked by main thread, so
> >> +             * only handle synchronous signal.
> >> +             */
> >> +            if (code == BUS_MCEERR_AR) {
> >> +                kvm_cpu_synchronize_state(c);
> >> +                if (ACPI_GHES_CPER_FAIL !=
> >> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> >> +                    kvm_inject_arm_sea(c);
> >> +                } else {
> >> +                    fprintf(stderr, "failed to record the error\n");  
> > 
> > fprintf() shouldn't be used in new code
> > and another question is is it's fine to ignore error ?
> > maybe we should use error_fatal in such cases?
> >   
> 
> I'd use error_fatal in this case.
> 
> >> +                }
> >> +            }
> >> +            return;
> >> +        }
> >> +        fprintf(stderr, "Hardware memory error for memory used by "
> >> +                "QEMU itself instead of guest system!\n");  
> >   
> >> +    }
> >> +
> >> +    if (code == BUS_MCEERR_AR) {
> >> +        fprintf(stderr, "Hardware memory error!\n");
> >> +        exit(1);
> >> +    }
> >> +}
> >> +
> >>  /* C6.6.29 BRK instruction */
> >>  static const uint32_t brk_insn = 0xd4200000;
> >>  
> >> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> >> index 5feb312941..499672ebbc 100644
> >> --- a/target/arm/tlb_helper.c
> >> +++ b/target/arm/tlb_helper.c
> >> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
> >>       * ISV field.
> >>       */
> >>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> >> -        syn = syn_data_abort_no_iss(same_el,
> >> +        syn = syn_data_abort_no_iss(same_el, 0,
> >>                                      ea, 0, s1ptw, is_write, fsc);
> >>      } else {
> >>          /*
> >> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> >> index 5352c9ff55..f75a210f96 100644
> >> --- a/target/i386/cpu.h
> >> +++ b/target/i386/cpu.h
> >> @@ -29,6 +29,8 @@
> >>  /* The x86 has a strong memory model with some store-after-load re-ordering */
> >>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
> >>  
> >> +#define KVM_HAVE_MCE_INJECTION 1
> >> +
> >>  /* Maximum instruction code size */
> >>  #define TARGET_MAX_INSN_SIZE 16
> >>    
> > 
> > 
> > .
> >   
> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-11-27 10:43         ` Igor Mammedov
  0 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-27 10:43 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: peter.maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, xuwei5, jonathan.cameron, pbonzini, lersek, rth

On Wed, 27 Nov 2019 09:40:14 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> Hi,
> 
> On 2019/11/16 0:37, Igor Mammedov wrote:
> > On Mon, 11 Nov 2019 09:40:47 +0800
> > Xiang Zheng <zhengxiang9@huawei.com> wrote:
> >   
> >> From: Dongjiu Geng <gengdongjiu@huawei.com>
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> >> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> ---
> >>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
> >>  include/hw/acpi/acpi_ghes.h |   4 +
> >>  include/sysemu/kvm.h        |   3 +-
> >>  target/arm/cpu.h            |   4 +
> >>  target/arm/helper.c         |   2 +-
> >>  target/arm/internals.h      |   5 +-
> >>  target/arm/kvm64.c          |  64 ++++++++
> >>  target/arm/tlb_helper.c     |   2 +-
> >>  target/i386/cpu.h           |   2 +
> >>  9 files changed, 377 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> >> index 42c00ff3d3..f5b54990c0 100644
> >> --- a/hw/acpi/acpi_ghes.c
> >> +++ b/hw/acpi/acpi_ghes.c
> >> @@ -39,6 +39,34 @@
> >>  /* The max size in bytes for one error block */
> >>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> >>  
> >> +/*
> >> + * The total size of Generic Error Data Entry
> >> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-343 Generic Error Data Entry
> >> + */
> >> +#define ACPI_GHES_DATA_LENGTH               72
> >> +
> >> +/*
> >> + * The memory section CPER size,
> >> + * UEFI 2.6: N.2.5 Memory Error Section
> >> + */  
> > maybe use one line comment
> >   
> >> +#define ACPI_GHES_MEM_CPER_LENGTH           80
> >> +
> >> +/*
> >> + * Masks for block_status flags
> >> + */  
> > ditto
> >   
> >> +#define ACPI_GEBS_UNCORRECTABLE         1
> >> +
> >> +/*
> >> + * Values for error_severity field
> >> + */  
> > ditto
> >   
> 
> OK, I will use one line comment.
> 
> >> +enum AcpiGenericErrorSeverity {
> >> +    ACPI_CPER_SEV_RECOVERABLE,
> >> +    ACPI_CPER_SEV_FATAL,
> >> +    ACPI_CPER_SEV_CORRECTED,
> >> +    ACPI_CPER_SEV_NONE,  
> > I'd assign values explicitly here
> >   foo = x,
> >   ...  
> 
> OK.
> 
> >   
> >> +};
> >> +
> >>  /*
> >>   * Now only support ARMv8 SEA notification type error source
> >>   */
> >> @@ -49,6 +77,16 @@
> >>   */
> >>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >>  
> >> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> >> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> >> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> >> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> >> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> >> +
> >> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> >> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> >> +    0xED, 0x7C, 0x83, 0xB1)
> >> +
> >>  /*
> >>   * | +--------------------------+ 0
> >>   * | |        Header            |
> >> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
> >>      uint64_t ghes_addr_le;
> >>  } AcpiGhesState;
> >>  
> >> +/*
> >> + * Total size for Generic Error Status Block
> >> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-380 Generic Error Status Block
> >> + */
> >> +#define ACPI_GHES_GESB_SIZE                 20  
> >   
> >> +/* The offset of Data Length in Generic Error Status Block */
> >> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12  
> > 
> > unused, drop it
> >   
> 
> OK.
> 
> >> +
> >> +/*
> >> + * Record the value of data length for each error status block to avoid getting
> >> + * this value from guest.
> >> + */
> >> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> >> +
> >> +/*
> >> + * Generic Error Data Entry
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> >> +                uint32_t error_severity, uint16_t revision,
> >> +                uint8_t validation_bits, uint8_t flags,
> >> +                uint32_t error_data_length, QemuUUID fru_id,
> >> +                uint8_t *fru_text, uint64_t time_stamp)
> >> +{
> >> +    QemuUUID uuid_le;
> >> +
> >> +    /* Section Type */
> >> +    uuid_le = qemu_uuid_bswap(section_type);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +    /* Revision */
> >> +    build_append_int_noprefix(table, revision, 2);
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table, validation_bits, 1);
> >> +    /* Flags */
> >> +    build_append_int_noprefix(table, flags, 1);
> >> +    /* Error Data Length */
> >> +    build_append_int_noprefix(table, error_data_length, 4);
> >> +
> >> +    /* FRU Id */
> >> +    uuid_le = qemu_uuid_bswap(fru_id);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* FRU Text */
> >> +    g_array_append_vals(table, fru_text, 20);  
> > what if fru_text were shorter than 20 bytes?
> > 
> > Suggest to pass length along or
> > drop all fru handling in the caller and just hardcode here invalid fru with empty text,
> > as function could be extended later, once there is something meaningful to put in fru.
> >   
> 
> "FRU Text" and "FRU Id" are only used in acpi_ghes_generic_error_data(), so I'd move the
> definition into acpi_ghes_generic_error_data() and just hardcode here.
> 
> >   
> >> +    /* Timestamp */
> >> +    build_append_int_noprefix(table, time_stamp, 8);
> >> +}
> >> +
> >> +/*
> >> + * Generic Error Status Block
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> >> +                uint32_t raw_data_offset, uint32_t raw_data_length,
> >> +                uint32_t data_length, uint32_t error_severity)
> >> +{
> >> +    /* Block Status */
> >> +    build_append_int_noprefix(table, block_status, 4);
> >> +    /* Raw Data Offset */
> >> +    build_append_int_noprefix(table, raw_data_offset, 4);
> >> +    /* Raw Data Length */
> >> +    build_append_int_noprefix(table, raw_data_length, 4);
> >> +    /* Data Length */
> >> +    build_append_int_noprefix(table, data_length, 4);
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +}
> >> +
> >> +/* UEFI 2.6: N.2.5 Memory Error Section */
> >> +static void acpi_ghes_build_append_mem_cper(GArray *table,
> >> +                                            uint64_t error_physical_addr)  
> > I'd split out this and acpi_ghes_generic_error_status() and
> > acpi_ghes_generic_error_data()  functions into a separate patch.
> >   
> 
> OK.
> 
> >> +{
> >> +    /*
> >> +     * Memory Error Record
> >> +     */
> >> +
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table,  
> >   
> >> +                              (1UL << 14) | /* Type Valid */
> >> +                              (1UL << 1) /* Physical Address Valid */,  
> > shouldn't it use ULL suffixes?
> >   
> 
> Yes, we should use ULL. The field "Validation Bits" in Memory Error Section is
> in 8-bytes size.
> 
> >> +                              8);
> >> +    /* Error Status */
> >> +    build_append_int_noprefix(table, 0, 8);
> >> +    /* Physical Address */
> >> +    build_append_int_noprefix(table, error_physical_addr, 8);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 48);
> >> +    /* Memory Error Type */
> >> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 7);
> >> +}
> >> +
> >> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> >> +                                      uint64_t error_physical_addr,
> >> +                                      uint32_t data_length)
> >> +{
> >> +    GArray *block;
> >> +    uint64_t current_block_length;
> >> +    /* Memory Error Section Type */
> >> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;  
> >                                ^^
> > UEFI_CPER_SEC_PLATFORM_MEM is defined as BE, so _le here is wrong
> > and then later you use qemu_uuid_bswap() to make it LE.
> > 
> > Why not define it as LE to begin with, like it's been done for NVDIMM_UUID_LE?
> >   
> 
> OK.
> 
> >   
> >> +    QemuUUID fru_id = {};
> >> +    uint8_t fru_text[20] = {};
> >> +
> >> +    /*
> >> +     * Generic Error Status Block
> >> +     * | +---------------------+
> >> +     * | |     block_status    |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_offset  |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_length  |
> >> +     * | +---------------------+
> >> +     * | |     data_length     |
> >> +     * | +---------------------+
> >> +     * | |   error_severity    |
> >> +     * | +---------------------+
> >> +     */  
> > not necessary, just point to concrete part of ACPI spec if needed.
> >   
> 
> OK.
> 
> >> +    block = g_array_new(false, true /* clear */, 1);
> >> +
> >> +    /* The current whole length of the generic error status block */
> >> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> >> +
> >> +    /* This is the length if adding a new generic error data entry*/
> >> +    data_length += ACPI_GHES_DATA_LENGTH;
> >> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> >> +
> >> +    /*
> >> +     * Check whether it will run out of the preallocated memory if adding a new
> >> +     * generic error data entry
> >> +     */
> >> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> >> +        error_report("Record CPER out of boundary!!!");
> >> +        return ACPI_GHES_CPER_FAIL;
> >> +    }
> >> +
> >> +    /* Build the new generic error status block header */
> >> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> >> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));  
> > Down the road, the arguments are passed to build_append_int_noprefix() which takes
> > numbers in host byte order, so manually calling cpu_to_le32() is wrong.
> > just drop cpu_to_le32() here.
> > 
> >   
> >> +
> >> +    /* Write back above generic error status block header to guest memory */
> >> +    cpu_physical_memory_write(error_block_address, block->data,
> >> +                              block->len);
> >> +
> >> +    /* Add a new generic error data entry */
> >> +
> >> +    data_length = block->len;
> >> +    /* Build this new generic error data entry header */
> >> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> >> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> >> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);  
> > ditto
> >  
> 
> Yes, this would be wrong when running with big-endian QEMU. I will drop "cpu_to_le32()".
> 
> >> +
> >> +    /* Build the memory section CPER for above new generic error data entry */
> >> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> >> +
> >> +    /* Write back above this new generic error data entry to guest memory */
> >> +    cpu_physical_memory_write(error_block_address + current_block_length,
> >> +        block->data + data_length, block->len - data_length);  
> > 
> > If I read it right you are in the first write build an updated "Error Status Block"
> > header where you update "Data Length" to account for an additional
> > "Error Data Entry" and then this second write appends a new "Error Data Entry"
> > after the previous one (if any existed).
> > 
> > Now for GHESv2, OSPM is supposed to copy existing "Error Status Block" and Ack
> > that fact via "Read Ack Register" and QEMU must not overwrite old data until
> > they are acked by OSPM.
> > 
> > With that in mind appending a new error seems a pointless since guest
> > already consumed any pre-existing error before we are able to write.
> > So we can drop "Error Status Block" tracking and just
> >  1. compose whole "Error Status Block" with 1 new "Error Data Entry"
> >  2. check that it fits into start, start+ACPI_GHES_MAX_RAW_DATA_LENGTH range
> >  3. push it into guest RAM with 1 only write
> > 
> > and drop all data_length tracking related code.
> >   
> 
> Yes, you're right. After OSPM acks the "Error Status Block", it can be reused. Maybe one
> "Error Data Entry" is enough for each "Error Status Block" in current implementation.
> 
> In the ACPI spec, it says that "One or more Generic Error Data Entry structures may be
> recorded in the Generic Error Data Entries field of the Generic Error Status Block
> structure". I'm not sure whether a single "Error Data Entry" is sufficient.
Several data entries could be used if system detects several errors at the same time,
so it could create single status block with all error it knows about.


> >> +    g_array_free(block, true);
> >> +
> >> +    return ACPI_GHES_CPER_OK;
> >> +}
> >> +
> >>  /*
> >>   * Hardware Error Notification
> >>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> >> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> >>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> >>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> >>  }
> >> +
> >> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> >> +{
> >> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> >> +    int loop = 0;
> >> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);  
> >                                          ^^^^^^^^^^^^^^^
> > Forgot to mention in patch [3/6],
> > 
> > Migration is definitively broken here, since ges.ghes_addr_le is
> > not migrated to target QEMU. For example how it should be done see:
> >   vmgenid_addr_le and vmstate_vmgenid
> > 
> > for that you'd need to make ghes_addr_le a part of some device
> > (recently added hw/acpi/generic_event_device.c looks like suitable victim)
> >   
> 
> Yes, this is a serious problem! Is there any better way to migrate ges.ghes_addr_le?
> It looks weird that making ghes_addr_le a part of GED. How about registering a single
> "ACPI GHES Device" which inherits from "TYPE_DEVICE"?
we already have acpi specific device (GED), so I suggest to reuse it.


> >> +    bool ret = ACPI_GHES_CPER_FAIL;
> >> +    uint8_t source_id;  
> >   
> >> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> >> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};  
> > put map at the beginning of this file 
> > 
> > s/const/static const/
> > s/error_source_id/ghes_notify2source_id_map/
> >  = { ...,
> >      ACPI_HEST_SCR_ID_SEA,
> >      ...,
> >      ACPI_HEST_SRC_ID_RESERVED
> >    }
> >   
> 
> OK.
> 
> >   
> >> +
> >> +    /*
> >> +     * | +---------------------+ ges.ghes_addr_le
> >> +     * | |error_block_address0 |
> >> +     * | +---------------------+ --+--
> >> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | |error_block_addressN |
> >> +     * | +---------------------+
> >> +     * | | read_ack_register0  |
> >> +     * | +---------------------+ --+--
> >> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | | read_ack_registerN  |  
> > 
> > above part is not necessary
> >   
> 
> OK
> 
> >> +     * | +---------------------+ --+--
> >> +     * | |      CPER           |   |
> >> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> >> +     * | |      CPER           |   |
> >> +     * | +---------------------+ --+--  
> > and this one is not precise as it holds not only CPER record
> > Generic Error Status Block + Generic Error Data (with CPER inside)
> >   
> 
> Even in the spec, it only shows the CPER in Error Data Entry. But I
> also think using "Error Data Entry" is more precise.
> 
> > and looking at code here and spec I'm not sure we can actually do
> > several Error Data Entries as implemented here, more on that later
> >   
> 
> 
> >> +     * | |    ..........       |
> >> +     * | +---------------------+
> >> +     * | |      CPER           |
> >> +     * | |      ....           |
> >> +     * | |      CPER           |
> >> +     * | +---------------------+
> >> +     */
> >> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> >> +        /* Find and check the source id for this new CPER */
> >> +        source_id = error_source_id[notify];
> >> +        if (source_id != 0xff) {
> >> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> >> +        } else {
> >> +            goto out;  
> > assert() ???
> >   
> 
> >   
> >> +        }
> >> +
> >> +        cpu_physical_memory_read(start_addr, &error_block_addr,
> >> +                                 ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +        read_ack_register_addr = start_addr +
> >> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
> >> +retry:
> >> +        cpu_physical_memory_read(read_ack_register_addr,
> >> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);  
> > it's safer to use
> >    sizeof(read_ack_register)
> > instead of ACPI_GHES_ADDRESS_SIZE to make sure that stack won't be corrupted
> > by accident later, the same applies to other reads.
> >   
> 
> OK, I will droy the ACPI_GHES_ADDRESS_SIZE and replace it with "sizeof()".
> 
> >> +
> >> +        /* zero means OSPM does not acknowledge the error */
> >> +        if (!read_ack_register) {
> >> +            if (loop < 3) {
> >> +                usleep(100 * 1000);
> >> +                loop++;
> >> +                goto retry;  
> > as minimum this loop can stall guest repeatedly for 0.3s if guest triggers BQL,
> > until it handles error.
> > 
> > (not sure what to suggest here though)
> >   
> 
> >> +            } else {
> >> +                error_report("OSPM does not acknowledge previous error,"
> >> +                    " so can not record CPER for current error, forcibly"
> >> +                    " acknowledge previous error to avoid blocking next time"
> >> +                    " CPER record! Exit");  
> > 
> > Also error overwrite goes against the spec, which says
> > "
> > Platforms with RAS
> > controllers must prevent concurrent accesses to the Error Status Block (i.e., the RAS controller
> > must not overwrite the Error Status Block before the OS has completed reading it).
> > ******************
> > "
> > we probably shouldn't override not acked block.
> > Question is what bare metal machines do in this case?
> >   
> 
> Hmm... Yes, you're right. For example, on bare metal machines there are 3 pre-allocated GHESv2(s)
> for SEA. If none of them is acked, it will do nothing for the incoming SEA.
> 
> On Qemu there is one pre-allocated GHESv2 for SEA, so we think should do nothing if the block is
> not acked.
ok if bare-metal behave like this, lets go with it for now.


> >> +                read_ack_register = 1;
> >> +                cpu_physical_memory_write(read_ack_register_addr,
> >> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);  
> > Function writes data as is, so one has to ensure that endianness of
> > read_ack_register matches that of the spec/guest.
> > The same applies to the code below marked with "^^^".
> >   
> Yes, for the codes here and below marked with "^^^", we need to add cpu_to_leXX() to
> match the spec.
> 
> >> +            }
> >> +        } else {
> >> +            if (error_block_addr) {  
> > 
> > } else if () {
> >   
> 
> OK.
> 
> >> +                read_ack_register = 0;
> >> +                /*
> >> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
> >> +                 * acknowledge this error.
> >> +                 */
> >> +                cpu_physical_memory_write(read_ack_register_addr,
> >> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);  
> >                          ^^^ - for 0 it doesn't really matter but conversion should be done
> >                                 even if it's just for the sake of documenting interface
> >   
> 
> >> +                ret = acpi_ghes_record_mem_error(error_block_addr,  
> >                                                     ^^^^
> >   
> >> +                          physical_address, acpi_ghes_data_length[source_id]);  
> >                              ^^^
> >   
> >> +                if (ret == ACPI_GHES_CPER_OK) {
> >> +                    acpi_ghes_data_length[source_id] +=
> >> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);  
> > eventually we will run out of space and nothing short of QEMU restart will
> > help to reclaim that.
> > 
> > Also if you keep track of available space in QEMU,
> > you'd also have to migrate it otherwise it's lost after migration.
> > But maybe we don't need to keep a track of free space,
> > see my another comment in acpi_ghes_record_mem_error()
> >  
> >> +                }
> >> +            }
> >> +        }
> >> +    }
> >> +
> >> +out:
> >> +    return ret;
> >> +}
> >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> >> index cb62ec9c7b..8e3c5b879e 100644
> >> --- a/include/hw/acpi/acpi_ghes.h
> >> +++ b/include/hw/acpi/acpi_ghes.h
> >> @@ -24,6 +24,9 @@
> >>  
> >>  #include "hw/acpi/bios-linker-loader.h"
> >>  
> >> +#define ACPI_GHES_CPER_OK                   1
> >> +#define ACPI_GHES_CPER_FAIL                 0
> >> +
> >>  /*
> >>   * Values for Hardware Error Notification Type field
> >>   */
> >> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> >>  
> >>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> >>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> >> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
> >>  #endif
> >> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> >> index 9d143282bc..321ead8115 100644
> >> --- a/include/sysemu/kvm.h
> >> +++ b/include/sysemu/kvm.h
> >> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
> >>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
> >>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
> >>  
> >> -#ifdef TARGET_I386
> >> -#define KVM_HAVE_MCE_INJECTION 1
> >> +#ifdef KVM_HAVE_MCE_INJECTION
> >>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
> >>  #endif
> >>  
> >> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> >> index d844ea21d8..c4fe6ccc63 100644
> >> --- a/target/arm/cpu.h
> >> +++ b/target/arm/cpu.h
> >> @@ -28,6 +28,10 @@
> >>  /* ARM processors have a weak memory model */
> >>  #define TCG_GUEST_DEFAULT_MO      (0)
> >>  
> >> +#ifdef TARGET_AARCH64
> >> +#define KVM_HAVE_MCE_INJECTION 1
> >> +#endif
> >> +
> >>  #define EXCP_UDEF            1   /* undefined instruction */
> >>  #define EXCP_SWI             2   /* software interrupt */
> >>  #define EXCP_PREFETCH_ABORT  3
> >> diff --git a/target/arm/helper.c b/target/arm/helper.c
> >> index 63815fc4cf..a9ce97efb1 100644
> >> --- a/target/arm/helper.c
> >> +++ b/target/arm/helper.c
> >> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
> >>               * Report exception with ESR indicating a fault due to a
> >>               * translation table walk for a cache maintenance instruction.
> >>               */
> >> -            syn = syn_data_abort_no_iss(current_el == target_el,
> >> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
> >>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
> >>              env->exception.vaddress = value;
> >>              env->exception.fsr = fsr;
> >> diff --git a/target/arm/internals.h b/target/arm/internals.h
> >> index f5313dd3d4..28b8451d6d 100644
> >> --- a/target/arm/internals.h
> >> +++ b/target/arm/internals.h
> >> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
> >>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
> >>  }
> >>  
> >> -static inline uint32_t syn_data_abort_no_iss(int same_el,
> >> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
> >>                                               int ea, int cm, int s1ptw,
> >>                                               int wnr, int fsc)
> >>  {
> >>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
> >>             | ARM_EL_IL
> >> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
> >> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
> >> +           | (wnr << 6) | fsc;
> >>  }
> >>  
> >>  static inline uint32_t syn_data_abort_with_iss(int same_el,
> >> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> >> index 28f6db57d5..c7b7653d3f 100644
> >> --- a/target/arm/kvm64.c
> >> +++ b/target/arm/kvm64.c
> >> @@ -28,6 +28,8 @@
> >>  #include "kvm_arm.h"
> >>  #include "hw/boards.h"
> >>  #include "internals.h"
> >> +#include "hw/acpi/acpi.h"
> >> +#include "hw/acpi/acpi_ghes.h"
> >>  
> >>  static bool have_guest_debug;
> >>  
> >> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
> >>      return KVM_PUT_RUNTIME_STATE;
> >>  }
> >>  
> >> +/* Callers must hold the iothread mutex lock */
> >> +static void kvm_inject_arm_sea(CPUState *c)
> >> +{
> >> +    ARMCPU *cpu = ARM_CPU(c);
> >> +    CPUARMState *env = &cpu->env;
> >> +    CPUClass *cc = CPU_GET_CLASS(c);
> >> +    uint32_t esr;
> >> +    bool same_el;
> >> +
> >> +    c->exception_index = EXCP_DATA_ABORT;
> >> +    env->exception.target_el = 1;
> >> +
> >> +    /*
> >> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> >> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> >> +     */
> >> +    same_el = arm_current_el(env) == env->exception.target_el;
> >> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> >> +
> >> +    env->exception.syndrome = esr;
> >> +
> >> +    cc->do_interrupt(c);
> >> +}
> >> +
> >>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> >>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
> >>  
> >> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
> >>      return ret;
> >>  }
> >>  
> >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >> +{
> >> +    ram_addr_t ram_addr;
> >> +    hwaddr paddr;
> >> +
> >> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);  
> > you let BUS_MCEERR_AO in but then it's unused, so what's the purpose of allowing it?
> >   
> 
> Yes, the assertion "code == BUS_MCEERR_AO" is useless at least for now.
> 
> >> +
> >> +    if (acpi_enabled && addr &&
> >> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> >> +        ram_addr = qemu_ram_addr_from_host(addr);
> >> +        if (ram_addr != RAM_ADDR_INVALID &&
> >> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> >> +            kvm_hwpoison_page_add(ram_addr);
> >> +            /*
> >> +             * Asynchronous signal will be masked by main thread, so
> >> +             * only handle synchronous signal.
> >> +             */
> >> +            if (code == BUS_MCEERR_AR) {
> >> +                kvm_cpu_synchronize_state(c);
> >> +                if (ACPI_GHES_CPER_FAIL !=
> >> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> >> +                    kvm_inject_arm_sea(c);
> >> +                } else {
> >> +                    fprintf(stderr, "failed to record the error\n");  
> > 
> > fprintf() shouldn't be used in new code
> > and another question is is it's fine to ignore error ?
> > maybe we should use error_fatal in such cases?
> >   
> 
> I'd use error_fatal in this case.
> 
> >> +                }
> >> +            }
> >> +            return;
> >> +        }
> >> +        fprintf(stderr, "Hardware memory error for memory used by "
> >> +                "QEMU itself instead of guest system!\n");  
> >   
> >> +    }
> >> +
> >> +    if (code == BUS_MCEERR_AR) {
> >> +        fprintf(stderr, "Hardware memory error!\n");
> >> +        exit(1);
> >> +    }
> >> +}
> >> +
> >>  /* C6.6.29 BRK instruction */
> >>  static const uint32_t brk_insn = 0xd4200000;
> >>  
> >> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> >> index 5feb312941..499672ebbc 100644
> >> --- a/target/arm/tlb_helper.c
> >> +++ b/target/arm/tlb_helper.c
> >> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
> >>       * ISV field.
> >>       */
> >>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> >> -        syn = syn_data_abort_no_iss(same_el,
> >> +        syn = syn_data_abort_no_iss(same_el, 0,
> >>                                      ea, 0, s1ptw, is_write, fsc);
> >>      } else {
> >>          /*
> >> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> >> index 5352c9ff55..f75a210f96 100644
> >> --- a/target/i386/cpu.h
> >> +++ b/target/i386/cpu.h
> >> @@ -29,6 +29,8 @@
> >>  /* The x86 has a strong memory model with some store-after-load re-ordering */
> >>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
> >>  
> >> +#define KVM_HAVE_MCE_INJECTION 1
> >> +
> >>  /* Maximum instruction code size */
> >>  #define TARGET_MAX_INSN_SIZE 16
> >>    
> > 
> > 
> > .
> >   
> 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
  2019-11-25  9:48             ` Igor Mammedov
@ 2019-11-27 11:16               ` gengdongjiu
  -1 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-11-27 11:16 UTC (permalink / raw)
  To: Igor Mammedov, Michael S. Tsirkin
  Cc: peter.maydell, ehabkost, kvm, wanghaibin.wang, mtosatti,
	qemu-devel, linuxarm, shannon.zhaosl, Xiang Zheng, qemu-arm,
	james.morse, jonathan.cameron, pbonzini, xuwei5, lersek, rth

On 2019/11/25 17:48, Igor Mammedov wrote:
>>>    ......
>>>     bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>>>         ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
>>>         sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
>>>         source_id * sizeof(uint64_t));
>>>   .......
>>> }
>>>
>>> My previous series patch support 2 error sources, but now only enable 'SEA' type Error Source  
>> I'd try to merge this, worry about extending things later.
>> This is at v21 and the simpler you can keep things,
>> the faster it'll go in.
> I don't think the series is ready for merging yet.
> It has a number of issues (not stylistic ones) that need to be fixed first.
> 
> As for extending, I think I've suggested to simplify series
> to account for single error source only in some places so it
> would be easier on author and reviewers and worry about extending
> it later.
sure, thanks for the review, we are preparing another series which will fix the issues that you mentioned.

> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support
@ 2019-11-27 11:16               ` gengdongjiu
  0 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-11-27 11:16 UTC (permalink / raw)
  To: Igor Mammedov, Michael S. Tsirkin
  Cc: peter.maydell, ehabkost, kvm, pbonzini, mtosatti, qemu-devel,
	linuxarm, shannon.zhaosl, Xiang Zheng, qemu-arm, james.morse,
	xuwei5, jonathan.cameron, wanghaibin.wang, lersek, rth

On 2019/11/25 17:48, Igor Mammedov wrote:
>>>    ......
>>>     bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>>>         ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
>>>         sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
>>>         source_id * sizeof(uint64_t));
>>>   .......
>>> }
>>>
>>> My previous series patch support 2 error sources, but now only enable 'SEA' type Error Source  
>> I'd try to merge this, worry about extending things later.
>> This is at v21 and the simpler you can keep things,
>> the faster it'll go in.
> I don't think the series is ready for merging yet.
> It has a number of issues (not stylistic ones) that need to be fixed first.
> 
> As for extending, I think I've suggested to simplify series
> to account for single error source only in some places so it
> would be easier on author and reviewers and worry about extending
> it later.
sure, thanks for the review, we are preparing another series which will fix the issues that you mentioned.

> 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-22 15:47     ` Beata Michalska
@ 2019-11-27 12:47       ` Xiang Zheng
  -1 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-27 12:47 UTC (permalink / raw)
  To: Beata Michalska
  Cc: pbonzini, mst, imammedo, shannon.zhaosl, Peter Maydell,
	Laszlo Ersek, james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

Hi Beata,

Thanks for you review!

On 2019/11/22 23:47, Beata Michalska wrote:
> Hi,
> 
> On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>
>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>
>> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
>> translates the host VA delivered by host to guest PA, then fills this PA
>> to guest APEI GHES memory, then notifies guest according to the SIGBUS
>> type.
>>
>> When guest accesses the poisoned memory, it will generate a Synchronous
>> External Abort(SEA). Then host kernel gets an APEI notification and calls
>> memory_failure() to unmapped the affected page in stage 2, finally
>> returns to guest.
>>
>> Guest continues to access the PG_hwpoison page, it will trap to KVM as
>> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
>> Qemu, Qemu records this error address into guest APEI GHES memory and
>> notifes guest using Synchronous-External-Abort(SEA).
>>
>> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
>> in which we can setup the type of exception and the syndrome information.
>> When switching to guest, the target vcpu will jump to the synchronous
>> external abort vector table entry.
>>
>> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
>> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
>> not valid and hold an UNKNOWN value. These values will be set to KVM
>> register structures through KVM_SET_ONE_REG IOCTL.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>>  include/hw/acpi/acpi_ghes.h |   4 +
>>  include/sysemu/kvm.h        |   3 +-
>>  target/arm/cpu.h            |   4 +
>>  target/arm/helper.c         |   2 +-
>>  target/arm/internals.h      |   5 +-
>>  target/arm/kvm64.c          |  64 ++++++++
>>  target/arm/tlb_helper.c     |   2 +-
>>  target/i386/cpu.h           |   2 +
>>  9 files changed, 377 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
>> index 42c00ff3d3..f5b54990c0 100644
>> --- a/hw/acpi/acpi_ghes.c
>> +++ b/hw/acpi/acpi_ghes.c
>> @@ -39,6 +39,34 @@
>>  /* The max size in bytes for one error block */
>>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>>
>> +/*
>> + * The total size of Generic Error Data Entry
>> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-343 Generic Error Data Entry
>> + */
>> +#define ACPI_GHES_DATA_LENGTH               72
>> +
>> +/*
>> + * The memory section CPER size,
>> + * UEFI 2.6: N.2.5 Memory Error Section
>> + */
>> +#define ACPI_GHES_MEM_CPER_LENGTH           80
>> +
>> +/*
>> + * Masks for block_status flags
>> + */
>> +#define ACPI_GEBS_UNCORRECTABLE         1
> 
> Why not listing all supported statuses ? Similar to error severity below ?
> 

We now only use the first bit for uncorrectable error. The correctable errors
are handled in host and would not be delivered to QEMU.

I think it's unnecessary to list all the bit masks.

>> +
>> +/*
>> + * Values for error_severity field
>> + */
>> +enum AcpiGenericErrorSeverity {
>> +    ACPI_CPER_SEV_RECOVERABLE,
>> +    ACPI_CPER_SEV_FATAL,
>> +    ACPI_CPER_SEV_CORRECTED,
>> +    ACPI_CPER_SEV_NONE,
>> +};
>> +
>>  /*
>>   * Now only support ARMv8 SEA notification type error source
>>   */
>> @@ -49,6 +77,16 @@
>>   */
>>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>>
>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>> +
>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>> +    0xED, 0x7C, 0x83, 0xB1)
>> +
>>  /*
>>   * | +--------------------------+ 0
>>   * | |        Header            |
>> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>>      uint64_t ghes_addr_le;
>>  } AcpiGhesState;
>>
>> +/*
>> + * Total size for Generic Error Status Block
>> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-380 Generic Error Status Block
>> + */
>> +#define ACPI_GHES_GESB_SIZE                 20
> 
> Minor: This is not entirely correct: GEDE is part of GESB so the total length
> would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)
> 

Yes, here it only indicates the total length of Generic Error Status Block structure
expect "GEDEs".

>> +/* The offset of Data Length in Generic Error Status Block */
>> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
>> +
> 
> If those were nicely represented as structures you get the offsets easily
> without having number of defines. That could simplify the code and make it
> more readable - see comments below
> 

To address Igor's comment, this macro is useless and I will drop it.

>> +/*
>> + * Record the value of data length for each error status block to avoid getting
>> + * this value from guest.
>> + */
>> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
>> +
>> +/*
>> + * Generic Error Data Entry
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
>> +                uint32_t error_severity, uint16_t revision,
>> +                uint8_t validation_bits, uint8_t flags,
>> +                uint32_t error_data_length, QemuUUID fru_id,
>> +                uint8_t *fru_text, uint64_t time_stamp)
> 
> Why not just defining a struct that represents the GED entry?
> 
>> +{
>> +    QemuUUID uuid_le;
>> +
>> +    /* Section Type */
>> +    uuid_le = qemu_uuid_bswap(section_type);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +    /* Revision */
>> +    build_append_int_noprefix(table, revision, 2);
> 
> Minor: According to the spec it seems that the revision number is
> a fixed value so you could drop that from the parameters....
> or ... use a struct to represent the data
> 
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table, validation_bits, 1);
>> +    /* Flags */
>> +    build_append_int_noprefix(table, flags, 1);
>> +    /* Error Data Length */
>> +    build_append_int_noprefix(table, error_data_length, 4);
>> +
>> +    /* FRU Id */
>> +    uuid_le = qemu_uuid_bswap(fru_id);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* FRU Text */
>> +    g_array_append_vals(table, fru_text, 20);
>> +    /* Timestamp */
>> +    build_append_int_noprefix(table, time_stamp, 8);
>> +}
>> +
>> +/*
>> + * Generic Error Status Block
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
>> +                uint32_t raw_data_offset, uint32_t raw_data_length,
>> +                uint32_t data_length, uint32_t error_severity)
> 
> Same as the above
> 

>> +{
>> +    /* Block Status */
>> +    build_append_int_noprefix(table, block_status, 4);
>> +    /* Raw Data Offset */
>> +    build_append_int_noprefix(table, raw_data_offset, 4);
>> +    /* Raw Data Length */
>> +    build_append_int_noprefix(table, raw_data_length, 4);
>> +    /* Data Length */
>> +    build_append_int_noprefix(table, data_length, 4);
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +}
>> +
>> +/* UEFI 2.6: N.2.5 Memory Error Section */
>> +static void acpi_ghes_build_append_mem_cper(GArray *table,
>> +                                            uint64_t error_physical_addr)
>> +{
>> +    /*
>> +     * Memory Error Record
>> +     */
>> +
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table,
>> +                              (1UL << 14) | /* Type Valid */
>> +                              (1UL << 1) /* Physical Address Valid */,
>> +                              8);
>> +    /* Error Status */
>> +    build_append_int_noprefix(table, 0, 8);
> 
> Just wondering whether it would be worth to specify the Error Type
> through the Error Status ?
> 

Error Status relies on the informations from implementation-specific error registers
which means we need to provide more informations to QEMU to handle.

In current implemention, KVM only delivers BUS_MCEERR_AR type of signal and poisoned
HVA to userspace(QEMU). If we want to extract more information in QEMU, it requires KVM
to provide corresponding informations. However KVM is not ready for that now.

>> +    /* Physical Address */
>> +    build_append_int_noprefix(table, error_physical_addr, 8);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 48);
>> +    /* Memory Error Type */
>> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 7);
>> +}
>> +
>> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
>> +                                      uint64_t error_physical_addr,
>> +                                      uint32_t data_length)
>> +{
>> +    GArray *block;
>> +    uint64_t current_block_length;
>> +    /* Memory Error Section Type */
>> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
> 
> As already mentioned - mixing LE /w BE
> 

>> +    QemuUUID fru_id = {};
>> +    uint8_t fru_text[20] = {};
>> +
>> +    /*
>> +     * Generic Error Status Block
>> +     * | +---------------------+
>> +     * | |     block_status    |
>> +     * | +---------------------+
>> +     * | |    raw_data_offset  |
>> +     * | +---------------------+
>> +     * | |    raw_data_length  |
>> +     * | +---------------------+
>> +     * | |     data_length     |
>> +     * | +---------------------+
>> +     * | |   error_severity    |
>> +     * | +---------------------+
>> +     */
>> +    block = g_array_new(false, true /* clear */, 1);
>> +
>> +    /* The current whole length of the generic error status block */
>> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
>> +
>> +    /* This is the length if adding a new generic error data entry*/
>> +    data_length += ACPI_GHES_DATA_LENGTH;
>> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
>> +
>> +    /*
>> +     * Check whether it will run out of the preallocated memory if adding a new
>> +     * generic error data entry
>> +     */
>> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
>> +        error_report("Record CPER out of boundary!!!");
> 
> Minor: The error message could be made more accurate, like:
>     "Not enough memory to record new CPER"
> 

OK.

>> +        return ACPI_GHES_CPER_FAIL;
>> +    }
>> +
>> +    /* Build the new generic error status block header */
>> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
>> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
>> +
>> +    /* Write back above generic error status block header to guest memory */
>> +    cpu_physical_memory_write(error_block_address, block->data,
>> +                              block->len);
>> +
>> +    /* Add a new generic error data entry */
>> +
>> +    data_length = block->len;
>> +    /* Build this new generic error data entry header */
>> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
>> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
>> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
>> +
>> +    /* Build the memory section CPER for above new generic error data entry */
>> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
>> +
>> +    /* Write back above this new generic error data entry to guest memory */
>> +    cpu_physical_memory_write(error_block_address + current_block_length,
>> +        block->data + data_length, block->len - data_length);
>> +
> 
> As already mentioned and unless I have missed smth (which is highly possible)
> this will append new records while the GESB is kept 'in-place'. So the
> used space is
> only growing.
> 

Yes, we need to address this unlimited growing records.

>> +    g_array_free(block, true);
>> +
>> +    return ACPI_GHES_CPER_OK;
>> +}
>> +
>>  /*
>>   * Hardware Error Notification
>>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
>> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>>  }
>> +
>> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
>> +{
>> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
>> +    int loop = 0;
>> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
>> +    bool ret = ACPI_GHES_CPER_FAIL;
>> +    uint8_t source_id;
>> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
>> +
> 
> I'm not entirely sure why this is needed - se below
> 

>> +    /*
>> +     * | +---------------------+ ges.ghes_addr_le
>> +     * | |error_block_address0 |
>> +     * | +---------------------+ --+--
>> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | |error_block_addressN |
>> +     * | +---------------------+
>> +     * | | read_ack_register0  |
>> +     * | +---------------------+ --+--
>> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | | read_ack_registerN  |
>> +     * | +---------------------+ --+--
>> +     * | |      CPER           |   |
>> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
>> +     * | |      CPER           |   |
>> +     * | +---------------------+ --+--
>> +     * | |    ..........       |
>> +     * | +---------------------+
>> +     * | |      CPER           |
>> +     * | |      ....           |
>> +     * | |      CPER           |
>> +     * | +---------------------+
>> +     */
>> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
>> +        /* Find and check the source id for this new CPER */
>> +        source_id = error_source_id[notify];
> 
> Why not using switch case for supported source types ?
> For the time being only one is being supported. And you only use that to
> verify that support - seems a bit unnecessary.
> 

Good idea, I think using switch case is much better.

>> +        if (source_id != 0xff) {
>> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
>> +        } else {
>> +            goto out;
>> +        }
>> +
>> +        cpu_physical_memory_read(start_addr, &error_block_addr,
>> +                                 ACPI_GHES_ADDRESS_SIZE);
>> +
>> +        read_ack_register_addr = start_addr +
>> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
>> +retry:
>> +        cpu_physical_memory_read(read_ack_register_addr,
>> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
>> +
>> +        /* zero means OSPM does not acknowledge the error */
>> +        if (!read_ack_register) {
>> +            if (loop < 3) {
>> +                usleep(100 * 1000);
>> +                loop++;
>> +                goto retry;
>> +            } else {
>> +                error_report("OSPM does not acknowledge previous error,"
>> +                    " so can not record CPER for current error, forcibly"
>> +                    " acknowledge previous error to avoid blocking next time"
>> +                    " CPER record! Exit");
>> +                read_ack_register = 1;
>> +                cpu_physical_memory_write(read_ack_register_addr,
>> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> 
> Already mentioned ...
> This seems to be against the spec. It not only ignores the req
> for OSPM to acknowledge receiving notifications for previous errors ,
> but it also loses one of them. Why not caching it somewhere until
> OSPM acknowledges the old ones ?
> 

Yes, Igor had mentioned this point in the previous comments.

>> +            }
>> +        } else {
>> +            if (error_block_addr) {
> 
> What is the use case for the address not being set ?
> 

Hmmm...I'd add a "error_fatal" in this case.

>> +                read_ack_register = 0;
>> +                /*
>> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
>> +                 * acknowledge this error.
>> +                 */
>> +                cpu_physical_memory_write(read_ack_register_addr,
>> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> 
> If the ack register has been cleared - which is why we end up here ....
> why writing it back if there is no notification for the system to process ?
> 

Ending up here means the ack register has been acked by OSPM, we need to clear it so
that OSPM can write it back to 1 at the next time.

>> +                ret = acpi_ghes_record_mem_error(error_block_addr,
>> +                          physical_address, acpi_ghes_data_length[source_id]);
>> +                if (ret == ACPI_GHES_CPER_OK) {
>> +                    acpi_ghes_data_length[source_id] +=
>> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
> 
> As mentioned .. this will run out of space - some roll-back
> mechanism is needed to overwrite stale entries
> 

Yes, Igor had mentioned this point too.

>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +out:
>> +    return ret;
>> +}
>> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
>> index cb62ec9c7b..8e3c5b879e 100644
>> --- a/include/hw/acpi/acpi_ghes.h
>> +++ b/include/hw/acpi/acpi_ghes.h
>> @@ -24,6 +24,9 @@
>>
>>  #include "hw/acpi/bios-linker-loader.h"
>>
>> +#define ACPI_GHES_CPER_OK                   1
>> +#define ACPI_GHES_CPER_FAIL                 0
>> +
> 
> Is there really a need to introduce those ?
> 

Don't you think it's more clear than using "1" or "0"? :)

>>  /*
>>   * Values for Hardware Error Notification Type field
>>   */
>> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
>>
>>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
>>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
>> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
>>  #endif
> 
> All the above should preferably land in a separate patch
> 

OK.

>> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
>> index 9d143282bc..321ead8115 100644
>> --- a/include/sysemu/kvm.h
>> +++ b/include/sysemu/kvm.h
>> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
>>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
>>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
>>
>> -#ifdef TARGET_I386
>> -#define KVM_HAVE_MCE_INJECTION 1
>> +#ifdef KVM_HAVE_MCE_INJECTION
>>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
>>  #endif
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index d844ea21d8..c4fe6ccc63 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -28,6 +28,10 @@
>>  /* ARM processors have a weak memory model */
>>  #define TCG_GUEST_DEFAULT_MO      (0)
>>
>> +#ifdef TARGET_AARCH64
>> +#define KVM_HAVE_MCE_INJECTION 1
>> +#endif
>> +
>>  #define EXCP_UDEF            1   /* undefined instruction */
>>  #define EXCP_SWI             2   /* software interrupt */
>>  #define EXCP_PREFETCH_ABORT  3
>> diff --git a/target/arm/helper.c b/target/arm/helper.c
>> index 63815fc4cf..a9ce97efb1 100644
>> --- a/target/arm/helper.c
>> +++ b/target/arm/helper.c
>> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
>>               * Report exception with ESR indicating a fault due to a
>>               * translation table walk for a cache maintenance instruction.
>>               */
>> -            syn = syn_data_abort_no_iss(current_el == target_el,
>> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
>>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
>>              env->exception.vaddress = value;
>>              env->exception.fsr = fsr;
>> diff --git a/target/arm/internals.h b/target/arm/internals.h
>> index f5313dd3d4..28b8451d6d 100644
>> --- a/target/arm/internals.h
>> +++ b/target/arm/internals.h
>> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
>>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
>>  }
>>
>> -static inline uint32_t syn_data_abort_no_iss(int same_el,
>> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
>>                                               int ea, int cm, int s1ptw,
>>                                               int wnr, int fsc)
>>  {
>>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
>>             | ARM_EL_IL
>> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
>> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
>> +           | (wnr << 6) | fsc;
>>  }
>>
>>  static inline uint32_t syn_data_abort_with_iss(int same_el,
>> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
>> index 28f6db57d5..c7b7653d3f 100644
>> --- a/target/arm/kvm64.c
>> +++ b/target/arm/kvm64.c
>> @@ -28,6 +28,8 @@
>>  #include "kvm_arm.h"
>>  #include "hw/boards.h"
>>  #include "internals.h"
>> +#include "hw/acpi/acpi.h"
>> +#include "hw/acpi/acpi_ghes.h"
>>
>>  static bool have_guest_debug;
>>
>> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
>>      return KVM_PUT_RUNTIME_STATE;
>>  }
>>
>> +/* Callers must hold the iothread mutex lock */
>> +static void kvm_inject_arm_sea(CPUState *c)
> 
> We could enclose this function along with the kvm_arch_on_sigbus_vcpu
> within ifdef switch for KVM_HAVE_MCE_INJECTION
> 

Peter suggested to define KVM_HAVE_MCE_INJECTION within ifdef TARGET_AARCH64.
So isn't KVM_HAVE_MCE_INJECTION always defined in the target/arm/kvm64.c?

>> +{
>> +    ARMCPU *cpu = ARM_CPU(c);
>> +    CPUARMState *env = &cpu->env;
>> +    CPUClass *cc = CPU_GET_CLASS(c);
>> +    uint32_t esr;
>> +    bool same_el;
>> +
>> +    c->exception_index = EXCP_DATA_ABORT;
>> +    env->exception.target_el = 1;
>> +
>> +    /*
>> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
>> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
>> +     */
>> +    same_el = arm_current_el(env) == env->exception.target_el;
>> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> 
> IINM this is the only use case when FnV is considered to be valid
> so I'm not convinced it is worth to modify the syn_data_abort_no_iss
> just for this.
> 
>> +
>> +    env->exception.syndrome = esr;
>> +
>> +    cc->do_interrupt(c);
>> +}
>> +
>>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
>>
>> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>>      return ret;
>>  }
>>
>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>> +{
>> +    ram_addr_t ram_addr;
>> +    hwaddr paddr;
>> +
>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>> +
>> +    if (acpi_enabled && addr &&
>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>> +        ram_addr = qemu_ram_addr_from_host(addr);
>> +        if (ram_addr != RAM_ADDR_INVALID &&
>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>> +            kvm_hwpoison_page_add(ram_addr);
>> +            /*
>> +             * Asynchronous signal will be masked by main thread, so
>> +             * only handle synchronous signal.
>> +             */
> 
> I'm not entirely sure that the comment above is correct (it has been
> pointed out before). I would expect the AO signal to be handled here as
> well. Not having proper support to do that just yet is another story but
> the comment might be bit misleading.
> 

We also expect the AO signal can be handled here. Maybe we could add the comment like:

"Asynchronous signal is masked by main thread now. Once it can be asserted, we could
handle it." :)

> 
>> +            if (code == BUS_MCEERR_AR) {
>> +                kvm_cpu_synchronize_state(c);
>> +                if (ACPI_GHES_CPER_FAIL !=
>> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
>> +                    kvm_inject_arm_sea(c);
>> +                } else {
>> +                    fprintf(stderr, "failed to record the error\n");
>> +                }
>> +            }
>> +            return;
>> +        }
>> +        fprintf(stderr, "Hardware memory error for memory used by "
>> +                "QEMU itself instead of guest system!\n");
>> +    }
>> +
>> +    if (code == BUS_MCEERR_AR) {
>> +        fprintf(stderr, "Hardware memory error!\n");
>> +        exit(1);
>> +    }
>> +}
>> +
>>  /* C6.6.29 BRK instruction */
>>  static const uint32_t brk_insn = 0xd4200000;
>>
>> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
>> index 5feb312941..499672ebbc 100644
>> --- a/target/arm/tlb_helper.c
>> +++ b/target/arm/tlb_helper.c
>> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
>>       * ISV field.
>>       */
>>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
>> -        syn = syn_data_abort_no_iss(same_el,
>> +        syn = syn_data_abort_no_iss(same_el, 0,
>>                                      ea, 0, s1ptw, is_write, fsc);
>>      } else {
>>          /*
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 5352c9ff55..f75a210f96 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -29,6 +29,8 @@
>>  /* The x86 has a strong memory model with some store-after-load re-ordering */
>>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
>>
>> +#define KVM_HAVE_MCE_INJECTION 1
>> +
>>  /* Maximum instruction code size */
>>  #define TARGET_MAX_INSN_SIZE 16
>>
>> --
>> 2.19.1
>>
>>
>>
> 
> .
> 

-- 

Thanks,
Xiang


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-11-27 12:47       ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-11-27 12:47 UTC (permalink / raw)
  To: Beata Michalska
  Cc: Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, jonathan.cameron, imammedo, pbonzini, xuwei5,
	Laszlo Ersek, rth

Hi Beata,

Thanks for you review!

On 2019/11/22 23:47, Beata Michalska wrote:
> Hi,
> 
> On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>
>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>
>> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
>> translates the host VA delivered by host to guest PA, then fills this PA
>> to guest APEI GHES memory, then notifies guest according to the SIGBUS
>> type.
>>
>> When guest accesses the poisoned memory, it will generate a Synchronous
>> External Abort(SEA). Then host kernel gets an APEI notification and calls
>> memory_failure() to unmapped the affected page in stage 2, finally
>> returns to guest.
>>
>> Guest continues to access the PG_hwpoison page, it will trap to KVM as
>> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
>> Qemu, Qemu records this error address into guest APEI GHES memory and
>> notifes guest using Synchronous-External-Abort(SEA).
>>
>> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
>> in which we can setup the type of exception and the syndrome information.
>> When switching to guest, the target vcpu will jump to the synchronous
>> external abort vector table entry.
>>
>> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
>> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
>> not valid and hold an UNKNOWN value. These values will be set to KVM
>> register structures through KVM_SET_ONE_REG IOCTL.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>>  include/hw/acpi/acpi_ghes.h |   4 +
>>  include/sysemu/kvm.h        |   3 +-
>>  target/arm/cpu.h            |   4 +
>>  target/arm/helper.c         |   2 +-
>>  target/arm/internals.h      |   5 +-
>>  target/arm/kvm64.c          |  64 ++++++++
>>  target/arm/tlb_helper.c     |   2 +-
>>  target/i386/cpu.h           |   2 +
>>  9 files changed, 377 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
>> index 42c00ff3d3..f5b54990c0 100644
>> --- a/hw/acpi/acpi_ghes.c
>> +++ b/hw/acpi/acpi_ghes.c
>> @@ -39,6 +39,34 @@
>>  /* The max size in bytes for one error block */
>>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>>
>> +/*
>> + * The total size of Generic Error Data Entry
>> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-343 Generic Error Data Entry
>> + */
>> +#define ACPI_GHES_DATA_LENGTH               72
>> +
>> +/*
>> + * The memory section CPER size,
>> + * UEFI 2.6: N.2.5 Memory Error Section
>> + */
>> +#define ACPI_GHES_MEM_CPER_LENGTH           80
>> +
>> +/*
>> + * Masks for block_status flags
>> + */
>> +#define ACPI_GEBS_UNCORRECTABLE         1
> 
> Why not listing all supported statuses ? Similar to error severity below ?
> 

We now only use the first bit for uncorrectable error. The correctable errors
are handled in host and would not be delivered to QEMU.

I think it's unnecessary to list all the bit masks.

>> +
>> +/*
>> + * Values for error_severity field
>> + */
>> +enum AcpiGenericErrorSeverity {
>> +    ACPI_CPER_SEV_RECOVERABLE,
>> +    ACPI_CPER_SEV_FATAL,
>> +    ACPI_CPER_SEV_CORRECTED,
>> +    ACPI_CPER_SEV_NONE,
>> +};
>> +
>>  /*
>>   * Now only support ARMv8 SEA notification type error source
>>   */
>> @@ -49,6 +77,16 @@
>>   */
>>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>>
>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>> +
>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>> +    0xED, 0x7C, 0x83, 0xB1)
>> +
>>  /*
>>   * | +--------------------------+ 0
>>   * | |        Header            |
>> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>>      uint64_t ghes_addr_le;
>>  } AcpiGhesState;
>>
>> +/*
>> + * Total size for Generic Error Status Block
>> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-380 Generic Error Status Block
>> + */
>> +#define ACPI_GHES_GESB_SIZE                 20
> 
> Minor: This is not entirely correct: GEDE is part of GESB so the total length
> would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)
> 

Yes, here it only indicates the total length of Generic Error Status Block structure
expect "GEDEs".

>> +/* The offset of Data Length in Generic Error Status Block */
>> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
>> +
> 
> If those were nicely represented as structures you get the offsets easily
> without having number of defines. That could simplify the code and make it
> more readable - see comments below
> 

To address Igor's comment, this macro is useless and I will drop it.

>> +/*
>> + * Record the value of data length for each error status block to avoid getting
>> + * this value from guest.
>> + */
>> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
>> +
>> +/*
>> + * Generic Error Data Entry
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
>> +                uint32_t error_severity, uint16_t revision,
>> +                uint8_t validation_bits, uint8_t flags,
>> +                uint32_t error_data_length, QemuUUID fru_id,
>> +                uint8_t *fru_text, uint64_t time_stamp)
> 
> Why not just defining a struct that represents the GED entry?
> 
>> +{
>> +    QemuUUID uuid_le;
>> +
>> +    /* Section Type */
>> +    uuid_le = qemu_uuid_bswap(section_type);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +    /* Revision */
>> +    build_append_int_noprefix(table, revision, 2);
> 
> Minor: According to the spec it seems that the revision number is
> a fixed value so you could drop that from the parameters....
> or ... use a struct to represent the data
> 
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table, validation_bits, 1);
>> +    /* Flags */
>> +    build_append_int_noprefix(table, flags, 1);
>> +    /* Error Data Length */
>> +    build_append_int_noprefix(table, error_data_length, 4);
>> +
>> +    /* FRU Id */
>> +    uuid_le = qemu_uuid_bswap(fru_id);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* FRU Text */
>> +    g_array_append_vals(table, fru_text, 20);
>> +    /* Timestamp */
>> +    build_append_int_noprefix(table, time_stamp, 8);
>> +}
>> +
>> +/*
>> + * Generic Error Status Block
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
>> +                uint32_t raw_data_offset, uint32_t raw_data_length,
>> +                uint32_t data_length, uint32_t error_severity)
> 
> Same as the above
> 

>> +{
>> +    /* Block Status */
>> +    build_append_int_noprefix(table, block_status, 4);
>> +    /* Raw Data Offset */
>> +    build_append_int_noprefix(table, raw_data_offset, 4);
>> +    /* Raw Data Length */
>> +    build_append_int_noprefix(table, raw_data_length, 4);
>> +    /* Data Length */
>> +    build_append_int_noprefix(table, data_length, 4);
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +}
>> +
>> +/* UEFI 2.6: N.2.5 Memory Error Section */
>> +static void acpi_ghes_build_append_mem_cper(GArray *table,
>> +                                            uint64_t error_physical_addr)
>> +{
>> +    /*
>> +     * Memory Error Record
>> +     */
>> +
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table,
>> +                              (1UL << 14) | /* Type Valid */
>> +                              (1UL << 1) /* Physical Address Valid */,
>> +                              8);
>> +    /* Error Status */
>> +    build_append_int_noprefix(table, 0, 8);
> 
> Just wondering whether it would be worth to specify the Error Type
> through the Error Status ?
> 

Error Status relies on the informations from implementation-specific error registers
which means we need to provide more informations to QEMU to handle.

In current implemention, KVM only delivers BUS_MCEERR_AR type of signal and poisoned
HVA to userspace(QEMU). If we want to extract more information in QEMU, it requires KVM
to provide corresponding informations. However KVM is not ready for that now.

>> +    /* Physical Address */
>> +    build_append_int_noprefix(table, error_physical_addr, 8);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 48);
>> +    /* Memory Error Type */
>> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 7);
>> +}
>> +
>> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
>> +                                      uint64_t error_physical_addr,
>> +                                      uint32_t data_length)
>> +{
>> +    GArray *block;
>> +    uint64_t current_block_length;
>> +    /* Memory Error Section Type */
>> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
> 
> As already mentioned - mixing LE /w BE
> 

>> +    QemuUUID fru_id = {};
>> +    uint8_t fru_text[20] = {};
>> +
>> +    /*
>> +     * Generic Error Status Block
>> +     * | +---------------------+
>> +     * | |     block_status    |
>> +     * | +---------------------+
>> +     * | |    raw_data_offset  |
>> +     * | +---------------------+
>> +     * | |    raw_data_length  |
>> +     * | +---------------------+
>> +     * | |     data_length     |
>> +     * | +---------------------+
>> +     * | |   error_severity    |
>> +     * | +---------------------+
>> +     */
>> +    block = g_array_new(false, true /* clear */, 1);
>> +
>> +    /* The current whole length of the generic error status block */
>> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
>> +
>> +    /* This is the length if adding a new generic error data entry*/
>> +    data_length += ACPI_GHES_DATA_LENGTH;
>> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
>> +
>> +    /*
>> +     * Check whether it will run out of the preallocated memory if adding a new
>> +     * generic error data entry
>> +     */
>> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
>> +        error_report("Record CPER out of boundary!!!");
> 
> Minor: The error message could be made more accurate, like:
>     "Not enough memory to record new CPER"
> 

OK.

>> +        return ACPI_GHES_CPER_FAIL;
>> +    }
>> +
>> +    /* Build the new generic error status block header */
>> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
>> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
>> +
>> +    /* Write back above generic error status block header to guest memory */
>> +    cpu_physical_memory_write(error_block_address, block->data,
>> +                              block->len);
>> +
>> +    /* Add a new generic error data entry */
>> +
>> +    data_length = block->len;
>> +    /* Build this new generic error data entry header */
>> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
>> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
>> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
>> +
>> +    /* Build the memory section CPER for above new generic error data entry */
>> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
>> +
>> +    /* Write back above this new generic error data entry to guest memory */
>> +    cpu_physical_memory_write(error_block_address + current_block_length,
>> +        block->data + data_length, block->len - data_length);
>> +
> 
> As already mentioned and unless I have missed smth (which is highly possible)
> this will append new records while the GESB is kept 'in-place'. So the
> used space is
> only growing.
> 

Yes, we need to address this unlimited growing records.

>> +    g_array_free(block, true);
>> +
>> +    return ACPI_GHES_CPER_OK;
>> +}
>> +
>>  /*
>>   * Hardware Error Notification
>>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
>> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>>  }
>> +
>> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
>> +{
>> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
>> +    int loop = 0;
>> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
>> +    bool ret = ACPI_GHES_CPER_FAIL;
>> +    uint8_t source_id;
>> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
>> +
> 
> I'm not entirely sure why this is needed - se below
> 

>> +    /*
>> +     * | +---------------------+ ges.ghes_addr_le
>> +     * | |error_block_address0 |
>> +     * | +---------------------+ --+--
>> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | |error_block_addressN |
>> +     * | +---------------------+
>> +     * | | read_ack_register0  |
>> +     * | +---------------------+ --+--
>> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | | read_ack_registerN  |
>> +     * | +---------------------+ --+--
>> +     * | |      CPER           |   |
>> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
>> +     * | |      CPER           |   |
>> +     * | +---------------------+ --+--
>> +     * | |    ..........       |
>> +     * | +---------------------+
>> +     * | |      CPER           |
>> +     * | |      ....           |
>> +     * | |      CPER           |
>> +     * | +---------------------+
>> +     */
>> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
>> +        /* Find and check the source id for this new CPER */
>> +        source_id = error_source_id[notify];
> 
> Why not using switch case for supported source types ?
> For the time being only one is being supported. And you only use that to
> verify that support - seems a bit unnecessary.
> 

Good idea, I think using switch case is much better.

>> +        if (source_id != 0xff) {
>> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
>> +        } else {
>> +            goto out;
>> +        }
>> +
>> +        cpu_physical_memory_read(start_addr, &error_block_addr,
>> +                                 ACPI_GHES_ADDRESS_SIZE);
>> +
>> +        read_ack_register_addr = start_addr +
>> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
>> +retry:
>> +        cpu_physical_memory_read(read_ack_register_addr,
>> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
>> +
>> +        /* zero means OSPM does not acknowledge the error */
>> +        if (!read_ack_register) {
>> +            if (loop < 3) {
>> +                usleep(100 * 1000);
>> +                loop++;
>> +                goto retry;
>> +            } else {
>> +                error_report("OSPM does not acknowledge previous error,"
>> +                    " so can not record CPER for current error, forcibly"
>> +                    " acknowledge previous error to avoid blocking next time"
>> +                    " CPER record! Exit");
>> +                read_ack_register = 1;
>> +                cpu_physical_memory_write(read_ack_register_addr,
>> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> 
> Already mentioned ...
> This seems to be against the spec. It not only ignores the req
> for OSPM to acknowledge receiving notifications for previous errors ,
> but it also loses one of them. Why not caching it somewhere until
> OSPM acknowledges the old ones ?
> 

Yes, Igor had mentioned this point in the previous comments.

>> +            }
>> +        } else {
>> +            if (error_block_addr) {
> 
> What is the use case for the address not being set ?
> 

Hmmm...I'd add a "error_fatal" in this case.

>> +                read_ack_register = 0;
>> +                /*
>> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
>> +                 * acknowledge this error.
>> +                 */
>> +                cpu_physical_memory_write(read_ack_register_addr,
>> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> 
> If the ack register has been cleared - which is why we end up here ....
> why writing it back if there is no notification for the system to process ?
> 

Ending up here means the ack register has been acked by OSPM, we need to clear it so
that OSPM can write it back to 1 at the next time.

>> +                ret = acpi_ghes_record_mem_error(error_block_addr,
>> +                          physical_address, acpi_ghes_data_length[source_id]);
>> +                if (ret == ACPI_GHES_CPER_OK) {
>> +                    acpi_ghes_data_length[source_id] +=
>> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
> 
> As mentioned .. this will run out of space - some roll-back
> mechanism is needed to overwrite stale entries
> 

Yes, Igor had mentioned this point too.

>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +out:
>> +    return ret;
>> +}
>> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
>> index cb62ec9c7b..8e3c5b879e 100644
>> --- a/include/hw/acpi/acpi_ghes.h
>> +++ b/include/hw/acpi/acpi_ghes.h
>> @@ -24,6 +24,9 @@
>>
>>  #include "hw/acpi/bios-linker-loader.h"
>>
>> +#define ACPI_GHES_CPER_OK                   1
>> +#define ACPI_GHES_CPER_FAIL                 0
>> +
> 
> Is there really a need to introduce those ?
> 

Don't you think it's more clear than using "1" or "0"? :)

>>  /*
>>   * Values for Hardware Error Notification Type field
>>   */
>> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
>>
>>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
>>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
>> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
>>  #endif
> 
> All the above should preferably land in a separate patch
> 

OK.

>> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
>> index 9d143282bc..321ead8115 100644
>> --- a/include/sysemu/kvm.h
>> +++ b/include/sysemu/kvm.h
>> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
>>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
>>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
>>
>> -#ifdef TARGET_I386
>> -#define KVM_HAVE_MCE_INJECTION 1
>> +#ifdef KVM_HAVE_MCE_INJECTION
>>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
>>  #endif
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index d844ea21d8..c4fe6ccc63 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -28,6 +28,10 @@
>>  /* ARM processors have a weak memory model */
>>  #define TCG_GUEST_DEFAULT_MO      (0)
>>
>> +#ifdef TARGET_AARCH64
>> +#define KVM_HAVE_MCE_INJECTION 1
>> +#endif
>> +
>>  #define EXCP_UDEF            1   /* undefined instruction */
>>  #define EXCP_SWI             2   /* software interrupt */
>>  #define EXCP_PREFETCH_ABORT  3
>> diff --git a/target/arm/helper.c b/target/arm/helper.c
>> index 63815fc4cf..a9ce97efb1 100644
>> --- a/target/arm/helper.c
>> +++ b/target/arm/helper.c
>> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
>>               * Report exception with ESR indicating a fault due to a
>>               * translation table walk for a cache maintenance instruction.
>>               */
>> -            syn = syn_data_abort_no_iss(current_el == target_el,
>> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
>>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
>>              env->exception.vaddress = value;
>>              env->exception.fsr = fsr;
>> diff --git a/target/arm/internals.h b/target/arm/internals.h
>> index f5313dd3d4..28b8451d6d 100644
>> --- a/target/arm/internals.h
>> +++ b/target/arm/internals.h
>> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
>>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
>>  }
>>
>> -static inline uint32_t syn_data_abort_no_iss(int same_el,
>> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
>>                                               int ea, int cm, int s1ptw,
>>                                               int wnr, int fsc)
>>  {
>>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
>>             | ARM_EL_IL
>> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
>> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
>> +           | (wnr << 6) | fsc;
>>  }
>>
>>  static inline uint32_t syn_data_abort_with_iss(int same_el,
>> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
>> index 28f6db57d5..c7b7653d3f 100644
>> --- a/target/arm/kvm64.c
>> +++ b/target/arm/kvm64.c
>> @@ -28,6 +28,8 @@
>>  #include "kvm_arm.h"
>>  #include "hw/boards.h"
>>  #include "internals.h"
>> +#include "hw/acpi/acpi.h"
>> +#include "hw/acpi/acpi_ghes.h"
>>
>>  static bool have_guest_debug;
>>
>> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
>>      return KVM_PUT_RUNTIME_STATE;
>>  }
>>
>> +/* Callers must hold the iothread mutex lock */
>> +static void kvm_inject_arm_sea(CPUState *c)
> 
> We could enclose this function along with the kvm_arch_on_sigbus_vcpu
> within ifdef switch for KVM_HAVE_MCE_INJECTION
> 

Peter suggested to define KVM_HAVE_MCE_INJECTION within ifdef TARGET_AARCH64.
So isn't KVM_HAVE_MCE_INJECTION always defined in the target/arm/kvm64.c?

>> +{
>> +    ARMCPU *cpu = ARM_CPU(c);
>> +    CPUARMState *env = &cpu->env;
>> +    CPUClass *cc = CPU_GET_CLASS(c);
>> +    uint32_t esr;
>> +    bool same_el;
>> +
>> +    c->exception_index = EXCP_DATA_ABORT;
>> +    env->exception.target_el = 1;
>> +
>> +    /*
>> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
>> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
>> +     */
>> +    same_el = arm_current_el(env) == env->exception.target_el;
>> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> 
> IINM this is the only use case when FnV is considered to be valid
> so I'm not convinced it is worth to modify the syn_data_abort_no_iss
> just for this.
> 
>> +
>> +    env->exception.syndrome = esr;
>> +
>> +    cc->do_interrupt(c);
>> +}
>> +
>>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
>>
>> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>>      return ret;
>>  }
>>
>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>> +{
>> +    ram_addr_t ram_addr;
>> +    hwaddr paddr;
>> +
>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>> +
>> +    if (acpi_enabled && addr &&
>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>> +        ram_addr = qemu_ram_addr_from_host(addr);
>> +        if (ram_addr != RAM_ADDR_INVALID &&
>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>> +            kvm_hwpoison_page_add(ram_addr);
>> +            /*
>> +             * Asynchronous signal will be masked by main thread, so
>> +             * only handle synchronous signal.
>> +             */
> 
> I'm not entirely sure that the comment above is correct (it has been
> pointed out before). I would expect the AO signal to be handled here as
> well. Not having proper support to do that just yet is another story but
> the comment might be bit misleading.
> 

We also expect the AO signal can be handled here. Maybe we could add the comment like:

"Asynchronous signal is masked by main thread now. Once it can be asserted, we could
handle it." :)

> 
>> +            if (code == BUS_MCEERR_AR) {
>> +                kvm_cpu_synchronize_state(c);
>> +                if (ACPI_GHES_CPER_FAIL !=
>> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
>> +                    kvm_inject_arm_sea(c);
>> +                } else {
>> +                    fprintf(stderr, "failed to record the error\n");
>> +                }
>> +            }
>> +            return;
>> +        }
>> +        fprintf(stderr, "Hardware memory error for memory used by "
>> +                "QEMU itself instead of guest system!\n");
>> +    }
>> +
>> +    if (code == BUS_MCEERR_AR) {
>> +        fprintf(stderr, "Hardware memory error!\n");
>> +        exit(1);
>> +    }
>> +}
>> +
>>  /* C6.6.29 BRK instruction */
>>  static const uint32_t brk_insn = 0xd4200000;
>>
>> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
>> index 5feb312941..499672ebbc 100644
>> --- a/target/arm/tlb_helper.c
>> +++ b/target/arm/tlb_helper.c
>> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
>>       * ISV field.
>>       */
>>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
>> -        syn = syn_data_abort_no_iss(same_el,
>> +        syn = syn_data_abort_no_iss(same_el, 0,
>>                                      ea, 0, s1ptw, is_write, fsc);
>>      } else {
>>          /*
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 5352c9ff55..f75a210f96 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -29,6 +29,8 @@
>>  /* The x86 has a strong memory model with some store-after-load re-ordering */
>>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
>>
>> +#define KVM_HAVE_MCE_INJECTION 1
>> +
>>  /* Maximum instruction code size */
>>  #define TARGET_MAX_INSN_SIZE 16
>>
>> --
>> 2.19.1
>>
>>
>>
> 
> .
> 

-- 

Thanks,
Xiang



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-27 12:47       ` Xiang Zheng
@ 2019-11-27 13:02         ` Igor Mammedov
  -1 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-27 13:02 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: Beata Michalska, pbonzini, mst, shannon.zhaosl, Peter Maydell,
	Laszlo Ersek, james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

On Wed, 27 Nov 2019 20:47:15 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> Hi Beata,
> 
> Thanks for you review!
> 
> On 2019/11/22 23:47, Beata Michalska wrote:
> > Hi,
> > 
> > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:  
> >>
> >> From: Dongjiu Geng <gengdongjiu@huawei.com>
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> >> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> ---
[...]
> >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> >> index cb62ec9c7b..8e3c5b879e 100644
> >> --- a/include/hw/acpi/acpi_ghes.h
> >> +++ b/include/hw/acpi/acpi_ghes.h
> >> @@ -24,6 +24,9 @@
> >>
> >>  #include "hw/acpi/bios-linker-loader.h"
> >>
> >> +#define ACPI_GHES_CPER_OK                   1
> >> +#define ACPI_GHES_CPER_FAIL                 0
> >> +  
> > 
> > Is there really a need to introduce those ?
> >   
> 
> Don't you think it's more clear than using "1" or "0"? :)

or maybe just reuse default libc return convention: 0 - ok, -1 - fail
and drop custom macros

> 
> >>  /*
> >>   * Values for Hardware Error Notification Type field
> >>   */
[...]


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-11-27 13:02         ` Igor Mammedov
  0 siblings, 0 replies; 82+ messages in thread
From: Igor Mammedov @ 2019-11-27 13:02 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: Peter Maydell, Beata Michalska, kvm, mst, wanghaibin.wang,
	mtosatti, linuxarm, qemu-devel, ehabkost, gengdongjiu,
	shannon.zhaosl, qemu-arm, james.morse, xuwei5, jonathan.cameron,
	pbonzini, Laszlo Ersek, rth

On Wed, 27 Nov 2019 20:47:15 +0800
Xiang Zheng <zhengxiang9@huawei.com> wrote:

> Hi Beata,
> 
> Thanks for you review!
> 
> On 2019/11/22 23:47, Beata Michalska wrote:
> > Hi,
> > 
> > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:  
> >>
> >> From: Dongjiu Geng <gengdongjiu@huawei.com>
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> >> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> ---
[...]
> >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> >> index cb62ec9c7b..8e3c5b879e 100644
> >> --- a/include/hw/acpi/acpi_ghes.h
> >> +++ b/include/hw/acpi/acpi_ghes.h
> >> @@ -24,6 +24,9 @@
> >>
> >>  #include "hw/acpi/bios-linker-loader.h"
> >>
> >> +#define ACPI_GHES_CPER_OK                   1
> >> +#define ACPI_GHES_CPER_FAIL                 0
> >> +  
> > 
> > Is there really a need to introduce those ?
> >   
> 
> Don't you think it's more clear than using "1" or "0"? :)

or maybe just reuse default libc return convention: 0 - ok, -1 - fail
and drop custom macros

> 
> >>  /*
> >>   * Values for Hardware Error Notification Type field
> >>   */
[...]



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-27 12:47       ` Xiang Zheng
@ 2019-11-27 14:17         ` Beata Michalska
  -1 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-27 14:17 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: pbonzini, mst, Igor Mammedov, shannon.zhaosl, Peter Maydell,
	Laszlo Ersek, james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

Hi

On Wed, 27 Nov 2019 at 12:47, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> Hi Beata,
>
> Thanks for you review!
>
YAW

> On 2019/11/22 23:47, Beata Michalska wrote:
> > Hi,
> >
> > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
> >>
> >> From: Dongjiu Geng <gengdongjiu@huawei.com>
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> >> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> ---
> >>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
> >>  include/hw/acpi/acpi_ghes.h |   4 +
> >>  include/sysemu/kvm.h        |   3 +-
> >>  target/arm/cpu.h            |   4 +
> >>  target/arm/helper.c         |   2 +-
> >>  target/arm/internals.h      |   5 +-
> >>  target/arm/kvm64.c          |  64 ++++++++
> >>  target/arm/tlb_helper.c     |   2 +-
> >>  target/i386/cpu.h           |   2 +
> >>  9 files changed, 377 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> >> index 42c00ff3d3..f5b54990c0 100644
> >> --- a/hw/acpi/acpi_ghes.c
> >> +++ b/hw/acpi/acpi_ghes.c
> >> @@ -39,6 +39,34 @@
> >>  /* The max size in bytes for one error block */
> >>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> >>
> >> +/*
> >> + * The total size of Generic Error Data Entry
> >> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-343 Generic Error Data Entry
> >> + */
> >> +#define ACPI_GHES_DATA_LENGTH               72
> >> +
> >> +/*
> >> + * The memory section CPER size,
> >> + * UEFI 2.6: N.2.5 Memory Error Section
> >> + */
> >> +#define ACPI_GHES_MEM_CPER_LENGTH           80
> >> +
> >> +/*
> >> + * Masks for block_status flags
> >> + */
> >> +#define ACPI_GEBS_UNCORRECTABLE         1
> >
> > Why not listing all supported statuses ? Similar to error severity below ?
> >
>
> We now only use the first bit for uncorrectable error. The correctable errors
> are handled in host and would not be delivered to QEMU.
>
> I think it's unnecessary to list all the bit masks.

I'm not sure we are using all the error severity types either, but fair enough.
>
> >> +
> >> +/*
> >> + * Values for error_severity field
> >> + */
> >> +enum AcpiGenericErrorSeverity {
> >> +    ACPI_CPER_SEV_RECOVERABLE,
> >> +    ACPI_CPER_SEV_FATAL,
> >> +    ACPI_CPER_SEV_CORRECTED,
> >> +    ACPI_CPER_SEV_NONE,
> >> +};
> >> +
> >>  /*
> >>   * Now only support ARMv8 SEA notification type error source
> >>   */
> >> @@ -49,6 +77,16 @@
> >>   */
> >>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >>
> >> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> >> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> >> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> >> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> >> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> >> +
> >> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> >> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> >> +    0xED, 0x7C, 0x83, 0xB1)
> >> +
> >>  /*

As suggested in different thread - could this be also made common with
NVMe code ?
> >>   * | +--------------------------+ 0
> >>   * | |        Header            |
> >> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
> >>      uint64_t ghes_addr_le;
> >>  } AcpiGhesState;
> >>
> >> +/*
> >> + * Total size for Generic Error Status Block
> >> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-380 Generic Error Status Block
> >> + */
> >> +#define ACPI_GHES_GESB_SIZE                 20
> >
> > Minor: This is not entirely correct: GEDE is part of GESB so the total length
> > would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)
> >
>
> Yes, here it only indicates the total length of Generic Error Status Block structure
> expect "GEDEs".
>
Sure, just the comment might be misleading. That's minor though.

> >> +/* The offset of Data Length in Generic Error Status Block */
> >> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
> >> +
> >
> > If those were nicely represented as structures you get the offsets easily
> > without having number of defines. That could simplify the code and make it
> > more readable - see comments below
> >
>
> To address Igor's comment, this macro is useless and I will drop it.
>
> >> +/*
> >> + * Record the value of data length for each error status block to avoid getting
> >> + * this value from guest.
> >> + */
> >> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> >> +
> >> +/*
> >> + * Generic Error Data Entry
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> >> +                uint32_t error_severity, uint16_t revision,
> >> +                uint8_t validation_bits, uint8_t flags,
> >> +                uint32_t error_data_length, QemuUUID fru_id,
> >> +                uint8_t *fru_text, uint64_t time_stamp)
> >
> > Why not just defining a struct that represents the GED entry?
> >
> >> +{
> >> +    QemuUUID uuid_le;
> >> +
> >> +    /* Section Type */
> >> +    uuid_le = qemu_uuid_bswap(section_type);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +    /* Revision */
> >> +    build_append_int_noprefix(table, revision, 2);
> >
> > Minor: According to the spec it seems that the revision number is
> > a fixed value so you could drop that from the parameters....
> > or ... use a struct to represent the data
> >
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table, validation_bits, 1);
> >> +    /* Flags */
> >> +    build_append_int_noprefix(table, flags, 1);
> >> +    /* Error Data Length */
> >> +    build_append_int_noprefix(table, error_data_length, 4);
> >> +
> >> +    /* FRU Id */
> >> +    uuid_le = qemu_uuid_bswap(fru_id);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* FRU Text */
> >> +    g_array_append_vals(table, fru_text, 20);
> >> +    /* Timestamp */
> >> +    build_append_int_noprefix(table, time_stamp, 8);
> >> +}
> >> +
> >> +/*
> >> + * Generic Error Status Block
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> >> +                uint32_t raw_data_offset, uint32_t raw_data_length,
> >> +                uint32_t data_length, uint32_t error_severity)
> >
> > Same as the above
> >
>
Still believe having a struct could make the code bit more maintainable
and readable ... :)
> >> +{
> >> +    /* Block Status */
> >> +    build_append_int_noprefix(table, block_status, 4);
> >> +    /* Raw Data Offset */
> >> +    build_append_int_noprefix(table, raw_data_offset, 4);
> >> +    /* Raw Data Length */
> >> +    build_append_int_noprefix(table, raw_data_length, 4);
> >> +    /* Data Length */
> >> +    build_append_int_noprefix(table, data_length, 4);
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +}
> >> +
> >> +/* UEFI 2.6: N.2.5 Memory Error Section */
> >> +static void acpi_ghes_build_append_mem_cper(GArray *table,
> >> +                                            uint64_t error_physical_addr)
> >> +{
> >> +    /*
> >> +     * Memory Error Record
> >> +     */
> >> +
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table,
> >> +                              (1UL << 14) | /* Type Valid */
> >> +                              (1UL << 1) /* Physical Address Valid */,
> >> +                              8);
> >> +    /* Error Status */
> >> +    build_append_int_noprefix(table, 0, 8);
> >
> > Just wondering whether it would be worth to specify the Error Type
> > through the Error Status ?
> >
>
> Error Status relies on the informations from implementation-specific error registers
> which means we need to provide more informations to QEMU to handle.
>
> In current implemention, KVM only delivers BUS_MCEERR_AR type of signal and poisoned
> HVA to userspace(QEMU). If we want to extract more information in QEMU, it requires KVM
> to provide corresponding informations. However KVM is not ready for that now.

Fair enough - thanks.
>
> >> +    /* Physical Address */
> >> +    build_append_int_noprefix(table, error_physical_addr, 8);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 48);
> >> +    /* Memory Error Type */
> >> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 7);
> >> +}
> >> +
> >> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> >> +                                      uint64_t error_physical_addr,
> >> +                                      uint32_t data_length)
> >> +{
> >> +    GArray *block;
> >> +    uint64_t current_block_length;
> >> +    /* Memory Error Section Type */
> >> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
> >
> > As already mentioned - mixing LE /w BE
> >
>
> >> +    QemuUUID fru_id = {};
> >> +    uint8_t fru_text[20] = {};
> >> +
> >> +    /*
> >> +     * Generic Error Status Block
> >> +     * | +---------------------+
> >> +     * | |     block_status    |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_offset  |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_length  |
> >> +     * | +---------------------+
> >> +     * | |     data_length     |
> >> +     * | +---------------------+
> >> +     * | |   error_severity    |
> >> +     * | +---------------------+
> >> +     */
> >> +    block = g_array_new(false, true /* clear */, 1);
> >> +
> >> +    /* The current whole length of the generic error status block */
> >> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> >> +
> >> +    /* This is the length if adding a new generic error data entry*/
> >> +    data_length += ACPI_GHES_DATA_LENGTH;
> >> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> >> +
> >> +    /*
> >> +     * Check whether it will run out of the preallocated memory if adding a new
> >> +     * generic error data entry
> >> +     */
> >> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> >> +        error_report("Record CPER out of boundary!!!");
> >
> > Minor: The error message could be made more accurate, like:
> >     "Not enough memory to record new CPER"
> >
>
> OK.
>
> >> +        return ACPI_GHES_CPER_FAIL;
> >> +    }
> >> +
> >> +    /* Build the new generic error status block header */
> >> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> >> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
> >> +
> >> +    /* Write back above generic error status block header to guest memory */
> >> +    cpu_physical_memory_write(error_block_address, block->data,
> >> +                              block->len);
> >> +
> >> +    /* Add a new generic error data entry */
> >> +
> >> +    data_length = block->len;
> >> +    /* Build this new generic error data entry header */
> >> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> >> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> >> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
> >> +
> >> +    /* Build the memory section CPER for above new generic error data entry */
> >> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> >> +
> >> +    /* Write back above this new generic error data entry to guest memory */
> >> +    cpu_physical_memory_write(error_block_address + current_block_length,
> >> +        block->data + data_length, block->len - data_length);
> >> +
> >
> > As already mentioned and unless I have missed smth (which is highly possible)
> > this will append new records while the GESB is kept 'in-place'. So the
> > used space is
> > only growing.
> >
>
> Yes, we need to address this unlimited growing records.
>
> >> +    g_array_free(block, true);
> >> +
> >> +    return ACPI_GHES_CPER_OK;
> >> +}
> >> +
> >>  /*
> >>   * Hardware Error Notification
> >>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> >> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> >>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> >>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> >>  }
> >> +
> >> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> >> +{
> >> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> >> +    int loop = 0;
> >> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
> >> +    bool ret = ACPI_GHES_CPER_FAIL;
> >> +    uint8_t source_id;
> >> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> >> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
> >> +
> >
> > I'm not entirely sure why this is needed - se below
> >
>
> >> +    /*
> >> +     * | +---------------------+ ges.ghes_addr_le
> >> +     * | |error_block_address0 |
> >> +     * | +---------------------+ --+--
> >> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | |error_block_addressN |
> >> +     * | +---------------------+
> >> +     * | | read_ack_register0  |
> >> +     * | +---------------------+ --+--
> >> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | | read_ack_registerN  |
> >> +     * | +---------------------+ --+--
> >> +     * | |      CPER           |   |
> >> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> >> +     * | |      CPER           |   |
> >> +     * | +---------------------+ --+--
> >> +     * | |    ..........       |
> >> +     * | +---------------------+
> >> +     * | |      CPER           |
> >> +     * | |      ....           |
> >> +     * | |      CPER           |
> >> +     * | +---------------------+
> >> +     */
> >> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> >> +        /* Find and check the source id for this new CPER */
> >> +        source_id = error_source_id[notify];
> >
> > Why not using switch case for supported source types ?
> > For the time being only one is being supported. And you only use that to
> > verify that support - seems a bit unnecessary.
> >
>
> Good idea, I think using switch case is much better.
>
> >> +        if (source_id != 0xff) {
> >> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> >> +        } else {
> >> +            goto out;
> >> +        }
> >> +
> >> +        cpu_physical_memory_read(start_addr, &error_block_addr,
> >> +                                 ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +        read_ack_register_addr = start_addr +
> >> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
> >> +retry:
> >> +        cpu_physical_memory_read(read_ack_register_addr,
> >> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +        /* zero means OSPM does not acknowledge the error */
> >> +        if (!read_ack_register) {
> >> +            if (loop < 3) {
> >> +                usleep(100 * 1000);
> >> +                loop++;
> >> +                goto retry;
> >> +            } else {
> >> +                error_report("OSPM does not acknowledge previous error,"
> >> +                    " so can not record CPER for current error, forcibly"
> >> +                    " acknowledge previous error to avoid blocking next time"
> >> +                    " CPER record! Exit");
> >> +                read_ack_register = 1;
> >> +                cpu_physical_memory_write(read_ack_register_addr,
> >> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> >
> > Already mentioned ...
> > This seems to be against the spec. It not only ignores the req
> > for OSPM to acknowledge receiving notifications for previous errors ,
> > but it also loses one of them. Why not caching it somewhere until
> > OSPM acknowledges the old ones ?
> >
>
> Yes, Igor had mentioned this point in the previous comments.
>
> >> +            }
> >> +        } else {
> >> +            if (error_block_addr) {
> >
> > What is the use case for the address not being set ?
> >
>
> Hmmm...I'd add a "error_fatal" in this case.
>
> >> +                read_ack_register = 0;
> >> +                /*
> >> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
> >> +                 * acknowledge this error.
> >> +                 */
> >> +                cpu_physical_memory_write(read_ack_register_addr,
> >> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> >
> > If the ack register has been cleared - which is why we end up here ....
> > why writing it back if there is no notification for the system to process ?
> >
>
> Ending up here means the ack register has been acked by OSPM, we need to clear it so
> that OSPM can write it back to 1 at the next time.
>
My bad - missed the zeroing part.

> >> +                ret = acpi_ghes_record_mem_error(error_block_addr,
> >> +                          physical_address, acpi_ghes_data_length[source_id]);
> >> +                if (ret == ACPI_GHES_CPER_OK) {
> >> +                    acpi_ghes_data_length[source_id] +=
> >> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
> >
> > As mentioned .. this will run out of space - some roll-back
> > mechanism is needed to overwrite stale entries
> >
>
> Yes, Igor had mentioned this point too.
>
> >> +                }
> >> +            }
> >> +        }
> >> +    }
> >> +
> >> +out:
> >> +    return ret;
> >> +}
> >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> >> index cb62ec9c7b..8e3c5b879e 100644
> >> --- a/include/hw/acpi/acpi_ghes.h
> >> +++ b/include/hw/acpi/acpi_ghes.h
> >> @@ -24,6 +24,9 @@
> >>
> >>  #include "hw/acpi/bios-linker-loader.h"
> >>
> >> +#define ACPI_GHES_CPER_OK                   1
> >> +#define ACPI_GHES_CPER_FAIL                 0
> >> +
> >
> > Is there really a need to introduce those ?
> >
>
> Don't you think it's more clear than using "1" or "0"? :)
>
> >>  /*
> >>   * Values for Hardware Error Notification Type field
> >>   */
> >> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> >>
> >>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> >>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> >> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
> >>  #endif
> >
> > All the above should preferably land in a separate patch
> >
>
> OK.
>
> >> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> >> index 9d143282bc..321ead8115 100644
> >> --- a/include/sysemu/kvm.h
> >> +++ b/include/sysemu/kvm.h
> >> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
> >>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
> >>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
> >>
> >> -#ifdef TARGET_I386
> >> -#define KVM_HAVE_MCE_INJECTION 1
> >> +#ifdef KVM_HAVE_MCE_INJECTION
> >>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
> >>  #endif
> >>
> >> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> >> index d844ea21d8..c4fe6ccc63 100644
> >> --- a/target/arm/cpu.h
> >> +++ b/target/arm/cpu.h
> >> @@ -28,6 +28,10 @@
> >>  /* ARM processors have a weak memory model */
> >>  #define TCG_GUEST_DEFAULT_MO      (0)
> >>
> >> +#ifdef TARGET_AARCH64
> >> +#define KVM_HAVE_MCE_INJECTION 1
> >> +#endif
> >> +
> >>  #define EXCP_UDEF            1   /* undefined instruction */
> >>  #define EXCP_SWI             2   /* software interrupt */
> >>  #define EXCP_PREFETCH_ABORT  3
> >> diff --git a/target/arm/helper.c b/target/arm/helper.c
> >> index 63815fc4cf..a9ce97efb1 100644
> >> --- a/target/arm/helper.c
> >> +++ b/target/arm/helper.c
> >> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
> >>               * Report exception with ESR indicating a fault due to a
> >>               * translation table walk for a cache maintenance instruction.
> >>               */
> >> -            syn = syn_data_abort_no_iss(current_el == target_el,
> >> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
> >>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
> >>              env->exception.vaddress = value;
> >>              env->exception.fsr = fsr;
> >> diff --git a/target/arm/internals.h b/target/arm/internals.h
> >> index f5313dd3d4..28b8451d6d 100644
> >> --- a/target/arm/internals.h
> >> +++ b/target/arm/internals.h
> >> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
> >>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
> >>  }
> >>
> >> -static inline uint32_t syn_data_abort_no_iss(int same_el,
> >> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
> >>                                               int ea, int cm, int s1ptw,
> >>                                               int wnr, int fsc)
> >>  {
> >>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
> >>             | ARM_EL_IL
> >> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
> >> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
> >> +           | (wnr << 6) | fsc;
> >>  }
> >>
> >>  static inline uint32_t syn_data_abort_with_iss(int same_el,
> >> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> >> index 28f6db57d5..c7b7653d3f 100644
> >> --- a/target/arm/kvm64.c
> >> +++ b/target/arm/kvm64.c
> >> @@ -28,6 +28,8 @@
> >>  #include "kvm_arm.h"
> >>  #include "hw/boards.h"
> >>  #include "internals.h"
> >> +#include "hw/acpi/acpi.h"
> >> +#include "hw/acpi/acpi_ghes.h"
> >>
> >>  static bool have_guest_debug;
> >>
> >> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
> >>      return KVM_PUT_RUNTIME_STATE;
> >>  }
> >>
> >> +/* Callers must hold the iothread mutex lock */
> >> +static void kvm_inject_arm_sea(CPUState *c)
> >
> > We could enclose this function along with the kvm_arch_on_sigbus_vcpu
> > within ifdef switch for KVM_HAVE_MCE_INJECTION
> >
>
> Peter suggested to define KVM_HAVE_MCE_INJECTION within ifdef TARGET_AARCH64.
> So isn't KVM_HAVE_MCE_INJECTION always defined in the target/arm/kvm64.c?
>
OK, not sure why I had v7 in mind but the code changes are in kvm64 so all good.
Apologies for the confusion.

> >> +{
> >> +    ARMCPU *cpu = ARM_CPU(c);
> >> +    CPUARMState *env = &cpu->env;
> >> +    CPUClass *cc = CPU_GET_CLASS(c);
> >> +    uint32_t esr;
> >> +    bool same_el;
> >> +
> >> +    c->exception_index = EXCP_DATA_ABORT;
> >> +    env->exception.target_el = 1;
> >> +
> >> +    /*
> >> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> >> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> >> +     */
> >> +    same_el = arm_current_el(env) == env->exception.target_el;
> >> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> >
> > IINM this is the only use case when FnV is considered to be valid
> > so I'm not convinced it is worth to modify the syn_data_abort_no_iss
> > just for this.
> >
> >> +
> >> +    env->exception.syndrome = esr;
> >> +
> >> +    cc->do_interrupt(c);
> >> +}
> >> +
> >>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> >>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
> >>
> >> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
> >>      return ret;
> >>  }
> >>
> >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >> +{
> >> +    ram_addr_t ram_addr;
> >> +    hwaddr paddr;
> >> +
> >> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> >> +
> >> +    if (acpi_enabled && addr &&
> >> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> >> +        ram_addr = qemu_ram_addr_from_host(addr);
> >> +        if (ram_addr != RAM_ADDR_INVALID &&
> >> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> >> +            kvm_hwpoison_page_add(ram_addr);
> >> +            /*
> >> +             * Asynchronous signal will be masked by main thread, so
> >> +             * only handle synchronous signal.
> >> +             */
> >
> > I'm not entirely sure that the comment above is correct (it has been
> > pointed out before). I would expect the AO signal to be handled here as
> > well. Not having proper support to do that just yet is another story but
> > the comment might be bit misleading.
> >
>
> We also expect the AO signal can be handled here. Maybe we could add the comment like:
>
> "Asynchronous signal is masked by main thread now. Once it can be asserted, we could
> handle it." :)
>
Still not entirely there - if I'm not mistaken. Both BUS_MCEERR_AR and
BUS_MVEERR_AO can end up here.
I'm not entirely sure what you mean by "masked by main thread" ? Both will be
handled by sigbus_handler and as such both will end up here either
directly through kvm_on_sigbus
or through kvm_cpu_exec with pending sigbus. Or am I misguided ?

BR
Beata
> >
> >> +            if (code == BUS_MCEERR_AR) {
> >> +                kvm_cpu_synchronize_state(c);
> >> +                if (ACPI_GHES_CPER_FAIL !=
> >> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> >> +                    kvm_inject_arm_sea(c);
> >> +                } else {
> >> +                    fprintf(stderr, "failed to record the error\n");
> >> +                }
> >> +            }
> >> +            return;
> >> +        }
> >> +        fprintf(stderr, "Hardware memory error for memory used by "
> >> +                "QEMU itself instead of guest system!\n");
> >> +    }
> >> +
> >> +    if (code == BUS_MCEERR_AR) {
> >> +        fprintf(stderr, "Hardware memory error!\n");
> >> +        exit(1);
> >> +    }
> >> +}
> >> +
> >>  /* C6.6.29 BRK instruction */
> >>  static const uint32_t brk_insn = 0xd4200000;
> >>
> >> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> >> index 5feb312941..499672ebbc 100644
> >> --- a/target/arm/tlb_helper.c
> >> +++ b/target/arm/tlb_helper.c
> >> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
> >>       * ISV field.
> >>       */
> >>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> >> -        syn = syn_data_abort_no_iss(same_el,
> >> +        syn = syn_data_abort_no_iss(same_el, 0,
> >>                                      ea, 0, s1ptw, is_write, fsc);
> >>      } else {
> >>          /*
> >> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> >> index 5352c9ff55..f75a210f96 100644
> >> --- a/target/i386/cpu.h
> >> +++ b/target/i386/cpu.h
> >> @@ -29,6 +29,8 @@
> >>  /* The x86 has a strong memory model with some store-after-load re-ordering */
> >>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
> >>
> >> +#define KVM_HAVE_MCE_INJECTION 1
> >> +
> >>  /* Maximum instruction code size */
> >>  #define TARGET_MAX_INSN_SIZE 16
> >>
> >> --
> >> 2.19.1
> >>
> >>
> >>
> >
> > .
> >
>
> --
>
> Thanks,
> Xiang
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-11-27 14:17         ` Beata Michalska
  0 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-27 14:17 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, jonathan.cameron, Igor Mammedov, pbonzini, xuwei5,
	Laszlo Ersek, rth

Hi

On Wed, 27 Nov 2019 at 12:47, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> Hi Beata,
>
> Thanks for you review!
>
YAW

> On 2019/11/22 23:47, Beata Michalska wrote:
> > Hi,
> >
> > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
> >>
> >> From: Dongjiu Geng <gengdongjiu@huawei.com>
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> >> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> ---
> >>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
> >>  include/hw/acpi/acpi_ghes.h |   4 +
> >>  include/sysemu/kvm.h        |   3 +-
> >>  target/arm/cpu.h            |   4 +
> >>  target/arm/helper.c         |   2 +-
> >>  target/arm/internals.h      |   5 +-
> >>  target/arm/kvm64.c          |  64 ++++++++
> >>  target/arm/tlb_helper.c     |   2 +-
> >>  target/i386/cpu.h           |   2 +
> >>  9 files changed, 377 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> >> index 42c00ff3d3..f5b54990c0 100644
> >> --- a/hw/acpi/acpi_ghes.c
> >> +++ b/hw/acpi/acpi_ghes.c
> >> @@ -39,6 +39,34 @@
> >>  /* The max size in bytes for one error block */
> >>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> >>
> >> +/*
> >> + * The total size of Generic Error Data Entry
> >> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-343 Generic Error Data Entry
> >> + */
> >> +#define ACPI_GHES_DATA_LENGTH               72
> >> +
> >> +/*
> >> + * The memory section CPER size,
> >> + * UEFI 2.6: N.2.5 Memory Error Section
> >> + */
> >> +#define ACPI_GHES_MEM_CPER_LENGTH           80
> >> +
> >> +/*
> >> + * Masks for block_status flags
> >> + */
> >> +#define ACPI_GEBS_UNCORRECTABLE         1
> >
> > Why not listing all supported statuses ? Similar to error severity below ?
> >
>
> We now only use the first bit for uncorrectable error. The correctable errors
> are handled in host and would not be delivered to QEMU.
>
> I think it's unnecessary to list all the bit masks.

I'm not sure we are using all the error severity types either, but fair enough.
>
> >> +
> >> +/*
> >> + * Values for error_severity field
> >> + */
> >> +enum AcpiGenericErrorSeverity {
> >> +    ACPI_CPER_SEV_RECOVERABLE,
> >> +    ACPI_CPER_SEV_FATAL,
> >> +    ACPI_CPER_SEV_CORRECTED,
> >> +    ACPI_CPER_SEV_NONE,
> >> +};
> >> +
> >>  /*
> >>   * Now only support ARMv8 SEA notification type error source
> >>   */
> >> @@ -49,6 +77,16 @@
> >>   */
> >>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >>
> >> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> >> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> >> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> >> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> >> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> >> +
> >> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> >> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> >> +    0xED, 0x7C, 0x83, 0xB1)
> >> +
> >>  /*

As suggested in different thread - could this be also made common with
NVMe code ?
> >>   * | +--------------------------+ 0
> >>   * | |        Header            |
> >> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
> >>      uint64_t ghes_addr_le;
> >>  } AcpiGhesState;
> >>
> >> +/*
> >> + * Total size for Generic Error Status Block
> >> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-380 Generic Error Status Block
> >> + */
> >> +#define ACPI_GHES_GESB_SIZE                 20
> >
> > Minor: This is not entirely correct: GEDE is part of GESB so the total length
> > would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)
> >
>
> Yes, here it only indicates the total length of Generic Error Status Block structure
> expect "GEDEs".
>
Sure, just the comment might be misleading. That's minor though.

> >> +/* The offset of Data Length in Generic Error Status Block */
> >> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
> >> +
> >
> > If those were nicely represented as structures you get the offsets easily
> > without having number of defines. That could simplify the code and make it
> > more readable - see comments below
> >
>
> To address Igor's comment, this macro is useless and I will drop it.
>
> >> +/*
> >> + * Record the value of data length for each error status block to avoid getting
> >> + * this value from guest.
> >> + */
> >> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> >> +
> >> +/*
> >> + * Generic Error Data Entry
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> >> +                uint32_t error_severity, uint16_t revision,
> >> +                uint8_t validation_bits, uint8_t flags,
> >> +                uint32_t error_data_length, QemuUUID fru_id,
> >> +                uint8_t *fru_text, uint64_t time_stamp)
> >
> > Why not just defining a struct that represents the GED entry?
> >
> >> +{
> >> +    QemuUUID uuid_le;
> >> +
> >> +    /* Section Type */
> >> +    uuid_le = qemu_uuid_bswap(section_type);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +    /* Revision */
> >> +    build_append_int_noprefix(table, revision, 2);
> >
> > Minor: According to the spec it seems that the revision number is
> > a fixed value so you could drop that from the parameters....
> > or ... use a struct to represent the data
> >
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table, validation_bits, 1);
> >> +    /* Flags */
> >> +    build_append_int_noprefix(table, flags, 1);
> >> +    /* Error Data Length */
> >> +    build_append_int_noprefix(table, error_data_length, 4);
> >> +
> >> +    /* FRU Id */
> >> +    uuid_le = qemu_uuid_bswap(fru_id);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* FRU Text */
> >> +    g_array_append_vals(table, fru_text, 20);
> >> +    /* Timestamp */
> >> +    build_append_int_noprefix(table, time_stamp, 8);
> >> +}
> >> +
> >> +/*
> >> + * Generic Error Status Block
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> >> +                uint32_t raw_data_offset, uint32_t raw_data_length,
> >> +                uint32_t data_length, uint32_t error_severity)
> >
> > Same as the above
> >
>
Still believe having a struct could make the code bit more maintainable
and readable ... :)
> >> +{
> >> +    /* Block Status */
> >> +    build_append_int_noprefix(table, block_status, 4);
> >> +    /* Raw Data Offset */
> >> +    build_append_int_noprefix(table, raw_data_offset, 4);
> >> +    /* Raw Data Length */
> >> +    build_append_int_noprefix(table, raw_data_length, 4);
> >> +    /* Data Length */
> >> +    build_append_int_noprefix(table, data_length, 4);
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +}
> >> +
> >> +/* UEFI 2.6: N.2.5 Memory Error Section */
> >> +static void acpi_ghes_build_append_mem_cper(GArray *table,
> >> +                                            uint64_t error_physical_addr)
> >> +{
> >> +    /*
> >> +     * Memory Error Record
> >> +     */
> >> +
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table,
> >> +                              (1UL << 14) | /* Type Valid */
> >> +                              (1UL << 1) /* Physical Address Valid */,
> >> +                              8);
> >> +    /* Error Status */
> >> +    build_append_int_noprefix(table, 0, 8);
> >
> > Just wondering whether it would be worth to specify the Error Type
> > through the Error Status ?
> >
>
> Error Status relies on the informations from implementation-specific error registers
> which means we need to provide more informations to QEMU to handle.
>
> In current implemention, KVM only delivers BUS_MCEERR_AR type of signal and poisoned
> HVA to userspace(QEMU). If we want to extract more information in QEMU, it requires KVM
> to provide corresponding informations. However KVM is not ready for that now.

Fair enough - thanks.
>
> >> +    /* Physical Address */
> >> +    build_append_int_noprefix(table, error_physical_addr, 8);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 48);
> >> +    /* Memory Error Type */
> >> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 7);
> >> +}
> >> +
> >> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> >> +                                      uint64_t error_physical_addr,
> >> +                                      uint32_t data_length)
> >> +{
> >> +    GArray *block;
> >> +    uint64_t current_block_length;
> >> +    /* Memory Error Section Type */
> >> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
> >
> > As already mentioned - mixing LE /w BE
> >
>
> >> +    QemuUUID fru_id = {};
> >> +    uint8_t fru_text[20] = {};
> >> +
> >> +    /*
> >> +     * Generic Error Status Block
> >> +     * | +---------------------+
> >> +     * | |     block_status    |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_offset  |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_length  |
> >> +     * | +---------------------+
> >> +     * | |     data_length     |
> >> +     * | +---------------------+
> >> +     * | |   error_severity    |
> >> +     * | +---------------------+
> >> +     */
> >> +    block = g_array_new(false, true /* clear */, 1);
> >> +
> >> +    /* The current whole length of the generic error status block */
> >> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> >> +
> >> +    /* This is the length if adding a new generic error data entry*/
> >> +    data_length += ACPI_GHES_DATA_LENGTH;
> >> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> >> +
> >> +    /*
> >> +     * Check whether it will run out of the preallocated memory if adding a new
> >> +     * generic error data entry
> >> +     */
> >> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> >> +        error_report("Record CPER out of boundary!!!");
> >
> > Minor: The error message could be made more accurate, like:
> >     "Not enough memory to record new CPER"
> >
>
> OK.
>
> >> +        return ACPI_GHES_CPER_FAIL;
> >> +    }
> >> +
> >> +    /* Build the new generic error status block header */
> >> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> >> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
> >> +
> >> +    /* Write back above generic error status block header to guest memory */
> >> +    cpu_physical_memory_write(error_block_address, block->data,
> >> +                              block->len);
> >> +
> >> +    /* Add a new generic error data entry */
> >> +
> >> +    data_length = block->len;
> >> +    /* Build this new generic error data entry header */
> >> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> >> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> >> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
> >> +
> >> +    /* Build the memory section CPER for above new generic error data entry */
> >> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> >> +
> >> +    /* Write back above this new generic error data entry to guest memory */
> >> +    cpu_physical_memory_write(error_block_address + current_block_length,
> >> +        block->data + data_length, block->len - data_length);
> >> +
> >
> > As already mentioned and unless I have missed smth (which is highly possible)
> > this will append new records while the GESB is kept 'in-place'. So the
> > used space is
> > only growing.
> >
>
> Yes, we need to address this unlimited growing records.
>
> >> +    g_array_free(block, true);
> >> +
> >> +    return ACPI_GHES_CPER_OK;
> >> +}
> >> +
> >>  /*
> >>   * Hardware Error Notification
> >>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> >> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> >>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> >>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> >>  }
> >> +
> >> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> >> +{
> >> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> >> +    int loop = 0;
> >> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
> >> +    bool ret = ACPI_GHES_CPER_FAIL;
> >> +    uint8_t source_id;
> >> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> >> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
> >> +
> >
> > I'm not entirely sure why this is needed - se below
> >
>
> >> +    /*
> >> +     * | +---------------------+ ges.ghes_addr_le
> >> +     * | |error_block_address0 |
> >> +     * | +---------------------+ --+--
> >> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | |error_block_addressN |
> >> +     * | +---------------------+
> >> +     * | | read_ack_register0  |
> >> +     * | +---------------------+ --+--
> >> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | | read_ack_registerN  |
> >> +     * | +---------------------+ --+--
> >> +     * | |      CPER           |   |
> >> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> >> +     * | |      CPER           |   |
> >> +     * | +---------------------+ --+--
> >> +     * | |    ..........       |
> >> +     * | +---------------------+
> >> +     * | |      CPER           |
> >> +     * | |      ....           |
> >> +     * | |      CPER           |
> >> +     * | +---------------------+
> >> +     */
> >> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> >> +        /* Find and check the source id for this new CPER */
> >> +        source_id = error_source_id[notify];
> >
> > Why not using switch case for supported source types ?
> > For the time being only one is being supported. And you only use that to
> > verify that support - seems a bit unnecessary.
> >
>
> Good idea, I think using switch case is much better.
>
> >> +        if (source_id != 0xff) {
> >> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> >> +        } else {
> >> +            goto out;
> >> +        }
> >> +
> >> +        cpu_physical_memory_read(start_addr, &error_block_addr,
> >> +                                 ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +        read_ack_register_addr = start_addr +
> >> +            ACPI_GHES_ERROR_SOURCE_COUNT * ACPI_GHES_ADDRESS_SIZE;
> >> +retry:
> >> +        cpu_physical_memory_read(read_ack_register_addr,
> >> +                                 &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> >> +
> >> +        /* zero means OSPM does not acknowledge the error */
> >> +        if (!read_ack_register) {
> >> +            if (loop < 3) {
> >> +                usleep(100 * 1000);
> >> +                loop++;
> >> +                goto retry;
> >> +            } else {
> >> +                error_report("OSPM does not acknowledge previous error,"
> >> +                    " so can not record CPER for current error, forcibly"
> >> +                    " acknowledge previous error to avoid blocking next time"
> >> +                    " CPER record! Exit");
> >> +                read_ack_register = 1;
> >> +                cpu_physical_memory_write(read_ack_register_addr,
> >> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> >
> > Already mentioned ...
> > This seems to be against the spec. It not only ignores the req
> > for OSPM to acknowledge receiving notifications for previous errors ,
> > but it also loses one of them. Why not caching it somewhere until
> > OSPM acknowledges the old ones ?
> >
>
> Yes, Igor had mentioned this point in the previous comments.
>
> >> +            }
> >> +        } else {
> >> +            if (error_block_addr) {
> >
> > What is the use case for the address not being set ?
> >
>
> Hmmm...I'd add a "error_fatal" in this case.
>
> >> +                read_ack_register = 0;
> >> +                /*
> >> +                 * Clear the Read Ack Register, OSPM will write it to 1 when
> >> +                 * acknowledge this error.
> >> +                 */
> >> +                cpu_physical_memory_write(read_ack_register_addr,
> >> +                    &read_ack_register, ACPI_GHES_ADDRESS_SIZE);
> >
> > If the ack register has been cleared - which is why we end up here ....
> > why writing it back if there is no notification for the system to process ?
> >
>
> Ending up here means the ack register has been acked by OSPM, we need to clear it so
> that OSPM can write it back to 1 at the next time.
>
My bad - missed the zeroing part.

> >> +                ret = acpi_ghes_record_mem_error(error_block_addr,
> >> +                          physical_address, acpi_ghes_data_length[source_id]);
> >> +                if (ret == ACPI_GHES_CPER_OK) {
> >> +                    acpi_ghes_data_length[source_id] +=
> >> +                        (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
> >
> > As mentioned .. this will run out of space - some roll-back
> > mechanism is needed to overwrite stale entries
> >
>
> Yes, Igor had mentioned this point too.
>
> >> +                }
> >> +            }
> >> +        }
> >> +    }
> >> +
> >> +out:
> >> +    return ret;
> >> +}
> >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> >> index cb62ec9c7b..8e3c5b879e 100644
> >> --- a/include/hw/acpi/acpi_ghes.h
> >> +++ b/include/hw/acpi/acpi_ghes.h
> >> @@ -24,6 +24,9 @@
> >>
> >>  #include "hw/acpi/bios-linker-loader.h"
> >>
> >> +#define ACPI_GHES_CPER_OK                   1
> >> +#define ACPI_GHES_CPER_FAIL                 0
> >> +
> >
> > Is there really a need to introduce those ?
> >
>
> Don't you think it's more clear than using "1" or "0"? :)
>
> >>  /*
> >>   * Values for Hardware Error Notification Type field
> >>   */
> >> @@ -53,4 +56,5 @@ void acpi_ghes_build_hest(GArray *table_data, GArray *hardware_error,
> >>
> >>  void acpi_ghes_build_error_table(GArray *hardware_errors, BIOSLinker *linker);
> >>  void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_errors);
> >> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t error_physical_addr);
> >>  #endif
> >
> > All the above should preferably land in a separate patch
> >
>
> OK.
>
> >> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> >> index 9d143282bc..321ead8115 100644
> >> --- a/include/sysemu/kvm.h
> >> +++ b/include/sysemu/kvm.h
> >> @@ -378,8 +378,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
> >>  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
> >>  unsigned long kvm_arch_vcpu_id(CPUState *cpu);
> >>
> >> -#ifdef TARGET_I386
> >> -#define KVM_HAVE_MCE_INJECTION 1
> >> +#ifdef KVM_HAVE_MCE_INJECTION
> >>  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
> >>  #endif
> >>
> >> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> >> index d844ea21d8..c4fe6ccc63 100644
> >> --- a/target/arm/cpu.h
> >> +++ b/target/arm/cpu.h
> >> @@ -28,6 +28,10 @@
> >>  /* ARM processors have a weak memory model */
> >>  #define TCG_GUEST_DEFAULT_MO      (0)
> >>
> >> +#ifdef TARGET_AARCH64
> >> +#define KVM_HAVE_MCE_INJECTION 1
> >> +#endif
> >> +
> >>  #define EXCP_UDEF            1   /* undefined instruction */
> >>  #define EXCP_SWI             2   /* software interrupt */
> >>  #define EXCP_PREFETCH_ABORT  3
> >> diff --git a/target/arm/helper.c b/target/arm/helper.c
> >> index 63815fc4cf..a9ce97efb1 100644
> >> --- a/target/arm/helper.c
> >> +++ b/target/arm/helper.c
> >> @@ -3005,7 +3005,7 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
> >>               * Report exception with ESR indicating a fault due to a
> >>               * translation table walk for a cache maintenance instruction.
> >>               */
> >> -            syn = syn_data_abort_no_iss(current_el == target_el,
> >> +            syn = syn_data_abort_no_iss(current_el == target_el, 0,
> >>                                          fi.ea, 1, fi.s1ptw, 1, fsc);
> >>              env->exception.vaddress = value;
> >>              env->exception.fsr = fsr;
> >> diff --git a/target/arm/internals.h b/target/arm/internals.h
> >> index f5313dd3d4..28b8451d6d 100644
> >> --- a/target/arm/internals.h
> >> +++ b/target/arm/internals.h
> >> @@ -451,13 +451,14 @@ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
> >>          | ARM_EL_IL | (ea << 9) | (s1ptw << 7) | fsc;
> >>  }
> >>
> >> -static inline uint32_t syn_data_abort_no_iss(int same_el,
> >> +static inline uint32_t syn_data_abort_no_iss(int same_el, int fnv,
> >>                                               int ea, int cm, int s1ptw,
> >>                                               int wnr, int fsc)
> >>  {
> >>      return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
> >>             | ARM_EL_IL
> >> -           | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
> >> +           | (fnv << 10) | (ea << 9) | (cm << 8) | (s1ptw << 7)
> >> +           | (wnr << 6) | fsc;
> >>  }
> >>
> >>  static inline uint32_t syn_data_abort_with_iss(int same_el,
> >> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> >> index 28f6db57d5..c7b7653d3f 100644
> >> --- a/target/arm/kvm64.c
> >> +++ b/target/arm/kvm64.c
> >> @@ -28,6 +28,8 @@
> >>  #include "kvm_arm.h"
> >>  #include "hw/boards.h"
> >>  #include "internals.h"
> >> +#include "hw/acpi/acpi.h"
> >> +#include "hw/acpi/acpi_ghes.h"
> >>
> >>  static bool have_guest_debug;
> >>
> >> @@ -710,6 +712,30 @@ int kvm_arm_cpreg_level(uint64_t regidx)
> >>      return KVM_PUT_RUNTIME_STATE;
> >>  }
> >>
> >> +/* Callers must hold the iothread mutex lock */
> >> +static void kvm_inject_arm_sea(CPUState *c)
> >
> > We could enclose this function along with the kvm_arch_on_sigbus_vcpu
> > within ifdef switch for KVM_HAVE_MCE_INJECTION
> >
>
> Peter suggested to define KVM_HAVE_MCE_INJECTION within ifdef TARGET_AARCH64.
> So isn't KVM_HAVE_MCE_INJECTION always defined in the target/arm/kvm64.c?
>
OK, not sure why I had v7 in mind but the code changes are in kvm64 so all good.
Apologies for the confusion.

> >> +{
> >> +    ARMCPU *cpu = ARM_CPU(c);
> >> +    CPUARMState *env = &cpu->env;
> >> +    CPUClass *cc = CPU_GET_CLASS(c);
> >> +    uint32_t esr;
> >> +    bool same_el;
> >> +
> >> +    c->exception_index = EXCP_DATA_ABORT;
> >> +    env->exception.target_el = 1;
> >> +
> >> +    /*
> >> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> >> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> >> +     */
> >> +    same_el = arm_current_el(env) == env->exception.target_el;
> >> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> >
> > IINM this is the only use case when FnV is considered to be valid
> > so I'm not convinced it is worth to modify the syn_data_abort_no_iss
> > just for this.
> >
> >> +
> >> +    env->exception.syndrome = esr;
> >> +
> >> +    cc->do_interrupt(c);
> >> +}
> >> +
> >>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> >>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
> >>
> >> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
> >>      return ret;
> >>  }
> >>
> >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >> +{
> >> +    ram_addr_t ram_addr;
> >> +    hwaddr paddr;
> >> +
> >> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> >> +
> >> +    if (acpi_enabled && addr &&
> >> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> >> +        ram_addr = qemu_ram_addr_from_host(addr);
> >> +        if (ram_addr != RAM_ADDR_INVALID &&
> >> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> >> +            kvm_hwpoison_page_add(ram_addr);
> >> +            /*
> >> +             * Asynchronous signal will be masked by main thread, so
> >> +             * only handle synchronous signal.
> >> +             */
> >
> > I'm not entirely sure that the comment above is correct (it has been
> > pointed out before). I would expect the AO signal to be handled here as
> > well. Not having proper support to do that just yet is another story but
> > the comment might be bit misleading.
> >
>
> We also expect the AO signal can be handled here. Maybe we could add the comment like:
>
> "Asynchronous signal is masked by main thread now. Once it can be asserted, we could
> handle it." :)
>
Still not entirely there - if I'm not mistaken. Both BUS_MCEERR_AR and
BUS_MVEERR_AO can end up here.
I'm not entirely sure what you mean by "masked by main thread" ? Both will be
handled by sigbus_handler and as such both will end up here either
directly through kvm_on_sigbus
or through kvm_cpu_exec with pending sigbus. Or am I misguided ?

BR
Beata
> >
> >> +            if (code == BUS_MCEERR_AR) {
> >> +                kvm_cpu_synchronize_state(c);
> >> +                if (ACPI_GHES_CPER_FAIL !=
> >> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> >> +                    kvm_inject_arm_sea(c);
> >> +                } else {
> >> +                    fprintf(stderr, "failed to record the error\n");
> >> +                }
> >> +            }
> >> +            return;
> >> +        }
> >> +        fprintf(stderr, "Hardware memory error for memory used by "
> >> +                "QEMU itself instead of guest system!\n");
> >> +    }
> >> +
> >> +    if (code == BUS_MCEERR_AR) {
> >> +        fprintf(stderr, "Hardware memory error!\n");
> >> +        exit(1);
> >> +    }
> >> +}
> >> +
> >>  /* C6.6.29 BRK instruction */
> >>  static const uint32_t brk_insn = 0xd4200000;
> >>
> >> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> >> index 5feb312941..499672ebbc 100644
> >> --- a/target/arm/tlb_helper.c
> >> +++ b/target/arm/tlb_helper.c
> >> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
> >>       * ISV field.
> >>       */
> >>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> >> -        syn = syn_data_abort_no_iss(same_el,
> >> +        syn = syn_data_abort_no_iss(same_el, 0,
> >>                                      ea, 0, s1ptw, is_write, fsc);
> >>      } else {
> >>          /*
> >> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> >> index 5352c9ff55..f75a210f96 100644
> >> --- a/target/i386/cpu.h
> >> +++ b/target/i386/cpu.h
> >> @@ -29,6 +29,8 @@
> >>  /* The x86 has a strong memory model with some store-after-load re-ordering */
> >>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
> >>
> >> +#define KVM_HAVE_MCE_INJECTION 1
> >> +
> >>  /* Maximum instruction code size */
> >>  #define TARGET_MAX_INSN_SIZE 16
> >>
> >> --
> >> 2.19.1
> >>
> >>
> >>
> >
> > .
> >
>
> --
>
> Thanks,
> Xiang
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-27 13:02         ` Igor Mammedov
@ 2019-11-27 14:17           ` Beata Michalska
  -1 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-27 14:17 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Xiang Zheng, pbonzini, mst, shannon.zhaosl, Peter Maydell,
	Laszlo Ersek, james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang

On Wed, 27 Nov 2019 at 13:03, Igor Mammedov <imammedo@redhat.com> wrote:
>
> On Wed, 27 Nov 2019 20:47:15 +0800
> Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> > Hi Beata,
> >
> > Thanks for you review!
> >
> > On 2019/11/22 23:47, Beata Michalska wrote:
> > > Hi,
> > >
> > > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
> > >>
> > >> From: Dongjiu Geng <gengdongjiu@huawei.com>
> > >>
> > >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> > >> translates the host VA delivered by host to guest PA, then fills this PA
> > >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> > >> type.
> > >>
> > >> When guest accesses the poisoned memory, it will generate a Synchronous
> > >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> > >> memory_failure() to unmapped the affected page in stage 2, finally
> > >> returns to guest.
> > >>
> > >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> > >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> > >> Qemu, Qemu records this error address into guest APEI GHES memory and
> > >> notifes guest using Synchronous-External-Abort(SEA).
> > >>
> > >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> > >> in which we can setup the type of exception and the syndrome information.
> > >> When switching to guest, the target vcpu will jump to the synchronous
> > >> external abort vector table entry.
> > >>
> > >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> > >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> > >> not valid and hold an UNKNOWN value. These values will be set to KVM
> > >> register structures through KVM_SET_ONE_REG IOCTL.
> > >>
> > >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> > >> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> > >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > >> ---
> [...]
> > >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> > >> index cb62ec9c7b..8e3c5b879e 100644
> > >> --- a/include/hw/acpi/acpi_ghes.h
> > >> +++ b/include/hw/acpi/acpi_ghes.h
> > >> @@ -24,6 +24,9 @@
> > >>
> > >>  #include "hw/acpi/bios-linker-loader.h"
> > >>
> > >> +#define ACPI_GHES_CPER_OK                   1
> > >> +#define ACPI_GHES_CPER_FAIL                 0
> > >> +
> > >
> > > Is there really a need to introduce those ?
> > >
> >
> > Don't you think it's more clear than using "1" or "0"? :)
>
> or maybe just reuse default libc return convention: 0 - ok, -1 - fail
> and drop custom macros
>

Totally agree.

BR
Beata
> >
> > >>  /*
> > >>   * Values for Hardware Error Notification Type field
> > >>   */
> [...]
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-11-27 14:17           ` Beata Michalska
  0 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-11-27 14:17 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, Xiang Zheng,
	qemu-arm, james.morse, xuwei5, jonathan.cameron, pbonzini,
	Laszlo Ersek, rth

On Wed, 27 Nov 2019 at 13:03, Igor Mammedov <imammedo@redhat.com> wrote:
>
> On Wed, 27 Nov 2019 20:47:15 +0800
> Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> > Hi Beata,
> >
> > Thanks for you review!
> >
> > On 2019/11/22 23:47, Beata Michalska wrote:
> > > Hi,
> > >
> > > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
> > >>
> > >> From: Dongjiu Geng <gengdongjiu@huawei.com>
> > >>
> > >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> > >> translates the host VA delivered by host to guest PA, then fills this PA
> > >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> > >> type.
> > >>
> > >> When guest accesses the poisoned memory, it will generate a Synchronous
> > >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> > >> memory_failure() to unmapped the affected page in stage 2, finally
> > >> returns to guest.
> > >>
> > >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> > >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> > >> Qemu, Qemu records this error address into guest APEI GHES memory and
> > >> notifes guest using Synchronous-External-Abort(SEA).
> > >>
> > >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> > >> in which we can setup the type of exception and the syndrome information.
> > >> When switching to guest, the target vcpu will jump to the synchronous
> > >> external abort vector table entry.
> > >>
> > >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> > >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> > >> not valid and hold an UNKNOWN value. These values will be set to KVM
> > >> register structures through KVM_SET_ONE_REG IOCTL.
> > >>
> > >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> > >> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> > >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > >> ---
> [...]
> > >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> > >> index cb62ec9c7b..8e3c5b879e 100644
> > >> --- a/include/hw/acpi/acpi_ghes.h
> > >> +++ b/include/hw/acpi/acpi_ghes.h
> > >> @@ -24,6 +24,9 @@
> > >>
> > >>  #include "hw/acpi/bios-linker-loader.h"
> > >>
> > >> +#define ACPI_GHES_CPER_OK                   1
> > >> +#define ACPI_GHES_CPER_FAIL                 0
> > >> +
> > >
> > > Is there really a need to introduce those ?
> > >
> >
> > Don't you think it's more clear than using "1" or "0"? :)
>
> or maybe just reuse default libc return convention: 0 - ok, -1 - fail
> and drop custom macros
>

Totally agree.

BR
Beata
> >
> > >>  /*
> > >>   * Values for Hardware Error Notification Type field
> > >>   */
> [...]
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 1/6] hw/arm/virt: Introduce a RAS machine option
  2019-11-11  1:40   ` Xiang Zheng
@ 2019-12-02 18:22     ` Peter Maydell
  -1 siblings, 0 replies; 82+ messages in thread
From: Peter Maydell @ 2019-12-02 18:22 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: Paolo Bonzini, Michael S. Tsirkin, Igor Mammedov, Shannon Zhao,
	Laszlo Ersek, James Morse, gengdongjiu, Marcelo Tosatti,
	Richard Henderson, Eduardo Habkost, Jonathan Cameron, xuwei (O),
	kvm-devel, QEMU Developers, qemu-arm, Linuxarm, wanghaibin.wang

On Mon, 11 Nov 2019 at 01:44, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> From: Dongjiu Geng <gengdongjiu@huawei.com>
>
> RAS Virtualization feature is not supported now, so add a RAS machine
> option and disable it by default.
>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> ---
>  hw/arm/virt.c         | 23 +++++++++++++++++++++++
>  include/hw/arm/virt.h |  1 +
>  2 files changed, 24 insertions(+)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index d4bedc2607..ea0fbf82be 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1819,6 +1819,20 @@ static void virt_set_its(Object *obj, bool value, Error **errp)
>      vms->its = value;
>  }
>
> +static bool virt_get_ras(Object *obj, Error **errp)
> +{
> +    VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +    return vms->ras;
> +}
> +
> +static void virt_set_ras(Object *obj, bool value, Error **errp)
> +{
> +    VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +    vms->ras = value;
> +}
> +
>  static char *virt_get_gic_version(Object *obj, Error **errp)
>  {
>      VirtMachineState *vms = VIRT_MACHINE(obj);
> @@ -2122,6 +2136,15 @@ static void virt_instance_init(Object *obj)
>                                      "Valid values are none and smmuv3",
>                                      NULL);
>
> +    /* Default disallows RAS instantiation */
> +    vms->ras = false;
> +    object_property_add_bool(obj, "ras", virt_get_ras,
> +                             virt_set_ras, NULL);
> +    object_property_set_description(obj, "ras",
> +                                    "Set on/off to enable/disable "
> +                                    "RAS instantiation",
> +                                    NULL);

I think we could make the user-facing description of
the option a little clearer: something like
"Set on/off to enable/disable reporting host memory errors
to a KVM guest using ACPI and guest external abort exceptions"

?

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 1/6] hw/arm/virt: Introduce a RAS machine option
@ 2019-12-02 18:22     ` Peter Maydell
  0 siblings, 0 replies; 82+ messages in thread
From: Peter Maydell @ 2019-12-02 18:22 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: Eduardo Habkost, kvm-devel, Michael S. Tsirkin, wanghaibin.wang,
	Marcelo Tosatti, Linuxarm, QEMU Developers, gengdongjiu,
	Shannon Zhao, qemu-arm, James Morse, Jonathan Cameron,
	Igor Mammedov, Paolo Bonzini, xuwei (O),
	Laszlo Ersek, Richard Henderson

On Mon, 11 Nov 2019 at 01:44, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> From: Dongjiu Geng <gengdongjiu@huawei.com>
>
> RAS Virtualization feature is not supported now, so add a RAS machine
> option and disable it by default.
>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> ---
>  hw/arm/virt.c         | 23 +++++++++++++++++++++++
>  include/hw/arm/virt.h |  1 +
>  2 files changed, 24 insertions(+)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index d4bedc2607..ea0fbf82be 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1819,6 +1819,20 @@ static void virt_set_its(Object *obj, bool value, Error **errp)
>      vms->its = value;
>  }
>
> +static bool virt_get_ras(Object *obj, Error **errp)
> +{
> +    VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +    return vms->ras;
> +}
> +
> +static void virt_set_ras(Object *obj, bool value, Error **errp)
> +{
> +    VirtMachineState *vms = VIRT_MACHINE(obj);
> +
> +    vms->ras = value;
> +}
> +
>  static char *virt_get_gic_version(Object *obj, Error **errp)
>  {
>      VirtMachineState *vms = VIRT_MACHINE(obj);
> @@ -2122,6 +2136,15 @@ static void virt_instance_init(Object *obj)
>                                      "Valid values are none and smmuv3",
>                                      NULL);
>
> +    /* Default disallows RAS instantiation */
> +    vms->ras = false;
> +    object_property_add_bool(obj, "ras", virt_get_ras,
> +                             virt_set_ras, NULL);
> +    object_property_set_description(obj, "ras",
> +                                    "Set on/off to enable/disable "
> +                                    "RAS instantiation",
> +                                    NULL);

I think we could make the user-facing description of
the option a little clearer: something like
"Set on/off to enable/disable reporting host memory errors
to a KVM guest using ACPI and guest external abort exceptions"

?

Otherwise
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 4/6] KVM: Move hwpoison page related functions into kvm-all.c
  2019-11-11  1:40   ` Xiang Zheng
@ 2019-12-02 18:23     ` Peter Maydell
  -1 siblings, 0 replies; 82+ messages in thread
From: Peter Maydell @ 2019-12-02 18:23 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: Paolo Bonzini, Michael S. Tsirkin, Igor Mammedov, Shannon Zhao,
	Laszlo Ersek, James Morse, gengdongjiu, Marcelo Tosatti,
	Richard Henderson, Eduardo Habkost, Jonathan Cameron, xuwei (O),
	kvm-devel, QEMU Developers, qemu-arm, Linuxarm, wanghaibin.wang

On Mon, 11 Nov 2019 at 01:44, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> From: Dongjiu Geng <gengdongjiu@huawei.com>
>
> kvm_hwpoison_page_add() and kvm_unpoison_all() will both be used by X86
> and ARM platforms, so moving them into "accel/kvm/kvm-all.c" to avoid
> duplicate code.
>
> For architectures that don't use the poison-list functionality the
> reset handler will harmlessly do nothing, so let's register the
> kvm_unpoison_all() function in the generic kvm_init() function.
>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 4/6] KVM: Move hwpoison page related functions into kvm-all.c
@ 2019-12-02 18:23     ` Peter Maydell
  0 siblings, 0 replies; 82+ messages in thread
From: Peter Maydell @ 2019-12-02 18:23 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: Eduardo Habkost, kvm-devel, Michael S. Tsirkin, wanghaibin.wang,
	Marcelo Tosatti, Linuxarm, QEMU Developers, gengdongjiu,
	Shannon Zhao, qemu-arm, James Morse, Jonathan Cameron,
	Igor Mammedov, Paolo Bonzini, xuwei (O),
	Laszlo Ersek, Richard Henderson

On Mon, 11 Nov 2019 at 01:44, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> From: Dongjiu Geng <gengdongjiu@huawei.com>
>
> kvm_hwpoison_page_add() and kvm_unpoison_all() will both be used by X86
> and ARM platforms, so moving them into "accel/kvm/kvm-all.c" to avoid
> duplicate code.
>
> For architectures that don't use the poison-list functionality the
> reset handler will harmlessly do nothing, so let's register the
> kvm_unpoison_all() function in the generic kvm_init() function.
>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 0/6] Add ARMv8 RAS virtualization support in QEMU
  2019-11-11  1:40 ` Xiang Zheng
@ 2019-12-02 18:27   ` Peter Maydell
  -1 siblings, 0 replies; 82+ messages in thread
From: Peter Maydell @ 2019-12-02 18:27 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: Paolo Bonzini, Michael S. Tsirkin, Igor Mammedov, Shannon Zhao,
	Laszlo Ersek, James Morse, gengdongjiu, Marcelo Tosatti,
	Richard Henderson, Eduardo Habkost, Jonathan Cameron, xuwei (O),
	kvm-devel, QEMU Developers, qemu-arm, Linuxarm, wanghaibin.wang

On Mon, 11 Nov 2019 at 01:44, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> In the ARMv8 platform, the CPU error types are synchronous external abort(SEA)
> and SError Interrupt (SEI). If exception happens in guest, sometimes it's better
> for guest to perform the recovery, because host does not know the detailed
> information of guest. For example, if an exception happens in a user-space
> application within guest, host does not know which application encounters
> errors.
>
> For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify userspace.
> After user space gets the notification, it will record the CPER into guest GHES
> buffer and inject an exception or IRQ into guest.
>
> In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we will
> treat it as a synchronous exception, and notify guest with ARMv8 SEA
> notification type after recording CPER into guest.

Hi; I've given you reviewed-by tags on a couple of patches; other
people have given review comments on some of the other patches,
so I think you have enough to do a v22 addressing those.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 0/6] Add ARMv8 RAS virtualization support in QEMU
@ 2019-12-02 18:27   ` Peter Maydell
  0 siblings, 0 replies; 82+ messages in thread
From: Peter Maydell @ 2019-12-02 18:27 UTC (permalink / raw)
  To: Xiang Zheng
  Cc: Eduardo Habkost, kvm-devel, Michael S. Tsirkin, wanghaibin.wang,
	Marcelo Tosatti, Linuxarm, QEMU Developers, gengdongjiu,
	Shannon Zhao, qemu-arm, James Morse, Jonathan Cameron,
	Igor Mammedov, Paolo Bonzini, xuwei (O),
	Laszlo Ersek, Richard Henderson

On Mon, 11 Nov 2019 at 01:44, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>
> In the ARMv8 platform, the CPU error types are synchronous external abort(SEA)
> and SError Interrupt (SEI). If exception happens in guest, sometimes it's better
> for guest to perform the recovery, because host does not know the detailed
> information of guest. For example, if an exception happens in a user-space
> application within guest, host does not know which application encounters
> errors.
>
> For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify userspace.
> After user space gets the notification, it will record the CPER into guest GHES
> buffer and inject an exception or IRQ into guest.
>
> In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we will
> treat it as a synchronous exception, and notify guest with ARMv8 SEA
> notification type after recording CPER into guest.

Hi; I've given you reviewed-by tags on a couple of patches; other
people have given review comments on some of the other patches,
so I think you have enough to do a v22 addressing those.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 0/6] Add ARMv8 RAS virtualization support in QEMU
  2019-12-02 18:27   ` Peter Maydell
@ 2019-12-03  2:09     ` gengdongjiu
  -1 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-12-03  2:09 UTC (permalink / raw)
  To: Peter Maydell, Xiang Zheng
  Cc: Paolo Bonzini, Michael S. Tsirkin, Igor Mammedov, Shannon Zhao,
	Laszlo Ersek, James Morse, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Jonathan Cameron, xuwei (O),
	kvm-devel, QEMU Developers, qemu-arm, Linuxarm, wanghaibin.wang

On 2019/12/3 2:27, Peter Maydell wrote:
>> application within guest, host does not know which application encounters
>> errors.
>>
>> For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify userspace.
>> After user space gets the notification, it will record the CPER into guest GHES
>> buffer and inject an exception or IRQ into guest.
>>
>> In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we will
>> treat it as a synchronous exception, and notify guest with ARMv8 SEA
>> notification type after recording CPER into guest.
> Hi; I've given you reviewed-by tags on a couple of patches; other
> people have given review comments on some of the other patches,
> so I think you have enough to do a v22 addressing those.
Thanks very much for the reviewed-by tags,  we will upload v22.


> > thanks
> -- PMM
> .
> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 0/6] Add ARMv8 RAS virtualization support in QEMU
@ 2019-12-03  2:09     ` gengdongjiu
  0 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-12-03  2:09 UTC (permalink / raw)
  To: Peter Maydell, Xiang Zheng
  Cc: Eduardo Habkost, kvm-devel, Michael S. Tsirkin, wanghaibin.wang,
	Marcelo Tosatti, QEMU Developers, Linuxarm, Shannon Zhao,
	qemu-arm, James Morse, Jonathan Cameron, Igor Mammedov,
	Paolo Bonzini, xuwei (O),
	Laszlo Ersek, Richard Henderson

On 2019/12/3 2:27, Peter Maydell wrote:
>> application within guest, host does not know which application encounters
>> errors.
>>
>> For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify userspace.
>> After user space gets the notification, it will record the CPER into guest GHES
>> buffer and inject an exception or IRQ into guest.
>>
>> In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we will
>> treat it as a synchronous exception, and notify guest with ARMv8 SEA
>> notification type after recording CPER into guest.
> Hi; I've given you reviewed-by tags on a couple of patches; other
> people have given review comments on some of the other patches,
> so I think you have enough to do a v22 addressing those.
Thanks very much for the reviewed-by tags,  we will upload v22.


> > thanks
> -- PMM
> .
> 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-27 14:17         ` Beata Michalska
@ 2019-12-03  3:35           ` Xiang Zheng
  -1 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-12-03  3:35 UTC (permalink / raw)
  To: Beata Michalska
  Cc: pbonzini, mst, Igor Mammedov, shannon.zhaosl, Peter Maydell,
	Laszlo Ersek, james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang



On 2019/11/27 22:17, Beata Michalska wrote:
> Hi
> 
> On Wed, 27 Nov 2019 at 12:47, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>
>> Hi Beata,
>>
>> Thanks for you review!
>>
> YAW
> 
>> On 2019/11/22 23:47, Beata Michalska wrote:
>>> Hi,
>>>
>>> On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>>>
>>>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>>>
>>>> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
>>>> translates the host VA delivered by host to guest PA, then fills this PA
>>>> to guest APEI GHES memory, then notifies guest according to the SIGBUS
>>>> type.
>>>>
>>>> When guest accesses the poisoned memory, it will generate a Synchronous
>>>> External Abort(SEA). Then host kernel gets an APEI notification and calls
>>>> memory_failure() to unmapped the affected page in stage 2, finally
>>>> returns to guest.
>>>>
>>>> Guest continues to access the PG_hwpoison page, it will trap to KVM as
>>>> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
>>>> Qemu, Qemu records this error address into guest APEI GHES memory and
>>>> notifes guest using Synchronous-External-Abort(SEA).
>>>>
>>>> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
>>>> in which we can setup the type of exception and the syndrome information.
>>>> When switching to guest, the target vcpu will jump to the synchronous
>>>> external abort vector table entry.
>>>>
>>>> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
>>>> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
>>>> not valid and hold an UNKNOWN value. These values will be set to KVM
>>>> register structures through KVM_SET_ONE_REG IOCTL.
>>>>
>>>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>>>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>> ---
>>>>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>>>>  include/hw/acpi/acpi_ghes.h |   4 +
>>>>  include/sysemu/kvm.h        |   3 +-
>>>>  target/arm/cpu.h            |   4 +
>>>>  target/arm/helper.c         |   2 +-
>>>>  target/arm/internals.h      |   5 +-
>>>>  target/arm/kvm64.c          |  64 ++++++++
>>>>  target/arm/tlb_helper.c     |   2 +-
>>>>  target/i386/cpu.h           |   2 +
>>>>  9 files changed, 377 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
>>>> index 42c00ff3d3..f5b54990c0 100644
>>>> --- a/hw/acpi/acpi_ghes.c
>>>> +++ b/hw/acpi/acpi_ghes.c
>>>> @@ -39,6 +39,34 @@
>>>>  /* The max size in bytes for one error block */
>>>>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>>>>
>>>> +/*
>>>> + * The total size of Generic Error Data Entry
>>>> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
>>>> + * Table 18-343 Generic Error Data Entry
>>>> + */
>>>> +#define ACPI_GHES_DATA_LENGTH               72
>>>> +
>>>> +/*
>>>> + * The memory section CPER size,
>>>> + * UEFI 2.6: N.2.5 Memory Error Section
>>>> + */
>>>> +#define ACPI_GHES_MEM_CPER_LENGTH           80
>>>> +
>>>> +/*
>>>> + * Masks for block_status flags
>>>> + */
>>>> +#define ACPI_GEBS_UNCORRECTABLE         1
>>>
>>> Why not listing all supported statuses ? Similar to error severity below ?
>>>
>>
>> We now only use the first bit for uncorrectable error. The correctable errors
>> are handled in host and would not be delivered to QEMU.
>>
>> I think it's unnecessary to list all the bit masks.
> 
> I'm not sure we are using all the error severity types either, but fair enough.
>>
>>>> +
>>>> +/*
>>>> + * Values for error_severity field
>>>> + */
>>>> +enum AcpiGenericErrorSeverity {
>>>> +    ACPI_CPER_SEV_RECOVERABLE,
>>>> +    ACPI_CPER_SEV_FATAL,
>>>> +    ACPI_CPER_SEV_CORRECTED,
>>>> +    ACPI_CPER_SEV_NONE,
>>>> +};
>>>> +
>>>>  /*
>>>>   * Now only support ARMv8 SEA notification type error source
>>>>   */
>>>> @@ -49,6 +77,16 @@
>>>>   */
>>>>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>>>>
>>>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>>>> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>>>> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
>>>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>>>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>>>> +
>>>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>>>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>>>> +    0xED, 0x7C, 0x83, 0xB1)
>>>> +
>>>>  /*
> 
> As suggested in different thread - could this be also made common with
> NVMe code ?

Sure, I will make it common in a separate patch.

>>>> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>>>>      return ret;
>>>>  }
>>>>
>>>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>>> +{
>>>> +    ram_addr_t ram_addr;
>>>> +    hwaddr paddr;
>>>> +
>>>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>>>> +
>>>> +    if (acpi_enabled && addr &&
>>>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>>>> +        ram_addr = qemu_ram_addr_from_host(addr);
>>>> +        if (ram_addr != RAM_ADDR_INVALID &&
>>>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>>>> +            kvm_hwpoison_page_add(ram_addr);
>>>> +            /*
>>>> +             * Asynchronous signal will be masked by main thread, so
>>>> +             * only handle synchronous signal.
>>>> +             */
>>>
>>> I'm not entirely sure that the comment above is correct (it has been
>>> pointed out before). I would expect the AO signal to be handled here as
>>> well. Not having proper support to do that just yet is another story but
>>> the comment might be bit misleading.
>>>
>>
>> We also expect the AO signal can be handled here. Maybe we could add the comment like:
>>
>> "Asynchronous signal is masked by main thread now. Once it can be asserted, we could
>> handle it." :)
>>
> Still not entirely there - if I'm not mistaken. Both BUS_MCEERR_AR and
> BUS_MVEERR_AO can end up here.
> I'm not entirely sure what you mean by "masked by main thread" ? Both will be
> handled by sigbus_handler and as such both will end up here either
> directly through kvm_on_sigbus
> or through kvm_cpu_exec with pending sigbus. Or am I misguided ?
> 

In fact BUS_MCEERR_AO cannot go to here, because QEMU main thread masks the SIGBUS signal[1]
and vcpu threads can only handle the BUS_MCEERR_AR.

         Qemu Main Thread   VCPU Threads

Kernel:  Mask SIGBUS        AO SIGBUS would be send to Qemu main thread in kernel(kill_proc())

KVM:     Mask SIGBUS        Only send AR SIGBUS to VCPU threads in KVM(kvm_send_hwpoison_signal())


However, maybe we shouldn't consider the behaviors of kernel or KVM and just keep
the logic of handling the AO signal in kvm_arch_on_sigbus_vcpu() like what x86 version
does.


[1] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03575.html

-- 

Thanks,
Xiang


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-12-03  3:35           ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-12-03  3:35 UTC (permalink / raw)
  To: Beata Michalska
  Cc: Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, jonathan.cameron, Igor Mammedov, pbonzini, xuwei5,
	Laszlo Ersek, rth



On 2019/11/27 22:17, Beata Michalska wrote:
> Hi
> 
> On Wed, 27 Nov 2019 at 12:47, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>
>> Hi Beata,
>>
>> Thanks for you review!
>>
> YAW
> 
>> On 2019/11/22 23:47, Beata Michalska wrote:
>>> Hi,
>>>
>>> On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>>>
>>>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>>>
>>>> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
>>>> translates the host VA delivered by host to guest PA, then fills this PA
>>>> to guest APEI GHES memory, then notifies guest according to the SIGBUS
>>>> type.
>>>>
>>>> When guest accesses the poisoned memory, it will generate a Synchronous
>>>> External Abort(SEA). Then host kernel gets an APEI notification and calls
>>>> memory_failure() to unmapped the affected page in stage 2, finally
>>>> returns to guest.
>>>>
>>>> Guest continues to access the PG_hwpoison page, it will trap to KVM as
>>>> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
>>>> Qemu, Qemu records this error address into guest APEI GHES memory and
>>>> notifes guest using Synchronous-External-Abort(SEA).
>>>>
>>>> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
>>>> in which we can setup the type of exception and the syndrome information.
>>>> When switching to guest, the target vcpu will jump to the synchronous
>>>> external abort vector table entry.
>>>>
>>>> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
>>>> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
>>>> not valid and hold an UNKNOWN value. These values will be set to KVM
>>>> register structures through KVM_SET_ONE_REG IOCTL.
>>>>
>>>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>>>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>> ---
>>>>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>>>>  include/hw/acpi/acpi_ghes.h |   4 +
>>>>  include/sysemu/kvm.h        |   3 +-
>>>>  target/arm/cpu.h            |   4 +
>>>>  target/arm/helper.c         |   2 +-
>>>>  target/arm/internals.h      |   5 +-
>>>>  target/arm/kvm64.c          |  64 ++++++++
>>>>  target/arm/tlb_helper.c     |   2 +-
>>>>  target/i386/cpu.h           |   2 +
>>>>  9 files changed, 377 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
>>>> index 42c00ff3d3..f5b54990c0 100644
>>>> --- a/hw/acpi/acpi_ghes.c
>>>> +++ b/hw/acpi/acpi_ghes.c
>>>> @@ -39,6 +39,34 @@
>>>>  /* The max size in bytes for one error block */
>>>>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>>>>
>>>> +/*
>>>> + * The total size of Generic Error Data Entry
>>>> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
>>>> + * Table 18-343 Generic Error Data Entry
>>>> + */
>>>> +#define ACPI_GHES_DATA_LENGTH               72
>>>> +
>>>> +/*
>>>> + * The memory section CPER size,
>>>> + * UEFI 2.6: N.2.5 Memory Error Section
>>>> + */
>>>> +#define ACPI_GHES_MEM_CPER_LENGTH           80
>>>> +
>>>> +/*
>>>> + * Masks for block_status flags
>>>> + */
>>>> +#define ACPI_GEBS_UNCORRECTABLE         1
>>>
>>> Why not listing all supported statuses ? Similar to error severity below ?
>>>
>>
>> We now only use the first bit for uncorrectable error. The correctable errors
>> are handled in host and would not be delivered to QEMU.
>>
>> I think it's unnecessary to list all the bit masks.
> 
> I'm not sure we are using all the error severity types either, but fair enough.
>>
>>>> +
>>>> +/*
>>>> + * Values for error_severity field
>>>> + */
>>>> +enum AcpiGenericErrorSeverity {
>>>> +    ACPI_CPER_SEV_RECOVERABLE,
>>>> +    ACPI_CPER_SEV_FATAL,
>>>> +    ACPI_CPER_SEV_CORRECTED,
>>>> +    ACPI_CPER_SEV_NONE,
>>>> +};
>>>> +
>>>>  /*
>>>>   * Now only support ARMv8 SEA notification type error source
>>>>   */
>>>> @@ -49,6 +77,16 @@
>>>>   */
>>>>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>>>>
>>>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>>>> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>>>> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
>>>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>>>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>>>> +
>>>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>>>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>>>> +    0xED, 0x7C, 0x83, 0xB1)
>>>> +
>>>>  /*
> 
> As suggested in different thread - could this be also made common with
> NVMe code ?

Sure, I will make it common in a separate patch.

>>>> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>>>>      return ret;
>>>>  }
>>>>
>>>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>>> +{
>>>> +    ram_addr_t ram_addr;
>>>> +    hwaddr paddr;
>>>> +
>>>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>>>> +
>>>> +    if (acpi_enabled && addr &&
>>>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>>>> +        ram_addr = qemu_ram_addr_from_host(addr);
>>>> +        if (ram_addr != RAM_ADDR_INVALID &&
>>>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>>>> +            kvm_hwpoison_page_add(ram_addr);
>>>> +            /*
>>>> +             * Asynchronous signal will be masked by main thread, so
>>>> +             * only handle synchronous signal.
>>>> +             */
>>>
>>> I'm not entirely sure that the comment above is correct (it has been
>>> pointed out before). I would expect the AO signal to be handled here as
>>> well. Not having proper support to do that just yet is another story but
>>> the comment might be bit misleading.
>>>
>>
>> We also expect the AO signal can be handled here. Maybe we could add the comment like:
>>
>> "Asynchronous signal is masked by main thread now. Once it can be asserted, we could
>> handle it." :)
>>
> Still not entirely there - if I'm not mistaken. Both BUS_MCEERR_AR and
> BUS_MVEERR_AO can end up here.
> I'm not entirely sure what you mean by "masked by main thread" ? Both will be
> handled by sigbus_handler and as such both will end up here either
> directly through kvm_on_sigbus
> or through kvm_cpu_exec with pending sigbus. Or am I misguided ?
> 

In fact BUS_MCEERR_AO cannot go to here, because QEMU main thread masks the SIGBUS signal[1]
and vcpu threads can only handle the BUS_MCEERR_AR.

         Qemu Main Thread   VCPU Threads

Kernel:  Mask SIGBUS        AO SIGBUS would be send to Qemu main thread in kernel(kill_proc())

KVM:     Mask SIGBUS        Only send AR SIGBUS to VCPU threads in KVM(kvm_send_hwpoison_signal())


However, maybe we shouldn't consider the behaviors of kernel or KVM and just keep
the logic of handling the AO signal in kvm_arch_on_sigbus_vcpu() like what x86 version
does.


[1] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03575.html

-- 

Thanks,
Xiang



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-27 14:17           ` Beata Michalska
@ 2019-12-03  3:35             ` Xiang Zheng
  -1 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-12-03  3:35 UTC (permalink / raw)
  To: Beata Michalska, Igor Mammedov
  Cc: pbonzini, mst, shannon.zhaosl, Peter Maydell, Laszlo Ersek,
	james.morse, gengdongjiu, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang


On 2019/11/27 22:17, Beata Michalska wrote:
> On Wed, 27 Nov 2019 at 13:03, Igor Mammedov <imammedo@redhat.com> wrote:
>>
>> On Wed, 27 Nov 2019 20:47:15 +0800
>> Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>
>>> Hi Beata,
>>>
>>> Thanks for you review!
>>>
>>> On 2019/11/22 23:47, Beata Michalska wrote:
>>>> Hi,
>>>>
>>>> On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>>>>
>>>>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>>>>
>>>>> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
>>>>> translates the host VA delivered by host to guest PA, then fills this PA
>>>>> to guest APEI GHES memory, then notifies guest according to the SIGBUS
>>>>> type.
>>>>>
>>>>> When guest accesses the poisoned memory, it will generate a Synchronous
>>>>> External Abort(SEA). Then host kernel gets an APEI notification and calls
>>>>> memory_failure() to unmapped the affected page in stage 2, finally
>>>>> returns to guest.
>>>>>
>>>>> Guest continues to access the PG_hwpoison page, it will trap to KVM as
>>>>> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
>>>>> Qemu, Qemu records this error address into guest APEI GHES memory and
>>>>> notifes guest using Synchronous-External-Abort(SEA).
>>>>>
>>>>> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
>>>>> in which we can setup the type of exception and the syndrome information.
>>>>> When switching to guest, the target vcpu will jump to the synchronous
>>>>> external abort vector table entry.
>>>>>
>>>>> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
>>>>> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
>>>>> not valid and hold an UNKNOWN value. These values will be set to KVM
>>>>> register structures through KVM_SET_ONE_REG IOCTL.
>>>>>
>>>>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>>>>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>>> ---
>> [...]
>>>>> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
>>>>> index cb62ec9c7b..8e3c5b879e 100644
>>>>> --- a/include/hw/acpi/acpi_ghes.h
>>>>> +++ b/include/hw/acpi/acpi_ghes.h
>>>>> @@ -24,6 +24,9 @@
>>>>>
>>>>>  #include "hw/acpi/bios-linker-loader.h"
>>>>>
>>>>> +#define ACPI_GHES_CPER_OK                   1
>>>>> +#define ACPI_GHES_CPER_FAIL                 0
>>>>> +
>>>>
>>>> Is there really a need to introduce those ?
>>>>
>>>
>>> Don't you think it's more clear than using "1" or "0"? :)
>>
>> or maybe just reuse default libc return convention: 0 - ok, -1 - fail
>> and drop custom macros
>>
> 
> Totally agree.
> 

OK, let's reuse default libc return convention.

-- 

Thanks,
Xiang


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-12-03  3:35             ` Xiang Zheng
  0 siblings, 0 replies; 82+ messages in thread
From: Xiang Zheng @ 2019-12-03  3:35 UTC (permalink / raw)
  To: Beata Michalska, Igor Mammedov
  Cc: Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	linuxarm, qemu-devel, gengdongjiu, shannon.zhaosl, qemu-arm,
	james.morse, xuwei5, jonathan.cameron, pbonzini, Laszlo Ersek,
	rth


On 2019/11/27 22:17, Beata Michalska wrote:
> On Wed, 27 Nov 2019 at 13:03, Igor Mammedov <imammedo@redhat.com> wrote:
>>
>> On Wed, 27 Nov 2019 20:47:15 +0800
>> Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>
>>> Hi Beata,
>>>
>>> Thanks for you review!
>>>
>>> On 2019/11/22 23:47, Beata Michalska wrote:
>>>> Hi,
>>>>
>>>> On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>>>>
>>>>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>>>>
>>>>> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
>>>>> translates the host VA delivered by host to guest PA, then fills this PA
>>>>> to guest APEI GHES memory, then notifies guest according to the SIGBUS
>>>>> type.
>>>>>
>>>>> When guest accesses the poisoned memory, it will generate a Synchronous
>>>>> External Abort(SEA). Then host kernel gets an APEI notification and calls
>>>>> memory_failure() to unmapped the affected page in stage 2, finally
>>>>> returns to guest.
>>>>>
>>>>> Guest continues to access the PG_hwpoison page, it will trap to KVM as
>>>>> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
>>>>> Qemu, Qemu records this error address into guest APEI GHES memory and
>>>>> notifes guest using Synchronous-External-Abort(SEA).
>>>>>
>>>>> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
>>>>> in which we can setup the type of exception and the syndrome information.
>>>>> When switching to guest, the target vcpu will jump to the synchronous
>>>>> external abort vector table entry.
>>>>>
>>>>> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
>>>>> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
>>>>> not valid and hold an UNKNOWN value. These values will be set to KVM
>>>>> register structures through KVM_SET_ONE_REG IOCTL.
>>>>>
>>>>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>>>>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>>> ---
>> [...]
>>>>> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
>>>>> index cb62ec9c7b..8e3c5b879e 100644
>>>>> --- a/include/hw/acpi/acpi_ghes.h
>>>>> +++ b/include/hw/acpi/acpi_ghes.h
>>>>> @@ -24,6 +24,9 @@
>>>>>
>>>>>  #include "hw/acpi/bios-linker-loader.h"
>>>>>
>>>>> +#define ACPI_GHES_CPER_OK                   1
>>>>> +#define ACPI_GHES_CPER_FAIL                 0
>>>>> +
>>>>
>>>> Is there really a need to introduce those ?
>>>>
>>>
>>> Don't you think it's more clear than using "1" or "0"? :)
>>
>> or maybe just reuse default libc return convention: 0 - ok, -1 - fail
>> and drop custom macros
>>
> 
> Totally agree.
> 

OK, let's reuse default libc return convention.

-- 

Thanks,
Xiang



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-22 15:47     ` Beata Michalska
@ 2019-12-07  9:33       ` gengdongjiu
  -1 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-12-07  9:33 UTC (permalink / raw)
  To: Beata Michalska, Xiang Zheng
  Cc: pbonzini, mst, imammedo, shannon.zhaosl, Peter Maydell,
	Laszlo Ersek, james.morse, mtosatti, rth, ehabkost,
	jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm, linuxarm,
	wanghaibin.wang



On 2019/11/22 23:47, Beata Michalska wrote:
> Hi,
> 
> On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>
>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>
>> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
>> translates the host VA delivered by host to guest PA, then fills this PA
>> to guest APEI GHES memory, then notifies guest according to the SIGBUS
>> type.
>>
>> When guest accesses the poisoned memory, it will generate a Synchronous
>> External Abort(SEA). Then host kernel gets an APEI notification and calls
>> memory_failure() to unmapped the affected page in stage 2, finally
>> returns to guest.
>>
>> Guest continues to access the PG_hwpoison page, it will trap to KVM as
>> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
>> Qemu, Qemu records this error address into guest APEI GHES memory and
>> notifes guest using Synchronous-External-Abort(SEA).
>>
>> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
>> in which we can setup the type of exception and the syndrome information.
>> When switching to guest, the target vcpu will jump to the synchronous
>> external abort vector table entry.
>>
>> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
>> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
>> not valid and hold an UNKNOWN value. These values will be set to KVM
>> register structures through KVM_SET_ONE_REG IOCTL.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>>  include/hw/acpi/acpi_ghes.h |   4 +
>>  include/sysemu/kvm.h        |   3 +-
>>  target/arm/cpu.h            |   4 +
>>  target/arm/helper.c         |   2 +-
>>  target/arm/internals.h      |   5 +-
>>  target/arm/kvm64.c          |  64 ++++++++
>>  target/arm/tlb_helper.c     |   2 +-
>>  target/i386/cpu.h           |   2 +
>>  9 files changed, 377 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
>> index 42c00ff3d3..f5b54990c0 100644
>> --- a/hw/acpi/acpi_ghes.c
>> +++ b/hw/acpi/acpi_ghes.c
>> @@ -39,6 +39,34 @@
>>  /* The max size in bytes for one error block */
>>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>>
>> +/*
>> + * The total size of Generic Error Data Entry
>> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-343 Generic Error Data Entry
>> + */
>> +#define ACPI_GHES_DATA_LENGTH               72
>> +
>> +/*
>> + * The memory section CPER size,
>> + * UEFI 2.6: N.2.5 Memory Error Section
>> + */
>> +#define ACPI_GHES_MEM_CPER_LENGTH           80
>> +
>> +/*
>> + * Masks for block_status flags
>> + */
>> +#define ACPI_GEBS_UNCORRECTABLE         1
> 
> Why not listing all supported statuses ? Similar to error severity below ?
> 
>> +
>> +/*
>> + * Values for error_severity field
>> + */
>> +enum AcpiGenericErrorSeverity {
>> +    ACPI_CPER_SEV_RECOVERABLE,
>> +    ACPI_CPER_SEV_FATAL,
>> +    ACPI_CPER_SEV_CORRECTED,
>> +    ACPI_CPER_SEV_NONE,
>> +};
>> +
>>  /*
>>   * Now only support ARMv8 SEA notification type error source
>>   */
>> @@ -49,6 +77,16 @@
>>   */
>>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>>
>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>> +
>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>> +    0xED, 0x7C, 0x83, 0xB1)
>> +
>>  /*
>>   * | +--------------------------+ 0
>>   * | |        Header            |
>> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>>      uint64_t ghes_addr_le;
>>  } AcpiGhesState;
>>
>> +/*
>> + * Total size for Generic Error Status Block
>> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-380 Generic Error Status Block
>> + */
>> +#define ACPI_GHES_GESB_SIZE                 20
> 
> Minor: This is not entirely correct: GEDE is part of GESB so the total length
> would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)
yes, the comments needs to correct.

> 
>> +/* The offset of Data Length in Generic Error Status Block */
>> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
>> +
> 
> If those were nicely represented as structures you get the offsets easily
> without having number of defines. That could simplify the code and make it
> more readable - see comments below
> 
>> +/*
>> + * Record the value of data length for each error status block to avoid getting
>> + * this value from guest.
>> + */
>> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
>> +
>> +/*
>> + * Generic Error Data Entry
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
>> +                uint32_t error_severity, uint16_t revision,
>> +                uint8_t validation_bits, uint8_t flags,
>> +                uint32_t error_data_length, QemuUUID fru_id,
>> +                uint8_t *fru_text, uint64_t time_stamp)
> 
> Why not just defining a struct that represents the GED entry?

This is due to address Igor's comments. there are two reasons:
1. avoid define many structures about APEI/GHES/CPER, so you can see it has very little structures definition in acpi_ghes.h
2. using build_append_int_noprefix() to compose the table can avoid considering endian

> 
>> +{
>> +    QemuUUID uuid_le;
>> +
>> +    /* Section Type */
>> +    uuid_le = qemu_uuid_bswap(section_type);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +    /* Revision */
>> +    build_append_int_noprefix(table, revision, 2);
> 
> Minor: According to the spec it seems that the revision number is
> a fixed value so you could drop that from the parameters....
> or ... use a struct to represent the data
> 
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table, validation_bits, 1);
>> +    /* Flags */
>> +    build_append_int_noprefix(table, flags, 1);
>> +    /* Error Data Length */
>> +    build_append_int_noprefix(table, error_data_length, 4);
>> +
>> +    /* FRU Id */
>> +    uuid_le = qemu_uuid_bswap(fru_id);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* FRU Text */
>> +    g_array_append_vals(table, fru_text, 20);
>> +    /* Timestamp */
>> +    build_append_int_noprefix(table, time_stamp, 8);
>> +}
>> +
>> +/*
>> + * Generic Error Status Block
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
>> +                uint32_t raw_data_offset, uint32_t raw_data_length,
>> +                uint32_t data_length, uint32_t error_severity)
> 
> Same as the above
> 
>> +{
>> +    /* Block Status */
>> +    build_append_int_noprefix(table, block_status, 4);
>> +    /* Raw Data Offset */
>> +    build_append_int_noprefix(table, raw_data_offset, 4);
>> +    /* Raw Data Length */
>> +    build_append_int_noprefix(table, raw_data_length, 4);
>> +    /* Data Length */
>> +    build_append_int_noprefix(table, data_length, 4);
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +}
>> +
>> +/* UEFI 2.6: N.2.5 Memory Error Section */
>> +static void acpi_ghes_build_append_mem_cper(GArray *table,
>> +                                            uint64_t error_physical_addr)
>> +{
>> +    /*
>> +     * Memory Error Record
>> +     */
>> +
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table,
>> +                              (1UL << 14) | /* Type Valid */
>> +                              (1UL << 1) /* Physical Address Valid */,
>> +                              8);
>> +    /* Error Status */
>> +    build_append_int_noprefix(table, 0, 8);
> 
> Just wondering whether it would be worth to specify the Error Type
> through the Error Status ?
> 
>> +    /* Physical Address */
>> +    build_append_int_noprefix(table, error_physical_addr, 8);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 48);
>> +    /* Memory Error Type */
>> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 7);
>> +}
>> +
>> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
>> +                                      uint64_t error_physical_addr,
>> +                                      uint32_t data_length)
>> +{
>> +    GArray *block;
>> +    uint64_t current_block_length;
>> +    /* Memory Error Section Type */
>> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
> 
> As already mentioned - mixing LE /w BE
> 
>> +    QemuUUID fru_id = {};
>> +    uint8_t fru_text[20] = {};
>> +
>> +    /*
>> +     * Generic Error Status Block
>> +     * | +---------------------+
>> +     * | |     block_status    |
>> +     * | +---------------------+
>> +     * | |    raw_data_offset  |
>> +     * | +---------------------+
>> +     * | |    raw_data_length  |
>> +     * | +---------------------+
>> +     * | |     data_length     |
>> +     * | +---------------------+
>> +     * | |   error_severity    |
>> +     * | +---------------------+
>> +     */
>> +    block = g_array_new(false, true /* clear */, 1);
>> +
>> +    /* The current whole length of the generic error status block */
>> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
>> +
>> +    /* This is the length if adding a new generic error data entry*/
>> +    data_length += ACPI_GHES_DATA_LENGTH;
>> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
>> +
>> +    /*
>> +     * Check whether it will run out of the preallocated memory if adding a new
>> +     * generic error data entry
>> +     */
>> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
>> +        error_report("Record CPER out of boundary!!!");
> 
> Minor: The error message could be made more accurate, like:
>     "Not enough memory to record new CPER"
> 
>> +        return ACPI_GHES_CPER_FAIL;
>> +    }
>> +
>> +    /* Build the new generic error status block header */
>> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
>> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
>> +
>> +    /* Write back above generic error status block header to guest memory */
>> +    cpu_physical_memory_write(error_block_address, block->data,
>> +                              block->len);
>> +
>> +    /* Add a new generic error data entry */
>> +
>> +    data_length = block->len;
>> +    /* Build this new generic error data entry header */
>> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
>> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
>> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
>> +
>> +    /* Build the memory section CPER for above new generic error data entry */
>> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
>> +
>> +    /* Write back above this new generic error data entry to guest memory */
>> +    cpu_physical_memory_write(error_block_address + current_block_length,
>> +        block->data + data_length, block->len - data_length);
>> +
> 
> As already mentioned and unless I have missed smth (which is highly possible)
> this will append new records while the GESB is kept 'in-place'. So the
> used space is
> only growing.
> 
>> +    g_array_free(block, true);
>> +
>> +    return ACPI_GHES_CPER_OK;
>> +}
>> +
>>  /*
>>   * Hardware Error Notification
>>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
>> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>>  }
>> +
>> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
>> +{
>> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
>> +    int loop = 0;
>> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
>> +    bool ret = ACPI_GHES_CPER_FAIL;
>> +    uint8_t source_id;
>> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
>> +
> 
> I'm not entirely sure why this is needed - se below
> 
>> +    /*
>> +     * | +---------------------+ ges.ghes_addr_le
>> +     * | |error_block_address0 |
>> +     * | +---------------------+ --+--
>> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | |error_block_addressN |
>> +     * | +---------------------+
>> +     * | | read_ack_register0  |
>> +     * | +---------------------+ --+--
>> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | | read_ack_registerN  |
>> +     * | +---------------------+ --+--
>> +     * | |      CPER           |   |
>> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
>> +     * | |      CPER           |   |
>> +     * | +---------------------+ --+--
>> +     * | |    ..........       |
>> +     * | +---------------------+
>> +     * | |      CPER           |
>> +     * | |      ....           |
>> +     * | |      CPER           |
>> +     * | +---------------------+
>> +     */
>> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
>> +        /* Find and check the source id for this new CPER */
>> +        source_id = error_source_id[notify];
> 
> Why not using switch case for supported source types ?
> For the time being only one is being supported. And you only use that to
> verify that support - seems a bit unnecessary.

Afterwards May be we will many source types to support, so Igor's suggestion is better as shown below.

static const uint8_t ghes_notify2source_id_map[] = {
    ACPI_HEST_SRC_ID_SEA,
    ACPI_HEST_SRC_ID_RESERVED
}


> 
>> +        if (source_id != 0xff) {
>> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
>> +        } else {
>> +            goto out;
>> +        }
>> +
[...]
>>
>> +/* Callers must hold the iothread mutex lock */
>> +static void kvm_inject_arm_sea(CPUState *c)
> 
> We could enclose this function along with the kvm_arch_on_sigbus_vcpu
> within ifdef switch for KVM_HAVE_MCE_INJECTION
> 
>> +{
>> +    ARMCPU *cpu = ARM_CPU(c);
>> +    CPUARMState *env = &cpu->env;
>> +    CPUClass *cc = CPU_GET_CLASS(c);
>> +    uint32_t esr;
>> +    bool same_el;
>> +
>> +    c->exception_index = EXCP_DATA_ABORT;
>> +    env->exception.target_el = 1;
>> +
>> +    /*
>> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
>> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
>> +     */
>> +    same_el = arm_current_el(env) == env->exception.target_el;
>> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> 
> IINM this is the only use case when FnV is considered to be valid
> so I'm not convinced it is worth to modify the syn_data_abort_no_iss
> just for this.

Here we set the FnV to not valid, not to set it to valid.
because Guest will use the physical address that recorded in APEI table.

> 
>> +
>> +    env->exception.syndrome = esr;
>> +
>> +    cc->do_interrupt(c);
>> +}
>> +
>>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
>>
>> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>>      return ret;
>>  }
>>
>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>> +{
>> +    ram_addr_t ram_addr;
>> +    hwaddr paddr;
>> +
>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>> +
>> +    if (acpi_enabled && addr &&
>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>> +        ram_addr = qemu_ram_addr_from_host(addr);
>> +        if (ram_addr != RAM_ADDR_INVALID &&
>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>> +            kvm_hwpoison_page_add(ram_addr);
>> +            /*
>> +             * Asynchronous signal will be masked by main thread, so
>> +             * only handle synchronous signal.
>> +             */
> 
> I'm not entirely sure that the comment above is correct (it has been
> pointed out before). I would expect the AO signal to be handled here as
> well. Not having proper support to do that just yet is another story but
> the comment might be bit misleading.
> 
> 
>> +            if (code == BUS_MCEERR_AR) {
>> +                kvm_cpu_synchronize_state(c);
>> +                if (ACPI_GHES_CPER_FAIL !=
>> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
>> +                    kvm_inject_arm_sea(c);
>> +                } else {
>> +                    fprintf(stderr, "failed to record the error\n");
>> +                }
>> +            }
>> +            return;
>> +        }
>> +        fprintf(stderr, "Hardware memory error for memory used by "
>> +                "QEMU itself instead of guest system!\n");
>> +    }
>> +
>> +    if (code == BUS_MCEERR_AR) {
>> +        fprintf(stderr, "Hardware memory error!\n");
>> +        exit(1);
>> +    }
>> +}
>> +
>>  /* C6.6.29 BRK instruction */
>>  static const uint32_t brk_insn = 0xd4200000;
>>
>> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
>> index 5feb312941..499672ebbc 100644
>> --- a/target/arm/tlb_helper.c
>> +++ b/target/arm/tlb_helper.c
>> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
>>       * ISV field.
>>       */
>>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
>> -        syn = syn_data_abort_no_iss(same_el,
>> +        syn = syn_data_abort_no_iss(same_el, 0,
>>                                      ea, 0, s1ptw, is_write, fsc);
>>      } else {
>>          /*
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 5352c9ff55..f75a210f96 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -29,6 +29,8 @@
>>  /* The x86 has a strong memory model with some store-after-load re-ordering */
>>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
>>
>> +#define KVM_HAVE_MCE_INJECTION 1
>> +
>>  /* Maximum instruction code size */
>>  #define TARGET_MAX_INSN_SIZE 16
>>
>> --
>> 2.19.1
>>
>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-12-07  9:33       ` gengdongjiu
  0 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-12-07  9:33 UTC (permalink / raw)
  To: Beata Michalska, Xiang Zheng
  Cc: Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	qemu-devel, linuxarm, shannon.zhaosl, qemu-arm, james.morse,
	jonathan.cameron, imammedo, pbonzini, xuwei5, Laszlo Ersek, rth



On 2019/11/22 23:47, Beata Michalska wrote:
> Hi,
> 
> On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
>>
>> From: Dongjiu Geng <gengdongjiu@huawei.com>
>>
>> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
>> translates the host VA delivered by host to guest PA, then fills this PA
>> to guest APEI GHES memory, then notifies guest according to the SIGBUS
>> type.
>>
>> When guest accesses the poisoned memory, it will generate a Synchronous
>> External Abort(SEA). Then host kernel gets an APEI notification and calls
>> memory_failure() to unmapped the affected page in stage 2, finally
>> returns to guest.
>>
>> Guest continues to access the PG_hwpoison page, it will trap to KVM as
>> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
>> Qemu, Qemu records this error address into guest APEI GHES memory and
>> notifes guest using Synchronous-External-Abort(SEA).
>>
>> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
>> in which we can setup the type of exception and the syndrome information.
>> When switching to guest, the target vcpu will jump to the synchronous
>> external abort vector table entry.
>>
>> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
>> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
>> not valid and hold an UNKNOWN value. These values will be set to KVM
>> register structures through KVM_SET_ONE_REG IOCTL.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
>>  include/hw/acpi/acpi_ghes.h |   4 +
>>  include/sysemu/kvm.h        |   3 +-
>>  target/arm/cpu.h            |   4 +
>>  target/arm/helper.c         |   2 +-
>>  target/arm/internals.h      |   5 +-
>>  target/arm/kvm64.c          |  64 ++++++++
>>  target/arm/tlb_helper.c     |   2 +-
>>  target/i386/cpu.h           |   2 +
>>  9 files changed, 377 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
>> index 42c00ff3d3..f5b54990c0 100644
>> --- a/hw/acpi/acpi_ghes.c
>> +++ b/hw/acpi/acpi_ghes.c
>> @@ -39,6 +39,34 @@
>>  /* The max size in bytes for one error block */
>>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
>>
>> +/*
>> + * The total size of Generic Error Data Entry
>> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-343 Generic Error Data Entry
>> + */
>> +#define ACPI_GHES_DATA_LENGTH               72
>> +
>> +/*
>> + * The memory section CPER size,
>> + * UEFI 2.6: N.2.5 Memory Error Section
>> + */
>> +#define ACPI_GHES_MEM_CPER_LENGTH           80
>> +
>> +/*
>> + * Masks for block_status flags
>> + */
>> +#define ACPI_GEBS_UNCORRECTABLE         1
> 
> Why not listing all supported statuses ? Similar to error severity below ?
> 
>> +
>> +/*
>> + * Values for error_severity field
>> + */
>> +enum AcpiGenericErrorSeverity {
>> +    ACPI_CPER_SEV_RECOVERABLE,
>> +    ACPI_CPER_SEV_FATAL,
>> +    ACPI_CPER_SEV_CORRECTED,
>> +    ACPI_CPER_SEV_NONE,
>> +};
>> +
>>  /*
>>   * Now only support ARMv8 SEA notification type error source
>>   */
>> @@ -49,6 +77,16 @@
>>   */
>>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>>
>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
>> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
>> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
>> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
>> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>> +
>> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
>> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>> +    0xED, 0x7C, 0x83, 0xB1)
>> +
>>  /*
>>   * | +--------------------------+ 0
>>   * | |        Header            |
>> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>>      uint64_t ghes_addr_le;
>>  } AcpiGhesState;
>>
>> +/*
>> + * Total size for Generic Error Status Block
>> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-380 Generic Error Status Block
>> + */
>> +#define ACPI_GHES_GESB_SIZE                 20
> 
> Minor: This is not entirely correct: GEDE is part of GESB so the total length
> would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)
yes, the comments needs to correct.

> 
>> +/* The offset of Data Length in Generic Error Status Block */
>> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
>> +
> 
> If those were nicely represented as structures you get the offsets easily
> without having number of defines. That could simplify the code and make it
> more readable - see comments below
> 
>> +/*
>> + * Record the value of data length for each error status block to avoid getting
>> + * this value from guest.
>> + */
>> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
>> +
>> +/*
>> + * Generic Error Data Entry
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
>> +                uint32_t error_severity, uint16_t revision,
>> +                uint8_t validation_bits, uint8_t flags,
>> +                uint32_t error_data_length, QemuUUID fru_id,
>> +                uint8_t *fru_text, uint64_t time_stamp)
> 
> Why not just defining a struct that represents the GED entry?

This is due to address Igor's comments. there are two reasons:
1. avoid define many structures about APEI/GHES/CPER, so you can see it has very little structures definition in acpi_ghes.h
2. using build_append_int_noprefix() to compose the table can avoid considering endian

> 
>> +{
>> +    QemuUUID uuid_le;
>> +
>> +    /* Section Type */
>> +    uuid_le = qemu_uuid_bswap(section_type);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +    /* Revision */
>> +    build_append_int_noprefix(table, revision, 2);
> 
> Minor: According to the spec it seems that the revision number is
> a fixed value so you could drop that from the parameters....
> or ... use a struct to represent the data
> 
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table, validation_bits, 1);
>> +    /* Flags */
>> +    build_append_int_noprefix(table, flags, 1);
>> +    /* Error Data Length */
>> +    build_append_int_noprefix(table, error_data_length, 4);
>> +
>> +    /* FRU Id */
>> +    uuid_le = qemu_uuid_bswap(fru_id);
>> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
>> +
>> +    /* FRU Text */
>> +    g_array_append_vals(table, fru_text, 20);
>> +    /* Timestamp */
>> +    build_append_int_noprefix(table, time_stamp, 8);
>> +}
>> +
>> +/*
>> + * Generic Error Status Block
>> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
>> + */
>> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
>> +                uint32_t raw_data_offset, uint32_t raw_data_length,
>> +                uint32_t data_length, uint32_t error_severity)
> 
> Same as the above
> 
>> +{
>> +    /* Block Status */
>> +    build_append_int_noprefix(table, block_status, 4);
>> +    /* Raw Data Offset */
>> +    build_append_int_noprefix(table, raw_data_offset, 4);
>> +    /* Raw Data Length */
>> +    build_append_int_noprefix(table, raw_data_length, 4);
>> +    /* Data Length */
>> +    build_append_int_noprefix(table, data_length, 4);
>> +    /* Error Severity */
>> +    build_append_int_noprefix(table, error_severity, 4);
>> +}
>> +
>> +/* UEFI 2.6: N.2.5 Memory Error Section */
>> +static void acpi_ghes_build_append_mem_cper(GArray *table,
>> +                                            uint64_t error_physical_addr)
>> +{
>> +    /*
>> +     * Memory Error Record
>> +     */
>> +
>> +    /* Validation Bits */
>> +    build_append_int_noprefix(table,
>> +                              (1UL << 14) | /* Type Valid */
>> +                              (1UL << 1) /* Physical Address Valid */,
>> +                              8);
>> +    /* Error Status */
>> +    build_append_int_noprefix(table, 0, 8);
> 
> Just wondering whether it would be worth to specify the Error Type
> through the Error Status ?
> 
>> +    /* Physical Address */
>> +    build_append_int_noprefix(table, error_physical_addr, 8);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 48);
>> +    /* Memory Error Type */
>> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
>> +    /* Skip all the detailed information normally found in such a record */
>> +    build_append_int_noprefix(table, 0, 7);
>> +}
>> +
>> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
>> +                                      uint64_t error_physical_addr,
>> +                                      uint32_t data_length)
>> +{
>> +    GArray *block;
>> +    uint64_t current_block_length;
>> +    /* Memory Error Section Type */
>> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
> 
> As already mentioned - mixing LE /w BE
> 
>> +    QemuUUID fru_id = {};
>> +    uint8_t fru_text[20] = {};
>> +
>> +    /*
>> +     * Generic Error Status Block
>> +     * | +---------------------+
>> +     * | |     block_status    |
>> +     * | +---------------------+
>> +     * | |    raw_data_offset  |
>> +     * | +---------------------+
>> +     * | |    raw_data_length  |
>> +     * | +---------------------+
>> +     * | |     data_length     |
>> +     * | +---------------------+
>> +     * | |   error_severity    |
>> +     * | +---------------------+
>> +     */
>> +    block = g_array_new(false, true /* clear */, 1);
>> +
>> +    /* The current whole length of the generic error status block */
>> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
>> +
>> +    /* This is the length if adding a new generic error data entry*/
>> +    data_length += ACPI_GHES_DATA_LENGTH;
>> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
>> +
>> +    /*
>> +     * Check whether it will run out of the preallocated memory if adding a new
>> +     * generic error data entry
>> +     */
>> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
>> +        error_report("Record CPER out of boundary!!!");
> 
> Minor: The error message could be made more accurate, like:
>     "Not enough memory to record new CPER"
> 
>> +        return ACPI_GHES_CPER_FAIL;
>> +    }
>> +
>> +    /* Build the new generic error status block header */
>> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
>> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
>> +
>> +    /* Write back above generic error status block header to guest memory */
>> +    cpu_physical_memory_write(error_block_address, block->data,
>> +                              block->len);
>> +
>> +    /* Add a new generic error data entry */
>> +
>> +    data_length = block->len;
>> +    /* Build this new generic error data entry header */
>> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
>> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
>> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
>> +
>> +    /* Build the memory section CPER for above new generic error data entry */
>> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
>> +
>> +    /* Write back above this new generic error data entry to guest memory */
>> +    cpu_physical_memory_write(error_block_address + current_block_length,
>> +        block->data + data_length, block->len - data_length);
>> +
> 
> As already mentioned and unless I have missed smth (which is highly possible)
> this will append new records while the GESB is kept 'in-place'. So the
> used space is
> only growing.
> 
>> +    g_array_free(block, true);
>> +
>> +    return ACPI_GHES_CPER_OK;
>> +}
>> +
>>  /*
>>   * Hardware Error Notification
>>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
>> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
>>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
>>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
>>  }
>> +
>> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
>> +{
>> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
>> +    int loop = 0;
>> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
>> +    bool ret = ACPI_GHES_CPER_FAIL;
>> +    uint8_t source_id;
>> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
>> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
>> +
> 
> I'm not entirely sure why this is needed - se below
> 
>> +    /*
>> +     * | +---------------------+ ges.ghes_addr_le
>> +     * | |error_block_address0 |
>> +     * | +---------------------+ --+--
>> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | |error_block_addressN |
>> +     * | +---------------------+
>> +     * | | read_ack_register0  |
>> +     * | +---------------------+ --+--
>> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
>> +     * | +---------------------+ --+--
>> +     * | | read_ack_registerN  |
>> +     * | +---------------------+ --+--
>> +     * | |      CPER           |   |
>> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
>> +     * | |      CPER           |   |
>> +     * | +---------------------+ --+--
>> +     * | |    ..........       |
>> +     * | +---------------------+
>> +     * | |      CPER           |
>> +     * | |      ....           |
>> +     * | |      CPER           |
>> +     * | +---------------------+
>> +     */
>> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
>> +        /* Find and check the source id for this new CPER */
>> +        source_id = error_source_id[notify];
> 
> Why not using switch case for supported source types ?
> For the time being only one is being supported. And you only use that to
> verify that support - seems a bit unnecessary.

Afterwards May be we will many source types to support, so Igor's suggestion is better as shown below.

static const uint8_t ghes_notify2source_id_map[] = {
    ACPI_HEST_SRC_ID_SEA,
    ACPI_HEST_SRC_ID_RESERVED
}


> 
>> +        if (source_id != 0xff) {
>> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
>> +        } else {
>> +            goto out;
>> +        }
>> +
[...]
>>
>> +/* Callers must hold the iothread mutex lock */
>> +static void kvm_inject_arm_sea(CPUState *c)
> 
> We could enclose this function along with the kvm_arch_on_sigbus_vcpu
> within ifdef switch for KVM_HAVE_MCE_INJECTION
> 
>> +{
>> +    ARMCPU *cpu = ARM_CPU(c);
>> +    CPUARMState *env = &cpu->env;
>> +    CPUClass *cc = CPU_GET_CLASS(c);
>> +    uint32_t esr;
>> +    bool same_el;
>> +
>> +    c->exception_index = EXCP_DATA_ABORT;
>> +    env->exception.target_el = 1;
>> +
>> +    /*
>> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
>> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
>> +     */
>> +    same_el = arm_current_el(env) == env->exception.target_el;
>> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> 
> IINM this is the only use case when FnV is considered to be valid
> so I'm not convinced it is worth to modify the syn_data_abort_no_iss
> just for this.

Here we set the FnV to not valid, not to set it to valid.
because Guest will use the physical address that recorded in APEI table.

> 
>> +
>> +    env->exception.syndrome = esr;
>> +
>> +    cc->do_interrupt(c);
>> +}
>> +
>>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
>>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
>>
>> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
>>      return ret;
>>  }
>>
>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>> +{
>> +    ram_addr_t ram_addr;
>> +    hwaddr paddr;
>> +
>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>> +
>> +    if (acpi_enabled && addr &&
>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>> +        ram_addr = qemu_ram_addr_from_host(addr);
>> +        if (ram_addr != RAM_ADDR_INVALID &&
>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>> +            kvm_hwpoison_page_add(ram_addr);
>> +            /*
>> +             * Asynchronous signal will be masked by main thread, so
>> +             * only handle synchronous signal.
>> +             */
> 
> I'm not entirely sure that the comment above is correct (it has been
> pointed out before). I would expect the AO signal to be handled here as
> well. Not having proper support to do that just yet is another story but
> the comment might be bit misleading.
> 
> 
>> +            if (code == BUS_MCEERR_AR) {
>> +                kvm_cpu_synchronize_state(c);
>> +                if (ACPI_GHES_CPER_FAIL !=
>> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
>> +                    kvm_inject_arm_sea(c);
>> +                } else {
>> +                    fprintf(stderr, "failed to record the error\n");
>> +                }
>> +            }
>> +            return;
>> +        }
>> +        fprintf(stderr, "Hardware memory error for memory used by "
>> +                "QEMU itself instead of guest system!\n");
>> +    }
>> +
>> +    if (code == BUS_MCEERR_AR) {
>> +        fprintf(stderr, "Hardware memory error!\n");
>> +        exit(1);
>> +    }
>> +}
>> +
>>  /* C6.6.29 BRK instruction */
>>  static const uint32_t brk_insn = 0xd4200000;
>>
>> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
>> index 5feb312941..499672ebbc 100644
>> --- a/target/arm/tlb_helper.c
>> +++ b/target/arm/tlb_helper.c
>> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
>>       * ISV field.
>>       */
>>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
>> -        syn = syn_data_abort_no_iss(same_el,
>> +        syn = syn_data_abort_no_iss(same_el, 0,
>>                                      ea, 0, s1ptw, is_write, fsc);
>>      } else {
>>          /*
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 5352c9ff55..f75a210f96 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -29,6 +29,8 @@
>>  /* The x86 has a strong memory model with some store-after-load re-ordering */
>>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
>>
>> +#define KVM_HAVE_MCE_INJECTION 1
>> +
>>  /* Maximum instruction code size */
>>  #define TARGET_MAX_INSN_SIZE 16
>>
>> --
>> 2.19.1
>>
>>
>>
> .
> 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 1/6] hw/arm/virt: Introduce a RAS machine option
  2019-12-02 18:22     ` Peter Maydell
@ 2019-12-07 12:10       ` gengdongjiu
  -1 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-12-07 12:10 UTC (permalink / raw)
  To: Peter Maydell, Xiang Zheng
  Cc: Paolo Bonzini, Michael S. Tsirkin, Igor Mammedov, Shannon Zhao,
	Laszlo Ersek, James Morse, Marcelo Tosatti, Richard Henderson,
	Eduardo Habkost, Jonathan Cameron, xuwei (O),
	kvm-devel, QEMU Developers, qemu-arm, Linuxarm, wanghaibin.wang


> 
> I think we could make the user-facing description of
> the option a little clearer: something like
> "Set on/off to enable/disable reporting host memory errors
> to a KVM guest using ACPI and guest external abort exceptions"
> 
> ?
Peter, sorry for the late response.
sure, we have already updated it, and will send PATCH V22.

> 
> Otherwise
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> 
> thanks
> -- PMM
> .
> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 1/6] hw/arm/virt: Introduce a RAS machine option
@ 2019-12-07 12:10       ` gengdongjiu
  0 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-12-07 12:10 UTC (permalink / raw)
  To: Peter Maydell, Xiang Zheng
  Cc: Eduardo Habkost, kvm-devel, Michael S. Tsirkin, wanghaibin.wang,
	Marcelo Tosatti, QEMU Developers, Linuxarm, Shannon Zhao,
	qemu-arm, James Morse, Jonathan Cameron, Igor Mammedov,
	Paolo Bonzini, xuwei (O),
	Laszlo Ersek, Richard Henderson


> 
> I think we could make the user-facing description of
> the option a little clearer: something like
> "Set on/off to enable/disable reporting host memory errors
> to a KVM guest using ACPI and guest external abort exceptions"
> 
> ?
Peter, sorry for the late response.
sure, we have already updated it, and will send PATCH V22.

> 
> Otherwise
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> 
> thanks
> -- PMM
> .
> 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-12-07  9:33       ` gengdongjiu
@ 2019-12-09 13:05         ` Beata Michalska
  -1 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-12-09 13:05 UTC (permalink / raw)
  To: gengdongjiu
  Cc: Xiang Zheng, pbonzini, mst, Igor Mammedov, shannon.zhaosl,
	Peter Maydell, Laszlo Ersek, james.morse, mtosatti, rth,
	ehabkost, jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm,
	linuxarm, wanghaibin.wang

On Sat, 7 Dec 2019 at 09:33, gengdongjiu <gengdongjiu@huawei.com> wrote:
>
>
>
> On 2019/11/22 23:47, Beata Michalska wrote:
> > Hi,
> >
> > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
> >>
> >> From: Dongjiu Geng <gengdongjiu@huawei.com>
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> >> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> ---
> >>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
> >>  include/hw/acpi/acpi_ghes.h |   4 +
> >>  include/sysemu/kvm.h        |   3 +-
> >>  target/arm/cpu.h            |   4 +
> >>  target/arm/helper.c         |   2 +-
> >>  target/arm/internals.h      |   5 +-
> >>  target/arm/kvm64.c          |  64 ++++++++
> >>  target/arm/tlb_helper.c     |   2 +-
> >>  target/i386/cpu.h           |   2 +
> >>  9 files changed, 377 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> >> index 42c00ff3d3..f5b54990c0 100644
> >> --- a/hw/acpi/acpi_ghes.c
> >> +++ b/hw/acpi/acpi_ghes.c
> >> @@ -39,6 +39,34 @@
> >>  /* The max size in bytes for one error block */
> >>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> >>
> >> +/*
> >> + * The total size of Generic Error Data Entry
> >> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-343 Generic Error Data Entry
> >> + */
> >> +#define ACPI_GHES_DATA_LENGTH               72
> >> +
> >> +/*
> >> + * The memory section CPER size,
> >> + * UEFI 2.6: N.2.5 Memory Error Section
> >> + */
> >> +#define ACPI_GHES_MEM_CPER_LENGTH           80
> >> +
> >> +/*
> >> + * Masks for block_status flags
> >> + */
> >> +#define ACPI_GEBS_UNCORRECTABLE         1
> >
> > Why not listing all supported statuses ? Similar to error severity below ?
> >
> >> +
> >> +/*
> >> + * Values for error_severity field
> >> + */
> >> +enum AcpiGenericErrorSeverity {
> >> +    ACPI_CPER_SEV_RECOVERABLE,
> >> +    ACPI_CPER_SEV_FATAL,
> >> +    ACPI_CPER_SEV_CORRECTED,
> >> +    ACPI_CPER_SEV_NONE,
> >> +};
> >> +
> >>  /*
> >>   * Now only support ARMv8 SEA notification type error source
> >>   */
> >> @@ -49,6 +77,16 @@
> >>   */
> >>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >>
> >> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> >> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> >> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> >> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> >> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> >> +
> >> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> >> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> >> +    0xED, 0x7C, 0x83, 0xB1)
> >> +
> >>  /*
> >>   * | +--------------------------+ 0
> >>   * | |        Header            |
> >> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
> >>      uint64_t ghes_addr_le;
> >>  } AcpiGhesState;
> >>
> >> +/*
> >> + * Total size for Generic Error Status Block
> >> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-380 Generic Error Status Block
> >> + */
> >> +#define ACPI_GHES_GESB_SIZE                 20
> >
> > Minor: This is not entirely correct: GEDE is part of GESB so the total length
> > would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)
> yes, the comments needs to correct.
>
> >
> >> +/* The offset of Data Length in Generic Error Status Block */
> >> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
> >> +
> >
> > If those were nicely represented as structures you get the offsets easily
> > without having number of defines. That could simplify the code and make it
> > more readable - see comments below
> >
> >> +/*
> >> + * Record the value of data length for each error status block to avoid getting
> >> + * this value from guest.
> >> + */
> >> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> >> +
> >> +/*
> >> + * Generic Error Data Entry
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> >> +                uint32_t error_severity, uint16_t revision,
> >> +                uint8_t validation_bits, uint8_t flags,
> >> +                uint32_t error_data_length, QemuUUID fru_id,
> >> +                uint8_t *fru_text, uint64_t time_stamp)
> >
> > Why not just defining a struct that represents the GED entry?
>
> This is due to address Igor's comments. there are two reasons:
> 1. avoid define many structures about APEI/GHES/CPER, so you can see it has very little structures definition in acpi_ghes.h
> 2. using build_append_int_noprefix() to compose the table can avoid considering endian
>
> >
> >> +{
> >> +    QemuUUID uuid_le;
> >> +
> >> +    /* Section Type */
> >> +    uuid_le = qemu_uuid_bswap(section_type);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +    /* Revision */
> >> +    build_append_int_noprefix(table, revision, 2);
> >
> > Minor: According to the spec it seems that the revision number is
> > a fixed value so you could drop that from the parameters....
> > or ... use a struct to represent the data
> >
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table, validation_bits, 1);
> >> +    /* Flags */
> >> +    build_append_int_noprefix(table, flags, 1);
> >> +    /* Error Data Length */
> >> +    build_append_int_noprefix(table, error_data_length, 4);
> >> +
> >> +    /* FRU Id */
> >> +    uuid_le = qemu_uuid_bswap(fru_id);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* FRU Text */
> >> +    g_array_append_vals(table, fru_text, 20);
> >> +    /* Timestamp */
> >> +    build_append_int_noprefix(table, time_stamp, 8);
> >> +}
> >> +
> >> +/*
> >> + * Generic Error Status Block
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> >> +                uint32_t raw_data_offset, uint32_t raw_data_length,
> >> +                uint32_t data_length, uint32_t error_severity)
> >
> > Same as the above
> >
> >> +{
> >> +    /* Block Status */
> >> +    build_append_int_noprefix(table, block_status, 4);
> >> +    /* Raw Data Offset */
> >> +    build_append_int_noprefix(table, raw_data_offset, 4);
> >> +    /* Raw Data Length */
> >> +    build_append_int_noprefix(table, raw_data_length, 4);
> >> +    /* Data Length */
> >> +    build_append_int_noprefix(table, data_length, 4);
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +}
> >> +
> >> +/* UEFI 2.6: N.2.5 Memory Error Section */
> >> +static void acpi_ghes_build_append_mem_cper(GArray *table,
> >> +                                            uint64_t error_physical_addr)
> >> +{
> >> +    /*
> >> +     * Memory Error Record
> >> +     */
> >> +
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table,
> >> +                              (1UL << 14) | /* Type Valid */
> >> +                              (1UL << 1) /* Physical Address Valid */,
> >> +                              8);
> >> +    /* Error Status */
> >> +    build_append_int_noprefix(table, 0, 8);
> >
> > Just wondering whether it would be worth to specify the Error Type
> > through the Error Status ?
> >
> >> +    /* Physical Address */
> >> +    build_append_int_noprefix(table, error_physical_addr, 8);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 48);
> >> +    /* Memory Error Type */
> >> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 7);
> >> +}
> >> +
> >> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> >> +                                      uint64_t error_physical_addr,
> >> +                                      uint32_t data_length)
> >> +{
> >> +    GArray *block;
> >> +    uint64_t current_block_length;
> >> +    /* Memory Error Section Type */
> >> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
> >
> > As already mentioned - mixing LE /w BE
> >
> >> +    QemuUUID fru_id = {};
> >> +    uint8_t fru_text[20] = {};
> >> +
> >> +    /*
> >> +     * Generic Error Status Block
> >> +     * | +---------------------+
> >> +     * | |     block_status    |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_offset  |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_length  |
> >> +     * | +---------------------+
> >> +     * | |     data_length     |
> >> +     * | +---------------------+
> >> +     * | |   error_severity    |
> >> +     * | +---------------------+
> >> +     */
> >> +    block = g_array_new(false, true /* clear */, 1);
> >> +
> >> +    /* The current whole length of the generic error status block */
> >> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> >> +
> >> +    /* This is the length if adding a new generic error data entry*/
> >> +    data_length += ACPI_GHES_DATA_LENGTH;
> >> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> >> +
> >> +    /*
> >> +     * Check whether it will run out of the preallocated memory if adding a new
> >> +     * generic error data entry
> >> +     */
> >> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> >> +        error_report("Record CPER out of boundary!!!");
> >
> > Minor: The error message could be made more accurate, like:
> >     "Not enough memory to record new CPER"
> >
> >> +        return ACPI_GHES_CPER_FAIL;
> >> +    }
> >> +
> >> +    /* Build the new generic error status block header */
> >> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> >> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
> >> +
> >> +    /* Write back above generic error status block header to guest memory */
> >> +    cpu_physical_memory_write(error_block_address, block->data,
> >> +                              block->len);
> >> +
> >> +    /* Add a new generic error data entry */
> >> +
> >> +    data_length = block->len;
> >> +    /* Build this new generic error data entry header */
> >> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> >> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> >> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
> >> +
> >> +    /* Build the memory section CPER for above new generic error data entry */
> >> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> >> +
> >> +    /* Write back above this new generic error data entry to guest memory */
> >> +    cpu_physical_memory_write(error_block_address + current_block_length,
> >> +        block->data + data_length, block->len - data_length);
> >> +
> >
> > As already mentioned and unless I have missed smth (which is highly possible)
> > this will append new records while the GESB is kept 'in-place'. So the
> > used space is
> > only growing.
> >
> >> +    g_array_free(block, true);
> >> +
> >> +    return ACPI_GHES_CPER_OK;
> >> +}
> >> +
> >>  /*
> >>   * Hardware Error Notification
> >>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> >> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> >>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> >>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> >>  }
> >> +
> >> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> >> +{
> >> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> >> +    int loop = 0;
> >> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
> >> +    bool ret = ACPI_GHES_CPER_FAIL;
> >> +    uint8_t source_id;
> >> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> >> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
> >> +
> >
> > I'm not entirely sure why this is needed - se below
> >
> >> +    /*
> >> +     * | +---------------------+ ges.ghes_addr_le
> >> +     * | |error_block_address0 |
> >> +     * | +---------------------+ --+--
> >> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | |error_block_addressN |
> >> +     * | +---------------------+
> >> +     * | | read_ack_register0  |
> >> +     * | +---------------------+ --+--
> >> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | | read_ack_registerN  |
> >> +     * | +---------------------+ --+--
> >> +     * | |      CPER           |   |
> >> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> >> +     * | |      CPER           |   |
> >> +     * | +---------------------+ --+--
> >> +     * | |    ..........       |
> >> +     * | +---------------------+
> >> +     * | |      CPER           |
> >> +     * | |      ....           |
> >> +     * | |      CPER           |
> >> +     * | +---------------------+
> >> +     */
> >> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> >> +        /* Find and check the source id for this new CPER */
> >> +        source_id = error_source_id[notify];
> >
> > Why not using switch case for supported source types ?
> > For the time being only one is being supported. And you only use that to
> > verify that support - seems a bit unnecessary.
>
> Afterwards May be we will many source types to support, so Igor's suggestion is better as shown below.
>
> static const uint8_t ghes_notify2source_id_map[] = {
>     ACPI_HEST_SRC_ID_SEA,
>     ACPI_HEST_SRC_ID_RESERVED
> }
>
>
> >
> >> +        if (source_id != 0xff) {
> >> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> >> +        } else {
> >> +            goto out;
> >> +        }
> >> +
> [...]
> >>
> >> +/* Callers must hold the iothread mutex lock */
> >> +static void kvm_inject_arm_sea(CPUState *c)
> >
> > We could enclose this function along with the kvm_arch_on_sigbus_vcpu
> > within ifdef switch for KVM_HAVE_MCE_INJECTION
> >
> >> +{
> >> +    ARMCPU *cpu = ARM_CPU(c);
> >> +    CPUARMState *env = &cpu->env;
> >> +    CPUClass *cc = CPU_GET_CLASS(c);
> >> +    uint32_t esr;
> >> +    bool same_el;
> >> +
> >> +    c->exception_index = EXCP_DATA_ABORT;
> >> +    env->exception.target_el = 1;
> >> +
> >> +    /*
> >> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> >> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> >> +     */
> >> +    same_el = arm_current_el(env) == env->exception.target_el;
> >> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> >
> > IINM this is the only use case when FnV is considered to be valid
> > so I'm not convinced it is worth to modify the syn_data_abort_no_iss
> > just for this.
>
> Here we set the FnV to not valid, not to set it to valid.
> because Guest will use the physical address that recorded in APEI table.
>
To be precise : the FnV is  giving the status of FAR - so what you are setting
here is status of 0b0 which means FAR is valid, not FnV on it's own.
And my point was that you are changing the prototype for syn_data_abort_no_iss
just for this case only so I was just thinking that it might not be
worth that, instead
you could just set it here ... or to be more flexible , provide a way
to set specific bits
on demand.


BR
Beata

> >
> >> +
> >> +    env->exception.syndrome = esr;
> >> +
> >> +    cc->do_interrupt(c);
> >> +}
> >> +
> >>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> >>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
> >>
> >> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
> >>      return ret;
> >>  }
> >>
> >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >> +{
> >> +    ram_addr_t ram_addr;
> >> +    hwaddr paddr;
> >> +
> >> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> >> +
> >> +    if (acpi_enabled && addr &&
> >> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> >> +        ram_addr = qemu_ram_addr_from_host(addr);
> >> +        if (ram_addr != RAM_ADDR_INVALID &&
> >> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> >> +            kvm_hwpoison_page_add(ram_addr);
> >> +            /*
> >> +             * Asynchronous signal will be masked by main thread, so
> >> +             * only handle synchronous signal.
> >> +             */
> >
> > I'm not entirely sure that the comment above is correct (it has been
> > pointed out before). I would expect the AO signal to be handled here as
> > well. Not having proper support to do that just yet is another story but
> > the comment might be bit misleading.
> >
> >
> >> +            if (code == BUS_MCEERR_AR) {
> >> +                kvm_cpu_synchronize_state(c);
> >> +                if (ACPI_GHES_CPER_FAIL !=
> >> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> >> +                    kvm_inject_arm_sea(c);
> >> +                } else {
> >> +                    fprintf(stderr, "failed to record the error\n");
> >> +                }
> >> +            }
> >> +            return;
> >> +        }
> >> +        fprintf(stderr, "Hardware memory error for memory used by "
> >> +                "QEMU itself instead of guest system!\n");
> >> +    }
> >> +
> >> +    if (code == BUS_MCEERR_AR) {
> >> +        fprintf(stderr, "Hardware memory error!\n");
> >> +        exit(1);
> >> +    }
> >> +}
> >> +
> >>  /* C6.6.29 BRK instruction */
> >>  static const uint32_t brk_insn = 0xd4200000;
> >>
> >> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> >> index 5feb312941..499672ebbc 100644
> >> --- a/target/arm/tlb_helper.c
> >> +++ b/target/arm/tlb_helper.c
> >> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
> >>       * ISV field.
> >>       */
> >>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> >> -        syn = syn_data_abort_no_iss(same_el,
> >> +        syn = syn_data_abort_no_iss(same_el, 0,
> >>                                      ea, 0, s1ptw, is_write, fsc);
> >>      } else {
> >>          /*
> >> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> >> index 5352c9ff55..f75a210f96 100644
> >> --- a/target/i386/cpu.h
> >> +++ b/target/i386/cpu.h
> >> @@ -29,6 +29,8 @@
> >>  /* The x86 has a strong memory model with some store-after-load re-ordering */
> >>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
> >>
> >> +#define KVM_HAVE_MCE_INJECTION 1
> >> +
> >>  /* Maximum instruction code size */
> >>  #define TARGET_MAX_INSN_SIZE 16
> >>
> >> --
> >> 2.19.1
> >>
> >>
> >>
> > .
> >
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-12-09 13:05         ` Beata Michalska
  0 siblings, 0 replies; 82+ messages in thread
From: Beata Michalska @ 2019-12-09 13:05 UTC (permalink / raw)
  To: gengdongjiu
  Cc: Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	qemu-devel, linuxarm, shannon.zhaosl, Xiang Zheng, qemu-arm,
	james.morse, jonathan.cameron, Igor Mammedov, pbonzini, xuwei5,
	Laszlo Ersek, rth

On Sat, 7 Dec 2019 at 09:33, gengdongjiu <gengdongjiu@huawei.com> wrote:
>
>
>
> On 2019/11/22 23:47, Beata Michalska wrote:
> > Hi,
> >
> > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng <zhengxiang9@huawei.com> wrote:
> >>
> >> From: Dongjiu Geng <gengdongjiu@huawei.com>
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> >> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com>
> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> ---
> >>  hw/acpi/acpi_ghes.c         | 297 ++++++++++++++++++++++++++++++++++++
> >>  include/hw/acpi/acpi_ghes.h |   4 +
> >>  include/sysemu/kvm.h        |   3 +-
> >>  target/arm/cpu.h            |   4 +
> >>  target/arm/helper.c         |   2 +-
> >>  target/arm/internals.h      |   5 +-
> >>  target/arm/kvm64.c          |  64 ++++++++
> >>  target/arm/tlb_helper.c     |   2 +-
> >>  target/i386/cpu.h           |   2 +
> >>  9 files changed, 377 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> >> index 42c00ff3d3..f5b54990c0 100644
> >> --- a/hw/acpi/acpi_ghes.c
> >> +++ b/hw/acpi/acpi_ghes.c
> >> @@ -39,6 +39,34 @@
> >>  /* The max size in bytes for one error block */
> >>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH       0x1000
> >>
> >> +/*
> >> + * The total size of Generic Error Data Entry
> >> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-343 Generic Error Data Entry
> >> + */
> >> +#define ACPI_GHES_DATA_LENGTH               72
> >> +
> >> +/*
> >> + * The memory section CPER size,
> >> + * UEFI 2.6: N.2.5 Memory Error Section
> >> + */
> >> +#define ACPI_GHES_MEM_CPER_LENGTH           80
> >> +
> >> +/*
> >> + * Masks for block_status flags
> >> + */
> >> +#define ACPI_GEBS_UNCORRECTABLE         1
> >
> > Why not listing all supported statuses ? Similar to error severity below ?
> >
> >> +
> >> +/*
> >> + * Values for error_severity field
> >> + */
> >> +enum AcpiGenericErrorSeverity {
> >> +    ACPI_CPER_SEV_RECOVERABLE,
> >> +    ACPI_CPER_SEV_FATAL,
> >> +    ACPI_CPER_SEV_CORRECTED,
> >> +    ACPI_CPER_SEV_NONE,
> >> +};
> >> +
> >>  /*
> >>   * Now only support ARMv8 SEA notification type error source
> >>   */
> >> @@ -49,6 +77,16 @@
> >>   */
> >>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >>
> >> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)        \
> >> +    {{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 0xff, \
> >> +    ((b) >> 8) & 0xff, (b) & 0xff,                   \
> >> +    ((c) >> 8) & 0xff, (c) & 0xff,                    \
> >> +    (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> >> +
> >> +#define UEFI_CPER_SEC_PLATFORM_MEM                   \
> >> +    UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> >> +    0xED, 0x7C, 0x83, 0xB1)
> >> +
> >>  /*
> >>   * | +--------------------------+ 0
> >>   * | |        Header            |
> >> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
> >>      uint64_t ghes_addr_le;
> >>  } AcpiGhesState;
> >>
> >> +/*
> >> + * Total size for Generic Error Status Block
> >> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-380 Generic Error Status Block
> >> + */
> >> +#define ACPI_GHES_GESB_SIZE                 20
> >
> > Minor: This is not entirely correct: GEDE is part of GESB so the total length
> > would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)
> yes, the comments needs to correct.
>
> >
> >> +/* The offset of Data Length in Generic Error Status Block */
> >> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
> >> +
> >
> > If those were nicely represented as structures you get the offsets easily
> > without having number of defines. That could simplify the code and make it
> > more readable - see comments below
> >
> >> +/*
> >> + * Record the value of data length for each error status block to avoid getting
> >> + * this value from guest.
> >> + */
> >> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> >> +
> >> +/*
> >> + * Generic Error Data Entry
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_data(GArray *table, QemuUUID section_type,
> >> +                uint32_t error_severity, uint16_t revision,
> >> +                uint8_t validation_bits, uint8_t flags,
> >> +                uint32_t error_data_length, QemuUUID fru_id,
> >> +                uint8_t *fru_text, uint64_t time_stamp)
> >
> > Why not just defining a struct that represents the GED entry?
>
> This is due to address Igor's comments. there are two reasons:
> 1. avoid define many structures about APEI/GHES/CPER, so you can see it has very little structures definition in acpi_ghes.h
> 2. using build_append_int_noprefix() to compose the table can avoid considering endian
>
> >
> >> +{
> >> +    QemuUUID uuid_le;
> >> +
> >> +    /* Section Type */
> >> +    uuid_le = qemu_uuid_bswap(section_type);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +    /* Revision */
> >> +    build_append_int_noprefix(table, revision, 2);
> >
> > Minor: According to the spec it seems that the revision number is
> > a fixed value so you could drop that from the parameters....
> > or ... use a struct to represent the data
> >
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table, validation_bits, 1);
> >> +    /* Flags */
> >> +    build_append_int_noprefix(table, flags, 1);
> >> +    /* Error Data Length */
> >> +    build_append_int_noprefix(table, error_data_length, 4);
> >> +
> >> +    /* FRU Id */
> >> +    uuid_le = qemu_uuid_bswap(fru_id);
> >> +    g_array_append_vals(table, uuid_le.data, ARRAY_SIZE(uuid_le.data));
> >> +
> >> +    /* FRU Text */
> >> +    g_array_append_vals(table, fru_text, 20);
> >> +    /* Timestamp */
> >> +    build_append_int_noprefix(table, time_stamp, 8);
> >> +}
> >> +
> >> +/*
> >> + * Generic Error Status Block
> >> + * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> >> + */
> >> +static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> >> +                uint32_t raw_data_offset, uint32_t raw_data_length,
> >> +                uint32_t data_length, uint32_t error_severity)
> >
> > Same as the above
> >
> >> +{
> >> +    /* Block Status */
> >> +    build_append_int_noprefix(table, block_status, 4);
> >> +    /* Raw Data Offset */
> >> +    build_append_int_noprefix(table, raw_data_offset, 4);
> >> +    /* Raw Data Length */
> >> +    build_append_int_noprefix(table, raw_data_length, 4);
> >> +    /* Data Length */
> >> +    build_append_int_noprefix(table, data_length, 4);
> >> +    /* Error Severity */
> >> +    build_append_int_noprefix(table, error_severity, 4);
> >> +}
> >> +
> >> +/* UEFI 2.6: N.2.5 Memory Error Section */
> >> +static void acpi_ghes_build_append_mem_cper(GArray *table,
> >> +                                            uint64_t error_physical_addr)
> >> +{
> >> +    /*
> >> +     * Memory Error Record
> >> +     */
> >> +
> >> +    /* Validation Bits */
> >> +    build_append_int_noprefix(table,
> >> +                              (1UL << 14) | /* Type Valid */
> >> +                              (1UL << 1) /* Physical Address Valid */,
> >> +                              8);
> >> +    /* Error Status */
> >> +    build_append_int_noprefix(table, 0, 8);
> >
> > Just wondering whether it would be worth to specify the Error Type
> > through the Error Status ?
> >
> >> +    /* Physical Address */
> >> +    build_append_int_noprefix(table, error_physical_addr, 8);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 48);
> >> +    /* Memory Error Type */
> >> +    build_append_int_noprefix(table, 0 /* Unknown error */, 1);
> >> +    /* Skip all the detailed information normally found in such a record */
> >> +    build_append_int_noprefix(table, 0, 7);
> >> +}
> >> +
> >> +static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> >> +                                      uint64_t error_physical_addr,
> >> +                                      uint32_t data_length)
> >> +{
> >> +    GArray *block;
> >> +    uint64_t current_block_length;
> >> +    /* Memory Error Section Type */
> >> +    QemuUUID mem_section_id_le = UEFI_CPER_SEC_PLATFORM_MEM;
> >
> > As already mentioned - mixing LE /w BE
> >
> >> +    QemuUUID fru_id = {};
> >> +    uint8_t fru_text[20] = {};
> >> +
> >> +    /*
> >> +     * Generic Error Status Block
> >> +     * | +---------------------+
> >> +     * | |     block_status    |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_offset  |
> >> +     * | +---------------------+
> >> +     * | |    raw_data_length  |
> >> +     * | +---------------------+
> >> +     * | |     data_length     |
> >> +     * | +---------------------+
> >> +     * | |   error_severity    |
> >> +     * | +---------------------+
> >> +     */
> >> +    block = g_array_new(false, true /* clear */, 1);
> >> +
> >> +    /* The current whole length of the generic error status block */
> >> +    current_block_length = ACPI_GHES_GESB_SIZE + data_length;
> >> +
> >> +    /* This is the length if adding a new generic error data entry*/
> >> +    data_length += ACPI_GHES_DATA_LENGTH;
> >> +    data_length += ACPI_GHES_MEM_CPER_LENGTH;
> >> +
> >> +    /*
> >> +     * Check whether it will run out of the preallocated memory if adding a new
> >> +     * generic error data entry
> >> +     */
> >> +    if ((data_length + ACPI_GHES_GESB_SIZE) > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> >> +        error_report("Record CPER out of boundary!!!");
> >
> > Minor: The error message could be made more accurate, like:
> >     "Not enough memory to record new CPER"
> >
> >> +        return ACPI_GHES_CPER_FAIL;
> >> +    }
> >> +
> >> +    /* Build the new generic error status block header */
> >> +    acpi_ghes_generic_error_status(block, cpu_to_le32(ACPI_GEBS_UNCORRECTABLE),
> >> +        0, 0, cpu_to_le32(data_length), cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE));
> >> +
> >> +    /* Write back above generic error status block header to guest memory */
> >> +    cpu_physical_memory_write(error_block_address, block->data,
> >> +                              block->len);
> >> +
> >> +    /* Add a new generic error data entry */
> >> +
> >> +    data_length = block->len;
> >> +    /* Build this new generic error data entry header */
> >> +    acpi_ghes_generic_error_data(block, mem_section_id_le,
> >> +        cpu_to_le32(ACPI_CPER_SEV_RECOVERABLE), cpu_to_le32(0x300), 0, 0,
> >> +        cpu_to_le32(ACPI_GHES_MEM_CPER_LENGTH), fru_id, fru_text, 0);
> >> +
> >> +    /* Build the memory section CPER for above new generic error data entry */
> >> +    acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> >> +
> >> +    /* Write back above this new generic error data entry to guest memory */
> >> +    cpu_physical_memory_write(error_block_address + current_block_length,
> >> +        block->data + data_length, block->len - data_length);
> >> +
> >
> > As already mentioned and unless I have missed smth (which is highly possible)
> > this will append new records while the GESB is kept 'in-place'. So the
> > used space is
> > only growing.
> >
> >> +    g_array_free(block, true);
> >> +
> >> +    return ACPI_GHES_CPER_OK;
> >> +}
> >> +
> >>  /*
> >>   * Hardware Error Notification
> >>   * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> >> @@ -265,3 +471,94 @@ void acpi_ghes_add_fw_cfg(FWCfgState *s, GArray *hardware_error)
> >>      fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> >>          NULL, &ges.ghes_addr_le, sizeof(ges.ghes_addr_le), false);
> >>  }
> >> +
> >> +bool acpi_ghes_record_errors(uint32_t notify, uint64_t physical_address)
> >> +{
> >> +    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> >> +    int loop = 0;
> >> +    uint64_t start_addr = le64_to_cpu(ges.ghes_addr_le);
> >> +    bool ret = ACPI_GHES_CPER_FAIL;
> >> +    uint8_t source_id;
> >> +    const uint8_t error_source_id[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> >> +                                        0xff, 0xff,    0, 0xff, 0xff, 0xff};
> >> +
> >
> > I'm not entirely sure why this is needed - se below
> >
> >> +    /*
> >> +     * | +---------------------+ ges.ghes_addr_le
> >> +     * | |error_block_address0 |
> >> +     * | +---------------------+ --+--
> >> +     * | |    .............    | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | |error_block_addressN |
> >> +     * | +---------------------+
> >> +     * | | read_ack_register0  |
> >> +     * | +---------------------+ --+--
> >> +     * | |   .............     | ACPI_GHES_ADDRESS_SIZE
> >> +     * | +---------------------+ --+--
> >> +     * | | read_ack_registerN  |
> >> +     * | +---------------------+ --+--
> >> +     * | |      CPER           |   |
> >> +     * | |      ....           | ACPI_GHES_MAX_RAW_DATA_LENGT
> >> +     * | |      CPER           |   |
> >> +     * | +---------------------+ --+--
> >> +     * | |    ..........       |
> >> +     * | +---------------------+
> >> +     * | |      CPER           |
> >> +     * | |      ....           |
> >> +     * | |      CPER           |
> >> +     * | +---------------------+
> >> +     */
> >> +    if (physical_address && notify < ACPI_GHES_NOTIFY_RESERVED) {
> >> +        /* Find and check the source id for this new CPER */
> >> +        source_id = error_source_id[notify];
> >
> > Why not using switch case for supported source types ?
> > For the time being only one is being supported. And you only use that to
> > verify that support - seems a bit unnecessary.
>
> Afterwards May be we will many source types to support, so Igor's suggestion is better as shown below.
>
> static const uint8_t ghes_notify2source_id_map[] = {
>     ACPI_HEST_SRC_ID_SEA,
>     ACPI_HEST_SRC_ID_RESERVED
> }
>
>
> >
> >> +        if (source_id != 0xff) {
> >> +            start_addr += source_id * ACPI_GHES_ADDRESS_SIZE;
> >> +        } else {
> >> +            goto out;
> >> +        }
> >> +
> [...]
> >>
> >> +/* Callers must hold the iothread mutex lock */
> >> +static void kvm_inject_arm_sea(CPUState *c)
> >
> > We could enclose this function along with the kvm_arch_on_sigbus_vcpu
> > within ifdef switch for KVM_HAVE_MCE_INJECTION
> >
> >> +{
> >> +    ARMCPU *cpu = ARM_CPU(c);
> >> +    CPUARMState *env = &cpu->env;
> >> +    CPUClass *cc = CPU_GET_CLASS(c);
> >> +    uint32_t esr;
> >> +    bool same_el;
> >> +
> >> +    c->exception_index = EXCP_DATA_ABORT;
> >> +    env->exception.target_el = 1;
> >> +
> >> +    /*
> >> +     * Set the DFSC to synchronous external abort and set FnV to not valid,
> >> +     * this will tell guest the FAR_ELx is UNKNOWN for this abort.
> >> +     */
> >> +    same_el = arm_current_el(env) == env->exception.target_el;
> >> +    esr = syn_data_abort_no_iss(same_el, 1, 0, 0, 0, 0, 0x10);
> >
> > IINM this is the only use case when FnV is considered to be valid
> > so I'm not convinced it is worth to modify the syn_data_abort_no_iss
> > just for this.
>
> Here we set the FnV to not valid, not to set it to valid.
> because Guest will use the physical address that recorded in APEI table.
>
To be precise : the FnV is  giving the status of FAR - so what you are setting
here is status of 0b0 which means FAR is valid, not FnV on it's own.
And my point was that you are changing the prototype for syn_data_abort_no_iss
just for this case only so I was just thinking that it might not be
worth that, instead
you could just set it here ... or to be more flexible , provide a way
to set specific bits
on demand.


BR
Beata

> >
> >> +
> >> +    env->exception.syndrome = esr;
> >> +
> >> +    cc->do_interrupt(c);
> >> +}
> >> +
> >>  #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
> >>                   KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
> >>
> >> @@ -1036,6 +1062,44 @@ int kvm_arch_get_registers(CPUState *cs)
> >>      return ret;
> >>  }
> >>
> >> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >> +{
> >> +    ram_addr_t ram_addr;
> >> +    hwaddr paddr;
> >> +
> >> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
> >> +
> >> +    if (acpi_enabled && addr &&
> >> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
> >> +        ram_addr = qemu_ram_addr_from_host(addr);
> >> +        if (ram_addr != RAM_ADDR_INVALID &&
> >> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
> >> +            kvm_hwpoison_page_add(ram_addr);
> >> +            /*
> >> +             * Asynchronous signal will be masked by main thread, so
> >> +             * only handle synchronous signal.
> >> +             */
> >
> > I'm not entirely sure that the comment above is correct (it has been
> > pointed out before). I would expect the AO signal to be handled here as
> > well. Not having proper support to do that just yet is another story but
> > the comment might be bit misleading.
> >
> >
> >> +            if (code == BUS_MCEERR_AR) {
> >> +                kvm_cpu_synchronize_state(c);
> >> +                if (ACPI_GHES_CPER_FAIL !=
> >> +                    acpi_ghes_record_errors(ACPI_GHES_NOTIFY_SEA, paddr)) {
> >> +                    kvm_inject_arm_sea(c);
> >> +                } else {
> >> +                    fprintf(stderr, "failed to record the error\n");
> >> +                }
> >> +            }
> >> +            return;
> >> +        }
> >> +        fprintf(stderr, "Hardware memory error for memory used by "
> >> +                "QEMU itself instead of guest system!\n");
> >> +    }
> >> +
> >> +    if (code == BUS_MCEERR_AR) {
> >> +        fprintf(stderr, "Hardware memory error!\n");
> >> +        exit(1);
> >> +    }
> >> +}
> >> +
> >>  /* C6.6.29 BRK instruction */
> >>  static const uint32_t brk_insn = 0xd4200000;
> >>
> >> diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
> >> index 5feb312941..499672ebbc 100644
> >> --- a/target/arm/tlb_helper.c
> >> +++ b/target/arm/tlb_helper.c
> >> @@ -33,7 +33,7 @@ static inline uint32_t merge_syn_data_abort(uint32_t template_syn,
> >>       * ISV field.
> >>       */
> >>      if (!(template_syn & ARM_EL_ISV) || target_el != 2 || s1ptw) {
> >> -        syn = syn_data_abort_no_iss(same_el,
> >> +        syn = syn_data_abort_no_iss(same_el, 0,
> >>                                      ea, 0, s1ptw, is_write, fsc);
> >>      } else {
> >>          /*
> >> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> >> index 5352c9ff55..f75a210f96 100644
> >> --- a/target/i386/cpu.h
> >> +++ b/target/i386/cpu.h
> >> @@ -29,6 +29,8 @@
> >>  /* The x86 has a strong memory model with some store-after-load re-ordering */
> >>  #define TCG_GUEST_DEFAULT_MO      (TCG_MO_ALL & ~TCG_MO_ST_LD)
> >>
> >> +#define KVM_HAVE_MCE_INJECTION 1
> >> +
> >>  /* Maximum instruction code size */
> >>  #define TARGET_MAX_INSN_SIZE 16
> >>
> >> --
> >> 2.19.1
> >>
> >>
> >>
> > .
> >
>


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-12-09 13:05         ` Beata Michalska
@ 2019-12-09 14:12           ` gengdongjiu
  -1 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-12-09 14:12 UTC (permalink / raw)
  To: Beata Michalska
  Cc: Xiang Zheng, pbonzini, mst, Igor Mammedov, shannon.zhaosl,
	Peter Maydell, Laszlo Ersek, james.morse, mtosatti, rth,
	ehabkost, jonathan.cameron, xuwei5, kvm, qemu-devel, qemu-arm,
	linuxarm, wanghaibin.wang



On 2019/12/9 21:05, Beata Michalska wrote:
>> Here we set the FnV to not valid, not to set it to valid.
>> because Guest will use the physical address that recorded in APEI table.
>>
> To be precise : the FnV is  giving the status of FAR - so what you are setting
> here is status of 0b0 which means FAR is valid, not FnV on it's own.
> And my point was that you are changing the prototype for syn_data_abort_no_iss
> just for this case only so I was just thinking that it might not be
> worth that, instead
> you could just set it here ... or to be more flexible , provide a way
> to set specific bits
> on demand.

No, I set the FnV to 0b1, not 0b0, the whole esr_el1's value is 0x96000410, as shown below log:
I remember changing the prototype for syn_data_abort_no_iss is suggested by Peter Maydell.


[1]:
[   62.851830] Internal error: synchronous external abort: 96000410 [#1] PREEMPT SMP
[   62.854465] Modules linked in:



> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-12-09 14:12           ` gengdongjiu
  0 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-12-09 14:12 UTC (permalink / raw)
  To: Beata Michalska
  Cc: Peter Maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	qemu-devel, linuxarm, shannon.zhaosl, Xiang Zheng, qemu-arm,
	james.morse, jonathan.cameron, Igor Mammedov, pbonzini, xuwei5,
	Laszlo Ersek, rth



On 2019/12/9 21:05, Beata Michalska wrote:
>> Here we set the FnV to not valid, not to set it to valid.
>> because Guest will use the physical address that recorded in APEI table.
>>
> To be precise : the FnV is  giving the status of FAR - so what you are setting
> here is status of 0b0 which means FAR is valid, not FnV on it's own.
> And my point was that you are changing the prototype for syn_data_abort_no_iss
> just for this case only so I was just thinking that it might not be
> worth that, instead
> you could just set it here ... or to be more flexible , provide a way
> to set specific bits
> on demand.

No, I set the FnV to 0b1, not 0b0, the whole esr_el1's value is 0x96000410, as shown below log:
I remember changing the prototype for syn_data_abort_no_iss is suggested by Peter Maydell.


[1]:
[   62.851830] Internal error: synchronous external abort: 96000410 [#1] PREEMPT SMP
[   62.854465] Modules linked in:



> 



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
  2019-11-15 16:37     ` Igor Mammedov
@ 2019-12-21 12:35       ` gengdongjiu
  -1 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-12-21 12:35 UTC (permalink / raw)
  To: Igor Mammedov, Xiang Zheng
  Cc: pbonzini, mst, shannon.zhaosl, peter.maydell, lersek,
	james.morse, mtosatti, rth, ehabkost, jonathan.cameron, xuwei5,
	kvm, qemu-devel, qemu-arm, linuxarm, wanghaibin.wang



On 2019/11/16 0:37, Igor Mammedov wrote:
>> +
>> +        /* zero means OSPM does not acknowledge the error */
>> +        if (!read_ack_register) {
>> +            if (loop < 3) {
>> +                usleep(100 * 1000);
>> +                loop++;
>> +                goto retry;
> as minimum this loop can stall guest repeatedly for 0.3s if guest triggers BQL,
> until it handles error.

I think reparations for 0.3s is reasonable.
1. 0.3s is the worst case to repeat, if guest acknowledge it in before 0.3s, the guest can not stall
2. if the previous error is not acknowledged, the next error will be lost, error handling(safety) is more important than others.


>
> (not sure what to suggest here though)
> 
> (not sure what to suggest here though)
> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
@ 2019-12-21 12:35       ` gengdongjiu
  0 siblings, 0 replies; 82+ messages in thread
From: gengdongjiu @ 2019-12-21 12:35 UTC (permalink / raw)
  To: Igor Mammedov, Xiang Zheng
  Cc: peter.maydell, ehabkost, kvm, mst, wanghaibin.wang, mtosatti,
	qemu-devel, linuxarm, shannon.zhaosl, qemu-arm, james.morse,
	xuwei5, jonathan.cameron, pbonzini, lersek, rth



On 2019/11/16 0:37, Igor Mammedov wrote:
>> +
>> +        /* zero means OSPM does not acknowledge the error */
>> +        if (!read_ack_register) {
>> +            if (loop < 3) {
>> +                usleep(100 * 1000);
>> +                loop++;
>> +                goto retry;
> as minimum this loop can stall guest repeatedly for 0.3s if guest triggers BQL,
> until it handles error.

I think reparations for 0.3s is reasonable.
1. 0.3s is the worst case to repeat, if guest acknowledge it in before 0.3s, the guest can not stall
2. if the previous error is not acknowledged, the next error will be lost, error handling(safety) is more important than others.


>
> (not sure what to suggest here though)
> 
> (not sure what to suggest here though)
> 



^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2019-12-21 12:37 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-11  1:40 [RESEND PATCH v21 0/6] Add ARMv8 RAS virtualization support in QEMU Xiang Zheng
2019-11-11  1:40 ` Xiang Zheng
2019-11-11  1:40 ` [RESEND PATCH v21 1/6] hw/arm/virt: Introduce a RAS machine option Xiang Zheng
2019-11-11  1:40   ` Xiang Zheng
2019-12-02 18:22   ` Peter Maydell
2019-12-02 18:22     ` Peter Maydell
2019-12-07 12:10     ` gengdongjiu
2019-12-07 12:10       ` gengdongjiu
2019-11-11  1:40 ` [RESEND PATCH v21 2/6] docs: APEI GHES generation and CPER record description Xiang Zheng
2019-11-11  1:40   ` Xiang Zheng
2019-11-15  9:44   ` Igor Mammedov
2019-11-15  9:44     ` Igor Mammedov
2019-11-27  1:37     ` Xiang Zheng
2019-11-27  1:37       ` Xiang Zheng
2019-11-27  8:12       ` Igor Mammedov
2019-11-27  8:12         ` Igor Mammedov
2019-11-11  1:40 ` [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support Xiang Zheng
2019-11-11  1:40   ` Xiang Zheng
2019-11-15  9:38   ` Igor Mammedov
2019-11-15  9:38     ` Igor Mammedov
2019-11-18 12:49     ` gengdongjiu
2019-11-18 12:49       ` gengdongjiu
2019-11-18 13:18       ` gengdongjiu
2019-11-18 13:18         ` gengdongjiu
2019-11-18 13:21         ` Michael S. Tsirkin
2019-11-18 13:21           ` Michael S. Tsirkin
2019-11-18 13:57           ` gengdongjiu
2019-11-18 13:57             ` gengdongjiu
2019-11-25  9:48           ` Igor Mammedov
2019-11-25  9:48             ` Igor Mammedov
2019-11-27 11:16             ` gengdongjiu
2019-11-27 11:16               ` gengdongjiu
2019-11-22 15:44       ` Beata Michalska
2019-11-22 15:44         ` Beata Michalska
2019-11-22 15:42   ` Beata Michalska
2019-11-22 15:42     ` Beata Michalska
2019-11-25  9:23     ` Igor Mammedov
2019-11-25  9:23       ` Igor Mammedov
2019-11-11  1:40 ` [RESEND PATCH v21 4/6] KVM: Move hwpoison page related functions into kvm-all.c Xiang Zheng
2019-11-11  1:40   ` Xiang Zheng
2019-12-02 18:23   ` Peter Maydell
2019-12-02 18:23     ` Peter Maydell
2019-11-11  1:40 ` [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM Xiang Zheng
2019-11-11  1:40   ` Xiang Zheng
2019-11-15 16:37   ` Igor Mammedov
2019-11-15 16:37     ` Igor Mammedov
2019-11-22 15:47     ` Beata Michalska
2019-11-22 15:47       ` Beata Michalska
2019-11-25  9:37       ` Igor Mammedov
2019-11-25  9:37         ` Igor Mammedov
2019-11-27  1:40     ` Xiang Zheng
2019-11-27  1:40       ` Xiang Zheng
2019-11-27 10:43       ` Igor Mammedov
2019-11-27 10:43         ` Igor Mammedov
2019-12-21 12:35     ` gengdongjiu
2019-12-21 12:35       ` gengdongjiu
2019-11-22 15:47   ` Beata Michalska
2019-11-22 15:47     ` Beata Michalska
2019-11-27 12:47     ` Xiang Zheng
2019-11-27 12:47       ` Xiang Zheng
2019-11-27 13:02       ` Igor Mammedov
2019-11-27 13:02         ` Igor Mammedov
2019-11-27 14:17         ` Beata Michalska
2019-11-27 14:17           ` Beata Michalska
2019-12-03  3:35           ` Xiang Zheng
2019-12-03  3:35             ` Xiang Zheng
2019-11-27 14:17       ` Beata Michalska
2019-11-27 14:17         ` Beata Michalska
2019-12-03  3:35         ` Xiang Zheng
2019-12-03  3:35           ` Xiang Zheng
2019-12-07  9:33     ` gengdongjiu
2019-12-07  9:33       ` gengdongjiu
2019-12-09 13:05       ` Beata Michalska
2019-12-09 13:05         ` Beata Michalska
2019-12-09 14:12         ` gengdongjiu
2019-12-09 14:12           ` gengdongjiu
2019-11-11  1:40 ` [RESEND PATCH v21 6/6] MAINTAINERS: Add APCI/APEI/GHES entries Xiang Zheng
2019-11-11  1:40   ` Xiang Zheng
2019-12-02 18:27 ` [RESEND PATCH v21 0/6] Add ARMv8 RAS virtualization support in QEMU Peter Maydell
2019-12-02 18:27   ` Peter Maydell
2019-12-03  2:09   ` gengdongjiu
2019-12-03  2:09     ` gengdongjiu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.