All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V8 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64
@ 2017-02-01 17:16 ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi
  Cc: Tyler Baicar

When a memory error, CPU error, PCIe error, or other type of hardware error
that's covered by RAS occurs, firmware should populate the shared GHES memory
location with the proper GHES structures to notify the OS of the error.
For example, platforms that implement firmware first handling may implement
separate GHES sources for corrected errors and uncorrected errors. If the
error is an uncorrectable error, then the firmware will notify the OS
immediately since the error needs to be handled ASAP. The OS will then be able
to take the appropriate action needed such as offlining a page. If the error
is a corrected error, then the firmware will not interrupt the OS immediately.
Instead, the OS will see and report the error the next time it's GHES timer
expires. The kernel will first parse the GHES structures and report the errors
through the kernel logs and then notify the user space through RAS trace
events. This allows user space applications such as RAS Daemon to see the
errors and report them however the user desires. This patchset extends the
kernel functionality for RAS errors based on updates in the UEFI 2.6 and
ACPI 6.1 specifications.

An example flow from firmware to user space could be:

                 +---------------+
       +-------->|               |
       |         |  GHES polling |--+
+-------------+  |    source     |  |   +---------------+   +------------+
|             |  +---------------+  |   |  Kernel GHES  |   |            |
|  Firmware   |                     +-->|  CPER AER and |-->|  RAS trace |
|             |  +---------------+  |   |  EDAC drivers |   |   event    |
+-------------+  |               |  |   +---------------+   +------------+
       |         |  GHES sci     |--+
       +-------->|   source      |
                 +---------------+

Add support for Generic Hardware Error Source (GHES) v2, which introduces the
capability for the OS to acknowledge the consumption of the error record
generated by the Reliability, Availability and Serviceability (RAS) controller.
This eliminates potential race conditions between the OS and the RAS controller.

Add support for the timestamp field added to the Generic Error Data Entry v3,
allowing the OS to log the time that the error is generated by the firmware,
rather than the time the error is consumed. This improves the correctness of
event sequences when analyzing error logs. The timestamp is added in
ACPI 6.1, reference Table 18-343 Generic Error Data Entry.

Add support for ARMv8 Common Platform Error Record (CPER) per UEFI 2.6
specification. ARMv8 specific processor error information is reported as part of
the CPER records.  This provides more detail on for processor error logs. This
can help describe ARMv8 cache, tlb, and bus errors.

Synchronous External Abort (SEA) represents a specific processor error condition
in ARM systems. A handler is added to recognize SEA errors, and a notifier is
added to parse and report the errors before the process is killed. Refer to
section N.2.1.1 in the Common Platform Error Record appendix of the UEFI 2.6
specification.

Currently the kernel ignores CPER records that are unrecognized.
On the other hand, UEFI spec allows for non-standard (eg. vendor
proprietary) error section type in CPER (Common Platform Error Record),
as defined in section N2.3 of UEFI version 2.5. Therefore, user
is not able to see hardware error data of non-standard section.

If section Type field of Generic Error Data Entry is unrecognized,
prints out the raw data in dmesg buffer, and also adds a tracepoint
for reporting such hardware errors.

Currently even if an error status block's severity is fatal, the kernel
does not honor the severity level and panic. With the firmware first
model, the platform could inform the OS about a fatal hardware error
through the non-NMI GHES notification type. The OS should panic when a
hardware error record is received with this severity.

Add support to handle SEAs that occur while a KVM guest kernel is
running. Currently these are unsupported by the guest abort handling.

V8: Remove SEA notifier
    Add FAR not valid bit check when populating the SEA error address
    Move nmi_enter/exit() to architecture specific code
    Add synchronize_rcu() usage to SEA handling
    Make GHES_IOREMAP_PAGES always 2
    Update ghes_ioremap_pfn_nmi() to work like ghes_ioremap_pfn_irq()
    Remove the SEA print from handle_guest_sea()

V7: Update a couple prints for ARM processor errors
    Add Print notifying if overflow occurred for ARM processor errors
    Check for ARM configuration to allow the compiler to ignore ARM code
     on non-ARM systems
    Use SEA acronym instead of spelling it out
    Update fault_info prints to be more clear
    Add NMI locking to SEA notification
    Remove error info structure from ARM trace event since there can be
     a variable amount of these structures

V6: Change HEST_TYPE_GENERIC_V2 to IS_HEST_TYPE_GENERIC_V2 for readability
    Move APEI helper defines from cper.h to ghes.h
    Add data_len decrement back into print loop
    Change references to ARMv8 to just ARM
    Rewrite ARM processor context info parsing
    Check valid bit of ARM error info field before printing it
    Add include of linux/uuid.h in ghes.c

V5: Fix GHES goto logic for error conditions
    Change ghes_do_read_ack to ghes_ack_error
    Make sure data version check is >= 3
    Use CPER helper functions in print functions
    Make handle_guest_sea() dummy function static for arm
    Add arm to subject line for KVM patch

V4: Add bit offset left shift to read_ack_write value
    Make HEST generic and generic_v2 structures a union in the ghes structure
    Move gdata v3 helper functions into ghes.h to avoid duplication
    Reorder the timestamp print and avoid memcpy
    Add helper functions for gdata size checking
    Rename the SEA functions
    Add helper function for GHES panics
    Set fru_id to NULL UUID at variable declaration
    Limit ARM trace event parameters to the needed structures
    Reorder the ARM trace event variables to save space
    Add comment for why we don't pass SEAs to the guest when it aborts
    Move ARM trace event call into GHES driver instead of CPER

V3: Fix unmapped address to the read_ack_register in ghes.c
    Add helper function to get the proper payload based on generic data entry
     version
    Move timestamp print to avoid changing function calls in cper.c
    Remove patch "arm64: exception: handle instruction abort at current EL"
     since the el1_ia handler is already added in 4.8
    Add EFI and ARM64 dependencies for HAVE_ACPI_APEI_SEA
    Add a new trace event for ARM type errors
    Add support to handle KVM guest SEAs

V2: Add PSCI state print for the ARMv8 error type.
    Separate timestamp year into year and century using BCD format.
    Rebase on top of ACPICA 20160318 release and remove header file changes
     in include/acpi/actbl1.h.
    Add panic OS with fatal error status block patch.
    Add processing of unrecognized CPER error section patches with updates
     from previous comments. Original patches: https://lkml.org/lkml/2015/9/8/646

V1: https://lkml.org/lkml/2016/2/5/544

Jonathan (Zhixiong) Zhang (1):
  acpi: apei: panic OS with fatal error status block

Tyler Baicar (9):
  acpi: apei: read ack upon ghes record consumption
  ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
  efi: parse ARM processor error
  arm64: exception: handle Synchronous External Abort
  acpi: apei: handle SEA notification type for ARMv8
  efi: print unrecognized CPER section
  ras: acpi / apei: generate trace event for unrecognized CPER section
  trace, ras: add ARM processor error trace event
  arm/arm64: KVM: add guest SEA support

 arch/arm/include/asm/kvm_arm.h       |   1 +
 arch/arm/include/asm/system_misc.h   |   5 +
 arch/arm/kvm/mmu.c                   |  18 ++-
 arch/arm64/Kconfig                   |   2 +
 arch/arm64/include/asm/kvm_arm.h     |   1 +
 arch/arm64/include/asm/system_misc.h |   2 +
 arch/arm64/mm/fault.c                |  72 ++++++++++--
 drivers/acpi/apei/Kconfig            |  14 +++
 drivers/acpi/apei/ghes.c             | 182 ++++++++++++++++++++++++++----
 drivers/acpi/apei/hest.c             |   7 +-
 drivers/firmware/efi/cper.c          | 209 ++++++++++++++++++++++++++++++++---
 drivers/ras/ras.c                    |   2 +
 include/acpi/ghes.h                  |  29 ++++-
 include/linux/cper.h                 |  54 +++++++++
 include/ras/ras_event.h              |  79 +++++++++++++
 15 files changed, 626 insertions(+), 51 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64
@ 2017-02-01 17:16 ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose
  Cc: Tyler Baicar

When a memory error, CPU error, PCIe error, or other type of hardware error
that's covered by RAS occurs, firmware should populate the shared GHES memory
location with the proper GHES structures to notify the OS of the error.
For example, platforms that implement firmware first handling may implement
separate GHES sources for corrected errors and uncorrected errors. If the
error is an uncorrectable error, then the firmware will notify the OS
immediately since the error needs to be handled ASAP. The OS will then be able
to take the appropriate action needed such as offlining a page. If the error
is a corrected error, then the firmware will not interrupt the OS immediately.
Instead, the OS will see and report the error the next time it's GHES timer
expires. The kernel will first parse the GHES structures and report the errors
through the kernel logs and then notify the user space through RAS trace
events. This allows user space applications such as RAS Daemon to see the
errors and report them however the user desires. This patchset extends the
kernel functionality for RAS errors based on updates in the UEFI 2.6 and
ACPI 6.1 specifications.

An example flow from firmware to user space could be:

                 +---------------+
       +-------->|               |
       |         |  GHES polling |--+
+-------------+  |    source     |  |   +---------------+   +------------+
|             |  +---------------+  |   |  Kernel GHES  |   |            |
|  Firmware   |                     +-->|  CPER AER and |-->|  RAS trace |
|             |  +---------------+  |   |  EDAC drivers |   |   event    |
+-------------+  |               |  |   +---------------+   +------------+
       |         |  GHES sci     |--+
       +-------->|   source      |
                 +---------------+

Add support for Generic Hardware Error Source (GHES) v2, which introduces the
capability for the OS to acknowledge the consumption of the error record
generated by the Reliability, Availability and Serviceability (RAS) controller.
This eliminates potential race conditions between the OS and the RAS controller.

Add support for the timestamp field added to the Generic Error Data Entry v3,
allowing the OS to log the time that the error is generated by the firmware,
rather than the time the error is consumed. This improves the correctness of
event sequences when analyzing error logs. The timestamp is added in
ACPI 6.1, reference Table 18-343 Generic Error Data Entry.

Add support for ARMv8 Common Platform Error Record (CPER) per UEFI 2.6
specification. ARMv8 specific processor error information is reported as part of
the CPER records.  This provides more detail on for processor error logs. This
can help describe ARMv8 cache, tlb, and bus errors.

Synchronous External Abort (SEA) represents a specific processor error condition
in ARM systems. A handler is added to recognize SEA errors, and a notifier is
added to parse and report the errors before the process is killed. Refer to
section N.2.1.1 in the Common Platform Error Record appendix of the UEFI 2.6
specification.

Currently the kernel ignores CPER records that are unrecognized.
On the other hand, UEFI spec allows for non-standard (eg. vendor
proprietary) error section type in CPER (Common Platform Error Record),
as defined in section N2.3 of UEFI version 2.5. Therefore, user
is not able to see hardware error data of non-standard section.

If section Type field of Generic Error Data Entry is unrecognized,
prints out the raw data in dmesg buffer, and also adds a tracepoint
for reporting such hardware errors.

Currently even if an error status block's severity is fatal, the kernel
does not honor the severity level and panic. With the firmware first
model, the platform could inform the OS about a fatal hardware error
through the non-NMI GHES notification type. The OS should panic when a
hardware error record is received with this severity.

Add support to handle SEAs that occur while a KVM guest kernel is
running. Currently these are unsupported by the guest abort handling.

V8: Remove SEA notifier
    Add FAR not valid bit check when populating the SEA error address
    Move nmi_enter/exit() to architecture specific code
    Add synchronize_rcu() usage to SEA handling
    Make GHES_IOREMAP_PAGES always 2
    Update ghes_ioremap_pfn_nmi() to work like ghes_ioremap_pfn_irq()
    Remove the SEA print from handle_guest_sea()

V7: Update a couple prints for ARM processor errors
    Add Print notifying if overflow occurred for ARM processor errors
    Check for ARM configuration to allow the compiler to ignore ARM code
     on non-ARM systems
    Use SEA acronym instead of spelling it out
    Update fault_info prints to be more clear
    Add NMI locking to SEA notification
    Remove error info structure from ARM trace event since there can be
     a variable amount of these structures

V6: Change HEST_TYPE_GENERIC_V2 to IS_HEST_TYPE_GENERIC_V2 for readability
    Move APEI helper defines from cper.h to ghes.h
    Add data_len decrement back into print loop
    Change references to ARMv8 to just ARM
    Rewrite ARM processor context info parsing
    Check valid bit of ARM error info field before printing it
    Add include of linux/uuid.h in ghes.c

V5: Fix GHES goto logic for error conditions
    Change ghes_do_read_ack to ghes_ack_error
    Make sure data version check is >= 3
    Use CPER helper functions in print functions
    Make handle_guest_sea() dummy function static for arm
    Add arm to subject line for KVM patch

V4: Add bit offset left shift to read_ack_write value
    Make HEST generic and generic_v2 structures a union in the ghes structure
    Move gdata v3 helper functions into ghes.h to avoid duplication
    Reorder the timestamp print and avoid memcpy
    Add helper functions for gdata size checking
    Rename the SEA functions
    Add helper function for GHES panics
    Set fru_id to NULL UUID at variable declaration
    Limit ARM trace event parameters to the needed structures
    Reorder the ARM trace event variables to save space
    Add comment for why we don't pass SEAs to the guest when it aborts
    Move ARM trace event call into GHES driver instead of CPER

V3: Fix unmapped address to the read_ack_register in ghes.c
    Add helper function to get the proper payload based on generic data entry
     version
    Move timestamp print to avoid changing function calls in cper.c
    Remove patch "arm64: exception: handle instruction abort at current EL"
     since the el1_ia handler is already added in 4.8
    Add EFI and ARM64 dependencies for HAVE_ACPI_APEI_SEA
    Add a new trace event for ARM type errors
    Add support to handle KVM guest SEAs

V2: Add PSCI state print for the ARMv8 error type.
    Separate timestamp year into year and century using BCD format.
    Rebase on top of ACPICA 20160318 release and remove header file changes
     in include/acpi/actbl1.h.
    Add panic OS with fatal error status block patch.
    Add processing of unrecognized CPER error section patches with updates
     from previous comments. Original patches: https://lkml.org/lkml/2015/9/8/646

V1: https://lkml.org/lkml/2016/2/5/544

Jonathan (Zhixiong) Zhang (1):
  acpi: apei: panic OS with fatal error status block

Tyler Baicar (9):
  acpi: apei: read ack upon ghes record consumption
  ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
  efi: parse ARM processor error
  arm64: exception: handle Synchronous External Abort
  acpi: apei: handle SEA notification type for ARMv8
  efi: print unrecognized CPER section
  ras: acpi / apei: generate trace event for unrecognized CPER section
  trace, ras: add ARM processor error trace event
  arm/arm64: KVM: add guest SEA support

 arch/arm/include/asm/kvm_arm.h       |   1 +
 arch/arm/include/asm/system_misc.h   |   5 +
 arch/arm/kvm/mmu.c                   |  18 ++-
 arch/arm64/Kconfig                   |   2 +
 arch/arm64/include/asm/kvm_arm.h     |   1 +
 arch/arm64/include/asm/system_misc.h |   2 +
 arch/arm64/mm/fault.c                |  72 ++++++++++--
 drivers/acpi/apei/Kconfig            |  14 +++
 drivers/acpi/apei/ghes.c             | 182 ++++++++++++++++++++++++++----
 drivers/acpi/apei/hest.c             |   7 +-
 drivers/firmware/efi/cper.c          | 209 ++++++++++++++++++++++++++++++++---
 drivers/ras/ras.c                    |   2 +
 include/acpi/ghes.h                  |  29 ++++-
 include/linux/cper.h                 |  54 +++++++++
 include/ras/ras_event.h              |  79 +++++++++++++
 15 files changed, 626 insertions(+), 51 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64
@ 2017-02-01 17:16 ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi
  Cc: Tyler Baicar

When a memory error, CPU error, PCIe error, or other type of hardware error
that's covered by RAS occurs, firmware should populate the shared GHES memory
location with the proper GHES structures to notify the OS of the error.
For example, platforms that implement firmware first handling may implement
separate GHES sources for corrected errors and uncorrected errors. If the
error is an uncorrectable error, then the firmware will notify the OS
immediately since the error needs to be handled ASAP. The OS will then be able
to take the appropriate action needed such as offlining a page. If the error
is a corrected error, then the firmware will not interrupt the OS immediately.
Instead, the OS will see and report the error the next time it's GHES timer
expires. The kernel will first parse the GHES structures and report the errors
through the kernel logs and then notify the user space through RAS trace
events. This allows user space applications such as RAS Daemon to see the
errors and report them however the user desires. This patchset extends the
kernel functionality for RAS errors based on updates in the UEFI 2.6 and
ACPI 6.1 specifications.

An example flow from firmware to user space could be:

                 +---------------+
       +-------->|               |
       |         |  GHES polling |--+
+-------------+  |    source     |  |   +---------------+   +------------+
|             |  +---------------+  |   |  Kernel GHES  |   |            |
|  Firmware   |                     +-->|  CPER AER and |-->|  RAS trace |
|             |  +---------------+  |   |  EDAC drivers |   |   event    |
+-------------+  |               |  |   +---------------+   +------------+
       |         |  GHES sci     |--+
       +-------->|   source      |
                 +---------------+

Add support for Generic Hardware Error Source (GHES) v2, which introduces the
capability for the OS to acknowledge the consumption of the error record
generated by the Reliability, Availability and Serviceability (RAS) controller.
This eliminates potential race conditions between the OS and the RAS controller.

Add support for the timestamp field added to the Generic Error Data Entry v3,
allowing the OS to log the time that the error is generated by the firmware,
rather than the time the error is consumed. This improves the correctness of
event sequences when analyzing error logs. The timestamp is added in
ACPI 6.1, reference Table 18-343 Generic Error Data Entry.

Add support for ARMv8 Common Platform Error Record (CPER) per UEFI 2.6
specification. ARMv8 specific processor error information is reported as part of
the CPER records.  This provides more detail on for processor error logs. This
can help describe ARMv8 cache, tlb, and bus errors.

Synchronous External Abort (SEA) represents a specific processor error condition
in ARM systems. A handler is added to recognize SEA errors, and a notifier is
added to parse and report the errors before the process is killed. Refer to
section N.2.1.1 in the Common Platform Error Record appendix of the UEFI 2.6
specification.

Currently the kernel ignores CPER records that are unrecognized.
On the other hand, UEFI spec allows for non-standard (eg. vendor
proprietary) error section type in CPER (Common Platform Error Record),
as defined in section N2.3 of UEFI version 2.5. Therefore, user
is not able to see hardware error data of non-standard section.

If section Type field of Generic Error Data Entry is unrecognized,
prints out the raw data in dmesg buffer, and also adds a tracepoint
for reporting such hardware errors.

Currently even if an error status block's severity is fatal, the kernel
does not honor the severity level and panic. With the firmware first
model, the platform could inform the OS about a fatal hardware error
through the non-NMI GHES notification type. The OS should panic when a
hardware error record is received with this severity.

Add support to handle SEAs that occur while a KVM guest kernel is
running. Currently these are unsupported by the guest abort handling.

V8: Remove SEA notifier
    Add FAR not valid bit check when populating the SEA error address
    Move nmi_enter/exit() to architecture specific code
    Add synchronize_rcu() usage to SEA handling
    Make GHES_IOREMAP_PAGES always 2
    Update ghes_ioremap_pfn_nmi() to work like ghes_ioremap_pfn_irq()
    Remove the SEA print from handle_guest_sea()

V7: Update a couple prints for ARM processor errors
    Add Print notifying if overflow occurred for ARM processor errors
    Check for ARM configuration to allow the compiler to ignore ARM code
     on non-ARM systems
    Use SEA acronym instead of spelling it out
    Update fault_info prints to be more clear
    Add NMI locking to SEA notification
    Remove error info structure from ARM trace event since there can be
     a variable amount of these structures

V6: Change HEST_TYPE_GENERIC_V2 to IS_HEST_TYPE_GENERIC_V2 for readability
    Move APEI helper defines from cper.h to ghes.h
    Add data_len decrement back into print loop
    Change references to ARMv8 to just ARM
    Rewrite ARM processor context info parsing
    Check valid bit of ARM error info field before printing it
    Add include of linux/uuid.h in ghes.c

V5: Fix GHES goto logic for error conditions
    Change ghes_do_read_ack to ghes_ack_error
    Make sure data version check is >= 3
    Use CPER helper functions in print functions
    Make handle_guest_sea() dummy function static for arm
    Add arm to subject line for KVM patch

V4: Add bit offset left shift to read_ack_write value
    Make HEST generic and generic_v2 structures a union in the ghes structure
    Move gdata v3 helper functions into ghes.h to avoid duplication
    Reorder the timestamp print and avoid memcpy
    Add helper functions for gdata size checking
    Rename the SEA functions
    Add helper function for GHES panics
    Set fru_id to NULL UUID at variable declaration
    Limit ARM trace event parameters to the needed structures
    Reorder the ARM trace event variables to save space
    Add comment for why we don't pass SEAs to the guest when it aborts
    Move ARM trace event call into GHES driver instead of CPER

V3: Fix unmapped address to the read_ack_register in ghes.c
    Add helper function to get the proper payload based on generic data entry
     version
    Move timestamp print to avoid changing function calls in cper.c
    Remove patch "arm64: exception: handle instruction abort at current EL"
     since the el1_ia handler is already added in 4.8
    Add EFI and ARM64 dependencies for HAVE_ACPI_APEI_SEA
    Add a new trace event for ARM type errors
    Add support to handle KVM guest SEAs

V2: Add PSCI state print for the ARMv8 error type.
    Separate timestamp year into year and century using BCD format.
    Rebase on top of ACPICA 20160318 release and remove header file changes
     in include/acpi/actbl1.h.
    Add panic OS with fatal error status block patch.
    Add processing of unrecognized CPER error section patches with updates
     from previous comments. Original patches: https://lkml.org/lkml/2015/9/8/646

V1: https://lkml.org/lkml/2016/2/5/544

Jonathan (Zhixiong) Zhang (1):
  acpi: apei: panic OS with fatal error status block

Tyler Baicar (9):
  acpi: apei: read ack upon ghes record consumption
  ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
  efi: parse ARM processor error
  arm64: exception: handle Synchronous External Abort
  acpi: apei: handle SEA notification type for ARMv8
  efi: print unrecognized CPER section
  ras: acpi / apei: generate trace event for unrecognized CPER section
  trace, ras: add ARM processor error trace event
  arm/arm64: KVM: add guest SEA support

 arch/arm/include/asm/kvm_arm.h       |   1 +
 arch/arm/include/asm/system_misc.h   |   5 +
 arch/arm/kvm/mmu.c                   |  18 ++-
 arch/arm64/Kconfig                   |   2 +
 arch/arm64/include/asm/kvm_arm.h     |   1 +
 arch/arm64/include/asm/system_misc.h |   2 +
 arch/arm64/mm/fault.c                |  72 ++++++++++--
 drivers/acpi/apei/Kconfig            |  14 +++
 drivers/acpi/apei/ghes.c             | 182 ++++++++++++++++++++++++++----
 drivers/acpi/apei/hest.c             |   7 +-
 drivers/firmware/efi/cper.c          | 209 ++++++++++++++++++++++++++++++++---
 drivers/ras/ras.c                    |   2 +
 include/acpi/ghes.h                  |  29 ++++-
 include/linux/cper.h                 |  54 +++++++++
 include/ras/ras_event.h              |  79 +++++++++++++
 15 files changed, 626 insertions(+), 51 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64
@ 2017-02-01 17:16 ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

When a memory error, CPU error, PCIe error, or other type of hardware error
that's covered by RAS occurs, firmware should populate the shared GHES memory
location with the proper GHES structures to notify the OS of the error.
For example, platforms that implement firmware first handling may implement
separate GHES sources for corrected errors and uncorrected errors. If the
error is an uncorrectable error, then the firmware will notify the OS
immediately since the error needs to be handled ASAP. The OS will then be able
to take the appropriate action needed such as offlining a page. If the error
is a corrected error, then the firmware will not interrupt the OS immediately.
Instead, the OS will see and report the error the next time it's GHES timer
expires. The kernel will first parse the GHES structures and report the errors
through the kernel logs and then notify the user space through RAS trace
events. This allows user space applications such as RAS Daemon to see the
errors and report them however the user desires. This patchset extends the
kernel functionality for RAS errors based on updates in the UEFI 2.6 and
ACPI 6.1 specifications.

An example flow from firmware to user space could be:

                 +---------------+
       +-------->|               |
       |         |  GHES polling |--+
+-------------+  |    source     |  |   +---------------+   +------------+
|             |  +---------------+  |   |  Kernel GHES  |   |            |
|  Firmware   |                     +-->|  CPER AER and |-->|  RAS trace |
|             |  +---------------+  |   |  EDAC drivers |   |   event    |
+-------------+  |               |  |   +---------------+   +------------+
       |         |  GHES sci     |--+
       +-------->|   source      |
                 +---------------+

Add support for Generic Hardware Error Source (GHES) v2, which introduces the
capability for the OS to acknowledge the consumption of the error record
generated by the Reliability, Availability and Serviceability (RAS) controller.
This eliminates potential race conditions between the OS and the RAS controller.

Add support for the timestamp field added to the Generic Error Data Entry v3,
allowing the OS to log the time that the error is generated by the firmware,
rather than the time the error is consumed. This improves the correctness of
event sequences when analyzing error logs. The timestamp is added in
ACPI 6.1, reference Table 18-343 Generic Error Data Entry.

Add support for ARMv8 Common Platform Error Record (CPER) per UEFI 2.6
specification. ARMv8 specific processor error information is reported as part of
the CPER records.  This provides more detail on for processor error logs. This
can help describe ARMv8 cache, tlb, and bus errors.

Synchronous External Abort (SEA) represents a specific processor error condition
in ARM systems. A handler is added to recognize SEA errors, and a notifier is
added to parse and report the errors before the process is killed. Refer to
section N.2.1.1 in the Common Platform Error Record appendix of the UEFI 2.6
specification.

Currently the kernel ignores CPER records that are unrecognized.
On the other hand, UEFI spec allows for non-standard (eg. vendor
proprietary) error section type in CPER (Common Platform Error Record),
as defined in section N2.3 of UEFI version 2.5. Therefore, user
is not able to see hardware error data of non-standard section.

If section Type field of Generic Error Data Entry is unrecognized,
prints out the raw data in dmesg buffer, and also adds a tracepoint
for reporting such hardware errors.

Currently even if an error status block's severity is fatal, the kernel
does not honor the severity level and panic. With the firmware first
model, the platform could inform the OS about a fatal hardware error
through the non-NMI GHES notification type. The OS should panic when a
hardware error record is received with this severity.

Add support to handle SEAs that occur while a KVM guest kernel is
running. Currently these are unsupported by the guest abort handling.

V8: Remove SEA notifier
    Add FAR not valid bit check when populating the SEA error address
    Move nmi_enter/exit() to architecture specific code
    Add synchronize_rcu() usage to SEA handling
    Make GHES_IOREMAP_PAGES always 2
    Update ghes_ioremap_pfn_nmi() to work like ghes_ioremap_pfn_irq()
    Remove the SEA print from handle_guest_sea()

V7: Update a couple prints for ARM processor errors
    Add Print notifying if overflow occurred for ARM processor errors
    Check for ARM configuration to allow the compiler to ignore ARM code
     on non-ARM systems
    Use SEA acronym instead of spelling it out
    Update fault_info prints to be more clear
    Add NMI locking to SEA notification
    Remove error info structure from ARM trace event since there can be
     a variable amount of these structures

V6: Change HEST_TYPE_GENERIC_V2 to IS_HEST_TYPE_GENERIC_V2 for readability
    Move APEI helper defines from cper.h to ghes.h
    Add data_len decrement back into print loop
    Change references to ARMv8 to just ARM
    Rewrite ARM processor context info parsing
    Check valid bit of ARM error info field before printing it
    Add include of linux/uuid.h in ghes.c

V5: Fix GHES goto logic for error conditions
    Change ghes_do_read_ack to ghes_ack_error
    Make sure data version check is >= 3
    Use CPER helper functions in print functions
    Make handle_guest_sea() dummy function static for arm
    Add arm to subject line for KVM patch

V4: Add bit offset left shift to read_ack_write value
    Make HEST generic and generic_v2 structures a union in the ghes structure
    Move gdata v3 helper functions into ghes.h to avoid duplication
    Reorder the timestamp print and avoid memcpy
    Add helper functions for gdata size checking
    Rename the SEA functions
    Add helper function for GHES panics
    Set fru_id to NULL UUID at variable declaration
    Limit ARM trace event parameters to the needed structures
    Reorder the ARM trace event variables to save space
    Add comment for why we don't pass SEAs to the guest when it aborts
    Move ARM trace event call into GHES driver instead of CPER

V3: Fix unmapped address to the read_ack_register in ghes.c
    Add helper function to get the proper payload based on generic data entry
     version
    Move timestamp print to avoid changing function calls in cper.c
    Remove patch "arm64: exception: handle instruction abort at current EL"
     since the el1_ia handler is already added in 4.8
    Add EFI and ARM64 dependencies for HAVE_ACPI_APEI_SEA
    Add a new trace event for ARM type errors
    Add support to handle KVM guest SEAs

V2: Add PSCI state print for the ARMv8 error type.
    Separate timestamp year into year and century using BCD format.
    Rebase on top of ACPICA 20160318 release and remove header file changes
     in include/acpi/actbl1.h.
    Add panic OS with fatal error status block patch.
    Add processing of unrecognized CPER error section patches with updates
     from previous comments. Original patches: https://lkml.org/lkml/2015/9/8/646

V1: https://lkml.org/lkml/2016/2/5/544

Jonathan (Zhixiong) Zhang (1):
  acpi: apei: panic OS with fatal error status block

Tyler Baicar (9):
  acpi: apei: read ack upon ghes record consumption
  ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
  efi: parse ARM processor error
  arm64: exception: handle Synchronous External Abort
  acpi: apei: handle SEA notification type for ARMv8
  efi: print unrecognized CPER section
  ras: acpi / apei: generate trace event for unrecognized CPER section
  trace, ras: add ARM processor error trace event
  arm/arm64: KVM: add guest SEA support

 arch/arm/include/asm/kvm_arm.h       |   1 +
 arch/arm/include/asm/system_misc.h   |   5 +
 arch/arm/kvm/mmu.c                   |  18 ++-
 arch/arm64/Kconfig                   |   2 +
 arch/arm64/include/asm/kvm_arm.h     |   1 +
 arch/arm64/include/asm/system_misc.h |   2 +
 arch/arm64/mm/fault.c                |  72 ++++++++++--
 drivers/acpi/apei/Kconfig            |  14 +++
 drivers/acpi/apei/ghes.c             | 182 ++++++++++++++++++++++++++----
 drivers/acpi/apei/hest.c             |   7 +-
 drivers/firmware/efi/cper.c          | 209 ++++++++++++++++++++++++++++++++---
 drivers/ras/ras.c                    |   2 +
 include/acpi/ghes.h                  |  29 ++++-
 include/linux/cper.h                 |  54 +++++++++
 include/ras/ras_event.h              |  79 +++++++++++++
 15 files changed, 626 insertions(+), 51 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 01/10] acpi: apei: read ack upon ghes record consumption
  2017-02-01 17:16 ` Tyler Baicar
  (?)
  (?)
@ 2017-02-01 17:16   ` Tyler Baicar
  -1 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi
  Cc: Tyler Baicar

A RAS (Reliability, Availability, Serviceability) controller
may be a separate processor running in parallel with OS
execution, and may generate error records for consumption by
the OS. If the RAS controller produces multiple error records,
then they may be overwritten before the OS has consumed them.

The Generic Hardware Error Source (GHES) v2 structure
introduces the capability for the OS to acknowledge the
consumption of the error record generated by the RAS
controller. A RAS controller supporting GHESv2 shall wait for
the acknowledgment before writing a new error record, thus
eliminating the race condition.

Add support for parsing of GHESv2 sub-tables as well.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c | 49 +++++++++++++++++++++++++++++++++++++++++++++---
 drivers/acpi/apei/hest.c |  7 +++++--
 include/acpi/ghes.h      |  5 ++++-
 3 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e53bef6..5e1ec41 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -45,6 +45,7 @@
 #include <linux/aer.h>
 #include <linux/nmi.h>
 
+#include <acpi/actbl1.h>
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
@@ -79,6 +80,10 @@
 	((struct acpi_hest_generic_status *)				\
 	 ((struct ghes_estatus_node *)(estatus_node) + 1))
 
+#define IS_HEST_TYPE_GENERIC_V2(ghes)				\
+	((struct acpi_hest_header *)ghes->generic)->type ==	\
+	 ACPI_HEST_TYPE_GENERIC_ERROR_V2
+
 /*
  * This driver isn't really modular, however for the time being,
  * continuing to use module_param is the easiest way to remain
@@ -248,10 +253,18 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 	ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
 	if (!ghes)
 		return ERR_PTR(-ENOMEM);
+
 	ghes->generic = generic;
+	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+		rc = apei_map_generic_address(
+			&ghes->generic_v2->read_ack_register);
+		if (rc)
+			goto err_free;
+	}
+
 	rc = apei_map_generic_address(&generic->error_status_address);
 	if (rc)
-		goto err_free;
+		goto err_unmap_read_ack_addr;
 	error_block_length = generic->error_block_length;
 	if (error_block_length > GHES_ESTATUS_MAX_SIZE) {
 		pr_warning(FW_WARN GHES_PFX
@@ -263,13 +276,17 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 	ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
 	if (!ghes->estatus) {
 		rc = -ENOMEM;
-		goto err_unmap;
+		goto err_unmap_status_addr;
 	}
 
 	return ghes;
 
-err_unmap:
+err_unmap_status_addr:
 	apei_unmap_generic_address(&generic->error_status_address);
+err_unmap_read_ack_addr:
+	if (IS_HEST_TYPE_GENERIC_V2(ghes))
+		apei_unmap_generic_address(
+			&ghes->generic_v2->read_ack_register);
 err_free:
 	kfree(ghes);
 	return ERR_PTR(rc);
@@ -279,6 +296,9 @@ static void ghes_fini(struct ghes *ghes)
 {
 	kfree(ghes->estatus);
 	apei_unmap_generic_address(&ghes->generic->error_status_address);
+	if (IS_HEST_TYPE_GENERIC_V2(ghes))
+		apei_unmap_generic_address(
+			&ghes->generic_v2->read_ack_register);
 }
 
 static inline int ghes_severity(int severity)
@@ -648,6 +668,23 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
+static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
+{
+	int rc;
+	u64 val = 0;
+
+	rc = apei_read(&val, &generic_v2->read_ack_register);
+	if (rc)
+		return rc;
+	val &= generic_v2->read_ack_preserve <<
+		generic_v2->read_ack_register.bit_offset;
+	val |= generic_v2->read_ack_write <<
+		generic_v2->read_ack_register.bit_offset;
+	rc = apei_write(val, &generic_v2->read_ack_register);
+
+	return rc;
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -660,6 +697,12 @@ static int ghes_proc(struct ghes *ghes)
 			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
 	}
 	ghes_do_proc(ghes, ghes->estatus);
+
+	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+		rc = ghes_ack_error(ghes->generic_v2);
+		if (rc)
+			return rc;
+	}
 out:
 	ghes_clear_estatus(ghes);
 	return rc;
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index 8f2a98e..456b488 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -52,6 +52,7 @@
 	[ACPI_HEST_TYPE_AER_ENDPOINT] = sizeof(struct acpi_hest_aer),
 	[ACPI_HEST_TYPE_AER_BRIDGE] = sizeof(struct acpi_hest_aer_bridge),
 	[ACPI_HEST_TYPE_GENERIC_ERROR] = sizeof(struct acpi_hest_generic),
+	[ACPI_HEST_TYPE_GENERIC_ERROR_V2] = sizeof(struct acpi_hest_generic_v2),
 };
 
 static int hest_esrc_len(struct acpi_hest_header *hest_hdr)
@@ -141,7 +142,8 @@ static int __init hest_parse_ghes_count(struct acpi_hest_header *hest_hdr, void
 {
 	int *count = data;
 
-	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR ||
+	    hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		(*count)++;
 	return 0;
 }
@@ -152,7 +154,8 @@ static int __init hest_parse_ghes(struct acpi_hest_header *hest_hdr, void *data)
 	struct ghes_arr *ghes_arr = data;
 	int rc, i;
 
-	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR &&
+	    hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		return 0;
 
 	if (!((struct acpi_hest_generic *)hest_hdr)->enabled)
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 720446c..68f088a 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -13,7 +13,10 @@
 #define GHES_EXITING		0x0002
 
 struct ghes {
-	struct acpi_hest_generic *generic;
+	union {
+		struct acpi_hest_generic *generic;
+		struct acpi_hest_generic_v2 *generic_v2;
+	};
 	struct acpi_hest_generic_status *estatus;
 	u64 buffer_paddr;
 	unsigned long flags;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 01/10] acpi: apei: read ack upon ghes record consumption
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose
  Cc: Tyler Baicar

A RAS (Reliability, Availability, Serviceability) controller
may be a separate processor running in parallel with OS
execution, and may generate error records for consumption by
the OS. If the RAS controller produces multiple error records,
then they may be overwritten before the OS has consumed them.

The Generic Hardware Error Source (GHES) v2 structure
introduces the capability for the OS to acknowledge the
consumption of the error record generated by the RAS
controller. A RAS controller supporting GHESv2 shall wait for
the acknowledgment before writing a new error record, thus
eliminating the race condition.

Add support for parsing of GHESv2 sub-tables as well.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c | 49 +++++++++++++++++++++++++++++++++++++++++++++---
 drivers/acpi/apei/hest.c |  7 +++++--
 include/acpi/ghes.h      |  5 ++++-
 3 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e53bef6..5e1ec41 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -45,6 +45,7 @@
 #include <linux/aer.h>
 #include <linux/nmi.h>
 
+#include <acpi/actbl1.h>
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
@@ -79,6 +80,10 @@
 	((struct acpi_hest_generic_status *)				\
 	 ((struct ghes_estatus_node *)(estatus_node) + 1))
 
+#define IS_HEST_TYPE_GENERIC_V2(ghes)				\
+	((struct acpi_hest_header *)ghes->generic)->type ==	\
+	 ACPI_HEST_TYPE_GENERIC_ERROR_V2
+
 /*
  * This driver isn't really modular, however for the time being,
  * continuing to use module_param is the easiest way to remain
@@ -248,10 +253,18 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 	ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
 	if (!ghes)
 		return ERR_PTR(-ENOMEM);
+
 	ghes->generic = generic;
+	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+		rc = apei_map_generic_address(
+			&ghes->generic_v2->read_ack_register);
+		if (rc)
+			goto err_free;
+	}
+
 	rc = apei_map_generic_address(&generic->error_status_address);
 	if (rc)
-		goto err_free;
+		goto err_unmap_read_ack_addr;
 	error_block_length = generic->error_block_length;
 	if (error_block_length > GHES_ESTATUS_MAX_SIZE) {
 		pr_warning(FW_WARN GHES_PFX
@@ -263,13 +276,17 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 	ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
 	if (!ghes->estatus) {
 		rc = -ENOMEM;
-		goto err_unmap;
+		goto err_unmap_status_addr;
 	}
 
 	return ghes;
 
-err_unmap:
+err_unmap_status_addr:
 	apei_unmap_generic_address(&generic->error_status_address);
+err_unmap_read_ack_addr:
+	if (IS_HEST_TYPE_GENERIC_V2(ghes))
+		apei_unmap_generic_address(
+			&ghes->generic_v2->read_ack_register);
 err_free:
 	kfree(ghes);
 	return ERR_PTR(rc);
@@ -279,6 +296,9 @@ static void ghes_fini(struct ghes *ghes)
 {
 	kfree(ghes->estatus);
 	apei_unmap_generic_address(&ghes->generic->error_status_address);
+	if (IS_HEST_TYPE_GENERIC_V2(ghes))
+		apei_unmap_generic_address(
+			&ghes->generic_v2->read_ack_register);
 }
 
 static inline int ghes_severity(int severity)
@@ -648,6 +668,23 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
+static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
+{
+	int rc;
+	u64 val = 0;
+
+	rc = apei_read(&val, &generic_v2->read_ack_register);
+	if (rc)
+		return rc;
+	val &= generic_v2->read_ack_preserve <<
+		generic_v2->read_ack_register.bit_offset;
+	val |= generic_v2->read_ack_write <<
+		generic_v2->read_ack_register.bit_offset;
+	rc = apei_write(val, &generic_v2->read_ack_register);
+
+	return rc;
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -660,6 +697,12 @@ static int ghes_proc(struct ghes *ghes)
 			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
 	}
 	ghes_do_proc(ghes, ghes->estatus);
+
+	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+		rc = ghes_ack_error(ghes->generic_v2);
+		if (rc)
+			return rc;
+	}
 out:
 	ghes_clear_estatus(ghes);
 	return rc;
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index 8f2a98e..456b488 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -52,6 +52,7 @@
 	[ACPI_HEST_TYPE_AER_ENDPOINT] = sizeof(struct acpi_hest_aer),
 	[ACPI_HEST_TYPE_AER_BRIDGE] = sizeof(struct acpi_hest_aer_bridge),
 	[ACPI_HEST_TYPE_GENERIC_ERROR] = sizeof(struct acpi_hest_generic),
+	[ACPI_HEST_TYPE_GENERIC_ERROR_V2] = sizeof(struct acpi_hest_generic_v2),
 };
 
 static int hest_esrc_len(struct acpi_hest_header *hest_hdr)
@@ -141,7 +142,8 @@ static int __init hest_parse_ghes_count(struct acpi_hest_header *hest_hdr, void
 {
 	int *count = data;
 
-	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR ||
+	    hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		(*count)++;
 	return 0;
 }
@@ -152,7 +154,8 @@ static int __init hest_parse_ghes(struct acpi_hest_header *hest_hdr, void *data)
 	struct ghes_arr *ghes_arr = data;
 	int rc, i;
 
-	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR &&
+	    hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		return 0;
 
 	if (!((struct acpi_hest_generic *)hest_hdr)->enabled)
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 720446c..68f088a 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -13,7 +13,10 @@
 #define GHES_EXITING		0x0002
 
 struct ghes {
-	struct acpi_hest_generic *generic;
+	union {
+		struct acpi_hest_generic *generic;
+		struct acpi_hest_generic_v2 *generic_v2;
+	};
 	struct acpi_hest_generic_status *estatus;
 	u64 buffer_paddr;
 	unsigned long flags;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 01/10] acpi: apei: read ack upon ghes record consumption
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi
  Cc: Tyler Baicar

A RAS (Reliability, Availability, Serviceability) controller
may be a separate processor running in parallel with OS
execution, and may generate error records for consumption by
the OS. If the RAS controller produces multiple error records,
then they may be overwritten before the OS has consumed them.

The Generic Hardware Error Source (GHES) v2 structure
introduces the capability for the OS to acknowledge the
consumption of the error record generated by the RAS
controller. A RAS controller supporting GHESv2 shall wait for
the acknowledgment before writing a new error record, thus
eliminating the race condition.

Add support for parsing of GHESv2 sub-tables as well.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c | 49 +++++++++++++++++++++++++++++++++++++++++++++---
 drivers/acpi/apei/hest.c |  7 +++++--
 include/acpi/ghes.h      |  5 ++++-
 3 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e53bef6..5e1ec41 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -45,6 +45,7 @@
 #include <linux/aer.h>
 #include <linux/nmi.h>
 
+#include <acpi/actbl1.h>
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
@@ -79,6 +80,10 @@
 	((struct acpi_hest_generic_status *)				\
 	 ((struct ghes_estatus_node *)(estatus_node) + 1))
 
+#define IS_HEST_TYPE_GENERIC_V2(ghes)				\
+	((struct acpi_hest_header *)ghes->generic)->type ==	\
+	 ACPI_HEST_TYPE_GENERIC_ERROR_V2
+
 /*
  * This driver isn't really modular, however for the time being,
  * continuing to use module_param is the easiest way to remain
@@ -248,10 +253,18 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 	ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
 	if (!ghes)
 		return ERR_PTR(-ENOMEM);
+
 	ghes->generic = generic;
+	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+		rc = apei_map_generic_address(
+			&ghes->generic_v2->read_ack_register);
+		if (rc)
+			goto err_free;
+	}
+
 	rc = apei_map_generic_address(&generic->error_status_address);
 	if (rc)
-		goto err_free;
+		goto err_unmap_read_ack_addr;
 	error_block_length = generic->error_block_length;
 	if (error_block_length > GHES_ESTATUS_MAX_SIZE) {
 		pr_warning(FW_WARN GHES_PFX
@@ -263,13 +276,17 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 	ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
 	if (!ghes->estatus) {
 		rc = -ENOMEM;
-		goto err_unmap;
+		goto err_unmap_status_addr;
 	}
 
 	return ghes;
 
-err_unmap:
+err_unmap_status_addr:
 	apei_unmap_generic_address(&generic->error_status_address);
+err_unmap_read_ack_addr:
+	if (IS_HEST_TYPE_GENERIC_V2(ghes))
+		apei_unmap_generic_address(
+			&ghes->generic_v2->read_ack_register);
 err_free:
 	kfree(ghes);
 	return ERR_PTR(rc);
@@ -279,6 +296,9 @@ static void ghes_fini(struct ghes *ghes)
 {
 	kfree(ghes->estatus);
 	apei_unmap_generic_address(&ghes->generic->error_status_address);
+	if (IS_HEST_TYPE_GENERIC_V2(ghes))
+		apei_unmap_generic_address(
+			&ghes->generic_v2->read_ack_register);
 }
 
 static inline int ghes_severity(int severity)
@@ -648,6 +668,23 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
+static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
+{
+	int rc;
+	u64 val = 0;
+
+	rc = apei_read(&val, &generic_v2->read_ack_register);
+	if (rc)
+		return rc;
+	val &= generic_v2->read_ack_preserve <<
+		generic_v2->read_ack_register.bit_offset;
+	val |= generic_v2->read_ack_write <<
+		generic_v2->read_ack_register.bit_offset;
+	rc = apei_write(val, &generic_v2->read_ack_register);
+
+	return rc;
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -660,6 +697,12 @@ static int ghes_proc(struct ghes *ghes)
 			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
 	}
 	ghes_do_proc(ghes, ghes->estatus);
+
+	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+		rc = ghes_ack_error(ghes->generic_v2);
+		if (rc)
+			return rc;
+	}
 out:
 	ghes_clear_estatus(ghes);
 	return rc;
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index 8f2a98e..456b488 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -52,6 +52,7 @@
 	[ACPI_HEST_TYPE_AER_ENDPOINT] = sizeof(struct acpi_hest_aer),
 	[ACPI_HEST_TYPE_AER_BRIDGE] = sizeof(struct acpi_hest_aer_bridge),
 	[ACPI_HEST_TYPE_GENERIC_ERROR] = sizeof(struct acpi_hest_generic),
+	[ACPI_HEST_TYPE_GENERIC_ERROR_V2] = sizeof(struct acpi_hest_generic_v2),
 };
 
 static int hest_esrc_len(struct acpi_hest_header *hest_hdr)
@@ -141,7 +142,8 @@ static int __init hest_parse_ghes_count(struct acpi_hest_header *hest_hdr, void
 {
 	int *count = data;
 
-	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR ||
+	    hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		(*count)++;
 	return 0;
 }
@@ -152,7 +154,8 @@ static int __init hest_parse_ghes(struct acpi_hest_header *hest_hdr, void *data)
 	struct ghes_arr *ghes_arr = data;
 	int rc, i;
 
-	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR &&
+	    hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		return 0;
 
 	if (!((struct acpi_hest_generic *)hest_hdr)->enabled)
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 720446c..68f088a 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -13,7 +13,10 @@
 #define GHES_EXITING		0x0002
 
 struct ghes {
-	struct acpi_hest_generic *generic;
+	union {
+		struct acpi_hest_generic *generic;
+		struct acpi_hest_generic_v2 *generic_v2;
+	};
 	struct acpi_hest_generic_status *estatus;
 	u64 buffer_paddr;
 	unsigned long flags;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 01/10] acpi: apei: read ack upon ghes record consumption
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

A RAS (Reliability, Availability, Serviceability) controller
may be a separate processor running in parallel with OS
execution, and may generate error records for consumption by
the OS. If the RAS controller produces multiple error records,
then they may be overwritten before the OS has consumed them.

The Generic Hardware Error Source (GHES) v2 structure
introduces the capability for the OS to acknowledge the
consumption of the error record generated by the RAS
controller. A RAS controller supporting GHESv2 shall wait for
the acknowledgment before writing a new error record, thus
eliminating the race condition.

Add support for parsing of GHESv2 sub-tables as well.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c | 49 +++++++++++++++++++++++++++++++++++++++++++++---
 drivers/acpi/apei/hest.c |  7 +++++--
 include/acpi/ghes.h      |  5 ++++-
 3 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e53bef6..5e1ec41 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -45,6 +45,7 @@
 #include <linux/aer.h>
 #include <linux/nmi.h>
 
+#include <acpi/actbl1.h>
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
@@ -79,6 +80,10 @@
 	((struct acpi_hest_generic_status *)				\
 	 ((struct ghes_estatus_node *)(estatus_node) + 1))
 
+#define IS_HEST_TYPE_GENERIC_V2(ghes)				\
+	((struct acpi_hest_header *)ghes->generic)->type ==	\
+	 ACPI_HEST_TYPE_GENERIC_ERROR_V2
+
 /*
  * This driver isn't really modular, however for the time being,
  * continuing to use module_param is the easiest way to remain
@@ -248,10 +253,18 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 	ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
 	if (!ghes)
 		return ERR_PTR(-ENOMEM);
+
 	ghes->generic = generic;
+	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+		rc = apei_map_generic_address(
+			&ghes->generic_v2->read_ack_register);
+		if (rc)
+			goto err_free;
+	}
+
 	rc = apei_map_generic_address(&generic->error_status_address);
 	if (rc)
-		goto err_free;
+		goto err_unmap_read_ack_addr;
 	error_block_length = generic->error_block_length;
 	if (error_block_length > GHES_ESTATUS_MAX_SIZE) {
 		pr_warning(FW_WARN GHES_PFX
@@ -263,13 +276,17 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
 	ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
 	if (!ghes->estatus) {
 		rc = -ENOMEM;
-		goto err_unmap;
+		goto err_unmap_status_addr;
 	}
 
 	return ghes;
 
-err_unmap:
+err_unmap_status_addr:
 	apei_unmap_generic_address(&generic->error_status_address);
+err_unmap_read_ack_addr:
+	if (IS_HEST_TYPE_GENERIC_V2(ghes))
+		apei_unmap_generic_address(
+			&ghes->generic_v2->read_ack_register);
 err_free:
 	kfree(ghes);
 	return ERR_PTR(rc);
@@ -279,6 +296,9 @@ static void ghes_fini(struct ghes *ghes)
 {
 	kfree(ghes->estatus);
 	apei_unmap_generic_address(&ghes->generic->error_status_address);
+	if (IS_HEST_TYPE_GENERIC_V2(ghes))
+		apei_unmap_generic_address(
+			&ghes->generic_v2->read_ack_register);
 }
 
 static inline int ghes_severity(int severity)
@@ -648,6 +668,23 @@ static void ghes_estatus_cache_add(
 	rcu_read_unlock();
 }
 
+static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
+{
+	int rc;
+	u64 val = 0;
+
+	rc = apei_read(&val, &generic_v2->read_ack_register);
+	if (rc)
+		return rc;
+	val &= generic_v2->read_ack_preserve <<
+		generic_v2->read_ack_register.bit_offset;
+	val |= generic_v2->read_ack_write <<
+		generic_v2->read_ack_register.bit_offset;
+	rc = apei_write(val, &generic_v2->read_ack_register);
+
+	return rc;
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -660,6 +697,12 @@ static int ghes_proc(struct ghes *ghes)
 			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
 	}
 	ghes_do_proc(ghes, ghes->estatus);
+
+	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+		rc = ghes_ack_error(ghes->generic_v2);
+		if (rc)
+			return rc;
+	}
 out:
 	ghes_clear_estatus(ghes);
 	return rc;
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index 8f2a98e..456b488 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -52,6 +52,7 @@
 	[ACPI_HEST_TYPE_AER_ENDPOINT] = sizeof(struct acpi_hest_aer),
 	[ACPI_HEST_TYPE_AER_BRIDGE] = sizeof(struct acpi_hest_aer_bridge),
 	[ACPI_HEST_TYPE_GENERIC_ERROR] = sizeof(struct acpi_hest_generic),
+	[ACPI_HEST_TYPE_GENERIC_ERROR_V2] = sizeof(struct acpi_hest_generic_v2),
 };
 
 static int hest_esrc_len(struct acpi_hest_header *hest_hdr)
@@ -141,7 +142,8 @@ static int __init hest_parse_ghes_count(struct acpi_hest_header *hest_hdr, void
 {
 	int *count = data;
 
-	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR ||
+	    hest_hdr->type == ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		(*count)++;
 	return 0;
 }
@@ -152,7 +154,8 @@ static int __init hest_parse_ghes(struct acpi_hest_header *hest_hdr, void *data)
 	struct ghes_arr *ghes_arr = data;
 	int rc, i;
 
-	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR)
+	if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR &&
+	    hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR_V2)
 		return 0;
 
 	if (!((struct acpi_hest_generic *)hest_hdr)->enabled)
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 720446c..68f088a 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -13,7 +13,10 @@
 #define GHES_EXITING		0x0002
 
 struct ghes {
-	struct acpi_hest_generic *generic;
+	union {
+		struct acpi_hest_generic *generic;
+		struct acpi_hest_generic_v2 *generic_v2;
+	};
 	struct acpi_hest_generic_status *estatus;
 	u64 buffer_paddr;
 	unsigned long flags;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
  2017-02-01 17:16 ` Tyler Baicar
  (?)
@ 2017-02-01 17:16   ` Tyler Baicar
  -1 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba
  Cc: Tyler Baicar

Currently when a RAS error is reported it is not timestamped.
The ACPI 6.1 spec adds the timestamp field to the generic error
data entry v3 structure. The timestamp of when the firmware
generated the error is now being reported.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c    |  9 ++++---
 drivers/firmware/efi/cper.c | 63 +++++++++++++++++++++++++++++++++++----------
 include/acpi/ghes.h         | 22 ++++++++++++++++
 3 files changed, 77 insertions(+), 17 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 5e1ec41..b25e7cf 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -420,7 +420,8 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 	int flags = -1;
 	int sec_sev = ghes_severity(gdata->error_severity);
 	struct cper_sec_mem_err *mem_err;
-	mem_err = (struct cper_sec_mem_err *)(gdata + 1);
+
+	mem_err = acpi_hest_generic_data_payload(gdata);
 
 	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
 		return;
@@ -457,7 +458,8 @@ static void ghes_do_proc(struct ghes *ghes,
 		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 				 CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err;
-			mem_err = (struct cper_sec_mem_err *)(gdata+1);
+
+			mem_err = acpi_hest_generic_data_payload(gdata);
 			ghes_edac_report_mem_error(ghes, sev, mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
@@ -467,7 +469,8 @@ static void ghes_do_proc(struct ghes *ghes,
 		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 				      CPER_SEC_PCIE)) {
 			struct cper_sec_pcie *pcie_err;
-			pcie_err = (struct cper_sec_pcie *)(gdata+1);
+
+			pcie_err = acpi_hest_generic_data_payload(gdata);
 			if (sev == GHES_SEV_RECOVERABLE &&
 			    sec_sev == GHES_SEV_RECOVERABLE &&
 			    pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index d425374..8fa4e23 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -32,6 +32,9 @@
 #include <linux/acpi.h>
 #include <linux/pci.h>
 #include <linux/aer.h>
+#include <linux/printk.h>
+#include <linux/bcd.h>
+#include <acpi/ghes.h>
 
 #define INDENT_SP	" "
 
@@ -386,13 +389,37 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie,
 	pfx, pcie->bridge.secondary_status, pcie->bridge.control);
 }
 
+static void cper_estatus_print_section_v300(const char *pfx,
+	const struct acpi_hest_generic_data_v300 *gdata)
+{
+	__u8 hour, min, sec, day, mon, year, century, *timestamp;
+
+	if (gdata->validation_bits & ACPI_HEST_GEN_VALID_TIMESTAMP) {
+		timestamp = (__u8 *)&(gdata->time_stamp);
+		sec = bcd2bin(timestamp[0]);
+		min = bcd2bin(timestamp[1]);
+		hour = bcd2bin(timestamp[2]);
+		day = bcd2bin(timestamp[4]);
+		mon = bcd2bin(timestamp[5]);
+		year = bcd2bin(timestamp[6]);
+		century = bcd2bin(timestamp[7]);
+		printk("%stime: %7s %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+			0x01 & *(timestamp + 3) ? "precise" : "", century,
+			year, mon, day, hour, min, sec);
+	}
+}
+
 static void cper_estatus_print_section(
-	const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
+	const char *pfx, struct acpi_hest_generic_data *gdata, int sec_no)
 {
 	uuid_le *sec_type = (uuid_le *)gdata->section_type;
 	__u16 severity;
 	char newpfx[64];
 
+	if (acpi_hest_generic_data_version(gdata) >= 3)
+		cper_estatus_print_section_v300(pfx,
+			(const struct acpi_hest_generic_data_v300 *)gdata);
+
 	severity = gdata->error_severity;
 	printk("%s""Error %d, type: %s\n", pfx, sec_no,
 	       cper_severity_str(severity));
@@ -403,14 +430,18 @@ static void cper_estatus_print_section(
 
 	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
 	if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_GENERIC)) {
-		struct cper_sec_proc_generic *proc_err = (void *)(gdata + 1);
+		struct cper_sec_proc_generic *proc_err;
+
+		proc_err = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: general processor error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*proc_err))
 			cper_print_proc_generic(newpfx, proc_err);
 		else
 			goto err_section_too_small;
 	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
-		struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
+		struct cper_sec_mem_err *mem_err;
+
+		mem_err = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: memory error\n", newpfx);
 		if (gdata->error_data_length >=
 		    sizeof(struct cper_sec_mem_err_old))
@@ -419,7 +450,9 @@ static void cper_estatus_print_section(
 		else
 			goto err_section_too_small;
 	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
-		struct cper_sec_pcie *pcie = (void *)(gdata + 1);
+		struct cper_sec_pcie *pcie;
+
+		pcie = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: PCIe error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*pcie))
 			cper_print_pcie(newpfx, pcie, gdata);
@@ -438,7 +471,7 @@ void cper_estatus_print(const char *pfx,
 			const struct acpi_hest_generic_status *estatus)
 {
 	struct acpi_hest_generic_data *gdata;
-	unsigned int data_len, gedata_len;
+	unsigned int data_len;
 	int sec_no = 0;
 	char newpfx[64];
 	__u16 severity;
@@ -451,12 +484,13 @@ void cper_estatus_print(const char *pfx,
 	printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
 	data_len = estatus->data_length;
 	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
+
 	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
-	while (data_len >= sizeof(*gdata)) {
-		gedata_len = gdata->error_data_length;
+
+	while (data_len >= acpi_hest_generic_data_size(gdata)) {
 		cper_estatus_print_section(newpfx, gdata, sec_no);
-		data_len -= gedata_len + sizeof(*gdata);
-		gdata = (void *)(gdata + 1) + gedata_len;
+		data_len -= acpi_hest_generic_data_record_size(gdata);
+		gdata = acpi_hest_generic_data_next(gdata);
 		sec_no++;
 	}
 }
@@ -486,12 +520,13 @@ int cper_estatus_check(const struct acpi_hest_generic_status *estatus)
 		return rc;
 	data_len = estatus->data_length;
 	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
-	while (data_len >= sizeof(*gdata)) {
-		gedata_len = gdata->error_data_length;
-		if (gedata_len > data_len - sizeof(*gdata))
+
+	while (data_len >= acpi_hest_generic_data_size(gdata)) {
+		gedata_len = acpi_hest_generic_data_error_length(gdata);
+		if (gedata_len > data_len - acpi_hest_generic_data_size(gdata))
 			return -EINVAL;
-		data_len -= gedata_len + sizeof(*gdata);
-		gdata = (void *)(gdata + 1) + gedata_len;
+		data_len -= gedata_len + acpi_hest_generic_data_size(gdata);
+		gdata = acpi_hest_generic_data_next(gdata);
 	}
 	if (data_len)
 		return -EINVAL;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 68f088a..6ae318b 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -12,6 +12,18 @@
 #define GHES_TO_CLEAR		0x0001
 #define GHES_EXITING		0x0002
 
+#define acpi_hest_generic_data_error_length(gdata)	\
+	(((struct acpi_hest_generic_data *)(gdata))->error_data_length)
+#define acpi_hest_generic_data_size(gdata)		\
+	((acpi_hest_generic_data_version(gdata) >= 3) ?	\
+	sizeof(struct acpi_hest_generic_data_v300) :	\
+	sizeof(struct acpi_hest_generic_data))
+#define acpi_hest_generic_data_record_size(gdata)	\
+	(acpi_hest_generic_data_size(gdata) +		\
+	acpi_hest_generic_data_error_length(gdata))
+#define acpi_hest_generic_data_next(gdata)		\
+	((void *)(gdata) + acpi_hest_generic_data_record_size(gdata))
+
 struct ghes {
 	union {
 		struct acpi_hest_generic *generic;
@@ -73,3 +85,13 @@ static inline void ghes_edac_unregister(struct ghes *ghes)
 {
 }
 #endif
+
+#define acpi_hest_generic_data_version(gdata)			\
+	(gdata->revision >> 8)
+
+static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data *gdata)
+{
+	return acpi_hest_generic_data_version(gdata) >= 3 ?
+		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
+		gdata + 1;
+}
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose
  Cc: Tyler Baicar

Currently when a RAS error is reported it is not timestamped.
The ACPI 6.1 spec adds the timestamp field to the generic error
data entry v3 structure. The timestamp of when the firmware
generated the error is now being reported.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c    |  9 ++++---
 drivers/firmware/efi/cper.c | 63 +++++++++++++++++++++++++++++++++++----------
 include/acpi/ghes.h         | 22 ++++++++++++++++
 3 files changed, 77 insertions(+), 17 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 5e1ec41..b25e7cf 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -420,7 +420,8 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 	int flags = -1;
 	int sec_sev = ghes_severity(gdata->error_severity);
 	struct cper_sec_mem_err *mem_err;
-	mem_err = (struct cper_sec_mem_err *)(gdata + 1);
+
+	mem_err = acpi_hest_generic_data_payload(gdata);
 
 	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
 		return;
@@ -457,7 +458,8 @@ static void ghes_do_proc(struct ghes *ghes,
 		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 				 CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err;
-			mem_err = (struct cper_sec_mem_err *)(gdata+1);
+
+			mem_err = acpi_hest_generic_data_payload(gdata);
 			ghes_edac_report_mem_error(ghes, sev, mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
@@ -467,7 +469,8 @@ static void ghes_do_proc(struct ghes *ghes,
 		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 				      CPER_SEC_PCIE)) {
 			struct cper_sec_pcie *pcie_err;
-			pcie_err = (struct cper_sec_pcie *)(gdata+1);
+
+			pcie_err = acpi_hest_generic_data_payload(gdata);
 			if (sev == GHES_SEV_RECOVERABLE &&
 			    sec_sev == GHES_SEV_RECOVERABLE &&
 			    pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index d425374..8fa4e23 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -32,6 +32,9 @@
 #include <linux/acpi.h>
 #include <linux/pci.h>
 #include <linux/aer.h>
+#include <linux/printk.h>
+#include <linux/bcd.h>
+#include <acpi/ghes.h>
 
 #define INDENT_SP	" "
 
@@ -386,13 +389,37 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie,
 	pfx, pcie->bridge.secondary_status, pcie->bridge.control);
 }
 
+static void cper_estatus_print_section_v300(const char *pfx,
+	const struct acpi_hest_generic_data_v300 *gdata)
+{
+	__u8 hour, min, sec, day, mon, year, century, *timestamp;
+
+	if (gdata->validation_bits & ACPI_HEST_GEN_VALID_TIMESTAMP) {
+		timestamp = (__u8 *)&(gdata->time_stamp);
+		sec = bcd2bin(timestamp[0]);
+		min = bcd2bin(timestamp[1]);
+		hour = bcd2bin(timestamp[2]);
+		day = bcd2bin(timestamp[4]);
+		mon = bcd2bin(timestamp[5]);
+		year = bcd2bin(timestamp[6]);
+		century = bcd2bin(timestamp[7]);
+		printk("%stime: %7s %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+			0x01 & *(timestamp + 3) ? "precise" : "", century,
+			year, mon, day, hour, min, sec);
+	}
+}
+
 static void cper_estatus_print_section(
-	const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
+	const char *pfx, struct acpi_hest_generic_data *gdata, int sec_no)
 {
 	uuid_le *sec_type = (uuid_le *)gdata->section_type;
 	__u16 severity;
 	char newpfx[64];
 
+	if (acpi_hest_generic_data_version(gdata) >= 3)
+		cper_estatus_print_section_v300(pfx,
+			(const struct acpi_hest_generic_data_v300 *)gdata);
+
 	severity = gdata->error_severity;
 	printk("%s""Error %d, type: %s\n", pfx, sec_no,
 	       cper_severity_str(severity));
@@ -403,14 +430,18 @@ static void cper_estatus_print_section(
 
 	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
 	if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_GENERIC)) {
-		struct cper_sec_proc_generic *proc_err = (void *)(gdata + 1);
+		struct cper_sec_proc_generic *proc_err;
+
+		proc_err = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: general processor error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*proc_err))
 			cper_print_proc_generic(newpfx, proc_err);
 		else
 			goto err_section_too_small;
 	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
-		struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
+		struct cper_sec_mem_err *mem_err;
+
+		mem_err = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: memory error\n", newpfx);
 		if (gdata->error_data_length >=
 		    sizeof(struct cper_sec_mem_err_old))
@@ -419,7 +450,9 @@ static void cper_estatus_print_section(
 		else
 			goto err_section_too_small;
 	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
-		struct cper_sec_pcie *pcie = (void *)(gdata + 1);
+		struct cper_sec_pcie *pcie;
+
+		pcie = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: PCIe error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*pcie))
 			cper_print_pcie(newpfx, pcie, gdata);
@@ -438,7 +471,7 @@ void cper_estatus_print(const char *pfx,
 			const struct acpi_hest_generic_status *estatus)
 {
 	struct acpi_hest_generic_data *gdata;
-	unsigned int data_len, gedata_len;
+	unsigned int data_len;
 	int sec_no = 0;
 	char newpfx[64];
 	__u16 severity;
@@ -451,12 +484,13 @@ void cper_estatus_print(const char *pfx,
 	printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
 	data_len = estatus->data_length;
 	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
+
 	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
-	while (data_len >= sizeof(*gdata)) {
-		gedata_len = gdata->error_data_length;
+
+	while (data_len >= acpi_hest_generic_data_size(gdata)) {
 		cper_estatus_print_section(newpfx, gdata, sec_no);
-		data_len -= gedata_len + sizeof(*gdata);
-		gdata = (void *)(gdata + 1) + gedata_len;
+		data_len -= acpi_hest_generic_data_record_size(gdata);
+		gdata = acpi_hest_generic_data_next(gdata);
 		sec_no++;
 	}
 }
@@ -486,12 +520,13 @@ int cper_estatus_check(const struct acpi_hest_generic_status *estatus)
 		return rc;
 	data_len = estatus->data_length;
 	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
-	while (data_len >= sizeof(*gdata)) {
-		gedata_len = gdata->error_data_length;
-		if (gedata_len > data_len - sizeof(*gdata))
+
+	while (data_len >= acpi_hest_generic_data_size(gdata)) {
+		gedata_len = acpi_hest_generic_data_error_length(gdata);
+		if (gedata_len > data_len - acpi_hest_generic_data_size(gdata))
 			return -EINVAL;
-		data_len -= gedata_len + sizeof(*gdata);
-		gdata = (void *)(gdata + 1) + gedata_len;
+		data_len -= gedata_len + acpi_hest_generic_data_size(gdata);
+		gdata = acpi_hest_generic_data_next(gdata);
 	}
 	if (data_len)
 		return -EINVAL;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 68f088a..6ae318b 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -12,6 +12,18 @@
 #define GHES_TO_CLEAR		0x0001
 #define GHES_EXITING		0x0002
 
+#define acpi_hest_generic_data_error_length(gdata)	\
+	(((struct acpi_hest_generic_data *)(gdata))->error_data_length)
+#define acpi_hest_generic_data_size(gdata)		\
+	((acpi_hest_generic_data_version(gdata) >= 3) ?	\
+	sizeof(struct acpi_hest_generic_data_v300) :	\
+	sizeof(struct acpi_hest_generic_data))
+#define acpi_hest_generic_data_record_size(gdata)	\
+	(acpi_hest_generic_data_size(gdata) +		\
+	acpi_hest_generic_data_error_length(gdata))
+#define acpi_hest_generic_data_next(gdata)		\
+	((void *)(gdata) + acpi_hest_generic_data_record_size(gdata))
+
 struct ghes {
 	union {
 		struct acpi_hest_generic *generic;
@@ -73,3 +85,13 @@ static inline void ghes_edac_unregister(struct ghes *ghes)
 {
 }
 #endif
+
+#define acpi_hest_generic_data_version(gdata)			\
+	(gdata->revision >> 8)
+
+static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data *gdata)
+{
+	return acpi_hest_generic_data_version(gdata) >= 3 ?
+		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
+		gdata + 1;
+}
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

Currently when a RAS error is reported it is not timestamped.
The ACPI 6.1 spec adds the timestamp field to the generic error
data entry v3 structure. The timestamp of when the firmware
generated the error is now being reported.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c    |  9 ++++---
 drivers/firmware/efi/cper.c | 63 +++++++++++++++++++++++++++++++++++----------
 include/acpi/ghes.h         | 22 ++++++++++++++++
 3 files changed, 77 insertions(+), 17 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 5e1ec41..b25e7cf 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -420,7 +420,8 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 	int flags = -1;
 	int sec_sev = ghes_severity(gdata->error_severity);
 	struct cper_sec_mem_err *mem_err;
-	mem_err = (struct cper_sec_mem_err *)(gdata + 1);
+
+	mem_err = acpi_hest_generic_data_payload(gdata);
 
 	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
 		return;
@@ -457,7 +458,8 @@ static void ghes_do_proc(struct ghes *ghes,
 		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 				 CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err;
-			mem_err = (struct cper_sec_mem_err *)(gdata+1);
+
+			mem_err = acpi_hest_generic_data_payload(gdata);
 			ghes_edac_report_mem_error(ghes, sev, mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
@@ -467,7 +469,8 @@ static void ghes_do_proc(struct ghes *ghes,
 		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
 				      CPER_SEC_PCIE)) {
 			struct cper_sec_pcie *pcie_err;
-			pcie_err = (struct cper_sec_pcie *)(gdata+1);
+
+			pcie_err = acpi_hest_generic_data_payload(gdata);
 			if (sev == GHES_SEV_RECOVERABLE &&
 			    sec_sev == GHES_SEV_RECOVERABLE &&
 			    pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index d425374..8fa4e23 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -32,6 +32,9 @@
 #include <linux/acpi.h>
 #include <linux/pci.h>
 #include <linux/aer.h>
+#include <linux/printk.h>
+#include <linux/bcd.h>
+#include <acpi/ghes.h>
 
 #define INDENT_SP	" "
 
@@ -386,13 +389,37 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie,
 	pfx, pcie->bridge.secondary_status, pcie->bridge.control);
 }
 
+static void cper_estatus_print_section_v300(const char *pfx,
+	const struct acpi_hest_generic_data_v300 *gdata)
+{
+	__u8 hour, min, sec, day, mon, year, century, *timestamp;
+
+	if (gdata->validation_bits & ACPI_HEST_GEN_VALID_TIMESTAMP) {
+		timestamp = (__u8 *)&(gdata->time_stamp);
+		sec = bcd2bin(timestamp[0]);
+		min = bcd2bin(timestamp[1]);
+		hour = bcd2bin(timestamp[2]);
+		day = bcd2bin(timestamp[4]);
+		mon = bcd2bin(timestamp[5]);
+		year = bcd2bin(timestamp[6]);
+		century = bcd2bin(timestamp[7]);
+		printk("%stime: %7s %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
+			0x01 & *(timestamp + 3) ? "precise" : "", century,
+			year, mon, day, hour, min, sec);
+	}
+}
+
 static void cper_estatus_print_section(
-	const char *pfx, const struct acpi_hest_generic_data *gdata, int sec_no)
+	const char *pfx, struct acpi_hest_generic_data *gdata, int sec_no)
 {
 	uuid_le *sec_type = (uuid_le *)gdata->section_type;
 	__u16 severity;
 	char newpfx[64];
 
+	if (acpi_hest_generic_data_version(gdata) >= 3)
+		cper_estatus_print_section_v300(pfx,
+			(const struct acpi_hest_generic_data_v300 *)gdata);
+
 	severity = gdata->error_severity;
 	printk("%s""Error %d, type: %s\n", pfx, sec_no,
 	       cper_severity_str(severity));
@@ -403,14 +430,18 @@ static void cper_estatus_print_section(
 
 	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
 	if (!uuid_le_cmp(*sec_type, CPER_SEC_PROC_GENERIC)) {
-		struct cper_sec_proc_generic *proc_err = (void *)(gdata + 1);
+		struct cper_sec_proc_generic *proc_err;
+
+		proc_err = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: general processor error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*proc_err))
 			cper_print_proc_generic(newpfx, proc_err);
 		else
 			goto err_section_too_small;
 	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PLATFORM_MEM)) {
-		struct cper_sec_mem_err *mem_err = (void *)(gdata + 1);
+		struct cper_sec_mem_err *mem_err;
+
+		mem_err = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: memory error\n", newpfx);
 		if (gdata->error_data_length >=
 		    sizeof(struct cper_sec_mem_err_old))
@@ -419,7 +450,9 @@ static void cper_estatus_print_section(
 		else
 			goto err_section_too_small;
 	} else if (!uuid_le_cmp(*sec_type, CPER_SEC_PCIE)) {
-		struct cper_sec_pcie *pcie = (void *)(gdata + 1);
+		struct cper_sec_pcie *pcie;
+
+		pcie = acpi_hest_generic_data_payload(gdata);
 		printk("%s""section_type: PCIe error\n", newpfx);
 		if (gdata->error_data_length >= sizeof(*pcie))
 			cper_print_pcie(newpfx, pcie, gdata);
@@ -438,7 +471,7 @@ void cper_estatus_print(const char *pfx,
 			const struct acpi_hest_generic_status *estatus)
 {
 	struct acpi_hest_generic_data *gdata;
-	unsigned int data_len, gedata_len;
+	unsigned int data_len;
 	int sec_no = 0;
 	char newpfx[64];
 	__u16 severity;
@@ -451,12 +484,13 @@ void cper_estatus_print(const char *pfx,
 	printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
 	data_len = estatus->data_length;
 	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
+
 	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
-	while (data_len >= sizeof(*gdata)) {
-		gedata_len = gdata->error_data_length;
+
+	while (data_len >= acpi_hest_generic_data_size(gdata)) {
 		cper_estatus_print_section(newpfx, gdata, sec_no);
-		data_len -= gedata_len + sizeof(*gdata);
-		gdata = (void *)(gdata + 1) + gedata_len;
+		data_len -= acpi_hest_generic_data_record_size(gdata);
+		gdata = acpi_hest_generic_data_next(gdata);
 		sec_no++;
 	}
 }
@@ -486,12 +520,13 @@ int cper_estatus_check(const struct acpi_hest_generic_status *estatus)
 		return rc;
 	data_len = estatus->data_length;
 	gdata = (struct acpi_hest_generic_data *)(estatus + 1);
-	while (data_len >= sizeof(*gdata)) {
-		gedata_len = gdata->error_data_length;
-		if (gedata_len > data_len - sizeof(*gdata))
+
+	while (data_len >= acpi_hest_generic_data_size(gdata)) {
+		gedata_len = acpi_hest_generic_data_error_length(gdata);
+		if (gedata_len > data_len - acpi_hest_generic_data_size(gdata))
 			return -EINVAL;
-		data_len -= gedata_len + sizeof(*gdata);
-		gdata = (void *)(gdata + 1) + gedata_len;
+		data_len -= gedata_len + acpi_hest_generic_data_size(gdata);
+		gdata = acpi_hest_generic_data_next(gdata);
 	}
 	if (data_len)
 		return -EINVAL;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 68f088a..6ae318b 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -12,6 +12,18 @@
 #define GHES_TO_CLEAR		0x0001
 #define GHES_EXITING		0x0002
 
+#define acpi_hest_generic_data_error_length(gdata)	\
+	(((struct acpi_hest_generic_data *)(gdata))->error_data_length)
+#define acpi_hest_generic_data_size(gdata)		\
+	((acpi_hest_generic_data_version(gdata) >= 3) ?	\
+	sizeof(struct acpi_hest_generic_data_v300) :	\
+	sizeof(struct acpi_hest_generic_data))
+#define acpi_hest_generic_data_record_size(gdata)	\
+	(acpi_hest_generic_data_size(gdata) +		\
+	acpi_hest_generic_data_error_length(gdata))
+#define acpi_hest_generic_data_next(gdata)		\
+	((void *)(gdata) + acpi_hest_generic_data_record_size(gdata))
+
 struct ghes {
 	union {
 		struct acpi_hest_generic *generic;
@@ -73,3 +85,13 @@ static inline void ghes_edac_unregister(struct ghes *ghes)
 {
 }
 #endif
+
+#define acpi_hest_generic_data_version(gdata)			\
+	(gdata->revision >> 8)
+
+static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data *gdata)
+{
+	return acpi_hest_generic_data_version(gdata) >= 3 ?
+		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
+		gdata + 1;
+}
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 03/10] efi: parse ARM processor error
  2017-02-01 17:16 ` Tyler Baicar
  (?)
@ 2017-02-01 17:16   ` Tyler Baicar
  -1 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba
  Cc: Tyler Baicar

Add support for ARM Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARM specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/firmware/efi/cper.c | 133 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/cper.h        |  54 ++++++++++++++++++
 2 files changed, 187 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 8fa4e23..c2b0a12 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 static const char * const proc_type_strs[] = {
 	"IA32/X64",
 	"IA64",
+	"ARM",
 };
 
 static const char * const proc_isa_strs[] = {
 	"IA32",
 	"IA64",
 	"X64",
+	"ARM A32/T32",
+	"ARM A64",
 };
 
 static const char * const proc_error_type_strs[] = {
@@ -139,6 +142,18 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 	"corrected",
 };
 
+static const char * const arm_reg_ctx_strs[] = {
+	"AArch32 general purpose registers",
+	"AArch32 EL1 context registers",
+	"AArch32 EL2 context registers",
+	"AArch32 secure context registers",
+	"AArch64 general purpose registers",
+	"AArch64 EL1 context registers",
+	"AArch64 EL2 context registers",
+	"AArch64 EL3 context registers",
+	"Misc. system register structure",
+};
+
 static void cper_print_proc_generic(const char *pfx,
 				    const struct cper_sec_proc_generic *proc)
 {
@@ -184,6 +199,114 @@ static void cper_print_proc_generic(const char *pfx,
 		printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
 }
 
+static void cper_print_proc_arm(const char *pfx,
+				const struct cper_sec_proc_arm *proc)
+{
+	int i, len, max_ctx_type;
+	struct cper_arm_err_info *err_info;
+	struct cper_arm_ctx_info *ctx_info;
+	char newpfx[64];
+
+	printk("%s""section length: %d\n", pfx, proc->section_length);
+	printk("%s""MIDR: 0x%016llx\n", pfx, proc->midr);
+
+	len = proc->section_length - (sizeof(*proc) +
+		proc->err_info_num * (sizeof(*err_info)));
+	if (len < 0) {
+		printk("%s""section length is too small\n", pfx);
+		printk("%s""firmware-generated error record is incorrect\n", pfx);
+		printk("%s""ERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
+		return;
+	}
+
+	if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+		printk("%s""MPIDR: 0x%016llx\n", pfx, proc->mpidr);
+	if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+		printk("%s""error affinity level: %d\n", pfx,
+			proc->affinity_level);
+	if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+		printk("%s""running state: 0x%x\n", pfx, proc->running_state);
+		printk("%s""PSCI state: %d\n", pfx, proc->psci_state);
+	}
+
+	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
+
+	err_info = (struct cper_arm_err_info *)(proc + 1);
+	for (i = 0; i < proc->err_info_num; i++) {
+		printk("%s""Error info structure %d:\n", pfx, i);
+		printk("%s""version:%d\n", newpfx, err_info->version);
+		printk("%s""length:%d\n", newpfx, err_info->length);
+		if (err_info->validation_bits &
+		    CPER_ARM_INFO_VALID_MULTI_ERR) {
+			if (err_info->multiple_error == 0)
+				printk("%s""single error\n", newpfx);
+			else if (err_info->multiple_error == 1)
+				printk("%s""multiple errors\n", newpfx);
+			else
+				printk("%s""multiple errors count:%u\n",
+				newpfx, err_info->multiple_error);
+		}
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_FLAGS) {
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_FIRST)
+				printk("%s""first error captured\n", newpfx);
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_LAST)
+				printk("%s""last error captured\n", newpfx);
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_PROPAGATED)
+				printk("%s""propagated error captured\n",
+				       newpfx);
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_OVERFLOW)
+				printk("%s""overflow occurred, error info is incomplete\n",
+				       newpfx);
+		}
+		printk("%s""error_type: %d, %s\n", newpfx, err_info->type,
+			err_info->type < ARRAY_SIZE(proc_error_type_strs) ?
+			proc_error_type_strs[err_info->type] : "unknown");
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
+			printk("%s""error_info: 0x%016llx\n", newpfx,
+			       err_info->error_info);
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
+			printk("%s""virtual fault address: 0x%016llx\n",
+				newpfx, err_info->virt_fault_addr);
+		if (err_info->validation_bits &
+		    CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
+			printk("%s""physical fault address: 0x%016llx\n",
+				newpfx, err_info->physical_fault_addr);
+		err_info += 1;
+	}
+	ctx_info = (struct cper_arm_ctx_info *)err_info;
+	max_ctx_type = ARRAY_SIZE(arm_reg_ctx_strs) - 1;
+	for (i = 0; i < proc->context_info_num; i++) {
+		int size = sizeof(*ctx_info) + ctx_info->size;
+
+		printk("%s""Context info structure %d:\n", pfx, i);
+		if (len < size) {
+			printk("%s""section length is too small\n", newpfx);
+			printk("%s""firmware-generated error record is incorrect\n", pfx);
+			return;
+		}
+		if (ctx_info->type > max_ctx_type) {
+			printk("%s""Invalid context type: %d\n", newpfx,
+							ctx_info->type);
+			printk("%s""Max context type: %d\n", newpfx,
+							max_ctx_type);
+			return;
+		}
+		printk("%s""register context type %d: %s\n", newpfx,
+			ctx_info->type, arm_reg_ctx_strs[ctx_info->type]);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+				(ctx_info + 1), ctx_info->size, 0);
+		len -= size;
+		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + size);
+	}
+
+	if (len > 0) {
+		printk("%s""Vendor specific error info has %u bytes:\n", pfx,
+		       len);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4, ctx_info,
+				len, 0);
+	}
+}
+
 static const char * const mem_err_type_strs[] = {
 	"unknown",
 	"no error",
@@ -458,6 +581,16 @@ static void cper_estatus_print_section(
 			cper_print_pcie(newpfx, pcie, gdata);
 		else
 			goto err_section_too_small;
+	} else if ((IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_ARM)) &&
+		   !uuid_le_cmp(*sec_type, CPER_SEC_PROC_ARM)) {
+		struct cper_sec_proc_arm *arm_err;
+
+		arm_err = acpi_hest_generic_data_payload(gdata);
+		printk("%ssection_type: ARM processor error\n", newpfx);
+		if (gdata->error_data_length >= sizeof(*arm_err))
+			cper_print_proc_arm(newpfx, arm_err);
+		else
+			goto err_section_too_small;
 	} else
 		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
 
diff --git a/include/linux/cper.h b/include/linux/cper.h
index dcacb1a..85450f3 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -180,6 +180,10 @@ enum {
 #define CPER_SEC_PROC_IPF						\
 	UUID_LE(0xE429FAF1, 0x3CB7, 0x11D4, 0x0B, 0xCA, 0x07, 0x00,	\
 		0x80, 0xC7, 0x3C, 0x88, 0x81)
+/* Processor Specific: ARM */
+#define CPER_SEC_PROC_ARM						\
+	UUID_LE(0xE19E3D16, 0xBC11, 0x11E4, 0x9C, 0xAA, 0xC2, 0x05,	\
+		0x1D, 0x5D, 0x46, 0xB0)
 /* Platform Memory */
 #define CPER_SEC_PLATFORM_MEM						\
 	UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83,	\
@@ -255,6 +259,22 @@ enum {
 
 #define CPER_PCIE_SLOT_SHIFT			3
 
+#define CPER_ARM_VALID_MPIDR			0x00000001
+#define CPER_ARM_VALID_AFFINITY_LEVEL		0x00000002
+#define CPER_ARM_VALID_RUNNING_STATE		0x00000004
+#define CPER_ARM_VALID_VENDOR_INFO		0x00000008
+
+#define CPER_ARM_INFO_VALID_MULTI_ERR		0x0001
+#define CPER_ARM_INFO_VALID_FLAGS		0x0002
+#define CPER_ARM_INFO_VALID_ERR_INFO		0x0004
+#define CPER_ARM_INFO_VALID_VIRT_ADDR		0x0008
+#define CPER_ARM_INFO_VALID_PHYSICAL_ADDR	0x0010
+
+#define CPER_ARM_INFO_FLAGS_FIRST		0x0001
+#define CPER_ARM_INFO_FLAGS_LAST		0x0002
+#define CPER_ARM_INFO_FLAGS_PROPAGATED		0x0004
+#define CPER_ARM_INFO_FLAGS_OVERFLOW		0x0008
+
 /*
  * All tables and structs must be byte-packed to match CPER
  * specification, since the tables are provided by the system BIOS
@@ -340,6 +360,40 @@ struct cper_ia_proc_ctx {
 	__u64	mm_reg_addr;
 };
 
+/* ARM Processor Error Section */
+struct cper_sec_proc_arm {
+	__u32	validation_bits;
+	__u16	err_info_num; /* Number of Processor Error Info */
+	__u16	context_info_num; /* Number of Processor Context Info Records*/
+	__u32	section_length;
+	__u8	affinity_level;
+	__u8	reserved[3];	/* must be zero */
+	__u64	mpidr;
+	__u64	midr;
+	__u32	running_state; /* Bit 0 set - Processor running. PSCI = 0 */
+	__u32	psci_state;
+};
+
+/* ARM Processor Error Information Structure */
+struct cper_arm_err_info {
+	__u8	version;
+	__u8	length;
+	__u16	validation_bits;
+	__u8	type;
+	__u16	multiple_error;
+	__u8	flags;
+	__u64	error_info;
+	__u64	virt_fault_addr;
+	__u64	physical_fault_addr;
+};
+
+/* ARM Processor Context Information Structure */
+struct cper_arm_ctx_info {
+	__u16	version;
+	__u16	type;
+	__u32	size;
+};
+
 /* Old Memory Error Section UEFI 2.1, 2.2 */
 struct cper_sec_mem_err_old {
 	__u64	validation_bits;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 03/10] efi: parse ARM processor error
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose
  Cc: Tyler Baicar

Add support for ARM Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARM specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/firmware/efi/cper.c | 133 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/cper.h        |  54 ++++++++++++++++++
 2 files changed, 187 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 8fa4e23..c2b0a12 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 static const char * const proc_type_strs[] = {
 	"IA32/X64",
 	"IA64",
+	"ARM",
 };
 
 static const char * const proc_isa_strs[] = {
 	"IA32",
 	"IA64",
 	"X64",
+	"ARM A32/T32",
+	"ARM A64",
 };
 
 static const char * const proc_error_type_strs[] = {
@@ -139,6 +142,18 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 	"corrected",
 };
 
+static const char * const arm_reg_ctx_strs[] = {
+	"AArch32 general purpose registers",
+	"AArch32 EL1 context registers",
+	"AArch32 EL2 context registers",
+	"AArch32 secure context registers",
+	"AArch64 general purpose registers",
+	"AArch64 EL1 context registers",
+	"AArch64 EL2 context registers",
+	"AArch64 EL3 context registers",
+	"Misc. system register structure",
+};
+
 static void cper_print_proc_generic(const char *pfx,
 				    const struct cper_sec_proc_generic *proc)
 {
@@ -184,6 +199,114 @@ static void cper_print_proc_generic(const char *pfx,
 		printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
 }
 
+static void cper_print_proc_arm(const char *pfx,
+				const struct cper_sec_proc_arm *proc)
+{
+	int i, len, max_ctx_type;
+	struct cper_arm_err_info *err_info;
+	struct cper_arm_ctx_info *ctx_info;
+	char newpfx[64];
+
+	printk("%s""section length: %d\n", pfx, proc->section_length);
+	printk("%s""MIDR: 0x%016llx\n", pfx, proc->midr);
+
+	len = proc->section_length - (sizeof(*proc) +
+		proc->err_info_num * (sizeof(*err_info)));
+	if (len < 0) {
+		printk("%s""section length is too small\n", pfx);
+		printk("%s""firmware-generated error record is incorrect\n", pfx);
+		printk("%s""ERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
+		return;
+	}
+
+	if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+		printk("%s""MPIDR: 0x%016llx\n", pfx, proc->mpidr);
+	if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+		printk("%s""error affinity level: %d\n", pfx,
+			proc->affinity_level);
+	if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+		printk("%s""running state: 0x%x\n", pfx, proc->running_state);
+		printk("%s""PSCI state: %d\n", pfx, proc->psci_state);
+	}
+
+	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
+
+	err_info = (struct cper_arm_err_info *)(proc + 1);
+	for (i = 0; i < proc->err_info_num; i++) {
+		printk("%s""Error info structure %d:\n", pfx, i);
+		printk("%s""version:%d\n", newpfx, err_info->version);
+		printk("%s""length:%d\n", newpfx, err_info->length);
+		if (err_info->validation_bits &
+		    CPER_ARM_INFO_VALID_MULTI_ERR) {
+			if (err_info->multiple_error == 0)
+				printk("%s""single error\n", newpfx);
+			else if (err_info->multiple_error == 1)
+				printk("%s""multiple errors\n", newpfx);
+			else
+				printk("%s""multiple errors count:%u\n",
+				newpfx, err_info->multiple_error);
+		}
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_FLAGS) {
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_FIRST)
+				printk("%s""first error captured\n", newpfx);
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_LAST)
+				printk("%s""last error captured\n", newpfx);
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_PROPAGATED)
+				printk("%s""propagated error captured\n",
+				       newpfx);
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_OVERFLOW)
+				printk("%s""overflow occurred, error info is incomplete\n",
+				       newpfx);
+		}
+		printk("%s""error_type: %d, %s\n", newpfx, err_info->type,
+			err_info->type < ARRAY_SIZE(proc_error_type_strs) ?
+			proc_error_type_strs[err_info->type] : "unknown");
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
+			printk("%s""error_info: 0x%016llx\n", newpfx,
+			       err_info->error_info);
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
+			printk("%s""virtual fault address: 0x%016llx\n",
+				newpfx, err_info->virt_fault_addr);
+		if (err_info->validation_bits &
+		    CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
+			printk("%s""physical fault address: 0x%016llx\n",
+				newpfx, err_info->physical_fault_addr);
+		err_info += 1;
+	}
+	ctx_info = (struct cper_arm_ctx_info *)err_info;
+	max_ctx_type = ARRAY_SIZE(arm_reg_ctx_strs) - 1;
+	for (i = 0; i < proc->context_info_num; i++) {
+		int size = sizeof(*ctx_info) + ctx_info->size;
+
+		printk("%s""Context info structure %d:\n", pfx, i);
+		if (len < size) {
+			printk("%s""section length is too small\n", newpfx);
+			printk("%s""firmware-generated error record is incorrect\n", pfx);
+			return;
+		}
+		if (ctx_info->type > max_ctx_type) {
+			printk("%s""Invalid context type: %d\n", newpfx,
+							ctx_info->type);
+			printk("%s""Max context type: %d\n", newpfx,
+							max_ctx_type);
+			return;
+		}
+		printk("%s""register context type %d: %s\n", newpfx,
+			ctx_info->type, arm_reg_ctx_strs[ctx_info->type]);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+				(ctx_info + 1), ctx_info->size, 0);
+		len -= size;
+		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + size);
+	}
+
+	if (len > 0) {
+		printk("%s""Vendor specific error info has %u bytes:\n", pfx,
+		       len);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4, ctx_info,
+				len, 0);
+	}
+}
+
 static const char * const mem_err_type_strs[] = {
 	"unknown",
 	"no error",
@@ -458,6 +581,16 @@ static void cper_estatus_print_section(
 			cper_print_pcie(newpfx, pcie, gdata);
 		else
 			goto err_section_too_small;
+	} else if ((IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_ARM)) &&
+		   !uuid_le_cmp(*sec_type, CPER_SEC_PROC_ARM)) {
+		struct cper_sec_proc_arm *arm_err;
+
+		arm_err = acpi_hest_generic_data_payload(gdata);
+		printk("%ssection_type: ARM processor error\n", newpfx);
+		if (gdata->error_data_length >= sizeof(*arm_err))
+			cper_print_proc_arm(newpfx, arm_err);
+		else
+			goto err_section_too_small;
 	} else
 		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
 
diff --git a/include/linux/cper.h b/include/linux/cper.h
index dcacb1a..85450f3 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -180,6 +180,10 @@ enum {
 #define CPER_SEC_PROC_IPF						\
 	UUID_LE(0xE429FAF1, 0x3CB7, 0x11D4, 0x0B, 0xCA, 0x07, 0x00,	\
 		0x80, 0xC7, 0x3C, 0x88, 0x81)
+/* Processor Specific: ARM */
+#define CPER_SEC_PROC_ARM						\
+	UUID_LE(0xE19E3D16, 0xBC11, 0x11E4, 0x9C, 0xAA, 0xC2, 0x05,	\
+		0x1D, 0x5D, 0x46, 0xB0)
 /* Platform Memory */
 #define CPER_SEC_PLATFORM_MEM						\
 	UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83,	\
@@ -255,6 +259,22 @@ enum {
 
 #define CPER_PCIE_SLOT_SHIFT			3
 
+#define CPER_ARM_VALID_MPIDR			0x00000001
+#define CPER_ARM_VALID_AFFINITY_LEVEL		0x00000002
+#define CPER_ARM_VALID_RUNNING_STATE		0x00000004
+#define CPER_ARM_VALID_VENDOR_INFO		0x00000008
+
+#define CPER_ARM_INFO_VALID_MULTI_ERR		0x0001
+#define CPER_ARM_INFO_VALID_FLAGS		0x0002
+#define CPER_ARM_INFO_VALID_ERR_INFO		0x0004
+#define CPER_ARM_INFO_VALID_VIRT_ADDR		0x0008
+#define CPER_ARM_INFO_VALID_PHYSICAL_ADDR	0x0010
+
+#define CPER_ARM_INFO_FLAGS_FIRST		0x0001
+#define CPER_ARM_INFO_FLAGS_LAST		0x0002
+#define CPER_ARM_INFO_FLAGS_PROPAGATED		0x0004
+#define CPER_ARM_INFO_FLAGS_OVERFLOW		0x0008
+
 /*
  * All tables and structs must be byte-packed to match CPER
  * specification, since the tables are provided by the system BIOS
@@ -340,6 +360,40 @@ struct cper_ia_proc_ctx {
 	__u64	mm_reg_addr;
 };
 
+/* ARM Processor Error Section */
+struct cper_sec_proc_arm {
+	__u32	validation_bits;
+	__u16	err_info_num; /* Number of Processor Error Info */
+	__u16	context_info_num; /* Number of Processor Context Info Records*/
+	__u32	section_length;
+	__u8	affinity_level;
+	__u8	reserved[3];	/* must be zero */
+	__u64	mpidr;
+	__u64	midr;
+	__u32	running_state; /* Bit 0 set - Processor running. PSCI = 0 */
+	__u32	psci_state;
+};
+
+/* ARM Processor Error Information Structure */
+struct cper_arm_err_info {
+	__u8	version;
+	__u8	length;
+	__u16	validation_bits;
+	__u8	type;
+	__u16	multiple_error;
+	__u8	flags;
+	__u64	error_info;
+	__u64	virt_fault_addr;
+	__u64	physical_fault_addr;
+};
+
+/* ARM Processor Context Information Structure */
+struct cper_arm_ctx_info {
+	__u16	version;
+	__u16	type;
+	__u32	size;
+};
+
 /* Old Memory Error Section UEFI 2.1, 2.2 */
 struct cper_sec_mem_err_old {
 	__u64	validation_bits;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 03/10] efi: parse ARM processor error
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

Add support for ARM Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARM specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
---
 drivers/firmware/efi/cper.c | 133 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/cper.h        |  54 ++++++++++++++++++
 2 files changed, 187 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 8fa4e23..c2b0a12 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 static const char * const proc_type_strs[] = {
 	"IA32/X64",
 	"IA64",
+	"ARM",
 };
 
 static const char * const proc_isa_strs[] = {
 	"IA32",
 	"IA64",
 	"X64",
+	"ARM A32/T32",
+	"ARM A64",
 };
 
 static const char * const proc_error_type_strs[] = {
@@ -139,6 +142,18 @@ void cper_print_bits(const char *pfx, unsigned int bits,
 	"corrected",
 };
 
+static const char * const arm_reg_ctx_strs[] = {
+	"AArch32 general purpose registers",
+	"AArch32 EL1 context registers",
+	"AArch32 EL2 context registers",
+	"AArch32 secure context registers",
+	"AArch64 general purpose registers",
+	"AArch64 EL1 context registers",
+	"AArch64 EL2 context registers",
+	"AArch64 EL3 context registers",
+	"Misc. system register structure",
+};
+
 static void cper_print_proc_generic(const char *pfx,
 				    const struct cper_sec_proc_generic *proc)
 {
@@ -184,6 +199,114 @@ static void cper_print_proc_generic(const char *pfx,
 		printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
 }
 
+static void cper_print_proc_arm(const char *pfx,
+				const struct cper_sec_proc_arm *proc)
+{
+	int i, len, max_ctx_type;
+	struct cper_arm_err_info *err_info;
+	struct cper_arm_ctx_info *ctx_info;
+	char newpfx[64];
+
+	printk("%s""section length: %d\n", pfx, proc->section_length);
+	printk("%s""MIDR: 0x%016llx\n", pfx, proc->midr);
+
+	len = proc->section_length - (sizeof(*proc) +
+		proc->err_info_num * (sizeof(*err_info)));
+	if (len < 0) {
+		printk("%s""section length is too small\n", pfx);
+		printk("%s""firmware-generated error record is incorrect\n", pfx);
+		printk("%s""ERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
+		return;
+	}
+
+	if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+		printk("%s""MPIDR: 0x%016llx\n", pfx, proc->mpidr);
+	if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+		printk("%s""error affinity level: %d\n", pfx,
+			proc->affinity_level);
+	if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+		printk("%s""running state: 0x%x\n", pfx, proc->running_state);
+		printk("%s""PSCI state: %d\n", pfx, proc->psci_state);
+	}
+
+	snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
+
+	err_info = (struct cper_arm_err_info *)(proc + 1);
+	for (i = 0; i < proc->err_info_num; i++) {
+		printk("%s""Error info structure %d:\n", pfx, i);
+		printk("%s""version:%d\n", newpfx, err_info->version);
+		printk("%s""length:%d\n", newpfx, err_info->length);
+		if (err_info->validation_bits &
+		    CPER_ARM_INFO_VALID_MULTI_ERR) {
+			if (err_info->multiple_error == 0)
+				printk("%s""single error\n", newpfx);
+			else if (err_info->multiple_error == 1)
+				printk("%s""multiple errors\n", newpfx);
+			else
+				printk("%s""multiple errors count:%u\n",
+				newpfx, err_info->multiple_error);
+		}
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_FLAGS) {
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_FIRST)
+				printk("%s""first error captured\n", newpfx);
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_LAST)
+				printk("%s""last error captured\n", newpfx);
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_PROPAGATED)
+				printk("%s""propagated error captured\n",
+				       newpfx);
+			if (err_info->flags & CPER_ARM_INFO_FLAGS_OVERFLOW)
+				printk("%s""overflow occurred, error info is incomplete\n",
+				       newpfx);
+		}
+		printk("%s""error_type: %d, %s\n", newpfx, err_info->type,
+			err_info->type < ARRAY_SIZE(proc_error_type_strs) ?
+			proc_error_type_strs[err_info->type] : "unknown");
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO)
+			printk("%s""error_info: 0x%016llx\n", newpfx,
+			       err_info->error_info);
+		if (err_info->validation_bits & CPER_ARM_INFO_VALID_VIRT_ADDR)
+			printk("%s""virtual fault address: 0x%016llx\n",
+				newpfx, err_info->virt_fault_addr);
+		if (err_info->validation_bits &
+		    CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
+			printk("%s""physical fault address: 0x%016llx\n",
+				newpfx, err_info->physical_fault_addr);
+		err_info += 1;
+	}
+	ctx_info = (struct cper_arm_ctx_info *)err_info;
+	max_ctx_type = ARRAY_SIZE(arm_reg_ctx_strs) - 1;
+	for (i = 0; i < proc->context_info_num; i++) {
+		int size = sizeof(*ctx_info) + ctx_info->size;
+
+		printk("%s""Context info structure %d:\n", pfx, i);
+		if (len < size) {
+			printk("%s""section length is too small\n", newpfx);
+			printk("%s""firmware-generated error record is incorrect\n", pfx);
+			return;
+		}
+		if (ctx_info->type > max_ctx_type) {
+			printk("%s""Invalid context type: %d\n", newpfx,
+							ctx_info->type);
+			printk("%s""Max context type: %d\n", newpfx,
+							max_ctx_type);
+			return;
+		}
+		printk("%s""register context type %d: %s\n", newpfx,
+			ctx_info->type, arm_reg_ctx_strs[ctx_info->type]);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+				(ctx_info + 1), ctx_info->size, 0);
+		len -= size;
+		ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + size);
+	}
+
+	if (len > 0) {
+		printk("%s""Vendor specific error info has %u bytes:\n", pfx,
+		       len);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4, ctx_info,
+				len, 0);
+	}
+}
+
 static const char * const mem_err_type_strs[] = {
 	"unknown",
 	"no error",
@@ -458,6 +581,16 @@ static void cper_estatus_print_section(
 			cper_print_pcie(newpfx, pcie, gdata);
 		else
 			goto err_section_too_small;
+	} else if ((IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_ARM)) &&
+		   !uuid_le_cmp(*sec_type, CPER_SEC_PROC_ARM)) {
+		struct cper_sec_proc_arm *arm_err;
+
+		arm_err = acpi_hest_generic_data_payload(gdata);
+		printk("%ssection_type: ARM processor error\n", newpfx);
+		if (gdata->error_data_length >= sizeof(*arm_err))
+			cper_print_proc_arm(newpfx, arm_err);
+		else
+			goto err_section_too_small;
 	} else
 		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
 
diff --git a/include/linux/cper.h b/include/linux/cper.h
index dcacb1a..85450f3 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -180,6 +180,10 @@ enum {
 #define CPER_SEC_PROC_IPF						\
 	UUID_LE(0xE429FAF1, 0x3CB7, 0x11D4, 0x0B, 0xCA, 0x07, 0x00,	\
 		0x80, 0xC7, 0x3C, 0x88, 0x81)
+/* Processor Specific: ARM */
+#define CPER_SEC_PROC_ARM						\
+	UUID_LE(0xE19E3D16, 0xBC11, 0x11E4, 0x9C, 0xAA, 0xC2, 0x05,	\
+		0x1D, 0x5D, 0x46, 0xB0)
 /* Platform Memory */
 #define CPER_SEC_PLATFORM_MEM						\
 	UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83,	\
@@ -255,6 +259,22 @@ enum {
 
 #define CPER_PCIE_SLOT_SHIFT			3
 
+#define CPER_ARM_VALID_MPIDR			0x00000001
+#define CPER_ARM_VALID_AFFINITY_LEVEL		0x00000002
+#define CPER_ARM_VALID_RUNNING_STATE		0x00000004
+#define CPER_ARM_VALID_VENDOR_INFO		0x00000008
+
+#define CPER_ARM_INFO_VALID_MULTI_ERR		0x0001
+#define CPER_ARM_INFO_VALID_FLAGS		0x0002
+#define CPER_ARM_INFO_VALID_ERR_INFO		0x0004
+#define CPER_ARM_INFO_VALID_VIRT_ADDR		0x0008
+#define CPER_ARM_INFO_VALID_PHYSICAL_ADDR	0x0010
+
+#define CPER_ARM_INFO_FLAGS_FIRST		0x0001
+#define CPER_ARM_INFO_FLAGS_LAST		0x0002
+#define CPER_ARM_INFO_FLAGS_PROPAGATED		0x0004
+#define CPER_ARM_INFO_FLAGS_OVERFLOW		0x0008
+
 /*
  * All tables and structs must be byte-packed to match CPER
  * specification, since the tables are provided by the system BIOS
@@ -340,6 +360,40 @@ struct cper_ia_proc_ctx {
 	__u64	mm_reg_addr;
 };
 
+/* ARM Processor Error Section */
+struct cper_sec_proc_arm {
+	__u32	validation_bits;
+	__u16	err_info_num; /* Number of Processor Error Info */
+	__u16	context_info_num; /* Number of Processor Context Info Records*/
+	__u32	section_length;
+	__u8	affinity_level;
+	__u8	reserved[3];	/* must be zero */
+	__u64	mpidr;
+	__u64	midr;
+	__u32	running_state; /* Bit 0 set - Processor running. PSCI = 0 */
+	__u32	psci_state;
+};
+
+/* ARM Processor Error Information Structure */
+struct cper_arm_err_info {
+	__u8	version;
+	__u8	length;
+	__u16	validation_bits;
+	__u8	type;
+	__u16	multiple_error;
+	__u8	flags;
+	__u64	error_info;
+	__u64	virt_fault_addr;
+	__u64	physical_fault_addr;
+};
+
+/* ARM Processor Context Information Structure */
+struct cper_arm_ctx_info {
+	__u16	version;
+	__u16	type;
+	__u32	size;
+};
+
 /* Old Memory Error Section UEFI 2.1, 2.2 */
 struct cper_sec_mem_err_old {
 	__u64	validation_bits;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 04/10] arm64: exception: handle Synchronous External Abort
  2017-02-01 17:16 ` Tyler Baicar
  (?)
@ 2017-02-01 17:16   ` Tyler Baicar
  -1 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba
  Cc: Tyler Baicar

SEA exceptions are often caused by an uncorrected hardware
error, and are handled when data abort and instruction abort
exception classes have specific values for their Fault Status
Code.
When SEA occurs, before killing the process, report the error
in the kernel logs.
Update fault_info[] with specific SEA faults so that the
new SEA handler is used.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 arch/arm64/mm/fault.c | 45 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 156169c..9ae7e65 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -487,6 +487,31 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	return 1;
 }
 
+#define SEA_FnV_MASK	0x00000400
+
+/*
+ * This abort handler deals with Synchronous External Abort.
+ * It calls notifiers, and then returns "fault".
+ */
+static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
+{
+	struct siginfo info;
+
+	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
+		 fault_name(esr), esr, addr);
+
+	info.si_signo = SIGBUS;
+	info.si_errno = 0;
+	info.si_code  = 0;
+	if (esr & SEA_FnV_MASK)
+		info.si_addr = 0;
+	else
+		info.si_addr  = (void __user *)addr;
+	arm64_notify_die("", regs, &info, esr);
+
+	return 0;
+}
+
 static const struct fault_info {
 	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
 	int	sig;
@@ -509,22 +534,22 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
+	{ do_sea,		SIGBUS,  0,		"synchronous external abort"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
+	{ do_sea,		SIGBUS,  0,		"level 0 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 1 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 2 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 3 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"synchronous parity or ECC error" },
 	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
+	{ do_sea,		SIGBUS,  0,		"level 0 synchronous parity error (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 1 synchronous parity error (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 2 synchronous parity error (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 3 synchronous parity error (translation table walk)"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
 	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
 	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 04/10] arm64: exception: handle Synchronous External Abort
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose
  Cc: Tyler Baicar

SEA exceptions are often caused by an uncorrected hardware
error, and are handled when data abort and instruction abort
exception classes have specific values for their Fault Status
Code.
When SEA occurs, before killing the process, report the error
in the kernel logs.
Update fault_info[] with specific SEA faults so that the
new SEA handler is used.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 arch/arm64/mm/fault.c | 45 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 156169c..9ae7e65 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -487,6 +487,31 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	return 1;
 }
 
+#define SEA_FnV_MASK	0x00000400
+
+/*
+ * This abort handler deals with Synchronous External Abort.
+ * It calls notifiers, and then returns "fault".
+ */
+static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
+{
+	struct siginfo info;
+
+	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
+		 fault_name(esr), esr, addr);
+
+	info.si_signo = SIGBUS;
+	info.si_errno = 0;
+	info.si_code  = 0;
+	if (esr & SEA_FnV_MASK)
+		info.si_addr = 0;
+	else
+		info.si_addr  = (void __user *)addr;
+	arm64_notify_die("", regs, &info, esr);
+
+	return 0;
+}
+
 static const struct fault_info {
 	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
 	int	sig;
@@ -509,22 +534,22 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
+	{ do_sea,		SIGBUS,  0,		"synchronous external abort"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
+	{ do_sea,		SIGBUS,  0,		"level 0 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 1 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 2 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 3 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"synchronous parity or ECC error" },
 	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
+	{ do_sea,		SIGBUS,  0,		"level 0 synchronous parity error (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 1 synchronous parity error (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 2 synchronous parity error (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 3 synchronous parity error (translation table walk)"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
 	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
 	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 04/10] arm64: exception: handle Synchronous External Abort
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

SEA exceptions are often caused by an uncorrected hardware
error, and are handled when data abort and instruction abort
exception classes have specific values for their Fault Status
Code.
When SEA occurs, before killing the process, report the error
in the kernel logs.
Update fault_info[] with specific SEA faults so that the
new SEA handler is used.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 arch/arm64/mm/fault.c | 45 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 156169c..9ae7e65 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -487,6 +487,31 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	return 1;
 }
 
+#define SEA_FnV_MASK	0x00000400
+
+/*
+ * This abort handler deals with Synchronous External Abort.
+ * It calls notifiers, and then returns "fault".
+ */
+static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
+{
+	struct siginfo info;
+
+	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
+		 fault_name(esr), esr, addr);
+
+	info.si_signo = SIGBUS;
+	info.si_errno = 0;
+	info.si_code  = 0;
+	if (esr & SEA_FnV_MASK)
+		info.si_addr = 0;
+	else
+		info.si_addr  = (void __user *)addr;
+	arm64_notify_die("", regs, &info, esr);
+
+	return 0;
+}
+
 static const struct fault_info {
 	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
 	int	sig;
@@ -509,22 +534,22 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
+	{ do_sea,		SIGBUS,  0,		"synchronous external abort"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
+	{ do_sea,		SIGBUS,  0,		"level 0 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 1 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 2 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 3 (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"synchronous parity or ECC error" },
 	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
+	{ do_sea,		SIGBUS,  0,		"level 0 synchronous parity error (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 1 synchronous parity error (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 2 synchronous parity error (translation table walk)"	},
+	{ do_sea,		SIGBUS,  0,		"level 3 synchronous parity error (translation table walk)"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
 	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
 	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
  2017-02-01 17:16 ` Tyler Baicar
  (?)
@ 2017-02-01 17:16   ` Tyler Baicar
  -1 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba
  Cc: Tyler Baicar

ARM APEI extension proposal added SEA (Synchronous External Abort)
notification type for ARMv8.
Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 arch/arm64/Kconfig        |  2 ++
 arch/arm64/mm/fault.c     | 11 +++++++
 drivers/acpi/apei/Kconfig | 14 +++++++++
 drivers/acpi/apei/ghes.c  | 78 ++++++++++++++++++++++++++++++++++++++++++-----
 include/acpi/ghes.h       |  2 ++
 5 files changed, 100 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1117421..f92778d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -53,6 +53,8 @@ config ARM64
 	select HANDLE_DOMAIN_IRQ
 	select HARDIRQS_SW_RESEND
 	select HAVE_ACPI_APEI if (ACPI && EFI)
+	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
+	select HAVE_NMI if HAVE_ACPI_APEI_SEA
 	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_BITREVERSE
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 9ae7e65..5a5a096 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -30,6 +30,7 @@
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
 #include <linux/preempt.h>
+#include <linux/hardirq.h>
 
 #include <asm/bug.h>
 #include <asm/cpufeature.h>
@@ -41,6 +42,8 @@
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
 
+#include <acpi/ghes.h>
+
 static const char *fault_name(unsigned int esr);
 
 #ifdef CONFIG_KPROBES
@@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
 		 fault_name(esr), esr, addr);
 
+	/*
+	 * synchronize_rcu() will wait for nmi_exit(), so no need to
+	 * rcu_read_lock().
+	 */
+	nmi_enter();
+	ghes_notify_sea();
+	nmi_exit();
+
 	info.si_signo = SIGBUS;
 	info.si_errno = 0;
 	info.si_code  = 0;
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index b0140c8..3786ff1 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
 config HAVE_ACPI_APEI_NMI
 	bool
 
+config HAVE_ACPI_APEI_SEA
+	bool "APEI Synchronous External Abort logging/recovering support"
+	depends on ARM64
+	help
+	  This option should be enabled if the system supports
+	  firmware first handling of SEA (Synchronous External Abort).
+	  SEA happens with certain faults of data abort or instruction
+	  abort synchronous exceptions on ARMv8 systems. If a system
+	  supports firmware first handling of SEA, the platform analyzes
+	  and handles hardware error notifications with SEA, and it may then
+	  form a HW error record for the OS to parse and handle. This
+	  option allows the OS to look for such HW error record, and
+	  take appropriate action.
+
 config ACPI_APEI
 	bool "ACPI Platform Error Interface (APEI)"
 	select MISC_FILESYSTEMS
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index b25e7cf..8756172 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -114,11 +114,7 @@
  * Two virtual pages are used, one for IRQ/PROCESS context, the other for
  * NMI context (optionally).
  */
-#ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #define GHES_IOREMAP_PAGES           2
-#else
-#define GHES_IOREMAP_PAGES           1
-#endif
 #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
 #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
 
@@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
 
 static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
 {
-	unsigned long vaddr;
+	unsigned long vaddr, paddr;
+	pgprot_t prot;
 
 	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
-	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
-			   pfn << PAGE_SHIFT, PAGE_KERNEL);
+
+	paddr = pfn << PAGE_SHIFT;
+	prot = arch_apei_get_mem_attribute(paddr);
+	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
 
 	return (void __iomem *)vaddr;
 }
@@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
 	.notifier_call = ghes_notify_sci,
 };
 
+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
+static LIST_HEAD(ghes_sea);
+
+void ghes_notify_sea(void)
+{
+	struct ghes *ghes;
+
+	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
+		ghes_proc(ghes);
+	}
+}
+
+static int ghes_sea_add(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_add_rcu(&ghes->list, &ghes_sea);
+	mutex_unlock(&ghes_list_mutex);
+	return 0;
+}
+
+static void ghes_sea_remove(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_del_rcu(&ghes->list);
+	mutex_unlock(&ghes_list_mutex);
+	synchronize_rcu();
+}
+#else /* CONFIG_HAVE_ACPI_APEI_SEA */
+static inline int ghes_sea_add(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+	return -ENOTSUPP;
+}
+
+static inline void ghes_sea_remove(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+}
+#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_EXTERNAL:
 	case ACPI_HEST_NOTIFY_SCI:
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
+				generic->header.source_id);
+			rc = -ENOTSUPP;
+			goto err;
+		}
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
 			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
 			   generic->header.source_id);
 		goto err;
+	case ACPI_HEST_NOTIFY_GPIO:
+	case ACPI_HEST_NOTIFY_SEI:
+	case ACPI_HEST_NOTIFY_GSIV:
+		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
+			generic->header.source_id, generic->header.source_id);
+		rc = -ENOTSUPP;
+		goto err;
 	default:
 		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
 			   generic->notify.type, generic->header.source_id);
@@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		list_add_rcu(&ghes->list, &ghes_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		rc = ghes_sea_add(ghes);
+		if (rc)
+			goto err_edac_unreg;
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
@@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 			unregister_acpi_hed_notifier(&ghes_notifier_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		ghes_sea_remove(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 6ae318b..adf5455 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
 		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
 		gdata + 1;
 }
+
+void ghes_notify_sea(void);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose
  Cc: Tyler Baicar

ARM APEI extension proposal added SEA (Synchronous External Abort)
notification type for ARMv8.
Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 arch/arm64/Kconfig        |  2 ++
 arch/arm64/mm/fault.c     | 11 +++++++
 drivers/acpi/apei/Kconfig | 14 +++++++++
 drivers/acpi/apei/ghes.c  | 78 ++++++++++++++++++++++++++++++++++++++++++-----
 include/acpi/ghes.h       |  2 ++
 5 files changed, 100 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1117421..f92778d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -53,6 +53,8 @@ config ARM64
 	select HANDLE_DOMAIN_IRQ
 	select HARDIRQS_SW_RESEND
 	select HAVE_ACPI_APEI if (ACPI && EFI)
+	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
+	select HAVE_NMI if HAVE_ACPI_APEI_SEA
 	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_BITREVERSE
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 9ae7e65..5a5a096 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -30,6 +30,7 @@
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
 #include <linux/preempt.h>
+#include <linux/hardirq.h>
 
 #include <asm/bug.h>
 #include <asm/cpufeature.h>
@@ -41,6 +42,8 @@
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
 
+#include <acpi/ghes.h>
+
 static const char *fault_name(unsigned int esr);
 
 #ifdef CONFIG_KPROBES
@@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
 		 fault_name(esr), esr, addr);
 
+	/*
+	 * synchronize_rcu() will wait for nmi_exit(), so no need to
+	 * rcu_read_lock().
+	 */
+	nmi_enter();
+	ghes_notify_sea();
+	nmi_exit();
+
 	info.si_signo = SIGBUS;
 	info.si_errno = 0;
 	info.si_code  = 0;
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index b0140c8..3786ff1 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
 config HAVE_ACPI_APEI_NMI
 	bool
 
+config HAVE_ACPI_APEI_SEA
+	bool "APEI Synchronous External Abort logging/recovering support"
+	depends on ARM64
+	help
+	  This option should be enabled if the system supports
+	  firmware first handling of SEA (Synchronous External Abort).
+	  SEA happens with certain faults of data abort or instruction
+	  abort synchronous exceptions on ARMv8 systems. If a system
+	  supports firmware first handling of SEA, the platform analyzes
+	  and handles hardware error notifications with SEA, and it may then
+	  form a HW error record for the OS to parse and handle. This
+	  option allows the OS to look for such HW error record, and
+	  take appropriate action.
+
 config ACPI_APEI
 	bool "ACPI Platform Error Interface (APEI)"
 	select MISC_FILESYSTEMS
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index b25e7cf..8756172 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -114,11 +114,7 @@
  * Two virtual pages are used, one for IRQ/PROCESS context, the other for
  * NMI context (optionally).
  */
-#ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #define GHES_IOREMAP_PAGES           2
-#else
-#define GHES_IOREMAP_PAGES           1
-#endif
 #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
 #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
 
@@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
 
 static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
 {
-	unsigned long vaddr;
+	unsigned long vaddr, paddr;
+	pgprot_t prot;
 
 	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
-	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
-			   pfn << PAGE_SHIFT, PAGE_KERNEL);
+
+	paddr = pfn << PAGE_SHIFT;
+	prot = arch_apei_get_mem_attribute(paddr);
+	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
 
 	return (void __iomem *)vaddr;
 }
@@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
 	.notifier_call = ghes_notify_sci,
 };
 
+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
+static LIST_HEAD(ghes_sea);
+
+void ghes_notify_sea(void)
+{
+	struct ghes *ghes;
+
+	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
+		ghes_proc(ghes);
+	}
+}
+
+static int ghes_sea_add(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_add_rcu(&ghes->list, &ghes_sea);
+	mutex_unlock(&ghes_list_mutex);
+	return 0;
+}
+
+static void ghes_sea_remove(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_del_rcu(&ghes->list);
+	mutex_unlock(&ghes_list_mutex);
+	synchronize_rcu();
+}
+#else /* CONFIG_HAVE_ACPI_APEI_SEA */
+static inline int ghes_sea_add(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+	return -ENOTSUPP;
+}
+
+static inline void ghes_sea_remove(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+}
+#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_EXTERNAL:
 	case ACPI_HEST_NOTIFY_SCI:
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
+				generic->header.source_id);
+			rc = -ENOTSUPP;
+			goto err;
+		}
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
 			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
 			   generic->header.source_id);
 		goto err;
+	case ACPI_HEST_NOTIFY_GPIO:
+	case ACPI_HEST_NOTIFY_SEI:
+	case ACPI_HEST_NOTIFY_GSIV:
+		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
+			generic->header.source_id, generic->header.source_id);
+		rc = -ENOTSUPP;
+		goto err;
 	default:
 		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
 			   generic->notify.type, generic->header.source_id);
@@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		list_add_rcu(&ghes->list, &ghes_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		rc = ghes_sea_add(ghes);
+		if (rc)
+			goto err_edac_unreg;
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
@@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 			unregister_acpi_hed_notifier(&ghes_notifier_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		ghes_sea_remove(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 6ae318b..adf5455 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
 		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
 		gdata + 1;
 }
+
+void ghes_notify_sea(void);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

ARM APEI extension proposal added SEA (Synchronous External Abort)
notification type for ARMv8.
Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 arch/arm64/Kconfig        |  2 ++
 arch/arm64/mm/fault.c     | 11 +++++++
 drivers/acpi/apei/Kconfig | 14 +++++++++
 drivers/acpi/apei/ghes.c  | 78 ++++++++++++++++++++++++++++++++++++++++++-----
 include/acpi/ghes.h       |  2 ++
 5 files changed, 100 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1117421..f92778d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -53,6 +53,8 @@ config ARM64
 	select HANDLE_DOMAIN_IRQ
 	select HARDIRQS_SW_RESEND
 	select HAVE_ACPI_APEI if (ACPI && EFI)
+	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
+	select HAVE_NMI if HAVE_ACPI_APEI_SEA
 	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_BITREVERSE
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 9ae7e65..5a5a096 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -30,6 +30,7 @@
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
 #include <linux/preempt.h>
+#include <linux/hardirq.h>
 
 #include <asm/bug.h>
 #include <asm/cpufeature.h>
@@ -41,6 +42,8 @@
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
 
+#include <acpi/ghes.h>
+
 static const char *fault_name(unsigned int esr);
 
 #ifdef CONFIG_KPROBES
@@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
 		 fault_name(esr), esr, addr);
 
+	/*
+	 * synchronize_rcu() will wait for nmi_exit(), so no need to
+	 * rcu_read_lock().
+	 */
+	nmi_enter();
+	ghes_notify_sea();
+	nmi_exit();
+
 	info.si_signo = SIGBUS;
 	info.si_errno = 0;
 	info.si_code  = 0;
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index b0140c8..3786ff1 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
 config HAVE_ACPI_APEI_NMI
 	bool
 
+config HAVE_ACPI_APEI_SEA
+	bool "APEI Synchronous External Abort logging/recovering support"
+	depends on ARM64
+	help
+	  This option should be enabled if the system supports
+	  firmware first handling of SEA (Synchronous External Abort).
+	  SEA happens with certain faults of data abort or instruction
+	  abort synchronous exceptions on ARMv8 systems. If a system
+	  supports firmware first handling of SEA, the platform analyzes
+	  and handles hardware error notifications with SEA, and it may then
+	  form a HW error record for the OS to parse and handle. This
+	  option allows the OS to look for such HW error record, and
+	  take appropriate action.
+
 config ACPI_APEI
 	bool "ACPI Platform Error Interface (APEI)"
 	select MISC_FILESYSTEMS
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index b25e7cf..8756172 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -114,11 +114,7 @@
  * Two virtual pages are used, one for IRQ/PROCESS context, the other for
  * NMI context (optionally).
  */
-#ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #define GHES_IOREMAP_PAGES           2
-#else
-#define GHES_IOREMAP_PAGES           1
-#endif
 #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
 #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
 
@@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
 
 static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
 {
-	unsigned long vaddr;
+	unsigned long vaddr, paddr;
+	pgprot_t prot;
 
 	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
-	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
-			   pfn << PAGE_SHIFT, PAGE_KERNEL);
+
+	paddr = pfn << PAGE_SHIFT;
+	prot = arch_apei_get_mem_attribute(paddr);
+	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
 
 	return (void __iomem *)vaddr;
 }
@@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
 	.notifier_call = ghes_notify_sci,
 };
 
+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
+static LIST_HEAD(ghes_sea);
+
+void ghes_notify_sea(void)
+{
+	struct ghes *ghes;
+
+	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
+		ghes_proc(ghes);
+	}
+}
+
+static int ghes_sea_add(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_add_rcu(&ghes->list, &ghes_sea);
+	mutex_unlock(&ghes_list_mutex);
+	return 0;
+}
+
+static void ghes_sea_remove(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_del_rcu(&ghes->list);
+	mutex_unlock(&ghes_list_mutex);
+	synchronize_rcu();
+}
+#else /* CONFIG_HAVE_ACPI_APEI_SEA */
+static inline int ghes_sea_add(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+	return -ENOTSUPP;
+}
+
+static inline void ghes_sea_remove(struct ghes *ghes)
+{
+	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
+	       ghes->generic->header.source_id);
+}
+#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_EXTERNAL:
 	case ACPI_HEST_NOTIFY_SCI:
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
+				generic->header.source_id);
+			rc = -ENOTSUPP;
+			goto err;
+		}
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
 			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
 			   generic->header.source_id);
 		goto err;
+	case ACPI_HEST_NOTIFY_GPIO:
+	case ACPI_HEST_NOTIFY_SEI:
+	case ACPI_HEST_NOTIFY_GSIV:
+		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
+			generic->header.source_id, generic->header.source_id);
+		rc = -ENOTSUPP;
+		goto err;
 	default:
 		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
 			   generic->notify.type, generic->header.source_id);
@@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
 		list_add_rcu(&ghes->list, &ghes_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		rc = ghes_sea_add(ghes);
+		if (rc)
+			goto err_edac_unreg;
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
@@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 			unregister_acpi_hed_notifier(&ghes_notifier_sci);
 		mutex_unlock(&ghes_list_mutex);
 		break;
+	case ACPI_HEST_NOTIFY_SEA:
+		ghes_sea_remove(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 6ae318b..adf5455 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
 		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
 		gdata + 1;
 }
+
+void ghes_notify_sea(void);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
  2017-02-01 17:16 ` Tyler Baicar
  (?)
@ 2017-02-01 17:16   ` Tyler Baicar
  -1 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba
  Cc: Tyler Baicar

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

Even if an error status block's severity is fatal, the kernel does not
honor the severity level and panic.

With the firmware first model, the platform could inform the OS about a
fatal hardware error through the non-NMI GHES notification type. The OS
should panic when a hardware error record is received with this
severity.

Call panic() after CPER data in error status block is printed if
severity is fatal, before each error section is handled.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 drivers/acpi/apei/ghes.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 8756172..86c1f15 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -133,6 +133,8 @@
 static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
+static int ghes_panic_timeout __read_mostly = 30;
+
 static int ghes_ioremap_init(void)
 {
 	ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
@@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
 	return rc;
 }
 
+static void __ghes_call_panic(void)
+{
+	if (panic_timeout == 0)
+		panic_timeout = ghes_panic_timeout;
+	panic("Fatal hardware error!");
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -698,6 +707,10 @@ static int ghes_proc(struct ghes *ghes)
 		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
 			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
 	}
+	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
+		__ghes_call_panic();
+	}
+
 	ghes_do_proc(ghes, ghes->estatus);
 
 	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
@@ -828,8 +841,6 @@ static inline void ghes_sea_remove(struct ghes *ghes)
 
 static LIST_HEAD(ghes_nmi);
 
-static int ghes_panic_timeout	__read_mostly = 30;
-
 static void ghes_proc_in_irq(struct irq_work *irq_work)
 {
 	struct llist_node *llnode, *next;
@@ -922,9 +933,7 @@ static void __ghes_panic(struct ghes *ghes)
 	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
 
 	/* reboot to log the error! */
-	if (panic_timeout == 0)
-		panic_timeout = ghes_panic_timeout;
-	panic("Fatal hardware error!");
+	__ghes_call_panic();
 }
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose
  Cc: Tyler Baicar

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

Even if an error status block's severity is fatal, the kernel does not
honor the severity level and panic.

With the firmware first model, the platform could inform the OS about a
fatal hardware error through the non-NMI GHES notification type. The OS
should panic when a hardware error record is received with this
severity.

Call panic() after CPER data in error status block is printed if
severity is fatal, before each error section is handled.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 drivers/acpi/apei/ghes.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 8756172..86c1f15 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -133,6 +133,8 @@
 static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
+static int ghes_panic_timeout __read_mostly = 30;
+
 static int ghes_ioremap_init(void)
 {
 	ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
@@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
 	return rc;
 }
 
+static void __ghes_call_panic(void)
+{
+	if (panic_timeout == 0)
+		panic_timeout = ghes_panic_timeout;
+	panic("Fatal hardware error!");
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -698,6 +707,10 @@ static int ghes_proc(struct ghes *ghes)
 		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
 			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
 	}
+	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
+		__ghes_call_panic();
+	}
+
 	ghes_do_proc(ghes, ghes->estatus);
 
 	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
@@ -828,8 +841,6 @@ static inline void ghes_sea_remove(struct ghes *ghes)
 
 static LIST_HEAD(ghes_nmi);
 
-static int ghes_panic_timeout	__read_mostly = 30;
-
 static void ghes_proc_in_irq(struct irq_work *irq_work)
 {
 	struct llist_node *llnode, *next;
@@ -922,9 +933,7 @@ static void __ghes_panic(struct ghes *ghes)
 	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
 
 	/* reboot to log the error! */
-	if (panic_timeout == 0)
-		panic_timeout = ghes_panic_timeout;
-	panic("Fatal hardware error!");
+	__ghes_call_panic();
 }
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>

Even if an error status block's severity is fatal, the kernel does not
honor the severity level and panic.

With the firmware first model, the platform could inform the OS about a
fatal hardware error through the non-NMI GHES notification type. The OS
should panic when a hardware error record is received with this
severity.

Call panic() after CPER data in error status block is printed if
severity is fatal, before each error section is handled.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 drivers/acpi/apei/ghes.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 8756172..86c1f15 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -133,6 +133,8 @@
 static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
+static int ghes_panic_timeout __read_mostly = 30;
+
 static int ghes_ioremap_init(void)
 {
 	ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
@@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
 	return rc;
 }
 
+static void __ghes_call_panic(void)
+{
+	if (panic_timeout == 0)
+		panic_timeout = ghes_panic_timeout;
+	panic("Fatal hardware error!");
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
 	int rc;
@@ -698,6 +707,10 @@ static int ghes_proc(struct ghes *ghes)
 		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
 			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
 	}
+	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
+		__ghes_call_panic();
+	}
+
 	ghes_do_proc(ghes, ghes->estatus);
 
 	if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
@@ -828,8 +841,6 @@ static inline void ghes_sea_remove(struct ghes *ghes)
 
 static LIST_HEAD(ghes_nmi);
 
-static int ghes_panic_timeout	__read_mostly = 30;
-
 static void ghes_proc_in_irq(struct irq_work *irq_work)
 {
 	struct llist_node *llnode, *next;
@@ -922,9 +933,7 @@ static void __ghes_panic(struct ghes *ghes)
 	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
 
 	/* reboot to log the error! */
-	if (panic_timeout == 0)
-		panic_timeout = ghes_panic_timeout;
-	panic("Fatal hardware error!");
+	__ghes_call_panic();
 }
 
 static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 07/10] efi: print unrecognized CPER section
  2017-02-01 17:16 ` Tyler Baicar
  (?)
@ 2017-02-01 17:16   ` Tyler Baicar
  -1 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba
  Cc: Tyler Baicar

UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
one of the section types that the kernel knows how to parse, the
section is skipped. Therefore, user is not able to see
such CPER data, for instance, error record of non-standard section.

For above mentioned case, this change prints out the raw data in
hex in dmesg buffer. Data length is taken from Error Data length
field of Generic Error Data Entry.

Following is a sample output from dmesg:
[  115.771702] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
[  115.779042] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[  115.787456] {1}[Hardware Error]: event severity: corrected
[  115.792927] {1}[Hardware Error]:  Error 0, type: corrected
[  115.798415] {1}[Hardware Error]:  fru_id: 00000000-0000-0000-0000-000000000000
[  115.805596] {1}[Hardware Error]:  fru_text:
[  115.816105] {1}[Hardware Error]:  section type: d2e2621c-f936-468d-0d84-15a4ed015c8b
[  115.823880] {1}[Hardware Error]:  section length: 88
[  115.828779] {1}[Hardware Error]:   00000000: 01000001 00000002 5f434345 525f4543
[  115.836153] {1}[Hardware Error]:   00000010: 0000574d 00000000 00000000 00000000
[  115.843531] {1}[Hardware Error]:   00000020: 00000000 00000000 00000000 00000000
[  115.850908] {1}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000
[  115.858288] {1}[Hardware Error]:   00000040: fe800000 00000000 00000004 5f434345
[  115.865665] {1}[Hardware Error]:   00000050: 525f4543 0000574d

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
---
 drivers/firmware/efi/cper.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index c2b0a12..48cb8ee 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -591,8 +591,16 @@ static void cper_estatus_print_section(
 			cper_print_proc_arm(newpfx, arm_err);
 		else
 			goto err_section_too_small;
-	} else
-		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
+	} else {
+		const void *unknown_err;
+
+		unknown_err = acpi_hest_generic_data_payload(gdata);
+		printk("%ssection type: %pUl\n", newpfx, sec_type);
+		printk("%ssection length: %d\n", newpfx,
+		       gdata->error_data_length);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+			       unknown_err, gdata->error_data_length, 0);
+	}
 
 	return;
 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 07/10] efi: print unrecognized CPER section
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose
  Cc: Tyler Baicar

UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
one of the section types that the kernel knows how to parse, the
section is skipped. Therefore, user is not able to see
such CPER data, for instance, error record of non-standard section.

For above mentioned case, this change prints out the raw data in
hex in dmesg buffer. Data length is taken from Error Data length
field of Generic Error Data Entry.

Following is a sample output from dmesg:
[  115.771702] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
[  115.779042] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[  115.787456] {1}[Hardware Error]: event severity: corrected
[  115.792927] {1}[Hardware Error]:  Error 0, type: corrected
[  115.798415] {1}[Hardware Error]:  fru_id: 00000000-0000-0000-0000-000000000000
[  115.805596] {1}[Hardware Error]:  fru_text:
[  115.816105] {1}[Hardware Error]:  section type: d2e2621c-f936-468d-0d84-15a4ed015c8b
[  115.823880] {1}[Hardware Error]:  section length: 88
[  115.828779] {1}[Hardware Error]:   00000000: 01000001 00000002 5f434345 525f4543
[  115.836153] {1}[Hardware Error]:   00000010: 0000574d 00000000 00000000 00000000
[  115.843531] {1}[Hardware Error]:   00000020: 00000000 00000000 00000000 00000000
[  115.850908] {1}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000
[  115.858288] {1}[Hardware Error]:   00000040: fe800000 00000000 00000004 5f434345
[  115.865665] {1}[Hardware Error]:   00000050: 525f4543 0000574d

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
---
 drivers/firmware/efi/cper.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index c2b0a12..48cb8ee 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -591,8 +591,16 @@ static void cper_estatus_print_section(
 			cper_print_proc_arm(newpfx, arm_err);
 		else
 			goto err_section_too_small;
-	} else
-		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
+	} else {
+		const void *unknown_err;
+
+		unknown_err = acpi_hest_generic_data_payload(gdata);
+		printk("%ssection type: %pUl\n", newpfx, sec_type);
+		printk("%ssection length: %d\n", newpfx,
+		       gdata->error_data_length);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+			       unknown_err, gdata->error_data_length, 0);
+	}
 
 	return;
 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 07/10] efi: print unrecognized CPER section
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
one of the section types that the kernel knows how to parse, the
section is skipped. Therefore, user is not able to see
such CPER data, for instance, error record of non-standard section.

For above mentioned case, this change prints out the raw data in
hex in dmesg buffer. Data length is taken from Error Data length
field of Generic Error Data Entry.

Following is a sample output from dmesg:
[  115.771702] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
[  115.779042] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[  115.787456] {1}[Hardware Error]: event severity: corrected
[  115.792927] {1}[Hardware Error]:  Error 0, type: corrected
[  115.798415] {1}[Hardware Error]:  fru_id: 00000000-0000-0000-0000-000000000000
[  115.805596] {1}[Hardware Error]:  fru_text:
[  115.816105] {1}[Hardware Error]:  section type: d2e2621c-f936-468d-0d84-15a4ed015c8b
[  115.823880] {1}[Hardware Error]:  section length: 88
[  115.828779] {1}[Hardware Error]:   00000000: 01000001 00000002 5f434345 525f4543
[  115.836153] {1}[Hardware Error]:   00000010: 0000574d 00000000 00000000 00000000
[  115.843531] {1}[Hardware Error]:   00000020: 00000000 00000000 00000000 00000000
[  115.850908] {1}[Hardware Error]:   00000030: 00000000 00000000 00000000 00000000
[  115.858288] {1}[Hardware Error]:   00000040: fe800000 00000000 00000004 5f434345
[  115.865665] {1}[Hardware Error]:   00000050: 525f4543 0000574d

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
---
 drivers/firmware/efi/cper.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index c2b0a12..48cb8ee 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -591,8 +591,16 @@ static void cper_estatus_print_section(
 			cper_print_proc_arm(newpfx, arm_err);
 		else
 			goto err_section_too_small;
-	} else
-		printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
+	} else {
+		const void *unknown_err;
+
+		unknown_err = acpi_hest_generic_data_payload(gdata);
+		printk("%ssection type: %pUl\n", newpfx, sec_type);
+		printk("%ssection length: %d\n", newpfx,
+		       gdata->error_data_length);
+		print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+			       unknown_err, gdata->error_data_length, 0);
+	}
 
 	return;
 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
  2017-02-01 17:16 ` Tyler Baicar
  (?)
@ 2017-02-01 17:16   ` Tyler Baicar
  -1 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba
  Cc: Tyler Baicar

UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
any section type that the kernel knows how to parse, trace event
is not generated for such section. And thus user is not able to know
happening of such hardware error, including error record of
non-standard section.

This commit generates a trace event which contains raw error data
for unrecognized CPER section.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
---
 drivers/acpi/apei/ghes.c | 22 ++++++++++++++++++++--
 drivers/ras/ras.c        |  1 +
 include/ras/ras_event.h  | 45 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 86c1f15..a989345 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -44,11 +44,13 @@
 #include <linux/pci.h>
 #include <linux/aer.h>
 #include <linux/nmi.h>
+#include <linux/uuid.h>
 
 #include <acpi/actbl1.h>
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
+#include <ras/ras_event.h>
 
 #include "apei-internal.h"
 
@@ -452,11 +454,21 @@ static void ghes_do_proc(struct ghes *ghes,
 {
 	int sev, sec_sev;
 	struct acpi_hest_generic_data *gdata;
+	uuid_le sec_type;
+	uuid_le *fru_id = &NULL_UUID_LE;
+	char *fru_text = "";
 
 	sev = ghes_severity(estatus->error_severity);
 	apei_estatus_for_each_section(estatus, gdata) {
 		sec_sev = ghes_severity(gdata->error_severity);
-		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
+		sec_type = *(uuid_le *)gdata->section_type;
+
+		if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+			fru_id = (uuid_le *)gdata->fru_id;
+		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+			fru_text = gdata->fru_text;
+
+		if (!uuid_le_cmp(sec_type,
 				 CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err;
 
@@ -467,7 +479,7 @@ static void ghes_do_proc(struct ghes *ghes,
 			ghes_handle_memory_failure(gdata, sev);
 		}
 #ifdef CONFIG_ACPI_APEI_PCIEAER
-		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
+		else if (!uuid_le_cmp(sec_type,
 				      CPER_SEC_PCIE)) {
 			struct cper_sec_pcie *pcie_err;
 
@@ -500,6 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
 
 		}
 #endif
+		else {
+			void *unknown_err = acpi_hest_generic_data_payload(gdata);
+			trace_unknown_sec_event(&sec_type,
+					fru_id, fru_text, sec_sev,
+					unknown_err, gdata->error_data_length);
+		}
 	}
 }
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index b67dd36..fb2500b 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -27,3 +27,4 @@ static int __init ras_init(void)
 EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
+EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 1791a12..5861b6f 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -162,6 +162,51 @@
 );
 
 /*
+ * Unknown Section Report
+ *
+ * This event is generated when hardware detected a hardware
+ * error event, which may be of non-standard section as defined
+ * in UEFI spec appendix "Common Platform Error Record", or may
+ * be of sections for which TRACE_EVENT is not defined.
+ *
+ */
+TRACE_EVENT(unknown_sec_event,
+
+	TP_PROTO(const uuid_le *sec_type,
+		 const uuid_le *fru_id,
+		 const char *fru_text,
+		 const u8 sev,
+		 const u8 *err,
+		 const u32 len),
+
+	TP_ARGS(sec_type, fru_id, fru_text, sev, err, len),
+
+	TP_STRUCT__entry(
+		__array(char, sec_type, 16)
+		__array(char, fru_id, 16)
+		__string(fru_text, fru_text)
+		__field(u8, sev)
+		__field(u32, len)
+		__dynamic_array(u8, buf, len)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->sec_type, sec_type, sizeof(uuid_le));
+		memcpy(__entry->fru_id, fru_id, sizeof(uuid_le));
+		__assign_str(fru_text, fru_text);
+		__entry->sev = sev;
+		__entry->len = len;
+		memcpy(__get_dynamic_array(buf), err, len);
+	),
+
+	TP_printk("severity: %d; sec type:%pU; FRU: %pU %s; data len:%d; raw data:%s",
+		  __entry->sev, __entry->sec_type,
+		  __entry->fru_id, __get_str(fru_text),
+		  __entry->len,
+		  __print_hex(__get_dynamic_array(buf), __entry->len))
+);
+
+/*
  * PCIe AER Trace event
  *
  * These events are generated when hardware detects a corrected or
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose
  Cc: Tyler Baicar

UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
any section type that the kernel knows how to parse, trace event
is not generated for such section. And thus user is not able to know
happening of such hardware error, including error record of
non-standard section.

This commit generates a trace event which contains raw error data
for unrecognized CPER section.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
---
 drivers/acpi/apei/ghes.c | 22 ++++++++++++++++++++--
 drivers/ras/ras.c        |  1 +
 include/ras/ras_event.h  | 45 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 86c1f15..a989345 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -44,11 +44,13 @@
 #include <linux/pci.h>
 #include <linux/aer.h>
 #include <linux/nmi.h>
+#include <linux/uuid.h>
 
 #include <acpi/actbl1.h>
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
+#include <ras/ras_event.h>
 
 #include "apei-internal.h"
 
@@ -452,11 +454,21 @@ static void ghes_do_proc(struct ghes *ghes,
 {
 	int sev, sec_sev;
 	struct acpi_hest_generic_data *gdata;
+	uuid_le sec_type;
+	uuid_le *fru_id = &NULL_UUID_LE;
+	char *fru_text = "";
 
 	sev = ghes_severity(estatus->error_severity);
 	apei_estatus_for_each_section(estatus, gdata) {
 		sec_sev = ghes_severity(gdata->error_severity);
-		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
+		sec_type = *(uuid_le *)gdata->section_type;
+
+		if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+			fru_id = (uuid_le *)gdata->fru_id;
+		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+			fru_text = gdata->fru_text;
+
+		if (!uuid_le_cmp(sec_type,
 				 CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err;
 
@@ -467,7 +479,7 @@ static void ghes_do_proc(struct ghes *ghes,
 			ghes_handle_memory_failure(gdata, sev);
 		}
 #ifdef CONFIG_ACPI_APEI_PCIEAER
-		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
+		else if (!uuid_le_cmp(sec_type,
 				      CPER_SEC_PCIE)) {
 			struct cper_sec_pcie *pcie_err;
 
@@ -500,6 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
 
 		}
 #endif
+		else {
+			void *unknown_err = acpi_hest_generic_data_payload(gdata);
+			trace_unknown_sec_event(&sec_type,
+					fru_id, fru_text, sec_sev,
+					unknown_err, gdata->error_data_length);
+		}
 	}
 }
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index b67dd36..fb2500b 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -27,3 +27,4 @@ static int __init ras_init(void)
 EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
+EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 1791a12..5861b6f 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -162,6 +162,51 @@
 );
 
 /*
+ * Unknown Section Report
+ *
+ * This event is generated when hardware detected a hardware
+ * error event, which may be of non-standard section as defined
+ * in UEFI spec appendix "Common Platform Error Record", or may
+ * be of sections for which TRACE_EVENT is not defined.
+ *
+ */
+TRACE_EVENT(unknown_sec_event,
+
+	TP_PROTO(const uuid_le *sec_type,
+		 const uuid_le *fru_id,
+		 const char *fru_text,
+		 const u8 sev,
+		 const u8 *err,
+		 const u32 len),
+
+	TP_ARGS(sec_type, fru_id, fru_text, sev, err, len),
+
+	TP_STRUCT__entry(
+		__array(char, sec_type, 16)
+		__array(char, fru_id, 16)
+		__string(fru_text, fru_text)
+		__field(u8, sev)
+		__field(u32, len)
+		__dynamic_array(u8, buf, len)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->sec_type, sec_type, sizeof(uuid_le));
+		memcpy(__entry->fru_id, fru_id, sizeof(uuid_le));
+		__assign_str(fru_text, fru_text);
+		__entry->sev = sev;
+		__entry->len = len;
+		memcpy(__get_dynamic_array(buf), err, len);
+	),
+
+	TP_printk("severity: %d; sec type:%pU; FRU: %pU %s; data len:%d; raw data:%s",
+		  __entry->sev, __entry->sec_type,
+		  __entry->fru_id, __get_str(fru_text),
+		  __entry->len,
+		  __print_hex(__get_dynamic_array(buf), __entry->len))
+);
+
+/*
  * PCIe AER Trace event
  *
  * These events are generated when hardware detects a corrected or
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
any section type that the kernel knows how to parse, trace event
is not generated for such section. And thus user is not able to know
happening of such hardware error, including error record of
non-standard section.

This commit generates a trace event which contains raw error data
for unrecognized CPER section.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
---
 drivers/acpi/apei/ghes.c | 22 ++++++++++++++++++++--
 drivers/ras/ras.c        |  1 +
 include/ras/ras_event.h  | 45 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 86c1f15..a989345 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -44,11 +44,13 @@
 #include <linux/pci.h>
 #include <linux/aer.h>
 #include <linux/nmi.h>
+#include <linux/uuid.h>
 
 #include <acpi/actbl1.h>
 #include <acpi/ghes.h>
 #include <acpi/apei.h>
 #include <asm/tlbflush.h>
+#include <ras/ras_event.h>
 
 #include "apei-internal.h"
 
@@ -452,11 +454,21 @@ static void ghes_do_proc(struct ghes *ghes,
 {
 	int sev, sec_sev;
 	struct acpi_hest_generic_data *gdata;
+	uuid_le sec_type;
+	uuid_le *fru_id = &NULL_UUID_LE;
+	char *fru_text = "";
 
 	sev = ghes_severity(estatus->error_severity);
 	apei_estatus_for_each_section(estatus, gdata) {
 		sec_sev = ghes_severity(gdata->error_severity);
-		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
+		sec_type = *(uuid_le *)gdata->section_type;
+
+		if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+			fru_id = (uuid_le *)gdata->fru_id;
+		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+			fru_text = gdata->fru_text;
+
+		if (!uuid_le_cmp(sec_type,
 				 CPER_SEC_PLATFORM_MEM)) {
 			struct cper_sec_mem_err *mem_err;
 
@@ -467,7 +479,7 @@ static void ghes_do_proc(struct ghes *ghes,
 			ghes_handle_memory_failure(gdata, sev);
 		}
 #ifdef CONFIG_ACPI_APEI_PCIEAER
-		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
+		else if (!uuid_le_cmp(sec_type,
 				      CPER_SEC_PCIE)) {
 			struct cper_sec_pcie *pcie_err;
 
@@ -500,6 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
 
 		}
 #endif
+		else {
+			void *unknown_err = acpi_hest_generic_data_payload(gdata);
+			trace_unknown_sec_event(&sec_type,
+					fru_id, fru_text, sec_sev,
+					unknown_err, gdata->error_data_length);
+		}
 	}
 }
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index b67dd36..fb2500b 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -27,3 +27,4 @@ static int __init ras_init(void)
 EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
+EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 1791a12..5861b6f 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -162,6 +162,51 @@
 );
 
 /*
+ * Unknown Section Report
+ *
+ * This event is generated when hardware detected a hardware
+ * error event, which may be of non-standard section as defined
+ * in UEFI spec appendix "Common Platform Error Record", or may
+ * be of sections for which TRACE_EVENT is not defined.
+ *
+ */
+TRACE_EVENT(unknown_sec_event,
+
+	TP_PROTO(const uuid_le *sec_type,
+		 const uuid_le *fru_id,
+		 const char *fru_text,
+		 const u8 sev,
+		 const u8 *err,
+		 const u32 len),
+
+	TP_ARGS(sec_type, fru_id, fru_text, sev, err, len),
+
+	TP_STRUCT__entry(
+		__array(char, sec_type, 16)
+		__array(char, fru_id, 16)
+		__string(fru_text, fru_text)
+		__field(u8, sev)
+		__field(u32, len)
+		__dynamic_array(u8, buf, len)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->sec_type, sec_type, sizeof(uuid_le));
+		memcpy(__entry->fru_id, fru_id, sizeof(uuid_le));
+		__assign_str(fru_text, fru_text);
+		__entry->sev = sev;
+		__entry->len = len;
+		memcpy(__get_dynamic_array(buf), err, len);
+	),
+
+	TP_printk("severity: %d; sec type:%pU; FRU: %pU %s; data len:%d; raw data:%s",
+		  __entry->sev, __entry->sec_type,
+		  __entry->fru_id, __get_str(fru_text),
+		  __entry->len,
+		  __print_hex(__get_dynamic_array(buf), __entry->len))
+);
+
+/*
  * PCIe AER Trace event
  *
  * These events are generated when hardware detects a corrected or
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 09/10] trace, ras: add ARM processor error trace event
  2017-02-01 17:16 ` Tyler Baicar
  (?)
@ 2017-02-01 17:16   ` Tyler Baicar
  -1 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba
  Cc: Tyler Baicar

Currently there are trace events for the various RAS
errors with the exception of ARM processor type errors.
Add a new trace event for such errors so that the user
will know when they occur. These trace events are
consistent with the ARM processor error section type
defined in UEFI 2.6 spec section N.2.4.4.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
---
 drivers/acpi/apei/ghes.c    |  7 ++++++-
 drivers/firmware/efi/cper.c |  1 +
 drivers/ras/ras.c           |  1 +
 include/ras/ras_event.h     | 34 ++++++++++++++++++++++++++++++++++
 4 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index a989345..013faf0 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -512,7 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
 
 		}
 #endif
-		else {
+		else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM)) {
+			struct cper_sec_proc_arm *arm_err;
+
+			arm_err = acpi_hest_generic_data_payload(gdata);
+			trace_arm_event(arm_err);
+		} else {
 			void *unknown_err = acpi_hest_generic_data_payload(gdata);
 			trace_unknown_sec_event(&sec_type,
 					fru_id, fru_text, sec_sev,
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 48cb8ee..0ec678e 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -35,6 +35,7 @@
 #include <linux/printk.h>
 #include <linux/bcd.h>
 #include <acpi/ghes.h>
+#include <ras/ras_event.h>
 
 #define INDENT_SP	" "
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index fb2500b..8ba5a94 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -28,3 +28,4 @@ static int __init ras_init(void)
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
+EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 5861b6f..b36db48 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -162,6 +162,40 @@
 );
 
 /*
+ * ARM Processor Events Report
+ *
+ * This event is generated when hardware detects an ARM processor error
+ * has occurred. UEFI 2.6 spec section N.2.4.4.
+ */
+TRACE_EVENT(arm_event,
+
+	TP_PROTO(const struct cper_sec_proc_arm *proc),
+
+	TP_ARGS(proc),
+
+	TP_STRUCT__entry(
+		__field(u64, mpidr)
+		__field(u64, midr)
+		__field(u32, running_state)
+		__field(u32, psci_state)
+		__field(u8, affinity)
+	),
+
+	TP_fast_assign(
+		__entry->affinity = proc->affinity_level;
+		__entry->mpidr = proc->mpidr;
+		__entry->midr = proc->midr;
+		__entry->running_state = proc->running_state;
+		__entry->psci_state = proc->psci_state;
+	),
+
+	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
+		  "running state: %d; PSCI state: %d",
+		  __entry->affinity, __entry->mpidr, __entry->midr,
+		  __entry->running_state, __entry->psci_state)
+);
+
+/*
  * Unknown Section Report
  *
  * This event is generated when hardware detected a hardware
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 09/10] trace, ras: add ARM processor error trace event
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose
  Cc: Tyler Baicar

Currently there are trace events for the various RAS
errors with the exception of ARM processor type errors.
Add a new trace event for such errors so that the user
will know when they occur. These trace events are
consistent with the ARM processor error section type
defined in UEFI 2.6 spec section N.2.4.4.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
---
 drivers/acpi/apei/ghes.c    |  7 ++++++-
 drivers/firmware/efi/cper.c |  1 +
 drivers/ras/ras.c           |  1 +
 include/ras/ras_event.h     | 34 ++++++++++++++++++++++++++++++++++
 4 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index a989345..013faf0 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -512,7 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
 
 		}
 #endif
-		else {
+		else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM)) {
+			struct cper_sec_proc_arm *arm_err;
+
+			arm_err = acpi_hest_generic_data_payload(gdata);
+			trace_arm_event(arm_err);
+		} else {
 			void *unknown_err = acpi_hest_generic_data_payload(gdata);
 			trace_unknown_sec_event(&sec_type,
 					fru_id, fru_text, sec_sev,
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 48cb8ee..0ec678e 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -35,6 +35,7 @@
 #include <linux/printk.h>
 #include <linux/bcd.h>
 #include <acpi/ghes.h>
+#include <ras/ras_event.h>
 
 #define INDENT_SP	" "
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index fb2500b..8ba5a94 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -28,3 +28,4 @@ static int __init ras_init(void)
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
+EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 5861b6f..b36db48 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -162,6 +162,40 @@
 );
 
 /*
+ * ARM Processor Events Report
+ *
+ * This event is generated when hardware detects an ARM processor error
+ * has occurred. UEFI 2.6 spec section N.2.4.4.
+ */
+TRACE_EVENT(arm_event,
+
+	TP_PROTO(const struct cper_sec_proc_arm *proc),
+
+	TP_ARGS(proc),
+
+	TP_STRUCT__entry(
+		__field(u64, mpidr)
+		__field(u64, midr)
+		__field(u32, running_state)
+		__field(u32, psci_state)
+		__field(u8, affinity)
+	),
+
+	TP_fast_assign(
+		__entry->affinity = proc->affinity_level;
+		__entry->mpidr = proc->mpidr;
+		__entry->midr = proc->midr;
+		__entry->running_state = proc->running_state;
+		__entry->psci_state = proc->psci_state;
+	),
+
+	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
+		  "running state: %d; PSCI state: %d",
+		  __entry->affinity, __entry->mpidr, __entry->midr,
+		  __entry->running_state, __entry->psci_state)
+);
+
+/*
  * Unknown Section Report
  *
  * This event is generated when hardware detected a hardware
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 09/10] trace, ras: add ARM processor error trace event
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

Currently there are trace events for the various RAS
errors with the exception of ARM processor type errors.
Add a new trace event for such errors so that the user
will know when they occur. These trace events are
consistent with the ARM processor error section type
defined in UEFI 2.6 spec section N.2.4.4.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
---
 drivers/acpi/apei/ghes.c    |  7 ++++++-
 drivers/firmware/efi/cper.c |  1 +
 drivers/ras/ras.c           |  1 +
 include/ras/ras_event.h     | 34 ++++++++++++++++++++++++++++++++++
 4 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index a989345..013faf0 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -512,7 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
 
 		}
 #endif
-		else {
+		else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM)) {
+			struct cper_sec_proc_arm *arm_err;
+
+			arm_err = acpi_hest_generic_data_payload(gdata);
+			trace_arm_event(arm_err);
+		} else {
 			void *unknown_err = acpi_hest_generic_data_payload(gdata);
 			trace_unknown_sec_event(&sec_type,
 					fru_id, fru_text, sec_sev,
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 48cb8ee..0ec678e 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -35,6 +35,7 @@
 #include <linux/printk.h>
 #include <linux/bcd.h>
 #include <acpi/ghes.h>
+#include <ras/ras_event.h>
 
 #define INDENT_SP	" "
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index fb2500b..8ba5a94 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -28,3 +28,4 @@ static int __init ras_init(void)
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
+EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 5861b6f..b36db48 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -162,6 +162,40 @@
 );
 
 /*
+ * ARM Processor Events Report
+ *
+ * This event is generated when hardware detects an ARM processor error
+ * has occurred. UEFI 2.6 spec section N.2.4.4.
+ */
+TRACE_EVENT(arm_event,
+
+	TP_PROTO(const struct cper_sec_proc_arm *proc),
+
+	TP_ARGS(proc),
+
+	TP_STRUCT__entry(
+		__field(u64, mpidr)
+		__field(u64, midr)
+		__field(u32, running_state)
+		__field(u32, psci_state)
+		__field(u8, affinity)
+	),
+
+	TP_fast_assign(
+		__entry->affinity = proc->affinity_level;
+		__entry->mpidr = proc->mpidr;
+		__entry->midr = proc->midr;
+		__entry->running_state = proc->running_state;
+		__entry->psci_state = proc->psci_state;
+	),
+
+	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
+		  "running state: %d; PSCI state: %d",
+		  __entry->affinity, __entry->mpidr, __entry->midr,
+		  __entry->running_state, __entry->psci_state)
+);
+
+/*
  * Unknown Section Report
  *
  * This event is generated when hardware detected a hardware
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 10/10] arm/arm64: KVM: add guest SEA support
  2017-02-01 17:16 ` Tyler Baicar
  (?)
@ 2017-02-01 17:16   ` Tyler Baicar
  -1 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba
  Cc: Tyler Baicar

Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 arch/arm/include/asm/kvm_arm.h       |  1 +
 arch/arm/include/asm/system_misc.h   |  5 +++++
 arch/arm/kvm/mmu.c                   | 18 ++++++++++++++++--
 arch/arm64/include/asm/kvm_arm.h     |  1 +
 arch/arm64/include/asm/system_misc.h |  2 ++
 arch/arm64/mm/fault.c                | 16 ++++++++++++++++
 6 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index e22089f..33a77509 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -187,6 +187,7 @@
 #define FSC_FAULT	(0x04)
 #define FSC_ACCESS	(0x08)
 #define FSC_PERM	(0x0c)
+#define FSC_EXTABT	(0x10)
 
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK	(~0xf)
diff --git a/arch/arm/include/asm/system_misc.h b/arch/arm/include/asm/system_misc.h
index a3d61ad..ea45d94 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -24,4 +24,9 @@
 
 #endif /* !__ASSEMBLY__ */
 
+static inline int handle_guest_sea(unsigned long addr, unsigned int esr)
+{
+	return -1;
+}
+
 #endif /* __ASM_ARM_SYSTEM_MISC_H */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index a5265ed..04f1dd50 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -29,6 +29,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/virt.h>
+#include <asm/system_misc.h>
 
 #include "trace.h"
 
@@ -1444,8 +1445,21 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 	/* Check the stage-2 fault is trans. fault or write fault */
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
-	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
-	    fault_status != FSC_ACCESS) {
+
+	/* The host kernel will handle the synchronous external abort. There
+	 * is no need to pass the error into the guest.
+	 */
+	if (fault_status == FSC_EXTABT) {
+		if(handle_guest_sea((unsigned long)fault_ipa,
+				    kvm_vcpu_get_hsr(vcpu))) {
+			kvm_err("Failed to handle guest SEA, FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
+				kvm_vcpu_trap_get_class(vcpu),
+				(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
+				(unsigned long)kvm_vcpu_get_hsr(vcpu));
+			return -EFAULT;
+		}
+	} else if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
+		   fault_status != FSC_ACCESS) {
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 2a2752b..2b11d59 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -201,6 +201,7 @@
 #define FSC_FAULT	ESR_ELx_FSC_FAULT
 #define FSC_ACCESS	ESR_ELx_FSC_ACCESS
 #define FSC_PERM	ESR_ELx_FSC_PERM
+#define FSC_EXTABT	ESR_ELx_FSC_EXTABT
 
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK	(~UL(0xf))
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index bc81243..5b2cecd1 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -58,4 +58,6 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
 
 #endif	/* __ASSEMBLY__ */
 
+int handle_guest_sea(unsigned long addr, unsigned int esr);
+
 #endif	/* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 5a5a096..edd0c4f 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -602,6 +602,22 @@ static const char *fault_name(unsigned int esr)
 }
 
 /*
+ * Handle Synchronous External Aborts that occur in a guest kernel.
+ */
+int handle_guest_sea(unsigned long addr, unsigned int esr)
+{
+	/*
+	 * synchronize_rcu() will wait for nmi_exit(), so no need to
+	 * rcu_read_lock().
+	 */
+	nmi_enter();
+	ghes_notify_sea();
+	nmi_exit();
+
+	return 0;
+}
+
+/*
  * Dispatch a data abort to the relevant handler.
  */
 asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 10/10] arm/arm64: KVM: add guest SEA support
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose
  Cc: Tyler Baicar

Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 arch/arm/include/asm/kvm_arm.h       |  1 +
 arch/arm/include/asm/system_misc.h   |  5 +++++
 arch/arm/kvm/mmu.c                   | 18 ++++++++++++++++--
 arch/arm64/include/asm/kvm_arm.h     |  1 +
 arch/arm64/include/asm/system_misc.h |  2 ++
 arch/arm64/mm/fault.c                | 16 ++++++++++++++++
 6 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index e22089f..33a77509 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -187,6 +187,7 @@
 #define FSC_FAULT	(0x04)
 #define FSC_ACCESS	(0x08)
 #define FSC_PERM	(0x0c)
+#define FSC_EXTABT	(0x10)
 
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK	(~0xf)
diff --git a/arch/arm/include/asm/system_misc.h b/arch/arm/include/asm/system_misc.h
index a3d61ad..ea45d94 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -24,4 +24,9 @@
 
 #endif /* !__ASSEMBLY__ */
 
+static inline int handle_guest_sea(unsigned long addr, unsigned int esr)
+{
+	return -1;
+}
+
 #endif /* __ASM_ARM_SYSTEM_MISC_H */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index a5265ed..04f1dd50 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -29,6 +29,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/virt.h>
+#include <asm/system_misc.h>
 
 #include "trace.h"
 
@@ -1444,8 +1445,21 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 	/* Check the stage-2 fault is trans. fault or write fault */
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
-	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
-	    fault_status != FSC_ACCESS) {
+
+	/* The host kernel will handle the synchronous external abort. There
+	 * is no need to pass the error into the guest.
+	 */
+	if (fault_status == FSC_EXTABT) {
+		if(handle_guest_sea((unsigned long)fault_ipa,
+				    kvm_vcpu_get_hsr(vcpu))) {
+			kvm_err("Failed to handle guest SEA, FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
+				kvm_vcpu_trap_get_class(vcpu),
+				(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
+				(unsigned long)kvm_vcpu_get_hsr(vcpu));
+			return -EFAULT;
+		}
+	} else if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
+		   fault_status != FSC_ACCESS) {
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 2a2752b..2b11d59 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -201,6 +201,7 @@
 #define FSC_FAULT	ESR_ELx_FSC_FAULT
 #define FSC_ACCESS	ESR_ELx_FSC_ACCESS
 #define FSC_PERM	ESR_ELx_FSC_PERM
+#define FSC_EXTABT	ESR_ELx_FSC_EXTABT
 
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK	(~UL(0xf))
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index bc81243..5b2cecd1 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -58,4 +58,6 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
 
 #endif	/* __ASSEMBLY__ */
 
+int handle_guest_sea(unsigned long addr, unsigned int esr);
+
 #endif	/* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 5a5a096..edd0c4f 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -602,6 +602,22 @@ static const char *fault_name(unsigned int esr)
 }
 
 /*
+ * Handle Synchronous External Aborts that occur in a guest kernel.
+ */
+int handle_guest_sea(unsigned long addr, unsigned int esr)
+{
+	/*
+	 * synchronize_rcu() will wait for nmi_exit(), so no need to
+	 * rcu_read_lock().
+	 */
+	nmi_enter();
+	ghes_notify_sea();
+	nmi_exit();
+
+	return 0;
+}
+
+/*
  * Dispatch a data abort to the relevant handler.
  */
 asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH V8 10/10] arm/arm64: KVM: add guest SEA support
@ 2017-02-01 17:16   ` Tyler Baicar
  0 siblings, 0 replies; 97+ messages in thread
From: Tyler Baicar @ 2017-02-01 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 arch/arm/include/asm/kvm_arm.h       |  1 +
 arch/arm/include/asm/system_misc.h   |  5 +++++
 arch/arm/kvm/mmu.c                   | 18 ++++++++++++++++--
 arch/arm64/include/asm/kvm_arm.h     |  1 +
 arch/arm64/include/asm/system_misc.h |  2 ++
 arch/arm64/mm/fault.c                | 16 ++++++++++++++++
 6 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index e22089f..33a77509 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -187,6 +187,7 @@
 #define FSC_FAULT	(0x04)
 #define FSC_ACCESS	(0x08)
 #define FSC_PERM	(0x0c)
+#define FSC_EXTABT	(0x10)
 
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK	(~0xf)
diff --git a/arch/arm/include/asm/system_misc.h b/arch/arm/include/asm/system_misc.h
index a3d61ad..ea45d94 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -24,4 +24,9 @@
 
 #endif /* !__ASSEMBLY__ */
 
+static inline int handle_guest_sea(unsigned long addr, unsigned int esr)
+{
+	return -1;
+}
+
 #endif /* __ASM_ARM_SYSTEM_MISC_H */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index a5265ed..04f1dd50 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -29,6 +29,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/virt.h>
+#include <asm/system_misc.h>
 
 #include "trace.h"
 
@@ -1444,8 +1445,21 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 	/* Check the stage-2 fault is trans. fault or write fault */
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
-	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
-	    fault_status != FSC_ACCESS) {
+
+	/* The host kernel will handle the synchronous external abort. There
+	 * is no need to pass the error into the guest.
+	 */
+	if (fault_status == FSC_EXTABT) {
+		if(handle_guest_sea((unsigned long)fault_ipa,
+				    kvm_vcpu_get_hsr(vcpu))) {
+			kvm_err("Failed to handle guest SEA, FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
+				kvm_vcpu_trap_get_class(vcpu),
+				(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
+				(unsigned long)kvm_vcpu_get_hsr(vcpu));
+			return -EFAULT;
+		}
+	} else if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
+		   fault_status != FSC_ACCESS) {
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 2a2752b..2b11d59 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -201,6 +201,7 @@
 #define FSC_FAULT	ESR_ELx_FSC_FAULT
 #define FSC_ACCESS	ESR_ELx_FSC_ACCESS
 #define FSC_PERM	ESR_ELx_FSC_PERM
+#define FSC_EXTABT	ESR_ELx_FSC_EXTABT
 
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK	(~UL(0xf))
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index bc81243..5b2cecd1 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -58,4 +58,6 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
 
 #endif	/* __ASSEMBLY__ */
 
+int handle_guest_sea(unsigned long addr, unsigned int esr);
+
 #endif	/* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 5a5a096..edd0c4f 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -602,6 +602,22 @@ static const char *fault_name(unsigned int esr)
 }
 
 /*
+ * Handle Synchronous External Aborts that occur in a guest kernel.
+ */
+int handle_guest_sea(unsigned long addr, unsigned int esr)
+{
+	/*
+	 * synchronize_rcu() will wait for nmi_exit(), so no need to
+	 * rcu_read_lock().
+	 */
+	nmi_enter();
+	ghes_notify_sea();
+	nmi_exit();
+
+	return 0;
+}
+
+/*
  * Dispatch a data abort to the relevant handler.
  */
 asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
  2017-02-01 17:16   ` Tyler Baicar
  (?)
  (?)
@ 2017-02-01 22:26       ` kbuild test robot
  -1 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-01 22:26 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: kbuild-all-JC7UmRfGjtg, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
	rkrcmar-H+wXaHxf7aLQT0dZR+AlfA, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	matt-mF/unelCI9GS6iBeEJttW/XRex20P6io,
	robert.moore-ral2JQCrhuEAvxtiuMwx3w,
	lv.zheng-ral2JQCrhuEAvxtiuMwx3w, nkaje-sgV2jX0FEOL9JmXXK+q4OQ,
	zjzhang-sgV2jX0FEOL9JmXXK+q4OQ, mark.rutland-5wv7dgnIgG8,
	james.morse-5wv7dgnIgG8, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	eun.taik.lee-Sze3O3UU22JBDgjK7y7TUQ,
	sandeepa.s.prabhu-Re5JQEeQqe8AvxtiuMwx3w,
	labbott-H+wXaHxf7aLQT0dZR+AlfA, shijie.huang-5wv7dgnIgG8,
	rruigrok-sgV2jX0FEOL9JmXXK+q4OQ,
	paul.gortmaker-CWA4WttNNZF54TAoqtyWWQ, tn-nYOzD4b6Jr9Wk0Htik3J/w,
	fu.wei-QSEj5FYQhm4dnm+yROfE0A, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	bristot-H+wXaHxf7aLQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1505 bytes --]

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/mm/built-in.o: In function `do_sea':
>> arch/arm64/mm/fault.c:511: undefined reference to `ghes_notify_sea'
   arch/arm64/mm/fault.c:511:(.text+0x1868): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ghes_notify_sea'

vim +511 arch/arm64/mm/fault.c

   505	
   506		/*
   507		 * synchronize_rcu() will wait for nmi_exit(), so no need to
   508		 * rcu_read_lock().
   509		 */
   510		nmi_enter();
 > 511		ghes_notify_sea();
   512		nmi_exit();
   513	
   514		info.si_signo = SIGBUS;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33928 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-01 22:26       ` kbuild test robot
  0 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-01 22:26 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: kbuild-all, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, lenb, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland,
	james.morse, akpm, eun.taik.lee, sandeepa.s.prabhu, labbott,
	shijie.huang, rruigrok, paul.gortmaker, tn, fu.wei, rostedt,
	bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel, linux-acpi,
	linux-efi, devel, Suzuki.Poulose, punit.agrawal, astone, harba,
	hanjun.guo, john.garry, shiju.jose, Tyler Baicar

[-- Attachment #1: Type: text/plain, Size: 1505 bytes --]

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/mm/built-in.o: In function `do_sea':
>> arch/arm64/mm/fault.c:511: undefined reference to `ghes_notify_sea'
   arch/arm64/mm/fault.c:511:(.text+0x1868): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ghes_notify_sea'

vim +511 arch/arm64/mm/fault.c

   505	
   506		/*
   507		 * synchronize_rcu() will wait for nmi_exit(), so no need to
   508		 * rcu_read_lock().
   509		 */
   510		nmi_enter();
 > 511		ghes_notify_sea();
   512		nmi_exit();
   513	
   514		info.si_signo = SIGBUS;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33928 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-01 22:26       ` kbuild test robot
  0 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-01 22:26 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: kbuild-all-JC7UmRfGjtg, christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
	rkrcmar-H+wXaHxf7aLQT0dZR+AlfA, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	matt-mF/unelCI9GS6iBeEJttW/XRex20P6io,
	robert.moore-ral2JQCrhuEAvxtiuMwx3w,
	lv.zheng-ral2JQCrhuEAvxtiuMwx3w, nkaje-sgV2jX0FEOL9JmXXK+q4OQ,
	zjzhang-sgV2jX0FEOL9JmXXK+q4OQ, mark.rutland-5wv7dgnIgG8,
	james.morse-5wv7dgnIgG8, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	eun.taik.lee-Sze3O3UU22JBDgjK7y7TUQ,
	sandeepa.s.prabhu-Re5JQEeQqe8AvxtiuMwx3w,
	labbott-H+wXaHxf7aLQT0dZR+AlfA, shijie.huang-5wv7dgnIgG8,
	rruigrok-sgV2jX0FEOL9JmXXK+q4OQ,
	paul.gortmaker-CWA4WttNNZF54TAoqtyWWQ, tn-nYOzD4b6Jr9Wk0Htik3J/w,
	fu.wei-QSEj5FYQhm4dnm+yROfE0A, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	bristot-H+wXaHxf7aLQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA, linux-

[-- Attachment #1: Type: text/plain, Size: 1505 bytes --]

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/mm/built-in.o: In function `do_sea':
>> arch/arm64/mm/fault.c:511: undefined reference to `ghes_notify_sea'
   arch/arm64/mm/fault.c:511:(.text+0x1868): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ghes_notify_sea'

vim +511 arch/arm64/mm/fault.c

   505	
   506		/*
   507		 * synchronize_rcu() will wait for nmi_exit(), so no need to
   508		 * rcu_read_lock().
   509		 */
   510		nmi_enter();
 > 511		ghes_notify_sea();
   512		nmi_exit();
   513	
   514		info.si_signo = SIGBUS;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33928 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-01 22:26       ` kbuild test robot
  0 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-01 22:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/mm/built-in.o: In function `do_sea':
>> arch/arm64/mm/fault.c:511: undefined reference to `ghes_notify_sea'
   arch/arm64/mm/fault.c:511:(.text+0x1868): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ghes_notify_sea'

vim +511 arch/arm64/mm/fault.c

   505	
   506		/*
   507		 * synchronize_rcu() will wait for nmi_exit(), so no need to
   508		 * rcu_read_lock().
   509		 */
   510		nmi_enter();
 > 511		ghes_notify_sea();
   512		nmi_exit();
   513	
   514		info.si_signo = SIGBUS;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 33928 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20170202/f7eb780a/attachment-0001.gz>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
  2017-02-01 17:16   ` Tyler Baicar
  (?)
  (?)
@ 2017-02-01 23:20     ` kbuild test robot
  -1 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-01 23:20 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: kbuild-all, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, lenb, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland,
	james.morse, akpm, eun.taik.lee, sandeepa.s.prabhu, labbott,
	shijie.huang, rruigrok, paul.gortmaker, tn, fu.wei, rostedt,
	bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-r0-02020102 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `ghes_do_proc.isra.9':
>> ghes.c:(.text+0x92f3a): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x92f73): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x92fdb): undefined reference to `__tracepoint_unknown_sec_event'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25871 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-01 23:20     ` kbuild test robot
  0 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-01 23:20 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: kbuild-all, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, lenb, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland,
	james.morse, akpm, eun.taik.lee, sandeepa.s.prabhu, labbott,
	shijie.huang, rruigrok, paul.gortmaker, tn, fu.wei, rostedt,
	bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel, linux-acpi,
	linux-efi, devel, Suzuki.Poulose, punit.agrawal, astone, harba,
	hanjun.guo, john.garry, shiju.jose, Tyler Baicar

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-r0-02020102 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `ghes_do_proc.isra.9':
>> ghes.c:(.text+0x92f3a): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x92f73): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x92fdb): undefined reference to `__tracepoint_unknown_sec_event'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25871 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-01 23:20     ` kbuild test robot
  0 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-01 23:20 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: kbuild-all, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, lenb, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland,
	james.morse, akpm, eun.taik.lee, sandeepa.s.prabhu, labbott,
	shijie.huang, rruigrok, paul.gortmaker, tn, fu.wei, rostedt,
	bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel, linux-acpi,
	linux-

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-r0-02020102 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `ghes_do_proc.isra.9':
>> ghes.c:(.text+0x92f3a): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x92f73): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x92fdb): undefined reference to `__tracepoint_unknown_sec_event'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25871 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-01 23:20     ` kbuild test robot
  0 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-01 23:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-r0-02020102 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `ghes_do_proc.isra.9':
>> ghes.c:(.text+0x92f3a): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x92f73): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x92fdb): undefined reference to `__tracepoint_unknown_sec_event'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 25871 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20170202/8c6b557c/attachment-0001.gz>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 09/10] trace, ras: add ARM processor error trace event
  2017-02-01 17:16   ` Tyler Baicar
  (?)
  (?)
@ 2017-02-02  2:34     ` kbuild test robot
  -1 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-02  2:34 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: kbuild-all, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, lenb, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland,
	james.morse, akpm, eun.taik.lee, sandeepa.s.prabhu, labbott,
	shijie.huang, rruigrok, paul.gortmaker, tn, fu.wei, rostedt,
	bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 1353 bytes --]

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-r0-02020102 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `ghes_do_proc.isra.9':
>> ghes.c:(.text+0x92fda): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x9300a): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x93054): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x93083): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x930b8): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x93121): undefined reference to `__tracepoint_unknown_sec_event'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25871 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 09/10] trace, ras: add ARM processor error trace event
@ 2017-02-02  2:34     ` kbuild test robot
  0 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-02  2:34 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: kbuild-all, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, lenb, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland,
	james.morse, akpm, eun.taik.lee, sandeepa.s.prabhu, labbott,
	shijie.huang, rruigrok, paul.gortmaker, tn, fu.wei, rostedt,
	bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel, linux-acpi,
	linux-efi, devel, Suzuki.Poulose, punit.agrawal, astone, harba,
	hanjun.guo, john.garry, shiju.jose, Tyler Baicar

[-- Attachment #1: Type: text/plain, Size: 1353 bytes --]

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-r0-02020102 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `ghes_do_proc.isra.9':
>> ghes.c:(.text+0x92fda): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x9300a): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x93054): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x93083): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x930b8): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x93121): undefined reference to `__tracepoint_unknown_sec_event'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25871 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 09/10] trace, ras: add ARM processor error trace event
@ 2017-02-02  2:34     ` kbuild test robot
  0 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-02  2:34 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: kbuild-all, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, lenb, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland,
	james.morse, akpm, eun.taik.lee, sandeepa.s.prabhu, labbott,
	shijie.huang, rruigrok, paul.gortmaker, tn, fu.wei, rostedt,
	bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel, linux-acpi,
	linux-

[-- Attachment #1: Type: text/plain, Size: 1353 bytes --]

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-r0-02020102 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `ghes_do_proc.isra.9':
>> ghes.c:(.text+0x92fda): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x9300a): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x93054): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x93083): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x930b8): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x93121): undefined reference to `__tracepoint_unknown_sec_event'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25871 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 09/10] trace, ras: add ARM processor error trace event
@ 2017-02-02  2:34     ` kbuild test robot
  0 siblings, 0 replies; 97+ messages in thread
From: kbuild test robot @ 2017-02-02  2:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tyler,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.10-rc6 next-20170201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Baicar/Add-UEFI-2-6-and-ACPI-6-1-updates-for-RAS-on-ARM64/20170202-020320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-r0-02020102 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/built-in.o: In function `ghes_do_proc.isra.9':
>> ghes.c:(.text+0x92fda): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x9300a): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x93054): undefined reference to `__tracepoint_arm_event'
   ghes.c:(.text+0x93083): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x930b8): undefined reference to `__tracepoint_unknown_sec_event'
   ghes.c:(.text+0x93121): undefined reference to `__tracepoint_unknown_sec_event'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 25871 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20170202/00eeb16c/attachment-0001.gz>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 09/10] trace, ras: add ARM processor error trace event
  2017-02-01 17:16   ` Tyler Baicar
  (?)
@ 2017-02-02  3:15     ` Steven Rostedt
  -1 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-02-02  3:15 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: linux-efi, kvm, matt, catalin.marinas, will.deacon, robert.moore,
	paul.gortmaker, lv.zheng, kvmarm, fu.wei, zjzhang, linux,
	linux-acpi, eun.taik.lee, shijie.huang, labbott, lenb, harba,
	john.garry, marc.zyngier, punit.agrawal, nkaje,
	sandeepa.s.prabhu, linux-arm-kernel, devel, rjw, rruigrok,
	linux-kernel, astone, hanjun.guo, pbonzini, akpm, bristot,
	shiju.jose

On Wed,  1 Feb 2017 10:16:52 -0700
Tyler Baicar <tbaicar@codeaurora.org> wrote:

> Currently there are trace events for the various RAS
> errors with the exception of ARM processor type errors.
> Add a new trace event for such errors so that the user
> will know when they occur. These trace events are
> consistent with the ARM processor error section type
> defined in UEFI 2.6 spec section N.2.4.4.
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Acked-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  drivers/acpi/apei/ghes.c    |  7 ++++++-
>  drivers/firmware/efi/cper.c |  1 +
>  drivers/ras/ras.c           |  1 +
>  include/ras/ras_event.h     | 34 ++++++++++++++++++++++++++++++++++
>  4 files changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index a989345..013faf0 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -512,7 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>  
>  		}
>  #endif
> -		else {
> +		else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM)) {
> +			struct cper_sec_proc_arm *arm_err;
> +
> +			arm_err = acpi_hest_generic_data_payload(gdata);
> +			trace_arm_event(arm_err);

According to the kbuild failure, I'm guessing this file requires a:

 #include <ras/ras_event.h>

-- Steve

> +		} else {
>  			void *unknown_err = acpi_hest_generic_data_payload(gdata);
>  			trace_unknown_sec_event(&sec_type,
>  					fru_id, fru_text, sec_sev,
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 48cb8ee..0ec678e 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -35,6 +35,7 @@
>  #include <linux/printk.h>
>  #include <linux/bcd.h>
>  #include <acpi/ghes.h>
> +#include <ras/ras_event.h>
>  
>  #define INDENT_SP	" "
>  
> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index fb2500b..8ba5a94 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -28,3 +28,4 @@ static int __init ras_init(void)
>  #endif
>  EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index 5861b6f..b36db48 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -162,6 +162,40 @@
>  );
>  
>  /*
> + * ARM Processor Events Report
> + *
> + * This event is generated when hardware detects an ARM processor error
> + * has occurred. UEFI 2.6 spec section N.2.4.4.
> + */
> +TRACE_EVENT(arm_event,
> +
> +	TP_PROTO(const struct cper_sec_proc_arm *proc),
> +
> +	TP_ARGS(proc),
> +
> +	TP_STRUCT__entry(
> +		__field(u64, mpidr)
> +		__field(u64, midr)
> +		__field(u32, running_state)
> +		__field(u32, psci_state)
> +		__field(u8, affinity)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->affinity = proc->affinity_level;
> +		__entry->mpidr = proc->mpidr;
> +		__entry->midr = proc->midr;
> +		__entry->running_state = proc->running_state;
> +		__entry->psci_state = proc->psci_state;
> +	),
> +
> +	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
> +		  "running state: %d; PSCI state: %d",
> +		  __entry->affinity, __entry->mpidr, __entry->midr,
> +		  __entry->running_state, __entry->psci_state)
> +);
> +
> +/*
>   * Unknown Section Report
>   *
>   * This event is generated when hardware detected a hardware

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 09/10] trace, ras: add ARM processor error trace event
@ 2017-02-02  3:15     ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-02-02  3:15 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, bristot, linux-arm-kernel, kvmarm,
	kvm, linux-kernel, linux-acpi, linux-efi, devel, Suzuki.Poulose,
	punit.agrawal, astone, harba, hanjun.guo, john.garry, shiju.jose

On Wed,  1 Feb 2017 10:16:52 -0700
Tyler Baicar <tbaicar@codeaurora.org> wrote:

> Currently there are trace events for the various RAS
> errors with the exception of ARM processor type errors.
> Add a new trace event for such errors so that the user
> will know when they occur. These trace events are
> consistent with the ARM processor error section type
> defined in UEFI 2.6 spec section N.2.4.4.
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Acked-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  drivers/acpi/apei/ghes.c    |  7 ++++++-
>  drivers/firmware/efi/cper.c |  1 +
>  drivers/ras/ras.c           |  1 +
>  include/ras/ras_event.h     | 34 ++++++++++++++++++++++++++++++++++
>  4 files changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index a989345..013faf0 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -512,7 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>  
>  		}
>  #endif
> -		else {
> +		else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM)) {
> +			struct cper_sec_proc_arm *arm_err;
> +
> +			arm_err = acpi_hest_generic_data_payload(gdata);
> +			trace_arm_event(arm_err);

According to the kbuild failure, I'm guessing this file requires a:

 #include <ras/ras_event.h>

-- Steve

> +		} else {
>  			void *unknown_err = acpi_hest_generic_data_payload(gdata);
>  			trace_unknown_sec_event(&sec_type,
>  					fru_id, fru_text, sec_sev,
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 48cb8ee..0ec678e 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -35,6 +35,7 @@
>  #include <linux/printk.h>
>  #include <linux/bcd.h>
>  #include <acpi/ghes.h>
> +#include <ras/ras_event.h>
>  
>  #define INDENT_SP	" "
>  
> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index fb2500b..8ba5a94 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -28,3 +28,4 @@ static int __init ras_init(void)
>  #endif
>  EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index 5861b6f..b36db48 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -162,6 +162,40 @@
>  );
>  
>  /*
> + * ARM Processor Events Report
> + *
> + * This event is generated when hardware detects an ARM processor error
> + * has occurred. UEFI 2.6 spec section N.2.4.4.
> + */
> +TRACE_EVENT(arm_event,
> +
> +	TP_PROTO(const struct cper_sec_proc_arm *proc),
> +
> +	TP_ARGS(proc),
> +
> +	TP_STRUCT__entry(
> +		__field(u64, mpidr)
> +		__field(u64, midr)
> +		__field(u32, running_state)
> +		__field(u32, psci_state)
> +		__field(u8, affinity)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->affinity = proc->affinity_level;
> +		__entry->mpidr = proc->mpidr;
> +		__entry->midr = proc->midr;
> +		__entry->running_state = proc->running_state;
> +		__entry->psci_state = proc->psci_state;
> +	),
> +
> +	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
> +		  "running state: %d; PSCI state: %d",
> +		  __entry->affinity, __entry->mpidr, __entry->midr,
> +		  __entry->running_state, __entry->psci_state)
> +);
> +
> +/*
>   * Unknown Section Report
>   *
>   * This event is generated when hardware detected a hardware

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 09/10] trace, ras: add ARM processor error trace event
@ 2017-02-02  3:15     ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-02-02  3:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed,  1 Feb 2017 10:16:52 -0700
Tyler Baicar <tbaicar@codeaurora.org> wrote:

> Currently there are trace events for the various RAS
> errors with the exception of ARM processor type errors.
> Add a new trace event for such errors so that the user
> will know when they occur. These trace events are
> consistent with the ARM processor error section type
> defined in UEFI 2.6 spec section N.2.4.4.
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Acked-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  drivers/acpi/apei/ghes.c    |  7 ++++++-
>  drivers/firmware/efi/cper.c |  1 +
>  drivers/ras/ras.c           |  1 +
>  include/ras/ras_event.h     | 34 ++++++++++++++++++++++++++++++++++
>  4 files changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index a989345..013faf0 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -512,7 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>  
>  		}
>  #endif
> -		else {
> +		else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM)) {
> +			struct cper_sec_proc_arm *arm_err;
> +
> +			arm_err = acpi_hest_generic_data_payload(gdata);
> +			trace_arm_event(arm_err);

According to the kbuild failure, I'm guessing this file requires a:

 #include <ras/ras_event.h>

-- Steve

> +		} else {
>  			void *unknown_err = acpi_hest_generic_data_payload(gdata);
>  			trace_unknown_sec_event(&sec_type,
>  					fru_id, fru_text, sec_sev,
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 48cb8ee..0ec678e 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -35,6 +35,7 @@
>  #include <linux/printk.h>
>  #include <linux/bcd.h>
>  #include <acpi/ghes.h>
> +#include <ras/ras_event.h>
>  
>  #define INDENT_SP	" "
>  
> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index fb2500b..8ba5a94 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -28,3 +28,4 @@ static int __init ras_init(void)
>  #endif
>  EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index 5861b6f..b36db48 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -162,6 +162,40 @@
>  );
>  
>  /*
> + * ARM Processor Events Report
> + *
> + * This event is generated when hardware detects an ARM processor error
> + * has occurred. UEFI 2.6 spec section N.2.4.4.
> + */
> +TRACE_EVENT(arm_event,
> +
> +	TP_PROTO(const struct cper_sec_proc_arm *proc),
> +
> +	TP_ARGS(proc),
> +
> +	TP_STRUCT__entry(
> +		__field(u64, mpidr)
> +		__field(u64, midr)
> +		__field(u32, running_state)
> +		__field(u32, psci_state)
> +		__field(u8, affinity)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->affinity = proc->affinity_level;
> +		__entry->mpidr = proc->mpidr;
> +		__entry->midr = proc->midr;
> +		__entry->running_state = proc->running_state;
> +		__entry->psci_state = proc->psci_state;
> +	),
> +
> +	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
> +		  "running state: %d; PSCI state: %d",
> +		  __entry->affinity, __entry->mpidr, __entry->midr,
> +		  __entry->running_state, __entry->psci_state)
> +);
> +
> +/*
>   * Unknown Section Report
>   *
>   * This event is generated when hardware detected a hardware

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 04/10] arm64: exception: handle Synchronous External Abort
  2017-02-01 17:16   ` Tyler Baicar
  (?)
@ 2017-02-03 15:59     ` James Morse
  -1 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-03 15:59 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: linux-efi, kvm, matt, catalin.marinas, will.deacon, robert.moore,
	paul.gortmaker, lv.zheng, kvmarm, fu.wei, zjzhang, linux,
	linux-acpi, eun.taik.lee, shijie.huang, labbott, lenb, harba,
	john.garry, marc.zyngier, punit.agrawal, rostedt, nkaje,
	sandeepa.s.prabhu, linux-arm-kernel, devel, rjw, rruigrok,
	linux-kernel, astone, hanjun.guo, pbonzini, akpm, bristot,
	shiju.jose

Hi Tyler,

On 01/02/17 17:16, Tyler Baicar wrote:
> SEA exceptions are often caused by an uncorrected hardware
> error, and are handled when data abort and instruction abort
> exception classes have specific values for their Fault Status
> Code.
> When SEA occurs, before killing the process, report the error
> in the kernel logs.
> Update fault_info[] with specific SEA faults so that the
> new SEA handler is used.

> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 156169c..9ae7e65 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -487,6 +487,31 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	return 1;
>  }
>  
> +#define SEA_FnV_MASK	0x00000400

There are a glut of ESR_ELx_ macros in arch/arm64/include/asm/esr.h, could this
be fitted in there in a similar format?


--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -83,6 +83,7 @@
 #define ESR_ELx_WNR            (UL(1) << 6)

 /* Shared ISS field definitions for Data/Instruction aborts */
+#define ESR_ELx_FnV            (UL(1) << 10)
 #define ESR_ELx_EA             (UL(1) << 9)
 #define ESR_ELx_S1PTW          (UL(1) << 7)


> +
> +/*
> + * This abort handler deals with Synchronous External Abort.
> + * It calls notifiers, and then returns "fault".
> + */
> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
> +{
> +	struct siginfo info;
> +
> +	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
> +		 fault_name(esr), esr, addr);
> +
> +	info.si_signo = SIGBUS;
> +	info.si_errno = 0;
> +	info.si_code  = 0;
> +	if (esr & SEA_FnV_MASK)
> +		info.si_addr = 0;
> +	else
> +		info.si_addr  = (void __user *)addr;
> +	arm64_notify_die("", regs, &info, esr);
> +
> +	return 0;
> +}
> +
>  static const struct fault_info {
>  	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
>  	int	sig;
> @@ -509,22 +534,22 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
> +	{ do_sea,		SIGBUS,  0,		"synchronous external abort"	},

This will print:
> Synchronous External Abort: synchronous external abort

It looks odd, but I can't think of anything better to put there.


>  	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
> +	{ do_sea,		SIGBUS,  0,		"level 0 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 1 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 2 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 3 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"synchronous parity or ECC error" },
>  	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> +	{ do_sea,		SIGBUS,  0,		"level 0 synchronous parity error (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 1 synchronous parity error (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 2 synchronous parity error (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 3 synchronous parity error (translation table walk)"	},
>  	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
>  	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
>  	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
> 


With the ESR_ELx_FnV change above,
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 04/10] arm64: exception: handle Synchronous External Abort
@ 2017-02-03 15:59     ` James Morse
  0 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-03 15:59 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, akpm, eun.taik.lee,
	sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose

Hi Tyler,

On 01/02/17 17:16, Tyler Baicar wrote:
> SEA exceptions are often caused by an uncorrected hardware
> error, and are handled when data abort and instruction abort
> exception classes have specific values for their Fault Status
> Code.
> When SEA occurs, before killing the process, report the error
> in the kernel logs.
> Update fault_info[] with specific SEA faults so that the
> new SEA handler is used.

> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 156169c..9ae7e65 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -487,6 +487,31 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	return 1;
>  }
>  
> +#define SEA_FnV_MASK	0x00000400

There are a glut of ESR_ELx_ macros in arch/arm64/include/asm/esr.h, could this
be fitted in there in a similar format?


--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -83,6 +83,7 @@
 #define ESR_ELx_WNR            (UL(1) << 6)

 /* Shared ISS field definitions for Data/Instruction aborts */
+#define ESR_ELx_FnV            (UL(1) << 10)
 #define ESR_ELx_EA             (UL(1) << 9)
 #define ESR_ELx_S1PTW          (UL(1) << 7)


> +
> +/*
> + * This abort handler deals with Synchronous External Abort.
> + * It calls notifiers, and then returns "fault".
> + */
> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
> +{
> +	struct siginfo info;
> +
> +	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
> +		 fault_name(esr), esr, addr);
> +
> +	info.si_signo = SIGBUS;
> +	info.si_errno = 0;
> +	info.si_code  = 0;
> +	if (esr & SEA_FnV_MASK)
> +		info.si_addr = 0;
> +	else
> +		info.si_addr  = (void __user *)addr;
> +	arm64_notify_die("", regs, &info, esr);
> +
> +	return 0;
> +}
> +
>  static const struct fault_info {
>  	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
>  	int	sig;
> @@ -509,22 +534,22 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
> +	{ do_sea,		SIGBUS,  0,		"synchronous external abort"	},

This will print:
> Synchronous External Abort: synchronous external abort

It looks odd, but I can't think of anything better to put there.


>  	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
> +	{ do_sea,		SIGBUS,  0,		"level 0 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 1 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 2 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 3 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"synchronous parity or ECC error" },
>  	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> +	{ do_sea,		SIGBUS,  0,		"level 0 synchronous parity error (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 1 synchronous parity error (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 2 synchronous parity error (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 3 synchronous parity error (translation table walk)"	},
>  	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
>  	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
>  	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
> 


With the ESR_ELx_FnV change above,
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 04/10] arm64: exception: handle Synchronous External Abort
@ 2017-02-03 15:59     ` James Morse
  0 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-03 15:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tyler,

On 01/02/17 17:16, Tyler Baicar wrote:
> SEA exceptions are often caused by an uncorrected hardware
> error, and are handled when data abort and instruction abort
> exception classes have specific values for their Fault Status
> Code.
> When SEA occurs, before killing the process, report the error
> in the kernel logs.
> Update fault_info[] with specific SEA faults so that the
> new SEA handler is used.

> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 156169c..9ae7e65 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -487,6 +487,31 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	return 1;
>  }
>  
> +#define SEA_FnV_MASK	0x00000400

There are a glut of ESR_ELx_ macros in arch/arm64/include/asm/esr.h, could this
be fitted in there in a similar format?


--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -83,6 +83,7 @@
 #define ESR_ELx_WNR            (UL(1) << 6)

 /* Shared ISS field definitions for Data/Instruction aborts */
+#define ESR_ELx_FnV            (UL(1) << 10)
 #define ESR_ELx_EA             (UL(1) << 9)
 #define ESR_ELx_S1PTW          (UL(1) << 7)


> +
> +/*
> + * This abort handler deals with Synchronous External Abort.
> + * It calls notifiers, and then returns "fault".
> + */
> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
> +{
> +	struct siginfo info;
> +
> +	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
> +		 fault_name(esr), esr, addr);
> +
> +	info.si_signo = SIGBUS;
> +	info.si_errno = 0;
> +	info.si_code  = 0;
> +	if (esr & SEA_FnV_MASK)
> +		info.si_addr = 0;
> +	else
> +		info.si_addr  = (void __user *)addr;
> +	arm64_notify_die("", regs, &info, esr);
> +
> +	return 0;
> +}
> +
>  static const struct fault_info {
>  	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
>  	int	sig;
> @@ -509,22 +534,22 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
> +	{ do_sea,		SIGBUS,  0,		"synchronous external abort"	},

This will print:
> Synchronous External Abort: synchronous external abort

It looks odd, but I can't think of anything better to put there.


>  	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
> +	{ do_sea,		SIGBUS,  0,		"level 0 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 1 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 2 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 3 (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"synchronous parity or ECC error" },
>  	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> +	{ do_sea,		SIGBUS,  0,		"level 0 synchronous parity error (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 1 synchronous parity error (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 2 synchronous parity error (translation table walk)"	},
> +	{ do_sea,		SIGBUS,  0,		"level 3 synchronous parity error (translation table walk)"	},
>  	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
>  	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
>  	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
> 


With the ESR_ELx_FnV change above,
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
  2017-02-01 17:16   ` Tyler Baicar
  (?)
@ 2017-02-03 16:00     ` James Morse
  -1 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-03 16:00 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: linux-efi, kvm, matt, catalin.marinas, will.deacon, robert.moore,
	paul.gortmaker, lv.zheng, kvmarm, fu.wei, zjzhang, linux,
	linux-acpi, eun.taik.lee, shijie.huang, labbott, lenb, harba,
	john.garry, marc.zyngier, punit.agrawal, rostedt, nkaje,
	sandeepa.s.prabhu, linux-arm-kernel, devel, rjw, rruigrok,
	linux-kernel, astone, hanjun.guo, pbonzini, akpm, bristot,
	shiju.jose

Hi Tyler,

On 01/02/17 17:16, Tyler Baicar wrote:
> ARM APEI extension proposal added SEA (Synchronous External Abort)
> notification type for ARMv8.
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.

It's worth adding details of the other things this patch changes, just to alert
busy reviewers, something like:
An SEA can interrupt code that had interrupts masked and is treated as an NMI.
To aid this the page of address space for mapping APEI buffers while in_nmi() is
always reserved, and ghes_ioremap_pfn_nmi() is changed to use the helper methods
to find the prot_t to map with in the same way as ghes_ioremap_pfn_irq().


> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1117421..f92778d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -53,6 +53,8 @@ config ARM64
>  	select HANDLE_DOMAIN_IRQ
>  	select HARDIRQS_SW_RESEND
>  	select HAVE_ACPI_APEI if (ACPI && EFI)
> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)

> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA

Nit: This section of the file is largely in alphabetical order, can we try to
keep it that way?!


>  	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_BITREVERSE
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9ae7e65..5a5a096 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -30,6 +30,7 @@
>  #include <linux/highmem.h>
>  #include <linux/perf_event.h>
>  #include <linux/preempt.h>

> +#include <linux/hardirq.h>

This header is already included by this file further up.


>  
>  #include <asm/bug.h>
>  #include <asm/cpufeature.h>
> @@ -41,6 +42,8 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  
> +#include <acpi/ghes.h>
> +
>  static const char *fault_name(unsigned int esr);
>  
>  #ifdef CONFIG_KPROBES
> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>  		 fault_name(esr), esr, addr);
>  

> +	/*
> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
> +	 * rcu_read_lock().
> +	 */

This comment should go against the use of RCU in ghes_notify_sea(), but there
should be something here to explain the surprise use of nmi. Something like:
Synchronous aborts may interrupt code which had interrupts masked. Before
calling out into the wider kernel tell the interested subsystems.


This should be wrapped in:
if (IS_ENABLED(HAVE_ACPI_APEI_SEA)) {

> +	nmi_enter();
> +	ghes_notify_sea();
> +	nmi_exit();

}

To avoid breaking systems that don't have SEA configured.


> +
>  	info.si_signo = SIGBUS;
>  	info.si_errno = 0;
>  	info.si_code  = 0;
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index b0140c8..3786ff1 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>  config HAVE_ACPI_APEI_NMI
>  	bool
>  
> +config HAVE_ACPI_APEI_SEA
> +	bool "APEI Synchronous External Abort logging/recovering support"
> +	depends on ARM64

depends on CONFIG_ACPI_APEI_GHES
?

(I think this is what the kbuild robot has managed to miss out!)


> +	help
> +	  This option should be enabled if the system supports
> +	  firmware first handling of SEA (Synchronous External Abort).
> +	  SEA happens with certain faults of data abort or instruction
> +	  abort synchronous exceptions on ARMv8 systems. If a system
> +	  supports firmware first handling of SEA, the platform analyzes
> +	  and handles hardware error notifications with SEA, and it may then

Nit: notifications from SEA,


> +	  form a HW error record for the OS to parse and handle. This
> +	  option allows the OS to look for such HW error record, and

Nit: 'HW'->hardware. This is spelled out for the other seven uses in the file.


> +	  take appropriate action.
> +
>  config ACPI_APEI
>  	bool "ACPI Platform Error Interface (APEI)"
>  	select MISC_FILESYSTEMS
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index b25e7cf..8756172 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -114,11 +114,7 @@
>   * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>   * NMI context (optionally).
>   */
> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  #define GHES_IOREMAP_PAGES           2
> -#else
> -#define GHES_IOREMAP_PAGES           1
> -#endif
>  #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>  #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>  
> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>  
>  static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>  {
> -	unsigned long vaddr;
> +	unsigned long vaddr, paddr;
> +	pgprot_t prot;
>  
>  	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
> +

> +	paddr = pfn << PAGE_SHIFT;

Physical addresses might not always fit in 'unsigned long'. phys_addr_t exists
to hide this nasty detail!

>From arch/x86/Kconfig:
> config ARCH_PHYS_ADDR_T_64BIT
> 	def_bool y
> 	depends on X86_64 || X86_PAE

32bit x86 kernels configured with PAE define phys_addr_t to be u64.


> +	prot = arch_apei_get_mem_attribute(paddr);
> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>  
>  	return (void __iomem *)vaddr;
>  }
> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>  	.notifier_call = ghes_notify_sci,
>  };
>  
> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
> +static LIST_HEAD(ghes_sea);
> +
> +void ghes_notify_sea(void)
> +{
> +	struct ghes *ghes;
> +

	/*
	 * synchronize_rcu() will wait for nmi_exit(), so no need to
	 * rcu_read_lock().
	 */

> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
> +		ghes_proc(ghes);
> +	}
> +}
> +
> +static int ghes_sea_add(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_add_rcu(&ghes->list, &ghes_sea);
> +	mutex_unlock(&ghes_list_mutex);

> +	return 0;

This function returns 0 or -ENOTSUPP, depending on CONFIG_HAVE_ACPI_APEI_SEA,
but ...


> +}
> +
> +static void ghes_sea_remove(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_del_rcu(&ghes->list);
> +	mutex_unlock(&ghes_list_mutex);
> +	synchronize_rcu();
> +}
> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
> +static inline int ghes_sea_add(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +	return -ENOTSUPP;
> +}
> +
> +static inline void ghes_sea_remove(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +}
> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
> +
>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  /*
>   * printk is not safe in NMI context.  So in NMI handler, we allocate
> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  	case ACPI_HEST_NOTIFY_EXTERNAL:
>  	case ACPI_HEST_NOTIFY_SCI:
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
> +				generic->header.source_id);
> +			rc = -ENOTSUPP;
> +			goto err;

... here we jump out of ghes_probe() if NOTIFY_SEA is used but the kernel wasn't
built with CONFIG_HAVE_ACPI_APEI_SEA....


> +		}
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>  			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>  			   generic->header.source_id);
>  		goto err;
> +	case ACPI_HEST_NOTIFY_GPIO:
> +	case ACPI_HEST_NOTIFY_SEI:
> +	case ACPI_HEST_NOTIFY_GSIV:
> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
> +			generic->header.source_id, generic->header.source_id);
> +		rc = -ENOTSUPP;
> +		goto err;
>  	default:
>  		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>  			   generic->notify.type, generic->header.source_id);
> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		list_add_rcu(&ghes->list, &ghes_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		rc = ghes_sea_add(ghes);

... so this error handling will never be needed.
ghes_nmi_add() returns void.

I guess the not-configured versions of the symbols need to exist for older
compilers that can't work out that this can never be called.


> +		if (rc)
> +			goto err_edac_unreg;
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_add(ghes);
>  		break;
> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		ghes_sea_remove(ghes);
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_remove(ghes);
>  		break;

> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 6ae318b..adf5455 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>  		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>  		gdata + 1;
>  }
> +
> +void ghes_notify_sea(void);
> 

This header file ought to have an include guard, could you add one?

I think the kbuild-robot has managed to configure SEA on, but GHES off so ghes.c
isn't included in the kernel. I will dig some more into this on Monday.



Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-03 16:00     ` James Morse
  0 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-03 16:00 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, akpm, eun.taik.lee,
	sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose

Hi Tyler,

On 01/02/17 17:16, Tyler Baicar wrote:
> ARM APEI extension proposal added SEA (Synchronous External Abort)
> notification type for ARMv8.
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.

It's worth adding details of the other things this patch changes, just to alert
busy reviewers, something like:
An SEA can interrupt code that had interrupts masked and is treated as an NMI.
To aid this the page of address space for mapping APEI buffers while in_nmi() is
always reserved, and ghes_ioremap_pfn_nmi() is changed to use the helper methods
to find the prot_t to map with in the same way as ghes_ioremap_pfn_irq().


> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1117421..f92778d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -53,6 +53,8 @@ config ARM64
>  	select HANDLE_DOMAIN_IRQ
>  	select HARDIRQS_SW_RESEND
>  	select HAVE_ACPI_APEI if (ACPI && EFI)
> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)

> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA

Nit: This section of the file is largely in alphabetical order, can we try to
keep it that way?!


>  	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_BITREVERSE
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9ae7e65..5a5a096 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -30,6 +30,7 @@
>  #include <linux/highmem.h>
>  #include <linux/perf_event.h>
>  #include <linux/preempt.h>

> +#include <linux/hardirq.h>

This header is already included by this file further up.


>  
>  #include <asm/bug.h>
>  #include <asm/cpufeature.h>
> @@ -41,6 +42,8 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  
> +#include <acpi/ghes.h>
> +
>  static const char *fault_name(unsigned int esr);
>  
>  #ifdef CONFIG_KPROBES
> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>  		 fault_name(esr), esr, addr);
>  

> +	/*
> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
> +	 * rcu_read_lock().
> +	 */

This comment should go against the use of RCU in ghes_notify_sea(), but there
should be something here to explain the surprise use of nmi. Something like:
Synchronous aborts may interrupt code which had interrupts masked. Before
calling out into the wider kernel tell the interested subsystems.


This should be wrapped in:
if (IS_ENABLED(HAVE_ACPI_APEI_SEA)) {

> +	nmi_enter();
> +	ghes_notify_sea();
> +	nmi_exit();

}

To avoid breaking systems that don't have SEA configured.


> +
>  	info.si_signo = SIGBUS;
>  	info.si_errno = 0;
>  	info.si_code  = 0;
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index b0140c8..3786ff1 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>  config HAVE_ACPI_APEI_NMI
>  	bool
>  
> +config HAVE_ACPI_APEI_SEA
> +	bool "APEI Synchronous External Abort logging/recovering support"
> +	depends on ARM64

depends on CONFIG_ACPI_APEI_GHES
?

(I think this is what the kbuild robot has managed to miss out!)


> +	help
> +	  This option should be enabled if the system supports
> +	  firmware first handling of SEA (Synchronous External Abort).
> +	  SEA happens with certain faults of data abort or instruction
> +	  abort synchronous exceptions on ARMv8 systems. If a system
> +	  supports firmware first handling of SEA, the platform analyzes
> +	  and handles hardware error notifications with SEA, and it may then

Nit: notifications from SEA,


> +	  form a HW error record for the OS to parse and handle. This
> +	  option allows the OS to look for such HW error record, and

Nit: 'HW'->hardware. This is spelled out for the other seven uses in the file.


> +	  take appropriate action.
> +
>  config ACPI_APEI
>  	bool "ACPI Platform Error Interface (APEI)"
>  	select MISC_FILESYSTEMS
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index b25e7cf..8756172 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -114,11 +114,7 @@
>   * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>   * NMI context (optionally).
>   */
> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  #define GHES_IOREMAP_PAGES           2
> -#else
> -#define GHES_IOREMAP_PAGES           1
> -#endif
>  #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>  #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>  
> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>  
>  static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>  {
> -	unsigned long vaddr;
> +	unsigned long vaddr, paddr;
> +	pgprot_t prot;
>  
>  	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
> +

> +	paddr = pfn << PAGE_SHIFT;

Physical addresses might not always fit in 'unsigned long'. phys_addr_t exists
to hide this nasty detail!

>From arch/x86/Kconfig:
> config ARCH_PHYS_ADDR_T_64BIT
> 	def_bool y
> 	depends on X86_64 || X86_PAE

32bit x86 kernels configured with PAE define phys_addr_t to be u64.


> +	prot = arch_apei_get_mem_attribute(paddr);
> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>  
>  	return (void __iomem *)vaddr;
>  }
> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>  	.notifier_call = ghes_notify_sci,
>  };
>  
> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
> +static LIST_HEAD(ghes_sea);
> +
> +void ghes_notify_sea(void)
> +{
> +	struct ghes *ghes;
> +

	/*
	 * synchronize_rcu() will wait for nmi_exit(), so no need to
	 * rcu_read_lock().
	 */

> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
> +		ghes_proc(ghes);
> +	}
> +}
> +
> +static int ghes_sea_add(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_add_rcu(&ghes->list, &ghes_sea);
> +	mutex_unlock(&ghes_list_mutex);

> +	return 0;

This function returns 0 or -ENOTSUPP, depending on CONFIG_HAVE_ACPI_APEI_SEA,
but ...


> +}
> +
> +static void ghes_sea_remove(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_del_rcu(&ghes->list);
> +	mutex_unlock(&ghes_list_mutex);
> +	synchronize_rcu();
> +}
> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
> +static inline int ghes_sea_add(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +	return -ENOTSUPP;
> +}
> +
> +static inline void ghes_sea_remove(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +}
> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
> +
>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  /*
>   * printk is not safe in NMI context.  So in NMI handler, we allocate
> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  	case ACPI_HEST_NOTIFY_EXTERNAL:
>  	case ACPI_HEST_NOTIFY_SCI:
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
> +				generic->header.source_id);
> +			rc = -ENOTSUPP;
> +			goto err;

... here we jump out of ghes_probe() if NOTIFY_SEA is used but the kernel wasn't
built with CONFIG_HAVE_ACPI_APEI_SEA....


> +		}
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>  			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>  			   generic->header.source_id);
>  		goto err;
> +	case ACPI_HEST_NOTIFY_GPIO:
> +	case ACPI_HEST_NOTIFY_SEI:
> +	case ACPI_HEST_NOTIFY_GSIV:
> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
> +			generic->header.source_id, generic->header.source_id);
> +		rc = -ENOTSUPP;
> +		goto err;
>  	default:
>  		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>  			   generic->notify.type, generic->header.source_id);
> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		list_add_rcu(&ghes->list, &ghes_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		rc = ghes_sea_add(ghes);

... so this error handling will never be needed.
ghes_nmi_add() returns void.

I guess the not-configured versions of the symbols need to exist for older
compilers that can't work out that this can never be called.


> +		if (rc)
> +			goto err_edac_unreg;
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_add(ghes);
>  		break;
> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		ghes_sea_remove(ghes);
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_remove(ghes);
>  		break;

> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 6ae318b..adf5455 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>  		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>  		gdata + 1;
>  }
> +
> +void ghes_notify_sea(void);
> 

This header file ought to have an include guard, could you add one?

I think the kbuild-robot has managed to configure SEA on, but GHES off so ghes.c
isn't included in the kernel. I will dig some more into this on Monday.



Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-03 16:00     ` James Morse
  0 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-03 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tyler,

On 01/02/17 17:16, Tyler Baicar wrote:
> ARM APEI extension proposal added SEA (Synchronous External Abort)
> notification type for ARMv8.
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.

It's worth adding details of the other things this patch changes, just to alert
busy reviewers, something like:
An SEA can interrupt code that had interrupts masked and is treated as an NMI.
To aid this the page of address space for mapping APEI buffers while in_nmi() is
always reserved, and ghes_ioremap_pfn_nmi() is changed to use the helper methods
to find the prot_t to map with in the same way as ghes_ioremap_pfn_irq().


> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1117421..f92778d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -53,6 +53,8 @@ config ARM64
>  	select HANDLE_DOMAIN_IRQ
>  	select HARDIRQS_SW_RESEND
>  	select HAVE_ACPI_APEI if (ACPI && EFI)
> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)

> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA

Nit: This section of the file is largely in alphabetical order, can we try to
keep it that way?!


>  	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_BITREVERSE
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9ae7e65..5a5a096 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -30,6 +30,7 @@
>  #include <linux/highmem.h>
>  #include <linux/perf_event.h>
>  #include <linux/preempt.h>

> +#include <linux/hardirq.h>

This header is already included by this file further up.


>  
>  #include <asm/bug.h>
>  #include <asm/cpufeature.h>
> @@ -41,6 +42,8 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  
> +#include <acpi/ghes.h>
> +
>  static const char *fault_name(unsigned int esr);
>  
>  #ifdef CONFIG_KPROBES
> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>  		 fault_name(esr), esr, addr);
>  

> +	/*
> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
> +	 * rcu_read_lock().
> +	 */

This comment should go against the use of RCU in ghes_notify_sea(), but there
should be something here to explain the surprise use of nmi. Something like:
Synchronous aborts may interrupt code which had interrupts masked. Before
calling out into the wider kernel tell the interested subsystems.


This should be wrapped in:
if (IS_ENABLED(HAVE_ACPI_APEI_SEA)) {

> +	nmi_enter();
> +	ghes_notify_sea();
> +	nmi_exit();

}

To avoid breaking systems that don't have SEA configured.


> +
>  	info.si_signo = SIGBUS;
>  	info.si_errno = 0;
>  	info.si_code  = 0;
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index b0140c8..3786ff1 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>  config HAVE_ACPI_APEI_NMI
>  	bool
>  
> +config HAVE_ACPI_APEI_SEA
> +	bool "APEI Synchronous External Abort logging/recovering support"
> +	depends on ARM64

depends on CONFIG_ACPI_APEI_GHES
?

(I think this is what the kbuild robot has managed to miss out!)


> +	help
> +	  This option should be enabled if the system supports
> +	  firmware first handling of SEA (Synchronous External Abort).
> +	  SEA happens with certain faults of data abort or instruction
> +	  abort synchronous exceptions on ARMv8 systems. If a system
> +	  supports firmware first handling of SEA, the platform analyzes
> +	  and handles hardware error notifications with SEA, and it may then

Nit: notifications from SEA,


> +	  form a HW error record for the OS to parse and handle. This
> +	  option allows the OS to look for such HW error record, and

Nit: 'HW'->hardware. This is spelled out for the other seven uses in the file.


> +	  take appropriate action.
> +
>  config ACPI_APEI
>  	bool "ACPI Platform Error Interface (APEI)"
>  	select MISC_FILESYSTEMS
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index b25e7cf..8756172 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -114,11 +114,7 @@
>   * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>   * NMI context (optionally).
>   */
> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  #define GHES_IOREMAP_PAGES           2
> -#else
> -#define GHES_IOREMAP_PAGES           1
> -#endif
>  #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>  #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>  
> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>  
>  static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>  {
> -	unsigned long vaddr;
> +	unsigned long vaddr, paddr;
> +	pgprot_t prot;
>  
>  	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
> +

> +	paddr = pfn << PAGE_SHIFT;

Physical addresses might not always fit in 'unsigned long'. phys_addr_t exists
to hide this nasty detail!

>From arch/x86/Kconfig:
> config ARCH_PHYS_ADDR_T_64BIT
> 	def_bool y
> 	depends on X86_64 || X86_PAE

32bit x86 kernels configured with PAE define phys_addr_t to be u64.


> +	prot = arch_apei_get_mem_attribute(paddr);
> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>  
>  	return (void __iomem *)vaddr;
>  }
> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>  	.notifier_call = ghes_notify_sci,
>  };
>  
> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
> +static LIST_HEAD(ghes_sea);
> +
> +void ghes_notify_sea(void)
> +{
> +	struct ghes *ghes;
> +

	/*
	 * synchronize_rcu() will wait for nmi_exit(), so no need to
	 * rcu_read_lock().
	 */

> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
> +		ghes_proc(ghes);
> +	}
> +}
> +
> +static int ghes_sea_add(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_add_rcu(&ghes->list, &ghes_sea);
> +	mutex_unlock(&ghes_list_mutex);

> +	return 0;

This function returns 0 or -ENOTSUPP, depending on CONFIG_HAVE_ACPI_APEI_SEA,
but ...


> +}
> +
> +static void ghes_sea_remove(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_del_rcu(&ghes->list);
> +	mutex_unlock(&ghes_list_mutex);
> +	synchronize_rcu();
> +}
> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
> +static inline int ghes_sea_add(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +	return -ENOTSUPP;
> +}
> +
> +static inline void ghes_sea_remove(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +}
> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
> +
>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  /*
>   * printk is not safe in NMI context.  So in NMI handler, we allocate
> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  	case ACPI_HEST_NOTIFY_EXTERNAL:
>  	case ACPI_HEST_NOTIFY_SCI:
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
> +				generic->header.source_id);
> +			rc = -ENOTSUPP;
> +			goto err;

... here we jump out of ghes_probe() if NOTIFY_SEA is used but the kernel wasn't
built with CONFIG_HAVE_ACPI_APEI_SEA....


> +		}
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>  			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>  			   generic->header.source_id);
>  		goto err;
> +	case ACPI_HEST_NOTIFY_GPIO:
> +	case ACPI_HEST_NOTIFY_SEI:
> +	case ACPI_HEST_NOTIFY_GSIV:
> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
> +			generic->header.source_id, generic->header.source_id);
> +		rc = -ENOTSUPP;
> +		goto err;
>  	default:
>  		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>  			   generic->notify.type, generic->header.source_id);
> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		list_add_rcu(&ghes->list, &ghes_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		rc = ghes_sea_add(ghes);

... so this error handling will never be needed.
ghes_nmi_add() returns void.

I guess the not-configured versions of the symbols need to exist for older
compilers that can't work out that this can never be called.


> +		if (rc)
> +			goto err_edac_unreg;
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_add(ghes);
>  		break;
> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		ghes_sea_remove(ghes);
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_remove(ghes);
>  		break;

> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 6ae318b..adf5455 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>  		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>  		gdata + 1;
>  }
> +
> +void ghes_notify_sea(void);
> 

This header file ought to have an include guard, could you add one?

I think the kbuild-robot has managed to configure SEA on, but GHES off so ghes.c
isn't included in the kernel. I will dig some more into this on Monday.



Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 09/10] trace, ras: add ARM processor error trace event
  2017-02-02  3:15     ` Steven Rostedt
  (?)
@ 2017-02-03 20:18       ` Baicar, Tyler
  -1 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-03 20:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, bristot, linux-arm-kernel, kvmarm,
	kvm, linux-kernel, linux-acpi, linux-efi, devel, S

Hello Steve,


On 2/1/2017 8:15 PM, Steven Rostedt wrote:
> On Wed,  1 Feb 2017 10:16:52 -0700
> Tyler Baicar <tbaicar@codeaurora.org> wrote:
>
>> Currently there are trace events for the various RAS
>> errors with the exception of ARM processor type errors.
>> Add a new trace event for such errors so that the user
>> will know when they occur. These trace events are
>> consistent with the ARM processor error section type
>> defined in UEFI 2.6 spec section N.2.4.4.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> Acked-by: Steven Rostedt <rostedt@goodmis.org>
>> ---
>>   drivers/acpi/apei/ghes.c    |  7 ++++++-
>>   drivers/firmware/efi/cper.c |  1 +
>>   drivers/ras/ras.c           |  1 +
>>   include/ras/ras_event.h     | 34 ++++++++++++++++++++++++++++++++++
>>   4 files changed, 42 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index a989345..013faf0 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -512,7 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>>   
>>   		}
>>   #endif
>> -		else {
>> +		else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM)) {
>> +			struct cper_sec_proc_arm *arm_err;
>> +
>> +			arm_err = acpi_hest_generic_data_payload(gdata);
>> +			trace_arm_event(arm_err);
> According to the kbuild failure, I'm guessing this file requires a:
>
>   #include <ras/ras_event.h>
I add that include in patch 8/10 of this series, so ghes.c has the 
include before this patch. The kbuild
complained about the same thing for both trace events added in this 
series. Upon further debug, it
looks like I'll need to verify that CONFIG_RAS is enabled to make these 
trace event calls. It's strange
that kbuild didn't complain in earlier versions of this series because 
these have been this way the
whole time :) I'll add the config check in the next series.

Thanks,
Tyler
> -- Steve
>
>> +		} else {
>>   			void *unknown_err = acpi_hest_generic_data_payload(gdata);
>>   			trace_unknown_sec_event(&sec_type,
>>   					fru_id, fru_text, sec_sev,
>> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
>> index 48cb8ee..0ec678e 100644
>> --- a/drivers/firmware/efi/cper.c
>> +++ b/drivers/firmware/efi/cper.c
>> @@ -35,6 +35,7 @@
>>   #include <linux/printk.h>
>>   #include <linux/bcd.h>
>>   #include <acpi/ghes.h>
>> +#include <ras/ras_event.h>
>>   
>>   #define INDENT_SP	" "
>>   
>> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
>> index fb2500b..8ba5a94 100644
>> --- a/drivers/ras/ras.c
>> +++ b/drivers/ras/ras.c
>> @@ -28,3 +28,4 @@ static int __init ras_init(void)
>>   #endif
>>   EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
>>   EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
>> +EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
>> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
>> index 5861b6f..b36db48 100644
>> --- a/include/ras/ras_event.h
>> +++ b/include/ras/ras_event.h
>> @@ -162,6 +162,40 @@
>>   );
>>   
>>   /*
>> + * ARM Processor Events Report
>> + *
>> + * This event is generated when hardware detects an ARM processor error
>> + * has occurred. UEFI 2.6 spec section N.2.4.4.
>> + */
>> +TRACE_EVENT(arm_event,
>> +
>> +	TP_PROTO(const struct cper_sec_proc_arm *proc),
>> +
>> +	TP_ARGS(proc),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(u64, mpidr)
>> +		__field(u64, midr)
>> +		__field(u32, running_state)
>> +		__field(u32, psci_state)
>> +		__field(u8, affinity)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->affinity = proc->affinity_level;
>> +		__entry->mpidr = proc->mpidr;
>> +		__entry->midr = proc->midr;
>> +		__entry->running_state = proc->running_state;
>> +		__entry->psci_state = proc->psci_state;
>> +	),
>> +
>> +	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
>> +		  "running state: %d; PSCI state: %d",
>> +		  __entry->affinity, __entry->mpidr, __entry->midr,
>> +		  __entry->running_state, __entry->psci_state)
>> +);
>> +
>> +/*
>>    * Unknown Section Report
>>    *
>>    * This event is generated when hardware detected a hardware

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 09/10] trace, ras: add ARM processor error trace event
@ 2017-02-03 20:18       ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-03 20:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, bristot, linux-arm-kernel, kvmarm,
	kvm, linux-kernel, linux-acpi, linux-efi, devel, Suzuki.Poulose,
	punit.agrawal, astone, harba, hanjun.guo, john.garry, shiju.jose

Hello Steve,


On 2/1/2017 8:15 PM, Steven Rostedt wrote:
> On Wed,  1 Feb 2017 10:16:52 -0700
> Tyler Baicar <tbaicar@codeaurora.org> wrote:
>
>> Currently there are trace events for the various RAS
>> errors with the exception of ARM processor type errors.
>> Add a new trace event for such errors so that the user
>> will know when they occur. These trace events are
>> consistent with the ARM processor error section type
>> defined in UEFI 2.6 spec section N.2.4.4.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> Acked-by: Steven Rostedt <rostedt@goodmis.org>
>> ---
>>   drivers/acpi/apei/ghes.c    |  7 ++++++-
>>   drivers/firmware/efi/cper.c |  1 +
>>   drivers/ras/ras.c           |  1 +
>>   include/ras/ras_event.h     | 34 ++++++++++++++++++++++++++++++++++
>>   4 files changed, 42 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index a989345..013faf0 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -512,7 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>>   
>>   		}
>>   #endif
>> -		else {
>> +		else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM)) {
>> +			struct cper_sec_proc_arm *arm_err;
>> +
>> +			arm_err = acpi_hest_generic_data_payload(gdata);
>> +			trace_arm_event(arm_err);
> According to the kbuild failure, I'm guessing this file requires a:
>
>   #include <ras/ras_event.h>
I add that include in patch 8/10 of this series, so ghes.c has the 
include before this patch. The kbuild
complained about the same thing for both trace events added in this 
series. Upon further debug, it
looks like I'll need to verify that CONFIG_RAS is enabled to make these 
trace event calls. It's strange
that kbuild didn't complain in earlier versions of this series because 
these have been this way the
whole time :) I'll add the config check in the next series.

Thanks,
Tyler
> -- Steve
>
>> +		} else {
>>   			void *unknown_err = acpi_hest_generic_data_payload(gdata);
>>   			trace_unknown_sec_event(&sec_type,
>>   					fru_id, fru_text, sec_sev,
>> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
>> index 48cb8ee..0ec678e 100644
>> --- a/drivers/firmware/efi/cper.c
>> +++ b/drivers/firmware/efi/cper.c
>> @@ -35,6 +35,7 @@
>>   #include <linux/printk.h>
>>   #include <linux/bcd.h>
>>   #include <acpi/ghes.h>
>> +#include <ras/ras_event.h>
>>   
>>   #define INDENT_SP	" "
>>   
>> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
>> index fb2500b..8ba5a94 100644
>> --- a/drivers/ras/ras.c
>> +++ b/drivers/ras/ras.c
>> @@ -28,3 +28,4 @@ static int __init ras_init(void)
>>   #endif
>>   EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
>>   EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
>> +EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
>> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
>> index 5861b6f..b36db48 100644
>> --- a/include/ras/ras_event.h
>> +++ b/include/ras/ras_event.h
>> @@ -162,6 +162,40 @@
>>   );
>>   
>>   /*
>> + * ARM Processor Events Report
>> + *
>> + * This event is generated when hardware detects an ARM processor error
>> + * has occurred. UEFI 2.6 spec section N.2.4.4.
>> + */
>> +TRACE_EVENT(arm_event,
>> +
>> +	TP_PROTO(const struct cper_sec_proc_arm *proc),
>> +
>> +	TP_ARGS(proc),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(u64, mpidr)
>> +		__field(u64, midr)
>> +		__field(u32, running_state)
>> +		__field(u32, psci_state)
>> +		__field(u8, affinity)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->affinity = proc->affinity_level;
>> +		__entry->mpidr = proc->mpidr;
>> +		__entry->midr = proc->midr;
>> +		__entry->running_state = proc->running_state;
>> +		__entry->psci_state = proc->psci_state;
>> +	),
>> +
>> +	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
>> +		  "running state: %d; PSCI state: %d",
>> +		  __entry->affinity, __entry->mpidr, __entry->midr,
>> +		  __entry->running_state, __entry->psci_state)
>> +);
>> +
>> +/*
>>    * Unknown Section Report
>>    *
>>    * This event is generated when hardware detected a hardware

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 09/10] trace, ras: add ARM processor error trace event
@ 2017-02-03 20:18       ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-03 20:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Steve,


On 2/1/2017 8:15 PM, Steven Rostedt wrote:
> On Wed,  1 Feb 2017 10:16:52 -0700
> Tyler Baicar <tbaicar@codeaurora.org> wrote:
>
>> Currently there are trace events for the various RAS
>> errors with the exception of ARM processor type errors.
>> Add a new trace event for such errors so that the user
>> will know when they occur. These trace events are
>> consistent with the ARM processor error section type
>> defined in UEFI 2.6 spec section N.2.4.4.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> Acked-by: Steven Rostedt <rostedt@goodmis.org>
>> ---
>>   drivers/acpi/apei/ghes.c    |  7 ++++++-
>>   drivers/firmware/efi/cper.c |  1 +
>>   drivers/ras/ras.c           |  1 +
>>   include/ras/ras_event.h     | 34 ++++++++++++++++++++++++++++++++++
>>   4 files changed, 42 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index a989345..013faf0 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -512,7 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>>   
>>   		}
>>   #endif
>> -		else {
>> +		else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM)) {
>> +			struct cper_sec_proc_arm *arm_err;
>> +
>> +			arm_err = acpi_hest_generic_data_payload(gdata);
>> +			trace_arm_event(arm_err);
> According to the kbuild failure, I'm guessing this file requires a:
>
>   #include <ras/ras_event.h>
I add that include in patch 8/10 of this series, so ghes.c has the 
include before this patch. The kbuild
complained about the same thing for both trace events added in this 
series. Upon further debug, it
looks like I'll need to verify that CONFIG_RAS is enabled to make these 
trace event calls. It's strange
that kbuild didn't complain in earlier versions of this series because 
these have been this way the
whole time :) I'll add the config check in the next series.

Thanks,
Tyler
> -- Steve
>
>> +		} else {
>>   			void *unknown_err = acpi_hest_generic_data_payload(gdata);
>>   			trace_unknown_sec_event(&sec_type,
>>   					fru_id, fru_text, sec_sev,
>> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
>> index 48cb8ee..0ec678e 100644
>> --- a/drivers/firmware/efi/cper.c
>> +++ b/drivers/firmware/efi/cper.c
>> @@ -35,6 +35,7 @@
>>   #include <linux/printk.h>
>>   #include <linux/bcd.h>
>>   #include <acpi/ghes.h>
>> +#include <ras/ras_event.h>
>>   
>>   #define INDENT_SP	" "
>>   
>> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
>> index fb2500b..8ba5a94 100644
>> --- a/drivers/ras/ras.c
>> +++ b/drivers/ras/ras.c
>> @@ -28,3 +28,4 @@ static int __init ras_init(void)
>>   #endif
>>   EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
>>   EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
>> +EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
>> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
>> index 5861b6f..b36db48 100644
>> --- a/include/ras/ras_event.h
>> +++ b/include/ras/ras_event.h
>> @@ -162,6 +162,40 @@
>>   );
>>   
>>   /*
>> + * ARM Processor Events Report
>> + *
>> + * This event is generated when hardware detects an ARM processor error
>> + * has occurred. UEFI 2.6 spec section N.2.4.4.
>> + */
>> +TRACE_EVENT(arm_event,
>> +
>> +	TP_PROTO(const struct cper_sec_proc_arm *proc),
>> +
>> +	TP_ARGS(proc),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(u64, mpidr)
>> +		__field(u64, midr)
>> +		__field(u32, running_state)
>> +		__field(u32, psci_state)
>> +		__field(u8, affinity)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->affinity = proc->affinity_level;
>> +		__entry->mpidr = proc->mpidr;
>> +		__entry->midr = proc->midr;
>> +		__entry->running_state = proc->running_state;
>> +		__entry->psci_state = proc->psci_state;
>> +	),
>> +
>> +	TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
>> +		  "running state: %d; PSCI state: %d",
>> +		  __entry->affinity, __entry->mpidr, __entry->midr,
>> +		  __entry->running_state, __entry->psci_state)
>> +);
>> +
>> +/*
>>    * Unknown Section Report
>>    *
>>    * This event is generated when hardware detected a hardware

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 04/10] arm64: exception: handle Synchronous External Abort
  2017-02-03 15:59     ` James Morse
  (?)
@ 2017-02-03 20:24       ` Baicar, Tyler
  -1 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-03 20:24 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, akpm, eun.taik.lee,
	sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Po

Hello James,


On 2/3/2017 8:59 AM, James Morse wrote:
> On 01/02/17 17:16, Tyler Baicar wrote:
>> SEA exceptions are often caused by an uncorrected hardware
>> error, and are handled when data abort and instruction abort
>> exception classes have specific values for their Fault Status
>> Code.
>> When SEA occurs, before killing the process, report the error
>> in the kernel logs.
>> Update fault_info[] with specific SEA faults so that the
>> new SEA handler is used.
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 156169c..9ae7e65 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -487,6 +487,31 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	return 1;
>>   }
>>   
>> +#define SEA_FnV_MASK	0x00000400
> There are a glut of ESR_ELx_ macros in arch/arm64/include/asm/esr.h, could this
> be fitted in there in a similar format?
>
>
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -83,6 +83,7 @@
>   #define ESR_ELx_WNR            (UL(1) << 6)
>
>   /* Shared ISS field definitions for Data/Instruction aborts */
> +#define ESR_ELx_FnV            (UL(1) << 10)
>   #define ESR_ELx_EA             (UL(1) << 9)
>   #define ESR_ELx_S1PTW          (UL(1) << 7)
I'll make this change in the next version.
>> +
>> +/*
>> + * This abort handler deals with Synchronous External Abort.
>> + * It calls notifiers, and then returns "fault".
>> + */
>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>> +{
>> +	struct siginfo info;
>> +
>> +	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>> +		 fault_name(esr), esr, addr);
>> +
>> +	info.si_signo = SIGBUS;
>> +	info.si_errno = 0;
>> +	info.si_code  = 0;
>> +	if (esr & SEA_FnV_MASK)
>> +		info.si_addr = 0;
>> +	else
>> +		info.si_addr  = (void __user *)addr;
>> +	arm64_notify_die("", regs, &info, esr);
>> +
>> +	return 0;
>> +}
>> +
>>   static const struct fault_info {
>>   	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
>>   	int	sig;
>> @@ -509,22 +534,22 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
>> +	{ do_sea,		SIGBUS,  0,		"synchronous external abort"	},
> This will print:
>> Synchronous External Abort: synchronous external abort
> It looks odd, but I can't think of anything better to put there.
>
>
>>   	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 0 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 1 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 2 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 3 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"synchronous parity or ECC error" },
>>   	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> +	{ do_sea,		SIGBUS,  0,		"level 0 synchronous parity error (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 1 synchronous parity error (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 2 synchronous parity error (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 3 synchronous parity error (translation table walk)"	},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
>>   	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
>>
>
> With the ESR_ELx_FnV change above,
> Reviewed-by: James Morse <james.morse@arm.com>
>
Great, thanks!

Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 04/10] arm64: exception: handle Synchronous External Abort
@ 2017-02-03 20:24       ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-03 20:24 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, akpm, eun.taik.lee,
	sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose

Hello James,


On 2/3/2017 8:59 AM, James Morse wrote:
> On 01/02/17 17:16, Tyler Baicar wrote:
>> SEA exceptions are often caused by an uncorrected hardware
>> error, and are handled when data abort and instruction abort
>> exception classes have specific values for their Fault Status
>> Code.
>> When SEA occurs, before killing the process, report the error
>> in the kernel logs.
>> Update fault_info[] with specific SEA faults so that the
>> new SEA handler is used.
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 156169c..9ae7e65 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -487,6 +487,31 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	return 1;
>>   }
>>   
>> +#define SEA_FnV_MASK	0x00000400
> There are a glut of ESR_ELx_ macros in arch/arm64/include/asm/esr.h, could this
> be fitted in there in a similar format?
>
>
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -83,6 +83,7 @@
>   #define ESR_ELx_WNR            (UL(1) << 6)
>
>   /* Shared ISS field definitions for Data/Instruction aborts */
> +#define ESR_ELx_FnV            (UL(1) << 10)
>   #define ESR_ELx_EA             (UL(1) << 9)
>   #define ESR_ELx_S1PTW          (UL(1) << 7)
I'll make this change in the next version.
>> +
>> +/*
>> + * This abort handler deals with Synchronous External Abort.
>> + * It calls notifiers, and then returns "fault".
>> + */
>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>> +{
>> +	struct siginfo info;
>> +
>> +	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>> +		 fault_name(esr), esr, addr);
>> +
>> +	info.si_signo = SIGBUS;
>> +	info.si_errno = 0;
>> +	info.si_code  = 0;
>> +	if (esr & SEA_FnV_MASK)
>> +		info.si_addr = 0;
>> +	else
>> +		info.si_addr  = (void __user *)addr;
>> +	arm64_notify_die("", regs, &info, esr);
>> +
>> +	return 0;
>> +}
>> +
>>   static const struct fault_info {
>>   	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
>>   	int	sig;
>> @@ -509,22 +534,22 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
>> +	{ do_sea,		SIGBUS,  0,		"synchronous external abort"	},
> This will print:
>> Synchronous External Abort: synchronous external abort
> It looks odd, but I can't think of anything better to put there.
>
>
>>   	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 0 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 1 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 2 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 3 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"synchronous parity or ECC error" },
>>   	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> +	{ do_sea,		SIGBUS,  0,		"level 0 synchronous parity error (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 1 synchronous parity error (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 2 synchronous parity error (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 3 synchronous parity error (translation table walk)"	},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
>>   	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
>>
>
> With the ESR_ELx_FnV change above,
> Reviewed-by: James Morse <james.morse@arm.com>
>
Great, thanks!

Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 04/10] arm64: exception: handle Synchronous External Abort
@ 2017-02-03 20:24       ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-03 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hello James,


On 2/3/2017 8:59 AM, James Morse wrote:
> On 01/02/17 17:16, Tyler Baicar wrote:
>> SEA exceptions are often caused by an uncorrected hardware
>> error, and are handled when data abort and instruction abort
>> exception classes have specific values for their Fault Status
>> Code.
>> When SEA occurs, before killing the process, report the error
>> in the kernel logs.
>> Update fault_info[] with specific SEA faults so that the
>> new SEA handler is used.
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 156169c..9ae7e65 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -487,6 +487,31 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	return 1;
>>   }
>>   
>> +#define SEA_FnV_MASK	0x00000400
> There are a glut of ESR_ELx_ macros in arch/arm64/include/asm/esr.h, could this
> be fitted in there in a similar format?
>
>
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -83,6 +83,7 @@
>   #define ESR_ELx_WNR            (UL(1) << 6)
>
>   /* Shared ISS field definitions for Data/Instruction aborts */
> +#define ESR_ELx_FnV            (UL(1) << 10)
>   #define ESR_ELx_EA             (UL(1) << 9)
>   #define ESR_ELx_S1PTW          (UL(1) << 7)
I'll make this change in the next version.
>> +
>> +/*
>> + * This abort handler deals with Synchronous External Abort.
>> + * It calls notifiers, and then returns "fault".
>> + */
>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>> +{
>> +	struct siginfo info;
>> +
>> +	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>> +		 fault_name(esr), esr, addr);
>> +
>> +	info.si_signo = SIGBUS;
>> +	info.si_errno = 0;
>> +	info.si_code  = 0;
>> +	if (esr & SEA_FnV_MASK)
>> +		info.si_addr = 0;
>> +	else
>> +		info.si_addr  = (void __user *)addr;
>> +	arm64_notify_die("", regs, &info, esr);
>> +
>> +	return 0;
>> +}
>> +
>>   static const struct fault_info {
>>   	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
>>   	int	sig;
>> @@ -509,22 +534,22 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
>> +	{ do_sea,		SIGBUS,  0,		"synchronous external abort"	},
> This will print:
>> Synchronous External Abort: synchronous external abort
> It looks odd, but I can't think of anything better to put there.
>
>
>>   	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 0 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 1 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 2 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 3 (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"synchronous parity or ECC error" },
>>   	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> +	{ do_sea,		SIGBUS,  0,		"level 0 synchronous parity error (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 1 synchronous parity error (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 2 synchronous parity error (translation table walk)"	},
>> +	{ do_sea,		SIGBUS,  0,		"level 3 synchronous parity error (translation table walk)"	},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
>>   	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 34"			},
>>
>
> With the ESR_ELx_FnV change above,
> Reviewed-by: James Morse <james.morse@arm.com>
>
Great, thanks!

Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
  2017-02-03 16:00     ` James Morse
  (?)
@ 2017-02-03 20:38       ` Baicar, Tyler
  -1 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-03 20:38 UTC (permalink / raw)
  To: James Morse
  Cc: linux-efi, kvm, matt, catalin.marinas, will.deacon, robert.moore,
	paul.gortmaker, lv.zheng, kvmarm, fu.wei, zjzhang, linux,
	linux-acpi, eun.taik.lee, shijie.huang, labbott, lenb, harba,
	john.garry, marc.zyngier, punit.agrawal, rostedt, nkaje,
	sandeepa.s.prabhu, linux-arm-kernel, devel, rjw, rruigrok,
	linux-kernel, astone, hanjun.guo, pbonzini, akpm, bristot,
	shiju.jose

Hello James,


On 2/3/2017 9:00 AM, James Morse wrote:
> On 01/02/17 17:16, Tyler Baicar wrote:
>> ARM APEI extension proposal added SEA (Synchronous External Abort)
>> notification type for ARMv8.
>> Add a new GHES error source handling function for SEA. If an error
>> source's notification type is SEA, then this function can be registered
>> into the SEA exception handler. That way GHES will parse and report
>> SEA exceptions when they occur.
> It's worth adding details of the other things this patch changes, just to alert
> busy reviewers, something like:
> An SEA can interrupt code that had interrupts masked and is treated as an NMI.
> To aid this the page of address space for mapping APEI buffers while in_nmi() is
> always reserved, and ghes_ioremap_pfn_nmi() is changed to use the helper methods
> to find the prot_t to map with in the same way as ghes_ioremap_pfn_irq().
I'll add this in in the next version.
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 1117421..f92778d 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -53,6 +53,8 @@ config ARM64
>>   	select HANDLE_DOMAIN_IRQ
>>   	select HARDIRQS_SW_RESEND
>>   	select HAVE_ACPI_APEI if (ACPI && EFI)
>> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
>> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA
> Nit: This section of the file is largely in alphabetical order, can we try to
> keep it that way?!
Yes, will do!
>>   	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>>   	select HAVE_ARCH_AUDITSYSCALL
>>   	select HAVE_ARCH_BITREVERSE
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 9ae7e65..5a5a096 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -30,6 +30,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/perf_event.h>
>>   #include <linux/preempt.h>
>> +#include <linux/hardirq.h>
> This header is already included by this file further up.
Oops, guess I missed that :)
>>   
>>   #include <asm/bug.h>
>>   #include <asm/cpufeature.h>
>> @@ -41,6 +42,8 @@
>>   #include <asm/pgtable.h>
>>   #include <asm/tlbflush.h>
>>   
>> +#include <acpi/ghes.h>
>> +
>>   static const char *fault_name(unsigned int esr);
>>   
>>   #ifdef CONFIG_KPROBES
>> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>   		 fault_name(esr), esr, addr);
>>   
>> +	/*
>> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
>> +	 * rcu_read_lock().
>> +	 */
> This comment should go against the use of RCU in ghes_notify_sea(), but there
> should be something here to explain the surprise use of nmi. Something like:
> Synchronous aborts may interrupt code which had interrupts masked. Before
> calling out into the wider kernel tell the interested subsystems.
Sounds good. I'll update the comments.
> This should be wrapped in:
> if (IS_ENABLED(HAVE_ACPI_APEI_SEA)) {
>
>> +	nmi_enter();
>> +	ghes_notify_sea();
>> +	nmi_exit();
> }
>
> To avoid breaking systems that don't have SEA configured.
Yes, I'll add that check in the next version.
>> +
>>   	info.si_signo = SIGBUS;
>>   	info.si_errno = 0;
>>   	info.si_code  = 0;
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index b0140c8..3786ff1 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>>   config HAVE_ACPI_APEI_NMI
>>   	bool
>>   
>> +config HAVE_ACPI_APEI_SEA
>> +	bool "APEI Synchronous External Abort logging/recovering support"
>> +	depends on ARM64
> depends on CONFIG_ACPI_APEI_GHES
> ?
>
> (I think this is what the kbuild robot has managed to miss out!)
Will do.
>> +	help
>> +	  This option should be enabled if the system supports
>> +	  firmware first handling of SEA (Synchronous External Abort).
>> +	  SEA happens with certain faults of data abort or instruction
>> +	  abort synchronous exceptions on ARMv8 systems. If a system
>> +	  supports firmware first handling of SEA, the platform analyzes
>> +	  and handles hardware error notifications with SEA, and it may then
> Nit: notifications from SEA,
Will do.
>> +	  form a HW error record for the OS to parse and handle. This
>> +	  option allows the OS to look for such HW error record, and
> Nit: 'HW'->hardware. This is spelled out for the other seven uses in the file.
Will do.
>> +	  take appropriate action.
>> +
>>   config ACPI_APEI
>>   	bool "ACPI Platform Error Interface (APEI)"
>>   	select MISC_FILESYSTEMS
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index b25e7cf..8756172 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -114,11 +114,7 @@
>>    * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>>    * NMI context (optionally).
>>    */
>> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   #define GHES_IOREMAP_PAGES           2
>> -#else
>> -#define GHES_IOREMAP_PAGES           1
>> -#endif
>>   #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>>   #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>>   
>> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>>   
>>   static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>>   {
>> -	unsigned long vaddr;
>> +	unsigned long vaddr, paddr;
>> +	pgprot_t prot;
>>   
>>   	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
>> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
>> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
>> +
>> +	paddr = pfn << PAGE_SHIFT;
> Physical addresses might not always fit in 'unsigned long'. phys_addr_t exists
> to hide this nasty detail!
>
>  From arch/x86/Kconfig:
>> config ARCH_PHYS_ADDR_T_64BIT
>> 	def_bool y
>> 	depends on X86_64 || X86_PAE
> 32bit x86 kernels configured with PAE define phys_addr_t to be u64.
I'll make that change.

ghes_ioremap_pfn_irq() should probably be updated in a separate change 
because it uses unsigned
long for it's paddr...I just mimicked that code here :)
>> +	prot = arch_apei_get_mem_attribute(paddr);
>> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>>   
>>   	return (void __iomem *)vaddr;
>>   }
>> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>>   	.notifier_call = ghes_notify_sci,
>>   };
>>   
>> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
>> +static LIST_HEAD(ghes_sea);
>> +
>> +void ghes_notify_sea(void)
>> +{
>> +	struct ghes *ghes;
>> +
> 	/*
> 	 * synchronize_rcu() will wait for nmi_exit(), so no need to
> 	 * rcu_read_lock().
> 	 */
>
>> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
>> +		ghes_proc(ghes);
>> +	}
>> +}
>> +
>> +static int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_add_rcu(&ghes->list, &ghes_sea);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	return 0;
> This function returns 0 or -ENOTSUPP, depending on CONFIG_HAVE_ACPI_APEI_SEA,
> but ...
>
>
>> +}
>> +
>> +static void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_del_rcu(&ghes->list);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	synchronize_rcu();
>> +}
>> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +static inline int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +	return -ENOTSUPP;
>> +}
>> +
>> +static inline void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +}
>> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +
>>   #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   /*
>>    * printk is not safe in NMI context.  So in NMI handler, we allocate
>> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   	case ACPI_HEST_NOTIFY_EXTERNAL:
>>   	case ACPI_HEST_NOTIFY_SCI:
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
>> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
>> +				generic->header.source_id);
>> +			rc = -ENOTSUPP;
>> +			goto err;
> ... here we jump out of ghes_probe() if NOTIFY_SEA is used but the kernel wasn't
> built with CONFIG_HAVE_ACPI_APEI_SEA....
>
>
>> +		}
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>>   			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
>> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>>   			   generic->header.source_id);
>>   		goto err;
>> +	case ACPI_HEST_NOTIFY_GPIO:
>> +	case ACPI_HEST_NOTIFY_SEI:
>> +	case ACPI_HEST_NOTIFY_GSIV:
>> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
>> +			generic->header.source_id, generic->header.source_id);
>> +		rc = -ENOTSUPP;
>> +		goto err;
>>   	default:
>>   		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>>   			   generic->notify.type, generic->header.source_id);
>> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		list_add_rcu(&ghes->list, &ghes_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		rc = ghes_sea_add(ghes);
> ... so this error handling will never be needed.
> ghes_nmi_add() returns void.
>
> I guess the not-configured versions of the symbols need to exist for older
> compilers that can't work out that this can never be called.
I can remove the extra error handling here.
>> +		if (rc)
>> +			goto err_edac_unreg;
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_add(ghes);
>>   		break;
>> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>>   			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		ghes_sea_remove(ghes);
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_remove(ghes);
>>   		break;
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 6ae318b..adf5455 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>>   		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>>   		gdata + 1;
>>   }
>> +
>> +void ghes_notify_sea(void);
>>
> This header file ought to have an include guard, could you add one?
>
> I think the kbuild-robot has managed to configure SEA on, but GHES off so ghes.c
> isn't included in the kernel. I will dig some more into this on Monday.
Yes, I can add that.

Thanks for all the great reviewing :)

Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-03 20:38       ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-03 20:38 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, akpm, eun.taik.lee,
	sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose

Hello James,


On 2/3/2017 9:00 AM, James Morse wrote:
> On 01/02/17 17:16, Tyler Baicar wrote:
>> ARM APEI extension proposal added SEA (Synchronous External Abort)
>> notification type for ARMv8.
>> Add a new GHES error source handling function for SEA. If an error
>> source's notification type is SEA, then this function can be registered
>> into the SEA exception handler. That way GHES will parse and report
>> SEA exceptions when they occur.
> It's worth adding details of the other things this patch changes, just to alert
> busy reviewers, something like:
> An SEA can interrupt code that had interrupts masked and is treated as an NMI.
> To aid this the page of address space for mapping APEI buffers while in_nmi() is
> always reserved, and ghes_ioremap_pfn_nmi() is changed to use the helper methods
> to find the prot_t to map with in the same way as ghes_ioremap_pfn_irq().
I'll add this in in the next version.
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 1117421..f92778d 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -53,6 +53,8 @@ config ARM64
>>   	select HANDLE_DOMAIN_IRQ
>>   	select HARDIRQS_SW_RESEND
>>   	select HAVE_ACPI_APEI if (ACPI && EFI)
>> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
>> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA
> Nit: This section of the file is largely in alphabetical order, can we try to
> keep it that way?!
Yes, will do!
>>   	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>>   	select HAVE_ARCH_AUDITSYSCALL
>>   	select HAVE_ARCH_BITREVERSE
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 9ae7e65..5a5a096 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -30,6 +30,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/perf_event.h>
>>   #include <linux/preempt.h>
>> +#include <linux/hardirq.h>
> This header is already included by this file further up.
Oops, guess I missed that :)
>>   
>>   #include <asm/bug.h>
>>   #include <asm/cpufeature.h>
>> @@ -41,6 +42,8 @@
>>   #include <asm/pgtable.h>
>>   #include <asm/tlbflush.h>
>>   
>> +#include <acpi/ghes.h>
>> +
>>   static const char *fault_name(unsigned int esr);
>>   
>>   #ifdef CONFIG_KPROBES
>> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>   		 fault_name(esr), esr, addr);
>>   
>> +	/*
>> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
>> +	 * rcu_read_lock().
>> +	 */
> This comment should go against the use of RCU in ghes_notify_sea(), but there
> should be something here to explain the surprise use of nmi. Something like:
> Synchronous aborts may interrupt code which had interrupts masked. Before
> calling out into the wider kernel tell the interested subsystems.
Sounds good. I'll update the comments.
> This should be wrapped in:
> if (IS_ENABLED(HAVE_ACPI_APEI_SEA)) {
>
>> +	nmi_enter();
>> +	ghes_notify_sea();
>> +	nmi_exit();
> }
>
> To avoid breaking systems that don't have SEA configured.
Yes, I'll add that check in the next version.
>> +
>>   	info.si_signo = SIGBUS;
>>   	info.si_errno = 0;
>>   	info.si_code  = 0;
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index b0140c8..3786ff1 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>>   config HAVE_ACPI_APEI_NMI
>>   	bool
>>   
>> +config HAVE_ACPI_APEI_SEA
>> +	bool "APEI Synchronous External Abort logging/recovering support"
>> +	depends on ARM64
> depends on CONFIG_ACPI_APEI_GHES
> ?
>
> (I think this is what the kbuild robot has managed to miss out!)
Will do.
>> +	help
>> +	  This option should be enabled if the system supports
>> +	  firmware first handling of SEA (Synchronous External Abort).
>> +	  SEA happens with certain faults of data abort or instruction
>> +	  abort synchronous exceptions on ARMv8 systems. If a system
>> +	  supports firmware first handling of SEA, the platform analyzes
>> +	  and handles hardware error notifications with SEA, and it may then
> Nit: notifications from SEA,
Will do.
>> +	  form a HW error record for the OS to parse and handle. This
>> +	  option allows the OS to look for such HW error record, and
> Nit: 'HW'->hardware. This is spelled out for the other seven uses in the file.
Will do.
>> +	  take appropriate action.
>> +
>>   config ACPI_APEI
>>   	bool "ACPI Platform Error Interface (APEI)"
>>   	select MISC_FILESYSTEMS
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index b25e7cf..8756172 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -114,11 +114,7 @@
>>    * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>>    * NMI context (optionally).
>>    */
>> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   #define GHES_IOREMAP_PAGES           2
>> -#else
>> -#define GHES_IOREMAP_PAGES           1
>> -#endif
>>   #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>>   #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>>   
>> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>>   
>>   static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>>   {
>> -	unsigned long vaddr;
>> +	unsigned long vaddr, paddr;
>> +	pgprot_t prot;
>>   
>>   	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
>> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
>> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
>> +
>> +	paddr = pfn << PAGE_SHIFT;
> Physical addresses might not always fit in 'unsigned long'. phys_addr_t exists
> to hide this nasty detail!
>
>  From arch/x86/Kconfig:
>> config ARCH_PHYS_ADDR_T_64BIT
>> 	def_bool y
>> 	depends on X86_64 || X86_PAE
> 32bit x86 kernels configured with PAE define phys_addr_t to be u64.
I'll make that change.

ghes_ioremap_pfn_irq() should probably be updated in a separate change 
because it uses unsigned
long for it's paddr...I just mimicked that code here :)
>> +	prot = arch_apei_get_mem_attribute(paddr);
>> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>>   
>>   	return (void __iomem *)vaddr;
>>   }
>> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>>   	.notifier_call = ghes_notify_sci,
>>   };
>>   
>> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
>> +static LIST_HEAD(ghes_sea);
>> +
>> +void ghes_notify_sea(void)
>> +{
>> +	struct ghes *ghes;
>> +
> 	/*
> 	 * synchronize_rcu() will wait for nmi_exit(), so no need to
> 	 * rcu_read_lock().
> 	 */
>
>> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
>> +		ghes_proc(ghes);
>> +	}
>> +}
>> +
>> +static int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_add_rcu(&ghes->list, &ghes_sea);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	return 0;
> This function returns 0 or -ENOTSUPP, depending on CONFIG_HAVE_ACPI_APEI_SEA,
> but ...
>
>
>> +}
>> +
>> +static void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_del_rcu(&ghes->list);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	synchronize_rcu();
>> +}
>> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +static inline int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +	return -ENOTSUPP;
>> +}
>> +
>> +static inline void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +}
>> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +
>>   #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   /*
>>    * printk is not safe in NMI context.  So in NMI handler, we allocate
>> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   	case ACPI_HEST_NOTIFY_EXTERNAL:
>>   	case ACPI_HEST_NOTIFY_SCI:
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
>> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
>> +				generic->header.source_id);
>> +			rc = -ENOTSUPP;
>> +			goto err;
> ... here we jump out of ghes_probe() if NOTIFY_SEA is used but the kernel wasn't
> built with CONFIG_HAVE_ACPI_APEI_SEA....
>
>
>> +		}
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>>   			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
>> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>>   			   generic->header.source_id);
>>   		goto err;
>> +	case ACPI_HEST_NOTIFY_GPIO:
>> +	case ACPI_HEST_NOTIFY_SEI:
>> +	case ACPI_HEST_NOTIFY_GSIV:
>> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
>> +			generic->header.source_id, generic->header.source_id);
>> +		rc = -ENOTSUPP;
>> +		goto err;
>>   	default:
>>   		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>>   			   generic->notify.type, generic->header.source_id);
>> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		list_add_rcu(&ghes->list, &ghes_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		rc = ghes_sea_add(ghes);
> ... so this error handling will never be needed.
> ghes_nmi_add() returns void.
>
> I guess the not-configured versions of the symbols need to exist for older
> compilers that can't work out that this can never be called.
I can remove the extra error handling here.
>> +		if (rc)
>> +			goto err_edac_unreg;
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_add(ghes);
>>   		break;
>> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>>   			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		ghes_sea_remove(ghes);
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_remove(ghes);
>>   		break;
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 6ae318b..adf5455 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>>   		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>>   		gdata + 1;
>>   }
>> +
>> +void ghes_notify_sea(void);
>>
> This header file ought to have an include guard, could you add one?
>
> I think the kbuild-robot has managed to configure SEA on, but GHES off so ghes.c
> isn't included in the kernel. I will dig some more into this on Monday.
Yes, I can add that.

Thanks for all the great reviewing :)

Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-03 20:38       ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-03 20:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hello James,


On 2/3/2017 9:00 AM, James Morse wrote:
> On 01/02/17 17:16, Tyler Baicar wrote:
>> ARM APEI extension proposal added SEA (Synchronous External Abort)
>> notification type for ARMv8.
>> Add a new GHES error source handling function for SEA. If an error
>> source's notification type is SEA, then this function can be registered
>> into the SEA exception handler. That way GHES will parse and report
>> SEA exceptions when they occur.
> It's worth adding details of the other things this patch changes, just to alert
> busy reviewers, something like:
> An SEA can interrupt code that had interrupts masked and is treated as an NMI.
> To aid this the page of address space for mapping APEI buffers while in_nmi() is
> always reserved, and ghes_ioremap_pfn_nmi() is changed to use the helper methods
> to find the prot_t to map with in the same way as ghes_ioremap_pfn_irq().
I'll add this in in the next version.
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 1117421..f92778d 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -53,6 +53,8 @@ config ARM64
>>   	select HANDLE_DOMAIN_IRQ
>>   	select HARDIRQS_SW_RESEND
>>   	select HAVE_ACPI_APEI if (ACPI && EFI)
>> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
>> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA
> Nit: This section of the file is largely in alphabetical order, can we try to
> keep it that way?!
Yes, will do!
>>   	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>>   	select HAVE_ARCH_AUDITSYSCALL
>>   	select HAVE_ARCH_BITREVERSE
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 9ae7e65..5a5a096 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -30,6 +30,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/perf_event.h>
>>   #include <linux/preempt.h>
>> +#include <linux/hardirq.h>
> This header is already included by this file further up.
Oops, guess I missed that :)
>>   
>>   #include <asm/bug.h>
>>   #include <asm/cpufeature.h>
>> @@ -41,6 +42,8 @@
>>   #include <asm/pgtable.h>
>>   #include <asm/tlbflush.h>
>>   
>> +#include <acpi/ghes.h>
>> +
>>   static const char *fault_name(unsigned int esr);
>>   
>>   #ifdef CONFIG_KPROBES
>> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>   		 fault_name(esr), esr, addr);
>>   
>> +	/*
>> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
>> +	 * rcu_read_lock().
>> +	 */
> This comment should go against the use of RCU in ghes_notify_sea(), but there
> should be something here to explain the surprise use of nmi. Something like:
> Synchronous aborts may interrupt code which had interrupts masked. Before
> calling out into the wider kernel tell the interested subsystems.
Sounds good. I'll update the comments.
> This should be wrapped in:
> if (IS_ENABLED(HAVE_ACPI_APEI_SEA)) {
>
>> +	nmi_enter();
>> +	ghes_notify_sea();
>> +	nmi_exit();
> }
>
> To avoid breaking systems that don't have SEA configured.
Yes, I'll add that check in the next version.
>> +
>>   	info.si_signo = SIGBUS;
>>   	info.si_errno = 0;
>>   	info.si_code  = 0;
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index b0140c8..3786ff1 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>>   config HAVE_ACPI_APEI_NMI
>>   	bool
>>   
>> +config HAVE_ACPI_APEI_SEA
>> +	bool "APEI Synchronous External Abort logging/recovering support"
>> +	depends on ARM64
> depends on CONFIG_ACPI_APEI_GHES
> ?
>
> (I think this is what the kbuild robot has managed to miss out!)
Will do.
>> +	help
>> +	  This option should be enabled if the system supports
>> +	  firmware first handling of SEA (Synchronous External Abort).
>> +	  SEA happens with certain faults of data abort or instruction
>> +	  abort synchronous exceptions on ARMv8 systems. If a system
>> +	  supports firmware first handling of SEA, the platform analyzes
>> +	  and handles hardware error notifications with SEA, and it may then
> Nit: notifications from SEA,
Will do.
>> +	  form a HW error record for the OS to parse and handle. This
>> +	  option allows the OS to look for such HW error record, and
> Nit: 'HW'->hardware. This is spelled out for the other seven uses in the file.
Will do.
>> +	  take appropriate action.
>> +
>>   config ACPI_APEI
>>   	bool "ACPI Platform Error Interface (APEI)"
>>   	select MISC_FILESYSTEMS
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index b25e7cf..8756172 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -114,11 +114,7 @@
>>    * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>>    * NMI context (optionally).
>>    */
>> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   #define GHES_IOREMAP_PAGES           2
>> -#else
>> -#define GHES_IOREMAP_PAGES           1
>> -#endif
>>   #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>>   #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>>   
>> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>>   
>>   static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>>   {
>> -	unsigned long vaddr;
>> +	unsigned long vaddr, paddr;
>> +	pgprot_t prot;
>>   
>>   	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
>> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
>> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
>> +
>> +	paddr = pfn << PAGE_SHIFT;
> Physical addresses might not always fit in 'unsigned long'. phys_addr_t exists
> to hide this nasty detail!
>
>  From arch/x86/Kconfig:
>> config ARCH_PHYS_ADDR_T_64BIT
>> 	def_bool y
>> 	depends on X86_64 || X86_PAE
> 32bit x86 kernels configured with PAE define phys_addr_t to be u64.
I'll make that change.

ghes_ioremap_pfn_irq() should probably be updated in a separate change 
because it uses unsigned
long for it's paddr...I just mimicked that code here :)
>> +	prot = arch_apei_get_mem_attribute(paddr);
>> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>>   
>>   	return (void __iomem *)vaddr;
>>   }
>> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>>   	.notifier_call = ghes_notify_sci,
>>   };
>>   
>> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
>> +static LIST_HEAD(ghes_sea);
>> +
>> +void ghes_notify_sea(void)
>> +{
>> +	struct ghes *ghes;
>> +
> 	/*
> 	 * synchronize_rcu() will wait for nmi_exit(), so no need to
> 	 * rcu_read_lock().
> 	 */
>
>> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
>> +		ghes_proc(ghes);
>> +	}
>> +}
>> +
>> +static int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_add_rcu(&ghes->list, &ghes_sea);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	return 0;
> This function returns 0 or -ENOTSUPP, depending on CONFIG_HAVE_ACPI_APEI_SEA,
> but ...
>
>
>> +}
>> +
>> +static void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_del_rcu(&ghes->list);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	synchronize_rcu();
>> +}
>> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +static inline int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +	return -ENOTSUPP;
>> +}
>> +
>> +static inline void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +}
>> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +
>>   #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   /*
>>    * printk is not safe in NMI context.  So in NMI handler, we allocate
>> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   	case ACPI_HEST_NOTIFY_EXTERNAL:
>>   	case ACPI_HEST_NOTIFY_SCI:
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
>> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
>> +				generic->header.source_id);
>> +			rc = -ENOTSUPP;
>> +			goto err;
> ... here we jump out of ghes_probe() if NOTIFY_SEA is used but the kernel wasn't
> built with CONFIG_HAVE_ACPI_APEI_SEA....
>
>
>> +		}
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>>   			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
>> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>>   			   generic->header.source_id);
>>   		goto err;
>> +	case ACPI_HEST_NOTIFY_GPIO:
>> +	case ACPI_HEST_NOTIFY_SEI:
>> +	case ACPI_HEST_NOTIFY_GSIV:
>> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
>> +			generic->header.source_id, generic->header.source_id);
>> +		rc = -ENOTSUPP;
>> +		goto err;
>>   	default:
>>   		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>>   			   generic->notify.type, generic->header.source_id);
>> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		list_add_rcu(&ghes->list, &ghes_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		rc = ghes_sea_add(ghes);
> ... so this error handling will never be needed.
> ghes_nmi_add() returns void.
>
> I guess the not-configured versions of the symbols need to exist for older
> compilers that can't work out that this can never be called.
I can remove the extra error handling here.
>> +		if (rc)
>> +			goto err_edac_unreg;
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_add(ghes);
>>   		break;
>> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>>   			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		ghes_sea_remove(ghes);
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_remove(ghes);
>>   		break;
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 6ae318b..adf5455 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>>   		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>>   		gdata + 1;
>>   }
>> +
>> +void ghes_notify_sea(void);
>>
> This header file ought to have an include guard, could you add one?
>
> I think the kbuild-robot has managed to configure SEA on, but GHES off so ghes.c
> isn't included in the kernel. I will dig some more into this on Monday.
Yes, I can add that.

Thanks for all the great reviewing :)

Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
  2017-02-01 17:16   ` Tyler Baicar
  (?)
@ 2017-02-09 10:48     ` James Morse
  -1 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-09 10:48 UTC (permalink / raw)
  To: Tyler Baicar, zjzhang
  Cc: linux-efi, kvm, matt, catalin.marinas, will.deacon, robert.moore,
	paul.gortmaker, lv.zheng, kvmarm, fu.wei, linux, linux-acpi,
	eun.taik.lee, shijie.huang, labbott, lenb, harba, john.garry,
	marc.zyngier, punit.agrawal, rostedt, nkaje, sandeepa.s.prabhu,
	linux-arm-kernel, devel, rjw, rruigrok, linux-kernel, astone,
	hanjun.guo, pbonzini, akpm, bristot, shiju.jose

Hi Jonathan, Tyler,

On 01/02/17 17:16, Tyler Baicar wrote:
> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
> 
> Even if an error status block's severity is fatal, the kernel does not
> honor the severity level and panic.
> 
> With the firmware first model, the platform could inform the OS about a
> fatal hardware error through the non-NMI GHES notification type. The OS
> should panic when a hardware error record is received with this
> severity.
> 
> Call panic() after CPER data in error status block is printed if
> severity is fatal, before each error section is handled.

> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 8756172..86c1f15 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
>  	return rc;
>  }
>  
> +static void __ghes_call_panic(void)
> +{
> +	if (panic_timeout == 0)
> +		panic_timeout = ghes_panic_timeout;
> +	panic("Fatal hardware error!");
> +}
> +

__ghes_panic() also has:
>	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);

Which prints this estatus regardless of rate limiting and cache-ing.

[ ... ]

> @@ -698,6 +707,10 @@ static int ghes_proc(struct ghes *ghes)
>  		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))

ghes_print_estatus() uses some custom rate limiting '2 messages every 5
seconds', GHES_SEV_PANIC shares the same limit as GHES_SEV_RECOVERABLE.

I think its possible to get 2 recoverable messages, then a panic in a 5 second
window. The rate limit will kick in to stop the panic estatus block being
printed, but we still go on to call panic() without the real reason being printed...

(the caching thing only seems to consider identical messages, given we would
never see two panic messages, I don't think that will cause any problems.)

>  			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>  	}
> +	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
> +		__ghes_call_panic();
> +	}
> +

I think this ghes_severity() then panic() should go above the:
>	if (!ghes_estatus_cached(ghes->estatus)) {
and we should call __ghes_print_estatus() here too, to make sure the message
definitely got out!


With that,
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
@ 2017-02-09 10:48     ` James Morse
  0 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-09 10:48 UTC (permalink / raw)
  To: Tyler Baicar, zjzhang
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, mark.rutland, akpm, eun.taik.lee,
	sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose

Hi Jonathan, Tyler,

On 01/02/17 17:16, Tyler Baicar wrote:
> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
> 
> Even if an error status block's severity is fatal, the kernel does not
> honor the severity level and panic.
> 
> With the firmware first model, the platform could inform the OS about a
> fatal hardware error through the non-NMI GHES notification type. The OS
> should panic when a hardware error record is received with this
> severity.
> 
> Call panic() after CPER data in error status block is printed if
> severity is fatal, before each error section is handled.

> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 8756172..86c1f15 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
>  	return rc;
>  }
>  
> +static void __ghes_call_panic(void)
> +{
> +	if (panic_timeout == 0)
> +		panic_timeout = ghes_panic_timeout;
> +	panic("Fatal hardware error!");
> +}
> +

__ghes_panic() also has:
>	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);

Which prints this estatus regardless of rate limiting and cache-ing.

[ ... ]

> @@ -698,6 +707,10 @@ static int ghes_proc(struct ghes *ghes)
>  		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))

ghes_print_estatus() uses some custom rate limiting '2 messages every 5
seconds', GHES_SEV_PANIC shares the same limit as GHES_SEV_RECOVERABLE.

I think its possible to get 2 recoverable messages, then a panic in a 5 second
window. The rate limit will kick in to stop the panic estatus block being
printed, but we still go on to call panic() without the real reason being printed...

(the caching thing only seems to consider identical messages, given we would
never see two panic messages, I don't think that will cause any problems.)

>  			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>  	}
> +	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
> +		__ghes_call_panic();
> +	}
> +

I think this ghes_severity() then panic() should go above the:
>	if (!ghes_estatus_cached(ghes->estatus)) {
and we should call __ghes_print_estatus() here too, to make sure the message
definitely got out!


With that,
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
@ 2017-02-09 10:48     ` James Morse
  0 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-09 10:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jonathan, Tyler,

On 01/02/17 17:16, Tyler Baicar wrote:
> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
> 
> Even if an error status block's severity is fatal, the kernel does not
> honor the severity level and panic.
> 
> With the firmware first model, the platform could inform the OS about a
> fatal hardware error through the non-NMI GHES notification type. The OS
> should panic when a hardware error record is received with this
> severity.
> 
> Call panic() after CPER data in error status block is printed if
> severity is fatal, before each error section is handled.

> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 8756172..86c1f15 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
>  	return rc;
>  }
>  
> +static void __ghes_call_panic(void)
> +{
> +	if (panic_timeout == 0)
> +		panic_timeout = ghes_panic_timeout;
> +	panic("Fatal hardware error!");
> +}
> +

__ghes_panic() also has:
>	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);

Which prints this estatus regardless of rate limiting and cache-ing.

[ ... ]

> @@ -698,6 +707,10 @@ static int ghes_proc(struct ghes *ghes)
>  		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))

ghes_print_estatus() uses some custom rate limiting '2 messages every 5
seconds', GHES_SEV_PANIC shares the same limit as GHES_SEV_RECOVERABLE.

I think its possible to get 2 recoverable messages, then a panic in a 5 second
window. The rate limit will kick in to stop the panic estatus block being
printed, but we still go on to call panic() without the real reason being printed...

(the caching thing only seems to consider identical messages, given we would
never see two panic messages, I don't think that will cause any problems.)

>  			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>  	}
> +	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
> +		__ghes_call_panic();
> +	}
> +

I think this ghes_severity() then panic() should go above the:
>	if (!ghes_estatus_cached(ghes->estatus)) {
and we should call __ghes_print_estatus() here too, to make sure the message
definitely got out!


With that,
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
  2017-02-09 10:48     ` James Morse
  (?)
@ 2017-02-13 22:45         ` Baicar, Tyler
  -1 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-13 22:45 UTC (permalink / raw)
  To: James Morse, zjzhang-sgV2jX0FEOL9JmXXK+q4OQ
  Cc: christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
	rkrcmar-H+wXaHxf7aLQT0dZR+AlfA, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	matt-mF/unelCI9GS6iBeEJttW/XRex20P6io,
	robert.moore-ral2JQCrhuEAvxtiuMwx3w,
	lv.zheng-ral2JQCrhuEAvxtiuMwx3w, nkaje-sgV2jX0FEOL9JmXXK+q4OQ,
	mark.rutland-5wv7dgnIgG8, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	eun.taik.lee-Sze3O3UU22JBDgjK7y7TUQ,
	sandeepa.s.prabhu-Re5JQEeQqe8AvxtiuMwx3w,
	labbott-H+wXaHxf7aLQT0dZR+AlfA, shijie.huang-5wv7dgnIgG8,
	rruigrok-sgV2jX0FEOL9JmXXK+q4OQ,
	paul.gortmaker-CWA4WttNNZF54TAoqtyWWQ, tn-nYOzD4b6Jr9Wk0Htik3J/w,
	fu.wei-QSEj5FYQhm4dnm+yROfE0A, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	bristot-H+wXaHxf7aLQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, devel-E0kO6a4B6psdnm+yROfE0A,
	Suzuki.Poulose-5wv7dgnIgG8, punit.agr

Hello James,


On 2/9/2017 3:48 AM, James Morse wrote:
> Hi Jonathan, Tyler,
>
> On 01/02/17 17:16, Tyler Baicar wrote:
>> From: "Jonathan (Zhixiong) Zhang" <zjzhang-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>>
>> Even if an error status block's severity is fatal, the kernel does not
>> honor the severity level and panic.
>>
>> With the firmware first model, the platform could inform the OS about a
>> fatal hardware error through the non-NMI GHES notification type. The OS
>> should panic when a hardware error record is received with this
>> severity.
>>
>> Call panic() after CPER data in error status block is printed if
>> severity is fatal, before each error section is handled.
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 8756172..86c1f15 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
>>   	return rc;
>>   }
>>   
>> +static void __ghes_call_panic(void)
>> +{
>> +	if (panic_timeout == 0)
>> +		panic_timeout = ghes_panic_timeout;
>> +	panic("Fatal hardware error!");
>> +}
>> +
> __ghes_panic() also has:
>> 	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
> Which prints this estatus regardless of rate limiting and cache-ing.
>
> [ ... ]
>
>> @@ -698,6 +707,10 @@ static int ghes_proc(struct ghes *ghes)
>>   		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
> ghes_print_estatus() uses some custom rate limiting '2 messages every 5
> seconds', GHES_SEV_PANIC shares the same limit as GHES_SEV_RECOVERABLE.
>
> I think its possible to get 2 recoverable messages, then a panic in a 5 second
> window. The rate limit will kick in to stop the panic estatus block being
> printed, but we still go on to call panic() without the real reason being printed...
>
> (the caching thing only seems to consider identical messages, given we would
> never see two panic messages, I don't think that will cause any problems.)
>
>>   			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>   	}
>> +	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
>> +		__ghes_call_panic();
>> +	}
>> +
> I think this ghes_severity() then panic() should go above the:
>> 	if (!ghes_estatus_cached(ghes->estatus)) {
> and we should call __ghes_print_estatus() here too, to make sure the message
> definitely got out!
Okay, that makes sense. If we move this up, is there a problem with 
calling __ghes_panic() instead of making the __ghes_print_estatus() and 
__ghes_call_panic() calls here? It looks like that will just add a call 
to oops_begin() and ghes_print_queued_estatus() as well, but this is 
what ghes_notify_nmi() does if the severity is panic.

Thanks,
Tyler
> With that,
> Reviewed-by: James Morse <james.morse-5wv7dgnIgG8@public.gmane.org>
>
>
> Thanks,
>
> James

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
@ 2017-02-13 22:45         ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-13 22:45 UTC (permalink / raw)
  To: James Morse, zjzhang
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, mark.rutland, akpm, eun.taik.lee,
	sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose

Hello James,


On 2/9/2017 3:48 AM, James Morse wrote:
> Hi Jonathan, Tyler,
>
> On 01/02/17 17:16, Tyler Baicar wrote:
>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>
>> Even if an error status block's severity is fatal, the kernel does not
>> honor the severity level and panic.
>>
>> With the firmware first model, the platform could inform the OS about a
>> fatal hardware error through the non-NMI GHES notification type. The OS
>> should panic when a hardware error record is received with this
>> severity.
>>
>> Call panic() after CPER data in error status block is printed if
>> severity is fatal, before each error section is handled.
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 8756172..86c1f15 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
>>   	return rc;
>>   }
>>   
>> +static void __ghes_call_panic(void)
>> +{
>> +	if (panic_timeout == 0)
>> +		panic_timeout = ghes_panic_timeout;
>> +	panic("Fatal hardware error!");
>> +}
>> +
> __ghes_panic() also has:
>> 	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
> Which prints this estatus regardless of rate limiting and cache-ing.
>
> [ ... ]
>
>> @@ -698,6 +707,10 @@ static int ghes_proc(struct ghes *ghes)
>>   		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
> ghes_print_estatus() uses some custom rate limiting '2 messages every 5
> seconds', GHES_SEV_PANIC shares the same limit as GHES_SEV_RECOVERABLE.
>
> I think its possible to get 2 recoverable messages, then a panic in a 5 second
> window. The rate limit will kick in to stop the panic estatus block being
> printed, but we still go on to call panic() without the real reason being printed...
>
> (the caching thing only seems to consider identical messages, given we would
> never see two panic messages, I don't think that will cause any problems.)
>
>>   			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>   	}
>> +	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
>> +		__ghes_call_panic();
>> +	}
>> +
> I think this ghes_severity() then panic() should go above the:
>> 	if (!ghes_estatus_cached(ghes->estatus)) {
> and we should call __ghes_print_estatus() here too, to make sure the message
> definitely got out!
Okay, that makes sense. If we move this up, is there a problem with 
calling __ghes_panic() instead of making the __ghes_print_estatus() and 
__ghes_call_panic() calls here? It looks like that will just add a call 
to oops_begin() and ghes_print_queued_estatus() as well, but this is 
what ghes_notify_nmi() does if the severity is panic.

Thanks,
Tyler
> With that,
> Reviewed-by: James Morse <james.morse@arm.com>
>
>
> Thanks,
>
> James

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
@ 2017-02-13 22:45         ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-13 22:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hello James,


On 2/9/2017 3:48 AM, James Morse wrote:
> Hi Jonathan, Tyler,
>
> On 01/02/17 17:16, Tyler Baicar wrote:
>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>
>> Even if an error status block's severity is fatal, the kernel does not
>> honor the severity level and panic.
>>
>> With the firmware first model, the platform could inform the OS about a
>> fatal hardware error through the non-NMI GHES notification type. The OS
>> should panic when a hardware error record is received with this
>> severity.
>>
>> Call panic() after CPER data in error status block is printed if
>> severity is fatal, before each error section is handled.
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 8756172..86c1f15 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
>>   	return rc;
>>   }
>>   
>> +static void __ghes_call_panic(void)
>> +{
>> +	if (panic_timeout == 0)
>> +		panic_timeout = ghes_panic_timeout;
>> +	panic("Fatal hardware error!");
>> +}
>> +
> __ghes_panic() also has:
>> 	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
> Which prints this estatus regardless of rate limiting and cache-ing.
>
> [ ... ]
>
>> @@ -698,6 +707,10 @@ static int ghes_proc(struct ghes *ghes)
>>   		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
> ghes_print_estatus() uses some custom rate limiting '2 messages every 5
> seconds', GHES_SEV_PANIC shares the same limit as GHES_SEV_RECOVERABLE.
>
> I think its possible to get 2 recoverable messages, then a panic in a 5 second
> window. The rate limit will kick in to stop the panic estatus block being
> printed, but we still go on to call panic() without the real reason being printed...
>
> (the caching thing only seems to consider identical messages, given we would
> never see two panic messages, I don't think that will cause any problems.)
>
>>   			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>   	}
>> +	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
>> +		__ghes_call_panic();
>> +	}
>> +
> I think this ghes_severity() then panic() should go above the:
>> 	if (!ghes_estatus_cached(ghes->estatus)) {
> and we should call __ghes_print_estatus() here too, to make sure the message
> definitely got out!
Okay, that makes sense. If we move this up, is there a problem with 
calling __ghes_panic() instead of making the __ghes_print_estatus() and 
__ghes_call_panic() calls here? It looks like that will just add a call 
to oops_begin() and ghes_print_queued_estatus() as well, but this is 
what ghes_notify_nmi() does if the severity is panic.

Thanks,
Tyler
> With that,
> Reviewed-by: James Morse <james.morse@arm.com>
>
>
> Thanks,
>
> James

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
  2017-02-01 17:16   ` Tyler Baicar
  (?)
  (?)
@ 2017-02-15  6:24     ` Zhengqiang
  -1 siblings, 0 replies; 97+ messages in thread
From: Zhengqiang @ 2017-02-15  6:24 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel

Hi Tyler:

On 2017/2/2 1:16, Tyler Baicar wrote:
> ARM APEI extension proposal added SEA (Synchronous External Abort)
> notification type for ARMv8.
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
> ---
>  arch/arm64/Kconfig        |  2 ++
>  arch/arm64/mm/fault.c     | 11 +++++++
>  drivers/acpi/apei/Kconfig | 14 +++++++++
>  drivers/acpi/apei/ghes.c  | 78 ++++++++++++++++++++++++++++++++++++++++++-----
>  include/acpi/ghes.h       |  2 ++
>  5 files changed, 100 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1117421..f92778d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -53,6 +53,8 @@ config ARM64
>  	select HANDLE_DOMAIN_IRQ
>  	select HARDIRQS_SW_RESEND
>  	select HAVE_ACPI_APEI if (ACPI && EFI)
> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA
>  	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_BITREVERSE
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9ae7e65..5a5a096 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -30,6 +30,7 @@
>  #include <linux/highmem.h>
>  #include <linux/perf_event.h>
>  #include <linux/preempt.h>
> +#include <linux/hardirq.h>
>  
>  #include <asm/bug.h>
>  #include <asm/cpufeature.h>
> @@ -41,6 +42,8 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  
> +#include <acpi/ghes.h>
> +
>  static const char *fault_name(unsigned int esr);
>  
>  #ifdef CONFIG_KPROBES
> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>  		 fault_name(esr), esr, addr);
>  
> +	/*
> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
> +	 * rcu_read_lock().
> +	 */
> +	nmi_enter();
> +	ghes_notify_sea();
> +	nmi_exit();
> +

For fatal error:
ghes_notify_sea() -> ghes_proc() -> __ghes_call_panic(),cause panic;
do_sea() function also call arm64_notify_die(), will cause panic too;
Does it can happen, panic will be called twice ?

>  	info.si_signo = SIGBUS;
>  	info.si_errno = 0;
>  	info.si_code  = 0;
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index b0140c8..3786ff1 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>  config HAVE_ACPI_APEI_NMI
>  	bool
>  
> +config HAVE_ACPI_APEI_SEA
> +	bool "APEI Synchronous External Abort logging/recovering support"
> +	depends on ARM64
> +	help
> +	  This option should be enabled if the system supports
> +	  firmware first handling of SEA (Synchronous External Abort).
> +	  SEA happens with certain faults of data abort or instruction
> +	  abort synchronous exceptions on ARMv8 systems. If a system
> +	  supports firmware first handling of SEA, the platform analyzes
> +	  and handles hardware error notifications with SEA, and it may then
> +	  form a HW error record for the OS to parse and handle. This
> +	  option allows the OS to look for such HW error record, and
> +	  take appropriate action.
> +
>  config ACPI_APEI
>  	bool "ACPI Platform Error Interface (APEI)"
>  	select MISC_FILESYSTEMS
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index b25e7cf..8756172 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -114,11 +114,7 @@
>   * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>   * NMI context (optionally).
>   */
> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  #define GHES_IOREMAP_PAGES           2
> -#else
> -#define GHES_IOREMAP_PAGES           1
> -#endif
>  #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>  #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>  
> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>  
>  static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>  {
> -	unsigned long vaddr;
> +	unsigned long vaddr, paddr;
> +	pgprot_t prot;
>  
>  	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
> +
> +	paddr = pfn << PAGE_SHIFT;
> +	prot = arch_apei_get_mem_attribute(paddr);
> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>  
>  	return (void __iomem *)vaddr;
>  }
> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>  	.notifier_call = ghes_notify_sci,
>  };
>  
> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
> +static LIST_HEAD(ghes_sea);
> +
> +void ghes_notify_sea(void)
> +{
> +	struct ghes *ghes;
> +
> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
> +		ghes_proc(ghes);
> +	}
> +}
> +
> +static int ghes_sea_add(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_add_rcu(&ghes->list, &ghes_sea);
> +	mutex_unlock(&ghes_list_mutex);
> +	return 0;
> +}
> +
> +static void ghes_sea_remove(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_del_rcu(&ghes->list);
> +	mutex_unlock(&ghes_list_mutex);
> +	synchronize_rcu();
> +}
> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
> +static inline int ghes_sea_add(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +	return -ENOTSUPP;
> +}
> +
> +static inline void ghes_sea_remove(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +}
> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
> +
>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  /*
>   * printk is not safe in NMI context.  So in NMI handler, we allocate
> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  	case ACPI_HEST_NOTIFY_EXTERNAL:
>  	case ACPI_HEST_NOTIFY_SCI:
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
> +				generic->header.source_id);
> +			rc = -ENOTSUPP;
> +			goto err;
> +		}
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>  			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>  			   generic->header.source_id);
>  		goto err;
> +	case ACPI_HEST_NOTIFY_GPIO:
> +	case ACPI_HEST_NOTIFY_SEI:
> +	case ACPI_HEST_NOTIFY_GSIV:
> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
> +			generic->header.source_id, generic->header.source_id);
> +		rc = -ENOTSUPP;
> +		goto err;
>  	default:
>  		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>  			   generic->notify.type, generic->header.source_id);
> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		list_add_rcu(&ghes->list, &ghes_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		rc = ghes_sea_add(ghes);
> +		if (rc)
> +			goto err_edac_unreg;
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_add(ghes);
>  		break;
> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		ghes_sea_remove(ghes);
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_remove(ghes);
>  		break;
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 6ae318b..adf5455 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>  		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>  		gdata + 1;
>  }
> +
> +void ghes_notify_sea(void);
> 

Thanks,

Zhengqiang


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-15  6:24     ` Zhengqiang
  0 siblings, 0 replies; 97+ messages in thread
From: Zhengqiang @ 2017-02-15  6:24 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose

Hi Tyler:

On 2017/2/2 1:16, Tyler Baicar wrote:
> ARM APEI extension proposal added SEA (Synchronous External Abort)
> notification type for ARMv8.
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
> ---
>  arch/arm64/Kconfig        |  2 ++
>  arch/arm64/mm/fault.c     | 11 +++++++
>  drivers/acpi/apei/Kconfig | 14 +++++++++
>  drivers/acpi/apei/ghes.c  | 78 ++++++++++++++++++++++++++++++++++++++++++-----
>  include/acpi/ghes.h       |  2 ++
>  5 files changed, 100 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1117421..f92778d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -53,6 +53,8 @@ config ARM64
>  	select HANDLE_DOMAIN_IRQ
>  	select HARDIRQS_SW_RESEND
>  	select HAVE_ACPI_APEI if (ACPI && EFI)
> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA
>  	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_BITREVERSE
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9ae7e65..5a5a096 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -30,6 +30,7 @@
>  #include <linux/highmem.h>
>  #include <linux/perf_event.h>
>  #include <linux/preempt.h>
> +#include <linux/hardirq.h>
>  
>  #include <asm/bug.h>
>  #include <asm/cpufeature.h>
> @@ -41,6 +42,8 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  
> +#include <acpi/ghes.h>
> +
>  static const char *fault_name(unsigned int esr);
>  
>  #ifdef CONFIG_KPROBES
> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>  		 fault_name(esr), esr, addr);
>  
> +	/*
> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
> +	 * rcu_read_lock().
> +	 */
> +	nmi_enter();
> +	ghes_notify_sea();
> +	nmi_exit();
> +

For fatal error:
ghes_notify_sea() -> ghes_proc() -> __ghes_call_panic(),cause panic;
do_sea() function also call arm64_notify_die(), will cause panic too;
Does it can happen, panic will be called twice ?

>  	info.si_signo = SIGBUS;
>  	info.si_errno = 0;
>  	info.si_code  = 0;
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index b0140c8..3786ff1 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>  config HAVE_ACPI_APEI_NMI
>  	bool
>  
> +config HAVE_ACPI_APEI_SEA
> +	bool "APEI Synchronous External Abort logging/recovering support"
> +	depends on ARM64
> +	help
> +	  This option should be enabled if the system supports
> +	  firmware first handling of SEA (Synchronous External Abort).
> +	  SEA happens with certain faults of data abort or instruction
> +	  abort synchronous exceptions on ARMv8 systems. If a system
> +	  supports firmware first handling of SEA, the platform analyzes
> +	  and handles hardware error notifications with SEA, and it may then
> +	  form a HW error record for the OS to parse and handle. This
> +	  option allows the OS to look for such HW error record, and
> +	  take appropriate action.
> +
>  config ACPI_APEI
>  	bool "ACPI Platform Error Interface (APEI)"
>  	select MISC_FILESYSTEMS
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index b25e7cf..8756172 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -114,11 +114,7 @@
>   * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>   * NMI context (optionally).
>   */
> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  #define GHES_IOREMAP_PAGES           2
> -#else
> -#define GHES_IOREMAP_PAGES           1
> -#endif
>  #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>  #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>  
> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>  
>  static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>  {
> -	unsigned long vaddr;
> +	unsigned long vaddr, paddr;
> +	pgprot_t prot;
>  
>  	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
> +
> +	paddr = pfn << PAGE_SHIFT;
> +	prot = arch_apei_get_mem_attribute(paddr);
> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>  
>  	return (void __iomem *)vaddr;
>  }
> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>  	.notifier_call = ghes_notify_sci,
>  };
>  
> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
> +static LIST_HEAD(ghes_sea);
> +
> +void ghes_notify_sea(void)
> +{
> +	struct ghes *ghes;
> +
> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
> +		ghes_proc(ghes);
> +	}
> +}
> +
> +static int ghes_sea_add(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_add_rcu(&ghes->list, &ghes_sea);
> +	mutex_unlock(&ghes_list_mutex);
> +	return 0;
> +}
> +
> +static void ghes_sea_remove(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_del_rcu(&ghes->list);
> +	mutex_unlock(&ghes_list_mutex);
> +	synchronize_rcu();
> +}
> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
> +static inline int ghes_sea_add(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +	return -ENOTSUPP;
> +}
> +
> +static inline void ghes_sea_remove(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +}
> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
> +
>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  /*
>   * printk is not safe in NMI context.  So in NMI handler, we allocate
> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  	case ACPI_HEST_NOTIFY_EXTERNAL:
>  	case ACPI_HEST_NOTIFY_SCI:
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
> +				generic->header.source_id);
> +			rc = -ENOTSUPP;
> +			goto err;
> +		}
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>  			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>  			   generic->header.source_id);
>  		goto err;
> +	case ACPI_HEST_NOTIFY_GPIO:
> +	case ACPI_HEST_NOTIFY_SEI:
> +	case ACPI_HEST_NOTIFY_GSIV:
> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
> +			generic->header.source_id, generic->header.source_id);
> +		rc = -ENOTSUPP;
> +		goto err;
>  	default:
>  		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>  			   generic->notify.type, generic->header.source_id);
> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		list_add_rcu(&ghes->list, &ghes_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		rc = ghes_sea_add(ghes);
> +		if (rc)
> +			goto err_edac_unreg;
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_add(ghes);
>  		break;
> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		ghes_sea_remove(ghes);
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_remove(ghes);
>  		break;
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 6ae318b..adf5455 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>  		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>  		gdata + 1;
>  }
> +
> +void ghes_notify_sea(void);
> 

Thanks,

Zhengqiang

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-15  6:24     ` Zhengqiang
  0 siblings, 0 replies; 97+ messages in thread
From: Zhengqiang @ 2017-02-15  6:24 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel

Hi Tyler:

On 2017/2/2 1:16, Tyler Baicar wrote:
> ARM APEI extension proposal added SEA (Synchronous External Abort)
> notification type for ARMv8.
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
> ---
>  arch/arm64/Kconfig        |  2 ++
>  arch/arm64/mm/fault.c     | 11 +++++++
>  drivers/acpi/apei/Kconfig | 14 +++++++++
>  drivers/acpi/apei/ghes.c  | 78 ++++++++++++++++++++++++++++++++++++++++++-----
>  include/acpi/ghes.h       |  2 ++
>  5 files changed, 100 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1117421..f92778d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -53,6 +53,8 @@ config ARM64
>  	select HANDLE_DOMAIN_IRQ
>  	select HARDIRQS_SW_RESEND
>  	select HAVE_ACPI_APEI if (ACPI && EFI)
> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA
>  	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_BITREVERSE
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9ae7e65..5a5a096 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -30,6 +30,7 @@
>  #include <linux/highmem.h>
>  #include <linux/perf_event.h>
>  #include <linux/preempt.h>
> +#include <linux/hardirq.h>
>  
>  #include <asm/bug.h>
>  #include <asm/cpufeature.h>
> @@ -41,6 +42,8 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  
> +#include <acpi/ghes.h>
> +
>  static const char *fault_name(unsigned int esr);
>  
>  #ifdef CONFIG_KPROBES
> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>  		 fault_name(esr), esr, addr);
>  
> +	/*
> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
> +	 * rcu_read_lock().
> +	 */
> +	nmi_enter();
> +	ghes_notify_sea();
> +	nmi_exit();
> +

For fatal error:
ghes_notify_sea() -> ghes_proc() -> __ghes_call_panic(),cause panic;
do_sea() function also call arm64_notify_die(), will cause panic too;
Does it can happen, panic will be called twice ?

>  	info.si_signo = SIGBUS;
>  	info.si_errno = 0;
>  	info.si_code  = 0;
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index b0140c8..3786ff1 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>  config HAVE_ACPI_APEI_NMI
>  	bool
>  
> +config HAVE_ACPI_APEI_SEA
> +	bool "APEI Synchronous External Abort logging/recovering support"
> +	depends on ARM64
> +	help
> +	  This option should be enabled if the system supports
> +	  firmware first handling of SEA (Synchronous External Abort).
> +	  SEA happens with certain faults of data abort or instruction
> +	  abort synchronous exceptions on ARMv8 systems. If a system
> +	  supports firmware first handling of SEA, the platform analyzes
> +	  and handles hardware error notifications with SEA, and it may then
> +	  form a HW error record for the OS to parse and handle. This
> +	  option allows the OS to look for such HW error record, and
> +	  take appropriate action.
> +
>  config ACPI_APEI
>  	bool "ACPI Platform Error Interface (APEI)"
>  	select MISC_FILESYSTEMS
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index b25e7cf..8756172 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -114,11 +114,7 @@
>   * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>   * NMI context (optionally).
>   */
> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  #define GHES_IOREMAP_PAGES           2
> -#else
> -#define GHES_IOREMAP_PAGES           1
> -#endif
>  #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>  #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>  
> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>  
>  static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>  {
> -	unsigned long vaddr;
> +	unsigned long vaddr, paddr;
> +	pgprot_t prot;
>  
>  	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
> +
> +	paddr = pfn << PAGE_SHIFT;
> +	prot = arch_apei_get_mem_attribute(paddr);
> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>  
>  	return (void __iomem *)vaddr;
>  }
> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>  	.notifier_call = ghes_notify_sci,
>  };
>  
> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
> +static LIST_HEAD(ghes_sea);
> +
> +void ghes_notify_sea(void)
> +{
> +	struct ghes *ghes;
> +
> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
> +		ghes_proc(ghes);
> +	}
> +}
> +
> +static int ghes_sea_add(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_add_rcu(&ghes->list, &ghes_sea);
> +	mutex_unlock(&ghes_list_mutex);
> +	return 0;
> +}
> +
> +static void ghes_sea_remove(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_del_rcu(&ghes->list);
> +	mutex_unlock(&ghes_list_mutex);
> +	synchronize_rcu();
> +}
> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
> +static inline int ghes_sea_add(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +	return -ENOTSUPP;
> +}
> +
> +static inline void ghes_sea_remove(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +}
> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
> +
>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  /*
>   * printk is not safe in NMI context.  So in NMI handler, we allocate
> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  	case ACPI_HEST_NOTIFY_EXTERNAL:
>  	case ACPI_HEST_NOTIFY_SCI:
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
> +				generic->header.source_id);
> +			rc = -ENOTSUPP;
> +			goto err;
> +		}
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>  			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>  			   generic->header.source_id);
>  		goto err;
> +	case ACPI_HEST_NOTIFY_GPIO:
> +	case ACPI_HEST_NOTIFY_SEI:
> +	case ACPI_HEST_NOTIFY_GSIV:
> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
> +			generic->header.source_id, generic->header.source_id);
> +		rc = -ENOTSUPP;
> +		goto err;
>  	default:
>  		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>  			   generic->notify.type, generic->header.source_id);
> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		list_add_rcu(&ghes->list, &ghes_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		rc = ghes_sea_add(ghes);
> +		if (rc)
> +			goto err_edac_unreg;
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_add(ghes);
>  		break;
> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		ghes_sea_remove(ghes);
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_remove(ghes);
>  		break;
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 6ae318b..adf5455 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>  		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>  		gdata + 1;
>  }
> +
> +void ghes_notify_sea(void);
> 

Thanks,

Zhengqiang


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-15  6:24     ` Zhengqiang
  0 siblings, 0 replies; 97+ messages in thread
From: Zhengqiang @ 2017-02-15  6:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tyler:

On 2017/2/2 1:16, Tyler Baicar wrote:
> ARM APEI extension proposal added SEA (Synchronous External Abort)
> notification type for ARMv8.
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
> ---
>  arch/arm64/Kconfig        |  2 ++
>  arch/arm64/mm/fault.c     | 11 +++++++
>  drivers/acpi/apei/Kconfig | 14 +++++++++
>  drivers/acpi/apei/ghes.c  | 78 ++++++++++++++++++++++++++++++++++++++++++-----
>  include/acpi/ghes.h       |  2 ++
>  5 files changed, 100 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1117421..f92778d 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -53,6 +53,8 @@ config ARM64
>  	select HANDLE_DOMAIN_IRQ
>  	select HARDIRQS_SW_RESEND
>  	select HAVE_ACPI_APEI if (ACPI && EFI)
> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA
>  	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>  	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_BITREVERSE
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9ae7e65..5a5a096 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -30,6 +30,7 @@
>  #include <linux/highmem.h>
>  #include <linux/perf_event.h>
>  #include <linux/preempt.h>
> +#include <linux/hardirq.h>
>  
>  #include <asm/bug.h>
>  #include <asm/cpufeature.h>
> @@ -41,6 +42,8 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  
> +#include <acpi/ghes.h>
> +
>  static const char *fault_name(unsigned int esr);
>  
>  #ifdef CONFIG_KPROBES
> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>  	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>  		 fault_name(esr), esr, addr);
>  
> +	/*
> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
> +	 * rcu_read_lock().
> +	 */
> +	nmi_enter();
> +	ghes_notify_sea();
> +	nmi_exit();
> +

For fatal error:
ghes_notify_sea() -> ghes_proc() -> __ghes_call_panic(),cause panic;
do_sea() function also call arm64_notify_die(), will cause panic too;
Does it can happen, panic will be called twice ?

>  	info.si_signo = SIGBUS;
>  	info.si_errno = 0;
>  	info.si_code  = 0;
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index b0140c8..3786ff1 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>  config HAVE_ACPI_APEI_NMI
>  	bool
>  
> +config HAVE_ACPI_APEI_SEA
> +	bool "APEI Synchronous External Abort logging/recovering support"
> +	depends on ARM64
> +	help
> +	  This option should be enabled if the system supports
> +	  firmware first handling of SEA (Synchronous External Abort).
> +	  SEA happens with certain faults of data abort or instruction
> +	  abort synchronous exceptions on ARMv8 systems. If a system
> +	  supports firmware first handling of SEA, the platform analyzes
> +	  and handles hardware error notifications with SEA, and it may then
> +	  form a HW error record for the OS to parse and handle. This
> +	  option allows the OS to look for such HW error record, and
> +	  take appropriate action.
> +
>  config ACPI_APEI
>  	bool "ACPI Platform Error Interface (APEI)"
>  	select MISC_FILESYSTEMS
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index b25e7cf..8756172 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -114,11 +114,7 @@
>   * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>   * NMI context (optionally).
>   */
> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  #define GHES_IOREMAP_PAGES           2
> -#else
> -#define GHES_IOREMAP_PAGES           1
> -#endif
>  #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>  #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>  
> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>  
>  static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>  {
> -	unsigned long vaddr;
> +	unsigned long vaddr, paddr;
> +	pgprot_t prot;
>  
>  	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
> +
> +	paddr = pfn << PAGE_SHIFT;
> +	prot = arch_apei_get_mem_attribute(paddr);
> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>  
>  	return (void __iomem *)vaddr;
>  }
> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>  	.notifier_call = ghes_notify_sci,
>  };
>  
> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
> +static LIST_HEAD(ghes_sea);
> +
> +void ghes_notify_sea(void)
> +{
> +	struct ghes *ghes;
> +
> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
> +		ghes_proc(ghes);
> +	}
> +}
> +
> +static int ghes_sea_add(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_add_rcu(&ghes->list, &ghes_sea);
> +	mutex_unlock(&ghes_list_mutex);
> +	return 0;
> +}
> +
> +static void ghes_sea_remove(struct ghes *ghes)
> +{
> +	mutex_lock(&ghes_list_mutex);
> +	list_del_rcu(&ghes->list);
> +	mutex_unlock(&ghes_list_mutex);
> +	synchronize_rcu();
> +}
> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
> +static inline int ghes_sea_add(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +	return -ENOTSUPP;
> +}
> +
> +static inline void ghes_sea_remove(struct ghes *ghes)
> +{
> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
> +	       ghes->generic->header.source_id);
> +}
> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
> +
>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  /*
>   * printk is not safe in NMI context.  So in NMI handler, we allocate
> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  	case ACPI_HEST_NOTIFY_EXTERNAL:
>  	case ACPI_HEST_NOTIFY_SCI:
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
> +				generic->header.source_id);
> +			rc = -ENOTSUPP;
> +			goto err;
> +		}
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>  			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>  			   generic->header.source_id);
>  		goto err;
> +	case ACPI_HEST_NOTIFY_GPIO:
> +	case ACPI_HEST_NOTIFY_SEI:
> +	case ACPI_HEST_NOTIFY_GSIV:
> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
> +			generic->header.source_id, generic->header.source_id);
> +		rc = -ENOTSUPP;
> +		goto err;
>  	default:
>  		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>  			   generic->notify.type, generic->header.source_id);
> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>  		list_add_rcu(&ghes->list, &ghes_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		rc = ghes_sea_add(ghes);
> +		if (rc)
> +			goto err_edac_unreg;
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_add(ghes);
>  		break;
> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>  			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>  		mutex_unlock(&ghes_list_mutex);
>  		break;
> +	case ACPI_HEST_NOTIFY_SEA:
> +		ghes_sea_remove(ghes);
> +		break;
>  	case ACPI_HEST_NOTIFY_NMI:
>  		ghes_nmi_remove(ghes);
>  		break;
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 6ae318b..adf5455 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>  		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>  		gdata + 1;
>  }
> +
> +void ghes_notify_sea(void);
> 

Thanks,

Zhengqiang

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
  2017-02-13 22:45         ` Baicar, Tyler
  (?)
@ 2017-02-15 12:13             ` James Morse
  -1 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-15 12:13 UTC (permalink / raw)
  To: Baicar, Tyler, zjzhang-sgV2jX0FEOL9JmXXK+q4OQ
  Cc: christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
	rkrcmar-H+wXaHxf7aLQT0dZR+AlfA, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	matt-mF/unelCI9GS6iBeEJttW/XRex20P6io,
	robert.moore-ral2JQCrhuEAvxtiuMwx3w,
	lv.zheng-ral2JQCrhuEAvxtiuMwx3w, nkaje-sgV2jX0FEOL9JmXXK+q4OQ,
	mark.rutland-5wv7dgnIgG8, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	eun.taik.lee-Sze3O3UU22JBDgjK7y7TUQ,
	sandeepa.s.prabhu-Re5JQEeQqe8AvxtiuMwx3w,
	labbott-H+wXaHxf7aLQT0dZR+AlfA, shijie.huang-5wv7dgnIgG8,
	rruigrok-sgV2jX0FEOL9JmXXK+q4OQ,
	paul.gortmaker-CWA4WttNNZF54TAoqtyWWQ, tn-nYOzD4b6Jr9Wk0Htik3J/w,
	fu.wei-QSEj5FYQhm4dnm+yROfE0A, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	bristot-H+wXaHxf7aLQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, devel-E0kO6a4B6psdnm+yROfE0A,
	Suzuki.Poulose-5wv7dgnIgG8, punit.agr

Hi Tyler,

On 13/02/17 22:45, Baicar, Tyler wrote:
> On 2/9/2017 3:48 AM, James Morse wrote:
>> On 01/02/17 17:16, Tyler Baicar wrote:
>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>>>
>>> Even if an error status block's severity is fatal, the kernel does not
>>> honor the severity level and panic.
>>>
>>> With the firmware first model, the platform could inform the OS about a
>>> fatal hardware error through the non-NMI GHES notification type. The OS
>>> should panic when a hardware error record is received with this
>>> severity.
>>>
>>> Call panic() after CPER data in error status block is printed if
>>> severity is fatal, before each error section is handled.

>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>> index 8756172..86c1f15 100644
>>> --- a/drivers/acpi/apei/ghes.c
>>> +++ b/drivers/acpi/apei/ghes.c
>>> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2
>>> *generic_v2)
>>>       return rc;
>>>   }
>>>   +static void __ghes_call_panic(void)
>>> +{
>>> +    if (panic_timeout == 0)
>>> +        panic_timeout = ghes_panic_timeout;
>>> +    panic("Fatal hardware error!");
>>> +}
>>> +

>> __ghes_panic() also has:
>>>     __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
>> Which prints this estatus regardless of rate limiting and cache-ing.

[...]

>>>               ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>>       }
>>> +    if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
>>> +        __ghes_call_panic();
>>> +    }

>> I think this ghes_severity() then panic() should go above the:
>>>     if (!ghes_estatus_cached(ghes->estatus)) {
>> and we should call __ghes_print_estatus() here too, to make sure the message
>> definitely got out!


> Okay, that makes sense. If we move this up, is there a problem with calling
> __ghes_panic() instead of making the __ghes_print_estatus() and
> __ghes_call_panic() calls here? It looks like that will just add a call to
> oops_begin() and ghes_print_queued_estatus() as well, but this is what
> ghes_notify_nmi() does if the severity is panic.


I don't think the queued stuff is relevant, isn't that just for x86-NMI messages
that it doesn't print out directly?

A quick grep shows arm64 doesn't have oops_begin(), you may have to add some
equivalent mechanism. Lets try and avoid that rabbit hole!

Given __ghes_panic() calls __ghes_print_estatus() too, you could try moving that
into your new __ghes_call_panic().... or whatever results in the least lines
changed!


Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
@ 2017-02-15 12:13             ` James Morse
  0 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-15 12:13 UTC (permalink / raw)
  To: Baicar, Tyler, zjzhang
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, mark.rutland, akpm, eun.taik.lee,
	sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose

Hi Tyler,

On 13/02/17 22:45, Baicar, Tyler wrote:
> On 2/9/2017 3:48 AM, James Morse wrote:
>> On 01/02/17 17:16, Tyler Baicar wrote:
>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>>
>>> Even if an error status block's severity is fatal, the kernel does not
>>> honor the severity level and panic.
>>>
>>> With the firmware first model, the platform could inform the OS about a
>>> fatal hardware error through the non-NMI GHES notification type. The OS
>>> should panic when a hardware error record is received with this
>>> severity.
>>>
>>> Call panic() after CPER data in error status block is printed if
>>> severity is fatal, before each error section is handled.

>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>> index 8756172..86c1f15 100644
>>> --- a/drivers/acpi/apei/ghes.c
>>> +++ b/drivers/acpi/apei/ghes.c
>>> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2
>>> *generic_v2)
>>>       return rc;
>>>   }
>>>   +static void __ghes_call_panic(void)
>>> +{
>>> +    if (panic_timeout == 0)
>>> +        panic_timeout = ghes_panic_timeout;
>>> +    panic("Fatal hardware error!");
>>> +}
>>> +

>> __ghes_panic() also has:
>>>     __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
>> Which prints this estatus regardless of rate limiting and cache-ing.

[...]

>>>               ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>>       }
>>> +    if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
>>> +        __ghes_call_panic();
>>> +    }

>> I think this ghes_severity() then panic() should go above the:
>>>     if (!ghes_estatus_cached(ghes->estatus)) {
>> and we should call __ghes_print_estatus() here too, to make sure the message
>> definitely got out!


> Okay, that makes sense. If we move this up, is there a problem with calling
> __ghes_panic() instead of making the __ghes_print_estatus() and
> __ghes_call_panic() calls here? It looks like that will just add a call to
> oops_begin() and ghes_print_queued_estatus() as well, but this is what
> ghes_notify_nmi() does if the severity is panic.


I don't think the queued stuff is relevant, isn't that just for x86-NMI messages
that it doesn't print out directly?

A quick grep shows arm64 doesn't have oops_begin(), you may have to add some
equivalent mechanism. Lets try and avoid that rabbit hole!

Given __ghes_panic() calls __ghes_print_estatus() too, you could try moving that
into your new __ghes_call_panic().... or whatever results in the least lines
changed!


Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
@ 2017-02-15 12:13             ` James Morse
  0 siblings, 0 replies; 97+ messages in thread
From: James Morse @ 2017-02-15 12:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Tyler,

On 13/02/17 22:45, Baicar, Tyler wrote:
> On 2/9/2017 3:48 AM, James Morse wrote:
>> On 01/02/17 17:16, Tyler Baicar wrote:
>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>>
>>> Even if an error status block's severity is fatal, the kernel does not
>>> honor the severity level and panic.
>>>
>>> With the firmware first model, the platform could inform the OS about a
>>> fatal hardware error through the non-NMI GHES notification type. The OS
>>> should panic when a hardware error record is received with this
>>> severity.
>>>
>>> Call panic() after CPER data in error status block is printed if
>>> severity is fatal, before each error section is handled.

>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>> index 8756172..86c1f15 100644
>>> --- a/drivers/acpi/apei/ghes.c
>>> +++ b/drivers/acpi/apei/ghes.c
>>> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2
>>> *generic_v2)
>>>       return rc;
>>>   }
>>>   +static void __ghes_call_panic(void)
>>> +{
>>> +    if (panic_timeout == 0)
>>> +        panic_timeout = ghes_panic_timeout;
>>> +    panic("Fatal hardware error!");
>>> +}
>>> +

>> __ghes_panic() also has:
>>>     __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
>> Which prints this estatus regardless of rate limiting and cache-ing.

[...]

>>>               ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>>       }
>>> +    if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
>>> +        __ghes_call_panic();
>>> +    }

>> I think this ghes_severity() then panic() should go above the:
>>>     if (!ghes_estatus_cached(ghes->estatus)) {
>> and we should call __ghes_print_estatus() here too, to make sure the message
>> definitely got out!


> Okay, that makes sense. If we move this up, is there a problem with calling
> __ghes_panic() instead of making the __ghes_print_estatus() and
> __ghes_call_panic() calls here? It looks like that will just add a call to
> oops_begin() and ghes_print_queued_estatus() as well, but this is what
> ghes_notify_nmi() does if the severity is panic.


I don't think the queued stuff is relevant, isn't that just for x86-NMI messages
that it doesn't print out directly?

A quick grep shows arm64 doesn't have oops_begin(), you may have to add some
equivalent mechanism. Lets try and avoid that rabbit hole!

Given __ghes_panic() calls __ghes_print_estatus() too, you could try moving that
into your new __ghes_call_panic().... or whatever results in the least lines
changed!


Thanks,

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
  2017-02-15  6:24     ` Zhengqiang
  (?)
  (?)
@ 2017-02-15 14:58       ` Baicar, Tyler
  -1 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 14:58 UTC (permalink / raw)
  To: Zhengqiang
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi

Hello Zhengqiang,


On 2/14/2017 11:24 PM, Zhengqiang wrote:
> Hi Tyler:
>
> On 2017/2/2 1:16, Tyler Baicar wrote:
>> ARM APEI extension proposal added SEA (Synchronous External Abort)
>> notification type for ARMv8.
>> Add a new GHES error source handling function for SEA. If an error
>> source's notification type is SEA, then this function can be registered
>> into the SEA exception handler. That way GHES will parse and report
>> SEA exceptions when they occur.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
>> ---
>>   arch/arm64/Kconfig        |  2 ++
>>   arch/arm64/mm/fault.c     | 11 +++++++
>>   drivers/acpi/apei/Kconfig | 14 +++++++++
>>   drivers/acpi/apei/ghes.c  | 78 ++++++++++++++++++++++++++++++++++++++++++-----
>>   include/acpi/ghes.h       |  2 ++
>>   5 files changed, 100 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 1117421..f92778d 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -53,6 +53,8 @@ config ARM64
>>   	select HANDLE_DOMAIN_IRQ
>>   	select HARDIRQS_SW_RESEND
>>   	select HAVE_ACPI_APEI if (ACPI && EFI)
>> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
>> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA
>>   	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>>   	select HAVE_ARCH_AUDITSYSCALL
>>   	select HAVE_ARCH_BITREVERSE
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 9ae7e65..5a5a096 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -30,6 +30,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/perf_event.h>
>>   #include <linux/preempt.h>
>> +#include <linux/hardirq.h>
>>   
>>   #include <asm/bug.h>
>>   #include <asm/cpufeature.h>
>> @@ -41,6 +42,8 @@
>>   #include <asm/pgtable.h>
>>   #include <asm/tlbflush.h>
>>   
>> +#include <acpi/ghes.h>
>> +
>>   static const char *fault_name(unsigned int esr);
>>   
>>   #ifdef CONFIG_KPROBES
>> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>   		 fault_name(esr), esr, addr);
>>   
>> +	/*
>> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
>> +	 * rcu_read_lock().
>> +	 */
>> +	nmi_enter();
>> +	ghes_notify_sea();
>> +	nmi_exit();
>> +
> For fatal error:
> ghes_notify_sea() -> ghes_proc() -> __ghes_call_panic(),cause panic;
> do_sea() function also call arm64_notify_die(), will cause panic too;
> Does it can happen, panic will be called twice ?
Since the panic function never returns, it isn't possible to call it 
twice. ghes_proc will only
cause a panic if the severity of the error is GHES_SEV_PANIC, so it's 
possible that we call
panic from either the GHES code or from arm64_notify_die() depending on 
the error.

Thanks,
Tyler
>>   	info.si_signo = SIGBUS;
>>   	info.si_errno = 0;
>>   	info.si_code  = 0;
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index b0140c8..3786ff1 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>>   config HAVE_ACPI_APEI_NMI
>>   	bool
>>   
>> +config HAVE_ACPI_APEI_SEA
>> +	bool "APEI Synchronous External Abort logging/recovering support"
>> +	depends on ARM64
>> +	help
>> +	  This option should be enabled if the system supports
>> +	  firmware first handling of SEA (Synchronous External Abort).
>> +	  SEA happens with certain faults of data abort or instruction
>> +	  abort synchronous exceptions on ARMv8 systems. If a system
>> +	  supports firmware first handling of SEA, the platform analyzes
>> +	  and handles hardware error notifications with SEA, and it may then
>> +	  form a HW error record for the OS to parse and handle. This
>> +	  option allows the OS to look for such HW error record, and
>> +	  take appropriate action.
>> +
>>   config ACPI_APEI
>>   	bool "ACPI Platform Error Interface (APEI)"
>>   	select MISC_FILESYSTEMS
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index b25e7cf..8756172 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -114,11 +114,7 @@
>>    * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>>    * NMI context (optionally).
>>    */
>> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   #define GHES_IOREMAP_PAGES           2
>> -#else
>> -#define GHES_IOREMAP_PAGES           1
>> -#endif
>>   #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>>   #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>>   
>> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>>   
>>   static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>>   {
>> -	unsigned long vaddr;
>> +	unsigned long vaddr, paddr;
>> +	pgprot_t prot;
>>   
>>   	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
>> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
>> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
>> +
>> +	paddr = pfn << PAGE_SHIFT;
>> +	prot = arch_apei_get_mem_attribute(paddr);
>> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>>   
>>   	return (void __iomem *)vaddr;
>>   }
>> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>>   	.notifier_call = ghes_notify_sci,
>>   };
>>   
>> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
>> +static LIST_HEAD(ghes_sea);
>> +
>> +void ghes_notify_sea(void)
>> +{
>> +	struct ghes *ghes;
>> +
>> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
>> +		ghes_proc(ghes);
>> +	}
>> +}
>> +
>> +static int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_add_rcu(&ghes->list, &ghes_sea);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	return 0;
>> +}
>> +
>> +static void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_del_rcu(&ghes->list);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	synchronize_rcu();
>> +}
>> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +static inline int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +	return -ENOTSUPP;
>> +}
>> +
>> +static inline void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +}
>> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +
>>   #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   /*
>>    * printk is not safe in NMI context.  So in NMI handler, we allocate
>> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   	case ACPI_HEST_NOTIFY_EXTERNAL:
>>   	case ACPI_HEST_NOTIFY_SCI:
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
>> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
>> +				generic->header.source_id);
>> +			rc = -ENOTSUPP;
>> +			goto err;
>> +		}
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>>   			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
>> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>>   			   generic->header.source_id);
>>   		goto err;
>> +	case ACPI_HEST_NOTIFY_GPIO:
>> +	case ACPI_HEST_NOTIFY_SEI:
>> +	case ACPI_HEST_NOTIFY_GSIV:
>> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
>> +			generic->header.source_id, generic->header.source_id);
>> +		rc = -ENOTSUPP;
>> +		goto err;
>>   	default:
>>   		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>>   			   generic->notify.type, generic->header.source_id);
>> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		list_add_rcu(&ghes->list, &ghes_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		rc = ghes_sea_add(ghes);
>> +		if (rc)
>> +			goto err_edac_unreg;
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_add(ghes);
>>   		break;
>> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>>   			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		ghes_sea_remove(ghes);
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_remove(ghes);
>>   		break;
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 6ae318b..adf5455 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>>   		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>>   		gdata + 1;
>>   }
>> +
>> +void ghes_notify_sea(void);
>>
> Thanks,
>
> Zhengqiang
>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-15 14:58       ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 14:58 UTC (permalink / raw)
  To: Zhengqiang
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose

Hello Zhengqiang,


On 2/14/2017 11:24 PM, Zhengqiang wrote:
> Hi Tyler:
>
> On 2017/2/2 1:16, Tyler Baicar wrote:
>> ARM APEI extension proposal added SEA (Synchronous External Abort)
>> notification type for ARMv8.
>> Add a new GHES error source handling function for SEA. If an error
>> source's notification type is SEA, then this function can be registered
>> into the SEA exception handler. That way GHES will parse and report
>> SEA exceptions when they occur.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
>> ---
>>   arch/arm64/Kconfig        |  2 ++
>>   arch/arm64/mm/fault.c     | 11 +++++++
>>   drivers/acpi/apei/Kconfig | 14 +++++++++
>>   drivers/acpi/apei/ghes.c  | 78 ++++++++++++++++++++++++++++++++++++++++++-----
>>   include/acpi/ghes.h       |  2 ++
>>   5 files changed, 100 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 1117421..f92778d 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -53,6 +53,8 @@ config ARM64
>>   	select HANDLE_DOMAIN_IRQ
>>   	select HARDIRQS_SW_RESEND
>>   	select HAVE_ACPI_APEI if (ACPI && EFI)
>> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
>> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA
>>   	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>>   	select HAVE_ARCH_AUDITSYSCALL
>>   	select HAVE_ARCH_BITREVERSE
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 9ae7e65..5a5a096 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -30,6 +30,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/perf_event.h>
>>   #include <linux/preempt.h>
>> +#include <linux/hardirq.h>
>>   
>>   #include <asm/bug.h>
>>   #include <asm/cpufeature.h>
>> @@ -41,6 +42,8 @@
>>   #include <asm/pgtable.h>
>>   #include <asm/tlbflush.h>
>>   
>> +#include <acpi/ghes.h>
>> +
>>   static const char *fault_name(unsigned int esr);
>>   
>>   #ifdef CONFIG_KPROBES
>> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>   		 fault_name(esr), esr, addr);
>>   
>> +	/*
>> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
>> +	 * rcu_read_lock().
>> +	 */
>> +	nmi_enter();
>> +	ghes_notify_sea();
>> +	nmi_exit();
>> +
> For fatal error:
> ghes_notify_sea() -> ghes_proc() -> __ghes_call_panic(),cause panic;
> do_sea() function also call arm64_notify_die(), will cause panic too;
> Does it can happen, panic will be called twice ?
Since the panic function never returns, it isn't possible to call it 
twice. ghes_proc will only
cause a panic if the severity of the error is GHES_SEV_PANIC, so it's 
possible that we call
panic from either the GHES code or from arm64_notify_die() depending on 
the error.

Thanks,
Tyler
>>   	info.si_signo = SIGBUS;
>>   	info.si_errno = 0;
>>   	info.si_code  = 0;
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index b0140c8..3786ff1 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>>   config HAVE_ACPI_APEI_NMI
>>   	bool
>>   
>> +config HAVE_ACPI_APEI_SEA
>> +	bool "APEI Synchronous External Abort logging/recovering support"
>> +	depends on ARM64
>> +	help
>> +	  This option should be enabled if the system supports
>> +	  firmware first handling of SEA (Synchronous External Abort).
>> +	  SEA happens with certain faults of data abort or instruction
>> +	  abort synchronous exceptions on ARMv8 systems. If a system
>> +	  supports firmware first handling of SEA, the platform analyzes
>> +	  and handles hardware error notifications with SEA, and it may then
>> +	  form a HW error record for the OS to parse and handle. This
>> +	  option allows the OS to look for such HW error record, and
>> +	  take appropriate action.
>> +
>>   config ACPI_APEI
>>   	bool "ACPI Platform Error Interface (APEI)"
>>   	select MISC_FILESYSTEMS
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index b25e7cf..8756172 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -114,11 +114,7 @@
>>    * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>>    * NMI context (optionally).
>>    */
>> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   #define GHES_IOREMAP_PAGES           2
>> -#else
>> -#define GHES_IOREMAP_PAGES           1
>> -#endif
>>   #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>>   #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>>   
>> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>>   
>>   static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>>   {
>> -	unsigned long vaddr;
>> +	unsigned long vaddr, paddr;
>> +	pgprot_t prot;
>>   
>>   	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
>> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
>> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
>> +
>> +	paddr = pfn << PAGE_SHIFT;
>> +	prot = arch_apei_get_mem_attribute(paddr);
>> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>>   
>>   	return (void __iomem *)vaddr;
>>   }
>> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>>   	.notifier_call = ghes_notify_sci,
>>   };
>>   
>> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
>> +static LIST_HEAD(ghes_sea);
>> +
>> +void ghes_notify_sea(void)
>> +{
>> +	struct ghes *ghes;
>> +
>> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
>> +		ghes_proc(ghes);
>> +	}
>> +}
>> +
>> +static int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_add_rcu(&ghes->list, &ghes_sea);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	return 0;
>> +}
>> +
>> +static void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_del_rcu(&ghes->list);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	synchronize_rcu();
>> +}
>> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +static inline int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +	return -ENOTSUPP;
>> +}
>> +
>> +static inline void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +}
>> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +
>>   #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   /*
>>    * printk is not safe in NMI context.  So in NMI handler, we allocate
>> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   	case ACPI_HEST_NOTIFY_EXTERNAL:
>>   	case ACPI_HEST_NOTIFY_SCI:
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
>> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
>> +				generic->header.source_id);
>> +			rc = -ENOTSUPP;
>> +			goto err;
>> +		}
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>>   			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
>> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>>   			   generic->header.source_id);
>>   		goto err;
>> +	case ACPI_HEST_NOTIFY_GPIO:
>> +	case ACPI_HEST_NOTIFY_SEI:
>> +	case ACPI_HEST_NOTIFY_GSIV:
>> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
>> +			generic->header.source_id, generic->header.source_id);
>> +		rc = -ENOTSUPP;
>> +		goto err;
>>   	default:
>>   		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>>   			   generic->notify.type, generic->header.source_id);
>> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		list_add_rcu(&ghes->list, &ghes_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		rc = ghes_sea_add(ghes);
>> +		if (rc)
>> +			goto err_edac_unreg;
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_add(ghes);
>>   		break;
>> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>>   			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		ghes_sea_remove(ghes);
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_remove(ghes);
>>   		break;
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 6ae318b..adf5455 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>>   		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>>   		gdata + 1;
>>   }
>> +
>> +void ghes_notify_sea(void);
>>
> Thanks,
>
> Zhengqiang
>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-15 14:58       ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 14:58 UTC (permalink / raw)
  To: Zhengqiang
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi

Hello Zhengqiang,


On 2/14/2017 11:24 PM, Zhengqiang wrote:
> Hi Tyler:
>
> On 2017/2/2 1:16, Tyler Baicar wrote:
>> ARM APEI extension proposal added SEA (Synchronous External Abort)
>> notification type for ARMv8.
>> Add a new GHES error source handling function for SEA. If an error
>> source's notification type is SEA, then this function can be registered
>> into the SEA exception handler. That way GHES will parse and report
>> SEA exceptions when they occur.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
>> ---
>>   arch/arm64/Kconfig        |  2 ++
>>   arch/arm64/mm/fault.c     | 11 +++++++
>>   drivers/acpi/apei/Kconfig | 14 +++++++++
>>   drivers/acpi/apei/ghes.c  | 78 ++++++++++++++++++++++++++++++++++++++++++-----
>>   include/acpi/ghes.h       |  2 ++
>>   5 files changed, 100 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 1117421..f92778d 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -53,6 +53,8 @@ config ARM64
>>   	select HANDLE_DOMAIN_IRQ
>>   	select HARDIRQS_SW_RESEND
>>   	select HAVE_ACPI_APEI if (ACPI && EFI)
>> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
>> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA
>>   	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>>   	select HAVE_ARCH_AUDITSYSCALL
>>   	select HAVE_ARCH_BITREVERSE
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 9ae7e65..5a5a096 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -30,6 +30,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/perf_event.h>
>>   #include <linux/preempt.h>
>> +#include <linux/hardirq.h>
>>   
>>   #include <asm/bug.h>
>>   #include <asm/cpufeature.h>
>> @@ -41,6 +42,8 @@
>>   #include <asm/pgtable.h>
>>   #include <asm/tlbflush.h>
>>   
>> +#include <acpi/ghes.h>
>> +
>>   static const char *fault_name(unsigned int esr);
>>   
>>   #ifdef CONFIG_KPROBES
>> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>   		 fault_name(esr), esr, addr);
>>   
>> +	/*
>> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
>> +	 * rcu_read_lock().
>> +	 */
>> +	nmi_enter();
>> +	ghes_notify_sea();
>> +	nmi_exit();
>> +
> For fatal error:
> ghes_notify_sea() -> ghes_proc() -> __ghes_call_panic(),cause panic;
> do_sea() function also call arm64_notify_die(), will cause panic too;
> Does it can happen, panic will be called twice ?
Since the panic function never returns, it isn't possible to call it 
twice. ghes_proc will only
cause a panic if the severity of the error is GHES_SEV_PANIC, so it's 
possible that we call
panic from either the GHES code or from arm64_notify_die() depending on 
the error.

Thanks,
Tyler
>>   	info.si_signo = SIGBUS;
>>   	info.si_errno = 0;
>>   	info.si_code  = 0;
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index b0140c8..3786ff1 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>>   config HAVE_ACPI_APEI_NMI
>>   	bool
>>   
>> +config HAVE_ACPI_APEI_SEA
>> +	bool "APEI Synchronous External Abort logging/recovering support"
>> +	depends on ARM64
>> +	help
>> +	  This option should be enabled if the system supports
>> +	  firmware first handling of SEA (Synchronous External Abort).
>> +	  SEA happens with certain faults of data abort or instruction
>> +	  abort synchronous exceptions on ARMv8 systems. If a system
>> +	  supports firmware first handling of SEA, the platform analyzes
>> +	  and handles hardware error notifications with SEA, and it may then
>> +	  form a HW error record for the OS to parse and handle. This
>> +	  option allows the OS to look for such HW error record, and
>> +	  take appropriate action.
>> +
>>   config ACPI_APEI
>>   	bool "ACPI Platform Error Interface (APEI)"
>>   	select MISC_FILESYSTEMS
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index b25e7cf..8756172 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -114,11 +114,7 @@
>>    * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>>    * NMI context (optionally).
>>    */
>> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   #define GHES_IOREMAP_PAGES           2
>> -#else
>> -#define GHES_IOREMAP_PAGES           1
>> -#endif
>>   #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>>   #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>>   
>> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>>   
>>   static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>>   {
>> -	unsigned long vaddr;
>> +	unsigned long vaddr, paddr;
>> +	pgprot_t prot;
>>   
>>   	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
>> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
>> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
>> +
>> +	paddr = pfn << PAGE_SHIFT;
>> +	prot = arch_apei_get_mem_attribute(paddr);
>> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>>   
>>   	return (void __iomem *)vaddr;
>>   }
>> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>>   	.notifier_call = ghes_notify_sci,
>>   };
>>   
>> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
>> +static LIST_HEAD(ghes_sea);
>> +
>> +void ghes_notify_sea(void)
>> +{
>> +	struct ghes *ghes;
>> +
>> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
>> +		ghes_proc(ghes);
>> +	}
>> +}
>> +
>> +static int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_add_rcu(&ghes->list, &ghes_sea);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	return 0;
>> +}
>> +
>> +static void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_del_rcu(&ghes->list);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	synchronize_rcu();
>> +}
>> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +static inline int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +	return -ENOTSUPP;
>> +}
>> +
>> +static inline void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +}
>> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +
>>   #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   /*
>>    * printk is not safe in NMI context.  So in NMI handler, we allocate
>> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   	case ACPI_HEST_NOTIFY_EXTERNAL:
>>   	case ACPI_HEST_NOTIFY_SCI:
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
>> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
>> +				generic->header.source_id);
>> +			rc = -ENOTSUPP;
>> +			goto err;
>> +		}
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>>   			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
>> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>>   			   generic->header.source_id);
>>   		goto err;
>> +	case ACPI_HEST_NOTIFY_GPIO:
>> +	case ACPI_HEST_NOTIFY_SEI:
>> +	case ACPI_HEST_NOTIFY_GSIV:
>> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
>> +			generic->header.source_id, generic->header.source_id);
>> +		rc = -ENOTSUPP;
>> +		goto err;
>>   	default:
>>   		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>>   			   generic->notify.type, generic->header.source_id);
>> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		list_add_rcu(&ghes->list, &ghes_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		rc = ghes_sea_add(ghes);
>> +		if (rc)
>> +			goto err_edac_unreg;
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_add(ghes);
>>   		break;
>> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>>   			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		ghes_sea_remove(ghes);
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_remove(ghes);
>>   		break;
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 6ae318b..adf5455 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>>   		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>>   		gdata + 1;
>>   }
>> +
>> +void ghes_notify_sea(void);
>>
> Thanks,
>
> Zhengqiang
>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8
@ 2017-02-15 14:58       ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 14:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Zhengqiang,


On 2/14/2017 11:24 PM, Zhengqiang wrote:
> Hi Tyler:
>
> On 2017/2/2 1:16, Tyler Baicar wrote:
>> ARM APEI extension proposal added SEA (Synchronous External Abort)
>> notification type for ARMv8.
>> Add a new GHES error source handling function for SEA. If an error
>> source's notification type is SEA, then this function can be registered
>> into the SEA exception handler. That way GHES will parse and report
>> SEA exceptions when they occur.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
>> ---
>>   arch/arm64/Kconfig        |  2 ++
>>   arch/arm64/mm/fault.c     | 11 +++++++
>>   drivers/acpi/apei/Kconfig | 14 +++++++++
>>   drivers/acpi/apei/ghes.c  | 78 ++++++++++++++++++++++++++++++++++++++++++-----
>>   include/acpi/ghes.h       |  2 ++
>>   5 files changed, 100 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 1117421..f92778d 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -53,6 +53,8 @@ config ARM64
>>   	select HANDLE_DOMAIN_IRQ
>>   	select HARDIRQS_SW_RESEND
>>   	select HAVE_ACPI_APEI if (ACPI && EFI)
>> +	select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
>> +	select HAVE_NMI if HAVE_ACPI_APEI_SEA
>>   	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>>   	select HAVE_ARCH_AUDITSYSCALL
>>   	select HAVE_ARCH_BITREVERSE
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 9ae7e65..5a5a096 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -30,6 +30,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/perf_event.h>
>>   #include <linux/preempt.h>
>> +#include <linux/hardirq.h>
>>   
>>   #include <asm/bug.h>
>>   #include <asm/cpufeature.h>
>> @@ -41,6 +42,8 @@
>>   #include <asm/pgtable.h>
>>   #include <asm/tlbflush.h>
>>   
>> +#include <acpi/ghes.h>
>> +
>>   static const char *fault_name(unsigned int esr);
>>   
>>   #ifdef CONFIG_KPROBES
>> @@ -500,6 +503,14 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>   	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>   		 fault_name(esr), esr, addr);
>>   
>> +	/*
>> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
>> +	 * rcu_read_lock().
>> +	 */
>> +	nmi_enter();
>> +	ghes_notify_sea();
>> +	nmi_exit();
>> +
> For fatal error:
> ghes_notify_sea() -> ghes_proc() -> __ghes_call_panic(),cause panic;
> do_sea() function also call arm64_notify_die(), will cause panic too;
> Does it can happen, panic will be called twice ?
Since the panic function never returns, it isn't possible to call it 
twice. ghes_proc will only
cause a panic if the severity of the error is GHES_SEV_PANIC, so it's 
possible that we call
panic from either the GHES code or from arm64_notify_die() depending on 
the error.

Thanks,
Tyler
>>   	info.si_signo = SIGBUS;
>>   	info.si_errno = 0;
>>   	info.si_code  = 0;
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index b0140c8..3786ff1 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>>   config HAVE_ACPI_APEI_NMI
>>   	bool
>>   
>> +config HAVE_ACPI_APEI_SEA
>> +	bool "APEI Synchronous External Abort logging/recovering support"
>> +	depends on ARM64
>> +	help
>> +	  This option should be enabled if the system supports
>> +	  firmware first handling of SEA (Synchronous External Abort).
>> +	  SEA happens with certain faults of data abort or instruction
>> +	  abort synchronous exceptions on ARMv8 systems. If a system
>> +	  supports firmware first handling of SEA, the platform analyzes
>> +	  and handles hardware error notifications with SEA, and it may then
>> +	  form a HW error record for the OS to parse and handle. This
>> +	  option allows the OS to look for such HW error record, and
>> +	  take appropriate action.
>> +
>>   config ACPI_APEI
>>   	bool "ACPI Platform Error Interface (APEI)"
>>   	select MISC_FILESYSTEMS
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index b25e7cf..8756172 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -114,11 +114,7 @@
>>    * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>>    * NMI context (optionally).
>>    */
>> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   #define GHES_IOREMAP_PAGES           2
>> -#else
>> -#define GHES_IOREMAP_PAGES           1
>> -#endif
>>   #define GHES_IOREMAP_IRQ_PAGE(base)	(base)
>>   #define GHES_IOREMAP_NMI_PAGE(base)	((base) + PAGE_SIZE)
>>   
>> @@ -156,11 +152,14 @@ static void ghes_ioremap_exit(void)
>>   
>>   static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn)
>>   {
>> -	unsigned long vaddr;
>> +	unsigned long vaddr, paddr;
>> +	pgprot_t prot;
>>   
>>   	vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr);
>> -	ioremap_page_range(vaddr, vaddr + PAGE_SIZE,
>> -			   pfn << PAGE_SHIFT, PAGE_KERNEL);
>> +
>> +	paddr = pfn << PAGE_SHIFT;
>> +	prot = arch_apei_get_mem_attribute(paddr);
>> +	ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot);
>>   
>>   	return (void __iomem *)vaddr;
>>   }
>> @@ -767,6 +766,48 @@ static int ghes_notify_sci(struct notifier_block *this,
>>   	.notifier_call = ghes_notify_sci,
>>   };
>>   
>> +#ifdef CONFIG_HAVE_ACPI_APEI_SEA
>> +static LIST_HEAD(ghes_sea);
>> +
>> +void ghes_notify_sea(void)
>> +{
>> +	struct ghes *ghes;
>> +
>> +	list_for_each_entry_rcu(ghes, &ghes_sea, list) {
>> +		ghes_proc(ghes);
>> +	}
>> +}
>> +
>> +static int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_add_rcu(&ghes->list, &ghes_sea);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	return 0;
>> +}
>> +
>> +static void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	mutex_lock(&ghes_list_mutex);
>> +	list_del_rcu(&ghes->list);
>> +	mutex_unlock(&ghes_list_mutex);
>> +	synchronize_rcu();
>> +}
>> +#else /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +static inline int ghes_sea_add(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +	return -ENOTSUPP;
>> +}
>> +
>> +static inline void ghes_sea_remove(struct ghes *ghes)
>> +{
>> +	pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n",
>> +	       ghes->generic->header.source_id);
>> +}
>> +#endif /* CONFIG_HAVE_ACPI_APEI_SEA */
>> +
>>   #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>>   /*
>>    * printk is not safe in NMI context.  So in NMI handler, we allocate
>> @@ -1012,6 +1053,14 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   	case ACPI_HEST_NOTIFY_EXTERNAL:
>>   	case ACPI_HEST_NOTIFY_SCI:
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_SEA)) {
>> +			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEA is not supported\n",
>> +				generic->header.source_id);
>> +			rc = -ENOTSUPP;
>> +			goto err;
>> +		}
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>>   			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
>> @@ -1023,6 +1072,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		pr_warning(GHES_PFX "Generic hardware error source: %d notified via local interrupt is not supported!\n",
>>   			   generic->header.source_id);
>>   		goto err;
>> +	case ACPI_HEST_NOTIFY_GPIO:
>> +	case ACPI_HEST_NOTIFY_SEI:
>> +	case ACPI_HEST_NOTIFY_GSIV:
>> +		pr_warn(GHES_PFX "Generic hardware error source: %d notified via notification type %u is not supported\n",
>> +			generic->header.source_id, generic->header.source_id);
>> +		rc = -ENOTSUPP;
>> +		goto err;
>>   	default:
>>   		pr_warning(FW_WARN GHES_PFX "Unknown notification type: %u for generic hardware error source: %d\n",
>>   			   generic->notify.type, generic->header.source_id);
>> @@ -1077,6 +1133,11 @@ static int ghes_probe(struct platform_device *ghes_dev)
>>   		list_add_rcu(&ghes->list, &ghes_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		rc = ghes_sea_add(ghes);
>> +		if (rc)
>> +			goto err_edac_unreg;
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_add(ghes);
>>   		break;
>> @@ -1119,6 +1180,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
>>   			unregister_acpi_hed_notifier(&ghes_notifier_sci);
>>   		mutex_unlock(&ghes_list_mutex);
>>   		break;
>> +	case ACPI_HEST_NOTIFY_SEA:
>> +		ghes_sea_remove(ghes);
>> +		break;
>>   	case ACPI_HEST_NOTIFY_NMI:
>>   		ghes_nmi_remove(ghes);
>>   		break;
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 6ae318b..adf5455 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -95,3 +95,5 @@ static inline void *acpi_hest_generic_data_payload(struct acpi_hest_generic_data
>>   		(void *)(((struct acpi_hest_generic_data_v300 *)(gdata)) + 1) :
>>   		gdata + 1;
>>   }
>> +
>> +void ghes_notify_sea(void);
>>
> Thanks,
>
> Zhengqiang
>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
  2017-02-01 17:16   ` Tyler Baicar
  (?)
@ 2017-02-15 15:52     ` Steven Rostedt
  -1 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-02-15 15:52 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, bristot, linux-arm-kernel, kvmarm,
	kvm, linux-kernel, linux-acpi, linux-efi, devel, S

On Wed,  1 Feb 2017 10:16:51 -0700
Tyler Baicar <tbaicar@codeaurora.org> wrote:

> @@ -452,11 +454,21 @@ static void ghes_do_proc(struct ghes *ghes,
>  {
>  	int sev, sec_sev;
>  	struct acpi_hest_generic_data *gdata;
> +	uuid_le sec_type;
> +	uuid_le *fru_id = &NULL_UUID_LE;
> +	char *fru_text = "";
>  
>  	sev = ghes_severity(estatus->error_severity);
>  	apei_estatus_for_each_section(estatus, gdata) {
>  		sec_sev = ghes_severity(gdata->error_severity);
> -		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
> +		sec_type = *(uuid_le *)gdata->section_type;
> +
> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
> +			fru_id = (uuid_le *)gdata->fru_id;
> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
> +			fru_text = gdata->fru_text;
> +
> +		if (!uuid_le_cmp(sec_type,
>  				 CPER_SEC_PLATFORM_MEM)) {
>  			struct cper_sec_mem_err *mem_err;
>  
> @@ -467,7 +479,7 @@ static void ghes_do_proc(struct ghes *ghes,
>  			ghes_handle_memory_failure(gdata, sev);
>  		}
>  #ifdef CONFIG_ACPI_APEI_PCIEAER
> -		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
> +		else if (!uuid_le_cmp(sec_type,
>  				      CPER_SEC_PCIE)) {
>  			struct cper_sec_pcie *pcie_err;
>  
> @@ -500,6 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>  
>  		}
>  #endif
> +		else {

As an optimization, you can add:

		else if (trace_unknown_sec_event_enabled()) {

instead, as then this wont be called unless the tracepoint is
activated. Will keep the logic from doing anything with
acpi_hest_generic_data_payload().

Note, that trace_*_enabled() is activated via a jump_label, thus
there's no branches involved.

-- Steve

> +			void *unknown_err = acpi_hest_generic_data_payload(gdata);
> +			trace_unknown_sec_event(&sec_type,
> +					fru_id, fru_text, sec_sev,
> +					unknown_err, gdata->error_data_length);
> +		}
>  	}
>  }
>  
>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-15 15:52     ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-02-15 15:52 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, bristot, linux-arm-kernel, kvmarm,
	kvm, linux-kernel, linux-acpi, linux-efi, devel, Suzuki.Poulose,
	punit.agrawal, astone, harba, hanjun.guo, john.garry, shiju.jose

On Wed,  1 Feb 2017 10:16:51 -0700
Tyler Baicar <tbaicar@codeaurora.org> wrote:

> @@ -452,11 +454,21 @@ static void ghes_do_proc(struct ghes *ghes,
>  {
>  	int sev, sec_sev;
>  	struct acpi_hest_generic_data *gdata;
> +	uuid_le sec_type;
> +	uuid_le *fru_id = &NULL_UUID_LE;
> +	char *fru_text = "";
>  
>  	sev = ghes_severity(estatus->error_severity);
>  	apei_estatus_for_each_section(estatus, gdata) {
>  		sec_sev = ghes_severity(gdata->error_severity);
> -		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
> +		sec_type = *(uuid_le *)gdata->section_type;
> +
> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
> +			fru_id = (uuid_le *)gdata->fru_id;
> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
> +			fru_text = gdata->fru_text;
> +
> +		if (!uuid_le_cmp(sec_type,
>  				 CPER_SEC_PLATFORM_MEM)) {
>  			struct cper_sec_mem_err *mem_err;
>  
> @@ -467,7 +479,7 @@ static void ghes_do_proc(struct ghes *ghes,
>  			ghes_handle_memory_failure(gdata, sev);
>  		}
>  #ifdef CONFIG_ACPI_APEI_PCIEAER
> -		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
> +		else if (!uuid_le_cmp(sec_type,
>  				      CPER_SEC_PCIE)) {
>  			struct cper_sec_pcie *pcie_err;
>  
> @@ -500,6 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>  
>  		}
>  #endif
> +		else {

As an optimization, you can add:

		else if (trace_unknown_sec_event_enabled()) {

instead, as then this wont be called unless the tracepoint is
activated. Will keep the logic from doing anything with
acpi_hest_generic_data_payload().

Note, that trace_*_enabled() is activated via a jump_label, thus
there's no branches involved.

-- Steve

> +			void *unknown_err = acpi_hest_generic_data_payload(gdata);
> +			trace_unknown_sec_event(&sec_type,
> +					fru_id, fru_text, sec_sev,
> +					unknown_err, gdata->error_data_length);
> +		}
>  	}
>  }
>  
>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-15 15:52     ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-02-15 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed,  1 Feb 2017 10:16:51 -0700
Tyler Baicar <tbaicar@codeaurora.org> wrote:

> @@ -452,11 +454,21 @@ static void ghes_do_proc(struct ghes *ghes,
>  {
>  	int sev, sec_sev;
>  	struct acpi_hest_generic_data *gdata;
> +	uuid_le sec_type;
> +	uuid_le *fru_id = &NULL_UUID_LE;
> +	char *fru_text = "";
>  
>  	sev = ghes_severity(estatus->error_severity);
>  	apei_estatus_for_each_section(estatus, gdata) {
>  		sec_sev = ghes_severity(gdata->error_severity);
> -		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
> +		sec_type = *(uuid_le *)gdata->section_type;
> +
> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
> +			fru_id = (uuid_le *)gdata->fru_id;
> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
> +			fru_text = gdata->fru_text;
> +
> +		if (!uuid_le_cmp(sec_type,
>  				 CPER_SEC_PLATFORM_MEM)) {
>  			struct cper_sec_mem_err *mem_err;
>  
> @@ -467,7 +479,7 @@ static void ghes_do_proc(struct ghes *ghes,
>  			ghes_handle_memory_failure(gdata, sev);
>  		}
>  #ifdef CONFIG_ACPI_APEI_PCIEAER
> -		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
> +		else if (!uuid_le_cmp(sec_type,
>  				      CPER_SEC_PCIE)) {
>  			struct cper_sec_pcie *pcie_err;
>  
> @@ -500,6 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>  
>  		}
>  #endif
> +		else {

As an optimization, you can add:

		else if (trace_unknown_sec_event_enabled()) {

instead, as then this wont be called unless the tracepoint is
activated. Will keep the logic from doing anything with
acpi_hest_generic_data_payload().

Note, that trace_*_enabled() is activated via a jump_label, thus
there's no branches involved.

-- Steve

> +			void *unknown_err = acpi_hest_generic_data_payload(gdata);
> +			trace_unknown_sec_event(&sec_type,
> +					fru_id, fru_text, sec_sev,
> +					unknown_err, gdata->error_data_length);
> +		}
>  	}
>  }
>  
>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
  2017-02-15 15:52     ` Steven Rostedt
  (?)
@ 2017-02-15 16:54       ` Baicar, Tyler
  -1 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 16:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-efi, kvm, matt, catalin.marinas, will.deacon, robert.moore,
	paul.gortmaker, lv.zheng, kvmarm, fu.wei, zjzhang, linux,
	linux-acpi, eun.taik.lee, shijie.huang, labbott, lenb, harba,
	john.garry, marc.zyngier, punit.agrawal, nkaje,
	sandeepa.s.prabhu, linux-arm-kernel, devel, rjw, rruigrok,
	linux-kernel, astone, hanjun.guo, pbonzini, akpm, bristot,
	shiju.jose

On 2/15/2017 8:52 AM, Steven Rostedt wrote:
> On Wed,  1 Feb 2017 10:16:51 -0700
> Tyler Baicar <tbaicar@codeaurora.org> wrote:
>
>> @@ -452,11 +454,21 @@ static void ghes_do_proc(struct ghes *ghes,
>>   {
>>   	int sev, sec_sev;
>>   	struct acpi_hest_generic_data *gdata;
>> +	uuid_le sec_type;
>> +	uuid_le *fru_id = &NULL_UUID_LE;
>> +	char *fru_text = "";
>>   
>>   	sev = ghes_severity(estatus->error_severity);
>>   	apei_estatus_for_each_section(estatus, gdata) {
>>   		sec_sev = ghes_severity(gdata->error_severity);
>> -		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
>> +		sec_type = *(uuid_le *)gdata->section_type;
>> +
>> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
>> +			fru_id = (uuid_le *)gdata->fru_id;
>> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
>> +			fru_text = gdata->fru_text;
>> +
>> +		if (!uuid_le_cmp(sec_type,
>>   				 CPER_SEC_PLATFORM_MEM)) {
>>   			struct cper_sec_mem_err *mem_err;
>>   
>> @@ -467,7 +479,7 @@ static void ghes_do_proc(struct ghes *ghes,
>>   			ghes_handle_memory_failure(gdata, sev);
>>   		}
>>   #ifdef CONFIG_ACPI_APEI_PCIEAER
>> -		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
>> +		else if (!uuid_le_cmp(sec_type,
>>   				      CPER_SEC_PCIE)) {
>>   			struct cper_sec_pcie *pcie_err;
>>   
>> @@ -500,6 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>>   
>>   		}
>>   #endif
>> +		else {
> As an optimization, you can add:
>
> 		else if (trace_unknown_sec_event_enabled()) {
>
> instead, as then this wont be called unless the tracepoint is
> activated. Will keep the logic from doing anything with
> acpi_hest_generic_data_payload().
>
> Note, that trace_*_enabled() is activated via a jump_label, thus
> there's no branches involved.
>
> -- Steve
Hello Steve,

In v9 I currently have this and the ARM trace event from this series 
both wrapped in an
ifdef verifying that CONFIG_RAS is enabled. This resolves the kbuild 
failures and
will have this code compiled out when that config isn't enabled. Do you 
think I
should use the ifdef or use these *_enabled() functions?

Thanks,
Tyler
>> +			void *unknown_err = acpi_hest_generic_data_payload(gdata);
>> +			trace_unknown_sec_event(&sec_type,
>> +					fru_id, fru_text, sec_sev,
>> +					unknown_err, gdata->error_data_length);
>> +		}
>>   	}
>>   }
>>   
>>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-15 16:54       ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 16:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, bristot, linux-arm-kernel, kvmarm,
	kvm, linux-kernel, linux-acpi, linux-efi, devel, Suzuki.Poulose,
	punit.agrawal, astone, harba, hanjun.guo, john.garry, shiju.jose

On 2/15/2017 8:52 AM, Steven Rostedt wrote:
> On Wed,  1 Feb 2017 10:16:51 -0700
> Tyler Baicar <tbaicar@codeaurora.org> wrote:
>
>> @@ -452,11 +454,21 @@ static void ghes_do_proc(struct ghes *ghes,
>>   {
>>   	int sev, sec_sev;
>>   	struct acpi_hest_generic_data *gdata;
>> +	uuid_le sec_type;
>> +	uuid_le *fru_id = &NULL_UUID_LE;
>> +	char *fru_text = "";
>>   
>>   	sev = ghes_severity(estatus->error_severity);
>>   	apei_estatus_for_each_section(estatus, gdata) {
>>   		sec_sev = ghes_severity(gdata->error_severity);
>> -		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
>> +		sec_type = *(uuid_le *)gdata->section_type;
>> +
>> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
>> +			fru_id = (uuid_le *)gdata->fru_id;
>> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
>> +			fru_text = gdata->fru_text;
>> +
>> +		if (!uuid_le_cmp(sec_type,
>>   				 CPER_SEC_PLATFORM_MEM)) {
>>   			struct cper_sec_mem_err *mem_err;
>>   
>> @@ -467,7 +479,7 @@ static void ghes_do_proc(struct ghes *ghes,
>>   			ghes_handle_memory_failure(gdata, sev);
>>   		}
>>   #ifdef CONFIG_ACPI_APEI_PCIEAER
>> -		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
>> +		else if (!uuid_le_cmp(sec_type,
>>   				      CPER_SEC_PCIE)) {
>>   			struct cper_sec_pcie *pcie_err;
>>   
>> @@ -500,6 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>>   
>>   		}
>>   #endif
>> +		else {
> As an optimization, you can add:
>
> 		else if (trace_unknown_sec_event_enabled()) {
>
> instead, as then this wont be called unless the tracepoint is
> activated. Will keep the logic from doing anything with
> acpi_hest_generic_data_payload().
>
> Note, that trace_*_enabled() is activated via a jump_label, thus
> there's no branches involved.
>
> -- Steve
Hello Steve,

In v9 I currently have this and the ARM trace event from this series 
both wrapped in an
ifdef verifying that CONFIG_RAS is enabled. This resolves the kbuild 
failures and
will have this code compiled out when that config isn't enabled. Do you 
think I
should use the ifdef or use these *_enabled() functions?

Thanks,
Tyler
>> +			void *unknown_err = acpi_hest_generic_data_payload(gdata);
>> +			trace_unknown_sec_event(&sec_type,
>> +					fru_id, fru_text, sec_sev,
>> +					unknown_err, gdata->error_data_length);
>> +		}
>>   	}
>>   }
>>   
>>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-15 16:54       ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/15/2017 8:52 AM, Steven Rostedt wrote:
> On Wed,  1 Feb 2017 10:16:51 -0700
> Tyler Baicar <tbaicar@codeaurora.org> wrote:
>
>> @@ -452,11 +454,21 @@ static void ghes_do_proc(struct ghes *ghes,
>>   {
>>   	int sev, sec_sev;
>>   	struct acpi_hest_generic_data *gdata;
>> +	uuid_le sec_type;
>> +	uuid_le *fru_id = &NULL_UUID_LE;
>> +	char *fru_text = "";
>>   
>>   	sev = ghes_severity(estatus->error_severity);
>>   	apei_estatus_for_each_section(estatus, gdata) {
>>   		sec_sev = ghes_severity(gdata->error_severity);
>> -		if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
>> +		sec_type = *(uuid_le *)gdata->section_type;
>> +
>> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
>> +			fru_id = (uuid_le *)gdata->fru_id;
>> +		if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
>> +			fru_text = gdata->fru_text;
>> +
>> +		if (!uuid_le_cmp(sec_type,
>>   				 CPER_SEC_PLATFORM_MEM)) {
>>   			struct cper_sec_mem_err *mem_err;
>>   
>> @@ -467,7 +479,7 @@ static void ghes_do_proc(struct ghes *ghes,
>>   			ghes_handle_memory_failure(gdata, sev);
>>   		}
>>   #ifdef CONFIG_ACPI_APEI_PCIEAER
>> -		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
>> +		else if (!uuid_le_cmp(sec_type,
>>   				      CPER_SEC_PCIE)) {
>>   			struct cper_sec_pcie *pcie_err;
>>   
>> @@ -500,6 +512,12 @@ static void ghes_do_proc(struct ghes *ghes,
>>   
>>   		}
>>   #endif
>> +		else {
> As an optimization, you can add:
>
> 		else if (trace_unknown_sec_event_enabled()) {
>
> instead, as then this wont be called unless the tracepoint is
> activated. Will keep the logic from doing anything with
> acpi_hest_generic_data_payload().
>
> Note, that trace_*_enabled() is activated via a jump_label, thus
> there's no branches involved.
>
> -- Steve
Hello Steve,

In v9 I currently have this and the ARM trace event from this series 
both wrapped in an
ifdef verifying that CONFIG_RAS is enabled. This resolves the kbuild 
failures and
will have this code compiled out when that config isn't enabled. Do you 
think I
should use the ifdef or use these *_enabled() functions?

Thanks,
Tyler
>> +			void *unknown_err = acpi_hest_generic_data_payload(gdata);
>> +			trace_unknown_sec_event(&sec_type,
>> +					fru_id, fru_text, sec_sev,
>> +					unknown_err, gdata->error_data_length);
>> +		}
>>   	}
>>   }
>>   
>>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
  2017-02-15 16:54       ` Baicar, Tyler
  (?)
@ 2017-02-15 17:03         ` Steven Rostedt
  -1 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-02-15 17:03 UTC (permalink / raw)
  To: Baicar, Tyler
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, bristot, linux-arm-kernel, kvmarm,
	kvm, linux-kernel, linux-acpi, linux-efi, devel, S

On Wed, 15 Feb 2017 09:54:09 -0700
"Baicar, Tyler" <tbaicar@codeaurora.org> wrote:
\
> 
> In v9 I currently have this and the ARM trace event from this series 
> both wrapped in an
> ifdef verifying that CONFIG_RAS is enabled. This resolves the kbuild 
> failures and
> will have this code compiled out when that config isn't enabled. Do you 
> think I
> should use the ifdef or use these *_enabled() functions?
> 

I believe you need both. The *_enabled() functions wont prevent build
errors. But since tracing is seldom enabled (usually only when you want
to see what's happening), the *_enabled() functions optimize the code
to not perform tasks that are only needed when tracing is enabled. When
I say "enabled" I mean actively tracing, as supposed to being just
enabled in the kernel.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-15 17:03         ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-02-15 17:03 UTC (permalink / raw)
  To: Baicar, Tyler
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, bristot, linux-arm-kernel, kvmarm,
	kvm, linux-kernel, linux-acpi, linux-efi, devel, Suzuki.Poulose,
	punit.agrawal, astone, harba, hanjun.guo, john.garry, shiju.jose

On Wed, 15 Feb 2017 09:54:09 -0700
"Baicar, Tyler" <tbaicar@codeaurora.org> wrote:
\
> 
> In v9 I currently have this and the ARM trace event from this series 
> both wrapped in an
> ifdef verifying that CONFIG_RAS is enabled. This resolves the kbuild 
> failures and
> will have this code compiled out when that config isn't enabled. Do you 
> think I
> should use the ifdef or use these *_enabled() functions?
> 

I believe you need both. The *_enabled() functions wont prevent build
errors. But since tracing is seldom enabled (usually only when you want
to see what's happening), the *_enabled() functions optimize the code
to not perform tasks that are only needed when tracing is enabled. When
I say "enabled" I mean actively tracing, as supposed to being just
enabled in the kernel.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-15 17:03         ` Steven Rostedt
  0 siblings, 0 replies; 97+ messages in thread
From: Steven Rostedt @ 2017-02-15 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 15 Feb 2017 09:54:09 -0700
"Baicar, Tyler" <tbaicar@codeaurora.org> wrote:
\
> 
> In v9 I currently have this and the ARM trace event from this series 
> both wrapped in an
> ifdef verifying that CONFIG_RAS is enabled. This resolves the kbuild 
> failures and
> will have this code compiled out when that config isn't enabled. Do you 
> think I
> should use the ifdef or use these *_enabled() functions?
> 

I believe you need both. The *_enabled() functions wont prevent build
errors. But since tracing is seldom enabled (usually only when you want
to see what's happening), the *_enabled() functions optimize the code
to not perform tasks that are only needed when tracing is enabled. When
I say "enabled" I mean actively tracing, as supposed to being just
enabled in the kernel.

-- Steve

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
  2017-02-15 17:03         ` Steven Rostedt
  (?)
@ 2017-02-15 17:06           ` Baicar, Tyler
  -1 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 17:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-efi, kvm, matt, catalin.marinas, will.deacon, robert.moore,
	paul.gortmaker, lv.zheng, kvmarm, fu.wei, zjzhang, linux,
	linux-acpi, eun.taik.lee, shijie.huang, labbott, lenb, harba,
	john.garry, marc.zyngier, punit.agrawal, nkaje,
	sandeepa.s.prabhu, linux-arm-kernel, devel, rjw, rruigrok,
	linux-kernel, astone, hanjun.guo, pbonzini, akpm, bristot,
	shiju.jose

On 2/15/2017 10:03 AM, Steven Rostedt wrote:
> On Wed, 15 Feb 2017 09:54:09 -0700
> "Baicar, Tyler" <tbaicar@codeaurora.org> wrote:
> \
>> In v9 I currently have this and the ARM trace event from this series
>> both wrapped in an
>> ifdef verifying that CONFIG_RAS is enabled. This resolves the kbuild
>> failures and
>> will have this code compiled out when that config isn't enabled. Do you
>> think I
>> should use the ifdef or use these *_enabled() functions?
>>
> I believe you need both. The *_enabled() functions wont prevent build
> errors. But since tracing is seldom enabled (usually only when you want
> to see what's happening), the *_enabled() functions optimize the code
> to not perform tasks that are only needed when tracing is enabled. When
> I say "enabled" I mean actively tracing, as supposed to being just
> enabled in the kernel.
>
> -- Steve
Got it, I will add both!

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-15 17:06           ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 17:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, zjzhang, mark.rutland, james.morse, akpm,
	eun.taik.lee, sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, bristot, linux-arm-kernel, kvmarm,
	kvm, linux-kernel, linux-acpi, linux-efi, devel, Suzuki.Poulose,
	punit.agrawal, astone, harba, hanjun.guo, john.garry, shiju.jose

On 2/15/2017 10:03 AM, Steven Rostedt wrote:
> On Wed, 15 Feb 2017 09:54:09 -0700
> "Baicar, Tyler" <tbaicar@codeaurora.org> wrote:
> \
>> In v9 I currently have this and the ARM trace event from this series
>> both wrapped in an
>> ifdef verifying that CONFIG_RAS is enabled. This resolves the kbuild
>> failures and
>> will have this code compiled out when that config isn't enabled. Do you
>> think I
>> should use the ifdef or use these *_enabled() functions?
>>
> I believe you need both. The *_enabled() functions wont prevent build
> errors. But since tracing is seldom enabled (usually only when you want
> to see what's happening), the *_enabled() functions optimize the code
> to not perform tasks that are only needed when tracing is enabled. When
> I say "enabled" I mean actively tracing, as supposed to being just
> enabled in the kernel.
>
> -- Steve
Got it, I will add both!

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section
@ 2017-02-15 17:06           ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 17:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/15/2017 10:03 AM, Steven Rostedt wrote:
> On Wed, 15 Feb 2017 09:54:09 -0700
> "Baicar, Tyler" <tbaicar@codeaurora.org> wrote:
> \
>> In v9 I currently have this and the ARM trace event from this series
>> both wrapped in an
>> ifdef verifying that CONFIG_RAS is enabled. This resolves the kbuild
>> failures and
>> will have this code compiled out when that config isn't enabled. Do you
>> think I
>> should use the ifdef or use these *_enabled() functions?
>>
> I believe you need both. The *_enabled() functions wont prevent build
> errors. But since tracing is seldom enabled (usually only when you want
> to see what's happening), the *_enabled() functions optimize the code
> to not perform tasks that are only needed when tracing is enabled. When
> I say "enabled" I mean actively tracing, as supposed to being just
> enabled in the kernel.
>
> -- Steve
Got it, I will add both!

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
  2017-02-15 12:13             ` James Morse
  (?)
@ 2017-02-15 17:07                 ` Baicar, Tyler
  -1 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 17:07 UTC (permalink / raw)
  To: James Morse, zjzhang-sgV2jX0FEOL9JmXXK+q4OQ
  Cc: christoffer.dall-QSEj5FYQhm4dnm+yROfE0A,
	marc.zyngier-5wv7dgnIgG8, pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
	rkrcmar-H+wXaHxf7aLQT0dZR+AlfA, linux-I+IVW8TIWO2tmTQ+vhA3Yw,
	catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	rjw-LthD3rsA81gm4RdzfppkhA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	matt-mF/unelCI9GS6iBeEJttW/XRex20P6io,
	robert.moore-ral2JQCrhuEAvxtiuMwx3w,
	lv.zheng-ral2JQCrhuEAvxtiuMwx3w, nkaje-sgV2jX0FEOL9JmXXK+q4OQ,
	mark.rutland-5wv7dgnIgG8, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	eun.taik.lee-Sze3O3UU22JBDgjK7y7TUQ,
	sandeepa.s.prabhu-Re5JQEeQqe8AvxtiuMwx3w,
	labbott-H+wXaHxf7aLQT0dZR+AlfA, shijie.huang-5wv7dgnIgG8,
	rruigrok-sgV2jX0FEOL9JmXXK+q4OQ,
	paul.gortmaker-CWA4WttNNZF54TAoqtyWWQ, tn-nYOzD4b6Jr9Wk0Htik3J/w,
	fu.wei-QSEj5FYQhm4dnm+yROfE0A, rostedt-nx8X9YLhiw1AfugRpC6u6w,
	bristot-H+wXaHxf7aLQT0dZR+AlfA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, devel-E0kO6a4B6psdnm+yROfE0A,
	Suzuki.Poulose-5wv7dgnIgG8, punit.agr

On 2/15/2017 5:13 AM, James Morse wrote:
> Hi Tyler,
>
> On 13/02/17 22:45, Baicar, Tyler wrote:
>> On 2/9/2017 3:48 AM, James Morse wrote:
>>> On 01/02/17 17:16, Tyler Baicar wrote:
>>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>>>>
>>>> Even if an error status block's severity is fatal, the kernel does not
>>>> honor the severity level and panic.
>>>>
>>>> With the firmware first model, the platform could inform the OS about a
>>>> fatal hardware error through the non-NMI GHES notification type. The OS
>>>> should panic when a hardware error record is received with this
>>>> severity.
>>>>
>>>> Call panic() after CPER data in error status block is printed if
>>>> severity is fatal, before each error section is handled.
>>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>>> index 8756172..86c1f15 100644
>>>> --- a/drivers/acpi/apei/ghes.c
>>>> +++ b/drivers/acpi/apei/ghes.c
>>>> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2
>>>> *generic_v2)
>>>>        return rc;
>>>>    }
>>>>    +static void __ghes_call_panic(void)
>>>> +{
>>>> +    if (panic_timeout == 0)
>>>> +        panic_timeout = ghes_panic_timeout;
>>>> +    panic("Fatal hardware error!");
>>>> +}
>>>> +
>>> __ghes_panic() also has:
>>>>      __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
>>> Which prints this estatus regardless of rate limiting and cache-ing.
> [...]
>
>>>>                ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>>>        }
>>>> +    if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
>>>> +        __ghes_call_panic();
>>>> +    }
>>> I think this ghes_severity() then panic() should go above the:
>>>>      if (!ghes_estatus_cached(ghes->estatus)) {
>>> and we should call __ghes_print_estatus() here too, to make sure the message
>>> definitely got out!
>
>> Okay, that makes sense. If we move this up, is there a problem with calling
>> __ghes_panic() instead of making the __ghes_print_estatus() and
>> __ghes_call_panic() calls here? It looks like that will just add a call to
>> oops_begin() and ghes_print_queued_estatus() as well, but this is what
>> ghes_notify_nmi() does if the severity is panic.
>
> I don't think the queued stuff is relevant, isn't that just for x86-NMI messages
> that it doesn't print out directly?
>
> A quick grep shows arm64 doesn't have oops_begin(), you may have to add some
> equivalent mechanism. Lets try and avoid that rabbit hole!
>
> Given __ghes_panic() calls __ghes_print_estatus() too, you could try moving that
> into your new __ghes_call_panic().... or whatever results in the least lines
> changed!
Sounds good, I will just use __ghes_print_estatus() and __ghes_call_panic().

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
@ 2017-02-15 17:07                 ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 17:07 UTC (permalink / raw)
  To: James Morse, zjzhang
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, rjw, lenb, matt, robert.moore,
	lv.zheng, nkaje, mark.rutland, akpm, eun.taik.lee,
	sandeepa.s.prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, fu.wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, punit.agrawal, astone, harba, hanjun.guo,
	john.garry, shiju.jose

On 2/15/2017 5:13 AM, James Morse wrote:
> Hi Tyler,
>
> On 13/02/17 22:45, Baicar, Tyler wrote:
>> On 2/9/2017 3:48 AM, James Morse wrote:
>>> On 01/02/17 17:16, Tyler Baicar wrote:
>>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>>>
>>>> Even if an error status block's severity is fatal, the kernel does not
>>>> honor the severity level and panic.
>>>>
>>>> With the firmware first model, the platform could inform the OS about a
>>>> fatal hardware error through the non-NMI GHES notification type. The OS
>>>> should panic when a hardware error record is received with this
>>>> severity.
>>>>
>>>> Call panic() after CPER data in error status block is printed if
>>>> severity is fatal, before each error section is handled.
>>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>>> index 8756172..86c1f15 100644
>>>> --- a/drivers/acpi/apei/ghes.c
>>>> +++ b/drivers/acpi/apei/ghes.c
>>>> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2
>>>> *generic_v2)
>>>>        return rc;
>>>>    }
>>>>    +static void __ghes_call_panic(void)
>>>> +{
>>>> +    if (panic_timeout == 0)
>>>> +        panic_timeout = ghes_panic_timeout;
>>>> +    panic("Fatal hardware error!");
>>>> +}
>>>> +
>>> __ghes_panic() also has:
>>>>      __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
>>> Which prints this estatus regardless of rate limiting and cache-ing.
> [...]
>
>>>>                ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>>>        }
>>>> +    if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
>>>> +        __ghes_call_panic();
>>>> +    }
>>> I think this ghes_severity() then panic() should go above the:
>>>>      if (!ghes_estatus_cached(ghes->estatus)) {
>>> and we should call __ghes_print_estatus() here too, to make sure the message
>>> definitely got out!
>
>> Okay, that makes sense. If we move this up, is there a problem with calling
>> __ghes_panic() instead of making the __ghes_print_estatus() and
>> __ghes_call_panic() calls here? It looks like that will just add a call to
>> oops_begin() and ghes_print_queued_estatus() as well, but this is what
>> ghes_notify_nmi() does if the severity is panic.
>
> I don't think the queued stuff is relevant, isn't that just for x86-NMI messages
> that it doesn't print out directly?
>
> A quick grep shows arm64 doesn't have oops_begin(), you may have to add some
> equivalent mechanism. Lets try and avoid that rabbit hole!
>
> Given __ghes_panic() calls __ghes_print_estatus() too, you could try moving that
> into your new __ghes_call_panic().... or whatever results in the least lines
> changed!
Sounds good, I will just use __ghes_print_estatus() and __ghes_call_panic().

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block
@ 2017-02-15 17:07                 ` Baicar, Tyler
  0 siblings, 0 replies; 97+ messages in thread
From: Baicar, Tyler @ 2017-02-15 17:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 2/15/2017 5:13 AM, James Morse wrote:
> Hi Tyler,
>
> On 13/02/17 22:45, Baicar, Tyler wrote:
>> On 2/9/2017 3:48 AM, James Morse wrote:
>>> On 01/02/17 17:16, Tyler Baicar wrote:
>>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>>>
>>>> Even if an error status block's severity is fatal, the kernel does not
>>>> honor the severity level and panic.
>>>>
>>>> With the firmware first model, the platform could inform the OS about a
>>>> fatal hardware error through the non-NMI GHES notification type. The OS
>>>> should panic when a hardware error record is received with this
>>>> severity.
>>>>
>>>> Call panic() after CPER data in error status block is printed if
>>>> severity is fatal, before each error section is handled.
>>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>>> index 8756172..86c1f15 100644
>>>> --- a/drivers/acpi/apei/ghes.c
>>>> +++ b/drivers/acpi/apei/ghes.c
>>>> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2
>>>> *generic_v2)
>>>>        return rc;
>>>>    }
>>>>    +static void __ghes_call_panic(void)
>>>> +{
>>>> +    if (panic_timeout == 0)
>>>> +        panic_timeout = ghes_panic_timeout;
>>>> +    panic("Fatal hardware error!");
>>>> +}
>>>> +
>>> __ghes_panic() also has:
>>>>      __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
>>> Which prints this estatus regardless of rate limiting and cache-ing.
> [...]
>
>>>>                ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>>>        }
>>>> +    if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
>>>> +        __ghes_call_panic();
>>>> +    }
>>> I think this ghes_severity() then panic() should go above the:
>>>>      if (!ghes_estatus_cached(ghes->estatus)) {
>>> and we should call __ghes_print_estatus() here too, to make sure the message
>>> definitely got out!
>
>> Okay, that makes sense. If we move this up, is there a problem with calling
>> __ghes_panic() instead of making the __ghes_print_estatus() and
>> __ghes_call_panic() calls here? It looks like that will just add a call to
>> oops_begin() and ghes_print_queued_estatus() as well, but this is what
>> ghes_notify_nmi() does if the severity is panic.
>
> I don't think the queued stuff is relevant, isn't that just for x86-NMI messages
> that it doesn't print out directly?
>
> A quick grep shows arm64 doesn't have oops_begin(), you may have to add some
> equivalent mechanism. Lets try and avoid that rabbit hole!
>
> Given __ghes_panic() calls __ghes_print_estatus() too, you could try moving that
> into your new __ghes_call_panic().... or whatever results in the least lines
> changed!
Sounds good, I will just use __ghes_print_estatus() and __ghes_call_panic().

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2017-02-15 17:07 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-01 17:16 [PATCH V8 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Tyler Baicar
2017-02-01 17:16 ` Tyler Baicar
2017-02-01 17:16 ` Tyler Baicar
2017-02-01 17:16 ` Tyler Baicar
2017-02-01 17:16 ` [PATCH V8 01/10] acpi: apei: read ack upon ghes record consumption Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16 ` [PATCH V8 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1 Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16 ` [PATCH V8 03/10] efi: parse ARM processor error Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16 ` [PATCH V8 04/10] arm64: exception: handle Synchronous External Abort Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-03 15:59   ` James Morse
2017-02-03 15:59     ` James Morse
2017-02-03 15:59     ` James Morse
2017-02-03 20:24     ` Baicar, Tyler
2017-02-03 20:24       ` Baicar, Tyler
2017-02-03 20:24       ` Baicar, Tyler
2017-02-01 17:16 ` [PATCH V8 05/10] acpi: apei: handle SEA notification type for ARMv8 Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
     [not found]   ` <1485969413-23577-6-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-02-01 22:26     ` kbuild test robot
2017-02-01 22:26       ` kbuild test robot
2017-02-01 22:26       ` kbuild test robot
2017-02-01 22:26       ` kbuild test robot
2017-02-03 16:00   ` James Morse
2017-02-03 16:00     ` James Morse
2017-02-03 16:00     ` James Morse
2017-02-03 20:38     ` Baicar, Tyler
2017-02-03 20:38       ` Baicar, Tyler
2017-02-03 20:38       ` Baicar, Tyler
2017-02-15  6:24   ` Zhengqiang
2017-02-15  6:24     ` Zhengqiang
2017-02-15  6:24     ` Zhengqiang
2017-02-15  6:24     ` Zhengqiang
2017-02-15 14:58     ` Baicar, Tyler
2017-02-15 14:58       ` Baicar, Tyler
2017-02-15 14:58       ` Baicar, Tyler
2017-02-15 14:58       ` Baicar, Tyler
2017-02-01 17:16 ` [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-09 10:48   ` James Morse
2017-02-09 10:48     ` James Morse
2017-02-09 10:48     ` James Morse
     [not found]     ` <589C490A.9080109-5wv7dgnIgG8@public.gmane.org>
2017-02-13 22:45       ` Baicar, Tyler
2017-02-13 22:45         ` Baicar, Tyler
2017-02-13 22:45         ` Baicar, Tyler
     [not found]         ` <5b06372d-e389-5157-ccb4-a7b023990d4d-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-02-15 12:13           ` James Morse
2017-02-15 12:13             ` James Morse
2017-02-15 12:13             ` James Morse
     [not found]             ` <58A445D5.7030501-5wv7dgnIgG8@public.gmane.org>
2017-02-15 17:07               ` Baicar, Tyler
2017-02-15 17:07                 ` Baicar, Tyler
2017-02-15 17:07                 ` Baicar, Tyler
2017-02-01 17:16 ` [PATCH V8 07/10] efi: print unrecognized CPER section Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16 ` [PATCH V8 08/10] ras: acpi / apei: generate trace event for " Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 23:20   ` kbuild test robot
2017-02-01 23:20     ` kbuild test robot
2017-02-01 23:20     ` kbuild test robot
2017-02-01 23:20     ` kbuild test robot
2017-02-15 15:52   ` Steven Rostedt
2017-02-15 15:52     ` Steven Rostedt
2017-02-15 15:52     ` Steven Rostedt
2017-02-15 16:54     ` Baicar, Tyler
2017-02-15 16:54       ` Baicar, Tyler
2017-02-15 16:54       ` Baicar, Tyler
2017-02-15 17:03       ` Steven Rostedt
2017-02-15 17:03         ` Steven Rostedt
2017-02-15 17:03         ` Steven Rostedt
2017-02-15 17:06         ` Baicar, Tyler
2017-02-15 17:06           ` Baicar, Tyler
2017-02-15 17:06           ` Baicar, Tyler
2017-02-01 17:16 ` [PATCH V8 09/10] trace, ras: add ARM processor error trace event Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-02  2:34   ` kbuild test robot
2017-02-02  2:34     ` kbuild test robot
2017-02-02  2:34     ` kbuild test robot
2017-02-02  2:34     ` kbuild test robot
2017-02-02  3:15   ` Steven Rostedt
2017-02-02  3:15     ` Steven Rostedt
2017-02-02  3:15     ` Steven Rostedt
2017-02-03 20:18     ` Baicar, Tyler
2017-02-03 20:18       ` Baicar, Tyler
2017-02-03 20:18       ` Baicar, Tyler
2017-02-01 17:16 ` [PATCH V8 10/10] arm/arm64: KVM: add guest SEA support Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar
2017-02-01 17:16   ` Tyler Baicar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.