From mboxrd@z Thu Jan 1 00:00:00 1970 From: tbaicar@codeaurora.org (Baicar, Tyler) Date: Fri, 30 Jun 2017 10:47:17 -0600 Subject: [PATCH V17 01/11] acpi: apei: read ack upon ghes record consumption In-Reply-To: <20170630101043.GZ658@rric.localdomain> References: <1495225933-4410-1-git-send-email-tbaicar@codeaurora.org> <1495225933-4410-2-git-send-email-tbaicar@codeaurora.org> <20170630101043.GZ658@rric.localdomain> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 6/30/2017 4:10 AM, Robert Richter wrote: > Tyler, > > On 19.05.17 14:32:03, Tyler Baicar wrote: >> A RAS (Reliability, Availability, Serviceability) controller >> may be a separate processor running in parallel with OS >> execution, and may generate error records for consumption by >> the OS. If the RAS controller produces multiple error records, >> then they may be overwritten before the OS has consumed them. >> >> The Generic Hardware Error Source (GHES) v2 structure >> introduces the capability for the OS to acknowledge the >> consumption of the error record generated by the RAS >> controller. A RAS controller supporting GHESv2 shall wait for >> the acknowledgment before writing a new error record, thus >> eliminating the race condition. >> >> Add support for parsing of GHESv2 sub-tables as well. >> >> Signed-off-by: Tyler Baicar >> CC: Jonathan (Zhixiong) Zhang >> Reviewed-by: James Morse >> --- >> drivers/acpi/apei/ghes.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--- >> drivers/acpi/apei/hest.c | 7 ++++-- >> include/acpi/ghes.h | 5 +++- >> 3 files changed, 65 insertions(+), 6 deletions(-) >> static int ghes_proc(struct ghes *ghes) >> { >> int rc; >> @@ -661,6 +704,16 @@ static int ghes_proc(struct ghes *ghes) >> ghes_estatus_cache_add(ghes->generic, ghes->estatus); >> } >> ghes_do_proc(ghes, ghes->estatus); >> + >> + /* >> + * GHESv2 type HEST entries introduce support for error acknowledgment, >> + * so only acknowledge the error if this support is present. >> + */ >> + if (is_hest_type_generic_v2(ghes)) { >> + rc = ghes_ack_error(ghes->generic_v2); >> + if (rc) >> + return rc; >> + } >> out: >> ghes_clear_estatus(ghes); >> return rc; > was there any specific reason why the ack is sent before clearing the > block status? Spec says the ack should be sent at last. > > Also, the block is never cleared if ghes_ack_error() returns an error. > IMO we should fall through and clear the block status (this will > change anyway if the bloc status is cleared first). Hello Robert, Thank you for pointing this out. I will send a patch to move the ack after the ghes_clear_estatus. This is probably the right thing to do since right now if the FW populates an invalid estatus, we will fail to read the estatus, jump to 'out:', and never send the ack. Thanks, Tyler -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.