From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Baicar, Tyler" Subject: Re: [PATCH V15 04/11] efi: parse ARM processor error Date: Tue, 25 Apr 2017 10:05:31 -0600 Message-ID: References: <1492556723-9189-1-git-send-email-tbaicar@codeaurora.org> <1492556723-9189-5-git-send-email-tbaicar@codeaurora.org> <20170421175527.fjwnqd22jz7br5yu@pd.tnic> <20170424175240.3nvhbxzwicxnk6og@pd.tnic> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20170424175240.3nvhbxzwicxnk6og@pd.tnic> Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu To: Borislav Petkov Cc: linux-efi@vger.kernel.org, kvm@vger.kernel.org, matt@codeblueprint.co.uk, catalin.marinas@arm.com, will.deacon@arm.com, robert.moore@intel.com, paul.gortmaker@windriver.com, lv.zheng@intel.com, kvmarm@lists.cs.columbia.edu, fu.wei@linaro.org, rafael@kernel.org, zjzhang@codeaurora.org, linux@armlinux.org.uk, gengdongjiu@huawei.com, linux-acpi@vger.kernel.org, eun.taik.lee@samsung.com, shijie.huang@arm.com, labbott@redhat.com, lenb@kernel.org, harba@codeaurora.org, john.garry@huawei.com, marc.zyngier@arm.com, punit.agrawal@arm.com, rostedt@goodmis.org, nkaje@codeaurora.org, sandeepa.s.prabhu@gmail.com, linux-arm-kernel@lists.infradead.org, tony.luck@intel.com, rjw@rjwysocki.net, rruigrok@codeaurora.org, linux-kernel@vger.kernel.org, astone@redhat.com, hanjun.guo@linaro.org, joe@perches.com, pbonzini@redhat.com, akpm@linux-foundation.org, bristot@redhat.com, shiju.jose@huawe List-Id: linux-acpi@vger.kernel.org On 4/24/2017 11:52 AM, Borislav Petkov wrote: > On Fri, Apr 21, 2017 at 12:22:09PM -0600, Baicar, Tyler wrote: >> I guess it's not really needed. It just may be useful considering there can >> be numerous error info structures, numerous context info structures, and a >> variable length vendor information section. I can move this print to only in >> the length check failure cases. > And? Why does the user care? > > I mean, it is good for debugging when you wanna see you're parsing the > error info data properly but otherwise it doesn't improve the error > reporting one bit. I'll move this to just happen when the length check fails. >> Because these are part of the error information structure. I wouldn't think >> FW would populate error information structures that are different versions >> in the same processor error, but it could be possible from the spec (at >> least once there are different versions of the table). > Same argument as above. I can remove it then. > >> There is an error information 64 bit value in the ARM processor error >> information structure. (UEFI spec 2.6 table 261) > So that's IP-dependent and explained in the following tables. Any plans > on decoding that too? Yes, I do plan on adding further decoding for these values in the future. > >> Why's that? Dumping this vendor specific error information is similar to the >> unrecognized CPER section reporting which is also meant for vendor specific >> information https://lkml.org/lkml/2017/4/18/751 > And how do those naked bytes help the user understand the error happening? > > Even in your example you have: > > [ 140.739210] {1}[Hardware Error]: 00000000: 4d415201 4d492031 453a4d45 435f4343 .RAM1 IMEM:ECC_C > [ 140.739214] {1}[Hardware Error]: 00000010: 53515f45 44525f42 00000000 00000000 E_QSB_RD........ > > Which looks like some correctable ECC DRAM error and is actually begging > to be decoded in a human-readable form. So let's do that completely and > not dump partially decoded information. That seems like something that should be done outside of these patches (if added to the kernel at all). The decoding for this information would all be vendor specific, so I'm not sure if we want to pollute the EFI code with vendor specific error decoding. Currently we are using the RAS Daemon user space tool for the decoding of this information since vendors can easily pick up this tool and add an extension for their vendor specific parsing. These prints will only happen when the firmware supports the vendor specific error information anyway. Thanks, Tyler -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1951228AbdDYQFu (ORCPT ); Tue, 25 Apr 2017 12:05:50 -0400 Received: from smtp.codeaurora.org ([198.145.29.96]:33490 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1951197AbdDYQFj (ORCPT ); Tue, 25 Apr 2017 12:05:39 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 38ADD6143F Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=tbaicar@codeaurora.org Subject: Re: [PATCH V15 04/11] efi: parse ARM processor error To: Borislav Petkov Cc: christoffer.dall@linaro.org, marc.zyngier@arm.com, pbonzini@redhat.com, rkrcmar@redhat.com, linux@armlinux.org.uk, catalin.marinas@arm.com, will.deacon@arm.com, rjw@rjwysocki.net, lenb@kernel.org, matt@codeblueprint.co.uk, robert.moore@intel.com, lv.zheng@intel.com, nkaje@codeaurora.org, zjzhang@codeaurora.org, mark.rutland@arm.com, james.morse@arm.com, akpm@linux-foundation.org, eun.taik.lee@samsung.com, sandeepa.s.prabhu@gmail.com, labbott@redhat.com, shijie.huang@arm.com, rruigrok@codeaurora.org, paul.gortmaker@windriver.com, tn@semihalf.com, fu.wei@linaro.org, rostedt@goodmis.org, bristot@redhat.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-efi@vger.kernel.org, Suzuki.Poulose@arm.com, punit.agrawal@arm.com, astone@redhat.com, harba@codeaurora.org, hanjun.guo@linaro.org, john.garry@huawei.com, shiju.jose@huawei.com, joe@perches.com, rafael@kernel.org, tony.luck@intel.com, gengdongjiu@huawei.com, xiexiuqi@huawei.com References: <1492556723-9189-1-git-send-email-tbaicar@codeaurora.org> <1492556723-9189-5-git-send-email-tbaicar@codeaurora.org> <20170421175527.fjwnqd22jz7br5yu@pd.tnic> <20170424175240.3nvhbxzwicxnk6og@pd.tnic> From: "Baicar, Tyler" Message-ID: Date: Tue, 25 Apr 2017 10:05:31 -0600 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.0.1 MIME-Version: 1.0 In-Reply-To: <20170424175240.3nvhbxzwicxnk6og@pd.tnic> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/24/2017 11:52 AM, Borislav Petkov wrote: > On Fri, Apr 21, 2017 at 12:22:09PM -0600, Baicar, Tyler wrote: >> I guess it's not really needed. It just may be useful considering there can >> be numerous error info structures, numerous context info structures, and a >> variable length vendor information section. I can move this print to only in >> the length check failure cases. > And? Why does the user care? > > I mean, it is good for debugging when you wanna see you're parsing the > error info data properly but otherwise it doesn't improve the error > reporting one bit. I'll move this to just happen when the length check fails. >> Because these are part of the error information structure. I wouldn't think >> FW would populate error information structures that are different versions >> in the same processor error, but it could be possible from the spec (at >> least once there are different versions of the table). > Same argument as above. I can remove it then. > >> There is an error information 64 bit value in the ARM processor error >> information structure. (UEFI spec 2.6 table 261) > So that's IP-dependent and explained in the following tables. Any plans > on decoding that too? Yes, I do plan on adding further decoding for these values in the future. > >> Why's that? Dumping this vendor specific error information is similar to the >> unrecognized CPER section reporting which is also meant for vendor specific >> information https://lkml.org/lkml/2017/4/18/751 > And how do those naked bytes help the user understand the error happening? > > Even in your example you have: > > [ 140.739210] {1}[Hardware Error]: 00000000: 4d415201 4d492031 453a4d45 435f4343 .RAM1 IMEM:ECC_C > [ 140.739214] {1}[Hardware Error]: 00000010: 53515f45 44525f42 00000000 00000000 E_QSB_RD........ > > Which looks like some correctable ECC DRAM error and is actually begging > to be decoded in a human-readable form. So let's do that completely and > not dump partially decoded information. That seems like something that should be done outside of these patches (if added to the kernel at all). The decoding for this information would all be vendor specific, so I'm not sure if we want to pollute the EFI code with vendor specific error decoding. Currently we are using the RAS Daemon user space tool for the decoding of this information since vendors can easily pick up this tool and add an extension for their vendor specific parsing. These prints will only happen when the firmware supports the vendor specific error information anyway. Thanks, Tyler -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. From mboxrd@z Thu Jan 1 00:00:00 1970 From: tbaicar@codeaurora.org (Baicar, Tyler) Date: Tue, 25 Apr 2017 10:05:31 -0600 Subject: [PATCH V15 04/11] efi: parse ARM processor error In-Reply-To: <20170424175240.3nvhbxzwicxnk6og@pd.tnic> References: <1492556723-9189-1-git-send-email-tbaicar@codeaurora.org> <1492556723-9189-5-git-send-email-tbaicar@codeaurora.org> <20170421175527.fjwnqd22jz7br5yu@pd.tnic> <20170424175240.3nvhbxzwicxnk6og@pd.tnic> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 4/24/2017 11:52 AM, Borislav Petkov wrote: > On Fri, Apr 21, 2017 at 12:22:09PM -0600, Baicar, Tyler wrote: >> I guess it's not really needed. It just may be useful considering there can >> be numerous error info structures, numerous context info structures, and a >> variable length vendor information section. I can move this print to only in >> the length check failure cases. > And? Why does the user care? > > I mean, it is good for debugging when you wanna see you're parsing the > error info data properly but otherwise it doesn't improve the error > reporting one bit. I'll move this to just happen when the length check fails. >> Because these are part of the error information structure. I wouldn't think >> FW would populate error information structures that are different versions >> in the same processor error, but it could be possible from the spec (at >> least once there are different versions of the table). > Same argument as above. I can remove it then. > >> There is an error information 64 bit value in the ARM processor error >> information structure. (UEFI spec 2.6 table 261) > So that's IP-dependent and explained in the following tables. Any plans > on decoding that too? Yes, I do plan on adding further decoding for these values in the future. > >> Why's that? Dumping this vendor specific error information is similar to the >> unrecognized CPER section reporting which is also meant for vendor specific >> information https://lkml.org/lkml/2017/4/18/751 > And how do those naked bytes help the user understand the error happening? > > Even in your example you have: > > [ 140.739210] {1}[Hardware Error]: 00000000: 4d415201 4d492031 453a4d45 435f4343 .RAM1 IMEM:ECC_C > [ 140.739214] {1}[Hardware Error]: 00000010: 53515f45 44525f42 00000000 00000000 E_QSB_RD........ > > Which looks like some correctable ECC DRAM error and is actually begging > to be decoded in a human-readable form. So let's do that completely and > not dump partially decoded information. That seems like something that should be done outside of these patches (if added to the kernel at all). The decoding for this information would all be vendor specific, so I'm not sure if we want to pollute the EFI code with vendor specific error decoding. Currently we are using the RAS Daemon user space tool for the decoding of this information since vendors can easily pick up this tool and add an extension for their vendor specific parsing. These prints will only happen when the firmware supports the vendor specific error information anyway. Thanks, Tyler -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.