From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DE548C001DE for ; Tue, 25 Jul 2023 13:10:49 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=velankanigroup.com header.i=jdhanasekar@velankanigroup.com header.a=rsa-sha256 header.s=zoho header.b=A51y8a3a; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4R9HTN3r35z3cM2 for ; Tue, 25 Jul 2023 23:10:48 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=velankanigroup.com header.i=jdhanasekar@velankanigroup.com header.a=rsa-sha256 header.s=zoho header.b=A51y8a3a; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=velankanigroup.com (client-ip=103.117.158.11; helo=sender-op-o11.zoho.in; envelope-from=jdhanasekar@velankanigroup.com; receiver=lists.ozlabs.org) Received: from sender-op-o11.zoho.in (sender-op-o11.zoho.in [103.117.158.11]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4R9HS81X76z3brX for ; Tue, 25 Jul 2023 23:09:43 +1000 (AEST) ARC-Seal: i=1; a=rsa-sha256; t=1690290565; cv=none; d=zohomail.in; s=zohoarc; b=I1UjGc0RfWGy5g8161p881bWQh1CmGrX5jfxdOwbXUlSE3rQQa7XwvmoOH9VHE7hltaGyk0Yx8b1BtLalEAzysAV+/VqC8kKMT2C3b0WK6S9iFnSgyxRPfeQ8A8ysIu8fV2T6C4ug2ixptfbETmqM90OqOzhBtltl3K6+kh8OJ0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.in; s=zohoarc; t=1690290565; h=Content-Type:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To; bh=l5TPs/v9b9m/4KbldWWyflfHAAvVTJECMJ+slZpT0sU=; b=ONdMO0dW811LIRHNmkKZ6rEldxGbyJWiJfFzpa/GtAKvF4/lcSZWRZvduDIPIiF7gZdxa8aBfStLxOwTbITTjpiA95E8vbyLVpaSDVPxgAsDVCVT3oWRTAE7/o9sJDrenIbNFACkm2lovIPg7zFPe+vM31Q40HQdxXtARtEm77U= ARC-Authentication-Results: i=1; mx.zohomail.in; dkim=pass header.i=velankanigroup.com; spf=pass smtp.mailfrom=jdhanasekar@velankanigroup.com; dmarc=pass header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1690290565; s=zoho; d=velankanigroup.com; i=jdhanasekar@velankanigroup.com; h=Date:Date:From:From:To:To:Cc:Cc:Message-Id:Message-Id:In-Reply-To:References:Subject:Subject:MIME-Version:Content-Type:Reply-To; bh=l5TPs/v9b9m/4KbldWWyflfHAAvVTJECMJ+slZpT0sU=; b=A51y8a3a+qpIuzdAr+crWSgfliBCf1QRWd6L/hphQSSzWoxNLlrFEbi0p6OPssnr RHpt2zK9DaS1c5j0rQcX8jSYygIuqTP5cdfR2G5gGX4McJsG4RVJopTOdX6PrtBZIh3 o8jCDc4z4eneR9prQtcLiKyImzvnXk/I3P7p3x98= Received: from mail.zoho.in by mx.zoho.in with SMTP id 1690290564438297.67873352007655; Tue, 25 Jul 2023 18:39:24 +0530 (IST) Date: Tue, 25 Jul 2023 18:39:24 +0530 From: J Dhanasekar To: "Venkatesh, Supreeth" Message-Id: <1898d2b2d0e.4ac15546926284.5918723584994850422@velankanigroup.com> In-Reply-To: References: <07621845-19a4-0568-be0e-f556ba40b813@amd.com> <255d7c9a-ce17-bbe1-7312-990d0221cf36@amd.com> <65515592-8f77-1c8f-731c-165fb833344b@amd.com> <71a122a9-07a9-06a8-ee1a-dd108db63df3@amd.com> <18977ff7cd7.59a883fc562150.7689391317426675156@velankanigroup.com> <18987ffeff9.35c4bda1801937.8894247920197462243@velankanigroup.com> Subject: RE: [RFC] BMC RAS Feature MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2895738_559444056.1690290564369" Importance: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-Zoho-Virus-Status: 1 X-BeenThere: openbmc@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development list for OpenBMC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lei Yu , Michael Shen , openbmc , dhruvaraj S , Brad Bishop , Ed Tanous , "Dhandapani, Abinaya" Errors-To: openbmc-bounces+openbmc=archiver.kernel.org@lists.ozlabs.org Sender: "openbmc" ------=_Part_2895738_559444056.1690290564369 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Supreeth,=C2=A0 I am working on SP5 Servers too. SP5 Servers has aspeed 2600 chip and=C2=A0= BMC is off the board whereas EthanolX/Daytonax has 2500 and BMC is on the = board.=C2=A0 Algorithms or Steps for implementing functionalities (SOL, PostCode, PSU..)= will=C2=A0=C2=A0remain the same?.=C2=A0 Thanks, Dhanasekar ---- On Mon, 24 Jul 2023 19:44:52 +0530 Venkatesh, Supreeth wrote --- [AMD Official Use Only - General] Hi Dhanasekar, =C2=A0 DaytonaX and EthanolX platforms were only OpenBMC PoC with limited function= ality. We are in the process of upstreaming new AMD CRBs with OpenBMC which has al= l the functionality you mention below. Public instance of the staging/intermediary repository before upstream is h= ere: https://github.com/AMDESE/OpenBMC =C2=A0 Thanks, Supreeth Venkatesh System Manageability Architect=C2=A0=C2=A0|=C2=A0=C2=A0AMD Server Software =C2=A0 From: J Dhanasekar =20 Sent: Monday, July 24, 2023 8:04 AM To: Venkatesh, Supreeth Cc: Lei Yu ; Zane Shelley ; Michael Shen ; openbmc ; dhruvaraj S ; Brad= Bishop ; Ed Tanous ; Dhandapani, Abinaya Subject: RE: [RFC] BMC RAS Feature =C2=A0 Caution: This message originated from an External Source. Use proper cautio= n when opening attachments, clicking links, or responding. =C2=A0 Hi Supreeth, =C2=A0 Thanks for the info. We hoped that Daytonax would be upstreamed. Unfortunat= ely, It is not available.=C2=A0 Actually, we need to enable SOL, Post code and PSU features in Daytona.=C2= =A0 Will we get support for this feature enablement? or Are there any refer= ence=C2=A0implementation available for AMD boards?. =C2=A0 Thanks, Dhanasekar =C2=A0 =C2=A0 =C2=A0 ---- On Fri, 21 Jul 2023 19:33:41 +0530 Venkatesh, Supreeth wrote --- =C2=A0 [AMD Official Use Only - General] =C2=A0 Hi Dhanasekar, =C2=A0 It is supported for EPYC Genoa family and beyond at this time. Daytona uses EPYC Milan family and support is not there in that. =C2=A0 Thanks, Supreeth Venkatesh System Manageability Architect=C2=A0=C2=A0|=C2=A0=C2=A0AMD Server Software =C2=A0 From: J Dhanasekar =20 Sent: Friday, July 21, 2023 5:30 AM To: Venkatesh, Supreeth Cc: Zane Shelley ; Lei Yu ; Michael Shen ; openbmc ; dhruvaraj S ; Brad Bishop ; Ed Tanous ; Dhandapani, Abinaya Subject: Re: [RFC] BMC RAS Feature =C2=A0 Caution: This message originated from an External Source. Use proper cautio= n when opening attachments, clicking links, or responding. =C2=A0 Hi Supreeth Venkatesh, =C2=A0 Does this RAS feature work for the Daytona Platform.=C2=A0 i have been work= ing in openBMC development for the Daytonax platform.=C2=A0 If this RAS works for Daytona Platform. I will include it in my project.=C2= =A0 =C2=A0 Please provide your suggestions.=C2=A0 =C2=A0 Thanks, Dhanasekar =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ---- On Mon, 03 Apr 2023 22:06:24 +0530 Supreeth Venkatesh wrote --- =C2=A0 On 3/23/23 13:57, Zane Shelley wrote: > Caution: This message originated from an External Source. Use proper=20 > caution when opening attachments, clicking links, or responding.=20 >=20 >=20 > On 2023-03-22 19:07, Supreeth Venkatesh wrote:=20 >> On 3/22/23 02:10, Lei Yu wrote:=20 >>> Caution: This message originated from an External Source. Use proper= =20 >>> caution when opening attachments, clicking links, or responding.=20 >>>=20 >>>=20 >>>>> On Tue, 21 Mar 2023 at 20:38, Supreeth Venkatesh =20 >>>>> wrote:=20 >>>>>=20 >>>>>=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 On 3/21/23 05:40, Patrick Williams wrote:= =20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 > On Tue, Mar 21, 2023 at 12:14:45AM -0500,= Supreeth Venkatesh=20 >>>>> wrote:=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 >=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 >> #### Alternatives Considered=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 >>=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 >> In-band mechanisms using System Manageme= nt Mode (SMM)=20 >>>>> exists.=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 >>=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 >> However, out of band method to gather RA= S data is processor=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 specific.=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 >>=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 > How does this compare with existing imple= mentations in=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 > phosphor-debug-collector.=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 Thanks for your feedback. See below.=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 > I believe there was some attempt to exten= d =20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 > P-D-C previously to handle Intel's crashd= ump behavior.=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 Intel's crashdump interface uses com.intel.= crashdump.=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 We have implemented com.amd.crashdump based= on that reference.=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 However,=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 can this be made generic?=20 >>>>>=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 PoC below:=20 >>>>>=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 busctl tree com.amd.crashdump=20 >>>>>=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 =E2=94=94=E2=94=80/com=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =E2=94=94=E2=94=80/com/am= d=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =E2=94=94=E2= =94=80/com/amd/crashdump=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =E2=94=9C=E2=94=80/com/amd/crashdump/0=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =E2=94=9C=E2=94=80/com/amd/crashdump/1=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =E2=94=9C=E2=94=80/com/amd/crashdump/2=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =E2=94=9C=E2=94=80/com/amd/crashdump/3=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =E2=94=9C=E2=94=80/com/amd/crashdump/4=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =E2=94=9C=E2=94=80/com/amd/crashdump/5=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =E2=94=9C=E2=94=80/com/amd/crashdump/6=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =E2=94=9C=E2=94=80/com/amd/crashdump/7=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =E2=94=9C=E2=94=80/com/amd/crashdump/8=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = =E2=94=94=E2=94=80/com/amd/crashdump/9=20 >>>>>=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 > The repository=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 > currently handles IBM's processors, I thi= nk, or maybe that is=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 covered by=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 > openpower-debug-collector.=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 >=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 > In any case, I think you should look at t= he existing D-Bus=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 interfaces=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 > (and associated Redfish implementation) o= f these repositories=20 >>>>> and=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 > determine if you can use those approaches= (or document why=20 >>>>> now).=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 I could not find an existing D-Bus interfac= e for RAS in=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 xyz/openbmc_project/.=20 >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 It would be helpful if you could point me t= o it. =20 >>>>>=20 >>>>>=20 >>>>> There is an interface for the dumps generated from the host, which= =20 >>>>> can=20 >>>>> be used for these kinds of dumps=20 >>>>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml= /xyz/openbmc_project/Dump/Entry/System.interface.yaml=20 >>>>>=20 >>>>>=20 >>>>> The fault log also provides similar dumps=20 >>>>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml= /xyz/openbmc_project/Dump/Entry/FaultLog.interface.yaml=20 >>>>>=20 >>>>>=20 >>>> ThanksDdhruvraj. The interface looks useful for the purpose. However,= =20 >>>> the current BMCWEB implementation references=20 >>>> https://github.com/openbmc/bmcweb/blob/master/redfish-core/lib/log_se= rvices.hpp=20 >>>>=20 >>>> [com.intel.crashdump]=20 >>>> constexpr char const* crashdumpPath =3D "/com/intel/crashdump";=20 >>>>=20 >>>> constexpr char const* crashdumpInterface =3D "com.intel.crashdump";= =20 >>>> constexpr char const* crashdumpObject =3D "com.intel.crashdump";=20 >>>>=20 >>>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/= xyz/openbmc_project/Dump/Entry/System.interface.yaml=20 >>>>=20 >>>> or=20 >>>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/= xyz/openbmc_project/Dump/Entry/FaultLog.interface.yaml=20 >>>>=20 >>>> is it exercised in Redfish logservices?=20 >>> In our practice, a plugin `tools/dreport.d/plugins.d/acddump` is added= =20 >>> to copy the crashdump json file to the dump tarball. =20 >>> The crashdump tool (Intel or AMD) could trigger a dump after the=20 >>> crashdump is completed, and then we could get a dump entry containing= =20 >>> the crashdump.=20 >> Thanks Lei Yu for your input. We are using Redfish to retrieve the=20 >> CPER binary file which can then be passed through a plugin/script for= =20 >> detailed analysis.=20 >> In any case irrespective of whichever Dbus interface we use, we need a= =20 >> repository which will gather data from AMD processor via APML as per=20 >> AMD design.=20 >> APML=20 >> Spec: https://www.amd.com/system/files/TechDocs/57019-A0-PUB_3.00.zip= =20 >> Can someone please help create bmc-ras or amd-debug-collector=20 >> repository as there are instances of openpower-debug-collector=20 >> repository used for Open Power systems?=20 >>>=20 >>>=20 >>> --=20 >>> BRs,=20 >>> Lei YU=20 > I am interested in possibly standardizing some of this. IBM POWER has=20 > several related components. openpower-hw-diags is a service that will=20 > listen for the hardware interrupts via a GPIO pin. When an error is=20 > detected, it will use openpower-libhei to query hardware registers to=20 > determine what happened. Based on that information openpower-hw-diags=20 > will generate a PEL, which is an extended log in phosphor-logging, that= =20 > is used to tell service what to replace if necessary. Afterward,=20 > openpower-hw-diags will initiate openpower-debug-collector, which=20 > gathers a significant amount of data from the hardware for additional=20 > debug when necessary. I wrote openpower-libhei to be fairly agnostic. It= =20 > uses data files (currently XML, but moving to JSON) to define register= =20 > addresses and rules for isolation. openpower-hw-diags is fairly POWER=20 > specific, but I can see some parts can be made generic. Dhruv would have= =20 > to help with openpower-debug-collector.=20 Thank you. Lets collaborate in standardizing some aspects of it.=20 >=20 > Regarding creation of a new repository, I think we'll need to have some= =20 > more collaboration to determine the scope before creating it. It=20 > certainly sounds like we are doing similar things, but we need to=20 > determine if enough can be abstracted to make it worth our time.=20 I have put in a request here:=20 https://github.com/openbmc/technical-oversight-forum/issues/24=20 Please chime in. =C2=A0 =C2=A0 =C2=A0 =C2=A0 ------=_Part_2895738_559444056.1690290564369 Content-Type: multipart/related; boundary="----=_Part_2895739_1302712611.1690290564381" ------=_Part_2895739_1302712611.1690290564381 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =

Hi Supreeth, 
I am working on SP5 Servers too. SP5 Serve= rs has aspeed 2600 chip and  BMC is off the board whereas EthanolX/Day= tonax has 2500 and BMC is on the board. 
Algorithms or Steps for implementing functionalities (SOL, PostCode, P= SU..) will  remain the same?. 

= Thanks,
Dhanasekar

=



---- On Mon, 24 Jul 202= 3 19:44:52 +0530 Venkatesh, Supreeth <Supreeth.Venkatesh@amd.com><= /b> wrote ---

[AMD Official Use Only - General]<= br>


Hi Dhanaseka= r,

&nbs= p;

Dayt= onaX and EthanolX platforms were only OpenBMC PoC with limited functionalit= y.

We a= re in the process of upstreaming new AMD CRBs with OpenBMC which has all th= e functionality you mention below.

Public instance of the staging/intermediary rep= ository before upstream is here:

AMDESE/OpenBMC: OpenBMC for Genoa SP5 socket platforms (gith= ub.com)

 

Thanks,

Supreeth Venkatesh

System Manageability Architect  |&nbs= p; AMD
Server Software


 

From: J Dhanasekar <<= a href=3D"mailto:jdhanasekar@velankanigroup.com" target=3D"_blank">jdhanase= kar@velankanigroup.com>
Sent: Monday, July 24, 2023 8:04= AM
To: Venkatesh, Supreeth <Supreeth.Venkatesh@amd.com>
C= c: Lei Yu <yulei.sh@bytedance.com>; Zane Shelley <zshelle@imap.linux.ibm.com>= ; Michael Shen <g= pgpgp@google.com>; openbmc <openbmc@lists.ozlabs.org>; dhruvaraj S <= dhruvaraj@gmail.co= m>; Brad Bishop <bradleyb@fuzziesquirrel.com>; Ed Tanous <ed@tanous.net>; Dhandapani, Abinaya <Abinaya.Dhandapani@amd.com>
Subject: RE: [R= FC] BMC RAS Feature

 

<= /table>

 <= br>

Hi Supreeth,

 

Thanks= for the info. We hoped that Daytonax would be upstreamed. Unfortunately, I= t is not available. 

Actually, = we need to enable SOL, Post code and PSU features in Daytona.  Will we= get support for this feature enablement? or Are there any reference i= mplementation available for AMD boards?.

 

Thanks,

Dhanasekar

 

 

 

---- On Fri, 21 Jul 2023 19:33:41 +0530 Venkatesh, Supreeth <= Supreeth.Ve= nkatesh@amd.com> wrote ---

 

[AMD Official Use Only - General]

 

Hi Dhanase= kar,

=  

<= p class=3D"" style=3D"margin-top: 0px; margin-bottom: 0px;">It is supported for EPYC Genoa family and beyond at thi= s time.

Daytona uses EPYC Milan fa= mily and support is not there in that.

 

Thanks,

=

Supreeth Venkatesh<= span class=3D"font" style=3D"font-family:Verdana, sans-serif">

System Manageability Architect  |  AMD<= /span>
Server Software<= /span>
=

=

 

=

From: J Dhanasekar <jdhanasekar@velankanigroup.com>
Sent: Fr= iday, July 21, 2023 5:30 AM
To: Venkatesh, Supreeth <Supreeth.Venkatesh= @amd.com>
Cc: Zane Shelley <zshelle@imap.linux.ibm.com>; L= ei Yu <yulei= .sh@bytedance.com>; Michael Shen <gpgpgp@google.com>; openbmc <= openbmc@lists.ozlabs.org>; dhruvaraj S <dhruvaraj@gmail.com>; Brad Bishop &l= t;bradleyb= @fuzziesquirrel.com>; Ed Tanous <ed@tanous= .net>; Dhandapani, Abinaya <Abinaya.Dhandapani@amd.com>
Subjec= t: Re: [RFC] BMC RAS Feature

 


Caution: This message o= riginated from an External Source. Use proper caution when opening attachme= nts, clicking links, or responding.


Caution: This message originated from an External Source. Use p= roper caution when opening attachments, clicking links, or responding.

 =

Hi Supreeth Venkatesh,

 

Does this RAS feature work for the Daytona Platform.  i have been w= orking in openBMC development for the Daytonax platform. 

If this RAS works for Daytona Platfor= m. I will include it in my project. 

 

Please provide your suggestions. 

 

= Thanks,

= Dhanasekar

 

 = ;
<= /p>

&n= bsp;

 

 <= span class=3D"size" style=3D"font-size:10pt">

---- On Mon, 03 Apr 2023 22:06:24 +0530 Supreeth Venkatesh = <supreet= h.venkatesh@amd.com> wrote ---

=  

On 3/23/23 13:57, Zane Shelley wrote:
> Caution: This message origina= ted from an External Source. Use proper
> caution when opening attachments, clicking links, or respond= ing.
>
>
= > On 2023-03-22 19:07, Supreeth Venkatesh wrote:
>> On 3/22/23 02:10, Lei Yu wrote:
= >>> Caution: This message origina= ted from an External Source. Use proper
>>> caution when opening attachments, clicking links, or= responding.
>>>
>>>
>>>>> On Tue, 21 Mar 2023 at 20:38, = Supreeth Venkatesh
>>&g= t;>> <supreeth.venkatesh@amd.com> wrote:
>>>>>
>>>>>
&g= t;>>>>      On 3/21/23 05:40, Patrick Willi= ams wrote:
>>>>>= ;      > On Tue, Mar 21, 2023 at 12:14:45AM -0500, S= upreeth Venkatesh
>>>= >> wrote:
>>>&g= t;>      >
>>>>>      >> #### Alternat= ives Considered
>>>&g= t;>      >>
>>>>>      >> In-band m= echanisms using System Management Mode (SMM)
>>>>> exists.
>>>>>      >>
>>>>>   &nb= sp;  >> However, out of band method to gather RAS data is proces= sor
>>>>>  = ;    specific.
= >>>>>      >>
>>>>>      >= How does this compare with existing implementations in
>>>>>      >= ; phosphor-debug-collector.
&g= t;>>>>      Thanks for your feedback. See b= elow.
>>>>> &nb= sp;    > I believe there was some attempt to extend
>>>>>   &= nbsp;  > P-D-C previously to handle Intel's crashdump behavior.
>>>>>   =    Intel's crashdump interface uses com.intel.crashdump. <= br> >>>>>    = ;  We have implemented com.amd.crashdump based on that reference.
>>>>>   &= nbsp;  However,
>>&= gt;>>      can this be made generic?
= >>>>>
>>>>>      PoC b= elow:
>>>>>
>>>>>   =    busctl tree com.amd.crashdump
>>>>>
>>>>>      =E2=94=94=E2=94=80/com
>>>>>   = ;      =E2=94=94=E2=94=80/com/amd
>>>>>     &= nbsp;     =E2=94=94=E2=94=80/com/amd/crashdump
<= br> >>>>>    = ;        
=E2=94=9C
=E2=94=80/com/amd/crashdump/0
>>>>&= gt;            
=E2=94=9C=E2=94=80/com/amd/crashdump/1
>>>>>         &nbs= p;  
=E2=94=9C=E2=94=80/com/amd/crashdump= /2

>>>>>      &n= bsp;     
= =E2=94=9C=E2=94= =80/com/amd/crashdump/3
>>>>>   =          
=E2=94=9C= =E2=94=80/com/amd/crashdump/4
>>>&g= t;>             <= /span>
=E2=94=9C=E2=94=80/com/amd/crashdump/5
>>>>>         &= nbsp;  
=E2=94=9C
=E2=94=80/com/amd/crashd= ump/6
>>>>>      =       
=E2=94=9C=E2= =94=80/com/amd/crashdump/7
>>>>>  &n= bsp;         
=E2= =94=9C=E2=94=80/com/amd/crashdump/8
>>= ;>>>           &= nbsp; =E2=94=94=E2=94=80/com/amd/crashdump/9
>>>>>
>>>>>      > The repository
>>>>>   =    > currently handles IBM's processors, I think, or maybe tha= t is
>>>>> &nbs= p;    covered by
>>>>>      > openpower-debug-collec= tor.
>>>>> &nbs= p;    >
>= >>>>      > In any case, I think you sho= uld look at the existing D-Bus
>>>>>      interfaces
>>>>>      &= gt; (and associated Redfish implementation) of these repositories
>>>>> and
>>>>>     = ; > determine if you can use those approaches (or document why
>>>>> now).
= >>>>>    &nb= sp; I could not find an existing D-Bus interface for RAS in
>>>>>     = xyz/openbmc_project/.

>>= ;>>>      It would be helpful if you could poi= nt me to it.
>>>>= >
>>>>>
>>>>> There is an i= nterface for the dumps generated from the host, which
>>>>> can
>>>>> be used for these kinds of dumps
>>>>> https://= github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_pr= oject/Dump/Entry/System.interface.yaml
>>>>>
>>>>>
>&= gt;>>> The fault log also provides similar dumps
>>>>> https://github.com/op= enbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Dump/En= try/FaultLog.interface.yaml

>>>>>
>>= ;>>>
>>>>= ThanksDdhruvraj. The interface looks useful for the purpose. However,
>>>> the current BMCWE= B implementation references
&g= t;>>> https://github.com/openb= mc/bmcweb/blob/master/redfish-core/lib/log_services.hpp
>>>>

>>>> [com.intel.crashdump]
>>>> constexpr char const* crashdumpP= ath =3D "/com/intel/crashdump";
>>>>
>>>= ;> constexpr char const* crashdumpInterface =3D "com.intel.crashdump"; <= /span>
>>>> constexpr char= const* crashdumpObject =3D "com.intel.crashdump";
>>>>
>>>> https://github.com/openbmc/phosphor-dbus-inter= faces/blob/master/yaml/xyz/openbmc_project/Dump/Entry/System.interface.yaml=
>>>> <= br> >>>> or
>>>> https://github.com/openbmc= /phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Dump/Entry/F= aultLog.interface.yaml
>= ;>>>
>>>>= is it exercised in Redfish logservices?
>>> In our practice, a plugin `tools/dreport.d/plugins.= d/acddump` is added
>>&g= t; to copy the crashdump json file to the dump tarball.
>>> The crashdump tool (Intel or AMD) c= ould trigger a dump after the
= >>> crashdump is completed, and then we could get a dump entry con= taining
>>> the crash= dump.
>> Thanks Lei Yu f= or your input. We are using Redfish to retrieve the
>> CPER binary file which can then be passed th= rough a plugin/script for
>= > detailed analysis.
>&g= t; In any case irrespective of whichever Dbus interface we use, we need a <= /span>
>> repository which will = gather data from AMD processor via APML as per
>> AMD design.
>> APML
>>= Spec: https://www.amd.com/system/files/TechDocs/57019-= A0-PUB_3.00.zip
>> C= an someone please help create bmc-ras or amd-debug-collector
>> repository as there are instances o= f openpower-debug-collector

&g= t;> repository used for Open Power systems?
>>>
= >>>
>>> -- <= /span>
>>> BRs,
<= span class=3D"x_-684518287size">>>> Lei YU

> I am interested in possibly standardizing some o= f this. IBM POWER has
> sev= eral related components. openpower-hw-diags is a service that will <= br> > listen for the hardware interrupt= s via a GPIO pin. When an error is
> detected, it will use openpower-libhei to query hardware register= s to
> determine what happe= ned. Based on that information openpower-hw-diags
> will generate a PEL, which is an extended log in= phosphor-logging, that
> i= s used to tell service what to replace if necessary. Afterward,
= > openpower-hw-diags will initiate ope= npower-debug-collector, which
= > gathers a significant amount of data from the hardware for additional =
> debug when necessary. I w= rote openpower-libhei to be fairly agnostic. It
> uses data files (currently XML, but moving to JSON) = to define register
> addres= ses and rules for isolation. openpower-hw-diags is fairly POWER
= > specific, but I can see some parts c= an be made generic. Dhruv would have
> to help with openpower-debug-collector.
Thank you. Lets collaborate in standardizing some asp= ects of it.
>
<= span class=3D"x_-684518287size">> Regarding creation of a new repository= , I think we'll need to have some

> more collaboration to determine the scope before creating it. It <= /span>
> certainly sounds like we a= re doing similar things, but we need to
> determine if enough can be abstracted to make it worth our t= ime.
I have put in a request h= ere:
https:/= /github.com/openbmc/technical-oversight-forum/issues/24
Please chime in.

 

 

 

 



------=_Part_2895739_1302712611.1690290564381 Content-Type: image/png; name=1.png Content-Transfer-Encoding: base64 Content-Disposition: inline; filename=1.png Content-ID: iVBORw0KGgoAAAANSUhEUgAAAJYAAAAjCAYAAAB2BvMkAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJ bWFnZVJlYWR5ccllPAAAA0xpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdp bj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6 eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuNi1jMTQ1IDc5LjE2 MzQ5OSwgMjAxOC8wOC8xMy0xNjo0MDoyMiAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJo dHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlw dGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wTU09Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEu MC9tbS8iIHhtbG5zOnN0UmVmPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvc1R5cGUvUmVz b3VyY2VSZWYjIiB4bWxuczp4bXA9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC8iIHhtcE1N OkRvY3VtZW50SUQ9InhtcC5kaWQ6Njg2Njg2MTAwRDEzMTFFOTg1OEREMTQ2NTU1Qjg5RTUiIHht cE1NOkluc3RhbmNlSUQ9InhtcC5paWQ6Njg2Njg2MEYwRDEzMTFFOTg1OEREMTQ2NTU1Qjg5RTUi IHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENDIDIwMTkgKE1hY2ludG9zaCkiPiA8 eG1wTU06RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0iYWRvYmU6ZG9jaWQ6cGhvdG9zaG9w OmRmOTM1NGYxLTFiODYtNDE0Zi1hYmE2LWIzZDg0OGUzYjMxYiIgc3RSZWY6ZG9jdW1lbnRJRD0i YWRvYmU6ZG9jaWQ6cGhvdG9zaG9wOmRmOTM1NGYxLTFiODYtNDE0Zi1hYmE2LWIzZDg0OGUzYjMx YiIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1wbWV0YT4gPD94cGFja2V0 IGVuZD0iciI/Pu955OsAAApiSURBVHja7Ft9jFxVFT/vY+bNzu623bZb21LakkIF/K7VIn7sWoKl CoohUk3xo7FRA2lFk0YxfFTQiMYQrdEGKKgQykcgoYrEWFA+gqIVKk1qxabBQivtLu12d2d2Zt68 967nvHdm+/bu+5o3M/LPO+nZeZ/33nfv757zO+feKkIIyCSTdouadUEmGbAyyYCVSQasTDLJgJVJ BqxMMmBlkkkGrEwyYGWSASuTTNoouqIoWS/8n+SM+X3w0PbNOJsVGCtXgHv+KdRfsYIQDuTyBTC6 iiAcp6PtobqMQhfkcnkwzRo4tg2anoM6HWPdrWAjs1jTZRFqvhMFHz02AjvufwYWLpwHBRzQvKdn ofbxsQuoXN4AYTtA67idVoSXH2nejwtogcgWC9KWq2c4miYfRL0bdRy10qYy16P+mQ7+8uJBsBwV isVusNBCoJyU6/HGd3LAt6Ne0uZv3Ir6awEaWiXVBRKC4Vys8Uo8vzlnFKBaKRf0XP4uVVEuxptH miz/RBywrkL9kHSNOnxLwgpuQe33nd+L+lzCd9+JerVU73WoFp9/BvWimDJM1GHUf6P+DfWVBPU+ iLoc9eY2DuSsxkFPT5drkfJGHVQPWAEuasrpuahL2wysBfSHXB15OwTVW3U99w/QcgZeWNNV7BnE 85O1WmWTYXT9iepvchfM0ihg5RkIQUKVPZ6gguul86+gvh31nzHvLeYZ3i1dv8EHrCtQ1zXZob9B /Qm3P25CHES9v00DWZsktZqGvAbdnWGBYtlJ3j3ZAassV7wT1fBALS6k8enpnbk6ZxiH6rXq4pzR 9RTib6AJcA1HAWtDxL0fJATWIdRlvnNig88zuF4NeWc26l8DQDU8lRDA6yk69JOsd6B+NebZB1BL qL9t54gqqorAKqDFMkHV7DRF0MSyWyDW9GJZUFeKSeu1YqrFFGej5Xq8u3fmpTU9d9i27cF83ngE b31imnX1yiGunpsSFUY04NsxbmqQI5pmpRd1D+r55Iule10MvPkpZyFFVmUGYI477b1sAWXLuSQB d3mMqcAf20XoVVWDQlc3gssCzWoeWIinbbZt3eqg9dAQpIltyCRZBxV/xwjguqbT5RlS+fTcgGXV RzFKrFDUWK1UyG1eoen6POEjgHRMANc0/QP45q4kwFoj+fU6R5Ca79oPUVel7N95bJXIclV918n9 nZOyTJrJG0Mi388yCZ4hfeNNqN+NKfc5BuYX2V1EjSXN6AuigaG4HCuXz6P1SmWxhpFUD9eqE26K QFWiA3vBJoqAhMBwn6/XTUAO5aYZEBxWQCP34/MnLHyu4f442hvyu0PvWwoErFfkbgkD1rek80dR 7+Pfhrwf9V2oL6UEArnIZ1Hfx+dPor67FS/DgB2SrjvMIV5A3SdZHoqOtqGOxJR9HPVHCdM3kcCi /FBlokRRF1ipLJYXxZXGxzxgqVoSc9VwWm6ao9jdA7ZlAQEnpJZ+BM2JhnFSVVWyuqoXYOAfKqNu VvumJUhDBvyjAcDaxa5rju/6rahrWwDDSg7tyX2tbgeFibj3MuqN3GY58v1ZxHsDqDvY9R+NqX9u rL/GAS2NjUIZgZEGWIisfsus99uOjZZCgwSEmlBh4nMjlYkyjAwfh54Zs6AX1fHyVZHWlYonIBWK RfeYrJ47Ocol/A1vfxCwviOdVxlUJLehft937xIOh/+VsFt+yYR4U0yQcBdHUle3ORq6LwBYqyOA daGPRx5tRwMIEOXSGEyUSo08VnMixGYc8WtosOu2lXSy1REkL6Pr2+4Y9o6J8rhr9Qgk8cAUOBls DDQ0mD13HlrKUTgxfAzsuuVeSwosisQ+H5DXKfMxherfkywDRYifTtgtBKrNTP4HQp7Zx1zpxg6E 2a9z+D7bd21RBM/8PR8fZn5pt9oABy0WWSua8ZadaslGjwm6QlJHygo0QHfi75kKKDeRK04aWVrY Zlr+QTTCG0PH0I1PuBzNZ+2KcllyAzfKYaPELSZQ7+SoqiGX8+Akyc6ezb+DnLQ8J2DgGwnZ8zoA LJHQfX5K4pNtW6HwLNY4lMtlsJMBq6+9XaDQhL0Hu+JQE5Go2+bXXv0PFIwC8jQDrVjdb+0Oy2uL cod9QzrfHZDM/CbzkqKUUNyQoI1+tkgkd78vtTDOUeZ4wLPtkgWStWrkx+T83d3QIXFcjjWCVqsa DqypUD/IEzLtfwBFouR+ty5RmJ8nm4oIGE0HWxDfUqBqmq4Fy+dzLsdjLnagbtYGzVrtCTymemxd CpWXSMUuYY7hf24s4CO/BN4yzxtNfPBJjghf4/NVvuNOyecCrj0vke8wULVlqwHN8joODoX84RZL 8Xfx1+QLzWDKe1FQRPwe341C0raqCCpVU5HAK9AIDm0i7zUTKIsmkOdRKgM539NoxVY6QryI4Jrv B8z1AWUvZ00i9P61TX75Ec5l0Yce6DCozmLLKss9U7lIaI5MxI6il+6I9Suqnnez7iIQq0IGl0gL KjeMU1x4mRGeIzQ9oagIKuJSUxfFvfQDfoeD7pC22mB4uFBVtf/isy/hbFmpqNpO3cd9Lmhx4CiC o0XiZncE7G8jfzoecu8yjkgLAVGif2Ha5k7PB6QRbg8Z4LnMDb8O3lrkulgrompu/olcS9BnKO7A aS14vwawHPxHn6Q0GXSII9gIbII2bUXcZ85ca+alJATlHx9EIH5YVdS9SPRXNIB1Q8CrX+ZoKCq/ 438vx4nVrfDmiM7pkBKcXtKZz4nct4XktTYkLLtbClggIBfWSMTOiuIvxG9d4JCKiIdO3yQ6MjPV PKPdC+6yjZgj3TRiApM1CJiDkGDRmSYJPreIg66n0RoPIJgrVGgP6hek5/+egMA+yf7fvy1mC7sb +00AlhYQfITJ71CvBG+pqhU5zBPMPwF/AV4mf2eYJVEVbx9Uwmifyvt4S3Hg9Ir8XDho1eHhlOV/ BEH2Alqy9So3XJZtCcu8TTqnSHGH73yZdP/MJtp7hnTeL8VLaRaqaTLQPq5LOXXSihxgQhxk1Wm7 zdpATkNWBK0VzfSEKpp4NkK9Oj0yru9yfz2XPITHu/3PtFjHCgTaZrJYo+BtEfF3wgMJO5cy1u+A qVuca77jn6K+RbKESeURKRUwBqf3YpHEbdtxGDzHOWdGi8lxuRsVku1i2MumPwqclFz9GOofOAB6 1m2UI0DL02Y/5FIhUaHSkTyWwnzeuZyWd5SpNV2FoNiD3nNxy7wOYJ8Q9i0ErE0tlEQZ+fUR969t oeztrGFyL4RvREwrBNyjMeDaw8FAkvTDbn52cgeHWXegt2cW1Ooq6HZYVKj43cszEL0GmigvS0GS 41i3m2Z1r0voyRWrSoOID+l6/nzULTxhiinqwIkvnrDq5o8t08n2vAdwj2Uxz9SaLPMx/8nai1bB zL45cKpsge6RY7JIXX5XaZlV3utEBlRc166g2baFL8oTp42TuyXGKSOYtypK+v9fg5bKXfqhfRQZ sKabi1qnCj9v+VK4ZuM6ODU+AXmju0HeiaOd8kdZdeQ/E6VRUNV2f1p8QrQVV+hf0P6fAAMA0CcI PP9xuG0AAAAASUVORK5CYII= ------=_Part_2895739_1302712611.1690290564381-- ------=_Part_2895738_559444056.1690290564369--