All of lore.kernel.org
 help / color / mirror / Atom feed
From: J Dhanasekar <jdhanasekar@velankanigroup.com>
To: "Venkatesh, Supreeth" <Supreeth.Venkatesh@amd.com>
Cc: Lei Yu <yulei.sh@bytedance.com>,
	Zane Shelley <zshelle@imap.linux.ibm.com>,
	Michael Shen <gpgpgp@google.com>,
	openbmc <openbmc@lists.ozlabs.org>,
	dhruvaraj S <dhruvaraj@gmail.com>,
	Brad Bishop <bradleyb@fuzziesquirrel.com>,
	Ed Tanous <ed@tanous.net>,
	"Dhandapani,  Abinaya" <Abinaya.Dhandapani@amd.com>
Subject: RE: [RFC] BMC RAS Feature
Date: Mon, 24 Jul 2023 18:34:04 +0530	[thread overview]
Message-ID: <18987ffeff9.35c4bda1801937.8894247920197462243@velankanigroup.com> (raw)
In-Reply-To: <SN6PR12MB4752B1CEE5232F40EED441C8963FA@SN6PR12MB4752.namprd12.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 8935 bytes --]

Hi Supreeth,



Thanks for the info. We hoped that Daytonax would be upstreamed. Unfortunately, It is not available. 

Actually, we need to enable SOL, Post code and PSU features in Daytona.  Will we get support for this feature enablement? or Are there any reference implementation available for AMD boards?.



Thanks,

Dhanasekar







---- On Fri, 21 Jul 2023 19:33:41 +0530 Venkatesh, Supreeth <Supreeth.Venkatesh@amd.com> wrote ---



[AMD Official Use Only - General]



Hi Dhanasekar,

 

It is supported for EPYC Genoa family and beyond at this time.

Daytona uses EPYC Milan family and support is not there in that.

 

Thanks,

Supreeth Venkatesh

System Manageability Architect  |  AMD
 Server Software



 

From: J Dhanasekar <mailto:jdhanasekar@velankanigroup.com> 
 Sent: Friday, July 21, 2023 5:30 AM
 To: Venkatesh, Supreeth <mailto:Supreeth.Venkatesh@amd.com>
 Cc: Zane Shelley <mailto:zshelle@imap.linux.ibm.com>; Lei Yu <mailto:yulei.sh@bytedance.com>; Michael Shen <mailto:gpgpgp@google.com>; openbmc <mailto:openbmc@lists.ozlabs.org>; dhruvaraj S <mailto:dhruvaraj@gmail.com>; Brad Bishop <mailto:bradleyb@fuzziesquirrel.com>; Ed Tanous <mailto:ed@tanous.net>;
 Dhandapani, Abinaya <mailto:Abinaya.Dhandapani@amd.com>
 Subject: Re: [RFC] BMC RAS Feature


 



Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.






 

Hi Supreeth Venkatesh,


 


Does this RAS feature work for the Daytona Platform.  i have been working in openBMC development for the Daytonax platform. 


If this RAS works for Daytona Platform. I will include it in my project. 


 


Please provide your suggestions. 


 


Thanks,


Dhanasekar


 


 



 


 


 


---- On Mon, 03 Apr 2023 22:06:24 +0530 Supreeth Venkatesh <mailto:supreeth.venkatesh@amd.com> wrote ---


 



On 3/23/23 13:57, Zane Shelley wrote:
 > Caution: This message originated from an External Source. Use proper 
 > caution when opening attachments, clicking links, or responding. 
 > 
 > 
 > On 2023-03-22 19:07, Supreeth Venkatesh wrote: 
 >> On 3/22/23 02:10, Lei Yu wrote: 
 >>> Caution: This message originated from an External Source. Use proper 
 >>> caution when opening attachments, clicking links, or responding. 
 >>> 
 >>> 
 >>>>> On Tue, 21 Mar 2023 at 20:38, Supreeth Venkatesh 
 >>>>> <mailto:supreeth.venkatesh@amd.com> wrote: 
 >>>>> 
 >>>>> 
 >>>>>      On 3/21/23 05:40, Patrick Williams wrote: 
 >>>>>      > On Tue, Mar 21, 2023 at 12:14:45AM -0500, Supreeth Venkatesh 
 >>>>> wrote: 
 >>>>>      > 
 >>>>>      >> #### Alternatives Considered 
 >>>>>      >> 
 >>>>>      >> In-band mechanisms using System Management Mode (SMM) 
 >>>>> exists. 
 >>>>>      >> 
 >>>>>      >> However, out of band method to gather RAS data is processor 
 >>>>>      specific. 
 >>>>>      >> 
 >>>>>      > How does this compare with existing implementations in 
 >>>>>      > phosphor-debug-collector. 
 >>>>>      Thanks for your feedback. See below. 
 >>>>>      > I believe there was some attempt to extend 
 >>>>>      > P-D-C previously to handle Intel's crashdump behavior. 
 >>>>>      Intel's crashdump interface uses com.intel.crashdump. 
 >>>>>      We have implemented com.amd.crashdump based on that reference. 
 >>>>>      However, 
 >>>>>      can this be made generic? 
 >>>>> 
 >>>>>      PoC below: 
 >>>>> 
 >>>>>      busctl tree com.amd.crashdump 
 >>>>> 
 >>>>>      └─/com 
 >>>>>         └─/com/amd 
 >>>>>           └─/com/amd/crashdump 
 >>>>>             ├─/com/amd/crashdump/0 
 >>>>>             ├─/com/amd/crashdump/1 
 >>>>>             ├─/com/amd/crashdump/2 
 >>>>>             ├─/com/amd/crashdump/3 
 >>>>>             ├─/com/amd/crashdump/4 
 >>>>>             ├─/com/amd/crashdump/5 
 >>>>>             ├─/com/amd/crashdump/6 
 >>>>>             ├─/com/amd/crashdump/7 
 >>>>>             ├─/com/amd/crashdump/8 
 >>>>>             └─/com/amd/crashdump/9 
 >>>>> 
 >>>>>      > The repository 
 >>>>>      > currently handles IBM's processors, I think, or maybe that is 
 >>>>>      covered by 
 >>>>>      > openpower-debug-collector. 
 >>>>>      > 
 >>>>>      > In any case, I think you should look at the existing D-Bus 
 >>>>>      interfaces 
 >>>>>      > (and associated Redfish implementation) of these repositories 
 >>>>> and 
 >>>>>      > determine if you can use those approaches (or document why 
 >>>>> now). 
 >>>>>      I could not find an existing D-Bus interface for RAS in 
 >>>>>      xyz/openbmc_project/. 
 >>>>>      It would be helpful if you could point me to it. 
 >>>>> 
 >>>>> 
 >>>>> There is an interface for the dumps generated from the host, which 
 >>>>> can 
 >>>>> be used for these kinds of dumps 
 >>>>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Dump/Entry/System.interface.yaml 
 >>>>> 
 >>>>> 
 >>>>> The fault log also provides similar dumps 
 >>>>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Dump/Entry/FaultLog.interface.yaml 
 >>>>> 
 >>>>> 
 >>>> ThanksDdhruvraj. The interface looks useful for the purpose. However, 
 >>>> the current BMCWEB implementation references 
 >>>> https://github.com/openbmc/bmcweb/blob/master/redfish-core/lib/log_services.hpp 
 >>>> 
 >>>> [com.intel.crashdump] 
 >>>> constexpr char const* crashdumpPath = "/com/intel/crashdump"; 
 >>>> 
 >>>> constexpr char const* crashdumpInterface = "com.intel.crashdump"; 
 >>>> constexpr char const* crashdumpObject = "com.intel.crashdump"; 
 >>>> 
 >>>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Dump/Entry/System.interface.yaml 
 >>>> 
 >>>> or 
 >>>> https://github.com/openbmc/phosphor-dbus-interfaces/blob/master/yaml/xyz/openbmc_project/Dump/Entry/FaultLog.interface.yaml 
 >>>> 
 >>>> is it exercised in Redfish logservices? 
 >>> In our practice, a plugin `tools/dreport.d/plugins.d/acddump` is added 
 >>> to copy the crashdump json file to the dump tarball. 
 >>> The crashdump tool (Intel or AMD) could trigger a dump after the 
 >>> crashdump is completed, and then we could get a dump entry containing 
 >>> the crashdump. 
 >> Thanks Lei Yu for your input. We are using Redfish to retrieve the 
 >> CPER binary file which can then be passed through a plugin/script for 
 >> detailed analysis. 
 >> In any case irrespective of whichever Dbus interface we use, we need a 
 >> repository which will gather data from AMD processor via APML as per 
 >> AMD design. 
 >> APML 
 >> Spec: https://www.amd.com/system/files/TechDocs/57019-A0-PUB_3.00.zip 
 >> Can someone please help create bmc-ras or amd-debug-collector 
 >> repository as there are instances of openpower-debug-collector 
 >> repository used for Open Power systems? 
 >>> 
 >>> 
 >>> -- 
 >>> BRs, 
 >>> Lei YU 
 > I am interested in possibly standardizing some of this. IBM POWER has 
 > several related components. openpower-hw-diags is a service that will 
 > listen for the hardware interrupts via a GPIO pin. When an error is 
 > detected, it will use openpower-libhei to query hardware registers to 
 > determine what happened. Based on that information openpower-hw-diags 
 > will generate a PEL, which is an extended log in phosphor-logging, that 
 > is used to tell service what to replace if necessary. Afterward, 
 > openpower-hw-diags will initiate openpower-debug-collector, which 
 > gathers a significant amount of data from the hardware for additional 
 > debug when necessary. I wrote openpower-libhei to be fairly agnostic. It 
 > uses data files (currently XML, but moving to JSON) to define register 
 > addresses and rules for isolation. openpower-hw-diags is fairly POWER 
 > specific, but I can see some parts can be made generic. Dhruv would have 
 > to help with openpower-debug-collector. 
 Thank you. Lets collaborate in standardizing some aspects of it. 
 > 
 > Regarding creation of a new repository, I think we'll need to have some 
 > more collaboration to determine the scope before creating it. It 
 > certainly sounds like we are doing similar things, but we need to 
 > determine if enough can be abstracted to make it worth our time. 
 I have put in a request here: 
 https://github.com/openbmc/technical-oversight-forum/issues/24 
 Please chime in.



 



 

[-- Attachment #2.1: Type: text/html, Size: 25597 bytes --]

[-- Attachment #2.2: 1.png --]
[-- Type: image/png, Size: 3608 bytes --]

  reply	other threads:[~2023-07-24 13:05 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-21  5:14 [RFC] BMC RAS Feature Supreeth Venkatesh
2023-03-21 10:40 ` Patrick Williams
2023-03-21 15:07   ` Supreeth Venkatesh
2023-03-21 16:26     ` dhruvaraj S
2023-03-21 17:25       ` Supreeth Venkatesh
2023-03-22  7:10         ` Lei Yu
2023-03-23  0:07           ` Supreeth Venkatesh
2023-04-03 11:44             ` Patrick Williams
2023-04-03 16:32               ` Supreeth Venkatesh
     [not found]             ` <d65937a46b6fb4f9f94edbdef44af58e@imap.linux.ibm.com>
2023-04-03 16:36               ` Supreeth Venkatesh
2023-07-21 10:29                 ` J Dhanasekar
2023-07-21 14:03                   ` Venkatesh, Supreeth
2023-07-24 13:04                     ` J Dhanasekar [this message]
2023-07-24 14:14                       ` Venkatesh, Supreeth
2023-07-25 13:09                         ` J Dhanasekar
2023-07-25 14:02                           ` Venkatesh, Supreeth
2023-07-27 10:20                             ` J Dhanasekar
2023-07-14 22:05 ` Bills, Jason M
2023-07-15  9:01   ` dhruvaraj S
2023-07-24 14:29   ` Venkatesh, Supreeth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18987ffeff9.35c4bda1801937.8894247920197462243@velankanigroup.com \
    --to=jdhanasekar@velankanigroup.com \
    --cc=Abinaya.Dhandapani@amd.com \
    --cc=Supreeth.Venkatesh@amd.com \
    --cc=bradleyb@fuzziesquirrel.com \
    --cc=dhruvaraj@gmail.com \
    --cc=ed@tanous.net \
    --cc=gpgpgp@google.com \
    --cc=openbmc@lists.ozlabs.org \
    --cc=yulei.sh@bytedance.com \
    --cc=zshelle@imap.linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.