All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Venkatesh, Supreeth" <Supreeth.Venkatesh@amd.com>
To: "Bills, Jason M" <jason.m.bills@linux.intel.com>,
	"openbmc@lists.ozlabs.org" <openbmc@lists.ozlabs.org>
Subject: RE: [RFC] BMC RAS Feature
Date: Mon, 24 Jul 2023 14:29:40 +0000	[thread overview]
Message-ID: <SN6PR12MB475228EBE67B4A125D9BD2EA9602A@SN6PR12MB4752.namprd12.prod.outlook.com> (raw)
In-Reply-To: <7d5f86f9-f39a-829f-fd64-62d244c04ef4@linux.intel.com>

[AMD Official Use Only - General]

Thanks for your feedback Jason. Sorry for the delay in my response.

1. The format can be anything. [We could use phosphor-debug-collector that collects different debug dumps]
2. Agree with this path
        i. Redfish -> provided by bmcweb which pulls from
        ii. D-Bus -> provided by a new service which looks for data stored by
        iii. processor-specific collector -> provided by separate services as needed and triggered by
        iv. platform-specific monitoring service -> provided by host-error-monitor or other service as needed.
We need a repository for processor-specific collector.

Thanks,
Supreeth Venkatesh
System Manageability Architect  |  AMD
Server Software


-----Original Message-----
From: openbmc <openbmc-bounces+supreeth.venkatesh=amd.com@lists.ozlabs.org> On Behalf Of Bills, Jason M
Sent: Friday, July 14, 2023 5:05 PM
To: openbmc@lists.ozlabs.org
Subject: Re: [RFC] BMC RAS Feature

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


Sorry for missing this earlier.  Here are some of my thoughts.

On 3/20/2023 11:14 PM, Supreeth Venkatesh wrote:
>
> #### Requirements
>
> 1. Collecting RAS/Crashdump shall be processor specific. Hence the use
>     of virtual APIs to allow override for processor specific way of
>     collecting the data.
> 2. Crash data format shall be stored in common platform error record
>     (CPER) format as per UEFI specification
>     [https://uefi.org/specs/UEFI/2.10/].

Do we have to define a single output format? Could it be made to be flexible with the format of the collected crash data?

> 3. Configuration parameters of the service shall be standard with scope
>     for processor specific extensions.
>
> #### Proposed Design
>
> When one or more processors register a fatal error condition , then an
> interrupt is generated to the host processor.
>
> The host processor in the failed state asserts the signal to indicate
> to the BMC that a fatal hang has occurred. [APML_ALERT# in case of AMD
> processor family]
>
> BMC RAS application listens on the event [APML_ALERT# in case of AMD
> processor family ].

The host-error-monitor application provides support for listening for events and taking action such as logging or triggering a crashdump that may meet this requirement.


One thought may be to break this up into various layers to allow for flexibility and standardization. For example:
1. Redfish -> provided by bmcweb which pulls from 2. D-Bus -> provided by a new service which looks for data stored by 3. processor-specific collector -> provided by separate services as needed and triggered by 4. platform-specific monitoring service -> provided by host-error-monitor or other service as needed.

Ideally, we could make 2 a generic service.

>
> Upon detection of FATAL error event , BMC will check the status
> register of the host processor [implementation defined method] to see
>
> if the assertion is due to the fatal error.
>
> Upon fatal error , BMC will attempt to harvest crash data from all
> processors. [via the APML interface (mailbox) in case of AMD processor
> family].
>
> BMC will generate a single raw crashdump record and saves it in
> non-volatile location /var/lib/bmc-ras.
>


      parent reply	other threads:[~2023-07-24 14:31 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-21  5:14 [RFC] BMC RAS Feature Supreeth Venkatesh
2023-03-21 10:40 ` Patrick Williams
2023-03-21 15:07   ` Supreeth Venkatesh
2023-03-21 16:26     ` dhruvaraj S
2023-03-21 17:25       ` Supreeth Venkatesh
2023-03-22  7:10         ` Lei Yu
2023-03-23  0:07           ` Supreeth Venkatesh
2023-04-03 11:44             ` Patrick Williams
2023-04-03 16:32               ` Supreeth Venkatesh
     [not found]             ` <d65937a46b6fb4f9f94edbdef44af58e@imap.linux.ibm.com>
2023-04-03 16:36               ` Supreeth Venkatesh
2023-07-21 10:29                 ` J Dhanasekar
2023-07-21 14:03                   ` Venkatesh, Supreeth
2023-07-24 13:04                     ` J Dhanasekar
2023-07-24 14:14                       ` Venkatesh, Supreeth
2023-07-25 13:09                         ` J Dhanasekar
2023-07-25 14:02                           ` Venkatesh, Supreeth
2023-07-27 10:20                             ` J Dhanasekar
2023-07-14 22:05 ` Bills, Jason M
2023-07-15  9:01   ` dhruvaraj S
2023-07-24 14:29   ` Venkatesh, Supreeth [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR12MB475228EBE67B4A125D9BD2EA9602A@SN6PR12MB4752.namprd12.prod.outlook.com \
    --to=supreeth.venkatesh@amd.com \
    --cc=jason.m.bills@linux.intel.com \
    --cc=openbmc@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.