All of lore.kernel.org
 help / color / mirror / Atom feed
From: dhruvaraj S <dhruvaraj@gmail.com>
To: "Bills, Jason M" <jason.m.bills@linux.intel.com>
Cc: openbmc@lists.ozlabs.org
Subject: Re: [RFC] BMC RAS Feature
Date: Sat, 15 Jul 2023 14:31:07 +0530	[thread overview]
Message-ID: <CAK7WosjqQU3uL3uof8gcnfQFn0N5AjMNaww-P3mAc-hLQWsB9w@mail.gmail.com> (raw)
In-Reply-To: <7d5f86f9-f39a-829f-fd64-62d244c04ef4@linux.intel.com>

Please find a few comments on using phosphor-debug-collector for this

Phosphor-debug-collector employs a set of scripts for BMC dump
collections, which can be customised per processor architecture.
Architecture-specific dump collections are appended to dump-extensions
and activated exclusively on systems that support them, identified by
their corresponding feature code.

Data Format: The data is packaged as a basic tarball or a custom
package according to host specifications.

Event Triggering: The phosphor-debug-collector responds to specific
events to initialize dump creation. A core monitor observes a
designated directory, generating a BMC dump containing the core file
upon event detection. On IBM systems, an attention handler awaits
notifications from processors or the host to trigger dump creation via
phosphor-debug-collector.

Layered Design: The phosphor-debug-collector operates as a
processor-specific collector within the proposed layered architecture,
initiated by a platform-specific monitoring service like the
host-error-monitor. The created dumps are exposed via D-Bus, which can
then be served by bmcweb via Redfish.

Phosphor-debug-collector allows for extensions to accommodate
processor-specific parameters. This is achieved by adjusting the dump
collection scripts in line with the particular processor requirements.

The phosphor-debug-collector interacts with specific applications
during the dump collection process. For example, on IBM systems, it
invokes an IBM-specific application via the dump collection script to
retrieve the dump from the host processor.

On Sat, 15 Jul 2023 at 03:37, Bills, Jason M
<jason.m.bills@linux.intel.com> wrote:
>
> Sorry for missing this earlier.  Here are some of my thoughts.
>
> On 3/20/2023 11:14 PM, Supreeth Venkatesh wrote:
> >
> > #### Requirements
> >
> > 1. Collecting RAS/Crashdump shall be processor specific. Hence the use
> >     of virtual APIs to allow override for processor specific way of
> >     collecting the data.
> > 2. Crash data format shall be stored in common platform error record
> >     (CPER) format as per UEFI specification
> >     [https://uefi.org/specs/UEFI/2.10/].
>
> Do we have to define a single output format? Could it be made to be
> flexible with the format of the collected crash data?
>
> > 3. Configuration parameters of the service shall be standard with scope
> >     for processor specific extensions.
> >
> > #### Proposed Design
> >
> > When one or more processors register a fatal error condition , then an
> > interrupt is generated to the host processor.
> >
> > The host processor in the failed state asserts the signal to indicate to
> > the BMC that a fatal hang has occurred. [APML_ALERT# in case of AMD
> > processor family]
> >
> > BMC RAS application listens on the event [APML_ALERT# in case of AMD
> > processor family ].
>
> The host-error-monitor application provides support for listening for
> events and taking action such as logging or triggering a crashdump that
> may meet this requirement.
>
>
> One thought may be to break this up into various layers to allow for
> flexibility and standardization. For example:
> 1. Redfish -> provided by bmcweb which pulls from
> 2. D-Bus -> provided by a new service which looks for data stored by
> 3. processor-specific collector -> provided by separate services as
> needed and triggered by
> 4. platform-specific monitoring service -> provided by
> host-error-monitor or other service as needed.
>
> Ideally, we could make 2 a generic service.
>
> >
> > Upon detection of FATAL error event , BMC will check the status register
> > of the host processor [implementation defined method] to see
> >
> > if the assertion is due to the fatal error.
> >
> > Upon fatal error , BMC will attempt to harvest crash data from all
> > processors. [via the APML interface (mailbox) in case of AMD processor
> > family].
> >
> > BMC will generate a single raw crashdump record and saves it in
> > non-volatile location /var/lib/bmc-ras.
> >
>


-- 
--------------
Dhruvaraj S

  reply	other threads:[~2023-07-15  9:02 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-21  5:14 [RFC] BMC RAS Feature Supreeth Venkatesh
2023-03-21 10:40 ` Patrick Williams
2023-03-21 15:07   ` Supreeth Venkatesh
2023-03-21 16:26     ` dhruvaraj S
2023-03-21 17:25       ` Supreeth Venkatesh
2023-03-22  7:10         ` Lei Yu
2023-03-23  0:07           ` Supreeth Venkatesh
2023-04-03 11:44             ` Patrick Williams
2023-04-03 16:32               ` Supreeth Venkatesh
     [not found]             ` <d65937a46b6fb4f9f94edbdef44af58e@imap.linux.ibm.com>
2023-04-03 16:36               ` Supreeth Venkatesh
2023-07-21 10:29                 ` J Dhanasekar
2023-07-21 14:03                   ` Venkatesh, Supreeth
2023-07-24 13:04                     ` J Dhanasekar
2023-07-24 14:14                       ` Venkatesh, Supreeth
2023-07-25 13:09                         ` J Dhanasekar
2023-07-25 14:02                           ` Venkatesh, Supreeth
2023-07-27 10:20                             ` J Dhanasekar
2023-07-14 22:05 ` Bills, Jason M
2023-07-15  9:01   ` dhruvaraj S [this message]
2023-07-24 14:29   ` Venkatesh, Supreeth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAK7WosjqQU3uL3uof8gcnfQFn0N5AjMNaww-P3mAc-hLQWsB9w@mail.gmail.com \
    --to=dhruvaraj@gmail.com \
    --cc=jason.m.bills@linux.intel.com \
    --cc=openbmc@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.