openbmc.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: dhruvaraj S <dhruvaraj@gmail.com>
To: Ed Tanous <ed@tanous.net>
Cc: openbmc <openbmc@lists.ozlabs.org>,
	Brad Bishop <bradleyb@fuzziesquirrel.com>,
	Gunnar Mills <gmills@linux.vnet.ibm.com>
Subject: Re: Proposal for operations on isolated hardware units using Redfish logging
Date: Fri, 11 Dec 2020 16:57:22 +0530	[thread overview]
Message-ID: <CAK7WosjKjLKu_1vvZak7kOry9aujxNdEJ7cuuFdbqPVTnUW5Mw@mail.gmail.com> (raw)
In-Reply-To: <CACWQX8050TCOT8z5efOWQ_q7b9Ucqv6+w1X1J1NRwba9AGKq8g@mail.gmail.com>

On Thu, Dec 10, 2020 at 10:59 PM Ed Tanous <ed@tanous.net> wrote:
>
> On Thu, Dec 10, 2020 at 7:49 AM dhruvaraj S <dhruvaraj@gmail.com> wrote:
> >
> > Hi,
> > Please find the option for operations on isolated hardware units using
> > Redfisg logging
> >
> >
> > Hardware Isolation
> > On systems with multiple processor units and other redundant vital resources,
> > the system downtime can be prevented by isolating the faulty hardware units.
> > Most of the actions required to isolate the parts will be dependent on
> > the architecture and
> > executed in the host. But the BMC needs to support a few steps like
> > provide a method to users to query the units in isolation, clearing
> > isolation, isolating a
> > suspected part, or isolating when the host is down due to a fault in a
> > critical unit.
> > Since a user interface is needed for the above actions proposing a method to use
> > Redfish log service to carry out these actions.
>
> Right off the bat, LogServices seems like a strange choice for this.
> In your requirements, you're taking actions on the unit itself, not
> logging the actions that occurred, so I'm struggling to see the design
> choice here.  Can you elaborate why LogService, something intended to
> be for historical logging, would be appropriate for a design that
> needs to accept user action?

Apart from user-requested isolation of a hardware unit, usually, hardware units
get isolated due to a past action in the system. for example, if a
processor core encountered
an error while performing the activities and cannot continue in
service, that will be listed
as isolated. A method is needed to show the list of such units to the users.
Since log service is for showing such logs, I think log service is
suitable for that.
And after the repair, once the unit is back in service, this log
service entry will be marked
as resolved.

>
> >
> > Requirements
> > When user requests, isolate a hardware unit.
> > Getting the list of all isolated resources.
> > Remove the isolation of a hardware unit.
> > Remove all existing isolation
> >
> > Isolating a hardware unit:
> > redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware
> > {
> >   "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware",
> >   "@odata.type": "#LogService.v1_2_0.LogService",
> >   "Actions": {
> >     "#LogService.CollectDiagnosticData": {
> >       "target":
> > "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Actions/LogService.CollectDiagnosticData"
>
> What is this action intended to do?
>
> >     }
> >   },
> >   "Description": "Isolated Hardware",
> >   "Entries": {
> >     "@odata.id":
> > "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries"
> >   },
> >   "Id": "IsolatedHardware",
> >   "Name": "Isolated Hardware LogService",
> >   "OverWritePolicy": "WrapsWhenFull"
> >
> > Listing isolated hardware units.
> > redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware >> Entries
> > {
> >   "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries",
> >   "@odata.type": "#LogEntryCollection.LogEntryCollection",
> >   "Description": "Collection of Isolated Hardware Components",
> >   "Members": [
> >     {
> >       "@odata.id":
> > "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries/1",
> >       "@odata.type": "#LogEntry.v1_7_0.LogEntry",
> >       "Created": "2020-10-15T10:30:08+00:00",
> >       "EntryType": "Event",
> >       "Id": "1",
> >       "Resolved": "false",
>
> LogEntry doesn't have a "Resolved" field that I can see.
>
> >       "Name": "Processor 1",
> >       "links":  {
> >                  "OriginOfCondition": {
> >                         "@odata.id":
> > "/redfish/v1/Systems/system/Processors/cpu1"
> >                     },
> >       "Severity": "Critical",
> >        "SensorType" : "Processor",
>
> SensorType doesn't really make sense in this case, as you're not
> reporting errors from a sensor, but from a resource.
>
> >
> >  "AdditionalDataURI":
> > “/redfish/v1/Systems/system/LogServices/EventLog/attachement/111"
> >  “AddionalDataSizeBytes": "1024"
> >
> >   }
> >   ],
> >   "Members@odata.count": 1,
> >   "Name": "Isolated Hardware Entries"
> >
> > Users will be able to delete any entry or all the entries, but if an
> > isolated unit is serviced then that unit will be back in service, in
> > such cases the "Resolved" property in the entries will be marked as
> > "true"
> > "AdditionalDataURI" : This is a link to the error log associated with
> > this isolation action.
> > --------------
> > Dhruvaraj S
>
>
> I suspect overall you need to separate this into two different
> resources.  One for logging things that have happened in the past,
> under log service, and one for interacting directly with the system in
> its current state.  The second one would likely take the form of being
> able to set the Status property to something like "Disabled",
> "UnavailableOffline", or something similar on your Processor
> resources.

The log service is already being used to generate the dump, which is a
user-initiated
 action in log service, I am thinking the user-initiated isolation
also can be in the same place.
But as you suggested setting the disabled/UnavailableOffline on the
list of units also a good option,
need to look more into that.

-- 
--------------
Dhruvaraj S

  reply	other threads:[~2020-12-11 11:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-10 14:55 Proposal for operations on isolated hardware units using Redfish logging dhruvaraj S
2020-12-10 17:29 ` Ed Tanous
2020-12-11 11:27   ` dhruvaraj S [this message]
2020-12-11 16:12   ` Gunnar Mills
2020-12-23 16:54     ` dhruvaraj S

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAK7WosjKjLKu_1vvZak7kOry9aujxNdEJ7cuuFdbqPVTnUW5Mw@mail.gmail.com \
    --to=dhruvaraj@gmail.com \
    --cc=bradleyb@fuzziesquirrel.com \
    --cc=ed@tanous.net \
    --cc=gmills@linux.vnet.ibm.com \
    --cc=openbmc@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).