* Proposal for operations on isolated hardware units using Redfish logging @ 2020-12-10 14:55 dhruvaraj S 2020-12-10 17:29 ` Ed Tanous 0 siblings, 1 reply; 5+ messages in thread From: dhruvaraj S @ 2020-12-10 14:55 UTC (permalink / raw) To: openbmc; +Cc: bradleyb, gmills Hi, Please find the option for operations on isolated hardware units using Redfisg logging Hardware Isolation On systems with multiple processor units and other redundant vital resources, the system downtime can be prevented by isolating the faulty hardware units. Most of the actions required to isolate the parts will be dependent on the architecture and executed in the host. But the BMC needs to support a few steps like provide a method to users to query the units in isolation, clearing isolation, isolating a suspected part, or isolating when the host is down due to a fault in a critical unit. Since a user interface is needed for the above actions proposing a method to use Redfish log service to carry out these actions. Requirements When user requests, isolate a hardware unit. Getting the list of all isolated resources. Remove the isolation of a hardware unit. Remove all existing isolation Isolating a hardware unit: redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware { "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware", "@odata.type": "#LogService.v1_2_0.LogService", "Actions": { "#LogService.CollectDiagnosticData": { "target": "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Actions/LogService.CollectDiagnosticData" } }, "Description": "Isolated Hardware", "Entries": { "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries" }, "Id": "IsolatedHardware", "Name": "Isolated Hardware LogService", "OverWritePolicy": "WrapsWhenFull" Listing isolated hardware units. redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware >> Entries { "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries", "@odata.type": "#LogEntryCollection.LogEntryCollection", "Description": "Collection of Isolated Hardware Components", "Members": [ { "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries/1", "@odata.type": "#LogEntry.v1_7_0.LogEntry", "Created": "2020-10-15T10:30:08+00:00", "EntryType": "Event", "Id": "1", "Resolved": "false", "Name": "Processor 1", "links": { "OriginOfCondition": { "@odata.id": "/redfish/v1/Systems/system/Processors/cpu1" }, "Severity": "Critical", "SensorType" : "Processor", "AdditionalDataURI": “/redfish/v1/Systems/system/LogServices/EventLog/attachement/111" “AddionalDataSizeBytes": "1024" } ], "Members@odata.count": 1, "Name": "Isolated Hardware Entries" Users will be able to delete any entry or all the entries, but if an isolated unit is serviced then that unit will be back in service, in such cases the "Resolved" property in the entries will be marked as "true" "AdditionalDataURI" : This is a link to the error log associated with this isolation action. -------------- Dhruvaraj S ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Proposal for operations on isolated hardware units using Redfish logging 2020-12-10 14:55 Proposal for operations on isolated hardware units using Redfish logging dhruvaraj S @ 2020-12-10 17:29 ` Ed Tanous 2020-12-11 11:27 ` dhruvaraj S 2020-12-11 16:12 ` Gunnar Mills 0 siblings, 2 replies; 5+ messages in thread From: Ed Tanous @ 2020-12-10 17:29 UTC (permalink / raw) To: dhruvaraj S; +Cc: openbmc, Brad Bishop, Gunnar Mills On Thu, Dec 10, 2020 at 7:49 AM dhruvaraj S <dhruvaraj@gmail.com> wrote: > > Hi, > Please find the option for operations on isolated hardware units using > Redfisg logging > > > Hardware Isolation > On systems with multiple processor units and other redundant vital resources, > the system downtime can be prevented by isolating the faulty hardware units. > Most of the actions required to isolate the parts will be dependent on > the architecture and > executed in the host. But the BMC needs to support a few steps like > provide a method to users to query the units in isolation, clearing > isolation, isolating a > suspected part, or isolating when the host is down due to a fault in a > critical unit. > Since a user interface is needed for the above actions proposing a method to use > Redfish log service to carry out these actions. Right off the bat, LogServices seems like a strange choice for this. In your requirements, you're taking actions on the unit itself, not logging the actions that occurred, so I'm struggling to see the design choice here. Can you elaborate why LogService, something intended to be for historical logging, would be appropriate for a design that needs to accept user action? > > Requirements > When user requests, isolate a hardware unit. > Getting the list of all isolated resources. > Remove the isolation of a hardware unit. > Remove all existing isolation > > Isolating a hardware unit: > redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware > { > "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware", > "@odata.type": "#LogService.v1_2_0.LogService", > "Actions": { > "#LogService.CollectDiagnosticData": { > "target": > "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Actions/LogService.CollectDiagnosticData" What is this action intended to do? > } > }, > "Description": "Isolated Hardware", > "Entries": { > "@odata.id": > "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries" > }, > "Id": "IsolatedHardware", > "Name": "Isolated Hardware LogService", > "OverWritePolicy": "WrapsWhenFull" > > Listing isolated hardware units. > redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware >> Entries > { > "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries", > "@odata.type": "#LogEntryCollection.LogEntryCollection", > "Description": "Collection of Isolated Hardware Components", > "Members": [ > { > "@odata.id": > "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries/1", > "@odata.type": "#LogEntry.v1_7_0.LogEntry", > "Created": "2020-10-15T10:30:08+00:00", > "EntryType": "Event", > "Id": "1", > "Resolved": "false", LogEntry doesn't have a "Resolved" field that I can see. > "Name": "Processor 1", > "links": { > "OriginOfCondition": { > "@odata.id": > "/redfish/v1/Systems/system/Processors/cpu1" > }, > "Severity": "Critical", > "SensorType" : "Processor", SensorType doesn't really make sense in this case, as you're not reporting errors from a sensor, but from a resource. > > "AdditionalDataURI": > “/redfish/v1/Systems/system/LogServices/EventLog/attachement/111" > “AddionalDataSizeBytes": "1024" > > } > ], > "Members@odata.count": 1, > "Name": "Isolated Hardware Entries" > > Users will be able to delete any entry or all the entries, but if an > isolated unit is serviced then that unit will be back in service, in > such cases the "Resolved" property in the entries will be marked as > "true" > "AdditionalDataURI" : This is a link to the error log associated with > this isolation action. > -------------- > Dhruvaraj S I suspect overall you need to separate this into two different resources. One for logging things that have happened in the past, under log service, and one for interacting directly with the system in its current state. The second one would likely take the form of being able to set the Status property to something like "Disabled", "UnavailableOffline", or something similar on your Processor resources. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Proposal for operations on isolated hardware units using Redfish logging 2020-12-10 17:29 ` Ed Tanous @ 2020-12-11 11:27 ` dhruvaraj S 2020-12-11 16:12 ` Gunnar Mills 1 sibling, 0 replies; 5+ messages in thread From: dhruvaraj S @ 2020-12-11 11:27 UTC (permalink / raw) To: Ed Tanous; +Cc: openbmc, Brad Bishop, Gunnar Mills On Thu, Dec 10, 2020 at 10:59 PM Ed Tanous <ed@tanous.net> wrote: > > On Thu, Dec 10, 2020 at 7:49 AM dhruvaraj S <dhruvaraj@gmail.com> wrote: > > > > Hi, > > Please find the option for operations on isolated hardware units using > > Redfisg logging > > > > > > Hardware Isolation > > On systems with multiple processor units and other redundant vital resources, > > the system downtime can be prevented by isolating the faulty hardware units. > > Most of the actions required to isolate the parts will be dependent on > > the architecture and > > executed in the host. But the BMC needs to support a few steps like > > provide a method to users to query the units in isolation, clearing > > isolation, isolating a > > suspected part, or isolating when the host is down due to a fault in a > > critical unit. > > Since a user interface is needed for the above actions proposing a method to use > > Redfish log service to carry out these actions. > > Right off the bat, LogServices seems like a strange choice for this. > In your requirements, you're taking actions on the unit itself, not > logging the actions that occurred, so I'm struggling to see the design > choice here. Can you elaborate why LogService, something intended to > be for historical logging, would be appropriate for a design that > needs to accept user action? Apart from user-requested isolation of a hardware unit, usually, hardware units get isolated due to a past action in the system. for example, if a processor core encountered an error while performing the activities and cannot continue in service, that will be listed as isolated. A method is needed to show the list of such units to the users. Since log service is for showing such logs, I think log service is suitable for that. And after the repair, once the unit is back in service, this log service entry will be marked as resolved. > > > > > Requirements > > When user requests, isolate a hardware unit. > > Getting the list of all isolated resources. > > Remove the isolation of a hardware unit. > > Remove all existing isolation > > > > Isolating a hardware unit: > > redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware > > { > > "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware", > > "@odata.type": "#LogService.v1_2_0.LogService", > > "Actions": { > > "#LogService.CollectDiagnosticData": { > > "target": > > "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Actions/LogService.CollectDiagnosticData" > > What is this action intended to do? > > > } > > }, > > "Description": "Isolated Hardware", > > "Entries": { > > "@odata.id": > > "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries" > > }, > > "Id": "IsolatedHardware", > > "Name": "Isolated Hardware LogService", > > "OverWritePolicy": "WrapsWhenFull" > > > > Listing isolated hardware units. > > redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware >> Entries > > { > > "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries", > > "@odata.type": "#LogEntryCollection.LogEntryCollection", > > "Description": "Collection of Isolated Hardware Components", > > "Members": [ > > { > > "@odata.id": > > "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries/1", > > "@odata.type": "#LogEntry.v1_7_0.LogEntry", > > "Created": "2020-10-15T10:30:08+00:00", > > "EntryType": "Event", > > "Id": "1", > > "Resolved": "false", > > LogEntry doesn't have a "Resolved" field that I can see. > > > "Name": "Processor 1", > > "links": { > > "OriginOfCondition": { > > "@odata.id": > > "/redfish/v1/Systems/system/Processors/cpu1" > > }, > > "Severity": "Critical", > > "SensorType" : "Processor", > > SensorType doesn't really make sense in this case, as you're not > reporting errors from a sensor, but from a resource. > > > > > "AdditionalDataURI": > > “/redfish/v1/Systems/system/LogServices/EventLog/attachement/111" > > “AddionalDataSizeBytes": "1024" > > > > } > > ], > > "Members@odata.count": 1, > > "Name": "Isolated Hardware Entries" > > > > Users will be able to delete any entry or all the entries, but if an > > isolated unit is serviced then that unit will be back in service, in > > such cases the "Resolved" property in the entries will be marked as > > "true" > > "AdditionalDataURI" : This is a link to the error log associated with > > this isolation action. > > -------------- > > Dhruvaraj S > > > I suspect overall you need to separate this into two different > resources. One for logging things that have happened in the past, > under log service, and one for interacting directly with the system in > its current state. The second one would likely take the form of being > able to set the Status property to something like "Disabled", > "UnavailableOffline", or something similar on your Processor > resources. The log service is already being used to generate the dump, which is a user-initiated action in log service, I am thinking the user-initiated isolation also can be in the same place. But as you suggested setting the disabled/UnavailableOffline on the list of units also a good option, need to look more into that. -- -------------- Dhruvaraj S ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Proposal for operations on isolated hardware units using Redfish logging 2020-12-10 17:29 ` Ed Tanous 2020-12-11 11:27 ` dhruvaraj S @ 2020-12-11 16:12 ` Gunnar Mills 2020-12-23 16:54 ` dhruvaraj S 1 sibling, 1 reply; 5+ messages in thread From: Gunnar Mills @ 2020-12-11 16:12 UTC (permalink / raw) To: Ed Tanous, dhruvaraj S; +Cc: openbmc, Brad Bishop On 12/10/2020 10:29 AM, Ed Tanous wrote: > On Thu, Dec 10, 2020 at 7:49 AM dhruvaraj S <dhruvaraj@gmail.com> wrote: >> >> >> Listing isolated hardware units. >> redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware >> Entries >> { >> "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries", >> "@odata.type": "#LogEntryCollection.LogEntryCollection", >> "Description": "Collection of Isolated Hardware Components", >> "Members": [ >> { >> "@odata.id": >> "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries/1", >> "@odata.type": "#LogEntry.v1_7_0.LogEntry", >> "Created": "2020-10-15T10:30:08+00:00", >> "EntryType": "Event", >> "Id": "1", >> "Resolved": "false", > > LogEntry doesn't have a "Resolved" field that I can see. Part of Redfish's 2020.4. Matches OpenBMC's https://github.com/openbmc/phosphor-dbus-interfaces/blob/05dd96872560bc6f11616be48b1873f539904142/xyz/openbmc_project/Logging/Entry.interface.yaml#L29 > >> "Name": "Processor 1", >> "links": { >> "OriginOfCondition": { >> "@odata.id": >> "/redfish/v1/Systems/system/Processors/cpu1" >> }, >> "Severity": "Critical", >> "SensorType" : "Processor", > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Proposal for operations on isolated hardware units using Redfish logging 2020-12-11 16:12 ` Gunnar Mills @ 2020-12-23 16:54 ` dhruvaraj S 0 siblings, 0 replies; 5+ messages in thread From: dhruvaraj S @ 2020-12-23 16:54 UTC (permalink / raw) To: Gunnar Mills; +Cc: Brad Bishop, openbmc, Ed Tanous HI, Updated, instead of using LogService.CollectDiagnosticDatato manually isolate the hardware, new proposal is to set the property ReadyToRemove to True" redfish » v1 » Systems » system » Processors » CPU1 { "@odata.type": "#Processor.v1_7_0.Processor", "Id":view details "CPU1", "Name": "Processor", "Socket": "CPU 1", "ProcessorType": "CPU", "ProcessorId": { "VendorId": "XXXX", "IdentificationRegisters": "XXXX", } , "MaxSpeedMHz": 3700, "TotalCores": 8, "TotalThreads": 16, "Status": { "State": "Enabled", "Health": "OK" "ReadyToRemove": "True" <--- } , "@odata.id":view details "/redfish/v1/Systems/system/Processors/CPU1" } On Fri, Dec 11, 2020 at 9:42 PM Gunnar Mills <gmills@linux.vnet.ibm.com> wrote: > > On 12/10/2020 10:29 AM, Ed Tanous wrote: > > On Thu, Dec 10, 2020 at 7:49 AM dhruvaraj S <dhruvaraj@gmail.com> wrote: > >> > >> > >> Listing isolated hardware units. > >> redfish >> v1 >> Systems >> system >> LogServices >> IsolatedHardware >> Entries > >> { > >> "@odata.id": "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries", > >> "@odata.type": "#LogEntryCollection.LogEntryCollection", > >> "Description": "Collection of Isolated Hardware Components", > >> "Members": [ > >> { > >> "@odata.id": > >> "/redfish/v1/Systems/system/LogServices/IsolatedHardware/Entries/1", > >> "@odata.type": "#LogEntry.v1_7_0.LogEntry", > >> "Created": "2020-10-15T10:30:08+00:00", > >> "EntryType": "Event", > >> "Id": "1", > >> "Resolved": "false", > > > > LogEntry doesn't have a "Resolved" field that I can see. > > Part of Redfish's 2020.4. Matches OpenBMC's > https://github.com/openbmc/phosphor-dbus-interfaces/blob/05dd96872560bc6f11616be48b1873f539904142/xyz/openbmc_project/Logging/Entry.interface.yaml#L29 > > > > >> "Name": "Processor 1", > >> "links": { > >> "OriginOfCondition": { > >> "@odata.id": > >> "/redfish/v1/Systems/system/Processors/cpu1" > >> }, > >> "Severity": "Critical", > >> "SensorType" : "Processor", > > > > -- -------------- Dhruvaraj S ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-12-23 16:56 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-10 14:55 Proposal for operations on isolated hardware units using Redfish logging dhruvaraj S 2020-12-10 17:29 ` Ed Tanous 2020-12-11 11:27 ` dhruvaraj S 2020-12-11 16:12 ` Gunnar Mills 2020-12-23 16:54 ` dhruvaraj S
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).