* BMC and Host State Management Refactor @ 2016-11-18 20:31 Andrew Geissler 2016-11-21 5:55 ` Joel Stanley 0 siblings, 1 reply; 11+ messages in thread From: Andrew Geissler @ 2016-11-18 20:31 UTC (permalink / raw) To: OpenBMC Maillist Josh and I are working two stories this sprint that deal with refactoring the bmc and host state management code out of skeleton (#772/#783). Here’s the proposal on this work. A new repository will be created, phosphor-state-manager. Within this repo we will support both the BMC and Host state management functions. The bmc and host function will be broken into separate objects within the repo for cases where people only want one or the other. For host management, we will be moving skeleton/op-hostctl into this new repo. The primary responsibility of this code is to initiate a host power on/off/reboot via systemd and to monitor systemd targets to maintain a coherent state of the host. We will also be moving a portion of op-hostctl into it’s own standalone application that will be run in a systemd target and do the FSI bit bang magic to start the SBE/Hostboot. Note that we will be moving a portion of the skeleton/pychassisctl out as well (specifically the power on/off/reboot and query functions). For the BMC state management work we will be moving skeleton/bmcctl into this new repo. The primary responsibility of this code will be to provide reboot and current state information on the BMC. Note the actual state managemen function is still contained within systemd and the corresponding units we've associated with it. This code we're writing here just iniates the systemd targets on user requests and tracks the defined states by monitoring systemd progress. The overall design for both state management objects is that they will provide a set of properties on which to operate. - DesiredState - CurrentState CurrentState will be a read only property. The host control will have these additional properties: - DesiredPowerState - CurrentPowerState Valid states to request for OpenBMC (DesiredState) - READY, REBOOT Valid states to be in for OpenBMC (CurrentState) - NOT_READY, READY (note this proposal removes BASE_APPS and BMC_STARTING). READY implies all services started and running successfully (i.e. we reached obmc-standby.target) Valid states to request for Host (DesiredState) - OFF, ON, REBOOT Validate states to be in for Host (CurrentState) - OFF, BOOTING, BOOTED BOOTED implies petitboot has reached it's "ready" state. Valid power states to request for DesiredPowerState - OFF, ON Validate states to be in for CurrentPowerState - OFF, TRANSITION, ON Patrick and I had some discussion on providing more details on the host booted state (petitboot vs. os booted) but that can be an extension in the future if needed. Yaml for review will be available shortly in gerrit. Comments always welcome! Andrew and Josh ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BMC and Host State Management Refactor 2016-11-18 20:31 BMC and Host State Management Refactor Andrew Geissler @ 2016-11-21 5:55 ` Joel Stanley 2016-11-22 2:28 ` Andrew Geissler 0 siblings, 1 reply; 11+ messages in thread From: Joel Stanley @ 2016-11-21 5:55 UTC (permalink / raw) To: Andrew Geissler, jdking; +Cc: OpenBMC Maillist Hi Andrew and Josh, On Sat, Nov 19, 2016 at 7:01 AM, Andrew Geissler <geissonator@gmail.com> wrote: > Josh and I are working two stories this sprint that deal with > refactoring the bmc and host state management code out of skeleton > (#772/#783). Here’s the proposal on this work. Thanks for sending out your plan, this is great. I have a few comments that came up as I was reading. > The overall design for both state management objects is that they will > provide a set of properties on which to operate. > - DesiredState > - CurrentState > > CurrentState will be a read only property. You've chosen to make the desired and current states be separate, which works. Another option would be to have them be the same list of states, so you know that when current==desired you're not waiting on anything to happen. What do you think? > > The host control will have these additional properties: > - DesiredPowerState > - CurrentPowerState > > Valid states to request for OpenBMC (DesiredState) > - READY, REBOOT > Valid states to be in for OpenBMC (CurrentState) > - NOT_READY, READY (note this proposal removes BASE_APPS and BMC_STARTING). Does this need to consider states where the device is updating? That is, it is not attempting to be ready to control the host, but it is turned on and able to accept new firmware? > > READY implies all services started and running successfully (i.e. we > reached obmc-standby.target) > > > Valid states to request for Host (DesiredState) > - OFF, ON, REBOOT This might need to distinguish between soft-off/soft-reboot and printer-on-fire OFF requests. > Validate states to be in for Host (CurrentState) > - OFF, BOOTING, BOOTED STANDBY, BOOTING, RUNNING? The bikeshed should be blue. > > BOOTED implies petitboot has reached it's "ready" state. > > > Valid power states to request for DesiredPowerState > - OFF, ON > Validate states to be in for CurrentPowerState > - OFF, TRANSITION, ON Does TRANSITION need to distinguish between coming up and going down? > > Patrick and I had some discussion on providing more details on the > host booted state (petitboot vs. os booted) but that can be an > extension in the future if needed. I think baking this in from the start is a worthy goal. This then feeds into the user-facing APIs, weather that is an IPMI sensor or a REST endpoint, that can be queried to ask "what is that damn computer up to". If we can make it informative enough that no one ever feels the need to watch the console output when the machine is booting then we have done a good job. Cheers, Joel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BMC and Host State Management Refactor 2016-11-21 5:55 ` Joel Stanley @ 2016-11-22 2:28 ` Andrew Geissler 2016-11-22 3:40 ` Andrew Jeffery 0 siblings, 1 reply; 11+ messages in thread From: Andrew Geissler @ 2016-11-22 2:28 UTC (permalink / raw) To: Joel Stanley; +Cc: jdking, OpenBMC Maillist On Sun, Nov 20, 2016 at 11:55 PM, Joel Stanley <joel@jms.id.au> wrote: > Hi Andrew and Josh, > > On Sat, Nov 19, 2016 at 7:01 AM, Andrew Geissler <geissonator@gmail.com> wrote: >> Josh and I are working two stories this sprint that deal with >> refactoring the bmc and host state management code out of skeleton >> (#772/#783). Here’s the proposal on this work. > > Thanks for sending out your plan, this is great. I have a few comments > that came up as I was reading. > >> The overall design for both state management objects is that they will >> provide a set of properties on which to operate. >> - DesiredState >> - CurrentState >> >> CurrentState will be a read only property. > > You've chosen to make the desired and current states be separate, > which works. Another option would be to have them be the same list of > states, so you know that when current==desired you're not waiting on > anything to happen. What do you think? > Hmmm, I'm thinking from a DBUS/REST api perspective here. 2 seems more intuitive, but also I don't think I understand your proposal fully :) What would this look like from an implementation perspective and DBUS/REST interfaces? >> >> The host control will have these additional properties: >> - DesiredPowerState >> - CurrentPowerState >> >> Valid states to request for OpenBMC (DesiredState) >> - READY, REBOOT >> Valid states to be in for OpenBMC (CurrentState) >> - NOT_READY, READY (note this proposal removes BASE_APPS and BMC_STARTING). > > Does this need to consider states where the device is updating? That > is, it is not attempting to be ready to control the host, but it is > turned on and able to accept new firmware? > I think READY implies available for code update? But good point on code updates, when a code update is being performed, a FW_UPDATE state seems reasonable. >> >> READY implies all services started and running successfully (i.e. we >> reached obmc-standby.target) >> >> >> Valid states to request for Host (DesiredState) >> - OFF, ON, REBOOT > > This might need to distinguish between soft-off/soft-reboot and > printer-on-fire OFF requests. > Yeah, the focus of this story was to keep similar function as what's in place now, but refactor into c++ and the new sdbusplus interfaces. We do probably need more work done with things like you say here. We've tended to call the printer-on-fire off's EMERGENCY_ type event requests. At a minimum, I'll be sure these are tracked in future stories. >> Validate states to be in for Host (CurrentState) >> - OFF, BOOTING, BOOTED > > STANDBY, BOOTING, RUNNING? > > The bikeshed should be blue. > Agree >> >> BOOTED implies petitboot has reached it's "ready" state. >> >> >> Valid power states to request for DesiredPowerState >> - OFF, ON >> Validate states to be in for CurrentPowerState >> - OFF, TRANSITION, ON > > Does TRANSITION need to distinguish between coming up and going down? > I'd prefer to keep simple and let the user determine this by looking at the DesiredPowerState if it's needed. >> >> Patrick and I had some discussion on providing more details on the >> host booted state (petitboot vs. os booted) but that can be an >> extension in the future if needed. > > I think baking this in from the start is a worthy goal. This then > feeds into the user-facing APIs, weather that is an IPMI sensor or a > REST endpoint, that can be queried to ask "what is that damn computer > up to". If we can make it informative enough that no one ever feels > the need to watch the console output when the machine is booting then > we have done a good job. > Agree but please see above comment about the scope of this story. I'm concerned this opens up a can of worms on extra stuff we'll need from hostboot and petitboot. I'll def make sure to write this down for future stories and there wont be anything in this base design that will limit that later. > Cheers, > > Joel Thanks for the feedback Joel! Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BMC and Host State Management Refactor 2016-11-22 2:28 ` Andrew Geissler @ 2016-11-22 3:40 ` Andrew Jeffery 2016-11-22 17:23 ` Andrew Geissler 0 siblings, 1 reply; 11+ messages in thread From: Andrew Jeffery @ 2016-11-22 3:40 UTC (permalink / raw) To: Andrew Geissler, Joel Stanley; +Cc: jdking, OpenBMC Maillist [-- Attachment #1: Type: text/plain, Size: 3996 bytes --] On Mon, 2016-11-21 at 20:28 -0600, Andrew Geissler wrote: > > On Sun, Nov 20, 2016 at 11:55 PM, Joel Stanley <joel@jms.id.au> wrote: > > Hi Andrew and Josh, > > > > On Sat, Nov 19, 2016 at 7:01 AM, Andrew Geissler <geissonator@gmail.com> wrote: > > > Josh and I are working two stories this sprint that deal with > > > refactoring the bmc and host state management code out of skeleton > > > (#772/#783). Here’s the proposal on this work. > > > > Thanks for sending out your plan, this is great. I have a few comments > > that came up as I was reading. > > > > > The overall design for both state management objects is that they will > > > provide a set of properties on which to operate. > > > - DesiredState > > > - CurrentState > > > > > > CurrentState will be a read only property. > > > > You've chosen to make the desired and current states be separate, > > which works. Another option would be to have them be the same list of > > states, so you know that when current==desired you're not waiting on > > anything to happen. What do you think? > > > > Hmmm, I'm thinking from a DBUS/REST api perspective here. 2 seems > more intuitive, but also I don't think I understand your proposal > fully :) I think you might be misinterpreting. I don't think Joel was suggesting you eliminate one of the DesiredState or CurrentState "variables", rather that the /types/ of the CurrentState and DesiredState variables be equal. That is, that the same set of states can be assigned to both. > What would this look like from an implementation perspective > and DBUS/REST interfaces? No different, just the set of possible states would be consistent between the two variables. Alternatively you could define transitions on the states and rename DesiredState to Transition, or something similar. This might better align with what you're proposing (a state machine). > > > > > > > The host control will have these additional properties: > > > - DesiredPowerState > > > - CurrentPowerState > > > > > > Valid states to request for OpenBMC (DesiredState) > > > - READY, REBOOT > > > Valid states to be in for OpenBMC (CurrentState) > > > - NOT_READY, READY (note this proposal removes BASE_APPS and BMC_STARTING). What does it mean to request a READY state? When can you do this? Can you request it whilst the BMC is in NOT_READY? Requesting READY in READY doesn't seem productive. If there are restrictions on combinations of source and target state it might be better to make use of the transition concept to spell it out (i.e. describe a formal state machine). > > > > Does this need to consider states where the device is updating? That > > is, it is not attempting to be ready to control the host, but it is > > turned on and able to accept new firmware? > > > > I think READY implies available for code update? But good point on > code updates, when a code update is being performed, a FW_UPDATE state > seems reasonable. > > > > > > > READY implies all services started and running successfully (i.e. we > > > reached obmc-standby.target) > > > > > > > > > Valid states to request for Host (DesiredState) > > > - OFF, ON, REBOOT > > > > This might need to distinguish between soft-off/soft-reboot and > > printer-on-fire OFF requests. > > > > Yeah, the focus of this story was to keep similar function as what's > in place now, but refactor into c++ and the new sdbusplus interfaces. > We do probably need more work done with things like you say here. > We've tended to call the printer-on-fire off's EMERGENCY_ type event > requests. At a minimum, I'll be sure these are tracked in future > stories. > > > > Validate states to be in for Host (CurrentState) > > > - OFF, BOOTING, BOOTED > > > > STANDBY, BOOTING, RUNNING? > > > > The bikeshed should be blue. > > > > Agree Okay so the bikeshed is blue, but what about the state names?? ;) Andrew [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BMC and Host State Management Refactor 2016-11-22 3:40 ` Andrew Jeffery @ 2016-11-22 17:23 ` Andrew Geissler 2016-11-23 1:07 ` Andrew Jeffery 0 siblings, 1 reply; 11+ messages in thread From: Andrew Geissler @ 2016-11-22 17:23 UTC (permalink / raw) To: Andrew Jeffery; +Cc: Joel Stanley, jdking, OpenBMC Maillist On Mon, Nov 21, 2016 at 9:40 PM, Andrew Jeffery <andrew@aj.id.au> wrote: > On Mon, 2016-11-21 at 20:28 -0600, Andrew Geissler wrote: >> > On Sun, Nov 20, 2016 at 11:55 PM, Joel Stanley <joel@jms.id.au> wrote: >> > Hi Andrew and Josh, >> > >> > On Sat, Nov 19, 2016 at 7:01 AM, Andrew Geissler <geissonator@gmail.com> wrote: >> > > Josh and I are working two stories this sprint that deal with >> > > refactoring the bmc and host state management code out of skeleton >> > > (#772/#783). Here’s the proposal on this work. >> > >> > Thanks for sending out your plan, this is great. I have a few comments >> > that came up as I was reading. >> > >> > > The overall design for both state management objects is that they will >> > > provide a set of properties on which to operate. >> > > - DesiredState >> > > - CurrentState >> > > >> > > CurrentState will be a read only property. >> > >> > You've chosen to make the desired and current states be separate, >> > which works. Another option would be to have them be the same list of >> > states, so you know that when current==desired you're not waiting on >> > anything to happen. What do you think? >> > >> >> Hmmm, I'm thinking from a DBUS/REST api perspective here. 2 seems >> more intuitive, but also I don't think I understand your proposal >> fully :) > > I think you might be misinterpreting. I don't think Joel was suggesting > you eliminate one of the DesiredState or CurrentState "variables", > rather that the /types/ of the CurrentState and DesiredState variables > be equal. That is, that the same set of states can be assigned to both. > I see now. I'm still not seeing any huge advantages on either proposal over my original. I think I'm just going to stick with it for now since there are times where the valid states associated with each (Desired vs. Current) are different and I think having the two as I've defined is a bit more user friendly. >> What would this look like from an implementation perspective >> and DBUS/REST interfaces? > > No different, just the set of possible states would be consistent > between the two variables. > > Alternatively you could define transitions on the states and rename > DesiredState to Transition, or something similar. This might better > align with what you're proposing (a state machine). > >> >> > > >> > > The host control will have these additional properties: >> > > - DesiredPowerState >> > > - CurrentPowerState >> > > >> > > Valid states to request for OpenBMC (DesiredState) >> > > - READY, REBOOT >> > > Valid states to be in for OpenBMC (CurrentState) >> > > - NOT_READY, READY (note this proposal removes BASE_APPS and BMC_STARTING). > > What does it mean to request a READY state? When can you do this? Can > you request it whilst the BMC is in NOT_READY? Requesting READY in > READY doesn't seem productive. > > If there are restrictions on combinations of source and target state it > might be better to make use of the transition concept to spell it out > (i.e. describe a formal state machine). > Yeah good point, READY is not a valid state to request. We get to that via systemd, so the only valid DesiredState would be REBOOT. I'm trying to keep things as simple as possible right now, and to utilize as much of systemd's built in function for state transitions. >> > >> > Does this need to consider states where the device is updating? That >> > is, it is not attempting to be ready to control the host, but it is >> > turned on and able to accept new firmware? >> > >> >> I think READY implies available for code update? But good point on >> code updates, when a code update is being performed, a FW_UPDATE state >> seems reasonable. >> >> > > >> > > READY implies all services started and running successfully (i.e. we >> > > reached obmc-standby.target) >> > > >> > > >> > > Valid states to request for Host (DesiredState) >> > > - OFF, ON, REBOOT >> > >> > This might need to distinguish between soft-off/soft-reboot and >> > printer-on-fire OFF requests. >> > >> >> Yeah, the focus of this story was to keep similar function as what's >> in place now, but refactor into c++ and the new sdbusplus interfaces. >> We do probably need more work done with things like you say here. >> We've tended to call the printer-on-fire off's EMERGENCY_ type event >> requests. At a minimum, I'll be sure these are tracked in future >> stories. >> >> > > Validate states to be in for Host (CurrentState) >> > > - OFF, BOOTING, BOOTED >> > >> > STANDBY, BOOTING, RUNNING? >> > >> > The bikeshed should be blue. >> > >> >> Agree > > Okay so the bikeshed is blue, but what about the state names?? ;) I finally googled this... http://bikeshed.org/ nice > > Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BMC and Host State Management Refactor 2016-11-22 17:23 ` Andrew Geissler @ 2016-11-23 1:07 ` Andrew Jeffery 2016-11-28 2:30 ` Andrew Geissler 0 siblings, 1 reply; 11+ messages in thread From: Andrew Jeffery @ 2016-11-23 1:07 UTC (permalink / raw) To: Andrew Geissler; +Cc: Joel Stanley, jdking, OpenBMC Maillist [-- Attachment #1: Type: text/plain, Size: 2703 bytes --] On Tue, 2016-11-22 at 11:23 -0600, Andrew Geissler wrote: > > On Mon, Nov 21, 2016 at 9:40 PM, Andrew Jeffery <andrew@aj.id.au> wrote: > > On Mon, 2016-11-21 at 20:28 -0600, Andrew Geissler wrote: > > > > > > > > On Sun, Nov 20, 2016 at 11:55 PM, Joel Stanley <joel@jms.id.au> wrote: > > > > Hi Andrew and Josh, > > > > > > > > On Sat, Nov 19, 2016 at 7:01 AM, Andrew Geissler <geissonator@gmail.com> wrote: > > > > > Josh and I are working two stories this sprint that deal with > > > > > refactoring the bmc and host state management code out of skeleton > > > > > (#772/#783). Here’s the proposal on this work. > > > > > > > > Thanks for sending out your plan, this is great. I have a few comments > > > > that came up as I was reading. > > > > > > > > > The overall design for both state management objects is that they will > > > > > provide a set of properties on which to operate. > > > > > - DesiredState > > > > > - CurrentState > > > > > > > > > > CurrentState will be a read only property. > > > > > > > > You've chosen to make the desired and current states be separate, > > > > which works. Another option would be to have them be the same list of > > > > states, so you know that when current==desired you're not waiting on > > > > anything to happen. What do you think? > > > > > > > > > > Hmmm, I'm thinking from a DBUS/REST api perspective here. 2 seems > > > more intuitive, but also I don't think I understand your proposal > > > fully :) > > > > I think you might be misinterpreting. I don't think Joel was suggesting > > you eliminate one of the DesiredState or CurrentState "variables", > > rather that the /types/ of the CurrentState and DesiredState variables > > be equal. That is, that the same set of states can be assigned to both. > > > > I see now. I'm still not seeing any huge advantages on either > proposal over my original. The advantage I see in Joel's proposal is that we have fewer types involved in the problem. The alternative (as mentioned below) is you rename DesiredState to Transition, in which case I think what you are suggesting is okay. Transitions and states are distinct and well defined concepts. I don't like the idea of "desiring" a state that doesn't exist. Joel's initial question suggests he thinks along these lines as well. > I think I'm just going to stick with it > for now since there are times where the valid states associated with > each (Desired vs. Current) are different Can you expand on this to make it clear what you are arguing for? > and I think having the two as > I've defined is a bit more user friendly. In what way? Cheers, Andrew [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BMC and Host State Management Refactor 2016-11-23 1:07 ` Andrew Jeffery @ 2016-11-28 2:30 ` Andrew Geissler 2017-01-03 22:24 ` Andrew Geissler 0 siblings, 1 reply; 11+ messages in thread From: Andrew Geissler @ 2016-11-28 2:30 UTC (permalink / raw) To: Andrew Jeffery; +Cc: Joel Stanley, jdking, OpenBMC Maillist On Tue, Nov 22, 2016 at 7:07 PM, Andrew Jeffery <andrew@aj.id.au> wrote: > On Tue, 2016-11-22 at 11:23 -0600, Andrew Geissler wrote: >> > On Mon, Nov 21, 2016 at 9:40 PM, Andrew Jeffery <andrew@aj.id.au> wrote: >> > On Mon, 2016-11-21 at 20:28 -0600, Andrew Geissler wrote: >> > > > > > > > On Sun, Nov 20, 2016 at 11:55 PM, Joel Stanley <joel@jms.id.au> wrote: >> > > > Hi Andrew and Josh, >> > > > >> > > > On Sat, Nov 19, 2016 at 7:01 AM, Andrew Geissler <geissonator@gmail.com> wrote: >> > > > > Josh and I are working two stories this sprint that deal with >> > > > > refactoring the bmc and host state management code out of skeleton >> > > > > (#772/#783). Here’s the proposal on this work. >> > > > >> > > > Thanks for sending out your plan, this is great. I have a few comments >> > > > that came up as I was reading. >> > > > >> > > > > The overall design for both state management objects is that they will >> > > > > provide a set of properties on which to operate. >> > > > > - DesiredState >> > > > > - CurrentState >> > > > > >> > > > > CurrentState will be a read only property. >> > > > >> > > > You've chosen to make the desired and current states be separate, >> > > > which works. Another option would be to have them be the same list of >> > > > states, so you know that when current==desired you're not waiting on >> > > > anything to happen. What do you think? >> > > > >> > > >> > > Hmmm, I'm thinking from a DBUS/REST api perspective here. 2 seems >> > > more intuitive, but also I don't think I understand your proposal >> > > fully :) >> > >> > I think you might be misinterpreting. I don't think Joel was suggesting >> > you eliminate one of the DesiredState or CurrentState "variables", >> > rather that the /types/ of the CurrentState and DesiredState variables >> > be equal. That is, that the same set of states can be assigned to both. >> > >> >> I see now. I'm still not seeing any huge advantages on either >> proposal over my original. > > The advantage I see in Joel's proposal is that we have fewer types > involved in the problem. The alternative (as mentioned below) is you > rename DesiredState to Transition, in which case I think what you are > suggesting is okay. Transitions and states are distinct and well > defined concepts. > > I don't like the idea of "desiring" a state that doesn't exist. Joel's > initial question suggests he thinks along these lines as well. > Ahh, ok I see your guys point now. I could def rename the Desired variables to something like DesiredHostTransition. Maybe even make their values verbs (TURN_ON, TURN_OFF, REBOOT)? I could even knock of the "Desired" part (i.e. HostTransition)? I'm not real strong on it either way. >> I think I'm just going to stick with it >> for now since there are times where the valid states associated with >> each (Desired vs. Current) are different > > Can you expand on this to make it clear what you are arguing for? > >> and I think having the two as >> I've defined is a bit more user friendly. > > In what way? > > Cheers, > > Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BMC and Host State Management Refactor 2016-11-28 2:30 ` Andrew Geissler @ 2017-01-03 22:24 ` Andrew Geissler 2017-01-04 0:07 ` Stewart Smith 2017-01-04 16:28 ` Patrick Williams 0 siblings, 2 replies; 11+ messages in thread From: Andrew Geissler @ 2017-01-03 22:24 UTC (permalink / raw) To: Andrew Jeffery; +Cc: Joel Stanley, jdking, OpenBMC Maillist Happy 2017 Everyone!! As I’ve been implementing the host and chassis state control, I ran into an issue when moving some of our existing applications over to the new interfaces. In skeleton/hostcheckstop/host_checkstop_obj.c there’s an assumption that a “Reboot” request will do a hard power off (i.e. no host notification) and then a fresh boot of the system. However, per the design discussion of my new code, I’m implementing reboot to do a soft power off (i.e. host notification) which obviously won’t work if the host has checkstopped. I see a few options, I have my favorite last. 1. Have the checkstop code emit a checkstop signal, have the new host state code monitor for it, if a reboot is requested after the checkstop then the host code is smart enough to just power of the chassis and do a power on (i.e. no soft power off) - I’m not a big fan of the potential race conditions here on checkstop single vs reboot (I’m not sure if DBUS guarantees in-order messages) nor do I really like all this logic in the host state code. 2. Have the checkstop code issue the chassis power off, which will be detected by the host state code, and then have the checkstop code issue a power on to the host state code once the power off is complete. - This fits with our original plan, put the owness on the caller, but I don’t really like putting the state logic in the checkstop code. It would have to issue a command, wait for a signal that we’re powered off, then issue the power on. 3. Put some logic into the soft power off code, phosphor-host-ipmid/host-services.c, to know if the host is up or not and act accordingly - Doesn’t really seem like the right place for this logic 4. Provide a softReboot and hardReboot option in the host state code. The hardReboot would do the chassis power off (hard power off) and then power on. The softReboot will work as expected and issue the soft power down command to the host. - Seems like a happy compromise in where the logic goes. Checkstop is smart enough to know it needs a hardReboot and host state code knows how to do it. Current Interfaces: https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/xyz/openbmc_project/State Andrew On Sun, Nov 27, 2016 at 8:30 PM, Andrew Geissler <geissonator@gmail.com> wrote: > On Tue, Nov 22, 2016 at 7:07 PM, Andrew Jeffery <andrew@aj.id.au> wrote: >> On Tue, 2016-11-22 at 11:23 -0600, Andrew Geissler wrote: >>> > On Mon, Nov 21, 2016 at 9:40 PM, Andrew Jeffery <andrew@aj.id.au> wrote: >>> > On Mon, 2016-11-21 at 20:28 -0600, Andrew Geissler wrote: >>> > > > > > > > On Sun, Nov 20, 2016 at 11:55 PM, Joel Stanley <joel@jms.id.au> wrote: >>> > > > Hi Andrew and Josh, >>> > > > >>> > > > On Sat, Nov 19, 2016 at 7:01 AM, Andrew Geissler <geissonator@gmail.com> wrote: >>> > > > > Josh and I are working two stories this sprint that deal with >>> > > > > refactoring the bmc and host state management code out of skeleton >>> > > > > (#772/#783). Here’s the proposal on this work. >>> > > > >>> > > > Thanks for sending out your plan, this is great. I have a few comments >>> > > > that came up as I was reading. >>> > > > >>> > > > > The overall design for both state management objects is that they will >>> > > > > provide a set of properties on which to operate. >>> > > > > - DesiredState >>> > > > > - CurrentState >>> > > > > >>> > > > > CurrentState will be a read only property. >>> > > > >>> > > > You've chosen to make the desired and current states be separate, >>> > > > which works. Another option would be to have them be the same list of >>> > > > states, so you know that when current==desired you're not waiting on >>> > > > anything to happen. What do you think? >>> > > > >>> > > >>> > > Hmmm, I'm thinking from a DBUS/REST api perspective here. 2 seems >>> > > more intuitive, but also I don't think I understand your proposal >>> > > fully :) >>> > >>> > I think you might be misinterpreting. I don't think Joel was suggesting >>> > you eliminate one of the DesiredState or CurrentState "variables", >>> > rather that the /types/ of the CurrentState and DesiredState variables >>> > be equal. That is, that the same set of states can be assigned to both. >>> > >>> >>> I see now. I'm still not seeing any huge advantages on either >>> proposal over my original. >> >> The advantage I see in Joel's proposal is that we have fewer types >> involved in the problem. The alternative (as mentioned below) is you >> rename DesiredState to Transition, in which case I think what you are >> suggesting is okay. Transitions and states are distinct and well >> defined concepts. >> >> I don't like the idea of "desiring" a state that doesn't exist. Joel's >> initial question suggests he thinks along these lines as well. >> > > Ahh, ok I see your guys point now. I could def rename the Desired > variables to something like DesiredHostTransition. Maybe even make > their values verbs (TURN_ON, TURN_OFF, REBOOT)? I could even knock of > the "Desired" part (i.e. HostTransition)? I'm not real strong on it > either way. > >>> I think I'm just going to stick with it >>> for now since there are times where the valid states associated with >>> each (Desired vs. Current) are different >> >> Can you expand on this to make it clear what you are arguing for? >> >>> and I think having the two as >>> I've defined is a bit more user friendly. >> >> In what way? >> >> Cheers, >> >> Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BMC and Host State Management Refactor 2017-01-03 22:24 ` Andrew Geissler @ 2017-01-04 0:07 ` Stewart Smith 2017-01-04 16:28 ` Patrick Williams 1 sibling, 0 replies; 11+ messages in thread From: Stewart Smith @ 2017-01-04 0:07 UTC (permalink / raw) To: Andrew Geissler, Andrew Jeffery; +Cc: jdking, OpenBMC Maillist Andrew Geissler <geissonator@gmail.com> writes: > 4. Provide a softReboot and hardReboot option in the host state code. > The hardReboot would do the chassis power off (hard power off) and > then power on. The softReboot will work as expected and issue the > soft power down command to the host. > - Seems like a happy compromise in where the logic goes. Checkstop is > smart enough to know it needs a hardReboot and host state code knows > how to do it. So, we also have fast reset, which is different again, mostly in that it has slightly different security implications. So, we have power (off|cycle) with hard/soft shutdown and we have reboot with hard/soft shutdown. At least on other platforms, I think there's a similar concept, as I've noticed that on many x86 systems if you don't cut the power on reboot, you skip some of the POST.... thougths? -- Stewart Smith OPAL Architect, IBM. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BMC and Host State Management Refactor 2017-01-03 22:24 ` Andrew Geissler 2017-01-04 0:07 ` Stewart Smith @ 2017-01-04 16:28 ` Patrick Williams 2017-01-04 22:44 ` Andrew Geissler 1 sibling, 1 reply; 11+ messages in thread From: Patrick Williams @ 2017-01-04 16:28 UTC (permalink / raw) To: Andrew Geissler; +Cc: Andrew Jeffery, jdking, OpenBMC Maillist [-- Attachment #1: Type: text/plain, Size: 7462 bytes --] On Tue, Jan 03, 2017 at 04:24:00PM -0600, Andrew Geissler wrote: > Happy 2017 Everyone!! > > As I’ve been implementing the host and chassis state control, I ran > into an issue when moving some of our existing applications over to > the new interfaces. > > In skeleton/hostcheckstop/host_checkstop_obj.c there’s an assumption > that a “Reboot” request will do a hard power off (i.e. no host > notification) and then a fresh boot of the system. However, per the > design discussion of my new code, I’m implementing reboot to do a soft > power off (i.e. host notification) which obviously won’t work if the > host has checkstopped. I don't think this is unique to "checkstop". Any time the host has crashed the soft power off won't work. Aren't there two phases to a "soft power off"? 1) Send SMS alert to host, have a short timeout for them to accept the SMS alert. 2) Have a long timeout for them to acknowledge they are ready for the reboot. The "short timeout" should be on the order of seconds, so adding that to the checkstop path doesn't really seem that unreasonable to me. There are going to be other cases (pgood fault, host kernel panic, clock failure) where the host has similarly died and not all of them yield a checkstop signal. > > I see a few options, I have my favorite last. > > 1. Have the checkstop code emit a checkstop signal, have the new host > state code monitor for it, if a reboot is requested after the > checkstop then the host code is smart enough to just power of the > chassis and do a power on (i.e. no soft power off) > - I’m not a big fan of the potential race conditions here on checkstop > single vs reboot (I’m not sure if DBUS guarantees in-order messages) > nor do I really like all this logic in the host state code. "Checkstop" is a Power-specific concept anyhow. I don't think a signal is all that useful. > > 2. Have the checkstop code issue the chassis power off, which will be > detected by the host state code, and then have the checkstop code > issue a power on to the host state code once the power off is > complete. > - This fits with our original plan, put the owness on the caller, but > I don’t really like putting the state logic in the checkstop code. It > would have to issue a command, wait for a signal that we’re powered > off, then issue the power on. Not a fan of this either for similar reasons. But, along those lines, can we have the checkstop code force the host state to "not running" / "failed"? Then the reboot request can / should skip any of the host activity. This does give us a similar pattern to follow for the other possible failure conditions. > 3. Put some logic into the soft power off code, > phosphor-host-ipmid/host-services.c, to know if the host is up or not > and act accordingly > - Doesn’t really seem like the right place for this logic I suspect it needs to have this anyhow, per my earlier comment. > 4. Provide a softReboot and hardReboot option in the host state code. > The hardReboot would do the chassis power off (hard power off) and > then power on. The softReboot will work as expected and issue the > soft power down command to the host. > - Seems like a happy compromise in where the logic goes. Checkstop is > smart enough to know it needs a hardReboot and host state code knows > how to do it. Other than failure conditions, I don't see a reason for a user to request a "hard reboot", being a reboot that keeps AC up but does not gracefully shutdown the OS, so I'd rather keep it simple. FSP code has 9000 different boot-types and this is a slippery slope towards that. > Current Interfaces: > https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/xyz/openbmc_project/State > > Andrew > > On Sun, Nov 27, 2016 at 8:30 PM, Andrew Geissler <geissonator@gmail.com> wrote: > > On Tue, Nov 22, 2016 at 7:07 PM, Andrew Jeffery <andrew@aj.id.au> wrote: > >> On Tue, 2016-11-22 at 11:23 -0600, Andrew Geissler wrote: > >>> > On Mon, Nov 21, 2016 at 9:40 PM, Andrew Jeffery <andrew@aj.id.au> wrote: > >>> > On Mon, 2016-11-21 at 20:28 -0600, Andrew Geissler wrote: > >>> > > > > > > > On Sun, Nov 20, 2016 at 11:55 PM, Joel Stanley <joel@jms.id.au> wrote: > >>> > > > Hi Andrew and Josh, > >>> > > > > >>> > > > On Sat, Nov 19, 2016 at 7:01 AM, Andrew Geissler <geissonator@gmail.com> wrote: > >>> > > > > Josh and I are working two stories this sprint that deal with > >>> > > > > refactoring the bmc and host state management code out of skeleton > >>> > > > > (#772/#783). Here’s the proposal on this work. > >>> > > > > >>> > > > Thanks for sending out your plan, this is great. I have a few comments > >>> > > > that came up as I was reading. > >>> > > > > >>> > > > > The overall design for both state management objects is that they will > >>> > > > > provide a set of properties on which to operate. > >>> > > > > - DesiredState > >>> > > > > - CurrentState > >>> > > > > > >>> > > > > CurrentState will be a read only property. > >>> > > > > >>> > > > You've chosen to make the desired and current states be separate, > >>> > > > which works. Another option would be to have them be the same list of > >>> > > > states, so you know that when current==desired you're not waiting on > >>> > > > anything to happen. What do you think? > >>> > > > > >>> > > > >>> > > Hmmm, I'm thinking from a DBUS/REST api perspective here. 2 seems > >>> > > more intuitive, but also I don't think I understand your proposal > >>> > > fully :) > >>> > > >>> > I think you might be misinterpreting. I don't think Joel was suggesting > >>> > you eliminate one of the DesiredState or CurrentState "variables", > >>> > rather that the /types/ of the CurrentState and DesiredState variables > >>> > be equal. That is, that the same set of states can be assigned to both. > >>> > > >>> > >>> I see now. I'm still not seeing any huge advantages on either > >>> proposal over my original. > >> > >> The advantage I see in Joel's proposal is that we have fewer types > >> involved in the problem. The alternative (as mentioned below) is you > >> rename DesiredState to Transition, in which case I think what you are > >> suggesting is okay. Transitions and states are distinct and well > >> defined concepts. > >> > >> I don't like the idea of "desiring" a state that doesn't exist. Joel's > >> initial question suggests he thinks along these lines as well. > >> > > > > Ahh, ok I see your guys point now. I could def rename the Desired > > variables to something like DesiredHostTransition. Maybe even make > > their values verbs (TURN_ON, TURN_OFF, REBOOT)? I could even knock of > > the "Desired" part (i.e. HostTransition)? I'm not real strong on it > > either way. > > > >>> I think I'm just going to stick with it > >>> for now since there are times where the valid states associated with > >>> each (Desired vs. Current) are different > >> > >> Can you expand on this to make it clear what you are arguing for? > >> > >>> and I think having the two as > >>> I've defined is a bit more user friendly. > >> > >> In what way? > >> > >> Cheers, > >> > >> Andrew > _______________________________________________ > openbmc mailing list > openbmc@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/openbmc -- Patrick Williams [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BMC and Host State Management Refactor 2017-01-04 16:28 ` Patrick Williams @ 2017-01-04 22:44 ` Andrew Geissler 0 siblings, 0 replies; 11+ messages in thread From: Andrew Geissler @ 2017-01-04 22:44 UTC (permalink / raw) To: Patrick Williams; +Cc: Andrew Jeffery, jdking, OpenBMC Maillist On Wed, Jan 4, 2017 at 10:28 AM, Patrick Williams <patrick@stwcx.xyz> wrote: > On Tue, Jan 03, 2017 at 04:24:00PM -0600, Andrew Geissler wrote: >> Happy 2017 Everyone!! >> >> As I’ve been implementing the host and chassis state control, I ran >> into an issue when moving some of our existing applications over to >> the new interfaces. >> >> In skeleton/hostcheckstop/host_checkstop_obj.c there’s an assumption >> that a “Reboot” request will do a hard power off (i.e. no host >> notification) and then a fresh boot of the system. However, per the >> design discussion of my new code, I’m implementing reboot to do a soft >> power off (i.e. host notification) which obviously won’t work if the >> host has checkstopped. > > I don't think this is unique to "checkstop". Any time the host has > crashed the soft power off won't work. Aren't there two phases to a > "soft power off"? > 1) Send SMS alert to host, have a short timeout for them to accept > the SMS alert. > 2) Have a long timeout for them to acknowledge they are ready for > the reboot. I haven't gotten to the bottom of this code flow yet. But yes, based on debug data, this seems to be the current flow. Although current code does not appear to have implemented any type of safe guard timeouts. If you issue a soft power off and the host is not running, we will currently just hang in whatever state we're in. I'll be sure we get the final details documented somewhere once we have plan. > The "short timeout" should be on the order of seconds, so adding that to > the checkstop path doesn't really seem that unreasonable to me. There > are going to be other cases (pgood fault, host kernel panic, clock > failure) where the host has similarly died and not all of them yield a > checkstop signal. > So you want to add this short timeout to each application that has to handle a bad host state? That doesn't make a lot of sense to me, seems like we want that logic in one place. >> >> I see a few options, I have my favorite last. >> >> 1. Have the checkstop code emit a checkstop signal, have the new host >> state code monitor for it, if a reboot is requested after the >> checkstop then the host code is smart enough to just power of the >> chassis and do a power on (i.e. no soft power off) >> - I’m not a big fan of the potential race conditions here on checkstop >> single vs reboot (I’m not sure if DBUS guarantees in-order messages) >> nor do I really like all this logic in the host state code. > > "Checkstop" is a Power-specific concept anyhow. I don't think a signal > is all that useful. > >> >> 2. Have the checkstop code issue the chassis power off, which will be >> detected by the host state code, and then have the checkstop code >> issue a power on to the host state code once the power off is >> complete. >> - This fits with our original plan, put the owness on the caller, but >> I don’t really like putting the state logic in the checkstop code. It >> would have to issue a command, wait for a signal that we’re powered >> off, then issue the power on. > > Not a fan of this either for similar reasons. > > But, along those lines, can we have the checkstop code force the host > state to "not running" / "failed"? Then the reboot request can / should > skip any of the host activity. This does give us a similar pattern to > follow for the other possible failure conditions. Yes, definitely a possibility to do something like this. >> 3. Put some logic into the soft power off code, >> phosphor-host-ipmid/host-services.c, to know if the host is up or not >> and act accordingly >> - Doesn’t really seem like the right place for this logic > > I suspect it needs to have this anyhow, per my earlier comment. > >> 4. Provide a softReboot and hardReboot option in the host state code. >> The hardReboot would do the chassis power off (hard power off) and >> then power on. The softReboot will work as expected and issue the >> soft power down command to the host. >> - Seems like a happy compromise in where the logic goes. Checkstop is >> smart enough to know it needs a hardReboot and host state code knows >> how to do it. > > Other than failure conditions, I don't see a reason for a user to > request a "hard reboot", being a reboot that keeps AC up but does not > gracefully shutdown the OS, so I'd rather keep it simple. FSP code has > 9000 different boot-types and this is a slippery slope towards that. I'm mis-using the soft/hard terms I think. For this discussion, my plan was a reboot always removes power and then reapplies it. A soft OS reboot where we leave AC power on is a future item. When I said a hard reboot, I was just thinking of a reboot where we don't send the SMS alert to the host, we just cut a/c and then boot it back up. Seems like we've come down to these points: - If the host is up, give it a chance to power down properly, but monitor it with 2 timeouts, 1 fast one where they ack they're alive and processing the request and another longer one where they do the shutdown. - Allow application to tell the host state code when it knows the host is dead (i.e. checkstop, power failure, watchdog) and in those cases the state code should just power off/on the system on reboot requests. The host power off code needs to honor this above rule. Seems like you were thinking the timeout logic for this should go into the host ipmi code? The IPMI code will have better access to the SMS attention and general host status so seems ok. But it also seems out of this discussion we're thinking a "host reboot" when the host is running should really just be a reboot of the host where we keep the AC applied? The current "chassis reboot" which I'm trying to port over(and is the usecase in checkstop code) was for the whole chassis. Final code logic would look something like this on a "reboot" request to the host dbus object Cec power OS State Action on off remove cec power, power back on (i.e. checkstop code has told host state code host is off) on on send reboot to host, verify it's processed (on fail, power down and then up) off off just power on system If a user really wants a full "hard" reboot, they can call the chassis code to cut power and then boot the system again. >> Current Interfaces: >> https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/xyz/openbmc_project/State >> >> Andrew >> >> On Sun, Nov 27, 2016 at 8:30 PM, Andrew Geissler <geissonator@gmail.com> wrote: >> > On Tue, Nov 22, 2016 at 7:07 PM, Andrew Jeffery <andrew@aj.id.au> wrote: >> >> On Tue, 2016-11-22 at 11:23 -0600, Andrew Geissler wrote: >> >>> > On Mon, Nov 21, 2016 at 9:40 PM, Andrew Jeffery <andrew@aj.id.au> wrote: >> >>> > On Mon, 2016-11-21 at 20:28 -0600, Andrew Geissler wrote: >> >>> > > > > > > > On Sun, Nov 20, 2016 at 11:55 PM, Joel Stanley <joel@jms.id.au> wrote: >> >>> > > > Hi Andrew and Josh, >> >>> > > > >> >>> > > > On Sat, Nov 19, 2016 at 7:01 AM, Andrew Geissler <geissonator@gmail.com> wrote: >> >>> > > > > Josh and I are working two stories this sprint that deal with >> >>> > > > > refactoring the bmc and host state management code out of skeleton >> >>> > > > > (#772/#783). Here’s the proposal on this work. >> >>> > > > >> >>> > > > Thanks for sending out your plan, this is great. I have a few comments >> >>> > > > that came up as I was reading. >> >>> > > > >> >>> > > > > The overall design for both state management objects is that they will >> >>> > > > > provide a set of properties on which to operate. >> >>> > > > > - DesiredState >> >>> > > > > - CurrentState >> >>> > > > > >> >>> > > > > CurrentState will be a read only property. >> >>> > > > >> >>> > > > You've chosen to make the desired and current states be separate, >> >>> > > > which works. Another option would be to have them be the same list of >> >>> > > > states, so you know that when current==desired you're not waiting on >> >>> > > > anything to happen. What do you think? >> >>> > > > >> >>> > > >> >>> > > Hmmm, I'm thinking from a DBUS/REST api perspective here. 2 seems >> >>> > > more intuitive, but also I don't think I understand your proposal >> >>> > > fully :) >> >>> > >> >>> > I think you might be misinterpreting. I don't think Joel was suggesting >> >>> > you eliminate one of the DesiredState or CurrentState "variables", >> >>> > rather that the /types/ of the CurrentState and DesiredState variables >> >>> > be equal. That is, that the same set of states can be assigned to both. >> >>> > >> >>> >> >>> I see now. I'm still not seeing any huge advantages on either >> >>> proposal over my original. >> >> >> >> The advantage I see in Joel's proposal is that we have fewer types >> >> involved in the problem. The alternative (as mentioned below) is you >> >> rename DesiredState to Transition, in which case I think what you are >> >> suggesting is okay. Transitions and states are distinct and well >> >> defined concepts. >> >> >> >> I don't like the idea of "desiring" a state that doesn't exist. Joel's >> >> initial question suggests he thinks along these lines as well. >> >> >> > >> > Ahh, ok I see your guys point now. I could def rename the Desired >> > variables to something like DesiredHostTransition. Maybe even make >> > their values verbs (TURN_ON, TURN_OFF, REBOOT)? I could even knock of >> > the "Desired" part (i.e. HostTransition)? I'm not real strong on it >> > either way. >> > >> >>> I think I'm just going to stick with it >> >>> for now since there are times where the valid states associated with >> >>> each (Desired vs. Current) are different >> >> >> >> Can you expand on this to make it clear what you are arguing for? >> >> >> >>> and I think having the two as >> >>> I've defined is a bit more user friendly. >> >> >> >> In what way? >> >> >> >> Cheers, >> >> >> >> Andrew >> _______________________________________________ >> openbmc mailing list >> openbmc@lists.ozlabs.org >> https://lists.ozlabs.org/listinfo/openbmc > > -- > Patrick Williams Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-01-04 22:44 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-18 20:31 BMC and Host State Management Refactor Andrew Geissler 2016-11-21 5:55 ` Joel Stanley 2016-11-22 2:28 ` Andrew Geissler 2016-11-22 3:40 ` Andrew Jeffery 2016-11-22 17:23 ` Andrew Geissler 2016-11-23 1:07 ` Andrew Jeffery 2016-11-28 2:30 ` Andrew Geissler 2017-01-03 22:24 ` Andrew Geissler 2017-01-04 0:07 ` Stewart Smith 2017-01-04 16:28 ` Patrick Williams 2017-01-04 22:44 ` Andrew Geissler
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.