All of lore.kernel.org
 help / color / mirror / Atom feed
* OpenBMC on RCS platforms
@ 2021-04-23 14:30 Timothy Pearson
  2021-04-23 17:11 ` Patrick Williams
  2021-04-23 17:23 ` Ed Tanous
  0 siblings, 2 replies; 11+ messages in thread
From: Timothy Pearson @ 2021-04-23 14:30 UTC (permalink / raw)
  To: openbmc

All,

I'm reaching out after some internal discussion on how we can better integrate our platforms with the OpenBMC project.  As many of you may know, we have been using OpenBMC in our lineup of OpenPOWER-based server and desktop products, with a number of custom patches on top to better serve our target markets.

While we have had fairly good success with OpenBMC in the server / datacenter space, reception has been lukewarm at best in the desktop space.  This is not too surprising, given OpenBMC's historical focus on datacenter applications, but it is also becoming an expensive technical and PR pain point for us as the years go by.  To make matters worse, we're still shielding our desktop / workstation customer base to some degree from certain design decisions that persist in upstream OpenBMC, and we'd like to open discussion on all of these topics to see if a resolution can be found with minimal wasted effort from all sides.

Roughly speaking, we see issues in OpenBMC in 5 main areas:


== Fan control ==

Out of all of the various pain points we've dealt with over the years, this has proven the most costly and is responsible on its own for the lack of RCS platforms upstream in OpenBMC.

To be perfectly frank, OpenBMC's current fan control subsystem is a technical embarrassment, and not up to the high quality seen elsewhere in the project.  Worse, this multi-daemon DBUS-interconnected Rube Goldberg contraption has somehow managed to persist over the past 4+ years, likely because it reached a complexity level where it is both tightly integrated with the rest of the OpenBMC system and extremely difficult to understand, therefore it is equally difficult to replace.  Furthering the lack of progress is the fact that it is mostly "working" for datacenter applications, so there may be a "don't touch what isn't broken" mentality in play.  From a technical perspective, it is indirected to a sufficient level as to be nearly incomprehensible to most people, with the source spread across multiple different projects and repositories, yet somehow it remains rigid / fragile enough to not support basic features like runtime (or even post-compile) fan configuration for a given server.

What we need is a much simpler, more robust fan control daemon.  Ideally this would be one self-contained process, not multiple interconnected processes where a single failure causes the entire system to go into safe mode.

Our requirements:
1.) True PID control with tunable constants.  Trying to do things with PWM/temp maps alone may have made sense in the resource-constrained environments common in the 1970s, but it makes no sense on modern, powerful BMC silicon with hard floating point instructions.  Even the stock fan daemon implements a sort of bespoke integrator-by-another-name, but without the P and D components it does a terrible job outside of a constant-temperature datacenter environment.
2.) Tunable PID constants, tunable temperature thresholds, tunable min/max fan speeds, and arbitrary linkage between temp inputs (zones) and fan outputs (also zoned).
3.) Configurable zones -- both temperature and PWMs, as well as installed / not installed fans and temperature sensors.
4.) Configurable failure behavior.  A single failed or uninstalled chassis fan should NOT cause the entire platform to go into failsafe mode!
5.) A decent GUI to configure all of this, and the ability to export / import the settings.

To be fair, we've only been able to implement 1, 2, 3, and 4 above at compile time -- we don't have the runtime configuration options due to the way the fan systems work in OpenBMC right now, and the sheer amount of work needed to overhaul the GUI in the out-of-tree situation we remain stuck in.  With all that said, however, we point out that our competition, especially on x86 platforms, has all of these features and more, all neatly contained in a nice user-friendly point+click GUI.  OpenBMC should be able to easily match or exceed that functionality, but for some reason it seems stuck in datacenter-only mode with archaic hardcoded tables and constants.

== Local firmware updates ==

This is right behind fan control in terms of cost and PR damage for us vs. competing platforms.  While OpenBMC's firmware update support is very well tuned for datacenter operations (we use a simple SSH + pflash method on our large clusters, for example) it's absolutely terrible for desktop and workstation applications where a second PC is not guaranteed to be available, and where wired Ethernet even exists DHCP is either non-existent or provided by a consumer cable box.  Some method of flashing -- and recovering -- the BMC and host firmware right from the local machine is badly needed, especially for the WiFi-only environments we're starting to see more of in the wild.  Ideally this would be a command line tool / library such that we can integrate it with our bootloader or a GUI as desired.

== BMC boot time ==

This is self explanatory.  Other vendors' solutions allow the host to be powered on within seconds of power application from the wall, and even our own Kestrel soft BMC allows the host to begin booting less than 10 seconds after power is applied.  Several *minutes* for OpenBMC to reach a point where it can even start to boot the host is a major issue outside of datacenter applications.

== Host boot status indications ==

Any ODM that makes server products has had to deal with the psychological "dead server effect", where lack of visible progress during boot causes spurious callouts / RMAs.  It's even worse on desktop, especially if server-type hardware is used inside the machine.  We've worked around this a few times with our "IPL observer" services, and really do need this functionality in OpenBMC.  The current version we have is both front panel lights and a progress bar on the BMC boot monitor (VGA/HDMI), and this is something we're willing to contribute upstream.

== IPMI / BMC permissions ==

An item that's come up recently is that, at least on our older OpenBMC versions, there's a complete disconnect between the BMC's shell user database and the IPMI user database.  Resetting the BMC root password isn't possible from IPMI on the host, and setting up IPMI doesn't seem possible from the BMC shell.  If IPMI support is something OpenBMC provides alongside Redfish, it needs to be better integrated -- we're dealing with multiple locked-out BMC issues at the moment at various customer sites, and the recovery method is painful at best when it should be as simple as an ipmitool command from the host terminal.


If there is interest, I'd suggest we all work on getting some semblance of a modern fan control system and the boot status indication framework into upstream OpenBMC.  This would allow Raptor to start upstreaming base support for RCS product lines without risking severe regressions in user pain points like noisy fans -- perceived high noise levels are always a great way to kill sales of office products, and as a result the fan control functionality is something we're quite sensitive about.  The main problem is that with the existing fan control system's tentacles snaking everywhere including the UI, this will need to be a concerted effort by multiple organizations including the maintainers of the UI and the other ODMs currently using the existing fan control functionality.  We're willing to make yet another attempt *if* there's enough buy-in from the various stakeholders to ensure a prompt merge and update of the other components.

Finally, some of you may also be aware of our Kestrel project [1], which eschews the typical BMC ASICs, Linux, and OpenBMC itself.  I'd like to point out that this is not a direct competitor to OpenBMC, it is designed specifically for certain target applications with unique requirements surrounding overall size, functionality, speed, auditability, transparency, etc.  Why we have gone to those lengths will become apparent later this year, but suffice it to say we're considering Kestrel to be used in spaces where OpenBMC is not practical and vice versa.  In fact, we'd like to see OpenBMC run on the Kestrel SoCs (or a derivative thereof) at some point in the future, once the performance concerns above are sufficiently mitigated to make that practical.

[1] https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-litex/litex-boards/-/blob/master/README.md

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: OpenBMC on RCS platforms
  2021-04-23 14:30 OpenBMC on RCS platforms Timothy Pearson
@ 2021-04-23 17:11 ` Patrick Williams
  2021-04-23 18:46   ` Timothy Pearson
  2021-04-23 17:23 ` Ed Tanous
  1 sibling, 1 reply; 11+ messages in thread
From: Patrick Williams @ 2021-04-23 17:11 UTC (permalink / raw)
  To: Timothy Pearson; +Cc: openbmc

[-- Attachment #1: Type: text/plain, Size: 16388 bytes --]

On Fri, Apr 23, 2021 at 09:30:00AM -0500, Timothy Pearson wrote:
> All,
> 
> I'm reaching out after some internal discussion on how we can better integrate our platforms with the OpenBMC project.  As many of you may know, we have been using OpenBMC in our lineup of OpenPOWER-based server and desktop products, with a number of custom patches on top to better serve our target markets.

Hi Timothy,

Good to hear from your team again and hope there is some ways we can
work together on solving some of these issues.

> Roughly speaking, we see issues in OpenBMC in 5 main areas:

We might want to fork this into 5 different discussion threads and/or
design documents, but let's see how this goes...

> == Fan control ==
> 
> To be perfectly frank, OpenBMC's current fan control subsystem is a technical 
> embarrassment, and not up to the high quality seen elsewhere in the project.  
> Worse, this multi-daemon DBUS-interconnected Rube Goldberg contraption has 
> somehow managed to persist over the past 4+ years, likely because it reached
> a complexity level where it is both tightly integrated with the rest of the 
> OpenBMC system and extremely difficult to understand, therefore it is 
> equally difficult to replace.

This is, to me, a pretty unfair assessment of the situation, but I hear
you that the code is likely not very usable outside of datacenter use-cases.
Certainly there is some work that can be done to improve that and I
think we'd be receptive to have partners on it.  The vast majority of
developers on the project *are* working on datacenter use-cases though,
so I don't know if there is anyone actively taking up the mantle on
this.  This could be a good area of expertise and contribution from your
team (I personally don't really know enough of where to start at making
a desktop-friendly fan control algorithm).

I'm not sure what you mean by "this multi-daemon DBUS-interconnected Rube
Goldberg contraption" though.  There are really 3 dbus interfaces around
Fan control:
    - xyz.openbmc_project.Sensor.Value
    - xyz.openbmc_project.Control.FanPwm
    - xyz.openbmc_project.Control.FanSpeed

I don't like that we ended up with FanPwm and FanSpeed, but the fact is
that there are two different hardware models for controlling fans and
the people working with PWM-based fans didn't want to put in the effort
to control them with target speeds.  (I think there was an argument that
*their* fan control algorithm experts liked %-of-PWM as a calibration,
and weren't able to come to consensus otherwise)

I don't know what you mean by Rube Goldberg here or how to make the
sitation any better.  All sensors are read by sensor daemons using
common APIs like Linux HWMon and there is a similar "set the fan speed in
hardware" daemon.  Perhaps you could eliminate the Control.Fan*
interfaces (and merge them into a fan control daemon directly) but there
were some people who wanted to be able to manually control fan speeds in
some scenarios anyhow.  Anyhow, I'm not really seeing a lot of
simplification that could even be done, but certainly not undue
excessive mechanisations that would classify as "Rube Goldberg".

There is a xyz.openbmc_project.Control.FanRedundancy interface, but I
suspect that is used outside the use cases you intend anyhow and really
it is optional to fan control.  Similarly, anything under Inventory is
just that... Inventory; it is not critical to fan control.

> Furthering the lack of progress is the fact that it is mostly "working" for
> datacenter applications, so there may be a "don't touch what isn't broken"
> mentality in play.  

As I hinted at above, I think it is a lack of necessity and not a fear
of breaking.  In general, as a community we should not be afraid of
change.  We have plenty of test cases to qualify code that is changing
and if there isn't test cases for a functional area then it is fair game
to change without worry, in my opinion.

> From a technical perspective, it is indirected to a sufficient level as to
> be nearly incomprehensible to most people, with the source spread across
> multiple different projects and repositories, yet somehow it remains 
> rigid / fragile enough to not support basic features like runtime (or even
> post-compile) fan configuration for a given server.

There are two different fan control implementations presently:
    - phosphor-pid-control (swampd)
    - phosphor-fan-presence (phosphor-fan-control)

Which of these are you having issue with?  They are intended to serve
drastically different purposes.

I don't think anyone outside of IBM uses phosphor-fan-control.  It seems
to be explicitly designed for their systems with their own requirements.
Unless they speak up, I don't know how we intend anyone else to use this
code and it probably should be renamed 'ibm-fan-control'.

> What we need is a much simpler, more robust fan control daemon.  Ideally this would be one self-contained process, not multiple interconnected processes where a single failure causes the entire system to go into safe mode.
> 
> Our requirements:
> 1.) True PID control with tunable constants.  Trying to do things with PWM/temp maps alone may have made sense in the resource-constrained environments common in the 1970s, but it makes no sense on modern, powerful BMC silicon with hard floating point instructions.  Even the stock fan daemon implements a sort of bespoke integrator-by-another-name, but without the P and D components it does a terrible job outside of a constant-temperature datacenter environment.

Isn't phosphor-pid-control this already?

> 2.) Tunable PID constants, tunable temperature thresholds, tunable min/max fan speeds, and arbitrary linkage between temp inputs (zones) and fan outputs (also zoned).
> 3.) Configurable zones -- both temperature and PWMs, as well as installed / not installed fans and temperature sensors.

I think these two features are the part that are more interesting to
non-datacenter use cases and so nobody has put effort into it.  As much
as you seem to dislike dbus mechanization, this sounds like we would
need a few interfaces defined for these so that Redfish has something to
poke at.  

I do know some BIOS vendors provide this for desktops already.  Is there
anything at an IPMI level that could facilitate the hand-off of this?

> 4.) Configurable failure behavior.  A single failed or uninstalled chassis fan should NOT cause the entire platform to go into failsafe mode!

phosphor-fan-presence does provide some of this, but again, I feel like
it is tuned to IBM's needs.  It appears that phosphor-pid-control has
some amount of implementation of Control.FanRedundancy that I mentioned
earlier.  Are you sure that phosphor-pid-control causes the system to go
into fail-safe mode from a single fan failure though?  I've not heard
this.

> 5.) A decent GUI to configure all of this, and the ability to export / import the settings.

Sure...

> To be fair, we've only been able to implement 1, 2, 3, and 4 above at
> compile time -- we don't have the runtime configuration options due to the
> way the fan systems work in OpenBMC right now, and the sheer amount of 
> work needed to overhaul the GUI in the out-of-tree situation we remain
> stuck in.  With all that said, however, we point out that our competition,
> especially on x86 platforms, has all of these features and more, all neatly
> contained in a nice user-friendly point+click GUI.  OpenBMC should be able 
> to easily match or exceed that functionality, but for some reason it seems
> stuck in datacenter-only mode with archaic hardcoded tables and constants.

So, if you've done 1-4, are there any commits in Gerrit?  Which fan
control daemon were they done against?

There is a certain air to what you wrote that rubs me the wrong way.
We're not a product that you've paid for here to do what you, the
customer, is asking of us.  This is an open source community and one
that most of us are paid to work on by our employer.  We don't do work
to make you happy, but do work because our bosses are asking for certain
features out of us.  As I said above, almost everyone here is working on
"datacenter-only systems", so why would anyone else invest in this use
case?

This is *your* business model.  We'd certainly love to have contributions
from your team and most of us would even spend some of our time to help 
you in your efforts.  But, if you want this, as they often say:
                    "Patches Welcome!"

I do see some code from early 2020 from you against phosphor-hwmon and
phosphor-fan-presence.  All of the phosphor-hwmon commits failed CI test
so nobody every looked at them.  All of the phosphor-fan-presence commits
received timely feedback, to which you never responded, and seemed to be
missing an updated CCLA from Raptor?  Is there something we should have
done as a community to keep this work going?

> == Local firmware updates ==
> 
> This is right behind fan control in terms of cost and PR damage for us vs. competing platforms.  While OpenBMC's firmware update support is very well tuned for datacenter operations (we use a simple SSH + pflash method on our large clusters, for example) it's absolutely terrible for desktop and workstation applications where a second PC is not guaranteed to be available, and where wired Ethernet even exists DHCP is either non-existent or provided by a consumer cable box.  Some method of flashing -- and recovering -- the BMC and host firmware right from the local machine is badly needed, especially for the WiFi-only environments we're starting to see more of in the wild.  Ideally this would be a command line tool / library such that we can integrate it with our bootloader or a GUI as desired.

This sounds to me pretty easily obtainable and what I have in mind is
actually a valid data center use case for many of us.  When all else
fails, you should be able to use a USB key to update the system
(assuming the image you're updating with is trusted for whatever your
system determines is trust-worthy).  I'm pretty sure our OCP systems can
be updated with a magic combination of a USB-key and an OCP debug
card(*).  I don't think that is currently implemented on openbmc/openbmc,
but it is on our list of pending features.

For your specific users, the OCP debug card is probably not a good
requirement, but you could likely automate the update whenever a USB-key
plus text file is added?  (I'm just brainstorming how you'd know to kick
it off).  The current software update code probably isn't too far off
from being able to facilitate this for you.

https://www.opencompute.org/documents/facebook-ocp-debug-card-with-lcd-spec_v1p0

> == BMC boot time ==
> 
> This is self explanatory.  Other vendors' solutions allow the host to be powered on within seconds of power application from the wall, and even our own Kestrel soft BMC allows the host to begin booting less than 10 seconds after power is applied.  Several *minutes* for OpenBMC to reach a point where it can even start to boot the host is a major issue outside of datacenter applications.

Some of this is, to me, an artifact of the Power architecture and not an
artifact of OpenBMC explicitly.  On x86 systems we have a little code in
u-boot that wiggles a GPIO and gets the Host power sequence going while
the BMC is booting up.  This overlaps quite a bit of the memory testing
of the Host with the BMC boot time.  The "well-known proprietary BMC"
also does this same trick.

Power requires the BMC to be up in order to serve out the virtual PNOR,
from my recollection.  It seems like this could be solved in other ways,
such as a SPI-mux on a physical SPI-NOR so that the BMC can take the NOR
at specific times during update but otherwise it is given to the host
CPUs.  This is exactly what we do on x86 systems.

Having said all of that, there is certainly some performance
improvements that can be done, but nobody has taken up the torch on it.
A big low-hanging fruit in my mind is the file system compression being
xz or gzip is very computationally intensive.  I did some work, with
Nick Terrell, to switch to zstd on our systems for both the kernel
initramfs and UBI and saw significant boot time improvements.  The
upstream enablement for this appears to have landed as of v5.9 so we
could certainly start enabling it here now.

https://lore.kernel.org/linux-kbuild/20200730190841.2071656-7-nickrterrell@gmail.com/

> == Host boot status indications ==
> 
> Any ODM that makes server products has had to deal with the psychological "dead server effect", where lack of visible progress during boot causes spurious callouts / RMAs.  It's even worse on desktop, especially if server-type hardware is used inside the machine.  We've worked around this a few times with our "IPL observer" services, and really do need this functionality in OpenBMC.  The current version we have is both front panel lights and a progress bar on the BMC boot monitor (VGA/HDMI), and this is something we're willing to contribute upstream.

Great!  Let's get that merged!

I do think some others have support for a 7-seg display with the
postcodes going to it already.  I think this is along those same lines.
It might just be another back-end for our existing post code daemon to
replicate them to the VGA and/or blink morse code on an LED.

> == IPMI / BMC permissions ==
> 
> An item that's come up recently is that, at least on our older OpenBMC versions, there's a complete disconnect between the BMC's shell user database and the IPMI user database.  Resetting the BMC root password isn't possible from IPMI on the host, and setting up IPMI doesn't seem possible from the BMC shell.  If IPMI support is something OpenBMC provides alongside Redfish, it needs to be better integrated -- we're dealing with multiple locked-out BMC issues at the moment at various customer sites, and the recovery method is painful at best when it should be as simple as an ipmitool command from the host terminal.

I suspect most of this is a matter of IPMI command support and/or enabling
those commands to the host IPMI path.  Most of us are fairly untrusting
of IPMI (and the Host itself), so there hasn't been work to do anything
here.  As long as whatever you're proposing can be disabled for models
where we distrust the Host, it seems like these would be accepted as
well.

> If there is interest, I'd suggest we all work on getting some semblance of a modern fan control system and the boot status indication framework into upstream OpenBMC.  This would allow Raptor to start upstreaming base support for RCS product lines without risking severe regressions in user pain points like noisy fans -- perceived high noise levels are always a great way to kill sales of office products, and as a result the fan control functionality is something we're quite sensitive about.  The main problem is that with the existing fan control system's tentacles snaking everywhere including the UI, this will need to be a concerted effort by multiple organizations including the maintainers of the UI and the other ODMs currently using the existing fan control functionality.  We're willing to make yet another attempt *if* there's enough buy-in from the various stakeholders to ensure a prompt merge and update of the other components.

This would be great.  Hopefully nothing I wrote here was too harsh or
turned you off.  One piece of advice though...

Even if you find that some of the changes you propose are met with some
resistance, it would be good to get your base system support upstreamed
and continue to hold your extra sauce off on the side.  I know there
has been complaints by some owners of Raptor hardware that they cannot
use upstream code improvements on their own hardware because of the
forked nature of your code base.  The way forward, to me, is to get
your hardware configuration upstreamed first and work on these extra
features separately.  If one of your customers wants to use upstream,
with the caveat that they lose out on a few super awesome features, they
can make that decision, but the important thing is that your machine
doesn't get "left behind".

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: OpenBMC on RCS platforms
  2021-04-23 14:30 OpenBMC on RCS platforms Timothy Pearson
  2021-04-23 17:11 ` Patrick Williams
@ 2021-04-23 17:23 ` Ed Tanous
  2021-04-23 19:00   ` Timothy Pearson
  1 sibling, 1 reply; 11+ messages in thread
From: Ed Tanous @ 2021-04-23 17:23 UTC (permalink / raw)
  To: Timothy Pearson; +Cc: openbmc

On Fri, Apr 23, 2021 at 7:36 AM Timothy Pearson
<tpearson@raptorengineering.com> wrote:
>

First off, this is great feedback, and despite some of my comments
below, I do really appreciate you putting it out there.

> All,
>
> I'm reaching out after some internal discussion on how we can better integrate our platforms with the OpenBMC project.  As many of you may know, we have been using OpenBMC in our lineup of OpenPOWER-based server and desktop products, with a number of custom patches on top to better serve our target markets.
>
> While we have had fairly good success with OpenBMC in the server / datacenter space, reception has been lukewarm at best in the desktop space.  This is not too surprising, given OpenBMC's historical focus on datacenter applications, but it is also becoming an expensive technical and PR pain point for us as the years go by.  To make matters worse, we're still shielding our desktop / workstation customer base to some degree from certain design decisions that persist in upstream OpenBMC, and we'd like to open discussion on all of these topics to see if a resolution can be found with minimal wasted effort from all sides.
>
> Roughly speaking, we see issues in OpenBMC in 5 main areas:
>
>
> == Fan control ==
>
> Out of all of the various pain points we've dealt with over the years, this has proven the most costly and is responsible on its own for the lack of RCS platforms upstream in OpenBMC.
>
> To be perfectly frank, OpenBMC's current fan control subsystem is a technical embarrassment, and not up to the high quality seen elsewhere in the project.

Which fan control subsystem are you referring to?  Phosphor-fans or
phosphor-pid-control?

>  Worse, this multi-daemon DBUS-interconnected Rube Goldberg contraption has somehow managed to persist over the past 4+ years, likely because it reached a complexity level where it is both tightly integrated with the rest of the OpenBMC system and extremely difficult to understand, therefore it is equally difficult to replace.  Furthering the lack of progress is the fact that it is mostly "working" for datacenter applications, so there may be a "don't touch what isn't broken" mentality in play.

I'm not really sure I agree with that.  If someone came with a design
for "We should replace dbus with X", had good technical foundations
for why X was better, and was putting forward the monumental effort to
do the work, I know that I personally wouldn't be opposed.  For the
record, I agree with you about the complexity here, but most of the
ideas I've heard to make it better were "Throw everything out and
start over", which, if that's what you want to do, by all means do,
but I don't think the community is willing to redo all of the untold
hours of engineering effort spent over the years the project has
existed.

FWIW, u-bmc was a project that took the existing kernel, threw out all
the userspace and started over.  From my view outside the project,
they seem to have failed to gain traction, and only support a couple
of platforms.

>  From a technical perspective, it is indirected to a sufficient level as to be nearly incomprehensible to most people, with the source spread across multiple different projects and repositories, yet somehow it remains rigid / fragile enough to not support basic features like runtime (or even post-compile) fan configuration for a given server.

With respect, this statement is incorrect.  On an entity-manager
enabled system + phosphor-pid-control, all of the fan control
parameters are fully modifiable at runtime either from within the
system (through dbus) or through Redfish out of band through the
OEMManager API.  If you haven't ported your systems to entity-manager
yet, there's quite a bit of people doing it at the moment and are
discussing this stuff on discord basically every day that I'm sure
would be able to give you some direction on where to start getting
your systems moved over.

>
> What we need is a much simpler, more robust fan control daemon.  Ideally this would be one self-contained process, not multiple interconnected processes where a single failure causes the entire system to go into safe mode.

in phosphor-pid-control, the failure modes are configurable per zone,
and includes things like N failures to failsafe, or adjusted fan floor
on failsafe.  If what's there doesn't meet your needs, I'm sure we can
discuss adding something else (I know there's at least one feature in
review in this area that you might check out on gerrit.)

>
> Our requirements:
> 1.) True PID control with tunable constants.  Trying to do things with PWM/temp maps alone may have made sense in the resource-constrained environments common in the 1970s, but it makes no sense on modern, powerful BMC silicon with hard floating point instructions.  Even the stock fan daemon implements a sort of bespoke integrator-by-another-name, but without the P and D components it does a terrible job outside of a constant-temperature datacenter environment.

phosphor-pid-control implements PI based fan control.  If you really
wanted to add D, it would be an easy addition, but in practice, most
server control loops have enough noise, and a low enough loop
bandwidth that a D component isn't useful, so it was omitted from the
initial version.

> 2.) Tunable PID constants, tunable temperature thresholds, tunable min/max fan speeds, and arbitrary linkage between temp inputs (zones) and fan outputs (also zoned).

All of this exists in phosphor-pid-control.  Example:
https://github.com/openbmc/entity-manager/blob/a5a716dadfbf97b601577276cc699af8f662beeb/configurations/WFT%20Baseboard.json#L1100

> 3.) Configurable zones -- both temperature and PWMs, as well as installed / not installed fans and temperature sensors.

Also exists in phosphor-pid-control.  Example:
https://github.com/openbmc/entity-manager/blob/ec98491a00c5dcffae6be362e483380c807f234c/configurations/R2000%20Chassis.json#L411

> 4.) Configurable failure behavior.  A single failed or uninstalled chassis fan should NOT cause the entire platform to go into failsafe mode!

Also exists in phosphor-pid-control.  Example of allowing single rotor
failures to not cause the system to hit failsafe:
https://github.com/openbmc/entity-manager/blob/ec98491a00c5dcffae6be362e483380c807f234c/configurations/R1000%20Chassis.json#L303

> 5.) A decent GUI to configure all of this, and the ability to export / import the settings.

Doesn't exist, but considering we already have the Redfish API for
this, it should be relatively easy to execute within webui-vue.  With
that said, I've had this on my "Great idea for an intern project" list
for some time now.  If you have engineers to spare (or you're
interested in implementing this yourself) feel free to hop on discord
and I can help get you ramped on getting this started and how those
interfaces work.

>
> To be fair, we've only been able to implement 1, 2, 3, and 4 above at compile time -- we don't have the runtime configuration options due to the way the fan systems work in OpenBMC right now, and the sheer amount of work needed to overhaul the GUI in the out-of-tree situation we remain stuck in.  With all that said, however, we point out that our competition, especially on x86 platforms, has all of these features and more, all neatly contained in a nice user-friendly point+click GUI.  OpenBMC should be able to easily match or exceed that functionality, but for some reason it seems stuck in datacenter-only mode with archaic hardcoded tables and constants.
>
> == Local firmware updates ==
>
> This is right behind fan control in terms of cost and PR damage for us vs. competing platforms.  While OpenBMC's firmware update support is very well tuned for datacenter operations (we use a simple SSH + pflash method on our large clusters, for example) it's absolutely terrible for desktop and workstation applications where a second PC is not guaranteed to be available, and where wired Ethernet even exists DHCP is either non-existent or provided by a consumer cable box.  Some method of flashing -- and recovering -- the BMC and host firmware right from the local machine is badly needed, especially for the WiFi-only environments we're starting to see more of in the wild.  Ideally this would be a command line tool / library such that we can integrate it with our bootloader or a GUI as desired.

You might check Intels openbmc fork;  I believe they had u-boot
patches to do this that you might consider upstreaming, or working
with them to upstream them.

>
> == BMC boot time ==
>
> This is self explanatory.  Other vendors' solutions allow the host to be powered on within seconds of power application from the wall, and even our own Kestrel soft BMC allows the host to begin booting less than 10 seconds after power is applied.  Several *minutes* for OpenBMC to reach a point where it can even start to boot the host is a major issue outside of datacenter applications.

While this is great information to have, it's a little disingenuous to
the fact that we've significantly reduced the boot time in the last
few years with things like dropping python, and porting the mapper to
a compiled language.  We can always do better, but unless you have
concrete ideas on how we can continue reducing this, there's very
little OpenBMC can do.

>
> == Host boot status indications ==
>
> Any ODM that makes server products has had to deal with the psychological "dead server effect", where lack of visible progress during boot causes spurious callouts / RMAs.  It's even worse on desktop, especially if server-type hardware is used inside the machine.  We've worked around this a few times with our "IPL observer" services, and really do need this functionality in OpenBMC.  The current version we have is both front panel lights and a progress bar on the BMC boot monitor (VGA/HDMI), and this is something we're willing to contribute upstream.

For some reason I thought we already had code to allow the BMC to post
a splash screen ahead of processor boot, but I'm not recalling what it
was called, as I've never had this requirement myself.

>
> == IPMI / BMC permissions ==
>
> An item that's come up recently is that, at least on our older OpenBMC versions, there's a complete disconnect between the BMC's shell user database and the IPMI user database.  Resetting the BMC root password isn't possible from IPMI on the host, and setting up IPMI doesn't seem possible from the BMC shell.  If IPMI support is something OpenBMC provides alongside Redfish, it needs to be better integrated -- we're dealing with multiple locked-out BMC issues at the moment at various customer sites, and the recovery method is painful at best when it should be as simple as an ipmitool command from the host terminal.

I thought this was fixed long ago.  User passwords and user accounts
are common between redfish, ipmi, and ssh.  Do you think you could try
a more recent build and see if this is still an issue for you?

>
>
> If there is interest, I'd suggest we all work on getting some semblance of a modern fan control system and the boot status indication framework into upstream OpenBMC.  This would allow Raptor to start upstreaming base support for RCS product lines without risking severe regressions in user pain points like noisy fans -- perceived high noise levels are always a great way to kill sales of office products, and as a result the fan control functionality is something we're quite sensitive about.  The main problem is that with the existing fan control system's tentacles snaking everywhere including the UI, this will need to be a concerted effort by multiple organizations including the maintainers of the UI and the other ODMs currently using the existing fan control functionality.  We're willing to make yet another attempt *if* there's enough buy-in from the various stakeholders to ensure a prompt merge and update of the other components.

I'd really prefer you look at what already exists.  I think most of
your concerns are covered in phosphor-pid-control today, and if they
aren't, I suspect we can add new parts to the control loop where
needed.

>
> Finally, some of you may also be aware of our Kestrel project [1], which eschews the typical BMC ASICs, Linux, and OpenBMC itself.  I'd like to point out that this is not a direct competitor to OpenBMC, it is designed specifically for certain target applications with unique requirements surrounding overall size, functionality, speed, auditability, transparency, etc.  Why we have gone to those lengths will become apparent later this year, but suffice it to say we're considering Kestrel to be used in spaces where OpenBMC is not practical and vice versa.  In fact, we'd like to see OpenBMC run on the Kestrel SoCs (or a derivative thereof) at some point in the future, once the performance concerns above are sufficiently mitigated to make that practical.
>
> [1] https://gitlab.raptorengineering.com/kestrel-collaboration/kestrel-litex/litex-boards/-/blob/master/README.md

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: OpenBMC on RCS platforms
  2021-04-23 17:11 ` Patrick Williams
@ 2021-04-23 18:46   ` Timothy Pearson
  2021-04-26 21:42     ` Milton Miller II
  0 siblings, 1 reply; 11+ messages in thread
From: Timothy Pearson @ 2021-04-23 18:46 UTC (permalink / raw)
  To: Patrick Williams; +Cc: openbmc



----- Original Message -----
> From: "Patrick Williams" <patrick@stwcx.xyz>
> To: "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "openbmc" <openbmc@lists.ozlabs.org>
> Sent: Friday, April 23, 2021 12:11:26 PM
> Subject: Re: OpenBMC on RCS platforms

> On Fri, Apr 23, 2021 at 09:30:00AM -0500, Timothy Pearson wrote:
>> All,
>> 
>> I'm reaching out after some internal discussion on how we can better integrate
>> our platforms with the OpenBMC project.  As many of you may know, we have been
>> using OpenBMC in our lineup of OpenPOWER-based server and desktop products,
>> with a number of custom patches on top to better serve our target markets.
> 
> Hi Timothy,
> 
> Good to hear from your team again and hope there is some ways we can
> work together on solving some of these issues.
> 
>> Roughly speaking, we see issues in OpenBMC in 5 main areas:
> 
> We might want to fork this into 5 different discussion threads and/or
> design documents, but let's see how this goes...
> 
>> == Fan control ==
>> 
>> To be perfectly frank, OpenBMC's current fan control subsystem is a technical
>> embarrassment, and not up to the high quality seen elsewhere in the project.
>> Worse, this multi-daemon DBUS-interconnected Rube Goldberg contraption has
>> somehow managed to persist over the past 4+ years, likely because it reached
>> a complexity level where it is both tightly integrated with the rest of the
>> OpenBMC system and extremely difficult to understand, therefore it is
>> equally difficult to replace.
> 
> This is, to me, a pretty unfair assessment of the situation, but I hear
> you that the code is likely not very usable outside of datacenter use-cases.
> Certainly there is some work that can be done to improve that and I
> think we'd be receptive to have partners on it.  The vast majority of
> developers on the project *are* working on datacenter use-cases though,
> so I don't know if there is anyone actively taking up the mantle on
> this.  This could be a good area of expertise and contribution from your
> team (I personally don't really know enough of where to start at making
> a desktop-friendly fan control algorithm).
> 
> I'm not sure what you mean by "this multi-daemon DBUS-interconnected Rube
> Goldberg contraption" though.  There are really 3 dbus interfaces around
> Fan control:
>    - xyz.openbmc_project.Sensor.Value
>    - xyz.openbmc_project.Control.FanPwm
>    - xyz.openbmc_project.Control.FanSpeed
> 
> I don't like that we ended up with FanPwm and FanSpeed, but the fact is
> that there are two different hardware models for controlling fans and
> the people working with PWM-based fans didn't want to put in the effort
> to control them with target speeds.  (I think there was an argument that
> *their* fan control algorithm experts liked %-of-PWM as a calibration,
> and weren't able to come to consensus otherwise)
> 
> I don't know what you mean by Rube Goldberg here or how to make the
> sitation any better.  All sensors are read by sensor daemons using
> common APIs like Linux HWMon and there is a similar "set the fan speed in
> hardware" daemon.  Perhaps you could eliminate the Control.Fan*
> interfaces (and merge them into a fan control daemon directly) but there
> were some people who wanted to be able to manually control fan speeds in
> some scenarios anyhow.  Anyhow, I'm not really seeing a lot of
> simplification that could even be done, but certainly not undue
> excessive mechanisations that would classify as "Rube Goldberg".
> 
> There is a xyz.openbmc_project.Control.FanRedundancy interface, but I
> suspect that is used outside the use cases you intend anyhow and really
> it is optional to fan control.  Similarly, anything under Inventory is
> just that... Inventory; it is not critical to fan control.

I admit I was a bit harsh here, but in this particular case I think I may be justified.  Hear me out.. :)

Looking at one of our systems, the following daemons and scripts all have to run just to provide basic fan control:

fan_control.exe
phosphor-fan-presence-tach
phosphor-fan-monitor
phosphor-fan-control
phosphor-hwmon-readd (multiple instances)
openpower-occ-control
occ-active.sh

We don't have anything against DBUS per se, but what I do see here is that DBUS has been used as a crutch to easily glue together four (!) different services that are so closely linked that they really should be all integrated into one dedicated service.  Also on display here is a bit of the haphazard design that afflicts the fan control system -- the separate daemons to provide raw sensor values may well make sense as a sort of HAL, but there's no equivalent to interface with the raw PWM settings.

When I next look at how these seven services are configured, I see an overly complex configuration scheme involving both build and run time files.  The run time files are generated at compile time from yet other input files, most of them YAML, some of them conf files and a few straight up shell snippets:

fans/phosphor-fan-control-events-config-native/events.yaml
fans/phosphor-fan-control-fan-config-native/fans.yaml
fans/phosphor-fan-control-zone-conditions-config-native/zone_conditions.yaml
fans/phosphor-fan-control-zone-config-native/zones.yaml
fans/phosphor-fan-monitor-config-native/monitor.yaml
fans/phosphor-fan-presence-config-native/config.yaml
fans/talos-thermal-policy/thermal-policy.yaml
fans/talos-fan-policy/air-cooled.yaml
fans/talos-fan-policy/fan-errors.yaml
fans/talos-fan-policy/water-cooled.yaml
fans/phosphor-fan/obmc/phosphor-fan/phosphor-cooling-type-0.conf
fans/talos-fan-watchdog/obmc/talos-fan-watchdog/fan-watchdog.conf
fans/talos-fan-watchdog/obmc/talos-fan-watchdog/reset-fan-watchdog.conf
sensors/phosphor-hwmon/obmc/hwmon/ahb/apb/bus@1e78a000/i2c-bus@440/max31785@52.conf
sensors/phosphor-hwmon/obmc/hwmon/ahb/apb/bus@1e78a000/i2c-bus@440/w83773g@4c.conf
sensors/phosphor-hwmon/obmc/hwmon/devices/platform/gpio-fsi/fsi0/slave@00--00/00--00--00--06/sbefifo1-dev0/occ-hwmon.1.conf
sensors/phosphor-hwmon/obmc/hwmon/devices/platform/gpio-fsi/fsi0/slave@00--00/00--00--00--0a/fsi1/slave@01--00/01--01--00--06/sbefifo2-dev0/occ-hwmon.2.conf

I'm sure there are more, but I'm not motivated to find them at the moment.  All of that configuration mess is required for a single platform with the simple design of six fans, six tachs, and five temperature sources, grouped into three zones.  None of it is runtime configurable, all of those YAML files go through a bunch of preprocessing and eventually end up as source code that is hard compiled into the fan daemons.  If the user wants to alter so much as a single PID constant, the entire stack has to be recompiled with the new settings and reflashed.

>> Furthering the lack of progress is the fact that it is mostly "working" for
>> datacenter applications, so there may be a "don't touch what isn't broken"
>> mentality in play.
> 
> As I hinted at above, I think it is a lack of necessity and not a fear
> of breaking.  In general, as a community we should not be afraid of
> change.  We have plenty of test cases to qualify code that is changing
> and if there isn't test cases for a functional area then it is fair game
> to change without worry, in my opinion.
> 
>> From a technical perspective, it is indirected to a sufficient level as to
>> be nearly incomprehensible to most people, with the source spread across
>> multiple different projects and repositories, yet somehow it remains
>> rigid / fragile enough to not support basic features like runtime (or even
>> post-compile) fan configuration for a given server.
> 
> There are two different fan control implementations presently:
>    - phosphor-pid-control (swampd)
>    - phosphor-fan-presence (phosphor-fan-control)
> 
> Which of these are you having issue with?  They are intended to serve
> drastically different purposes.
> 
> I don't think anyone outside of IBM uses phosphor-fan-control.  It seems
> to be explicitly designed for their systems with their own requirements.
> Unless they speak up, I don't know how we intend anyone else to use this
> code and it probably should be renamed 'ibm-fan-control'.
> 
>> What we need is a much simpler, more robust fan control daemon.  Ideally this
>> would be one self-contained process, not multiple interconnected processes
>> where a single failure causes the entire system to go into safe mode.
>> 
>> Our requirements:
>> 1.) True PID control with tunable constants.  Trying to do things with PWM/temp
>> maps alone may have made sense in the resource-constrained environments common
>> in the 1970s, but it makes no sense on modern, powerful BMC silicon with hard
>> floating point instructions.  Even the stock fan daemon implements a sort of
>> bespoke integrator-by-another-name, but without the P and D components it does
>> a terrible job outside of a constant-temperature datacenter environment.
> 
> Isn't phosphor-pid-control this already?

No.  It suffers from the exact same hardcoded YAML mess as above, with no tunability at runtime.

>> 2.) Tunable PID constants, tunable temperature thresholds, tunable min/max fan
>> speeds, and arbitrary linkage between temp inputs (zones) and fan outputs (also
>> zoned).
>> 3.) Configurable zones -- both temperature and PWMs, as well as installed / not
>> installed fans and temperature sensors.
> 
> I think these two features are the part that are more interesting to
> non-datacenter use cases and so nobody has put effort into it.  As much
> as you seem to dislike dbus mechanization, this sounds like we would
> need a few interfaces defined for these so that Redfish has something to
> poke at.
> 
> I do know some BIOS vendors provide this for desktops already.  Is there
> anything at an IPMI level that could facilitate the hand-off of this?
> 
>> 4.) Configurable failure behavior.  A single failed or uninstalled chassis fan
>> should NOT cause the entire platform to go into failsafe mode!
> 
> phosphor-fan-presence does provide some of this, but again, I feel like
> it is tuned to IBM's needs.  It appears that phosphor-pid-control has
> some amount of implementation of Control.FanRedundancy that I mentioned
> earlier.  Are you sure that phosphor-pid-control causes the system to go
> into fail-safe mode from a single fan failure though?  I've not heard
> this.

It's not pid-control, it's a separate monitor daemon and some shell scripts.  The "control" part of the fan control daemon stack is put into a failure mode IIRC if a problem is found, and that's a pretty coarse switch that has no ability to take into account the location of the problem or whether the other fans are able to take over with just a moderate speed increase.

>> 5.) A decent GUI to configure all of this, and the ability to export / import
>> the settings.
> 
> Sure...
> 
>> To be fair, we've only been able to implement 1, 2, 3, and 4 above at
>> compile time -- we don't have the runtime configuration options due to the
>> way the fan systems work in OpenBMC right now, and the sheer amount of
>> work needed to overhaul the GUI in the out-of-tree situation we remain
>> stuck in.  With all that said, however, we point out that our competition,
>> especially on x86 platforms, has all of these features and more, all neatly
>> contained in a nice user-friendly point+click GUI.  OpenBMC should be able
>> to easily match or exceed that functionality, but for some reason it seems
>> stuck in datacenter-only mode with archaic hardcoded tables and constants.
> 
> So, if you've done 1-4, are there any commits in Gerrit?  Which fan
> control daemon were they done against?

This is where the history of the projects starts to come into play.  We've re-implemented the same functionality several times as OpenBMC continues to churn, I think the last version required less work than before but phosphor-fan-presence and phosphor-hwmon still needed our patches to enable the raw-mode PID loops.

> There is a certain air to what you wrote that rubs me the wrong way.
> We're not a product that you've paid for here to do what you, the
> customer, is asking of us.  This is an open source community and one
> that most of us are paid to work on by our employer.  We don't do work
> to make you happy, but do work because our bosses are asking for certain
> features out of us.  As I said above, almost everyone here is working on
> "datacenter-only systems", so why would anyone else invest in this use
> case?
> 
> This is *your* business model.  We'd certainly love to have contributions
> from your team and most of us would even spend some of our time to help
> you in your efforts.  But, if you want this, as they often say:
>                    "Patches Welcome!"

Don't mean to rub anyone here the wrong way.  The main reason I'm here now is that Raptor was recently called out for creating a more limited, contained system that better matches what we need in a BMC, and chastised for not fixing the problems in OpenBMC instead.  We have not yet decided how we are officially going to proceed with our future product lines, and are evaluation options.  Part of that evaluation involves me seeing what degree of acceptance or resistance there would be in OpenBMC to merging and maintaining patches that have no real use for other ODMs in the server-only solution space.  I think the OpenBMC stack is impressive in many areas, and would prefer to use it where practical, but at the same time when I see low level design decisions like running 14 different processes all linked over DBUS just to read sensors and set corresponding fan speeds, I also realize it's simply not going to be practical to use it as-is on low end BMC silicon.

At the end of the day, we're backed into a bit of a corner here for exactly the reasons you mention above.  I have to justify development costs in order for any large-scale (non-hobby / "gratis") development project to be approved, and that means cost in vs. results out is the primary factor.  If, for the same investment, we can create a smaller BMC solution for our desktop products that actually does what our customers need, while not providing 90% of the server features said desktop users won't use in the first place, that's the direction I'll be told to go vs. attempting to extend OpenBMC in a way that no one else needs or wants.

> I do see some code from early 2020 from you against phosphor-hwmon and
> phosphor-fan-presence.  All of the phosphor-hwmon commits failed CI test
> so nobody every looked at them.  All of the phosphor-fan-presence commits
> received timely feedback, to which you never responded, and seemed to be
> missing an updated CCLA from Raptor?  Is there something we should have
> done as a community to keep this work going?

Good question!  As always, there's a bit of backstory on that as well, but really it came down to a bad combination of OpenBMC churn and reviewer delay causing a need for significant refactoring, and the needed developer resources being reassigned to more pressing projects on Raptor's side.  If the OpenBMC churn is anywhere near where it was at that point today, then the best advice I'd have is to merge as fast as possible and clean up any small issues in later commits -- this is where the fact that changing one functional item (fan control) requires coincident changes in multiple different repositories really hurts vs. a single source tree / single daemon that provides that specific function.

>> == Local firmware updates ==
>> 
>> This is right behind fan control in terms of cost and PR damage for us vs.
>> competing platforms.  While OpenBMC's firmware update support is very well
>> tuned for datacenter operations (we use a simple SSH + pflash method on our
>> large clusters, for example) it's absolutely terrible for desktop and
>> workstation applications where a second PC is not guaranteed to be available,
>> and where wired Ethernet even exists DHCP is either non-existent or provided by
>> a consumer cable box.  Some method of flashing -- and recovering -- the BMC and
>> host firmware right from the local machine is badly needed, especially for the
>> WiFi-only environments we're starting to see more of in the wild.  Ideally this
>> would be a command line tool / library such that we can integrate it with our
>> bootloader or a GUI as desired.
> 
> This sounds to me pretty easily obtainable and what I have in mind is
> actually a valid data center use case for many of us.  When all else
> fails, you should be able to use a USB key to update the system
> (assuming the image you're updating with is trusted for whatever your
> system determines is trust-worthy).  I'm pretty sure our OCP systems can
> be updated with a magic combination of a USB-key and an OCP debug
> card(*).  I don't think that is currently implemented on openbmc/openbmc,
> but it is on our list of pending features.
> 
> For your specific users, the OCP debug card is probably not a good
> requirement, but you could likely automate the update whenever a USB-key
> plus text file is added?  (I'm just brainstorming how you'd know to kick
> it off).  The current software update code probably isn't too far off
> from being able to facilitate this for you.
> 
> https://www.opencompute.org/documents/facebook-ocp-debug-card-with-lcd-spec_v1p0

At first glance, that's another overly complex solution for a simple problem that would cause a degraded user experience vs. other platforms.

We have an 800Mhz Linux-based computer with 512MB of RAM, serial and video out support already integrated into every one of our products.  It can receive data via PCIe and via USB from an active host.  Why isn't there a mechanism to send a signed container to it over one of these existing channels for self-update?

A potential user story looks like this:

=====

I want to update the firmware on my Blackbird desktop to fix a problem I'm having with a new control widget I've plugged in.  To make things more interesting, I'm on an oil rig in the Gulf, and the desktop only connects via intermittent WiFi.  Spare parts are weeks away, and I have next to no electronic diagnostic equipment available to me.  There's one or two USB ports I can normally use because I have administrative privileges, but I was able to grab the upgrade file over WiFi instead, saving myself some time cleaning accumulated gunk out of the ports.

I can update my <large vendor> standard PC firmware just by running a tool on Windows, but the Blackbird was selected because it controls a critical process that needed to be malware-resistant.

Fortunately, OpenBMC implemented a quality firmware update process.  I just need to launch a GUI tool with host administrative privileges, select the upgrade file, and queue an upgrade to happen when I reboot the machine.  I queue the update, start the reboot, and stick around to see the upgrade progress on the screen while it's booting back up.  Because I can see the status on the screen, I know what is happening and don't pull the power plug due to only seeing a black screen and power LED for 10 minutes.  Finally, the machine loads the OS and I verify the new control widget is working properly.

=====

Is there a technical / architectural reason this can't be done, or some other reason it's a bad idea?

>> == BMC boot time ==
>> 
>> This is self explanatory.  Other vendors' solutions allow the host to be powered
>> on within seconds of power application from the wall, and even our own Kestrel
>> soft BMC allows the host to begin booting less than 10 seconds after power is
>> applied.  Several *minutes* for OpenBMC to reach a point where it can even
>> start to boot the host is a major issue outside of datacenter applications.
> 
> Some of this is, to me, an artifact of the Power architecture and not an
> artifact of OpenBMC explicitly.  On x86 systems we have a little code in
> u-boot that wiggles a GPIO and gets the Host power sequence going while
> the BMC is booting up.  This overlaps quite a bit of the memory testing
> of the Host with the BMC boot time.  The "well-known proprietary BMC"
> also does this same trick.

I think we're talking about two different well know proprietary BMCs, but that's not important for this discussion other than no, the one I have in mind doesn't resort to such tricks.  What it does do is start up its core services rapidly enough where this isn't a problem, and lets the rest of the BMC stack start up at its own pace later on.
 
> Power requires the BMC to be up in order to serve out the virtual PNOR,
> from my recollection.  It seems like this could be solved in other ways,
> such as a SPI-mux on a physical SPI-NOR so that the BMC can take the NOR
> at specific times during update but otherwise it is given to the host
> CPUs.  This is exactly what we do on x86 systems.

Ouch.  So on x86 boxen you might actually have two "BMCs" -- the proprietary one inside the CPU that starts in seconds and provides base services like SPI Flash mapping to CPU address space, and the external OpenBMC one that can run in parallel without interfering with host start.  Adding a mux is then a hack needed on top, since you can't really communicate with the proprietary stack in the required manner.

For systems like POWER that lack the proprietary internal "BMC", I guess there are a few ways we could address the problem:

1.) Speed up OpenBMC load -- this sounds like it would end up being completely supported by one or two vendors alone, and subject to breakage from the other vendors that simply don't have any concerns around OpenBMC start time since their platforms aren't visibly affected by it.  It's also unlikely to come into the desired sub-10s range.

2.) Split the BMC into "essential" and "nice to have" services, much like the other platforms.  Painful, as it now requires even more parts on the mainboard.

3.) Keep the single BMC device, but split it into two software stacks, one that can load nearly instantly and start providing essential services, and another than can load more slowly.  This would effectively require two separate CPUs inside the BMC, which we actually do have in the AST2500.  I haven't done any digging though to see if the second CPU is powerful enough to implement the HIOMAP protocol at speed.

> Having said all of that, there is certainly some performance
> improvements that can be done, but nobody has taken up the torch on it.
> A big low-hanging fruit in my mind is the file system compression being
> xz or gzip is very computationally intensive.  I did some work, with
> Nick Terrell, to switch to zstd on our systems for both the kernel
> initramfs and UBI and saw significant boot time improvements.  The
> upstream enablement for this appears to have landed as of v5.9 so we
> could certainly start enabling it here now.
> 
> https://lore.kernel.org/linux-kbuild/20200730190841.2071656-7-nickrterrell@gmail.com/
> 
>> == Host boot status indications ==
>> 
>> Any ODM that makes server products has had to deal with the psychological "dead
>> server effect", where lack of visible progress during boot causes spurious
>> callouts / RMAs.  It's even worse on desktop, especially if server-type
>> hardware is used inside the machine.  We've worked around this a few times with
>> our "IPL observer" services, and really do need this functionality in OpenBMC.
>> The current version we have is both front panel lights and a progress bar on
>> the BMC boot monitor (VGA/HDMI), and this is something we're willing to
>> contribute upstream.
> 
> Great!  Let's get that merged!

Sounds good!  The files aren't too complex:

https://git.raptorcs.com/git/blackbird-skeleton/tree/pyiplobserver
https://git.raptorcs.com/git/blackbird-skeleton/tree/pyiplledmonitor

Is the skeleton repository the best place for a merge request?

> I do think some others have support for a 7-seg display with the
> postcodes going to it already.  I think this is along those same lines.
> It might just be another back-end for our existing post code daemon to
> replicate them to the VGA and/or blink morse code on an LED.

OK, so this is what we ran into before.  Where is this support in-tree, and do we need to reimplement our system to match what already exists (by extension, extending the other vendor code since our observer is more detailed in terms of status etc.), or would we be allowed to provide a competing solution to this other support, letting ODMs pick which one they wanted?

>> == IPMI / BMC permissions ==
>> 
>> An item that's come up recently is that, at least on our older OpenBMC versions,
>> there's a complete disconnect between the BMC's shell user database and the
>> IPMI user database.  Resetting the BMC root password isn't possible from IPMI
>> on the host, and setting up IPMI doesn't seem possible from the BMC shell.  If
>> IPMI support is something OpenBMC provides alongside Redfish, it needs to be
>> better integrated -- we're dealing with multiple locked-out BMC issues at the
>> moment at various customer sites, and the recovery method is painful at best
>> when it should be as simple as an ipmitool command from the host terminal.
> 
> I suspect most of this is a matter of IPMI command support and/or enabling
> those commands to the host IPMI path.  Most of us are fairly untrusting
> of IPMI (and the Host itself), so there hasn't been work to do anything
> here.  As long as whatever you're proposing can be disabled for models
> where we distrust the Host, it seems like these would be accepted as
> well.
> 
>> If there is interest, I'd suggest we all work on getting some semblance of a
>> modern fan control system and the boot status indication framework into
>> upstream OpenBMC.  This would allow Raptor to start upstreaming base support
>> for RCS product lines without risking severe regressions in user pain points
>> like noisy fans -- perceived high noise levels are always a great way to kill
>> sales of office products, and as a result the fan control functionality is
>> something we're quite sensitive about.  The main problem is that with the
>> existing fan control system's tentacles snaking everywhere including the UI,
>> this will need to be a concerted effort by multiple organizations including the
>> maintainers of the UI and the other ODMs currently using the existing fan
>> control functionality.  We're willing to make yet another attempt *if* there's
>> enough buy-in from the various stakeholders to ensure a prompt merge and update
>> of the other components.
> 
> This would be great.  Hopefully nothing I wrote here was too harsh or
> turned you off.  One piece of advice though...
> 
> Even if you find that some of the changes you propose are met with some
> resistance, it would be good to get your base system support upstreamed
> and continue to hold your extra sauce off on the side.  I know there
> has been complaints by some owners of Raptor hardware that they cannot
> use upstream code improvements on their own hardware because of the
> forked nature of your code base.  The way forward, to me, is to get
> your hardware configuration upstreamed first and work on these extra
> features separately.  If one of your customers wants to use upstream,
> with the caveat that they lose out on a few super awesome features, they
> can make that decision, but the important thing is that your machine
> doesn't get "left behind".

So this is where we run into an interesting intersection of perceptual issues surrounding POWER, marketing, and resistance to fixing the fan controls in particular.

At the end of the day, we *need* reliable fan control in-tree before we'll upstream the platform support.  POWER is still, rightly or wrongly, perceived as power hungry, inefficient, hot, and noisy.  Even a small percentage of users that load upstream for various fixes, only to have the system fans scream at them all the time, will severely damage our brand image.  I'm aware that an older OpenBMC tree is also causing some issues, but the perception is the fan control issue is more important given the specific headwinds surrounding POWER.  I simply don't have access to the resources required to break the deadlock; I've tried to make the case to the key decision makers but so far I've been met with stiff resistance.  This is in no small part due to the lack of results from previous attempts to get a workable fan control solution merged; the cost/benefit just keeps coming up as not something we want to throw funding at right now.

I hope this makes some sense, and thank you for the response!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: OpenBMC on RCS platforms
  2021-04-23 17:23 ` Ed Tanous
@ 2021-04-23 19:00   ` Timothy Pearson
  2021-04-23 19:23     ` Ed Tanous
  0 siblings, 1 reply; 11+ messages in thread
From: Timothy Pearson @ 2021-04-23 19:00 UTC (permalink / raw)
  To: Ed Tanous; +Cc: openbmc



----- Original Message -----
> From: "Ed Tanous" <ed@tanous.net>
> To: "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "openbmc" <openbmc@lists.ozlabs.org>
> Sent: Friday, April 23, 2021 12:23:23 PM
> Subject: Re: OpenBMC on RCS platforms

> On Fri, Apr 23, 2021 at 7:36 AM Timothy Pearson
> <tpearson@raptorengineering.com> wrote:
>>
> 
> First off, this is great feedback, and despite some of my comments
> below, I do really appreciate you putting it out there.
> 
>> All,
>>
>> I'm reaching out after some internal discussion on how we can better integrate
>> our platforms with the OpenBMC project.  As many of you may know, we have been
>> using OpenBMC in our lineup of OpenPOWER-based server and desktop products,
>> with a number of custom patches on top to better serve our target markets.
>>
>> While we have had fairly good success with OpenBMC in the server / datacenter
>> space, reception has been lukewarm at best in the desktop space.  This is not
>> too surprising, given OpenBMC's historical focus on datacenter applications,
>> but it is also becoming an expensive technical and PR pain point for us as the
>> years go by.  To make matters worse, we're still shielding our desktop /
>> workstation customer base to some degree from certain design decisions that
>> persist in upstream OpenBMC, and we'd like to open discussion on all of these
>> topics to see if a resolution can be found with minimal wasted effort from all
>> sides.
>>
>> Roughly speaking, we see issues in OpenBMC in 5 main areas:
>>
>>
>> == Fan control ==
>>
>> Out of all of the various pain points we've dealt with over the years, this has
>> proven the most costly and is responsible on its own for the lack of RCS
>> platforms upstream in OpenBMC.
>>
>> To be perfectly frank, OpenBMC's current fan control subsystem is a technical
>> embarrassment, and not up to the high quality seen elsewhere in the project.
> 
> Which fan control subsystem are you referring to?  Phosphor-fans or
> phosphor-pid-control?
> 
>>  Worse, this multi-daemon DBUS-interconnected Rube Goldberg contraption has
>>  somehow managed to persist over the past 4+ years, likely because it reached a
>>  complexity level where it is both tightly integrated with the rest of the
>>  OpenBMC system and extremely difficult to understand, therefore it is equally
>>  difficult to replace.  Furthering the lack of progress is the fact that it is
>>  mostly "working" for datacenter applications, so there may be a "don't touch
>>  what isn't broken" mentality in play.
> 
> I'm not really sure I agree with that.  If someone came with a design
> for "We should replace dbus with X", had good technical foundations
> for why X was better, and was putting forward the monumental effort to
> do the work, I know that I personally wouldn't be opposed.  For the
> record, I agree with you about the complexity here, but most of the
> ideas I've heard to make it better were "Throw everything out and
> start over", which, if that's what you want to do, by all means do,
> but I don't think the community is willing to redo all of the untold
> hours of engineering effort spent over the years the project has
> existed.
> 
> FWIW, u-bmc was a project that took the existing kernel, threw out all
> the userspace and started over.  From my view outside the project,
> they seem to have failed to gain traction, and only support a couple
> of platforms.
> 
>>  From a technical perspective, it is indirected to a sufficient level as to be
>>  nearly incomprehensible to most people, with the source spread across multiple
>>  different projects and repositories, yet somehow it remains rigid / fragile
>>  enough to not support basic features like runtime (or even post-compile) fan
>>  configuration for a given server.
> 
> With respect, this statement is incorrect.  On an entity-manager
> enabled system + phosphor-pid-control, all of the fan control
> parameters are fully modifiable at runtime either from within the
> system (through dbus) or through Redfish out of band through the
> OEMManager API.  If you haven't ported your systems to entity-manager
> yet, there's quite a bit of people doing it at the moment and are
> discussing this stuff on discord basically every day that I'm sure
> would be able to give you some direction on where to start getting
> your systems moved over.

<snip>

Interesting.  I assume entity-manager is pretty new still?  A year ago there was zero solution to the problem of runtime configuration, and when I checked several weeks ago the bug report on it [1] had no meaningful progress.  Looks like that's finally changing.

Is the entity manager fairly stable API-wise at this point?  That might be enough of a game changer for me to go back and get approval for what will effectively be our fourth port of the Talos II systems to OpenBMC.

[1] https://github.com/openbmc/openbmc/issues/3595

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: OpenBMC on RCS platforms
  2021-04-23 19:00   ` Timothy Pearson
@ 2021-04-23 19:23     ` Ed Tanous
  0 siblings, 0 replies; 11+ messages in thread
From: Ed Tanous @ 2021-04-23 19:23 UTC (permalink / raw)
  To: Timothy Pearson; +Cc: openbmc, Ed Tanous

On Fri, Apr 23, 2021 at 12:00 PM Timothy Pearson
<tpearson@raptorengineering.com> wrote:
>
>
>
> ----- Original Message -----
> > From: "Ed Tanous" <ed@tanous.net>
> > To: "Timothy Pearson" <tpearson@raptorengineering.com>
> > Cc: "openbmc" <openbmc@lists.ozlabs.org>
> > Sent: Friday, April 23, 2021 12:23:23 PM
> > Subject: Re: OpenBMC on RCS platforms
>
> > On Fri, Apr 23, 2021 at 7:36 AM Timothy Pearson
> > <tpearson@raptorengineering.com> wrote:
> >>
> >
> > First off, this is great feedback, and despite some of my comments
> > below, I do really appreciate you putting it out there.
> >
> >> All,
> >>
> >> I'm reaching out after some internal discussion on how we can better integrate
> >> our platforms with the OpenBMC project.  As many of you may know, we have been
> >> using OpenBMC in our lineup of OpenPOWER-based server and desktop products,
> >> with a number of custom patches on top to better serve our target markets.
> >>
> >> While we have had fairly good success with OpenBMC in the server / datacenter
> >> space, reception has been lukewarm at best in the desktop space.  This is not
> >> too surprising, given OpenBMC's historical focus on datacenter applications,
> >> but it is also becoming an expensive technical and PR pain point for us as the
> >> years go by.  To make matters worse, we're still shielding our desktop /
> >> workstation customer base to some degree from certain design decisions that
> >> persist in upstream OpenBMC, and we'd like to open discussion on all of these
> >> topics to see if a resolution can be found with minimal wasted effort from all
> >> sides.
> >>
> >> Roughly speaking, we see issues in OpenBMC in 5 main areas:
> >>
> >>
> >> == Fan control ==
> >>
> >> Out of all of the various pain points we've dealt with over the years, this has
> >> proven the most costly and is responsible on its own for the lack of RCS
> >> platforms upstream in OpenBMC.
> >>
> >> To be perfectly frank, OpenBMC's current fan control subsystem is a technical
> >> embarrassment, and not up to the high quality seen elsewhere in the project.
> >
> > Which fan control subsystem are you referring to?  Phosphor-fans or
> > phosphor-pid-control?
> >
> >>  Worse, this multi-daemon DBUS-interconnected Rube Goldberg contraption has
> >>  somehow managed to persist over the past 4+ years, likely because it reached a
> >>  complexity level where it is both tightly integrated with the rest of the
> >>  OpenBMC system and extremely difficult to understand, therefore it is equally
> >>  difficult to replace.  Furthering the lack of progress is the fact that it is
> >>  mostly "working" for datacenter applications, so there may be a "don't touch
> >>  what isn't broken" mentality in play.
> >
> > I'm not really sure I agree with that.  If someone came with a design
> > for "We should replace dbus with X", had good technical foundations
> > for why X was better, and was putting forward the monumental effort to
> > do the work, I know that I personally wouldn't be opposed.  For the
> > record, I agree with you about the complexity here, but most of the
> > ideas I've heard to make it better were "Throw everything out and
> > start over", which, if that's what you want to do, by all means do,
> > but I don't think the community is willing to redo all of the untold
> > hours of engineering effort spent over the years the project has
> > existed.
> >
> > FWIW, u-bmc was a project that took the existing kernel, threw out all
> > the userspace and started over.  From my view outside the project,
> > they seem to have failed to gain traction, and only support a couple
> > of platforms.
> >
> >>  From a technical perspective, it is indirected to a sufficient level as to be
> >>  nearly incomprehensible to most people, with the source spread across multiple
> >>  different projects and repositories, yet somehow it remains rigid / fragile
> >>  enough to not support basic features like runtime (or even post-compile) fan
> >>  configuration for a given server.
> >
> > With respect, this statement is incorrect.  On an entity-manager
> > enabled system + phosphor-pid-control, all of the fan control
> > parameters are fully modifiable at runtime either from within the
> > system (through dbus) or through Redfish out of band through the
> > OEMManager API.  If you haven't ported your systems to entity-manager
> > yet, there's quite a bit of people doing it at the moment and are
> > discussing this stuff on discord basically every day that I'm sure
> > would be able to give you some direction on where to start getting
> > your systems moved over.
>
> <snip>
>
> Interesting.  I assume entity-manager is pretty new still?

It's a couple years old at this point (I think first commit was in
2018?).  It has certainly gotten more momentum over time though.

>  A year ago there was zero solution to the problem of runtime configuration, and when I checked several weeks ago the bug report on it [1] had no meaningful progress.

Bug reports aren't generally the best way to get answers in my
experience, especially if it's not a "bug" but an enhancement you want
to make to the overall architecture. The mailing list or discord tends
to get better responses (as you've seen here in this thread).

For what it's worth, Redfish configurable PID loops were checked in
back in October of 2018, so about 2 and a half years old now.
https://github.com/openbmc/bmcweb/commit/af996fe4d12668d1a096e36e791c49690e54c9bb

>  Looks like that's finally changing.
>
> Is the entity manager fairly stable API-wise at this point?

While we do our best to not make backward incompatible configuration
changes (I can't think of any we've done yet) we don't guarantee it,
and certainly can't make any stability guarantees about code we can't
see.  The best way to keep your systems stable is to get them
upstreamed, so when we need to make "might break things" type changes,
we'll have a good idea if anyone is actually using the features in
question, and which systems we should ask maintainers to test changes
against.

More details under "Intent" heading item #3 here:
https://github.com/openbmc/entity-manager/blob/21608383661285e63e97c0457f55817f6e1d6b92/CONFIG_FORMAT.md

>  That might be enough of a game changer for me to go back and get approval for what will effectively be our fourth port of the Talos II systems to OpenBMC.

Glad to see you're interested.

>
> [1] https://github.com/openbmc/openbmc/issues/3595

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: OpenBMC on RCS platforms
  2021-04-23 18:46   ` Timothy Pearson
@ 2021-04-26 21:42     ` Milton Miller II
  2021-04-28 20:21       ` Timothy Pearson
  2021-04-29  7:54       ` OpenBMC on RCS platforms Milton Miller II
  0 siblings, 2 replies; 11+ messages in thread
From: Milton Miller II @ 2021-04-26 21:42 UTC (permalink / raw)
  To: Timothy Pearson; +Cc: openbmc

Timothy Pearson <tpearson@raptorengineering.com> wrote:
>----- Original Message -----
>> From: "Patrick Williams" <patrick@stwcx.xyz>
>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>> Cc: "openbmc" <openbmc@lists.ozlabs.org>
>> Sent: Friday, April 23, 2021 12:11:26 PM
>> Subject: Re: OpenBMC on RCS platforms
>
>> On Fri, Apr 23, 2021 at 09:30:00AM -0500, Timothy Pearson wrote:
>>> All,
>>> 
>>> I'm reaching out after some internal discussion on how we can
>>> better integrate
>>> our platforms with the OpenBMC project.  As many of you may know,
>>> we have been
>>> using OpenBMC in our lineup of OpenPOWER-based server and desktop
>>> products,
>>> with a number of custom patches on top to better serve our target
>>> markets.
>> 
>> Hi Timothy,
>> 
>> Good to hear from your team again and hope there is some ways we
>> can
>> work together on solving some of these issues.
>> 
>>> Roughly speaking, we see issues in OpenBMC in 5 main areas:
>> 
>> We might want to fork this into 5 different discussion threads
>> and/or
>> design documents, but let's see how this goes...
>> 

[ some issues trimmed, including fan ]

>>> == Local firmware updates ==
>>> 
>>> This is right behind fan control in terms of cost and PR damage
>>> for us vs.
>>> competing platforms.  While OpenBMC's firmware update support is
>>> very well
>>> tuned for datacenter operations (we use a simple SSH + pflash
>>> method on our
>>> large clusters, for example) it's absolutely terrible for desktop
>>> and
>>> workstation applications where a second PC is not guaranteed to be
>>> available,
>>> and where wired Ethernet even exists DHCP is either non-existent
>>> or provided by
>>> a consumer cable box.  Some method of flashing -- and recovering
>>> -- the BMC and
>>> host firmware right from the local machine is badly needed,
>>> especially for the
>>> WiFi-only environments we're starting to see more of in the wild.
>>> Ideally this
>>> would be a command line tool / library such that we can integrate
>>> it with our
>>> bootloader or a GUI as desired.
>> 
>> This sounds to me pretty easily obtainable and what I have in mind
>> is
>> actually a valid data center use case for many of us.  When all
>> else
>> fails, you should be able to use a USB key to update the system
>> (assuming the image you're updating with is trusted for whatever
>> your
>> system determines is trust-worthy).  I'm pretty sure our OCP
>> systems can
>> be updated with a magic combination of a USB-key and an OCP debug
>> card(*).  I don't think that is currently implemented on
>> openbmc/openbmc,
>> but it is on our list of pending features.
>> 
>> For your specific users, the OCP debug card is probably not a good
>> requirement, but you could likely automate the update whenever a
>> USB-key
>> plus text file is added?  (I'm just brainstorming how you'd know to
>> kick
>> it off).  The current software update code probably isn't too far
>> off
>> from being able to facilitate this for you.
>> 
>> https://www.opencompute.org/documents/facebook-ocp-debug-card-with-lcd-spec_v1p0
>
>At first glance, that's another overly complex solution for a simple
>problem that would cause a degraded user experience vs. other
>platforms.
>

I have to agree, both overly complex and probably not useful in that
its just a port interface for control.

>We have an 800Mhz Linux-based computer with 512MB of RAM, serial and
>video out support already integrated into every one of our products.
>It can receive data via PCIe and via USB from an active host.  Why
>isn't there a mechanism to send a signed container to it over one of
>these existing channels for self-update?
>
>A potential user story looks like this:
>
>=====
>
>I want to update the firmware on my Blackbird desktop to fix a
>problem I'm having with a new control widget I've plugged in.  To
>make things more interesting, I'm on an oil rig in the Gulf, and the
>desktop only connects via intermittent WiFi.  Spare parts are weeks
>away, and I have next to no electronic diagnostic equipment available
>to me.  There's one or two USB ports I can normally use because I
>have administrative privileges, but I was able to grab the upgrade
>file over WiFi instead, saving myself some time cleaning accumulated
>gunk out of the ports.
>
>I can update my <large vendor> standard PC firmware just by running a
>tool on Windows, but the Blackbird was selected because it controls a
>critical process that needed to be malware-resistant.
>
>Fortunately, OpenBMC implemented a quality firmware update process.
>I just need to launch a GUI tool with host administrative privileges,
>select the upgrade file, and queue an upgrade to happen when I reboot
>the machine.  I queue the update, start the reboot, and stick around
>to see the upgrade progress on the screen while it's booting back up.
> Because I can see the status on the screen, I know what is happening
>and don't pull the power plug due to only seeing a black screen and
>power LED for 10 minutes.  Finally, the machine loads the OS and I
>verify the new control widget is working properly.
>
>=====
>
>Is there a technical / architectural reason this can't be done, or
>some other reason it's a bad idea?
>

I ended up writing this twice or thrice.  Also what I call
phosphor-initfs is actually the package obmc-phosphor-initfs.bb
found in meta-phosphor/recipies-phosphor/initrdscripts/.


There are two issues.  One is that there is no graphics
library or console code for the aspeed bmc.  I understand a 
text rendering library was added for boot monitoring). But 
if you are starting from the host up, then use the host to 
drive the GUI and just establish a command session (network, 
USB to host, or serial).  

The biggest limitation is we use squashfs for file system 
for space efficency.  This is a read-only filesystem that 
contains references between different pieces that is loaded
and decompressed by the kernel on demand.  That means you can
not be running on the copy in flash while trying to update
that copy in the flash.

If you have space for two copies then you can update the
second copy while the primary is online.  This is supported
in the UBI and eMMC layouts upstream.

If you only have flash space for one copy then you have to
arrange for something more limited.  Either way you are 
subject to bricking on interrupted flash unless you do
something exotic like repurpose the host chip as a backup
BMC during the process.   But if its just the feedback
then the upstream code has help that isn't in the Redfish
flow.


====
Once

The "static" mtd layout with phosphor-initfs has support 
for both loading the static flash content into RAM, allowing 
the update to occur with full services running, and as  a 
backup on shutdown it will apply the update on bmc reboot 
by switching back to the initramfs and performing the flash 
from there.  The status of the later update is only visible 
on the console, which might be hidden on an internal serial 
cable by default.

Unfortunately the "prepare for update" method that was in 
the original update instructions and tells the BMC init 
"hey, load all this content into ram, so that you can write 
over the flash" got lost in the "we must be limited to what 
RedFish can support".  The code is still in the low level 
scripts but the fancy rest api is missing.  Also with the 
addition of code verification the actual flash progress 
was hidden.

The phosphor-initfs scripts also allow a new filesystem 
image to be downloaded over the network if you wish to test.
This doesn't have signature checking code, and it can be
disabled by build options.

All of the options to phosphor-initfs can be set by u-boot 
environment variables (one of which is cleared by a systemd
unit each boot, on that is not) and by the kernel command 
line.

Note: I highly suggest not to use image-bmc (for the whole
flash) as this erases the entire flash (although we try to
write back the u-boot environment), but instead use image-kernel, 
image-rofs, etc to allow the prior rwfs and u-boot to persist.
Some bad assertions may have migrated into the code-update 
rest endpoints and we should accept patches.

Bottom Line:

Put the BMC in maintence mode and you can update the image
while the stack is running.  You can then use ssh to 
display the flash progress.  If you need a fancy gui and 
not the internal serial then use the host, or write the 
rest of the graphics stack.

If you need the reliable backout then you need space for 
a second image, even if its smaller due to being emergency
servies only.


PS:  There were some flashes we tried early that had 
horrible erase times -- over 20 minutes for a full
erase.  Check the specs for the parts you provide vs 
others in the market, the better ones erase in a few
minutes.

PPS:  The reason we added UBI was its feature to use
the whole flash for wear leveling (minus the bootloader
that is outside the UBI partition).

=======================================
Twice: Going back to the scenerio again

>I just need to launch a GUI tool with host administrative privileges,
>select the upgrade file, and queue an upgrade to happen when I reboot
>the machine.  I queue the update, start the reboot, and stick around
>to see the upgrade progress on the screen while it's booting back up.
> Because I can see the status on the screen, I know what is happening
>and don't pull the power plug due to only seeing a black screen and
>power LED for 10 minutes.  Finally, the machine loads the OS and I
>verify the new control widget is working properly.

If the gui is on the host, with todays stock phosphor-initfs, you need
1) a connection from the host to the bmc
   ethernet, serial, usb ethernet etc  
   (to copy files from host to BMC RAM and to monitor command output)

2) hardware ability to reboot bmc with host surviving
 - all userspace has to be replaced with those on the filesystem in RAM
 - can be shortened slightly by preloading image in BMC before shuting
   down services if the current kernel is compatible.  This can be the
   old or new image.

 - or -
 
 Boot the host for GUI support with the BMC in an optimized
 update mode.

  This can be before or after the file is downloaded to the
  host.


3) Once the bmc is running from a squashfs in RAM (and if you want
to clean the rwfs overlay, persist on clean reboot/shutdown mode),

- copy the image to the bmc 
- validate as required (preferably somewhere under /run)
- move imgage-rofs , kernel, etc as needed to /run/initramfs
- /run/initramfs/update 
    (which checks the fs is not obviously mounted,
     runs flashcp, which has status on stdout
     moves files successfully written
     and then writes selected overlay content back to rwfs
- check the images were all written
- reboot

=================
Option Three:
This might be a better experience but needs some software work
to enable kexec on the 2500.   


Transfer the FS and kernel to the BMC RAM, and kexec the kernel
(note patches on the list for 2600 need to test and maybe a bit of
coding for the 2500).  Optionally this can contain the virt pnor
image too.  After the BMC boots from the system in RAM boot the
host from vpnor image in RAM then use the host to drive the GUI
to acknoledge and initiate the flash as desired.

The hooks are in phosphor-initfs to flash the image after the 
host is up, and to boot with the image in RAM.  

As an alternative to kexec, if the new file system supports the 
old BMC kernel then the shutdown script can easily be edited to
restart the exec script with the images in /run.  Alternatively 
if the new kernel supports the old user space then it can be 
flashed first, then on the next boot the prior case applies as
it is the updated kernel.  Note: I did this flow several times
in developement but decided not to put code in the shutdown 
script because its a script that is executed from /run/initramfs
and can easily be edited there when alternative flow is required.
(there are comments that show where to edit).


>>> == BMC boot time ==
>>> 
>>> This is self explanatory.  Other vendors' solutions allow the host
>>> to be powered
>>> on within seconds of power application from the wall, and even our
>>> own Kestrel
>>> soft BMC allows the host to begin booting less than 10 seconds
>>> after power is
>>> applied.  Several *minutes* for OpenBMC to reach a point where it
>>> can even
>>> start to boot the host is a major issue outside of datacenter
>>> applications.
>> 
>> Some of this is, to me, an artifact of the Power architecture and
>> not an
>> artifact of OpenBMC explicitly.  On x86 systems we have a little
>> code in
>> u-boot that wiggles a GPIO and gets the Host power sequence going
>> while
>> the BMC is booting up.  This overlaps quite a bit of the memory
>> testing
>> of the Host with the BMC boot time.  The "well-known proprietary
>> BMC"
>> also does this same trick.
>
>I think we're talking about two different well know proprietary BMCs,
>but that's not important for this discussion other than no, the one I
>have in mind doesn't resort to such tricks.  What it does do is start
>up its core services rapidly enough where this isn't a problem, and
>lets the rest of the BMC stack start up at its own pace later on.
> 
>> Power requires the BMC to be up in order to serve out the virtual
>> PNOR,
>> from my recollection.  It seems like this could be solved in other
>> ways,
>> such as a SPI-mux on a physical SPI-NOR so that the BMC can take
>> the NOR
>> at specific times during update but otherwise it is given to the
>> host
>> CPUs.  This is exactly what we do on x86 systems.
>
>Ouch.  So on x86 boxen you might actually have two "BMCs" -- the
>proprietary one inside the CPU that starts in seconds and provides
>base services like SPI Flash mapping to CPU address space, and the
>external OpenBMC one that can run in parallel without interfering
>with host start.  Adding a mux is then a hack needed on top, since
>you can't really communicate with the proprietary stack in the
>required manner.
>

I'd say their cpu doesn't require the bmc to boot, it also means
they trust their system to not melt without bmc monitoring.

>For systems like POWER that lack the proprietary internal "BMC", I
>guess there are a few ways we could address the problem:
>
>1.) Speed up OpenBMC load -- this sounds like it would end up being
>completely supported by one or two vendors alone, and subject to
>breakage from the other vendors that simply don't have any concerns
>around OpenBMC start time since their platforms aren't visibly
>affected by it.  It's also unlikely to come into the desired sub-10s
>range.
>
>2.) Split the BMC into "essential" and "nice to have" services, much
>like the other platforms.  Painful, as it now requires even more
>parts on the mainboard.
>
>3.) Keep the single BMC device, but split it into two software
>stacks, one that can load nearly instantly and start providing
>essential services, and another than can load more slowly.  This
>would effectively require two separate CPUs inside the BMC, which we
>actually do have in the AST2500.  I haven't done any digging though
>to see if the second CPU is powerful enough to implement the HIOMAP
>protocol at speed.
>
>> Having said all of that, there is certainly some performance
>> improvements that can be done, but nobody has taken up the torch on
>> it.
>> A big low-hanging fruit in my mind is the file system compression
>> being
>> xz or gzip is very computationally intensive.  I did some work,
>> with
>> Nick Terrell, to switch to zstd on our systems for both the kernel
>> initramfs and UBI and saw significant boot time improvements.  The
>> upstream enablement for this appears to have landed as of v5.9 so
>> we
>> could certainly start enabling it here now.
>> 
>>
>INVALID URI REMOVED
>linux-2Dkbuild_20200730190841.2071656-2D7-2Dnickrterrell-40gmail.com_
>&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=bvv7AJEECoRKBU02rcu4F5DWd-EwX8As
>2xrXeO9ZSo4&m=2O37p_XR8IO9jl4psZwnU-fmhndTW41NpqMXsT9Or6w&s=DF7yGqfSE
>-V5_j_DgmASLOgLpkfjcJpCK5xsJW3avqY&e= 
>> 

In addition to compression options there are tradeoffs on how much is 
copied to ram vs how much is read from the flash possibly repeatedly.
If you add secure boot the time goes up.

>>> == Host boot status indications ==
>>> 
>>> Any ODM that makes server products has had to deal with the
>>> psychological "dead
>>> server effect", where lack of visible progress during boot causes
>>> spurious
>>> callouts / RMAs.  It's even worse on desktop, especially if
>>> server-type
>>> hardware is used inside the machine.  We've worked around this a
>>> few times with
>>> our "IPL observer" services, and really do need this functionality
>>> in OpenBMC.
>>> The current version we have is both front panel lights and a
>>> progress bar on
>>> the BMC boot monitor (VGA/HDMI), and this is something we're
>>> willing to
>>> contribute upstream.
>> 
>> Great!  Let's get that merged!
>
>Sounds good!  The files aren't too complex:
>
>INVALID URI REMOVED
>_git_blackbird-2Dskeleton_tree_pyiplobserver&d=DwIFaQ&c=jf_iaSHvJObTb
>x-siA1ZOg&r=bvv7AJEECoRKBU02rcu4F5DWd-EwX8As2xrXeO9ZSo4&m=2O37p_XR8IO
>9jl4psZwnU-fmhndTW41NpqMXsT9Or6w&s=zLtrjaE2hHjV3z9ar0gcJVvZ9Uzwxinfed
>AOMEWs04s&e= 
>INVALID URI REMOVED
>_git_blackbird-2Dskeleton_tree_pyiplledmonitor&d=DwIFaQ&c=jf_iaSHvJOb
>Tbx-siA1ZOg&r=bvv7AJEECoRKBU02rcu4F5DWd-EwX8As2xrXeO9ZSo4&m=2O37p_XR8
>IO9jl4psZwnU-fmhndTW41NpqMXsT9Or6w&s=AOWB1Ja82thvSZFO81WfIj7MJtg5TeZN
>8wpT_EpG_Zo&e= 
>
>Is the skeleton repository the best place for a merge request?

hmm, as prototype code in python, maybe.   I don't think many current
systems ship python.  Also upstream Yocto removed all support for 
python 2.  

In addition I see a mix of "copy the data" and "transform the data"
in the same script, such as 

updateIPLLeds(self, initial_start, status_changed)

with 
            # Show major ISTEP on LED bank
            # On Talos we only have three LEDs plus a fourth indicator modification 
            # bit, but the major ISTEPs range from 2 to 21
            # Try to condense that down to something more readily displayable


[ After some thought, its ok to be in the output code, as it's 
formatting the data for the display. ]


The upstream post interface logs the post codes, and display is
a separate function.  The ipl_status_monitor seems to mix monitoring 
the port 80 snoops with other logic to determine the system state 
eg is the host up?.

Also both scripts extensivly use popen to handle device communication
and some communication to other services (kill to post code).


>
>> I do think some others have support for a 7-seg display with the
>> postcodes going to it already.  I think this is along those same
>> lines.
>> It might just be another back-end for our existing post code daemon
>> to
>> replicate them to the VGA and/or blink morse code on an LED.
>
>OK, so this is what we ran into before.  Where is this support
>in-tree, and do we need to reimplement our system to match what
>already exists (by extension, extending the other vendor code since
>our observer is more detailed in terms of status etc.), or would we
>be allowed to provide a competing solution to this other support,
>letting ODMs pick which one they wanted?
>

Our upstream code is at https://github.com/openbmc/phosphor-host-postd
for the snoop readers and the LED segment drivers, and the history 
and Dbus owner is https://github.com/openbmc/phosphor-post-code-manager.

To catalog the source of the host and bmc there is
https://github.com/openbmc/phosphor-state-manager/blob/master/obmcutil

In addition to phosphor-misc for "one file projects" there is 
openbmc-tools for handy tools which may be more developer focused.

>>> == IPMI / BMC permissions ==
>>> 
>>> An item that's come up recently is that, at least on our older
>>> OpenBMC versions,
>>> there's a complete disconnect between the BMC's shell user
>>> database and the
>>> IPMI user database.  

Mostly true, in part because the IPMI password for RCMP+ must be
stored on the BMC (reversiably encrypted for our implementation).
Note improper storage of this was an area of one or more CVEs.

In addition it has a limit of 20 characters in a password and 8
users.

>>> Resetting the BMC root password isn't possible from IPMI
>>> on the host, and setting up IPMI doesn't seem possible from the
>>>>BMC shell.  If

In our current code we have pam hooks that save the password 
during a change, if the user is in the ipmi group and the 
password is short enough (or returns an error).

>>> IPMI support is something OpenBMC provides alongside Redfish, it
>>> needs to be
>>> better integrated -- we're dealing with multiple locked-out BMC
>>> issues at the
>>> moment at various customer sites, and the recovery method is
>>> painful at best
>>> when it should be as simple as an ipmitool command from the host
>>> terminal.
>> 
>> I suspect most of this is a matter of IPMI command support and/or
>> enabling
>> those commands to the host IPMI path.  Most of us are fairly
>> untrusting
>> of IPMI (and the Host itself), so there hasn't been work to do
>> anything
>> here.  As long as whatever you're proposing can be disabled for
>> models
>> where we distrust the Host, it seems like these would be accepted
>> as
>> well.


Our current Redfish has multiple users and can enable and 
disable users to have ipmi access and set their password.


Of course this just moves the goal posts to the Redfish 
admin login, but in addition to mTLS certificate based 
trust (which should be customized to the customer), 

Redfish has the concept of a host firmware and os logins
including a binding for EFI to specify adapter path and
network in addition to read-once magic efi variables.  I 
know OpenPOWER boxes don't have EFI but the information
could be exposed in a similar fashion.  As far as I know 
we have not yet implemented these users in our Redfish
server.  



Or designate a physical jumper to tell the BMC to install
a known password.  Where's that turbo button again? :-)

milton


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: OpenBMC on RCS platforms
  2021-04-26 21:42     ` Milton Miller II
@ 2021-04-28 20:21       ` Timothy Pearson
  2021-04-28 21:24         ` OpenBMC on RCS platforms - remote media Joseph Reynolds
  2021-04-29  7:54       ` OpenBMC on RCS platforms Milton Miller II
  1 sibling, 1 reply; 11+ messages in thread
From: Timothy Pearson @ 2021-04-28 20:21 UTC (permalink / raw)
  To: Milton Miller II; +Cc: openbmc



----- Original Message -----
> From: "Milton Miller II" <miltonm@us.ibm.com>
> To: "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "Patrick Williams" <patrick@stwcx.xyz>, "openbmc" <openbmc@lists.ozlabs.org>
> Sent: Monday, April 26, 2021 4:42:16 PM
> Subject: RE: OpenBMC on RCS platforms

[snip]

>>At first glance, that's another overly complex solution for a simple
>>problem that would cause a degraded user experience vs. other
>>platforms.
>>
> 
> I have to agree, both overly complex and probably not useful in that
> its just a port interface for control.
> 
>>We have an 800Mhz Linux-based computer with 512MB of RAM, serial and
>>video out support already integrated into every one of our products.
>>It can receive data via PCIe and via USB from an active host.  Why
>>isn't there a mechanism to send a signed container to it over one of
>>these existing channels for self-update?
>>
>>A potential user story looks like this:
>>
>>=====
>>
>>I want to update the firmware on my Blackbird desktop to fix a
>>problem I'm having with a new control widget I've plugged in.  To
>>make things more interesting, I'm on an oil rig in the Gulf, and the
>>desktop only connects via intermittent WiFi.  Spare parts are weeks
>>away, and I have next to no electronic diagnostic equipment available
>>to me.  There's one or two USB ports I can normally use because I
>>have administrative privileges, but I was able to grab the upgrade
>>file over WiFi instead, saving myself some time cleaning accumulated
>>gunk out of the ports.
>>
>>I can update my <large vendor> standard PC firmware just by running a
>>tool on Windows, but the Blackbird was selected because it controls a
>>critical process that needed to be malware-resistant.
>>
>>Fortunately, OpenBMC implemented a quality firmware update process.
>>I just need to launch a GUI tool with host administrative privileges,
>>select the upgrade file, and queue an upgrade to happen when I reboot
>>the machine.  I queue the update, start the reboot, and stick around
>>to see the upgrade progress on the screen while it's booting back up.
>> Because I can see the status on the screen, I know what is happening
>>and don't pull the power plug due to only seeing a black screen and
>>power LED for 10 minutes.  Finally, the machine loads the OS and I
>>verify the new control widget is working properly.
>>
>>=====
>>
>>Is there a technical / architectural reason this can't be done, or
>>some other reason it's a bad idea?
>>
> 
> I ended up writing this twice or thrice.  Also what I call
> phosphor-initfs is actually the package obmc-phosphor-initfs.bb
> found in meta-phosphor/recipies-phosphor/initrdscripts/.
> 
> 
> There are two issues.  One is that there is no graphics
> library or console code for the aspeed bmc.  I understand a
> text rendering library was added for boot monitoring). But
> if you are starting from the host up, then use the host to
> drive the GUI and just establish a command session (network,
> USB to host, or serial).
> 
> The biggest limitation is we use squashfs for file system
> for space efficency.  This is a read-only filesystem that
> contains references between different pieces that is loaded
> and decompressed by the kernel on demand.  That means you can
> not be running on the copy in flash while trying to update
> that copy in the flash.
> 
> If you have space for two copies then you can update the
> second copy while the primary is online.  This is supported
> in the UBI and eMMC layouts upstream.
> 
> If you only have flash space for one copy then you have to
> arrange for something more limited.  Either way you are
> subject to bricking on interrupted flash unless you do
> something exotic like repurpose the host chip as a backup
> BMC during the process.   But if its just the feedback
> then the upstream code has help that isn't in the Redfish
> flow.

Most of these systems also have a significant amount of RAM available, enough to hold both the update file and the existing BMC Flash contents while the system remains online.  Is there any way we could copy the existing Flash into RAM, then "pivot" the running system to use the copy in RAM as the backing store?

Bricking on power cut is, well, expected during a BMC update without a backup Flash chip.  Not cutting power during a low level firmware update is I think still ingrained sufficiently in the average PC users psyche not to be a significant issue, especially if several warnings are given before and during the update process regarding ensuring power is not cut.  Even if it is cut, the BMC Flash is socketed for a reason.

All that said, ideally, longer term, a recovery partition could be added to the Flash -- basically, a normal BMC update would only update the rofs partition, leaving u-boot, kernel, and the recovery partition alone.  The recovery partition would contain a very small userspace, just enough to accept some kind of network connection for e.g. TFTP upload of a new firmware (similar to how various embedded devices and even small PCs can be recovered).

> 
> ====
> Once
> 
> The "static" mtd layout with phosphor-initfs has support
> for both loading the static flash content into RAM, allowing
> the update to occur with full services running, and as  a
> backup on shutdown it will apply the update on bmc reboot
> by switching back to the initramfs and performing the flash
> from there.  The status of the later update is only visible
> on the console, which might be hidden on an internal serial
> cable by default.
> 
> Unfortunately the "prepare for update" method that was in
> the original update instructions and tells the BMC init
> "hey, load all this content into ram, so that you can write
> over the flash" got lost in the "we must be limited to what
> RedFish can support".  The code is still in the low level
> scripts but the fancy rest api is missing.  Also with the
> addition of code verification the actual flash progress
> was hidden.
> 
> The phosphor-initfs scripts also allow a new filesystem
> image to be downloaded over the network if you wish to test.
> This doesn't have signature checking code, and it can be
> disabled by build options.
> 
> All of the options to phosphor-initfs can be set by u-boot
> environment variables (one of which is cleared by a systemd
> unit each boot, on that is not) and by the kernel command
> line.
> 
> Note: I highly suggest not to use image-bmc (for the whole
> flash) as this erases the entire flash (although we try to
> write back the u-boot environment), but instead use image-kernel,
> image-rofs, etc to allow the prior rwfs and u-boot to persist.
> Some bad assertions may have migrated into the code-update
> rest endpoints and we should accept patches.
> 
> Bottom Line:
> 
> Put the BMC in maintence mode and you can update the image
> while the stack is running.  You can then use ssh to
> display the flash progress.  If you need a fancy gui and
> not the internal serial then use the host, or write the
> rest of the graphics stack.

That's all over external network again, though.  Point is we want to do this from the host -- the host in general is unable to connect to the BMC when the BMC is piggybacking on a host network port (all of our products do this, and a lot of other vendors use the same design).

If we were assured of external BMC network access, updates become very simple.  In this kind of deployment though, there is no external network access to the BMC.

> If you need the reliable backout then you need space for
> a second image, even if its smaller due to being emergency
> servies only.
> 
> 
> PS:  There were some flashes we tried early that had
> horrible erase times -- over 20 minutes for a full
> erase.  Check the specs for the parts you provide vs
> others in the market, the better ones erase in a few
> minutes.

We use the better-specced ones for both BMC and PNOR.

> PPS:  The reason we added UBI was its feature to use
> the whole flash for wear leveling (minus the bootloader
> that is outside the UBI partition).
> 
> =======================================
> Twice: Going back to the scenerio again
> 
>>I just need to launch a GUI tool with host administrative privileges,
>>select the upgrade file, and queue an upgrade to happen when I reboot
>>the machine.  I queue the update, start the reboot, and stick around
>>to see the upgrade progress on the screen while it's booting back up.
>> Because I can see the status on the screen, I know what is happening
>>and don't pull the power plug due to only seeing a black screen and
>>power LED for 10 minutes.  Finally, the machine loads the OS and I
>>verify the new control widget is working properly.
> 
> If the gui is on the host, with todays stock phosphor-initfs, you need
> 1) a connection from the host to the bmc
>   ethernet, serial, usb ethernet etc
>   (to copy files from host to BMC RAM and to monitor command output)

Precisely.  USB would be an interesting control channel, but I don't think OpenBMC currently supports this kind of access?

> 2) hardware ability to reboot bmc with host surviving
> - all userspace has to be replaced with those on the filesystem in RAM
> - can be shortened slightly by preloading image in BMC before shuting
>   down services if the current kernel is compatible.  This can be the
>   old or new image.
> 
> - or -
> 
> Boot the host for GUI support with the BMC in an optimized
> update mode.
> 
>  This can be before or after the file is downloaded to the
>  host.
> 
> 
> 3) Once the bmc is running from a squashfs in RAM (and if you want
> to clean the rwfs overlay, persist on clean reboot/shutdown mode),
> 
> - copy the image to the bmc
> - validate as required (preferably somewhere under /run)
> - move imgage-rofs , kernel, etc as needed to /run/initramfs
> - /run/initramfs/update
>    (which checks the fs is not obviously mounted,
>     runs flashcp, which has status on stdout
>     moves files successfully written
>     and then writes selected overlay content back to rwfs
> - check the images were all written
> - reboot
> 
> =================
> Option Three:
> This might be a better experience but needs some software work
> to enable kexec on the 2500.
> 
> 
> Transfer the FS and kernel to the BMC RAM, and kexec the kernel
> (note patches on the list for 2600 need to test and maybe a bit of
> coding for the 2500).  Optionally this can contain the virt pnor
> image too.  After the BMC boots from the system in RAM boot the
> host from vpnor image in RAM then use the host to drive the GUI
> to acknoledge and initiate the flash as desired.
> 
> The hooks are in phosphor-initfs to flash the image after the
> host is up, and to boot with the image in RAM.
> 
> As an alternative to kexec, if the new file system supports the
> old BMC kernel then the shutdown script can easily be edited to
> restart the exec script with the images in /run.  Alternatively
> if the new kernel supports the old user space then it can be
> flashed first, then on the next boot the prior case applies as
> it is the updated kernel.  Note: I did this flow several times
> in developement but decided not to put code in the shutdown
> script because its a script that is executed from /run/initramfs
> and can easily be edited there when alternative flow is required.
> (there are comments that show where to edit).
> 
> 
>>>> == BMC boot time ==
>>>> 
>>>> This is self explanatory.  Other vendors' solutions allow the host
>>>> to be powered
>>>> on within seconds of power application from the wall, and even our
>>>> own Kestrel
>>>> soft BMC allows the host to begin booting less than 10 seconds
>>>> after power is
>>>> applied.  Several *minutes* for OpenBMC to reach a point where it
>>>> can even
>>>> start to boot the host is a major issue outside of datacenter
>>>> applications.
>>> 
>>> Some of this is, to me, an artifact of the Power architecture and
>>> not an
>>> artifact of OpenBMC explicitly.  On x86 systems we have a little
>>> code in
>>> u-boot that wiggles a GPIO and gets the Host power sequence going
>>> while
>>> the BMC is booting up.  This overlaps quite a bit of the memory
>>> testing
>>> of the Host with the BMC boot time.  The "well-known proprietary
>>> BMC"
>>> also does this same trick.
>>
>>I think we're talking about two different well know proprietary BMCs,
>>but that's not important for this discussion other than no, the one I
>>have in mind doesn't resort to such tricks.  What it does do is start
>>up its core services rapidly enough where this isn't a problem, and
>>lets the rest of the BMC stack start up at its own pace later on.
>> 
>>> Power requires the BMC to be up in order to serve out the virtual
>>> PNOR,
>>> from my recollection.  It seems like this could be solved in other
>>> ways,
>>> such as a SPI-mux on a physical SPI-NOR so that the BMC can take
>>> the NOR
>>> at specific times during update but otherwise it is given to the
>>> host
>>> CPUs.  This is exactly what we do on x86 systems.
>>
>>Ouch.  So on x86 boxen you might actually have two "BMCs" -- the
>>proprietary one inside the CPU that starts in seconds and provides
>>base services like SPI Flash mapping to CPU address space, and the
>>external OpenBMC one that can run in parallel without interfering
>>with host start.  Adding a mux is then a hack needed on top, since
>>you can't really communicate with the proprietary stack in the
>>required manner.
>>
> 
> I'd say their cpu doesn't require the bmc to boot, it also means
> they trust their system to not melt without bmc monitoring.

I'd argue it's really a bit of semantics. :)  x86 systems have a sort of proto-BMC built right in to every single CPU, in the form of the ME/PSP and its associated firmware, that can provide various functions including (IIRC) thermal control.  On the ARM side, you're probably right, they're a bit more primitive in terms of just mapping Flash directly to the CPU address space on low end parts, though I think (?) the modern higher end parts are back to a sort of "security manager" BMC-analogue providing these basic services to the host CPU.

Regardless, POWER does stick out like a sore thumb for shoving these low level functions into the high level "full-stack" BMC.  Architecturally, it may not have been the best decision, but I do understand it sped time to market etc.   Fortunately, it's also something we can work to fix.

>>For systems like POWER that lack the proprietary internal "BMC", I
>>guess there are a few ways we could address the problem:
>>
>>1.) Speed up OpenBMC load -- this sounds like it would end up being
>>completely supported by one or two vendors alone, and subject to
>>breakage from the other vendors that simply don't have any concerns
>>around OpenBMC start time since their platforms aren't visibly
>>affected by it.  It's also unlikely to come into the desired sub-10s
>>range.
>>
>>2.) Split the BMC into "essential" and "nice to have" services, much
>>like the other platforms.  Painful, as it now requires even more
>>parts on the mainboard.
>>
>>3.) Keep the single BMC device, but split it into two software
>>stacks, one that can load nearly instantly and start providing
>>essential services, and another than can load more slowly.  This
>>would effectively require two separate CPUs inside the BMC, which we
>>actually do have in the AST2500.  I haven't done any digging though
>>to see if the second CPU is powerful enough to implement the HIOMAP
>>protocol at speed.
>>
>>> Having said all of that, there is certainly some performance
>>> improvements that can be done, but nobody has taken up the torch on
>>> it.
>>> A big low-hanging fruit in my mind is the file system compression
>>> being
>>> xz or gzip is very computationally intensive.  I did some work,
>>> with
>>> Nick Terrell, to switch to zstd on our systems for both the kernel
>>> initramfs and UBI and saw significant boot time improvements.  The
>>> upstream enablement for this appears to have landed as of v5.9 so
>>> we
>>> could certainly start enabling it here now.
>>> 
>>>
>>INVALID URI REMOVED
>>linux-2Dkbuild_20200730190841.2071656-2D7-2Dnickrterrell-40gmail.com_
>>&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=bvv7AJEECoRKBU02rcu4F5DWd-EwX8As
>>2xrXeO9ZSo4&m=2O37p_XR8IO9jl4psZwnU-fmhndTW41NpqMXsT9Or6w&s=DF7yGqfSE
>>-V5_j_DgmASLOgLpkfjcJpCK5xsJW3avqY&e=
>>> 
> 
> In addition to compression options there are tradeoffs on how much is
> copied to ram vs how much is read from the flash possibly repeatedly.
> If you add secure boot the time goes up.

Yeah, I'm really coming around to the idea that we need to embrace the split architecture every other system uses.  The LPC bridge and base power / fan controls really should be running independently on the ColdFire core, not on the main "full stack" BMC ARM core, and even for Kestrel we're exploring something similar (though in that case, it's mainly so that the host doesn't die if we accidentally crash the main CPU).

>>>> == Host boot status indications ==
>>>> 
>>>> Any ODM that makes server products has had to deal with the
>>>> psychological "dead
>>>> server effect", where lack of visible progress during boot causes
>>>> spurious
>>>> callouts / RMAs.  It's even worse on desktop, especially if
>>>> server-type
>>>> hardware is used inside the machine.  We've worked around this a
>>>> few times with
>>>> our "IPL observer" services, and really do need this functionality
>>>> in OpenBMC.
>>>> The current version we have is both front panel lights and a
>>>> progress bar on
>>>> the BMC boot monitor (VGA/HDMI), and this is something we're
>>>> willing to
>>>> contribute upstream.
>>> 
>>> Great!  Let's get that merged!
>>
>>Sounds good!  The files aren't too complex:
>>
>>INVALID URI REMOVED
>>_git_blackbird-2Dskeleton_tree_pyiplobserver&d=DwIFaQ&c=jf_iaSHvJObTb
>>x-siA1ZOg&r=bvv7AJEECoRKBU02rcu4F5DWd-EwX8As2xrXeO9ZSo4&m=2O37p_XR8IO
>>9jl4psZwnU-fmhndTW41NpqMXsT9Or6w&s=zLtrjaE2hHjV3z9ar0gcJVvZ9Uzwxinfed
>>AOMEWs04s&e=
>>INVALID URI REMOVED
>>_git_blackbird-2Dskeleton_tree_pyiplledmonitor&d=DwIFaQ&c=jf_iaSHvJOb
>>Tbx-siA1ZOg&r=bvv7AJEECoRKBU02rcu4F5DWd-EwX8As2xrXeO9ZSo4&m=2O37p_XR8
>>IO9jl4psZwnU-fmhndTW41NpqMXsT9Or6w&s=AOWB1Ja82thvSZFO81WfIj7MJtg5TeZN
>>8wpT_EpG_Zo&e=
>>
>>Is the skeleton repository the best place for a merge request?
> 
> hmm, as prototype code in python, maybe.   I don't think many current
> systems ship python.  Also upstream Yocto removed all support for
> python 2.
> 
> In addition I see a mix of "copy the data" and "transform the data"
> in the same script, such as
> 
> updateIPLLeds(self, initial_start, status_changed)
> 
> with
>            # Show major ISTEP on LED bank
>            # On Talos we only have three LEDs plus a fourth indicator modification
>            # bit, but the major ISTEPs range from 2 to 21
>            # Try to condense that down to something more readily displayable
> 
> 
> [ After some thought, its ok to be in the output code, as it's
> formatting the data for the display. ]
> 
> 
> The upstream post interface logs the post codes, and display is
> a separate function.  The ipl_status_monitor seems to mix monitoring
> the port 80 snoops with other logic to determine the system state
> eg is the host up?.
> 
> Also both scripts extensivly use popen to handle device communication
> and some communication to other services (kill to post code).
> 
> 
>>
>>> I do think some others have support for a 7-seg display with the
>>> postcodes going to it already.  I think this is along those same
>>> lines.
>>> It might just be another back-end for our existing post code daemon
>>> to
>>> replicate them to the VGA and/or blink morse code on an LED.
>>
>>OK, so this is what we ran into before.  Where is this support
>>in-tree, and do we need to reimplement our system to match what
>>already exists (by extension, extending the other vendor code since
>>our observer is more detailed in terms of status etc.), or would we
>>be allowed to provide a competing solution to this other support,
>>letting ODMs pick which one they wanted?
>>
> 
> Our upstream code is at https://github.com/openbmc/phosphor-host-postd
> for the snoop readers and the LED segment drivers, and the history
> and Dbus owner is https://github.com/openbmc/phosphor-post-code-manager.
> 
> To catalog the source of the host and bmc there is
> https://github.com/openbmc/phosphor-state-manager/blob/master/obmcutil
> 
> In addition to phosphor-misc for "one file projects" there is
> openbmc-tools for handy tools which may be more developer focused.

So it sounds like we'd need to rewrite this as a set of patches for phosphor-post-code-manager?  Would they actually be merged or would we run into resistance to extending the functionality of that system for our use case?

>>>> == IPMI / BMC permissions ==
>>>> 
>>>> An item that's come up recently is that, at least on our older
>>>> OpenBMC versions,
>>>> there's a complete disconnect between the BMC's shell user
>>>> database and the
>>>> IPMI user database.
> 
> Mostly true, in part because the IPMI password for RCMP+ must be
> stored on the BMC (reversiably encrypted for our implementation).
> Note improper storage of this was an area of one or more CVEs.
> 
> In addition it has a limit of 20 characters in a password and 8
> users.
> 
>>>> Resetting the BMC root password isn't possible from IPMI
>>>> on the host, and setting up IPMI doesn't seem possible from the
>>>>>BMC shell.  If
> 
> In our current code we have pam hooks that save the password
> during a change, if the user is in the ipmi group and the
> password is short enough (or returns an error).
> 
>>>> IPMI support is something OpenBMC provides alongside Redfish, it
>>>> needs to be
>>>> better integrated -- we're dealing with multiple locked-out BMC
>>>> issues at the
>>>> moment at various customer sites, and the recovery method is
>>>> painful at best
>>>> when it should be as simple as an ipmitool command from the host
>>>> terminal.
>>> 
>>> I suspect most of this is a matter of IPMI command support and/or
>>> enabling
>>> those commands to the host IPMI path.  Most of us are fairly
>>> untrusting
>>> of IPMI (and the Host itself), so there hasn't been work to do
>>> anything
>>> here.  As long as whatever you're proposing can be disabled for
>>> models
>>> where we distrust the Host, it seems like these would be accepted
>>> as
>>> well.
> 
> 
> Our current Redfish has multiple users and can enable and
> disable users to have ipmi access and set their password.
> 
> 
> Of course this just moves the goal posts to the Redfish
> admin login, but in addition to mTLS certificate based
> trust (which should be customized to the customer),
> 
> Redfish has the concept of a host firmware and os logins
> including a binding for EFI to specify adapter path and
> network in addition to read-once magic efi variables.  I
> know OpenPOWER boxes don't have EFI but the information
> could be exposed in a similar fashion.  As far as I know
> we have not yet implemented these users in our Redfish
> server.

Honestly Redfish is something that we might just want to move to, and officially / formally drop network IPMI support.  Probably the biggest issue with that comes right back down to needing communication between the host and BMC, however -- ipmitool shortcuts the whole BMC/host network isolation problem (described above) by using the USB interface.  Is there a way to use Redfish over USB in a similar manner?

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: OpenBMC on RCS platforms - remote media
  2021-04-28 20:21       ` Timothy Pearson
@ 2021-04-28 21:24         ` Joseph Reynolds
  2021-06-03 12:29           ` Konstantin Klubnichkin
  0 siblings, 1 reply; 11+ messages in thread
From: Joseph Reynolds @ 2021-04-28 21:24 UTC (permalink / raw)
  To: Timothy Pearson, Milton Miller II; +Cc: openbmc

On 4/28/21 3:21 PM, Timothy Pearson wrote:
>
> ----- Original Message -----
>> From: "Milton Miller II" <miltonm@us.ibm.com>
>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>> Cc: "Patrick Williams" <patrick@stwcx.xyz>, "openbmc" <openbmc@lists.ozlabs.org>
>> Sent: Monday, April 26, 2021 4:42:16 PM
>> Subject: RE: OpenBMC on RCS platforms
> [snip]
>
>

...snip...

>>> I just need to launch a GUI tool with host administrative privileges,
>>> select the upgrade file, and queue an upgrade to happen when I reboot
>>> the machine.  I queue the update, start the reboot, and stick around
>>> to see the upgrade progress on the screen while it's booting back up.
>>> Because I can see the status on the screen, I know what is happening
>>> and don't pull the power plug due to only seeing a black screen and
>>> power LED for 10 minutes.  Finally, the machine loads the OS and I
>>> verify the new control widget is working properly.
>> If the gui is on the host, with todays stock phosphor-initfs, you need
>> 1) a connection from the host to the bmc
>>    ethernet, serial, usb ethernet etc
>>    (to copy files from host to BMC RAM and to monitor command output)
> Precisely.  USB would be an interesting control channel, but I don't think OpenBMC currently supports this kind of access?

If (if) I am following correctly, you want the OpenBMC virtual media 
(aka remote media) implementation?
https://github.com/openbmc/docs/blob/master/designs/virtual-media.md

Is there an implementation?  I didn't find one listed here:
https://github.com/openbmc/docs/blob/master/features.md

- Joseph


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: OpenBMC on RCS platforms
  2021-04-26 21:42     ` Milton Miller II
  2021-04-28 20:21       ` Timothy Pearson
@ 2021-04-29  7:54       ` Milton Miller II
  1 sibling, 0 replies; 11+ messages in thread
From: Milton Miller II @ 2021-04-29  7:54 UTC (permalink / raw)
  To: Timothy Pearson; +Cc: openbmc



-----Timothy Pearson <tpearson@raptorengineering.com> wrote: -----

>To: Milton Miller II <miltonm@us.ibm.com>
>From: Timothy Pearson <tpearson@raptorengineering.com>
>Date: 04/28/2021 03:22PM
>Cc: Patrick Williams <patrick@stwcx.xyz>, openbmc
><openbmc@lists.ozlabs.org>
>Subject: [EXTERNAL] Re: OpenBMC on RCS platforms
>
>
>----- Original Message -----
>> From: "Milton Miller II" <miltonm@us.ibm.com>
>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>> Cc: "Patrick Williams" <patrick@stwcx.xyz>, "openbmc"
><openbmc@lists.ozlabs.org>
>> Sent: Monday, April 26, 2021 4:42:16 PM
>> Subject: RE: OpenBMC on RCS platforms
>
>[snip]
>
>>>At first glance, that's another overly complex solution for a
>>>simple
>>>problem that would cause a degraded user experience vs. other
>>>platforms.
>>>
>> 
>> I have to agree, both overly complex and probably not useful in
>> that
>> its just a port interface for control.
>> 
>>>We have an 800Mhz Linux-based computer with 512MB of RAM, serial
>>>and
>>>video out support already integrated into every one of our
>>>products.
>>>It can receive data via PCIe and via USB from an active host.  Why
>>>isn't there a mechanism to send a signed container to it over one
>>>of
>>>these existing channels for self-update?
>>>
>>>A potential user story looks like this:
>>>
>>>=====
>>>
>>>I want to update the firmware on my Blackbird desktop to fix a
>>>problem I'm having with a new control widget I've plugged in.  To
>>>make things more interesting, I'm on an oil rig in the Gulf, and
>>>the
>>>desktop only connects via intermittent WiFi.  Spare parts are weeks
>>>away, and I have next to no electronic diagnostic equipment
>>>available
>>>to me.  There's one or two USB ports I can normally use because I
>>>have administrative privileges, but I was able to grab the upgrade
>>>file over WiFi instead, saving myself some time cleaning
>>>accumulated
>>>gunk out of the ports.
>>>
>>>I can update my <large vendor> standard PC firmware just by running
>>>a
>>>tool on Windows, but the Blackbird was selected because it controls
>>>a
>>>critical process that needed to be malware-resistant.
>>>
>>>Fortunately, OpenBMC implemented a quality firmware update process.
>>>I just need to launch a GUI tool with host administrative
>>>privileges,
>>>select the upgrade file, and queue an upgrade to happen when I
>>>reboot
>>>the machine.  I queue the update, start the reboot, and stick
>>>around
>>>to see the upgrade progress on the screen while it's booting back
>>>up.
>>> Because I can see the status on the screen, I know what is
>>>happening
>>>and don't pull the power plug due to only seeing a black screen and
>>>power LED for 10 minutes.  Finally, the machine loads the OS and I
>>>verify the new control widget is working properly.
>>>
>>>=====
>>>
>>>Is there a technical / architectural reason this can't be done, or
>>>some other reason it's a bad idea?
>>>
>> 
>> I ended up writing this twice or thrice.  Also what I call
>> phosphor-initfs is actually the package obmc-phosphor-initfs.bb
>> found in meta-phosphor/recipies-phosphor/initrdscripts/.
>> 
>> 
>> There are two issues.  One is that there is no graphics
>> library or console code for the aspeed bmc.  I understand a
>> text rendering library was added for boot monitoring). But
>> if you are starting from the host up, then use the host to
>> drive the GUI and just establish a command session (network,
>> USB to host, or serial).
>> 
>> The biggest limitation is we use squashfs for file system
>> for space efficency.  This is a read-only filesystem that
>> contains references between different pieces that is loaded
>> and decompressed by the kernel on demand.  That means you can
>> not be running on the copy in flash while trying to update
>> that copy in the flash.
>> 
>> If you have space for two copies then you can update the
>> second copy while the primary is online.  This is supported
>> in the UBI and eMMC layouts upstream.
>> 
>> If you only have flash space for one copy then you have to
>> arrange for something more limited.  Either way you are
>> subject to bricking on interrupted flash unless you do
>> something exotic like repurpose the host chip as a backup
>> BMC during the process.   But if its just the feedback
>> then the upstream code has help that isn't in the Redfish
>> flow.
>
>Most of these systems also have a significant amount of RAM
>available, enough to hold both the update file and the existing BMC
>Flash contents while the system remains online.  Is there any way we
>could copy the existing Flash into RAM, then "pivot" the running
>system to use the copy in RAM as the backing store?

[See also the Thrice description ...]

There is no version of filesystem that I am aware of that 
says "instead of using layer x, start using layer y that 
will have the same content".

The existing init script has a config option to copy the 
contents from the flash to RAM then loop mount the file.  
Of course this will likely increase the boot time because 
all content had to be copied from the flash before starting 
any userspace from the volume.   Also the copy uses all
space alloced to the rofs layer; it is not smart enough to
only copy the length of the squashfs contents even though 
that is in the filesystem header.

Thinking a bit this evening, squashfs uses a block device 
for storage so one could use DM to create a 1-member 
degraded raid1 on the mtdblock device, and add a ramdisk 
block drive (rd) as the mirror.  The ramdisk can be added 
as a degraded volume after boot to avoid having the kernel
spending time coping the data instead of starting the real
userspace.  After the rd copy is synced, one could remove 
the mtdblock volume from the raid1.

This requires access to dm-tools to setup the raid unless 
the in-kernel raid metadata would work on a mtdblock 
volume.   The md layerprobably wants to update the 
superblock of the good volume or something.

The above got the rofs, but didn't address the rwfs.  We
use jffs2 today.  While there are other options, the mtd
writable filesystems understand the large erase blocks but
the general block device file systems do not.

The existing init does have an option to copy designated 
files from the host to a tmpfs, and from the tmpfs back 
to the rwfs.  It also can erase the rwfs partition.  This
can be used for freeing the rwfs space during the firmware
update but on an abnormal shutdown the updates to the
rwfs are lost, be they logs or configuration updates.


>
>Bricking on power cut is, well, expected during a BMC update without
>a backup Flash chip.  Not cutting power during a low level firmware
>update is I think still ingrained sufficiently in the average PC
>users psyche not to be a significant issue, especially if several
>warnings are given before and during the update process regarding
>ensuring power is not cut.  Even if it is cut, the BMC Flash is
>socketed for a reason.
>
>All that said, ideally, longer term, a recovery partition could be
>added to the Flash -- basically, a normal BMC update would only
>update the rofs partition, leaving u-boot, kernel, and the recovery
>partition alone.  The recovery partition would contain a very small
>userspace, just enough to accept some kind of network connection for
>e.g. TFTP upload of a new firmware (similar to how various embedded
>devices and even small PCs can be recovered).
>
>> 
>> ====
>> Once
>> 
>> The "static" mtd layout with phosphor-initfs has support
>> for both loading the static flash content into RAM, allowing
>> the update to occur with full services running, and as  a
>> backup on shutdown it will apply the update on bmc reboot
>> by switching back to the initramfs and performing the flash
>> from there.  The status of the later update is only visible
>> on the console, which might be hidden on an internal serial
>> cable by default.
>> 
>> Unfortunately the "prepare for update" method that was in
>> the original update instructions and tells the BMC init
>> "hey, load all this content into ram, so that you can write
>> over the flash" got lost in the "we must be limited to what
>> RedFish can support".  The code is still in the low level
>> scripts but the fancy rest api is missing.  Also with the
>> addition of code verification the actual flash progress
>> was hidden.
>> 
>> The phosphor-initfs scripts also allow a new filesystem
>> image to be downloaded over the network if you wish to test.
>> This doesn't have signature checking code, and it can be
>> disabled by build options.
>> 
>> All of the options to phosphor-initfs can be set by u-boot
>> environment variables (one of which is cleared by a systemd
>> unit each boot, on that is not) and by the kernel command
>> line.
>> 
>> Note: I highly suggest not to use image-bmc (for the whole
>> flash) as this erases the entire flash (although we try to
>> write back the u-boot environment), but instead use image-kernel,
>> image-rofs, etc to allow the prior rwfs and u-boot to persist.
>> Some bad assertions may have migrated into the code-update
>> rest endpoints and we should accept patches.
>> 
>> Bottom Line:
>> 
>> Put the BMC in maintence mode and you can update the image
>> while the stack is running.  You can then use ssh to
>> display the flash progress.  If you need a fancy gui and
>> not the internal serial then use the host, or write the
>> rest of the graphics stack.
>
>That's all over external network again, though.  Point is we want to
>do this from the host -- the host in general is unable to connect to
>the BMC when the BMC is piggybacking on a host network port (all of
>our products do this, and a lot of other vendors use the same
>design).

Well, Intel i210 has a bmc controlled mode to control if the host
can see the network, the bmc, or both.   However, it also allows 
the bmc to redirect any traffic to itself, so that is another can
of worms.

Point is, can your customized firmware add BMC to Host networking?

>
>If we were assured of external BMC network access, updates become
>very simple.  In this kind of deployment though, there is no external
>network access to the BMC.
>
>> If you need the reliable backout then you need space for
>> a second image, even if its smaller due to being emergency
>> servies only.
>> 
>> 
>> PS:  There were some flashes we tried early that had
>> horrible erase times -- over 20 minutes for a full
>> erase.  Check the specs for the parts you provide vs
>> others in the market, the better ones erase in a few
>> minutes.
>
>We use the better-specced ones for both BMC and PNOR.
>
>> PPS:  The reason we added UBI was its feature to use
>> the whole flash for wear leveling (minus the bootloader
>> that is outside the UBI partition).
>> 
>> =======================================
>> Twice: Going back to the scenerio again
>> 
>>>I just need to launch a GUI tool with host administrative
>privileges,
>>>select the upgrade file, and queue an upgrade to happen when I
>reboot
>>>the machine.  I queue the update, start the reboot, and stick
>around
>>>to see the upgrade progress on the screen while it's booting back
>up.
>>> Because I can see the status on the screen, I know what is
>happening
>>>and don't pull the power plug due to only seeing a black screen and
>>>power LED for 10 minutes.  Finally, the machine loads the OS and I
>>>verify the new control widget is working properly.
>> 
>> If the gui is on the host, with todays stock phosphor-initfs, you
>need
>> 1) a connection from the host to the bmc
>>   ethernet, serial, usb ethernet etc
>>   (to copy files from host to BMC RAM and to monitor command
>output)
>
>Precisely.  USB would be an interesting control channel, but I don't
>think OpenBMC currently supports this kind of access?
>

Actually the current usb-ctrl script has an option to configure the
ecm gadget, and there are patches to update the script to use 
defined mac addresses.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/Kconfig?h=v5.12#n281
https://gerrit.openbmc-project.xyz/c/openbmc/phosphor-misc/+/42280/

>> 2) hardware ability to reboot bmc with host surviving
>> - all userspace has to be replaced with those on the filesystem in
>> RAM
>> - can be shortened slightly by preloading image in BMC before
>> shuting
>>   down services if the current kernel is compatible.  This can be
>> the
>>   old or new image.
>> 
>> - or -
>> 
>> Boot the host for GUI support with the BMC in an optimized
>> update mode.
>> 
>>  This can be before or after the file is downloaded to the
>>  host.
>> 
>> 
>> 3) Once the bmc is running from a squashfs in RAM (and if you want
>> to clean the rwfs overlay, persist on clean reboot/shutdown mode),
>> 
>> - copy the image to the bmc
>> - validate as required (preferably somewhere under /run)
>> - move imgage-rofs , kernel, etc as needed to /run/initramfs
>> - /run/initramfs/update
>>    (which checks the fs is not obviously mounted,
>>     runs flashcp, which has status on stdout
>>     moves files successfully written
>>     and then writes selected overlay content back to rwfs
>> - check the images were all written
>> - reboot
>> 
>> =================
>> Option Three:
>> This might be a better experience but needs some software work
>> to enable kexec on the 2500.
>> 
>> 
>> Transfer the FS and kernel to the BMC RAM, and kexec the kernel
>> (note patches on the list for 2600 need to test and maybe a bit of
>> coding for the 2500).  Optionally this can contain the virt pnor
>> image too.  After the BMC boots from the system in RAM boot the
>> host from vpnor image in RAM then use the host to drive the GUI
>> to acknoledge and initiate the flash as desired.
>> 
>> The hooks are in phosphor-initfs to flash the image after the
>> host is up, and to boot with the image in RAM.
>> 
>> As an alternative to kexec, if the new file system supports the
>> old BMC kernel then the shutdown script can easily be edited to
>> restart the exec script with the images in /run.  Alternatively
>> if the new kernel supports the old user space then it can be
>> flashed first, then on the next boot the prior case applies as
>> it is the updated kernel.  Note: I did this flow several times
>> in developement but decided not to put code in the shutdown
>> script because its a script that is executed from /run/initramfs
>> and can easily be edited there when alternative flow is required.
>> (there are comments that show where to edit).
>> 
>> 
>>>>> == BMC boot time ==
>>>>> 
>>>>> This is self explanatory.  Other vendors' solutions allow the
>host
>>>>> to be powered
>>>>> on within seconds of power application from the wall, and even
>>>>> our
>>>>> own Kestrel
>>>>> soft BMC allows the host to begin booting less than 10 seconds
>>>>> after power is
>>>>> applied.  Several *minutes* for OpenBMC to reach a point where
>>>>> it
>>>>> can even
>>>>> start to boot the host is a major issue outside of datacenter
>>>>> applications.
>>>> 
>>>> Some of this is, to me, an artifact of the Power architecture and
>>>> not an
>>>> artifact of OpenBMC explicitly.  On x86 systems we have a little
>>>> code in
>>>> u-boot that wiggles a GPIO and gets the Host power sequence going
>>>> while
>>>> the BMC is booting up.  This overlaps quite a bit of the memory
>>>> testing
>>>> of the Host with the BMC boot time.  The "well-known proprietary
>>>> BMC"
>>>> also does this same trick.
>>>
>>>I think we're talking about two different well know proprietary
>>>BMCs,
>>>but that's not important for this discussion other than no, the one
>>>I
>>>have in mind doesn't resort to such tricks.  What it does do is
>>>start
>>>up its core services rapidly enough where this isn't a problem, and
>>>lets the rest of the BMC stack start up at its own pace later on.
>>> 
>>>> Power requires the BMC to be up in order to serve out the virtual
>>>> PNOR,
>>>> from my recollection.  It seems like this could be solved in
>>>> other
>>>> ways,
>>>> such as a SPI-mux on a physical SPI-NOR so that the BMC can take
>>>> the NOR
>>>> at specific times during update but otherwise it is given to the
>>>> host
>>>> CPUs.  This is exactly what we do on x86 systems.
>>>
>>>Ouch.  So on x86 boxen you might actually have two "BMCs" -- the
>>>proprietary one inside the CPU that starts in seconds and provides
>>>base services like SPI Flash mapping to CPU address space, and the
>>>external OpenBMC one that can run in parallel without interfering
>>>with host start.  Adding a mux is then a hack needed on top, since
>>>you can't really communicate with the proprietary stack in the
>>>required manner.
>>>
>> 
>> I'd say their cpu doesn't require the bmc to boot, it also means
>> they trust their system to not melt without bmc monitoring.
>
>I'd argue it's really a bit of semantics. :)  x86 systems have a sort
>of proto-BMC built right in to every single CPU, in the form of the
>ME/PSP and its associated firmware, that can provide various
>functions including (IIRC) thermal control.  On the ARM side, you're
>probably right, they're a bit more primitive in terms of just mapping
>Flash directly to the CPU address space on low end parts, though I
>think (?) the modern higher end parts are back to a sort of "security
>manager" BMC-analogue providing these basic services to the host CPU.
>
>Regardless, POWER does stick out like a sore thumb for shoving these
>low level functions into the high level "full-stack" BMC.
>Architecturally, it may not have been the best decision, but I do
>understand it sped time to market etc.   Fortunately, it's also
>something we can work to fix.

Hostboot can probably boot a decent way up with just a readonly
mapping of the flash.  Either copy the image to ram or just the
ioctl to the flash chip if the pnor flash is full image.


>
>>>For systems like POWER that lack the proprietary internal "BMC", I
>>>guess there are a few ways we could address the problem:
>>>
>>>1.) Speed up OpenBMC load -- this sounds like it would end up being
>>>completely supported by one or two vendors alone, and subject to
>>>breakage from the other vendors that simply don't have any concerns
>>>around OpenBMC start time since their platforms aren't visibly
>>>affected by it.  It's also unlikely to come into the desired
>>>>sub-10s
>>>range.
>>>
>>>2.) Split the BMC into "essential" and "nice to have" services,
>>>>much
>>>like the other platforms.  Painful, as it now requires even more
>>>parts on the mainboard.
>>>
>>>3.) Keep the single BMC device, but split it into two software
>>>stacks, one that can load nearly instantly and start providing
>>>essential services, and another than can load more slowly.  This
>>>would effectively require two separate CPUs inside the BMC, which
>>>we
>>>actually do have in the AST2500.  I haven't done any digging though
>>>to see if the second CPU is powerful enough to implement the HIOMAP
>>>protocol at speed.
X>>>
>>>> Having said all of that, there is certainly some performance
>>>> improvements that can be done, but nobody has taken up the torch
>>>> on
>>>> it.
>>>> A big low-hanging fruit in my mind is the file system compression
>>>> being
>>>> xz or gzip is very computationally intensive.  I did some work,
>>>> with
>>>> Nick Terrell, to switch to zstd on our systems for both the
>>>> kernel
>>>> initramfs and UBI and saw significant boot time improvements.
>>>> The
>>>> upstream enablement for this appears to have landed as of v5.9 so
>>>> we
>>>> could certainly start enabling it here now.
>>>> 
>>>>
>>>INVALID URI REMOVED
>>>linux-2Dkbuild_20200730190841.2071656-2D7-2Dnickrterrell-40gmail.co
>m_
>>>&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=bvv7AJEECoRKBU02rcu4F5DWd-EwX8
>As
>>>2xrXeO9ZSo4&m=2O37p_XR8IO9jl4psZwnU-fmhndTW41NpqMXsT9Or6w&s=DF7yGqf
>SE
>>>-V5_j_DgmASLOgLpkfjcJpCK5xsJW3avqY&e=
>>>> 
>> 
>> In addition to compression options there are tradeoffs on how much
>> is
>> copied to ram vs how much is read from the flash possibly
>> repeatedly.
>> If you add secure boot the time goes up.
>
>Yeah, I'm really coming around to the idea that we need to embrace
>the split architecture every other system uses.  The LPC bridge and
>base power / fan controls really should be running independently on
>the ColdFire core, not on the main "full stack" BMC ARM core, and
>even for Kestrel we're exploring something similar (though in that
>case, it's mainly so that the host doesn't die if we accidentally
>crash the main CPU).

Have you looked at starting hiomap early?  and telling hostboot
assume the whole image is there until you need to write?

Can you get by with a fixed fan through memory post where 
hostboot is running a single core?

>
>>>>> == Host boot status indications ==
>>>>> 
>>>>> Any ODM that makes server products has had to deal with the
>>>>> psychological "dead
>>>>> server effect", where lack of visible progress during boot
>>>>> causes
>>>>> spurious
>>>>> callouts / RMAs.  It's even worse on desktop, especially if
>>>>> server-type
>>>>> hardware is used inside the machine.  We've worked around this a
>>>>> few times with
>>>>> our "IPL observer" services, and really do need this
>>>>> functionality
>>>>> in OpenBMC.
>>>>> The current version we have is both front panel lights and a
>>>>> progress bar on
>>>>> the BMC boot monitor (VGA/HDMI), and this is something we're
>>>>> willing to
>>>>> contribute upstream.
>>>> 
>>>> Great!  Let's get that merged!
>>>
>>>Sounds good!  The files aren't too complex:
>>>
>>>INVALID URI REMOVED
>>>_git_blackbird-2Dskeleton_tree_pyiplobserver&d=DwIFaQ&c=jf_iaSHvJOb
>Tb
>>>x-siA1ZOg&r=bvv7AJEECoRKBU02rcu4F5DWd-EwX8As2xrXeO9ZSo4&m=2O37p_XR8
>IO
>>>9jl4psZwnU-fmhndTW41NpqMXsT9Or6w&s=zLtrjaE2hHjV3z9ar0gcJVvZ9Uzwxinf
>ed
>>>AOMEWs04s&e=
>>>INVALID URI REMOVED
>>>_git_blackbird-2Dskeleton_tree_pyiplledmonitor&d=DwIFaQ&c=jf_iaSHvJ
>Ob
>>>Tbx-siA1ZOg&r=bvv7AJEECoRKBU02rcu4F5DWd-EwX8As2xrXeO9ZSo4&m=2O37p_X
>R8
>>>IO9jl4psZwnU-fmhndTW41NpqMXsT9Or6w&s=AOWB1Ja82thvSZFO81WfIj7MJtg5Te
>ZN
>>>8wpT_EpG_Zo&e=
>>>
>>>Is the skeleton repository the best place for a merge request?
>> 
>> hmm, as prototype code in python, maybe.   I don't think many
>> current
>> systems ship python.  Also upstream Yocto removed all support for
>> python 2.
>> 
>> In addition I see a mix of "copy the data" and "transform the data"
>> in the same script, such as
>> 
>> updateIPLLeds(self, initial_start, status_changed)
>> 
>> with
>>            # Show major ISTEP on LED bank
>>            # On Talos we only have three LEDs plus a fourth
>> indicator modification
>>            # bit, but the major ISTEPs range from 2 to 21
>>            # Try to condense that down to something more readily
>> displayable
>> 
>> 
>> [ After some thought, its ok to be in the output code, as it's
>> formatting the data for the display. ]
>> 
>> 
>> The upstream post interface logs the post codes, and display is
>> a separate function.  The ipl_status_monitor seems to mix
>> monitoring
>> the port 80 snoops with other logic to determine the system state
>> eg is the host up?.
>> 
>> Also both scripts extensivly use popen to handle device
>> communication
>> and some communication to other services (kill to post code).
>> 
>> 
>>>
>>>> I do think some others have support for a 7-seg display with the
>>>> postcodes going to it already.  I think this is along those same
>>>> lines.
>>>> It might just be another back-end for our existing post code
>>>> daemon
>>>> to
>>>> replicate them to the VGA and/or blink morse code on an LED.
>>>
>>>OK, so this is what we ran into before.  Where is this support
>>>in-tree, and do we need to reimplement our system to match what
>>>already exists (by extension, extending the other vendor code since
>>>our observer is more detailed in terms of status etc.), or would we
>>>be allowed to provide a competing solution to this other support,
>>>letting ODMs pick which one they wanted?
>>>
>> 
>> Our upstream code is at
>> INVALID URI REMOVED
>> mc_phosphor-2Dhost-2Dpostd&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=bvv7AJ
>> EECoRKBU02rcu4F5DWd-EwX8As2xrXeO9ZSo4&m=xjFkq8WLDkwI0WAERUAgRudWVVcq4
>> 10qqWVjl-Ka4-M&s=VJweaMcJOe7tEVYUZkBMijTisbfRMkTVN9ga4s-d8Xw&e= 
>> for the snoop readers and the LED segment drivers, and the history
>> and Dbus owner is
>> INVALID URI REMOVED
>> mc_phosphor-2Dpost-2Dcode-2Dmanager&d=DwICaQ&c=jf_iaSHvJObTbx-siA1ZOg
>> &r=bvv7AJEECoRKBU02rcu4F5DWd-EwX8As2xrXeO9ZSo4&m=xjFkq8WLDkwI0WAERUAg
>> RudWVVcq410qqWVjl-Ka4-M&s=esXQVng9zvQlGkdhvk0dH12NrhHkOGgAsIqi_MLaPRY
>> &e= .
>> 
>> To catalog the source of the host and bmc there is
>>
>INVALID URI REMOVED
>mc_phosphor-2Dstate-2Dmanager_blob_master_obmcutil&d=DwICaQ&c=jf_iaSH
>vJObTbx-siA1ZOg&r=bvv7AJEECoRKBU02rcu4F5DWd-EwX8As2xrXeO9ZSo4&m=xjFkq
>8WLDkwI0WAERUAgRudWVVcq410qqWVjl-Ka4-M&s=brrhp8N0QwEZNOkddBmqUH6OoufV
>H5DQxo4GAPBCMbo&e= 
>> 
>> In addition to phosphor-misc for "one file projects" there is
>> openbmc-tools for handy tools which may be more developer focused.
>
>So it sounds like we'd need to rewrite this as a set of patches for
>phosphor-post-code-manager?  Would they actually be merged or would
>we run into resistance to extending the functionality of that system
>for our use case?

Actually I think the manager would stay, and you might be adding
an appliation similar to the 7 segemnt led driver in the the
phosphor-host-postd to take the data snooped from port 80 and
formatting it for your display.

>
>>>>> == IPMI / BMC permissions ==
>>>>> 
>>>>> An item that's come up recently is that, at least on our older
>>>>> OpenBMC versions,
>>>>> there's a complete disconnect between the BMC's shell user
>>>>> database and the
>>>>> IPMI user database.
>> 
>> Mostly true, in part because the IPMI password for RCMP+ must be
>> stored on the BMC (reversiably encrypted for our implementation).
>> Note improper storage of this was an area of one or more CVEs.
>> 
>> In addition it has a limit of 20 characters in a password and 8
>> users.
>> 
>>>>> Resetting the BMC root password isn't possible from IPMI
>>>>> on the host, and setting up IPMI doesn't seem possible from the
>>>>>>BMC shell.  If
>> 
>> In our current code we have pam hooks that save the password
>> during a change, if the user is in the ipmi group and the
>> password is short enough (or returns an error).
>> 
>>>>> IPMI support is something OpenBMC provides alongside Redfish, it
>>>>> needs to be
>>>>> better integrated -- we're dealing with multiple locked-out BMC
>>>>> issues at the
>>>>> moment at various customer sites, and the recovery method is
>>>>> painful at best
>>>>> when it should be as simple as an ipmitool command from the host
>>>>> terminal.
>>>> 
>>>> I suspect most of this is a matter of IPMI command support and/or
>>>> enabling
>>>> those commands to the host IPMI path.  Most of us are fairly
>>>> untrusting
>>>> of IPMI (and the Host itself), so there hasn't been work to do
>>>> anything
>>>> here.  As long as whatever you're proposing can be disabled for
>>>> models
>>>> where we distrust the Host, it seems like these would be accepted
>>>> as
>>>> well.
>> 
>> 
>> Our current Redfish has multiple users and can enable and
>> disable users to have ipmi access and set their password.
>> 
>> 
>> Of course this just moves the goal posts to the Redfish
>> admin login, but in addition to mTLS certificate based
>> trust (which should be customized to the customer),
>> 
>> Redfish has the concept of a host firmware and os logins
>> including a binding for EFI to specify adapter path and
>> network in addition to read-once magic efi variables.  I
>> know OpenPOWER boxes don't have EFI but the information
>> could be exposed in a similar fashion.  As far as I know
>> we have not yet implemented these users in our Redfish
>> server.
>
>Honestly Redfish is something that we might just want to move to, and
>officially / formally drop network IPMI support.  Probably the
>biggest issue with that comes right back down to needing
>communication between the host and BMC, however -- ipmitool shortcuts
>the whole BMC/host network isolation problem (described above) by
>using the USB interface.  Is there a way to use Redfish over USB in a
>similar manner?

DEPRECATED ===== skip this for below

As I mentioned, the Redfish specification explictly 
talks about having a login for the firmware and the 
booted OS, and requirements for the admin to allow 
or disallow the IDs.  In addition it talks about how 
the information is presented to a EFI boot.  The model 
generates a unique password for each boot using special 
EFI variables, include designation of network path 
including the concept of USB network or pci slot and 
function, IP information, etc.  It uses special 
read-once efi variables to protect the password from 
casual snooping.

I don't think we (OpenBMC) have that implemented this  
magic user, but would anticipate that it would be 
accepted.

Also, for OpenPOWER we would likely want to define 
an OF binding.   Thinking about this, due to the 
desire to clear after fetch something like the 
SYSPARMS api that can request a value optinally 
from the service processor might be approprate, 
even though that is currently a FSP only interface. 

https://github.com/open-power/skiboot/blob/master/doc/device-tree/ibm%2Copal/sysparams.rst
https://github.com/open-power/skiboot/blob/master/doc/opal-api/opal-param-89-90.rst

Another alternative would be the secvar interface 
if that could be common with userspace expecting 
the efi variables, but that would have to be 
multiplexed with the current secvar backend for 
secure boot management. 

DEPRECATED === END ===

The Redfish spec was updated to have a IPMI 
command to create a Bootstrapping credential 
that can then be used until disabled and will 
be invalidated by a Host Reset or Service 
Reset.   The expectation is this temporary 
role will be used to create a permanent 
account.  This service is only available on a 
designated interface and can be disabled or 
enabled from the Redfish HostInterface 
representation.

I believe this too would have to be 
implimented, but exposing the information
to an openfirmware client is much easier 
as it could be a few properties in the 
device tree path.  USB networks are 
identified by vendor, device, and serial
and the ecm device serial is generated
using the bmc machine-id.

=====

For in-box communication, IPMI is being replaced with 
PLDM and MCTP, as 
IPMI is being replaced in future stacks by PLDM and MCTP, 
as Redfish is expecting a reliable transport and is 
string based so quite verbose.


https://github.com/openbmc/docs/blob/master/designs/pldm-stack.md
https://github.com/openbmc/docs/blob/master/designs/mctp/

>
>Thanks!
>
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: OpenBMC on RCS platforms - remote media
  2021-04-28 21:24         ` OpenBMC on RCS platforms - remote media Joseph Reynolds
@ 2021-06-03 12:29           ` Konstantin Klubnichkin
  0 siblings, 0 replies; 11+ messages in thread
From: Konstantin Klubnichkin @ 2021-06-03 12:29 UTC (permalink / raw)
  To: Joseph Reynolds, Timothy Pearson, Milton Miller II; +Cc: openbmc

[-- Attachment #1: Type: text/html, Size: 2479 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-03 12:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-23 14:30 OpenBMC on RCS platforms Timothy Pearson
2021-04-23 17:11 ` Patrick Williams
2021-04-23 18:46   ` Timothy Pearson
2021-04-26 21:42     ` Milton Miller II
2021-04-28 20:21       ` Timothy Pearson
2021-04-28 21:24         ` OpenBMC on RCS platforms - remote media Joseph Reynolds
2021-06-03 12:29           ` Konstantin Klubnichkin
2021-04-29  7:54       ` OpenBMC on RCS platforms Milton Miller II
2021-04-23 17:23 ` Ed Tanous
2021-04-23 19:00   ` Timothy Pearson
2021-04-23 19:23     ` Ed Tanous

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.