openbmc.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Redundant BMC's
@ 2023-12-13 20:43 Andrew Geissler
  0 siblings, 0 replies; only message in thread
From: Andrew Geissler @ 2023-12-13 20:43 UTC (permalink / raw)
  To: OpenBMC Maillist

Greetings,

We at IBM are looking at implementing a server with redundant BMC's. The idea of
redundant BMC's is that if one fails (software or hardware related), the other
BMC takes over and there is no impact to the owner of the server (enterprise,
high availability market). One BMC is the "Active" BMC and the other is the
"Passive”.

High level you have 2 or more chassis's in a single server. 2 of those chassis's
have BMC's running OpenBMC. The BMC's negotiate on startup which one will be the
Active BMC and which one will be the Passive. Both BMC's have full access to the
server hardware (fans, power supplies, VPD chips, ...) but only one can access
the hardware at one time (via hardware mux).

The Passive BMC will be running a subset of OpenBMC services. As it will need to
support firmware update, and other basic features, it will have bmcweb running.
But other services like fan or power control would not be running on the
Passive.

The Active BMC will utilize bmcweb aggregation to provide basic information
about the Passive BMC. Server management can only occur via the Active BMC.

As the user changes settings (BIOS, certificates, system policy, ...) via the
Active BMC, we need to ensure we replicate these settings over to the Passive.
We've done a bit of initial exploration into using corosync/pacemaker. It has
some potential but also feels a bit heavy for what we need. The thought is that
a role change where the Passive BMC becomes the Active BMC and the Active
becomes the Passive is mostly driven by our external software managers. There's
potential for some cases where the BMC's themselves drive the role changes but
most of our use cases are situations where something in the BMC hardware (or its
connections to the server) have failed and the BIOS firmware or Redfish
management client direct the Passive BMC to become the Active.

A roll-our-own data synchronization daemon (utilizing rsync) to monitor for file
changes with some basic rules on when to synch (immediate, synch points) doesn't
seem all that bad but there's probably a lot of unknown pitfalls something like
corosync/pacemaker already handle.

Just throwing this out there in case anyone is also working on this or has any
opinions on direction here.

Thanks,
Andrew

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-12-13 20:44 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-13 20:43 Redundant BMC's Andrew Geissler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).