All of lore.kernel.org
 help / color / mirror / Atom feed
* Managing heterogeneous systems
@ 2019-12-09 21:26 Neeraj Ladkani
  2019-12-09 23:02 ` Richard Hanley
  0 siblings, 1 reply; 10+ messages in thread
From: Neeraj Ladkani @ 2019-12-09 21:26 UTC (permalink / raw)
  To: openbmc

[-- Attachment #1: Type: text/plain, Size: 310 bytes --]

Are there any standards in managing heterogeneous systems? For example in a rack if there is a compute node( with its own BMC) and storage node( with its own BMC) connected using a PCIe switch.  How these two BMC represented as one system ?  are there any standards for BMC - BMC communication?


Neeraj


[-- Attachment #2: Type: text/html, Size: 2388 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Managing heterogeneous systems
  2019-12-09 21:26 Managing heterogeneous systems Neeraj Ladkani
@ 2019-12-09 23:02 ` Richard Hanley
  2019-12-10  8:59   ` vishwa
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Hanley @ 2019-12-09 23:02 UTC (permalink / raw)
  To: Neeraj Ladkani; +Cc: openbmc

[-- Attachment #1: Type: text/plain, Size: 2127 bytes --]

Hi Neeraj,

This is an open question that I've been looking into as well.

For BMC to BMC communication there are a few options.

   1. If you have network connectivity you can communicate using Redfish.
   2. If you only have a PCIe connection, you'll have to use either the
   inband connection or the side band I2C*.  PLDM and MCTP are protocols that
   defined to handle this use case, although I'm not sure if the OpenBMC
   implementations have been used in production.
   3. There is always IPMI, which has its own pros/cons.

For taking several BMCs and aggregating them into a single logical
interface that is exposed to the outside world, there are a few things
happening on that front.  DMTF has been working on an aggregation protocol
for Redfish.  However, it's my understanding that their proposal is more
directed at the client level, as opposed to within a single "system".

I just recently joined the community, but I've been thinking about how a
proxy layer could merge two Redfish services together.  Since Redfish is
fairly strongly typed and has a well defined mechanism for OEM extensions,
this should be pretty generally applicable.  I am planning on having a
white paper on the issue sometime after the holidays.

Another thing to note, recently DMTF released a spec for running a binary
Redfish over PLDM called RDE.  That might be a useful way of tying all
these concepts together.

I'd be curious about your thoughts and use cases here.  Would either PLDM
or Redfish fit your use case?

Regards,
Richard

*I've heard of some proposals that run a network interface over PCIe.  I
don't know enough about PCIe to know if this is a good idea.

On Mon, Dec 9, 2019 at 1:27 PM Neeraj Ladkani <neladk@microsoft.com> wrote:

> Are there any standards in managing heterogeneous systems? For example in
> a rack if there is a compute node( with its own BMC) and storage node( with
> its own BMC) connected using a PCIe switch.  How these two BMC represented
> as one system ?  are there any standards for BMC – BMC communication?
>
>
>
>
>
> Neeraj
>
>
>

[-- Attachment #2: Type: text/html, Size: 2877 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Managing heterogeneous systems
  2019-12-09 23:02 ` Richard Hanley
@ 2019-12-10  8:59   ` vishwa
  2019-12-10  9:50     ` [EXTERNAL] " Neeraj Ladkani
  0 siblings, 1 reply; 10+ messages in thread
From: vishwa @ 2019-12-10  8:59 UTC (permalink / raw)
  To: Richard Hanley, Neeraj Ladkani
  Cc: openbmc, sgundura, kusripat, shahjsha, vikantan

[-- Attachment #1: Type: text/plain, Size: 3632 bytes --]

Hi Richard / Neeraj,

Thanks for bringing this up. It's one of the interesting topic for IBM.

Some of the thoughts here.....

When we have multiple BMCs as part of a single system, then there are 3 
main parts into it.

1/. Discovering the peer BMCs and role assignment
2/. Monitoring the existence of peer BMCs - heartbeat
3/. In the event of loosing the master, detect so using #2 and then 
reassign the role

Depending on how we want to establish the roles, we could have 
Single-Master, Many-slave or Multi-Master, Multi-Slave. etc

One of the team here is trying to do a POC for Multi BMC architecture 
and is still in the very beginning stage.
The team is currently studying/evaluating the available solution - 
Corosync / Heartbeat / Pacemaker".
Corosync works nice with the clusters, but we need to see if we can trim 
it down for BMC.

If we can not use corosync for some reason, then need to see if we can 
use the discovery using PLDM ( probably use the terminus IDs )
and come up with custom rules for assigning Master-Slave roles.

If we choose to have Single-Master and Many-Slave, we could have that 
Single-Master as an entity acting as a Point of Contact for external 
request and then could orchestrate with the needed BMCs internally to 
get the job done

I will be happy to know if there are alternatives that suit BMC kind of 
an architecture

!! Vishwa !!

On 12/10/19 4:32 AM, Richard Hanley wrote:
> Hi Neeraj,
>
> This is an open question that I've been looking into as well.
>
> For BMC to BMC communication there are a few options.
>
>  1. If you have network connectivity you can communicate using Redfish.
>  2. If you only have a PCIe connection, you'll have to use either the
>     inband connection or the side band I2C*.  PLDM and MCTP are
>     protocols that defined to handle this use case, although I'm not
>     sure if the OpenBMC implementations have been used in production.
>  3. There is always IPMI, which has its own pros/cons.
>
> For taking several BMCs and aggregating them into a single logical 
> interface that is exposed to the outside world, there are a few things 
> happening on that front.  DMTF has been working on an aggregation 
> protocol for Redfish. However, it's my understanding that their 
> proposal is more directed at the client level, as opposed to within a 
> single "system".
>
> I just recently joined the community, but I've been thinking about how 
> a proxy layer could merge two Redfish services together.  Since 
> Redfish is fairly strongly typed and has a well defined mechanism for 
> OEM extensions, this should be pretty generally applicable.  I am 
> planning on having a white paper on the issue sometime after the holidays.
>
> Another thing to note, recently DMTF released a spec for running a 
> binary Redfish over PLDM called RDE.  That might be a useful way of 
> tying all these concepts together.
>
> I'd be curious about your thoughts and use cases here. Would either 
> PLDM or Redfish fit your use case?
>
> Regards,
> Richard
>
> *I've heard of some proposals that run a network interface over PCIe.  
> I don't know enough about PCIe to know if this is a good idea.
>
> On Mon, Dec 9, 2019 at 1:27 PM Neeraj Ladkani <neladk@microsoft.com 
> <mailto:neladk@microsoft.com>> wrote:
>
>     Are there any standards in managing heterogeneous systems? For
>     example in a rack if there is a compute node( with its own BMC)
>     and storage node( with its own BMC) connected using a PCIe
>     switch.  How these two BMC represented as one system ?  are there
>     any standards for BMC – BMC communication?
>
>     Neeraj
>

[-- Attachment #2: Type: text/html, Size: 5729 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [EXTERNAL] Re: Managing heterogeneous systems
  2019-12-10  8:59   ` vishwa
@ 2019-12-10  9:50     ` Neeraj Ladkani
  2019-12-12  6:59       ` vishwa
  0 siblings, 1 reply; 10+ messages in thread
From: Neeraj Ladkani @ 2019-12-10  9:50 UTC (permalink / raw)
  To: vishwa, Richard Hanley; +Cc: openbmc, sgundura, kusripat, shahjsha, vikantan

[-- Attachment #1: Type: text/plain, Size: 5643 bytes --]

Great discussion.

The problem is not physical interface as they can communicate using LAN. The problem is entity binding as one compute node can be connected to 1 or more storage nodes. How can we have one view of system from operational perspective? Power on/off, SEL logs, telemetry?

Some of problems :


  1.  Power operations : Power/resets/ need to be coordinated in all nodes in a system
  2.  Telemetry : OS runs only on head node so if there are requests to read telemetry, it should get telemetry ( SEL logs, Sensor Values ) from all the nodes.
  3.  Firmware Update
  4.  RAS: Memory errors are logged by UEFI SMM in to head node but corresponding DIMM temperature , inlet temperature are logged on secondary node which are not mapped.


I have been exploring couple of routes


  1.  LUN  discovery and routing: this is similar to IPMI but I am working on architecture to extend this to support multiple LUNs and route them from Head node. ( we would need LUN routing over LAN )
  2.  Redfish hierarchy for systems

   "Systems": {

        "@odata.id": "/redfish/v1/Systems"

    },

    "Chassis": {

        "@odata.id": "/redfish/v1/Chassis"

    },

    "Managers": {

        "@odata.id": "/redfish/v1/Managers"

    },

    "AccountService": {

        "@odata.id": "/redfish/v1/AccountService"

    },

    "SessionService": {

        "@odata.id": "/redfish/v1/SessionService"

    },

    "Links": {

        "Sessions": {

            "@odata.id": "/redfish/v1/SessionService/Sessions"

        }

3.  Custom Messaging over LAN ( PubSub)

I am also working on a whitepaper on same area ☺.  Happy to work with you guys if you have any ideas on how can we standardize this.

Neeraj

From: vishwa <vishwa@linux.vnet.ibm.com>
Sent: Tuesday, December 10, 2019 1:00 AM
To: Richard Hanley <rhanley@google.com>; Neeraj Ladkani <neladk@microsoft.com>
Cc: openbmc@lists.ozlabs.org; sgundura@in.ibm.com; kusripat@in.ibm.com; shahjsha@in.ibm.com; vikantan@in.ibm.com
Subject: [EXTERNAL] Re: Managing heterogeneous systems


Hi Richard / Neeraj,

Thanks for bringing this up. It's one of the interesting topic for IBM.

Some of the thoughts here.....

When we have multiple BMCs as part of a single system, then there are 3 main parts into it.

1/. Discovering the peer BMCs and role assignment
2/. Monitoring the existence of peer BMCs - heartbeat
3/. In the event of loosing the master, detect so using #2 and then reassign the role

Depending on how we want to establish the roles, we could have Single-Master, Many-slave or Multi-Master, Multi-Slave. etc

One of the team here is trying to do a POC for Multi BMC architecture and is still in the very beginning stage.
The team is currently studying/evaluating the available solution - Corosync / Heartbeat / Pacemaker".
Corosync works nice with the clusters, but we need to see if we can trim it down for BMC.

If we can not use corosync for some reason, then need to see if we can use the discovery using PLDM ( probably use the terminus IDs )
and come up with custom rules for assigning Master-Slave roles.

If we choose to have Single-Master and Many-Slave, we could have that Single-Master as an entity acting as a Point of Contact for external request and then could orchestrate with the needed BMCs internally to get the job done

I will be happy to know if there are alternatives that suit BMC kind of an architecture

!! Vishwa !!
On 12/10/19 4:32 AM, Richard Hanley wrote:
Hi Neeraj,

This is an open question that I've been looking into as well.

For BMC to BMC communication there are a few options.

  1.  If you have network connectivity you can communicate using Redfish.
  2.  If you only have a PCIe connection, you'll have to use either the inband connection or the side band I2C*.  PLDM and MCTP are protocols that defined to handle this use case, although I'm not sure if the OpenBMC implementations have been used in production.
  3.  There is always IPMI, which has its own pros/cons.
For taking several BMCs and aggregating them into a single logical interface that is exposed to the outside world, there are a few things happening on that front.  DMTF has been working on an aggregation protocol for Redfish.  However, it's my understanding that their proposal is more directed at the client level, as opposed to within a single "system".

I just recently joined the community, but I've been thinking about how a proxy layer could merge two Redfish services together.  Since Redfish is fairly strongly typed and has a well defined mechanism for OEM extensions, this should be pretty generally applicable.  I am planning on having a white paper on the issue sometime after the holidays.

Another thing to note, recently DMTF released a spec for running a binary Redfish over PLDM called RDE.  That might be a useful way of tying all these concepts together.

I'd be curious about your thoughts and use cases here.  Would either PLDM or Redfish fit your use case?

Regards,
Richard

*I've heard of some proposals that run a network interface over PCIe.  I don't know enough about PCIe to know if this is a good idea.

On Mon, Dec 9, 2019 at 1:27 PM Neeraj Ladkani <neladk@microsoft.com<mailto:neladk@microsoft.com>> wrote:
Are there any standards in managing heterogeneous systems? For example in a rack if there is a compute node( with its own BMC) and storage node( with its own BMC) connected using a PCIe switch.  How these two BMC represented as one system ?  are there any standards for BMC – BMC communication?


Neeraj


[-- Attachment #2: Type: text/html, Size: 18772 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [EXTERNAL] Re: Managing heterogeneous systems
  2019-12-10  9:50     ` [EXTERNAL] " Neeraj Ladkani
@ 2019-12-12  6:59       ` vishwa
  2019-12-12  7:33         ` Neeraj Ladkani
  0 siblings, 1 reply; 10+ messages in thread
From: vishwa @ 2019-12-12  6:59 UTC (permalink / raw)
  To: Neeraj Ladkani
  Cc: openbmc, sgundura, kusripat, shahjsha, vikantan, Richard Hanley

[-- Attachment #1: Type: text/plain, Size: 6467 bytes --]

On 12/10/19 3:20 PM, Neeraj Ladkani wrote:
>
> Great discussion.
>
> The problem is not physical interface as they can communicate using 
> LAN. The problem is entity binding as one compute node can be 
> connected to 1 or more storage nodes. How can we have one view of 
> system from operational perspective? Power on/off, SEL logs, telemetry?
>


Correct. This is where I mentioned about "Primary BMC acting as Point Of 
Contact" for external requests.
Depending on how we want to service the request, we could orchestrate 
that via PoC BMC or respond to external requesters on where they can get 
the data and they connect to 'em directly.

!! Vishwa !!

> Some of problems :
>
>  1. Power operations : Power/resets/ need to be coordinated in all
>     nodes in a system
>  2. Telemetry : OS runs only on head node so if there are requests to
>     read telemetry, it should get telemetry ( SEL logs, Sensor Values
>     ) from all the nodes.
>  3. Firmware Update
>  4. RAS: Memory errors are logged by UEFI SMM in to head node but
>     corresponding DIMM temperature , inlet temperature are logged on
>     secondary node which are not mapped.
>
> I have been exploring couple of routes
>
>  1. LUN  discovery and routing: this is similar to IPMI but I am
>     working on architecture to extend this to support multiple LUNs
>     and route them from Head node. ( we would need LUN routing over LAN )
>  2. Redfish hierarchy for systems
>
>    "Systems": {
>         "@odata.id": "/redfish/v1/Systems"
>     },
>     "Chassis": {
>         "@odata.id": "/redfish/v1/Chassis"
>     },
>     "Managers": {
>         "@odata.id": "/redfish/v1/Managers"
>     },
>     "AccountService": {
>         "@odata.id": "/redfish/v1/AccountService"
>     },
>     "SessionService": {
>         "@odata.id": "/redfish/v1/SessionService"
>     },
>     "Links": {
>         "Sessions": {
>             "@odata.id": "/redfish/v1/SessionService/Sessions"
>         }
> 3.Custom Messaging over LAN ( PubSub)
>
> I am also working on a whitepaper on same area J.  Happy to work with 
> you guys if you have any ideas on how can we standardize this.
>
> Neeraj
>
> *From:*vishwa <vishwa@linux.vnet.ibm.com>
> *Sent:* Tuesday, December 10, 2019 1:00 AM
> *To:* Richard Hanley <rhanley@google.com>; Neeraj Ladkani 
> <neladk@microsoft.com>
> *Cc:* openbmc@lists.ozlabs.org; sgundura@in.ibm.com; 
> kusripat@in.ibm.com; shahjsha@in.ibm.com; vikantan@in.ibm.com
> *Subject:* [EXTERNAL] Re: Managing heterogeneous systems
>
> Hi Richard / Neeraj,
>
> Thanks for bringing this up. It's one of the interesting topic for IBM.
>
> Some of the thoughts here.....
>
> When we have multiple BMCs as part of a single system, then there are 
> 3 main parts into it.
>
> 1/. Discovering the peer BMCs and role assignment
> 2/. Monitoring the existence of peer BMCs - heartbeat
> 3/. In the event of loosing the master, detect so using #2 and then 
> reassign the role
>
> Depending on how we want to establish the roles, we could have 
> Single-Master, Many-slave or Multi-Master, Multi-Slave. etc
>
> One of the team here is trying to do a POC for Multi BMC architecture 
> and is still in the very beginning stage.
> The team is currently studying/evaluating the available solution - 
> Corosync / Heartbeat / Pacemaker".
> Corosync works nice with the clusters, but we need to see if we can 
> trim it down for BMC.
>
> If we can not use corosync for some reason, then need to see if we can 
> use the discovery using PLDM ( probably use the terminus IDs )
> and come up with custom rules for assigning Master-Slave roles.
>
> If we choose to have Single-Master and Many-Slave, we could have that 
> Single-Master as an entity acting as a Point of Contact for external 
> request and then could orchestrate with the needed BMCs internally to 
> get the job done
>
> I will be happy to know if there are alternatives that suit BMC kind 
> of an architecture
>
> !! Vishwa !!
>
> On 12/10/19 4:32 AM, Richard Hanley wrote:
>
>     Hi Neeraj,
>
>     This is an open question that I've been looking into as well.
>
>     For BMC to BMC communication there are a few options.
>
>      1. If you have network connectivity you can communicate using
>         Redfish.
>      2. If you only have a PCIe connection, you'll have to use either
>         the inband connection or the side band I2C*. PLDM and MCTP are
>         protocols that defined to handle this use case, although I'm
>         not sure if the OpenBMC implementations have been used in
>         production.
>      3. There is always IPMI, which has its own pros/cons.
>
>     For taking several BMCs and aggregating them into a single logical
>     interface that is exposed to the outside world, there are a few
>     things happening on that front.  DMTF has been working on an
>     aggregation protocol for Redfish.  However, it's my understanding
>     that their proposal is more directed at the client level, as
>     opposed to within a single "system".
>
>     I just recently joined the community, but I've been thinking about
>     how a proxy layer could merge two Redfish services together. 
>     Since Redfish is fairly strongly typed and has a well defined
>     mechanism for OEM extensions, this should be pretty generally
>     applicable.  I am planning on having a white paper on the issue
>     sometime after the holidays.
>
>     Another thing to note, recently DMTF released a spec for running a
>     binary Redfish over PLDM called RDE.  That might be a useful way
>     of tying all these concepts together.
>
>     I'd be curious about your thoughts and use cases here.  Would
>     either PLDM or Redfish fit your use case?
>
>     Regards,
>
>     Richard
>
>     *I've heard of some proposals that run a network interface over
>     PCIe.  I don't know enough about PCIe to know if this is a good idea.
>
>     On Mon, Dec 9, 2019 at 1:27 PM Neeraj Ladkani
>     <neladk@microsoft.com <mailto:neladk@microsoft.com>> wrote:
>
>         Are there any standards in managing heterogeneous systems? For
>         example in a rack if there is a compute node( with its own
>         BMC) and storage node( with its own BMC) connected using a
>         PCIe switch.  How these two BMC represented as one system ?
>          are there any standards for BMC – BMC communication?
>
>         Neeraj
>

[-- Attachment #2: Type: text/html, Size: 22449 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [EXTERNAL] Re: Managing heterogeneous systems
  2019-12-12  6:59       ` vishwa
@ 2019-12-12  7:33         ` Neeraj Ladkani
  2019-12-12 20:02           ` Richard Hanley
  0 siblings, 1 reply; 10+ messages in thread
From: Neeraj Ladkani @ 2019-12-12  7:33 UTC (permalink / raw)
  To: vishwa; +Cc: openbmc, sgundura, kusripat, shahjsha, vikantan, Richard Hanley

[-- Attachment #1: Type: text/plain, Size: 6671 bytes --]

Sure, how do we want to enable BMC-BMC communication? Standard redfish/IPMI ?

Neeraj


From: vishwa <vishwa@linux.vnet.ibm.com>
Sent: Wednesday, December 11, 2019 10:59 PM
To: Neeraj Ladkani <neladk@microsoft.com>
Cc: openbmc@lists.ozlabs.org; sgundura@in.ibm.com; kusripat@in.ibm.com; shahjsha@in.ibm.com; vikantan@in.ibm.com; Richard Hanley <rhanley@google.com>
Subject: Re: [EXTERNAL] Re: Managing heterogeneous systems

On 12/10/19 3:20 PM, Neeraj Ladkani wrote:
Great discussion.

The problem is not physical interface as they can communicate using LAN. The problem is entity binding as one compute node can be connected to 1 or more storage nodes. How can we have one view of system from operational perspective? Power on/off, SEL logs, telemetry?


Correct. This is where I mentioned about "Primary BMC acting as Point Of Contact" for external requests.
Depending on how we want to service the request, we could orchestrate that via PoC BMC or respond to external requesters on where they can get the data and they connect to 'em directly.

!! Vishwa !!

Some of problems :


  1.  Power operations : Power/resets/ need to be coordinated in all nodes in a system
  2.  Telemetry : OS runs only on head node so if there are requests to read telemetry, it should get telemetry ( SEL logs, Sensor Values ) from all the nodes.
  3.  Firmware Update
  4.  RAS: Memory errors are logged by UEFI SMM in to head node but corresponding DIMM temperature , inlet temperature are logged on secondary node which are not mapped.


I have been exploring couple of routes


  1.  LUN  discovery and routing: this is similar to IPMI but I am working on architecture to extend this to support multiple LUNs and route them from Head node. ( we would need LUN routing over LAN )
  2.  Redfish hierarchy for systems

   "Systems": {

        "@odata.id": "/redfish/v1/Systems"

    },

    "Chassis": {

        "@odata.id": "/redfish/v1/Chassis"

    },

    "Managers": {

        "@odata.id": "/redfish/v1/Managers"

    },

    "AccountService": {

        "@odata.id": "/redfish/v1/AccountService"

    },

    "SessionService": {

        "@odata.id": "/redfish/v1/SessionService"

    },

    "Links": {

        "Sessions": {

            "@odata.id": "/redfish/v1/SessionService/Sessions"

        }

3.  Custom Messaging over LAN ( PubSub)

I am also working on a whitepaper on same area ☺.  Happy to work with you guys if you have any ideas on how can we standardize this.

Neeraj

From: vishwa <vishwa@linux.vnet.ibm.com><mailto:vishwa@linux.vnet.ibm.com>
Sent: Tuesday, December 10, 2019 1:00 AM
To: Richard Hanley <rhanley@google.com><mailto:rhanley@google.com>; Neeraj Ladkani <neladk@microsoft.com><mailto:neladk@microsoft.com>
Cc: openbmc@lists.ozlabs.org<mailto:openbmc@lists.ozlabs.org>; sgundura@in.ibm.com<mailto:sgundura@in.ibm.com>; kusripat@in.ibm.com<mailto:kusripat@in.ibm.com>; shahjsha@in.ibm.com<mailto:shahjsha@in.ibm.com>; vikantan@in.ibm.com<mailto:vikantan@in.ibm.com>
Subject: [EXTERNAL] Re: Managing heterogeneous systems


Hi Richard / Neeraj,

Thanks for bringing this up. It's one of the interesting topic for IBM.

Some of the thoughts here.....

When we have multiple BMCs as part of a single system, then there are 3 main parts into it.

1/. Discovering the peer BMCs and role assignment
2/. Monitoring the existence of peer BMCs - heartbeat
3/. In the event of loosing the master, detect so using #2 and then reassign the role

Depending on how we want to establish the roles, we could have Single-Master, Many-slave or Multi-Master, Multi-Slave. etc

One of the team here is trying to do a POC for Multi BMC architecture and is still in the very beginning stage.
The team is currently studying/evaluating the available solution - Corosync / Heartbeat / Pacemaker".
Corosync works nice with the clusters, but we need to see if we can trim it down for BMC.

If we can not use corosync for some reason, then need to see if we can use the discovery using PLDM ( probably use the terminus IDs )
and come up with custom rules for assigning Master-Slave roles.

If we choose to have Single-Master and Many-Slave, we could have that Single-Master as an entity acting as a Point of Contact for external request and then could orchestrate with the needed BMCs internally to get the job done

I will be happy to know if there are alternatives that suit BMC kind of an architecture

!! Vishwa !!
On 12/10/19 4:32 AM, Richard Hanley wrote:
Hi Neeraj,

This is an open question that I've been looking into as well.

For BMC to BMC communication there are a few options.

  1.  If you have network connectivity you can communicate using Redfish.
  2.  If you only have a PCIe connection, you'll have to use either the inband connection or the side band I2C*.  PLDM and MCTP are protocols that defined to handle this use case, although I'm not sure if the OpenBMC implementations have been used in production.
  3.  There is always IPMI, which has its own pros/cons.
For taking several BMCs and aggregating them into a single logical interface that is exposed to the outside world, there are a few things happening on that front.  DMTF has been working on an aggregation protocol for Redfish.  However, it's my understanding that their proposal is more directed at the client level, as opposed to within a single "system".

I just recently joined the community, but I've been thinking about how a proxy layer could merge two Redfish services together.  Since Redfish is fairly strongly typed and has a well defined mechanism for OEM extensions, this should be pretty generally applicable.  I am planning on having a white paper on the issue sometime after the holidays.

Another thing to note, recently DMTF released a spec for running a binary Redfish over PLDM called RDE.  That might be a useful way of tying all these concepts together.

I'd be curious about your thoughts and use cases here.  Would either PLDM or Redfish fit your use case?

Regards,
Richard

*I've heard of some proposals that run a network interface over PCIe.  I don't know enough about PCIe to know if this is a good idea.

On Mon, Dec 9, 2019 at 1:27 PM Neeraj Ladkani <neladk@microsoft.com<mailto:neladk@microsoft.com>> wrote:
Are there any standards in managing heterogeneous systems? For example in a rack if there is a compute node( with its own BMC) and storage node( with its own BMC) connected using a PCIe switch.  How these two BMC represented as one system ?  are there any standards for BMC – BMC communication?


Neeraj


[-- Attachment #2: Type: text/html, Size: 20937 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [EXTERNAL] Re: Managing heterogeneous systems
  2019-12-12  7:33         ` Neeraj Ladkani
@ 2019-12-12 20:02           ` Richard Hanley
  2019-12-19 10:17             ` vishwa
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Hanley @ 2019-12-12 20:02 UTC (permalink / raw)
  To: Neeraj Ladkani; +Cc: vishwa, openbmc, sgundura, kusripat, shahjsha, vikantan

[-- Attachment #1: Type: text/plain, Size: 10062 bytes --]

In our case we are working to migrate away from IPMI to Redfish.  Most of
the solutions I've been thinking about have leaned pretty heavily into that.

In my mind I've sliced this project up into a few different areas.

*Merging/Transforming Redfish Resources*
Let's say that there are several Redfish services.  They will have
collections of Systems, Chassis, and Managers that need to be merged.  In
the simplest uses this would be just an HTTP proxy cache with some URL
cleaning.

However, this could end up being a pretty deep merge in cases where some
resources are split across multiple management domains.  Memory errors
being on one node, but the temperature sensor being on a separate node is a
good example. Another example would be the "ContainedBy" link.  These links
might reach across different BMC boundaries, and would need to be inserted
by the primary node.

*Aggregating Services and Actions*
This is where I think the DMTF proposals for Redfish aggregation (located
here
<https://members.dmtf.org/apps/org/workgroup/redfish/document.php?document_id=91811>)
provide the most insight.  My reading of this proposal is that an
aggregation service would be used to tie actions together.  For example,
there may be individual chassis reset action embedded in the chassis
resources, and then aggregated action for a full reset.

DMTF seems to be leaving the arbiter of the aggregation up to the
implementation.  I'd imagine that some implementations would provide a
static aggregation service, while others would allow clients to create
their own dynamic aggregates.

*Discovery, Negotiation, and Error Recovery*
This is an area where I'd like to hear more about your requirements,
Vishwa.  Would you expect the BMC cluster to be hot-swappable?  Is there a
particular reason that it has to be peer to peer? What kind of error
recovery should be supported when a node fails?

At a high level, the idea that has been suggested internally is to have a
designated master node at install time.  That node would discover any other
Redfish services on the LAN, and begin aggregating them.  The master node
would keep any in memory cache of the other services, and reload resources
on demand.  If a node goes down, then there error is propagated using HTTP
return codes.  If the master node goes down, then the entire aggregate will
go down.  In theory a client could talk to individual nodes if it needed to.

* Authentication and Authorization*
This is an area where I think Redfish is a little hands off.  In an ideal
world ACLs could be setup without proliferating username/passwords across
nodes.  As an aside, we've been thinking about how to use Redfish without
any usernames or passwords.  By using a combination of certificates and
authorization tokens it should be possible to extend a security zone to a
small cluster of BMCs.

Regards,
Richard

On Wed, Dec 11, 2019 at 11:33 PM Neeraj Ladkani <neladk@microsoft.com>
wrote:

> Sure, how do we want to enable BMC-BMC communication? Standard
> redfish/IPMI ?
>
>
>
> Neeraj
>
>
>
>
>
> *From:* vishwa <vishwa@linux.vnet.ibm.com>
> *Sent:* Wednesday, December 11, 2019 10:59 PM
> *To:* Neeraj Ladkani <neladk@microsoft.com>
> *Cc:* openbmc@lists.ozlabs.org; sgundura@in.ibm.com; kusripat@in.ibm.com;
> shahjsha@in.ibm.com; vikantan@in.ibm.com; Richard Hanley <
> rhanley@google.com>
> *Subject:* Re: [EXTERNAL] Re: Managing heterogeneous systems
>
>
>
> On 12/10/19 3:20 PM, Neeraj Ladkani wrote:
>
> Great discussion.
>
>
>
> The problem is not physical interface as they can communicate using LAN.
> The problem is entity binding as one compute node can be connected to 1 or
> more storage nodes. How can we have one view of system from operational
> perspective? Power on/off, SEL logs, telemetry?
>
>
>
>
> Correct. This is where I mentioned about "Primary BMC acting as Point Of
> Contact" for external requests.
> Depending on how we want to service the request, we could orchestrate that
> via PoC BMC or respond to external requesters on where they can get the
> data and they connect to 'em directly.
>
>
> !! Vishwa !!
>
>
>
> Some of problems :
>
>
>
>    1. Power operations : Power/resets/ need to be coordinated in all
>    nodes in a system
>    2. Telemetry : OS runs only on head node so if there are requests to
>    read telemetry, it should get telemetry ( SEL logs, Sensor Values ) from
>    all the nodes.
>    3. Firmware Update
>    4. RAS: Memory errors are logged by UEFI SMM in to head node but
>    corresponding DIMM temperature , inlet temperature are logged on secondary
>    node which are not mapped.
>
>
>
>
>
> I have been exploring couple of routes
>
>
>
>    1. LUN  discovery and routing: this is similar to IPMI but I am
>    working on architecture to extend this to support multiple LUNs and route
>    them from Head node. ( we would need LUN routing over LAN )
>    2. Redfish hierarchy for systems
>
>    "Systems": {
>
>         "@odata.id": "/redfish/v1/Systems"
>
>     },
>
>     "Chassis": {
>
>         "@odata.id": "/redfish/v1/Chassis"
>
>     },
>
>     "Managers": {
>
>         "@odata.id": "/redfish/v1/Managers"
>
>     },
>
>     "AccountService": {
>
>         "@odata.id": "/redfish/v1/AccountService"
>
>     },
>
>     "SessionService": {
>
>         "@odata.id": "/redfish/v1/SessionService"
>
>     },
>
>     "Links": {
>
>         "Sessions": {
>
>             "@odata.id": "/redfish/v1/SessionService/Sessions"
>
>         }
>
> 3.  Custom Messaging over LAN ( PubSub)
>
>
>
> I am also working on a whitepaper on same area J.  Happy to work with you
> guys if you have any ideas on how can we standardize this.
>
>
>
> Neeraj
>
>
>
> *From:* vishwa <vishwa@linux.vnet.ibm.com> <vishwa@linux.vnet.ibm.com>
> *Sent:* Tuesday, December 10, 2019 1:00 AM
> *To:* Richard Hanley <rhanley@google.com> <rhanley@google.com>; Neeraj
> Ladkani <neladk@microsoft.com> <neladk@microsoft.com>
> *Cc:* openbmc@lists.ozlabs.org; sgundura@in.ibm.com; kusripat@in.ibm.com;
> shahjsha@in.ibm.com; vikantan@in.ibm.com
> *Subject:* [EXTERNAL] Re: Managing heterogeneous systems
>
>
>
> Hi Richard / Neeraj,
>
> Thanks for bringing this up. It's one of the interesting topic for IBM.
>
> Some of the thoughts here.....
>
> When we have multiple BMCs as part of a single system, then there are 3
> main parts into it.
>
> 1/. Discovering the peer BMCs and role assignment
> 2/. Monitoring the existence of peer BMCs - heartbeat
> 3/. In the event of loosing the master, detect so using #2 and then
> reassign the role
>
> Depending on how we want to establish the roles, we could have
> Single-Master, Many-slave or Multi-Master, Multi-Slave. etc
>
> One of the team here is trying to do a POC for Multi BMC architecture and
> is still in the very beginning stage.
> The team is currently studying/evaluating the available solution -
> Corosync / Heartbeat / Pacemaker".
> Corosync works nice with the clusters, but we need to see if we can trim
> it down for BMC.
>
> If we can not use corosync for some reason, then need to see if we can use
> the discovery using PLDM ( probably use the terminus IDs )
> and come up with custom rules for assigning Master-Slave roles.
>
> If we choose to have Single-Master and Many-Slave, we could have that
> Single-Master as an entity acting as a Point of Contact for external
> request and then could orchestrate with the needed BMCs internally to get
> the job done
>
> I will be happy to know if there are alternatives that suit BMC kind of an
> architecture
>
> !! Vishwa !!
>
> On 12/10/19 4:32 AM, Richard Hanley wrote:
>
> Hi Neeraj,
>
>
>
> This is an open question that I've been looking into as well.
>
>
>
> For BMC to BMC communication there are a few options.
>
>    1. If you have network connectivity you can communicate using Redfish.
>    2. If you only have a PCIe connection, you'll have to use either the
>    inband connection or the side band I2C*.  PLDM and MCTP are protocols that
>    defined to handle this use case, although I'm not sure if the OpenBMC
>    implementations have been used in production.
>    3. There is always IPMI, which has its own pros/cons.
>
> For taking several BMCs and aggregating them into a single logical
> interface that is exposed to the outside world, there are a few things
> happening on that front.  DMTF has been working on an aggregation protocol
> for Redfish.  However, it's my understanding that their proposal is more
> directed at the client level, as opposed to within a single "system".
>
>
>
> I just recently joined the community, but I've been thinking about how a
> proxy layer could merge two Redfish services together.  Since Redfish is
> fairly strongly typed and has a well defined mechanism for OEM extensions,
> this should be pretty generally applicable.  I am planning on having a
> white paper on the issue sometime after the holidays.
>
>
>
> Another thing to note, recently DMTF released a spec for running a binary
> Redfish over PLDM called RDE.  That might be a useful way of tying all
> these concepts together.
>
>
>
> I'd be curious about your thoughts and use cases here.  Would either PLDM
> or Redfish fit your use case?
>
>
>
> Regards,
>
> Richard
>
>
>
> *I've heard of some proposals that run a network interface over PCIe.  I
> don't know enough about PCIe to know if this is a good idea.
>
>
>
> On Mon, Dec 9, 2019 at 1:27 PM Neeraj Ladkani <neladk@microsoft.com>
> wrote:
>
> Are there any standards in managing heterogeneous systems? For example in
> a rack if there is a compute node( with its own BMC) and storage node( with
> its own BMC) connected using a PCIe switch.  How these two BMC represented
> as one system ?  are there any standards for BMC – BMC communication?
>
>
>
>
>
> Neeraj
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 18409 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [EXTERNAL] Re: Managing heterogeneous systems
  2019-12-12 20:02           ` Richard Hanley
@ 2019-12-19 10:17             ` vishwa
  2020-01-09 22:21               ` Richard Hanley
  0 siblings, 1 reply; 10+ messages in thread
From: vishwa @ 2019-12-19 10:17 UTC (permalink / raw)
  To: Richard Hanley, Neeraj Ladkani
  Cc: openbmc, sgundura, kusripat, shahjsha, vikantan

[-- Attachment #1: Type: text/plain, Size: 13175 bytes --]

Richard,

Thanks for putting it together.

On 12/13/19 1:32 AM, Richard Hanley wrote:
> In our case we are working to migrate away from IPMI to Redfish.  Most 
> of the solutions I've been thinking about have leaned pretty heavily 
> into that.
>
> In my mind I've sliced this project up into a few different areas.
>
> *Merging/Transforming Redfish Resources*
> Let's say that there are several Redfish services.  They will have 
> collections of Systems, Chassis, and Managers that need to be merged.  
> In the simplest uses this would be just an HTTP proxy cache with some 
> URL cleaning.
>
> However, this could end up being a pretty deep merge in cases where 
> some resources are split across multiple management domains.  Memory 
> errors being on one node, but the temperature sensor being on a 
> separate node is a good example. Another example would be the 
> "ContainedBy" link. These links might reach across different BMC 
> boundaries, and would need to be inserted by the primary node.
>
> *Aggregating Services and Actions*
> This is where I think the DMTF proposals for Redfish aggregation 
> (located here 
> <https://members.dmtf.org/apps/org/workgroup/redfish/document.php?document_id=91811>) 
> provide the most insight.  My reading of this proposal is that an 
> aggregation service would be used to tie actions together. For 
> example, there may be individual chassis reset action embedded in the 
> chassis resources, and then aggregated action for a full reset.
>
> DMTF seems to be leaving the arbiter of the aggregation up to the 
> implementation.  I'd imagine that some implementations would provide a 
> static aggregation service, while others would allow clients to create 
> their own dynamic aggregates.
> *
> *
> *Discovery, Negotiation, and Error Recovery*
> This is an area where I'd like to hear more about your requirements, 
> Vishwa.  Would you expect the BMC cluster to be hot-swappable?  Is 
> there a particular reason that it has to be peer to peer? What kind of 
> error recovery should be supported when a node fails?
>
> At a high level, the idea that has been suggested internally is to 
> have a designated master node at install time.  That node would 
> discover any other Redfish services on the LAN, and begin aggregating 
> them.  The master node would keep any in memory cache of the other 
> services, and reload resources on demand.  If a node goes down, then 
> there error is propagated using HTTP return codes.  If the master node 
> goes down, then the entire aggregate will go down.  In theory a client 
> could talk to individual nodes if it needed to.
> *
> *

Case-1:
.......

Consider a hypothetical case where I have 4 compute nodes, each having 
BMC in it and that BMC is responsible for initiating power-on and other 
services for that node / getting the debug data out of that node / etc...

We would want an external Management Console(MC) to manage this rack. 
Instead of going to 4 nodes separately, MC can ask 1 BMC that I am 
calling as "Point Of Contact" BMC / Primary BMC for that rack. It is the 
job of that BMC to do whatever is needed to return the result.

Similarly, when the POC goes down, we would need another POC.

I believe, Redfish discovery can be used to discover each BMCs. But how 
does the heart beat work between discovered BMCs ?
Also, when the POC goes down, how can we sense that and make some other 
BMC as POC using Redfish framework ?


Case-2:
.......

I have a control node that is housing 2 BMCs. One can be Primary and 
other can be Slave. Each BMC has the complete view of the whole systems.

I am assuming, we could still discover the other BMC using Redfish.. But 
again, how do we exchange heartbeat and do failover operations ?

Thanks,

!! Vishwa !!

> *Authentication and Authorization*
> This is an area where I think Redfish is a little hands off.  In an 
> ideal world ACLs could be setup without proliferating 
> username/passwords across nodes.  As an aside, we've been thinking 
> about how to use Redfish without any usernames or passwords.  By using 
> a combination of certificates and authorization tokens it should be 
> possible to extend a security zone to a small cluster of BMCs.
>
> Regards,
> Richard
>
> On Wed, Dec 11, 2019 at 11:33 PM Neeraj Ladkani <neladk@microsoft.com 
> <mailto:neladk@microsoft.com>> wrote:
>
>     Sure, how do we want to enable BMC-BMC communication? Standard
>     redfish/IPMI ?
>
>     Neeraj
>
>     *From:*vishwa <vishwa@linux.vnet.ibm.com
>     <mailto:vishwa@linux.vnet.ibm.com>>
>     *Sent:* Wednesday, December 11, 2019 10:59 PM
>     *To:* Neeraj Ladkani <neladk@microsoft.com
>     <mailto:neladk@microsoft.com>>
>     *Cc:* openbmc@lists.ozlabs.org <mailto:openbmc@lists.ozlabs.org>;
>     sgundura@in.ibm.com <mailto:sgundura@in.ibm.com>;
>     kusripat@in.ibm.com <mailto:kusripat@in.ibm.com>;
>     shahjsha@in.ibm.com <mailto:shahjsha@in.ibm.com>;
>     vikantan@in.ibm.com <mailto:vikantan@in.ibm.com>; Richard Hanley
>     <rhanley@google.com <mailto:rhanley@google.com>>
>     *Subject:* Re: [EXTERNAL] Re: Managing heterogeneous systems
>
>     On 12/10/19 3:20 PM, Neeraj Ladkani wrote:
>
>         Great discussion.
>
>         The problem is not physical interface as they can communicate
>         using LAN. The problem is entity binding as one compute node
>         can be connected to 1 or more storage nodes. How can we have
>         one view of system from operational perspective? Power on/off,
>         SEL logs, telemetry?
>
>
>     Correct. This is where I mentioned about "Primary BMC acting as
>     Point Of Contact" for external requests.
>     Depending on how we want to service the request, we could
>     orchestrate that via PoC BMC or respond to external requesters on
>     where they can get the data and they connect to 'em directly.
>
>
>     !! Vishwa !!
>
>         Some of problems :
>
>          1. Power operations : Power/resets/ need to be coordinated in
>             all nodes in a system
>          2. Telemetry : OS runs only on head node so if there are
>             requests to read telemetry, it should get telemetry ( SEL
>             logs, Sensor Values ) from all the nodes.
>          3. Firmware Update
>          4. RAS: Memory errors are logged by UEFI SMM in to head node
>             but corresponding DIMM temperature , inlet temperature are
>             logged on secondary node which are not mapped.
>
>         I have been exploring couple of routes
>
>          1. LUN  discovery and routing: this is similar to IPMI but I
>             am working on architecture to extend this to support
>             multiple LUNs and route them from Head node. ( we would
>             need LUN routing over LAN )
>          2. Redfish hierarchy for systems
>
>            "Systems": {
>
>                 "@odata.id <http://odata.id>": "/redfish/v1/Systems"
>
>             },
>
>             "Chassis": {
>
>                 "@odata.id <http://odata.id>": "/redfish/v1/Chassis"
>
>             },
>
>             "Managers": {
>
>                 "@odata.id <http://odata.id>": "/redfish/v1/Managers"
>
>             },
>
>             "AccountService": {
>
>                 "@odata.id <http://odata.id>":
>         "/redfish/v1/AccountService"
>
>             },
>
>             "SessionService": {
>
>                 "@odata.id <http://odata.id>":
>         "/redfish/v1/SessionService"
>
>             },
>
>             "Links": {
>
>                 "Sessions": {
>
>                     "@odata.id <http://odata.id>":
>         "/redfish/v1/SessionService/Sessions"
>
>                 }
>
>         3.Custom Messaging over LAN ( PubSub)
>
>         I am also working on a whitepaper on same area J.  Happy to
>         work with you guys if you have any ideas on how can we
>         standardize this.
>
>         Neeraj
>
>         *From:*vishwa <vishwa@linux.vnet.ibm.com>
>         <mailto:vishwa@linux.vnet.ibm.com>
>         *Sent:* Tuesday, December 10, 2019 1:00 AM
>         *To:* Richard Hanley <rhanley@google.com>
>         <mailto:rhanley@google.com>; Neeraj Ladkani
>         <neladk@microsoft.com> <mailto:neladk@microsoft.com>
>         *Cc:* openbmc@lists.ozlabs.org
>         <mailto:openbmc@lists.ozlabs.org>; sgundura@in.ibm.com
>         <mailto:sgundura@in.ibm.com>; kusripat@in.ibm.com
>         <mailto:kusripat@in.ibm.com>; shahjsha@in.ibm.com
>         <mailto:shahjsha@in.ibm.com>; vikantan@in.ibm.com
>         <mailto:vikantan@in.ibm.com>
>         *Subject:* [EXTERNAL] Re: Managing heterogeneous systems
>
>         Hi Richard / Neeraj,
>
>         Thanks for bringing this up. It's one of the interesting topic
>         for IBM.
>
>         Some of the thoughts here.....
>
>         When we have multiple BMCs as part of a single system, then
>         there are 3 main parts into it.
>
>         1/. Discovering the peer BMCs and role assignment
>         2/. Monitoring the existence of peer BMCs - heartbeat
>         3/. In the event of loosing the master, detect so using #2 and
>         then reassign the role
>
>         Depending on how we want to establish the roles, we could have
>         Single-Master, Many-slave or Multi-Master, Multi-Slave. etc
>
>         One of the team here is trying to do a POC for Multi BMC
>         architecture and is still in the very beginning stage.
>         The team is currently studying/evaluating the available
>         solution - Corosync / Heartbeat / Pacemaker".
>         Corosync works nice with the clusters, but we need to see if
>         we can trim it down for BMC.
>
>         If we can not use corosync for some reason, then need to see
>         if we can use the discovery using PLDM ( probably use the
>         terminus IDs )
>         and come up with custom rules for assigning Master-Slave roles.
>
>         If we choose to have Single-Master and Many-Slave, we could
>         have that Single-Master as an entity acting as a Point of
>         Contact for external request and then could orchestrate with
>         the needed BMCs internally to get the job done
>
>         I will be happy to know if there are alternatives that suit
>         BMC kind of an architecture
>
>         !! Vishwa !!
>
>         On 12/10/19 4:32 AM, Richard Hanley wrote:
>
>             Hi Neeraj,
>
>             This is an open question that I've been looking into as well.
>
>             For BMC to BMC communication there are a few options.
>
>              1. If you have network connectivity you can communicate
>                 using Redfish.
>              2. If you only have a PCIe connection, you'll have to use
>                 either the inband connection or the side band I2C*. 
>                 PLDM and MCTP are protocols that defined to handle
>                 this use case, although I'm not sure if the OpenBMC
>                 implementations have been used in production.
>              3. There is always IPMI, which has its own pros/cons.
>
>             For taking several BMCs and aggregating them into a single
>             logical interface that is exposed to the outside world,
>             there are a few things happening on that front.  DMTF has
>             been working on an aggregation protocol for Redfish. 
>             However, it's my understanding that their proposal is more
>             directed at the client level, as opposed to within a
>             single "system".
>
>             I just recently joined the community, but I've been
>             thinking about how a proxy layer could merge two Redfish
>             services together.  Since Redfish is fairly strongly typed
>             and has a well defined mechanism for OEM extensions, this
>             should be pretty generally applicable.  I am planning
>             on having a white paper on the issue sometime after the
>             holidays.
>
>             Another thing to note, recently DMTF released a spec for
>             running a binary Redfish over PLDM called RDE.  That might
>             be a useful way of tying all these concepts together.
>
>             I'd be curious about your thoughts and use cases here. 
>             Would either PLDM or Redfish fit your use case?
>
>             Regards,
>
>             Richard
>
>             *I've heard of some proposals that run a network interface
>             over PCIe.  I don't know enough about PCIe to know if this
>             is a good idea.
>
>             On Mon, Dec 9, 2019 at 1:27 PM Neeraj Ladkani
>             <neladk@microsoft.com <mailto:neladk@microsoft.com>> wrote:
>
>                 Are there any standards in managing heterogeneous
>                 systems? For example in a rack if there is a compute
>                 node( with its own BMC) and storage node( with its own
>                 BMC) connected using a PCIe switch.  How these two BMC
>                 represented as one system ?  are there any standards
>                 for BMC – BMC communication?
>
>                 Neeraj
>

[-- Attachment #2: Type: text/html, Size: 27272 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [EXTERNAL] Re: Managing heterogeneous systems
  2019-12-19 10:17             ` vishwa
@ 2020-01-09 22:21               ` Richard Hanley
  2020-01-10  1:08                 ` Neeraj Ladkani
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Hanley @ 2020-01-09 22:21 UTC (permalink / raw)
  To: vishwa; +Cc: Neeraj Ladkani, openbmc, sgundura, kusripat, shahjsha, vikantan

[-- Attachment #1: Type: text/plain, Size: 12736 bytes --]

I'm going to resurrect this thread for the new year.

It sounds like there is a decent need for some type of aggregator.  Would
anyone be interested in setting up a meeting to try and synthesize our use
cases into some broadly applicable requirements?

I'm located on the West Coast, but I have a pretty flexible schedule for
other time zones next week.

Some topics for us to discuss (either in a meeting or offline) include:

1) Layer 2/3 discovery and negotiation
2) Caching, proxy, and consistency requirements
3) Target hardware, performance requirements, and scale of aggregation
4) Tooling and infrastructure improvements needed to support an aggregator
5) Amount of configuration and knowledge an aggregator needs to know a
priori.

Any ideas on what else we can cover?  Is there a preferred format or medium
that would work best to gather these higher level requirements?

Regards,
Richard

On Thu, Dec 19, 2019 at 2:17 AM vishwa <vishwa@linux.vnet.ibm.com> wrote:

> Richard,
>
> Thanks for putting it together.
> On 12/13/19 1:32 AM, Richard Hanley wrote:
>
> In our case we are working to migrate away from IPMI to Redfish.  Most of
> the solutions I've been thinking about have leaned pretty heavily into
> that.
>
> In my mind I've sliced this project up into a few different areas.
>
> *Merging/Transforming Redfish Resources*
> Let's say that there are several Redfish services.  They will have
> collections of Systems, Chassis, and Managers that need to be merged.  In
> the simplest uses this would be just an HTTP proxy cache with some URL
> cleaning.
>
> However, this could end up being a pretty deep merge in cases where some
> resources are split across multiple management domains.  Memory errors
> being on one node, but the temperature sensor being on a separate node is a
> good example. Another example would be the "ContainedBy" link.  These links
> might reach across different BMC boundaries, and would need to be inserted
> by the primary node.
>
> *Aggregating Services and Actions*
> This is where I think the DMTF proposals for Redfish aggregation (located
> here
> <https://members.dmtf.org/apps/org/workgroup/redfish/document.php?document_id=91811>)
> provide the most insight.  My reading of this proposal is that an
> aggregation service would be used to tie actions together.  For example,
> there may be individual chassis reset action embedded in the chassis
> resources, and then aggregated action for a full reset.
>
> DMTF seems to be leaving the arbiter of the aggregation up to the
> implementation.  I'd imagine that some implementations would provide a
> static aggregation service, while others would allow clients to create
> their own dynamic aggregates.
>
> *Discovery, Negotiation, and Error Recovery*
> This is an area where I'd like to hear more about your requirements,
> Vishwa.  Would you expect the BMC cluster to be hot-swappable?  Is there a
> particular reason that it has to be peer to peer? What kind of error
> recovery should be supported when a node fails?
>
> At a high level, the idea that has been suggested internally is to have a
> designated master node at install time.  That node would discover any other
> Redfish services on the LAN, and begin aggregating them.  The master node
> would keep any in memory cache of the other services, and reload resources
> on demand.  If a node goes down, then there error is propagated using HTTP
> return codes.  If the master node goes down, then the entire aggregate will
> go down.  In theory a client could talk to individual nodes if it needed to.
>
> Case-1:
> .......
>
> Consider a hypothetical case where I have 4 compute nodes, each having BMC
> in it and that BMC is responsible for initiating power-on and other
> services for that node / getting the debug data out of that node / etc...
>
> We would want an external Management Console(MC) to manage this rack.
> Instead of going to 4 nodes separately, MC can ask 1 BMC that I am calling
> as "Point Of Contact" BMC / Primary BMC for that rack. It is the job of
> that BMC to do whatever is needed to return the result.
>
> Similarly, when the POC goes down, we would need another POC.
>
> I believe, Redfish discovery can be used to discover each BMCs. But how
> does the heart beat work between discovered BMCs ?
> Also, when the POC goes down, how can we sense that and make some other
> BMC as POC using Redfish framework ?
>
>
> Case-2:
> .......
>
> I have a control node that is housing 2 BMCs. One can be Primary and other
> can be Slave. Each BMC has the complete view of the whole systems.
>
> I am assuming, we could still discover the other BMC using Redfish.. But
> again, how do we exchange heartbeat and do failover operations ?
>
> Thanks,
>
> !! Vishwa !!
>
> * Authentication and Authorization*
> This is an area where I think Redfish is a little hands off.  In an ideal
> world ACLs could be setup without proliferating username/passwords across
> nodes.  As an aside, we've been thinking about how to use Redfish without
> any usernames or passwords.  By using a combination of certificates and
> authorization tokens it should be possible to extend a security zone to a
> small cluster of BMCs.
>
> Regards,
> Richard
>
> On Wed, Dec 11, 2019 at 11:33 PM Neeraj Ladkani <neladk@microsoft.com>
> wrote:
>
>> Sure, how do we want to enable BMC-BMC communication? Standard
>> redfish/IPMI ?
>>
>>
>>
>> Neeraj
>>
>>
>>
>>
>>
>> *From:* vishwa <vishwa@linux.vnet.ibm.com>
>> *Sent:* Wednesday, December 11, 2019 10:59 PM
>> *To:* Neeraj Ladkani <neladk@microsoft.com>
>> *Cc:* openbmc@lists.ozlabs.org; sgundura@in.ibm.com; kusripat@in.ibm.com;
>> shahjsha@in.ibm.com; vikantan@in.ibm.com; Richard Hanley <
>> rhanley@google.com>
>> *Subject:* Re: [EXTERNAL] Re: Managing heterogeneous systems
>>
>>
>>
>> On 12/10/19 3:20 PM, Neeraj Ladkani wrote:
>>
>> Great discussion.
>>
>>
>>
>> The problem is not physical interface as they can communicate using LAN.
>> The problem is entity binding as one compute node can be connected to 1 or
>> more storage nodes. How can we have one view of system from operational
>> perspective? Power on/off, SEL logs, telemetry?
>>
>>
>>
>>
>> Correct. This is where I mentioned about "Primary BMC acting as Point Of
>> Contact" for external requests.
>> Depending on how we want to service the request, we could orchestrate
>> that via PoC BMC or respond to external requesters on where they can get
>> the data and they connect to 'em directly.
>>
>>
>> !! Vishwa !!
>>
>>
>>
>> Some of problems :
>>
>>
>>
>>    1. Power operations : Power/resets/ need to be coordinated in all
>>    nodes in a system
>>    2. Telemetry : OS runs only on head node so if there are requests to
>>    read telemetry, it should get telemetry ( SEL logs, Sensor Values ) from
>>    all the nodes.
>>    3. Firmware Update
>>    4. RAS: Memory errors are logged by UEFI SMM in to head node but
>>    corresponding DIMM temperature , inlet temperature are logged on secondary
>>    node which are not mapped.
>>
>>
>>
>>
>>
>> I have been exploring couple of routes
>>
>>
>>
>>    1. LUN  discovery and routing: this is similar to IPMI but I am
>>    working on architecture to extend this to support multiple LUNs and route
>>    them from Head node. ( we would need LUN routing over LAN )
>>    2. Redfish hierarchy for systems
>>
>>    "Systems": {
>>
>>         "@odata.id": "/redfish/v1/Systems"
>>
>>     },
>>
>>     "Chassis": {
>>
>>         "@odata.id": "/redfish/v1/Chassis"
>>
>>     },
>>
>>     "Managers": {
>>
>>         "@odata.id": "/redfish/v1/Managers"
>>
>>     },
>>
>>     "AccountService": {
>>
>>         "@odata.id": "/redfish/v1/AccountService"
>>
>>     },
>>
>>     "SessionService": {
>>
>>         "@odata.id": "/redfish/v1/SessionService"
>>
>>     },
>>
>>     "Links": {
>>
>>         "Sessions": {
>>
>>             "@odata.id": "/redfish/v1/SessionService/Sessions"
>>
>>         }
>>
>> 3.  Custom Messaging over LAN ( PubSub)
>>
>>
>>
>> I am also working on a whitepaper on same area J.  Happy to work with
>> you guys if you have any ideas on how can we standardize this.
>>
>>
>>
>> Neeraj
>>
>>
>>
>> *From:* vishwa <vishwa@linux.vnet.ibm.com> <vishwa@linux.vnet.ibm.com>
>> *Sent:* Tuesday, December 10, 2019 1:00 AM
>> *To:* Richard Hanley <rhanley@google.com> <rhanley@google.com>; Neeraj
>> Ladkani <neladk@microsoft.com> <neladk@microsoft.com>
>> *Cc:* openbmc@lists.ozlabs.org; sgundura@in.ibm.com; kusripat@in.ibm.com;
>> shahjsha@in.ibm.com; vikantan@in.ibm.com
>> *Subject:* [EXTERNAL] Re: Managing heterogeneous systems
>>
>>
>>
>> Hi Richard / Neeraj,
>>
>> Thanks for bringing this up. It's one of the interesting topic for IBM.
>>
>> Some of the thoughts here.....
>>
>> When we have multiple BMCs as part of a single system, then there are 3
>> main parts into it.
>>
>> 1/. Discovering the peer BMCs and role assignment
>> 2/. Monitoring the existence of peer BMCs - heartbeat
>> 3/. In the event of loosing the master, detect so using #2 and then
>> reassign the role
>>
>> Depending on how we want to establish the roles, we could have
>> Single-Master, Many-slave or Multi-Master, Multi-Slave. etc
>>
>> One of the team here is trying to do a POC for Multi BMC architecture and
>> is still in the very beginning stage.
>> The team is currently studying/evaluating the available solution -
>> Corosync / Heartbeat / Pacemaker".
>> Corosync works nice with the clusters, but we need to see if we can trim
>> it down for BMC.
>>
>> If we can not use corosync for some reason, then need to see if we can
>> use the discovery using PLDM ( probably use the terminus IDs )
>> and come up with custom rules for assigning Master-Slave roles.
>>
>> If we choose to have Single-Master and Many-Slave, we could have that
>> Single-Master as an entity acting as a Point of Contact for external
>> request and then could orchestrate with the needed BMCs internally to get
>> the job done
>>
>> I will be happy to know if there are alternatives that suit BMC kind of
>> an architecture
>>
>> !! Vishwa !!
>>
>> On 12/10/19 4:32 AM, Richard Hanley wrote:
>>
>> Hi Neeraj,
>>
>>
>>
>> This is an open question that I've been looking into as well.
>>
>>
>>
>> For BMC to BMC communication there are a few options.
>>
>>    1. If you have network connectivity you can communicate using Redfish.
>>    2. If you only have a PCIe connection, you'll have to use either the
>>    inband connection or the side band I2C*.  PLDM and MCTP are protocols that
>>    defined to handle this use case, although I'm not sure if the OpenBMC
>>    implementations have been used in production.
>>    3. There is always IPMI, which has its own pros/cons.
>>
>> For taking several BMCs and aggregating them into a single logical
>> interface that is exposed to the outside world, there are a few things
>> happening on that front.  DMTF has been working on an aggregation protocol
>> for Redfish.  However, it's my understanding that their proposal is more
>> directed at the client level, as opposed to within a single "system".
>>
>>
>>
>> I just recently joined the community, but I've been thinking about how a
>> proxy layer could merge two Redfish services together.  Since Redfish is
>> fairly strongly typed and has a well defined mechanism for OEM extensions,
>> this should be pretty generally applicable.  I am planning on having a
>> white paper on the issue sometime after the holidays.
>>
>>
>>
>> Another thing to note, recently DMTF released a spec for running a binary
>> Redfish over PLDM called RDE.  That might be a useful way of tying all
>> these concepts together.
>>
>>
>>
>> I'd be curious about your thoughts and use cases here.  Would either PLDM
>> or Redfish fit your use case?
>>
>>
>>
>> Regards,
>>
>> Richard
>>
>>
>>
>> *I've heard of some proposals that run a network interface over PCIe.  I
>> don't know enough about PCIe to know if this is a good idea.
>>
>>
>>
>> On Mon, Dec 9, 2019 at 1:27 PM Neeraj Ladkani <neladk@microsoft.com>
>> wrote:
>>
>> Are there any standards in managing heterogeneous systems? For example in
>> a rack if there is a compute node( with its own BMC) and storage node( with
>> its own BMC) connected using a PCIe switch.  How these two BMC represented
>> as one system ?  are there any standards for BMC – BMC communication?
>>
>>
>>
>>
>>
>> Neeraj
>>
>>
>>
>>

[-- Attachment #2: Type: text/html, Size: 27697 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [EXTERNAL] Re: Managing heterogeneous systems
  2020-01-09 22:21               ` Richard Hanley
@ 2020-01-10  1:08                 ` Neeraj Ladkani
  0 siblings, 0 replies; 10+ messages in thread
From: Neeraj Ladkani @ 2020-01-10  1:08 UTC (permalink / raw)
  To: Richard Hanley, vishwa; +Cc: openbmc, sgundura, kusripat, shahjsha, vikantan

Thanks Richard for reviving this thread! 

Please count me in for the discussion. I will add few more points in your agenda discussion. 

1) Layer 2/3 discovery and negotiation
2) Caching, proxy, and consistency requirements
3) Target hardware, performance requirements, and scale of aggregation
4) Tooling and infrastructure improvements needed to support an aggregator
5) Amount of configuration and knowledge an aggregator needs to know a priori.
6) Data presentation from Inband?
7) Security  

I am also in PST and flexible on other time zones as well. 

Neeraj

From: Richard Hanley <rhanley@google.com> 
Sent: Thursday, January 9, 2020 2:21 PM
To: vishwa <vishwa@linux.vnet.ibm.com>
Cc: Neeraj Ladkani <neladk@microsoft.com>; openbmc@lists.ozlabs.org; sgundura@in.ibm.com; kusripat@in.ibm.com; shahjsha@in.ibm.com; vikantan@in.ibm.com
Subject: Re: [EXTERNAL] Re: Managing heterogeneous systems

I'm going to resurrect this thread for the new year.

It sounds like there is a decent need for some type of aggregator.  Would anyone be interested in setting up a meeting to try and synthesize our use cases into some broadly applicable requirements?

I'm located on the West Coast, but I have a pretty flexible schedule for other time zones next week.

Some topics for us to discuss (either in a meeting or offline) include:

1) Layer 2/3 discovery and negotiation
2) Caching, proxy, and consistency requirements
3) Target hardware, performance requirements, and scale of aggregation
4) Tooling and infrastructure improvements needed to support an aggregator
5) Amount of configuration and knowledge an aggregator needs to know a priori.

Any ideas on what else we can cover?  Is there a preferred format or medium that would work best to gather these higher level requirements?

Regards,
Richard

On Thu, Dec 19, 2019 at 2:17 AM vishwa <mailto:vishwa@linux.vnet.ibm.com> wrote:
Richard, 
Thanks for putting it together.
On 12/13/19 1:32 AM, Richard Hanley wrote:
In our case we are working to migrate away from IPMI to Redfish.  Most of the solutions I've been thinking about have leaned pretty heavily into that. 

In my mind I've sliced this project up into a few different areas. 

Merging/Transforming Redfish Resources
Let's say that there are several Redfish services.  They will have collections of Systems, Chassis, and Managers that need to be merged.  In the simplest uses this would be just an HTTP proxy cache with some URL cleaning.

However, this could end up being a pretty deep merge in cases where some resources are split across multiple management domains.  Memory errors being on one node, but the temperature sensor being on a separate node is a good example. Another example would be the "ContainedBy" link.  These links might reach across different BMC boundaries, and would need to be inserted by the primary node. 

Aggregating Services and Actions
This is where I think the DMTF proposals for Redfish aggregation (located https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmembers.dmtf.org%2Fapps%2Forg%2Fworkgroup%2Fredfish%2Fdocument.php%3Fdocument_id%3D91811&data=02%7C01%7Cneladk%40microsoft.com%7C252332bdcf5a42f86b0408d795524ba2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637142053009953534&sdata=lFM7W0d%2FNFvUvIMJQYtfP3X3QCWN%2B2K0sv7bYbS5mFM%3D&reserved=0) provide the most insight.  My reading of this proposal is that an aggregation service would be used to tie actions together.  For example, there may be individual chassis reset action embedded in the chassis resources, and then aggregated action for a full reset.

DMTF seems to be leaving the arbiter of the aggregation up to the implementation.  I'd imagine that some implementations would provide a static aggregation service, while others would allow clients to create their own dynamic aggregates.

Discovery, Negotiation, and Error Recovery
This is an area where I'd like to hear more about your requirements, Vishwa.  Would you expect the BMC cluster to be hot-swappable?  Is there a particular reason that it has to be peer to peer? What kind of error recovery should be supported when a node fails? 

At a high level, the idea that has been suggested internally is to have a designated master node at install time.  That node would discover any other Redfish services on the LAN, and begin aggregating them.  The master node would keep any in memory cache of the other services, and reload resources on demand.  If a node goes down, then there error is propagated using HTTP return codes.  If the master node goes down, then the entire aggregate will go down.  In theory a client could talk to individual nodes if it needed to.

Case-1:
.......
Consider a hypothetical case where I have 4 compute nodes, each having BMC in it and that BMC is responsible for initiating power-on and other services for that node / getting the debug data out of that node / etc...
We would want an external Management Console(MC) to manage this rack. Instead of going to 4 nodes separately, MC can ask 1 BMC that I am calling as "Point Of Contact" BMC / Primary BMC for that rack. It is the job of that BMC to do whatever is needed to return the result.
Similarly, when the POC goes down, we would need another POC.
I believe, Redfish discovery can be used to discover each BMCs. But how does the heart beat work between discovered BMCs ?
Also, when the POC goes down, how can we sense that and make some other BMC as POC using Redfish framework ?

Case-2:
.......
I have a control node that is housing 2 BMCs. One can be Primary and other can be Slave. Each BMC has the complete view of the whole systems. 
I am assuming, we could still discover the other BMC using Redfish.. But again, how do we exchange heartbeat and do failover operations ?
Thanks,
!! Vishwa !!
Authentication and Authorization
This is an area where I think Redfish is a little hands off.  In an ideal world ACLs could be setup without proliferating username/passwords across nodes.  As an aside, we've been thinking about how to use Redfish without any usernames or passwords.  By using a combination of certificates and authorization tokens it should be possible to extend a security zone to a small cluster of BMCs.

Regards,
Richard

On Wed, Dec 11, 2019 at 11:33 PM Neeraj Ladkani <mailto:neladk@microsoft.com> wrote:
Sure, how do we want to enable BMC-BMC communication? Standard redfish/IPMI ? 
 
Neeraj
 
 
From: vishwa <mailto:vishwa@linux.vnet.ibm.com> 
Sent: Wednesday, December 11, 2019 10:59 PM
To: Neeraj Ladkani <mailto:neladk@microsoft.com>
Cc: mailto:openbmc@lists.ozlabs.org; mailto:sgundura@in.ibm.com; mailto:kusripat@in.ibm.com; mailto:shahjsha@in.ibm.com; mailto:vikantan@in.ibm.com; Richard Hanley <mailto:rhanley@google.com>
Subject: Re: [EXTERNAL] Re: Managing heterogeneous systems
 
On 12/10/19 3:20 PM, Neeraj Ladkani wrote:
Great discussion. 
 
The problem is not physical interface as they can communicate using LAN. The problem is entity binding as one compute node can be connected to 1 or more storage nodes. How can we have one view of system from operational perspective? Power on/off, SEL logs, telemetry? 
 

Correct. This is where I mentioned about "Primary BMC acting as Point Of Contact" for external requests.
Depending on how we want to service the request, we could orchestrate that via PoC BMC or respond to external requesters on where they can get the data and they connect to 'em directly.

!! Vishwa !!
 
Some of problems :
 
1. Power operations : Power/resets/ need to be coordinated in all nodes in a system 
2. Telemetry : OS runs only on head node so if there are requests to read telemetry, it should get telemetry ( SEL logs, Sensor Values ) from all the nodes. 
3. Firmware Update
4. RAS: Memory errors are logged by UEFI SMM in to head node but corresponding DIMM temperature , inlet temperature are logged on secondary node which are not mapped.  
 
 
I have been exploring couple of routes 
 
1. LUN  discovery and routing: this is similar to IPMI but I am working on architecture to extend this to support multiple LUNs and route them from Head node. ( we would need LUN routing over LAN ) 
2. Redfish hierarchy for systems 
   "Systems": {
        "@https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fodata.id&data=02%7C01%7Cneladk%40microsoft.com%7C252332bdcf5a42f86b0408d795524ba2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637142053009963532&sdata=xsL6Al0D3ddfNEBu4sI3MmXEHqNhTzaSLxsmvjffwXA%3D&reserved=0": "/redfish/v1/Systems"
    },
    "Chassis": {
        "@https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fodata.id&data=02%7C01%7Cneladk%40microsoft.com%7C252332bdcf5a42f86b0408d795524ba2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637142053009963532&sdata=xsL6Al0D3ddfNEBu4sI3MmXEHqNhTzaSLxsmvjffwXA%3D&reserved=0": "/redfish/v1/Chassis"
    },
    "Managers": {
        "@https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fodata.id&data=02%7C01%7Cneladk%40microsoft.com%7C252332bdcf5a42f86b0408d795524ba2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637142053009973528&sdata=nTXfOlL%2Ff0d5ALUa28%2BsrYSskTwYwihZ6yjCsaUsc%2Fk%3D&reserved=0": "/redfish/v1/Managers"
    },
    "AccountService": {
        "@https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fodata.id&data=02%7C01%7Cneladk%40microsoft.com%7C252332bdcf5a42f86b0408d795524ba2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637142053009973528&sdata=nTXfOlL%2Ff0d5ALUa28%2BsrYSskTwYwihZ6yjCsaUsc%2Fk%3D&reserved=0": "/redfish/v1/AccountService"
    },
    "SessionService": {
        "@https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fodata.id&data=02%7C01%7Cneladk%40microsoft.com%7C252332bdcf5a42f86b0408d795524ba2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637142053009983521&sdata=MuxyWv3I1qCNIfXIFHWTGPd66U6DXRqZNTzIViK1908%3D&reserved=0": "/redfish/v1/SessionService"
    },
    "Links": {
        "Sessions": {
            "@https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fodata.id&data=02%7C01%7Cneladk%40microsoft.com%7C252332bdcf5a42f86b0408d795524ba2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637142053009983521&sdata=MuxyWv3I1qCNIfXIFHWTGPd66U6DXRqZNTzIViK1908%3D&reserved=0": "/redfish/v1/SessionService/Sessions"
        }
3.  Custom Messaging over LAN ( PubSub)
 
I am also working on a whitepaper on same area ☺.  Happy to work with you guys if you have any ideas on how can we standardize this. 
 
Neeraj
 
From: vishwa mailto:vishwa@linux.vnet.ibm.com 
Sent: Tuesday, December 10, 2019 1:00 AM
To: Richard Hanley mailto:rhanley@google.com; Neeraj Ladkani mailto:neladk@microsoft.com
Cc: mailto:openbmc@lists.ozlabs.org; mailto:sgundura@in.ibm.com; mailto:kusripat@in.ibm.com; mailto:shahjsha@in.ibm.com; mailto:vikantan@in.ibm.com
Subject: [EXTERNAL] Re: Managing heterogeneous systems
 
Hi Richard / Neeraj,
Thanks for bringing this up. It's one of the interesting topic for IBM.
Some of the thoughts here.....
When we have multiple BMCs as part of a single system, then there are 3 main parts into it.
1/. Discovering the peer BMCs and role assignment
2/. Monitoring the existence of peer BMCs - heartbeat 
3/. In the event of loosing the master, detect so using #2 and then reassign the role
Depending on how we want to establish the roles, we could have Single-Master, Many-slave or Multi-Master, Multi-Slave. etc
One of the team here is trying to do a POC for Multi BMC architecture and is still in the very beginning stage. 
The team is currently studying/evaluating the available solution - Corosync / Heartbeat / Pacemaker".
Corosync works nice with the clusters, but we need to see if we can trim it down for BMC.

If we can not use corosync for some reason, then need to see if we can use the discovery using PLDM ( probably use the terminus IDs )
and come up with custom rules for assigning Master-Slave roles.
If we choose to have Single-Master and Many-Slave, we could have that Single-Master as an entity acting as a Point of Contact for external request and then could orchestrate with the needed BMCs internally to get the job done
I will be happy to know if there are alternatives that suit BMC kind of an architecture
!! Vishwa !!
On 12/10/19 4:32 AM, Richard Hanley wrote:
Hi Neeraj, 
 
This is an open question that I've been looking into as well.  
 
For BMC to BMC communication there are a few options.
1. If you have network connectivity you can communicate using Redfish.
2. If you only have a PCIe connection, you'll have to use either the inband connection or the side band I2C*.  PLDM and MCTP are protocols that defined to handle this use case, although I'm not sure if the OpenBMC implementations have been used in production.
3. There is always IPMI, which has its own pros/cons.
For taking several BMCs and aggregating them into a single logical interface that is exposed to the outside world, there are a few things happening on that front.  DMTF has been working on an aggregation protocol for Redfish.  However, it's my understanding that their proposal is more directed at the client level, as opposed to within a single "system".
 
I just recently joined the community, but I've been thinking about how a proxy layer could merge two Redfish services together.  Since Redfish is fairly strongly typed and has a well defined mechanism for OEM extensions, this should be pretty generally applicable.  I am planning on having a white paper on the issue sometime after the holidays.
 
Another thing to note, recently DMTF released a spec for running a binary Redfish over PLDM called RDE.  That might be a useful way of tying all these concepts together.  
 
I'd be curious about your thoughts and use cases here.  Would either PLDM or Redfish fit your use case?
 
Regards,
Richard
 
*I've heard of some proposals that run a network interface over PCIe.  I don't know enough about PCIe to know if this is a good idea.
 
On Mon, Dec 9, 2019 at 1:27 PM Neeraj Ladkani <mailto:neladk@microsoft.com> wrote:
Are there any standards in managing heterogeneous systems? For example in a rack if there is a compute node( with its own BMC) and storage node( with its own BMC) connected using a PCIe switch.  How these two BMC represented as one system ?  are there any standards for BMC – BMC communication? 
 
 
Neeraj
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-01-10  1:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-09 21:26 Managing heterogeneous systems Neeraj Ladkani
2019-12-09 23:02 ` Richard Hanley
2019-12-10  8:59   ` vishwa
2019-12-10  9:50     ` [EXTERNAL] " Neeraj Ladkani
2019-12-12  6:59       ` vishwa
2019-12-12  7:33         ` Neeraj Ladkani
2019-12-12 20:02           ` Richard Hanley
2019-12-19 10:17             ` vishwa
2020-01-09 22:21               ` Richard Hanley
2020-01-10  1:08                 ` Neeraj Ladkani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.