All of lore.kernel.org
 help / color / mirror / Atom feed
* Adding support for custom SEL records
@ 2020-12-07  7:35 Lei Yu
  2022-10-18 20:09 ` Brad Bishop
  0 siblings, 1 reply; 19+ messages in thread
From: Lei Yu @ 2020-12-07  7:35 UTC (permalink / raw)
  To: openbmc

This mail is to propose some changes in openbmc to support custom SEL records.

* Background
The phosphor-sel-logger is the service to provide methods (IpmiSelAdd,
IpmiSelAddOem) to add custom SEL records.
However:
1. It only sends the log into the journal;
2. It depends on
meta-intel/meta-common/recipes-extended/rsyslog/rsyslog/rsyslog.conf
to filter the journal log and write to "/var/log/ipmi_sel"
3. It depends on intel-ipmi-oem to parse the "/var/log/ipmi_sel" to
provide the SEL entries.

In short, phosphor-sel-logger only works with the meta-intel layer and
intel-ipmi-oem.

To make it support general usage, several changes are submitted to gerrit:
https://gerrit.openbmc-project.xyz/q/topic:%22dev-add-custom-sel%22+(status:open%20OR%20status:merged)

Some details:
1. It adds SEL.errors.yaml in PDI.
2. With SEL_LOGGER_SEND_TO_LOGGING_SERVICE option, phosphor-sel-logger
creates entries in the logging service without callout path, instead
of sending to the journal.
3.  In phosphor-host-ipmid, it adds extra logic to parse the logging
entries and convert to SEL entries.

Comments are welcome.

-- 
BRs,
Lei YU

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2020-12-07  7:35 Adding support for custom SEL records Lei Yu
@ 2022-10-18 20:09 ` Brad Bishop
  2022-10-19  2:05   ` Lei Yu
  0 siblings, 1 reply; 19+ messages in thread
From: Brad Bishop @ 2022-10-18 20:09 UTC (permalink / raw)
  To: Lei Yu, openbmc

Thanks Lei YU for adding the support desribed below.  Sorry for the
necropost but I had a couple questions (below) for anyone that feels
like indulging me with answers.

On Mon, 2020-12-07 at 15:35 +0800, Lei Yu wrote:
> This mail is to propose some changes in openbmc to support custom SEL
> records.
> 
> * Background
> The phosphor-sel-logger is the service to provide methods (IpmiSelAdd,
> IpmiSelAddOem) to add custom SEL records.
> However:
> 1. It only sends the log into the journal;
> 2. It depends on
> meta-intel/meta-common/recipes-extended/rsyslog/rsyslog/rsyslog.conf
> to filter the journal log and write to "/var/log/ipmi_sel"
> 3. It depends on intel-ipmi-oem to parse the "/var/log/ipmi_sel" to
> provide the SEL entries.
> 
> In short, phosphor-sel-logger only works with the meta-intel layer and
> intel-ipmi-oem.
> 
> To make it support general usage, several changes are submitted to
> gerrit:
> https://gerrit.openbmc-project.xyz/q/topic:%22dev-add-custom-sel%22+(status:open%20OR%20status:merged)

To make it support general usage, I think we could have just moved the
rsyslog.conf to meta-phosphor and moved the handlers from intel-ipmi-oem
to phosphor-host-ipmid, correct?  Instead, we invented something
completely new.  My question is, why?

Whatever the reason, OpenBMC now has two ways of doing SELs, and I am
tasked with picking one.  Knowing what was wrong with the rsyslog 
approach that led you to invent something new to replace it would help
me, (and others I suspect), make that decision.

Thanks!
brad

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-18 20:09 ` Brad Bishop
@ 2022-10-19  2:05   ` Lei Yu
  2022-10-19 14:43     ` Brad Bishop
  0 siblings, 1 reply; 19+ messages in thread
From: Lei Yu @ 2022-10-19  2:05 UTC (permalink / raw)
  To: Brad Bishop; +Cc: openbmc

On Wed, Oct 19, 2022 at 4:09 AM Brad Bishop <bradleyb@fuzziesquirrel.com> wrote:
>
> Thanks Lei YU for adding the support desribed below.  Sorry for the
> necropost but I had a couple questions (below) for anyone that feels
> like indulging me with answers.
>
> On Mon, 2020-12-07 at 15:35 +0800, Lei Yu wrote:
> > This mail is to propose some changes in openbmc to support custom SEL
> > records.
> >
> > * Background
> > The phosphor-sel-logger is the service to provide methods (IpmiSelAdd,
> > IpmiSelAddOem) to add custom SEL records.
> > However:
> > 1. It only sends the log into the journal;
> > 2. It depends on
> > meta-intel/meta-common/recipes-extended/rsyslog/rsyslog/rsyslog.conf
> > to filter the journal log and write to "/var/log/ipmi_sel"
> > 3. It depends on intel-ipmi-oem to parse the "/var/log/ipmi_sel" to
> > provide the SEL entries.
> >
> > In short, phosphor-sel-logger only works with the meta-intel layer and
> > intel-ipmi-oem.
> >
> > To make it support general usage, several changes are submitted to
> > gerrit:
> > https://gerrit.openbmc-project.xyz/q/topic:%22dev-add-custom-sel%22+(status:open%20OR%20status:merged)
>
> To make it support general usage, I think we could have just moved the
> rsyslog.conf to meta-phosphor and moved the handlers from intel-ipmi-oem
> to phosphor-host-ipmid, correct?  Instead, we invented something
> completely new.  My question is, why?

There are several reasons:
1. The rsyslog way is undocumented and the handler is not in upstream OpenBMC.
2. The rsyslog way puts the SEL in a file and thus there are no DBus
objects, which makes it harder to work with other services.
3. The above SEL file needs extra configs to preserve, otherwise it's
lost after BMC reboot. (In upstream, the /var/log points to
/var/volatile/log, Intel's fork makes it persistent)

>
> Whatever the reason, OpenBMC now has two ways of doing SELs, and I am
> tasked with picking one.  Knowing what was wrong with the rsyslog
> approach that led you to invent something new to replace it would help
> me, (and others I suspect), make that decision.

Indeed, but the rsyslog way is not really (and fully) upstream.
With `SEL.errors.yaml` interface and the
SEL_LOGGER_SEND_TO_LOGGING_SERVICE option, in phosphor-sel-logger, we
have the ability to add logging entries to phosphor-logging, and let
ipmid to parse it as SEL.
There is one drawback though, the logging entries that are not
"SEL.error" could not be parsed as normal SEL, the current behavior of
ipmid parses it as an "undetermined system error".

-- 
BRs,
Lei YU

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-19  2:05   ` Lei Yu
@ 2022-10-19 14:43     ` Brad Bishop
  2022-10-19 15:50       ` Bills, Jason M
  0 siblings, 1 reply; 19+ messages in thread
From: Brad Bishop @ 2022-10-19 14:43 UTC (permalink / raw)
  To: Lei Yu, Jason Bills; +Cc: openbmc

Thanks for the reply Lei Yu.

On Wed, 2022-10-19 at 10:05 +0800, Lei Yu wrote:
> 
> 2. The rsyslog way puts the SEL in a file and thus there are no DBus
> objects, which makes it harder to work with other services.

Are there other services that work with IPMI sels?  I know there is a
Redfish SEL log.  Anything else?

> Indeed, but the rsyslog way is not really (and fully) upstream.

I'm trying to determine which implementation is a better fit for me
based on the technical merits of the solution, not based on what
repositories the source code is in.  If that ends up being the rsyslog
approach, I'd consider helping to move the code and make it fully
upstream.

In the hopes that it generates additional information about the
motivations behind the differing implementations, allow me to ask a
somewhat rhetorical question.  Jason, to avoid confusing OpenBMC users
by having to select from two different SEL implementations with pros and
cons of each that are not obvious, would you accept patches that remove
the rsyslog based implementation from intel-ipmi-oem (provided the Intel
metadata is also updated to use the alternative)?  If not, why not?

Thanks,
brad

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-19 14:43     ` Brad Bishop
@ 2022-10-19 15:50       ` Bills, Jason M
  2022-10-19 17:10         ` Brad Bishop
                           ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Bills, Jason M @ 2022-10-19 15:50 UTC (permalink / raw)
  To: openbmc



On 10/19/2022 8:43 AM, Brad Bishop wrote:
> Thanks for the reply Lei Yu.
> 
> On Wed, 2022-10-19 at 10:05 +0800, Lei Yu wrote:
>>
>> 2. The rsyslog way puts the SEL in a file and thus there are no DBus
>> objects, which makes it harder to work with other services.
> 
> Are there other services that work with IPMI sels?  I know there is a
> Redfish SEL log.  Anything else?

bmcweb has a build flag to choose between D-Bus- or journal-based logging.
> 
>> Indeed, but the rsyslog way is not really (and fully) upstream.
> 
> I'm trying to determine which implementation is a better fit for me
> based on the technical merits of the solution, not based on what
> repositories the source code is in.  If that ends up being the rsyslog
> approach, I'd consider helping to move the code and make it fully
> upstream.
> 
> In the hopes that it generates additional information about the
> motivations behind the differing implementations, allow me to ask a
> somewhat rhetorical question.  Jason, to avoid confusing OpenBMC users
> by having to select from two different SEL implementations with pros and
> cons of each that are not obvious, would you accept patches that remove
> the rsyslog based implementation from intel-ipmi-oem (provided the Intel
> metadata is also updated to use the alternative)?  If not, why not?

Intel had a requirement to support storing at least 4000 log entries. 
At the time, we were able to get about 400 entries on D-Bus before D-Bus 
performance became unusable.

That was before dbus-broker, so it could perhaps be better today.  But 
I'm guessing there is still a performance impact and arbitrary log limit 
placed on a system by storing the logs on D-Bus.

This log limit is what will make D-Bus log storage a non-starter for Intel.

I'd also be curious about the reverse question.  Is there any benefit to 
storing logs on D-Bus that makes it a better solution?

At the risk of complicating things more (https://xkcd.com/927/), D-Bus 
was the primary solution when Intel joined.  We created the rsyslog 
approach because of the limitation imposed by D-Bus.  But I know there 
are still those who don't like the rsyslog approach.  Is there a way we 
can now get together and define a new logging solution that is fully 
upstream and avoids the drawbacks of both existing solutions?

> 
> Thanks,
> brad

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-19 15:50       ` Bills, Jason M
@ 2022-10-19 17:10         ` Brad Bishop
  2022-10-19 18:05           ` Bills, Jason M
  2022-10-21 20:14         ` Patrick Williams
  2022-10-21 20:34         ` Patrick Williams
  2 siblings, 1 reply; 19+ messages in thread
From: Brad Bishop @ 2022-10-19 17:10 UTC (permalink / raw)
  To: Bills, Jason M; +Cc: openbmc

Thanks Jason

On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote:

>Intel had a requirement to support storing at least 4000 log entries. 

Ok.  So is it fair to assume anyone using the DBus backend does not have 
this requirement?

>At the time, we were able to get about 400 entries on D-Bus before 
>D-Bus performance became unusable.

To anyone using the DBus backend - have you observed similar performance 
issues?

Jason is there a testcase or scenario I can execute to highlighht the 
issues you refer to concretely?  Maybe something like "create 4000 sels, 
run ipmitool and see how long it takes?"

>I'd also be curious about the reverse question.  Is there any benefit 
>to storing logs on D-Bus that makes it a better solution?

Yes, this is exactly the question I've been trying to ask.  The answer 
seems only to be that the code is in meta-intel/intel-ipmi-oem - but 
that is easily fixed by moving the code to 
meta-phosphor/phosphor-host-ipmid.

>At the risk of complicating things more (https://xkcd.com/927/), D-Bus 
>was the primary solution when Intel joined.  We created the rsyslog 
>approach because of the limitation imposed by D-Bus.  But I know there 
>are still those who don't like the rsyslog approach.  Is there a way 
>we can now get together and define a new logging solution that is 
>fully upstream and avoids the drawbacks of both existing solutions?

I hope so, because doing that would make things a lot easier for our 
users adopting OpenBMC.

Thanks,
brad

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-19 17:10         ` Brad Bishop
@ 2022-10-19 18:05           ` Bills, Jason M
  2022-10-19 20:20             ` Brad Bishop
                               ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Bills, Jason M @ 2022-10-19 18:05 UTC (permalink / raw)
  To: openbmc



On 10/19/2022 11:10 AM, Brad Bishop wrote:
> Thanks Jason
> 
> On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote:
> 
>> Intel had a requirement to support storing at least 4000 log entries. 
> 
> Ok.  So is it fair to assume anyone using the DBus backend does not have 
> this requirement?

That is my assumption, yes.
> 
>> At the time, we were able to get about 400 entries on D-Bus before 
>> D-Bus performance became unusable.
> 
> To anyone using the DBus backend - have you observed similar performance 
> issues?
> 
> Jason is there a testcase or scenario I can execute to highlighht the 
> issues you refer to concretely?  Maybe something like "create 4000 sels, 
> run ipmitool and see how long it takes?"

To clarify, my understanding is the D-Bus performance issues were not 
isolated to just IPMI.  All of D-Bus for every BMC service was impacted.

If I remember correctly, Ed Tanous is who did the initial evaluation, so 
he may have more detail.  But I think it was similar to what you 
suggest: Create 4000 logs on D-Bus and check the performance.  This 
could be done with ipmitool.
> 
>> I'd also be curious about the reverse question.  Is there any benefit 
>> to storing logs on D-Bus that makes it a better solution?
> 
> Yes, this is exactly the question I've been trying to ask.  The answer 
> seems only to be that the code is in meta-intel/intel-ipmi-oem - but 
> that is easily fixed by moving the code to 
> meta-phosphor/phosphor-host-ipmid.
> 
>> At the risk of complicating things more (https://xkcd.com/927/), D-Bus 
>> was the primary solution when Intel joined.  We created the rsyslog 
>> approach because of the limitation imposed by D-Bus.  But I know there 
>> are still those who don't like the rsyslog approach.  Is there a way 
>> we can now get together and define a new logging solution that is 
>> fully upstream and avoids the drawbacks of both existing solutions?
> 
> I hope so, because doing that would make things a lot easier for our 
> users adopting OpenBMC.

My main requirements are to store many logs (at least 4000 was the 
original number, but I can try to get an updated number if needed) and 
have them persist across BMC reboots.

We currently accomplish this using rsyslog to extract logs from the 
journal and store them in a persistent text file.

How is best to approach starting a new design discussion?  Should we 
continue discussing in this thread?  Start a design doc review? 
Something else?
> 
> Thanks,
> brad

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-19 18:05           ` Bills, Jason M
@ 2022-10-19 20:20             ` Brad Bishop
  2022-10-20 13:24             ` Lei Yu
  2022-10-24 17:59             ` Ed Tanous
  2 siblings, 0 replies; 19+ messages in thread
From: Brad Bishop @ 2022-10-19 20:20 UTC (permalink / raw)
  To: Bills, Jason M; +Cc: openbmc

On Wed, Oct 19, 2022 at 12:05:03PM -0600, Bills, Jason M wrote:

>If I remember correctly, Ed Tanous is who did the initial evaluation, 
>so he may have more detail.  But I think it was similar to what you 
>suggest: Create 4000 logs on D-Bus and check the performance.  

Ok.  It would probably help if performance could be qualified more.  
What was checked to determine that performace had been impacted?  
http/ipmitool response latency?

>My main requirements are to store many logs (at least 4000 was the 
>original number, but I can try to get an updated number if needed) and 
>have them persist across BMC reboots.

These requirements are obviously quite reasonable, but they are already 
satisfied by both implementations.  So the requirements that are not
satisfied by the DBus implementation would be helpful to the discussion 
I think.  I'm assuming those would be certain reponse time targets for 
different types of queries, and possibly some other general performance 
metrics around resource utilization?

>How is best to approach starting a new design discussion?  Should we

For those using the rsyslog backend, I think sharing (performance) 
requirements would be a good start, as that seems to be the major 
differentiation feature.  For those using the DBus backend, it would be 
helpful to know if there are any technical reasons why they have not 
moved to the more performant solution (assuming it actually is).

>continue discussing in this thread?  Start a design doc review? 
>Something else?

I'm happy to keep the discussion going here...

Thanks,
brad

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-19 18:05           ` Bills, Jason M
  2022-10-19 20:20             ` Brad Bishop
@ 2022-10-20 13:24             ` Lei Yu
  2022-10-20 14:39               ` Deng Tyler
  2022-10-24 17:59             ` Ed Tanous
  2 siblings, 1 reply; 19+ messages in thread
From: Lei Yu @ 2022-10-20 13:24 UTC (permalink / raw)
  To: Brad Bishop; +Cc: openbmc, Bills, Jason M

On Thu, Oct 20, 2022 at 2:05 AM Bills, Jason M
<jason.m.bills@linux.intel.com> wrote:
>
>
>
> On 10/19/2022 11:10 AM, Brad Bishop wrote:
> > Thanks Jason
> >
> > On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote:
> >
> >> Intel had a requirement to support storing at least 4000 log entries.
> >

Bytedance has a requirement of 1000 log entries.

> > Ok.  So is it fair to assume anyone using the DBus backend does not have
> > this requirement?
>
> That is my assumption, yes.
> >
> >> At the time, we were able to get about 400 entries on D-Bus before
> >> D-Bus performance became unusable.
> >
> > To anyone using the DBus backend - have you observed similar performance
> > issues?
> >

We did hit the performance issue, specifically, it is extremely slow
during BMC boot, when log-manager restore the log entries and put them
on DBus.
That's when I start the discussion about
https://gerrit.openbmc.org/c/openbmc/phosphor-logging/+/52445 and
https://lore.kernel.org/openbmc/CAGm54UHU9s0bTq-AR9tJunoX2Wa9tQ0PH_zWJ2QrYdR3SRqcvg@mail.gmail.com/

Later we resolved the issue by:
* Applying the patch
https://gerrit.openbmc.org/c/openbmc/phosphor-objmgr/+/53904
* Implement the SEL cache in ipmid that is already upstreamed
* Improve the SEL cache by serialization (not upstreamed)

Eventually we get fair performance on SEL handling (with 1000
entries), it should handle 4000 as well.

> > Jason is there a testcase or scenario I can execute to highlighht the
> > issues you refer to concretely?  Maybe something like "create 4000 sels,
> > run ipmitool and see how long it takes?"
>
> To clarify, my understanding is the D-Bus performance issues were not
> isolated to just IPMI.  All of D-Bus for every BMC service was impacted.
>
> If I remember correctly, Ed Tanous is who did the initial evaluation, so
> he may have more detail.  But I think it was similar to what you
> suggest: Create 4000 logs on D-Bus and check the performance.  This
> could be done with ipmitool.
> >
> >> I'd also be curious about the reverse question.  Is there any benefit
> >> to storing logs on D-Bus that makes it a better solution?
> >
> > Yes, this is exactly the question I've been trying to ask.  The answer
> > seems only to be that the code is in meta-intel/intel-ipmi-oem - but
> > that is easily fixed by moving the code to
> > meta-phosphor/phosphor-host-ipmid.
> >
> >> At the risk of complicating things more (https://xkcd.com/927/), D-Bus
> >> was the primary solution when Intel joined.  We created the rsyslog
> >> approach because of the limitation imposed by D-Bus.  But I know there
> >> are still those who don't like the rsyslog approach.  Is there a way
> >> we can now get together and define a new logging solution that is
> >> fully upstream and avoids the drawbacks of both existing solutions?
> >
> > I hope so, because doing that would make things a lot easier for our
> > users adopting OpenBMC.
>
> My main requirements are to store many logs (at least 4000 was the
> original number, but I can try to get an updated number if needed) and
> have them persist across BMC reboots.
>
> We currently accomplish this using rsyslog to extract logs from the
> journal and store them in a persistent text file.
>
> How is best to approach starting a new design discussion?  Should we
> continue discussing in this thread?  Start a design doc review?
> Something else?
> >
> > Thanks,
> > brad

I would like to add several notes (possibly limitations) about
rsyslog's SEL in intel-ipmi-oem, please correct if I was wrong.
* It handles the SELs from phosphor-sel-logger, mostly it only
contains the threshold events.
* It iterates the sel files, and convert the file content into SEL
data every time on a request, which does not seem optimal
* The "add sel entry" does not really add a sel log, it adds an event
entry to Redfish instead.
* With above behavior, it basically has two separate types of logs,
SEL logs that are from rsyslog, and redfish event logs that are done
by "add sel entry". Thus the implementation seems to only support SELs
for sensor threshold events, but not for discrete sensors.

In bytedance we need a "full" SEL feature that supports both
thresholds and discrete sensors.
The whole solution is based on the DBus logging, but it involves
different repos (ipmid, phosphor-logging, fault-monitor). Part of the
implementation is upstreamed but some are internal for now.
I would like to share the details when I have bandwidth :)

-- 
BRs,
Lei YU

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-20 13:24             ` Lei Yu
@ 2022-10-20 14:39               ` Deng Tyler
  0 siblings, 0 replies; 19+ messages in thread
From: Deng Tyler @ 2022-10-20 14:39 UTC (permalink / raw)
  To: Lei Yu; +Cc: openbmc, Brad Bishop, Bills, Jason M

[-- Attachment #1: Type: text/plain, Size: 5153 bytes --]

Hi Lei:
    I encounter a SEL catch sync issue. If a SEL generated while ipmd
collecting all log enttry from file in SEL cache initialized progress then
the SEL wouldn't be exist in SEL cache. Do you ever encounter this issue?

Tyler

Lei Yu <yulei.sh@bytedance.com> 於 2022年10月20日 週四 晚上9:26寫道:

> On Thu, Oct 20, 2022 at 2:05 AM Bills, Jason M
> <jason.m.bills@linux.intel.com> wrote:
> >
> >
> >
> > On 10/19/2022 11:10 AM, Brad Bishop wrote:
> > > Thanks Jason
> > >
> > > On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote:
> > >
> > >> Intel had a requirement to support storing at least 4000 log entries.
> > >
>
> Bytedance has a requirement of 1000 log entries.
>
> > > Ok.  So is it fair to assume anyone using the DBus backend does not
> have
> > > this requirement?
> >
> > That is my assumption, yes.
> > >
> > >> At the time, we were able to get about 400 entries on D-Bus before
> > >> D-Bus performance became unusable.
> > >
> > > To anyone using the DBus backend - have you observed similar
> performance
> > > issues?
> > >
>
> We did hit the performance issue, specifically, it is extremely slow
> during BMC boot, when log-manager restore the log entries and put them
> on DBus.
> That's when I start the discussion about
> https://gerrit.openbmc.org/c/openbmc/phosphor-logging/+/52445 and
>
> https://lore.kernel.org/openbmc/CAGm54UHU9s0bTq-AR9tJunoX2Wa9tQ0PH_zWJ2QrYdR3SRqcvg@mail.gmail.com/
>
> Later we resolved the issue by:
> * Applying the patch
> https://gerrit.openbmc.org/c/openbmc/phosphor-objmgr/+/53904
> * Implement the SEL cache in ipmid that is already upstreamed
> * Improve the SEL cache by serialization (not upstreamed)
>
> Eventually we get fair performance on SEL handling (with 1000
> entries), it should handle 4000 as well.
>
> > > Jason is there a testcase or scenario I can execute to highlighht the
> > > issues you refer to concretely?  Maybe something like "create 4000
> sels,
> > > run ipmitool and see how long it takes?"
> >
> > To clarify, my understanding is the D-Bus performance issues were not
> > isolated to just IPMI.  All of D-Bus for every BMC service was impacted.
> >
> > If I remember correctly, Ed Tanous is who did the initial evaluation, so
> > he may have more detail.  But I think it was similar to what you
> > suggest: Create 4000 logs on D-Bus and check the performance.  This
> > could be done with ipmitool.
> > >
> > >> I'd also be curious about the reverse question.  Is there any benefit
> > >> to storing logs on D-Bus that makes it a better solution?
> > >
> > > Yes, this is exactly the question I've been trying to ask.  The answer
> > > seems only to be that the code is in meta-intel/intel-ipmi-oem - but
> > > that is easily fixed by moving the code to
> > > meta-phosphor/phosphor-host-ipmid.
> > >
> > >> At the risk of complicating things more (https://xkcd.com/927/),
> D-Bus
> > >> was the primary solution when Intel joined.  We created the rsyslog
> > >> approach because of the limitation imposed by D-Bus.  But I know there
> > >> are still those who don't like the rsyslog approach.  Is there a way
> > >> we can now get together and define a new logging solution that is
> > >> fully upstream and avoids the drawbacks of both existing solutions?
> > >
> > > I hope so, because doing that would make things a lot easier for our
> > > users adopting OpenBMC.
> >
> > My main requirements are to store many logs (at least 4000 was the
> > original number, but I can try to get an updated number if needed) and
> > have them persist across BMC reboots.
> >
> > We currently accomplish this using rsyslog to extract logs from the
> > journal and store them in a persistent text file.
> >
> > How is best to approach starting a new design discussion?  Should we
> > continue discussing in this thread?  Start a design doc review?
> > Something else?
> > >
> > > Thanks,
> > > brad
>
> I would like to add several notes (possibly limitations) about
> rsyslog's SEL in intel-ipmi-oem, please correct if I was wrong.
> * It handles the SELs from phosphor-sel-logger, mostly it only
> contains the threshold events.
> * It iterates the sel files, and convert the file content into SEL
> data every time on a request, which does not seem optimal
> * The "add sel entry" does not really add a sel log, it adds an event
> entry to Redfish instead.
> * With above behavior, it basically has two separate types of logs,
> SEL logs that are from rsyslog, and redfish event logs that are done
> by "add sel entry". Thus the implementation seems to only support SELs
> for sensor threshold events, but not for discrete sensors.
>
> In bytedance we need a "full" SEL feature that supports both
> thresholds and discrete sensors.
> The whole solution is based on the DBus logging, but it involves
> different repos (ipmid, phosphor-logging, fault-monitor). Part of the
> implementation is upstreamed but some are internal for now.
> I would like to share the details when I have bandwidth :)
>
> --
> BRs,
> Lei YU
>

[-- Attachment #2: Type: text/html, Size: 6598 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-19 15:50       ` Bills, Jason M
  2022-10-19 17:10         ` Brad Bishop
@ 2022-10-21 20:14         ` Patrick Williams
  2022-10-24 22:44           ` Vernon Mauery
  2022-10-21 20:34         ` Patrick Williams
  2 siblings, 1 reply; 19+ messages in thread
From: Patrick Williams @ 2022-10-21 20:14 UTC (permalink / raw)
  To: Bills, Jason M; +Cc: openbmc

[-- Attachment #1: Type: text/plain, Size: 3377 bytes --]

On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote:

> Intel had a requirement to support storing at least 4000 log entries. 
> At the time, we were able to get about 400 entries on D-Bus before D-Bus 
> performance became unusable.
> 
> That was before dbus-broker, so it could perhaps be better today.

I was surprised that there would be much performance impact to dbus as a
whole because there should not be any impact to the bus because one
process decided to host a bunch of objects.  I can understand _that_
process becoming slower if there are algorithmic problems with it.

I did an experiment on an AST2600 system where I modified phosphor-logging
to support 20k log entries and then created 10k of them.

```
$ cat meta-facebook/recipes-phosphor/logging/phosphor-logging_%.bbappend
FILESEXTRAPATHS:prepend := "${THISDIR}/${PN}:"

EXTRA_OEMESON += "-Derror_cap=20000"
```

What I've found can be summarized as follows:

   - Overall there is no impact to the dbus by creating a large number
     of log entries.  Interactions with other daemons happen just fine
     with no performance impact.

   - Creating 10k log entries does take a while.  Most of this time is
     observed (via top) in jffs2 but there is also some peaky objmgr
     activity.  There might be some improvements to be had there, but I
     don't think anyone is intending to create 10k events in the span of
     a minute.

   - Dumping all the events from phosphor-logging is slow when there are
     10k of them.  It took 8-11s.  I didn't have `strace` installed, but
     it seemed like much of this was coming from `busctl` processing the
     result and not from the bus transfer itself, but more investigation
     would be required.

   - Deleting all 10k of the events timed out at a dbus level, but still
     succeeded.  Almost all of the time was spent in jffs2.

I'm not suggesting there aren't optimizations we can do to
phosphor-logging, but it doesn't seem like dbus itself is a direct
concern.

```
    # busctl call xyz.openbmc_project.Logging \
            /xyz/openbmc_project/logging \
            xyz.openbmc_project.Collection.DeleteAll \
            DeleteAll

    # time busctl call xyz.openbmc_project.Logging \
            /xyz/openbmc_project/logging \
            org.freedesktop.DBus.ObjectManager \
            GetManagedObjects > /dev/null

    real    0m0.017s
    user    0m0.015s
    sys    0m0.001s

    # time busctl tree xyz.openbmc_project.VirtualSensor > /dev/null

    real    0m0.025s
    user    0m0.019s
    sys    0m0.001s    

    ### Create 10000 log entries.
    # for i in $(seq 0 10000) ; do busctl call \
        xyz.openbmc_project.Logging /xyz/openbmc_project/logging \
        xyz.openbmc_project.Logging.Create Create ssa{ss} \
        "Hello.$i" \
        "xyz.openbmc_project.Logging.Entry.Level.Warning" 0 ;
      done

    # time busctl call xyz.openbmc_project.Logging \
            /xyz/openbmc_project/logging \
            org.freedesktop.DBus.ObjectManager \
            GetManagedObjects > /dev/null

    real    0m11.722s
    user    0m9.130s
    sys    0m0.071s

    # time busctl tree xyz.openbmc_project.VirtualSensor > /dev/null

    real    0m0.024s
    user    0m0.019s
    sys    0m0.001s
```

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-19 15:50       ` Bills, Jason M
  2022-10-19 17:10         ` Brad Bishop
  2022-10-21 20:14         ` Patrick Williams
@ 2022-10-21 20:34         ` Patrick Williams
  2022-10-25 20:37           ` Bills, Jason M
  2 siblings, 1 reply; 19+ messages in thread
From: Patrick Williams @ 2022-10-21 20:34 UTC (permalink / raw)
  To: Bills, Jason M; +Cc: openbmc

[-- Attachment #1: Type: text/plain, Size: 2834 bytes --]

On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote:

> I'd also be curious about the reverse question.  Is there any benefit to 
> storing logs on D-Bus that makes it a better solution?
...
> Is there a way we can now get together and define a new logging solution
> that is fully upstream and avoids the drawbacks of both existing solutions?

First and foremost I'd like to see consistency come out of this.  If
there is another proposal for how to do it that we can all consolidate
on (and people are willing to put in effort to get there) then I'm
on-board.  It seems to me like the lowest friction way to get there, with
the best maintainability, is to use the phosphor-logging APIs even if we
end up not putting them into d-bus entries.

It happens that phosphor-logging stores the instances on d-bus, but the
more important aspect to me is that we have a more consistent API for
defining and creating errors and events.  The "rsyslog-way" is that you
make very specific journal entries that the rsyslog magic knows about,
but there are a few issues with it:

    1. We don't have any consistency in what, when, and how events are
       logged.  We even have cases within the same repository (looking at
       dbus-sensors) where some of the implementations make the magic
       SEL records and others do not.  Additionally, they're not required
       to be the same format.  Some maintainers have even outright
       rejected patches with the "magic log statements".

    2. There is no way to generate something like a Redfish message
       registry for the events, because they're just arbitrary strings
       that are sprinkled around.  It isn't even easy to programatically
       search the code for them because there are 4 different approaches
       to that: cout/cerr, direct journald, phosphor-logging "v1", and
       phosphor-logging lg2.

    3. Any kind of automation around these is more at the whim of
       whatever the developers / maintainers decide to change.  It is,
       for example, really difficult for me to write data center tooling
       that reacts to events like "we just lost pgood to the host"
       because I have to read through the code to find the specific text
       and hope it never changes.

Conversely, the phosphor-logging APIs leverage YAML-based error specifiers,
which can be easily transposed into a Redfish message registry, and happen
to also be the same structure we use for inter-process errors on d-bus calls.
While I have to review the implementations to make sure they're
appropriately created, I have far less concern about them disappearing
or changing once they are in place (and I can review the changes to the YAML
specifiers to keep tabs on what changes their might be).

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-19 18:05           ` Bills, Jason M
  2022-10-19 20:20             ` Brad Bishop
  2022-10-20 13:24             ` Lei Yu
@ 2022-10-24 17:59             ` Ed Tanous
  2022-10-24 19:03               ` Brad Bishop
  2 siblings, 1 reply; 19+ messages in thread
From: Ed Tanous @ 2022-10-24 17:59 UTC (permalink / raw)
  To: Bills, Jason M; +Cc: openbmc

On Wed, Oct 19, 2022 at 11:05 AM Bills, Jason M
<jason.m.bills@linux.intel.com> wrote:
>
>
>
> On 10/19/2022 11:10 AM, Brad Bishop wrote:
> > Thanks Jason
> >
> > On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote:
> >
> >> Intel had a requirement to support storing at least 4000 log entries.
> >
> > Ok.  So is it fair to assume anyone using the DBus backend does not have
> > this requirement?
>
> That is my assumption, yes.
> >
> >> At the time, we were able to get about 400 entries on D-Bus before
> >> D-Bus performance became unusable.
> >
> > To anyone using the DBus backend - have you observed similar performance
> > issues?
> >
> > Jason is there a testcase or scenario I can execute to highlighht the
> > issues you refer to concretely?  Maybe something like "create 4000 sels,
> > run ipmitool and see how long it takes?"
>
> To clarify, my understanding is the D-Bus performance issues were not
> isolated to just IPMI.  All of D-Bus for every BMC service was impacted.
>
> If I remember correctly, Ed Tanous is who did the initial evaluation, so
> he may have more detail.  But I think it was similar to what you
> suggest: Create 4000 logs on D-Bus and check the performance.  This
> could be done with ipmitool.


From what I recall, the requirements were:
- Ability to store 4000 logs in a rotating buffer (original desire was
10,000, but 4000 was picked as a middle-ground that could be
implemented).
- Ability to log 100+ entries per second, including when buffers get
overwritten.
- (abstract) Log storage should be aware of hardware limitations (SPI
flash cell write endurance) and allow writing N logs per minute for
the lifetime of the machine without hardware failure.  (I forget the
value of N).
- "ipmitool sensor list" should return the results from a full sel log
in less than 1 second (renegotiated from 200ms, the faster the
better).
- The logging implementation should be able to support a well-formed,
version controlled, Redfish MessageRegistry to the DMTF
specifications.
- The logging implementation should be able to implement a
well-formed, stable, and ACID compliant IPMI SEL command
implementation.

I don't believe the current DBus implementation can meet the previous
requirements, but if it's capable of that these days, it would be
excellent to consolidate.

> >
> >> I'd also be curious about the reverse question.  Is there any benefit
> >> to storing logs on D-Bus that makes it a better solution?
> >
> > Yes, this is exactly the question I've been trying to ask.  The answer
> > seems only to be that the code is in meta-intel/intel-ipmi-oem - but
> > that is easily fixed by moving the code to
> > meta-phosphor/phosphor-host-ipmid.
> >
> >> At the risk of complicating things more (https://xkcd.com/927/), D-Bus
> >> was the primary solution when Intel joined.  We created the rsyslog
> >> approach because of the limitation imposed by D-Bus.  But I know there
> >> are still those who don't like the rsyslog approach.  Is there a way
> >> we can now get together and define a new logging solution that is
> >> fully upstream and avoids the drawbacks of both existing solutions?
> >
> > I hope so, because doing that would make things a lot easier for our
> > users adopting OpenBMC.
>
> My main requirements are to store many logs (at least 4000 was the
> original number, but I can try to get an updated number if needed) and
> have them persist across BMC reboots.
>
> We currently accomplish this using rsyslog to extract logs from the
> journal and store them in a persistent text file.
>
> How is best to approach starting a new design discussion?  Should we
> continue discussing in this thread?  Start a design doc review?
> Something else?
> >
> > Thanks,
> > brad

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-24 17:59             ` Ed Tanous
@ 2022-10-24 19:03               ` Brad Bishop
  2022-10-24 20:19                 ` Ed Tanous
  2022-10-24 22:56                 ` Vernon Mauery
  0 siblings, 2 replies; 19+ messages in thread
From: Brad Bishop @ 2022-10-24 19:03 UTC (permalink / raw)
  To: Ed Tanous; +Cc: Bills, Jason M, openbmc

This is helpful, thanks Ed.

On Mon, Oct 24, 2022 at 10:59:51AM -0700, Ed Tanous wrote:
>From what I recall, the requirements were:
>- Ability to store 4000 logs in a rotating buffer (original desire was
>10,000, but 4000 was picked as a middle-ground that could be
>implemented).

A DBus object based implementation could meet this requirement, right?

>- Ability to log 100+ entries per second, including when buffers get
>overwritten.

I guess I would not be shocked if DBus objects + serialization might not 
be able to sustain this rate of incoming logs.  Maybe it depends on the 
filesystem or how the data is serialized in the filesystem.  The DBus 
approach creates many files.  Obviously the syslog approach is only 
using a couple of files.

Do you think this kind of requirement is typical?  Quoting Patrick from 
another thread here:

>don't think anyone is intending to create 10k events in the span of
>a minute

100/s is only 6k in a minute but that is getting pretty close...

>- (abstract) Log storage should be aware of hardware limitations (SPI
>flash cell write endurance) and allow writing N logs per minute for
>the lifetime of the machine without hardware failure.  (I forget the
>value of N).

Do you think the rsyslog implementation does better at this?  Why?

>- "ipmitool sensor list" should return the results from a full sel log
>in less than 1 second (renegotiated from 200ms, the faster the
>better).

Ok, again I would not be shocked if DBus objects weren't able to deliver 
on this.

>- The logging implementation should be able to support a well-formed,
>version controlled, Redfish MessageRegistry to the DMTF
>specifications.

Do you think a DBus object based implementation could meet this 
requirement?

>- The logging implementation should be able to implement a
>well-formed, stable, and ACID compliant IPMI SEL command
>implementation.

Do you think a DBus object based implementation could meet this 
requirement?

>
>I don't believe the current DBus implementation can meet the previous
>requirements, 

The motivation of my questions above is to understand which requirements 
cannot be met by something based on DBus objects.

Thanks,
brad

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-24 19:03               ` Brad Bishop
@ 2022-10-24 20:19                 ` Ed Tanous
  2022-10-25 20:18                   ` Bills, Jason M
  2022-10-24 22:56                 ` Vernon Mauery
  1 sibling, 1 reply; 19+ messages in thread
From: Ed Tanous @ 2022-10-24 20:19 UTC (permalink / raw)
  To: Brad Bishop; +Cc: Bills, Jason M, openbmc

On Mon, Oct 24, 2022 at 12:03 PM Brad Bishop
<bradleyb@fuzziesquirrel.com> wrote:
>
> This is helpful, thanks Ed.
>
> On Mon, Oct 24, 2022 at 10:59:51AM -0700, Ed Tanous wrote:
> >From what I recall, the requirements were:
> >- Ability to store 4000 logs in a rotating buffer (original desire was
> >10,000, but 4000 was picked as a middle-ground that could be
> >implemented).
>
> A DBus object based implementation could meet this requirement, right?
>
> >- Ability to log 100+ entries per second, including when buffers get
> >overwritten.
>
> I guess I would not be shocked if DBus objects + serialization might not
> be able to sustain this rate of incoming logs.  Maybe it depends on the
> filesystem or how the data is serialized in the filesystem.  The DBus
> approach creates many files.  Obviously the syslog approach is only
> using a couple of files.
>
> Do you think this kind of requirement is typical?  Quoting Patrick from
> another thread here:

There are cases where every sensor will cross a threshold at the same
moment (generally because a threshold was changed).  Is this a
"typical" use case, probably not, but if it's trivial to DOS the BMC
with events with a couple commands, it's not great.

>
> >don't think anyone is intending to create 10k events in the span of
> >a minute

Intending to, no.  Created in an error case, or in the case of a
failing piece of hardware, it can happen.

>
> 100/s is only 6k in a minute but that is getting pretty close...
>
> >- (abstract) Log storage should be aware of hardware limitations (SPI
> >flash cell write endurance) and allow writing N logs per minute for
> >the lifetime of the machine without hardware failure.  (I forget the
> >value of N).
>
> Do you think the rsyslog implementation does better at this?  Why?

@Jason?  I know there were a lot of worries about write expansion (N
byte sel log turning into 100XN bytes of JSON) wearing out flash
faster, and having some interactions with jffs2.

>
> >- "ipmitool sensor list" should return the results from a full sel log
> >in less than 1 second (renegotiated from 200ms, the faster the
> >better).
>
> Ok, again I would not be shocked if DBus objects weren't able to deliver
> on this.
>
> >- The logging implementation should be able to support a well-formed,
> >version controlled, Redfish MessageRegistry to the DMTF
> >specifications.
>
> Do you think a DBus object based implementation could meet this
> requirement?

Today, as defined, without duplicating the logging strings, no, but
DBus is just an IPC;  I'm fairly certain we could define a DBus based
logging mechanism that met these requirements, but the key is that the
Redfish instance (bmcweb in this case) needs a-priori knowledge of
every possible log that the system can publish, and version them with
DMTF-appropriate semver (existing messages changes rev the patch
version, new messages revs the subminor version).  The existing
rsyslog implementation places the log strings into bmcweb, so the only
thing being transferred over the IPC is the MessageId and the
specific-instance arguments, which in theory, increases the
performance quite a bit, and avoids duplicating the strings in two
places.

>
> >- The logging implementation should be able to implement a
> >well-formed, stable, and ACID compliant IPMI SEL command
> >implementation.
>
> Do you think a DBus object based implementation could meet this
> requirement?

Same answer, with today's code, no.  DBus is just an IPC, we could
potentially define one.

>
> >
> >I don't believe the current DBus implementation can meet the previous
> >requirements,
>
> The motivation of my questions above is to understand which requirements
> cannot be met by something based on DBus objects.

Cool.


>
> Thanks,
> brad

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-21 20:14         ` Patrick Williams
@ 2022-10-24 22:44           ` Vernon Mauery
  0 siblings, 0 replies; 19+ messages in thread
From: Vernon Mauery @ 2022-10-24 22:44 UTC (permalink / raw)
  To: Patrick Williams; +Cc: Bills, Jason M, openbmc

On 21-Oct-2022 03:14 PM, Patrick Williams wrote:
>On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote:
>
>> Intel had a requirement to support storing at least 4000 log entries.
>> At the time, we were able to get about 400 entries on D-Bus before D-Bus
>> performance became unusable.
>>
>> That was before dbus-broker, so it could perhaps be better today.
>
>I was surprised that there would be much performance impact to dbus as a
>whole because there should not be any impact to the bus because one
>process decided to host a bunch of objects.  I can understand _that_
>process becoming slower if there are algorithmic problems with it.

I suspect that it was a combination of the switch to dbus-broker in 
addition to rewriting the mapper, which made the number of SEL entries 
phosphor-logging is capable of handling go up from what it was before.

>I did an experiment on an AST2600 system where I modified phosphor-logging
>to support 20k log entries and then created 10k of them.
>
>```
>$ cat meta-facebook/recipes-phosphor/logging/phosphor-logging_%.bbappend
>FILESEXTRAPATHS:prepend := "${THISDIR}/${PN}:"
>
>EXTRA_OEMESON += "-Derror_cap=20000"
>```
>
>What I've found can be summarized as follows:
>
>   - Overall there is no impact to the dbus by creating a large number
>     of log entries.  Interactions with other daemons happen just fine
>     with no performance impact.

Right, I don't expect that once they are created that they would have a 
meaningful impact other than maybe some sort of memory footprint, 
though, that footprint is magnified from just phosphor-logging by the 
mapper.

>   - Creating 10k log entries does take a while.  Most of this time is
>     observed (via top) in jffs2 but there is also some peaky objmgr
>     activity.  There might be some improvements to be had there, but I
>     don't think anyone is intending to create 10k events in the span of
>     a minute.

This is really my biggest concern at this point. The OpenBMC is already 
the slowest-to-boot BMC firmware that I have worked on in the past 10 
years, and that is in the face of faster BMC processors and faster RAM. 
Delay the 'fully booted' state of the BMC for this will cause validation 
bugs because the BMC is changing behavior even though it should be at a 
stable state.

>   - Dumping all the events from phosphor-logging is slow when there are
>     10k of them.  It took 8-11s.  I didn't have `strace` installed, but
>     it seemed like much of this was coming from `busctl` processing the
>     result and not from the bus transfer itself, but more investigation
>     would be required.

I think more investigation needs to be done here. We should be limited 
by the network, not by accessing the items.

>   - Deleting all 10k of the events timed out at a dbus level, but still
>     succeeded.  Almost all of the time was spent in jffs2.

This would be a lot faster if all the items were in a single file, which 
is a change that could be made independently of whether or not the 
individual log entries are hosted as dbus objects.

--Vernon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-24 19:03               ` Brad Bishop
  2022-10-24 20:19                 ` Ed Tanous
@ 2022-10-24 22:56                 ` Vernon Mauery
  1 sibling, 0 replies; 19+ messages in thread
From: Vernon Mauery @ 2022-10-24 22:56 UTC (permalink / raw)
  To: Brad Bishop; +Cc: Ed Tanous, Bills, Jason M, openbmc

On 24-Oct-2022 03:03 PM, Brad Bishop wrote:
>This is helpful, thanks Ed.
>
>On Mon, Oct 24, 2022 at 10:59:51AM -0700, Ed Tanous wrote:
>>From what I recall, the requirements were:
>>- Ability to store 4000 logs in a rotating buffer (original desire was
>>10,000, but 4000 was picked as a middle-ground that could be
>>implemented).
>
>A DBus object based implementation could meet this requirement, right?
>
>>- Ability to log 100+ entries per second, including when buffers get
>>overwritten.
>
>I guess I would not be shocked if DBus objects + serialization might 
>not be able to sustain this rate of incoming logs.  Maybe it depends 
>on the filesystem or how the data is serialized in the filesystem.  
>The DBus approach creates many files.  Obviously the syslog approach 
>is only using a couple of files.
>
>Do you think this kind of requirement is typical?  Quoting Patrick 
>from another thread here:
>
>>don't think anyone is intending to create 10k events in the span of
>>a minute
>
>100/s is only 6k in a minute but that is getting pretty close...

I think 100/s is a pretty high rate. Actually creating events at that 
rate seems like a signal that the system is in a bit of distress. 
Really, I think the gating factor is how fast they get processed at boot 
time. If loading the events onto D-Bus as objects at boot time is the 
same cost as creating them to start with, 100/s is insufficient for a 
large number of log entries.

>>- (abstract) Log storage should be aware of hardware limitations (SPI
>>flash cell write endurance) and allow writing N logs per minute for
>>the lifetime of the machine without hardware failure.  (I forget the
>>value of N).
>
>Do you think the rsyslog implementation does better at this?  Why?

One file with jffs2 uses fewer bytes than many files. When we were 
trying to figure out how to do a file-based implementation to start 
with, we tried individual files and found we ran out of space quickly 
even though the total size of the files was not that great. We even 
tried using empty files, encoding the data into the filenames themselves 
in hopes to only use inodes but that seemed to suffer the same issue. A 
single file that held all the entries was fast, small, and easy to 
handle.

>>- "ipmitool sensor list" should return the results from a full sel log
>>in less than 1 second (renegotiated from 200ms, the faster the
>>better).
>
>Ok, again I would not be shocked if DBus objects weren't able to 
>deliver on this.

I imagine dbus-broker could handle it, especially on a dual-core system. 
I am not sure the systemd dbus daemon was able to handle it on the 
ast2500.

>>- The logging implementation should be able to support a well-formed,
>>version controlled, Redfish MessageRegistry to the DMTF
>>specifications.
>
>Do you think a DBus object based implementation could meet this 
>requirement?
>
>>- The logging implementation should be able to implement a
>>well-formed, stable, and ACID compliant IPMI SEL command
>>implementation.
>
>Do you think a DBus object based implementation could meet this 
>requirement?
>
>>
>>I don't believe the current DBus implementation can meet the previous
>>requirements,
>
>The motivation of my questions above is to understand which 
>requirements cannot be met by something based on DBus objects.

I think it is possible to come up with a dbus interface that supports 
the various requirements in this thread. I don't believe that the 
current implementation is sufficient though.


--Vernon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-24 20:19                 ` Ed Tanous
@ 2022-10-25 20:18                   ` Bills, Jason M
  0 siblings, 0 replies; 19+ messages in thread
From: Bills, Jason M @ 2022-10-25 20:18 UTC (permalink / raw)
  To: openbmc



On 10/24/2022 2:19 PM, Ed Tanous wrote:
> On Mon, Oct 24, 2022 at 12:03 PM Brad Bishop
> <bradleyb@fuzziesquirrel.com> wrote:
>>
>> This is helpful, thanks Ed.
>>
>> On Mon, Oct 24, 2022 at 10:59:51AM -0700, Ed Tanous wrote:
>> >From what I recall, the requirements were:
>>> - Ability to store 4000 logs in a rotating buffer (original desire was
>>> 10,000, but 4000 was picked as a middle-ground that could be
>>> implemented).
>>
>> A DBus object based implementation could meet this requirement, right?
>>
>>> - Ability to log 100+ entries per second, including when buffers get
>>> overwritten.
>>
>> I guess I would not be shocked if DBus objects + serialization might not
>> be able to sustain this rate of incoming logs.  Maybe it depends on the
>> filesystem or how the data is serialized in the filesystem.  The DBus
>> approach creates many files.  Obviously the syslog approach is only
>> using a couple of files.
>>
>> Do you think this kind of requirement is typical?  Quoting Patrick from
>> another thread here:
> 
> There are cases where every sensor will cross a threshold at the same
> moment (generally because a threshold was changed).  Is this a
> "typical" use case, probably not, but if it's trivial to DOS the BMC
> with events with a couple commands, it's not great.
> 
>>
>>> don't think anyone is intending to create 10k events in the span of
>>> a minute
> 
> Intending to, no.  Created in an error case, or in the case of a
> failing piece of hardware, it can happen.
> 
>>
>> 100/s is only 6k in a minute but that is getting pretty close...
>>
>>> - (abstract) Log storage should be aware of hardware limitations (SPI
>>> flash cell write endurance) and allow writing N logs per minute for
>>> the lifetime of the machine without hardware failure.  (I forget the
>>> value of N).
>>
>> Do you think the rsyslog implementation does better at this?  Why?
> 
> @Jason?  I know there were a lot of worries about write expansion (N
> byte sel log turning into 100XN bytes of JSON) wearing out flash
> faster, and having some interactions with jffs2.
> 

Yes.  We ran into this issue both with trying to use individual files 
per SEL entry and with persisting the journal.  At the time, we only had 
4MB of SPI flash space for persistent storage, and both of those methods 
quickly filled that up.

We now have 8MB, but only a small part of that can be dedicated to 
persisting logs, so we still have a requirement that persisting many 
logs should not consume too much persistent storage space.

>>
>>> - "ipmitool sensor list" should return the results from a full sel log
>>> in less than 1 second (renegotiated from 200ms, the faster the
>>> better).
>>
>> Ok, again I would not be shocked if DBus objects weren't able to deliver
>> on this.
>>
>>> - The logging implementation should be able to support a well-formed,
>>> version controlled, Redfish MessageRegistry to the DMTF
>>> specifications.
>>
>> Do you think a DBus object based implementation could meet this
>> requirement?
> 
> Today, as defined, without duplicating the logging strings, no, but
> DBus is just an IPC;  I'm fairly certain we could define a DBus based
> logging mechanism that met these requirements, but the key is that the
> Redfish instance (bmcweb in this case) needs a-priori knowledge of
> every possible log that the system can publish, and version them with
> DMTF-appropriate semver (existing messages changes rev the patch
> version, new messages revs the subminor version).  The existing
> rsyslog implementation places the log strings into bmcweb, so the only
> thing being transferred over the IPC is the MessageId and the
> specific-instance arguments, which in theory, increases the
> performance quite a bit, and avoids duplicating the strings in two
> places.
> 
>>
>>> - The logging implementation should be able to implement a
>>> well-formed, stable, and ACID compliant IPMI SEL command
>>> implementation.
>>
>> Do you think a DBus object based implementation could meet this
>> requirement?
> 
> Same answer, with today's code, no.  DBus is just an IPC, we could
> potentially define one.
> 
>>
>>>
>>> I don't believe the current DBus implementation can meet the previous
>>> requirements,
>>
>> The motivation of my questions above is to understand which requirements
>> cannot be met by something based on DBus objects.
> 
> Cool.
> 
> 
>>
>> Thanks,
>> brad

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Adding support for custom SEL records
  2022-10-21 20:34         ` Patrick Williams
@ 2022-10-25 20:37           ` Bills, Jason M
  0 siblings, 0 replies; 19+ messages in thread
From: Bills, Jason M @ 2022-10-25 20:37 UTC (permalink / raw)
  To: openbmc



On 10/21/2022 2:34 PM, Patrick Williams wrote:
> On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote:
> 
>> I'd also be curious about the reverse question.  Is there any benefit to
>> storing logs on D-Bus that makes it a better solution?
> ...
>> Is there a way we can now get together and define a new logging solution
>> that is fully upstream and avoids the drawbacks of both existing solutions?
> 
> First and foremost I'd like to see consistency come out of this.  If
> there is another proposal for how to do it that we can all consolidate
> on (and people are willing to put in effort to get there) then I'm
> on-board.  It seems to me like the lowest friction way to get there, with
> the best maintainability, is to use the phosphor-logging APIs even if we
> end up not putting them into d-bus entries.
> 
I agree that phosphor-logging seems like the right place, so I think 
looking at that but changing the back-end storage away from D-Bus 
objects may be a good direction to consider.

> It happens that phosphor-logging stores the instances on d-bus, but the
> more important aspect to me is that we have a more consistent API for
> defining and creating errors and events.  The "rsyslog-way" is that you
> make very specific journal entries that the rsyslog magic knows about,
> but there are a few issues with it:
> 
>      1. We don't have any consistency in what, when, and how events are
>         logged.  We even have cases within the same repository (looking at
>         dbus-sensors) where some of the implementations make the magic
>         SEL records and others do not.  Additionally, they're not required
>         to be the same format.  Some maintainers have even outright
>         rejected patches with the "magic log statements".
> 
Yes, this consistency would be good.  I tried to add SEL logging into 
phosphor-logging, but the patch didn't make it through review: 
https://gerrit.openbmc.org/c/openbmc/phosphor-logging/+/13956.

>      2. There is no way to generate something like a Redfish message
>         registry for the events, because they're just arbitrary strings
>         that are sprinkled around.  It isn't even easy to programatically
>         search the code for them because there are 4 different approaches
>         to that: cout/cerr, direct journald, phosphor-logging "v1", and
>         phosphor-logging lg2.
> 
I think Redfish is a more difficult case to handle, but if we can do it 
through the same or similar phosphor-logging API as IPMI, then I'm on board.

As for searching, it's true that different methods are used to get the 
log into the journal, but the Redfish MessageId is consistent in all 
cases and can be programatically searched.

>      3. Any kind of automation around these is more at the whim of
>         whatever the developers / maintainers decide to change.  It is,
>         for example, really difficult for me to write data center tooling
>         that reacts to events like "we just lost pgood to the host"
>         because I have to read through the code to find the specific text
>         and hope it never changes.
> 
Doesn't Redfish solve this issue by guaranteeing the same Message and 
MessageId are used for all the same events?

> Conversely, the phosphor-logging APIs leverage YAML-based error specifiers,
> which can be easily transposed into a Redfish message registry, and happen
> to also be the same structure we use for inter-process errors on d-bus calls.
> While I have to review the implementations to make sure they're
> appropriately created, I have far less concern about them disappearing
> or changing once they are in place (and I can review the changes to the YAML
> specifiers to keep tabs on what changes their might be).
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-10-25 20:38 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-07  7:35 Adding support for custom SEL records Lei Yu
2022-10-18 20:09 ` Brad Bishop
2022-10-19  2:05   ` Lei Yu
2022-10-19 14:43     ` Brad Bishop
2022-10-19 15:50       ` Bills, Jason M
2022-10-19 17:10         ` Brad Bishop
2022-10-19 18:05           ` Bills, Jason M
2022-10-19 20:20             ` Brad Bishop
2022-10-20 13:24             ` Lei Yu
2022-10-20 14:39               ` Deng Tyler
2022-10-24 17:59             ` Ed Tanous
2022-10-24 19:03               ` Brad Bishop
2022-10-24 20:19                 ` Ed Tanous
2022-10-25 20:18                   ` Bills, Jason M
2022-10-24 22:56                 ` Vernon Mauery
2022-10-21 20:14         ` Patrick Williams
2022-10-24 22:44           ` Vernon Mauery
2022-10-21 20:34         ` Patrick Williams
2022-10-25 20:37           ` Bills, Jason M

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.