* Adding support for custom SEL records @ 2020-12-07 7:35 Lei Yu 2022-10-18 20:09 ` Brad Bishop 0 siblings, 1 reply; 19+ messages in thread From: Lei Yu @ 2020-12-07 7:35 UTC (permalink / raw) To: openbmc This mail is to propose some changes in openbmc to support custom SEL records. * Background The phosphor-sel-logger is the service to provide methods (IpmiSelAdd, IpmiSelAddOem) to add custom SEL records. However: 1. It only sends the log into the journal; 2. It depends on meta-intel/meta-common/recipes-extended/rsyslog/rsyslog/rsyslog.conf to filter the journal log and write to "/var/log/ipmi_sel" 3. It depends on intel-ipmi-oem to parse the "/var/log/ipmi_sel" to provide the SEL entries. In short, phosphor-sel-logger only works with the meta-intel layer and intel-ipmi-oem. To make it support general usage, several changes are submitted to gerrit: https://gerrit.openbmc-project.xyz/q/topic:%22dev-add-custom-sel%22+(status:open%20OR%20status:merged) Some details: 1. It adds SEL.errors.yaml in PDI. 2. With SEL_LOGGER_SEND_TO_LOGGING_SERVICE option, phosphor-sel-logger creates entries in the logging service without callout path, instead of sending to the journal. 3. In phosphor-host-ipmid, it adds extra logic to parse the logging entries and convert to SEL entries. Comments are welcome. -- BRs, Lei YU ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2020-12-07 7:35 Adding support for custom SEL records Lei Yu @ 2022-10-18 20:09 ` Brad Bishop 2022-10-19 2:05 ` Lei Yu 0 siblings, 1 reply; 19+ messages in thread From: Brad Bishop @ 2022-10-18 20:09 UTC (permalink / raw) To: Lei Yu, openbmc Thanks Lei YU for adding the support desribed below. Sorry for the necropost but I had a couple questions (below) for anyone that feels like indulging me with answers. On Mon, 2020-12-07 at 15:35 +0800, Lei Yu wrote: > This mail is to propose some changes in openbmc to support custom SEL > records. > > * Background > The phosphor-sel-logger is the service to provide methods (IpmiSelAdd, > IpmiSelAddOem) to add custom SEL records. > However: > 1. It only sends the log into the journal; > 2. It depends on > meta-intel/meta-common/recipes-extended/rsyslog/rsyslog/rsyslog.conf > to filter the journal log and write to "/var/log/ipmi_sel" > 3. It depends on intel-ipmi-oem to parse the "/var/log/ipmi_sel" to > provide the SEL entries. > > In short, phosphor-sel-logger only works with the meta-intel layer and > intel-ipmi-oem. > > To make it support general usage, several changes are submitted to > gerrit: > https://gerrit.openbmc-project.xyz/q/topic:%22dev-add-custom-sel%22+(status:open%20OR%20status:merged) To make it support general usage, I think we could have just moved the rsyslog.conf to meta-phosphor and moved the handlers from intel-ipmi-oem to phosphor-host-ipmid, correct? Instead, we invented something completely new. My question is, why? Whatever the reason, OpenBMC now has two ways of doing SELs, and I am tasked with picking one. Knowing what was wrong with the rsyslog approach that led you to invent something new to replace it would help me, (and others I suspect), make that decision. Thanks! brad ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-18 20:09 ` Brad Bishop @ 2022-10-19 2:05 ` Lei Yu 2022-10-19 14:43 ` Brad Bishop 0 siblings, 1 reply; 19+ messages in thread From: Lei Yu @ 2022-10-19 2:05 UTC (permalink / raw) To: Brad Bishop; +Cc: openbmc On Wed, Oct 19, 2022 at 4:09 AM Brad Bishop <bradleyb@fuzziesquirrel.com> wrote: > > Thanks Lei YU for adding the support desribed below. Sorry for the > necropost but I had a couple questions (below) for anyone that feels > like indulging me with answers. > > On Mon, 2020-12-07 at 15:35 +0800, Lei Yu wrote: > > This mail is to propose some changes in openbmc to support custom SEL > > records. > > > > * Background > > The phosphor-sel-logger is the service to provide methods (IpmiSelAdd, > > IpmiSelAddOem) to add custom SEL records. > > However: > > 1. It only sends the log into the journal; > > 2. It depends on > > meta-intel/meta-common/recipes-extended/rsyslog/rsyslog/rsyslog.conf > > to filter the journal log and write to "/var/log/ipmi_sel" > > 3. It depends on intel-ipmi-oem to parse the "/var/log/ipmi_sel" to > > provide the SEL entries. > > > > In short, phosphor-sel-logger only works with the meta-intel layer and > > intel-ipmi-oem. > > > > To make it support general usage, several changes are submitted to > > gerrit: > > https://gerrit.openbmc-project.xyz/q/topic:%22dev-add-custom-sel%22+(status:open%20OR%20status:merged) > > To make it support general usage, I think we could have just moved the > rsyslog.conf to meta-phosphor and moved the handlers from intel-ipmi-oem > to phosphor-host-ipmid, correct? Instead, we invented something > completely new. My question is, why? There are several reasons: 1. The rsyslog way is undocumented and the handler is not in upstream OpenBMC. 2. The rsyslog way puts the SEL in a file and thus there are no DBus objects, which makes it harder to work with other services. 3. The above SEL file needs extra configs to preserve, otherwise it's lost after BMC reboot. (In upstream, the /var/log points to /var/volatile/log, Intel's fork makes it persistent) > > Whatever the reason, OpenBMC now has two ways of doing SELs, and I am > tasked with picking one. Knowing what was wrong with the rsyslog > approach that led you to invent something new to replace it would help > me, (and others I suspect), make that decision. Indeed, but the rsyslog way is not really (and fully) upstream. With `SEL.errors.yaml` interface and the SEL_LOGGER_SEND_TO_LOGGING_SERVICE option, in phosphor-sel-logger, we have the ability to add logging entries to phosphor-logging, and let ipmid to parse it as SEL. There is one drawback though, the logging entries that are not "SEL.error" could not be parsed as normal SEL, the current behavior of ipmid parses it as an "undetermined system error". -- BRs, Lei YU ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-19 2:05 ` Lei Yu @ 2022-10-19 14:43 ` Brad Bishop 2022-10-19 15:50 ` Bills, Jason M 0 siblings, 1 reply; 19+ messages in thread From: Brad Bishop @ 2022-10-19 14:43 UTC (permalink / raw) To: Lei Yu, Jason Bills; +Cc: openbmc Thanks for the reply Lei Yu. On Wed, 2022-10-19 at 10:05 +0800, Lei Yu wrote: > > 2. The rsyslog way puts the SEL in a file and thus there are no DBus > objects, which makes it harder to work with other services. Are there other services that work with IPMI sels? I know there is a Redfish SEL log. Anything else? > Indeed, but the rsyslog way is not really (and fully) upstream. I'm trying to determine which implementation is a better fit for me based on the technical merits of the solution, not based on what repositories the source code is in. If that ends up being the rsyslog approach, I'd consider helping to move the code and make it fully upstream. In the hopes that it generates additional information about the motivations behind the differing implementations, allow me to ask a somewhat rhetorical question. Jason, to avoid confusing OpenBMC users by having to select from two different SEL implementations with pros and cons of each that are not obvious, would you accept patches that remove the rsyslog based implementation from intel-ipmi-oem (provided the Intel metadata is also updated to use the alternative)? If not, why not? Thanks, brad ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-19 14:43 ` Brad Bishop @ 2022-10-19 15:50 ` Bills, Jason M 2022-10-19 17:10 ` Brad Bishop ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Bills, Jason M @ 2022-10-19 15:50 UTC (permalink / raw) To: openbmc On 10/19/2022 8:43 AM, Brad Bishop wrote: > Thanks for the reply Lei Yu. > > On Wed, 2022-10-19 at 10:05 +0800, Lei Yu wrote: >> >> 2. The rsyslog way puts the SEL in a file and thus there are no DBus >> objects, which makes it harder to work with other services. > > Are there other services that work with IPMI sels? I know there is a > Redfish SEL log. Anything else? bmcweb has a build flag to choose between D-Bus- or journal-based logging. > >> Indeed, but the rsyslog way is not really (and fully) upstream. > > I'm trying to determine which implementation is a better fit for me > based on the technical merits of the solution, not based on what > repositories the source code is in. If that ends up being the rsyslog > approach, I'd consider helping to move the code and make it fully > upstream. > > In the hopes that it generates additional information about the > motivations behind the differing implementations, allow me to ask a > somewhat rhetorical question. Jason, to avoid confusing OpenBMC users > by having to select from two different SEL implementations with pros and > cons of each that are not obvious, would you accept patches that remove > the rsyslog based implementation from intel-ipmi-oem (provided the Intel > metadata is also updated to use the alternative)? If not, why not? Intel had a requirement to support storing at least 4000 log entries. At the time, we were able to get about 400 entries on D-Bus before D-Bus performance became unusable. That was before dbus-broker, so it could perhaps be better today. But I'm guessing there is still a performance impact and arbitrary log limit placed on a system by storing the logs on D-Bus. This log limit is what will make D-Bus log storage a non-starter for Intel. I'd also be curious about the reverse question. Is there any benefit to storing logs on D-Bus that makes it a better solution? At the risk of complicating things more (https://xkcd.com/927/), D-Bus was the primary solution when Intel joined. We created the rsyslog approach because of the limitation imposed by D-Bus. But I know there are still those who don't like the rsyslog approach. Is there a way we can now get together and define a new logging solution that is fully upstream and avoids the drawbacks of both existing solutions? > > Thanks, > brad ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-19 15:50 ` Bills, Jason M @ 2022-10-19 17:10 ` Brad Bishop 2022-10-19 18:05 ` Bills, Jason M 2022-10-21 20:14 ` Patrick Williams 2022-10-21 20:34 ` Patrick Williams 2 siblings, 1 reply; 19+ messages in thread From: Brad Bishop @ 2022-10-19 17:10 UTC (permalink / raw) To: Bills, Jason M; +Cc: openbmc Thanks Jason On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote: >Intel had a requirement to support storing at least 4000 log entries. Ok. So is it fair to assume anyone using the DBus backend does not have this requirement? >At the time, we were able to get about 400 entries on D-Bus before >D-Bus performance became unusable. To anyone using the DBus backend - have you observed similar performance issues? Jason is there a testcase or scenario I can execute to highlighht the issues you refer to concretely? Maybe something like "create 4000 sels, run ipmitool and see how long it takes?" >I'd also be curious about the reverse question. Is there any benefit >to storing logs on D-Bus that makes it a better solution? Yes, this is exactly the question I've been trying to ask. The answer seems only to be that the code is in meta-intel/intel-ipmi-oem - but that is easily fixed by moving the code to meta-phosphor/phosphor-host-ipmid. >At the risk of complicating things more (https://xkcd.com/927/), D-Bus >was the primary solution when Intel joined. We created the rsyslog >approach because of the limitation imposed by D-Bus. But I know there >are still those who don't like the rsyslog approach. Is there a way >we can now get together and define a new logging solution that is >fully upstream and avoids the drawbacks of both existing solutions? I hope so, because doing that would make things a lot easier for our users adopting OpenBMC. Thanks, brad ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-19 17:10 ` Brad Bishop @ 2022-10-19 18:05 ` Bills, Jason M 2022-10-19 20:20 ` Brad Bishop ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Bills, Jason M @ 2022-10-19 18:05 UTC (permalink / raw) To: openbmc On 10/19/2022 11:10 AM, Brad Bishop wrote: > Thanks Jason > > On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote: > >> Intel had a requirement to support storing at least 4000 log entries. > > Ok. So is it fair to assume anyone using the DBus backend does not have > this requirement? That is my assumption, yes. > >> At the time, we were able to get about 400 entries on D-Bus before >> D-Bus performance became unusable. > > To anyone using the DBus backend - have you observed similar performance > issues? > > Jason is there a testcase or scenario I can execute to highlighht the > issues you refer to concretely? Maybe something like "create 4000 sels, > run ipmitool and see how long it takes?" To clarify, my understanding is the D-Bus performance issues were not isolated to just IPMI. All of D-Bus for every BMC service was impacted. If I remember correctly, Ed Tanous is who did the initial evaluation, so he may have more detail. But I think it was similar to what you suggest: Create 4000 logs on D-Bus and check the performance. This could be done with ipmitool. > >> I'd also be curious about the reverse question. Is there any benefit >> to storing logs on D-Bus that makes it a better solution? > > Yes, this is exactly the question I've been trying to ask. The answer > seems only to be that the code is in meta-intel/intel-ipmi-oem - but > that is easily fixed by moving the code to > meta-phosphor/phosphor-host-ipmid. > >> At the risk of complicating things more (https://xkcd.com/927/), D-Bus >> was the primary solution when Intel joined. We created the rsyslog >> approach because of the limitation imposed by D-Bus. But I know there >> are still those who don't like the rsyslog approach. Is there a way >> we can now get together and define a new logging solution that is >> fully upstream and avoids the drawbacks of both existing solutions? > > I hope so, because doing that would make things a lot easier for our > users adopting OpenBMC. My main requirements are to store many logs (at least 4000 was the original number, but I can try to get an updated number if needed) and have them persist across BMC reboots. We currently accomplish this using rsyslog to extract logs from the journal and store them in a persistent text file. How is best to approach starting a new design discussion? Should we continue discussing in this thread? Start a design doc review? Something else? > > Thanks, > brad ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-19 18:05 ` Bills, Jason M @ 2022-10-19 20:20 ` Brad Bishop 2022-10-20 13:24 ` Lei Yu 2022-10-24 17:59 ` Ed Tanous 2 siblings, 0 replies; 19+ messages in thread From: Brad Bishop @ 2022-10-19 20:20 UTC (permalink / raw) To: Bills, Jason M; +Cc: openbmc On Wed, Oct 19, 2022 at 12:05:03PM -0600, Bills, Jason M wrote: >If I remember correctly, Ed Tanous is who did the initial evaluation, >so he may have more detail. But I think it was similar to what you >suggest: Create 4000 logs on D-Bus and check the performance. Ok. It would probably help if performance could be qualified more. What was checked to determine that performace had been impacted? http/ipmitool response latency? >My main requirements are to store many logs (at least 4000 was the >original number, but I can try to get an updated number if needed) and >have them persist across BMC reboots. These requirements are obviously quite reasonable, but they are already satisfied by both implementations. So the requirements that are not satisfied by the DBus implementation would be helpful to the discussion I think. I'm assuming those would be certain reponse time targets for different types of queries, and possibly some other general performance metrics around resource utilization? >How is best to approach starting a new design discussion? Should we For those using the rsyslog backend, I think sharing (performance) requirements would be a good start, as that seems to be the major differentiation feature. For those using the DBus backend, it would be helpful to know if there are any technical reasons why they have not moved to the more performant solution (assuming it actually is). >continue discussing in this thread? Start a design doc review? >Something else? I'm happy to keep the discussion going here... Thanks, brad ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-19 18:05 ` Bills, Jason M 2022-10-19 20:20 ` Brad Bishop @ 2022-10-20 13:24 ` Lei Yu 2022-10-20 14:39 ` Deng Tyler 2022-10-24 17:59 ` Ed Tanous 2 siblings, 1 reply; 19+ messages in thread From: Lei Yu @ 2022-10-20 13:24 UTC (permalink / raw) To: Brad Bishop; +Cc: openbmc, Bills, Jason M On Thu, Oct 20, 2022 at 2:05 AM Bills, Jason M <jason.m.bills@linux.intel.com> wrote: > > > > On 10/19/2022 11:10 AM, Brad Bishop wrote: > > Thanks Jason > > > > On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote: > > > >> Intel had a requirement to support storing at least 4000 log entries. > > Bytedance has a requirement of 1000 log entries. > > Ok. So is it fair to assume anyone using the DBus backend does not have > > this requirement? > > That is my assumption, yes. > > > >> At the time, we were able to get about 400 entries on D-Bus before > >> D-Bus performance became unusable. > > > > To anyone using the DBus backend - have you observed similar performance > > issues? > > We did hit the performance issue, specifically, it is extremely slow during BMC boot, when log-manager restore the log entries and put them on DBus. That's when I start the discussion about https://gerrit.openbmc.org/c/openbmc/phosphor-logging/+/52445 and https://lore.kernel.org/openbmc/CAGm54UHU9s0bTq-AR9tJunoX2Wa9tQ0PH_zWJ2QrYdR3SRqcvg@mail.gmail.com/ Later we resolved the issue by: * Applying the patch https://gerrit.openbmc.org/c/openbmc/phosphor-objmgr/+/53904 * Implement the SEL cache in ipmid that is already upstreamed * Improve the SEL cache by serialization (not upstreamed) Eventually we get fair performance on SEL handling (with 1000 entries), it should handle 4000 as well. > > Jason is there a testcase or scenario I can execute to highlighht the > > issues you refer to concretely? Maybe something like "create 4000 sels, > > run ipmitool and see how long it takes?" > > To clarify, my understanding is the D-Bus performance issues were not > isolated to just IPMI. All of D-Bus for every BMC service was impacted. > > If I remember correctly, Ed Tanous is who did the initial evaluation, so > he may have more detail. But I think it was similar to what you > suggest: Create 4000 logs on D-Bus and check the performance. This > could be done with ipmitool. > > > >> I'd also be curious about the reverse question. Is there any benefit > >> to storing logs on D-Bus that makes it a better solution? > > > > Yes, this is exactly the question I've been trying to ask. The answer > > seems only to be that the code is in meta-intel/intel-ipmi-oem - but > > that is easily fixed by moving the code to > > meta-phosphor/phosphor-host-ipmid. > > > >> At the risk of complicating things more (https://xkcd.com/927/), D-Bus > >> was the primary solution when Intel joined. We created the rsyslog > >> approach because of the limitation imposed by D-Bus. But I know there > >> are still those who don't like the rsyslog approach. Is there a way > >> we can now get together and define a new logging solution that is > >> fully upstream and avoids the drawbacks of both existing solutions? > > > > I hope so, because doing that would make things a lot easier for our > > users adopting OpenBMC. > > My main requirements are to store many logs (at least 4000 was the > original number, but I can try to get an updated number if needed) and > have them persist across BMC reboots. > > We currently accomplish this using rsyslog to extract logs from the > journal and store them in a persistent text file. > > How is best to approach starting a new design discussion? Should we > continue discussing in this thread? Start a design doc review? > Something else? > > > > Thanks, > > brad I would like to add several notes (possibly limitations) about rsyslog's SEL in intel-ipmi-oem, please correct if I was wrong. * It handles the SELs from phosphor-sel-logger, mostly it only contains the threshold events. * It iterates the sel files, and convert the file content into SEL data every time on a request, which does not seem optimal * The "add sel entry" does not really add a sel log, it adds an event entry to Redfish instead. * With above behavior, it basically has two separate types of logs, SEL logs that are from rsyslog, and redfish event logs that are done by "add sel entry". Thus the implementation seems to only support SELs for sensor threshold events, but not for discrete sensors. In bytedance we need a "full" SEL feature that supports both thresholds and discrete sensors. The whole solution is based on the DBus logging, but it involves different repos (ipmid, phosphor-logging, fault-monitor). Part of the implementation is upstreamed but some are internal for now. I would like to share the details when I have bandwidth :) -- BRs, Lei YU ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-20 13:24 ` Lei Yu @ 2022-10-20 14:39 ` Deng Tyler 0 siblings, 0 replies; 19+ messages in thread From: Deng Tyler @ 2022-10-20 14:39 UTC (permalink / raw) To: Lei Yu; +Cc: openbmc, Brad Bishop, Bills, Jason M [-- Attachment #1: Type: text/plain, Size: 5153 bytes --] Hi Lei: I encounter a SEL catch sync issue. If a SEL generated while ipmd collecting all log enttry from file in SEL cache initialized progress then the SEL wouldn't be exist in SEL cache. Do you ever encounter this issue? Tyler Lei Yu <yulei.sh@bytedance.com> 於 2022年10月20日 週四 晚上9:26寫道: > On Thu, Oct 20, 2022 at 2:05 AM Bills, Jason M > <jason.m.bills@linux.intel.com> wrote: > > > > > > > > On 10/19/2022 11:10 AM, Brad Bishop wrote: > > > Thanks Jason > > > > > > On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote: > > > > > >> Intel had a requirement to support storing at least 4000 log entries. > > > > > Bytedance has a requirement of 1000 log entries. > > > > Ok. So is it fair to assume anyone using the DBus backend does not > have > > > this requirement? > > > > That is my assumption, yes. > > > > > >> At the time, we were able to get about 400 entries on D-Bus before > > >> D-Bus performance became unusable. > > > > > > To anyone using the DBus backend - have you observed similar > performance > > > issues? > > > > > We did hit the performance issue, specifically, it is extremely slow > during BMC boot, when log-manager restore the log entries and put them > on DBus. > That's when I start the discussion about > https://gerrit.openbmc.org/c/openbmc/phosphor-logging/+/52445 and > > https://lore.kernel.org/openbmc/CAGm54UHU9s0bTq-AR9tJunoX2Wa9tQ0PH_zWJ2QrYdR3SRqcvg@mail.gmail.com/ > > Later we resolved the issue by: > * Applying the patch > https://gerrit.openbmc.org/c/openbmc/phosphor-objmgr/+/53904 > * Implement the SEL cache in ipmid that is already upstreamed > * Improve the SEL cache by serialization (not upstreamed) > > Eventually we get fair performance on SEL handling (with 1000 > entries), it should handle 4000 as well. > > > > Jason is there a testcase or scenario I can execute to highlighht the > > > issues you refer to concretely? Maybe something like "create 4000 > sels, > > > run ipmitool and see how long it takes?" > > > > To clarify, my understanding is the D-Bus performance issues were not > > isolated to just IPMI. All of D-Bus for every BMC service was impacted. > > > > If I remember correctly, Ed Tanous is who did the initial evaluation, so > > he may have more detail. But I think it was similar to what you > > suggest: Create 4000 logs on D-Bus and check the performance. This > > could be done with ipmitool. > > > > > >> I'd also be curious about the reverse question. Is there any benefit > > >> to storing logs on D-Bus that makes it a better solution? > > > > > > Yes, this is exactly the question I've been trying to ask. The answer > > > seems only to be that the code is in meta-intel/intel-ipmi-oem - but > > > that is easily fixed by moving the code to > > > meta-phosphor/phosphor-host-ipmid. > > > > > >> At the risk of complicating things more (https://xkcd.com/927/), > D-Bus > > >> was the primary solution when Intel joined. We created the rsyslog > > >> approach because of the limitation imposed by D-Bus. But I know there > > >> are still those who don't like the rsyslog approach. Is there a way > > >> we can now get together and define a new logging solution that is > > >> fully upstream and avoids the drawbacks of both existing solutions? > > > > > > I hope so, because doing that would make things a lot easier for our > > > users adopting OpenBMC. > > > > My main requirements are to store many logs (at least 4000 was the > > original number, but I can try to get an updated number if needed) and > > have them persist across BMC reboots. > > > > We currently accomplish this using rsyslog to extract logs from the > > journal and store them in a persistent text file. > > > > How is best to approach starting a new design discussion? Should we > > continue discussing in this thread? Start a design doc review? > > Something else? > > > > > > Thanks, > > > brad > > I would like to add several notes (possibly limitations) about > rsyslog's SEL in intel-ipmi-oem, please correct if I was wrong. > * It handles the SELs from phosphor-sel-logger, mostly it only > contains the threshold events. > * It iterates the sel files, and convert the file content into SEL > data every time on a request, which does not seem optimal > * The "add sel entry" does not really add a sel log, it adds an event > entry to Redfish instead. > * With above behavior, it basically has two separate types of logs, > SEL logs that are from rsyslog, and redfish event logs that are done > by "add sel entry". Thus the implementation seems to only support SELs > for sensor threshold events, but not for discrete sensors. > > In bytedance we need a "full" SEL feature that supports both > thresholds and discrete sensors. > The whole solution is based on the DBus logging, but it involves > different repos (ipmid, phosphor-logging, fault-monitor). Part of the > implementation is upstreamed but some are internal for now. > I would like to share the details when I have bandwidth :) > > -- > BRs, > Lei YU > [-- Attachment #2: Type: text/html, Size: 6598 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-19 18:05 ` Bills, Jason M 2022-10-19 20:20 ` Brad Bishop 2022-10-20 13:24 ` Lei Yu @ 2022-10-24 17:59 ` Ed Tanous 2022-10-24 19:03 ` Brad Bishop 2 siblings, 1 reply; 19+ messages in thread From: Ed Tanous @ 2022-10-24 17:59 UTC (permalink / raw) To: Bills, Jason M; +Cc: openbmc On Wed, Oct 19, 2022 at 11:05 AM Bills, Jason M <jason.m.bills@linux.intel.com> wrote: > > > > On 10/19/2022 11:10 AM, Brad Bishop wrote: > > Thanks Jason > > > > On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote: > > > >> Intel had a requirement to support storing at least 4000 log entries. > > > > Ok. So is it fair to assume anyone using the DBus backend does not have > > this requirement? > > That is my assumption, yes. > > > >> At the time, we were able to get about 400 entries on D-Bus before > >> D-Bus performance became unusable. > > > > To anyone using the DBus backend - have you observed similar performance > > issues? > > > > Jason is there a testcase or scenario I can execute to highlighht the > > issues you refer to concretely? Maybe something like "create 4000 sels, > > run ipmitool and see how long it takes?" > > To clarify, my understanding is the D-Bus performance issues were not > isolated to just IPMI. All of D-Bus for every BMC service was impacted. > > If I remember correctly, Ed Tanous is who did the initial evaluation, so > he may have more detail. But I think it was similar to what you > suggest: Create 4000 logs on D-Bus and check the performance. This > could be done with ipmitool. From what I recall, the requirements were: - Ability to store 4000 logs in a rotating buffer (original desire was 10,000, but 4000 was picked as a middle-ground that could be implemented). - Ability to log 100+ entries per second, including when buffers get overwritten. - (abstract) Log storage should be aware of hardware limitations (SPI flash cell write endurance) and allow writing N logs per minute for the lifetime of the machine without hardware failure. (I forget the value of N). - "ipmitool sensor list" should return the results from a full sel log in less than 1 second (renegotiated from 200ms, the faster the better). - The logging implementation should be able to support a well-formed, version controlled, Redfish MessageRegistry to the DMTF specifications. - The logging implementation should be able to implement a well-formed, stable, and ACID compliant IPMI SEL command implementation. I don't believe the current DBus implementation can meet the previous requirements, but if it's capable of that these days, it would be excellent to consolidate. > > > >> I'd also be curious about the reverse question. Is there any benefit > >> to storing logs on D-Bus that makes it a better solution? > > > > Yes, this is exactly the question I've been trying to ask. The answer > > seems only to be that the code is in meta-intel/intel-ipmi-oem - but > > that is easily fixed by moving the code to > > meta-phosphor/phosphor-host-ipmid. > > > >> At the risk of complicating things more (https://xkcd.com/927/), D-Bus > >> was the primary solution when Intel joined. We created the rsyslog > >> approach because of the limitation imposed by D-Bus. But I know there > >> are still those who don't like the rsyslog approach. Is there a way > >> we can now get together and define a new logging solution that is > >> fully upstream and avoids the drawbacks of both existing solutions? > > > > I hope so, because doing that would make things a lot easier for our > > users adopting OpenBMC. > > My main requirements are to store many logs (at least 4000 was the > original number, but I can try to get an updated number if needed) and > have them persist across BMC reboots. > > We currently accomplish this using rsyslog to extract logs from the > journal and store them in a persistent text file. > > How is best to approach starting a new design discussion? Should we > continue discussing in this thread? Start a design doc review? > Something else? > > > > Thanks, > > brad ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-24 17:59 ` Ed Tanous @ 2022-10-24 19:03 ` Brad Bishop 2022-10-24 20:19 ` Ed Tanous 2022-10-24 22:56 ` Vernon Mauery 0 siblings, 2 replies; 19+ messages in thread From: Brad Bishop @ 2022-10-24 19:03 UTC (permalink / raw) To: Ed Tanous; +Cc: Bills, Jason M, openbmc This is helpful, thanks Ed. On Mon, Oct 24, 2022 at 10:59:51AM -0700, Ed Tanous wrote: >From what I recall, the requirements were: >- Ability to store 4000 logs in a rotating buffer (original desire was >10,000, but 4000 was picked as a middle-ground that could be >implemented). A DBus object based implementation could meet this requirement, right? >- Ability to log 100+ entries per second, including when buffers get >overwritten. I guess I would not be shocked if DBus objects + serialization might not be able to sustain this rate of incoming logs. Maybe it depends on the filesystem or how the data is serialized in the filesystem. The DBus approach creates many files. Obviously the syslog approach is only using a couple of files. Do you think this kind of requirement is typical? Quoting Patrick from another thread here: >don't think anyone is intending to create 10k events in the span of >a minute 100/s is only 6k in a minute but that is getting pretty close... >- (abstract) Log storage should be aware of hardware limitations (SPI >flash cell write endurance) and allow writing N logs per minute for >the lifetime of the machine without hardware failure. (I forget the >value of N). Do you think the rsyslog implementation does better at this? Why? >- "ipmitool sensor list" should return the results from a full sel log >in less than 1 second (renegotiated from 200ms, the faster the >better). Ok, again I would not be shocked if DBus objects weren't able to deliver on this. >- The logging implementation should be able to support a well-formed, >version controlled, Redfish MessageRegistry to the DMTF >specifications. Do you think a DBus object based implementation could meet this requirement? >- The logging implementation should be able to implement a >well-formed, stable, and ACID compliant IPMI SEL command >implementation. Do you think a DBus object based implementation could meet this requirement? > >I don't believe the current DBus implementation can meet the previous >requirements, The motivation of my questions above is to understand which requirements cannot be met by something based on DBus objects. Thanks, brad ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-24 19:03 ` Brad Bishop @ 2022-10-24 20:19 ` Ed Tanous 2022-10-25 20:18 ` Bills, Jason M 2022-10-24 22:56 ` Vernon Mauery 1 sibling, 1 reply; 19+ messages in thread From: Ed Tanous @ 2022-10-24 20:19 UTC (permalink / raw) To: Brad Bishop; +Cc: Bills, Jason M, openbmc On Mon, Oct 24, 2022 at 12:03 PM Brad Bishop <bradleyb@fuzziesquirrel.com> wrote: > > This is helpful, thanks Ed. > > On Mon, Oct 24, 2022 at 10:59:51AM -0700, Ed Tanous wrote: > >From what I recall, the requirements were: > >- Ability to store 4000 logs in a rotating buffer (original desire was > >10,000, but 4000 was picked as a middle-ground that could be > >implemented). > > A DBus object based implementation could meet this requirement, right? > > >- Ability to log 100+ entries per second, including when buffers get > >overwritten. > > I guess I would not be shocked if DBus objects + serialization might not > be able to sustain this rate of incoming logs. Maybe it depends on the > filesystem or how the data is serialized in the filesystem. The DBus > approach creates many files. Obviously the syslog approach is only > using a couple of files. > > Do you think this kind of requirement is typical? Quoting Patrick from > another thread here: There are cases where every sensor will cross a threshold at the same moment (generally because a threshold was changed). Is this a "typical" use case, probably not, but if it's trivial to DOS the BMC with events with a couple commands, it's not great. > > >don't think anyone is intending to create 10k events in the span of > >a minute Intending to, no. Created in an error case, or in the case of a failing piece of hardware, it can happen. > > 100/s is only 6k in a minute but that is getting pretty close... > > >- (abstract) Log storage should be aware of hardware limitations (SPI > >flash cell write endurance) and allow writing N logs per minute for > >the lifetime of the machine without hardware failure. (I forget the > >value of N). > > Do you think the rsyslog implementation does better at this? Why? @Jason? I know there were a lot of worries about write expansion (N byte sel log turning into 100XN bytes of JSON) wearing out flash faster, and having some interactions with jffs2. > > >- "ipmitool sensor list" should return the results from a full sel log > >in less than 1 second (renegotiated from 200ms, the faster the > >better). > > Ok, again I would not be shocked if DBus objects weren't able to deliver > on this. > > >- The logging implementation should be able to support a well-formed, > >version controlled, Redfish MessageRegistry to the DMTF > >specifications. > > Do you think a DBus object based implementation could meet this > requirement? Today, as defined, without duplicating the logging strings, no, but DBus is just an IPC; I'm fairly certain we could define a DBus based logging mechanism that met these requirements, but the key is that the Redfish instance (bmcweb in this case) needs a-priori knowledge of every possible log that the system can publish, and version them with DMTF-appropriate semver (existing messages changes rev the patch version, new messages revs the subminor version). The existing rsyslog implementation places the log strings into bmcweb, so the only thing being transferred over the IPC is the MessageId and the specific-instance arguments, which in theory, increases the performance quite a bit, and avoids duplicating the strings in two places. > > >- The logging implementation should be able to implement a > >well-formed, stable, and ACID compliant IPMI SEL command > >implementation. > > Do you think a DBus object based implementation could meet this > requirement? Same answer, with today's code, no. DBus is just an IPC, we could potentially define one. > > > > >I don't believe the current DBus implementation can meet the previous > >requirements, > > The motivation of my questions above is to understand which requirements > cannot be met by something based on DBus objects. Cool. > > Thanks, > brad ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-24 20:19 ` Ed Tanous @ 2022-10-25 20:18 ` Bills, Jason M 0 siblings, 0 replies; 19+ messages in thread From: Bills, Jason M @ 2022-10-25 20:18 UTC (permalink / raw) To: openbmc On 10/24/2022 2:19 PM, Ed Tanous wrote: > On Mon, Oct 24, 2022 at 12:03 PM Brad Bishop > <bradleyb@fuzziesquirrel.com> wrote: >> >> This is helpful, thanks Ed. >> >> On Mon, Oct 24, 2022 at 10:59:51AM -0700, Ed Tanous wrote: >> >From what I recall, the requirements were: >>> - Ability to store 4000 logs in a rotating buffer (original desire was >>> 10,000, but 4000 was picked as a middle-ground that could be >>> implemented). >> >> A DBus object based implementation could meet this requirement, right? >> >>> - Ability to log 100+ entries per second, including when buffers get >>> overwritten. >> >> I guess I would not be shocked if DBus objects + serialization might not >> be able to sustain this rate of incoming logs. Maybe it depends on the >> filesystem or how the data is serialized in the filesystem. The DBus >> approach creates many files. Obviously the syslog approach is only >> using a couple of files. >> >> Do you think this kind of requirement is typical? Quoting Patrick from >> another thread here: > > There are cases where every sensor will cross a threshold at the same > moment (generally because a threshold was changed). Is this a > "typical" use case, probably not, but if it's trivial to DOS the BMC > with events with a couple commands, it's not great. > >> >>> don't think anyone is intending to create 10k events in the span of >>> a minute > > Intending to, no. Created in an error case, or in the case of a > failing piece of hardware, it can happen. > >> >> 100/s is only 6k in a minute but that is getting pretty close... >> >>> - (abstract) Log storage should be aware of hardware limitations (SPI >>> flash cell write endurance) and allow writing N logs per minute for >>> the lifetime of the machine without hardware failure. (I forget the >>> value of N). >> >> Do you think the rsyslog implementation does better at this? Why? > > @Jason? I know there were a lot of worries about write expansion (N > byte sel log turning into 100XN bytes of JSON) wearing out flash > faster, and having some interactions with jffs2. > Yes. We ran into this issue both with trying to use individual files per SEL entry and with persisting the journal. At the time, we only had 4MB of SPI flash space for persistent storage, and both of those methods quickly filled that up. We now have 8MB, but only a small part of that can be dedicated to persisting logs, so we still have a requirement that persisting many logs should not consume too much persistent storage space. >> >>> - "ipmitool sensor list" should return the results from a full sel log >>> in less than 1 second (renegotiated from 200ms, the faster the >>> better). >> >> Ok, again I would not be shocked if DBus objects weren't able to deliver >> on this. >> >>> - The logging implementation should be able to support a well-formed, >>> version controlled, Redfish MessageRegistry to the DMTF >>> specifications. >> >> Do you think a DBus object based implementation could meet this >> requirement? > > Today, as defined, without duplicating the logging strings, no, but > DBus is just an IPC; I'm fairly certain we could define a DBus based > logging mechanism that met these requirements, but the key is that the > Redfish instance (bmcweb in this case) needs a-priori knowledge of > every possible log that the system can publish, and version them with > DMTF-appropriate semver (existing messages changes rev the patch > version, new messages revs the subminor version). The existing > rsyslog implementation places the log strings into bmcweb, so the only > thing being transferred over the IPC is the MessageId and the > specific-instance arguments, which in theory, increases the > performance quite a bit, and avoids duplicating the strings in two > places. > >> >>> - The logging implementation should be able to implement a >>> well-formed, stable, and ACID compliant IPMI SEL command >>> implementation. >> >> Do you think a DBus object based implementation could meet this >> requirement? > > Same answer, with today's code, no. DBus is just an IPC, we could > potentially define one. > >> >>> >>> I don't believe the current DBus implementation can meet the previous >>> requirements, >> >> The motivation of my questions above is to understand which requirements >> cannot be met by something based on DBus objects. > > Cool. > > >> >> Thanks, >> brad ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-24 19:03 ` Brad Bishop 2022-10-24 20:19 ` Ed Tanous @ 2022-10-24 22:56 ` Vernon Mauery 1 sibling, 0 replies; 19+ messages in thread From: Vernon Mauery @ 2022-10-24 22:56 UTC (permalink / raw) To: Brad Bishop; +Cc: Ed Tanous, Bills, Jason M, openbmc On 24-Oct-2022 03:03 PM, Brad Bishop wrote: >This is helpful, thanks Ed. > >On Mon, Oct 24, 2022 at 10:59:51AM -0700, Ed Tanous wrote: >>From what I recall, the requirements were: >>- Ability to store 4000 logs in a rotating buffer (original desire was >>10,000, but 4000 was picked as a middle-ground that could be >>implemented). > >A DBus object based implementation could meet this requirement, right? > >>- Ability to log 100+ entries per second, including when buffers get >>overwritten. > >I guess I would not be shocked if DBus objects + serialization might >not be able to sustain this rate of incoming logs. Maybe it depends >on the filesystem or how the data is serialized in the filesystem. >The DBus approach creates many files. Obviously the syslog approach >is only using a couple of files. > >Do you think this kind of requirement is typical? Quoting Patrick >from another thread here: > >>don't think anyone is intending to create 10k events in the span of >>a minute > >100/s is only 6k in a minute but that is getting pretty close... I think 100/s is a pretty high rate. Actually creating events at that rate seems like a signal that the system is in a bit of distress. Really, I think the gating factor is how fast they get processed at boot time. If loading the events onto D-Bus as objects at boot time is the same cost as creating them to start with, 100/s is insufficient for a large number of log entries. >>- (abstract) Log storage should be aware of hardware limitations (SPI >>flash cell write endurance) and allow writing N logs per minute for >>the lifetime of the machine without hardware failure. (I forget the >>value of N). > >Do you think the rsyslog implementation does better at this? Why? One file with jffs2 uses fewer bytes than many files. When we were trying to figure out how to do a file-based implementation to start with, we tried individual files and found we ran out of space quickly even though the total size of the files was not that great. We even tried using empty files, encoding the data into the filenames themselves in hopes to only use inodes but that seemed to suffer the same issue. A single file that held all the entries was fast, small, and easy to handle. >>- "ipmitool sensor list" should return the results from a full sel log >>in less than 1 second (renegotiated from 200ms, the faster the >>better). > >Ok, again I would not be shocked if DBus objects weren't able to >deliver on this. I imagine dbus-broker could handle it, especially on a dual-core system. I am not sure the systemd dbus daemon was able to handle it on the ast2500. >>- The logging implementation should be able to support a well-formed, >>version controlled, Redfish MessageRegistry to the DMTF >>specifications. > >Do you think a DBus object based implementation could meet this >requirement? > >>- The logging implementation should be able to implement a >>well-formed, stable, and ACID compliant IPMI SEL command >>implementation. > >Do you think a DBus object based implementation could meet this >requirement? > >> >>I don't believe the current DBus implementation can meet the previous >>requirements, > >The motivation of my questions above is to understand which >requirements cannot be met by something based on DBus objects. I think it is possible to come up with a dbus interface that supports the various requirements in this thread. I don't believe that the current implementation is sufficient though. --Vernon ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-19 15:50 ` Bills, Jason M 2022-10-19 17:10 ` Brad Bishop @ 2022-10-21 20:14 ` Patrick Williams 2022-10-24 22:44 ` Vernon Mauery 2022-10-21 20:34 ` Patrick Williams 2 siblings, 1 reply; 19+ messages in thread From: Patrick Williams @ 2022-10-21 20:14 UTC (permalink / raw) To: Bills, Jason M; +Cc: openbmc [-- Attachment #1: Type: text/plain, Size: 3377 bytes --] On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote: > Intel had a requirement to support storing at least 4000 log entries. > At the time, we were able to get about 400 entries on D-Bus before D-Bus > performance became unusable. > > That was before dbus-broker, so it could perhaps be better today. I was surprised that there would be much performance impact to dbus as a whole because there should not be any impact to the bus because one process decided to host a bunch of objects. I can understand _that_ process becoming slower if there are algorithmic problems with it. I did an experiment on an AST2600 system where I modified phosphor-logging to support 20k log entries and then created 10k of them. ``` $ cat meta-facebook/recipes-phosphor/logging/phosphor-logging_%.bbappend FILESEXTRAPATHS:prepend := "${THISDIR}/${PN}:" EXTRA_OEMESON += "-Derror_cap=20000" ``` What I've found can be summarized as follows: - Overall there is no impact to the dbus by creating a large number of log entries. Interactions with other daemons happen just fine with no performance impact. - Creating 10k log entries does take a while. Most of this time is observed (via top) in jffs2 but there is also some peaky objmgr activity. There might be some improvements to be had there, but I don't think anyone is intending to create 10k events in the span of a minute. - Dumping all the events from phosphor-logging is slow when there are 10k of them. It took 8-11s. I didn't have `strace` installed, but it seemed like much of this was coming from `busctl` processing the result and not from the bus transfer itself, but more investigation would be required. - Deleting all 10k of the events timed out at a dbus level, but still succeeded. Almost all of the time was spent in jffs2. I'm not suggesting there aren't optimizations we can do to phosphor-logging, but it doesn't seem like dbus itself is a direct concern. ``` # busctl call xyz.openbmc_project.Logging \ /xyz/openbmc_project/logging \ xyz.openbmc_project.Collection.DeleteAll \ DeleteAll # time busctl call xyz.openbmc_project.Logging \ /xyz/openbmc_project/logging \ org.freedesktop.DBus.ObjectManager \ GetManagedObjects > /dev/null real 0m0.017s user 0m0.015s sys 0m0.001s # time busctl tree xyz.openbmc_project.VirtualSensor > /dev/null real 0m0.025s user 0m0.019s sys 0m0.001s ### Create 10000 log entries. # for i in $(seq 0 10000) ; do busctl call \ xyz.openbmc_project.Logging /xyz/openbmc_project/logging \ xyz.openbmc_project.Logging.Create Create ssa{ss} \ "Hello.$i" \ "xyz.openbmc_project.Logging.Entry.Level.Warning" 0 ; done # time busctl call xyz.openbmc_project.Logging \ /xyz/openbmc_project/logging \ org.freedesktop.DBus.ObjectManager \ GetManagedObjects > /dev/null real 0m11.722s user 0m9.130s sys 0m0.071s # time busctl tree xyz.openbmc_project.VirtualSensor > /dev/null real 0m0.024s user 0m0.019s sys 0m0.001s ``` -- Patrick Williams [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-21 20:14 ` Patrick Williams @ 2022-10-24 22:44 ` Vernon Mauery 0 siblings, 0 replies; 19+ messages in thread From: Vernon Mauery @ 2022-10-24 22:44 UTC (permalink / raw) To: Patrick Williams; +Cc: Bills, Jason M, openbmc On 21-Oct-2022 03:14 PM, Patrick Williams wrote: >On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote: > >> Intel had a requirement to support storing at least 4000 log entries. >> At the time, we were able to get about 400 entries on D-Bus before D-Bus >> performance became unusable. >> >> That was before dbus-broker, so it could perhaps be better today. > >I was surprised that there would be much performance impact to dbus as a >whole because there should not be any impact to the bus because one >process decided to host a bunch of objects. I can understand _that_ >process becoming slower if there are algorithmic problems with it. I suspect that it was a combination of the switch to dbus-broker in addition to rewriting the mapper, which made the number of SEL entries phosphor-logging is capable of handling go up from what it was before. >I did an experiment on an AST2600 system where I modified phosphor-logging >to support 20k log entries and then created 10k of them. > >``` >$ cat meta-facebook/recipes-phosphor/logging/phosphor-logging_%.bbappend >FILESEXTRAPATHS:prepend := "${THISDIR}/${PN}:" > >EXTRA_OEMESON += "-Derror_cap=20000" >``` > >What I've found can be summarized as follows: > > - Overall there is no impact to the dbus by creating a large number > of log entries. Interactions with other daemons happen just fine > with no performance impact. Right, I don't expect that once they are created that they would have a meaningful impact other than maybe some sort of memory footprint, though, that footprint is magnified from just phosphor-logging by the mapper. > - Creating 10k log entries does take a while. Most of this time is > observed (via top) in jffs2 but there is also some peaky objmgr > activity. There might be some improvements to be had there, but I > don't think anyone is intending to create 10k events in the span of > a minute. This is really my biggest concern at this point. The OpenBMC is already the slowest-to-boot BMC firmware that I have worked on in the past 10 years, and that is in the face of faster BMC processors and faster RAM. Delay the 'fully booted' state of the BMC for this will cause validation bugs because the BMC is changing behavior even though it should be at a stable state. > - Dumping all the events from phosphor-logging is slow when there are > 10k of them. It took 8-11s. I didn't have `strace` installed, but > it seemed like much of this was coming from `busctl` processing the > result and not from the bus transfer itself, but more investigation > would be required. I think more investigation needs to be done here. We should be limited by the network, not by accessing the items. > - Deleting all 10k of the events timed out at a dbus level, but still > succeeded. Almost all of the time was spent in jffs2. This would be a lot faster if all the items were in a single file, which is a change that could be made independently of whether or not the individual log entries are hosted as dbus objects. --Vernon ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-19 15:50 ` Bills, Jason M 2022-10-19 17:10 ` Brad Bishop 2022-10-21 20:14 ` Patrick Williams @ 2022-10-21 20:34 ` Patrick Williams 2022-10-25 20:37 ` Bills, Jason M 2 siblings, 1 reply; 19+ messages in thread From: Patrick Williams @ 2022-10-21 20:34 UTC (permalink / raw) To: Bills, Jason M; +Cc: openbmc [-- Attachment #1: Type: text/plain, Size: 2834 bytes --] On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote: > I'd also be curious about the reverse question. Is there any benefit to > storing logs on D-Bus that makes it a better solution? ... > Is there a way we can now get together and define a new logging solution > that is fully upstream and avoids the drawbacks of both existing solutions? First and foremost I'd like to see consistency come out of this. If there is another proposal for how to do it that we can all consolidate on (and people are willing to put in effort to get there) then I'm on-board. It seems to me like the lowest friction way to get there, with the best maintainability, is to use the phosphor-logging APIs even if we end up not putting them into d-bus entries. It happens that phosphor-logging stores the instances on d-bus, but the more important aspect to me is that we have a more consistent API for defining and creating errors and events. The "rsyslog-way" is that you make very specific journal entries that the rsyslog magic knows about, but there are a few issues with it: 1. We don't have any consistency in what, when, and how events are logged. We even have cases within the same repository (looking at dbus-sensors) where some of the implementations make the magic SEL records and others do not. Additionally, they're not required to be the same format. Some maintainers have even outright rejected patches with the "magic log statements". 2. There is no way to generate something like a Redfish message registry for the events, because they're just arbitrary strings that are sprinkled around. It isn't even easy to programatically search the code for them because there are 4 different approaches to that: cout/cerr, direct journald, phosphor-logging "v1", and phosphor-logging lg2. 3. Any kind of automation around these is more at the whim of whatever the developers / maintainers decide to change. It is, for example, really difficult for me to write data center tooling that reacts to events like "we just lost pgood to the host" because I have to read through the code to find the specific text and hope it never changes. Conversely, the phosphor-logging APIs leverage YAML-based error specifiers, which can be easily transposed into a Redfish message registry, and happen to also be the same structure we use for inter-process errors on d-bus calls. While I have to review the implementations to make sure they're appropriately created, I have far less concern about them disappearing or changing once they are in place (and I can review the changes to the YAML specifiers to keep tabs on what changes their might be). -- Patrick Williams [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Adding support for custom SEL records 2022-10-21 20:34 ` Patrick Williams @ 2022-10-25 20:37 ` Bills, Jason M 0 siblings, 0 replies; 19+ messages in thread From: Bills, Jason M @ 2022-10-25 20:37 UTC (permalink / raw) To: openbmc On 10/21/2022 2:34 PM, Patrick Williams wrote: > On Wed, Oct 19, 2022 at 09:50:47AM -0600, Bills, Jason M wrote: > >> I'd also be curious about the reverse question. Is there any benefit to >> storing logs on D-Bus that makes it a better solution? > ... >> Is there a way we can now get together and define a new logging solution >> that is fully upstream and avoids the drawbacks of both existing solutions? > > First and foremost I'd like to see consistency come out of this. If > there is another proposal for how to do it that we can all consolidate > on (and people are willing to put in effort to get there) then I'm > on-board. It seems to me like the lowest friction way to get there, with > the best maintainability, is to use the phosphor-logging APIs even if we > end up not putting them into d-bus entries. > I agree that phosphor-logging seems like the right place, so I think looking at that but changing the back-end storage away from D-Bus objects may be a good direction to consider. > It happens that phosphor-logging stores the instances on d-bus, but the > more important aspect to me is that we have a more consistent API for > defining and creating errors and events. The "rsyslog-way" is that you > make very specific journal entries that the rsyslog magic knows about, > but there are a few issues with it: > > 1. We don't have any consistency in what, when, and how events are > logged. We even have cases within the same repository (looking at > dbus-sensors) where some of the implementations make the magic > SEL records and others do not. Additionally, they're not required > to be the same format. Some maintainers have even outright > rejected patches with the "magic log statements". > Yes, this consistency would be good. I tried to add SEL logging into phosphor-logging, but the patch didn't make it through review: https://gerrit.openbmc.org/c/openbmc/phosphor-logging/+/13956. > 2. There is no way to generate something like a Redfish message > registry for the events, because they're just arbitrary strings > that are sprinkled around. It isn't even easy to programatically > search the code for them because there are 4 different approaches > to that: cout/cerr, direct journald, phosphor-logging "v1", and > phosphor-logging lg2. > I think Redfish is a more difficult case to handle, but if we can do it through the same or similar phosphor-logging API as IPMI, then I'm on board. As for searching, it's true that different methods are used to get the log into the journal, but the Redfish MessageId is consistent in all cases and can be programatically searched. > 3. Any kind of automation around these is more at the whim of > whatever the developers / maintainers decide to change. It is, > for example, really difficult for me to write data center tooling > that reacts to events like "we just lost pgood to the host" > because I have to read through the code to find the specific text > and hope it never changes. > Doesn't Redfish solve this issue by guaranteeing the same Message and MessageId are used for all the same events? > Conversely, the phosphor-logging APIs leverage YAML-based error specifiers, > which can be easily transposed into a Redfish message registry, and happen > to also be the same structure we use for inter-process errors on d-bus calls. > While I have to review the implementations to make sure they're > appropriately created, I have far less concern about them disappearing > or changing once they are in place (and I can review the changes to the YAML > specifiers to keep tabs on what changes their might be). > ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2022-10-25 20:38 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-07 7:35 Adding support for custom SEL records Lei Yu 2022-10-18 20:09 ` Brad Bishop 2022-10-19 2:05 ` Lei Yu 2022-10-19 14:43 ` Brad Bishop 2022-10-19 15:50 ` Bills, Jason M 2022-10-19 17:10 ` Brad Bishop 2022-10-19 18:05 ` Bills, Jason M 2022-10-19 20:20 ` Brad Bishop 2022-10-20 13:24 ` Lei Yu 2022-10-20 14:39 ` Deng Tyler 2022-10-24 17:59 ` Ed Tanous 2022-10-24 19:03 ` Brad Bishop 2022-10-24 20:19 ` Ed Tanous 2022-10-25 20:18 ` Bills, Jason M 2022-10-24 22:56 ` Vernon Mauery 2022-10-21 20:14 ` Patrick Williams 2022-10-24 22:44 ` Vernon Mauery 2022-10-21 20:34 ` Patrick Williams 2022-10-25 20:37 ` Bills, Jason M
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.