openbmc.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Thoughts on performance profiling and tools for OpenBMC
@ 2021-03-22 22:05 Sui Chen
  2021-03-23 15:00 ` Joseph Reynolds
  2021-03-25  0:28 ` Andrew Geissler
  0 siblings, 2 replies; 4+ messages in thread
From: Sui Chen @ 2021-03-22 22:05 UTC (permalink / raw)
  To: OpenBMC Maillist

Hello OpenBMC Mailing List,

This email is to discuss some thoughts and work in progress regarding
the performance of BMC. We are aware performance has been brought up a
few times in the past, so this document covers and keeps track of some
recent work. The following is written according to the design doc
format, but might still have some way to go before becoming a more
concrete "set of benchmarks for OpenBMC". As such, any feedback is
appreciated. Thanks for reading this!

[ Problem Description ]

Writing benchmarks and studying profiling results is not only good for
learning the basic APIs and constructs, but also sometimes useful for
debugging complicated interactions between multiple moving parts of
the system.

When developers worked on devices with similar specs as BMCs, such as
smartphones from a few years back, they got performance profiling
support from developer tools.

BMCs have many interesting aspects involving kernel drivers, hardware
interfaces, multi-threading, modern programming language features,
open-source development all packed together into very tight hardware
and software constraints and a build workflow that compiles code from
scratch. During debugging, many steps may be needed to recreate the
scene where performance-related problems arise. Having benchmarks in
this scenario makes the process easier.

As BMC becomes more versatile and runs more workloads, performance
issues may become more imminent.

[ Background and References]

1. BMC performance problems are asked and encountered, and they may be
helped by benchmarks and tools. Related posts
   - “ObjectMapper - quantity limitations?” [1]
   - “dbus-broker caused the system OOM issue” [2]
   - “Issue about (polling every second) makes Entity Manager get stuck” [3]
   - “Performance implication of Sensor Value PropertiesChanged Events” [4]

2. People have started to find solutions for existing and potential
problems. Examples are:
   - io_uring vs epoll [5]
   - shmapper [6]

3. BMC workloads have their own characteristics, namely, the extensive
use of DBus, and the numerous I/O buses, among many others. Some of
these may not have been captured by existing benchmarks on Linux.
These reasons might justify spending efforts on making a BMC-specific
set of benchmarks.

4. There have been proposals for adding performance testing to the CI
[9]. A baseline, a way to measure performance are needed. This
document tries to partially discuss the measurement question.

[ Requirements ]

The benchmarks and tools should report basic metrics such as latency
and throughput. The performance profiling overhead should not distort
performance results.

The contents of the benchmark can evolve quickly to keep itself
up-to-date with the rest of the BMC ecosystem, which also evolves
quickly. This may be comparable to unit tests that are aimed at
getting code coverage for incremental additions to the code base. This
may also be comparable to hardware manufacturers updating their
drivers with performance tuning parameters for newly released
software.

Benchmarks and results should be easy to learn and use, help newcomers
learn the basics, and aid seasoned developers where needed.


[ Proposed Design ]

1. Continue the previous effort [7] on a sensor-reading performance
benchmark for the BMC. This will naturally lead to investigation into
the lower levels such as I2C and async processing.

2. Try the community’s ideas on performance optimization in benchmarks
and measure performance difference. If an optimization generates
performance gain, attempt to land it in OpenBMC code.

3. Distill ideas and observations into performance tools. For example,
enhance or expand the existing DBus visualizer tool [8].

4. Repeat the process in other areas of BMC performance, such as web
request processing.

[ Alternatives Considered ]

Rather than benchmarking real hardware, it might be possible to
directly measure a cycle-accurate full-system timing simulator (such
as GEM5). This approach might be subject to relatively slow simulation
speed compared to running on real hardware. Also, device support may
also affect the feasibility of certain experiments. As such, writing
benchmarks and running them on real hardware might be more feasible in
the short term.

[ References ]

[1] https://lists.ozlabs.org/pipermail/openbmc/2021-February/024978.html
[2] https://lists.ozlabs.org/pipermail/openbmc/2021-February/024895.html
[3] https://lists.ozlabs.org/pipermail/openbmc/2021-February/024914.html
[4] https://lists.ozlabs.org/pipermail/openbmc/2021-February/024889.html
[5] https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.6-IO-uring-Tests
[6] https://lists.ozlabs.org/pipermail/openbmc/2021-February/024908.html
[7] https://gerrit.openbmc-project.xyz/c/openbmc/openbmc-tools/+/35387
[8] https://github.com/openbmc/webui-vue/issues/41
[9] https://github.com/ibm-openbmc/dev/issues/73

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Thoughts on performance profiling and tools for OpenBMC
  2021-03-22 22:05 Thoughts on performance profiling and tools for OpenBMC Sui Chen
@ 2021-03-23 15:00 ` Joseph Reynolds
  2021-03-25  0:28 ` Andrew Geissler
  1 sibling, 0 replies; 4+ messages in thread
From: Joseph Reynolds @ 2021-03-23 15:00 UTC (permalink / raw)
  To: Sui Chen, OpenBMC Maillist

On 3/22/21 5:05 PM, Sui Chen wrote:
> Hello OpenBMC Mailing List,
>
> This email is to discuss some thoughts and work in progress regarding
> the performance of BMC. We are aware performance has been brought up a
> few times in the past, so this document covers and keeps track of some
> recent work. The following is written according to the design doc
> format, but might still have some way to go before becoming a more
> concrete "set of benchmarks for OpenBMC". As such, any feedback is
> appreciated. Thanks for reading this!

Sui,

I believe there are tie-ins between performance and security.  For 
example, if a BMC user can cause very bad performance or cause the BMC 
to crash, the BMC will not be able to perform its primary function.  
That constitutes a [denial of service][], a security issue.  So I am 
interested in the outcome of BMC performance profiling (but don't have 
resources to contribute).

More specifically, I believe there are tie-ins between the performance 
profiling work and the threat modeling work.  Threat modeling needs an 
architectural model of the interfaces within the BMC, for example, the 
D-Bus and daemon layers.  The [security working group][] has started 
modeling these interfaces, but making progress is hard (search for 
"model" or "threat model").  I understand performance work also needs 
similar models.  I am interested to see any architectural work you have 
in this area.

Thank you!

Joseph

[denial of service]: https://en.wikipedia.org/wiki/Denial-of-service_attack
[security working group]: 
https://github.com/openbmc/openbmc/wiki/Security-working-group


> [ Problem Description ]
>
> Writing benchmarks and studying profiling results is not only good for
> learning the basic APIs and constructs, but also sometimes useful for
> debugging complicated interactions between multiple moving parts of
> the system.
>
> When developers worked on devices with similar specs as BMCs, such as
> smartphones from a few years back, they got performance profiling
> support from developer tools.
>
> BMCs have many interesting aspects involving kernel drivers, hardware
> interfaces, multi-threading, modern programming language features,
> open-source development all packed together into very tight hardware
> and software constraints and a build workflow that compiles code from
> scratch. During debugging, many steps may be needed to recreate the
> scene where performance-related problems arise. Having benchmarks in
> this scenario makes the process easier.
>
> As BMC becomes more versatile and runs more workloads, performance
> issues may become more imminent.
>
> [ Background and References]
>
> 1. BMC performance problems are asked and encountered, and they may be
> helped by benchmarks and tools. Related posts
>     - “ObjectMapper - quantity limitations?” [1]
>     - “dbus-broker caused the system OOM issue” [2]
>     - “Issue about (polling every second) makes Entity Manager get stuck” [3]
>     - “Performance implication of Sensor Value PropertiesChanged Events” [4]
>
> 2. People have started to find solutions for existing and potential
> problems. Examples are:
>     - io_uring vs epoll [5]
>     - shmapper [6]
>
> 3. BMC workloads have their own characteristics, namely, the extensive
> use of DBus, and the numerous I/O buses, among many others. Some of
> these may not have been captured by existing benchmarks on Linux.
> These reasons might justify spending efforts on making a BMC-specific
> set of benchmarks.
>
> 4. There have been proposals for adding performance testing to the CI
> [9]. A baseline, a way to measure performance are needed. This
> document tries to partially discuss the measurement question.
>
> [ Requirements ]
>
> The benchmarks and tools should report basic metrics such as latency
> and throughput. The performance profiling overhead should not distort
> performance results.
>
> The contents of the benchmark can evolve quickly to keep itself
> up-to-date with the rest of the BMC ecosystem, which also evolves
> quickly. This may be comparable to unit tests that are aimed at
> getting code coverage for incremental additions to the code base. This
> may also be comparable to hardware manufacturers updating their
> drivers with performance tuning parameters for newly released
> software.
>
> Benchmarks and results should be easy to learn and use, help newcomers
> learn the basics, and aid seasoned developers where needed.
>
>
> [ Proposed Design ]
>
> 1. Continue the previous effort [7] on a sensor-reading performance
> benchmark for the BMC. This will naturally lead to investigation into
> the lower levels such as I2C and async processing.
>
> 2. Try the community’s ideas on performance optimization in benchmarks
> and measure performance difference. If an optimization generates
> performance gain, attempt to land it in OpenBMC code.
>
> 3. Distill ideas and observations into performance tools. For example,
> enhance or expand the existing DBus visualizer tool [8].
>
> 4. Repeat the process in other areas of BMC performance, such as web
> request processing.
>
> [ Alternatives Considered ]
>
> Rather than benchmarking real hardware, it might be possible to
> directly measure a cycle-accurate full-system timing simulator (such
> as GEM5). This approach might be subject to relatively slow simulation
> speed compared to running on real hardware. Also, device support may
> also affect the feasibility of certain experiments. As such, writing
> benchmarks and running them on real hardware might be more feasible in
> the short term.
>
> [ References ]
>
> [1] https://lists.ozlabs.org/pipermail/openbmc/2021-February/024978.html
> [2] https://lists.ozlabs.org/pipermail/openbmc/2021-February/024895.html
> [3] https://lists.ozlabs.org/pipermail/openbmc/2021-February/024914.html
> [4] https://lists.ozlabs.org/pipermail/openbmc/2021-February/024889.html
> [5] https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.6-IO-uring-Tests
> [6] https://lists.ozlabs.org/pipermail/openbmc/2021-February/024908.html
> [7] https://gerrit.openbmc-project.xyz/c/openbmc/openbmc-tools/+/35387
> [8] https://github.com/openbmc/webui-vue/issues/41
> [9] https://github.com/ibm-openbmc/dev/issues/73


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Thoughts on performance profiling and tools for OpenBMC
  2021-03-22 22:05 Thoughts on performance profiling and tools for OpenBMC Sui Chen
  2021-03-23 15:00 ` Joseph Reynolds
@ 2021-03-25  0:28 ` Andrew Geissler
  2021-04-12  3:12   ` Andrew Jeffery
  1 sibling, 1 reply; 4+ messages in thread
From: Andrew Geissler @ 2021-03-25  0:28 UTC (permalink / raw)
  To: Sui Chen; +Cc: OpenBMC Maillist



> On Mar 22, 2021, at 5:05 PM, Sui Chen <suichen@google.com> wrote:
> 
<snip>
> 
> [ Proposed Design ]
> 
> 1. Continue the previous effort [7] on a sensor-reading performance
> benchmark for the BMC. This will naturally lead to investigation into
> the lower levels such as I2C and async processing.
> 
> 2. Try the community’s ideas on performance optimization in benchmarks
> and measure performance difference. If an optimization generates
> performance gain, attempt to land it in OpenBMC code.
> 
> 3. Distill ideas and observations into performance tools. For example,
> enhance or expand the existing DBus visualizer tool [8].
> 
> 4. Repeat the process in other areas of BMC performance, such as web
> request processing.

I had to workaround a lot of performance issues in our first AST2500 
based systems. A lot of the issues were early in the boot of the BMC
when systemd was starting up all of the different services in parallel
and things like mapper were introspecting all new D-Bus objects 
showing up on the bus.

Moving from python to c++ applications helped a lot. Changing 
application nice levels was not helpful (too many d-bus commands
between apps so if one had a higher priority like mapper it would
timeout waiting for lower priority apps).

AndrewJ and I tried to track some of the issues and tools out on
this wiki:
https://github.com/openbmc/openbmc/wiki/Performance-Profiling-in-OpenBMC

We’ve gotten a bit of a reprieve with our move to the AST2600 but
it’s only a matter of time :)

I’m always a fan on trying to improve existing tools vs. rolling our
own but recognize that’s not always an option.

I’m all for anything and everything we can do in this area! Thanks
for taking the initiative Sui.  

Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Thoughts on performance profiling and tools for OpenBMC
  2021-03-25  0:28 ` Andrew Geissler
@ 2021-04-12  3:12   ` Andrew Jeffery
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Jeffery @ 2021-04-12  3:12 UTC (permalink / raw)
  To: Andrew Geissler, Sui Chen; +Cc: OpenBMC Maillist



On Thu, 25 Mar 2021, at 10:58, Andrew Geissler wrote:
> 
> 
> > On Mar 22, 2021, at 5:05 PM, Sui Chen <suichen@google.com> wrote:
> > 
> <snip>
> > 
> > [ Proposed Design ]
> > 
> > 1. Continue the previous effort [7] on a sensor-reading performance
> > benchmark for the BMC. This will naturally lead to investigation into
> > the lower levels such as I2C and async processing.
> > 
> > 2. Try the community’s ideas on performance optimization in benchmarks
> > and measure performance difference. If an optimization generates
> > performance gain, attempt to land it in OpenBMC code.
> > 
> > 3. Distill ideas and observations into performance tools. For example,
> > enhance or expand the existing DBus visualizer tool [8].
> > 
> > 4. Repeat the process in other areas of BMC performance, such as web
> > request processing.
> 
> I had to workaround a lot of performance issues in our first AST2500 
> based systems. A lot of the issues were early in the boot of the BMC
> when systemd was starting up all of the different services in parallel
> and things like mapper were introspecting all new D-Bus objects 
> showing up on the bus.
> 
> Moving from python to c++ applications helped a lot. Changing 
> application nice levels was not helpful (too many d-bus commands
> between apps so if one had a higher priority like mapper it would
> timeout waiting for lower priority apps).
> 
> AndrewJ and I tried to track some of the issues and tools out on
> this wiki:
> https://github.com/openbmc/openbmc/wiki/Performance-Profiling-in-OpenBMC

Some rambling thoughts:

The wiki page makes a start on this, but I suspect what could be helpful
is a list of tools for capturing and inspecting behaviour at different
levels of the stack. Cribbing from the wiki page a bit:

# Application- and Kernel- Level behaviour
* `strace`
* `perf probe` / `perf record -e ...` (tracepoints, kprobes, uprobes))
* `perf record`: Hot-spot analysis
* Flamegraphs[1]: More hot-spot analysis

[1] http://www.brendangregg.com/flamegraphs.html

# Scheduler behaviour
* `perf sched record`
* `perf timechart`

# Service behaviour
* `systemd-analyze`
* `systemd-bootchart`

# D-Bus behaviour
* `busctl capture`
* `wireshark`
* `dbus-pcap`

`perf timechart` a great place to start when you fail to meet timing
requirements in a complex system (state).

I'm not sure much of this could be integrated into e.g. the visualiser
tool, but I think making OpenBMC easy to instrument is a step in the
right direction.

Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-04-12  3:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-22 22:05 Thoughts on performance profiling and tools for OpenBMC Sui Chen
2021-03-23 15:00 ` Joseph Reynolds
2021-03-25  0:28 ` Andrew Geissler
2021-04-12  3:12   ` Andrew Jeffery

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).