[RFC PATCH 00/10] perf: add workqueue library and use it in synthetic-events

* [RFC PATCH 00/10] perf: add workqueue library and use it in synthetic-events
@ 2021-07-13 12:11 Riccardo Mancini
  2021-07-13 12:11 ` [RFC PATCH 01/10] perf workqueue: threadpool creation and destruction Riccardo Mancini
                   ` (11 more replies)
  0 siblings, 12 replies; 33+ messages in thread
From: Riccardo Mancini @ 2021-07-13 12:11 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ian Rogers, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Jiri Olsa, linux-kernel, linux-perf-users,
	Riccardo Mancini

This patchset introduces a new utility library inside perf/util, which
provides a work queue abstraction, which loosely follows the Kernel
workqueue API.

The workqueue abstraction is made up by two components:
 - threadpool: which takes care of managing a pool of threads. It is
   inspired by the prototype for threaded trace in perf-record from Alexey:
   https://lore.kernel.org/lkml/cover.1625227739.git.alexey.v.bayduraev@linux.intel.com/
 - workqueue: manages a shared queue and provides the workers implementation.

On top of the workqueue, a simple parallel-for utility is implemented
which is then showcased in synthetic-events.c, replacing the previous
manual pthread-created threads.

Through some experiments with perf bench, I can see how the new 
workqueue has a higher overhead compared to manual creation of threads, 
but is able to more effectively partition work among threads, yielding 
a better result with more threads.
Furthermore, the overhead could be configured by changing the
`work_size` (currently 1), aka the number of dirents that are 
processed by a thread before grabbing a lock to get the new work item.
I experimented with different sizes but, while bigger sizes reduce overhead
as expected, they do not scale as well to more threads.

I tried to keep the patchset as simple as possible, deferring possible
improvements and features to future work.
Naming a few:
 - in order to achieve a better performance, we could consider using 
   work-stealing instead of a common queue.
 - affinities in the thread pool, as in Alexey prototype for
   perf-record. Doing so would enable reusing the same threadpool for
   different purposes (evlist open, threaded trace, synthetic threads),
   avoiding having to spin up threads multiple times.
 - resizable threadpool, e.g. for lazy spawining of threads.

@Arnaldo
Since I wanted the workqueue to provide a similar API to the Kernel's
workqueue, I followed the naming style I found there, instead of the
usual object__method style that is typically found in perf. 
Let me know if you'd like me to follow perf style instead.

Thanks,
Riccardo

Riccardo Mancini (10):
  perf workqueue: threadpool creation and destruction
  perf tests: add test for workqueue
  perf workqueue: add threadpool start and stop functions
  perf workqueue: add threadpool execute and wait functions
  perf workqueue: add sparse annotation header
  perf workqueue: introduce workqueue struct
  perf workqueue: implement worker thread and management
  perf workqueue: add queue_work and flush_workqueue functions
  perf workqueue: add utility to execute a for loop in parallel
  perf synthetic-events: use workqueue parallel_for

 tools/perf/tests/Build                 |   1 +
 tools/perf/tests/builtin-test.c        |   9 +
 tools/perf/tests/tests.h               |   3 +
 tools/perf/tests/workqueue.c           | 453 +++++++++++++++++
 tools/perf/util/Build                  |   1 +
 tools/perf/util/synthetic-events.c     | 131 +++--
 tools/perf/util/workqueue/Build        |   2 +
 tools/perf/util/workqueue/sparse.h     |  21 +
 tools/perf/util/workqueue/threadpool.c | 516 ++++++++++++++++++++
 tools/perf/util/workqueue/threadpool.h |  29 ++
 tools/perf/util/workqueue/workqueue.c  | 642 +++++++++++++++++++++++++
 tools/perf/util/workqueue/workqueue.h  |  38 ++
 12 files changed, 1771 insertions(+), 75 deletions(-)
 create mode 100644 tools/perf/tests/workqueue.c
 create mode 100644 tools/perf/util/workqueue/Build
 create mode 100644 tools/perf/util/workqueue/sparse.h
 create mode 100644 tools/perf/util/workqueue/threadpool.c
 create mode 100644 tools/perf/util/workqueue/threadpool.h
 create mode 100644 tools/perf/util/workqueue/workqueue.c
 create mode 100644 tools/perf/util/workqueue/workqueue.h

-- 
2.31.1

^ permalink raw reply	[flat|nested] 33+ messages in thread