linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 00/15] perf: add workqueue library and use it in synthetic-events
@ 2021-08-20 10:53 Riccardo Mancini
  2021-08-20 10:53 ` [RFC PATCH v3 01/15] perf workqueue: threadpool creation and destruction Riccardo Mancini
                   ` (15 more replies)
  0 siblings, 16 replies; 24+ messages in thread
From: Riccardo Mancini @ 2021-08-20 10:53 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ian Rogers, Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Jiri Olsa, linux-kernel, linux-perf-users,
	Alexey Bayduraev, Riccardo Mancini

Changes in v3:
 - improved separation of threadpool and threadpool_entry method
 - replaced shared workqueue with per-thread workqueue. This should
   improve the performance on big machines (Jiri noticed in his
   experiments a significant performance degradation after 15 threads
   with the shared queue).
 - improved error reporting in both threadpool and workqueue
 - added lazy spinup of threads in workqueue [9/15]
 - added global workqueue [10/15]
 - setup global workqueue in perf record, top and synthesize bench
   [12-14/15] and used in in synthetic events

v2: https://lore.kernel.org/lkml/cover.1627643744.git.rickyman7@gmail.com/
    https://lore.kernel.org/lkml/4f0cd6c8e77c0b4f4d4b8d553a7032757b976e61.1627657061.git.rickyman7@gmail.com/
(something went wrong when sending it and the cover letter has a wrong
Message-Id)

Changes in v2:
 - rename threadpool_struct and its functions to adhere to naming style
 - use ERR_PTR instead of returning NULL
 - add *__strerror functions, removing pr_err from library code
 - wait for threads after creation of all threads, instead of waiting
   after each creation
 - use intention-revealing macros in test code instead of 0 and -1
 - use readn/writen functions

v1: https://lkml.kernel.org/lkml/cover.1626177381.git.rickyman7@gmail.com/

This patchset introduces a new utility library inside perf/util, which
provides a work queue abstraction, which follows the Kernel workqueue API.

The workqueue abstraction is made up by two components:
 - threadpool: which takes care of managing a pool of threads. It is
   inspired by the prototype for threaded trace in perf-record from Alexey:
   https://lore.kernel.org/lkml/cover.1625227739.git.alexey.v.bayduraev@linux.intel.com/
 - workqueue: manages the workers in the threadpool and assigns the work 
   items to the thread-local queues.

On top of the workqueue, a simple parallel-for utility is implemented
which is then showcased in synthetic-events.c, replacing the previous
manual pthread-created threads.

Through some experiments with perf bench, I can see how the new 
workqueue has a slightly higher overhead compared to manual creation of 
threads, but is able to more effectively partition work among threads, 
yielding better results overall (see the last patch for benchmark
detaisl on my machine). 
Since I'm not able to test it on bigger machine, it would be helpful if
someone could also test it and report back his results (thanks to Jiri
and Arnaldo who have already helped me by doing some tests).

Furthermore, the overhead could be reduced by changing the
`work_size` (currently 1), aka the number of dirents that are
processed by a thread before grabbing a lock to get the new work item.
I experimented with different sizes but, while bigger sizes reduce overhead
as expected, they do not scale as well to more threads.

Soon I will also send another patchset applying the workqueue to evlist
operations (open, enable, disable, close).

Thanks,
Riccardo

Riccardo Mancini (15):
  perf workqueue: threadpool creation and destruction
  perf tests: add test for workqueue
  perf workqueue: add threadpool start and stop functions
  perf workqueue: add threadpool execute and wait functions
  tools: add sparse context/locking annotations in compiler-types.h
  perf workqueue: introduce workqueue struct
  perf workqueue: implement worker thread and management
  perf workqueue: add queue_work and flush_workqueue functions
  perf workqueue: spinup threads when needed
  perf workqueue: create global workqueue
  perf workqueue: add utility to execute a for loop in parallel
  perf record: setup global workqueue
  perf top: setup global workqueue
  perf test/synthesis: setup global workqueue
  perf synthetic-events: use workqueue parallel_for

 tools/include/linux/compiler_types.h   |  18 +
 tools/perf/bench/synthesize.c          |  28 +-
 tools/perf/builtin-kvm.c               |   2 +-
 tools/perf/builtin-record.c            |  18 +-
 tools/perf/builtin-top.c               |  19 +-
 tools/perf/builtin-trace.c             |   3 +-
 tools/perf/tests/Build                 |   1 +
 tools/perf/tests/builtin-test.c        |   9 +
 tools/perf/tests/mmap-thread-lookup.c  |   2 +-
 tools/perf/tests/tests.h               |   3 +
 tools/perf/tests/workqueue.c           | 453 +++++++++++++
 tools/perf/util/Build                  |   1 +
 tools/perf/util/synthetic-events.c     | 135 ++--
 tools/perf/util/synthetic-events.h     |   8 +-
 tools/perf/util/workqueue/Build        |   2 +
 tools/perf/util/workqueue/threadpool.c | 619 +++++++++++++++++
 tools/perf/util/workqueue/threadpool.h |  41 ++
 tools/perf/util/workqueue/workqueue.c  | 901 +++++++++++++++++++++++++
 tools/perf/util/workqueue/workqueue.h  | 104 +++
 19 files changed, 2253 insertions(+), 114 deletions(-)
 create mode 100644 tools/perf/tests/workqueue.c
 create mode 100644 tools/perf/util/workqueue/Build
 create mode 100644 tools/perf/util/workqueue/threadpool.c
 create mode 100644 tools/perf/util/workqueue/threadpool.h
 create mode 100644 tools/perf/util/workqueue/workqueue.c
 create mode 100644 tools/perf/util/workqueue/workqueue.h

-- 
2.31.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-08-31 16:57 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20 10:53 [RFC PATCH v3 00/15] perf: add workqueue library and use it in synthetic-events Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 01/15] perf workqueue: threadpool creation and destruction Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 02/15] perf tests: add test for workqueue Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 03/15] perf workqueue: add threadpool start and stop functions Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 04/15] perf workqueue: add threadpool execute and wait functions Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 05/15] tools: add sparse context/locking annotations in compiler-types.h Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 06/15] perf workqueue: introduce workqueue struct Riccardo Mancini
2021-08-24 19:27   ` Namhyung Kim
2021-08-31 16:13     ` Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 07/15] perf workqueue: implement worker thread and management Riccardo Mancini
2021-08-30  7:22   ` Jiri Olsa
2021-08-20 10:53 ` [RFC PATCH v3 08/15] perf workqueue: add queue_work and flush_workqueue functions Riccardo Mancini
2021-08-24 19:40   ` Namhyung Kim
2021-08-31 16:23     ` Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 09/15] perf workqueue: spinup threads when needed Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 10/15] perf workqueue: create global workqueue Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 11/15] perf workqueue: add utility to execute a for loop in parallel Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 12/15] perf record: setup global workqueue Riccardo Mancini
2021-08-20 10:53 ` [RFC PATCH v3 13/15] perf top: " Riccardo Mancini
2021-08-20 10:54 ` [RFC PATCH v3 14/15] perf test/synthesis: " Riccardo Mancini
2021-08-20 10:54 ` [RFC PATCH v3 15/15] perf synthetic-events: use workqueue parallel_for Riccardo Mancini
2021-08-29 21:59 ` [RFC PATCH v3 00/15] perf: add workqueue library and use it in synthetic-events Jiri Olsa
2021-08-31 15:46   ` Jiri Olsa
2021-08-31 16:57     ` Riccardo Mancini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).