linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@redhat.com>
To: Riccardo Mancini <rickyman7@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Subject: Re: [RFC PATCH v2 10/10] perf synthetic-events: use workqueue parallel_for
Date: Mon, 9 Aug 2021 14:04:48 +0200	[thread overview]
Message-ID: <YREZ4G1xzncpdsVk@krava> (raw)
In-Reply-To: <0e9bdbcb903b24b95841e09bbae180841b6311ca.1627657061.git.rickyman7@gmail.com>

On Fri, Jul 30, 2021 at 05:34:17PM +0200, Riccardo Mancini wrote:
> To generate synthetic events, perf has the option to use multiple
> threads. These threads are created manually using pthread_created.
> 
> This patch replaces the manual pthread_create with a workqueue,
> using the parallel_for utility.

hi,
I really like this new interface

> 
> Experimental results show that workqueue has a slightly higher overhead,
> but this is repayed by the improved work balancing among threads.

how did you measure that balancing improvement?
is there less kernel cycles spent?

I ran the benchmark and if I'm reading the results correctly I see
performance drop for high cpu numbers (full list attached below).


old perf:                                                                 new perf:

[jolsa@dell-r440-01 perf]$ ./perf.old bench internals synthesize -t       [jolsa@dell-r440-01 perf]$ ./perf bench internals synthesize -t
...
  Number of synthesis threads: 40                                           Number of synthesis threads: 40
    Average synthesis took: 2489.400 usec (+- 49.832 usec)                    Average synthesis took: 4576.500 usec (+- 75.278 usec)
    Average num. events: 956.800 (+- 6.721)                                   Average num. events: 1020.000 (+- 0.000)
    Average time per event 2.602 usec                                         Average time per event 4.487 usec

maybe profiling will show what's going on?

thanks,
jirka


---
[jolsa@dell-r440-01 perf]$ ./perf.old bench internals synthesize -t       [jolsa@dell-r440-01 perf]$ ./perf bench internals synthesize -t
# Running 'internals/synthesize' benchmark:                               # Running 'internals/synthesize' benchmark:
Computing performance of multi threaded perf event synthesis by           Computing performance of multi threaded perf event synthesis by
synthesizing events on CPU 0:                                             synthesizing events on CPU 0:
  Number of synthesis threads: 1                                            Number of synthesis threads: 1 
    Average synthesis took: 7907.100 usec (+- 197.363 usec)                   Average synthesis took: 7972.900 usec (+- 198.158 usec)
    Average num. events: 956.000 (+- 0.000)                                   Average num. events: 936.000 (+- 0.000)
    Average time per event 8.271 usec                                         Average time per event 8.518 usec
  Number of synthesis threads: 2                                            Number of synthesis threads: 2 
    Average synthesis took: 5616.800 usec (+- 61.253 usec)                    Average synthesis took: 5844.700 usec (+- 87.219 usec)
    Average num. events: 958.800 (+- 0.327)                                   Average num. events: 940.000 (+- 0.000)
    Average time per event 5.858 usec                                         Average time per event 6.218 usec
  Number of synthesis threads: 3                                            Number of synthesis threads: 3 
    Average synthesis took: 4274.000 usec (+- 93.293 usec)                    Average synthesis took: 4019.700 usec (+- 67.354 usec)
    Average num. events: 962.000 (+- 0.000)                                   Average num. events: 942.000 (+- 0.000)
    Average time per event 4.443 usec                                         Average time per event 4.267 usec
  Number of synthesis threads: 4                                            Number of synthesis threads: 4 
    Average synthesis took: 3425.700 usec (+- 43.044 usec)                    Average synthesis took: 3382.200 usec (+- 74.652 usec)
    Average num. events: 959.600 (+- 0.933)                                   Average num. events: 944.000 (+- 0.000)
    Average time per event 3.570 usec                                         Average time per event 3.583 usec
  Number of synthesis threads: 5                                            Number of synthesis threads: 5 
    Average synthesis took: 2958.000 usec (+- 82.951 usec)                    Average synthesis took: 3086.500 usec (+- 48.213 usec)
    Average num. events: 966.000 (+- 0.000)                                   Average num. events: 946.000 (+- 0.000)
    Average time per event 3.062 usec                                         Average time per event 3.263 usec
  Number of synthesis threads: 6                                            Number of synthesis threads: 6 
    Average synthesis took: 2808.400 usec (+- 66.868 usec)                    Average synthesis took: 2752.200 usec (+- 56.411 usec)
    Average num. events: 956.800 (+- 0.327)                                   Average num. events: 948.000 (+- 0.000)
    Average time per event 2.935 usec                                         Average time per event 2.903 usec
  Number of synthesis threads: 7                                            Number of synthesis threads: 7 
    Average synthesis took: 2622.900 usec (+- 83.524 usec)                    Average synthesis took: 2548.200 usec (+- 48.042 usec)
    Average num. events: 958.400 (+- 0.267)                                   Average num. events: 950.000 (+- 0.000)
    Average time per event 2.737 usec                                         Average time per event 2.682 usec
  Number of synthesis threads: 8                                            Number of synthesis threads: 8 
    Average synthesis took: 2271.600 usec (+- 29.181 usec)                    Average synthesis took: 2486.600 usec (+- 47.862 usec)
    Average num. events: 972.000 (+- 0.000)                                   Average num. events: 952.000 (+- 0.000)
    Average time per event 2.337 usec                                         Average time per event 2.612 usec
  Number of synthesis threads: 9                                            Number of synthesis threads: 9 
    Average synthesis took: 2372.000 usec (+- 95.495 usec)                    Average synthesis took: 2347.300 usec (+- 23.959 usec)
    Average num. events: 959.200 (+- 0.952)                                   Average num. events: 954.000 (+- 0.000)
    Average time per event 2.473 usec                                         Average time per event 2.460 usec
  Number of synthesis threads: 10                                           Number of synthesis threads: 10
    Average synthesis took: 2544.600 usec (+- 107.569 usec)                   Average synthesis took: 2328.800 usec (+- 14.234 usec)
    Average num. events: 968.400 (+- 3.124)                                   Average num. events: 957.400 (+- 0.306)
    Average time per event 2.628 usec                                         Average time per event 2.432 usec
  Number of synthesis threads: 11                                           Number of synthesis threads: 11
    Average synthesis took: 2299.300 usec (+- 57.597 usec)                    Average synthesis took: 2340.300 usec (+- 34.638 usec)
    Average num. events: 956.000 (+- 0.000)                                   Average num. events: 960.000 (+- 0.000)
    Average time per event 2.405 usec                                         Average time per event 2.438 usec
  Number of synthesis threads: 12                                           Number of synthesis threads: 12
    Average synthesis took: 2545.500 usec (+- 69.557 usec)                    Average synthesis took: 2318.700 usec (+- 15.803 usec)
    Average num. events: 974.800 (+- 0.611)                                   Average num. events: 963.800 (+- 0.200)
    Average time per event 2.611 usec                                         Average time per event 2.406 usec
  Number of synthesis threads: 13                                           Number of synthesis threads: 13
    Average synthesis took: 2386.400 usec (+- 79.244 usec)                    Average synthesis took: 2408.700 usec (+- 27.071 usec)
    Average num. events: 950.500 (+- 5.726)                                   Average num. events: 966.000 (+- 0.000)
    Average time per event 2.511 usec                                         Average time per event 2.493 usec
  Number of synthesis threads: 14                                           Number of synthesis threads: 14 
    Average synthesis took: 2466.600 usec (+- 57.893 usec)                    Average synthesis took: 2547.200 usec (+- 53.445 usec)
    Average num. events: 957.600 (+- 0.718)                                   Average num. events: 968.000 (+- 0.000)
    Average time per event 2.576 usec                                         Average time per event 2.631 usec
  Number of synthesis threads: 15                                           Number of synthesis threads: 15 
    Average synthesis took: 2249.700 usec (+- 64.026 usec)                    Average synthesis took: 2647.900 usec (+- 79.014 usec)
    Average num. events: 956.000 (+- 0.000)                                   Average num. events: 970.000 (+- 0.000)
    Average time per event 2.353 usec                                         Average time per event 2.730 usec
  Number of synthesis threads: 16                                           Number of synthesis threads: 16 
    Average synthesis took: 2311.700 usec (+- 64.304 usec)                    Average synthesis took: 2676.200 usec (+- 34.824 usec)
    Average num. events: 955.000 (+- 0.907)                                   Average num. events: 972.000 (+- 0.000)
    Average time per event 2.421 usec                                         Average time per event 2.753 usec
  Number of synthesis threads: 17                                           Number of synthesis threads: 17 
    Average synthesis took: 2174.100 usec (+- 36.673 usec)                    Average synthesis took: 2580.100 usec (+- 45.414 usec)
    Average num. events: 971.600 (+- 3.124)                                   Average num. events: 974.000 (+- 0.000)
    Average time per event 2.238 usec                                         Average time per event 2.649 usec
  Number of synthesis threads: 18                                           Number of synthesis threads: 18 
    Average synthesis took: 2294.200 usec (+- 63.657 usec)                    Average synthesis took: 2810.200 usec (+- 49.113 usec)
    Average num. events: 953.200 (+- 0.611)                                   Average num. events: 976.000 (+- 0.000)
    Average time per event 2.407 usec                                         Average time per event 2.879 usec
  Number of synthesis threads: 19                                           Number of synthesis threads: 19 
    Average synthesis took: 2410.700 usec (+- 120.169 usec)                   Average synthesis took: 2862.400 usec (+- 36.982 usec)
    Average num. events: 953.400 (+- 0.306)                                   Average num. events: 978.000 (+- 0.000)
    Average time per event 2.529 usec                                         Average time per event 2.927 usec
  Number of synthesis threads: 20                                           Number of synthesis threads: 20 
    Average synthesis took: 2387.000 usec (+- 91.051 usec)                    Average synthesis took: 2908.800 usec (+- 36.404 usec)
    Average num. events: 952.800 (+- 0.800)                                   Average num. events: 978.600 (+- 0.306)
    Average time per event 2.505 usec                                         Average time per event 2.972 usec
  Number of synthesis threads: 21                                           Number of synthesis threads: 21 
    Average synthesis took: 2275.700 usec (+- 39.815 usec)                    Average synthesis took: 3141.100 usec (+- 30.896 usec)
    Average num. events: 954.600 (+- 0.306)                                   Average num. events: 980.000 (+- 0.000)
    Average time per event 2.384 usec                                         Average time per event 3.205 usec
  Number of synthesis threads: 22                                           Number of synthesis threads: 22 
    Average synthesis took: 2373.200 usec (+- 89.528 usec)                    Average synthesis took: 3342.400 usec (+- 112.115 usec)
    Average num. events: 949.100 (+- 5.843)                                   Average num. events: 982.000 (+- 0.000)
    Average time per event 2.500 usec                                         Average time per event 3.404 usec
  Number of synthesis threads: 23                                           Number of synthesis threads: 23 
    Average synthesis took: 2318.300 usec (+- 39.395 usec)                    Average synthesis took: 3269.700 usec (+- 55.215 usec)
    Average num. events: 954.600 (+- 0.427)                                   Average num. events: 984.000 (+- 0.000)
    Average time per event 2.429 usec                                         Average time per event 3.323 usec
  Number of synthesis threads: 24                                           Number of synthesis threads: 24
    Average synthesis took: 2241.900 usec (+- 52.577 usec)                    Average synthesis took: 3379.500 usec (+- 56.380 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 986.000 (+- 0.000)
    Average time per event 2.350 usec                                         Average time per event 3.427 usec
  Number of synthesis threads: 25                                           Number of synthesis threads: 25
    Average synthesis took: 2343.400 usec (+- 101.611 usec)                   Average synthesis took: 3382.500 usec (+- 51.535 usec)
    Average num. events: 956.200 (+- 1.009)                                   Average num. events: 988.000 (+- 0.000)
    Average time per event 2.451 usec                                         Average time per event 3.424 usec
  Number of synthesis threads: 26                                           Number of synthesis threads: 26
    Average synthesis took: 2260.700 usec (+- 18.863 usec)                    Average synthesis took: 3391.600 usec (+- 44.053 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 990.000 (+- 0.000)
    Average time per event 2.370 usec                                         Average time per event 3.426 usec
  Number of synthesis threads: 27                                           Number of synthesis threads: 27
    Average synthesis took: 2373.800 usec (+- 74.213 usec)                    Average synthesis took: 3659.200 usec (+- 113.176 usec)
    Average num. events: 955.000 (+- 0.803)                                   Average num. events: 992.000 (+- 0.000)
    Average time per event 2.486 usec                                         Average time per event 3.689 usec
  Number of synthesis threads: 28                                           Number of synthesis threads: 28
    Average synthesis took: 2335.500 usec (+- 49.480 usec)                    Average synthesis took: 3625.000 usec (+- 90.131 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 994.000 (+- 0.000)
    Average time per event 2.448 usec                                         Average time per event 3.647 usec
  Number of synthesis threads: 29                                           Number of synthesis threads: 29
    Average synthesis took: 2182.100 usec (+- 41.649 usec)                    Average synthesis took: 3708.400 usec (+- 103.717 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 996.000 (+- 0.000)
    Average time per event 2.287 usec                                         Average time per event 3.723 usec
  Number of synthesis threads: 30                                           Number of synthesis threads: 30
    Average synthesis took: 2246.100 usec (+- 58.252 usec)                    Average synthesis took: 3820.500 usec (+- 95.282 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 998.000 (+- 0.000)
    Average time per event 2.354 usec                                         Average time per event 3.828 usec
  Number of synthesis threads: 31                                           Number of synthesis threads: 31
    Average synthesis took: 2156.900 usec (+- 26.141 usec)                    Average synthesis took: 3881.400 usec (+- 36.277 usec)
    Average num. events: 948.300 (+- 5.700)                                   Average num. events: 1000.000 (+- 0.000)
    Average time per event 2.274 usec                                         Average time per event 3.881 usec
  Number of synthesis threads: 32                                           Number of synthesis threads: 32
    Average synthesis took: 2295.300 usec (+- 41.538 usec)                    Average synthesis took: 4191.700 usec (+- 149.780 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 1002.000 (+- 0.000)
    Average time per event 2.406 usec                                         Average time per event 4.183 usec
  Number of synthesis threads: 33                                           Number of synthesis threads: 33
    Average synthesis took: 2249.100 usec (+- 59.135 usec)                    Average synthesis took: 3988.200 usec (+- 25.015 usec)
    Average num. events: 948.500 (+- 5.726)                                   Average num. events: 1004.000 (+- 0.000)
    Average time per event 2.371 usec                                         Average time per event 3.972 usec
  Number of synthesis threads: 34                                           Number of synthesis threads: 34
    Average synthesis took: 2270.400 usec (+- 65.011 usec)                    Average synthesis took: 4064.600 usec (+- 44.158 usec)
    Average num. events: 954.200 (+- 0.200)                                   Average num. events: 1006.000 (+- 0.000)
    Average time per event 2.379 usec                                         Average time per event 4.040 usec
  Number of synthesis threads: 35                                           Number of synthesis threads: 35
    Average synthesis took: 2259.200 usec (+- 44.287 usec)                    Average synthesis took: 4145.700 usec (+- 37.297 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 1008.000 (+- 0.000)
    Average time per event 2.368 usec                                         Average time per event 4.113 usec
  Number of synthesis threads: 36                                           Number of synthesis threads: 36
    Average synthesis took: 2294.100 usec (+- 38.693 usec)                    Average synthesis took: 4234.900 usec (+- 81.904 usec)
    Average num. events: 954.000 (+- 0.000)                                   Average num. events: 1010.400 (+- 0.267)
    Average time per event 2.405 usec                                         Average time per event 4.191 usec
  Number of synthesis threads: 37                                           Number of synthesis threads: 37
    Average synthesis took: 2338.900 usec (+- 80.346 usec)                    Average synthesis took: 4337.900 usec (+- 30.071 usec)
    Average num. events: 954.400 (+- 0.267)                                   Average num. events: 1014.000 (+- 0.000)
    Average time per event 2.451 usec                                         Average time per event 4.278 usec
  Number of synthesis threads: 38                                           Number of synthesis threads: 38
    Average synthesis took: 2406.300 usec (+- 57.140 usec)                    Average synthesis took: 4426.600 usec (+- 27.035 usec)
    Average num. events: 938.400 (+- 7.730)                                   Average num. events: 1016.000 (+- 0.000)
    Average time per event 2.564 usec                                         Average time per event 4.357 usec
  Number of synthesis threads: 39                                           Number of synthesis threads: 39
    Average synthesis took: 2371.000 usec (+- 35.676 usec)                    Average synthesis took: 5979.000 usec (+- 1518.855 usec)
    Average num. events: 963.000 (+- 0.000)                                   Average num. events: 1018.000 (+- 0.000)
    Average time per event 2.462 usec                                         Average time per event 5.873 usec
  Number of synthesis threads: 40                                           Number of synthesis threads: 40
    Average synthesis took: 2489.400 usec (+- 49.832 usec)                    Average synthesis took: 4576.500 usec (+- 75.278 usec)
    Average num. events: 956.800 (+- 6.721)                                   Average num. events: 1020.000 (+- 0.000)
    Average time per event 2.602 usec                                         Average time per event 4.487 usec


  reply	other threads:[~2021-08-09 12:04 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1627657061.git.rickyman7@gmail.com>
2021-07-30 15:34 ` [RFC PATCH v2 01/10] perf workqueue: threadpool creation and destruction Riccardo Mancini
2021-08-07  2:24   ` Namhyung Kim
2021-08-09 10:30     ` Riccardo Mancini
2021-08-10 18:54       ` Namhyung Kim
2021-08-10 20:24         ` Arnaldo Carvalho de Melo
2021-08-11 17:55           ` Riccardo Mancini
2021-07-30 15:34 ` [RFC PATCH v2 02/10] perf tests: add test for workqueue Riccardo Mancini
2021-07-30 15:34 ` [RFC PATCH v2 03/10] perf workqueue: add threadpool start and stop functions Riccardo Mancini
2021-08-07  2:43   ` Namhyung Kim
2021-08-09 10:35     ` Riccardo Mancini
2021-07-30 15:34 ` [RFC PATCH v2 04/10] perf workqueue: add threadpool execute and wait functions Riccardo Mancini
2021-08-07  2:56   ` Namhyung Kim
2021-07-30 15:34 ` [RFC PATCH v2 05/10] tools: add sparse context/locking annotations in compiler-types.h Riccardo Mancini
2021-07-30 15:34 ` [RFC PATCH v2 06/10] perf workqueue: introduce workqueue struct Riccardo Mancini
2021-08-09 12:04   ` Jiri Olsa
2021-07-30 15:34 ` [RFC PATCH v2 07/10] perf workqueue: implement worker thread and management Riccardo Mancini
2021-07-30 15:34 ` [RFC PATCH v2 08/10] perf workqueue: add queue_work and flush_workqueue functions Riccardo Mancini
2021-07-30 15:34 ` [RFC PATCH v2 09/10] perf workqueue: add utility to execute a for loop in parallel Riccardo Mancini
2021-07-30 15:34 ` [RFC PATCH v2 10/10] perf synthetic-events: use workqueue parallel_for Riccardo Mancini
2021-08-09 12:04   ` Jiri Olsa [this message]
2021-08-09 13:24     ` Riccardo Mancini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YREZ4G1xzncpdsVk@krava \
    --to=jolsa@redhat.com \
    --cc=acme@kernel.org \
    --cc=alexey.v.bayduraev@linux.intel.com \
    --cc=irogers@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rickyman7@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).