linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] perf sched: Fix sched latency analysis incorrect
@ 2023-03-17  6:32 Chunxin Zang
  2023-03-21  7:32 ` Mike Galbraith
  2023-03-27 19:00 ` Namhyung Kim
  0 siblings, 2 replies; 5+ messages in thread
From: Chunxin Zang @ 2023-03-17  6:32 UTC (permalink / raw)
  To: namhyung, peterz, mingo, acme
  Cc: mark.rutland, alexander.shishkin, jolsa, irogers,
	linux-perf-users, linux-kernel, Chunxin Zang, Jerry Zhou

[-- Attachment #1: Type: text/plain, Size: 2228 bytes --]

'perf sched latency' is incorrect to get process schedule latency
when it used 'sched:sched_wakeup' to analysis perf.data.

Because 'perf record' prefer use 'sched:sched_waking' to
'sched:sched_wakeup' since commit d566a9c2d482 ("perf sched: Prefer
sched_waking event when it exists"). It's very reasonable to
evaluate process schedule latency.

Similarly, update sched latency/map/replay to use sched_waking events.

I used 'perf record -e "sched:* "' to record both wakeup and waking events,
and use fixed perf version to analysis them, the result is correct.
Because the function "latency_wakeup_event" will change atom->state to
THREAD_WAIT_CPU at waking event, and judge the 'atom->state !=
THREAD_SLEEPING' is false then returned at wakeup event.

Signed-off-by: Chunxin Zang <zangchunxin@lixiang.com>
Signed-off-by: Jerry Zhou <zhouchunhua@lixiang.com>
---

	changelogs in v3:
	1) fix non-ASCII characters in commit log.

	changelogs in v2:
	1) fix email address disappearing in 'signed off by'

 tools/perf/builtin-sched.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 86e18575c9be..1af4ec1ac824 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1819,6 +1819,7 @@ static int perf_sched__read_events(struct perf_sched *sched)
 	const struct evsel_str_handler handlers[] = {
 		{ "sched:sched_switch",	      process_sched_switch_event, },
 		{ "sched:sched_stat_runtime", process_sched_runtime_event, },
+		{ "sched:sched_waking",	      process_sched_wakeup_event, },
 		{ "sched:sched_wakeup",	      process_sched_wakeup_event, },
 		{ "sched:sched_wakeup_new",   process_sched_wakeup_event, },
 		{ "sched:sched_migrate_task", process_sched_migrate_task_event, },
-- 
2.25.1

声明:这封邮件只允许文件接收者阅读,有很高的机密性要求。禁止其他人使用、打开、复制或转发里面的任何内容。如果本邮件错误地发给了你,请联系邮件发出者并删除这个文件。机密及法律的特权并不因为误发邮件而放弃或丧失。任何提出的观点或意见只属于作者的个人见解,并不一定代表本公司。

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] perf sched: Fix sched latency analysis incorrect
  2023-03-17  6:32 [PATCH v3] perf sched: Fix sched latency analysis incorrect Chunxin Zang
@ 2023-03-21  7:32 ` Mike Galbraith
  2023-03-21 11:05   ` Chunxin Zang
  2023-03-27 19:00 ` Namhyung Kim
  1 sibling, 1 reply; 5+ messages in thread
From: Mike Galbraith @ 2023-03-21  7:32 UTC (permalink / raw)
  To: Chunxin Zang, namhyung, peterz, mingo, acme
  Cc: mark.rutland, alexander.shishkin, jolsa, irogers,
	linux-perf-users, linux-kernel, Jerry Zhou

On Fri, 2023-03-17 at 14:32 +0800, Chunxin Zang wrote:
> 'perf sched latency' is incorrect to get process schedule latency
> when it used 'sched:sched_wakeup' to analysis perf.data.
>
> Because 'perf record' prefer use 'sched:sched_waking' to
> 'sched:sched_wakeup' since commit d566a9c2d482 ("perf sched: Prefer
> sched_waking event when it exists"). It's very reasonable to
> evaluate process schedule latency.

Yeah, that makes a.. not so tiny difference.  Thanks.
(off to flush large pile of.. not data)

su - mikeg -c 'google-chrome-stable https://www.youtube.com/watch?v=aqz-KE-bpKQ > /dev/null 2>&1'&
massive_intr 8 9999&
# allow time to select display size/res
sleep 10
perf sched record -a -- cyclictest -Smqi 1234 -d 0&
sleep 300
killall cyclictest
sleep 1
killall massive_intr chrome

cyclictest output
T: 0 (10255) P: 0 I:1234 C: 194367 Min:      4 Act:   55 Avg:  484 Max:   22776
T: 1 (10256) P: 0 I:1234 C: 194346 Min:      6 Act:   57 Avg:  480 Max:   18589
T: 2 (10257) P: 0 I:1234 C: 194034 Min:      5 Act: 5214 Avg:  482 Max:   18530
T: 3 (10258) P: 0 I:1234 C: 192263 Min:      5 Act:   54 Avg:  513 Max:   19385
T: 4 (10259) P: 0 I:1234 C: 193350 Min:      6 Act:  728 Avg:  492 Max:   15951
T: 5 (10260) P: 0 I:1234 C: 196233 Min:      6 Act:   56 Avg:  469 Max:   17905
T: 6 (10261) P: 0 I:1234 C: 194181 Min:      5 Act:   67 Avg:  482 Max:   17997
T: 7 (10262) P: 0 I:1234 C: 196756 Min:      7 Act:  284 Avg:  468 Max:   16287

cyclictest bits of perf sched lat -p output, sorted ala cyclictest (local hackletlet sums delay)
 -----------------------------------------------------------------------------------------------------------
  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
 -----------------------------------------------------------------------------------------------------------
  cyclictest:10255      |   1280.359 ms |      294 | avg:   0.238 ms | max:   4.929 ms | sum:   69.921 ms |
  cyclictest:10256      |   1293.976 ms |      163 | avg:   0.463 ms | max:   6.619 ms | sum:   75.497 ms |
  cyclictest:10257      |   1259.701 ms |      206 | avg:   0.371 ms | max:   5.659 ms | sum:   76.453 ms |
  cyclictest:10258      |   1299.337 ms |      171 | avg:   0.440 ms | max:   4.210 ms | sum:   75.267 ms |
  cyclictest:10259      |   1436.543 ms |      181 | avg:   0.472 ms | max:   8.466 ms | sum:   85.473 ms |
  cyclictest:10260      |   1401.824 ms |      210 | avg:   0.386 ms | max:   7.007 ms | sum:   81.036 ms |
  cyclictest:10261      |   1336.621 ms |      190 | avg:   0.387 ms | max:   4.244 ms | sum:   73.593 ms |
  cyclictest:10262      |   1393.249 ms |      283 | avg:   0.268 ms | max:   4.381 ms | sum:   75.778 ms |
+patchlet
  cyclictest:10252      |    218.546 ms |    28657 | avg:   0.437 ms | max:  12.616 ms | sum:12514.777 ms | <== *poof*
  cyclictest:10255      |   1280.359 ms |   194661 | avg:   0.431 ms | max:  22.722 ms | sum:83954.384 ms |
  cyclictest:10256      |   1293.976 ms |   194509 | avg:   0.427 ms | max:  18.537 ms | sum:83115.895 ms |
  cyclictest:10257      |   1259.701 ms |   194240 | avg:   0.429 ms | max:  18.509 ms | sum:83408.424 ms |
  cyclictest:10258      |   1299.337 ms |   192434 | avg:   0.460 ms | max:  19.331 ms | sum:88555.775 ms |
  cyclictest:10259      |   1436.543 ms |   193531 | avg:   0.439 ms | max:  15.897 ms | sum:84989.121 ms |
  cyclictest:10260      |   1401.824 ms |   196443 | avg:   0.417 ms | max:  17.852 ms | sum:81818.898 ms |
  cyclictest:10261      |   1336.621 ms |   194371 | avg:   0.429 ms | max:  17.944 ms | sum:83443.732 ms |
  cyclictest:10262      |   1393.249 ms |   197039 | avg:   0.415 ms | max:  16.232 ms | sum:81752.473 ms |


	-Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] perf sched: Fix sched latency analysis incorrect
  2023-03-21  7:32 ` Mike Galbraith
@ 2023-03-21 11:05   ` Chunxin Zang
  0 siblings, 0 replies; 5+ messages in thread
From: Chunxin Zang @ 2023-03-21 11:05 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Namhyung Kim, peterz, mingo, acme, mark.rutland,
	alexander.shishkin, jolsa, irogers, linux-perf-users,
	linux-kernel, Jerry Zhou

[-- Attachment #1: Type: text/plain, Size: 22658 bytes --]

> On Mar 21, 2023, at 15:32, Mike Galbraith <efault@gmx.de> wrote:
> 
> On Fri, 2023-03-17 at 14:32 +0800, Chunxin Zang wrote:
>> 'perf sched latency' is incorrect to get process schedule latency
>> when it used 'sched:sched_wakeup' to analysis perf.data.
>> 
>> Because 'perf record' prefer use 'sched:sched_waking' to
>> 'sched:sched_wakeup' since commit d566a9c2d482 ("perf sched: Prefer
>> sched_waking event when it exists"). It's very reasonable to
>> evaluate process schedule latency.
> 
> Yeah, that makes a.. not so tiny difference.  Thanks.
> (off to flush large pile of.. not data)
> 
> su - mikeg -c 'google-chrome-stable https://www.youtube.com/watch?v=aqz-KE-bpKQ > /dev/null 2>&1'&
> massive_intr 8 9999&
> # allow time to select display size/res
> sleep 10
> perf sched record -a -- cyclictest -Smqi 1234 -d 0&
> sleep 300
> killall cyclictest
> sleep 1
> killall massive_intr chrome
> 
> cyclictest output
> T: 0 (10255) P: 0 I:1234 C: 194367 Min:      4 Act:   55 Avg:  484 Max:   22776
> T: 1 (10256) P: 0 I:1234 C: 194346 Min:      6 Act:   57 Avg:  480 Max:   18589
> T: 2 (10257) P: 0 I:1234 C: 194034 Min:      5 Act: 5214 Avg:  482 Max:   18530
> T: 3 (10258) P: 0 I:1234 C: 192263 Min:      5 Act:   54 Avg:  513 Max:   19385
> T: 4 (10259) P: 0 I:1234 C: 193350 Min:      6 Act:  728 Avg:  492 Max:   15951
> T: 5 (10260) P: 0 I:1234 C: 196233 Min:      6 Act:   56 Avg:  469 Max:   17905
> T: 6 (10261) P: 0 I:1234 C: 194181 Min:      5 Act:   67 Avg:  482 Max:   17997
> T: 7 (10262) P: 0 I:1234 C: 196756 Min:      7 Act:  284 Avg:  468 Max:   16287
> 
> cyclictest bits of perf sched lat -p output, sorted ala cyclictest (local hackletlet sums delay)
> -----------------------------------------------------------------------------------------------------------
>  Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Sum delay ms     |
> -----------------------------------------------------------------------------------------------------------
>  cyclictest:10255      |   1280.359 ms |      294 | avg:   0.238 ms | max:   4.929 ms | sum:   69.921 ms |
>  cyclictest:10256      |   1293.976 ms |      163 | avg:   0.463 ms | max:   6.619 ms | sum:   75.497 ms |
>  cyclictest:10257      |   1259.701 ms |      206 | avg:   0.371 ms | max:   5.659 ms | sum:   76.453 ms |
>  cyclictest:10258      |   1299.337 ms |      171 | avg:   0.440 ms | max:   4.210 ms | sum:   75.267 ms |
>  cyclictest:10259      |   1436.543 ms |      181 | avg:   0.472 ms | max:   8.466 ms | sum:   85.473 ms |
>  cyclictest:10260      |   1401.824 ms |      210 | avg:   0.386 ms | max:   7.007 ms | sum:   81.036 ms |
>  cyclictest:10261      |   1336.621 ms |      190 | avg:   0.387 ms | max:   4.244 ms | sum:   73.593 ms |
>  cyclictest:10262      |   1393.249 ms |      283 | avg:   0.268 ms | max:   4.381 ms | sum:   75.778 ms |
> +patchlet
>  cyclictest:10252      |    218.546 ms |    28657 | avg:   0.437 ms | max:  12.616 ms | sum:12514.777 ms | <== *poof*

I observed the different things is 'cyclictest:10252' data only exist in patchlet data. 
So I run the 'perf sched record -a -- cyclictest -Smqi 1234 -d 0 ' try to find the reason.
I get the following data.

$: perf sched record -a -- ./cyclictest -Smqi 1234 -d 0

# /dev/cpu_dma_latency set to 0us
^C[ perf record: Woken up 41 times to write data ]
T: 0 (1188076) P: 0 I:1234 C:   5655 Min:      7 Act:   57 Avg:   57 Max:    1394
T: 1 (1188077) P: 0 I:1234 C:   5656 Min:      7 Act:   57 Avg:   56 Max:      86
T: 2 (1188078) P: 0 I:1234 C:   5654 Min:      6 Act:   57 Avg:   57 Max:    1133
T: 3 (1188079) P: 0 I:1234 C:   5653 Min:      7 Act:   58 Avg:   57 Max:     575
T: 4 (1188080) P: 0 I:1234 C:   5649 Min:      8 Act:   57 Avg:   57 Max:    2878
T: 5 (1188081) P: 0 I:1234 C:   5650 Min:      9 Act:   57 Avg:   56 Max:     420
T: 6 (1188082) P: 0 I:1234 C:   5642 Min:      8 Act:   57 Avg:   58 Max:    7944
T: 7 (1188083) P: 0 I:1234 C:   5647 Min:      7 Act:   57 Avg:   56 Max:      64
T: 8 (1188084) P: 0 I:1234 C:   5612 Min:      8 Act:   57 Avg:   66 Max:   10684
T: 9 (1188085) P: 0 I:1234 C:   5644 Min:      8 Act:   57 Avg:   56 Max:     127
T:10 (1188086) P: 0 I:1234 C:   5643 Min:      8 Act:   57 Avg:   56 Max:     118
T:11 (1188087) P: 0 I:1234 C:   5642 Min:      7 Act:   57 Avg:   56 Max:      64
T:12 (1188088) P: 0 I:1234 C:   5616 Min:      7 Act:   58 Avg:   65 Max:    9961
T:13 (1188089) P: 0 I:1234 C:   5638 Min:      7 Act:   57 Avg:   56 Max:    1545
T:14 (1188090) P: 0 I:1234 C:   5637 Min:      7 Act:   57 Avg:   57 Max:    1394
T:15 (1188091) P: 0 I:1234 C:   5636 Min:      7 Act:   57 Avg:   56 Max:      75
T:16 (1188092) P: 0 I:1234 C:   5634 Min:      5 Act:   55 Avg:   53 Max:    1528
T:17 (1188093) P: 0 I:1234 C:   5633 Min:      4 Act:   54 Avg:   53 Max:     315
T:18 (1188094) P: 0 I:1234 C:   5632 Min:      5 Act:   54 Avg:   53 Max:      85
T:19 (1188095) P: 0 I:1234 C:   5630 Min:      6 Act:  138 Avg:   38 Max:    1356
[ perf record: Captured and wrote 105.442 MB perf.data (923494 samples) ]

The old perf tools data.
$: perf sched latecy -p | grep cyclictest | sort
  cyclictest:1152095    |      3.343 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152097    |     25.832 ms |        4 | avg:   0.003 ms | max:   0.004 ms | max start: 7371180.979819 s | max end: 7371180.979823 s
  cyclictest:1152098    |     25.594 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152099    |     25.558 ms |        2 | avg:   0.003 ms | max:   0.006 ms | max start: 7371180.984052 s | max end: 7371180.984058 s
  cyclictest:1152100    |     15.465 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152101    |     26.697 ms |        2 | avg:   0.003 ms | max:   0.005 ms | max start: 7371180.988301 s | max end: 7371180.988306 s
  cyclictest:1152102    |     26.330 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152103    |     26.380 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152104    |     25.926 ms |        2 | avg:   0.003 ms | max:   0.006 ms | max start: 7371180.990549 s | max end: 7371180.990554 s
  cyclictest:1152105    |     27.589 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152106    |     26.510 ms |        2 | avg:   0.003 ms | max:   0.005 ms | max start: 7371180.995015 s | max end: 7371180.995020 s
  cyclictest:1152107    |     25.325 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152108    |     25.275 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152109    |     15.515 ms |        2 | avg:   0.002 ms | max:   0.004 ms | max start: 7371182.862877 s | max end: 7371182.862881 s
  cyclictest:1152110    |     26.835 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152111    |     25.849 ms |        2 | avg:   0.001 ms | max:   0.002 ms | max start: 7371180.977996 s | max end: 7371180.977998 s
  cyclictest:1152112    |     26.192 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152113    |     18.339 ms |        2 | avg:   0.002 ms | max:   0.005 ms | max start: 7371180.986046 s | max end: 7371180.986051 s
  cyclictest:1152114    |     18.341 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152115    |     17.949 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1152116    |     19.291 ms |        4 | avg:   0.003 ms | max:   0.005 ms | max start: 7371180.363769 s | max end: 7371180.363774 s
  cyclictest:1188074    |     38.122 ms |        1 | avg:   0.000 ms | max:   0.000 ms | max start:     0.000000 s | max end:     0.000000 s
  cyclictest:1188076    |     26.020 ms |        2 | avg:   0.004 ms | max:   0.004 ms | max start: 7371180.984074 s | max end: 7371180.984078 s
  cyclictest:1188077    |     25.949 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.884309 s | max end: 7371175.884313 s
  cyclictest:1188078    |     26.184 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.886010 s | max end: 7371175.886014 s
  cyclictest:1188079    |     28.886 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.887703 s | max end: 7371175.887707 s
  cyclictest:1188080    |     25.228 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.889413 s | max end: 7371175.889417 s
  cyclictest:1188081    |     25.239 ms |        3 | avg:   0.007 ms | max:   0.011 ms | max start: 7371180.965735 s | max end: 7371180.965746 s
  cyclictest:1188082    |     25.371 ms |        2 | avg:   0.005 ms | max:   0.006 ms | max start: 7371180.965845 s | max end: 7371180.965851 s
  cyclictest:1188083    |     26.238 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.894465 s | max end: 7371175.894469 s
  cyclictest:1188084    |     25.658 ms |        3 | avg:   0.005 ms | max:   0.005 ms | max start: 7371180.967776 s | max end: 7371180.967781 s
  cyclictest:1188085    |     26.492 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.897930 s | max end: 7371175.897934 s
  cyclictest:1188086    |     26.397 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.899613 s | max end: 7371175.899617 s
  cyclictest:1188087    |     26.481 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.901338 s | max end: 7371175.901342 s
  cyclictest:1188088    |     29.591 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.903018 s | max end: 7371175.903022 s
  cyclictest:1188089    |     25.619 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.904778 s | max end: 7371175.904782 s
  cyclictest:1188090    |     30.952 ms |        4 | avg:   0.015 ms | max:   0.050 ms | max start: 7371180.975854 s | max end: 7371180.975903 s
  cyclictest:1188091    |     26.220 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.908250 s | max end: 7371175.908254 s
  cyclictest:1188092    |     18.931 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.909970 s | max end: 7371175.909974 s
  cyclictest:1188093    |     19.135 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.911677 s | max end: 7371175.911681 s
  cyclictest:1188094    |     19.145 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 7371175.913420 s | max end: 7371175.913424 s
  cyclictest:1188095    |     19.193 ms |        1 | avg:   0.005 ms | max:   0.005 ms | max start: 7371175.915139 s | max end: 7371175.915144 s

The patchlet perf data.
$: /root/perf sched latency -i ./perf.data -p | grep cyclictest | sort
  cyclictest:1152095    |      3.343 ms |      694 | avg:   0.003 ms | max:   0.206 ms | max start: 7371181.289279 s | max end: 7371181.289486 s
  cyclictest:1152097    |     25.832 ms |     5663 | avg:   0.004 ms | max:   0.806 ms | max start: 7371180.245354 s | max end: 7371180.246160 s
  cyclictest:1152098    |     25.594 ms |     5659 | avg:   0.003 ms | max:   0.016 ms | max start: 7371180.466516 s | max end: 7371180.466532 s
  cyclictest:1152099    |     25.558 ms |     5660 | avg:   0.004 ms | max:   0.929 ms | max start: 7371180.962911 s | max end: 7371180.963840 s
  cyclictest:1152100    |     15.465 ms |     5659 | avg:   0.006 ms | max:   0.525 ms | max start: 7371180.979260 s | max end: 7371180.979785 s
  cyclictest:1152101    |     26.697 ms |     5659 | avg:   0.004 ms | max:   2.952 ms | max start: 7371180.250317 s | max end: 7371180.253269 s
  cyclictest:1152102    |     26.330 ms |     5659 | avg:   0.003 ms | max:   0.359 ms | max start: 7371180.977488 s | max end: 7371180.977847 s
  cyclictest:1152103    |     26.380 ms |     5652 | avg:   0.005 ms | max:   8.230 ms | max start: 7371180.227552 s | max end: 7371180.235782 s
  cyclictest:1152104    |     25.926 ms |     5660 | avg:   0.003 ms | max:   0.060 ms | max start: 7371180.990483 s | max end: 7371180.990543 s
  cyclictest:1152105    |     27.589 ms |     5620 | avg:   0.015 ms | max:  10.972 ms | max start: 7371180.804812 s | max end: 7371180.815785 s
  cyclictest:1152106    |     26.510 ms |     5660 | avg:   0.003 ms | max:   0.158 ms | max start: 7371180.994849 s | max end: 7371180.995007 s
  cyclictest:1152107    |     25.325 ms |     5659 | avg:   0.003 ms | max:   0.947 ms | max start: 7371180.986531 s | max end: 7371180.987478 s
  cyclictest:1152108    |     25.275 ms |     5659 | avg:   0.003 ms | max:   0.129 ms | max start: 7371176.369231 s | max end: 7371176.369360 s
  cyclictest:1152109    |     15.515 ms |     5627 | avg:   0.017 ms | max:   9.912 ms | max start: 7371180.228257 s | max end: 7371180.238169 s
  cyclictest:1152110    |     26.835 ms |     5658 | avg:   0.003 ms | max:   1.647 ms | max start: 7371180.258250 s | max end: 7371180.259897 s
  cyclictest:1152111    |     25.849 ms |     5659 | avg:   0.004 ms | max:   1.679 ms | max start: 7371180.242512 s | max end: 7371180.244191 s
  cyclictest:1152112    |     26.192 ms |     5659 | avg:   0.003 ms | max:   0.009 ms | max start: 7371177.911768 s | max end: 7371177.911778 s
  cyclictest:1152113    |     18.339 ms |     5660 | avg:   0.002 ms | max:   0.944 ms | max start: 7371180.246854 s | max end: 7371180.247797 s
  cyclictest:1152114    |     18.341 ms |     5659 | avg:   0.002 ms | max:   0.423 ms | max start: 7371180.248383 s | max end: 7371180.248806 s
  cyclictest:1152115    |     17.949 ms |     5659 | avg:   0.002 ms | max:   0.036 ms | max start: 7371175.960511 s | max end: 7371175.960548 s
  cyclictest:1152116    |     19.291 ms |     5661 | avg:   0.005 ms | max:   1.324 ms | max start: 7371180.257638 s | max end: 7371180.258961 s
  cyclictest:1188074    |     38.122 ms |      692 | avg:   0.003 ms | max:   0.010 ms | max start: 7371175.995600 s | max end: 7371175.995610 s
  cyclictest:1188076    |     26.020 ms |     5656 | avg:   0.004 ms | max:   1.341 ms | max start: 7371180.244813 s | max end: 7371180.246154 s
  cyclictest:1188077    |     25.949 ms |     5656 | avg:   0.003 ms | max:   0.033 ms | max start: 7371180.268781 s | max end: 7371180.268814 s
  cyclictest:1188078    |     26.184 ms |     5654 | avg:   0.003 ms | max:   1.079 ms | max start: 7371180.962756 s | max end: 7371180.963835 s
  cyclictest:1188079    |     28.886 ms |     5653 | avg:   0.004 ms | max:   0.521 ms | max start: 7371180.979258 s | max end: 7371180.979779 s
  cyclictest:1188080    |     25.228 ms |     5650 | avg:   0.003 ms | max:   2.824 ms | max start: 7371180.250439 s | max end: 7371180.253263 s
  cyclictest:1188081    |     25.239 ms |     5652 | avg:   0.003 ms | max:   0.349 ms | max start: 7371180.965380 s | max end: 7371180.965729 s
  cyclictest:1188082    |     25.371 ms |     5643 | avg:   0.005 ms | max:   7.891 ms | max start: 7371180.227887 s | max end: 7371180.235778 s
  cyclictest:1188083    |     26.238 ms |     5647 | avg:   0.003 ms | max:   0.010 ms | max start: 7371181.073635 s | max end: 7371181.073645 s
  cyclictest:1188084    |     25.658 ms |     5614 | avg:   0.013 ms | max:  10.631 ms | max start: 7371180.805141 s | max end: 7371180.815771 s
  cyclictest:1188085    |     26.492 ms |     5645 | avg:   0.003 ms | max:   0.073 ms | max start: 7371180.988249 s | max end: 7371180.988322 s
  cyclictest:1188086    |     26.397 ms |     5643 | avg:   0.003 ms | max:   0.064 ms | max start: 7371177.544604 s | max end: 7371177.544667 s
  cyclictest:1188087    |     26.481 ms |     5642 | avg:   0.003 ms | max:   0.011 ms | max start: 7371180.211768 s | max end: 7371180.211778 s
  cyclictest:1188088    |     29.591 ms |     5617 | avg:   0.012 ms | max:   9.908 ms | max start: 7371180.228256 s | max end: 7371180.238165 s
  cyclictest:1188089    |     25.619 ms |     5638 | avg:   0.003 ms | max:   1.490 ms | max start: 7371180.258401 s | max end: 7371180.259890 s
  cyclictest:1188090    |     30.952 ms |     5640 | avg:   0.004 ms | max:   1.339 ms | max start: 7371180.242847 s | max end: 7371180.244186 s
  cyclictest:1188091    |     26.220 ms |     5636 | avg:   0.003 ms | max:   0.022 ms | max start: 7371180.712279 s | max end: 7371180.712301 s
  cyclictest:1188092    |     18.931 ms |     5634 | avg:   0.002 ms | max:   1.473 ms | max start: 7371180.246318 s | max end: 7371180.247791 s
  cyclictest:1188093    |     19.135 ms |     5634 | avg:   0.002 ms | max:   0.259 ms | max start: 7371180.243086 s | max end: 7371180.243345 s
  cyclictest:1188094    |     19.145 ms |     5632 | avg:   0.002 ms | max:   0.031 ms | max start: 7371180.244828 s | max end: 7371180.244859 s
  cyclictest:1188095    |     19.193 ms |     5630 | avg:   0.003 ms | max:   1.314 ms | max start: 7371180.257641 s | max end: 7371180.258955 s

Obviously, there are some cases that are counted as zero in perf.data which is generated
by old perf tools, such as 'cyclictest:1152098'.
I use 'perf script'  and get the following info.

         swapper     0 [001] 7371175.879738:       sched:sched_waking: comm=cyclictest pid=1152098 prio=125 target_cpu=001
         swapper     0 [001] 7371175.879741:       sched:sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=cyclictest next_pid=1152098 next_prio=125
      cyclictest 1152098 [001] 7371175.879744: sched:sched_stat_runtime: comm=cyclictest pid=1152098 runtime=5279 [ns] vruntime=100770994135 [ns]
            perf 1187996 [015] 7371175.879744: sched:sched_stat_runtime: comm=perf pid=1187996 runtime=24807 [ns] vruntime=28338406886 [ns]
      cyclictest 1152098 [001] 7371175.879745:       sched:sched_switch: prev_comm=cyclictest prev_pid=1152098 prev_prio=125 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
            perf 1187996 [015] 7371175.879746:       sched:sched_waking: comm=migration/15 pid=103 prio=0 target_cpu=015

	... ...

         swapper     0 [001] 7371175.880972:       sched:sched_waking: comm=cyclictest pid=1152098 prio=125 target_cpu=001
         swapper     0 [001] 7371175.880975:       sched:sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=cyclictest next_pid=1152098 next_prio=125
      cyclictest 1152098 [001] 7371175.880978: sched:sched_stat_runtime: comm=cyclictest pid=1152098 runtime=5481 [ns] vruntime=100771010888 [ns]
      cyclictest 1152098 [001] 7371175.880979:       sched:sched_switch: prev_comm=cyclictest prev_pid=1152098 prev_prio=125 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120

In fact, the 'cyclictest:1152098' was sched_waking at 7371175.880972, and sched_switch
at 7371175.880975 in second context, so it's can't be zero.
The reason is the old perf tools  isn't use 'sched_waking' event to analysis data,
and 'cyclictest:1152098' state is 'S' on previous sched_switch at 7371175.879745,
the perf tools isn't find the beginning time of 'cyclictest:1152098' which was putted on runqueue in perf.data. 

      cyclictest 1152097 [000] 7371180.979817:       sched:sched_waking: comm=kworker/0:2 pid=1074142 prio=120 target_cpu=000
             cut 1188136 [012] 7371180.979818: sched:sched_stat_runtime: comm=cut pid=1188136 runtime=2024 [ns] vruntime=232844433506260 [ns]
      cyclictest 1152097 [000] 7371180.979818: sched:sched_stat_runtime: comm=cyclictest pid=1152097 runtime=4303 [ns] vruntime=505223436640 [ns]
      cyclictest 1152097 [000] 7371180.979819:       sched:sched_switch: prev_comm=cyclictest prev_pid=1152097 prev_prio=125 prev_state=R ==> next_comm=kworker/0:2 next_pid=1074142 next_prio=120
             :-1    -1 [012] 7371180.979822:       sched:sched_waking: comm=migration/2 pid=25 prio=0 target_cpu=002
 kworker/0:2-eve 1074142 [000] 7371180.979822: sched:sched_stat_runtime: comm=kworker/0:2 pid=1074142 runtime=5032 [ns] vruntime=113645059997114 [ns]
 kworker/0:2-eve 1074142 [000] 7371180.979823:       sched:sched_switch: prev_comm=kworker/0:2 prev_pid=1074142 prev_prio=120 prev_state=I ==> next_comm=cyclictest next_pid=1152097 next_prio=125
             :-1    -1 [012] 7371180.979823:       sched:sched_switch: prev_comm=cut prev_pid=1188136 prev_prio=120 prev_state=X ==> next_comm=swapper/12 next_pid=0 next_prio=120

The 'cyclictest:1152097' case is't counted as zero is because of it's state is 'R'
in previous sched_switch at 7371180.979819, and old perf tools is find the beginning
time when it was putted on runqueue.

That's the issue which I want to fix.
And the reason may be makes you get the difference result in your context. 

Best wishes
-Chunxin

>  cyclictest:10255      |   1280.359 ms |   194661 | avg:   0.431 ms | max:  22.722 ms | sum:83954.384 ms |
>  cyclictest:10256      |   1293.976 ms |   194509 | avg:   0.427 ms | max:  18.537 ms | sum:83115.895 ms |
>  cyclictest:10257      |   1259.701 ms |   194240 | avg:   0.429 ms | max:  18.509 ms | sum:83408.424 ms |
>  cyclictest:10258      |   1299.337 ms |   192434 | avg:   0.460 ms | max:  19.331 ms | sum:88555.775 ms |
>  cyclictest:10259      |   1436.543 ms |   193531 | avg:   0.439 ms | max:  15.897 ms | sum:84989.121 ms |
>  cyclictest:10260      |   1401.824 ms |   196443 | avg:   0.417 ms | max:  17.852 ms | sum:81818.898 ms |
>  cyclictest:10261      |   1336.621 ms |   194371 | avg:   0.429 ms | max:  17.944 ms | sum:83443.732 ms |
>  cyclictest:10262      |   1393.249 ms |   197039 | avg:   0.415 ms | max:  16.232 ms | sum:81752.473 ms |
> 
> 
> -Mike

声明:这封邮件只允许文件接收者阅读,有很高的机密性要求。禁止其他人使用、打开、复制或转发里面的任何内容。如果本邮件错误地发给了你,请联系邮件发出者并删除这个文件。机密及法律的特权并不因为误发邮件而放弃或丧失。任何提出的观点或意见只属于作者的个人见解,并不一定代表本公司。

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] perf sched: Fix sched latency analysis incorrect
  2023-03-17  6:32 [PATCH v3] perf sched: Fix sched latency analysis incorrect Chunxin Zang
  2023-03-21  7:32 ` Mike Galbraith
@ 2023-03-27 19:00 ` Namhyung Kim
  2023-03-28  3:24   ` Chunxin Zang
  1 sibling, 1 reply; 5+ messages in thread
From: Namhyung Kim @ 2023-03-27 19:00 UTC (permalink / raw)
  To: Chunxin Zang
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	irogers, linux-perf-users, linux-kernel, Jerry Zhou

On Thu, Mar 16, 2023 at 11:33 PM Chunxin Zang <zangchunxin@lixiang.com> wrote:
>
> 'perf sched latency' is incorrect to get process schedule latency
> when it used 'sched:sched_wakeup' to analysis perf.data.
>
> Because 'perf record' prefer use 'sched:sched_waking' to
> 'sched:sched_wakeup' since commit d566a9c2d482 ("perf sched: Prefer
> sched_waking event when it exists"). It's very reasonable to
> evaluate process schedule latency.
>
> Similarly, update sched latency/map/replay to use sched_waking events.

Have you checked perf sched replay (add_sched_event_wakeup)?  I think
it'd make unnecessary sched atoms for sched_wakeup.

Can we check the wakeup and waking events and disable the wakeup
if the waking is found like in perf_sched__timehist?

Thanks,
Namhyung


>
> I used 'perf record -e "sched:* "' to record both wakeup and waking events,
> and use fixed perf version to analysis them, the result is correct.
> Because the function "latency_wakeup_event" will change atom->state to
> THREAD_WAIT_CPU at waking event, and judge the 'atom->state !=
> THREAD_SLEEPING' is false then returned at wakeup event.
>
> Signed-off-by: Chunxin Zang <zangchunxin@lixiang.com>
> Signed-off-by: Jerry Zhou <zhouchunhua@lixiang.com>
> ---
>
>         changelogs in v3:
>         1) fix non-ASCII characters in commit log.
>
>         changelogs in v2:
>         1) fix email address disappearing in 'signed off by'
>
>  tools/perf/builtin-sched.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index 86e18575c9be..1af4ec1ac824 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -1819,6 +1819,7 @@ static int perf_sched__read_events(struct perf_sched *sched)
>         const struct evsel_str_handler handlers[] = {
>                 { "sched:sched_switch",       process_sched_switch_event, },
>                 { "sched:sched_stat_runtime", process_sched_runtime_event, },
> +               { "sched:sched_waking",       process_sched_wakeup_event, },
>                 { "sched:sched_wakeup",       process_sched_wakeup_event, },
>                 { "sched:sched_wakeup_new",   process_sched_wakeup_event, },
>                 { "sched:sched_migrate_task", process_sched_migrate_task_event, },
> --
> 2.25.1
>
> 声明:这封邮件只允许文件接收者阅读,有很高的机密性要求。禁止其他人使用、打开、复制或转发里面的任何内容。如果本邮件错误地发给了你,请联系邮件发出者并删除这个文件。机密及法律的特权并不因为误发邮件而放弃或丧失。任何提出的观点或意见只属于作者的个人见解,并不一定代表本公司。

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] perf sched: Fix sched latency analysis incorrect
  2023-03-27 19:00 ` Namhyung Kim
@ 2023-03-28  3:24   ` Chunxin Zang
  0 siblings, 0 replies; 5+ messages in thread
From: Chunxin Zang @ 2023-03-28  3:24 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	irogers, linux-perf-users, linux-kernel, Jerry Zhou

[-- Attachment #1: Type: text/plain, Size: 3410 bytes --]

> On Mar 28, 2023, at 03:00, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> On Thu, Mar 16, 2023 at 11:33 PM Chunxin Zang <zangchunxin@lixiang.com> wrote:
>> 
>> 'perf sched latency' is incorrect to get process schedule latency
>> when it used 'sched:sched_wakeup' to analysis perf.data.
>> 
>> Because 'perf record' prefer use 'sched:sched_waking' to
>> 'sched:sched_wakeup' since commit d566a9c2d482 ("perf sched: Prefer
>> sched_waking event when it exists"). It's very reasonable to
>> evaluate process schedule latency.
>> 
>> Similarly, update sched latency/map/replay to use sched_waking events.
> 
> Have you checked perf sched replay (add_sched_event_wakeup)?  I think
> it'd make unnecessary sched atoms for sched_wakeup.
> 
> Can we check the wakeup and waking events and disable the wakeup
> if the waking is found like in perf_sched__timehist?

That's a good idea, I will do that in v4 version.

Best regards
Chunxin

> 
> Thanks,
> Namhyung
> 
> 
>> 
>> I used 'perf record -e "sched:* "' to record both wakeup and waking events,
>> and use fixed perf version to analysis them, the result is correct.
>> Because the function "latency_wakeup_event" will change atom->state to
>> THREAD_WAIT_CPU at waking event, and judge the 'atom->state !=
>> THREAD_SLEEPING' is false then returned at wakeup event.
>> 
>> Signed-off-by: Chunxin Zang <zangchunxin@lixiang.com>
>> Signed-off-by: Jerry Zhou <zhouchunhua@lixiang.com>
>> ---
>> 
>>        changelogs in v3:
>>        1) fix non-ASCII characters in commit log.
>> 
>>        changelogs in v2:
>>        1) fix email address disappearing in 'signed off by'
>> 
>> tools/perf/builtin-sched.c | 1 +
>> 1 file changed, 1 insertion(+)
>> 
>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>> index 86e18575c9be..1af4ec1ac824 100644
>> --- a/tools/perf/builtin-sched.c
>> +++ b/tools/perf/builtin-sched.c
>> @@ -1819,6 +1819,7 @@ static int perf_sched__read_events(struct perf_sched *sched)
>>        const struct evsel_str_handler handlers[] = {
>>                { "sched:sched_switch",       process_sched_switch_event, },
>>                { "sched:sched_stat_runtime", process_sched_runtime_event, },
>> +               { "sched:sched_waking",       process_sched_wakeup_event, },
>>                { "sched:sched_wakeup",       process_sched_wakeup_event, },
>>                { "sched:sched_wakeup_new",   process_sched_wakeup_event, },
>>                { "sched:sched_migrate_task", process_sched_migrate_task_event, },
>> --
>> 2.25.1
>> 
>> 声明:这封邮件只允许文件接收者阅读,有很高的机密性要求。禁止其他人使用、打开、复制或转发里面的任何内容。如果本邮件错误地发给了你,请联系邮件发出者并删除这个文件。机密及法律的特权并不因为误发邮件而放弃或丧失。任何提出的观点或意见只属于作者的个人见解,并不一定代表本公司。

声明:这封邮件只允许文件接收者阅读,有很高的机密性要求。禁止其他人使用、打开、复制或转发里面的任何内容。如果本邮件错误地发给了你,请联系邮件发出者并删除这个文件。机密及法律的特权并不因为误发邮件而放弃或丧失。任何提出的观点或意见只属于作者的个人见解,并不一定代表本公司。

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-28  3:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-17  6:32 [PATCH v3] perf sched: Fix sched latency analysis incorrect Chunxin Zang
2023-03-21  7:32 ` Mike Galbraith
2023-03-21 11:05   ` Chunxin Zang
2023-03-27 19:00 ` Namhyung Kim
2023-03-28  3:24   ` Chunxin Zang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).