All of lore.kernel.org
 help / color / mirror / Atom feed
* perf tools: software regression in d0a0a511493d269514fcbd852481cdca32c95350
@ 2022-07-13  7:30 Adrián Herrera Arcila
  2022-07-14 12:46 ` Leo Yan
  0 siblings, 1 reply; 6+ messages in thread
From: Adrián Herrera Arcila @ 2022-07-13  7:30 UTC (permalink / raw)
  To: tmricht, acme, linux-perf-users; +Cc: James.Clark, nd

Hello all,

The regression affects the perf command "stat" used with the option "-D 
msecs, --delay msecs". The change was introduced in v5.18 and was 
backported up to v5.15.33 of 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git. In 
v5.18, the behaviour of the affected option is described in 
tools/perf/Documentation/perf-stat.txt as follows:

"After starting the program, wait msecs before measuring (-1: start with 
events disabled). This is useful to filter out the startup phase of the 
program, which is often very different."

The program is expected to start without delay. The observed behaviour 
in v5.18 does not correspond, as demonstrated by the following experiments:

time perf stat --delay 2000 --quiet sleep 2
v5.18 output:
Events disabled
Events enabled

real    0m4.065s
user    0m0.019s
sys    0m0.028s
v5.17 output:
Events disabled
Events enabled

real    0m2.028s
user    0m0.001s
sys    0m0.008s

perf stat --delay 2000 --quiet echo "marking"
v5.18 output:
Events disabled
Events enabled
marking
v5.17 output:
Events disabled
marking
Events enabled

I think there are two behaviours that are valuable to maintain to enable 
two uses: (1) if no delay is specified, the program should start after 
counters are enabled, to prevent the unexpected behaviour that Thomas 
fixed with this change; (2) if delay is specified, the program should 
start before the delay is engaged.

Two possible solutions are: (1) though conditionals; if delay > 0, start 
the workload, then call enable_counters, which will usleep for the 
delay; if no delay, then behaviour as it is in v5.18; (2) replace 
blocking usleep with asynchronous callback, if that exists in Linux, 
that enables counters after the delay.

Any suggestions from more experienced Linux developers on which solution 
to pursue will be appreciated.

Kind regards,
Adrián.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf tools: software regression in d0a0a511493d269514fcbd852481cdca32c95350
  2022-07-13  7:30 perf tools: software regression in d0a0a511493d269514fcbd852481cdca32c95350 Adrián Herrera Arcila
@ 2022-07-14 12:46 ` Leo Yan
  2022-07-28 12:44   ` Adrián Herrera Arcila
  0 siblings, 1 reply; 6+ messages in thread
From: Leo Yan @ 2022-07-14 12:46 UTC (permalink / raw)
  To: Adrián Herrera Arcila
  Cc: tmricht, acme, linux-perf-users, James.Clark, nd

Hi Adrián,

On Wed, Jul 13, 2022 at 08:30:37AM +0100, Adrián Herrera Arcila wrote:

[...]

> I think there are two behaviours that are valuable to maintain to enable two
> uses: (1) if no delay is specified, the program should start after counters
> are enabled, to prevent the unexpected behaviour that Thomas fixed with this
> change; (2) if delay is specified, the program should start before the delay
> is engaged.
> 
> Two possible solutions are: (1) though conditionals; if delay > 0, start the
> workload, then call enable_counters, which will usleep for the delay; if no
> delay, then behaviour as it is in v5.18; (2) replace blocking usleep with
> asynchronous callback, if that exists in Linux, that enables counters after
> the delay.

If we look at other perf sub tools (like 'perf record'), they uses
your mentioned solution (1) to handle the delay and enabling counters,
based on the condition checking for 'stat_config.initial_delay', 'perf
record' enables counters at different time point.

Alternatively, we can start workload based on the condition checking
for 'stat_config.initial_delay', so we can get a below patch, this patch
should can fix the issue, but the change is a bit urgly for me, so
welcome a better fixing; if no, I will send a patch in my tomorrow
morning.

Thanks,
Leo

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index d2ecd4d29624..046686a7c8d6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -970,10 +970,15 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	 * Enable counters and exec the command:
 	 */
 	if (forks) {
+		if (stat_config.initial_delay > 0)
+			evlist__start_workload(evsel_list);
+
 		err = enable_counters();
 		if (err)
 			return -1;
-		evlist__start_workload(evsel_list);
+
+		if (stat_config.initial_delay <= 0)
+			evlist__start_workload(evsel_list);
 
 		t0 = rdclock();
 		clock_gettime(CLOCK_MONOTONIC, &ref_time);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: perf tools: software regression in d0a0a511493d269514fcbd852481cdca32c95350
  2022-07-14 12:46 ` Leo Yan
@ 2022-07-28 12:44   ` Adrián Herrera Arcila
  2022-07-28 14:07     ` Leo Yan
  0 siblings, 1 reply; 6+ messages in thread
From: Adrián Herrera Arcila @ 2022-07-28 12:44 UTC (permalink / raw)
  To: Leo Yan; +Cc: tmricht, acme, linux-perf-users, James.Clark, nd, songliubraving

Thank you Leo.

I think the source code fragment in perf-record is more readable 
(logical) than in perf-stat, so replicating it would be better, also for 
consistency. I have prepared a patch for that which I will send for review.

perf-stat differs from perf-record in the presence of BPF counters; 
these are enabled inside enable_counters immediately, independent of the 
delay; I respected that behaviour, but if it is not a problem to delay 
their enablement as well, the code would be slightly simplified. I have 
added Song Liu (author of BPF counters), in case of any comments on this.

Note Thomas issue arised when using "-a"; without it, events are enabled 
on exec of the workload/command, so the instruction he mentioned would 
be captured. The code structure in perf-record, and in the patch I 
prepared, relies implicitly on enable_on_exec; I tried to clarify it in 
the comments.

Kind regards,
Adrián.

On 14/07/2022 13:46, Leo Yan wrote:
> Hi Adri�n,
>
> On Wed, Jul 13, 2022 at 08:30:37AM +0100, Adri�n Herrera Arcila wrote:
>
> [...]
>
>> I think there are two behaviours that are valuable to maintain to enable two
>> uses: (1) if no delay is specified, the program should start after counters
>> are enabled, to prevent the unexpected behaviour that Thomas fixed with this
>> change; (2) if delay is specified, the program should start before the delay
>> is engaged.
>>
>> Two possible solutions are: (1) though conditionals; if delay > 0, start the
>> workload, then call enable_counters, which will usleep for the delay; if no
>> delay, then behaviour as it is in v5.18; (2) replace blocking usleep with
>> asynchronous callback, if that exists in Linux, that enables counters after
>> the delay.
> If we look at other perf sub tools (like 'perf record'), they uses
> your mentioned solution (1) to handle the delay and enabling counters,
> based on the condition checking for 'stat_config.initial_delay', 'perf
> record' enables counters at different time point.
>
> Alternatively, we can start workload based on the condition checking
> for 'stat_config.initial_delay', so we can get a below patch, this patch
> should can fix the issue, but the change is a bit urgly for me, so
> welcome a better fixing; if no, I will send a patch in my tomorrow
> morning.
>
> Thanks,
> Leo
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index d2ecd4d29624..046686a7c8d6 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -970,10 +970,15 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>   	 * Enable counters and exec the command:
>   	 */
>   	if (forks) {
> +		if (stat_config.initial_delay > 0)
> +			evlist__start_workload(evsel_list);
> +
>   		err = enable_counters();
>   		if (err)
>   			return -1;
> -		evlist__start_workload(evsel_list);
> +
> +		if (stat_config.initial_delay <= 0)
> +			evlist__start_workload(evsel_list);
>   
>   		t0 = rdclock();
>   		clock_gettime(CLOCK_MONOTONIC, &ref_time);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf tools: software regression in d0a0a511493d269514fcbd852481cdca32c95350
  2022-07-28 12:44   ` Adrián Herrera Arcila
@ 2022-07-28 14:07     ` Leo Yan
  2022-07-28 22:38       ` Song Liu
  0 siblings, 1 reply; 6+ messages in thread
From: Leo Yan @ 2022-07-28 14:07 UTC (permalink / raw)
  To: Adrián Herrera Arcila
  Cc: tmricht, acme, linux-perf-users, James.Clark, nd, songliubraving

Hi Adrián,

On Thu, Jul 28, 2022 at 01:44:57PM +0100, Adrián Herrera Arcila wrote:
> Thank you Leo.

Welcome, I recognized that I am not the best person to answer
your question :)

> I think the source code fragment in perf-record is more readable (logical)
> than in perf-stat, so replicating it would be better, also for consistency.

Makes sense for me.

> I have prepared a patch for that which I will send for review.

Very good!  I will review and test your patch.

> perf-stat differs from perf-record in the presence of BPF counters; these
> are enabled inside enable_counters immediately, independent of the delay; I
> respected that behaviour, but if it is not a problem to delay their
> enablement as well, the code would be slightly simplified. I have added Song
> Liu (author of BPF counters), in case of any comments on this.

Yeah, I also noticed that BPF counter enabling in perf-stat.

To be honest, I am also wandering why BPF counter cannot be enabled at
the same point with other events.  Would leave this question if Song
has chance to confirm for this.

> Note Thomas issue arised when using "-a"; without it, events are enabled on
> exec of the workload/command, so the instruction he mentioned would be
> captured. The code structure in perf-record, and in the patch I prepared,
> relies implicitly on enable_on_exec; I tried to clarify it in the comments.

Seems to me, using the flag 'enable_on_exec' is right thing to do.
I looked the code in "perf record", it uses a tricky to enable dummy
event when the workload process is forked for system wide tracing,
PMU events are deferred to be enabled for "initial_delay".

Look forward to seeing your fixing!

Thanks,
Leo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf tools: software regression in d0a0a511493d269514fcbd852481cdca32c95350
  2022-07-28 14:07     ` Leo Yan
@ 2022-07-28 22:38       ` Song Liu
  0 siblings, 0 replies; 6+ messages in thread
From: Song Liu @ 2022-07-28 22:38 UTC (permalink / raw)
  To: Leo Yan
  Cc: Adrián Herrera Arcila, tmricht, acme, linux-perf-users,
	James.Clark, nd

Hi Adrián and Leo,

Thanks for the CC. 

> On Jul 28, 2022, at 7:07 AM, Leo Yan <leo.yan@linaro.org> wrote:
> 
> Hi Adrián,
> 
> On Thu, Jul 28, 2022 at 01:44:57PM +0100, Adrián Herrera Arcila wrote:
>> Thank you Leo.
> 
> Welcome, I recognized that I am not the best person to answer
> your question :)
> 
>> I think the source code fragment in perf-record is more readable (logical)
>> than in perf-stat, so replicating it would be better, also for consistency.
> 
> Makes sense for me.
> 
>> I have prepared a patch for that which I will send for review.
> 
> Very good!  I will review and test your patch.
> 
>> perf-stat differs from perf-record in the presence of BPF counters; these
>> are enabled inside enable_counters immediately, independent of the delay; I
>> respected that behaviour, but if it is not a problem to delay their
>> enablement as well, the code would be slightly simplified. I have added Song
>> Liu (author of BPF counters), in case of any comments on this.
> 
> Yeah, I also noticed that BPF counter enabling in perf-stat.
> 
> To be honest, I am also wandering why BPF counter cannot be enabled at
> the same point with other events.  Would leave this question if Song
> has chance to confirm for this.

I don't recall very good reason why bpf counters are enabled in 
enable_counters, so I think it is OK to make it aware of initial_delay. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 6+ messages in thread

* perf tools: software regression in d0a0a511493d269514fcbd852481cdca32c95350
@ 2022-07-09 12:57 Adrián Herrera Arcila
  0 siblings, 0 replies; 6+ messages in thread
From: Adrián Herrera Arcila @ 2022-07-09 12:57 UTC (permalink / raw)
  To: tmricht, acme, linux-perf-users; +Cc: James.Clark

Hello all,

The regression affects the perf command "stat" used with the option "-D
msecs, --delay msecs". The change was introduced in v5.18 and was
backported up to v5.15.33 of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git. In
v5.18, the behaviour of the affected option is described in
tools/perf/Documentation/perf-stat.txt as follows:

"After starting the program, wait msecs before measuring (-1: start with
events disabled). This is useful to filter out the startup phase of the
program, which is often very different."

The program is expected to start without delay. The observed behaviour
in v5.18 does not correspond, as demonstrated by the following experiments:

time perf stat --delay 2000 --quiet sleep 2
v5.18 output:
Events disabled
Events enabled

real    0m4.065s
user    0m0.019s
sys    0m0.028s
v5.17 output:
Events disabled
Events enabled

real    0m2.028s
user    0m0.001s
sys    0m0.008s

perf stat --delay 2000 --quiet echo "marking"
v5.18 output:
Events disabled
Events enabled
marking
v5.17 output:
Events disabled
marking
Events enabled

I think there are two behaviours that are valuable to maintain to enable
two uses: (1) if no delay is specified, the program should start after
counters are enabled, to prevent the unexpected behaviour that Thomas
fixed with this change; (2) if delay is specified, the program should
start before the delay is engaged.

Two possible solutions are: (1) though conditionals; if delay > 0, start
the workload, then call enable_counters, which will usleep for the
delay; if no delay, then behaviour as it is in v5.18; (2) replace
blocking usleep with asynchronous callback, if that exists in Linux,
that enables counters after the delay.

Any suggestions from more experienced Linux developers on which solution
to pursue will be appreciated.

Kind regards,
Adrián.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-28 22:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-13  7:30 perf tools: software regression in d0a0a511493d269514fcbd852481cdca32c95350 Adrián Herrera Arcila
2022-07-14 12:46 ` Leo Yan
2022-07-28 12:44   ` Adrián Herrera Arcila
2022-07-28 14:07     ` Leo Yan
2022-07-28 22:38       ` Song Liu
  -- strict thread matches above, loose matches on Subject: below --
2022-07-09 12:57 Adrián Herrera Arcila

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.