linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization
@ 2021-10-11 11:50 Yang Jihong
  2021-10-14  3:02 ` Steven Rostedt
  2021-10-14 14:32 ` Steven Rostedt
  0 siblings, 2 replies; 5+ messages in thread
From: Yang Jihong @ 2021-10-11 11:50 UTC (permalink / raw)
  To: rostedt, mingo, linux-kernel; +Cc: yangjihong1

commit 85f726a35e504418 use strncpy instead of memcpy when copying comm,
on ARM64 machine, this commit causes performance degradation.

For the task that already exists in savecmd, it is unnecessary to call
set_cmdline to execute strncpy once, run set_cmdline only if the task does
not exist in savecmd.

I have written an example (which is an extreme case) in which trace sched switch
is invoked for 1000 times, as shown in the following:

  for (int i = 0; i < 1000; i++) {
          trace_sched_switch(true, current, current);
 }

On ARM64 machine, compare the data before and after the optimization:
+---------------------+------------------------------+------------------------+
|                     | Total number of instructions | Total number of cycles |
+---------------------+------------------------------+------------------------+
| Before optimization |           1107367            |          658491        |
+---------------------+------------------------------+------------------------+
| After optimization  |            869367            |          520171        |
+---------------------+------------------------------+------------------------+
As shown above, there is nearly 26% performance

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
---
 kernel/trace/trace.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7896d30d90f7..a795610a3b37 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2427,8 +2427,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
 		savedcmd->cmdline_idx = idx;
 	}
 
-	savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
-	set_cmdline(idx, tsk->comm);
+	/* save cmdline only when task does not exist in savecmd */
+	if (savedcmd->map_cmdline_to_pid[idx] != tsk->pid) {
+		savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
+		set_cmdline(idx, tsk->comm);
+	}
 
 	arch_spin_unlock(&trace_cmdline_lock);
 
-- 
2.30.GIT


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization
  2021-10-11 11:50 [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization Yang Jihong
@ 2021-10-14  3:02 ` Steven Rostedt
  2021-10-14  8:09   ` Yang Jihong
  2021-10-14 14:32 ` Steven Rostedt
  1 sibling, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2021-10-14  3:02 UTC (permalink / raw)
  To: Yang Jihong; +Cc: mingo, linux-kernel

On Mon, 11 Oct 2021 19:50:18 +0800
Yang Jihong <yangjihong1@huawei.com> wrote:

> commit 85f726a35e504418 use strncpy instead of memcpy when copying comm,
> on ARM64 machine, this commit causes performance degradation.
> 
> For the task that already exists in savecmd, it is unnecessary to call
> set_cmdline to execute strncpy once, run set_cmdline only if the task does
> not exist in savecmd.
> 
> I have written an example (which is an extreme case) in which trace sched switch
> is invoked for 1000 times, as shown in the following:
> 
>   for (int i = 0; i < 1000; i++) {
>           trace_sched_switch(true, current, current);
>  }

Well that's a pretty non realistic benchmark.

> 
> On ARM64 machine, compare the data before and after the optimization:
> +---------------------+------------------------------+------------------------+
> |                     | Total number of instructions | Total number of cycles |
> +---------------------+------------------------------+------------------------+
> | Before optimization |           1107367            |          658491        |
> +---------------------+------------------------------+------------------------+
> | After optimization  |            869367            |          520171        |
> +---------------------+------------------------------+------------------------+
> As shown above, there is nearly 26% performance

I'd prefer to see a more realistic benchmark.

> 
> Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
> ---
>  kernel/trace/trace.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 7896d30d90f7..a795610a3b37 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2427,8 +2427,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
>  		savedcmd->cmdline_idx = idx;
>  	}
>  
> -	savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
> -	set_cmdline(idx, tsk->comm);
> +	/* save cmdline only when task does not exist in savecmd */
> +	if (savedcmd->map_cmdline_to_pid[idx] != tsk->pid) {
> +		savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
> +		set_cmdline(idx, tsk->comm);
> +	}

I'm not against adding this. Just for kicks I ran the following before
and after this patch:

  # trace-cmd start -e sched
  # perf stat -r 100 hackbench 50

Before:

 Performance counter stats for '/work/c/hackbench 50' (100 runs):

          6,261.26 msec task-clock                #    6.126 CPUs utilized            ( +-  0.12% )
            93,519      context-switches          #   14.936 K/sec                    ( +-  1.12% )
            13,725      cpu-migrations            #    2.192 K/sec                    ( +-  1.16% )
            47,266      page-faults               #    7.549 K/sec                    ( +-  0.54% )
    22,911,885,026      cycles                    #    3.659 GHz                      ( +-  0.11% )
    15,171,250,777      stalled-cycles-frontend   #   66.22% frontend cycles idle     ( +-  0.13% )
    18,330,841,604      instructions              #    0.80  insn per cycle
                                                  #    0.83  stalled cycles per insn  ( +-  0.11% )
     4,027,904,559      branches                  #  643.306 M/sec                    ( +-  0.11% )
        31,327,782      branch-misses             #    0.78% of all branches          ( +-  0.20% )

           1.02201 +- 0.00158 seconds time elapsed  ( +-  0.15% )
After:

 Performance counter stats for '/work/c/hackbench 50' (100 runs):

          6,216.47 msec task-clock                #    6.124 CPUs utilized            ( +-  0.10% )
            93,311      context-switches          #   15.010 K/sec                    ( +-  0.91% )
            13,719      cpu-migrations            #    2.207 K/sec                    ( +-  1.09% )
            47,085      page-faults               #    7.574 K/sec                    ( +-  0.49% )
    22,746,703,318      cycles                    #    3.659 GHz                      ( +-  0.09% )
    15,012,911,121      stalled-cycles-frontend   #   66.00% frontend cycles idle     ( +-  0.11% )
    18,275,147,949      instructions              #    0.80  insn per cycle
                                                  #    0.82  stalled cycles per insn  ( +-  0.08% )
     4,017,673,788      branches                  #  646.295 M/sec                    ( +-  0.08% )
        31,313,459      branch-misses             #    0.78% of all branches          ( +-  0.17% )

           1.01506 +- 0.00150 seconds time elapsed  ( +-  0.15% )

Really it's all in the noise, so adding this doesn't seem to hurt.

-- Steve



>  
>  	arch_spin_unlock(&trace_cmdline_lock);
>  


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization
  2021-10-14  3:02 ` Steven Rostedt
@ 2021-10-14  8:09   ` Yang Jihong
  0 siblings, 0 replies; 5+ messages in thread
From: Yang Jihong @ 2021-10-14  8:09 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, linux-kernel

Hi Steven,

On 2021/10/14 11:02, Steven Rostedt wrote:
> On Mon, 11 Oct 2021 19:50:18 +0800
> Yang Jihong <yangjihong1@huawei.com> wrote:
> 
>> commit 85f726a35e504418 use strncpy instead of memcpy when copying comm,
>> on ARM64 machine, this commit causes performance degradation.
>>
>> For the task that already exists in savecmd, it is unnecessary to call
>> set_cmdline to execute strncpy once, run set_cmdline only if the task does
>> not exist in savecmd.
>>
>> I have written an example (which is an extreme case) in which trace sched switch
>> is invoked for 1000 times, as shown in the following:
>>
>>    for (int i = 0; i < 1000; i++) {
>>            trace_sched_switch(true, current, current);
>>   }
> 
> Well that's a pretty non realistic benchmark.
> 
>>
>> On ARM64 machine, compare the data before and after the optimization:
>> +---------------------+------------------------------+------------------------+
>> |                     | Total number of instructions | Total number of cycles |
>> +---------------------+------------------------------+------------------------+
>> | Before optimization |           1107367            |          658491        |
>> +---------------------+------------------------------+------------------------+
>> | After optimization  |            869367            |          520171        |
>> +---------------------+------------------------------+------------------------+
>> As shown above, there is nearly 26% performance
> 
> I'd prefer to see a more realistic benchmark.
> 
>>
>> Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
>> ---
>>   kernel/trace/trace.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index 7896d30d90f7..a795610a3b37 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -2427,8 +2427,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
>>   		savedcmd->cmdline_idx = idx;
>>   	}
>>   
>> -	savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
>> -	set_cmdline(idx, tsk->comm);
>> +	/* save cmdline only when task does not exist in savecmd */
>> +	if (savedcmd->map_cmdline_to_pid[idx] != tsk->pid) {
>> +		savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
>> +		set_cmdline(idx, tsk->comm);
>> +	}
> 
> I'm not against adding this. Just for kicks I ran the following before
> and after this patch:
> 
>    # trace-cmd start -e sched
>    # perf stat -r 100 hackbench 50
> 
> Before:
> 
>   Performance counter stats for '/work/c/hackbench 50' (100 runs):
> 
>            6,261.26 msec task-clock                #    6.126 CPUs utilized            ( +-  0.12% )
>              93,519      context-switches          #   14.936 K/sec                    ( +-  1.12% )
>              13,725      cpu-migrations            #    2.192 K/sec                    ( +-  1.16% )
>              47,266      page-faults               #    7.549 K/sec                    ( +-  0.54% )
>      22,911,885,026      cycles                    #    3.659 GHz                      ( +-  0.11% )
>      15,171,250,777      stalled-cycles-frontend   #   66.22% frontend cycles idle     ( +-  0.13% )
>      18,330,841,604      instructions              #    0.80  insn per cycle
>                                                    #    0.83  stalled cycles per insn  ( +-  0.11% )
>       4,027,904,559      branches                  #  643.306 M/sec                    ( +-  0.11% )
>          31,327,782      branch-misses             #    0.78% of all branches          ( +-  0.20% )
> 
>             1.02201 +- 0.00158 seconds time elapsed  ( +-  0.15% )
> After:
> 
>   Performance counter stats for '/work/c/hackbench 50' (100 runs):
> 
>            6,216.47 msec task-clock                #    6.124 CPUs utilized            ( +-  0.10% )
>              93,311      context-switches          #   15.010 K/sec                    ( +-  0.91% )
>              13,719      cpu-migrations            #    2.207 K/sec                    ( +-  1.09% )
>              47,085      page-faults               #    7.574 K/sec                    ( +-  0.49% )
>      22,746,703,318      cycles                    #    3.659 GHz                      ( +-  0.09% )
>      15,012,911,121      stalled-cycles-frontend   #   66.00% frontend cycles idle     ( +-  0.11% )
>      18,275,147,949      instructions              #    0.80  insn per cycle
>                                                    #    0.82  stalled cycles per insn  ( +-  0.08% )
>       4,017,673,788      branches                  #  646.295 M/sec                    ( +-  0.08% )
>          31,313,459      branch-misses             #    0.78% of all branches          ( +-  0.17% )
> 
>             1.01506 +- 0.00150 seconds time elapsed  ( +-  0.15% )
> 
> Really it's all in the noise, so adding this doesn't seem to hurt.
> 
Thanks very much for benchmark test data. :)
Indeed, the effect of this modification is not obvious in scenarios 
where tasks are repeatedly created, but only in scenarios where tasks 
are repeatedly scheduled between a limited number of tasks.

Thanks,
Jihong
> -- Steve
> 
> 
> 
>>   
>>   	arch_spin_unlock(&trace_cmdline_lock);
>>   
> 
> .
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization
  2021-10-11 11:50 [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization Yang Jihong
  2021-10-14  3:02 ` Steven Rostedt
@ 2021-10-14 14:32 ` Steven Rostedt
  2021-10-15  1:27   ` Yang Jihong
  1 sibling, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2021-10-14 14:32 UTC (permalink / raw)
  To: Yang Jihong; +Cc: mingo, linux-kernel

On Mon, 11 Oct 2021 19:50:18 +0800
Yang Jihong <yangjihong1@huawei.com> wrote:

> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 7896d30d90f7..a795610a3b37 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2427,8 +2427,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
>  		savedcmd->cmdline_idx = idx;
>  	}
>  
> -	savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
> -	set_cmdline(idx, tsk->comm);
> +	/* save cmdline only when task does not exist in savecmd */
> +	if (savedcmd->map_cmdline_to_pid[idx] != tsk->pid) {
> +		savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
> +		set_cmdline(idx, tsk->comm);
> +	}
>  

I now remember why I never did it this way. This breaks saving the command
line when we do an exec.

That is, just because mapped_pid == tsk->pid does not mean that the comm is
the same.

-- Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization
  2021-10-14 14:32 ` Steven Rostedt
@ 2021-10-15  1:27   ` Yang Jihong
  0 siblings, 0 replies; 5+ messages in thread
From: Yang Jihong @ 2021-10-15  1:27 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: mingo, linux-kernel

Hi Steven,

On 2021/10/14 22:32, Steven Rostedt wrote:
> On Mon, 11 Oct 2021 19:50:18 +0800
> Yang Jihong <yangjihong1@huawei.com> wrote:
> 
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index 7896d30d90f7..a795610a3b37 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -2427,8 +2427,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
>>   		savedcmd->cmdline_idx = idx;
>>   	}
>>   
>> -	savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
>> -	set_cmdline(idx, tsk->comm);
>> +	/* save cmdline only when task does not exist in savecmd */
>> +	if (savedcmd->map_cmdline_to_pid[idx] != tsk->pid) {
>> +		savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
>> +		set_cmdline(idx, tsk->comm);
>> +	}
>>   
> 
> I now remember why I never did it this way. This breaks saving the command
> line when we do an exec.
> 
> That is, just because mapped_pid == tsk->pid does not mean that the comm is
> the same.
> 
If do an exec, the original process is replaced with a new binary. 
Therefore, the command changes but the tsk->pid does not change. 
Therefore, we need to savecmd here again?

Okay, I see. Thank you for the detailed explanation. :)

Thanks,
Jihong
> -- Steve
> .
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-15  1:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-11 11:50 [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization Yang Jihong
2021-10-14  3:02 ` Steven Rostedt
2021-10-14  8:09   ` Yang Jihong
2021-10-14 14:32 ` Steven Rostedt
2021-10-15  1:27   ` Yang Jihong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).