linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Jihong <yangjihong1@huawei.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: <mingo@redhat.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization
Date: Thu, 14 Oct 2021 16:09:34 +0800	[thread overview]
Message-ID: <8977468f-26d1-1945-0d11-e56b10ff47c0@huawei.com> (raw)
In-Reply-To: <20211013230201.0f777564@oasis.local.home>

Hi Steven,

On 2021/10/14 11:02, Steven Rostedt wrote:
> On Mon, 11 Oct 2021 19:50:18 +0800
> Yang Jihong <yangjihong1@huawei.com> wrote:
> 
>> commit 85f726a35e504418 use strncpy instead of memcpy when copying comm,
>> on ARM64 machine, this commit causes performance degradation.
>>
>> For the task that already exists in savecmd, it is unnecessary to call
>> set_cmdline to execute strncpy once, run set_cmdline only if the task does
>> not exist in savecmd.
>>
>> I have written an example (which is an extreme case) in which trace sched switch
>> is invoked for 1000 times, as shown in the following:
>>
>>    for (int i = 0; i < 1000; i++) {
>>            trace_sched_switch(true, current, current);
>>   }
> 
> Well that's a pretty non realistic benchmark.
> 
>>
>> On ARM64 machine, compare the data before and after the optimization:
>> +---------------------+------------------------------+------------------------+
>> |                     | Total number of instructions | Total number of cycles |
>> +---------------------+------------------------------+------------------------+
>> | Before optimization |           1107367            |          658491        |
>> +---------------------+------------------------------+------------------------+
>> | After optimization  |            869367            |          520171        |
>> +---------------------+------------------------------+------------------------+
>> As shown above, there is nearly 26% performance
> 
> I'd prefer to see a more realistic benchmark.
> 
>>
>> Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
>> ---
>>   kernel/trace/trace.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index 7896d30d90f7..a795610a3b37 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -2427,8 +2427,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
>>   		savedcmd->cmdline_idx = idx;
>>   	}
>>   
>> -	savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
>> -	set_cmdline(idx, tsk->comm);
>> +	/* save cmdline only when task does not exist in savecmd */
>> +	if (savedcmd->map_cmdline_to_pid[idx] != tsk->pid) {
>> +		savedcmd->map_cmdline_to_pid[idx] = tsk->pid;
>> +		set_cmdline(idx, tsk->comm);
>> +	}
> 
> I'm not against adding this. Just for kicks I ran the following before
> and after this patch:
> 
>    # trace-cmd start -e sched
>    # perf stat -r 100 hackbench 50
> 
> Before:
> 
>   Performance counter stats for '/work/c/hackbench 50' (100 runs):
> 
>            6,261.26 msec task-clock                #    6.126 CPUs utilized            ( +-  0.12% )
>              93,519      context-switches          #   14.936 K/sec                    ( +-  1.12% )
>              13,725      cpu-migrations            #    2.192 K/sec                    ( +-  1.16% )
>              47,266      page-faults               #    7.549 K/sec                    ( +-  0.54% )
>      22,911,885,026      cycles                    #    3.659 GHz                      ( +-  0.11% )
>      15,171,250,777      stalled-cycles-frontend   #   66.22% frontend cycles idle     ( +-  0.13% )
>      18,330,841,604      instructions              #    0.80  insn per cycle
>                                                    #    0.83  stalled cycles per insn  ( +-  0.11% )
>       4,027,904,559      branches                  #  643.306 M/sec                    ( +-  0.11% )
>          31,327,782      branch-misses             #    0.78% of all branches          ( +-  0.20% )
> 
>             1.02201 +- 0.00158 seconds time elapsed  ( +-  0.15% )
> After:
> 
>   Performance counter stats for '/work/c/hackbench 50' (100 runs):
> 
>            6,216.47 msec task-clock                #    6.124 CPUs utilized            ( +-  0.10% )
>              93,311      context-switches          #   15.010 K/sec                    ( +-  0.91% )
>              13,719      cpu-migrations            #    2.207 K/sec                    ( +-  1.09% )
>              47,085      page-faults               #    7.574 K/sec                    ( +-  0.49% )
>      22,746,703,318      cycles                    #    3.659 GHz                      ( +-  0.09% )
>      15,012,911,121      stalled-cycles-frontend   #   66.00% frontend cycles idle     ( +-  0.11% )
>      18,275,147,949      instructions              #    0.80  insn per cycle
>                                                    #    0.82  stalled cycles per insn  ( +-  0.08% )
>       4,017,673,788      branches                  #  646.295 M/sec                    ( +-  0.08% )
>          31,313,459      branch-misses             #    0.78% of all branches          ( +-  0.17% )
> 
>             1.01506 +- 0.00150 seconds time elapsed  ( +-  0.15% )
> 
> Really it's all in the noise, so adding this doesn't seem to hurt.
> 
Thanks very much for benchmark test data. :)
Indeed, the effect of this modification is not obvious in scenarios 
where tasks are repeatedly created, but only in scenarios where tasks 
are repeatedly scheduled between a limited number of tasks.

Thanks,
Jihong
> -- Steve
> 
> 
> 
>>   
>>   	arch_spin_unlock(&trace_cmdline_lock);
>>   
> 
> .
> 

  reply	other threads:[~2021-10-14  8:09 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-11 11:50 [PATCH] tracing: save cmdline only when task does not exist in savecmd for optimization Yang Jihong
2021-10-14  3:02 ` Steven Rostedt
2021-10-14  8:09   ` Yang Jihong [this message]
2021-10-14 14:32 ` Steven Rostedt
2021-10-15  1:27   ` Yang Jihong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8977468f-26d1-1945-0d11-e56b10ff47c0@huawei.com \
    --to=yangjihong1@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).