All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Qais Yousef <qyousef@layalina.io>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Christian Loehle <christian.loehle@arm.com>,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] sched: Consolidate cpufreq updates
Date: Tue, 26 Mar 2024 09:20:31 +0100	[thread overview]
Message-ID: <ZgKFT5b423hfQdl9@gmail.com> (raw)
In-Reply-To: <20240324020139.1032473-1-qyousef@layalina.io>


* Qais Yousef <qyousef@layalina.io> wrote:

> Results of `perf stat --repeat 10 perf bench sched pipe` on AMD 3900X to
> verify any potential overhead because of the addition at context switch
> 
> Before:
> -------
> 
> 	Performance counter stats for 'perf bench sched pipe' (10 runs):
> 
> 		 16,839.74 msec task-clock:u              #    1.158 CPUs utilized            ( +-  0.52% )
> 			 0      context-switches:u        #    0.000 /sec
> 			 0      cpu-migrations:u          #    0.000 /sec
> 		     1,390      page-faults:u             #   83.903 /sec                     ( +-  0.06% )
> 	       333,773,107      cycles:u                  #    0.020 GHz                      ( +-  0.70% )  (83.72%)
> 		67,050,466      stalled-cycles-frontend:u #   19.94% frontend cycles idle     ( +-  2.99% )  (83.23%)
> 		37,763,775      stalled-cycles-backend:u  #   11.23% backend cycles idle      ( +-  2.18% )  (83.09%)
> 		84,456,137      instructions:u            #    0.25  insn per cycle
> 							  #    0.83  stalled cycles per insn  ( +-  0.02% )  (83.01%)
> 		34,097,544      branches:u                #    2.058 M/sec                    ( +-  0.02% )  (83.52%)
> 		 8,038,902      branch-misses:u           #   23.59% of all branches          ( +-  0.03% )  (83.44%)
> 
> 		   14.5464 +- 0.0758 seconds time elapsed  ( +-  0.52% )
> 
> After:
> -------
> 
> 	Performance counter stats for 'perf bench sched pipe' (10 runs):
> 
> 		 16,219.58 msec task-clock:u              #    1.130 CPUs utilized            ( +-  0.80% )
> 			 0      context-switches:u        #    0.000 /sec
> 			 0      cpu-migrations:u          #    0.000 /sec
> 		     1,391      page-faults:u             #   85.163 /sec                     ( +-  0.06% )
> 	       342,768,312      cycles:u                  #    0.021 GHz                      ( +-  0.63% )  (83.36%)
> 		66,231,208      stalled-cycles-frontend:u #   18.91% frontend cycles idle     ( +-  2.34% )  (83.95%)
> 		39,055,410      stalled-cycles-backend:u  #   11.15% backend cycles idle      ( +-  1.80% )  (82.73%)
> 		84,475,662      instructions:u            #    0.24  insn per cycle
> 							  #    0.82  stalled cycles per insn  ( +-  0.02% )  (83.05%)
> 		34,067,160      branches:u                #    2.086 M/sec                    ( +-  0.02% )  (83.67%)
> 		 8,042,888      branch-misses:u           #   23.60% of all branches          ( +-  0.07% )  (83.25%)
> 
> 		    14.358 +- 0.116 seconds time elapsed  ( +-  0.81% )

Noise caused by too many counters & the vagaries of multi-CPU scheduling is 
drowning out any results here.

I'd suggest somethig like this to measure same-CPU context-switching 
overhead:

    taskset 1 perf stat --repeat 10 -e cycles,instructions,task-clock perf bench sched pipe

... and make sure the cpufreq governor is at 'performance' first:

    for ((cpu=0; cpu < $(nproc); cpu++)); do echo performance > /sys/devices/system/cpu/cpu$cpu/cpufreq/scaling_governor; done

With that approach you should much, much lower noise levels even with just 
3 runs:

 Performance counter stats for 'perf bench sched pipe' (3 runs):

    51,616,501,297      cycles                           #    3.188 GHz                         ( +-  0.05% )
    37,523,641,203      instructions                     #    0.73  insn per cycle              ( +-  0.08% )
         16,191.01 msec task-clock                       #    0.999 CPUs utilized               ( +-  0.04% )

          16.20511 +- 0.00578 seconds time elapsed  ( +-  0.04% )

Thanks,

	Ingo

  reply	other threads:[~2024-03-26  8:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-24  2:01 [RFC PATCH] sched: Consolidate cpufreq updates Qais Yousef
2024-03-26  8:20 ` Ingo Molnar [this message]
2024-03-28 21:55   ` Qais Yousef
2024-03-26 14:44 ` Christian Loehle
2024-03-28 22:37   ` Qais Yousef
2024-03-27 13:35 ` Hongyan Xia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZgKFT5b423hfQdl9@gmail.com \
    --to=mingo@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.