From: Doug Smythies <dsmythies@telus.net>
To: Feng Tang <feng.tang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
"paulmck@kernel.org" <paulmck@kernel.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>,
"x86@kernel.org" <x86@kernel.org>,
"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
srinivas pandruvada <srinivas.pandruvada@linux.intel.com>,
dsmythies <dsmythies@telus.net>
Subject: Re: CPU excessively long times between frequency scaling driver calls - bisected
Date: Tue, 8 Feb 2022 22:23:13 -0800 [thread overview]
Message-ID: <CAAYoRsXkyWf0vmEE2HvjF6pzCC4utxTF=7AFx1PJv4Evh=C+Ow@mail.gmail.com> (raw)
In-Reply-To: <20220208091525.GA7898@shbuild999.sh.intel.com>
On Tue, Feb 8, 2022 at 1:15 AM Feng Tang <feng.tang@intel.com> wrote:
> On Mon, Feb 07, 2022 at 11:13:00PM -0800, Doug Smythies wrote:
> > > >
> > > > Since kernel 5.16-rc4 and commit: b50db7095fe002fa3e16605546cba66bf1b68a3e
> > > > " x86/tsc: Disable clocksource watchdog for TSC on qualified platorms"
> > > >
> > > > There are now occasions where times between calls to the driver can be
> > > > over 100's of seconds and can result in the CPU frequency being left
> > > > unnecessarily high for extended periods.
> > > >
> > > > From the number of clock cycles executed between these long
> > > > durations one can tell that the CPU has been running code, but
> > > > the driver never got called.
> > > >
> > > > Attached are some graphs from some trace data acquired using
> > > > intel_pstate_tracer.py where one can observe an idle system between
> > > > about 42 and well over 200 seconds elapsed time, yet CPU10 never gets
> > > > called, which would have resulted in reducing it's pstate request, until
> > > > an elapsed time of 167.616 seconds, 126 seconds since the last call. The
> > > > CPU frequency never does go to minimum.
> > > >
> > > > For reference, a similar CPU frequency graph is also attached, with
> > > > the commit reverted. The CPU frequency drops to minimum,
> > > > over about 10 or 15 seconds.,
> > >
> > > commit b50db7095fe0 essentially disables the clocksource watchdog,
> > > which literally doesn't have much to do with cpufreq code.
> > >
> > > One thing I can think of is, without the patch, there is a periodic
> > > clocksource timer running every 500 ms, and it loops to run on
> > > all CPUs in turn. For your HW, it has 12 CPUs (from the graph),
> > > so each CPU will get a timer (HW timer interrupt backed) every 6
> > > seconds. Could this affect the cpufreq governor's work flow (I just
> > > quickly read some cpufreq code, and seem there is irq_work/workqueue
> > > involved).
> >
> > 6 Seconds is the longest duration I have ever seen on this
> > processor before commit b50db7095fe0.
> >
> > I said "the times between calls to the driver have never
> > exceeded 10 seconds" originally, but that involved other processors.
> >
> > I also did longer, 9000 second tests:
> >
> > For a reverted kernel the driver was called 131,743,
> > and 0 times the duration was longer than 6.1 seconds.
> >
> > For a non-reverted kernel the driver was called 110,241 times,
> > and 1397 times the duration was longer than 6.1 seconds,
> > and the maximum duration was 303.6 seconds
>
> Thanks for the data, which shows it is related to the removal of
> clocksource watchdog timers. And under this specific configurations,
> the cpufreq work flow has some dependence on that watchdog timers.
>
> Also could you share you kernel config, boot message and some
> system settings like for tickless mode, so that other people can
> try to reproduce? thanks
I steal the kernel configuration file from the Ubuntu mainline PPA
[1], what they call "lowlatency", or 1000Hz tick. I make these
changes before compile:
scripts/config --disable DEBUG_INFO
scripts/config --disable SYSTEM_TRUSTED_KEYS
scripts/config --disable SYSTEM_REVOCATION_KEYS
I also send you the config and dmesg files in an off-list email.
This is an idle, and very low periodic loads, system type test.
My test computer has no GUI and very few services running.
Notice that I have not used the word "regression" yet in this thread,
because I don't know for certain that it is. In the end, we don't
care about CPU frequency, we care about wasting energy.
It is definitely a change, and I am able to measure small increases
in energy use, but this is all at the low end of the power curve.
So far I have not found a significant example of increased power
use, but I also have not looked very hard.
During any test, many monitoring tools might shorten durations.
For example if I run turbostat, say:
sudo turbostat --Summary --quiet --show
Busy%,Bzy_MHz,IRQ,PkgWatt,PkgTmp,RAMWatt,GFXWatt,CorWatt --interval
2.5
Well, yes then the maximum duration would be 2.5 seconds,
because turbostat wakes up each CPU to inquire about things
causing a call to the CPU scaling driver. (I tested this, for about
900 seconds.)
For my power tests I use a sample interval of >= 300 seconds.
For duration only tests, turbostat is not run at the same time.
My grub line:
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 consoleblank=314
intel_pstate=active intel_pstate=no_hwp msr.allow_writes=on
cpuidle.governor=teo"
A typical pstate tracer command (with the script copied to the
directory where I run this stuff:):
sudo ./intel_pstate_tracer.py --interval 600 --name vnew02 --memory 800000
>
> > > Can you try one test that keep all the current setting and change
> > > the irq affinity of disk/network-card to 0xfff to let interrupts
> > > from them be distributed to all CPUs?
> >
> > I am willing to do the test, but I do not know how to change the
> > irq affinity.
>
> I might say that too soon. I used to "echo fff > /proc/irq/xxx/smp_affinity"
> (xx is the irq number of a device) to let interrupts be distributed
> to all CPUs long time ago, but it doesn't work on my 2 desktops at hand.
> Seems it only support one-cpu irq affinity in recent kernel.
>
> You can still try that command, though it may not work.
I did not try this yet.
[1] https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.17-rc3/
next prev parent reply other threads:[~2022-02-09 6:24 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <003f01d81c8c$d20ee3e0$762caba0$@telus.net>
2022-02-08 2:39 ` CPU excessively long times between frequency scaling driver calls - bisected Feng Tang
2022-02-08 7:13 ` Doug Smythies
2022-02-08 9:15 ` Feng Tang
2022-02-09 6:23 ` Doug Smythies [this message]
2022-02-10 7:45 ` Zhang, Rui
2022-02-13 18:54 ` Doug Smythies
2022-02-14 15:17 ` srinivas pandruvada
2022-02-15 21:35 ` Doug Smythies
2022-02-22 7:34 ` Feng Tang
2022-02-22 18:04 ` Rafael J. Wysocki
2022-02-23 0:07 ` Doug Smythies
2022-02-23 0:32 ` srinivas pandruvada
2022-02-23 0:40 ` Feng Tang
2022-02-23 14:23 ` Rafael J. Wysocki
2022-02-24 8:08 ` Feng Tang
2022-02-24 14:44 ` Paul E. McKenney
2022-02-24 16:29 ` Doug Smythies
2022-02-24 16:58 ` Paul E. McKenney
2022-02-25 0:29 ` Feng Tang
2022-02-25 1:06 ` Paul E. McKenney
2022-02-25 17:45 ` Rafael J. Wysocki
2022-02-26 0:36 ` Doug Smythies
2022-02-28 4:12 ` Feng Tang
2022-02-28 19:36 ` Rafael J. Wysocki
2022-03-01 5:52 ` Feng Tang
2022-03-01 11:58 ` Rafael J. Wysocki
2022-03-01 17:18 ` Doug Smythies
2022-03-01 17:34 ` Rafael J. Wysocki
2022-03-02 4:06 ` Doug Smythies
2022-03-02 19:00 ` Rafael J. Wysocki
2022-03-03 23:00 ` Doug Smythies
2022-03-04 6:59 ` Doug Smythies
2022-03-16 15:54 ` Doug Smythies
2022-03-17 12:30 ` Rafael J. Wysocki
2022-03-17 13:58 ` Doug Smythies
2022-03-24 14:04 ` Doug Smythies
2022-03-24 18:17 ` Rafael J. Wysocki
2022-03-25 0:03 ` Doug Smythies
2022-03-03 5:27 ` Feng Tang
2022-03-03 12:02 ` Rafael J. Wysocki
2022-03-04 5:13 ` Feng Tang
2022-03-04 16:23 ` Paul E. McKenney
2022-02-23 2:49 ` Feng Tang
2022-02-23 14:11 ` Rafael J. Wysocki
2022-02-23 9:40 ` Thomas Gleixner
2022-02-23 14:23 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAAYoRsXkyWf0vmEE2HvjF6pzCC4utxTF=7AFx1PJv4Evh=C+Ow@mail.gmail.com' \
--to=dsmythies@telus.net \
--cc=feng.tang@intel.com \
--cc=linux-pm@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=srinivas.pandruvada@linux.intel.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).