All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Viresh Kumar" <viresh.kumar@linaro.org>,
	"Rafael Wysocki" <rjw@rjwysocki.net>,
	"Russell King" <linux@armlinux.org.uk>,
	"David S. Miller" <davem@davemloft.net>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Linux PM" <linux-pm@vger.kernel.org>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Linux ARM" <linux-arm-kernel@lists.infradead.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	sparclinux@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH V2] cpufreq: Call transition notifier only once for each policy
Date: Mon, 18 Mar 2019 11:45:00 +0100	[thread overview]
Message-ID: <CAJZ5v0g=VYSKAHuncceD-8a1AY+isR7-TMX2dbdMzpZgVkRfBg@mail.gmail.com> (raw)
In-Reply-To: <20190315122952.GF6058@hirez.programming.kicks-ass.net>

On Fri, Mar 15, 2019 at 1:30 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Mar 15, 2019 at 02:43:07PM +0530, Viresh Kumar wrote:
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 3fae23834069..cff8779fc0d2 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -956,28 +956,38 @@ static int time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
> >                               void *data)
> >  {
> >       struct cpufreq_freqs *freq = data;
> > -     unsigned long *lpj;
> > -
> > -     lpj = &boot_cpu_data.loops_per_jiffy;
> > -#ifdef CONFIG_SMP
> > -     if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> > -             lpj = &cpu_data(freq->cpu).loops_per_jiffy;
> > -#endif
> > +     struct cpumask *cpus = freq->policy->cpus;
> > +     bool boot_cpu = !IS_ENABLED(CONFIG_SMP) || freq->flags & CPUFREQ_CONST_LOOPS;
> > +     unsigned long lpj;
> > +     int cpu;
> >
> >       if (!ref_freq) {
> >               ref_freq = freq->old;
> > -             loops_per_jiffy_ref = *lpj;
> >               tsc_khz_ref = tsc_khz;
> > +
> > +             if (boot_cpu)
> > +                     loops_per_jiffy_ref = boot_cpu_data.loops_per_jiffy;
> > +             else
> > +                     loops_per_jiffy_ref = cpu_data(cpumask_first(cpus)).loops_per_jiffy;
> >       }
> > +
> >       if ((val == CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
> >                       (val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
> > -             *lpj = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
> > -
> > +             lpj = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
> >               tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
> > +
> >               if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> >                       mark_tsc_unstable("cpufreq changes");
> >
> > -             set_cyc2ns_scale(tsc_khz, freq->cpu, rdtsc());
> > +             if (boot_cpu) {
> > +                     boot_cpu_data.loops_per_jiffy = lpj;
> > +             } else {
> > +                     for_each_cpu(cpu, cpus)
> > +                             cpu_data(cpu).loops_per_jiffy = lpj;
> > +             }
> > +
> > +             for_each_cpu(cpu, cpus)
> > +                     set_cyc2ns_scale(tsc_khz, cpu, rdtsc());
>
> This code doesn't make sense, the rdtsc() _must_ be called on the CPU in
> question.

Well, strictly speaking the TSC value here comes from the CPU running the code.

The original code has this problem too, though (as Viresh said), so
the patch really doesn't make it worse in that respect. :-)

I'm not going to defend the original code (I ldidn't invent it
anyway), but it clearly assumes that different CPUs cannot run at
different frequencies and that kind of explains what happens in it.

> That's part of the whole problem here, TSC isn't sync'ed when
> it's subject to CPUFREQ.

So what would you recommend us to do here?

Obviously, this won't run on any new hardware.  Frankly, I'm not even
sure what the most recent HW where this hack would make a difference
is (the comment talking about Opterons suggests early 2000s), so this
clearly falls into the "legacy" bucket to me.

Does it make sense to try to preserve it, or can we simply make
cpufreq init fail on the systems where the TSC rate depends on the
frequency?

WARNING: multiple messages have this Message-ID (diff)
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Viresh Kumar" <viresh.kumar@linaro.org>,
	"Rafael Wysocki" <rjw@rjwysocki.net>,
	"Russell King" <linux@armlinux.org.uk>,
	"David S. Miller" <davem@davemloft.net>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Linux PM" <linux-pm@vger.kernel.org>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Linux ARM" <linux-arm-kernel@lists.infradead.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	sparclinux@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH V2] cpufreq: Call transition notifier only once for each policy
Date: Mon, 18 Mar 2019 10:45:00 +0000	[thread overview]
Message-ID: <CAJZ5v0g=VYSKAHuncceD-8a1AY+isR7-TMX2dbdMzpZgVkRfBg@mail.gmail.com> (raw)
In-Reply-To: <20190315122952.GF6058@hirez.programming.kicks-ass.net>

On Fri, Mar 15, 2019 at 1:30 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Mar 15, 2019 at 02:43:07PM +0530, Viresh Kumar wrote:
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 3fae23834069..cff8779fc0d2 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -956,28 +956,38 @@ static int time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
> >                               void *data)
> >  {
> >       struct cpufreq_freqs *freq = data;
> > -     unsigned long *lpj;
> > -
> > -     lpj = &boot_cpu_data.loops_per_jiffy;
> > -#ifdef CONFIG_SMP
> > -     if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> > -             lpj = &cpu_data(freq->cpu).loops_per_jiffy;
> > -#endif
> > +     struct cpumask *cpus = freq->policy->cpus;
> > +     bool boot_cpu = !IS_ENABLED(CONFIG_SMP) || freq->flags & CPUFREQ_CONST_LOOPS;
> > +     unsigned long lpj;
> > +     int cpu;
> >
> >       if (!ref_freq) {
> >               ref_freq = freq->old;
> > -             loops_per_jiffy_ref = *lpj;
> >               tsc_khz_ref = tsc_khz;
> > +
> > +             if (boot_cpu)
> > +                     loops_per_jiffy_ref = boot_cpu_data.loops_per_jiffy;
> > +             else
> > +                     loops_per_jiffy_ref = cpu_data(cpumask_first(cpus)).loops_per_jiffy;
> >       }
> > +
> >       if ((val = CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
> >                       (val = CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
> > -             *lpj = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
> > -
> > +             lpj = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
> >               tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
> > +
> >               if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> >                       mark_tsc_unstable("cpufreq changes");
> >
> > -             set_cyc2ns_scale(tsc_khz, freq->cpu, rdtsc());
> > +             if (boot_cpu) {
> > +                     boot_cpu_data.loops_per_jiffy = lpj;
> > +             } else {
> > +                     for_each_cpu(cpu, cpus)
> > +                             cpu_data(cpu).loops_per_jiffy = lpj;
> > +             }
> > +
> > +             for_each_cpu(cpu, cpus)
> > +                     set_cyc2ns_scale(tsc_khz, cpu, rdtsc());
>
> This code doesn't make sense, the rdtsc() _must_ be called on the CPU in
> question.

Well, strictly speaking the TSC value here comes from the CPU running the code.

The original code has this problem too, though (as Viresh said), so
the patch really doesn't make it worse in that respect. :-)

I'm not going to defend the original code (I ldidn't invent it
anyway), but it clearly assumes that different CPUs cannot run at
different frequencies and that kind of explains what happens in it.

> That's part of the whole problem here, TSC isn't sync'ed when
> it's subject to CPUFREQ.

So what would you recommend us to do here?

Obviously, this won't run on any new hardware.  Frankly, I'm not even
sure what the most recent HW where this hack would make a difference
is (the comment talking about Opterons suggests early 2000s), so this
clearly falls into the "legacy" bucket to me.

Does it make sense to try to preserve it, or can we simply make
cpufreq init fail on the systems where the TSC rate depends on the
frequency?

WARNING: multiple messages have this Message-ID (diff)
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Vincent Guittot" <vincent.guittot@linaro.org>,
	kvm@vger.kernel.org, "Radim Krčmář" <rkrcmar@redhat.com>,
	"Viresh Kumar" <viresh.kumar@linaro.org>,
	"Linux PM" <linux-pm@vger.kernel.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	"Rafael Wysocki" <rjw@rjwysocki.net>,
	"Russell King" <linux@armlinux.org.uk>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	sparclinux@vger.kernel.org, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"David S. Miller" <davem@davemloft.net>,
	"Linux ARM" <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH V2] cpufreq: Call transition notifier only once for each policy
Date: Mon, 18 Mar 2019 11:45:00 +0100	[thread overview]
Message-ID: <CAJZ5v0g=VYSKAHuncceD-8a1AY+isR7-TMX2dbdMzpZgVkRfBg@mail.gmail.com> (raw)
In-Reply-To: <20190315122952.GF6058@hirez.programming.kicks-ass.net>

On Fri, Mar 15, 2019 at 1:30 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Mar 15, 2019 at 02:43:07PM +0530, Viresh Kumar wrote:
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 3fae23834069..cff8779fc0d2 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -956,28 +956,38 @@ static int time_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
> >                               void *data)
> >  {
> >       struct cpufreq_freqs *freq = data;
> > -     unsigned long *lpj;
> > -
> > -     lpj = &boot_cpu_data.loops_per_jiffy;
> > -#ifdef CONFIG_SMP
> > -     if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> > -             lpj = &cpu_data(freq->cpu).loops_per_jiffy;
> > -#endif
> > +     struct cpumask *cpus = freq->policy->cpus;
> > +     bool boot_cpu = !IS_ENABLED(CONFIG_SMP) || freq->flags & CPUFREQ_CONST_LOOPS;
> > +     unsigned long lpj;
> > +     int cpu;
> >
> >       if (!ref_freq) {
> >               ref_freq = freq->old;
> > -             loops_per_jiffy_ref = *lpj;
> >               tsc_khz_ref = tsc_khz;
> > +
> > +             if (boot_cpu)
> > +                     loops_per_jiffy_ref = boot_cpu_data.loops_per_jiffy;
> > +             else
> > +                     loops_per_jiffy_ref = cpu_data(cpumask_first(cpus)).loops_per_jiffy;
> >       }
> > +
> >       if ((val == CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
> >                       (val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
> > -             *lpj = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
> > -
> > +             lpj = cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new);
> >               tsc_khz = cpufreq_scale(tsc_khz_ref, ref_freq, freq->new);
> > +
> >               if (!(freq->flags & CPUFREQ_CONST_LOOPS))
> >                       mark_tsc_unstable("cpufreq changes");
> >
> > -             set_cyc2ns_scale(tsc_khz, freq->cpu, rdtsc());
> > +             if (boot_cpu) {
> > +                     boot_cpu_data.loops_per_jiffy = lpj;
> > +             } else {
> > +                     for_each_cpu(cpu, cpus)
> > +                             cpu_data(cpu).loops_per_jiffy = lpj;
> > +             }
> > +
> > +             for_each_cpu(cpu, cpus)
> > +                     set_cyc2ns_scale(tsc_khz, cpu, rdtsc());
>
> This code doesn't make sense, the rdtsc() _must_ be called on the CPU in
> question.

Well, strictly speaking the TSC value here comes from the CPU running the code.

The original code has this problem too, though (as Viresh said), so
the patch really doesn't make it worse in that respect. :-)

I'm not going to defend the original code (I ldidn't invent it
anyway), but it clearly assumes that different CPUs cannot run at
different frequencies and that kind of explains what happens in it.

> That's part of the whole problem here, TSC isn't sync'ed when
> it's subject to CPUFREQ.

So what would you recommend us to do here?

Obviously, this won't run on any new hardware.  Frankly, I'm not even
sure what the most recent HW where this hack would make a difference
is (the comment talking about Opterons suggests early 2000s), so this
clearly falls into the "legacy" bucket to me.

Does it make sense to try to preserve it, or can we simply make
cpufreq init fail on the systems where the TSC rate depends on the
frequency?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2019-03-18 10:45 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-15  9:13 [PATCH V2] cpufreq: Call transition notifier only once for each policy Viresh Kumar
2019-03-15  9:25 ` Viresh Kumar
2019-03-15  9:13 ` Viresh Kumar
2019-03-15  9:13 ` Viresh Kumar
2019-03-15 12:29 ` Peter Zijlstra
2019-03-15 12:29   ` Peter Zijlstra
2019-03-15 12:29   ` Peter Zijlstra
2019-03-15 12:29   ` Peter Zijlstra
2019-03-18  2:35   ` Viresh Kumar
2019-03-18  2:47     ` Viresh Kumar
2019-03-18  2:35     ` Viresh Kumar
2019-03-18 10:53     ` Peter Zijlstra
2019-03-18 10:53       ` Peter Zijlstra
2019-03-18 10:53       ` Peter Zijlstra
2019-03-18 10:53       ` Peter Zijlstra
2019-03-18 11:09       ` Rafael J. Wysocki
2019-03-18 11:09         ` Rafael J. Wysocki
2019-03-18 11:09         ` Rafael J. Wysocki
2019-03-18 11:20         ` Rafael J. Wysocki
2019-03-18 11:20           ` Rafael J. Wysocki
2019-03-18 11:20           ` Rafael J. Wysocki
2019-03-18 10:45   ` Rafael J. Wysocki [this message]
2019-03-18 10:45     ` Rafael J. Wysocki
2019-03-18 10:45     ` Rafael J. Wysocki
2019-03-18 11:01     ` Peter Zijlstra
2019-03-18 11:01       ` Peter Zijlstra
2019-03-18 11:01       ` Peter Zijlstra
2019-03-18 11:01       ` Peter Zijlstra
2019-03-18 11:49 ` Rafael J. Wysocki
2019-03-18 11:49   ` Rafael J. Wysocki
2019-03-18 11:49   ` Rafael J. Wysocki
2019-03-19  5:50   ` Viresh Kumar
2019-03-19  5:51     ` Viresh Kumar
2019-03-19  5:50     ` Viresh Kumar
2019-03-19  9:41     ` Rafael J. Wysocki
2019-03-19  9:41       ` Rafael J. Wysocki
2019-03-19  9:41       ` Rafael J. Wysocki
2019-03-19 10:49       ` Viresh Kumar
2019-03-19 10:49         ` Viresh Kumar
2019-03-19 10:49         ` Viresh Kumar
2019-03-19 15:44         ` Rafael J. Wysocki
2019-03-19 15:44           ` Rafael J. Wysocki
2019-03-19 15:44           ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJZ5v0g=VYSKAHuncceD-8a1AY+isR7-TMX2dbdMzpZgVkRfBg@mail.gmail.com' \
    --to=rafael@kernel.org \
    --cc=bp@alien8.de \
    --cc=davem@davemloft.net \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=rkrcmar@redhat.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.