All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] cpufreq: intel_pstate: Fix for HWP interrupt before driver is ready
Date: Mon, 06 Sep 2021 11:14:05 -0700	[thread overview]
Message-ID: <584a4fad09048e3ea0dbc3515b2e909b745177dd.camel@linux.intel.com> (raw)
In-Reply-To: <CAJZ5v0jLmziZZEqEk-D+b6jD7UUPmeb7MQW1ZptdHTk-2c9nMg@mail.gmail.com>

On Mon, 2021-09-06 at 19:54 +0200, Rafael J. Wysocki wrote:
> On Mon, Sep 6, 2021 at 7:23 PM Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> wrote:
> > 
> > On Mon, 2021-09-06 at 18:58 +0200, Rafael J. Wysocki wrote:
> > > On Mon, Sep 6, 2021 at 6:55 PM Srinivas Pandruvada
> > > <srinivas.pandruvada@linux.intel.com> wrote:
> > > > 
> > > > On Mon, 2021-09-06 at 10:43 -0600, Jens Axboe wrote:
> > > > > On 9/6/21 10:17 AM, Rafael J. Wysocki wrote:
> > > > > > On Sat, Sep 4, 2021 at 7:37 AM Srinivas Pandruvada
> > > > > > <srinivas.pandruvada@linux.intel.com> wrote:
> > > > > > > 
> > > > > > > In Lenovo X1 gen9 laptop, HWP interrupts arrive before
> > > > > > > driver
> > > > > > > is
> > > > > > > ready
> > > > > > > to handle on that CPU. Basically didn't even allocated
> > > > > > > memory
> > > > > > > for
> > > > > > > per
> > > > > > > cpu data structure and not even started interrupt enable
> > > > > > > process
> > > > > > > on that
> > > > > > > CPU. So interrupt handler observes a NULL pointer to
> > > > > > > schedule
> > > > > > > work.
> > > > > > > 
> > > > > > > This interrupt was probably for SMM, but since it is
> > > > > > > redirected
> > > > > > > to
> > > > > > > OS by OSC call, OS receives it, but not ready to handle.
> > > > > > > That
> > > > > > > redirection
> > > > > > > of interrupt to OS was also done to solve one SMM crash on
> > > > > > > Yoga
> > > > > > > 260 for
> > > > > > > HWP interrupt a while back.
> > > > > > > 
> > > > > > > To solve this the HWP interrupt handler should ignore such
> > > > > > > request if the
> > > > > > > driver is not ready. This will require some flag to wait
> > > > > > > till
> > > > > > > the
> > > > > > > driver
> > > > > > > setup a workqueue to handle on a CPU. We can't simply
> > > > > > > assume
> > > > > > > cpudata to
> > > > > > > be NULL and avoid processing as it may not be NULL but data
> > > > > > > structure is
> > > > > > > not in consistent state.
> > > > > > > 
> > > > > > > So created a cpumask which sets the CPU on which interrupt
> > > > > > > was
> > > > > > > setup. If
> > > > > > > not setup, simply clear the interrupt status and return.
> > > > > > > Since
> > > > > > > the
> > > > > > > similar issue can happen during S3 resume, clear the bit
> > > > > > > during
> > > > > > > offline.
> > > > > > > 
> > > > > > > Since interrupt timing may be before HWP is enabled, use
> > > > > > > safe
> > > > > > > MSR
> > > > > > > read
> > > > > > > writes as before the change for HWP interrupt.
> > > > > > > 
> > > > > > > Fixes: d0e936adbd22 ("cpufreq: intel_pstate: Process HWP
> > > > > > > Guaranteed change notification")
> > > > > > > Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
> > > > > > > Signed-off-by: Srinivas Pandruvada <
> > > > > > > srinivas.pandruvada@linux.intel.com>
> > > > > > > ---
> > > > > > >  drivers/cpufreq/intel_pstate.c | 23
> > > > > > > ++++++++++++++++++++++-
> > > > > > >  1 file changed, 22 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/cpufreq/intel_pstate.c
> > > > > > > b/drivers/cpufreq/intel_pstate.c
> > > > > > > index b4ffe6c8a0d0..5ac86bfa1080 100644
> > > > > > > --- a/drivers/cpufreq/intel_pstate.c
> > > > > > > +++ b/drivers/cpufreq/intel_pstate.c
> > > > > > > @@ -298,6 +298,8 @@ static bool hwp_boost __read_mostly;
> > > > > > > 
> > > > > > >  static struct cpufreq_driver *intel_pstate_driver
> > > > > > > __read_mostly;
> > > > > > > 
> > > > > > > +static cpumask_t hwp_intr_enable_mask;
> > > > > > > +
> > > > > > >  #ifdef CONFIG_ACPI
> > > > > > >  static bool acpi_ppc;
> > > > > > >  #endif
> > > > > > > @@ -1067,11 +1069,15 @@ static void
> > > > > > > intel_pstate_hwp_set(unsigned
> > > > > > > int cpu)
> > > > > > >         wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
> > > > > > >  }
> > > > > > > 
> > > > > > > +static void intel_pstate_disable_hwp_interrupt(struct
> > > > > > > cpudata
> > > > > > > *cpudata);
> > > > > > > +
> > > > > > >  static void intel_pstate_hwp_offline(struct cpudata *cpu)
> > > > > > >  {
> > > > > > >         u64 value = READ_ONCE(cpu->hwp_req_cached);
> > > > > > >         int min_perf;
> > > > > > > 
> > > > > > > +       intel_pstate_disable_hwp_interrupt(cpu);
> > > > > > > +
> > > > > > >         if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
> > > > > > >                 /*
> > > > > > >                  * In case the EPP has been set to
> > > > > > > "performance"
> > > > > > > by the
> > > > > > > @@ -1645,20 +1651,35 @@ void notify_hwp_interrupt(void)
> > > > > > >         if (!hwp_active ||
> > > > > > > !boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
> > > > > > >                 return;
> > > > > > > 
> > > > > > > -       rdmsrl(MSR_HWP_STATUS, value);
> > > > > > > +       rdmsrl_safe(MSR_HWP_STATUS, &value);
> > > > > > >         if (!(value & 0x01))
> > > > > > >                 return;
> > > > > > > 
> > > > > > > +       if (!cpumask_test_cpu(this_cpu,
> > > > > > > &hwp_intr_enable_mask)) {
> > > > > > > +               wrmsrl_safe(MSR_HWP_STATUS, 0);
> > > > > > > +               return;
> > > > > > > +       }
> > > > > > 
> > > > > > Without additional locking, there is a race between this and
> > > > > > intel_pstate_disable_hwp_interrupt().
> > > > > > 
> > > > > > 1. notify_hwp_interrupt() checks hwp_intr_enable_mask() and
> > > > > > the
> > > > > > target
> > > > > > CPU is in there, so it will go for scheduling the delayed
> > > > > > work.
> > > > > > 2. intel_pstate_disable_hwp_interrupt() runs between the
> > > > > > check
> > > > > > and
> > > > > > the
> > > > > > cpudata load below.
> > > > > > 3. hwp_notify_work is scheduled on the CPU that isn't there
> > > > > > in
> > > > > > the
> > > > > > mask any more.
> > > > > 
> > > > > I noticed that too, not clear to me how much of an issue that
> > > > > is
> > > > > in
> > > > > practice. But there's definitely a race there.
> > > > Glad to see how this is possible from code running in ISR
> > > > context.
> > > 
> > > intel_pstate_disable_hwp_interrupt() may very well run on a
> > > different
> > > CPU in parallel with the interrupt handler running on this CPU.  Or
> > > is
> > > this not possible for some reason?
> > I see the offline callback is called from cpufreq core from hotplug
> > online/offline callback. So this should run the call on the target
> > CPU.
> > From Documentation
> > "The states CPUHP_AP_OFFLINE … CPUHP_AP_ONLINE are invoked just the
> > after the CPU has been brought up. The interrupts are off and the
> > scheduler is not yet active on this CPU. Starting with
> > CPUHP_AP_OFFLINE
> > the callbacks are invoked on the target CPU."
> > 
> > The only other place it is called is from subsys remove callback. Not
> > sure how can you remove cpufreq subsys on fly.
> 
> cpufreq_unregister_driver() causes this to happen.
> 
> > Let's say it is possible:
> > While running ISR on a local CPU, how can someone pull the CPU before
> > its completion? If the CPU is going away after that, the workqueue is
> > unbounded. So it will run on some other CPU, here if that happens it
> > will call cpufreq update policy, which will be harmless.
> 
> Well, it looks to me like if you are super-unlucky, the ISR may run on
> the local CPU in parallel with intel_pstate_update_status() running on
> a remote one and so dereferencing cpudata from it is generally unsafe.
> In theory.  In practice it is unlikely to become problematic for
> timing reasons AFAICS.
> 
> 
We are handling offline for other thermal interrupt sources from same
interrupt in therm-throt.c, where we do similar in offline path (by
TGLX). If cpufreq offline can cause such issue of changing CPU, I can
call intel_pstate_disable_hwp_interrupt() via override from
https://elixir.bootlin.com/linux/latest/C/ident/thermal_throttle_offline
after masking APIC interrupt.

Thanks,
Srinivas

> Anyway, I would consider using RCU here to stay on the safe side.




  reply	other threads:[~2021-09-06 18:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-04  5:37 [PATCH] cpufreq: intel_pstate: Fix for HWP interrupt before driver is ready Srinivas Pandruvada
2021-09-06 16:17 ` Rafael J. Wysocki
2021-09-06 16:43   ` Jens Axboe
2021-09-06 16:54     ` Srinivas Pandruvada
2021-09-06 16:58       ` Rafael J. Wysocki
2021-09-06 17:23         ` Srinivas Pandruvada
2021-09-06 17:54           ` Rafael J. Wysocki
2021-09-06 18:14             ` Srinivas Pandruvada [this message]
2021-09-06 18:25               ` Rafael J. Wysocki
2021-09-06 19:56                 ` Srinivas Pandruvada
2021-09-07 13:41                   ` Rafael J. Wysocki
2021-09-06 16:53   ` Srinivas Pandruvada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=584a4fad09048e3ea0dbc3515b2e909b745177dd.camel@linux.intel.com \
    --to=srinivas.pandruvada@linux.intel.com \
    --cc=axboe@kernel.dk \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.