linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
To: Jens Axboe <axboe@kernel.dk>, LKML <linux-kernel@vger.kernel.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Len Brown <lenb@kernel.org>,
	linux-pm@vger.kernel.org
Subject: Re: Bug: d0e936adbd22 crashes at boot
Date: Fri, 03 Sep 2021 11:00:31 -0700	[thread overview]
Message-ID: <d6bf08cbfd9f29ddb8cf29f522d68efc5c676624.camel@linux.intel.com> (raw)
In-Reply-To: <767fe00f-bf31-1eb0-09cc-1be91c633bb4@kernel.dk>

[-- Attachment #1: Type: text/plain, Size: 2834 bytes --]

Hi Axobe,

On Fri, 2021-09-03 at 09:00 -0600, Jens Axboe wrote:
> On 9/3/21 8:38 AM, Srinivas Pandruvada wrote:
> > On Fri, 2021-09-03 at 08:15 -0600, Jens Axboe wrote:
> > > On 9/3/21 8:13 AM, Srinivas Pandruvada wrote:
> > > > Hi Axboe,
> > > > 
> > > > Thanks for reporting.
> > > > On Fri, 2021-09-03 at 07:36 -0600, Jens Axboe wrote:
> > > > > Hi,
> > > > > 
> > > > > Booting Linus's tree causes a crash on my laptop, an x1 gen9.
> > > > > This
> > > > > was
> > > > > a bit
> > > > > difficult to pin down as it crashes before the display is up,
> > > > > but I
> > > > > managed
> > > > > to narrow it down to:
> > > > > 
> > > > > commit d0e936adbd2250cb03f2e840c6651d18edc22ace
> > > > > Author: Srinivas Pandruvada < 
> > > > > srinivas.pandruvada@linux.intel.com>
> > > > > Date:   Thu Aug 19 19:40:06 2021 -0700
> > > > > 
> > > > >     cpufreq: intel_pstate: Process HWP Guaranteed change
> > > > > notification
> > > > > 
> > > > > which crashes with a NULL pointer deref in
> > > > > notify_hwp_interrupt() -
> > > > > > 
> > > > > queue_delayed_work_on().
> > > > > 
> > > > > Reverting this change makes the laptop boot fine again.
> > > > > 
> > > > Does this change fixes your issue?
> > > 
> > > I would assume so, as it's crashing on cpudata == NULL :-)
> > > 
> > > But why is it NULL? Happy to test patches, but the below doesn't
> > > look
> > > like
> > > a real fix and more of a work-around.
> > 

Please try the attached.

Thanks,
Srinivas

> > This platform is sending an HWP interrupt on a CPU which we didn't
> > yet
> > bring it up for pstate control. So somehow firmware decided to send
> > very early during boot, which previously we would have ignored it
> > 
> > Actually try this, with more prevention
> 
> I can give this a whirl.
> 
> > diff --git a/drivers/cpufreq/intel_pstate.c
> > b/drivers/cpufreq/intel_pstate.c
> > index b4ffe6c8a0d0..6ee88d7640ea 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -1645,12 +1645,24 @@ void notify_hwp_interrupt(void)
> >         if (!hwp_active || !boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
> >                 return;
> >  
> > -       rdmsrl(MSR_HWP_STATUS, value);
> > +       rdmsrl_safe(MSR_HWP_STATUS, &value);
> >         if (!(value & 0x01))
> >                 return;
> >  
> > +       /*
> > +        * After hwp_active is set and all_cpu_data is allocated,
> > there
> > +        * is small window.
> > +        */
> > +       if (!all_cpu_data) {
> > +               wrmsrl_safe(MSR_HWP_STATUS, 0);
> > +               return;
> > +       }
> 
> What synchronizes the all_cpu_data setup and the interrupt? Can the
> interrupt come in while it's still being setup?
> 


[-- Attachment #2: 0001-cpufreq-intel_pstate-Fix-for-HWP-interrupt-before-dr.patch --]
[-- Type: text/x-patch, Size: 3055 bytes --]

From 380d5a340ebeb172c93a878fd84a12e7bfea9cff Mon Sep 17 00:00:00 2001
From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Date: Fri, 3 Sep 2021 10:41:35 -0700
Subject: [TEST PATCH] cpufreq: intel_pstate: Fix for HWP interrupt before
 driver is ready

In x1 gen9 laptop, one HWP interrupt arrives before driver is ready
to handle on that CPU. Here firmware is enabling and sending an
interrupt for guarantee change. Since driver didn't have cpudata
initialized it will cause NULL pointer when trying to schedule
processing of interrupt in a workwqueue.

To avoid this set a cpumask of CPUs for which driver has initialized
interrupts. If not initialized simply clear the HWP status.

Since the same thing may happen during S3 resume, clear the cpumask
during offline and let it recreate it during online.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/cpufreq/intel_pstate.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index b4ffe6c8a0d0..5ac86bfa1080 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -298,6 +298,8 @@ static bool hwp_boost __read_mostly;
 
 static struct cpufreq_driver *intel_pstate_driver __read_mostly;
 
+static cpumask_t hwp_intr_enable_mask;
+
 #ifdef CONFIG_ACPI
 static bool acpi_ppc;
 #endif
@@ -1067,11 +1069,15 @@ static void intel_pstate_hwp_set(unsigned int cpu)
 	wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
 }
 
+static void intel_pstate_disable_hwp_interrupt(struct cpudata *cpudata);
+
 static void intel_pstate_hwp_offline(struct cpudata *cpu)
 {
 	u64 value = READ_ONCE(cpu->hwp_req_cached);
 	int min_perf;
 
+	intel_pstate_disable_hwp_interrupt(cpu);
+
 	if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
 		/*
 		 * In case the EPP has been set to "performance" by the
@@ -1645,20 +1651,35 @@ void notify_hwp_interrupt(void)
 	if (!hwp_active || !boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
 		return;
 
-	rdmsrl(MSR_HWP_STATUS, value);
+	rdmsrl_safe(MSR_HWP_STATUS, &value);
 	if (!(value & 0x01))
 		return;
 
+	if (!cpumask_test_cpu(this_cpu, &hwp_intr_enable_mask)) {
+		wrmsrl_safe(MSR_HWP_STATUS, 0);
+		return;
+	}
+
 	cpudata = all_cpu_data[this_cpu];
 	schedule_delayed_work_on(this_cpu, &cpudata->hwp_notify_work, msecs_to_jiffies(10));
 }
 
+static void intel_pstate_disable_hwp_interrupt(struct cpudata *cpudata)
+{
+
+	if (cpumask_test_and_clear_cpu(cpudata->cpu, &hwp_intr_enable_mask)) {
+		wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00);
+		cancel_delayed_work_sync(&cpudata->hwp_notify_work);
+	}
+}
+
 static void intel_pstate_enable_hwp_interrupt(struct cpudata *cpudata)
 {
 	/* Enable HWP notification interrupt for guaranteed performance change */
 	if (boot_cpu_has(X86_FEATURE_HWP_NOTIFY)) {
 		INIT_DELAYED_WORK(&cpudata->hwp_notify_work, intel_pstate_notify_work);
 		wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x01);
+		cpumask_set_cpu(cpudata->cpu, &hwp_intr_enable_mask);
 	}
 }
 
-- 
2.17.1


  parent reply	other threads:[~2021-09-03 18:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-03 13:36 Bug: d0e936adbd22 crashes at boot Jens Axboe
2021-09-03 14:13 ` Srinivas Pandruvada
2021-09-03 14:15   ` Jens Axboe
2021-09-03 14:38     ` Srinivas Pandruvada
2021-09-03 15:00       ` Jens Axboe
2021-09-03 15:11         ` Srinivas Pandruvada
2021-09-03 18:00         ` Srinivas Pandruvada [this message]
2021-09-03 20:41           ` Jens Axboe
2021-09-03 20:57             ` Jens Axboe
2021-09-03 22:38               ` Srinivas Pandruvada

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6bf08cbfd9f29ddb8cf29f522d68efc5c676624.camel@linux.intel.com \
    --to=srinivas.pandruvada@linux.intel.com \
    --cc=axboe@kernel.dk \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).