linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug: d0e936adbd22 crashes at boot
@ 2021-09-03 13:36 Jens Axboe
  2021-09-03 14:13 ` Srinivas Pandruvada
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2021-09-03 13:36 UTC (permalink / raw)
  To: LKML, Srinivas Pandruvada, Rafael J. Wysocki, Len Brown, inux-pm

Hi,

Booting Linus's tree causes a crash on my laptop, an x1 gen9. This was a bit
difficult to pin down as it crashes before the display is up, but I managed
to narrow it down to:

commit d0e936adbd2250cb03f2e840c6651d18edc22ace
Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Date:   Thu Aug 19 19:40:06 2021 -0700

    cpufreq: intel_pstate: Process HWP Guaranteed change notification

which crashes with a NULL pointer deref in notify_hwp_interrupt() ->
queue_delayed_work_on().

Reverting this change makes the laptop boot fine again.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug: d0e936adbd22 crashes at boot
  2021-09-03 13:36 Bug: d0e936adbd22 crashes at boot Jens Axboe
@ 2021-09-03 14:13 ` Srinivas Pandruvada
  2021-09-03 14:15   ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Srinivas Pandruvada @ 2021-09-03 14:13 UTC (permalink / raw)
  To: Jens Axboe, LKML, Rafael J. Wysocki, Len Brown, inux-pm

Hi Axboe,

Thanks for reporting.
On Fri, 2021-09-03 at 07:36 -0600, Jens Axboe wrote:
> Hi,
> 
> Booting Linus's tree causes a crash on my laptop, an x1 gen9. This was
> a bit
> difficult to pin down as it crashes before the display is up, but I
> managed
> to narrow it down to:
> 
> commit d0e936adbd2250cb03f2e840c6651d18edc22ace
> Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Date:   Thu Aug 19 19:40:06 2021 -0700
> 
>     cpufreq: intel_pstate: Process HWP Guaranteed change notification
> 
> which crashes with a NULL pointer deref in notify_hwp_interrupt() ->
> queue_delayed_work_on().
> 
> Reverting this change makes the laptop boot fine again.
> 
Does this change fixes your issue?

diff --git a/drivers/cpufreq/intel_pstate.c
b/drivers/cpufreq/intel_pstate.c
index b4ffe6c8a0d0..6a3c6f60ad12 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1650,7 +1650,10 @@ void notify_hwp_interrupt(void)
                return;
 
        cpudata = all_cpu_data[this_cpu];
-       schedule_delayed_work_on(this_cpu, &cpudata->hwp_notify_work,
msecs_to_jiffies(10));
+       if (cpudata)
+               schedule_delayed_work_on(this_cpu, &cpudata-
>hwp_notify_work, msecs_to_jiffies(10));
+       else
+               wrmsrl(MSR_HWP_STATUS, 0);
 }
 
 static void intel_pstate_enable_hwp_interrupt(struct cpudata *cpudata)


Thanks,
Srinivas


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug: d0e936adbd22 crashes at boot
  2021-09-03 14:13 ` Srinivas Pandruvada
@ 2021-09-03 14:15   ` Jens Axboe
  2021-09-03 14:38     ` Srinivas Pandruvada
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2021-09-03 14:15 UTC (permalink / raw)
  To: Srinivas Pandruvada, LKML, Rafael J. Wysocki, Len Brown, inux-pm

On 9/3/21 8:13 AM, Srinivas Pandruvada wrote:
> Hi Axboe,
> 
> Thanks for reporting.
> On Fri, 2021-09-03 at 07:36 -0600, Jens Axboe wrote:
>> Hi,
>>
>> Booting Linus's tree causes a crash on my laptop, an x1 gen9. This was
>> a bit
>> difficult to pin down as it crashes before the display is up, but I
>> managed
>> to narrow it down to:
>>
>> commit d0e936adbd2250cb03f2e840c6651d18edc22ace
>> Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>> Date:   Thu Aug 19 19:40:06 2021 -0700
>>
>>     cpufreq: intel_pstate: Process HWP Guaranteed change notification
>>
>> which crashes with a NULL pointer deref in notify_hwp_interrupt() ->
>> queue_delayed_work_on().
>>
>> Reverting this change makes the laptop boot fine again.
>>
> Does this change fixes your issue?

I would assume so, as it's crashing on cpudata == NULL :-)

But why is it NULL? Happy to test patches, but the below doesn't look like
a real fix and more of a work-around.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug: d0e936adbd22 crashes at boot
  2021-09-03 14:15   ` Jens Axboe
@ 2021-09-03 14:38     ` Srinivas Pandruvada
  2021-09-03 15:00       ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Srinivas Pandruvada @ 2021-09-03 14:38 UTC (permalink / raw)
  To: Jens Axboe, LKML, Rafael J. Wysocki, Len Brown, inux-pm

On Fri, 2021-09-03 at 08:15 -0600, Jens Axboe wrote:
> On 9/3/21 8:13 AM, Srinivas Pandruvada wrote:
> > Hi Axboe,
> > 
> > Thanks for reporting.
> > On Fri, 2021-09-03 at 07:36 -0600, Jens Axboe wrote:
> > > Hi,
> > > 
> > > Booting Linus's tree causes a crash on my laptop, an x1 gen9. This
> > > was
> > > a bit
> > > difficult to pin down as it crashes before the display is up, but I
> > > managed
> > > to narrow it down to:
> > > 
> > > commit d0e936adbd2250cb03f2e840c6651d18edc22ace
> > > Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > > Date:   Thu Aug 19 19:40:06 2021 -0700
> > > 
> > >     cpufreq: intel_pstate: Process HWP Guaranteed change
> > > notification
> > > 
> > > which crashes with a NULL pointer deref in notify_hwp_interrupt() -
> > > >
> > > queue_delayed_work_on().
> > > 
> > > Reverting this change makes the laptop boot fine again.
> > > 
> > Does this change fixes your issue?
> 
> I would assume so, as it's crashing on cpudata == NULL :-)
> 
> But why is it NULL? Happy to test patches, but the below doesn't look
> like
> a real fix and more of a work-around.

This platform is sending an HWP interrupt on a CPU which we didn't yet
bring it up for pstate control. So somehow firmware decided to send
very early during boot, which previously we would have ignored it

Actually try this, with more prevention

diff --git a/drivers/cpufreq/intel_pstate.c
b/drivers/cpufreq/intel_pstate.c
index b4ffe6c8a0d0..6ee88d7640ea 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1645,12 +1645,24 @@ void notify_hwp_interrupt(void)
        if (!hwp_active || !boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
                return;
 
-       rdmsrl(MSR_HWP_STATUS, value);
+       rdmsrl_safe(MSR_HWP_STATUS, &value);
        if (!(value & 0x01))
                return;
 
+       /*
+        * After hwp_active is set and all_cpu_data is allocated, there
+        * is small window.
+        */
+       if (!all_cpu_data) {
+               wrmsrl_safe(MSR_HWP_STATUS, 0);
+               return;
+       }
+
        cpudata = all_cpu_data[this_cpu];
-       schedule_delayed_work_on(this_cpu, &cpudata->hwp_notify_work,
msecs_to_jiffies(10));
+       if (cpudata)
+               schedule_delayed_work_on(this_cpu, &cpudata-
>hwp_notify_work, msecs_to_jiffies(10));
+       else
+               wrmsrl_safe(MSR_HWP_STATUS, 0);
 }
 


> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug: d0e936adbd22 crashes at boot
  2021-09-03 14:38     ` Srinivas Pandruvada
@ 2021-09-03 15:00       ` Jens Axboe
  2021-09-03 15:11         ` Srinivas Pandruvada
  2021-09-03 18:00         ` Srinivas Pandruvada
  0 siblings, 2 replies; 10+ messages in thread
From: Jens Axboe @ 2021-09-03 15:00 UTC (permalink / raw)
  To: Srinivas Pandruvada, LKML, Rafael J. Wysocki, Len Brown, inux-pm

On 9/3/21 8:38 AM, Srinivas Pandruvada wrote:
> On Fri, 2021-09-03 at 08:15 -0600, Jens Axboe wrote:
>> On 9/3/21 8:13 AM, Srinivas Pandruvada wrote:
>>> Hi Axboe,
>>>
>>> Thanks for reporting.
>>> On Fri, 2021-09-03 at 07:36 -0600, Jens Axboe wrote:
>>>> Hi,
>>>>
>>>> Booting Linus's tree causes a crash on my laptop, an x1 gen9. This
>>>> was
>>>> a bit
>>>> difficult to pin down as it crashes before the display is up, but I
>>>> managed
>>>> to narrow it down to:
>>>>
>>>> commit d0e936adbd2250cb03f2e840c6651d18edc22ace
>>>> Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>>>> Date:   Thu Aug 19 19:40:06 2021 -0700
>>>>
>>>>     cpufreq: intel_pstate: Process HWP Guaranteed change
>>>> notification
>>>>
>>>> which crashes with a NULL pointer deref in notify_hwp_interrupt() -
>>>>>
>>>> queue_delayed_work_on().
>>>>
>>>> Reverting this change makes the laptop boot fine again.
>>>>
>>> Does this change fixes your issue?
>>
>> I would assume so, as it's crashing on cpudata == NULL :-)
>>
>> But why is it NULL? Happy to test patches, but the below doesn't look
>> like
>> a real fix and more of a work-around.
> 
> This platform is sending an HWP interrupt on a CPU which we didn't yet
> bring it up for pstate control. So somehow firmware decided to send
> very early during boot, which previously we would have ignored it
> 
> Actually try this, with more prevention

I can give this a whirl.

> diff --git a/drivers/cpufreq/intel_pstate.c
> b/drivers/cpufreq/intel_pstate.c
> index b4ffe6c8a0d0..6ee88d7640ea 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -1645,12 +1645,24 @@ void notify_hwp_interrupt(void)
>         if (!hwp_active || !boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
>                 return;
>  
> -       rdmsrl(MSR_HWP_STATUS, value);
> +       rdmsrl_safe(MSR_HWP_STATUS, &value);
>         if (!(value & 0x01))
>                 return;
>  
> +       /*
> +        * After hwp_active is set and all_cpu_data is allocated, there
> +        * is small window.
> +        */
> +       if (!all_cpu_data) {
> +               wrmsrl_safe(MSR_HWP_STATUS, 0);
> +               return;
> +       }

What synchronizes the all_cpu_data setup and the interrupt? Can the
interrupt come in while it's still being setup?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug: d0e936adbd22 crashes at boot
  2021-09-03 15:00       ` Jens Axboe
@ 2021-09-03 15:11         ` Srinivas Pandruvada
  2021-09-03 18:00         ` Srinivas Pandruvada
  1 sibling, 0 replies; 10+ messages in thread
From: Srinivas Pandruvada @ 2021-09-03 15:11 UTC (permalink / raw)
  To: Jens Axboe, LKML, Rafael J. Wysocki, Len Brown, inux-pm

On Fri, 2021-09-03 at 09:00 -0600, Jens Axboe wrote:
> On 9/3/21 8:38 AM, Srinivas Pandruvada wrote:
> > On Fri, 2021-09-03 at 08:15 -0600, Jens Axboe wrote:
> > > On 9/3/21 8:13 AM, Srinivas Pandruvada wrote:
> > > > Hi Axboe,
> > > > 
> > > > Thanks for reporting.
> > > > On Fri, 2021-09-03 at 07:36 -0600, Jens Axboe wrote:
> > > > > Hi,
> > > > > 
> > > > > Booting Linus's tree causes a crash on my laptop, an x1 gen9.
> > > > > This
> > > > > was
> > > > > a bit
> > > > > difficult to pin down as it crashes before the display is up,
> > > > > but I
> > > > > managed
> > > > > to narrow it down to:
> > > > > 
> > > > > commit d0e936adbd2250cb03f2e840c6651d18edc22ace
> > > > > Author: Srinivas Pandruvada < 
> > > > > srinivas.pandruvada@linux.intel.com>
> > > > > Date:   Thu Aug 19 19:40:06 2021 -0700
> > > > > 
> > > > >     cpufreq: intel_pstate: Process HWP Guaranteed change
> > > > > notification
> > > > > 
> > > > > which crashes with a NULL pointer deref in
> > > > > notify_hwp_interrupt() -
> > > > > > 
> > > > > queue_delayed_work_on().
> > > > > 
> > > > > Reverting this change makes the laptop boot fine again.
> > > > > 
> > > > Does this change fixes your issue?
> > > 
> > > I would assume so, as it's crashing on cpudata == NULL :-)
> > > 
> > > But why is it NULL? Happy to test patches, but the below doesn't
> > > look
> > > like
> > > a real fix and more of a work-around.
> > 
> > This platform is sending an HWP interrupt on a CPU which we didn't
> > yet
> > bring it up for pstate control. So somehow firmware decided to send
> > very early during boot, which previously we would have ignored it
> > 
> > Actually try this, with more prevention
> 
> I can give this a whirl.
> 
> > diff --git a/drivers/cpufreq/intel_pstate.c
> > b/drivers/cpufreq/intel_pstate.c
> > index b4ffe6c8a0d0..6ee88d7640ea 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -1645,12 +1645,24 @@ void notify_hwp_interrupt(void)
> >         if (!hwp_active || !boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
> >                 return;
> >  
> > -       rdmsrl(MSR_HWP_STATUS, value);
> > +       rdmsrl_safe(MSR_HWP_STATUS, &value);
> >         if (!(value & 0x01))
> >                 return;
> >  
> > +       /*
> > +        * After hwp_active is set and all_cpu_data is allocated,
> > there
> > +        * is small window.
> > +        */
> > +       if (!all_cpu_data) {
> > +               wrmsrl_safe(MSR_HWP_STATUS, 0);
> > +               return;
> > +       }
> 
> What synchronizes the all_cpu_data setup and the interrupt? Can the
> interrupt come in while it's still being setup?
Yes.
I am working on a change to simulate that case and fix.

Thanks,
Srinivas
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug: d0e936adbd22 crashes at boot
  2021-09-03 15:00       ` Jens Axboe
  2021-09-03 15:11         ` Srinivas Pandruvada
@ 2021-09-03 18:00         ` Srinivas Pandruvada
  2021-09-03 20:41           ` Jens Axboe
  1 sibling, 1 reply; 10+ messages in thread
From: Srinivas Pandruvada @ 2021-09-03 18:00 UTC (permalink / raw)
  To: Jens Axboe, LKML, Rafael J. Wysocki, Len Brown, linux-pm

[-- Attachment #1: Type: text/plain, Size: 2834 bytes --]

Hi Axobe,

On Fri, 2021-09-03 at 09:00 -0600, Jens Axboe wrote:
> On 9/3/21 8:38 AM, Srinivas Pandruvada wrote:
> > On Fri, 2021-09-03 at 08:15 -0600, Jens Axboe wrote:
> > > On 9/3/21 8:13 AM, Srinivas Pandruvada wrote:
> > > > Hi Axboe,
> > > > 
> > > > Thanks for reporting.
> > > > On Fri, 2021-09-03 at 07:36 -0600, Jens Axboe wrote:
> > > > > Hi,
> > > > > 
> > > > > Booting Linus's tree causes a crash on my laptop, an x1 gen9.
> > > > > This
> > > > > was
> > > > > a bit
> > > > > difficult to pin down as it crashes before the display is up,
> > > > > but I
> > > > > managed
> > > > > to narrow it down to:
> > > > > 
> > > > > commit d0e936adbd2250cb03f2e840c6651d18edc22ace
> > > > > Author: Srinivas Pandruvada < 
> > > > > srinivas.pandruvada@linux.intel.com>
> > > > > Date:   Thu Aug 19 19:40:06 2021 -0700
> > > > > 
> > > > >     cpufreq: intel_pstate: Process HWP Guaranteed change
> > > > > notification
> > > > > 
> > > > > which crashes with a NULL pointer deref in
> > > > > notify_hwp_interrupt() -
> > > > > > 
> > > > > queue_delayed_work_on().
> > > > > 
> > > > > Reverting this change makes the laptop boot fine again.
> > > > > 
> > > > Does this change fixes your issue?
> > > 
> > > I would assume so, as it's crashing on cpudata == NULL :-)
> > > 
> > > But why is it NULL? Happy to test patches, but the below doesn't
> > > look
> > > like
> > > a real fix and more of a work-around.
> > 

Please try the attached.

Thanks,
Srinivas

> > This platform is sending an HWP interrupt on a CPU which we didn't
> > yet
> > bring it up for pstate control. So somehow firmware decided to send
> > very early during boot, which previously we would have ignored it
> > 
> > Actually try this, with more prevention
> 
> I can give this a whirl.
> 
> > diff --git a/drivers/cpufreq/intel_pstate.c
> > b/drivers/cpufreq/intel_pstate.c
> > index b4ffe6c8a0d0..6ee88d7640ea 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -1645,12 +1645,24 @@ void notify_hwp_interrupt(void)
> >         if (!hwp_active || !boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
> >                 return;
> >  
> > -       rdmsrl(MSR_HWP_STATUS, value);
> > +       rdmsrl_safe(MSR_HWP_STATUS, &value);
> >         if (!(value & 0x01))
> >                 return;
> >  
> > +       /*
> > +        * After hwp_active is set and all_cpu_data is allocated,
> > there
> > +        * is small window.
> > +        */
> > +       if (!all_cpu_data) {
> > +               wrmsrl_safe(MSR_HWP_STATUS, 0);
> > +               return;
> > +       }
> 
> What synchronizes the all_cpu_data setup and the interrupt? Can the
> interrupt come in while it's still being setup?
> 


[-- Attachment #2: 0001-cpufreq-intel_pstate-Fix-for-HWP-interrupt-before-dr.patch --]
[-- Type: text/x-patch, Size: 3055 bytes --]

From 380d5a340ebeb172c93a878fd84a12e7bfea9cff Mon Sep 17 00:00:00 2001
From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Date: Fri, 3 Sep 2021 10:41:35 -0700
Subject: [TEST PATCH] cpufreq: intel_pstate: Fix for HWP interrupt before
 driver is ready

In x1 gen9 laptop, one HWP interrupt arrives before driver is ready
to handle on that CPU. Here firmware is enabling and sending an
interrupt for guarantee change. Since driver didn't have cpudata
initialized it will cause NULL pointer when trying to schedule
processing of interrupt in a workwqueue.

To avoid this set a cpumask of CPUs for which driver has initialized
interrupts. If not initialized simply clear the HWP status.

Since the same thing may happen during S3 resume, clear the cpumask
during offline and let it recreate it during online.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/cpufreq/intel_pstate.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index b4ffe6c8a0d0..5ac86bfa1080 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -298,6 +298,8 @@ static bool hwp_boost __read_mostly;
 
 static struct cpufreq_driver *intel_pstate_driver __read_mostly;
 
+static cpumask_t hwp_intr_enable_mask;
+
 #ifdef CONFIG_ACPI
 static bool acpi_ppc;
 #endif
@@ -1067,11 +1069,15 @@ static void intel_pstate_hwp_set(unsigned int cpu)
 	wrmsrl_on_cpu(cpu, MSR_HWP_REQUEST, value);
 }
 
+static void intel_pstate_disable_hwp_interrupt(struct cpudata *cpudata);
+
 static void intel_pstate_hwp_offline(struct cpudata *cpu)
 {
 	u64 value = READ_ONCE(cpu->hwp_req_cached);
 	int min_perf;
 
+	intel_pstate_disable_hwp_interrupt(cpu);
+
 	if (boot_cpu_has(X86_FEATURE_HWP_EPP)) {
 		/*
 		 * In case the EPP has been set to "performance" by the
@@ -1645,20 +1651,35 @@ void notify_hwp_interrupt(void)
 	if (!hwp_active || !boot_cpu_has(X86_FEATURE_HWP_NOTIFY))
 		return;
 
-	rdmsrl(MSR_HWP_STATUS, value);
+	rdmsrl_safe(MSR_HWP_STATUS, &value);
 	if (!(value & 0x01))
 		return;
 
+	if (!cpumask_test_cpu(this_cpu, &hwp_intr_enable_mask)) {
+		wrmsrl_safe(MSR_HWP_STATUS, 0);
+		return;
+	}
+
 	cpudata = all_cpu_data[this_cpu];
 	schedule_delayed_work_on(this_cpu, &cpudata->hwp_notify_work, msecs_to_jiffies(10));
 }
 
+static void intel_pstate_disable_hwp_interrupt(struct cpudata *cpudata)
+{
+
+	if (cpumask_test_and_clear_cpu(cpudata->cpu, &hwp_intr_enable_mask)) {
+		wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x00);
+		cancel_delayed_work_sync(&cpudata->hwp_notify_work);
+	}
+}
+
 static void intel_pstate_enable_hwp_interrupt(struct cpudata *cpudata)
 {
 	/* Enable HWP notification interrupt for guaranteed performance change */
 	if (boot_cpu_has(X86_FEATURE_HWP_NOTIFY)) {
 		INIT_DELAYED_WORK(&cpudata->hwp_notify_work, intel_pstate_notify_work);
 		wrmsrl_on_cpu(cpudata->cpu, MSR_HWP_INTERRUPT, 0x01);
+		cpumask_set_cpu(cpudata->cpu, &hwp_intr_enable_mask);
 	}
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug: d0e936adbd22 crashes at boot
  2021-09-03 18:00         ` Srinivas Pandruvada
@ 2021-09-03 20:41           ` Jens Axboe
  2021-09-03 20:57             ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2021-09-03 20:41 UTC (permalink / raw)
  To: Srinivas Pandruvada, LKML, Rafael J. Wysocki, Len Brown, linux-pm

On 9/3/21 12:00 PM, Srinivas Pandruvada wrote:
> Hi Axobe,
> 
> On Fri, 2021-09-03 at 09:00 -0600, Jens Axboe wrote:
>> On 9/3/21 8:38 AM, Srinivas Pandruvada wrote:
>>> On Fri, 2021-09-03 at 08:15 -0600, Jens Axboe wrote:
>>>> On 9/3/21 8:13 AM, Srinivas Pandruvada wrote:
>>>>> Hi Axboe,
>>>>>
>>>>> Thanks for reporting.
>>>>> On Fri, 2021-09-03 at 07:36 -0600, Jens Axboe wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Booting Linus's tree causes a crash on my laptop, an x1 gen9.
>>>>>> This
>>>>>> was
>>>>>> a bit
>>>>>> difficult to pin down as it crashes before the display is up,
>>>>>> but I
>>>>>> managed
>>>>>> to narrow it down to:
>>>>>>
>>>>>> commit d0e936adbd2250cb03f2e840c6651d18edc22ace
>>>>>> Author: Srinivas Pandruvada < 
>>>>>> srinivas.pandruvada@linux.intel.com>
>>>>>> Date:   Thu Aug 19 19:40:06 2021 -0700
>>>>>>
>>>>>>     cpufreq: intel_pstate: Process HWP Guaranteed change
>>>>>> notification
>>>>>>
>>>>>> which crashes with a NULL pointer deref in
>>>>>> notify_hwp_interrupt() -
>>>>>>>
>>>>>> queue_delayed_work_on().
>>>>>>
>>>>>> Reverting this change makes the laptop boot fine again.
>>>>>>
>>>>> Does this change fixes your issue?
>>>>
>>>> I would assume so, as it's crashing on cpudata == NULL :-)
>>>>
>>>> But why is it NULL? Happy to test patches, but the below doesn't
>>>> look
>>>> like
>>>> a real fix and more of a work-around.
>>>
> 
> Please try the attached.

I'll give it a test spin right now. Please do add a Reported-by tag,
though. That's always prudent.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug: d0e936adbd22 crashes at boot
  2021-09-03 20:41           ` Jens Axboe
@ 2021-09-03 20:57             ` Jens Axboe
  2021-09-03 22:38               ` Srinivas Pandruvada
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2021-09-03 20:57 UTC (permalink / raw)
  To: Srinivas Pandruvada, LKML, Rafael J. Wysocki, Len Brown, linux-pm

On 9/3/21 2:41 PM, Jens Axboe wrote:
> On 9/3/21 12:00 PM, Srinivas Pandruvada wrote:
>> Hi Axobe,
>>
>> On Fri, 2021-09-03 at 09:00 -0600, Jens Axboe wrote:
>>> On 9/3/21 8:38 AM, Srinivas Pandruvada wrote:
>>>> On Fri, 2021-09-03 at 08:15 -0600, Jens Axboe wrote:
>>>>> On 9/3/21 8:13 AM, Srinivas Pandruvada wrote:
>>>>>> Hi Axboe,
>>>>>>
>>>>>> Thanks for reporting.
>>>>>> On Fri, 2021-09-03 at 07:36 -0600, Jens Axboe wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Booting Linus's tree causes a crash on my laptop, an x1 gen9.
>>>>>>> This
>>>>>>> was
>>>>>>> a bit
>>>>>>> difficult to pin down as it crashes before the display is up,
>>>>>>> but I
>>>>>>> managed
>>>>>>> to narrow it down to:
>>>>>>>
>>>>>>> commit d0e936adbd2250cb03f2e840c6651d18edc22ace
>>>>>>> Author: Srinivas Pandruvada < 
>>>>>>> srinivas.pandruvada@linux.intel.com>
>>>>>>> Date:   Thu Aug 19 19:40:06 2021 -0700
>>>>>>>
>>>>>>>     cpufreq: intel_pstate: Process HWP Guaranteed change
>>>>>>> notification
>>>>>>>
>>>>>>> which crashes with a NULL pointer deref in
>>>>>>> notify_hwp_interrupt() -
>>>>>>>>
>>>>>>> queue_delayed_work_on().
>>>>>>>
>>>>>>> Reverting this change makes the laptop boot fine again.
>>>>>>>
>>>>>> Does this change fixes your issue?
>>>>>
>>>>> I would assume so, as it's crashing on cpudata == NULL :-)
>>>>>
>>>>> But why is it NULL? Happy to test patches, but the below doesn't
>>>>> look
>>>>> like
>>>>> a real fix and more of a work-around.
>>>>
>>
>> Please try the attached.
> 
> I'll give it a test spin right now. Please do add a Reported-by tag,
> though. That's always prudent.

And you can add Tested-by as well, it works for me.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bug: d0e936adbd22 crashes at boot
  2021-09-03 20:57             ` Jens Axboe
@ 2021-09-03 22:38               ` Srinivas Pandruvada
  0 siblings, 0 replies; 10+ messages in thread
From: Srinivas Pandruvada @ 2021-09-03 22:38 UTC (permalink / raw)
  To: Jens Axboe, LKML, Rafael J. Wysocki, Len Brown, linux-pm

On Fri, 2021-09-03 at 14:57 -0600, Jens Axboe wrote:
> On 9/3/21 2:41 PM, Jens Axboe wrote:
> > On 9/3/21 12:00 PM, Srinivas Pandruvada wrote:
> > > Hi Axobe,
> > > 
> > > On Fri, 2021-09-03 at 09:00 -0600, Jens Axboe wrote:
> > > > On 9/3/21 8:38 AM, Srinivas Pandruvada wrote:
> > > > > On Fri, 2021-09-03 at 08:15 -0600, Jens Axboe wrote:
> > > > > > On 9/3/21 8:13 AM, Srinivas Pandruvada wrote:
> > > > > > > Hi Axboe,
> > > > > > > 
> > > > > > > Thanks for reporting.
> > > > > > > On Fri, 2021-09-03 at 07:36 -0600, Jens Axboe wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > Booting Linus's tree causes a crash on my laptop, an x1
> > > > > > > > gen9.
> > > > > > > > This
> > > > > > > > was
> > > > > > > > a bit
> > > > > > > > difficult to pin down as it crashes before the display
> > > > > > > > is up,
> > > > > > > > but I
> > > > > > > > managed
> > > > > > > > to narrow it down to:
> > > > > > > > 
> > > > > > > > commit d0e936adbd2250cb03f2e840c6651d18edc22ace
> > > > > > > > Author: Srinivas Pandruvada < 
> > > > > > > > srinivas.pandruvada@linux.intel.com>
> > > > > > > > Date:   Thu Aug 19 19:40:06 2021 -0700
> > > > > > > > 
> > > > > > > >     cpufreq: intel_pstate: Process HWP Guaranteed
> > > > > > > > change
> > > > > > > > notification
> > > > > > > > 
> > > > > > > > which crashes with a NULL pointer deref in
> > > > > > > > notify_hwp_interrupt() -
> > > > > > > > > 
> > > > > > > > queue_delayed_work_on().
> > > > > > > > 
> > > > > > > > Reverting this change makes the laptop boot fine again.
> > > > > > > > 
> > > > > > > Does this change fixes your issue?
> > > > > > 
> > > > > > I would assume so, as it's crashing on cpudata == NULL :-)
> > > > > > 
> > > > > > But why is it NULL? Happy to test patches, but the below
> > > > > > doesn't
> > > > > > look
> > > > > > like
> > > > > > a real fix and more of a work-around.
> > > > > 
> > > 
> > > Please try the attached.
> > 
> > I'll give it a test spin right now. Please do add a Reported-by
> > tag,
> > though. That's always prudent.
> 
> And you can add Tested-by as well, it works for me.

Thanks. I will add both.

-Srinivas
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-03 22:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03 13:36 Bug: d0e936adbd22 crashes at boot Jens Axboe
2021-09-03 14:13 ` Srinivas Pandruvada
2021-09-03 14:15   ` Jens Axboe
2021-09-03 14:38     ` Srinivas Pandruvada
2021-09-03 15:00       ` Jens Axboe
2021-09-03 15:11         ` Srinivas Pandruvada
2021-09-03 18:00         ` Srinivas Pandruvada
2021-09-03 20:41           ` Jens Axboe
2021-09-03 20:57             ` Jens Axboe
2021-09-03 22:38               ` Srinivas Pandruvada

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).