All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] hwmon: (coretemp) Handle frozen hotplug state correctly
@ 2017-05-10 14:30 Thomas Gleixner
  2017-05-10 19:16 ` Tommi Rantala
  2017-05-10 20:09 ` Guenter Roeck
  0 siblings, 2 replies; 5+ messages in thread
From: Thomas Gleixner @ 2017-05-10 14:30 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

The recent conversion to the hotplug state machine missed that the original
hotplug notifiers did not execute in the frozen state, which is used on
suspend on resume.

This does not matter on single socket machines, but on multi socket systems
this breaks when the device for a non-boot socket is removed when the last
CPU of that socket is brought offline. The device removal locks up the
machine hard w/o any debug output.

Prevent executing the hotplug callbacks when cpuhp_tasks_frozen is true.

Thanks to Tommi for providing debug information patiently while I failed to
spot the obvious.

Fixes: e00ca5df37ad ("hwmon: (coretemp) Convert to hotplug state machine")
Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 drivers/hwmon/coretemp.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -605,6 +605,13 @@ static int coretemp_cpu_online(unsigned
 	struct platform_data *pdata;
 
 	/*
+	 * Don't execute this on resume as the offline callback did
+	 * not get executed on suspend.
+	 */
+	if (cpuhp_tasks_frozen)
+		return 0;
+
+	/*
 	 * CPUID.06H.EAX[0] indicates whether the CPU has thermal
 	 * sensors. We check this bit only, all the early CPUs
 	 * without thermal sensors will be filtered out.
@@ -654,6 +661,13 @@ static int coretemp_cpu_offline(unsigned
 	struct temp_data *tdata;
 	int indx, target;
 
+	/*
+	 * Don't execute this on suspend as the device remove locks
+	 * up the machine.
+	 */
+	if (cpuhp_tasks_frozen)
+		return 0;
+
 	/* If the physical CPU device does not exist, just return */
 	if (!pdev)
 		return 0;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] hwmon: (coretemp) Handle frozen hotplug state correctly
  2017-05-10 14:30 [PATCH] hwmon: (coretemp) Handle frozen hotplug state correctly Thomas Gleixner
@ 2017-05-10 19:16 ` Tommi Rantala
  2017-05-10 20:09   ` Guenter Roeck
  2017-05-10 20:09 ` Guenter Roeck
  1 sibling, 1 reply; 5+ messages in thread
From: Tommi Rantala @ 2017-05-10 19:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

2017-05-10 17:30 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> The recent conversion to the hotplug state machine missed that the original
> hotplug notifiers did not execute in the frozen state, which is used on
> suspend on resume.
>
> This does not matter on single socket machines, but on multi socket systems
> this breaks when the device for a non-boot socket is removed when the last
> CPU of that socket is brought offline. The device removal locks up the
> machine hard w/o any debug output.
>
> Prevent executing the hotplug callbacks when cpuhp_tasks_frozen is true.
>
> Thanks to Tommi for providing debug information patiently while I failed to
> spot the obvious.
>
> Fixes: e00ca5df37ad ("hwmon: (coretemp) Convert to hotplug state machine")
> Reported-by: Tommi Rantala <tt.rantala@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Many thanks, I can confirm that it works well!

-Tommi

> ---
>  drivers/hwmon/coretemp.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -605,6 +605,13 @@ static int coretemp_cpu_online(unsigned
>         struct platform_data *pdata;
>
>         /*
> +        * Don't execute this on resume as the offline callback did
> +        * not get executed on suspend.
> +        */
> +       if (cpuhp_tasks_frozen)
> +               return 0;
> +
> +       /*
>          * CPUID.06H.EAX[0] indicates whether the CPU has thermal
>          * sensors. We check this bit only, all the early CPUs
>          * without thermal sensors will be filtered out.
> @@ -654,6 +661,13 @@ static int coretemp_cpu_offline(unsigned
>         struct temp_data *tdata;
>         int indx, target;
>
> +       /*
> +        * Don't execute this on suspend as the device remove locks
> +        * up the machine.
> +        */
> +       if (cpuhp_tasks_frozen)
> +               return 0;
> +
>         /* If the physical CPU device does not exist, just return */
>         if (!pdev)
>                 return 0;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] hwmon: (coretemp) Handle frozen hotplug state correctly
  2017-05-10 14:30 [PATCH] hwmon: (coretemp) Handle frozen hotplug state correctly Thomas Gleixner
  2017-05-10 19:16 ` Tommi Rantala
@ 2017-05-10 20:09 ` Guenter Roeck
  1 sibling, 0 replies; 5+ messages in thread
From: Guenter Roeck @ 2017-05-10 20:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tommi Rantala, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

On Wed, May 10, 2017 at 04:30:12PM +0200, Thomas Gleixner wrote:
> The recent conversion to the hotplug state machine missed that the original
> hotplug notifiers did not execute in the frozen state, which is used on
> suspend on resume.
> 
> This does not matter on single socket machines, but on multi socket systems
> this breaks when the device for a non-boot socket is removed when the last
> CPU of that socket is brought offline. The device removal locks up the
> machine hard w/o any debug output.
> 
> Prevent executing the hotplug callbacks when cpuhp_tasks_frozen is true.
> 
> Thanks to Tommi for providing debug information patiently while I failed to
> spot the obvious.
> 
> Fixes: e00ca5df37ad ("hwmon: (coretemp) Convert to hotplug state machine")
> Reported-by: Tommi Rantala <tt.rantala@gmail.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Applied, and thanks a lot for fixing the problem!

Guenter

> ---
>  drivers/hwmon/coretemp.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -605,6 +605,13 @@ static int coretemp_cpu_online(unsigned
>  	struct platform_data *pdata;
>  
>  	/*
> +	 * Don't execute this on resume as the offline callback did
> +	 * not get executed on suspend.
> +	 */
> +	if (cpuhp_tasks_frozen)
> +		return 0;
> +
> +	/*
>  	 * CPUID.06H.EAX[0] indicates whether the CPU has thermal
>  	 * sensors. We check this bit only, all the early CPUs
>  	 * without thermal sensors will be filtered out.
> @@ -654,6 +661,13 @@ static int coretemp_cpu_offline(unsigned
>  	struct temp_data *tdata;
>  	int indx, target;
>  
> +	/*
> +	 * Don't execute this on suspend as the device remove locks
> +	 * up the machine.
> +	 */
> +	if (cpuhp_tasks_frozen)
> +		return 0;
> +
>  	/* If the physical CPU device does not exist, just return */
>  	if (!pdev)
>  		return 0;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] hwmon: (coretemp) Handle frozen hotplug state correctly
  2017-05-10 19:16 ` Tommi Rantala
@ 2017-05-10 20:09   ` Guenter Roeck
  2017-05-11  5:57     ` Tommi Rantala
  0 siblings, 1 reply; 5+ messages in thread
From: Guenter Roeck @ 2017-05-10 20:09 UTC (permalink / raw)
  To: Tommi Rantala
  Cc: Thomas Gleixner, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

On Wed, May 10, 2017 at 10:16:33PM +0300, Tommi Rantala wrote:
> 2017-05-10 17:30 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
> > The recent conversion to the hotplug state machine missed that the original
> > hotplug notifiers did not execute in the frozen state, which is used on
> > suspend on resume.
> >
> > This does not matter on single socket machines, but on multi socket systems
> > this breaks when the device for a non-boot socket is removed when the last
> > CPU of that socket is brought offline. The device removal locks up the
> > machine hard w/o any debug output.
> >
> > Prevent executing the hotplug callbacks when cpuhp_tasks_frozen is true.
> >
> > Thanks to Tommi for providing debug information patiently while I failed to
> > spot the obvious.
> >
> > Fixes: e00ca5df37ad ("hwmon: (coretemp) Convert to hotplug state machine")
> > Reported-by: Tommi Rantala <tt.rantala@gmail.com>
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> Many thanks, I can confirm that it works well!
> 
Ok if I add your Tested-by: ?

Thanks,
Guenter

> -Tommi
> 
> > ---
> >  drivers/hwmon/coretemp.c |   14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > --- a/drivers/hwmon/coretemp.c
> > +++ b/drivers/hwmon/coretemp.c
> > @@ -605,6 +605,13 @@ static int coretemp_cpu_online(unsigned
> >         struct platform_data *pdata;
> >
> >         /*
> > +        * Don't execute this on resume as the offline callback did
> > +        * not get executed on suspend.
> > +        */
> > +       if (cpuhp_tasks_frozen)
> > +               return 0;
> > +
> > +       /*
> >          * CPUID.06H.EAX[0] indicates whether the CPU has thermal
> >          * sensors. We check this bit only, all the early CPUs
> >          * without thermal sensors will be filtered out.
> > @@ -654,6 +661,13 @@ static int coretemp_cpu_offline(unsigned
> >         struct temp_data *tdata;
> >         int indx, target;
> >
> > +       /*
> > +        * Don't execute this on suspend as the device remove locks
> > +        * up the machine.
> > +        */
> > +       if (cpuhp_tasks_frozen)
> > +               return 0;
> > +
> >         /* If the physical CPU device does not exist, just return */
> >         if (!pdev)
> >                 return 0;

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] hwmon: (coretemp) Handle frozen hotplug state correctly
  2017-05-10 20:09   ` Guenter Roeck
@ 2017-05-11  5:57     ` Tommi Rantala
  0 siblings, 0 replies; 5+ messages in thread
From: Tommi Rantala @ 2017-05-11  5:57 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Thomas Gleixner, LKML, Fenghua Yu, Jean Delvare, linux-hwmon,
	Sebastian Siewior, Peter Zijlstra, x86

2017-05-10 23:09 GMT+03:00 Guenter Roeck <linux@roeck-us.net>:
> On Wed, May 10, 2017 at 10:16:33PM +0300, Tommi Rantala wrote:
>> 2017-05-10 17:30 GMT+03:00 Thomas Gleixner <tglx@linutronix.de>:
>> > The recent conversion to the hotplug state machine missed that the original
>> > hotplug notifiers did not execute in the frozen state, which is used on
>> > suspend on resume.
>> >
>> > This does not matter on single socket machines, but on multi socket systems
>> > this breaks when the device for a non-boot socket is removed when the last
>> > CPU of that socket is brought offline. The device removal locks up the
>> > machine hard w/o any debug output.
>> >
>> > Prevent executing the hotplug callbacks when cpuhp_tasks_frozen is true.
>> >
>> > Thanks to Tommi for providing debug information patiently while I failed to
>> > spot the obvious.
>> >
>> > Fixes: e00ca5df37ad ("hwmon: (coretemp) Convert to hotplug state machine")
>> > Reported-by: Tommi Rantala <tt.rantala@gmail.com>
>> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>
>> Many thanks, I can confirm that it works well!
>>
> Ok if I add your Tested-by: ?

Sure!

Tested-by: Tommi Rantala <tt.rantala@gmail.com>

> Thanks,
> Guenter
>
>> -Tommi
>>
>> > ---
>> >  drivers/hwmon/coretemp.c |   14 ++++++++++++++
>> >  1 file changed, 14 insertions(+)
>> >
>> > --- a/drivers/hwmon/coretemp.c
>> > +++ b/drivers/hwmon/coretemp.c
>> > @@ -605,6 +605,13 @@ static int coretemp_cpu_online(unsigned
>> >         struct platform_data *pdata;
>> >
>> >         /*
>> > +        * Don't execute this on resume as the offline callback did
>> > +        * not get executed on suspend.
>> > +        */
>> > +       if (cpuhp_tasks_frozen)
>> > +               return 0;
>> > +
>> > +       /*
>> >          * CPUID.06H.EAX[0] indicates whether the CPU has thermal
>> >          * sensors. We check this bit only, all the early CPUs
>> >          * without thermal sensors will be filtered out.
>> > @@ -654,6 +661,13 @@ static int coretemp_cpu_offline(unsigned
>> >         struct temp_data *tdata;
>> >         int indx, target;
>> >
>> > +       /*
>> > +        * Don't execute this on suspend as the device remove locks
>> > +        * up the machine.
>> > +        */
>> > +       if (cpuhp_tasks_frozen)
>> > +               return 0;
>> > +
>> >         /* If the physical CPU device does not exist, just return */
>> >         if (!pdev)
>> >                 return 0;

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-11  5:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-10 14:30 [PATCH] hwmon: (coretemp) Handle frozen hotplug state correctly Thomas Gleixner
2017-05-10 19:16 ` Tommi Rantala
2017-05-10 20:09   ` Guenter Roeck
2017-05-11  5:57     ` Tommi Rantala
2017-05-10 20:09 ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.