[PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs
@ 2016-12-19 18:19 Jochen Hein
  2016-12-19 19:43 ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Jochen Hein @ 2016-12-19 18:19 UTC (permalink / raw)
  To: Len Brown; +Cc: Vincent Gerris, linux-pm


There are frequent hangs on Baytrail CPUs according to
https://bugzilla.kernel.org/show_bug.cgi?id=109051.
This patch works around the errata by disabling C6.

According to the discussion in the bug and private mail it might be
useful to include the workaround in mainline.

Signed-off-by: Jochen Hein <jochen@jochen.org>
Cc: stable@vger.kernel.org

--- linux-4.7.5/drivers/idle/intel_idle.c.orig	2016-09-24 10:10:18.000000000 +0200
+++ linux-4.7.5/drivers/idle/intel_idle.c	2016-10-16 07:36:51.142862573 +0200
@@ -1210,6 +1210,34 @@
 
 }
 /*
+ * byt_idle_state_table_update(void)
+ *
+ * On BYT, we have errata VLP52 and disable C6.
+ * https://bugzilla.kernel.org/show_bug.cgi?id=109051A
+ * http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-n3520-j2850-celeron-n2920-n2820-n2815-n2806-j1850-j1750-spec-update.pdf
+ * VLP52 EOI Transactions May Not be Sent if Software Enters Core C6 During an Interrupt Service Routine.
+
+Problem:
+If core C6 is entered after the start of an interrupt service routine but before a write
+to the APIC EOI (End of Interrupt) register, and the core is woken up by an event
+other than a fixed interrupt source the core may drop the EOI transaction the next
+time APIC EOI register is written and further interrupts from the same or lower
+priority level will be blocked.
+
+Implication:
+EOI transactions may be lost and interrupts may be blocked when core C6 is used
+during interrupt service routines.
+
+Workaround:
+It is possible for the firmware to contain a workaround for this erratum.
+ */
+static void byt_idle_state_table_update(void)
+{
+	printk(PREFIX "byt_idle_state_table_update reached\n");
+	byt_cstates[1].disabled = 1;	/* C6N-BYT */
+	byt_cstates[2].disabled = 1;	/* C6S-BYT */
+}
+/*
  * sklh_idle_state_table_update(void)
  *
  * On SKL-H (model 0x5e) disable C8 and C9 if:
@@ -1264,6 +1292,10 @@
 	case 0x3e: /* IVT */
 		ivt_idle_state_table_update();
 		break;
+	case 0x37: /* BYT */
+		printk(PREFIX "intel_idle_state_table_update BYT 0x37 reached\n");
+		byt_idle_state_table_update();
+		break;
 	case 0x5c: /* BXT */
 		bxt_idle_state_table_update();
 		break;

-- 
The only problem with troubleshooting is that the trouble shoots back.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs
  2016-12-19 18:19 [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs Jochen Hein
@ 2016-12-19 19:43 ` Rafael J. Wysocki
       [not found]   ` <CA+8K-g=fXN9TWWxkV1AEEjXJLhwJEQ2PB434gKQ9Z7bL=yf+zA@mail.gmail.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2016-12-19 19:43 UTC (permalink / raw)
  To: Jochen Hein; +Cc: Len Brown, Vincent Gerris, Linux PM, Jacob Pan

CC Jacob Pan.

On Mon, Dec 19, 2016 at 7:19 PM, Jochen Hein <jochen@jochen.org> wrote:
>
> There are frequent hangs on Baytrail CPUs according to
> https://bugzilla.kernel.org/show_bug.cgi?id=109051.
> This patch works around the errata by disabling C6.
>
> According to the discussion in the bug and private mail it might be
> useful to include the workaround in mainline.
>
> Signed-off-by: Jochen Hein <jochen@jochen.org>
> Cc: stable@vger.kernel.org
>
> --- linux-4.7.5/drivers/idle/intel_idle.c.orig  2016-09-24 10:10:18.000000000 +0200
> +++ linux-4.7.5/drivers/idle/intel_idle.c       2016-10-16 07:36:51.142862573 +0200
> @@ -1210,6 +1210,34 @@
>
>  }
>  /*
> + * byt_idle_state_table_update(void)
> + *
> + * On BYT, we have errata VLP52 and disable C6.
> + * https://bugzilla.kernel.org/show_bug.cgi?id=109051A
> + * http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-n3520-j2850-celeron-n2920-n2820-n2815-n2806-j1850-j1750-spec-update.pdf
> + * VLP52 EOI Transactions May Not be Sent if Software Enters Core C6 During an Interrupt Service Routine.
> +
> +Problem:
> +If core C6 is entered after the start of an interrupt service routine but before a write
> +to the APIC EOI (End of Interrupt) register, and the core is woken up by an event
> +other than a fixed interrupt source the core may drop the EOI transaction the next
> +time APIC EOI register is written and further interrupts from the same or lower
> +priority level will be blocked.
> +
> +Implication:
> +EOI transactions may be lost and interrupts may be blocked when core C6 is used
> +during interrupt service routines.
> +
> +Workaround:
> +It is possible for the firmware to contain a workaround for this erratum.
> + */
> +static void byt_idle_state_table_update(void)
> +{
> +       printk(PREFIX "byt_idle_state_table_update reached\n");
> +       byt_cstates[1].disabled = 1;    /* C6N-BYT */
> +       byt_cstates[2].disabled = 1;    /* C6S-BYT */
> +}
> +/*
>   * sklh_idle_state_table_update(void)
>   *
>   * On SKL-H (model 0x5e) disable C8 and C9 if:
> @@ -1264,6 +1292,10 @@
>         case 0x3e: /* IVT */
>                 ivt_idle_state_table_update();
>                 break;
> +       case 0x37: /* BYT */
> +               printk(PREFIX "intel_idle_state_table_update BYT 0x37 reached\n");
> +               byt_idle_state_table_update();
> +               break;
>         case 0x5c: /* BXT */
>                 bxt_idle_state_table_update();
>                 break;
>
> --
> The only problem with troubleshooting is that the trouble shoots back.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs
       [not found]   ` <CA+8K-g=fXN9TWWxkV1AEEjXJLhwJEQ2PB434gKQ9Z7bL=yf+zA@mail.gmail.com>
@ 2016-12-27 17:24     ` Vincent Gerris
  2016-12-27 20:37       ` Len Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Vincent Gerris @ 2016-12-27 17:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jochen Hein, Len Brown, Linux PM, Jacob Pan, Hans de Goede

Hi,

Apologies, I apparently send some HTML in the email.
Here is another attempt.

Kind regards,
Vincent

---------- Forwarded message ----------
From: "Vincent Gerris" <vgerris@gmail.com>
Date: Dec 27, 2016 11:47
Subject: Re: [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Jochen Hein" <jochen@jochen.org>, "Len Brown" <lenb@kernel.org>,
"Linux PM" <linux-pm@vger.kernel.org>, "Jacob Pan"
<jacob.jun.pan@linux.intel.com>, "Hans de Goede" <hdegoede@redhat.com>


>
> > Hi all,

>
> Thank you Jochen Hein for creating the patch.
> The patch fixed my problem and at least one other person with a N3530 processor.
>
> I attached a patch for 4.8.x (needed some changes to constants) which also can be found here:
> https://bugzilla.kernel.org/attachment.cgi?id=247621
> I also added Hans de Goede in de CC, he supplied patches for the Cherry trail platform.
>
> Is there anything we can do to accelerate adoption in the main kernel?
> Maybe Jacob can confirm it deals with the errata properly?
>
> Thanks everyone,
>
> Kind regards,
> Vincent
>
>
> On Mon, Dec 19, 2016 at 8:43 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:

>

> >>

>> CC Jacob Pan.
>>
>> On Mon, Dec 19, 2016 at 7:19 PM, Jochen Hein <jochen@jochen.org> wrote:
>> >
>> > There are frequent hangs on Baytrail CPUs according to
>> > https://bugzilla.kernel.org/show_bug.cgi?id=109051.
>> > This patch works around the errata by disabling C6.
>> >
>> > According to the discussion in the bug and private mail it might be
>> > useful to include the workaround in mainline.
>> >
>> > Signed-off-by: Jochen Hein <jochen@jochen.org>
>> > Cc: stable@vger.kernel.org
>> >
>> > --- linux-4.7.5/drivers/idle/intel_idle.c.orig  2016-09-24 10:10:18.000000000 +0200
>> > +++ linux-4.7.5/drivers/idle/intel_idle.c       2016-10-16 07:36:51.142862573 +0200
>> > @@ -1210,6 +1210,34 @@
>> >
>> >  }
>> >  /*
>> > + * byt_idle_state_table_update(void)
>> > + *
>> > + * On BYT, we have errata VLP52 and disable C6.
>> > + * https://bugzilla.kernel.org/show_bug.cgi?id=109051A
>> > + * http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-n3520-j2850-celeron-n2920-n2820-n2815-n2806-j1850-j1750-spec-update.pdf
>> > + * VLP52 EOI Transactions May Not be Sent if Software Enters Core C6 During an Interrupt Service Routine.
>> > +
>> > +Problem:
>> > +If core C6 is entered after the start of an interrupt service routine but before a write
>> > +to the APIC EOI (End of Interrupt) register, and the core is woken up by an event
>> > +other than a fixed interrupt source the core may drop the EOI transaction the next
>> > +time APIC EOI register is written and further interrupts from the same or lower
>> > +priority level will be blocked.
>> > +
>> > +Implication:
>> > +EOI transactions may be lost and interrupts may be blocked when core C6 is used
>> > +during interrupt service routines.
>> > +
>> > +Workaround:
>> > +It is possible for the firmware to contain a workaround for this erratum.
>> > + */
>> > +static void byt_idle_state_table_update(void)
>> > +{
>> > +       printk(PREFIX "byt_idle_state_table_update reached\n");
>> > +       byt_cstates[1].disabled = 1;    /* C6N-BYT */
>> > +       byt_cstates[2].disabled = 1;    /* C6S-BYT */
>> > +}
>> > +/*
>> >   * sklh_idle_state_table_update(void)
>> >   *
>> >   * On SKL-H (model 0x5e) disable C8 and C9 if:
>> > @@ -1264,6 +1292,10 @@
>> >         case 0x3e: /* IVT */
>> >                 ivt_idle_state_table_update();
>> >                 break;
>> > +       case 0x37: /* BYT */
>> > +               printk(PREFIX "intel_idle_state_table_update BYT 0x37 reached\n");
>> > +               byt_idle_state_table_update();
>> > +               break;
>> >         case 0x5c: /* BXT */
>> >                 bxt_idle_state_table_update();
>> >                 break;
>> >
>> > --
>> > The only problem with troubleshooting is that the trouble shoots back.
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-pm" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs
  2016-12-27 17:24     ` Vincent Gerris
@ 2016-12-27 20:37       ` Len Brown
  2016-12-27 20:44         ` Jochen Hein
  0 siblings, 1 reply; 10+ messages in thread
From: Len Brown @ 2016-12-27 20:37 UTC (permalink / raw)
  To: Vincent Gerris
  Cc: Rafael J. Wysocki, Jochen Hein, Linux PM, Jacob Pan, Hans de Goede

>>> On Mon, Dec 19, 2016 at 7:19 PM, Jochen Hein <jochen@jochen.org> wrote:
>>> >
>>> > There are frequent hangs on Baytrail CPUs according to
>>> > https://bugzilla.kernel.org/show_bug.cgi?id=109051.
>>> > This patch works around the errata by disabling C6.
>>> >
>>> > According to the discussion in the bug and private mail it might be
>>> > useful to include the workaround in mainline.
>>> >
>>> > Signed-off-by: Jochen Hein <jochen@jochen.org>
>>> > Cc: stable@vger.kernel.org
>>> >
>>> > --- linux-4.7.5/drivers/idle/intel_idle.c.orig  2016-09-24 10:10:18.000000000 +0200
>>> > +++ linux-4.7.5/drivers/idle/intel_idle.c       2016-10-16 07:36:51.142862573 +0200
>>> > @@ -1210,6 +1210,34 @@
>>> >
>>> >  }
>>> >  /*
>>> > + * byt_idle_state_table_update(void)
>>> > + *
>>> > + * On BYT, we have errata VLP52 and disable C6.
>>> > + * https://bugzilla.kernel.org/show_bug.cgi?id=109051A
>>> > + * http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-n3520-j2850-celeron-n2920-n2820-n2815-n2806-j1850-j1750-spec-update.pdf
>>> > + * VLP52 EOI Transactions May Not be Sent if Software Enters Core C6 During an Interrupt Service Routine.
>>> > +
>>> > +Problem:
>>> > +If core C6 is entered after the start of an interrupt service routine but before a write
>>> > +to the APIC EOI (End of Interrupt) register, and the core is woken up by an event
>>> > +other than a fixed interrupt source the core may drop the EOI transaction the next
>>> > +time APIC EOI register is written and further interrupts from the same or lower
>>> > +priority level will be blocked.
>>> > +
>>> > +Implication:
>>> > +EOI transactions may be lost and interrupts may be blocked when core C6 is used
>>> > +during interrupt service routines.

Exactly how is it possible for Linux to enter idle and issue an MWAIT
from _within_ an interrupt handler?

>>> > +Workaround:
>>> > +It is possible for the firmware to contain a workaround for this erratum.
>>> > + */
>>> > +static void byt_idle_state_table_update(void)
>>> > +{
>>> > +       printk(PREFIX "byt_idle_state_table_update reached\n");
>>> > +       byt_cstates[1].disabled = 1;    /* C6N-BYT */
>>> > +       byt_cstates[2].disabled = 1;    /* C6S-BYT */
>>> > +}
>>> > +/*
>>> >   * sklh_idle_state_table_update(void)
>>> >   *
>>> >   * On SKL-H (model 0x5e) disable C8 and C9 if:
>>> > @@ -1264,6 +1292,10 @@
>>> >         case 0x3e: /* IVT */
>>> >                 ivt_idle_state_table_update();
>>> >                 break;
>>> > +       case 0x37: /* BYT */
>>> > +               printk(PREFIX "intel_idle_state_table_update BYT 0x37 reached\n");
>>> > +               byt_idle_state_table_update();
>>> > +               break;

If the right strategy were to disable C6 for all of BYT, then the
right implementation
would be to delete those states from byt_cstates[], rather than for a
routine to mark
them as disabled.  Note that a user can not later enable a state that is marked
as disabled here, it is never registered with cpuidle, and thus the effect
is exactly the same as if the entry were never in the table in the first place.

>>> >         case 0x5c: /* BXT */
>>> >                 bxt_idle_state_table_update();
>>> >                 break;
>>> >

thanks,
Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs
  2016-12-27 20:37       ` Len Brown
@ 2016-12-27 20:44         ` Jochen Hein
  2016-12-27 21:27           ` Len Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Jochen Hein @ 2016-12-27 20:44 UTC (permalink / raw)
  To: Len Brown
  Cc: Vincent Gerris, Rafael J. Wysocki, Linux PM, Jacob Pan, Hans de Goede


Hi Len,

Len Brown <lenb@kernel.org> writes:

>>>> On Mon, Dec 19, 2016 at 7:19 PM, Jochen Hein <jochen@jochen.org> wrote:
>>>> >
>>>> > There are frequent hangs on Baytrail CPUs according to
>>>> > https://bugzilla.kernel.org/show_bug.cgi?id=109051.
>>>> > This patch works around the errata by disabling C6.

>>>> > +Problem:
>>>> > +If core C6 is entered after the start of an interrupt service routine but before a write
>>>> > +to the APIC EOI (End of Interrupt) register, and the core is woken up by an event
>>>> > +other than a fixed interrupt source the core may drop the EOI transaction the next
>>>> > +time APIC EOI register is written and further interrupts from the same or lower
>>>> > +priority level will be blocked.
>>>> > +
>>>> > +Implication:
>>>> > +EOI transactions may be lost and interrupts may be blocked when core C6 is used
>>>> > +during interrupt service routines.
>
> Exactly how is it possible for Linux to enter idle and issue an MWAIT
> from _within_ an interrupt handler?

I really have no idea - all I can say is that for all Kernels < 4.9 I
had to disable C6 to have a stable system.
4.9 seems stable for me now.

>>>> > +Workaround:
>>>> > +It is possible for the firmware to contain a workaround for this erratum.
>>>> > + */
>>>> > +static void byt_idle_state_table_update(void)
>>>> > +{
>>>> > +       printk(PREFIX "byt_idle_state_table_update reached\n");
>>>> > +       byt_cstates[1].disabled = 1;    /* C6N-BYT */
>>>> > +       byt_cstates[2].disabled = 1;    /* C6S-BYT */
>>>> > +}
>>>> > +/*
>>>> >   * sklh_idle_state_table_update(void)
>>>> >   *
>>>> >   * On SKL-H (model 0x5e) disable C8 and C9 if:
>>>> > @@ -1264,6 +1292,10 @@
>>>> >         case 0x3e: /* IVT */
>>>> >                 ivt_idle_state_table_update();
>>>> >                 break;
>>>> > +       case 0x37: /* BYT */
>>>> > +               printk(PREFIX "intel_idle_state_table_update BYT 0x37 reached\n");
>>>> > +               byt_idle_state_table_update();
>>>> > +               break;
>
> If the right strategy were to disable C6 for all of BYT, then the
> right implementation
> would be to delete those states from byt_cstates[], rather than for a
> routine to mark
> them as disabled.  Note that a user can not later enable a state that is marked
> as disabled here, it is never registered with cpuidle, and thus the effect
> is exactly the same as if the entry were never in the table in the first place.

Would that be a useful workaround for older stable kernels? I think we
should try to get stable systems to the affected users.

Jochen

-- 
The only problem with troubleshooting is that the trouble shoots back.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs
  2016-12-27 20:44         ` Jochen Hein
@ 2016-12-27 21:27           ` Len Brown
  2016-12-28 12:56             ` Vincent Gerris
  0 siblings, 1 reply; 10+ messages in thread
From: Len Brown @ 2016-12-27 21:27 UTC (permalink / raw)
  To: Jochen Hein
  Cc: Vincent Gerris, Rafael J. Wysocki, Linux PM, Jacob Pan, Hans de Goede

It is not possible for Linux to enter idle from within an interrupt handler.
Thus, it is not possible for the cited errata to be the root cause of
this problem.

> - all I can say is that for all Kernels < 4.9 I
> had to disable C6 to have a stable system.
> 4.9 seems stable for me now.

Linux-4.9 does not contain the patch do disable c6 on BYT.
And so that patch can not be the reason for the reported change in stability.

I understand that this stability problem is difficult to get one's
hands around --
many have reported that it was gone, only to later report that it
wasn't really gone...

If you have something that can quickly determine if the problem is present
or if the problem is gone, that would be extremely valuable.

It would also be very interesting to know what C-states are actually
being used on your particular platform.  For example, your patch
disabled c6, but did not disable c7.  I would be interested to know
on a ststem such as yours, if disabling c67 was sufficient, and
if the platform was still able to make use of c7.

You can see if you are actually using c7 by running turbostat on an idle system:

# turbostat --debug -o ts.out sleep 10
and show the ts.out

>> If the right strategy were to disable C6 for all of BYT, then the
>> right implementation
>> would be to delete those states from byt_cstates[], rather than for a
>> routine to mark
>> them as disabled.  Note that a user can not later enable a state that is marked
>> as disabled here, it is never registered with cpuidle, and thus the effect
>> is exactly the same as if the entry were never in the table in the first place.
>
> Would that be a useful workaround for older stable kernels? I think we
> should try to get stable systems to the affected users.

This workaround is already available in the form of cmdline:
intel_idle.max_cstate=1

not all systems need/want that workaround.

thanks,
Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs
  2016-12-27 21:27           ` Len Brown
@ 2016-12-28 12:56             ` Vincent Gerris
  2016-12-28 18:47               ` Jacob Pan
  2017-01-11 23:14               ` Len Brown
  0 siblings, 2 replies; 10+ messages in thread
From: Vincent Gerris @ 2016-12-28 12:56 UTC (permalink / raw)
  To: Len Brown
  Cc: Hans de Goede, Rafael J. Wysocki, Jochen Hein, Jacob Pan, Linux PM

Hi,

First of all I want to thank you for getting involved, it seems intel
is the one that knows best how to address this and get to the root
cause.

On Dec 27, 2016 22:28, "Len Brown" <lenb@kernel.org> wrote:
>
> It is not possible for Linux to enter idle from within an interrupt handler.
> Thus, it is not possible for the cited errata to be the root cause of
> this problem.
>
> > - all I can say is that for all Kernels < 4.9 I
> > had to disable C6 to have a stable system.
> > 4.9 seems stable for me now.

For me 4.9.0 is not stable either and at least someone with gentoo on
4.9.0 rc1 reported freezes too.
Although it takes longer before it happens, it still freezes (N3520 on
Lenovo Yoga 2 11 with latest BIOS).

> Linux-4.9 does not contain the patch do disable c6 on BYT.
> And so that patch can not be the reason for the reported change in stability.
>
> I understand that this stability problem is difficult to get one's
> hands around --
> many have reported that it was gone, only to later report that it
> wasn't really gone...
>
> If you have something that can quickly determine if the problem is present
> or if the problem is gone, that would be extremely valuable.

For me I can usually trigger the freeze by a combination playing video
and network/bluetooth traffic.
What usually freezes it in minutes, is playing video from a network
share, copying that file at the same time and stream audio over
bluetooth.
>
> It would also be very interesting to know what C-states are actually
> being used on your particular platform.  For example, your patch
> disabled c6, but did not disable c7.  I would be interested to know
> on a ststem such as yours, if disabling c67 was sufficient, and
> if the platform was still able to make use of c7.
>
> You can see if you are actually using c7 by running turbostat on an idle system:

If I am not mistaken, this is output from a patched 4.8 kernel from
powertop using Jochens patch :
          Package   |             Core    |            CPU 0
                    |                     | C0 active   0,8%
                    |                     | POLL        0,0%    0,0 ms
                    | C1 (cc1)   14,7%    | C1-BYT     10,4%    4,3 ms
C2 (pc2)    0,0%    |                     |
C3 (pc3)    0,0%    |                     |
C6 (pc6)    0,0%    | C6 (cc6)   81,7%    |
                    |                     | C7S-BYT     3,1%    6,5 ms

                    |             Core    |            CPU 1
                    |                     | C0 active   1,6%
                    |                     | POLL        0,0%    0,0 ms
                    | C1 (cc1)   89,1%    | C1-BYT     56,2%    2,7 ms
                    |                     |
                    |                     |
                    | C6 (cc6)    4,1%    |
                    |                     | C7S-BYT     0,0%    0,0 ms

                    |             Core    |            CPU 2
                    |                     | C0 active   0,3%
                    |                     | POLL        0,0%    0,0 ms
                    | C1 (cc1)   10,7%    | C1-BYT      5,8%    8,9 ms
                    |                     |
                    |                     |
                    | C6 (cc6)   88,1%    |
                    |                     | C7S-BYT    19,5%   31,9 ms

                    |             Core    |            CPU 3
                    |                     | C0 active   0,2%
                    |                     | POLL        0,0%    0,0 ms
                    | C1 (cc1)   17,3%    | C1-BYT     10,2%   11,3 ms
                    |                     |
                    |                     |
                    | C6 (cc6)   82,0%    |

>
> # turbostat --debug -o ts.out sleep 10
> and show the ts.out
>
> >> If the right strategy were to disable C6 for all of BYT, then the
> >> right implementation
> >> would be to delete those states from byt_cstates[], rather than for a
> >> routine to mark
> >> them as disabled.  Note that a user can not later enable a state that is marked
> >> as disabled here, it is never registered with cpuidle, and thus the effect
> >> is exactly the same as if the entry were never in the table in the first place.
I don't fully grasp this, but if the errata causes this, wouldn't it
be more transparent to mark it as such so disable it?
Are we really sure the errata does not cause this? If so, is there a
recommended way from intel on how to patch this?

> >
> > Would that be a useful workaround for older stable kernels? I think we
> > should try to get stable systems to the affected users.
>
> This workaround is already available in the form of cmdline:
> intel_idle.max_cstate=1
>
> not all systems need/want that workaround.

I think there are 2 main reasons why we do not want kernel parameters:
1. normal users will have difficulties doing this (I don't like it either :) )
2. it's not a structural solution

>
> thanks,
> Len Brown, Intel Open Source Technology Center

I saw your patch for auto-demotion, I will test it and report within a few days.
I will also do further testing with the 4.9 kernel. I hope we can get
things stable one way or the other :).

Thanks a lot,
Vincent Gerris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs
  2016-12-28 12:56             ` Vincent Gerris
@ 2016-12-28 18:47               ` Jacob Pan
  2017-01-11 23:14               ` Len Brown
  1 sibling, 0 replies; 10+ messages in thread
From: Jacob Pan @ 2016-12-28 18:47 UTC (permalink / raw)
  To: Vincent Gerris
  Cc: Len Brown, Hans de Goede, Rafael J. Wysocki, Jochen Hein,
	Linux PM, jacob.jun.pan

On Wed, 28 Dec 2016 13:56:25 +0100
Vincent Gerris <vgerris@gmail.com> wrote:

> > >> If the right strategy were to disable C6 for all of BYT, then the
> > >> right implementation
> > >> would be to delete those states from byt_cstates[], rather than
> > >> for a routine to mark
> > >> them as disabled.  Note that a user can not later enable a state
> > >> that is marked as disabled here, it is never registered with
> > >> cpuidle, and thus the effect is exactly the same as if the entry
> > >> were never in the table in the first place.  
> I don't fully grasp this, but if the errata causes this, wouldn't it
> be more transparent to mark it as such so disable it?
> Are we really sure the errata does not cause this? If so, is there a
> recommended way from intel on how to patch this?
I don't think the EOI errata is relevant to this issue. As Len pointed
out, it is impossible to enter C6 or run idle task during ISR.

However, this errata might be the cause of the issue you seen, as you
mentioned you encounter the issue while playing media with audio i
presume? While I am looking for a fix, could you try without audio see
if it still hangs?

"VLP70. LPE Audio Playback May Result in System Hang
Problem: Extended audio playback with the LPE (Low Power Engine) may
result in a system hang if the SOC concurrently enters C6 or deeper
sleep states. Implication: The system may hang when this erratum occurs.
Workaround: It is possible for the driver to contain a workaround for
this erratum."

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs
  2016-12-28 12:56             ` Vincent Gerris
  2016-12-28 18:47               ` Jacob Pan
@ 2017-01-11 23:14               ` Len Brown
  2017-01-11 23:22                 ` Jacob Pan
  1 sibling, 1 reply; 10+ messages in thread
From: Len Brown @ 2017-01-11 23:14 UTC (permalink / raw)
  To: Vincent Gerris
  Cc: Hans de Goede, Rafael J. Wysocki, Jochen Hein, Jacob Pan, Linux PM

Re: powertop

turbostat is more useful.  The pre-release I uploaded to the bug
report includes some baytrail specific support:

https://bugzilla.kernel.org/show_bug.cgi?id=109051#c661

> I saw your patch for auto-demotion, I will test it and report within a few days.
> I will also do further testing with the 4.9 kernel. I hope we can get
> things stable one way or the other :).

Current thinking is that we have multiple issues here, and enabling
auto-demotion
simply hides the actual root causes.

Sound is one known root cause, as Jacob pointed out.
Hopefully we'll have a fix for that one shortly.

I'm testing w/o sound, and still can provoke a failure in under 30 minutes.

Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs
  2017-01-11 23:14               ` Len Brown
@ 2017-01-11 23:22                 ` Jacob Pan
  0 siblings, 0 replies; 10+ messages in thread
From: Jacob Pan @ 2017-01-11 23:22 UTC (permalink / raw)
  To: Len Brown
  Cc: Vincent Gerris, Hans de Goede, Rafael J. Wysocki, Jochen Hein,
	Linux PM, jacob.jun.pan, Vinod Koul, Prusty, Subhransu S

On Wed, 11 Jan 2017 18:14:12 -0500
Len Brown <lenb@kernel.org> wrote:

> Re: powertop
> 
> turbostat is more useful.  The pre-release I uploaded to the bug
> report includes some baytrail specific support:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=109051#c661
> 
> > I saw your patch for auto-demotion, I will test it and report
> > within a few days. I will also do further testing with the 4.9
> > kernel. I hope we can get things stable one way or the other :).  
> 
> Current thinking is that we have multiple issues here, and enabling
> auto-demotion
> simply hides the actual root causes.
> 
> Sound is one known root cause, as Jacob pointed out.
> Hopefully we'll have a fix for that one shortly.
> 
+Vinod & Subhransu
Could you send out Baytrail VLP70 SST driver workaround patch for
testing?

> I'm testing w/o sound, and still can provoke a failure in under 30
> minutes.
> 
> Len Brown, Intel Open Source Technology Center

[

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-01-11 23:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-19 18:19 [PATCH] intel_idle: work around errate VLP52 on Baytrail CPUs Jochen Hein
2016-12-19 19:43 ` Rafael J. Wysocki
     [not found]   ` <CA+8K-g=fXN9TWWxkV1AEEjXJLhwJEQ2PB434gKQ9Z7bL=yf+zA@mail.gmail.com>
2016-12-27 17:24     ` Vincent Gerris
2016-12-27 20:37       ` Len Brown
2016-12-27 20:44         ` Jochen Hein
2016-12-27 21:27           ` Len Brown
2016-12-28 12:56             ` Vincent Gerris
2016-12-28 18:47               ` Jacob Pan
2017-01-11 23:14               ` Len Brown
2017-01-11 23:22                 ` Jacob Pan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.