xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: extend coverage of HLE "bad page" workaround
@ 2020-05-26  6:49 Jan Beulich
  2020-05-26 11:17 ` Andrew Cooper
  2023-03-21 14:42 ` Roger Pau Monné
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Beulich @ 2020-05-26  6:49 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Respective Core Gen10 processor lines are affected, too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
     case 0x000506e0: /* errata SKL167 / SKW159 */
     case 0x000806e0: /* erratum KBL??? */
     case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
+    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
+    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
         *array_size = (cpuid_eax(0) >= 7 && !cpu_has_hypervisor &&
                        (cpuid_count_ebx(7, 0) & cpufeat_mask(X86_FEATURE_HLE)));
         return &hle_bad_page;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2020-05-26  6:49 [PATCH] x86: extend coverage of HLE "bad page" workaround Jan Beulich
@ 2020-05-26 11:17 ` Andrew Cooper
  2020-05-26 13:35   ` Jan Beulich
  2023-03-21 14:42 ` Roger Pau Monné
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2020-05-26 11:17 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 26/05/2020 07:49, Jan Beulich wrote:
> Respective Core Gen10 processor lines are affected, too.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>      case 0x000506e0: /* errata SKL167 / SKW159 */
>      case 0x000806e0: /* erratum KBL??? */
>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */

This is marred in complexity.

The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
no longer exists/works.  This applies to IceLake systems, but possibly
not their initial release configuration (hence, via a later microcode
update).

HLE is also disabled in microcode on all older parts for errata reasons,
so in practice it doesn't exist anywhere now.

I think it is safe to drop this workaround, and this does seem a more
simple option than encoding which microcode turned HLE off (which sadly
isn't covered by the spec updates, as even when turned off, HLE is still
functioning according to its spec of "may speed things up, may do
nothing"), or the interactions with the CPUID hiding capabilities of
MSR_TSX_CTRL.

~Andrew

>          *array_size = (cpuid_eax(0) >= 7 && !cpu_has_hypervisor &&
>                         (cpuid_count_ebx(7, 0) & cpufeat_mask(X86_FEATURE_HLE)));
>          return &hle_bad_page;



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2020-05-26 11:17 ` Andrew Cooper
@ 2020-05-26 13:35   ` Jan Beulich
  2020-05-26 15:01     ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2020-05-26 13:35 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Roger Pau Monné

On 26.05.2020 13:17, Andrew Cooper wrote:
> On 26/05/2020 07:49, Jan Beulich wrote:
>> Respective Core Gen10 processor lines are affected, too.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>      case 0x000806e0: /* erratum KBL??? */
>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
> 
> This is marred in complexity.
> 
> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
> no longer exists/works.  This applies to IceLake systems, but possibly
> not their initial release configuration (hence, via a later microcode
> update).
> 
> HLE is also disabled in microcode on all older parts for errata reasons,
> so in practice it doesn't exist anywhere now.
> 
> I think it is safe to drop this workaround, and this does seem a more
> simple option than encoding which microcode turned HLE off (which sadly
> isn't covered by the spec updates, as even when turned off, HLE is still
> functioning according to its spec of "may speed things up, may do
> nothing"), or the interactions with the CPUID hiding capabilities of
> MSR_TSX_CTRL.

I'm afraid I don't fully follow: For one, does what you say imply HLE is
no longer enumerated in CPUID? If so, and if we assume all CPU models
listed have had suitable ucode updates issued, we could indeed drop the
workaround (as taking effect only when HLE is enumerated). But then this
erratum does not have the usual text effectively meaning that an ucode
update is or will be available to address the issue; instead it says
that BIOS or VMM can reserve the respective address range. This -
assuming the alternative you describe is indeed viable - then is surely
a much more intrusive workaround than needed. Which I wouldn't assume
they would suggest in such a case.

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2020-05-26 13:35   ` Jan Beulich
@ 2020-05-26 15:01     ` Andrew Cooper
  2020-05-26 16:40       ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Cooper @ 2020-05-26 15:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, Roger Pau Monné

On 26/05/2020 14:35, Jan Beulich wrote:
> On 26.05.2020 13:17, Andrew Cooper wrote:
>> On 26/05/2020 07:49, Jan Beulich wrote:
>>> Respective Core Gen10 processor lines are affected, too.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>>      case 0x000806e0: /* erratum KBL??? */
>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
>> This is marred in complexity.
>>
>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
>> no longer exists/works.  This applies to IceLake systems, but possibly
>> not their initial release configuration (hence, via a later microcode
>> update).
>>
>> HLE is also disabled in microcode on all older parts for errata reasons,
>> so in practice it doesn't exist anywhere now.
>>
>> I think it is safe to drop this workaround, and this does seem a more
>> simple option than encoding which microcode turned HLE off (which sadly
>> isn't covered by the spec updates, as even when turned off, HLE is still
>> functioning according to its spec of "may speed things up, may do
>> nothing"), or the interactions with the CPUID hiding capabilities of
>> MSR_TSX_CTRL.
> I'm afraid I don't fully follow: For one, does what you say imply HLE is
> no longer enumerated in CPUID?

No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
microcode fiasco", the HLE bit will continue to exist and be set. 
(Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.)

It was always a weird CPUID bit.  You were supposed to put
XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
on old hardware and go faster on newer hardware.

There is nothing runtime code needs to look at the HLE bit for, except
perhaps for UI reporting purposes.

> But then this
> erratum does not have the usual text effectively meaning that an ucode
> update is or will be available to address the issue; instead it says
> that BIOS or VMM can reserve the respective address range.

This is not surprising at all.  Turning off HLE was an unrelated
activity, and I bet the link went unnoticed.

> This - assuming the alternative you describe is indeed viable - then is surely
> a much more intrusive workaround than needed. Which I wouldn't assume
> they would suggest in such a case.

My suggestion was to drop the workaround, not to complicated it with a
microcode revision matrix.

~Andrew


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2020-05-26 15:01     ` Andrew Cooper
@ 2020-05-26 16:40       ` Jan Beulich
  2023-03-17 11:39         ` Roger Pau Monné
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2020-05-26 16:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Roger Pau Monné

On 26.05.2020 17:01, Andrew Cooper wrote:
> On 26/05/2020 14:35, Jan Beulich wrote:
>> On 26.05.2020 13:17, Andrew Cooper wrote:
>>> On 26/05/2020 07:49, Jan Beulich wrote:
>>>> Respective Core Gen10 processor lines are affected, too.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>>>      case 0x000806e0: /* erratum KBL??? */
>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
>>> This is marred in complexity.
>>>
>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
>>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
>>> no longer exists/works.  This applies to IceLake systems, but possibly
>>> not their initial release configuration (hence, via a later microcode
>>> update).
>>>
>>> HLE is also disabled in microcode on all older parts for errata reasons,
>>> so in practice it doesn't exist anywhere now.
>>>
>>> I think it is safe to drop this workaround, and this does seem a more
>>> simple option than encoding which microcode turned HLE off (which sadly
>>> isn't covered by the spec updates, as even when turned off, HLE is still
>>> functioning according to its spec of "may speed things up, may do
>>> nothing"), or the interactions with the CPUID hiding capabilities of
>>> MSR_TSX_CTRL.
>> I'm afraid I don't fully follow: For one, does what you say imply HLE is
>> no longer enumerated in CPUID?
> 
> No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
> microcode fiasco", the HLE bit will continue to exist and be set. 
> (Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.)
> 
> It was always a weird CPUID bit.  You were supposed to put
> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
> on old hardware and go faster on newer hardware.
> 
> There is nothing runtime code needs to look at the HLE bit for, except
> perhaps for UI reporting purposes.

Do you know of some public Intel doc I could reference for all of this,
which I would kind of need in the description of a patch ...

>> But then this
>> erratum does not have the usual text effectively meaning that an ucode
>> update is or will be available to address the issue; instead it says
>> that BIOS or VMM can reserve the respective address range.
> 
> This is not surprising at all.  Turning off HLE was an unrelated
> activity, and I bet the link went unnoticed.
> 
>> This - assuming the alternative you describe is indeed viable - then is surely
>> a much more intrusive workaround than needed. Which I wouldn't assume
>> they would suggest in such a case.
> 
> My suggestion was to drop the workaround, not to complicated it with a
> microcode revision matrix.

... doing this? I don't think I've seen any of this in writing so far,
except by you. (I don't understand how this reply of yours relates to
what I was saying about the spec update. I understand what you are
suggesting. I merely tried to express that I'd have expected Intel to
point out the much easier workaround, rather than just a pretty involved
one.) Otherwise, may I suggest you make such a patch, to make sure it
has an adequate description?

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2020-05-26 16:40       ` Jan Beulich
@ 2023-03-17 11:39         ` Roger Pau Monné
  2023-03-20  9:24           ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2023-03-17 11:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel, Wei Liu

On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote:
> On 26.05.2020 17:01, Andrew Cooper wrote:
> > On 26/05/2020 14:35, Jan Beulich wrote:
> >> On 26.05.2020 13:17, Andrew Cooper wrote:
> >>> On 26/05/2020 07:49, Jan Beulich wrote:
> >>>> Respective Core Gen10 processor lines are affected, too.
> >>>>
> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >>>>
> >>>> --- a/xen/arch/x86/mm.c
> >>>> +++ b/xen/arch/x86/mm.c
> >>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
> >>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
> >>>>      case 0x000806e0: /* erratum KBL??? */
> >>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
> >>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
> >>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
> >>> This is marred in complexity.
> >>>
> >>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
> >>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
> >>> no longer exists/works.  This applies to IceLake systems, but possibly
> >>> not their initial release configuration (hence, via a later microcode
> >>> update).
> >>>
> >>> HLE is also disabled in microcode on all older parts for errata reasons,
> >>> so in practice it doesn't exist anywhere now.
> >>>
> >>> I think it is safe to drop this workaround, and this does seem a more
> >>> simple option than encoding which microcode turned HLE off (which sadly
> >>> isn't covered by the spec updates, as even when turned off, HLE is still
> >>> functioning according to its spec of "may speed things up, may do
> >>> nothing"), or the interactions with the CPUID hiding capabilities of
> >>> MSR_TSX_CTRL.
> >> I'm afraid I don't fully follow: For one, does what you say imply HLE is
> >> no longer enumerated in CPUID?
> > 
> > No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
> > microcode fiasco", the HLE bit will continue to exist and be set. 
> > (Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.)
> > 
> > It was always a weird CPUID bit.  You were supposed to put
> > XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
> > on old hardware and go faster on newer hardware.
> > 
> > There is nothing runtime code needs to look at the HLE bit for, except
> > perhaps for UI reporting purposes.
> 
> Do you know of some public Intel doc I could reference for all of this,
> which I would kind of need in the description of a patch ...
> 
> >> But then this
> >> erratum does not have the usual text effectively meaning that an ucode
> >> update is or will be available to address the issue; instead it says
> >> that BIOS or VMM can reserve the respective address range.
> > 
> > This is not surprising at all.  Turning off HLE was an unrelated
> > activity, and I bet the link went unnoticed.
> > 
> >> This - assuming the alternative you describe is indeed viable - then is surely
> >> a much more intrusive workaround than needed. Which I wouldn't assume
> >> they would suggest in such a case.
> > 
> > My suggestion was to drop the workaround, not to complicated it with a
> > microcode revision matrix.
> 
> ... doing this? I don't think I've seen any of this in writing so far,
> except by you. (I don't understand how this reply of yours relates to
> what I was saying about the spec update. I understand what you are
> suggesting. I merely tried to express that I'd have expected Intel to
> point out the much easier workaround, rather than just a pretty involved
> one.) Otherwise, may I suggest you make such a patch, to make sure it
> has an adequate description?

Seeing as there seems to be some data missing to justify the commit -
was has Linux done with those erratas?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2023-03-17 11:39         ` Roger Pau Monné
@ 2023-03-20  9:24           ` Jan Beulich
  2023-03-21 14:51             ` Andrew Cooper
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2023-03-20  9:24 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, xen-devel, Wei Liu

On 17.03.2023 12:39, Roger Pau Monné wrote:
> On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote:
>> On 26.05.2020 17:01, Andrew Cooper wrote:
>>> On 26/05/2020 14:35, Jan Beulich wrote:
>>>> On 26.05.2020 13:17, Andrew Cooper wrote:
>>>>> On 26/05/2020 07:49, Jan Beulich wrote:
>>>>>> Respective Core Gen10 processor lines are affected, too.
>>>>>>
>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>
>>>>>> --- a/xen/arch/x86/mm.c
>>>>>> +++ b/xen/arch/x86/mm.c
>>>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>>>>>      case 0x000806e0: /* erratum KBL??? */
>>>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>>>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
>>>>> This is marred in complexity.
>>>>>
>>>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
>>>>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
>>>>> no longer exists/works.  This applies to IceLake systems, but possibly
>>>>> not their initial release configuration (hence, via a later microcode
>>>>> update).
>>>>>
>>>>> HLE is also disabled in microcode on all older parts for errata reasons,
>>>>> so in practice it doesn't exist anywhere now.
>>>>>
>>>>> I think it is safe to drop this workaround, and this does seem a more
>>>>> simple option than encoding which microcode turned HLE off (which sadly
>>>>> isn't covered by the spec updates, as even when turned off, HLE is still
>>>>> functioning according to its spec of "may speed things up, may do
>>>>> nothing"), or the interactions with the CPUID hiding capabilities of
>>>>> MSR_TSX_CTRL.
>>>> I'm afraid I don't fully follow: For one, does what you say imply HLE is
>>>> no longer enumerated in CPUID?
>>>
>>> No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
>>> microcode fiasco", the HLE bit will continue to exist and be set. 
>>> (Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.)
>>>
>>> It was always a weird CPUID bit.  You were supposed to put
>>> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
>>> on old hardware and go faster on newer hardware.
>>>
>>> There is nothing runtime code needs to look at the HLE bit for, except
>>> perhaps for UI reporting purposes.
>>
>> Do you know of some public Intel doc I could reference for all of this,
>> which I would kind of need in the description of a patch ...
>>
>>>> But then this
>>>> erratum does not have the usual text effectively meaning that an ucode
>>>> update is or will be available to address the issue; instead it says
>>>> that BIOS or VMM can reserve the respective address range.
>>>
>>> This is not surprising at all.  Turning off HLE was an unrelated
>>> activity, and I bet the link went unnoticed.
>>>
>>>> This - assuming the alternative you describe is indeed viable - then is surely
>>>> a much more intrusive workaround than needed. Which I wouldn't assume
>>>> they would suggest in such a case.
>>>
>>> My suggestion was to drop the workaround, not to complicated it with a
>>> microcode revision matrix.
>>
>> ... doing this? I don't think I've seen any of this in writing so far,
>> except by you. (I don't understand how this reply of yours relates to
>> what I was saying about the spec update. I understand what you are
>> suggesting. I merely tried to express that I'd have expected Intel to
>> point out the much easier workaround, rather than just a pretty involved
>> one.) Otherwise, may I suggest you make such a patch, to make sure it
>> has an adequate description?
> 
> Seeing as there seems to be some data missing to justify the commit -
> was has Linux done with those erratas?

While they deal with the SNB erratum in a similar way, I'm afraid I'm
unaware of Linux having or having had a workaround for the errata here.
Which, granted, is a little surprising when we did actually even issue
an XSA for this.

In fact I find Andrew's request even more surprising with that fact (us
having issued XSA-282 for it) in mind, which originally I don't think I
had paid attention to (nor recalled).

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2020-05-26  6:49 [PATCH] x86: extend coverage of HLE "bad page" workaround Jan Beulich
  2020-05-26 11:17 ` Andrew Cooper
@ 2023-03-21 14:42 ` Roger Pau Monné
  2023-03-21 15:51   ` Jan Beulich
  1 sibling, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2023-03-21 14:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Tue, May 26, 2020 at 08:49:52AM +0200, Jan Beulich wrote:
> Respective Core Gen10 processor lines are affected, too.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>      case 0x000506e0: /* errata SKL167 / SKW159 */
>      case 0x000806e0: /* erratum KBL??? */
>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */

I think this is errata CML101, I would add that at the end of the
comment.

Also you seem to be missing the '806ec' model (806e0 case)? (listed as
'U 4+2 V1')

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2023-03-20  9:24           ` Jan Beulich
@ 2023-03-21 14:51             ` Andrew Cooper
  2023-03-21 15:59               ` Roger Pau Monné
  2023-03-21 16:31               ` Jan Beulich
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Cooper @ 2023-03-21 14:51 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné; +Cc: xen-devel, Wei Liu

On 20/03/2023 9:24 am, Jan Beulich wrote:
> On 17.03.2023 12:39, Roger Pau Monné wrote:
>> On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote:
>>> On 26.05.2020 17:01, Andrew Cooper wrote:
>>>> On 26/05/2020 14:35, Jan Beulich wrote:
>>>>> On 26.05.2020 13:17, Andrew Cooper wrote:
>>>>>> On 26/05/2020 07:49, Jan Beulich wrote:
>>>>>>> Respective Core Gen10 processor lines are affected, too.
>>>>>>>
>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>>
>>>>>>> --- a/xen/arch/x86/mm.c
>>>>>>> +++ b/xen/arch/x86/mm.c
>>>>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>>>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>>>>>>      case 0x000806e0: /* erratum KBL??? */
>>>>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>>>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>>>>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
>>>>>> This is marred in complexity.
>>>>>>
>>>>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
>>>>>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
>>>>>> no longer exists/works.  This applies to IceLake systems, but possibly
>>>>>> not their initial release configuration (hence, via a later microcode
>>>>>> update).
>>>>>>
>>>>>> HLE is also disabled in microcode on all older parts for errata reasons,
>>>>>> so in practice it doesn't exist anywhere now.
>>>>>>
>>>>>> I think it is safe to drop this workaround, and this does seem a more
>>>>>> simple option than encoding which microcode turned HLE off (which sadly
>>>>>> isn't covered by the spec updates, as even when turned off, HLE is still
>>>>>> functioning according to its spec of "may speed things up, may do
>>>>>> nothing"), or the interactions with the CPUID hiding capabilities of
>>>>>> MSR_TSX_CTRL.
>>>>> I'm afraid I don't fully follow: For one, does what you say imply HLE is
>>>>> no longer enumerated in CPUID?
>>>> No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
>>>> microcode fiasco", the HLE bit will continue to exist and be set. 
>>>> (Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.)
>>>>
>>>> It was always a weird CPUID bit.  You were supposed to put
>>>> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
>>>> on old hardware and go faster on newer hardware.
>>>>
>>>> There is nothing runtime code needs to look at the HLE bit for, except
>>>> perhaps for UI reporting purposes.
>>> Do you know of some public Intel doc I could reference for all of this,
>>> which I would kind of need in the description of a patch ...
>>>
>>>>> But then this
>>>>> erratum does not have the usual text effectively meaning that an ucode
>>>>> update is or will be available to address the issue; instead it says
>>>>> that BIOS or VMM can reserve the respective address range.
>>>> This is not surprising at all.  Turning off HLE was an unrelated
>>>> activity, and I bet the link went unnoticed.
>>>>
>>>>> This - assuming the alternative you describe is indeed viable - then is surely
>>>>> a much more intrusive workaround than needed. Which I wouldn't assume
>>>>> they would suggest in such a case.
>>>> My suggestion was to drop the workaround, not to complicated it with a
>>>> microcode revision matrix.
>>> ... doing this? I don't think I've seen any of this in writing so far,
>>> except by you. (I don't understand how this reply of yours relates to
>>> what I was saying about the spec update. I understand what you are
>>> suggesting. I merely tried to express that I'd have expected Intel to
>>> point out the much easier workaround, rather than just a pretty involved
>>> one.) Otherwise, may I suggest you make such a patch, to make sure it
>>> has an adequate description?
>> Seeing as there seems to be some data missing to justify the commit -
>> was has Linux done with those erratas?
> While they deal with the SNB erratum in a similar way, I'm afraid I'm
> unaware of Linux having or having had a workaround for the errata here.
> Which, granted, is a little surprising when we did actually even issue
> an XSA for this.
>
> In fact I find Andrew's request even more surprising with that fact (us
> having issued XSA-282 for it) in mind, which originally I don't think I
> had paid attention to (nor recalled).

No - I'm aware of it.  It probably was the right move at the time.

But, Intel have subsequently killed HLE in microcode updates update in
all CPUs it ever existed in (to fix a memory ordering erratum), and
removed it from the architecture moving forwards (the enumeration of
TSX_CTRL means HLE architecturally doesn't exist even if it is enumerated).

These errata no longer exist when you've got up to date microcode, and
if you've not got up to date microcode on these CPUs, you've got far
worse security problems.

~Andrew


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2023-03-21 14:42 ` Roger Pau Monné
@ 2023-03-21 15:51   ` Jan Beulich
  2023-03-21 15:58     ` Roger Pau Monné
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2023-03-21 15:51 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 21.03.2023 15:42, Roger Pau Monné wrote:
> On Tue, May 26, 2020 at 08:49:52AM +0200, Jan Beulich wrote:
>> Respective Core Gen10 processor lines are affected, too.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>      case 0x000806e0: /* erratum KBL??? */
>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
> 
> I think this is errata CML101, I would add that at the end of the
> comment.

Indeed in the current version of the document CML prefix exist. The older
document I've been looking at has no such letter acronyms in front of the
errata numbers. I can certainly update.

> Also you seem to be missing the '806ec' model (806e0 case)? (listed as
> 'U 4+2 V1')

Isn't that pre-existing (see 2nd line of context above)?

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2023-03-21 15:51   ` Jan Beulich
@ 2023-03-21 15:58     ` Roger Pau Monné
  2023-03-21 16:03       ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2023-03-21 15:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu

On Tue, Mar 21, 2023 at 04:51:43PM +0100, Jan Beulich wrote:
> On 21.03.2023 15:42, Roger Pau Monné wrote:
> > On Tue, May 26, 2020 at 08:49:52AM +0200, Jan Beulich wrote:
> >> Respective Core Gen10 processor lines are affected, too.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >>
> >> --- a/xen/arch/x86/mm.c
> >> +++ b/xen/arch/x86/mm.c
> >> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
> >>      case 0x000506e0: /* errata SKL167 / SKW159 */
> >>      case 0x000806e0: /* erratum KBL??? */
> >>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
> >> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
> >> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
> > 
> > I think this is errata CML101, I would add that at the end of the
> > comment.
> 
> Indeed in the current version of the document CML prefix exist. The older
> document I've been looking at has no such letter acronyms in front of the
> errata numbers. I can certainly update.
> 
> > Also you seem to be missing the '806ec' model (806e0 case)? (listed as
> > 'U 4+2 V1')
> 
> Isn't that pre-existing (see 2nd line of context above)?

Oh, indeed.  Would you mind also adding a reference to CML101 for
0x000806e0 then?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2023-03-21 14:51             ` Andrew Cooper
@ 2023-03-21 15:59               ` Roger Pau Monné
  2023-03-21 16:14                 ` Andrew Cooper
  2023-03-21 16:31               ` Jan Beulich
  1 sibling, 1 reply; 15+ messages in thread
From: Roger Pau Monné @ 2023-03-21 15:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, xen-devel, Wei Liu

On Tue, Mar 21, 2023 at 02:51:30PM +0000, Andrew Cooper wrote:
> On 20/03/2023 9:24 am, Jan Beulich wrote:
> > On 17.03.2023 12:39, Roger Pau Monné wrote:
> >> On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote:
> >>> On 26.05.2020 17:01, Andrew Cooper wrote:
> >>>> On 26/05/2020 14:35, Jan Beulich wrote:
> >>>>> On 26.05.2020 13:17, Andrew Cooper wrote:
> >>>>>> On 26/05/2020 07:49, Jan Beulich wrote:
> >>>>>>> Respective Core Gen10 processor lines are affected, too.
> >>>>>>>
> >>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >>>>>>>
> >>>>>>> --- a/xen/arch/x86/mm.c
> >>>>>>> +++ b/xen/arch/x86/mm.c
> >>>>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
> >>>>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
> >>>>>>>      case 0x000806e0: /* erratum KBL??? */
> >>>>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
> >>>>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
> >>>>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
> >>>>>> This is marred in complexity.
> >>>>>>
> >>>>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
> >>>>>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
> >>>>>> no longer exists/works.  This applies to IceLake systems, but possibly
> >>>>>> not their initial release configuration (hence, via a later microcode
> >>>>>> update).
> >>>>>>
> >>>>>> HLE is also disabled in microcode on all older parts for errata reasons,
> >>>>>> so in practice it doesn't exist anywhere now.
> >>>>>>
> >>>>>> I think it is safe to drop this workaround, and this does seem a more
> >>>>>> simple option than encoding which microcode turned HLE off (which sadly
> >>>>>> isn't covered by the spec updates, as even when turned off, HLE is still
> >>>>>> functioning according to its spec of "may speed things up, may do
> >>>>>> nothing"), or the interactions with the CPUID hiding capabilities of
> >>>>>> MSR_TSX_CTRL.
> >>>>> I'm afraid I don't fully follow: For one, does what you say imply HLE is
> >>>>> no longer enumerated in CPUID?
> >>>> No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
> >>>> microcode fiasco", the HLE bit will continue to exist and be set. 
> >>>> (Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.)
> >>>>
> >>>> It was always a weird CPUID bit.  You were supposed to put
> >>>> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
> >>>> on old hardware and go faster on newer hardware.
> >>>>
> >>>> There is nothing runtime code needs to look at the HLE bit for, except
> >>>> perhaps for UI reporting purposes.
> >>> Do you know of some public Intel doc I could reference for all of this,
> >>> which I would kind of need in the description of a patch ...
> >>>
> >>>>> But then this
> >>>>> erratum does not have the usual text effectively meaning that an ucode
> >>>>> update is or will be available to address the issue; instead it says
> >>>>> that BIOS or VMM can reserve the respective address range.
> >>>> This is not surprising at all.  Turning off HLE was an unrelated
> >>>> activity, and I bet the link went unnoticed.
> >>>>
> >>>>> This - assuming the alternative you describe is indeed viable - then is surely
> >>>>> a much more intrusive workaround than needed. Which I wouldn't assume
> >>>>> they would suggest in such a case.
> >>>> My suggestion was to drop the workaround, not to complicated it with a
> >>>> microcode revision matrix.
> >>> ... doing this? I don't think I've seen any of this in writing so far,
> >>> except by you. (I don't understand how this reply of yours relates to
> >>> what I was saying about the spec update. I understand what you are
> >>> suggesting. I merely tried to express that I'd have expected Intel to
> >>> point out the much easier workaround, rather than just a pretty involved
> >>> one.) Otherwise, may I suggest you make such a patch, to make sure it
> >>> has an adequate description?
> >> Seeing as there seems to be some data missing to justify the commit -
> >> was has Linux done with those erratas?
> > While they deal with the SNB erratum in a similar way, I'm afraid I'm
> > unaware of Linux having or having had a workaround for the errata here.
> > Which, granted, is a little surprising when we did actually even issue
> > an XSA for this.
> >
> > In fact I find Andrew's request even more surprising with that fact (us
> > having issued XSA-282 for it) in mind, which originally I don't think I
> > had paid attention to (nor recalled).
> 
> No - I'm aware of it.  It probably was the right move at the time.
> 
> But, Intel have subsequently killed HLE in microcode updates update in
> all CPUs it ever existed in (to fix a memory ordering erratum), and
> removed it from the architecture moving forwards (the enumeration of
> TSX_CTRL means HLE architecturally doesn't exist even if it is enumerated).

Should we then check for TSX_CTRL in order to check whether to engage
the workaround?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2023-03-21 15:58     ` Roger Pau Monné
@ 2023-03-21 16:03       ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2023-03-21 16:03 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu

On 21.03.2023 16:58, Roger Pau Monné wrote:
> On Tue, Mar 21, 2023 at 04:51:43PM +0100, Jan Beulich wrote:
>> On 21.03.2023 15:42, Roger Pau Monné wrote:
>>> On Tue, May 26, 2020 at 08:49:52AM +0200, Jan Beulich wrote:
>>>> Respective Core Gen10 processor lines are affected, too.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> --- a/xen/arch/x86/mm.c
>>>> +++ b/xen/arch/x86/mm.c
>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>>>      case 0x000806e0: /* erratum KBL??? */
>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
>>>
>>> I think this is errata CML101, I would add that at the end of the
>>> comment.
>>
>> Indeed in the current version of the document CML prefix exist. The older
>> document I've been looking at has no such letter acronyms in front of the
>> errata numbers. I can certainly update.
>>
>>> Also you seem to be missing the '806ec' model (806e0 case)? (listed as
>>> 'U 4+2 V1')
>>
>> Isn't that pre-existing (see 2nd line of context above)?
> 
> Oh, indeed.  Would you mind also adding a reference to CML101 for
> 0x000806e0 then?

Already done. Provided there'll be a v2 in the first place, considering
Andrew's comments (which I still need to properly consume).

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2023-03-21 15:59               ` Roger Pau Monné
@ 2023-03-21 16:14                 ` Andrew Cooper
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Cooper @ 2023-03-21 16:14 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Jan Beulich, xen-devel, Wei Liu

On 21/03/2023 3:59 pm, Roger Pau Monné wrote:
> On Tue, Mar 21, 2023 at 02:51:30PM +0000, Andrew Cooper wrote:
>> On 20/03/2023 9:24 am, Jan Beulich wrote:
>>> On 17.03.2023 12:39, Roger Pau Monné wrote:
>>>> On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote:
>>>>> On 26.05.2020 17:01, Andrew Cooper wrote:
>>>>>> On 26/05/2020 14:35, Jan Beulich wrote:
>>>>>>> On 26.05.2020 13:17, Andrew Cooper wrote:
>>>>>>>> On 26/05/2020 07:49, Jan Beulich wrote:
>>>>>>>>> Respective Core Gen10 processor lines are affected, too.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>>>>
>>>>>>>>> --- a/xen/arch/x86/mm.c
>>>>>>>>> +++ b/xen/arch/x86/mm.c
>>>>>>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>>>>>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>>>>>>>>      case 0x000806e0: /* erratum KBL??? */
>>>>>>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>>>>>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>>>>>>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
>>>>>>>> This is marred in complexity.
>>>>>>>>
>>>>>>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
>>>>>>>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
>>>>>>>> no longer exists/works.  This applies to IceLake systems, but possibly
>>>>>>>> not their initial release configuration (hence, via a later microcode
>>>>>>>> update).
>>>>>>>>
>>>>>>>> HLE is also disabled in microcode on all older parts for errata reasons,
>>>>>>>> so in practice it doesn't exist anywhere now.
>>>>>>>>
>>>>>>>> I think it is safe to drop this workaround, and this does seem a more
>>>>>>>> simple option than encoding which microcode turned HLE off (which sadly
>>>>>>>> isn't covered by the spec updates, as even when turned off, HLE is still
>>>>>>>> functioning according to its spec of "may speed things up, may do
>>>>>>>> nothing"), or the interactions with the CPUID hiding capabilities of
>>>>>>>> MSR_TSX_CTRL.
>>>>>>> I'm afraid I don't fully follow: For one, does what you say imply HLE is
>>>>>>> no longer enumerated in CPUID?
>>>>>> No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
>>>>>> microcode fiasco", the HLE bit will continue to exist and be set. 
>>>>>> (Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.)
>>>>>>
>>>>>> It was always a weird CPUID bit.  You were supposed to put
>>>>>> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
>>>>>> on old hardware and go faster on newer hardware.
>>>>>>
>>>>>> There is nothing runtime code needs to look at the HLE bit for, except
>>>>>> perhaps for UI reporting purposes.
>>>>> Do you know of some public Intel doc I could reference for all of this,
>>>>> which I would kind of need in the description of a patch ...
>>>>>
>>>>>>> But then this
>>>>>>> erratum does not have the usual text effectively meaning that an ucode
>>>>>>> update is or will be available to address the issue; instead it says
>>>>>>> that BIOS or VMM can reserve the respective address range.
>>>>>> This is not surprising at all.  Turning off HLE was an unrelated
>>>>>> activity, and I bet the link went unnoticed.
>>>>>>
>>>>>>> This - assuming the alternative you describe is indeed viable - then is surely
>>>>>>> a much more intrusive workaround than needed. Which I wouldn't assume
>>>>>>> they would suggest in such a case.
>>>>>> My suggestion was to drop the workaround, not to complicated it with a
>>>>>> microcode revision matrix.
>>>>> ... doing this? I don't think I've seen any of this in writing so far,
>>>>> except by you. (I don't understand how this reply of yours relates to
>>>>> what I was saying about the spec update. I understand what you are
>>>>> suggesting. I merely tried to express that I'd have expected Intel to
>>>>> point out the much easier workaround, rather than just a pretty involved
>>>>> one.) Otherwise, may I suggest you make such a patch, to make sure it
>>>>> has an adequate description?
>>>> Seeing as there seems to be some data missing to justify the commit -
>>>> was has Linux done with those erratas?
>>> While they deal with the SNB erratum in a similar way, I'm afraid I'm
>>> unaware of Linux having or having had a workaround for the errata here.
>>> Which, granted, is a little surprising when we did actually even issue
>>> an XSA for this.
>>>
>>> In fact I find Andrew's request even more surprising with that fact (us
>>> having issued XSA-282 for it) in mind, which originally I don't think I
>>> had paid attention to (nor recalled).
>> No - I'm aware of it.  It probably was the right move at the time.
>>
>> But, Intel have subsequently killed HLE in microcode updates update in
>> all CPUs it ever existed in (to fix a memory ordering erratum), and
>> removed it from the architecture moving forwards (the enumeration of
>> TSX_CTRL means HLE architecturally doesn't exist even if it is enumerated).
> Should we then check for TSX_CTRL in order to check whether to engage
> the workaround?

By the looks of the current model list, TSX_CTRL doesn't exist on any of
those CPUs.

https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#tsx

It was the March 2019 ucode which turned off HLE everywhere, which was
only shortly after we released XSA-282.

~Andrew


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] x86: extend coverage of HLE "bad page" workaround
  2023-03-21 14:51             ` Andrew Cooper
  2023-03-21 15:59               ` Roger Pau Monné
@ 2023-03-21 16:31               ` Jan Beulich
  1 sibling, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2023-03-21 16:31 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Roger Pau Monné

On 21.03.2023 15:51, Andrew Cooper wrote:
> On 20/03/2023 9:24 am, Jan Beulich wrote:
>> On 17.03.2023 12:39, Roger Pau Monné wrote:
>>> On Tue, May 26, 2020 at 06:40:16PM +0200, Jan Beulich wrote:
>>>> On 26.05.2020 17:01, Andrew Cooper wrote:
>>>>> On 26/05/2020 14:35, Jan Beulich wrote:
>>>>>> On 26.05.2020 13:17, Andrew Cooper wrote:
>>>>>>> On 26/05/2020 07:49, Jan Beulich wrote:
>>>>>>>> Respective Core Gen10 processor lines are affected, too.
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>>>
>>>>>>>> --- a/xen/arch/x86/mm.c
>>>>>>>> +++ b/xen/arch/x86/mm.c
>>>>>>>> @@ -6045,6 +6045,8 @@ const struct platform_bad_page *__init g
>>>>>>>>      case 0x000506e0: /* errata SKL167 / SKW159 */
>>>>>>>>      case 0x000806e0: /* erratum KBL??? */
>>>>>>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>>>>>>> +    case 0x000a0650: /* erratum Core Gen10 U/H/S 101 */
>>>>>>>> +    case 0x000a0660: /* erratum Core Gen10 U/H/S 101 */
>>>>>>> This is marred in complexity.
>>>>>>>
>>>>>>> The enumeration of MSR_TSX_CTRL (from the TAA fix, but architectural
>>>>>>> moving forwards on any TSX-enabled CPU) includes a confirmation that HLE
>>>>>>> no longer exists/works.  This applies to IceLake systems, but possibly
>>>>>>> not their initial release configuration (hence, via a later microcode
>>>>>>> update).
>>>>>>>
>>>>>>> HLE is also disabled in microcode on all older parts for errata reasons,
>>>>>>> so in practice it doesn't exist anywhere now.
>>>>>>>
>>>>>>> I think it is safe to drop this workaround, and this does seem a more
>>>>>>> simple option than encoding which microcode turned HLE off (which sadly
>>>>>>> isn't covered by the spec updates, as even when turned off, HLE is still
>>>>>>> functioning according to its spec of "may speed things up, may do
>>>>>>> nothing"), or the interactions with the CPUID hiding capabilities of
>>>>>>> MSR_TSX_CTRL.
>>>>>> I'm afraid I don't fully follow: For one, does what you say imply HLE is
>>>>>> no longer enumerated in CPUID?
>>>>> No - sadly not.  For reasons of "not repeating the Haswell/Broadwell
>>>>> microcode fiasco", the HLE bit will continue to exist and be set. 
>>>>> (Although on CascadeLake and later, you can turn it off with MSR_TSX_CTRL.)
>>>>>
>>>>> It was always a weird CPUID bit.  You were supposed to put
>>>>> XACQUIRE/XRELEASE prefixes on your legacy locking, and it would be a nop
>>>>> on old hardware and go faster on newer hardware.
>>>>>
>>>>> There is nothing runtime code needs to look at the HLE bit for, except
>>>>> perhaps for UI reporting purposes.
>>>> Do you know of some public Intel doc I could reference for all of this,
>>>> which I would kind of need in the description of a patch ...
>>>>
>>>>>> But then this
>>>>>> erratum does not have the usual text effectively meaning that an ucode
>>>>>> update is or will be available to address the issue; instead it says
>>>>>> that BIOS or VMM can reserve the respective address range.
>>>>> This is not surprising at all.  Turning off HLE was an unrelated
>>>>> activity, and I bet the link went unnoticed.
>>>>>
>>>>>> This - assuming the alternative you describe is indeed viable - then is surely
>>>>>> a much more intrusive workaround than needed. Which I wouldn't assume
>>>>>> they would suggest in such a case.
>>>>> My suggestion was to drop the workaround, not to complicated it with a
>>>>> microcode revision matrix.
>>>> ... doing this? I don't think I've seen any of this in writing so far,
>>>> except by you. (I don't understand how this reply of yours relates to
>>>> what I was saying about the spec update. I understand what you are
>>>> suggesting. I merely tried to express that I'd have expected Intel to
>>>> point out the much easier workaround, rather than just a pretty involved
>>>> one.) Otherwise, may I suggest you make such a patch, to make sure it
>>>> has an adequate description?
>>> Seeing as there seems to be some data missing to justify the commit -
>>> was has Linux done with those erratas?
>> While they deal with the SNB erratum in a similar way, I'm afraid I'm
>> unaware of Linux having or having had a workaround for the errata here.
>> Which, granted, is a little surprising when we did actually even issue
>> an XSA for this.
>>
>> In fact I find Andrew's request even more surprising with that fact (us
>> having issued XSA-282 for it) in mind, which originally I don't think I
>> had paid attention to (nor recalled).
> 
> No - I'm aware of it.  It probably was the right move at the time.
> 
> But, Intel have subsequently killed HLE in microcode updates update in
> all CPUs it ever existed in (to fix a memory ordering erratum), and
> removed it from the architecture moving forwards (the enumeration of
> TSX_CTRL means HLE architecturally doesn't exist even if it is enumerated).
> 
> These errata no longer exist when you've got up to date microcode, and
> if you've not got up to date microcode on these CPUs, you've got far
> worse security problems.

Are you sure such ucode was issued for _all_ affected CPUs? I think I
agree for the models being added by the patch here, so I guess I can
accept the argument for this change to not really be needed. But for
removing the workaround I'd expect that suitable microcode exists for
the respective models. Yet at the very least 06-55-05 - consisting of
just a single blob - has a date of 2018-Nov-16, i.e. older than March
2019. That's Xeon D-2100 and perhaps at least one other product (I
didn't go hunt further to see where SKZ is used as the acronym).

Jan


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-03-21 16:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-26  6:49 [PATCH] x86: extend coverage of HLE "bad page" workaround Jan Beulich
2020-05-26 11:17 ` Andrew Cooper
2020-05-26 13:35   ` Jan Beulich
2020-05-26 15:01     ` Andrew Cooper
2020-05-26 16:40       ` Jan Beulich
2023-03-17 11:39         ` Roger Pau Monné
2023-03-20  9:24           ` Jan Beulich
2023-03-21 14:51             ` Andrew Cooper
2023-03-21 15:59               ` Roger Pau Monné
2023-03-21 16:14                 ` Andrew Cooper
2023-03-21 16:31               ` Jan Beulich
2023-03-21 14:42 ` Roger Pau Monné
2023-03-21 15:51   ` Jan Beulich
2023-03-21 15:58     ` Roger Pau Monné
2023-03-21 16:03       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).