linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problems with Zen under Xen and recent Linux kernel improvements
@ 2018-07-31  1:14 Adam Novak
  2018-07-31 11:58 ` Juergen Gross
  0 siblings, 1 reply; 4+ messages in thread
From: Adam Novak @ 2018-07-31  1:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: boris.ostrovsky, jgross, x86

Hello,

I was advised to take this here, and to Boris Ostrovsky and Juergen
Gross, by Thomas Gleixner.

I am having some trouble with the new speculation control code that
has been added to the Linux kernel, for AMD Zen CPUs. I am running an
AMD Ryzen 7 1700, and I am running Linux as a Xen dom0 (which is part
of the problem; the code seems to work fine running outside of Xen).

I started having trouble on Ubuntu's commit
3f6a3b035f91a22c0d3bd27630bf61eac9c8cf6c, "x86/speculation: Handle HT
correctly on AMD", which appears to be cherry-picked from
1f50ddb4f4189243c05926b842dc1a0332195f31. Since that commit, my system
hangs during the boot process; it starts starting stuff up and trying
to mount things and printing "[OK]" messages, but then fairly early in
the boot process the kernel complains that it is "unable to handle
kernel NULL pointer deference at 000...0008"

On my Ubuntu bug:

https://bugs.launchpad.net/bugs/1777338

I have a "Screenshot of the null pointer dereference message". It is
running into trouble during a spin lock in the new
speculative_store_bypass_update().

Has anyone else seen this behavior on these CPUs under Xen (I am using 4.9)?

Since the commit that started the problem has to do with sibling CPU
cores, I suspect that the problem may have something to do with how
Xen handles hyperthreading. Namely, Xen seems to hide hyperthreading
from the VMs running under it (including from dom0). Instead of having
8 CPUs with 2 threads each, my Linux running under Xen on my 8-core
Ryzen chip sees 16 virtual CPU cores, all of which still report
themselves as being the Ryzen 7 1700 processor.

For reference, my /proc/cpuinfo looks like this at the tail end:

processor    : 14
vendor_id    : AuthenticAMD
cpu family    : 23
model        : 1
model name    : AMD Ryzen 7 1700 Eight-Core Processor
stepping    : 1
microcode    : 0x8001137
cpu MHz        : 2994.027
cache size    : 512 KB
physical id    : 0
siblings    : 16
core id        : 0
cpu cores    : 16
apicid        : 0
initial apicid    : 0
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu de tsc msr pae mce cx8 apic mca cmov pat clflush
mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm constant_tsc
rep_good nopl nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma
cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor
lahf_lm cmp_legacy abm sse4a misalignsse 3dnowprefetch bpext cpb
vmmcall fsgsbase bmi1 avx2 bmi2 rdseed adx clflushopt sha_ni xsaveopt
xsavec xgetbv1 clzero ibpb arat ssbd
bugs        : fxsave_leak null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips    : 5989.03
TLB size    : 2560 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 48 bits physical, 48 bits virtual
power management:

processor    : 15
vendor_id    : AuthenticAMD
cpu family    : 23
model        : 1
model name    : AMD Ryzen 7 1700 Eight-Core Processor
stepping    : 1
microcode    : 0x8001137
cpu MHz        : 2994.027
cache size    : 512 KB
physical id    : 0
siblings    : 16
core id        : 0
cpu cores    : 16
apicid        : 0
initial apicid    : 0
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu de tsc msr pae mce cx8 apic mca cmov pat clflush
mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm constant_tsc
rep_good nopl nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma
cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor
lahf_lm cmp_legacy abm sse4a misalignsse 3dnowprefetch bpext cpb
vmmcall fsgsbase bmi1 avx2 bmi2 rdseed adx clflushopt sha_ni xsaveopt
xsavec xgetbv1 clzero ibpb arat ssbd
bugs        : fxsave_leak null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips    : 5989.03
TLB size    : 2560 4K pages
clflush size    : 64
cache_alignment    : 64
address sizes    : 48 bits physical, 48 bits virtual
power management:

All the cores have core ID 0, and the CPU says it has 16 cores. When
booted outside of Xen, I still have processors 0-15 in /proc/cpuinfo,
but they come in pairs with core IDs 0-7, and "CPU cores" is 8.

If it looks like this during the boot process, and the new
sibling-thread-aware code is looking for hyperthreading that Xen
doesn't expose, maybe that is causing the problem?

Thanks,
-Adam

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problems with Zen under Xen and recent Linux kernel improvements
  2018-07-31  1:14 Problems with Zen under Xen and recent Linux kernel improvements Adam Novak
@ 2018-07-31 11:58 ` Juergen Gross
  2018-08-05 19:27   ` Adam Novak
  0 siblings, 1 reply; 4+ messages in thread
From: Juergen Gross @ 2018-07-31 11:58 UTC (permalink / raw)
  To: Adam Novak, linux-kernel; +Cc: boris.ostrovsky, x86

On 31/07/18 03:14, Adam Novak wrote:
> Hello,
> 
> I was advised to take this here, and to Boris Ostrovsky and Juergen
> Gross, by Thomas Gleixner.
> 
> I am having some trouble with the new speculation control code that
> has been added to the Linux kernel, for AMD Zen CPUs. I am running an
> AMD Ryzen 7 1700, and I am running Linux as a Xen dom0 (which is part
> of the problem; the code seems to work fine running outside of Xen).
> 
> I started having trouble on Ubuntu's commit
> 3f6a3b035f91a22c0d3bd27630bf61eac9c8cf6c, "x86/speculation: Handle HT
> correctly on AMD", which appears to be cherry-picked from
> 1f50ddb4f4189243c05926b842dc1a0332195f31. Since that commit, my system
> hangs during the boot process; it starts starting stuff up and trying
> to mount things and printing "[OK]" messages, but then fairly early in
> the boot process the kernel complains that it is "unable to handle
> kernel NULL pointer deference at 000...0008"
> 
> On my Ubuntu bug:
> 
> https://bugs.launchpad.net/bugs/1777338
> 
> I have a "Screenshot of the null pointer dereference message". It is
> running into trouble during a spin lock in the new
> speculative_store_bypass_update().
> 
> Has anyone else seen this behavior on these CPUs under Xen (I am using 4.9)?

You want at least 4.9.112, especially due to the missing patches
"x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths",
"x86/cpu: Re-apply forced caps every time CPU caps are re-read"

Juergen

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problems with Zen under Xen and recent Linux kernel improvements
  2018-07-31 11:58 ` Juergen Gross
@ 2018-08-05 19:27   ` Adam Novak
  2018-08-05 21:35     ` Adam Novak
  0 siblings, 1 reply; 4+ messages in thread
From: Adam Novak @ 2018-08-05 19:27 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, boris.ostrovsky, x86

Sorry, I am using Xen version 4.9.2, specifically 4.9.2-0ubuntu1.

I am seeing the bug with *kernel* version 4.15.0, and specifically
Ubuntu's tag Ubuntu-4.15.0-23.25. That appears to have the "x86/cpu:
Re-apply forced caps every time CPU caps are re-read" patch, but not
"x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths".

I can try cherry-picking that commit. Are there other commits in
particular that might need to be pulled into the Ubuntu kernel to get
it to work?

On Tue, Jul 31, 2018 at 4:58 AM, Juergen Gross <jgross@suse.com> wrote:
> On 31/07/18 03:14, Adam Novak wrote:
>> Hello,
>>
>> I was advised to take this here, and to Boris Ostrovsky and Juergen
>> Gross, by Thomas Gleixner.
>>
>> I am having some trouble with the new speculation control code that
>> has been added to the Linux kernel, for AMD Zen CPUs. I am running an
>> AMD Ryzen 7 1700, and I am running Linux as a Xen dom0 (which is part
>> of the problem; the code seems to work fine running outside of Xen).
>>
>> I started having trouble on Ubuntu's commit
>> 3f6a3b035f91a22c0d3bd27630bf61eac9c8cf6c, "x86/speculation: Handle HT
>> correctly on AMD", which appears to be cherry-picked from
>> 1f50ddb4f4189243c05926b842dc1a0332195f31. Since that commit, my system
>> hangs during the boot process; it starts starting stuff up and trying
>> to mount things and printing "[OK]" messages, but then fairly early in
>> the boot process the kernel complains that it is "unable to handle
>> kernel NULL pointer deference at 000...0008"
>>
>> On my Ubuntu bug:
>>
>> https://bugs.launchpad.net/bugs/1777338
>>
>> I have a "Screenshot of the null pointer dereference message". It is
>> running into trouble during a spin lock in the new
>> speculative_store_bypass_update().
>>
>> Has anyone else seen this behavior on these CPUs under Xen (I am using 4.9)?
>
> You want at least 4.9.112, especially due to the missing patches
> "x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths",
> "x86/cpu: Re-apply forced caps every time CPU caps are re-read"
>
> Juergen

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Problems with Zen under Xen and recent Linux kernel improvements
  2018-08-05 19:27   ` Adam Novak
@ 2018-08-05 21:35     ` Adam Novak
  0 siblings, 0 replies; 4+ messages in thread
From: Adam Novak @ 2018-08-05 21:35 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, boris.ostrovsky, x86

OK, I pulled that commit, 74899d92e66663dc7671a8017b3146dcd4735f3b, in
to the Ubuntu kernel and it seems to solve the problem. Now I just
need to get Ubuntu to ship it.

Thanks!

On Sun, Aug 5, 2018 at 12:27 PM, Adam Novak <interfect@gmail.com> wrote:
> Sorry, I am using Xen version 4.9.2, specifically 4.9.2-0ubuntu1.
>
> I am seeing the bug with *kernel* version 4.15.0, and specifically
> Ubuntu's tag Ubuntu-4.15.0-23.25. That appears to have the "x86/cpu:
> Re-apply forced caps every time CPU caps are re-read" patch, but not
> "x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths".
>
> I can try cherry-picking that commit. Are there other commits in
> particular that might need to be pulled into the Ubuntu kernel to get
> it to work?
>
> On Tue, Jul 31, 2018 at 4:58 AM, Juergen Gross <jgross@suse.com> wrote:
>> On 31/07/18 03:14, Adam Novak wrote:
>>> Hello,
>>>
>>> I was advised to take this here, and to Boris Ostrovsky and Juergen
>>> Gross, by Thomas Gleixner.
>>>
>>> I am having some trouble with the new speculation control code that
>>> has been added to the Linux kernel, for AMD Zen CPUs. I am running an
>>> AMD Ryzen 7 1700, and I am running Linux as a Xen dom0 (which is part
>>> of the problem; the code seems to work fine running outside of Xen).
>>>
>>> I started having trouble on Ubuntu's commit
>>> 3f6a3b035f91a22c0d3bd27630bf61eac9c8cf6c, "x86/speculation: Handle HT
>>> correctly on AMD", which appears to be cherry-picked from
>>> 1f50ddb4f4189243c05926b842dc1a0332195f31. Since that commit, my system
>>> hangs during the boot process; it starts starting stuff up and trying
>>> to mount things and printing "[OK]" messages, but then fairly early in
>>> the boot process the kernel complains that it is "unable to handle
>>> kernel NULL pointer deference at 000...0008"
>>>
>>> On my Ubuntu bug:
>>>
>>> https://bugs.launchpad.net/bugs/1777338
>>>
>>> I have a "Screenshot of the null pointer dereference message". It is
>>> running into trouble during a spin lock in the new
>>> speculative_store_bypass_update().
>>>
>>> Has anyone else seen this behavior on these CPUs under Xen (I am using 4.9)?
>>
>> You want at least 4.9.112, especially due to the missing patches
>> "x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths",
>> "x86/cpu: Re-apply forced caps every time CPU caps are re-read"
>>
>> Juergen

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-08-05 21:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-31  1:14 Problems with Zen under Xen and recent Linux kernel improvements Adam Novak
2018-07-31 11:58 ` Juergen Gross
2018-08-05 19:27   ` Adam Novak
2018-08-05 21:35     ` Adam Novak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).