* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure
@ 2018-01-29 22:29 David Dunn
2018-01-29 22:41 ` Andi Kleen
` (2 more replies)
0 siblings, 3 replies; 75+ messages in thread
From: David Dunn @ 2018-01-29 22:29 UTC (permalink / raw)
To: Eduardo Habkost
Cc: Arjan van de Ven, KarimAllah Ahmed, Wilson, Matt, linux-kernel,
Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj,
Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen,
Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar,
Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott,
Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra,
Radim Krčmář,
Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86,
Dr. David Alan Gilbert, Fred Jacobs, Jim Mattson,
David Woodhouse
On Mon, 2018-01-29 at 13:45:07 -0800, Eduardo Habkost wrote:
> Maybe a generic "family/model/stepping/microcode really matches
> the CPU you are running on" bit would be useful. The bit could
> be enabled only on host-passthrough (aka "-cpu host") mode.
>
> If we really want to be able to migrate to host with different
> CPU models (except Skylake), we could add a more specific "we
> promise the host CPU is never going to be Skylake" bit.
>
> Now, if the hypervisor is not providing any of those bits, I
> would advise against trusting family/model/stepping/microcode
> under a hypervisor. Using a pre-defined CPU model (that doesn't
> necessarily match the host) is very common when using KVM VM
> management stacks.
>
Eduardo,
I don't see how this is possible in a modern virtualization environment.
Under VMware, a VM will be migrated to SkyLake if one is in the cluster and supports the features exposed to the VM. This can occur for suspend/resume as well.
The migration pool isn't a constant. Hosts can be added to a cluster and VMs can be instructed to move across clusters. So there doesn't need to be a SkyLake around when the VM powers on in order for it to eventually end up on a SkyLake.
Even if we expose bit to indicate that FMS matches the underlying host, when does the guest know to query that? The VM can be moved at any point in time, including after the guest asks if FMS matches host.
My apologies for posting onto the mailing list out of the blue. Someone asked my opinion on this suggestion. I'm definitely interested in figuring out whether Linux can fully mitigate the SkyLake RSB problem in virtual environments, but it's not clear how best to achieve that.
Thanks,
David Dunn
^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 22:29 [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure David Dunn @ 2018-01-29 22:41 ` Andi Kleen 2018-01-29 22:49 ` Jim Mattson 2018-01-29 23:51 ` Fred Jacobs 2018-01-30 1:08 ` Eduardo Habkost 2 siblings, 1 reply; 75+ messages in thread From: Andi Kleen @ 2018-01-29 22:41 UTC (permalink / raw) To: David Dunn Cc: Eduardo Habkost, Arjan van de Ven, KarimAllah Ahmed, Wilson, Matt, linux-kernel, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert, Fred Jacobs, Jim Mattson, David Woodhouse > Even if we expose bit to indicate that FMS matches the underlying host, when does the guest know to query that? The VM can be moved at any point in time, including after the guest asks if FMS matches host. There's no way to enable these mitigations later, so if you always have to enable the super set of all the mitigations for all the hosts you might be migrating too. As of currently that means if you want to ever migrate to Skylake you should set the Skylake model number and you're good. -Andi ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 22:41 ` Andi Kleen @ 2018-01-29 22:49 ` Jim Mattson 2018-01-30 1:10 ` Eduardo Habkost 0 siblings, 1 reply; 75+ messages in thread From: Jim Mattson @ 2018-01-29 22:49 UTC (permalink / raw) To: Andi Kleen Cc: David Dunn, Eduardo Habkost, Arjan van de Ven, KarimAllah Ahmed, Wilson, Matt, linux-kernel, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert, Fred Jacobs, David Woodhouse And if we expect to introduce Cascade Lake into the pool in the future, we use a Cascade Lake model number? It sounds like you are suggesting that we set the model number to the highest model number that will ever be introduced into the pool, at any time in the future. That approach would also fail the 'is_skylake_era()' test. (Not to mention that we have no idea what Intel's highest compatible model number will be.) On Mon, Jan 29, 2018 at 2:41 PM, Andi Kleen <ak@linux.intel.com> wrote: >> Even if we expose bit to indicate that FMS matches the underlying host, when does the guest know to query that? The VM can be moved at any point in time, including after the guest asks if FMS matches host. > > There's no way to enable these mitigations later, so if you always > have to enable the super set of all the mitigations for all the hosts you > might be migrating too. > > As of currently that means if you want to ever migrate to Skylake you should > set the Skylake model number and you're good. > > -Andi ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 22:49 ` Jim Mattson @ 2018-01-30 1:10 ` Eduardo Habkost 2018-01-30 1:20 ` David Dunn 0 siblings, 1 reply; 75+ messages in thread From: Eduardo Habkost @ 2018-01-30 1:10 UTC (permalink / raw) To: Jim Mattson Cc: Andi Kleen, David Dunn, Arjan van de Ven, KarimAllah Ahmed, Wilson, Matt, linux-kernel, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert, Fred Jacobs, David Woodhouse On Mon, Jan 29, 2018 at 02:49:51PM -0800, Jim Mattson wrote: > And if we expect to introduce Cascade Lake into the pool in the > future, we use a Cascade Lake model number? > > It sounds like you are suggesting that we set the model number to the > highest model number that will ever be introduced into the pool, at > any time in the future. That approach would also fail the > 'is_skylake_era()' test. (Not to mention that we have no idea what > Intel's highest compatible model number will be.) Exactly, that's why virtualization and live-migration break the model of just checking f/m/s/microcode: the guest doesn't need to work around bugs that are present in the current host, but the set of bugs that could appear on any future host it can run on. > > On Mon, Jan 29, 2018 at 2:41 PM, Andi Kleen <ak@linux.intel.com> wrote: > >> Even if we expose bit to indicate that FMS matches the underlying host, when does the guest know to query that? The VM can be moved at any point in time, including after the guest asks if FMS matches host. > > > > There's no way to enable these mitigations later, so if you always > > have to enable the super set of all the mitigations for all the hosts you > > might be migrating too. > > > > As of currently that means if you want to ever migrate to Skylake you should > > set the Skylake model number and you're good. > > > > -Andi -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 1:10 ` Eduardo Habkost @ 2018-01-30 1:20 ` David Dunn 2018-01-30 1:30 ` Eduardo Habkost 0 siblings, 1 reply; 75+ messages in thread From: David Dunn @ 2018-01-30 1:20 UTC (permalink / raw) To: Eduardo Habkost, Jim Mattson Cc: Andi Kleen, Arjan van de Ven, KarimAllah Ahmed, Wilson, Matt, linux-kernel, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, Jorgensen, Bryan, kvm, x86, Dr. David Alan Gilbert, Fred Jacobs, David Woodhouse Eduardo, This is why it would be good to have a CPUID bit that says: "apply SkyLake RSB stuffing." That's preferable to "trust FMS" for VMware. If Intel defines such a feature flag, sets it on SkyLake, and Linux uses it... that would be very helpful for VMware. I won't speak for GCE and AWS. But hopefully they can indicate whether it would help them as well. If Intel cannot define/implement such a flag on SkyLake, then maybe the engineers on this email could define a flag in the hypervisor specific CPUID space. Linux would need to query that flag if it sees CPUID[1].ECX[31] set. That's not as nice since it makes detection on bare metal and virtualization platforms different, but it better than keying off FMS. David Dunn On 1/29/18, 5:11 PM, "Eduardo Habkost" <ehabkost@redhat.com> wrote: On Mon, Jan 29, 2018 at 02:49:51PM -0800, Jim Mattson wrote: > And if we expect to introduce Cascade Lake into the pool in the > future, we use a Cascade Lake model number? > > It sounds like you are suggesting that we set the model number to the > highest model number that will ever be introduced into the pool, at > any time in the future. That approach would also fail the > 'is_skylake_era()' test. (Not to mention that we have no idea what > Intel's highest compatible model number will be.) Exactly, that's why virtualization and live-migration break the model of just checking f/m/s/microcode: the guest doesn't need to work around bugs that are present in the current host, but the set of bugs that could appear on any future host it can run on. > > On Mon, Jan 29, 2018 at 2:41 PM, Andi Kleen <ak@linux.intel.com> wrote: > >> Even if we expose bit to indicate that FMS matches the underlying host, when does the guest know to query that? The VM can be moved at any point in time, including after the guest asks if FMS matches host. > > > > There's no way to enable these mitigations later, so if you always > > have to enable the super set of all the mitigations for all the hosts you > > might be migrating too. > > > > As of currently that means if you want to ever migrate to Skylake you should > > set the Skylake model number and you're good. > > > > -Andi -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 1:20 ` David Dunn @ 2018-01-30 1:30 ` Eduardo Habkost 0 siblings, 0 replies; 75+ messages in thread From: Eduardo Habkost @ 2018-01-30 1:30 UTC (permalink / raw) To: David Dunn Cc: Jim Mattson, Andi Kleen, Arjan van de Ven, KarimAllah Ahmed, Wilson, Matt, linux-kernel, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, Jorgensen, Bryan, kvm, x86, Dr. David Alan Gilbert, Fred Jacobs, David Woodhouse On Tue, Jan 30, 2018 at 01:20:52AM +0000, David Dunn wrote: > Eduardo, > > This is why it would be good to have a CPUID bit that says: > "apply SkyLake RSB stuffing." That's preferable to "trust FMS" > for VMware. Agreed it would be more useful than "trust FMS". However, I believe a "no need to apply Skylake RSB stuffing" bit (which I called "we promise we won't migrate to Skylake" previously) would allow guests to enable safer behavior by default under older hypervisors that don't support this bit. > > If Intel defines such a feature flag, sets it on SkyLake, and > Linux uses it... that would be very helpful for VMware. > > I won't speak for GCE and AWS. But hopefully they can indicate > whether it would help them as well. I agree that having a standard flag on the CPUID space to specify that would be very helpful. > > If Intel cannot define/implement such a flag on SkyLake, then > maybe the engineers on this email could define a flag in the > hypervisor specific CPUID space. Linux would need to query > that flag if it sees CPUID[1].ECX[31] set. That's not as nice > since it makes detection on bare metal and virtualization > platforms different, but it better than keying off FMS. Agreed. -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 22:29 [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure David Dunn 2018-01-29 22:41 ` Andi Kleen @ 2018-01-29 23:51 ` Fred Jacobs 2018-01-30 1:08 ` Eduardo Habkost 2 siblings, 0 replies; 75+ messages in thread From: Fred Jacobs @ 2018-01-29 23:51 UTC (permalink / raw) To: David Dunn Cc: Eduardo Habkost, Arjan van de Ven, KarimAllah Ahmed, Wilson, Matt, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Kr??m????, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert, Jim Mattson, David Woodhouse (Apologies as I was brought into this thread late, but I believe I have context). Could a new "feature" be enumerated on Skylake and beyond which specifies that a particular problem exists which requires different mitigation than on previous processors? Perhaps a CPUID bit enumerating this feature (along side IBRS, IBPB and STIBP) could be exposed on only the newer CPUs. System software could then query this to know what form of mitigation is necessary. This could be over-reported in virtualized environments (e.g. a Nehalem CPU could be represented as needing the Skylake mitigation), such that sometimes the heavier Skylake+ mitigation would be applied on older CPUs. This is correct, just slower. I'm just suggesting this rather than keying on Family/Model/Stepping to avoid breaking virtual machine migration, et cetera. Thanks, Fred, sticking his neck out. On Jan 29 2:29PM, David Dunn wrote: > On Mon, 2018-01-29 at 13:45:07 -0800, Eduardo Habkost wrote: > > > Maybe a generic "family/model/stepping/microcode really matches > > the CPU you are running on" bit would be useful. The bit could > > be enabled only on host-passthrough (aka "-cpu host") mode. > > > > If we really want to be able to migrate to host with different > > CPU models (except Skylake), we could add a more specific "we > > promise the host CPU is never going to be Skylake" bit. > > > > Now, if the hypervisor is not providing any of those bits, I > > would advise against trusting family/model/stepping/microcode > > under a hypervisor. Using a pre-defined CPU model (that doesn't > > necessarily match the host) is very common when using KVM VM > > management stacks. > > > > Eduardo, > > I don't see how this is possible in a modern virtualization environment. > > Under VMware, a VM will be migrated to SkyLake if one is in the cluster and supports the features exposed to the VM. This can occur for suspend/resume as well. > > The migration pool isn't a constant. Hosts can be added to a cluster and VMs can be instructed to move across clusters. So there doesn't need to be a SkyLake around when the VM powers on in order for it to eventually end up on a SkyLake. > > Even if we expose bit to indicate that FMS matches the underlying host, when does the guest know to query that? The VM can be moved at any point in time, including after the guest asks if FMS matches host. > > My apologies for posting onto the mailing list out of the blue. Someone asked my opinion on this suggestion. I'm definitely interested in figuring out whether Linux can fully mitigate the SkyLake RSB problem in virtual environments, but it's not clear how best to achieve that. > > Thanks, > > David Dunn > ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 22:29 [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure David Dunn 2018-01-29 22:41 ` Andi Kleen 2018-01-29 23:51 ` Fred Jacobs @ 2018-01-30 1:08 ` Eduardo Habkost 2 siblings, 0 replies; 75+ messages in thread From: Eduardo Habkost @ 2018-01-30 1:08 UTC (permalink / raw) To: David Dunn Cc: Arjan van de Ven, KarimAllah Ahmed, Wilson, Matt, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert, Fred Jacobs, Jim Mattson, David Woodhouse On Mon, Jan 29, 2018 at 10:29:28PM +0000, David Dunn wrote: > On Mon, 2018-01-29 at 13:45:07 -0800, Eduardo Habkost wrote: > > > Maybe a generic "family/model/stepping/microcode really matches > > the CPU you are running on" bit would be useful. The bit could > > be enabled only on host-passthrough (aka "-cpu host") mode. > > > > If we really want to be able to migrate to host with different > > CPU models (except Skylake), we could add a more specific "we > > promise the host CPU is never going to be Skylake" bit. > > > > Now, if the hypervisor is not providing any of those bits, I > > would advise against trusting family/model/stepping/microcode > > under a hypervisor. Using a pre-defined CPU model (that doesn't > > necessarily match the host) is very common when using KVM VM > > management stacks. > > > > Eduardo, > > I don't see how this is possible in a modern virtualization > environment. > > Under VMware, a VM will be migrated to SkyLake if one is in the > cluster and supports the features exposed to the VM. This can > occur for suspend/resume as well. > > The migration pool isn't a constant. Hosts can be added to a > cluster and VMs can be instructed to move across clusters. So > there doesn't need to be a SkyLake around when the VM powers on > in order for it to eventually end up on a SkyLake. If this is the case for your deployment, this means the guest must never assume it won't run on a Skylake host (even if f/m/s is not Skylake), doesn't it? Then the hypervisor won't set the "we promise the host CPU is never going to be Skylake" bit. > > Even if we expose bit to indicate that FMS matches the > underlying host, when does the guest know to query that? The > VM can be moved at any point in time, including after the guest > asks if FMS matches host. If the VM can be moved at any point of time to a different model of host CPU, this means you won't tell the guest it can trust f/m/s because it doesn't represent the underlying host. You won't set the "f/m/s/m really matches the host CPU" bit. On both scenarios you describe above, it sounds like Linux must assume it could migrated to a Skylake host at any moment. This is exactly why I'm proposing those extra bits. -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* [RFC 00/10] Speculation Control feature support @ 2018-01-20 19:22 KarimAllah Ahmed 2018-01-20 19:22 ` [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure KarimAllah Ahmed 0 siblings, 1 reply; 75+ messages in thread From: KarimAllah Ahmed @ 2018-01-20 19:22 UTC (permalink / raw) To: linux-kernel Cc: KarimAllah Ahmed, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86 Start using the newly-added microcode features for speculation control on both Intel and AMD CPUs to protect against Spectre v2. This patch series covers interrupts, system calls, context switching between processes, and context switching between VMs. It also exposes Indirect Branch Prediction Barrier MSR, aka IBPB MSR, to KVM guests. TODO: - Introduce a microcode blacklist to disable the feature for broken microcodes. - Restrict/Unrestrict the speculation (by toggling IBRS) around VMExit and VMEnter for KVM and expose IBRS to guests. Ashok Raj (1): x86/kvm: Add IBPB support David Woodhouse (1): x86/speculation: Add basic IBRS support infrastructure KarimAllah Ahmed (1): x86: Simplify spectre_v2 command line parsing Thomas Gleixner (4): x86/speculation: Add basic support for IBPB x86/speculation: Use Indirect Branch Prediction Barrier in context switch x86/speculation: Add inlines to control Indirect Branch Speculation x86/idle: Control Indirect Branch Speculation in idle Tim Chen (3): x86/mm: Only flush indirect branches when switching into non dumpable process x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation x86/enter: Use IBRS on syscall and interrupts Documentation/admin-guide/kernel-parameters.txt | 1 + arch/x86/entry/calling.h | 73 ++++++++++ arch/x86/entry/entry_64.S | 35 ++++- arch/x86/entry/entry_64_compat.S | 21 ++- arch/x86/include/asm/cpufeatures.h | 2 + arch/x86/include/asm/mwait.h | 14 ++ arch/x86/include/asm/nospec-branch.h | 54 ++++++- arch/x86/kernel/cpu/bugs.c | 183 +++++++++++++++--------- arch/x86/kernel/process.c | 14 ++ arch/x86/kvm/svm.c | 14 ++ arch/x86/kvm/vmx.c | 4 + arch/x86/mm/tlb.c | 21 ++- 12 files changed, 359 insertions(+), 77 deletions(-) Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: x86@kernel.org -- 2.7.4 ^ permalink raw reply [flat|nested] 75+ messages in thread
* [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-20 19:22 [RFC 00/10] Speculation Control feature support KarimAllah Ahmed @ 2018-01-20 19:22 ` KarimAllah Ahmed 2018-01-21 14:31 ` Thomas Gleixner ` (2 more replies) 0 siblings, 3 replies; 75+ messages in thread From: KarimAllah Ahmed @ 2018-01-20 19:22 UTC (permalink / raw) To: linux-kernel Cc: KarimAllah Ahmed, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86 From: David Woodhouse <dwmw@amazon.co.uk> Not functional yet; just add the handling for it in the Spectre v2 mitigation selection, and the X86_FEATURE_IBRS flag which will control the code to be added in later patches. Also take the #ifdef CONFIG_RETPOLINE from around the RSB-stuffing; IBRS mode will want that too. For now we are auto-selecting IBRS on Skylake. We will probably end up changing that but for now let's default to the safest option. XX: Do we want a microcode blacklist? [karahmed: simplify the switch block and get rid of all the magic] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> --- Documentation/admin-guide/kernel-parameters.txt | 1 + arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/nospec-branch.h | 2 - arch/x86/kernel/cpu/bugs.c | 108 +++++++++++++++--------- 4 files changed, 68 insertions(+), 44 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 8122b5f..e597650 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3932,6 +3932,7 @@ retpoline - replace indirect branches retpoline,generic - google's original retpoline retpoline,amd - AMD-specific minimal thunk + ibrs - Intel: Indirect Branch Restricted Speculation Not specifying this option is equivalent to spectre_v2=auto. diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 8ec9588..ae86ad9 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -211,6 +211,7 @@ #define X86_FEATURE_AMD_PRED_CMD ( 7*32+17) /* Prediction Command MSR (AMD) */ #define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */ #define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* Fill RSB on context switches */ +#define X86_FEATURE_IBRS ( 7*32+21) /* Use IBRS for Spectre v2 safety */ /* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index c333c95..8759449 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -205,7 +205,6 @@ extern char __indirect_thunk_end[]; */ static inline void vmexit_fill_RSB(void) { -#ifdef CONFIG_RETPOLINE unsigned long loops; asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE @@ -215,7 +214,6 @@ static inline void vmexit_fill_RSB(void) "910:" : "=r" (loops), ASM_CALL_CONSTRAINT : : "memory" ); -#endif } static inline void indirect_branch_prediction_barrier(void) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 96548ff..1d5e12f 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -79,6 +79,7 @@ enum spectre_v2_mitigation_cmd { SPECTRE_V2_CMD_RETPOLINE, SPECTRE_V2_CMD_RETPOLINE_GENERIC, SPECTRE_V2_CMD_RETPOLINE_AMD, + SPECTRE_V2_CMD_IBRS, }; static const char *spectre_v2_strings[] = { @@ -87,6 +88,7 @@ static const char *spectre_v2_strings[] = { [SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline", [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", + [SPECTRE_V2_IBRS] = "Mitigation: Indirect Branch Restricted Speculation", }; #undef pr_fmt @@ -132,9 +134,17 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) spec2_print_if_secure("force enabled on command line."); return SPECTRE_V2_CMD_FORCE; } else if (match_option(arg, ret, "retpoline")) { + if (!IS_ENABLED(CONFIG_RETPOLINE)) { + pr_err("retpoline selected but not compiled in. Switching to AUTO select\n"); + return SPECTRE_V2_CMD_AUTO; + } spec2_print_if_insecure("retpoline selected on command line."); return SPECTRE_V2_CMD_RETPOLINE; } else if (match_option(arg, ret, "retpoline,amd")) { + if (!IS_ENABLED(CONFIG_RETPOLINE)) { + pr_err("retpoline,amd selected but not compiled in. Switching to AUTO select\n"); + return SPECTRE_V2_CMD_AUTO; + } if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) { pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n"); return SPECTRE_V2_CMD_AUTO; @@ -142,8 +152,19 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) spec2_print_if_insecure("AMD retpoline selected on command line."); return SPECTRE_V2_CMD_RETPOLINE_AMD; } else if (match_option(arg, ret, "retpoline,generic")) { + if (!IS_ENABLED(CONFIG_RETPOLINE)) { + pr_err("retpoline,generic selected but not compiled in. Switching to AUTO select\n"); + return SPECTRE_V2_CMD_AUTO; + } spec2_print_if_insecure("generic retpoline selected on command line."); return SPECTRE_V2_CMD_RETPOLINE_GENERIC; + } else if (match_option(arg, ret, "ibrs")) { + if (!boot_cpu_has(X86_FEATURE_SPEC_CTRL)) { + pr_err("IBRS selected but no CPU support. Switching to AUTO select\n"); + return SPECTRE_V2_CMD_AUTO; + } + spec2_print_if_insecure("IBRS seleted on command line."); + return SPECTRE_V2_CMD_IBRS; } else if (match_option(arg, ret, "auto")) { return SPECTRE_V2_CMD_AUTO; } @@ -156,7 +177,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) return SPECTRE_V2_CMD_NONE; } -/* Check for Skylake-like CPUs (for RSB handling) */ +/* Check for Skylake-like CPUs (for RSB and IBRS handling) */ static bool __init is_skylake_era(void) { if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && @@ -178,55 +199,58 @@ static void __init spectre_v2_select_mitigation(void) enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline(); enum spectre_v2_mitigation mode = SPECTRE_V2_NONE; - /* - * If the CPU is not affected and the command line mode is NONE or AUTO - * then nothing to do. - */ - if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2) && - (cmd == SPECTRE_V2_CMD_NONE || cmd == SPECTRE_V2_CMD_AUTO)) - return; - switch (cmd) { case SPECTRE_V2_CMD_NONE: + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + pr_err("kernel not compiled with retpoline; no mitigation available!"); return; - - case SPECTRE_V2_CMD_FORCE: - /* FALLTRHU */ - case SPECTRE_V2_CMD_AUTO: - goto retpoline_auto; - - case SPECTRE_V2_CMD_RETPOLINE_AMD: - if (IS_ENABLED(CONFIG_RETPOLINE)) - goto retpoline_amd; - break; - case SPECTRE_V2_CMD_RETPOLINE_GENERIC: - if (IS_ENABLED(CONFIG_RETPOLINE)) - goto retpoline_generic; + case SPECTRE_V2_CMD_IBRS: + mode = SPECTRE_V2_IBRS; + setup_force_cpu_cap(X86_FEATURE_IBRS); break; + case SPECTRE_V2_CMD_AUTO: + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + return; + /* Fall through */ + case SPECTRE_V2_CMD_FORCE: + /* + * If we have IBRS support, and either Skylake or !RETPOLINE, + * then that's what we do. + */ + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL) && + (is_skylake_era() || !retp_compiler())) { + mode = SPECTRE_V2_IBRS; + setup_force_cpu_cap(X86_FEATURE_IBRS); + break; + } + /* Fall through */ case SPECTRE_V2_CMD_RETPOLINE: - if (IS_ENABLED(CONFIG_RETPOLINE)) - goto retpoline_auto; - break; - } - pr_err("kernel not compiled with retpoline; no mitigation available!"); - return; + case SPECTRE_V2_CMD_RETPOLINE_AMD: + if (IS_ENABLED(CONFIG_RETPOLINE) && + boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { + if (boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) { + mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD : + SPECTRE_V2_RETPOLINE_MINIMAL_AMD; + setup_force_cpu_cap(X86_FEATURE_RETPOLINE_AMD); + setup_force_cpu_cap(X86_FEATURE_RETPOLINE); + break; + } -retpoline_auto: - if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { - retpoline_amd: - if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) { pr_err("LFENCE not serializing. Switching to generic retpoline\n"); - goto retpoline_generic; } - mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD : - SPECTRE_V2_RETPOLINE_MINIMAL_AMD; - setup_force_cpu_cap(X86_FEATURE_RETPOLINE_AMD); - setup_force_cpu_cap(X86_FEATURE_RETPOLINE); - } else { - retpoline_generic: - mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_GENERIC : - SPECTRE_V2_RETPOLINE_MINIMAL; - setup_force_cpu_cap(X86_FEATURE_RETPOLINE); + /* Fall through */ + case SPECTRE_V2_CMD_RETPOLINE_GENERIC: + if (IS_ENABLED(CONFIG_RETPOLINE)) { + mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_GENERIC : + SPECTRE_V2_RETPOLINE_MINIMAL; + setup_force_cpu_cap(X86_FEATURE_RETPOLINE); + break; + } + /* Fall through */ + default: + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + pr_err("kernel not compiled with retpoline; no mitigation available!"); + return; } spectre_v2_enabled = mode; -- 2.7.4 ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-20 19:22 ` [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure KarimAllah Ahmed @ 2018-01-21 14:31 ` Thomas Gleixner 2018-01-21 14:56 ` Borislav Petkov ` (2 more replies) 2018-01-29 20:14 ` [RFC,05/10] " Eduardo Habkost 2018-01-31 10:03 ` [RFC 05/10] " Christophe de Dinechin 2 siblings, 3 replies; 75+ messages in thread From: Thomas Gleixner @ 2018-01-21 14:31 UTC (permalink / raw) To: KarimAllah Ahmed Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 On Sat, 20 Jan 2018, KarimAllah Ahmed wrote: > From: David Woodhouse <dwmw@amazon.co.uk> > > Not functional yet; just add the handling for it in the Spectre v2 > mitigation selection, and the X86_FEATURE_IBRS flag which will control > the code to be added in later patches. > > Also take the #ifdef CONFIG_RETPOLINE from around the RSB-stuffing; IBRS > mode will want that too. > > For now we are auto-selecting IBRS on Skylake. We will probably end up > changing that but for now let's default to the safest option. > > XX: Do we want a microcode blacklist? Oh yes, we want a microcode blacklist. Ideally we refuse to load the affected microcode in the first place and if its already loaded then at least avoid to use the borked features. PR texts promising that Intel is committed to transparency in this matter are not sufficient. Intel, please provide the facts, i.e. a proper list of micro codes and affected SKUs, ASAP. Thanks, tglx ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-21 14:31 ` Thomas Gleixner @ 2018-01-21 14:56 ` Borislav Petkov 2018-01-22 9:51 ` Peter Zijlstra 2018-01-21 15:25 ` David Woodhouse 2018-01-23 20:58 ` David Woodhouse 2 siblings, 1 reply; 75+ messages in thread From: Borislav Petkov @ 2018-01-21 14:56 UTC (permalink / raw) To: Thomas Gleixner Cc: KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 On Sun, Jan 21, 2018 at 03:31:28PM +0100, Thomas Gleixner wrote: > Oh yes, we want a microcode blacklist. Ideally we refuse to load the > affected microcode in the first place and if its already loaded then at > least avoid to use the borked features. > > PR texts promising that Intel is committed to transparency in this matter > are not sufficient. Intel, please provide the facts, i.e. a proper list of > micro codes and affected SKUs, ASAP. If we have to do blacklisting, then we need to blacklist microcode revisions and fixed ones should be incremented. I.e., we need a way to *detect* the faulty microcode revision at load time. Also, blacklisting microcode for early loading will become an ugly dance so I'd like to avoid it if possible. Thus, it would be much much easier if dracut/initrd creation thing already filters those blacklisted blobs by looking at the revision in the header. Which is much easier. Yeah, something like that. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-21 14:56 ` Borislav Petkov @ 2018-01-22 9:51 ` Peter Zijlstra 2018-01-22 12:06 ` Borislav Petkov 0 siblings, 1 reply; 75+ messages in thread From: Peter Zijlstra @ 2018-01-22 9:51 UTC (permalink / raw) To: Borislav Petkov Cc: Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 On Sun, Jan 21, 2018 at 03:56:55PM +0100, Borislav Petkov wrote: > Also, blacklisting microcode for early loading will become an ugly dance > so I'd like to avoid it if possible. > > Thus, it would be much much easier if dracut/initrd creation thing > already filters those blacklisted blobs by looking at the revision in > the header. Which is much easier. That wouldn't be enough; AFAIU there's people with this stuff already flashed in their BIOS. So the kernel needs to deal with it one way or another. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-22 9:51 ` Peter Zijlstra @ 2018-01-22 12:06 ` Borislav Petkov 2018-01-22 13:30 ` Greg Kroah-Hartman 0 siblings, 1 reply; 75+ messages in thread From: Borislav Petkov @ 2018-01-22 12:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 On Mon, Jan 22, 2018 at 10:51:53AM +0100, Peter Zijlstra wrote: > That wouldn't be enough; AFAIU there's people with this stuff already > flashed in their BIOS. So the kernel needs to deal with it one way or > another. Not a lot we can do there except maybe disable IBRS on those and users can go and complain to their BIOS vendor to give them a downgrade or they can downgrade themselves. If we had free BIOS, this would've been a whole different story... -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-22 12:06 ` Borislav Petkov @ 2018-01-22 13:30 ` Greg Kroah-Hartman 2018-01-22 13:37 ` Woodhouse, David 0 siblings, 1 reply; 75+ messages in thread From: Greg Kroah-Hartman @ 2018-01-22 13:30 UTC (permalink / raw) To: Borislav Petkov Cc: Peter Zijlstra, Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Dan Williams, Dave Hansen, David Woodhouse, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 On Mon, Jan 22, 2018 at 01:06:18PM +0100, Borislav Petkov wrote: > On Mon, Jan 22, 2018 at 10:51:53AM +0100, Peter Zijlstra wrote: > > That wouldn't be enough; AFAIU there's people with this stuff already > > flashed in their BIOS. So the kernel needs to deal with it one way or > > another. > > Not a lot we can do there except maybe disable IBRS on those and users > can go and complain to their BIOS vendor to give them a downgrade or > they can downgrade themselves. > > If we had free BIOS, this would've been a whole different story... We kind of do, you can submit patches to UEFI, but I doubt that the processor-specific portions are actually present in the Tianocore code to be able to be patched. What about LinuxBoot <https://linuxboot.org>, does it too take over too late in the boot process to control this? thanks, greg k-h ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-22 13:30 ` Greg Kroah-Hartman @ 2018-01-22 13:37 ` Woodhouse, David 0 siblings, 0 replies; 75+ messages in thread From: Woodhouse, David @ 2018-01-22 13:37 UTC (permalink / raw) To: gregkh, bp Cc: kvm, linux-kernel, peterz, arjan, Raslan, KarimAllah, ashok.raj, tglx, Janakarajan.Natarajan, tim.c.chen, ak, joro, dan.j.williams, x86, hpa, aarcange, mingo, luto, torvalds, pbonzini, dave.hansen, mhiramat, thomas.lendacky, asit.k.mallick, jun.nakajima, labbott, rkrcmar [-- Attachment #1.1: Type: text/plain, Size: 660 bytes --] On Mon, 2018-01-22 at 14:30 +0100, Greg Kroah-Hartman wrote: > We kind of do, you can submit patches to UEFI, but I doubt that the > processor-specific portions are actually present in the Tianocore code > to be able to be patched. This is just about which microcode your BIOS loads into the CPU before booting the OS. It's not "process-specific portions in the Tianocore code"; more a data blob — just like when Linux updates microcode. > What about LinuxBoot <https://linuxboot.org>, does it too take over too > late in the boot process to control this? Yes, I believe microcode updates are done in PEI which is before LinuxBoot takes over. [-- Attachment #1.2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5210 bytes --] [-- Attachment #2.1: Type: text/plain, Size: 197 bytes --] Amazon Web Services UK Limited. Registered in England and Wales with registration number 08650665 and which has its registered office at 60 Holborn Viaduct, London EC1A 2FD, United Kingdom. [-- Attachment #2.2: Type: text/html, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-21 14:31 ` Thomas Gleixner 2018-01-21 14:56 ` Borislav Petkov @ 2018-01-21 15:25 ` David Woodhouse 2018-01-23 20:58 ` David Woodhouse 2 siblings, 0 replies; 75+ messages in thread From: David Woodhouse @ 2018-01-21 15:25 UTC (permalink / raw) To: Thomas Gleixner Cc: KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, "Radim Krčmář", Tim Chen, Tom Lendacky, kvm, x86 > On Sat, 20 Jan 2018, KarimAllah Ahmed wrote: >> From: David Woodhouse <dwmw@amazon.co.uk> >> >> Not functional yet; just add the handling for it in the Spectre v2 >> mitigation selection, and the X86_FEATURE_IBRS flag which will control >> the code to be added in later patches. >> >> Also take the #ifdef CONFIG_RETPOLINE from around the RSB-stuffing; IBRS >> mode will want that too. >> >> For now we are auto-selecting IBRS on Skylake. We will probably end up >> changing that but for now let's default to the safest option. >> >> XX: Do we want a microcode blacklist? > > Oh yes, we want a microcode blacklist. Ideally we refuse to load the > affected microcode in the first place and if its already loaded then at > least avoid to use the borked features. > > PR texts promising that Intel is committed to transparency in this matter > are not sufficient. Intel, please provide the facts, i.e. a proper list of > micro codes and affected SKUs, ASAP. Perhaps we could start with the list already published by VMware at https://kb.vmware.com/s/article/52345 -- dwmw2 ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-21 14:31 ` Thomas Gleixner 2018-01-21 14:56 ` Borislav Petkov 2018-01-21 15:25 ` David Woodhouse @ 2018-01-23 20:58 ` David Woodhouse 2018-01-23 22:43 ` Johannes Erdfelt 2018-01-24 8:47 ` Peter Zijlstra 2 siblings, 2 replies; 75+ messages in thread From: David Woodhouse @ 2018-01-23 20:58 UTC (permalink / raw) To: Thomas Gleixner, KarimAllah Ahmed Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 [-- Attachment #1: Type: text/plain, Size: 4341 bytes --] On Sun, 2018-01-21 at 15:31 +0100, Thomas Gleixner wrote: > > > > XX: Do we want a microcode blacklist? > > Oh yes, we want a microcode blacklist. Ideally we refuse to load the > affected microcode in the first place and if its already loaded then at > least avoid to use the borked features. > > PR texts promising that Intel is committed to transparency in this matter > are not sufficient. Intel, please provide the facts, i.e. a proper list of > micro codes and affected SKUs, ASAP. They've finally published one, at https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/microcode-update-guidance.pdf For shits and giggles, you can compare it with the one at https://kb.vmware.com/s/article/52345 Intel's seems to be a bit rushed. For example for Broadwell-EX 406F1 they say "0x25, 0x23" are bad, but VMware's list says 0x0B000025 and I have a CPU with 0x0B0000xx. So I've "corrected" their numbers in attempt at a blacklist patch accordingly, and likewise for some Skylake SKUs. But there are others in Intel's list that I can't easily proofread for them right now. Am I missing something? diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index b720dacac051..52855d1a4f9a 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -102,6 +102,57 @@ static void probe_xeon_phi_r3mwait(struct cpuinfo_x86 *c) ELF_HWCAP2 |= HWCAP2_RING3MWAIT; } +/* + * Early microcode releases for the Spectre v2 mitigation were broken: + * https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/microcode-update-guidance.pdf + * VMware also has a list at https://kb.vmware.com/s/article/52345 + */ +struct sku_microcode { + u8 model; + u8 stepping; + u32 microcode; +}; +static const struct sku_microcode spectre_bad_microcodes[] = { + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0B, 0x80 }, + { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 }, + { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 }, + { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x80 }, + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 }, + { INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003C }, + { INTEL_FAM6_SKYLAKE_MOBILE, 0x03, 0x000000C2 }, + { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0x000000C2 }, + { INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 }, + { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x0000001B }, + { INTEL_FAM6_HASWELL_ULT, 0x01, 0x21 }, + { INTEL_FAM6_HASWELL_GT3E, 0x01, 0x18 }, + { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 }, + { INTEL_FAM6_IVYBRIDGE_X, 0x04, 0x42a }, + { INTEL_FAM6_HASWELL_X, 0x02, 0x3b }, + { INTEL_FAM6_HASWELL_X, 0x04, 0x10 }, + { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 }, + { INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 }, + { INTEL_FAM6_BROADWELL_XEON_D, 0x03, 0x7000011 }, + { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x0000001B }, + /* For 406F1 Intel says "0x25, 0x23" while VMware says 0x0B000025 + * and a real CPU has a firmware in the 0x0B0000xx range. So: */ + { INTEL_FAM6_BROADWELL_X, 0x01, 0x0b000025 }, + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 }, + { INTEL_FAM6_SKYLAKE_X, 0x03, 0x100013e }, + { INTEL_FAM6_SKYLAKE_X, 0x04, 0x200003c }, +}; + +static int bad_spectre_microcode(struct cpuinfo_x86 *c) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(spectre_bad_microcodes); i++) { + if (c->x86_model == spectre_bad_microcodes[i].model && + c->x86_mask == spectre_bad_microcodes[i].stepping) + return (c->microcode <= spectre_bad_microcodes[i].microcode); + } + return 0; +} + static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable; @@ -122,6 +173,18 @@ static void early_init_intel(struct cpuinfo_x86 *c) if (c->x86 >= 6 && !cpu_has(c, X86_FEATURE_IA64)) c->microcode = intel_get_microcode_revision(); + if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) || + cpu_has(c, X86_FEATURE_AMD_SPEC_CTRL) || + cpu_has(c, X86_FEATURE_AMD_PRED_CMD) || + cpu_has(c, X86_FEATURE_AMD_STIBP)) && bad_spectre_microcode(c)) { + pr_warn("Intel Spectre v2 broken microcode detected; disabling SPEC_CTRL\n"); + clear_cpu_cap(c, X86_FEATURE_SPEC_CTRL); + clear_cpu_cap(c, X86_FEATURE_STIBP); + clear_cpu_cap(c, X86_FEATURE_AMD_SPEC_CTRL); + clear_cpu_cap(c, X86_FEATURE_AMD_PRED_CMD); + clear_cpu_cap(c, X86_FEATURE_AMD_STIBP); + } + /* * Atom erratum AAE44/AAF40/AAG38/AAH41: * [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-23 20:58 ` David Woodhouse @ 2018-01-23 22:43 ` Johannes Erdfelt 2018-01-24 8:47 ` Peter Zijlstra 1 sibling, 0 replies; 75+ messages in thread From: Johannes Erdfelt @ 2018-01-23 22:43 UTC (permalink / raw) To: David Woodhouse Cc: Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 On Tue, Jan 23, 2018, David Woodhouse <dwmw2@infradead.org> wrote: > + { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 }, > + { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 }, > + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 }, > + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 }, > + { INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003C }, > + { INTEL_FAM6_SKYLAKE_X, 0x04, 0x200003c }, > + { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x0000001B }, > + { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x0000001B }, > + { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 }, > + { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 }, There appear to be a handful of duplicates in this list. JE ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-23 20:58 ` David Woodhouse 2018-01-23 22:43 ` Johannes Erdfelt @ 2018-01-24 8:47 ` Peter Zijlstra 2018-01-24 9:02 ` David Woodhouse 2018-01-24 12:14 ` David Woodhouse 1 sibling, 2 replies; 75+ messages in thread From: Peter Zijlstra @ 2018-01-24 8:47 UTC (permalink / raw) To: David Woodhouse Cc: Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 On Tue, Jan 23, 2018 at 08:58:36PM +0000, David Woodhouse wrote: > +static const struct sku_microcode spectre_bad_microcodes[] = { > + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0B, 0x80 }, > + { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 }, > + { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 }, > + { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x80 }, > + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 }, > + { INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003C }, > + { INTEL_FAM6_SKYLAKE_MOBILE, 0x03, 0x000000C2 }, > + { INTEL_FAM6_SKYLAKE_DESKTOP, 0x03, 0x000000C2 }, > + { INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 }, > + { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x0000001B }, > + { INTEL_FAM6_HASWELL_ULT, 0x01, 0x21 }, > + { INTEL_FAM6_HASWELL_GT3E, 0x01, 0x18 }, > + { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 }, > + { INTEL_FAM6_IVYBRIDGE_X, 0x04, 0x42a }, > + { INTEL_FAM6_HASWELL_X, 0x02, 0x3b }, > + { INTEL_FAM6_HASWELL_X, 0x04, 0x10 }, > + { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 }, > + { INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 }, > + { INTEL_FAM6_BROADWELL_XEON_D, 0x03, 0x7000011 }, > + { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x0000001B }, > + /* For 406F1 Intel says "0x25, 0x23" while VMware says 0x0B000025 > + * and a real CPU has a firmware in the 0x0B0000xx range. So: */ > + { INTEL_FAM6_BROADWELL_X, 0x01, 0x0b000025 }, > + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 }, > + { INTEL_FAM6_SKYLAKE_X, 0x03, 0x100013e }, > + { INTEL_FAM6_SKYLAKE_X, 0x04, 0x200003c }, > +}; Typically tglx likes to use x86_match_cpu() for these things; see also commit: bd9240a18edfb ("x86/apic: Add TSC_DEADLINE quirk due to errata"). > + > +static int bad_spectre_microcode(struct cpuinfo_x86 *c) > +{ > + int i; > + > + for (i = 0; i < ARRAY_SIZE(spectre_bad_microcodes); i++) { > + if (c->x86_model == spectre_bad_microcodes[i].model && > + c->x86_mask == spectre_bad_microcodes[i].stepping) > + return (c->microcode <= spectre_bad_microcodes[i].microcode); > + } > + return 0; > +} The above is Intel only, you should check vendor too I think. > static void early_init_intel(struct cpuinfo_x86 *c) > { > u64 misc_enable; > @@ -122,6 +173,18 @@ static void early_init_intel(struct cpuinfo_x86 *c) > if (c->x86 >= 6 && !cpu_has(c, X86_FEATURE_IA64)) > c->microcode = intel_get_microcode_revision(); > > + if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) || > + cpu_has(c, X86_FEATURE_AMD_SPEC_CTRL) || > + cpu_has(c, X86_FEATURE_AMD_PRED_CMD) || > + cpu_has(c, X86_FEATURE_AMD_STIBP)) && bad_spectre_microcode(c)) { > + pr_warn("Intel Spectre v2 broken microcode detected; disabling SPEC_CTRL\n"); > + clear_cpu_cap(c, X86_FEATURE_SPEC_CTRL); > + clear_cpu_cap(c, X86_FEATURE_STIBP); > + clear_cpu_cap(c, X86_FEATURE_AMD_SPEC_CTRL); > + clear_cpu_cap(c, X86_FEATURE_AMD_PRED_CMD); > + clear_cpu_cap(c, X86_FEATURE_AMD_STIBP); > + } And since its Intel only, what are those AMD features doing there? ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-24 8:47 ` Peter Zijlstra @ 2018-01-24 9:02 ` David Woodhouse 2018-01-24 9:10 ` Greg Kroah-Hartman ` (2 more replies) 2018-01-24 12:14 ` David Woodhouse 1 sibling, 3 replies; 75+ messages in thread From: David Woodhouse @ 2018-01-24 9:02 UTC (permalink / raw) To: Peter Zijlstra Cc: Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 [-- Attachment #1: Type: text/plain, Size: 3037 bytes --] On Wed, 2018-01-24 at 09:47 +0100, Peter Zijlstra wrote: > Typically tglx likes to use x86_match_cpu() for these things; see also > commit: bd9240a18edfb ("x86/apic: Add TSC_DEADLINE quirk due to > errata"). Thanks, will fix. I think we might also end up in whitelist mode, adding "known good" microcodes to the list as they get released or retroactively blessed. I would really have liked a new bit in IA32_ARCH_CAPABILITIES to say that it's safe, but that's not possible for *existing* microcode which actually turns out to be OK in the end. That means the whitelist ends up basically empty right now. Should I add a command line parameter to override it? Otherwise we end up having to rebuild the kernel every time there's a microcode release which covers a new CPU SKU (which is why I kind of hate the whitelist, but Arjan is very insistent...) I'm kind of tempted to turn it into a whitelist just by adding 1 to the microcode revision in each table entry. Sure, that N+1 might be another microcode build that also has issues but never saw the light of day... but that's OK as long it never *does*. And yes we'd have to tweak it if revisions that are blacklisted in the Intel doc are subsequently cleared. But at least it'd require *less* tweaking. > > > > + > > +static int bad_spectre_microcode(struct cpuinfo_x86 *c) > > +{ > > + int i; > > + > > + for (i = 0; i < ARRAY_SIZE(spectre_bad_microcodes); i++) { > > + if (c->x86_model == spectre_bad_microcodes[i].model && > > + c->x86_mask == spectre_bad_microcodes[i].stepping) > > + return (c->microcode <= spectre_bad_microcodes[i].microcode); > > + } > > + return 0; > > +} > The above is Intel only, you should check vendor too I think. It's in intel.c, called from early_init_intel(). Isn't that sufficient? > > > > static void early_init_intel(struct cpuinfo_x86 *c) > > { > > u64 misc_enable; > > @@ -122,6 +173,18 @@ static void early_init_intel(struct cpuinfo_x86 *c) > > if (c->x86 >= 6 && !cpu_has(c, X86_FEATURE_IA64)) > > c->microcode = intel_get_microcode_revision(); > > > > + if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) || > > + cpu_has(c, X86_FEATURE_AMD_SPEC_CTRL) || > > + cpu_has(c, X86_FEATURE_AMD_PRED_CMD) || > > + cpu_has(c, X86_FEATURE_AMD_STIBP)) && bad_spectre_microcode(c)) { > > + pr_warn("Intel Spectre v2 broken microcode detected; disabling SPEC_CTRL\n"); > > + clear_cpu_cap(c, X86_FEATURE_SPEC_CTRL); > > + clear_cpu_cap(c, X86_FEATURE_STIBP); > > + clear_cpu_cap(c, X86_FEATURE_AMD_SPEC_CTRL); > > + clear_cpu_cap(c, X86_FEATURE_AMD_PRED_CMD); > > + clear_cpu_cap(c, X86_FEATURE_AMD_STIBP); > > + } > And since its Intel only, what are those AMD features doing there? Hypervisors which only want to expose PRED_CMD may do so using the AMD feature bit. SPEC_CTRL requires save/restore and live migration support, and isn't needed with retpoline anyway (since guests won't be calling directly into firmware). [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-24 9:02 ` David Woodhouse @ 2018-01-24 9:10 ` Greg Kroah-Hartman 2018-01-24 15:09 ` Arjan van de Ven 2018-01-24 9:34 ` Peter Zijlstra 2018-01-24 10:49 ` Henrique de Moraes Holschuh 2 siblings, 1 reply; 75+ messages in thread From: Greg Kroah-Hartman @ 2018-01-24 9:10 UTC (permalink / raw) To: David Woodhouse Cc: Peter Zijlstra, Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 On Wed, Jan 24, 2018 at 09:02:21AM +0000, David Woodhouse wrote: > On Wed, 2018-01-24 at 09:47 +0100, Peter Zijlstra wrote: > > Typically tglx likes to use x86_match_cpu() for these things; see also > > commit: bd9240a18edfb ("x86/apic: Add TSC_DEADLINE quirk due to > > errata"). > > Thanks, will fix. I think we might also end up in whitelist mode, > adding "known good" microcodes to the list as they get released or > retroactively blessed. > > I would really have liked a new bit in IA32_ARCH_CAPABILITIES to say > that it's safe, but that's not possible for *existing* microcode which > actually turns out to be OK in the end. > > That means the whitelist ends up basically empty right now. Should I > add a command line parameter to override it? Otherwise we end up having > to rebuild the kernel every time there's a microcode release which > covers a new CPU SKU (which is why I kind of hate the whitelist, but > Arjan is very insistent...) Ick, no, whitelists are a pain for everyone involved. Don't do that unless it is absolutely the only way it will ever work. Arjan, why do you think this can only be done as a whitelist? It's much easier to just mark the "bad" microcode versions as those _should_ be a much smaller list that Intel knows about today. And of course, any future microcode updates will not be "bad" because they know how to properly test for this now before they are released :) thanks, greg k-h ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-24 9:10 ` Greg Kroah-Hartman @ 2018-01-24 15:09 ` Arjan van de Ven 2018-01-24 15:18 ` David Woodhouse 0 siblings, 1 reply; 75+ messages in thread From: Arjan van de Ven @ 2018-01-24 15:09 UTC (permalink / raw) To: Greg Kroah-Hartman, David Woodhouse Cc: Peter Zijlstra, Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 On 1/24/2018 1:10 AM, Greg Kroah-Hartman wrote: > >> That means the whitelist ends up basically empty right now. Should I >> add a command line parameter to override it? Otherwise we end up having >> to rebuild the kernel every time there's a microcode release which >> covers a new CPU SKU (which is why I kind of hate the whitelist, but >> Arjan is very insistent...) > > Ick, no, whitelists are a pain for everyone involved. Don't do that > unless it is absolutely the only way it will ever work. > > Arjan, why do you think this can only be done as a whitelist? I suggested a minimum version list for those cpus that need it. microcode versions are tricky (and we've released betas etc etc with their own numbers) and as a result there might be several numbers that have those issues with their IBRS for the same F/M/S ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-24 15:09 ` Arjan van de Ven @ 2018-01-24 15:18 ` David Woodhouse 0 siblings, 0 replies; 75+ messages in thread From: David Woodhouse @ 2018-01-24 15:18 UTC (permalink / raw) To: Arjan van de Ven, Greg Kroah-Hartman Cc: Peter Zijlstra, Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 [-- Attachment #1: Type: text/plain, Size: 1112 bytes --] On Wed, 2018-01-24 at 07:09 -0800, Arjan van de Ven wrote: > On 1/24/2018 1:10 AM, Greg Kroah-Hartman wrote: > > Arjan, why do you think this can only be done as a whitelist? > > I suggested a minimum version list for those cpus that need it. > > microcode versions are tricky (and we've released betas etc etc with their own numbers) > and as a result there might be several numbers that have those issues with their IBRS for the same F/M/S I really think that's fine. Anyone who uses beta microcodes, should be perfectly prepared to deal with the results. And probably *wanted* to be able to actually test them, instead of having the kernel refuse to do so. So if there are beta microcodes floating around with numbers higher than in Intel's currently-published list, which are not yet known to be safe (or even if they're known not to be), that's absolutely OK. If you're telling me that there will be *publicly* released microcodes with version numbers higher than those in the list, which still have the same issues... well, then I think Mr Shouty is going to come for another visit. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-24 9:02 ` David Woodhouse 2018-01-24 9:10 ` Greg Kroah-Hartman @ 2018-01-24 9:34 ` Peter Zijlstra 2018-01-24 10:49 ` Henrique de Moraes Holschuh 2 siblings, 0 replies; 75+ messages in thread From: Peter Zijlstra @ 2018-01-24 9:34 UTC (permalink / raw) To: David Woodhouse Cc: Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 > > > + for (i = 0; i < ARRAY_SIZE(spectre_bad_microcodes); i++) { > > > + if (c->x86_model == spectre_bad_microcodes[i].model && > > > + c->x86_mask == spectre_bad_microcodes[i].stepping) > > > + return (c->microcode <= spectre_bad_microcodes[i].microcode); > > > + } > > > + return 0; > > > +} > > The above is Intel only, you should check vendor too I think. > > It's in intel.c, called from early_init_intel(). Isn't that sufficient? Duh, so much for reading skillz on my end ;-) > > > + pr_warn("Intel Spectre v2 broken microcode detected; disabling SPEC_CTRL\n"); > > > + clear_cpu_cap(c, X86_FEATURE_SPEC_CTRL); > > > + clear_cpu_cap(c, X86_FEATURE_STIBP); > > > + clear_cpu_cap(c, X86_FEATURE_AMD_SPEC_CTRL); > > > + clear_cpu_cap(c, X86_FEATURE_AMD_PRED_CMD); > > > + clear_cpu_cap(c, X86_FEATURE_AMD_STIBP); > > > + } > > And since its Intel only, what are those AMD features doing there? > > Hypervisors which only want to expose PRED_CMD may do so using the AMD > feature bit. SPEC_CTRL requires save/restore and live migration > support, and isn't needed with retpoline anyway (since guests won't be > calling directly into firmware). Egads, I suppose that makes some sense, but it does make a horrible muddle of things. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-24 9:02 ` David Woodhouse 2018-01-24 9:10 ` Greg Kroah-Hartman 2018-01-24 9:34 ` Peter Zijlstra @ 2018-01-24 10:49 ` Henrique de Moraes Holschuh 2018-01-24 12:30 ` David Woodhouse 2 siblings, 1 reply; 75+ messages in thread From: Henrique de Moraes Holschuh @ 2018-01-24 10:49 UTC (permalink / raw) To: David Woodhouse Cc: Peter Zijlstra, Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 On Wed, 24 Jan 2018, David Woodhouse wrote: > I'm kind of tempted to turn it into a whitelist just by adding 1 to the > microcode revision in each table entry. Sure, that N+1 might be another > microcode build that also has issues but never saw the light of day... Watch out for the (AFAIK) still not properly documented where it should be (i.e. the microcode chapter of the Intel SDM) weirdness in Skylake+ microcode revision. Actually, this is related to SGX, so anything that has SGX. When it has SGX inside, Intel will release microcode only with even revision numbers, but the processor may report it as odd (and will do so by subtracting 1, so microcode 0xb0 is the same as microcode 0xaf) when the update is loaded by the processor itself from FIT (as opposed as being loaded by WRMSR from BIOS/UEFI/OS). So, you could see N-1 from within Linux if we did not update the microcode, and fail to trigger a whitelist (or mistrigger a blacklist). -- Henrique Holschuh ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-24 10:49 ` Henrique de Moraes Holschuh @ 2018-01-24 12:30 ` David Woodhouse 0 siblings, 0 replies; 75+ messages in thread From: David Woodhouse @ 2018-01-24 12:30 UTC (permalink / raw) To: Henrique de Moraes Holschuh Cc: Peter Zijlstra, Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 [-- Attachment #1: Type: text/plain, Size: 1938 bytes --] On Wed, 2018-01-24 at 08:49 -0200, Henrique de Moraes Holschuh wrote: > On Wed, 24 Jan 2018, David Woodhouse wrote: > > > > I'm kind of tempted to turn it into a whitelist just by adding 1 to the > > microcode revision in each table entry. Sure, that N+1 might be another > > microcode build that also has issues but never saw the light of day... > Watch out for the (AFAIK) still not properly documented where it should > be (i.e. the microcode chapter of the Intel SDM) weirdness in Skylake+ > microcode revision. Actually, this is related to SGX, so anything that > has SGX. > > When it has SGX inside, Intel will release microcode only with even > revision numbers, but the processor may report it as odd (and will do so > by subtracting 1, so microcode 0xb0 is the same as microcode 0xaf) when > the update is loaded by the processor itself from FIT (as opposed as > being loaded by WRMSR from BIOS/UEFI/OS). > > So, you could see N-1 from within Linux if we did not update the > microcode, and fail to trigger a whitelist (or mistrigger a blacklist). That's OK. If they ship a fixed 0x0200003E firmware for SKX, for example, which appears as 0x0200003D when it's loaded from FIT, that's still >= 0x0200003C *and* !(<0x0200003D) if we were to do that. In fact, the code for the "whitelist X+1" vs. "blacklist X" approach is *entirely* equivalent; it's purely a cosmetic change. Because !(< X) ≡ ≥ (X+1) The *real* change here is that for ∀ SKU, we are being asked to blacklist all microcode revisions <= 0xFFFFFFFF¹ for now, and change that only once new microcode is actually released. Every time, and then get people to rebuild their kernels because they can *use* the features from the new microcode. ¹(OK, *there's* a functional difference between whitelist and blacklist approach. But we'll never actually see 0xffffffff so that's not important right now :) [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-24 8:47 ` Peter Zijlstra 2018-01-24 9:02 ` David Woodhouse @ 2018-01-24 12:14 ` David Woodhouse 2018-01-24 12:29 ` Peter Zijlstra 1 sibling, 1 reply; 75+ messages in thread From: David Woodhouse @ 2018-01-24 12:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 [-- Attachment #1: Type: text/plain, Size: 1217 bytes --] On Wed, 2018-01-24 at 09:47 +0100, Peter Zijlstra wrote: > > Typically tglx likes to use x86_match_cpu() for these things; see also > commit: bd9240a18edfb ("x86/apic: Add TSC_DEADLINE quirk due to > errata"). Ewww. static u32 hsx_deadline_rev(void) { switch (boot_cpu_data.x86_mask) { case 0x02: return 0x3a; /* EP */ case 0x04: return 0x0f; /* EX */ } return ~0U; } ... static const struct x86_cpu_id deadline_match[] = { DEADLINE_MODEL_MATCH_FUNC( INTEL_FAM6_HASWELL_X, hsx_deadline_rev), DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_BROADWELL_X, 0x0b000020), DEADLINE_MODEL_MATCH_FUNC( INTEL_FAM6_BROADWELL_XEON_D, bdx_deadline_rev), DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_X, 0x02000014), ... /* * Function pointers will have the MSB set due to address layout, * immediate revisions will not. */ if ((long)m->driver_data < 0) rev = ((u32 (*)(void))(m->driver_data))(); else rev = (u32)m->driver_data; EWWWW! Shan't. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-24 12:14 ` David Woodhouse @ 2018-01-24 12:29 ` Peter Zijlstra 2018-01-24 12:58 ` David Woodhouse 0 siblings, 1 reply; 75+ messages in thread From: Peter Zijlstra @ 2018-01-24 12:29 UTC (permalink / raw) To: David Woodhouse Cc: Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 On Wed, Jan 24, 2018 at 12:14:51PM +0000, David Woodhouse wrote: > On Wed, 2018-01-24 at 09:47 +0100, Peter Zijlstra wrote: > > > > Typically tglx likes to use x86_match_cpu() for these things; see also > > commit: bd9240a18edfb ("x86/apic: Add TSC_DEADLINE quirk due to > > errata"). > > Ewww. > > static u32 hsx_deadline_rev(void) > { > switch (boot_cpu_data.x86_mask) { > case 0x02: return 0x3a; /* EP */ > case 0x04: return 0x0f; /* EX */ > } > > return ~0U; > } > ... > static const struct x86_cpu_id deadline_match[] = { > DEADLINE_MODEL_MATCH_FUNC( INTEL_FAM6_HASWELL_X, hsx_deadline_rev), > DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_BROADWELL_X, 0x0b000020), > DEADLINE_MODEL_MATCH_FUNC( INTEL_FAM6_BROADWELL_XEON_D, bdx_deadline_rev), > DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_X, 0x02000014), > ... > > /* > * Function pointers will have the MSB set due to address layout, > * immediate revisions will not. > */ > if ((long)m->driver_data < 0) > rev = ((u32 (*)(void))(m->driver_data))(); > else > rev = (u32)m->driver_data; > > EWWWW! > Yes :/ We could look at extending x86_cpu_id and x86_match_cpu with a stepping option I suppose, but that might be lots of churn. Thomas? ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-24 12:29 ` Peter Zijlstra @ 2018-01-24 12:58 ` David Woodhouse 0 siblings, 0 replies; 75+ messages in thread From: David Woodhouse @ 2018-01-24 12:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Thomas Gleixner, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Radim Krčmář, Tim Chen, Tom Lendacky, kvm, x86 [-- Attachment #1: Type: text/plain, Size: 697 bytes --] On Wed, 2018-01-24 at 13:29 +0100, Peter Zijlstra wrote: > > Yes :/ > > We could look at extending x86_cpu_id and x86_match_cpu with a stepping > option I suppose, but that might be lots of churn. That goes all the way to mod_deviceinfo, and would be horrid. We could add an x86_match_cpu_stepping() function, I suppose? But I'm mostly trying to avoid depending on other stuff like that, for patches which are going to need to be backported to all the stable kernels. I'd much rather do it this way and then if we see another use case for it (that commit you mentioned could be nicer, I suppose), consolidate into a single stepping-capable lookup function in a later "cleanup". [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-20 19:22 ` [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure KarimAllah Ahmed 2018-01-21 14:31 ` Thomas Gleixner @ 2018-01-29 20:14 ` Eduardo Habkost 2018-01-29 20:17 ` David Woodhouse 2018-01-31 10:03 ` [RFC 05/10] " Christophe de Dinechin 2 siblings, 1 reply; 75+ messages in thread From: Eduardo Habkost @ 2018-01-29 20:14 UTC (permalink / raw) To: KarimAllah Ahmed Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86 On Sat, Jan 20, 2018 at 08:22:56PM +0100, KarimAllah Ahmed wrote: > From: David Woodhouse <dwmw@amazon.co.uk> > > Not functional yet; just add the handling for it in the Spectre v2 > mitigation selection, and the X86_FEATURE_IBRS flag which will control > the code to be added in later patches. > > Also take the #ifdef CONFIG_RETPOLINE from around the RSB-stuffing; IBRS > mode will want that too. > > For now we are auto-selecting IBRS on Skylake. We will probably end up > changing that but for now let's default to the safest option. > > XX: Do we want a microcode blacklist? > > [karahmed: simplify the switch block and get rid of all the magic] > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> [...] > + case SPECTRE_V2_CMD_FORCE: > + /* > + * If we have IBRS support, and either Skylake or !RETPOLINE, > + * then that's what we do. > + */ > + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL) && > + (is_skylake_era() || !retp_compiler())) { Sorry for being confused here, as probably the answer is buried on a LKML thread somewhere. The comment explains what the code does, but not why. Why exactly IBRS is preferred on Skylake? I'm asking this because I would like to understand the risks involved when running under a hypervisor exposing CPUID data that don't match the host CPU. e.g.: what happens if a VM is migrated from a Broadwell host to a Skylake host? > + mode = SPECTRE_V2_IBRS; > + setup_force_cpu_cap(X86_FEATURE_IBRS); > + break; > + } > + /* Fall through */ > case SPECTRE_V2_CMD_RETPOLINE: [...] -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 20:14 ` [RFC,05/10] " Eduardo Habkost @ 2018-01-29 20:17 ` David Woodhouse 2018-01-29 20:42 ` Eduardo Habkost 0 siblings, 1 reply; 75+ messages in thread From: David Woodhouse @ 2018-01-29 20:17 UTC (permalink / raw) To: Eduardo Habkost, KarimAllah Ahmed Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86 [-- Attachment #1: Type: text/plain, Size: 589 bytes --] On Mon, 2018-01-29 at 18:14 -0200, Eduardo Habkost wrote: > > Sorry for being confused here, as probably the answer is buried > on a LKML thread somewhere. The comment explains what the code > does, but not why. Why exactly IBRS is preferred on Skylake? > > I'm asking this because I would like to understand the risks > involved when running under a hypervisor exposing CPUID data that > don't match the host CPU. e.g.: what happens if a VM is migrated > from a Broadwell host to a Skylake host? https://lkml.org/lkml/2018/1/22/598 should cover most of that, I think. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 20:17 ` David Woodhouse @ 2018-01-29 20:42 ` Eduardo Habkost 2018-01-29 20:44 ` Arjan van de Ven 0 siblings, 1 reply; 75+ messages in thread From: Eduardo Habkost @ 2018-01-29 20:42 UTC (permalink / raw) To: David Woodhouse Cc: KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert On Mon, Jan 29, 2018 at 08:17:02PM +0000, David Woodhouse wrote: > On Mon, 2018-01-29 at 18:14 -0200, Eduardo Habkost wrote: > > > > Sorry for being confused here, as probably the answer is buried > > on a LKML thread somewhere. The comment explains what the code > > does, but not why. Why exactly IBRS is preferred on Skylake? > > > > I'm asking this because I would like to understand the risks > > involved when running under a hypervisor exposing CPUID data that > > don't match the host CPU. e.g.: what happens if a VM is migrated > > from a Broadwell host to a Skylake host? > > https://lkml.org/lkml/2018/1/22/598 should cover most of that, I think. Thanks, it does answer some of my questions. So, it sounds like live-migration of a VM from a non-Skylake to a Skylake host will make the guest unsafe, unless the guest was explicitly configured to use IBRS. In a perfect world, Linux would never look at CPU family/model/stepping/microcode if running under a hypervisor, to take any decision. If Linux knows it's running under a hypervisor, it would be safer to assume retpolines aren't enough, unless the hypervisor is telling us otherwise. The question is how the hypervisor could tell that to the guest. If Intel doesn't give us a CPUID bit that can be used to tell that retpolines are enough, maybe we should use a hypervisor CPUID bit for that? -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 20:42 ` Eduardo Habkost @ 2018-01-29 20:44 ` Arjan van de Ven 2018-01-29 21:02 ` David Woodhouse 0 siblings, 1 reply; 75+ messages in thread From: Arjan van de Ven @ 2018-01-29 20:44 UTC (permalink / raw) To: Eduardo Habkost, David Woodhouse Cc: KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert On 1/29/2018 12:42 PM, Eduardo Habkost wrote: > The question is how the hypervisor could tell that to the guest. > If Intel doesn't give us a CPUID bit that can be used to tell > that retpolines are enough, maybe we should use a hypervisor > CPUID bit for that? the objective is to have retpoline be safe everywhere and never use IBRS (Linus was also pretty clear about that) so I'm confused by your question ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 20:44 ` Arjan van de Ven @ 2018-01-29 21:02 ` David Woodhouse 2018-01-29 21:37 ` Jim Mattson ` (3 more replies) 0 siblings, 4 replies; 75+ messages in thread From: David Woodhouse @ 2018-01-29 21:02 UTC (permalink / raw) To: Arjan van de Ven, Eduardo Habkost Cc: KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert [-- Attachment #1: Type: text/plain, Size: 1339 bytes --] On Mon, 2018-01-29 at 12:44 -0800, Arjan van de Ven wrote: > On 1/29/2018 12:42 PM, Eduardo Habkost wrote: > > > > The question is how the hypervisor could tell that to the guest. > > If Intel doesn't give us a CPUID bit that can be used to tell > > that retpolines are enough, maybe we should use a hypervisor > > CPUID bit for that? > > the objective is to have retpoline be safe everywhere and never use IBRS > (Linus was also pretty clear about that) so I'm confused by your question The question is about all the additional RSB-frobbing and call depth counting and other bits that don't really even exist for Skylake yet in a coherent form. If a guest doesn't have those, because it's running some future kernel where they *are* implemented but not enabled because at *boot* time it discovered it wasn't on Skylake, the question is what happens if that guest is subsequently migrated to a Skylake-class machine. To which the answer is obviously "oops, sucks to be you". So yes, *maybe* we want a way to advertise "you might be migrated to Skylake" if you're booted on a pre-SKL box in a migration pool where such is possible. That question is a reasonable one, and the answer possibly the same, regardless of whether the plan for Skylake is to use IBRS, or all the hypothetical other extra stuff. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 21:02 ` David Woodhouse @ 2018-01-29 21:37 ` Jim Mattson 2018-01-29 21:50 ` Eduardo Habkost 2018-01-29 21:37 ` Andi Kleen ` (2 subsequent siblings) 3 siblings, 1 reply; 75+ messages in thread From: Jim Mattson @ 2018-01-29 21:37 UTC (permalink / raw) To: David Woodhouse Cc: Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, LKML, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm list, the arch/x86 maintainers, Dr. David Alan Gilbert For GCE, "you might be migrated to Skylake" is pretty much a certainty. Even if you're in a zone that doesn't currently have Skylake machines, chances are pretty good that it will have Skylake machines some day in the not-too-distant future. In general, making these kinds of decisions based on F/M/S is probably unwise when running in a VM. On Mon, Jan 29, 2018 at 1:02 PM, David Woodhouse <dwmw2@infradead.org> wrote: > > > On Mon, 2018-01-29 at 12:44 -0800, Arjan van de Ven wrote: >> On 1/29/2018 12:42 PM, Eduardo Habkost wrote: >> > >> > The question is how the hypervisor could tell that to the guest. >> > If Intel doesn't give us a CPUID bit that can be used to tell >> > that retpolines are enough, maybe we should use a hypervisor >> > CPUID bit for that? >> >> the objective is to have retpoline be safe everywhere and never use IBRS >> (Linus was also pretty clear about that) so I'm confused by your question > > The question is about all the additional RSB-frobbing and call depth > counting and other bits that don't really even exist for Skylake yet in > a coherent form. > > If a guest doesn't have those, because it's running some future kernel > where they *are* implemented but not enabled because at *boot* time it > discovered it wasn't on Skylake, the question is what happens if that > guest is subsequently migrated to a Skylake-class machine. > > To which the answer is obviously "oops, sucks to be you". So yes, > *maybe* we want a way to advertise "you might be migrated to Skylake" > if you're booted on a pre-SKL box in a migration pool where such is > possible. > > That question is a reasonable one, and the answer possibly the same, > regardless of whether the plan for Skylake is to use IBRS, or all the > hypothetical other extra stuff. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 21:37 ` Jim Mattson @ 2018-01-29 21:50 ` Eduardo Habkost 2018-01-29 22:12 ` Jim Mattson 2018-01-29 22:25 ` Andi Kleen 0 siblings, 2 replies; 75+ messages in thread From: Eduardo Habkost @ 2018-01-29 21:50 UTC (permalink / raw) To: Jim Mattson Cc: David Woodhouse, Arjan van de Ven, KarimAllah Ahmed, LKML, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm list, the arch/x86 maintainers, Dr. David Alan Gilbert On Mon, Jan 29, 2018 at 01:37:05PM -0800, Jim Mattson wrote: > For GCE, "you might be migrated to Skylake" is pretty much a > certainty. Even if you're in a zone that doesn't currently have > Skylake machines, chances are pretty good that it will have Skylake > machines some day in the not-too-distant future. This kind of scenario is why I suggest a "we promise you're not going to be migrated to Skylake" bit instead a "you may be migrated to Skylake" bit. The hypervisor could prevent migration to Skylake hosts if management software chose to enable this bit, and guests would choose the safest option (i.e. assume the worst) if running on older hypervisors that don't set the bit. > > In general, making these kinds of decisions based on F/M/S is probably > unwise when running in a VM. Certainly. That's why I suggest not trusting f/m/s unless the hypervisor is explicitly saying it's accurate. -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 21:50 ` Eduardo Habkost @ 2018-01-29 22:12 ` Jim Mattson 2018-01-30 1:22 ` Eduardo Habkost 2018-01-29 22:25 ` Andi Kleen 1 sibling, 1 reply; 75+ messages in thread From: Jim Mattson @ 2018-01-29 22:12 UTC (permalink / raw) To: Eduardo Habkost Cc: David Woodhouse, Arjan van de Ven, KarimAllah Ahmed, LKML, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm list, the arch/x86 maintainers, Dr. David Alan Gilbert On Mon, Jan 29, 2018 at 1:50 PM, Eduardo Habkost <ehabkost@redhat.com> wrote: > On Mon, Jan 29, 2018 at 01:37:05PM -0800, Jim Mattson wrote: >> For GCE, "you might be migrated to Skylake" is pretty much a >> certainty. Even if you're in a zone that doesn't currently have >> Skylake machines, chances are pretty good that it will have Skylake >> machines some day in the not-too-distant future. > > This kind of scenario is why I suggest a "we promise you're not > going to be migrated to Skylake" bit instead a "you may be > migrated to Skylake" bit. The hypervisor could prevent migration > to Skylake hosts if management software chose to enable this bit, > and guests would choose the safest option (i.e. assume the worst) > if running on older hypervisors that don't set the bit. Giving customers this option promises the logistical nightmare of provisioning sufficient pre-Skylake-era machines in all pools until sufficient post-Skylake-era machines can be deployed to replace them. >> In general, making these kinds of decisions based on F/M/S is probably >> unwise when running in a VM. > > Certainly. That's why I suggest not trusting f/m/s unless the > hypervisor is explicitly saying it's accurate. > > -- > Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 22:12 ` Jim Mattson @ 2018-01-30 1:22 ` Eduardo Habkost 0 siblings, 0 replies; 75+ messages in thread From: Eduardo Habkost @ 2018-01-30 1:22 UTC (permalink / raw) To: Jim Mattson Cc: David Woodhouse, Arjan van de Ven, KarimAllah Ahmed, LKML, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm list, the arch/x86 maintainers, Dr. David Alan Gilbert On Mon, Jan 29, 2018 at 02:12:02PM -0800, Jim Mattson wrote: > On Mon, Jan 29, 2018 at 1:50 PM, Eduardo Habkost <ehabkost@redhat.com> wrote: > > On Mon, Jan 29, 2018 at 01:37:05PM -0800, Jim Mattson wrote: > >> For GCE, "you might be migrated to Skylake" is pretty much a > >> certainty. Even if you're in a zone that doesn't currently have > >> Skylake machines, chances are pretty good that it will have Skylake > >> machines some day in the not-too-distant future. > > > > This kind of scenario is why I suggest a "we promise you're not > > going to be migrated to Skylake" bit instead a "you may be > > migrated to Skylake" bit. The hypervisor could prevent migration > > to Skylake hosts if management software chose to enable this bit, > > and guests would choose the safest option (i.e. assume the worst) > > if running on older hypervisors that don't set the bit. > > Giving customers this option promises the logistical nightmare of > provisioning sufficient pre-Skylake-era machines in all pools until > sufficient post-Skylake-era machines can be deployed to replace them. If this is not practical, the hypervisor can simply choose to never make any of those promises to the guest OS. Never implementing any of those bits is also an option. But then guest OSes must be aware that the hypervisor can _not_ promise that f/m/s matches the host CPU, and can _not_ promise that the VM will never be migrated to Skylake CPUs. -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 21:50 ` Eduardo Habkost 2018-01-29 22:12 ` Jim Mattson @ 2018-01-29 22:25 ` Andi Kleen 2018-01-30 1:37 ` Eduardo Habkost 1 sibling, 1 reply; 75+ messages in thread From: Andi Kleen @ 2018-01-29 22:25 UTC (permalink / raw) To: Eduardo Habkost Cc: Jim Mattson, David Woodhouse, Arjan van de Ven, KarimAllah Ahmed, LKML, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm list, the arch/x86 maintainers, Dr. David Alan Gilbert I agree with your point that the common hypervisor practice to fake old model numbers will break some of the workarounds. Hypervisors may need to revisit their practice. > > In general, making these kinds of decisions based on F/M/S is probably > > unwise when running in a VM. > > Certainly. That's why I suggest not trusting f/m/s unless the > hypervisor is explicitly saying it's accurate. This would be only useful if there's an useful result of this non trust. But there isn't. Except for panic there's nothing you could do. And I don't think panic would be reasonable. The "Skylake bit " or "not skylake bit" doesn't make any sense to me. If a hypervisor wants to enable Skylake workarounds they need to provide the Skylake model number. If they don't think they need them because the VM can never be migrated to Skylake, then they don't need to set that model number. So there isn't any need for inventing any new bits, it's all already possible. -Andi ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 22:25 ` Andi Kleen @ 2018-01-30 1:37 ` Eduardo Habkost 0 siblings, 0 replies; 75+ messages in thread From: Eduardo Habkost @ 2018-01-30 1:37 UTC (permalink / raw) To: Andi Kleen Cc: Jim Mattson, David Woodhouse, Arjan van de Ven, KarimAllah Ahmed, LKML, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm list, the arch/x86 maintainers, Dr. David Alan Gilbert On Mon, Jan 29, 2018 at 02:25:12PM -0800, Andi Kleen wrote: > > I agree with your point that the common hypervisor practice to fake > old model numbers will break some of the workarounds. Hypervisors > may need to revisit their practice. > > > > In general, making these kinds of decisions based on F/M/S is probably > > > unwise when running in a VM. > > > > Certainly. That's why I suggest not trusting f/m/s unless the > > hypervisor is explicitly saying it's accurate. > > This would be only useful if there's an useful result of this > non trust. > > But there isn't. Except for panic there's nothing you could do. > And I don't think panic would be reasonable. Why it isn't an useful result to enable the Skylake workaround if unsure about the host CPU? > > The "Skylake bit " or "not skylake bit" doesn't make any sense > to me. If a hypervisor wants to enable Skylake workarounds > they need to provide the Skylake model number. If they don't > think they need them because the VM can never be migrated > to Skylake, then they don't need to set that model > number. > > So there isn't any need for inventing any new bits, it's > all already possible. It's already possible, until we find another bug in another CPU model that also needs to be worked around. We can't represent "please work around bugs in both Skylake and Westmere" in f/m/s. -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 21:02 ` David Woodhouse 2018-01-29 21:37 ` Jim Mattson @ 2018-01-29 21:37 ` Andi Kleen 2018-01-29 21:44 ` Eduardo Habkost 2018-01-30 0:23 ` Linus Torvalds 3 siblings, 0 replies; 75+ messages in thread From: Andi Kleen @ 2018-01-29 21:37 UTC (permalink / raw) To: David Woodhouse Cc: Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, linux-kernel, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert > The question is about all the additional RSB-frobbing and call depth > counting and other bits that don't really even exist for Skylake yet in > a coherent form. We have had several patch kits posted that all are in a "coherent form" That was the original one http://lkml.iu.edu/hypermail/linux/kernel/1801.1/05556.html and that's the newer one with only interrupt stuffing https://marc.info/?l=linux-kernel&m=151674718914504 We don't have generic deep chain handling yet, but everything else is there. -Andi ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 21:02 ` David Woodhouse 2018-01-29 21:37 ` Jim Mattson 2018-01-29 21:37 ` Andi Kleen @ 2018-01-29 21:44 ` Eduardo Habkost 2018-01-29 22:10 ` Konrad Rzeszutek Wilk 2018-01-30 0:23 ` Linus Torvalds 3 siblings, 1 reply; 75+ messages in thread From: Eduardo Habkost @ 2018-01-29 21:44 UTC (permalink / raw) To: David Woodhouse Cc: Arjan van de Ven, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert On Mon, Jan 29, 2018 at 09:02:39PM +0000, David Woodhouse wrote: > > > On Mon, 2018-01-29 at 12:44 -0800, Arjan van de Ven wrote: > > On 1/29/2018 12:42 PM, Eduardo Habkost wrote: > > > > > > The question is how the hypervisor could tell that to the guest. > > > If Intel doesn't give us a CPUID bit that can be used to tell > > > that retpolines are enough, maybe we should use a hypervisor > > > CPUID bit for that? > > > > the objective is to have retpoline be safe everywhere and never use IBRS > > (Linus was also pretty clear about that) so I'm confused by your question > > The question is about all the additional RSB-frobbing and call depth > counting and other bits that don't really even exist for Skylake yet in > a coherent form. > > If a guest doesn't have those, because it's running some future kernel > where they *are* implemented but not enabled because at *boot* time it > discovered it wasn't on Skylake, the question is what happens if that > guest is subsequently migrated to a Skylake-class machine. > > To which the answer is obviously "oops, sucks to be you". So yes, > *maybe* we want a way to advertise "you might be migrated to Skylake" > if you're booted on a pre-SKL box in a migration pool where such is > possible. > > That question is a reasonable one, and the answer possibly the same, > regardless of whether the plan for Skylake is to use IBRS, or all the > hypothetical other extra stuff. Maybe a generic "family/model/stepping/microcode really matches the CPU you are running on" bit would be useful. The bit could be enabled only on host-passthrough (aka "-cpu host") mode. If we really want to be able to migrate to host with different CPU models (except Skylake), we could add a more specific "we promise the host CPU is never going to be Skylake" bit. Now, if the hypervisor is not providing any of those bits, I would advise against trusting family/model/stepping/microcode under a hypervisor. Using a pre-defined CPU model (that doesn't necessarily match the host) is very common when using KVM VM management stacks. -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 21:44 ` Eduardo Habkost @ 2018-01-29 22:10 ` Konrad Rzeszutek Wilk 2018-01-30 1:12 ` Eduardo Habkost 0 siblings, 1 reply; 75+ messages in thread From: Konrad Rzeszutek Wilk @ 2018-01-29 22:10 UTC (permalink / raw) To: Eduardo Habkost Cc: David Woodhouse, Arjan van de Ven, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert On Mon, Jan 29, 2018 at 07:44:21PM -0200, Eduardo Habkost wrote: > On Mon, Jan 29, 2018 at 09:02:39PM +0000, David Woodhouse wrote: > > > > > > On Mon, 2018-01-29 at 12:44 -0800, Arjan van de Ven wrote: > > > On 1/29/2018 12:42 PM, Eduardo Habkost wrote: > > > > > > > > The question is how the hypervisor could tell that to the guest. > > > > If Intel doesn't give us a CPUID bit that can be used to tell > > > > that retpolines are enough, maybe we should use a hypervisor > > > > CPUID bit for that? > > > > > > the objective is to have retpoline be safe everywhere and never use IBRS > > > (Linus was also pretty clear about that) so I'm confused by your question > > > > The question is about all the additional RSB-frobbing and call depth > > counting and other bits that don't really even exist for Skylake yet in > > a coherent form. > > > > If a guest doesn't have those, because it's running some future kernel > > where they *are* implemented but not enabled because at *boot* time it > > discovered it wasn't on Skylake, the question is what happens if that > > guest is subsequently migrated to a Skylake-class machine. > > > > To which the answer is obviously "oops, sucks to be you". So yes, > > *maybe* we want a way to advertise "you might be migrated to Skylake" > > if you're booted on a pre-SKL box in a migration pool where such is > > possible. > > > > That question is a reasonable one, and the answer possibly the same, > > regardless of whether the plan for Skylake is to use IBRS, or all the > > hypothetical other extra stuff. > > Maybe a generic "family/model/stepping/microcode really matches > the CPU you are running on" bit would be useful. The bit could > be enabled only on host-passthrough (aka "-cpu host") mode. > > If we really want to be able to migrate to host with different > CPU models (except Skylake), we could add a more specific "we > promise the host CPU is never going to be Skylake" bit. > > Now, if the hypervisor is not providing any of those bits, I > would advise against trusting family/model/stepping/microcode > under a hypervisor. Using a pre-defined CPU model (that doesn't The migration code could be 'tickled' (when arrived at the destination) to recheck the CPUID and do the alternative logic to turn the proper bits on. And this tickling could be as simple as an ACPI DSDT/AML code specific to KVM PnP devices (say the CPUs?) to tell the guest to resample its environment? > necessarily match the host) is very common when using KVM VM > management stacks. > > -- > Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 22:10 ` Konrad Rzeszutek Wilk @ 2018-01-30 1:12 ` Eduardo Habkost 0 siblings, 0 replies; 75+ messages in thread From: Eduardo Habkost @ 2018-01-30 1:12 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: David Woodhouse, Arjan van de Ven, KarimAllah Ahmed, linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86, Dr. David Alan Gilbert On Mon, Jan 29, 2018 at 05:10:11PM -0500, Konrad Rzeszutek Wilk wrote: [...] > The migration code could be 'tickled' (when arrived at the destination) > to recheck the CPUID and do the alternative logic to turn the > proper bits on. > > And this tickling could be as simple as an ACPI DSDT/AML code > specific to KVM PnP devices (say the CPUs?) to tell the guest to > resample its environment? This would be nice to have for other CPU features, but if I understood a previous message from Andi on this thread correctly, it wouldn't be useful for the Spectre mitigations. -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-29 21:02 ` David Woodhouse ` (2 preceding siblings ...) 2018-01-29 21:44 ` Eduardo Habkost @ 2018-01-30 0:23 ` Linus Torvalds 2018-01-30 1:03 ` Jim Mattson ` (6 more replies) 3 siblings, 7 replies; 75+ messages in thread From: Linus Torvalds @ 2018-01-30 0:23 UTC (permalink / raw) To: David Woodhouse Cc: Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert On Mon, Jan 29, 2018 at 1:02 PM, David Woodhouse <dwmw2@infradead.org> wrote: > > On Mon, 2018-01-29 at 12:44 -0800, Arjan van de Ven wrote: >> >> the objective is to have retpoline be safe everywhere and never use IBRS >> (Linus was also pretty clear about that) so I'm confused by your question Note on the unhappiness with some of the patches involved: what I do *not* want to see is the "on every kernel entry" kind of garbage. So my unhappiness with the intel microcode patches is two-fold: (a) the interface is nasty and wrong, and I absolutely detest how Intel did it. (b) the write to random MSR's on every kernel entry/exit is wrong but that doesn't mean that I will necessarily end up NAK'ing every single IBRS/IBPB patch. My concern with (a) is that unlike meltdown, the intel work-around isn't forward-looking, and doesn't have a "we fixed it" bit. Instead, it has a "we have a nasty workaround that may or may not be horribly expensive" bit, and isn't all that well-defined. My dislike of (b) comes from "we have retpoline and various wondrous RSB filling crud already, we're smarter than that". So it's not that I refuse any IBRS/IBPB use, I refuse the stupid and _mindless_ kind of use. > The question is about all the additional RSB-frobbing and call depth > counting and other bits that don't really even exist for Skylake yet in > a coherent form. > > If a guest doesn't have those, because it's running some future kernel > where they *are* implemented but not enabled because at *boot* time it > discovered it wasn't on Skylake, the question is what happens if that > guest is subsequently migrated to a Skylake-class machine. So I actually have a _different_ question to the virtualization people. This includes the vmware people, but it also obviously incldues the Amazon AWS kind of usage. When you're a hypervisor (whether vmware or Amazon), why do you even end up caring about these things so much? You're protected from meltdown thanks to the virtual environment already having separate page tables. And the "big hammer" approach to spectre would seem to be to just make sure the BTB and RSB are flushed at vmexit time - and even then you might decide that you really want to just move it to vmenter time, and only do it if the VM has changed since last time (per CPU). Why do you even _care_ about the guest, and how it acts wrt Skylake? What you should care about is not so much the guests (which do their own thing) but protect guests from each other, no? So I'm a bit mystified by some of this discussion within the context of virtual machines. I think that is separate from any measures that the guest machine may then decide to partake in. If you are ever going to migrate to Skylake, I think you should just always tell the guests that you're running on Skylake. That way the guests will always assume the worst case situation wrt Specte. Maybe that mystification comes from me missing something. Linus ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 0:23 ` Linus Torvalds @ 2018-01-30 1:03 ` Jim Mattson 2018-01-30 3:13 ` Andi Kleen 2018-01-30 1:32 ` Arjan van de Ven ` (5 subsequent siblings) 6 siblings, 1 reply; 75+ messages in thread From: Jim Mattson @ 2018-01-30 1:03 UTC (permalink / raw) To: Linus Torvalds Cc: David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert The guest OS is responsible for protecting itself from intra-guest attacks. The hypervisor can't do that. We want to give the guest OS the tools it needs to make reasonable decisions about the intra-guest protections it wants to enable, in an environment where the virtual processor and the physical processor may not actually have the same F/M/S (and in fact, where the physical processor may change at any time). Right now, we are dealing with one workaround, which is tied to Skylake-era model numbers. Yes, we could report a Skylake model number, and Linux guests would use IBRS instead of retpoline. But this approach doesn't scale. What happens when someone introduces a workaround tied to some other model numbers? On Mon, Jan 29, 2018 at 4:23 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, Jan 29, 2018 at 1:02 PM, David Woodhouse <dwmw2@infradead.org> wrote: >> >> On Mon, 2018-01-29 at 12:44 -0800, Arjan van de Ven wrote: >>> >>> the objective is to have retpoline be safe everywhere and never use IBRS >>> (Linus was also pretty clear about that) so I'm confused by your question > > Note on the unhappiness with some of the patches involved: what I do > *not* want to see is the "on every kernel entry" kind of garbage. > > So my unhappiness with the intel microcode patches is two-fold: > > (a) the interface is nasty and wrong, and I absolutely detest how Intel did it. > > (b) the write to random MSR's on every kernel entry/exit is wrong > > but that doesn't mean that I will necessarily end up NAK'ing every > single IBRS/IBPB patch. > > My concern with (a) is that unlike meltdown, the intel work-around > isn't forward-looking, and doesn't have a "we fixed it" bit. Instead, > it has a "we have a nasty workaround that may or may not be horribly > expensive" bit, and isn't all that well-defined. > > My dislike of (b) comes from "we have retpoline and various wondrous > RSB filling crud already, we're smarter than that". So it's not that I > refuse any IBRS/IBPB use, I refuse the stupid and _mindless_ kind of > use. > >> The question is about all the additional RSB-frobbing and call depth >> counting and other bits that don't really even exist for Skylake yet in >> a coherent form. >> >> If a guest doesn't have those, because it's running some future kernel >> where they *are* implemented but not enabled because at *boot* time it >> discovered it wasn't on Skylake, the question is what happens if that >> guest is subsequently migrated to a Skylake-class machine. > > So I actually have a _different_ question to the virtualization > people. This includes the vmware people, but it also obviously > incldues the Amazon AWS kind of usage. > > When you're a hypervisor (whether vmware or Amazon), why do you even > end up caring about these things so much? You're protected from > meltdown thanks to the virtual environment already having separate > page tables. And the "big hammer" approach to spectre would seem to > be to just make sure the BTB and RSB are flushed at vmexit time - and > even then you might decide that you really want to just move it to > vmenter time, and only do it if the VM has changed since last time > (per CPU). > > Why do you even _care_ about the guest, and how it acts wrt Skylake? > What you should care about is not so much the guests (which do their > own thing) but protect guests from each other, no? > > So I'm a bit mystified by some of this discussion within the context > of virtual machines. I think that is separate from any measures that > the guest machine may then decide to partake in. > > If you are ever going to migrate to Skylake, I think you should just > always tell the guests that you're running on Skylake. That way the > guests will always assume the worst case situation wrt Specte. > > Maybe that mystification comes from me missing something. > > Linus ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 1:03 ` Jim Mattson @ 2018-01-30 3:13 ` Andi Kleen 2018-01-31 15:03 ` Paolo Bonzini 0 siblings, 1 reply; 75+ messages in thread From: Andi Kleen @ 2018-01-30 3:13 UTC (permalink / raw) To: Jim Mattson Cc: Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert > Right now, we are dealing with one workaround, which is tied to > Skylake-era model numbers. Yes, we could report a Skylake model > number, and Linux guests would use IBRS instead of retpoline. But this Nobody is planning to use IBRS and Linus has rejected it. > approach doesn't scale. What happens when someone introduces a > workaround tied to some other model numbers? There are already many of those in the tree for other issues and features. So far you managed to survive without. Likely that will be true in the future too. -Andi ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 3:13 ` Andi Kleen @ 2018-01-31 15:03 ` Paolo Bonzini 2018-01-31 15:07 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 75+ messages in thread From: Paolo Bonzini @ 2018-01-31 15:03 UTC (permalink / raw) To: Andi Kleen, Jim Mattson Cc: Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert On 29/01/2018 22:13, Andi Kleen wrote: >> What happens when someone introduces a >> workaround tied to some other model numbers? > There are already many of those in the tree for other issues and features. > So far you managed to survive without. Likely that will be true > in the future too. "Guests have to live with processor fuckups" is actually a much better answer than "Hypervisors may need to revisit their practice", since at least it's clear where the blame lies. Because really it's just plain luck. It just happens that most errata are for functionality that is not available to a virtual machine (e.g. perfmon and monitor workarounds or buggy TSC deadline timer that hypervisors emulate anyway), that only needs a chicken bit to be set in the host, or the bugs are there only for old hardware that doesn't have virtualization (X86_BUG_F00F, X86_BUGS_SWAPGS_FENCE). CPUID flags are guaranteed to never change---never come, never go away. For anything that doesn't map nicely to a CPUID flag, you cannot really express it. Also if something is not architectural, you can pretty much assume that you cannot know it under virtualization. f/m/s is not architectural; family, model and stepping mean absolutely nothing when running in virtualization, because the host CPU model can change under your feet at any time. We force guest vendor == host vendor just because otherwise too much stuff breaks, but that's it. Paolo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 15:03 ` Paolo Bonzini @ 2018-01-31 15:07 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 75+ messages in thread From: Dr. David Alan Gilbert @ 2018-01-31 15:07 UTC (permalink / raw) To: Paolo Bonzini Cc: Andi Kleen, Jim Mattson, Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers * Paolo Bonzini (pbonzini@redhat.com) wrote: > On 29/01/2018 22:13, Andi Kleen wrote: > >> What happens when someone introduces a > >> workaround tied to some other model numbers? > > There are already many of those in the tree for other issues and features. > > So far you managed to survive without. Likely that will be true > > in the future too. > > "Guests have to live with processor fuckups" is actually a much better > answer than "Hypervisors may need to revisit their practice", since at > least it's clear where the blame lies. > > Because really it's just plain luck. It just happens that most errata > are for functionality that is not available to a virtual machine (e.g. > perfmon and monitor workarounds or buggy TSC deadline timer that > hypervisors emulate anyway), that only needs a chicken bit to be set in > the host, or the bugs are there only for old hardware that doesn't have > virtualization (X86_BUG_F00F, X86_BUGS_SWAPGS_FENCE). > > CPUID flags are guaranteed to never change---never come, never go away. > For anything that doesn't map nicely to a CPUID flag, you cannot really > express it. Also if something is not architectural, you can pretty much > assume that you cannot know it under virtualization. f/m/s is not > architectural; family, model and stepping mean absolutely nothing when > running in virtualization, because the host CPU model can change under > your feet at any time. We force guest vendor == host vendor just > because otherwise too much stuff breaks, but that's it. In some ways we've been luckiest on x86; my understanding is ARM have a similar set of architecture-specific errata and aren't really sure how to expose this to guests either. Dave > Paolo -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 0:23 ` Linus Torvalds 2018-01-30 1:03 ` Jim Mattson @ 2018-01-30 1:32 ` Arjan van de Ven 2018-01-30 3:32 ` Linus Torvalds 2018-01-30 8:22 ` David Woodhouse ` (4 subsequent siblings) 6 siblings, 1 reply; 75+ messages in thread From: Arjan van de Ven @ 2018-01-30 1:32 UTC (permalink / raw) To: Linus Torvalds, David Woodhouse Cc: Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert On 1/29/2018 4:23 PM, Linus Torvalds wrote: > > Why do you even _care_ about the guest, and how it acts wrt Skylake? > What you should care about is not so much the guests (which do their > own thing) but protect guests from each other, no? the most simple solution is that we set the internal feature bit in Linux to turn on the "stuff the RSB" workaround is we're on a SKL *or* as a guest in a VM. The stuffing is not free, but it's also not insane either... so if it's turned on in guests, the impact is still limited, while bare metal doesn't need it at all ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 1:32 ` Arjan van de Ven @ 2018-01-30 3:32 ` Linus Torvalds 2018-01-30 12:04 ` Eduardo Habkost 2018-01-30 13:54 ` Arjan van de Ven 0 siblings, 2 replies; 75+ messages in thread From: Linus Torvalds @ 2018-01-30 3:32 UTC (permalink / raw) To: Arjan van de Ven Cc: David Woodhouse, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert On Mon, Jan 29, 2018 at 5:32 PM, Arjan van de Ven <arjan@linux.intel.com> wrote: > > the most simple solution is that we set the internal feature bit in Linux > to turn on the "stuff the RSB" workaround is we're on a SKL *or* as a guest > in a VM. That sounds reasonable. However, wouldn't it be even better to extend on the current cpuid model, and actually have some real architectural bits in there. Maybe it could be a bit in that IA32_ARCH_CAPABILITIES MSR. Say, add a bit #2 that says "ret falls back on BTB". Then that bit basically becomes the "Skylake bit". Hmm? Linus ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 3:32 ` Linus Torvalds @ 2018-01-30 12:04 ` Eduardo Habkost 2018-01-30 13:54 ` Arjan van de Ven 1 sibling, 0 replies; 75+ messages in thread From: Eduardo Habkost @ 2018-01-30 12:04 UTC (permalink / raw) To: Linus Torvalds Cc: Arjan van de Ven, David Woodhouse, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert On Mon, Jan 29, 2018 at 07:32:06PM -0800, Linus Torvalds wrote: > On Mon, Jan 29, 2018 at 5:32 PM, Arjan van de Ven <arjan@linux.intel.com> wrote: > > > > the most simple solution is that we set the internal feature bit in Linux > > to turn on the "stuff the RSB" workaround is we're on a SKL *or* as a guest > > in a VM. > > That sounds reasonable. > > However, wouldn't it be even better to extend on the current cpuid > model, and actually have some real architectural bits in there. If Intel could do that, it would be great. > > Maybe it could be a bit in that IA32_ARCH_CAPABILITIES MSR. Say, add a > bit #2 that says "ret falls back on BTB". > > Then that bit basically becomes the "Skylake bit". Hmm? Yes. But note that the OS needs to be able to differentiate "old Skylake that doesn't support the new bit" from "newer Skylake that doesn't fall back om BTB". That's why I suggest a "non-Skylake bit" instead of a "Skylake bit". -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 3:32 ` Linus Torvalds 2018-01-30 12:04 ` Eduardo Habkost @ 2018-01-30 13:54 ` Arjan van de Ven 1 sibling, 0 replies; 75+ messages in thread From: Arjan van de Ven @ 2018-01-30 13:54 UTC (permalink / raw) To: Linus Torvalds Cc: David Woodhouse, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert On 1/29/2018 7:32 PM, Linus Torvalds wrote: > On Mon, Jan 29, 2018 at 5:32 PM, Arjan van de Ven <arjan@linux.intel.com> wrote: >> >> the most simple solution is that we set the internal feature bit in Linux >> to turn on the "stuff the RSB" workaround is we're on a SKL *or* as a guest >> in a VM. > > That sounds reasonable. > > However, wouldn't it be even better to extend on the current cpuid > model, and actually have some real architectural bits in there. > > Maybe it could be a bit in that IA32_ARCH_CAPABILITIES MSR. Say, add a > bit #2 that says "ret falls back on BTB". > > Then that bit basically becomes the "Skylake bit". Hmm? we can try to do that, but existing systems don't have that, and then we get in another long thread here about weird lists of stuff ;-) ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 0:23 ` Linus Torvalds 2018-01-30 1:03 ` Jim Mattson 2018-01-30 1:32 ` Arjan van de Ven @ 2018-01-30 8:22 ` David Woodhouse 2018-01-30 11:35 ` David Woodhouse ` (3 subsequent siblings) 6 siblings, 0 replies; 75+ messages in thread From: David Woodhouse @ 2018-01-30 8:22 UTC (permalink / raw) To: Linus Torvalds Cc: Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert [-- Attachment #1: Type: text/plain, Size: 1634 bytes --] On Mon, 2018-01-29 at 16:23 -0800, Linus Torvalds wrote: > And the "big hammer" approach to spectre would seem to > be to just make sure the BTB and RSB are flushed at vmexit time - and > even then you might decide that you really want to just move it to > vmenter time, and only do it if the VM has changed since last time > (per CPU). The IBPB which flushes the BTB is *expensive*; we really want to reduce the amount we do that. For VM guests it's not so bad — we do it only on VMPTRLD which is sufficient to ensure it's done between running one vCPU and the next. And if vCPUs are pinned to pCPUs that means we basically never do it. Even for userspace we've mostly settled on a heuristic where we only do the IBPB flush for non-dumpable processes, precisely because it's so expensive. > Why do you even _care_ about the guest, and how it acts wrt Skylake? > What you should care about is not so much the guests (which do their > own thing) but protect guests from each other, no? Well yes, that's the part we had to fix before anyone was allowed to sleep. But customers kind of care about security *within* their part too, and we care about customers. :) Sure, the cloud *enables* a model where a given VM guest is just a single-tenant standalone compute job, and the kernel is effectively just a library to provide services to the application. In some sense it's all about the app, and you might as well be using uCLinux from the security point of view. So *some* (perhaps even *many*) guests don't need to care. But there are still plenty who *do* need to care, for various reasons. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 0:23 ` Linus Torvalds ` (2 preceding siblings ...) 2018-01-30 8:22 ` David Woodhouse @ 2018-01-30 11:35 ` David Woodhouse 2018-01-30 11:56 ` Dr. David Alan Gilbert ` (2 subsequent siblings) 6 siblings, 0 replies; 75+ messages in thread From: David Woodhouse @ 2018-01-30 11:35 UTC (permalink / raw) To: Linus Torvalds Cc: Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert [-- Attachment #1: Type: text/plain, Size: 2813 bytes --] On Mon, 2018-01-29 at 16:23 -0800, Linus Torvalds wrote: > > Note on the unhappiness with some of the patches involved: what I do > *not* want to see is the "on every kernel entry" kind of garbage. > > So my unhappiness with the intel microcode patches is two-fold: > > (a) the interface is nasty and wrong, and I absolutely detest how Intel did it. > > (b) the write to random MSR's on every kernel entry/exit is wrong > > but that doesn't mean that I will necessarily end up NAK'ing every > single IBRS/IBPB patch. > > My concern with (a) is that unlike meltdown, the intel work-around > isn't forward-looking, and doesn't have a "we fixed it" bit. Instead, > it has a "we have a nasty workaround that may or may not be horribly > expensive" bit, and isn't all that well-defined. The lack of a "we fixed it" bit is certainly problematic. But as an interim hack for the upcoming hardware, IBRS_ALL isn't so badly defined. Sure, the reassurances about performance all got ripped out before the document saw the light of day — quelle surprise? — but my understanding is that it *will* be fast. It is expected to be fast enough that we can ALTERNATIVE away the retpolines, set it once and leave it set. The reason it isn't just a "we fixed it" bit is because we'll still need the IBPB on context/vCPU switches. I suspect they managed to tag BTB entries with VMX mode and ring, but *not* the full VMID/PCID tagging (and associated automatic flushing) that they'd need to truly say "we fixed it". I seriously hope they're working on a complete fix for the subsequent generation, and just neglected to mention it in their public documentation that far in advance. > My dislike of (b) comes from "we have retpoline and various wondrous > RSB filling crud already, we're smarter than that". So it's not that I > refuse any IBRS/IBPB use, I refuse the stupid and _mindless_ kind of > use. Well... for Skylake we probably need something like Ingo's cunning plan to abuse function tracing to count call depth. I won't be utterly shocked if, by the time we have all that pulled together, it ends up being fairly much as fugly as the IBRS version — for less complete protection. But we'll see. :) It may also be that some of the last remaining holes can be declared just too unlikely for us to jump through fugly hoops for. In fact that *has* to be our answer for the SMI issue if we're not using IBRS on Skylake, so now it's just a question of degree — how many of the *other* theoretical holes are we happy to do the same thing for? That's a genuine question, not a rhetorical device arguing for IBRS. I just haven't seen a clear analysis, other than some hand-waving, of how feasible some of those attack vectors really are. I'd like to. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5213 bytes --] ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 0:23 ` Linus Torvalds ` (3 preceding siblings ...) 2018-01-30 11:35 ` David Woodhouse @ 2018-01-30 11:56 ` Dr. David Alan Gilbert 2018-01-30 12:11 ` Christian Borntraeger 2018-01-30 20:46 ` Alan Cox 6 siblings, 0 replies; 75+ messages in thread From: Dr. David Alan Gilbert @ 2018-01-30 11:56 UTC (permalink / raw) To: Linus Torvalds Cc: David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers * Linus Torvalds (torvalds@linux-foundation.org) wrote: > Why do you even _care_ about the guest, and how it acts wrt Skylake? > What you should care about is not so much the guests (which do their > own thing) but protect guests from each other, no? > > So I'm a bit mystified by some of this discussion within the context > of virtual machines. I think that is separate from any measures that > the guest machine may then decide to partake in. Because you'd never want to be the cause of the guest making the wrong decision and thus being less secure than it was on real hardware. > If you are ever going to migrate to Skylake, I think you should just > always tell the guests that you're running on Skylake. That way the > guests will always assume the worst case situation wrt Specte. Say you've got a pile of Ivybridge, all running lots of VMs, the guests see that they're running on Ivybridge. Now you need some more hosts, so you buy the latest Skylake boxes, and add them into your cluster. Previously it was fine to live migrate a VM to the Skylake box and the VM still sees it's running Ivybridge; and you can migrate that VM back and forward. The rule was that as long as the CPU type you told the guest was old enough then it could migrate to any newer box. You can't tell the VMs running on Ivybridge they're running on Skylake otherwise they'll start trying to use Skylake features (OK, they should be checking flags, but that's a separate story). Dave > Maybe that mystification comes from me missing something. > > Linus -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 0:23 ` Linus Torvalds ` (4 preceding siblings ...) 2018-01-30 11:56 ` Dr. David Alan Gilbert @ 2018-01-30 12:11 ` Christian Borntraeger 2018-01-30 14:46 ` Christophe de Dinechin 2018-01-30 20:46 ` Alan Cox 6 siblings, 1 reply; 75+ messages in thread From: Christian Borntraeger @ 2018-01-30 12:11 UTC (permalink / raw) To: Linus Torvalds, David Woodhouse Cc: Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert On 01/30/2018 01:23 AM, Linus Torvalds wrote: [...] > > So I actually have a _different_ question to the virtualization > people. This includes the vmware people, but it also obviously > incldues the Amazon AWS kind of usage. > > When you're a hypervisor (whether vmware or Amazon), why do you even > end up caring about these things so much? You're protected from > meltdown thanks to the virtual environment already having separate > page tables. And the "big hammer" approach to spectre would seem to > be to just make sure the BTB and RSB are flushed at vmexit time - and > even then you might decide that you really want to just move it to > vmenter time, and only do it if the VM has changed since last time > (per CPU). > > Why do you even _care_ about the guest, and how it acts wrt Skylake? > What you should care about is not so much the guests (which do their > own thing) but protect guests from each other, no? > > So I'm a bit mystified by some of this discussion within the context > of virtual machines. I think that is separate from any measures that > the guest machine may then decide to partake in. > > If you are ever going to migrate to Skylake, I think you should just > always tell the guests that you're running on Skylake. That way the > guests will always assume the worst case situation wrt Specte. > > Maybe that mystification comes from me missing something. I can only speak for KVM, but I think the hypervisor issues come from the fact that for migration purposes the hypervisor "lies" to the guest in regard to what kind of CPU is running. (it has to lie, see below). This is to avoid random guest crashes by not announcing features. For example if you want to migrate forth and back between a system that has AVX512 and another one that has not you must tell the guest that AVX512 is not available - even if it runs on the capable system. To protect against new features the hypervisor only announces features that it understands. So you essentially start a VM in QEMU of a given CPU type that is constructed of a base cpu type plus extra features. Before migration, it is checked if he target system can run a guest of given type - otherwise migration is rejected. The management stack also knows things like baselining - basically creating the best possible guest CPU given a set of hosts. The problem now is: If you have lets say Broadwell and Skylakes. What kind of CPU type are you telling your guest? If you claim broadwell but run on skylake then you prevent that the guest can protect itself, because the guest does not know that it should do something special. If you say skylake the guest might start using features that broadwell does not understand. So I think what we have here is that the current (guest) cpu model for hypervisors was always designed for architectural features. Presenting a microarchitectural knowledge for workarounds does not seem to be well integrated into hypervisors. PS: For a list of potential cpus/features look at https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/cpu/cpu_map.xml ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 12:11 ` Christian Borntraeger @ 2018-01-30 14:46 ` Christophe de Dinechin 2018-01-30 14:52 ` Christian Borntraeger 0 siblings, 1 reply; 75+ messages in thread From: Christophe de Dinechin @ 2018-01-30 14:46 UTC (permalink / raw) To: Christian Borntraeger Cc: Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert > On 30 Jan 2018, at 13:11, Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > > On 01/30/2018 01:23 AM, Linus Torvalds wrote: > [...] >> >> So I actually have a _different_ question to the virtualization >> people. This includes the vmware people, but it also obviously >> incldues the Amazon AWS kind of usage. >> >> When you're a hypervisor (whether vmware or Amazon), why do you even >> end up caring about these things so much? You're protected from >> meltdown thanks to the virtual environment already having separate >> page tables. And the "big hammer" approach to spectre would seem to >> be to just make sure the BTB and RSB are flushed at vmexit time - and >> even then you might decide that you really want to just move it to >> vmenter time, and only do it if the VM has changed since last time >> (per CPU). >> >> Why do you even _care_ about the guest, and how it acts wrt Skylake? >> What you should care about is not so much the guests (which do their >> own thing) but protect guests from each other, no? >> >> So I'm a bit mystified by some of this discussion within the context >> of virtual machines. I think that is separate from any measures that >> the guest machine may then decide to partake in. >> >> If you are ever going to migrate to Skylake, I think you should just >> always tell the guests that you're running on Skylake. That way the >> guests will always assume the worst case situation wrt Specte. >> >> Maybe that mystification comes from me missing something. > > I can only speak for KVM, but I think the hypervisor issues come from > the fact that for migration purposes the hypervisor "lies" to the guest > in regard to what kind of CPU is running. (it has to lie, see below). > > This is to avoid random guest crashes by not announcing features. For > example if you want to migrate forth and back between a system that > has AVX512 and another one that has not you must tell the guest that > AVX512 is not available - even if it runs on the capable system. > > To protect against new features the hypervisor only announces features > that it understands. > So you essentially start a VM in QEMU of a given CPU type that is > constructed of a base cpu type plus extra features. Before migration, > it is checked if he target system can run a guest of given type - > otherwise migration is rejected. > > The management stack also knows things like baselining - basically > creating the best possible guest CPU given a set of hosts. > > The problem now is: If you have lets say Broadwell and Skylakes. > What kind of CPU type are you telling your guest? If you claim > broadwell but run on skylake then you prevent that the guest can > protect itself, because the guest does not know that it should do > something special. If you say skylake the guest might start using > features that broadwell does not understand. I believe that Linus’ question was whether it makes sense to defer the entirety of the protection to the host kernel, although I was a bit confused by his suggestion to always assume Skylake. In other words, is it safe enough to rely on the host kernel countermeasure to protect guest kernels and their applications? In which case having the guest believe it runs on Broadwell would not be that problematic. Aren’t there enough vmexits on the guest kernel context switch to enforce protection on its behalf? Even if it’s a) some old kernel that without mitigation code or b) some new kernel that thinks it runs on an old CPU and disabled mitigation Christophe > > So I think what we have here is that the current (guest) cpu model > for hypervisors was always designed for architectural features. > Presenting a microarchitectural knowledge for workarounds does > not seem to be well integrated into hypervisors. > > > PS: For a list of potential cpus/features look at > https://libvirt.org/git/?p=libvirt.git;a=blob;f=src/cpu/cpu_map.xml > ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 14:46 ` Christophe de Dinechin @ 2018-01-30 14:52 ` Christian Borntraeger 2018-01-30 14:56 ` Christophe de Dinechin 0 siblings, 1 reply; 75+ messages in thread From: Christian Borntraeger @ 2018-01-30 14:52 UTC (permalink / raw) To: Christophe de Dinechin Cc: Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert On 01/30/2018 03:46 PM, Christophe de Dinechin wrote: > > >> On 30 Jan 2018, at 13:11, Christian Borntraeger <borntraeger@de.ibm.com> wrote: >> >> >> >> On 01/30/2018 01:23 AM, Linus Torvalds wrote: >> [...] >>> >>> So I actually have a _different_ question to the virtualization >>> people. This includes the vmware people, but it also obviously >>> incldues the Amazon AWS kind of usage. >>> >>> When you're a hypervisor (whether vmware or Amazon), why do you even >>> end up caring about these things so much? You're protected from >>> meltdown thanks to the virtual environment already having separate >>> page tables. And the "big hammer" approach to spectre would seem to >>> be to just make sure the BTB and RSB are flushed at vmexit time - and >>> even then you might decide that you really want to just move it to >>> vmenter time, and only do it if the VM has changed since last time >>> (per CPU). >>> >>> Why do you even _care_ about the guest, and how it acts wrt Skylake? >>> What you should care about is not so much the guests (which do their >>> own thing) but protect guests from each other, no? >>> >>> So I'm a bit mystified by some of this discussion within the context >>> of virtual machines. I think that is separate from any measures that >>> the guest machine may then decide to partake in. >>> >>> If you are ever going to migrate to Skylake, I think you should just >>> always tell the guests that you're running on Skylake. That way the >>> guests will always assume the worst case situation wrt Specte. >>> >>> Maybe that mystification comes from me missing something. >> >> I can only speak for KVM, but I think the hypervisor issues come from >> the fact that for migration purposes the hypervisor "lies" to the guest >> in regard to what kind of CPU is running. (it has to lie, see below). >> >> This is to avoid random guest crashes by not announcing features. For >> example if you want to migrate forth and back between a system that >> has AVX512 and another one that has not you must tell the guest that >> AVX512 is not available - even if it runs on the capable system. >> >> To protect against new features the hypervisor only announces features >> that it understands. >> So you essentially start a VM in QEMU of a given CPU type that is >> constructed of a base cpu type plus extra features. Before migration, >> it is checked if he target system can run a guest of given type - >> otherwise migration is rejected. >> >> The management stack also knows things like baselining - basically >> creating the best possible guest CPU given a set of hosts. >> >> The problem now is: If you have lets say Broadwell and Skylakes. >> What kind of CPU type are you telling your guest? If you claim >> broadwell but run on skylake then you prevent that the guest can >> protect itself, because the guest does not know that it should do >> something special. If you say skylake the guest might start using >> features that broadwell does not understand. > > I believe that Linus’ question was whether it makes sense to defer > the entirety of the protection to the host kernel, although I was a bit > confused by his suggestion to always assume Skylake. > > In other words, is it safe enough to rely on the host kernel countermeasure > to protect guest kernels and their applications? In which case having > the guest believe it runs on Broadwell would not be that problematic. > > Aren’t there enough vmexits on the guest kernel context switch > to enforce protection on its behalf? Even if it’s > > a) some old kernel that without mitigation code > > or > > b) some new kernel that thinks it runs on an old CPU and disabled mitigation > I think it is not safe to just protect the host. CPU bound workload in the guest will switch a lot between guest user and guest kernel without triggering an exit. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 14:52 ` Christian Borntraeger @ 2018-01-30 14:56 ` Christophe de Dinechin 2018-01-30 15:33 ` Christian Borntraeger 0 siblings, 1 reply; 75+ messages in thread From: Christophe de Dinechin @ 2018-01-30 14:56 UTC (permalink / raw) To: Christian Borntraeger Cc: Christophe de Dinechin, Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert > On 30 Jan 2018, at 15:52, Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > > On 01/30/2018 03:46 PM, Christophe de Dinechin wrote: >> >> >>> On 30 Jan 2018, at 13:11, Christian Borntraeger <borntraeger@de.ibm.com> wrote: >>> >>> >>> >>> On 01/30/2018 01:23 AM, Linus Torvalds wrote: >>> [...] >>>> >>>> So I actually have a _different_ question to the virtualization >>>> people. This includes the vmware people, but it also obviously >>>> incldues the Amazon AWS kind of usage. >>>> >>>> When you're a hypervisor (whether vmware or Amazon), why do you even >>>> end up caring about these things so much? You're protected from >>>> meltdown thanks to the virtual environment already having separate >>>> page tables. And the "big hammer" approach to spectre would seem to >>>> be to just make sure the BTB and RSB are flushed at vmexit time - and >>>> even then you might decide that you really want to just move it to >>>> vmenter time, and only do it if the VM has changed since last time >>>> (per CPU). >>>> >>>> Why do you even _care_ about the guest, and how it acts wrt Skylake? >>>> What you should care about is not so much the guests (which do their >>>> own thing) but protect guests from each other, no? >>>> >>>> So I'm a bit mystified by some of this discussion within the context >>>> of virtual machines. I think that is separate from any measures that >>>> the guest machine may then decide to partake in. >>>> >>>> If you are ever going to migrate to Skylake, I think you should just >>>> always tell the guests that you're running on Skylake. That way the >>>> guests will always assume the worst case situation wrt Specte. >>>> >>>> Maybe that mystification comes from me missing something. >>> >>> I can only speak for KVM, but I think the hypervisor issues come from >>> the fact that for migration purposes the hypervisor "lies" to the guest >>> in regard to what kind of CPU is running. (it has to lie, see below). >>> >>> This is to avoid random guest crashes by not announcing features. For >>> example if you want to migrate forth and back between a system that >>> has AVX512 and another one that has not you must tell the guest that >>> AVX512 is not available - even if it runs on the capable system. >>> >>> To protect against new features the hypervisor only announces features >>> that it understands. >>> So you essentially start a VM in QEMU of a given CPU type that is >>> constructed of a base cpu type plus extra features. Before migration, >>> it is checked if he target system can run a guest of given type - >>> otherwise migration is rejected. >>> >>> The management stack also knows things like baselining - basically >>> creating the best possible guest CPU given a set of hosts. >>> >>> The problem now is: If you have lets say Broadwell and Skylakes. >>> What kind of CPU type are you telling your guest? If you claim >>> broadwell but run on skylake then you prevent that the guest can >>> protect itself, because the guest does not know that it should do >>> something special. If you say skylake the guest might start using >>> features that broadwell does not understand. >> >> I believe that Linus’ question was whether it makes sense to defer >> the entirety of the protection to the host kernel, although I was a bit >> confused by his suggestion to always assume Skylake. >> >> In other words, is it safe enough to rely on the host kernel countermeasure >> to protect guest kernels and their applications? In which case having >> the guest believe it runs on Broadwell would not be that problematic. >> >> Aren’t there enough vmexits on the guest kernel context switch >> to enforce protection on its behalf? Even if it’s >> >> a) some old kernel that without mitigation code >> >> or >> >> b) some new kernel that thinks it runs on an old CPU and disabled mitigation >> > I think it is not safe to just protect the host. CPU bound workload in the guest > will switch a lot between guest user and guest kernel without triggering an > exit. But that’s only if the guest does not take any page faults. Is it possible to run any of the known approaches to spectre and meltdown without ever faulting? If the workload is not faulting, then it’s reading only stuff it’s allowed to, isn’t it? Christophe ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 14:56 ` Christophe de Dinechin @ 2018-01-30 15:33 ` Christian Borntraeger 0 siblings, 0 replies; 75+ messages in thread From: Christian Borntraeger @ 2018-01-30 15:33 UTC (permalink / raw) To: Christophe de Dinechin Cc: Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert On 01/30/2018 03:56 PM, Christophe de Dinechin wrote: > > >> On 30 Jan 2018, at 15:52, Christian Borntraeger <borntraeger@de.ibm.com> wrote: >> >> >> >> On 01/30/2018 03:46 PM, Christophe de Dinechin wrote: >>> >>> >>>> On 30 Jan 2018, at 13:11, Christian Borntraeger <borntraeger@de.ibm.com> wrote: >>>> >>>> >>>> >>>> On 01/30/2018 01:23 AM, Linus Torvalds wrote: >>>> [...] >>>>> >>>>> So I actually have a _different_ question to the virtualization >>>>> people. This includes the vmware people, but it also obviously >>>>> incldues the Amazon AWS kind of usage. >>>>> >>>>> When you're a hypervisor (whether vmware or Amazon), why do you even >>>>> end up caring about these things so much? You're protected from >>>>> meltdown thanks to the virtual environment already having separate >>>>> page tables. And the "big hammer" approach to spectre would seem to >>>>> be to just make sure the BTB and RSB are flushed at vmexit time - and >>>>> even then you might decide that you really want to just move it to >>>>> vmenter time, and only do it if the VM has changed since last time >>>>> (per CPU). >>>>> >>>>> Why do you even _care_ about the guest, and how it acts wrt Skylake? >>>>> What you should care about is not so much the guests (which do their >>>>> own thing) but protect guests from each other, no? >>>>> >>>>> So I'm a bit mystified by some of this discussion within the context >>>>> of virtual machines. I think that is separate from any measures that >>>>> the guest machine may then decide to partake in. >>>>> >>>>> If you are ever going to migrate to Skylake, I think you should just >>>>> always tell the guests that you're running on Skylake. That way the >>>>> guests will always assume the worst case situation wrt Specte. >>>>> >>>>> Maybe that mystification comes from me missing something. >>>> >>>> I can only speak for KVM, but I think the hypervisor issues come from >>>> the fact that for migration purposes the hypervisor "lies" to the guest >>>> in regard to what kind of CPU is running. (it has to lie, see below). >>>> >>>> This is to avoid random guest crashes by not announcing features. For >>>> example if you want to migrate forth and back between a system that >>>> has AVX512 and another one that has not you must tell the guest that >>>> AVX512 is not available - even if it runs on the capable system. >>>> >>>> To protect against new features the hypervisor only announces features >>>> that it understands. >>>> So you essentially start a VM in QEMU of a given CPU type that is >>>> constructed of a base cpu type plus extra features. Before migration, >>>> it is checked if he target system can run a guest of given type - >>>> otherwise migration is rejected. >>>> >>>> The management stack also knows things like baselining - basically >>>> creating the best possible guest CPU given a set of hosts. >>>> >>>> The problem now is: If you have lets say Broadwell and Skylakes. >>>> What kind of CPU type are you telling your guest? If you claim >>>> broadwell but run on skylake then you prevent that the guest can >>>> protect itself, because the guest does not know that it should do >>>> something special. If you say skylake the guest might start using >>>> features that broadwell does not understand. >>> >>> I believe that Linus’ question was whether it makes sense to defer >>> the entirety of the protection to the host kernel, although I was a bit >>> confused by his suggestion to always assume Skylake. >>> >>> In other words, is it safe enough to rely on the host kernel countermeasure >>> to protect guest kernels and their applications? In which case having >>> the guest believe it runs on Broadwell would not be that problematic. >>> >>> Aren’t there enough vmexits on the guest kernel context switch >>> to enforce protection on its behalf? Even if it’s >>> >>> a) some old kernel that without mitigation code >>> >>> or >>> >>> b) some new kernel that thinks it runs on an old CPU and disabled mitigation >>> >> I think it is not safe to just protect the host. CPU bound workload in the guest >> will switch a lot between guest user and guest kernel without triggering an >> exit. > > But that’s only if the guest does not take any page faults. Is it possible to run any > of the known approaches to spectre and meltdown without ever faulting? Sure, after you have faulted in everything you can still flush the cache without refaulting, And if you need a fault, it will be GUEST fault - no hypervisor involvment, Everything else would be too slow and is pre NPT. > If the workload is not faulting, then it’s reading only stuff it’s allowed to, isn’t it? The point is: The hypervisor will not try to fix the guest userspace against guest kernel space or other guest userspaces. This is clearly the task of the guest operating system (you are also not asking the hypervisor build a guest kpti is the guest is too old). The hypervisors task is to isolate guests against other guests and against the host. At the same time the hypervisor will try to _enable_ the guest to also protect itself. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 0:23 ` Linus Torvalds ` (5 preceding siblings ...) 2018-01-30 12:11 ` Christian Borntraeger @ 2018-01-30 20:46 ` Alan Cox 2018-01-31 10:05 ` Christophe de Dinechin 6 siblings, 1 reply; 75+ messages in thread From: Alan Cox @ 2018-01-30 20:46 UTC (permalink / raw) To: Linus Torvalds Cc: David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert > If you are ever going to migrate to Skylake, I think you should just > always tell the guests that you're running on Skylake. That way the > guests will always assume the worst case situation wrt Specte. Unfortunately if you do that then guest may also decide to use other Skylake hardware features and pop its clogs when it finds out its actually running on Westmere or SandyBridge. So you need to be able to both lie to the OS and user space via cpuid and also have a second 'but do skylake protections' that only mitigation aware software knows about. Alan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-30 20:46 ` Alan Cox @ 2018-01-31 10:05 ` Christophe de Dinechin 2018-01-31 10:15 ` Thomas Gleixner 0 siblings, 1 reply; 75+ messages in thread From: Christophe de Dinechin @ 2018-01-31 10:05 UTC (permalink / raw) To: Alan Cox Cc: Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert > On 30 Jan 2018, at 21:46, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote: > >> If you are ever going to migrate to Skylake, I think you should just >> always tell the guests that you're running on Skylake. That way the >> guests will always assume the worst case situation wrt Specte. > > Unfortunately if you do that then guest may also decide to use other > Skylake hardware features and pop its clogs when it finds out its actually > running on Westmere or SandyBridge. > > So you need to be able to both lie to the OS and user space via cpuid and > also have a second 'but do skylake protections' that only mitigation > aware software knows about. Yes. The most desirable lie is different depending on whether you want to allow virtualization features such as migration (where you’d gravitate towards a CPU with less features) or whether you want to allow mitigation (where you’d rather present the most fragile CPUID, probably Skylake). Looking at some recent patches, I’m concerned that the code being added often assumes that the CPUID is the correct way to get that info. I do not think this is correct. You really want specific information about the host CPUID, not whatever KVM CPUID emulation makes up. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 10:05 ` Christophe de Dinechin @ 2018-01-31 10:15 ` Thomas Gleixner 2018-01-31 11:04 ` Dr. David Alan Gilbert ` (3 more replies) 0 siblings, 4 replies; 75+ messages in thread From: Thomas Gleixner @ 2018-01-31 10:15 UTC (permalink / raw) To: Christophe de Dinechin Cc: Alan Cox, Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert [-- Attachment #1: Type: text/plain, Size: 1875 bytes --] On Wed, 31 Jan 2018, Christophe de Dinechin wrote: > > On 30 Jan 2018, at 21:46, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote: > > > >> If you are ever going to migrate to Skylake, I think you should just > >> always tell the guests that you're running on Skylake. That way the > >> guests will always assume the worst case situation wrt Specte. > > > > Unfortunately if you do that then guest may also decide to use other > > Skylake hardware features and pop its clogs when it finds out its actually > > running on Westmere or SandyBridge. > > > > So you need to be able to both lie to the OS and user space via cpuid and > > also have a second 'but do skylake protections' that only mitigation > > aware software knows about. > > Yes. The most desirable lie is different depending on whether you want to > allow virtualization features such as migration (where you’d gravitate > towards a CPU with less features) or whether you want to allow mitigation > (where you’d rather present the most fragile CPUID, probably Skylake). > > Looking at some recent patches, I’m concerned that the code being added > often assumes that the CPUID is the correct way to get that info. > I do not think this is correct. You really want specific information about > the host CPUID, not whatever KVM CPUID emulation makes up. That wont cut it. If you have a heterogenous farm of systems, then you need: - All CPUs have to support IBRS/IBPB or at least hte hypervisor has to pretend they do by providing fake MRS for that - Have a 'force IBRS/IBPB' mechanism so the guests don't discard it due to missing CPU feature bits. Though this gets worse. You have to make sure that the guest keeps _ALL_ sorts of mitigation mechanisms enabled and does not decide to disable retpolines because IBRS/IBPB are "available". Good luck with making all that work. Thanks, tglx ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 10:15 ` Thomas Gleixner @ 2018-01-31 11:04 ` Dr. David Alan Gilbert 2018-01-31 11:52 ` Borislav Petkov 2018-01-31 11:07 ` Christophe de Dinechin ` (2 subsequent siblings) 3 siblings, 1 reply; 75+ messages in thread From: Dr. David Alan Gilbert @ 2018-01-31 11:04 UTC (permalink / raw) To: Thomas Gleixner Cc: Christophe de Dinechin, Alan Cox, Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers * Thomas Gleixner (tglx@linutronix.de) wrote: > On Wed, 31 Jan 2018, Christophe de Dinechin wrote: > > > On 30 Jan 2018, at 21:46, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote: > > > > > >> If you are ever going to migrate to Skylake, I think you should just > > >> always tell the guests that you're running on Skylake. That way the > > >> guests will always assume the worst case situation wrt Specte. > > > > > > Unfortunately if you do that then guest may also decide to use other > > > Skylake hardware features and pop its clogs when it finds out its actually > > > running on Westmere or SandyBridge. > > > > > > So you need to be able to both lie to the OS and user space via cpuid and > > > also have a second 'but do skylake protections' that only mitigation > > > aware software knows about. > > > > Yes. The most desirable lie is different depending on whether you want to > > allow virtualization features such as migration (where you’d gravitate > > towards a CPU with less features) or whether you want to allow mitigation > > (where you’d rather present the most fragile CPUID, probably Skylake). > > > > Looking at some recent patches, I’m concerned that the code being added > > often assumes that the CPUID is the correct way to get that info. > > I do not think this is correct. You really want specific information about > > the host CPUID, not whatever KVM CPUID emulation makes up. > > That wont cut it. If you have a heterogenous farm of systems, then you need: > > - All CPUs have to support IBRS/IBPB or at least hte hypervisor has to > pretend they do by providing fake MRS for that > > - Have a 'force IBRS/IBPB' mechanism so the guests don't discard it due > to missing CPU feature bits. That half is the easy bit, we've already got that (thanks to Eduardo), QEMU has -IBRS variants of CPU types, so if you start a VM with -cpu Broadwell-IBRS it'll get advertised to the guest as having IBRS; and (with appropriate flags) the management layers will only allow that to be started on hosts that support IBRS and wont allow migration between hosts with and without it. > Though this gets worse. You have to make sure that the guest keeps _ALL_ > sorts of mitigation mechanisms enabled and does not decide to disable > retpolines because IBRS/IBPB are "available". This is what's different with this set; it's all coming down to sets of heuristics which include CPU model etc, rather than just a 'we've got a feature, use it'. Dave > Good luck with making all that work. > > Thanks, > > tglx -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 11:04 ` Dr. David Alan Gilbert @ 2018-01-31 11:52 ` Borislav Petkov 2018-01-31 12:30 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 75+ messages in thread From: Borislav Petkov @ 2018-01-31 11:52 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Thomas Gleixner, Christophe de Dinechin, Alan Cox, Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers On Wed, Jan 31, 2018 at 11:04:07AM +0000, Dr. David Alan Gilbert wrote: > That half is the easy bit, we've already got that (thanks to Eduardo), > QEMU has -IBRS variants of CPU types, so if you start a VM with > -cpu Broadwell-IBRS Eww, a CPU model with a specific feature bit. I hope you guys don't add a model like that for every CPU feature. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 11:52 ` Borislav Petkov @ 2018-01-31 12:30 ` Dr. David Alan Gilbert 2018-01-31 13:18 ` Borislav Petkov 0 siblings, 1 reply; 75+ messages in thread From: Dr. David Alan Gilbert @ 2018-01-31 12:30 UTC (permalink / raw) To: Borislav Petkov Cc: Thomas Gleixner, Christophe de Dinechin, Alan Cox, Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers * Borislav Petkov (bp@suse.de) wrote: > On Wed, Jan 31, 2018 at 11:04:07AM +0000, Dr. David Alan Gilbert wrote: > > That half is the easy bit, we've already got that (thanks to Eduardo), > > QEMU has -IBRS variants of CPU types, so if you start a VM with > > -cpu Broadwell-IBRS > > Eww, a CPU model with a specific feature bit. I hope you guys don't add > a model like that for every CPU feature. Indeed, it's only for this weird case where you suddenly need to change it. Dave > -- > Regards/Gruss, > Boris. > > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) > -- -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 12:30 ` Dr. David Alan Gilbert @ 2018-01-31 13:18 ` Borislav Petkov 2018-01-31 14:04 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 75+ messages in thread From: Borislav Petkov @ 2018-01-31 13:18 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Thomas Gleixner, Christophe de Dinechin, Alan Cox, Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers On Wed, Jan 31, 2018 at 12:30:36PM +0000, Dr. David Alan Gilbert wrote: > Indeed, it's only for this weird case where you suddenly need to change > it. No, there's more: .name = "Broadwell-noTSX", .name = "Haswell-noTSX", -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 13:18 ` Borislav Petkov @ 2018-01-31 14:04 ` Dr. David Alan Gilbert 2018-01-31 14:44 ` Eduardo Habkost 0 siblings, 1 reply; 75+ messages in thread From: Dr. David Alan Gilbert @ 2018-01-31 14:04 UTC (permalink / raw) To: Borislav Petkov Cc: Thomas Gleixner, Christophe de Dinechin, Alan Cox, Linus Torvalds, David Woodhouse, Arjan van de Ven, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers * Borislav Petkov (bp@suse.de) wrote: > On Wed, Jan 31, 2018 at 12:30:36PM +0000, Dr. David Alan Gilbert wrote: > > Indeed, it's only for this weird case where you suddenly need to change > > it. > > No, there's more: > > .name = "Broadwell-noTSX", > .name = "Haswell-noTSX", Haswell came out and we made the CPU definition, and then got a microcode update that removed the feature. So the common feature of noTSX and IBRS is that they're the only two cases where a CPU has released and then the flags have changed later. Dave > -- > Regards/Gruss, > Boris. > > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) > -- -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 14:04 ` Dr. David Alan Gilbert @ 2018-01-31 14:44 ` Eduardo Habkost 2018-01-31 16:28 ` Borislav Petkov 0 siblings, 1 reply; 75+ messages in thread From: Eduardo Habkost @ 2018-01-31 14:44 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Borislav Petkov, Thomas Gleixner, Christophe de Dinechin, Alan Cox, Linus Torvalds, David Woodhouse, Arjan van de Ven, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers On Wed, Jan 31, 2018 at 02:04:49PM +0000, Dr. David Alan Gilbert wrote: > * Borislav Petkov (bp@suse.de) wrote: > > On Wed, Jan 31, 2018 at 12:30:36PM +0000, Dr. David Alan Gilbert wrote: > > > Indeed, it's only for this weird case where you suddenly need to change > > > it. > > > > No, there's more: > > > > .name = "Broadwell-noTSX", > > .name = "Haswell-noTSX", > > Haswell came out and we made the CPU definition, and then got a > microcode update that removed the feature. > > So the common feature of noTSX and IBRS is that they're the only two > cases where a CPU has released and then the flags have changed later. Also, if anybody don't like it, users can already specify, e.g., "Broadwell,-hle,-rtm" or "Skylake,+spec_ctrl". QEMU only adds have the -noTSX and -IBRS CPU for convenience of management systems that don't know how to check/configure individual CPU features. We're working with libvirt and OpenStack folks to make this kind of trick unnecessary. -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 14:44 ` Eduardo Habkost @ 2018-01-31 16:28 ` Borislav Petkov 0 siblings, 0 replies; 75+ messages in thread From: Borislav Petkov @ 2018-01-31 16:28 UTC (permalink / raw) To: Eduardo Habkost Cc: Dr. David Alan Gilbert, Thomas Gleixner, Christophe de Dinechin, Alan Cox, Linus Torvalds, David Woodhouse, Arjan van de Ven, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers On Wed, Jan 31, 2018 at 12:44:41PM -0200, Eduardo Habkost wrote: > Also, if anybody don't like it, users can already specify, e.g., > "Broadwell,-hle,-rtm" or "Skylake,+spec_ctrl". > > QEMU only adds have the -noTSX and -IBRS CPU for convenience of > management systems that don't know how to check/configure > individual CPU features. We're working with libvirt and > OpenStack folks to make this kind of trick unnecessary. Yeah, defining separate CPU models just for that seems hacky. The +/-<feature> specification looks like the Right Thing(tm) to do. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 10:15 ` Thomas Gleixner 2018-01-31 11:04 ` Dr. David Alan Gilbert @ 2018-01-31 11:07 ` Christophe de Dinechin 2018-01-31 15:00 ` Eduardo Habkost 2018-01-31 15:11 ` Arjan van de Ven 3 siblings, 0 replies; 75+ messages in thread From: Christophe de Dinechin @ 2018-01-31 11:07 UTC (permalink / raw) To: Thomas Gleixner, Eduardo Habkost, KarimAllah Ahmed Cc: Christophe de Dinechin, Alan Cox, Linus Torvalds, David Woodhouse, Arjan van de Ven, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert > On 31 Jan 2018, at 11:15, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Wed, 31 Jan 2018, Christophe de Dinechin wrote: >>> On 30 Jan 2018, at 21:46, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote: >>> >>>> If you are ever going to migrate to Skylake, I think you should just >>>> always tell the guests that you're running on Skylake. That way the >>>> guests will always assume the worst case situation wrt Specte. >>> >>> Unfortunately if you do that then guest may also decide to use other >>> Skylake hardware features and pop its clogs when it finds out its actually >>> running on Westmere or SandyBridge. >>> >>> So you need to be able to both lie to the OS and user space via cpuid and >>> also have a second 'but do skylake protections' that only mitigation >>> aware software knows about. >> >> Yes. The most desirable lie is different depending on whether you want to >> allow virtualization features such as migration (where you’d gravitate >> towards a CPU with less features) or whether you want to allow mitigation >> (where you’d rather present the most fragile CPUID, probably Skylake). >> >> Looking at some recent patches, I’m concerned that the code being added >> often assumes that the CPUID is the correct way to get that info. >> I do not think this is correct. You really want specific information about >> the host CPUID, not whatever KVM CPUID emulation makes up. > > That wont cut it. If you have a heterogenous farm of systems, then you need: > > - All CPUs have to support IBRS/IBPB or at least hte hypervisor has to > pretend they do by providing fake MRS for that > > - Have a 'force IBRS/IBPB' mechanism so the guests don't discard it due > to missing CPU feature bits. > > Though this gets worse. You have to make sure that the guest keeps _ALL_ > sorts of mitigation mechanisms enabled and does not decide to disable > retpolines because IBRS/IBPB are "available”. What you are saying is that it’s one thing to test at boot time, but (at least) migration events should also cause a re-check. Agreed. The alternative is to pessimistically enable mitigation in VMs. I believe this is the current “state of the art”, i.e. enable IBRS statically via a CPU type variant. What is the best place to re-check anyway? (Just out of curiosity: there are no non-symmetric systems that mix CPUs of different generation, right?) > > Good luck with making all that work. :-) > > Thanks, > > tglx ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 10:15 ` Thomas Gleixner 2018-01-31 11:04 ` Dr. David Alan Gilbert 2018-01-31 11:07 ` Christophe de Dinechin @ 2018-01-31 15:00 ` Eduardo Habkost 2018-01-31 15:11 ` Arjan van de Ven 3 siblings, 0 replies; 75+ messages in thread From: Eduardo Habkost @ 2018-01-31 15:00 UTC (permalink / raw) To: Thomas Gleixner Cc: Christophe de Dinechin, Alan Cox, Linus Torvalds, David Woodhouse, Arjan van de Ven, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert On Wed, Jan 31, 2018 at 11:15:50AM +0100, Thomas Gleixner wrote: > On Wed, 31 Jan 2018, Christophe de Dinechin wrote: > > > On 30 Jan 2018, at 21:46, Alan Cox <gnomes@lxorguk.ukuu.org.uk> wrote: > > > > > >> If you are ever going to migrate to Skylake, I think you should just > > >> always tell the guests that you're running on Skylake. That way the > > >> guests will always assume the worst case situation wrt Specte. > > > > > > Unfortunately if you do that then guest may also decide to use other > > > Skylake hardware features and pop its clogs when it finds out its actually > > > running on Westmere or SandyBridge. > > > > > > So you need to be able to both lie to the OS and user space via cpuid and > > > also have a second 'but do skylake protections' that only mitigation > > > aware software knows about. > > > > Yes. The most desirable lie is different depending on whether you want to > > allow virtualization features such as migration (where you’d gravitate > > towards a CPU with less features) or whether you want to allow mitigation > > (where you’d rather present the most fragile CPUID, probably Skylake). > > > > Looking at some recent patches, I’m concerned that the code being added > > often assumes that the CPUID is the correct way to get that info. > > I do not think this is correct. You really want specific information about > > the host CPUID, not whatever KVM CPUID emulation makes up. > > That wont cut it. If you have a heterogenous farm of systems, then you need: > > - All CPUs have to support IBRS/IBPB or at least hte hypervisor has to > pretend they do by providing fake MRS for that > > - Have a 'force IBRS/IBPB' mechanism so the guests don't discard it due > to missing CPU feature bits. If all your hosts have IBRS/IBPB, you enable it. If some of your hosts don't have IBRS/IBPB, you don't expose it to the guest (and deal with the consequences of not applying updates to your hardware). Where's the problem? > > Though this gets worse. You have to make sure that the guest keeps _ALL_ > sorts of mitigation mechanisms enabled and does not decide to disable > retpolines because IBRS/IBPB are "available". If IBRS/IBPB are reported as available to the guest, the VM management system will ensure the VM won't be migrated to a host that doesn't have it. That's a pretty basic feature of VM management stacks. Exactly the same could happen to a "(non-)skylake bit". The host reports a feature (or a bug fix) as available to a guest, and then the system ensures you won't migrate to a host that doesn't provide that feature. The problem I see here is that Linux guests currently have no way to tell if it needs to enable Skylake-specific mitigations or not. Unless you make Linux always enable skylake mitigations if seeing the hypervisor bit, you will need the hypervisor to provide more useful information than f/m/s. -- Eduardo ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-31 10:15 ` Thomas Gleixner ` (2 preceding siblings ...) 2018-01-31 15:00 ` Eduardo Habkost @ 2018-01-31 15:11 ` Arjan van de Ven 3 siblings, 0 replies; 75+ messages in thread From: Arjan van de Ven @ 2018-01-31 15:11 UTC (permalink / raw) To: Thomas Gleixner, Christophe de Dinechin Cc: Alan Cox, Linus Torvalds, David Woodhouse, Eduardo Habkost, KarimAllah Ahmed, Linux Kernel Mailing List, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Tim Chen, Tom Lendacky, KVM list, the arch/x86 maintainers, Dr. David Alan Gilbert On 1/31/2018 2:15 AM, Thomas Gleixner wrote: > Good luck with making all that work. on the Intel side we're checking what we can do that works and doesn't break things right now; hopefully we just end up with a bit in the arch capabilities MSR for "you should do RSB stuffing" and then the HV's can emulate that. (people sometimes think that should be a 5 minute thing, but we need to check many cpu models/etc to make sure a bit we pick is really free etc which makes it take longer than some folks have patience for) ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure 2018-01-20 19:22 ` [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure KarimAllah Ahmed 2018-01-21 14:31 ` Thomas Gleixner 2018-01-29 20:14 ` [RFC,05/10] " Eduardo Habkost @ 2018-01-31 10:03 ` Christophe de Dinechin 2 siblings, 0 replies; 75+ messages in thread From: Christophe de Dinechin @ 2018-01-31 10:03 UTC (permalink / raw) To: KarimAllah Ahmed Cc: linux-kernel, Andi Kleen, Andrea Arcangeli, Andy Lutomirski, Arjan van de Ven, Ashok Raj, Asit Mallick, Borislav Petkov, Dan Williams, Dave Hansen, David Woodhouse, Greg Kroah-Hartman, H . Peter Anvin, Ingo Molnar, Janakarajan Natarajan, Joerg Roedel, Jun Nakajima, Laura Abbott, Linus Torvalds, Masami Hiramatsu, Paolo Bonzini, Peter Zijlstra, Radim Krčmář, Thomas Gleixner, Tim Chen, Tom Lendacky, kvm, x86 KarimAllah Ahmed writes: > From: David Woodhouse <dwmw@amazon.co.uk> > > Not functional yet; just add the handling for it in the Spectre v2 > mitigation selection, and the X86_FEATURE_IBRS flag which will control > the code to be added in later patches. > > Also take the #ifdef CONFIG_RETPOLINE from around the RSB-stuffing; IBRS > mode will want that too. > > For now we are auto-selecting IBRS on Skylake. We will probably end up > changing that but for now let's default to the safest option. > > XX: Do we want a microcode blacklist? > > [karahmed: simplify the switch block and get rid of all the magic] > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> > --- > Documentation/admin-guide/kernel-parameters.txt | 1 + > arch/x86/include/asm/cpufeatures.h | 1 + > arch/x86/include/asm/nospec-branch.h | 2 - > arch/x86/kernel/cpu/bugs.c | 108 +++++++++++++++--------- > 4 files changed, 68 insertions(+), 44 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 8122b5f..e597650 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -3932,6 +3932,7 @@ > retpoline - replace indirect branches > retpoline,generic - google's original retpoline > retpoline,amd - AMD-specific minimal thunk > + ibrs - Intel: Indirect Branch Restricted Speculation > > Not specifying this option is equivalent to > spectre_v2=auto. > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h > index 8ec9588..ae86ad9 100644 > --- a/arch/x86/include/asm/cpufeatures.h > +++ b/arch/x86/include/asm/cpufeatures.h > @@ -211,6 +211,7 @@ > #define X86_FEATURE_AMD_PRED_CMD ( 7*32+17) /* Prediction Command MSR (AMD) */ > #define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */ > #define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* Fill RSB on context switches */ > +#define X86_FEATURE_IBRS ( 7*32+21) /* Use IBRS for Spectre v2 safety */ > > /* Virtualization flags: Linux defined, word 8 */ > #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h > index c333c95..8759449 100644 > --- a/arch/x86/include/asm/nospec-branch.h > +++ b/arch/x86/include/asm/nospec-branch.h > @@ -205,7 +205,6 @@ extern char __indirect_thunk_end[]; > */ > static inline void vmexit_fill_RSB(void) > { > -#ifdef CONFIG_RETPOLINE > unsigned long loops; > > asm volatile (ANNOTATE_NOSPEC_ALTERNATIVE > @@ -215,7 +214,6 @@ static inline void vmexit_fill_RSB(void) > "910:" > : "=r" (loops), ASM_CALL_CONSTRAINT > : : "memory" ); > -#endif > } > > static inline void indirect_branch_prediction_barrier(void) > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c > index 96548ff..1d5e12f 100644 > --- a/arch/x86/kernel/cpu/bugs.c > +++ b/arch/x86/kernel/cpu/bugs.c > @@ -79,6 +79,7 @@ enum spectre_v2_mitigation_cmd { > SPECTRE_V2_CMD_RETPOLINE, > SPECTRE_V2_CMD_RETPOLINE_GENERIC, > SPECTRE_V2_CMD_RETPOLINE_AMD, > + SPECTRE_V2_CMD_IBRS, > }; > > static const char *spectre_v2_strings[] = { > @@ -87,6 +88,7 @@ static const char *spectre_v2_strings[] = { > [SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline", > [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", > [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", > + [SPECTRE_V2_IBRS] = "Mitigation: Indirect Branch Restricted Speculation", > }; > > #undef pr_fmt > @@ -132,9 +134,17 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) > spec2_print_if_secure("force enabled on command line."); > return SPECTRE_V2_CMD_FORCE; > } else if (match_option(arg, ret, "retpoline")) { > + if (!IS_ENABLED(CONFIG_RETPOLINE)) { > + pr_err("retpoline selected but not compiled in. Switching to AUTO select\n"); > + return SPECTRE_V2_CMD_AUTO; > + } > spec2_print_if_insecure("retpoline selected on command line."); > return SPECTRE_V2_CMD_RETPOLINE; > } else if (match_option(arg, ret, "retpoline,amd")) { > + if (!IS_ENABLED(CONFIG_RETPOLINE)) { > + pr_err("retpoline,amd selected but not compiled in. Switching to AUTO select\n"); > + return SPECTRE_V2_CMD_AUTO; > + } > if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) { > pr_err("retpoline,amd selected but CPU is not AMD. Switching to AUTO select\n"); > return SPECTRE_V2_CMD_AUTO; > @@ -142,8 +152,19 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) > spec2_print_if_insecure("AMD retpoline selected on command line."); > return SPECTRE_V2_CMD_RETPOLINE_AMD; > } else if (match_option(arg, ret, "retpoline,generic")) { > + if (!IS_ENABLED(CONFIG_RETPOLINE)) { > + pr_err("retpoline,generic selected but not compiled in. Switching to AUTO select\n"); > + return SPECTRE_V2_CMD_AUTO; > + } > spec2_print_if_insecure("generic retpoline selected on command line."); > return SPECTRE_V2_CMD_RETPOLINE_GENERIC; > + } else if (match_option(arg, ret, "ibrs")) { > + if (!boot_cpu_has(X86_FEATURE_SPEC_CTRL)) { > + pr_err("IBRS selected but no CPU support. Switching to AUTO select\n"); > + return SPECTRE_V2_CMD_AUTO; > + } > + spec2_print_if_insecure("IBRS seleted on command line."); > + return SPECTRE_V2_CMD_IBRS; > } else if (match_option(arg, ret, "auto")) { > return SPECTRE_V2_CMD_AUTO; > } > @@ -156,7 +177,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) > return SPECTRE_V2_CMD_NONE; > } > > -/* Check for Skylake-like CPUs (for RSB handling) */ > +/* Check for Skylake-like CPUs (for RSB and IBRS handling) */ > static bool __init is_skylake_era(void) > { > if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && > @@ -178,55 +199,58 @@ static void __init spectre_v2_select_mitigation(void) > enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline(); > enum spectre_v2_mitigation mode = SPECTRE_V2_NONE; > > - /* > - * If the CPU is not affected and the command line mode is NONE or AUTO > - * then nothing to do. > - */ > - if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2) && > - (cmd == SPECTRE_V2_CMD_NONE || cmd == SPECTRE_V2_CMD_AUTO)) > - return; > - > switch (cmd) { > case SPECTRE_V2_CMD_NONE: > + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) > + pr_err("kernel not compiled with retpoline; no mitigation available!"); > return; > - > - case SPECTRE_V2_CMD_FORCE: > - /* FALLTRHU */ > - case SPECTRE_V2_CMD_AUTO: > - goto retpoline_auto; > - > - case SPECTRE_V2_CMD_RETPOLINE_AMD: > - if (IS_ENABLED(CONFIG_RETPOLINE)) > - goto retpoline_amd; > - break; > - case SPECTRE_V2_CMD_RETPOLINE_GENERIC: > - if (IS_ENABLED(CONFIG_RETPOLINE)) > - goto retpoline_generic; > + case SPECTRE_V2_CMD_IBRS: > + mode = SPECTRE_V2_IBRS; > + setup_force_cpu_cap(X86_FEATURE_IBRS); > break; > + case SPECTRE_V2_CMD_AUTO: > + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) > + return; > + /* Fall through */ > + case SPECTRE_V2_CMD_FORCE: > + /* > + * If we have IBRS support, and either Skylake or !RETPOLINE, > + * then that's what we do. > + */ > + if (boot_cpu_has(X86_FEATURE_SPEC_CTRL) && > + (is_skylake_era() || !retp_compiler())) { As per Eduardo's comments and followups, it's unclear this will play well under virtualization. Putting this under a separate function with a name making it clear that what we care about is the host, not guest CPU. Under virtualization, you may want to force is_skylake() to return true (unless there is a way to get a more precise answer about the host CPU at that stage?) > + mode = SPECTRE_V2_IBRS; > + setup_force_cpu_cap(X86_FEATURE_IBRS); > + break; > + } > + /* Fall through */ Given the complexity of the decision and the number of fall-through cases, it's probably a good idea to add some printouts for system mgmt or debugging. > case SPECTRE_V2_CMD_RETPOLINE: > - if (IS_ENABLED(CONFIG_RETPOLINE)) > - goto retpoline_auto; > - break; > - } > - pr_err("kernel not compiled with retpoline; no mitigation available!"); > - return; > + case SPECTRE_V2_CMD_RETPOLINE_AMD: > + if (IS_ENABLED(CONFIG_RETPOLINE) && > + boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { > + if (boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) { > + mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD : > + SPECTRE_V2_RETPOLINE_MINIMAL_AMD; > + setup_force_cpu_cap(X86_FEATURE_RETPOLINE_AMD); > + setup_force_cpu_cap(X86_FEATURE_RETPOLINE); > + break; > + } > > -retpoline_auto: > - if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { > - retpoline_amd: > - if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) { > pr_err("LFENCE not serializing. Switching to generic retpoline\n"); > - goto retpoline_generic; > } > - mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD : > - SPECTRE_V2_RETPOLINE_MINIMAL_AMD; > - setup_force_cpu_cap(X86_FEATURE_RETPOLINE_AMD); > - setup_force_cpu_cap(X86_FEATURE_RETPOLINE); > - } else { > - retpoline_generic: > - mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_GENERIC : > - SPECTRE_V2_RETPOLINE_MINIMAL; > - setup_force_cpu_cap(X86_FEATURE_RETPOLINE); > + /* Fall through */ > + case SPECTRE_V2_CMD_RETPOLINE_GENERIC: > + if (IS_ENABLED(CONFIG_RETPOLINE)) { > + mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_GENERIC : > + SPECTRE_V2_RETPOLINE_MINIMAL; > + setup_force_cpu_cap(X86_FEATURE_RETPOLINE); > + break; > + } > + /* Fall through */ > + default: > + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) > + pr_err("kernel not compiled with retpoline; no mitigation available!"); > + return; > } > > spectre_v2_enabled = mode; -- Cheers, Christophe de Dinechin (IRC c3d) ^ permalink raw reply [flat|nested] 75+ messages in thread
end of thread, other threads:[~2018-01-31 16:28 UTC | newest] Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-29 22:29 [RFC,05/10] x86/speculation: Add basic IBRS support infrastructure David Dunn 2018-01-29 22:41 ` Andi Kleen 2018-01-29 22:49 ` Jim Mattson 2018-01-30 1:10 ` Eduardo Habkost 2018-01-30 1:20 ` David Dunn 2018-01-30 1:30 ` Eduardo Habkost 2018-01-29 23:51 ` Fred Jacobs 2018-01-30 1:08 ` Eduardo Habkost -- strict thread matches above, loose matches on Subject: below -- 2018-01-20 19:22 [RFC 00/10] Speculation Control feature support KarimAllah Ahmed 2018-01-20 19:22 ` [RFC 05/10] x86/speculation: Add basic IBRS support infrastructure KarimAllah Ahmed 2018-01-21 14:31 ` Thomas Gleixner 2018-01-21 14:56 ` Borislav Petkov 2018-01-22 9:51 ` Peter Zijlstra 2018-01-22 12:06 ` Borislav Petkov 2018-01-22 13:30 ` Greg Kroah-Hartman 2018-01-22 13:37 ` Woodhouse, David 2018-01-21 15:25 ` David Woodhouse 2018-01-23 20:58 ` David Woodhouse 2018-01-23 22:43 ` Johannes Erdfelt 2018-01-24 8:47 ` Peter Zijlstra 2018-01-24 9:02 ` David Woodhouse 2018-01-24 9:10 ` Greg Kroah-Hartman 2018-01-24 15:09 ` Arjan van de Ven 2018-01-24 15:18 ` David Woodhouse 2018-01-24 9:34 ` Peter Zijlstra 2018-01-24 10:49 ` Henrique de Moraes Holschuh 2018-01-24 12:30 ` David Woodhouse 2018-01-24 12:14 ` David Woodhouse 2018-01-24 12:29 ` Peter Zijlstra 2018-01-24 12:58 ` David Woodhouse 2018-01-29 20:14 ` [RFC,05/10] " Eduardo Habkost 2018-01-29 20:17 ` David Woodhouse 2018-01-29 20:42 ` Eduardo Habkost 2018-01-29 20:44 ` Arjan van de Ven 2018-01-29 21:02 ` David Woodhouse 2018-01-29 21:37 ` Jim Mattson 2018-01-29 21:50 ` Eduardo Habkost 2018-01-29 22:12 ` Jim Mattson 2018-01-30 1:22 ` Eduardo Habkost 2018-01-29 22:25 ` Andi Kleen 2018-01-30 1:37 ` Eduardo Habkost 2018-01-29 21:37 ` Andi Kleen 2018-01-29 21:44 ` Eduardo Habkost 2018-01-29 22:10 ` Konrad Rzeszutek Wilk 2018-01-30 1:12 ` Eduardo Habkost 2018-01-30 0:23 ` Linus Torvalds 2018-01-30 1:03 ` Jim Mattson 2018-01-30 3:13 ` Andi Kleen 2018-01-31 15:03 ` Paolo Bonzini 2018-01-31 15:07 ` Dr. David Alan Gilbert 2018-01-30 1:32 ` Arjan van de Ven 2018-01-30 3:32 ` Linus Torvalds 2018-01-30 12:04 ` Eduardo Habkost 2018-01-30 13:54 ` Arjan van de Ven 2018-01-30 8:22 ` David Woodhouse 2018-01-30 11:35 ` David Woodhouse 2018-01-30 11:56 ` Dr. David Alan Gilbert 2018-01-30 12:11 ` Christian Borntraeger 2018-01-30 14:46 ` Christophe de Dinechin 2018-01-30 14:52 ` Christian Borntraeger 2018-01-30 14:56 ` Christophe de Dinechin 2018-01-30 15:33 ` Christian Borntraeger 2018-01-30 20:46 ` Alan Cox 2018-01-31 10:05 ` Christophe de Dinechin 2018-01-31 10:15 ` Thomas Gleixner 2018-01-31 11:04 ` Dr. David Alan Gilbert 2018-01-31 11:52 ` Borislav Petkov 2018-01-31 12:30 ` Dr. David Alan Gilbert 2018-01-31 13:18 ` Borislav Petkov 2018-01-31 14:04 ` Dr. David Alan Gilbert 2018-01-31 14:44 ` Eduardo Habkost 2018-01-31 16:28 ` Borislav Petkov 2018-01-31 11:07 ` Christophe de Dinechin 2018-01-31 15:00 ` Eduardo Habkost 2018-01-31 15:11 ` Arjan van de Ven 2018-01-31 10:03 ` [RFC 05/10] " Christophe de Dinechin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).