* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 15:17 ` Mathieu Desnoyers
0 siblings, 0 replies; 52+ messages in thread
From: Mathieu Desnoyers @ 2017-11-14 15:17 UTC (permalink / raw)
To: Avi Kivity
Cc: Linus Torvalds, Andy Lutomirski, linux-kernel, linux-api,
Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann
----- On Nov 14, 2017, at 9:53 AM, Avi Kivity avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org wrote:
> On 11/13/2017 06:56 PM, Mathieu Desnoyers wrote:
>> ----- On Nov 10, 2017, at 4:57 PM, Mathieu Desnoyers
>> mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org wrote:
>>
>>> ----- On Nov 10, 2017, at 4:36 PM, Linus Torvalds torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
>>> wrote:
>>>
>>>> On Fri, Nov 10, 2017 at 1:12 PM, Mathieu Desnoyers
>>>> <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>>>>> x86 can return to user-space through sysexit and sysretq, which are not
>>>>> core serializing. This breaks expectations from user-space about
>>>>> sequential consistency from a single-threaded self-modifying program
>>>>> point of view in specific migration patterns.
>>>>>
>>>>> Feedback is welcome,
>>>> We should check with Intel. I would actually be surprised if the I$
>>>> can be out of sync with the D$ after a sysretq. It would actually
>>>> break things like "read code from disk" too in theory.
>>> That core serializing instruction is not that much about I$ vs D$
>>> consistency, but rather about the processor speculatively executing code
>>> ahead of its retirement point. Ref. Intel Architecture Software Developer's
>>> Manual, Volume 3: System Programming.
>>>
>>> 7.1.3. "Handling Self- and Cross-Modifying Code":
>>>
>>> "The act of a processor writing data into a currently executing code segment
>>> with the intent of
>>> executing that data as code is called self-modifying code. Intel Architecture
>>> processors exhibit
>>> model-specific behavior when executing self-modified code, depending upon how
>>> far ahead of
>>> the current execution pointer the code has been modified. As processor
>>> architectures become
>>> more complex and start to speculatively execute code ahead of the retirement
>>> point (as in the P6
>>> family processors), the rules regarding which code should execute, pre- or
>>> post-modification,
>>> become blurred. [...]"
>>>
>>> AFAIU, this core serializing instruction seems to be needed for use-cases of
>>> self-modifying code, but not for the initial load of a program from disk,
>>> as the processor has no way to have speculatively executed any of its
>>> instructions.
>> I figured out what you're pointing to: if exec() is executed by a previously
>> running thread, and there is no core serializing instruction between program
>> load and return to user-space, the kernel ends up acting like a JIT, indeed.
>
> I think that's safe. The kernel has to execute a MOV CR3 instruction
> before it can execute code loaded by exec, and that is a serializing
> instruction. Loading and unloading shared libraries is made safe by the
> IRET executed by page faults (loading) and TLB shootdown IPIs (unloading).
Very good points! Perhaps those guarantees should be documented somewhere ?
>
> Directly modifying code in userspace is unsafe if there is some
> non-coherent instruction cache. Instruction fetch and speculative
> execution are non-coherent, but they're probably too short (in current
> processors) to matter. Trace caches are probably large enough, but I
> don't know whether they are coherent or not.
Android guys at Google have reproducers of context synchronization issues
on arm 64 in JIT scenarios. Based on the information I got, flushing the
instruction caches is not enough: they also need to issue a context
synchronizing instruction.
Perhaps the current Intel processors may have short enough speculative
execution and small enough trace caches, but relying on this without
a clear statement from Intel seems fragile.
I've tried to create a small single-threaded self-modifying loop in
user-space to trigger a trace cache or speculative execution quirk,
but I have not succeeded yet. I suspect that I would need to know
more about the internals of the processor architecture to create the
right stalls that would allow speculative execution to move further
ahead, and trigger an incoherent execution flow. Ideas on how to
trigger this would be welcome.
Thanks,
Mathieu
>
>
>>
>> Therefore, we'd also need to invoke sync_core_before_usermode() after loading
>> the program.
>>
>> Let's wait to hear back from hpa,
>>
>> Thanks,
>>
>> Mathieu
>>
>>
>>> Hopefully hpa can tell us more about this,
>>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>>
>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
> >> http://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 15:42 ` Avi Kivity
0 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2017-11-14 15:42 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Linus Torvalds, Andy Lutomirski, linux-kernel, linux-api,
Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On 11/14/2017 05:17 PM, Mathieu Desnoyers wrote:
> ----- On Nov 14, 2017, at 9:53 AM, Avi Kivity avi@scylladb.com wrote:
>
>> On 11/13/2017 06:56 PM, Mathieu Desnoyers wrote:
>>> ----- On Nov 10, 2017, at 4:57 PM, Mathieu Desnoyers
>>> mathieu.desnoyers@efficios.com wrote:
>>>
>>>> ----- On Nov 10, 2017, at 4:36 PM, Linus Torvalds torvalds@linux-foundation.org
>>>> wrote:
>>>>
>>>>> On Fri, Nov 10, 2017 at 1:12 PM, Mathieu Desnoyers
>>>>> <mathieu.desnoyers@efficios.com> wrote:
>>>>>> x86 can return to user-space through sysexit and sysretq, which are not
>>>>>> core serializing. This breaks expectations from user-space about
>>>>>> sequential consistency from a single-threaded self-modifying program
>>>>>> point of view in specific migration patterns.
>>>>>>
>>>>>> Feedback is welcome,
>>>>> We should check with Intel. I would actually be surprised if the I$
>>>>> can be out of sync with the D$ after a sysretq. It would actually
>>>>> break things like "read code from disk" too in theory.
>>>> That core serializing instruction is not that much about I$ vs D$
>>>> consistency, but rather about the processor speculatively executing code
>>>> ahead of its retirement point. Ref. Intel Architecture Software Developer's
>>>> Manual, Volume 3: System Programming.
>>>>
>>>> 7.1.3. "Handling Self- and Cross-Modifying Code":
>>>>
>>>> "The act of a processor writing data into a currently executing code segment
>>>> with the intent of
>>>> executing that data as code is called self-modifying code. Intel Architecture
>>>> processors exhibit
>>>> model-specific behavior when executing self-modified code, depending upon how
>>>> far ahead of
>>>> the current execution pointer the code has been modified. As processor
>>>> architectures become
>>>> more complex and start to speculatively execute code ahead of the retirement
>>>> point (as in the P6
>>>> family processors), the rules regarding which code should execute, pre- or
>>>> post-modification,
>>>> become blurred. [...]"
>>>>
>>>> AFAIU, this core serializing instruction seems to be needed for use-cases of
>>>> self-modifying code, but not for the initial load of a program from disk,
>>>> as the processor has no way to have speculatively executed any of its
>>>> instructions.
>>> I figured out what you're pointing to: if exec() is executed by a previously
>>> running thread, and there is no core serializing instruction between program
>>> load and return to user-space, the kernel ends up acting like a JIT, indeed.
>> I think that's safe. The kernel has to execute a MOV CR3 instruction
>> before it can execute code loaded by exec, and that is a serializing
>> instruction. Loading and unloading shared libraries is made safe by the
>> IRET executed by page faults (loading) and TLB shootdown IPIs (unloading).
> Very good points! Perhaps those guarantees should be documented somewhere ?
>
>> Directly modifying code in userspace is unsafe if there is some
>> non-coherent instruction cache. Instruction fetch and speculative
>> execution are non-coherent, but they're probably too short (in current
>> processors) to matter. Trace caches are probably large enough, but I
>> don't know whether they are coherent or not.
> Android guys at Google have reproducers of context synchronization issues
> on arm 64 in JIT scenarios. Based on the information I got, flushing the
> instruction caches is not enough: they also need to issue a context
> synchronizing instruction.
>
> Perhaps the current Intel processors may have short enough speculative
> execution and small enough trace caches, but relying on this without
> a clear statement from Intel seems fragile.
A small trace cache is still vulnerable, the question is whether it is
coherent or not.
> I've tried to create a small single-threaded self-modifying loop in
> user-space to trigger a trace cache or speculative execution quirk,
> but I have not succeeded yet. I suspect that I would need to know
> more about the internals of the processor architecture to create the
> right stalls that would allow speculative execution to move further
> ahead, and trigger an incoherent execution flow. Ideas on how to
> trigger this would be welcome.
>
>
Intels resynchronize as soon as you jump (in single-threaded execution),
so you need to update ahead of the current instruction pointer to see
something. Not sure what quirk you're interested in seeing, executing
the old code? That's not very exciting.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 15:42 ` Avi Kivity
0 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2017-11-14 15:42 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Linus Torvalds, Andy Lutomirski, linux-kernel, linux-api,
Peter Zijlstra, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann
On 11/14/2017 05:17 PM, Mathieu Desnoyers wrote:
> ----- On Nov 14, 2017, at 9:53 AM, Avi Kivity avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org wrote:
>
>> On 11/13/2017 06:56 PM, Mathieu Desnoyers wrote:
>>> ----- On Nov 10, 2017, at 4:57 PM, Mathieu Desnoyers
>>> mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org wrote:
>>>
>>>> ----- On Nov 10, 2017, at 4:36 PM, Linus Torvalds torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
>>>> wrote:
>>>>
>>>>> On Fri, Nov 10, 2017 at 1:12 PM, Mathieu Desnoyers
>>>>> <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>>>>>> x86 can return to user-space through sysexit and sysretq, which are not
>>>>>> core serializing. This breaks expectations from user-space about
>>>>>> sequential consistency from a single-threaded self-modifying program
>>>>>> point of view in specific migration patterns.
>>>>>>
>>>>>> Feedback is welcome,
>>>>> We should check with Intel. I would actually be surprised if the I$
>>>>> can be out of sync with the D$ after a sysretq. It would actually
>>>>> break things like "read code from disk" too in theory.
>>>> That core serializing instruction is not that much about I$ vs D$
>>>> consistency, but rather about the processor speculatively executing code
>>>> ahead of its retirement point. Ref. Intel Architecture Software Developer's
>>>> Manual, Volume 3: System Programming.
>>>>
>>>> 7.1.3. "Handling Self- and Cross-Modifying Code":
>>>>
>>>> "The act of a processor writing data into a currently executing code segment
>>>> with the intent of
>>>> executing that data as code is called self-modifying code. Intel Architecture
>>>> processors exhibit
>>>> model-specific behavior when executing self-modified code, depending upon how
>>>> far ahead of
>>>> the current execution pointer the code has been modified. As processor
>>>> architectures become
>>>> more complex and start to speculatively execute code ahead of the retirement
>>>> point (as in the P6
>>>> family processors), the rules regarding which code should execute, pre- or
>>>> post-modification,
>>>> become blurred. [...]"
>>>>
>>>> AFAIU, this core serializing instruction seems to be needed for use-cases of
>>>> self-modifying code, but not for the initial load of a program from disk,
>>>> as the processor has no way to have speculatively executed any of its
>>>> instructions.
>>> I figured out what you're pointing to: if exec() is executed by a previously
>>> running thread, and there is no core serializing instruction between program
>>> load and return to user-space, the kernel ends up acting like a JIT, indeed.
>> I think that's safe. The kernel has to execute a MOV CR3 instruction
>> before it can execute code loaded by exec, and that is a serializing
>> instruction. Loading and unloading shared libraries is made safe by the
>> IRET executed by page faults (loading) and TLB shootdown IPIs (unloading).
> Very good points! Perhaps those guarantees should be documented somewhere ?
>
>> Directly modifying code in userspace is unsafe if there is some
>> non-coherent instruction cache. Instruction fetch and speculative
>> execution are non-coherent, but they're probably too short (in current
>> processors) to matter. Trace caches are probably large enough, but I
>> don't know whether they are coherent or not.
> Android guys at Google have reproducers of context synchronization issues
> on arm 64 in JIT scenarios. Based on the information I got, flushing the
> instruction caches is not enough: they also need to issue a context
> synchronizing instruction.
>
> Perhaps the current Intel processors may have short enough speculative
> execution and small enough trace caches, but relying on this without
> a clear statement from Intel seems fragile.
A small trace cache is still vulnerable, the question is whether it is
coherent or not.
> I've tried to create a small single-threaded self-modifying loop in
> user-space to trigger a trace cache or speculative execution quirk,
> but I have not succeeded yet. I suspect that I would need to know
> more about the internals of the processor architecture to create the
> right stalls that would allow speculative execution to move further
> ahead, and trigger an incoherent execution flow. Ideas on how to
> trigger this would be welcome.
>
>
Intels resynchronize as soon as you jump (in single-threaded execution),
so you need to update ahead of the current instruction pointer to see
something. Not sure what quirk you're interested in seeing, executing
the old code? That's not very exciting.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:05 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2017-11-14 16:05 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Avi Kivity, Linus Torvalds, Andy Lutomirski, linux-kernel,
linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
> I've tried to create a small single-threaded self-modifying loop in
> user-space to trigger a trace cache or speculative execution quirk,
> but I have not succeeded yet. I suspect that I would need to know
> more about the internals of the processor architecture to create the
> right stalls that would allow speculative execution to move further
> ahead, and trigger an incoherent execution flow. Ideas on how to
> trigger this would be welcome.
I thought the whole problem was per definition multi-threaded.
Single-threaded stuff can't get out of sync with itself; you'll always
observe your own stores.
And ISTR the JIT scenario being something like the JIT overwriting
previously executed but supposedly no longer used code. And in this
scenario you'd want to guarantee all CPUs observe the new code before
jumping into it.
The current approach is using mprotect(), except that on a number of
platforms the TLB invalidate from that is not guaranteed to be strong
enough to sync for code changes.
On x86 the mprotect() should work just fine, since we broadcast IPIs for
the TLB invalidate and the IRET from those will get the things synced up
again (if nothing else; very likely we'll have done a MOV-CR3 which will
of course also have sufficient syncness on it).
But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
and don't guarantee their TLB invalidate sync against execution units
are left broken by this scheme.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:05 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2017-11-14 16:05 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Avi Kivity, Linus Torvalds, Andy Lutomirski, linux-kernel,
linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann
On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
> I've tried to create a small single-threaded self-modifying loop in
> user-space to trigger a trace cache or speculative execution quirk,
> but I have not succeeded yet. I suspect that I would need to know
> more about the internals of the processor architecture to create the
> right stalls that would allow speculative execution to move further
> ahead, and trigger an incoherent execution flow. Ideas on how to
> trigger this would be welcome.
I thought the whole problem was per definition multi-threaded.
Single-threaded stuff can't get out of sync with itself; you'll always
observe your own stores.
And ISTR the JIT scenario being something like the JIT overwriting
previously executed but supposedly no longer used code. And in this
scenario you'd want to guarantee all CPUs observe the new code before
jumping into it.
The current approach is using mprotect(), except that on a number of
platforms the TLB invalidate from that is not guaranteed to be strong
enough to sync for code changes.
On x86 the mprotect() should work just fine, since we broadcast IPIs for
the TLB invalidate and the IRET from those will get the things synced up
again (if nothing else; very likely we'll have done a MOV-CR3 which will
of course also have sufficient syncness on it).
But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
and don't guarantee their TLB invalidate sync against execution units
are left broken by this scheme.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:08 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2017-11-14 16:08 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Avi Kivity, Linus Torvalds, Andy Lutomirski, linux-kernel,
linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
> > I've tried to create a small single-threaded self-modifying loop in
> > user-space to trigger a trace cache or speculative execution quirk,
> > but I have not succeeded yet. I suspect that I would need to know
> > more about the internals of the processor architecture to create the
> > right stalls that would allow speculative execution to move further
> > ahead, and trigger an incoherent execution flow. Ideas on how to
> > trigger this would be welcome.
>
> I thought the whole problem was per definition multi-threaded.
>
> Single-threaded stuff can't get out of sync with itself; you'll always
> observe your own stores.
And even if you could, you can always execute a local serializing
instruction like CPUID to force things.
> And ISTR the JIT scenario being something like the JIT overwriting
> previously executed but supposedly no longer used code. And in this
> scenario you'd want to guarantee all CPUs observe the new code before
> jumping into it.
>
> The current approach is using mprotect(), except that on a number of
> platforms the TLB invalidate from that is not guaranteed to be strong
> enough to sync for code changes.
>
> On x86 the mprotect() should work just fine, since we broadcast IPIs for
> the TLB invalidate and the IRET from those will get the things synced up
> again (if nothing else; very likely we'll have done a MOV-CR3 which will
> of course also have sufficient syncness on it).
>
> But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
> and don't guarantee their TLB invalidate sync against execution units
> are left broken by this scheme.
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:08 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2017-11-14 16:08 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Avi Kivity, Linus Torvalds, Andy Lutomirski, linux-kernel,
linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann
On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
> > I've tried to create a small single-threaded self-modifying loop in
> > user-space to trigger a trace cache or speculative execution quirk,
> > but I have not succeeded yet. I suspect that I would need to know
> > more about the internals of the processor architecture to create the
> > right stalls that would allow speculative execution to move further
> > ahead, and trigger an incoherent execution flow. Ideas on how to
> > trigger this would be welcome.
>
> I thought the whole problem was per definition multi-threaded.
>
> Single-threaded stuff can't get out of sync with itself; you'll always
> observe your own stores.
And even if you could, you can always execute a local serializing
instruction like CPUID to force things.
> And ISTR the JIT scenario being something like the JIT overwriting
> previously executed but supposedly no longer used code. And in this
> scenario you'd want to guarantee all CPUs observe the new code before
> jumping into it.
>
> The current approach is using mprotect(), except that on a number of
> platforms the TLB invalidate from that is not guaranteed to be strong
> enough to sync for code changes.
>
> On x86 the mprotect() should work just fine, since we broadcast IPIs for
> the TLB invalidate and the IRET from those will get the things synced up
> again (if nothing else; very likely we'll have done a MOV-CR3 which will
> of course also have sufficient syncness on it).
>
> But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
> and don't guarantee their TLB invalidate sync against execution units
> are left broken by this scheme.
>
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
2017-11-14 16:08 ` Peter Zijlstra
@ 2017-11-14 16:49 ` Mathieu Desnoyers
-1 siblings, 0 replies; 52+ messages in thread
From: Mathieu Desnoyers @ 2017-11-14 16:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Avi Kivity, Linus Torvalds, Andy Lutomirski, linux-kernel,
linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
----- On Nov 14, 2017, at 11:08 AM, Peter Zijlstra peterz@infradead.org wrote:
> On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>> > I've tried to create a small single-threaded self-modifying loop in
>> > user-space to trigger a trace cache or speculative execution quirk,
>> > but I have not succeeded yet. I suspect that I would need to know
>> > more about the internals of the processor architecture to create the
>> > right stalls that would allow speculative execution to move further
>> > ahead, and trigger an incoherent execution flow. Ideas on how to
>> > trigger this would be welcome.
>>
>> I thought the whole problem was per definition multi-threaded.
>>
>> Single-threaded stuff can't get out of sync with itself; you'll always
>> observe your own stores.
>
> And even if you could, you can always execute a local serializing
> instruction like CPUID to force things.
What I'm trying to reproduce is something that breaks in single-threaded
case if I explicitly leave out the CPUID core serializing instruction
when doing code modification on upcoming code, in a loop.
AFAIU, Intel requires a core serializing instruction to be issued even
in single-threaded scenarios between code update and execution, to ensure
that speculative execution does not observe incoherent code. Now the
question we all have for Intel is: is this requirement too strong, or
required by reality ?
Thanks,
Mathieu
>
>> And ISTR the JIT scenario being something like the JIT overwriting
>> previously executed but supposedly no longer used code. And in this
>> scenario you'd want to guarantee all CPUs observe the new code before
>> jumping into it.
>>
>> The current approach is using mprotect(), except that on a number of
>> platforms the TLB invalidate from that is not guaranteed to be strong
>> enough to sync for code changes.
>>
>> On x86 the mprotect() should work just fine, since we broadcast IPIs for
>> the TLB invalidate and the IRET from those will get the things synced up
>> again (if nothing else; very likely we'll have done a MOV-CR3 which will
>> of course also have sufficient syncness on it).
>>
>> But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
>> and don't guarantee their TLB invalidate sync against execution units
>> are left broken by this scheme.
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:49 ` Mathieu Desnoyers
0 siblings, 0 replies; 52+ messages in thread
From: Mathieu Desnoyers @ 2017-11-14 16:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Avi Kivity, Linus Torvalds, Andy Lutomirski, linux-kernel,
linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann
----- On Nov 14, 2017, at 11:08 AM, Peter Zijlstra peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org wrote:
> On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>> > I've tried to create a small single-threaded self-modifying loop in
>> > user-space to trigger a trace cache or speculative execution quirk,
>> > but I have not succeeded yet. I suspect that I would need to know
>> > more about the internals of the processor architecture to create the
>> > right stalls that would allow speculative execution to move further
>> > ahead, and trigger an incoherent execution flow. Ideas on how to
>> > trigger this would be welcome.
>>
>> I thought the whole problem was per definition multi-threaded.
>>
>> Single-threaded stuff can't get out of sync with itself; you'll always
>> observe your own stores.
>
> And even if you could, you can always execute a local serializing
> instruction like CPUID to force things.
What I'm trying to reproduce is something that breaks in single-threaded
case if I explicitly leave out the CPUID core serializing instruction
when doing code modification on upcoming code, in a loop.
AFAIU, Intel requires a core serializing instruction to be issued even
in single-threaded scenarios between code update and execution, to ensure
that speculative execution does not observe incoherent code. Now the
question we all have for Intel is: is this requirement too strong, or
required by reality ?
Thanks,
Mathieu
>
>> And ISTR the JIT scenario being something like the JIT overwriting
>> previously executed but supposedly no longer used code. And in this
>> scenario you'd want to guarantee all CPUs observe the new code before
>> jumping into it.
>>
>> The current approach is using mprotect(), except that on a number of
>> platforms the TLB invalidate from that is not guaranteed to be strong
>> enough to sync for code changes.
>>
>> On x86 the mprotect() should work just fine, since we broadcast IPIs for
>> the TLB invalidate and the IRET from those will get the things synced up
>> again (if nothing else; very likely we'll have done a MOV-CR3 which will
>> of course also have sufficient syncness on it).
>>
>> But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
>> and don't guarantee their TLB invalidate sync against execution units
>> are left broken by this scheme.
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
2017-11-14 16:49 ` Mathieu Desnoyers
@ 2017-11-14 17:03 ` Avi Kivity
-1 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2017-11-14 17:03 UTC (permalink / raw)
To: Mathieu Desnoyers, Peter Zijlstra
Cc: Linus Torvalds, Andy Lutomirski, linux-kernel, linux-api,
Paul E. McKenney, Boqun Feng, Andrew Hunter, maged michael,
Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Dave Watson, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Andrea Parri, Russell King, ARM Linux, Greg Hackmann,
Will Deacon, David Sehr, x86
On 11/14/2017 06:49 PM, Mathieu Desnoyers wrote:
> ----- On Nov 14, 2017, at 11:08 AM, Peter Zijlstra peterz@infradead.org wrote:
>
>> On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
>>> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>>>> I've tried to create a small single-threaded self-modifying loop in
>>>> user-space to trigger a trace cache or speculative execution quirk,
>>>> but I have not succeeded yet. I suspect that I would need to know
>>>> more about the internals of the processor architecture to create the
>>>> right stalls that would allow speculative execution to move further
>>>> ahead, and trigger an incoherent execution flow. Ideas on how to
>>>> trigger this would be welcome.
>>> I thought the whole problem was per definition multi-threaded.
>>>
>>> Single-threaded stuff can't get out of sync with itself; you'll always
>>> observe your own stores.
>> And even if you could, you can always execute a local serializing
>> instruction like CPUID to force things.
> What I'm trying to reproduce is something that breaks in single-threaded
> case if I explicitly leave out the CPUID core serializing instruction
> when doing code modification on upcoming code, in a loop.
>
> AFAIU, Intel requires a core serializing instruction to be issued even
> in single-threaded scenarios between code update and execution, to ensure
> that speculative execution does not observe incoherent code. Now the
> question we all have for Intel is: is this requirement too strong, or
> required by reality ?
>
In single-threaded execution, a jump is enough.
"As processor microarchitectures become more complex and start to
speculatively execute code ahead of the retire-
ment point (as in P6 and more recent processor families), the rules
regarding which code should execute, pre- or
post-modification, become blurred. To write self-modifying code and
ensure that it is compliant with current and
future versions of the IA-32 architectures, use one of the following
coding options:
(* OPTION 1 *)
Store modified code (as data) into code segment;
Jump to new code or an intermediate location;
Execute new code;"
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 17:03 ` Avi Kivity
0 siblings, 0 replies; 52+ messages in thread
From: Avi Kivity @ 2017-11-14 17:03 UTC (permalink / raw)
To: Mathieu Desnoyers, Peter Zijlstra
Cc: Linus Torvalds, Andy Lutomirski, linux-kernel, linux-api,
Paul E. McKenney, Boqun Feng, Andrew Hunter, maged michael,
Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
Dave Watson, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Andrea Parri, Russell King, ARM Linux, Greg Hackmann,
Will Deacon
On 11/14/2017 06:49 PM, Mathieu Desnoyers wrote:
> ----- On Nov 14, 2017, at 11:08 AM, Peter Zijlstra peterz@infradead.org wrote:
>
>> On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
>>> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>>>> I've tried to create a small single-threaded self-modifying loop in
>>>> user-space to trigger a trace cache or speculative execution quirk,
>>>> but I have not succeeded yet. I suspect that I would need to know
>>>> more about the internals of the processor architecture to create the
>>>> right stalls that would allow speculative execution to move further
>>>> ahead, and trigger an incoherent execution flow. Ideas on how to
>>>> trigger this would be welcome.
>>> I thought the whole problem was per definition multi-threaded.
>>>
>>> Single-threaded stuff can't get out of sync with itself; you'll always
>>> observe your own stores.
>> And even if you could, you can always execute a local serializing
>> instruction like CPUID to force things.
> What I'm trying to reproduce is something that breaks in single-threaded
> case if I explicitly leave out the CPUID core serializing instruction
> when doing code modification on upcoming code, in a loop.
>
> AFAIU, Intel requires a core serializing instruction to be issued even
> in single-threaded scenarios between code update and execution, to ensure
> that speculative execution does not observe incoherent code. Now the
> question we all have for Intel is: is this requirement too strong, or
> required by reality ?
>
In single-threaded execution, a jump is enough.
"As processor microarchitectures become more complex and start to
speculatively execute code ahead of the retire-
ment point (as in P6 and more recent processor families), the rules
regarding which code should execute, pre- or
post-modification, become blurred. To write self-modifying code and
ensure that it is compliant with current and
future versions of the IA-32 architectures, use one of the following
coding options:
(* OPTION 1 *)
Store modified code (as data) into code segment;
Jump to new code or an intermediate location;
Execute new code;"
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 17:10 ` Mathieu Desnoyers
0 siblings, 0 replies; 52+ messages in thread
From: Mathieu Desnoyers @ 2017-11-14 17:10 UTC (permalink / raw)
To: Avi Kivity
Cc: Peter Zijlstra, Linus Torvalds, Andy Lutomirski, linux-kernel,
linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
----- On Nov 14, 2017, at 12:03 PM, Avi Kivity avi@scylladb.com wrote:
> On 11/14/2017 06:49 PM, Mathieu Desnoyers wrote:
>> ----- On Nov 14, 2017, at 11:08 AM, Peter Zijlstra peterz@infradead.org wrote:
>>
>>> On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
>>>> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>>>>> I've tried to create a small single-threaded self-modifying loop in
>>>>> user-space to trigger a trace cache or speculative execution quirk,
>>>>> but I have not succeeded yet. I suspect that I would need to know
>>>>> more about the internals of the processor architecture to create the
>>>>> right stalls that would allow speculative execution to move further
>>>>> ahead, and trigger an incoherent execution flow. Ideas on how to
>>>>> trigger this would be welcome.
>>>> I thought the whole problem was per definition multi-threaded.
>>>>
>>>> Single-threaded stuff can't get out of sync with itself; you'll always
>>>> observe your own stores.
>>> And even if you could, you can always execute a local serializing
>>> instruction like CPUID to force things.
>> What I'm trying to reproduce is something that breaks in single-threaded
>> case if I explicitly leave out the CPUID core serializing instruction
>> when doing code modification on upcoming code, in a loop.
>>
>> AFAIU, Intel requires a core serializing instruction to be issued even
>> in single-threaded scenarios between code update and execution, to ensure
>> that speculative execution does not observe incoherent code. Now the
>> question we all have for Intel is: is this requirement too strong, or
>> required by reality ?
>>
>
> In single-threaded execution, a jump is enough.
>
> "As processor microarchitectures become more complex and start to
> speculatively execute code ahead of the retire-
> ment point (as in P6 and more recent processor families), the rules
> regarding which code should execute, pre- or
> post-modification, become blurred. To write self-modifying code and
> ensure that it is compliant with current and
> future versions of the IA-32 architectures, use one of the following
> coding options:
>
> (* OPTION 1 *)
> Store modified code (as data) into code segment;
> Jump to new code or an intermediate location;
> Execute new code;"
Good point, so this is likely why I was having trouble reproducing the
single-threaded self-modifying code incoherent case. I did have a branch
in there.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 17:10 ` Mathieu Desnoyers
0 siblings, 0 replies; 52+ messages in thread
From: Mathieu Desnoyers @ 2017-11-14 17:10 UTC (permalink / raw)
To: Avi Kivity
Cc: Peter Zijlstra, Linus Torvalds, Andy Lutomirski, linux-kernel,
linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann
----- On Nov 14, 2017, at 12:03 PM, Avi Kivity avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org wrote:
> On 11/14/2017 06:49 PM, Mathieu Desnoyers wrote:
>> ----- On Nov 14, 2017, at 11:08 AM, Peter Zijlstra peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org wrote:
>>
>>> On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
>>>> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>>>>> I've tried to create a small single-threaded self-modifying loop in
>>>>> user-space to trigger a trace cache or speculative execution quirk,
>>>>> but I have not succeeded yet. I suspect that I would need to know
>>>>> more about the internals of the processor architecture to create the
>>>>> right stalls that would allow speculative execution to move further
>>>>> ahead, and trigger an incoherent execution flow. Ideas on how to
>>>>> trigger this would be welcome.
>>>> I thought the whole problem was per definition multi-threaded.
>>>>
>>>> Single-threaded stuff can't get out of sync with itself; you'll always
>>>> observe your own stores.
>>> And even if you could, you can always execute a local serializing
>>> instruction like CPUID to force things.
>> What I'm trying to reproduce is something that breaks in single-threaded
>> case if I explicitly leave out the CPUID core serializing instruction
>> when doing code modification on upcoming code, in a loop.
>>
>> AFAIU, Intel requires a core serializing instruction to be issued even
>> in single-threaded scenarios between code update and execution, to ensure
>> that speculative execution does not observe incoherent code. Now the
>> question we all have for Intel is: is this requirement too strong, or
>> required by reality ?
>>
>
> In single-threaded execution, a jump is enough.
>
> "As processor microarchitectures become more complex and start to
> speculatively execute code ahead of the retire-
> ment point (as in P6 and more recent processor families), the rules
> regarding which code should execute, pre- or
> post-modification, become blurred. To write self-modifying code and
> ensure that it is compliant with current and
> future versions of the IA-32 architectures, use one of the following
> coding options:
>
> (* OPTION 1 *)
> Store modified code (as data) into code segment;
> Jump to new code or an intermediate location;
> Execute new code;"
Good point, so this is likely why I was having trouble reproducing the
single-threaded self-modifying code incoherent case. I did have a branch
in there.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 17:31 ` Linus Torvalds
0 siblings, 0 replies; 52+ messages in thread
From: Linus Torvalds @ 2017-11-14 17:31 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Avi Kivity, Peter Zijlstra, Andy Lutomirski, linux-kernel,
linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On Tue, Nov 14, 2017 at 9:10 AM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>> (* OPTION 1 *)
>> Store modified code (as data) into code segment;
>> Jump to new code or an intermediate location;
>> Execute new code;"
>
> Good point, so this is likely why I was having trouble reproducing the
> single-threaded self-modifying code incoherent case. I did have a branch
> in there.
Actually, even *without* the branch, Intel has been very good at
having precise I$ coherency. I think uou can literally store to the
next instruction, and Intel CPU's after the Pentium Pro would notice,
take a micro-fault, and handle it correctly (the i486 and Pentium did
not have that level of coherency, but a taken branch would flush the
fetch buffer).
An in-order Atom probabably has the old Pentium behavior, and you
could see it there.
But starting with the P6, and OoO execution, the "taken branch" thing
meant very little, so Intel started instead just doing the
"store-vs-instruction fetch" coherency explicitly, which causes it to
be precise.
Afaik, the only way to show incoherent I$ fairly easily is to use
virtual aliasing, and store to a different virtual address, because
the fetch buffer coherency is done by virtual address.
But even then, it's only the fetch buffer (and it's been called
different things over the years, now it's a uop loop cache), not the
L1 caches, so you get a very limited window of instructions.
And that fetch buffer is also where any cross-cpu incoherency would
be, for the exact same reason.
Linus
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 17:31 ` Linus Torvalds
0 siblings, 0 replies; 52+ messages in thread
From: Linus Torvalds @ 2017-11-14 17:31 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Avi Kivity, Peter Zijlstra, Andy Lutomirski, linux-kernel,
linux-api, Paul E. McKenney, Boqun Feng, Andrew Hunter,
maged michael, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Dave Watson, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will
On Tue, Nov 14, 2017 at 9:10 AM, Mathieu Desnoyers
<mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
>> (* OPTION 1 *)
>> Store modified code (as data) into code segment;
>> Jump to new code or an intermediate location;
>> Execute new code;"
>
> Good point, so this is likely why I was having trouble reproducing the
> single-threaded self-modifying code incoherent case. I did have a branch
> in there.
Actually, even *without* the branch, Intel has been very good at
having precise I$ coherency. I think uou can literally store to the
next instruction, and Intel CPU's after the Pentium Pro would notice,
take a micro-fault, and handle it correctly (the i486 and Pentium did
not have that level of coherency, but a taken branch would flush the
fetch buffer).
An in-order Atom probabably has the old Pentium behavior, and you
could see it there.
But starting with the P6, and OoO execution, the "taken branch" thing
meant very little, so Intel started instead just doing the
"store-vs-instruction fetch" coherency explicitly, which causes it to
be precise.
Afaik, the only way to show incoherent I$ fairly easily is to use
virtual aliasing, and store to a different virtual address, because
the fetch buffer coherency is done by virtual address.
But even then, it's only the fetch buffer (and it's been called
different things over the years, now it's a uop loop cache), not the
L1 caches, so you get a very limited window of instructions.
And that fetch buffer is also where any cross-cpu incoherency would
be, for the exact same reason.
Linus
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:10 ` Andy Lutomirski
0 siblings, 0 replies; 52+ messages in thread
From: Andy Lutomirski @ 2017-11-14 16:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Mathieu Desnoyers, Avi Kivity, Linus Torvalds, Andy Lutomirski,
linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, Andrea Parri, Russell King,
ARM Linux, Greg Hackmann, Will Deacon, David Sehr, x86
On Tue, Nov 14, 2017 at 8:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>> I've tried to create a small single-threaded self-modifying loop in
>> user-space to trigger a trace cache or speculative execution quirk,
>> but I have not succeeded yet. I suspect that I would need to know
>> more about the internals of the processor architecture to create the
>> right stalls that would allow speculative execution to move further
>> ahead, and trigger an incoherent execution flow. Ideas on how to
>> trigger this would be welcome.
>
> I thought the whole problem was per definition multi-threaded.
>
> Single-threaded stuff can't get out of sync with itself; you'll always
> observe your own stores.
>
> And ISTR the JIT scenario being something like the JIT overwriting
> previously executed but supposedly no longer used code. And in this
> scenario you'd want to guarantee all CPUs observe the new code before
> jumping into it.
>
> The current approach is using mprotect(), except that on a number of
> platforms the TLB invalidate from that is not guaranteed to be strong
> enough to sync for code changes.
>
> On x86 the mprotect() should work just fine, since we broadcast IPIs for
> the TLB invalidate and the IRET from those will get the things synced up
> again (if nothing else; very likely we'll have done a MOV-CR3 which will
> of course also have sufficient syncness on it).
>
> But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
> and don't guarantee their TLB invalidate sync against execution units
> are left broken by this scheme.
>
On x86 single-thread, you can still get in trouble, I think. Do a
store, get migrated, execute the stored code. There's no actual
guarantee that the new CPU does a CR3 load due to laziness.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:10 ` Andy Lutomirski
0 siblings, 0 replies; 52+ messages in thread
From: Andy Lutomirski @ 2017-11-14 16:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Mathieu Desnoyers, Avi Kivity, Linus Torvalds, Andy Lutomirski,
linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, Andrea Parri, Russell King,
ARM Linux
On Tue, Nov 14, 2017 at 8:05 AM, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>> I've tried to create a small single-threaded self-modifying loop in
>> user-space to trigger a trace cache or speculative execution quirk,
>> but I have not succeeded yet. I suspect that I would need to know
>> more about the internals of the processor architecture to create the
>> right stalls that would allow speculative execution to move further
>> ahead, and trigger an incoherent execution flow. Ideas on how to
>> trigger this would be welcome.
>
> I thought the whole problem was per definition multi-threaded.
>
> Single-threaded stuff can't get out of sync with itself; you'll always
> observe your own stores.
>
> And ISTR the JIT scenario being something like the JIT overwriting
> previously executed but supposedly no longer used code. And in this
> scenario you'd want to guarantee all CPUs observe the new code before
> jumping into it.
>
> The current approach is using mprotect(), except that on a number of
> platforms the TLB invalidate from that is not guaranteed to be strong
> enough to sync for code changes.
>
> On x86 the mprotect() should work just fine, since we broadcast IPIs for
> the TLB invalidate and the IRET from those will get the things synced up
> again (if nothing else; very likely we'll have done a MOV-CR3 which will
> of course also have sufficient syncness on it).
>
> But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
> and don't guarantee their TLB invalidate sync against execution units
> are left broken by this scheme.
>
On x86 single-thread, you can still get in trouble, I think. Do a
store, get migrated, execute the stored code. There's no actual
guarantee that the new CPU does a CR3 load due to laziness.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:13 ` Thomas Gleixner
0 siblings, 0 replies; 52+ messages in thread
From: Thomas Gleixner @ 2017-11-14 16:13 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Peter Zijlstra, Mathieu Desnoyers, Avi Kivity, Linus Torvalds,
linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On Tue, 14 Nov 2017, Andy Lutomirski wrote:
> On Tue, Nov 14, 2017 at 8:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
> >> I've tried to create a small single-threaded self-modifying loop in
> >> user-space to trigger a trace cache or speculative execution quirk,
> >> but I have not succeeded yet. I suspect that I would need to know
> >> more about the internals of the processor architecture to create the
> >> right stalls that would allow speculative execution to move further
> >> ahead, and trigger an incoherent execution flow. Ideas on how to
> >> trigger this would be welcome.
> >
> > I thought the whole problem was per definition multi-threaded.
> >
> > Single-threaded stuff can't get out of sync with itself; you'll always
> > observe your own stores.
> >
> > And ISTR the JIT scenario being something like the JIT overwriting
> > previously executed but supposedly no longer used code. And in this
> > scenario you'd want to guarantee all CPUs observe the new code before
> > jumping into it.
> >
> > The current approach is using mprotect(), except that on a number of
> > platforms the TLB invalidate from that is not guaranteed to be strong
> > enough to sync for code changes.
> >
> > On x86 the mprotect() should work just fine, since we broadcast IPIs for
> > the TLB invalidate and the IRET from those will get the things synced up
> > again (if nothing else; very likely we'll have done a MOV-CR3 which will
> > of course also have sufficient syncness on it).
> >
> > But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
> > and don't guarantee their TLB invalidate sync against execution units
> > are left broken by this scheme.
> >
>
> On x86 single-thread, you can still get in trouble, I think. Do a
> store, get migrated, execute the stored code. There's no actual
> guarantee that the new CPU does a CR3 load due to laziness.
The migration IPI will probably prevent that.
Thanks,
tglx
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:13 ` Thomas Gleixner
0 siblings, 0 replies; 52+ messages in thread
From: Thomas Gleixner @ 2017-11-14 16:13 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Peter Zijlstra, Mathieu Desnoyers, Avi Kivity, Linus Torvalds,
linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann
On Tue, 14 Nov 2017, Andy Lutomirski wrote:
> On Tue, Nov 14, 2017 at 8:05 AM, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> > On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
> >> I've tried to create a small single-threaded self-modifying loop in
> >> user-space to trigger a trace cache or speculative execution quirk,
> >> but I have not succeeded yet. I suspect that I would need to know
> >> more about the internals of the processor architecture to create the
> >> right stalls that would allow speculative execution to move further
> >> ahead, and trigger an incoherent execution flow. Ideas on how to
> >> trigger this would be welcome.
> >
> > I thought the whole problem was per definition multi-threaded.
> >
> > Single-threaded stuff can't get out of sync with itself; you'll always
> > observe your own stores.
> >
> > And ISTR the JIT scenario being something like the JIT overwriting
> > previously executed but supposedly no longer used code. And in this
> > scenario you'd want to guarantee all CPUs observe the new code before
> > jumping into it.
> >
> > The current approach is using mprotect(), except that on a number of
> > platforms the TLB invalidate from that is not guaranteed to be strong
> > enough to sync for code changes.
> >
> > On x86 the mprotect() should work just fine, since we broadcast IPIs for
> > the TLB invalidate and the IRET from those will get the things synced up
> > again (if nothing else; very likely we'll have done a MOV-CR3 which will
> > of course also have sufficient syncness on it).
> >
> > But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
> > and don't guarantee their TLB invalidate sync against execution units
> > are left broken by this scheme.
> >
>
> On x86 single-thread, you can still get in trouble, I think. Do a
> store, get migrated, execute the stored code. There's no actual
> guarantee that the new CPU does a CR3 load due to laziness.
The migration IPI will probably prevent that.
Thanks,
tglx
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
2017-11-14 16:13 ` Thomas Gleixner
@ 2017-11-14 16:16 ` Andy Lutomirski
-1 siblings, 0 replies; 52+ messages in thread
From: Andy Lutomirski @ 2017-11-14 16:16 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Andy Lutomirski, Peter Zijlstra, Mathieu Desnoyers, Avi Kivity,
Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On Tue, Nov 14, 2017 at 8:13 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 14 Nov 2017, Andy Lutomirski wrote:
>> On Tue, Nov 14, 2017 at 8:05 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>> >> I've tried to create a small single-threaded self-modifying loop in
>> >> user-space to trigger a trace cache or speculative execution quirk,
>> >> but I have not succeeded yet. I suspect that I would need to know
>> >> more about the internals of the processor architecture to create the
>> >> right stalls that would allow speculative execution to move further
>> >> ahead, and trigger an incoherent execution flow. Ideas on how to
>> >> trigger this would be welcome.
>> >
>> > I thought the whole problem was per definition multi-threaded.
>> >
>> > Single-threaded stuff can't get out of sync with itself; you'll always
>> > observe your own stores.
>> >
>> > And ISTR the JIT scenario being something like the JIT overwriting
>> > previously executed but supposedly no longer used code. And in this
>> > scenario you'd want to guarantee all CPUs observe the new code before
>> > jumping into it.
>> >
>> > The current approach is using mprotect(), except that on a number of
>> > platforms the TLB invalidate from that is not guaranteed to be strong
>> > enough to sync for code changes.
>> >
>> > On x86 the mprotect() should work just fine, since we broadcast IPIs for
>> > the TLB invalidate and the IRET from those will get the things synced up
>> > again (if nothing else; very likely we'll have done a MOV-CR3 which will
>> > of course also have sufficient syncness on it).
>> >
>> > But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
>> > and don't guarantee their TLB invalidate sync against execution units
>> > are left broken by this scheme.
>> >
>>
>> On x86 single-thread, you can still get in trouble, I think. Do a
>> store, get migrated, execute the stored code. There's no actual
>> guarantee that the new CPU does a CR3 load due to laziness.
>
> The migration IPI will probably prevent that.
What guarantees that there's an IPI? Do we never do a syscall, get
migrated during syscall processing (due to cond_resched(), for
example), and land on another CPU that just happened to already be
scheduling?
--Andy
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:16 ` Andy Lutomirski
0 siblings, 0 replies; 52+ messages in thread
From: Andy Lutomirski @ 2017-11-14 16:16 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Andy Lutomirski, Peter Zijlstra, Mathieu Desnoyers, Avi Kivity,
Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux
On Tue, Nov 14, 2017 at 8:13 AM, Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> wrote:
> On Tue, 14 Nov 2017, Andy Lutomirski wrote:
>> On Tue, Nov 14, 2017 at 8:05 AM, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
>> > On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>> >> I've tried to create a small single-threaded self-modifying loop in
>> >> user-space to trigger a trace cache or speculative execution quirk,
>> >> but I have not succeeded yet. I suspect that I would need to know
>> >> more about the internals of the processor architecture to create the
>> >> right stalls that would allow speculative execution to move further
>> >> ahead, and trigger an incoherent execution flow. Ideas on how to
>> >> trigger this would be welcome.
>> >
>> > I thought the whole problem was per definition multi-threaded.
>> >
>> > Single-threaded stuff can't get out of sync with itself; you'll always
>> > observe your own stores.
>> >
>> > And ISTR the JIT scenario being something like the JIT overwriting
>> > previously executed but supposedly no longer used code. And in this
>> > scenario you'd want to guarantee all CPUs observe the new code before
>> > jumping into it.
>> >
>> > The current approach is using mprotect(), except that on a number of
>> > platforms the TLB invalidate from that is not guaranteed to be strong
>> > enough to sync for code changes.
>> >
>> > On x86 the mprotect() should work just fine, since we broadcast IPIs for
>> > the TLB invalidate and the IRET from those will get the things synced up
>> > again (if nothing else; very likely we'll have done a MOV-CR3 which will
>> > of course also have sufficient syncness on it).
>> >
>> > But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
>> > and don't guarantee their TLB invalidate sync against execution units
>> > are left broken by this scheme.
>> >
>>
>> On x86 single-thread, you can still get in trouble, I think. Do a
>> store, get migrated, execute the stored code. There's no actual
>> guarantee that the new CPU does a CR3 load due to laziness.
>
> The migration IPI will probably prevent that.
What guarantees that there's an IPI? Do we never do a syscall, get
migrated during syscall processing (due to cond_resched(), for
example), and land on another CPU that just happened to already be
scheduling?
--Andy
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:31 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2017-11-14 16:31 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Thomas Gleixner, Mathieu Desnoyers, Avi Kivity, Linus Torvalds,
linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On Tue, Nov 14, 2017 at 08:16:09AM -0800, Andy Lutomirski wrote:
> What guarantees that there's an IPI? Do we never do a syscall, get
> migrated during syscall processing (due to cond_resched(), for
> example), and land on another CPU that just happened to already be
> scheduling?
Possible, the other CPU could've pulled the task because it went idle.
No IPIs involved in that scenario.
And if it was running a different thread of the same process prior to
that, we'll also not do switch_mm().
So yes, it is possible to construct a migration scenario without core
serializing instructions (of the CPUID/MOV-CR kind, not the LOCK prefix
kind).
Note that that still requires a multi-threaded process.
There is another scenario; where the NOHZ load-balancer moves the task;
such that the NOHZ load balancing CPU is a 3rd CPU. In that case there
is an interrupt (to affect the load-balancing) but it will not land on
the CPU that's going to run the task.
This could happen for a single threaded task; since I suppose the NOHZ
idle CPU that's going to be the victim could have ran our task last and
still lazily have the mm.
Very tricky to make work, not to mention that I suspect actually going
idle will kill a whole bunch of state real quick.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 16:31 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2017-11-14 16:31 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Thomas Gleixner, Mathieu Desnoyers, Avi Kivity, Linus Torvalds,
linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann
On Tue, Nov 14, 2017 at 08:16:09AM -0800, Andy Lutomirski wrote:
> What guarantees that there's an IPI? Do we never do a syscall, get
> migrated during syscall processing (due to cond_resched(), for
> example), and land on another CPU that just happened to already be
> scheduling?
Possible, the other CPU could've pulled the task because it went idle.
No IPIs involved in that scenario.
And if it was running a different thread of the same process prior to
that, we'll also not do switch_mm().
So yes, it is possible to construct a migration scenario without core
serializing instructions (of the CPUID/MOV-CR kind, not the LOCK prefix
kind).
Note that that still requires a multi-threaded process.
There is another scenario; where the NOHZ load-balancer moves the task;
such that the NOHZ load balancing CPU is a 3rd CPU. In that case there
is an interrupt (to affect the load-balancing) but it will not land on
the CPU that's going to run the task.
This could happen for a single threaded task; since I suppose the NOHZ
idle CPU that's going to be the victim could have ran our task last and
still lazily have the mm.
Very tricky to make work, not to mention that I suspect actually going
idle will kill a whole bunch of state real quick.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 17:17 ` Daniel Bristot de Oliveira
0 siblings, 0 replies; 52+ messages in thread
From: Daniel Bristot de Oliveira @ 2017-11-14 17:17 UTC (permalink / raw)
To: Peter Zijlstra, Andy Lutomirski
Cc: Thomas Gleixner, Mathieu Desnoyers, Avi Kivity, Linus Torvalds,
linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On 11/14/2017 05:31 PM, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 08:16:09AM -0800, Andy Lutomirski wrote:
>> What guarantees that there's an IPI? Do we never do a syscall, get
>> migrated during syscall processing (due to cond_resched(), for
>> example), and land on another CPU that just happened to already be
>> scheduling?
>
> Possible, the other CPU could've pulled the task because it went idle.
> No IPIs involved in that scenario.
>
> And if it was running a different thread of the same process prior to
> that, we'll also not do switch_mm().
>
> So yes, it is possible to construct a migration scenario without core
> serializing instructions (of the CPUID/MOV-CR kind, not the LOCK prefix
> kind).
>
> Note that that still requires a multi-threaded process.
>
> There is another scenario; where the NOHZ load-balancer moves the task;
> such that the NOHZ load balancing CPU is a 3rd CPU. In that case there
> is an interrupt (to affect the load-balancing) but it will not land on
> the CPU that's going to run the task.
>
> This could happen for a single threaded task; since I suppose the NOHZ
> idle CPU that's going to be the victim could have ran our task last and
> still lazily have the mm.
>
> Very tricky to make work, not to mention that I suspect actually going
> idle will kill a whole bunch of state real quick.
>
IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
is fired as well, but that is not a very common case.
-- Daniel
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 17:17 ` Daniel Bristot de Oliveira
0 siblings, 0 replies; 52+ messages in thread
From: Daniel Bristot de Oliveira @ 2017-11-14 17:17 UTC (permalink / raw)
To: Peter Zijlstra, Andy Lutomirski
Cc: Thomas Gleixner, Mathieu Desnoyers, Avi Kivity, Linus Torvalds,
linux-kernel, linux-api, Paul E. McKenney, Boqun Feng,
Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann
On 11/14/2017 05:31 PM, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 08:16:09AM -0800, Andy Lutomirski wrote:
>> What guarantees that there's an IPI? Do we never do a syscall, get
>> migrated during syscall processing (due to cond_resched(), for
>> example), and land on another CPU that just happened to already be
>> scheduling?
>
> Possible, the other CPU could've pulled the task because it went idle.
> No IPIs involved in that scenario.
>
> And if it was running a different thread of the same process prior to
> that, we'll also not do switch_mm().
>
> So yes, it is possible to construct a migration scenario without core
> serializing instructions (of the CPUID/MOV-CR kind, not the LOCK prefix
> kind).
>
> Note that that still requires a multi-threaded process.
>
> There is another scenario; where the NOHZ load-balancer moves the task;
> such that the NOHZ load balancing CPU is a 3rd CPU. In that case there
> is an interrupt (to affect the load-balancing) but it will not land on
> the CPU that's going to run the task.
>
> This could happen for a single threaded task; since I suppose the NOHZ
> idle CPU that's going to be the victim could have ran our task last and
> still lazily have the mm.
>
> Very tricky to make work, not to mention that I suspect actually going
> idle will kill a whole bunch of state real quick.
>
IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
is fired as well, but that is not a very common case.
-- Daniel
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 17:40 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2017-11-14 17:40 UTC (permalink / raw)
To: Daniel Bristot de Oliveira
Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:
> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
> is fired as well, but that is not a very common case.
You're thinking about wake from idle? That is almost always without IPI,
even without idle=poll.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 17:40 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2017-11-14 17:40 UTC (permalink / raw)
To: Daniel Bristot de Oliveira
Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux
On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:
> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
> is fired as well, but that is not a very common case.
You're thinking about wake from idle? That is almost always without IPI,
even without idle=poll.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 18:01 ` Daniel Bristot de Oliveira
0 siblings, 0 replies; 52+ messages in thread
From: Daniel Bristot de Oliveira @ 2017-11-14 18:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On 11/14/2017 06:40 PM, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:
>
>> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
>> is fired as well, but that is not a very common case.
>
> You're thinking about wake from idle? That is almost always without IPI,
> even without idle=poll.
>
I meant the resched_curr(rq) of an rq on another CPU. If the dest is
idle && idle=poll, the IPI will not be send.
-- Daniel
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 18:01 ` Daniel Bristot de Oliveira
0 siblings, 0 replies; 52+ messages in thread
From: Daniel Bristot de Oliveira @ 2017-11-14 18:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux
On 11/14/2017 06:40 PM, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:
>
>> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
>> is fired as well, but that is not a very common case.
>
> You're thinking about wake from idle? That is almost always without IPI,
> even without idle=poll.
>
I meant the resched_curr(rq) of an rq on another CPU. If the dest is
idle && idle=poll, the IPI will not be send.
-- Daniel
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 18:17 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2017-11-14 18:17 UTC (permalink / raw)
To: Daniel Bristot de Oliveira
Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On Tue, Nov 14, 2017 at 07:01:55PM +0100, Daniel Bristot de Oliveira wrote:
> On 11/14/2017 06:40 PM, Peter Zijlstra wrote:
> > On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:
> >
> >> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
> >> is fired as well, but that is not a very common case.
> >
> > You're thinking about wake from idle? That is almost always without IPI,
> > even without idle=poll.
> >
>
> I meant the resched_curr(rq) of an rq on another CPU. If the dest is
> idle && idle=poll, the IPI will not be send.
I'm saying the IPI will not be send even without idle=poll. MWAIT based
idle will also have TIF_POLLING_NRFLAG set.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 18:17 ` Peter Zijlstra
0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2017-11-14 18:17 UTC (permalink / raw)
To: Daniel Bristot de Oliveira
Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux
On Tue, Nov 14, 2017 at 07:01:55PM +0100, Daniel Bristot de Oliveira wrote:
> On 11/14/2017 06:40 PM, Peter Zijlstra wrote:
> > On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:
> >
> >> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
> >> is fired as well, but that is not a very common case.
> >
> > You're thinking about wake from idle? That is almost always without IPI,
> > even without idle=poll.
> >
>
> I meant the resched_curr(rq) of an rq on another CPU. If the dest is
> idle && idle=poll, the IPI will not be send.
I'm saying the IPI will not be send even without idle=poll. MWAIT based
idle will also have TIF_POLLING_NRFLAG set.
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 18:24 ` Daniel Bristot de Oliveira
0 siblings, 0 replies; 52+ messages in thread
From: Daniel Bristot de Oliveira @ 2017-11-14 18:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux,
Greg Hackmann, Will Deacon, David Sehr, x86
On 11/14/2017 07:17 PM, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 07:01:55PM +0100, Daniel Bristot de Oliveira wrote:
>> On 11/14/2017 06:40 PM, Peter Zijlstra wrote:
>>> On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:
>>>
>>>> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
>>>> is fired as well, but that is not a very common case.
>>>
>>> You're thinking about wake from idle? That is almost always without IPI,
>>> even without idle=poll.
>>>
>>
>> I meant the resched_curr(rq) of an rq on another CPU. If the dest is
>> idle && idle=poll, the IPI will not be send.
>
> I'm saying the IPI will not be send even without idle=poll. MWAIT based
> idle will also have TIF_POLLING_NRFLAG set.
>
Yeah! you are right! I missed that point... sorry :-)
-- Daniel
^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
@ 2017-11-14 18:24 ` Daniel Bristot de Oliveira
0 siblings, 0 replies; 52+ messages in thread
From: Daniel Bristot de Oliveira @ 2017-11-14 18:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Andy Lutomirski, Thomas Gleixner, Mathieu Desnoyers, Avi Kivity,
Linus Torvalds, linux-kernel, linux-api, Paul E. McKenney,
Boqun Feng, Andrew Hunter, maged michael, Benjamin Herrenschmidt,
Paul Mackerras, Michael Ellerman, Dave Watson, Ingo Molnar,
H. Peter Anvin, Andrea Parri, Russell King, ARM Linux
On 11/14/2017 07:17 PM, Peter Zijlstra wrote:
> On Tue, Nov 14, 2017 at 07:01:55PM +0100, Daniel Bristot de Oliveira wrote:
>> On 11/14/2017 06:40 PM, Peter Zijlstra wrote:
>>> On Tue, Nov 14, 2017 at 06:17:13PM +0100, Daniel Bristot de Oliveira wrote:
>>>
>>>> IIRC, if the dest cpu is idle and the system is with idle=poll, no IPI
>>>> is fired as well, but that is not a very common case.
>>>
>>> You're thinking about wake from idle? That is almost always without IPI,
>>> even without idle=poll.
>>>
>>
>> I meant the resched_curr(rq) of an rq on another CPU. If the dest is
>> idle && idle=poll, the IPI will not be send.
>
> I'm saying the IPI will not be send even without idle=poll. MWAIT based
> idle will also have TIF_POLLING_NRFLAG set.
>
Yeah! you are right! I missed that point... sorry :-)
-- Daniel
^ permalink raw reply [flat|nested] 52+ messages in thread