All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
@ 2016-01-25  0:49 Jeff Merkey
  2016-01-25  1:43 ` Jeff Merkey
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Merkey @ 2016-01-25  0:49 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman

If I single step with either kgdb, kgdb, or mdb kernel debuggers over
a sysret instruction anywhere in the OS,  the system hard hangs in
smp_call_function_single after the debugger releases the system and it
resumes normal operation.    The specific place the kernel hangs is in
the loop below.   The softlockup detector will periodically detect
this condition when it occurs, but not always, most of the time the
system is just hung and unresponsive.

(2)> u smp_call_function_single+d8
<<<< hard hang in this loop with EDX=3
0xffffffff810fce48 8B55E0          mov    edx,DWORD PTR [rbp-32]=0xCE037DC0
0xffffffff810fce4b 83E201          and    edx,0x1
0xffffffff810fce4e 75F6            jne
smp_call_function_single+0xd6 (0xffffffff810fce46) (up)
<<<<<
0xffffffff810fce50 EBC3            jmp
smp_call_function_single+0xa5 (0xffffffff810fce15) (up)
0xffffffff810fce52 8B05E08EC700    mov    eax,[oops_in_progress]=0x0
0xffffffff810fce58 85C0            test   eax,eax
0xffffffff810fce5a 7585            jne
smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
0xffffffff810fce5c 803D8E0C9D0000  cmp    [__warned.20610]=0x00,0x0
0xffffffff810fce63 0F8578FFFFFF    jne
smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
0xffffffff810fce69 BE24010000      mov    esi,0x124
0xffffffff810fce6e 48C7C796B08C81  mov    rdi,0xffffffff818cb096
0xffffffff810fce75 894DBC          mov    DWORD PTR [rbp-68]=0x0,ecx
0xffffffff810fce78 488955C0        mov    QWORD PTR
[rbp-64]=0xFFFFFFFFFFFFFF10,rdx
0xffffffff810fce7c E8FF21F8FF      call   warn_slowpath_null
0xffffffff810fce81 C605690C9D0001  mov    [__warned.20610]=0x00,0x1
0xffffffff810fce88 8B4DBC          mov    ecx,DWORD PTR [rbp-68]=0x0
0xffffffff810fce8b 488B55C0        mov    rdx,QWORD PTR
[rbp-64]=0xFFFFFFFFFFFFFF10
0xffffffff810fce8f E94DFFFFFF      jmp
smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
0xffffffff810fce94 E8A71EF8FF      call   __stack_chk_fail
0xffffffff810fce99 0F1F8000000000  nop    DWORD PTR [rax]=0x0
(2)> g


The stack backtrace when the bug occurs is:

smp_call_function_single+0xd8
unmap_page_range+0x613
flush_tlb_func+0x0
smp_call_function_many+215
native_flush_tlb_others+0x118
flush_tlb_mm_range+0x61
tlb_flush_mmu_tlbonly+0x6b
tlb_finish_mmu+0x14
unmap_region+0xe2
vma_rb_erase+0x10f
do_unmap+0x217
vm_unmap+0x41
SyS_munmap+0x22
entry_SYSCALL_64_fastpath+0x12

I traced through this code a bunch of times in just normal operations
without triggering the bug to get a feel for what it normally sees in
EDX and it looks like someone has coded a looping function that always
has EDX=0 in every case I saw in the except for when this bug occurs.

So the exact C code this maps fro objdump of kernel/smp.o is:

 469:	e8 62 fe ff ff       	callq  2d0 <generic_exec_single>
 46e:	8b 55 e0             	mov    -0x20(%rbp),%edx
 * previous function call. For multi-cpu calls its even more interesting
 * as we'll have to ensure no other cpu is observing our csd.
 */
static void csd_lock_wait(struct call_single_data *csd)
{
	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
 471:	83 e2 01             	and    $0x1,%edx
 474:	74 cf                	je     445 <smp_call_function_single+0xa5>
 476:	f3 90                	pause
<<<<<<<<<<
 478:	8b 55 e0             	mov    -0x20(%rbp),%edx
 47b:	83 e2 01             	and    $0x1,%edx
 47e:	75 f6                	jne    476 <smp_call_function_single+0xd6>
<<<<<<<<<<<
 480:	eb c3                	jmp    445 <smp_call_function_single+0xa5>
	 * Can deadlock when called with interrupts disabled.
	 * We allow cpu's that are not yet online though, as no one else can
	 * send smp call function interrupt to this cpu and as such deadlocks
	 * can't happen.
	 */
	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()

Each time this bug occurs csd->flags is always set to a value of 3 and
never changes.  When the system is just running normally, it seems to
be 0 the rest of the time.   Setting EDX=0 from the debugger console
clears the hang condition and the system seems to recover except the
system reports this error from the console when you attempt to load
programs, indicating the ability of the system to load shared objects
is fritzed.

#
# ls -l
/lib64/libc.so.6 version GLI not found   << this error and no shared
objects will load
#
#

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
  2016-01-25  0:49 [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8 Jeff Merkey
@ 2016-01-25  1:43 ` Jeff Merkey
  2016-01-25 23:06   ` Jeff Merkey
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Merkey @ 2016-01-25  1:43 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman

On 1/24/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> If I single step with either kgdb, kgdb, or mdb kernel debuggers over
> a sysret instruction anywhere in the OS,  the system hard hangs in
> smp_call_function_single after the debugger releases the system and it
> resumes normal operation.    The specific place the kernel hangs is in
> the loop below.   The softlockup detector will periodically detect
> this condition when it occurs, but not always, most of the time the
> system is just hung and unresponsive.
>
> (2)> u smp_call_function_single+d8
> <<<< hard hang in this loop with EDX=3
> 0xffffffff810fce48 8B55E0          mov    edx,DWORD PTR [rbp-32]=0xCE037DC0
> 0xffffffff810fce4b 83E201          and    edx,0x1
> 0xffffffff810fce4e 75F6            jne
> smp_call_function_single+0xd6 (0xffffffff810fce46) (up)
> <<<<<
> 0xffffffff810fce50 EBC3            jmp
> smp_call_function_single+0xa5 (0xffffffff810fce15) (up)
> 0xffffffff810fce52 8B05E08EC700    mov    eax,[oops_in_progress]=0x0
> 0xffffffff810fce58 85C0            test   eax,eax
> 0xffffffff810fce5a 7585            jne
> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
> 0xffffffff810fce5c 803D8E0C9D0000  cmp    [__warned.20610]=0x00,0x0
> 0xffffffff810fce63 0F8578FFFFFF    jne
> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
> 0xffffffff810fce69 BE24010000      mov    esi,0x124
> 0xffffffff810fce6e 48C7C796B08C81  mov    rdi,0xffffffff818cb096
> 0xffffffff810fce75 894DBC          mov    DWORD PTR [rbp-68]=0x0,ecx
> 0xffffffff810fce78 488955C0        mov    QWORD PTR
> [rbp-64]=0xFFFFFFFFFFFFFF10,rdx
> 0xffffffff810fce7c E8FF21F8FF      call   warn_slowpath_null
> 0xffffffff810fce81 C605690C9D0001  mov    [__warned.20610]=0x00,0x1
> 0xffffffff810fce88 8B4DBC          mov    ecx,DWORD PTR [rbp-68]=0x0
> 0xffffffff810fce8b 488B55C0        mov    rdx,QWORD PTR
> [rbp-64]=0xFFFFFFFFFFFFFF10
> 0xffffffff810fce8f E94DFFFFFF      jmp
> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
> 0xffffffff810fce94 E8A71EF8FF      call   __stack_chk_fail
> 0xffffffff810fce99 0F1F8000000000  nop    DWORD PTR [rax]=0x0
> (2)> g
>
>
> The stack backtrace when the bug occurs is:
>
> smp_call_function_single+0xd8
> unmap_page_range+0x613
> flush_tlb_func+0x0
> smp_call_function_many+215
> native_flush_tlb_others+0x118
> flush_tlb_mm_range+0x61
> tlb_flush_mmu_tlbonly+0x6b
> tlb_finish_mmu+0x14
> unmap_region+0xe2
> vma_rb_erase+0x10f
> do_unmap+0x217
> vm_unmap+0x41
> SyS_munmap+0x22
> entry_SYSCALL_64_fastpath+0x12
>
> I traced through this code a bunch of times in just normal operations
> without triggering the bug to get a feel for what it normally sees in
> EDX and it looks like someone has coded a looping function that always
> has EDX=0 in every case I saw in the except for when this bug occurs.
>
> So the exact C code this maps fro objdump of kernel/smp.o is:
>
>  469:	e8 62 fe ff ff       	callq  2d0 <generic_exec_single>
>  46e:	8b 55 e0             	mov    -0x20(%rbp),%edx
>  * previous function call. For multi-cpu calls its even more interesting
>  * as we'll have to ensure no other cpu is observing our csd.
>  */
> static void csd_lock_wait(struct call_single_data *csd)
> {
> 	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
>  471:	83 e2 01             	and    $0x1,%edx
>  474:	74 cf                	je     445 <smp_call_function_single+0xa5>
>  476:	f3 90                	pause
> <<<<<<<<<<
>  478:	8b 55 e0             	mov    -0x20(%rbp),%edx
>  47b:	83 e2 01             	and    $0x1,%edx
>  47e:	75 f6                	jne    476 <smp_call_function_single+0xd6>
> <<<<<<<<<<<
>  480:	eb c3                	jmp    445 <smp_call_function_single+0xa5>
> 	 * Can deadlock when called with interrupts disabled.
> 	 * We allow cpu's that are not yet online though, as no one else can
> 	 * send smp call function interrupt to this cpu and as such deadlocks
> 	 * can't happen.
> 	 */
> 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>
> Each time this bug occurs csd->flags is always set to a value of 3 and
> never changes.  When the system is just running normally, it seems to
> be 0 the rest of the time.   Setting EDX=0 from the debugger console
> clears the hang condition and the system seems to recover except the
> system reports this error from the console when you attempt to load
> programs, indicating the ability of the system to load shared objects
> is fritzed.
>
> #
> # ls -l
> /lib64/libc.so.6 version GLI not found   << this error and no shared
> objects will load
> #
> #
>
> Jeff
>

I am running down a trace of the MSR values for swapgs.  Looks like it
got nested somewhere down in the entry_64 code.   If so, then this is
just a symptom and not the sickness.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
  2016-01-25  1:43 ` Jeff Merkey
@ 2016-01-25 23:06   ` Jeff Merkey
  2016-01-26  4:05     ` Jeff Merkey
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Merkey @ 2016-01-25 23:06 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman

On 1/24/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> On 1/24/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> If I single step with either kgdb, kgdb, or mdb kernel debuggers over
>> a sysret instruction anywhere in the OS,  the system hard hangs in
>> smp_call_function_single after the debugger releases the system and it
>> resumes normal operation.    The specific place the kernel hangs is in
>> the loop below.   The softlockup detector will periodically detect
>> this condition when it occurs, but not always, most of the time the
>> system is just hung and unresponsive.
>>
>> (2)> u smp_call_function_single+d8
>> <<<< hard hang in this loop with EDX=3
>> 0xffffffff810fce48 8B55E0          mov    edx,DWORD PTR
>> [rbp-32]=0xCE037DC0
>> 0xffffffff810fce4b 83E201          and    edx,0x1
>> 0xffffffff810fce4e 75F6            jne
>> smp_call_function_single+0xd6 (0xffffffff810fce46) (up)
>> <<<<<
>> 0xffffffff810fce50 EBC3            jmp
>> smp_call_function_single+0xa5 (0xffffffff810fce15) (up)
>> 0xffffffff810fce52 8B05E08EC700    mov    eax,[oops_in_progress]=0x0
>> 0xffffffff810fce58 85C0            test   eax,eax
>> 0xffffffff810fce5a 7585            jne
>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>> 0xffffffff810fce5c 803D8E0C9D0000  cmp    [__warned.20610]=0x00,0x0
>> 0xffffffff810fce63 0F8578FFFFFF    jne
>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>> 0xffffffff810fce69 BE24010000      mov    esi,0x124
>> 0xffffffff810fce6e 48C7C796B08C81  mov    rdi,0xffffffff818cb096
>> 0xffffffff810fce75 894DBC          mov    DWORD PTR [rbp-68]=0x0,ecx
>> 0xffffffff810fce78 488955C0        mov    QWORD PTR
>> [rbp-64]=0xFFFFFFFFFFFFFF10,rdx
>> 0xffffffff810fce7c E8FF21F8FF      call   warn_slowpath_null
>> 0xffffffff810fce81 C605690C9D0001  mov    [__warned.20610]=0x00,0x1
>> 0xffffffff810fce88 8B4DBC          mov    ecx,DWORD PTR [rbp-68]=0x0
>> 0xffffffff810fce8b 488B55C0        mov    rdx,QWORD PTR
>> [rbp-64]=0xFFFFFFFFFFFFFF10
>> 0xffffffff810fce8f E94DFFFFFF      jmp
>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>> 0xffffffff810fce94 E8A71EF8FF      call   __stack_chk_fail
>> 0xffffffff810fce99 0F1F8000000000  nop    DWORD PTR [rax]=0x0
>> (2)> g
>>
>>
>> The stack backtrace when the bug occurs is:
>>
>> smp_call_function_single+0xd8
>> unmap_page_range+0x613
>> flush_tlb_func+0x0
>> smp_call_function_many+215
>> native_flush_tlb_others+0x118
>> flush_tlb_mm_range+0x61
>> tlb_flush_mmu_tlbonly+0x6b
>> tlb_finish_mmu+0x14
>> unmap_region+0xe2
>> vma_rb_erase+0x10f
>> do_unmap+0x217
>> vm_unmap+0x41
>> SyS_munmap+0x22
>> entry_SYSCALL_64_fastpath+0x12
>>
>> I traced through this code a bunch of times in just normal operations
>> without triggering the bug to get a feel for what it normally sees in
>> EDX and it looks like someone has coded a looping function that always
>> has EDX=0 in every case I saw in the except for when this bug occurs.
>>
>> So the exact C code this maps fro objdump of kernel/smp.o is:
>>
>>  469:	e8 62 fe ff ff       	callq  2d0 <generic_exec_single>
>>  46e:	8b 55 e0             	mov    -0x20(%rbp),%edx
>>  * previous function call. For multi-cpu calls its even more interesting
>>  * as we'll have to ensure no other cpu is observing our csd.
>>  */
>> static void csd_lock_wait(struct call_single_data *csd)
>> {
>> 	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
>>  471:	83 e2 01             	and    $0x1,%edx
>>  474:	74 cf                	je     445 <smp_call_function_single+0xa5>
>>  476:	f3 90                	pause
>> <<<<<<<<<<
>>  478:	8b 55 e0             	mov    -0x20(%rbp),%edx
>>  47b:	83 e2 01             	and    $0x1,%edx
>>  47e:	75 f6                	jne    476 <smp_call_function_single+0xd6>
>> <<<<<<<<<<<
>>  480:	eb c3                	jmp    445 <smp_call_function_single+0xa5>
>> 	 * Can deadlock when called with interrupts disabled.
>> 	 * We allow cpu's that are not yet online though, as no one else can
>> 	 * send smp call function interrupt to this cpu and as such deadlocks
>> 	 * can't happen.
>> 	 */
>> 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>>
>> Each time this bug occurs csd->flags is always set to a value of 3 and
>> never changes.  When the system is just running normally, it seems to
>> be 0 the rest of the time.   Setting EDX=0 from the debugger console
>> clears the hang condition and the system seems to recover except the
>> system reports this error from the console when you attempt to load
>> programs, indicating the ability of the system to load shared objects
>> is fritzed.
>>
>> #
>> # ls -l
>> /lib64/libc.so.6 version GLI not found   << this error and no shared
>> objects will load
>> #
>> #
>>
>> Jeff
>>
>
> I am running down a trace of the MSR values for swapgs.  Looks like it
> got nested somewhere down in the entry_64 code.   If so, then this is
> just a symptom and not the sickness.
>
> Jeff
>

I got to the bottom of this one and its related to the function loop
locking up the system that is described above when code is allowed to
run on one processor but not the others and one of them is trying to
send an IPI.

You might want to consider some sort of timeout logic for that
function or ability for it to recover.    The problem is caused by
holding all the processors except the target processor being traced
and letting it run for a trace session while the other processors are
held suspended.   If I release the all processors each time I step
with the target processor the problem goes away.  If I hold the
processors while stepping over a section of code this lockup will
occur and the processor the bug occurs on apparently never recovers

In the process of tracking down this bug I traced all of the swapgs
usage in all of the exception handlers and it works well.  The check
in paranoid_entry and paranoid_exit seems to catch the cases where the
debugger is or has stepped over a swapgs instruction pointing to a gs
in userspace for both int1 and int3 exceptions, and recovers correctly
through paranoid_entry and paranoid exit with the correct gs and MSR
value in all the cases I traced.

I coded around this issue but it's probably a bug in using that while
loop without some way to determine a processor has gone away
temporarily.  The solution is to let the processors run in between
each and every int1 trap and not hold a processor in focus mode, then
this code snippet that locks up in smp.c does not get hit.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
  2016-01-25 23:06   ` Jeff Merkey
@ 2016-01-26  4:05     ` Jeff Merkey
  2016-01-30  8:23       ` Jeff Merkey
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Merkey @ 2016-01-26  4:05 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman

On 1/25/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> On 1/24/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> On 1/24/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
>>> If I single step with either kgdb, kgdb, or mdb kernel debuggers over
>>> a sysret instruction anywhere in the OS,  the system hard hangs in
>>> smp_call_function_single after the debugger releases the system and it
>>> resumes normal operation.    The specific place the kernel hangs is in
>>> the loop below.   The softlockup detector will periodically detect
>>> this condition when it occurs, but not always, most of the time the
>>> system is just hung and unresponsive.
>>>
>>> (2)> u smp_call_function_single+d8
>>> <<<< hard hang in this loop with EDX=3
>>> 0xffffffff810fce48 8B55E0          mov    edx,DWORD PTR
>>> [rbp-32]=0xCE037DC0
>>> 0xffffffff810fce4b 83E201          and    edx,0x1
>>> 0xffffffff810fce4e 75F6            jne
>>> smp_call_function_single+0xd6 (0xffffffff810fce46) (up)
>>> <<<<<
>>> 0xffffffff810fce50 EBC3            jmp
>>> smp_call_function_single+0xa5 (0xffffffff810fce15) (up)
>>> 0xffffffff810fce52 8B05E08EC700    mov    eax,[oops_in_progress]=0x0
>>> 0xffffffff810fce58 85C0            test   eax,eax
>>> 0xffffffff810fce5a 7585            jne
>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>>> 0xffffffff810fce5c 803D8E0C9D0000  cmp    [__warned.20610]=0x00,0x0
>>> 0xffffffff810fce63 0F8578FFFFFF    jne
>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>>> 0xffffffff810fce69 BE24010000      mov    esi,0x124
>>> 0xffffffff810fce6e 48C7C796B08C81  mov    rdi,0xffffffff818cb096
>>> 0xffffffff810fce75 894DBC          mov    DWORD PTR [rbp-68]=0x0,ecx
>>> 0xffffffff810fce78 488955C0        mov    QWORD PTR
>>> [rbp-64]=0xFFFFFFFFFFFFFF10,rdx
>>> 0xffffffff810fce7c E8FF21F8FF      call   warn_slowpath_null
>>> 0xffffffff810fce81 C605690C9D0001  mov    [__warned.20610]=0x00,0x1
>>> 0xffffffff810fce88 8B4DBC          mov    ecx,DWORD PTR [rbp-68]=0x0
>>> 0xffffffff810fce8b 488B55C0        mov    rdx,QWORD PTR
>>> [rbp-64]=0xFFFFFFFFFFFFFF10
>>> 0xffffffff810fce8f E94DFFFFFF      jmp
>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>>> 0xffffffff810fce94 E8A71EF8FF      call   __stack_chk_fail
>>> 0xffffffff810fce99 0F1F8000000000  nop    DWORD PTR [rax]=0x0
>>> (2)> g
>>>
>>>
>>> The stack backtrace when the bug occurs is:
>>>
>>> smp_call_function_single+0xd8
>>> unmap_page_range+0x613
>>> flush_tlb_func+0x0
>>> smp_call_function_many+215
>>> native_flush_tlb_others+0x118
>>> flush_tlb_mm_range+0x61
>>> tlb_flush_mmu_tlbonly+0x6b
>>> tlb_finish_mmu+0x14
>>> unmap_region+0xe2
>>> vma_rb_erase+0x10f
>>> do_unmap+0x217
>>> vm_unmap+0x41
>>> SyS_munmap+0x22
>>> entry_SYSCALL_64_fastpath+0x12
>>>
>>> I traced through this code a bunch of times in just normal operations
>>> without triggering the bug to get a feel for what it normally sees in
>>> EDX and it looks like someone has coded a looping function that always
>>> has EDX=0 in every case I saw in the except for when this bug occurs.
>>>
>>> So the exact C code this maps fro objdump of kernel/smp.o is:
>>>
>>>  469:	e8 62 fe ff ff       	callq  2d0 <generic_exec_single>
>>>  46e:	8b 55 e0             	mov    -0x20(%rbp),%edx
>>>  * previous function call. For multi-cpu calls its even more interesting
>>>  * as we'll have to ensure no other cpu is observing our csd.
>>>  */
>>> static void csd_lock_wait(struct call_single_data *csd)
>>> {
>>> 	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
>>>  471:	83 e2 01             	and    $0x1,%edx
>>>  474:	74 cf                	je     445 <smp_call_function_single+0xa5>
>>>  476:	f3 90                	pause
>>> <<<<<<<<<<
>>>  478:	8b 55 e0             	mov    -0x20(%rbp),%edx
>>>  47b:	83 e2 01             	and    $0x1,%edx
>>>  47e:	75 f6                	jne    476 <smp_call_function_single+0xd6>
>>> <<<<<<<<<<<
>>>  480:	eb c3                	jmp    445 <smp_call_function_single+0xa5>
>>> 	 * Can deadlock when called with interrupts disabled.
>>> 	 * We allow cpu's that are not yet online though, as no one else can
>>> 	 * send smp call function interrupt to this cpu and as such deadlocks
>>> 	 * can't happen.
>>> 	 */
>>> 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>>>
>>> Each time this bug occurs csd->flags is always set to a value of 3 and
>>> never changes.  When the system is just running normally, it seems to
>>> be 0 the rest of the time.   Setting EDX=0 from the debugger console
>>> clears the hang condition and the system seems to recover except the
>>> system reports this error from the console when you attempt to load
>>> programs, indicating the ability of the system to load shared objects
>>> is fritzed.
>>>
>>> #
>>> # ls -l
>>> /lib64/libc.so.6 version GLI not found   << this error and no shared
>>> objects will load
>>> #
>>> #
>>>
>>> Jeff
>>>
>>
>> I am running down a trace of the MSR values for swapgs.  Looks like it
>> got nested somewhere down in the entry_64 code.   If so, then this is
>> just a symptom and not the sickness.
>>
>> Jeff
>>
>
> I got to the bottom of this one and its related to the function loop
> locking up the system that is described above when code is allowed to
> run on one processor but not the others and one of them is trying to
> send an IPI.
>
> You might want to consider some sort of timeout logic for that
> function or ability for it to recover.    The problem is caused by
> holding all the processors except the target processor being traced
> and letting it run for a trace session while the other processors are
> held suspended.   If I release the all processors each time I step
> with the target processor the problem goes away.  If I hold the
> processors while stepping over a section of code this lockup will
> occur and the processor the bug occurs on apparently never recovers
>
> In the process of tracking down this bug I traced all of the swapgs
> usage in all of the exception handlers and it works well.  The check
> in paranoid_entry and paranoid_exit seems to catch the cases where the
> debugger is or has stepped over a swapgs instruction pointing to a gs
> in userspace for both int1 and int3 exceptions, and recovers correctly
> through paranoid_entry and paranoid exit with the correct gs and MSR
> value in all the cases I traced.
>
> I coded around this issue but it's probably a bug in using that while
> loop without some way to determine a processor has gone away
> temporarily.  The solution is to let the processors run in between
> each and every int1 trap and not hold a processor in focus mode, then
> this code snippet that locks up in smp.c does not get hit.
>
> Jeff
>

I guess Linus has already been trying to fix this problem since he was
the last person to touch that code.  Looks like there is another case
where csd_unlock does not get called or is delayed.

https://github.com/torvalds/linux/commit/8053871d0f7f67c7efb7f226ef031f78877d6625

I think I'll work on this one some more and see if I can fix it.  It
still shows up after stepping through a sysret even with my changes,
its just harder to trigger.   It crashes at exactly the same place in
the stacktrace from the same source so there's a clue to start with.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
  2016-01-26  4:05     ` Jeff Merkey
@ 2016-01-30  8:23       ` Jeff Merkey
  2016-01-30  8:41         ` Jeff Merkey
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Merkey @ 2016-01-30  8:23 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman

On 1/25/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> On 1/25/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> On 1/24/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
>>> On 1/24/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
>>>> If I single step with either kgdb, kgdb, or mdb kernel debuggers over
>>>> a sysret instruction anywhere in the OS,  the system hard hangs in
>>>> smp_call_function_single after the debugger releases the system and it
>>>> resumes normal operation.    The specific place the kernel hangs is in
>>>> the loop below.   The softlockup detector will periodically detect
>>>> this condition when it occurs, but not always, most of the time the
>>>> system is just hung and unresponsive.
>>>>
>>>> (2)> u smp_call_function_single+d8
>>>> <<<< hard hang in this loop with EDX=3
>>>> 0xffffffff810fce48 8B55E0          mov    edx,DWORD PTR
>>>> [rbp-32]=0xCE037DC0
>>>> 0xffffffff810fce4b 83E201          and    edx,0x1
>>>> 0xffffffff810fce4e 75F6            jne
>>>> smp_call_function_single+0xd6 (0xffffffff810fce46) (up)
>>>> <<<<<
>>>> 0xffffffff810fce50 EBC3            jmp
>>>> smp_call_function_single+0xa5 (0xffffffff810fce15) (up)
>>>> 0xffffffff810fce52 8B05E08EC700    mov    eax,[oops_in_progress]=0x0
>>>> 0xffffffff810fce58 85C0            test   eax,eax
>>>> 0xffffffff810fce5a 7585            jne
>>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>>>> 0xffffffff810fce5c 803D8E0C9D0000  cmp    [__warned.20610]=0x00,0x0
>>>> 0xffffffff810fce63 0F8578FFFFFF    jne
>>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>>>> 0xffffffff810fce69 BE24010000      mov    esi,0x124
>>>> 0xffffffff810fce6e 48C7C796B08C81  mov    rdi,0xffffffff818cb096
>>>> 0xffffffff810fce75 894DBC          mov    DWORD PTR [rbp-68]=0x0,ecx
>>>> 0xffffffff810fce78 488955C0        mov    QWORD PTR
>>>> [rbp-64]=0xFFFFFFFFFFFFFF10,rdx
>>>> 0xffffffff810fce7c E8FF21F8FF      call   warn_slowpath_null
>>>> 0xffffffff810fce81 C605690C9D0001  mov    [__warned.20610]=0x00,0x1
>>>> 0xffffffff810fce88 8B4DBC          mov    ecx,DWORD PTR [rbp-68]=0x0
>>>> 0xffffffff810fce8b 488B55C0        mov    rdx,QWORD PTR
>>>> [rbp-64]=0xFFFFFFFFFFFFFF10
>>>> 0xffffffff810fce8f E94DFFFFFF      jmp
>>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up)
>>>> 0xffffffff810fce94 E8A71EF8FF      call   __stack_chk_fail
>>>> 0xffffffff810fce99 0F1F8000000000  nop    DWORD PTR [rax]=0x0
>>>> (2)> g
>>>>
>>>>
>>>> The stack backtrace when the bug occurs is:
>>>>
>>>> smp_call_function_single+0xd8
>>>> unmap_page_range+0x613
>>>> flush_tlb_func+0x0
>>>> smp_call_function_many+215
>>>> native_flush_tlb_others+0x118
>>>> flush_tlb_mm_range+0x61
>>>> tlb_flush_mmu_tlbonly+0x6b
>>>> tlb_finish_mmu+0x14
>>>> unmap_region+0xe2
>>>> vma_rb_erase+0x10f
>>>> do_unmap+0x217
>>>> vm_unmap+0x41
>>>> SyS_munmap+0x22
>>>> entry_SYSCALL_64_fastpath+0x12
>>>>
>>>> I traced through this code a bunch of times in just normal operations
>>>> without triggering the bug to get a feel for what it normally sees in
>>>> EDX and it looks like someone has coded a looping function that always
>>>> has EDX=0 in every case I saw in the except for when this bug occurs.
>>>>
>>>> So the exact C code this maps fro objdump of kernel/smp.o is:
>>>>
>>>>  469:	e8 62 fe ff ff       	callq  2d0 <generic_exec_single>
>>>>  46e:	8b 55 e0             	mov    -0x20(%rbp),%edx
>>>>  * previous function call. For multi-cpu calls its even more
>>>> interesting
>>>>  * as we'll have to ensure no other cpu is observing our csd.
>>>>  */
>>>> static void csd_lock_wait(struct call_single_data *csd)
>>>> {
>>>> 	while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
>>>>  471:	83 e2 01             	and    $0x1,%edx
>>>>  474:	74 cf                	je     445 <smp_call_function_single+0xa5>
>>>>  476:	f3 90                	pause
>>>> <<<<<<<<<<
>>>>  478:	8b 55 e0             	mov    -0x20(%rbp),%edx
>>>>  47b:	83 e2 01             	and    $0x1,%edx
>>>>  47e:	75 f6                	jne    476 <smp_call_function_single+0xd6>
>>>> <<<<<<<<<<<
>>>>  480:	eb c3                	jmp    445 <smp_call_function_single+0xa5>
>>>> 	 * Can deadlock when called with interrupts disabled.
>>>> 	 * We allow cpu's that are not yet online though, as no one else can
>>>> 	 * send smp call function interrupt to this cpu and as such deadlocks
>>>> 	 * can't happen.
>>>> 	 */
>>>> 	WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>>>>
>>>> Each time this bug occurs csd->flags is always set to a value of 3 and
>>>> never changes.  When the system is just running normally, it seems to
>>>> be 0 the rest of the time.   Setting EDX=0 from the debugger console
>>>> clears the hang condition and the system seems to recover except the
>>>> system reports this error from the console when you attempt to load
>>>> programs, indicating the ability of the system to load shared objects
>>>> is fritzed.
>>>>
>>>> #
>>>> # ls -l
>>>> /lib64/libc.so.6 version GLI not found   << this error and no shared
>>>> objects will load
>>>> #
>>>> #
>>>>
>>>> Jeff
>>>>
>>>
>>> I am running down a trace of the MSR values for swapgs.  Looks like it
>>> got nested somewhere down in the entry_64 code.   If so, then this is
>>> just a symptom and not the sickness.
>>>
>>> Jeff
>>>
>>
>> I got to the bottom of this one and its related to the function loop
>> locking up the system that is described above when code is allowed to
>> run on one processor but not the others and one of them is trying to
>> send an IPI.
>>
>> You might want to consider some sort of timeout logic for that
>> function or ability for it to recover.    The problem is caused by
>> holding all the processors except the target processor being traced
>> and letting it run for a trace session while the other processors are
>> held suspended.   If I release the all processors each time I step
>> with the target processor the problem goes away.  If I hold the
>> processors while stepping over a section of code this lockup will
>> occur and the processor the bug occurs on apparently never recovers
>>
>> In the process of tracking down this bug I traced all of the swapgs
>> usage in all of the exception handlers and it works well.  The check
>> in paranoid_entry and paranoid_exit seems to catch the cases where the
>> debugger is or has stepped over a swapgs instruction pointing to a gs
>> in userspace for both int1 and int3 exceptions, and recovers correctly
>> through paranoid_entry and paranoid exit with the correct gs and MSR
>> value in all the cases I traced.
>>
>> I coded around this issue but it's probably a bug in using that while
>> loop without some way to determine a processor has gone away
>> temporarily.  The solution is to let the processors run in between
>> each and every int1 trap and not hold a processor in focus mode, then
>> this code snippet that locks up in smp.c does not get hit.
>>
>> Jeff
>>
>
> I guess Linus has already been trying to fix this problem since he was
> the last person to touch that code.  Looks like there is another case
> where csd_unlock does not get called or is delayed.
>
> https://github.com/torvalds/linux/commit/8053871d0f7f67c7efb7f226ef031f78877d6625
>
> I think I'll work on this one some more and see if I can fix it.  It
> still shows up after stepping through a sysret even with my changes,
> its just harder to trigger.   It crashes at exactly the same place in
> the stacktrace from the same source so there's a clue to start with.
>
> Jeff
>

This bug is confirmed to be an int1 recursion that happens because the
sysret code in entry_64.S allows return to userspace with the trap
flag set.  The last person to try to fix this was from this commit --
Ingo.

https://github.com/torvalds/linux/commit/4d7321381e5c7102a3d3faf0a0a0035a09619612

for this reported problem.

http://lkml.iu.edu/hypermail/linux/kernel/1504.0/00594.html

I verified that I am seeing the same problem here, it also exposes a
bug in Linus code in smp.c.  I have that fixed as well -- use a
spinlock and don't try to guess what order hardware will post bus
transactions for memory reads and writes.  That code in smp.c is
busted on some systems.

I have a fix for this problem and Ingo needs to review this code
because I have set the trap flag and watched is cycle through this
code in entry_64.S while set and still end up executing a sysret when
its not supposed to and take the other return path.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
  2016-01-30  8:23       ` Jeff Merkey
@ 2016-01-30  8:41         ` Jeff Merkey
  2016-01-30 16:53           ` Andy Lutomirski
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Merkey @ 2016-01-30  8:41 UTC (permalink / raw)
  To: LKML
  Cc: Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman, tglx, mingo, hpa, x86, luto

Here is an MDB debugger trace of the code in question.  please note
that the flags being compared don't match what's in r11 and the
comparison bits are wrong.

(3)>

Break at 0xFFFFFFFF81680022 due to - Proceed (single step)
RAX: 0000000000000080 RBX: 0000000000000002 RCX: 00007FC9877F2A30
RDX: 0000000000000000 RSI: FFFF8800BFD9BC00 RDI: FFFF88011FCD6C80
RSP: FFFF8800CD6C7F58 RBP: 00007FC988119000  R8: FFFF8800CD6C4000
 R9: 0000017C85499D0E R10: FFFF8800C17BB8F0 R11: 0000000000000246  << WRONG!!!
R12: 00007FC987AC6400 R13: 0000000000000002 R14: 0000000000000001
R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 0018
 IP: FFFFFFFF81680022 FLAGS: 0000000000000146  (PF ZF TF) << real flags
0xffffffff81680022 49F7C300010100  test   r11,0x10100   < comparison
bits correct r11 is WRONG!!!
(3)>

Break at 0xFFFFFFFF81680029 due to - Proceed (single step)
RAX: 0000000000000080 RBX: 0000000000000002 RCX: 00007FC9877F2A30
RDX: 0000000000000000 RSI: FFFF8800BFD9BC00 RDI: FFFF88011FCD6C80
RSP: FFFF8800CD6C7F58 RBP: 00007FC988119000  R8: FFFF8800CD6C4000
 R9: 0000017C85499D0E R10: FFFF8800C17BB8F0 R11: 0000000000000246
R12: 00007FC987AC6400 R13: 0000000000000002 R14: 0000000000000001
R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 0018
 IP: FFFFFFFF81680029 FLAGS: 0000000000000146  (PF ZF TF)
0xffffffff81680029 753C            jne    opportunistic_sysret_failed
(0xffffffff81680067) (down)
(3)>

Maps to this code in entry_64.S

	movq	R11(%rsp), %r11           << picks up BOGUS flags here
	cmpq	%r11, EFLAGS(%rsp)		/* R11 == RFLAGS */
	jne	opportunistic_sysret_failed

	/*
	 * SYSRET can't restore RF.  SYSRET can restore TF, but unlike IRET,
	 * restoring TF results in a trap from userspace immediately after
	 * SYSRET.  This would cause an infinite loop whenever #DB happens
	 * with register state that satisfies the opportunistic SYSRET
	 * conditions.  For example, single-stepping this user code:
	 *
	 *           movq	$stuck_here, %rcx
	 *           pushfq
	 *           popq %r11
	 *   stuck_here:
	 *
	 * would never get past 'stuck_here'.
	 */
	testq	$(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
	jnz	opportunistic_sysret_failed


Anyway, there is your bug.  Should andy go back and fix this -- probably.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
  2016-01-30  8:41         ` Jeff Merkey
@ 2016-01-30 16:53           ` Andy Lutomirski
  2016-01-30 17:53             ` Jeff Merkey
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2016-01-30 16:53 UTC (permalink / raw)
  To: Jeff Merkey
  Cc: LKML, Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Andrew Lutomirski

On Sat, Jan 30, 2016 at 12:41 AM, Jeff Merkey <linux.mdb@gmail.com> wrote:
> Here is an MDB debugger trace of the code in question.  please note
> that the flags being compared don't match what's in r11 and the
> comparison bits are wrong.
>
> (3)>
>
> Break at 0xFFFFFFFF81680022 due to - Proceed (single step)
> RAX: 0000000000000080 RBX: 0000000000000002 RCX: 00007FC9877F2A30
> RDX: 0000000000000000 RSI: FFFF8800BFD9BC00 RDI: FFFF88011FCD6C80
> RSP: FFFF8800CD6C7F58 RBP: 00007FC988119000  R8: FFFF8800CD6C4000
>  R9: 0000017C85499D0E R10: FFFF8800C17BB8F0 R11: 0000000000000246  << WRONG!!!
> R12: 00007FC987AC6400 R13: 0000000000000002 R14: 0000000000000001
> R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 0018
>  IP: FFFFFFFF81680022 FLAGS: 0000000000000146  (PF ZF TF) << real flags
> 0xffffffff81680022 49F7C300010100  test   r11,0x10100   < comparison
> bits correct r11 is WRONG!!!
> (3)>

I have no idea what bug you're talking about, and I have no idea how
this code could cause a soft lockup in smp_call_function_single (at
worst it could potentially enter userspace with invalid state, this
alternating between user and kernel without making progress in user
mode).

And the HW flags register has no particular reason to match r11 or, in
fact, anything saved in pt_regs at all.

--Andy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
  2016-01-30 16:53           ` Andy Lutomirski
@ 2016-01-30 17:53             ` Jeff Merkey
  2016-01-30 17:58               ` Andy Lutomirski
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Merkey @ 2016-01-30 17:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Andrew Lutomirski

On 1/30/16, Andy Lutomirski <luto@amacapital.net> wrote:
> On Sat, Jan 30, 2016 at 12:41 AM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> Here is an MDB debugger trace of the code in question.  please note
>> that the flags being compared don't match what's in r11 and the
>> comparison bits are wrong.
>>
>> (3)>
>>
>> Break at 0xFFFFFFFF81680022 due to - Proceed (single step)
>> RAX: 0000000000000080 RBX: 0000000000000002 RCX: 00007FC9877F2A30
>> RDX: 0000000000000000 RSI: FFFF8800BFD9BC00 RDI: FFFF88011FCD6C80
>> RSP: FFFF8800CD6C7F58 RBP: 00007FC988119000  R8: FFFF8800CD6C4000
>>  R9: 0000017C85499D0E R10: FFFF8800C17BB8F0 R11: 0000000000000246  <<
>> WRONG!!!
>> R12: 00007FC987AC6400 R13: 0000000000000002 R14: 0000000000000001
>> R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS:
>> 0018
>>  IP: FFFFFFFF81680022 FLAGS: 0000000000000146  (PF ZF TF) << real flags
>> 0xffffffff81680022 49F7C300010100  test   r11,0x10100   < comparison
>> bits correct r11 is WRONG!!!
>> (3)>
>
> I have no idea what bug you're talking about, and I have no idea how
> this code could cause a soft lockup in smp_call_function_single (at
> worst it could potentially enter userspace with invalid state, this
> alternating between user and kernel without making progress in user
> mode).
>
> And the HW flags register has no particular reason to match r11 or, in
> fact, anything saved in pt_regs at all.
>
> --Andy
>

Hi Andy,

There are two cases to handle here with the trap flags with sysret,
you are handling just one of them in your fix.  There is the case
where you are going to use sysret to load the flags after the
instruction executes and that's the case you coded for.  The other
case which is not being handled is the one where someone is single
stepping through this code and the trap flag gets set and then sysret
gets called.

>From what I can tell, sysret is a broken instruction which will just
hang if someone calls it with the trap flag set.   It does not act
like this on ia32, just x86_64.    The answer is to not use sysret and
use your iret return for all syscalls.


So

TF Set -> call sysret   =- Hang
Load previous flags - > call sysret (pop TF flags) = Hang

Two cases to handle.

The smp_call_function_single bug is just a symptom when this other
hang condition shows up.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
  2016-01-30 17:53             ` Jeff Merkey
@ 2016-01-30 17:58               ` Andy Lutomirski
  2016-01-30 18:05                 ` Jeff Merkey
  2016-01-30 18:50                 ` Jeff Merkey
  0 siblings, 2 replies; 12+ messages in thread
From: Andy Lutomirski @ 2016-01-30 17:58 UTC (permalink / raw)
  To: Jeff Merkey
  Cc: LKML, Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Andrew Lutomirski

On Sat, Jan 30, 2016 at 9:53 AM, Jeff Merkey <linux.mdb@gmail.com> wrote:
> On 1/30/16, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Sat, Jan 30, 2016 at 12:41 AM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>>> Here is an MDB debugger trace of the code in question.  please note
>>> that the flags being compared don't match what's in r11 and the
>>> comparison bits are wrong.
>>>
>>> (3)>
>>>
>>> Break at 0xFFFFFFFF81680022 due to - Proceed (single step)
>>> RAX: 0000000000000080 RBX: 0000000000000002 RCX: 00007FC9877F2A30
>>> RDX: 0000000000000000 RSI: FFFF8800BFD9BC00 RDI: FFFF88011FCD6C80
>>> RSP: FFFF8800CD6C7F58 RBP: 00007FC988119000  R8: FFFF8800CD6C4000
>>>  R9: 0000017C85499D0E R10: FFFF8800C17BB8F0 R11: 0000000000000246  <<
>>> WRONG!!!
>>> R12: 00007FC987AC6400 R13: 0000000000000002 R14: 0000000000000001
>>> R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS:
>>> 0018
>>>  IP: FFFFFFFF81680022 FLAGS: 0000000000000146  (PF ZF TF) << real flags
>>> 0xffffffff81680022 49F7C300010100  test   r11,0x10100   < comparison
>>> bits correct r11 is WRONG!!!
>>> (3)>
>>
>> I have no idea what bug you're talking about, and I have no idea how
>> this code could cause a soft lockup in smp_call_function_single (at
>> worst it could potentially enter userspace with invalid state, this
>> alternating between user and kernel without making progress in user
>> mode).
>>
>> And the HW flags register has no particular reason to match r11 or, in
>> fact, anything saved in pt_regs at all.
>>
>> --Andy
>>
>
> Hi Andy,
>
> There are two cases to handle here with the trap flags with sysret,
> you are handling just one of them in your fix.  There is the case
> where you are going to use sysret to load the flags after the
> instruction executes and that's the case you coded for.  The other
> case which is not being handled is the one where someone is single
> stepping through this code and the trap flag gets set and then sysret
> gets called.
>
> From what I can tell, sysret is a broken instruction which will just
> hang if someone calls it with the trap flag set.   It does not act
> like this on ia32, just x86_64.    The answer is to not use sysret and
> use your iret return for all syscalls.
>

Just so you know, I have no intention of supporting this use case.  In
fact, I'm planning to eventually stop using IST for #DB entirely, at
which point the kernel will crash terribly if this code is
single-stepped (except when using a hypervisor to do this single
stepping, which is a much more sensible way to handle it).

So MDB may just need to force the slow syscall exit path
unconditionally, and it'll have to do something else clever to handle
SYSCALL, because that's going to crash, too.

I will *not* insert a pushfq into the syscall return path.  That would
slow everything down for the sole benefit of an in-kernel debugger.

--Andy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
  2016-01-30 17:58               ` Andy Lutomirski
@ 2016-01-30 18:05                 ` Jeff Merkey
  2016-01-30 18:19                   ` Jeff Merkey
  2016-01-30 18:50                 ` Jeff Merkey
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff Merkey @ 2016-01-30 18:05 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Andrew Lutomirski

On 1/30/16, Andy Lutomirski <luto@amacapital.net> wrote:
> On Sat, Jan 30, 2016 at 9:53 AM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> On 1/30/16, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Sat, Jan 30, 2016 at 12:41 AM, Jeff Merkey <linux.mdb@gmail.com>
>>> wrote:
>>>> Here is an MDB debugger trace of the code in question.  please note
>>>> that the flags being compared don't match what's in r11 and the
>>>> comparison bits are wrong.
>>>>
>>>> (3)>
>>>>
>>>> Break at 0xFFFFFFFF81680022 due to - Proceed (single step)
>>>> RAX: 0000000000000080 RBX: 0000000000000002 RCX: 00007FC9877F2A30
>>>> RDX: 0000000000000000 RSI: FFFF8800BFD9BC00 RDI: FFFF88011FCD6C80
>>>> RSP: FFFF8800CD6C7F58 RBP: 00007FC988119000  R8: FFFF8800CD6C4000
>>>>  R9: 0000017C85499D0E R10: FFFF8800C17BB8F0 R11: 0000000000000246  <<
>>>> WRONG!!!
>>>> R12: 00007FC987AC6400 R13: 0000000000000002 R14: 0000000000000001
>>>> R15: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS:
>>>> 0018
>>>>  IP: FFFFFFFF81680022 FLAGS: 0000000000000146  (PF ZF TF) << real flags
>>>> 0xffffffff81680022 49F7C300010100  test   r11,0x10100   < comparison
>>>> bits correct r11 is WRONG!!!
>>>> (3)>
>>>
>>> I have no idea what bug you're talking about, and I have no idea how
>>> this code could cause a soft lockup in smp_call_function_single (at
>>> worst it could potentially enter userspace with invalid state, this
>>> alternating between user and kernel without making progress in user
>>> mode).
>>>
>>> And the HW flags register has no particular reason to match r11 or, in
>>> fact, anything saved in pt_regs at all.
>>>
>>> --Andy
>>>
>>
>> Hi Andy,
>>
>> There are two cases to handle here with the trap flags with sysret,
>> you are handling just one of them in your fix.  There is the case
>> where you are going to use sysret to load the flags after the
>> instruction executes and that's the case you coded for.  The other
>> case which is not being handled is the one where someone is single
>> stepping through this code and the trap flag gets set and then sysret
>> gets called.
>>
>> From what I can tell, sysret is a broken instruction which will just
>> hang if someone calls it with the trap flag set.   It does not act
>> like this on ia32, just x86_64.    The answer is to not use sysret and
>> use your iret return for all syscalls.
>>
>
> Just so you know, I have no intention of supporting this use case.  In
> fact, I'm planning to eventually stop using IST for #DB entirely, at
> which point the kernel will crash terribly if this code is
> single-stepped (except when using a hypervisor to do this single
> stepping, which is a much more sensible way to handle it).
>
> So MDB may just need to force the slow syscall exit path
> unconditionally, and it'll have to do something else clever to handle
> SYSCALL, because that's going to crash, too.
>
> I will *not* insert a pushfq into the syscall return path.  That would
> slow everything down for the sole benefit of an in-kernel debugger.
>
> --Andy
>

Yep, now you see it.  I'll carry this fix locally in my patch series.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
  2016-01-30 18:05                 ` Jeff Merkey
@ 2016-01-30 18:19                   ` Jeff Merkey
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Merkey @ 2016-01-30 18:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Andrew Lutomirski

This lockless memory based synchronization in csd_lock_wait just
doesn't work on all smp systems because not all of them properly
implement these fancy memory fencing instructions.  I've run into this
before trying to do lockless queueing on a range of SMP systems.
About the only thing guaranteed to work is a lock assertion because
this tells the processor to flush its pipeline.  So this csd code is
busted on some systems.

This other hang condition with the sysret just makes it show up as a
deadlock in csd_lock_wait.

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8
  2016-01-30 17:58               ` Andy Lutomirski
  2016-01-30 18:05                 ` Jeff Merkey
@ 2016-01-30 18:50                 ` Jeff Merkey
  1 sibling, 0 replies; 12+ messages in thread
From: Jeff Merkey @ 2016-01-30 18:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Ingo Molnar, Andrew Morton, Vlastimil Babka,
	Peter Zijlstra (Intel),
	Mel Gorman, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
	Andrew Lutomirski

>
> Just so you know, I have no intention of supporting this use case.  In
> fact, I'm planning to eventually stop using IST for #DB entirely, at
> which point the kernel will crash terribly if this code is
> single-stepped (except when using a hypervisor to do this single
> stepping, which is a much more sensible way to handle it).

I don't know what good using the userspace trap code is going to
help with this, a hypervisor will crash too with sysret being used
there -- I just tested it.

So it breaks all debuggers, even the faux ones that run as user space
apps.  Your
other suggestions will cause some mayhem too for debuggers.  If you
break something, I'll just unpatch it in my tree, so have fun.  LOL

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-30 18:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-25  0:49 [BUG REPORT] Soft Lockup in smp_call_function_single+0xD8 Jeff Merkey
2016-01-25  1:43 ` Jeff Merkey
2016-01-25 23:06   ` Jeff Merkey
2016-01-26  4:05     ` Jeff Merkey
2016-01-30  8:23       ` Jeff Merkey
2016-01-30  8:41         ` Jeff Merkey
2016-01-30 16:53           ` Andy Lutomirski
2016-01-30 17:53             ` Jeff Merkey
2016-01-30 17:58               ` Andy Lutomirski
2016-01-30 18:05                 ` Jeff Merkey
2016-01-30 18:19                   ` Jeff Merkey
2016-01-30 18:50                 ` Jeff Merkey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.