[BUG REPORT] ktime_get_ts64 causes Hard Lockup

* [BUG REPORT] ktime_get_ts64 causes Hard Lockup
@ 2016-01-19  1:59 Jeff Merkey
  2016-01-19  2:16 ` Jeff Merkey
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-19  1:59 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz, Thomas Gleixner

If I suspend all processors in the NMI handler while in the debugger
console at an active breakpoint, and then I leave the system in the
console for about 20 minutes with the processors spinning, then exit
and release the processors, ktime_get_ts64 enters an infinite loop and
triggers the Hard Lockup detector.   Nice having a debugger to just
step into the NMI handlers and find this stuff now.

The offending code is:

(2)> u ktime_get_ts64+9d
<<<  this section gets stuck in an infinite loop.
0xffffffff810ede1d 482D00CA9A3B    sub    rax,0x3b9aca00
0xffffffff810ede23 83C201          add    edx,0x1
0xffffffff810ede26 483DFFC99A3B    cmp    rax,0x3b9ac9ff
0xffffffff810ede2c 77EF            ja     ktime_get_ts64+0x9d
(0xffffffff810ede1d) (up)
<<<
0xffffffff810ede2e 4801CA          add    rdx,rcx
0xffffffff810ede31 48894308        mov    QWORD PTR
[rbx+8]=0xFFFFFFFFA0774E01,rax
0xffffffff810ede35 488913          mov    QWORD PTR [rbx]=0xFFFFFFFFA0742000,rdx
0xffffffff810ede38 5B              pop    rbx
0xffffffff810ede39 415C            pop    r12
0xffffffff810ede3b 415D            pop    r13
0xffffffff810ede3d 5D              pop    rbp
0xffffffff810ede3e C3              ret
0xffffffff810ede3f 31D2            xor    edx,edx
0xffffffff810ede41 EBEB            jmp    ktime_get_ts64+0xae
(0xffffffff810ede2e) (up)
0xffffffff810ede43 BE11030000      mov    esi,0x311
0xffffffff810ede48 48C7C751C38D81  mov    rdi,0xffffffff818dc351
0xffffffff810ede4f E82C12F9FF      call   warn_slowpath_null
0xffffffff810ede54 E946FFFFFF      jmp    ktime_get_ts64+0x1f
(0xffffffff810edd9f) (up)
0xffffffff810ede59 F390            pause
0xffffffff810ede5b E942FFFFFF      jmp    ktime_get_ts64+0x22
(0xffffffff810edda2) (up)

Same code in GDB format

(2)> id ktime_get_ts64+9d
<<<
0xffffffff810ede1d ktime_get_ts64+0x9d:    sub    $0x3b9aca00,%rax
0xffffffff810ede23 ktime_get_ts64+0xa3:    add    $0x1,%edx
0xffffffff810ede26 ktime_get_ts64+0xa6:    cmp    $0x3b9ac9ff,%rax
0xffffffff810ede2c ktime_get_ts64+0xac:    ja     0xffffffff810ede1d
ktime_get_ts64+0x9d (up)
<<<
0xffffffff810ede2e ktime_get_ts64+0xae:    add    %rcx,%rdx
0xffffffff810ede31 ktime_get_ts64+0xb1:    mov    %rax,0x8(%rbx)
0xffffffff810ede35 ktime_get_ts64+0xb5:    mov    %rdx,(%rbx)
0xffffffff810ede38 ktime_get_ts64+0xb8:    pop    %rbx
0xffffffff810ede39 ktime_get_ts64+0xb9:    pop    %r12
0xffffffff810ede3b ktime_get_ts64+0xbb:    pop    %r13
0xffffffff810ede3d ktime_get_ts64+0xbd:    pop    %rbp
0xffffffff810ede3e ktime_get_ts64+0xbe:    retq
0xffffffff810ede3f ktime_get_ts64+0xbf:    xor    %edx,%edx
0xffffffff810ede41 ktime_get_ts64+0xc1:    jmp    0xffffffff810ede2e
ktime_get_ts64+0xae (up)
0xffffffff810ede43 ktime_get_ts64+0xc3:    mov    $0x311,%esi
0xffffffff810ede48 ktime_get_ts64+0xc8:    mov    $0xffffffff818dc351,%rdi
0xffffffff810ede4f ktime_get_ts64+0xcf:    callq  0xffffffff8107f080
warn_slowpath_null
0xffffffff810ede54 ktime_get_ts64+0xd4:    jmpq   0xffffffff810edd9f
ktime_get_ts64+0x1f (up)
0xffffffff810ede59 ktime_get_ts64+0xd9:    pause
0xffffffff810ede5b ktime_get_ts64+0xdb:    jmpq   0xffffffff810edda2
ktime_get_ts64+0x22 (up)
(2)> g

What is strange is the math its doing.  It is subtracting a fixed
value from rax then comparing the value.  It looks like this is case
where the value may have wrapped and the code just wasn;t setup to
handle it.

0xffffffff810ede1d 482D00CA9A3B    sub    rax,0x3b9aca00
0xffffffff810ede23 83C201          add    edx,0x1
0xffffffff810ede26 483DFFC99A3B    cmp    rax,0x3b9ac9ff
0xffffffff810ede2c 77EF            ja     ktime_get_ts64+0x9d
(0xffffffff810ede1d) (up)

The C code is:

/**
 * ktime_get_ts64 - get the monotonic clock in timespec64 format
 * @ts:		pointer to timespec variable
 *
 * The function calculates the monotonic clock from the realtime
 * clock and the wall_to_monotonic offset and stores the result
 * in normalized timespec64 format in the variable pointed to by @ts.
 */
void ktime_get_ts64(struct timespec64 *ts)
{
	struct timekeeper *tk = &tk_core.timekeeper;
	struct timespec64 tomono;
	s64 nsec;
	unsigned int seq;

	WARN_ON(timekeeping_suspended);

	do {
		seq = read_seqcount_begin(&tk_core.seq);
		ts->tv_sec = tk->xtime_sec;
		nsec = timekeeping_get_ns(&tk->tkr_mono);
		tomono = tk->wall_to_monotonic;
<<<
	} while (read_seqcount_retry(&tk_core.seq, seq));
<<<
	ts->tv_sec += tomono.tv_sec;
	ts->tv_nsec = 0;
	timespec64_add_ns(ts, nsec + tomono.tv_nsec);
}
EXPORT_SYMBOL_GPL(ktime_get_ts64);

Any ideas how to fix this problem?  That do {} while gets stuck there.

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread