All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG REPORT] ktime_get_ts64 causes Hard Lockup
@ 2016-01-19  1:59 Jeff Merkey
  2016-01-19  2:16 ` Jeff Merkey
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-19  1:59 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz, Thomas Gleixner

If I suspend all processors in the NMI handler while in the debugger
console at an active breakpoint, and then I leave the system in the
console for about 20 minutes with the processors spinning, then exit
and release the processors, ktime_get_ts64 enters an infinite loop and
triggers the Hard Lockup detector.   Nice having a debugger to just
step into the NMI handlers and find this stuff now.

The offending code is:

(2)> u ktime_get_ts64+9d
<<<  this section gets stuck in an infinite loop.
0xffffffff810ede1d 482D00CA9A3B    sub    rax,0x3b9aca00
0xffffffff810ede23 83C201          add    edx,0x1
0xffffffff810ede26 483DFFC99A3B    cmp    rax,0x3b9ac9ff
0xffffffff810ede2c 77EF            ja     ktime_get_ts64+0x9d
(0xffffffff810ede1d) (up)
<<<
0xffffffff810ede2e 4801CA          add    rdx,rcx
0xffffffff810ede31 48894308        mov    QWORD PTR
[rbx+8]=0xFFFFFFFFA0774E01,rax
0xffffffff810ede35 488913          mov    QWORD PTR [rbx]=0xFFFFFFFFA0742000,rdx
0xffffffff810ede38 5B              pop    rbx
0xffffffff810ede39 415C            pop    r12
0xffffffff810ede3b 415D            pop    r13
0xffffffff810ede3d 5D              pop    rbp
0xffffffff810ede3e C3              ret
0xffffffff810ede3f 31D2            xor    edx,edx
0xffffffff810ede41 EBEB            jmp    ktime_get_ts64+0xae
(0xffffffff810ede2e) (up)
0xffffffff810ede43 BE11030000      mov    esi,0x311
0xffffffff810ede48 48C7C751C38D81  mov    rdi,0xffffffff818dc351
0xffffffff810ede4f E82C12F9FF      call   warn_slowpath_null
0xffffffff810ede54 E946FFFFFF      jmp    ktime_get_ts64+0x1f
(0xffffffff810edd9f) (up)
0xffffffff810ede59 F390            pause
0xffffffff810ede5b E942FFFFFF      jmp    ktime_get_ts64+0x22
(0xffffffff810edda2) (up)

Same code in GDB format

(2)> id ktime_get_ts64+9d
<<<
0xffffffff810ede1d ktime_get_ts64+0x9d:    sub    $0x3b9aca00,%rax
0xffffffff810ede23 ktime_get_ts64+0xa3:    add    $0x1,%edx
0xffffffff810ede26 ktime_get_ts64+0xa6:    cmp    $0x3b9ac9ff,%rax
0xffffffff810ede2c ktime_get_ts64+0xac:    ja     0xffffffff810ede1d
ktime_get_ts64+0x9d (up)
<<<
0xffffffff810ede2e ktime_get_ts64+0xae:    add    %rcx,%rdx
0xffffffff810ede31 ktime_get_ts64+0xb1:    mov    %rax,0x8(%rbx)
0xffffffff810ede35 ktime_get_ts64+0xb5:    mov    %rdx,(%rbx)
0xffffffff810ede38 ktime_get_ts64+0xb8:    pop    %rbx
0xffffffff810ede39 ktime_get_ts64+0xb9:    pop    %r12
0xffffffff810ede3b ktime_get_ts64+0xbb:    pop    %r13
0xffffffff810ede3d ktime_get_ts64+0xbd:    pop    %rbp
0xffffffff810ede3e ktime_get_ts64+0xbe:    retq
0xffffffff810ede3f ktime_get_ts64+0xbf:    xor    %edx,%edx
0xffffffff810ede41 ktime_get_ts64+0xc1:    jmp    0xffffffff810ede2e
ktime_get_ts64+0xae (up)
0xffffffff810ede43 ktime_get_ts64+0xc3:    mov    $0x311,%esi
0xffffffff810ede48 ktime_get_ts64+0xc8:    mov    $0xffffffff818dc351,%rdi
0xffffffff810ede4f ktime_get_ts64+0xcf:    callq  0xffffffff8107f080
warn_slowpath_null
0xffffffff810ede54 ktime_get_ts64+0xd4:    jmpq   0xffffffff810edd9f
ktime_get_ts64+0x1f (up)
0xffffffff810ede59 ktime_get_ts64+0xd9:    pause
0xffffffff810ede5b ktime_get_ts64+0xdb:    jmpq   0xffffffff810edda2
ktime_get_ts64+0x22 (up)
(2)> g


What is strange is the math its doing.  It is subtracting a fixed
value from rax then comparing the value.  It looks like this is case
where the value may have wrapped and the code just wasn;t setup to
handle it.

0xffffffff810ede1d 482D00CA9A3B    sub    rax,0x3b9aca00
0xffffffff810ede23 83C201          add    edx,0x1
0xffffffff810ede26 483DFFC99A3B    cmp    rax,0x3b9ac9ff
0xffffffff810ede2c 77EF            ja     ktime_get_ts64+0x9d
(0xffffffff810ede1d) (up)

The C code is:


/**
 * ktime_get_ts64 - get the monotonic clock in timespec64 format
 * @ts:		pointer to timespec variable
 *
 * The function calculates the monotonic clock from the realtime
 * clock and the wall_to_monotonic offset and stores the result
 * in normalized timespec64 format in the variable pointed to by @ts.
 */
void ktime_get_ts64(struct timespec64 *ts)
{
	struct timekeeper *tk = &tk_core.timekeeper;
	struct timespec64 tomono;
	s64 nsec;
	unsigned int seq;

	WARN_ON(timekeeping_suspended);

	do {
		seq = read_seqcount_begin(&tk_core.seq);
		ts->tv_sec = tk->xtime_sec;
		nsec = timekeeping_get_ns(&tk->tkr_mono);
		tomono = tk->wall_to_monotonic;
<<<
	} while (read_seqcount_retry(&tk_core.seq, seq));
<<<
	ts->tv_sec += tomono.tv_sec;
	ts->tv_nsec = 0;
	timespec64_add_ns(ts, nsec + tomono.tv_nsec);
}
EXPORT_SYMBOL_GPL(ktime_get_ts64);

Any ideas how to fix this problem?  That do {} while gets stuck there.

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-19  1:59 [BUG REPORT] ktime_get_ts64 causes Hard Lockup Jeff Merkey
@ 2016-01-19  2:16 ` Jeff Merkey
  2016-01-19  2:31   ` Jeff Merkey
  2016-01-19  9:50 ` Thomas Gleixner
  2016-01-19 15:37 ` Jeff Merkey
  2 siblings, 1 reply; 29+ messages in thread
From: Jeff Merkey @ 2016-01-19  2:16 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz, Thomas Gleixner

On 1/18/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> If I suspend all processors in the NMI handler while in the debugger
> console at an active breakpoint, and then I leave the system in the
> console for about 20 minutes with the processors spinning, then exit
> and release the processors, ktime_get_ts64 enters an infinite loop and
> triggers the Hard Lockup detector.   Nice having a debugger to just
> step into the NMI handlers and find this stuff now.
>
> The offending code is:
>
> (2)> u ktime_get_ts64+9d
> <<<  this section gets stuck in an infinite loop.
> 0xffffffff810ede1d 482D00CA9A3B    sub    rax,0x3b9aca00
> 0xffffffff810ede23 83C201          add    edx,0x1
> 0xffffffff810ede26 483DFFC99A3B    cmp    rax,0x3b9ac9ff
> 0xffffffff810ede2c 77EF            ja     ktime_get_ts64+0x9d
> (0xffffffff810ede1d) (up)
> <<<
> 0xffffffff810ede2e 4801CA          add    rdx,rcx
> 0xffffffff810ede31 48894308        mov    QWORD PTR
> [rbx+8]=0xFFFFFFFFA0774E01,rax
> 0xffffffff810ede35 488913          mov    QWORD PTR
> [rbx]=0xFFFFFFFFA0742000,rdx
> 0xffffffff810ede38 5B              pop    rbx
> 0xffffffff810ede39 415C            pop    r12
> 0xffffffff810ede3b 415D            pop    r13
> 0xffffffff810ede3d 5D              pop    rbp
> 0xffffffff810ede3e C3              ret
> 0xffffffff810ede3f 31D2            xor    edx,edx
> 0xffffffff810ede41 EBEB            jmp    ktime_get_ts64+0xae
> (0xffffffff810ede2e) (up)
> 0xffffffff810ede43 BE11030000      mov    esi,0x311
> 0xffffffff810ede48 48C7C751C38D81  mov    rdi,0xffffffff818dc351
> 0xffffffff810ede4f E82C12F9FF      call   warn_slowpath_null
> 0xffffffff810ede54 E946FFFFFF      jmp    ktime_get_ts64+0x1f
> (0xffffffff810edd9f) (up)
> 0xffffffff810ede59 F390            pause
> 0xffffffff810ede5b E942FFFFFF      jmp    ktime_get_ts64+0x22
> (0xffffffff810edda2) (up)
>
> Same code in GDB format
>
> (2)> id ktime_get_ts64+9d
> <<<
> 0xffffffff810ede1d ktime_get_ts64+0x9d:    sub    $0x3b9aca00,%rax
> 0xffffffff810ede23 ktime_get_ts64+0xa3:    add    $0x1,%edx
> 0xffffffff810ede26 ktime_get_ts64+0xa6:    cmp    $0x3b9ac9ff,%rax
> 0xffffffff810ede2c ktime_get_ts64+0xac:    ja     0xffffffff810ede1d
> ktime_get_ts64+0x9d (up)
> <<<
> 0xffffffff810ede2e ktime_get_ts64+0xae:    add    %rcx,%rdx
> 0xffffffff810ede31 ktime_get_ts64+0xb1:    mov    %rax,0x8(%rbx)
> 0xffffffff810ede35 ktime_get_ts64+0xb5:    mov    %rdx,(%rbx)
> 0xffffffff810ede38 ktime_get_ts64+0xb8:    pop    %rbx
> 0xffffffff810ede39 ktime_get_ts64+0xb9:    pop    %r12
> 0xffffffff810ede3b ktime_get_ts64+0xbb:    pop    %r13
> 0xffffffff810ede3d ktime_get_ts64+0xbd:    pop    %rbp
> 0xffffffff810ede3e ktime_get_ts64+0xbe:    retq
> 0xffffffff810ede3f ktime_get_ts64+0xbf:    xor    %edx,%edx
> 0xffffffff810ede41 ktime_get_ts64+0xc1:    jmp    0xffffffff810ede2e
> ktime_get_ts64+0xae (up)
> 0xffffffff810ede43 ktime_get_ts64+0xc3:    mov    $0x311,%esi
> 0xffffffff810ede48 ktime_get_ts64+0xc8:    mov    $0xffffffff818dc351,%rdi
> 0xffffffff810ede4f ktime_get_ts64+0xcf:    callq  0xffffffff8107f080
> warn_slowpath_null
> 0xffffffff810ede54 ktime_get_ts64+0xd4:    jmpq   0xffffffff810edd9f
> ktime_get_ts64+0x1f (up)
> 0xffffffff810ede59 ktime_get_ts64+0xd9:    pause
> 0xffffffff810ede5b ktime_get_ts64+0xdb:    jmpq   0xffffffff810edda2
> ktime_get_ts64+0x22 (up)
> (2)> g
>
>
> What is strange is the math its doing.  It is subtracting a fixed
> value from rax then comparing the value.  It looks like this is case
> where the value may have wrapped and the code just wasn;t setup to
> handle it.
>
> 0xffffffff810ede1d 482D00CA9A3B    sub    rax,0x3b9aca00
> 0xffffffff810ede23 83C201          add    edx,0x1
> 0xffffffff810ede26 483DFFC99A3B    cmp    rax,0x3b9ac9ff
> 0xffffffff810ede2c 77EF            ja     ktime_get_ts64+0x9d
> (0xffffffff810ede1d) (up)
>
> The C code is:
>
>
> /**
>  * ktime_get_ts64 - get the monotonic clock in timespec64 format
>  * @ts:		pointer to timespec variable
>  *
>  * The function calculates the monotonic clock from the realtime
>  * clock and the wall_to_monotonic offset and stores the result
>  * in normalized timespec64 format in the variable pointed to by @ts.
>  */
> void ktime_get_ts64(struct timespec64 *ts)
> {
> 	struct timekeeper *tk = &tk_core.timekeeper;
> 	struct timespec64 tomono;
> 	s64 nsec;
> 	unsigned int seq;
>
> 	WARN_ON(timekeeping_suspended);
>
> 	do {
> 		seq = read_seqcount_begin(&tk_core.seq);
> 		ts->tv_sec = tk->xtime_sec;
> 		nsec = timekeeping_get_ns(&tk->tkr_mono);
> 		tomono = tk->wall_to_monotonic;
> <<<
> 	} while (read_seqcount_retry(&tk_core.seq, seq));
> <<<
> 	ts->tv_sec += tomono.tv_sec;
> 	ts->tv_nsec = 0;
> 	timespec64_add_ns(ts, nsec + tomono.tv_nsec);
> }
> EXPORT_SYMBOL_GPL(ktime_get_ts64);
>
> Any ideas how to fix this problem?  That do {} while gets stuck there.
>
> Jeff
>

I just verified that both kgdb and kdb trigger this bug as well if you
hold the processors suspended for about 20-45 minutes.

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-19  2:16 ` Jeff Merkey
@ 2016-01-19  2:31   ` Jeff Merkey
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-19  2:31 UTC (permalink / raw)
  To: LKML

output from objdump provided:

static __always_inline u32
__iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
{
	u32 ret = 0;
     27b:	31 d2                	xor    %edx,%edx
	while (dividend >= divisor) {
		/* The following asm() prevents the compiler from
		   optimising this loop into a modulo operation.  */
		asm("" : "+rm"(dividend));

		dividend -= divisor;
     27d:	48 2d 00 ca 9a 3b    	sub    $0x3b9aca00,%rax
		ret++;
     283:	83 c2 01             	add    $0x1,%edx
static __always_inline u32
__iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
{
	u32 ret = 0;

	while (dividend >= divisor) {
     286:	48 3d ff c9 9a 3b    	cmp    $0x3b9ac9ff,%rax
     28c:	77 ef                	ja     27d <ktime_get_ts64+0x9d>
 * This must always be inlined because its used from the x86-64 vdso,
 * which cannot call other kernel functions.
 */

I guess is dividend and divsor get out of whack this function loops
forever.  So I sent this to the maintainers, so which is the list of
"engineers" who actually wrote and understand this section of code.  I
would like to get this fixed.

Thanks

Jeff

On 1/18/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> On 1/18/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> If I suspend all processors in the NMI handler while in the debugger
>> console at an active breakpoint, and then I leave the system in the
>> console for about 20 minutes with the processors spinning, then exit
>> and release the processors, ktime_get_ts64 enters an infinite loop and
>> triggers the Hard Lockup detector.   Nice having a debugger to just
>> step into the NMI handlers and find this stuff now.
>>
>> The offending code is:
>>
>> (2)> u ktime_get_ts64+9d
>> <<<  this section gets stuck in an infinite loop.
>> 0xffffffff810ede1d 482D00CA9A3B    sub    rax,0x3b9aca00
>> 0xffffffff810ede23 83C201          add    edx,0x1
>> 0xffffffff810ede26 483DFFC99A3B    cmp    rax,0x3b9ac9ff
>> 0xffffffff810ede2c 77EF            ja     ktime_get_ts64+0x9d
>> (0xffffffff810ede1d) (up)
>> <<<
>> 0xffffffff810ede2e 4801CA          add    rdx,rcx
>> 0xffffffff810ede31 48894308        mov    QWORD PTR
>> [rbx+8]=0xFFFFFFFFA0774E01,rax
>> 0xffffffff810ede35 488913          mov    QWORD PTR
>> [rbx]=0xFFFFFFFFA0742000,rdx
>> 0xffffffff810ede38 5B              pop    rbx
>> 0xffffffff810ede39 415C            pop    r12
>> 0xffffffff810ede3b 415D            pop    r13
>> 0xffffffff810ede3d 5D              pop    rbp
>> 0xffffffff810ede3e C3              ret
>> 0xffffffff810ede3f 31D2            xor    edx,edx
>> 0xffffffff810ede41 EBEB            jmp    ktime_get_ts64+0xae
>> (0xffffffff810ede2e) (up)
>> 0xffffffff810ede43 BE11030000      mov    esi,0x311
>> 0xffffffff810ede48 48C7C751C38D81  mov    rdi,0xffffffff818dc351
>> 0xffffffff810ede4f E82C12F9FF      call   warn_slowpath_null
>> 0xffffffff810ede54 E946FFFFFF      jmp    ktime_get_ts64+0x1f
>> (0xffffffff810edd9f) (up)
>> 0xffffffff810ede59 F390            pause
>> 0xffffffff810ede5b E942FFFFFF      jmp    ktime_get_ts64+0x22
>> (0xffffffff810edda2) (up)
>>
>> Same code in GDB format
>>
>> (2)> id ktime_get_ts64+9d
>> <<<
>> 0xffffffff810ede1d ktime_get_ts64+0x9d:    sub    $0x3b9aca00,%rax
>> 0xffffffff810ede23 ktime_get_ts64+0xa3:    add    $0x1,%edx
>> 0xffffffff810ede26 ktime_get_ts64+0xa6:    cmp    $0x3b9ac9ff,%rax
>> 0xffffffff810ede2c ktime_get_ts64+0xac:    ja     0xffffffff810ede1d
>> ktime_get_ts64+0x9d (up)
>> <<<
>> 0xffffffff810ede2e ktime_get_ts64+0xae:    add    %rcx,%rdx
>> 0xffffffff810ede31 ktime_get_ts64+0xb1:    mov    %rax,0x8(%rbx)
>> 0xffffffff810ede35 ktime_get_ts64+0xb5:    mov    %rdx,(%rbx)
>> 0xffffffff810ede38 ktime_get_ts64+0xb8:    pop    %rbx
>> 0xffffffff810ede39 ktime_get_ts64+0xb9:    pop    %r12
>> 0xffffffff810ede3b ktime_get_ts64+0xbb:    pop    %r13
>> 0xffffffff810ede3d ktime_get_ts64+0xbd:    pop    %rbp
>> 0xffffffff810ede3e ktime_get_ts64+0xbe:    retq
>> 0xffffffff810ede3f ktime_get_ts64+0xbf:    xor    %edx,%edx
>> 0xffffffff810ede41 ktime_get_ts64+0xc1:    jmp    0xffffffff810ede2e
>> ktime_get_ts64+0xae (up)
>> 0xffffffff810ede43 ktime_get_ts64+0xc3:    mov    $0x311,%esi
>> 0xffffffff810ede48 ktime_get_ts64+0xc8:    mov
>> $0xffffffff818dc351,%rdi
>> 0xffffffff810ede4f ktime_get_ts64+0xcf:    callq  0xffffffff8107f080
>> warn_slowpath_null
>> 0xffffffff810ede54 ktime_get_ts64+0xd4:    jmpq   0xffffffff810edd9f
>> ktime_get_ts64+0x1f (up)
>> 0xffffffff810ede59 ktime_get_ts64+0xd9:    pause
>> 0xffffffff810ede5b ktime_get_ts64+0xdb:    jmpq   0xffffffff810edda2
>> ktime_get_ts64+0x22 (up)
>> (2)> g
>>
>>
>> What is strange is the math its doing.  It is subtracting a fixed
>> value from rax then comparing the value.  It looks like this is case
>> where the value may have wrapped and the code just wasn;t setup to
>> handle it.
>>
>> 0xffffffff810ede1d 482D00CA9A3B    sub    rax,0x3b9aca00
>> 0xffffffff810ede23 83C201          add    edx,0x1
>> 0xffffffff810ede26 483DFFC99A3B    cmp    rax,0x3b9ac9ff
>> 0xffffffff810ede2c 77EF            ja     ktime_get_ts64+0x9d
>> (0xffffffff810ede1d) (up)
>>
>> The C code is:
>>
>>
>> /**
>>  * ktime_get_ts64 - get the monotonic clock in timespec64 format
>>  * @ts:		pointer to timespec variable
>>  *
>>  * The function calculates the monotonic clock from the realtime
>>  * clock and the wall_to_monotonic offset and stores the result
>>  * in normalized timespec64 format in the variable pointed to by @ts.
>>  */
>> void ktime_get_ts64(struct timespec64 *ts)
>> {
>> 	struct timekeeper *tk = &tk_core.timekeeper;
>> 	struct timespec64 tomono;
>> 	s64 nsec;
>> 	unsigned int seq;
>>
>> 	WARN_ON(timekeeping_suspended);
>>
>> 	do {
>> 		seq = read_seqcount_begin(&tk_core.seq);
>> 		ts->tv_sec = tk->xtime_sec;
>> 		nsec = timekeeping_get_ns(&tk->tkr_mono);
>> 		tomono = tk->wall_to_monotonic;
>> <<<
>> 	} while (read_seqcount_retry(&tk_core.seq, seq));
>> <<<
>> 	ts->tv_sec += tomono.tv_sec;
>> 	ts->tv_nsec = 0;
>> 	timespec64_add_ns(ts, nsec + tomono.tv_nsec);
>> }
>> EXPORT_SYMBOL_GPL(ktime_get_ts64);
>>
>> Any ideas how to fix this problem?  That do {} while gets stuck there.
>>
>> Jeff
>>
>
> I just verified that both kgdb and kdb trigger this bug as well if you
> hold the processors suspended for about 20-45 minutes.
>
> Jeff
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-19  1:59 [BUG REPORT] ktime_get_ts64 causes Hard Lockup Jeff Merkey
  2016-01-19  2:16 ` Jeff Merkey
@ 2016-01-19  9:50 ` Thomas Gleixner
  2016-01-19 15:37 ` Jeff Merkey
  2 siblings, 0 replies; 29+ messages in thread
From: Thomas Gleixner @ 2016-01-19  9:50 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: LKML, John Stultz

On Mon, 18 Jan 2016, Jeff Merkey wrote:
> What is strange is the math its doing.  It is subtracting a fixed
> value from rax then comparing the value.  It looks like this is case
> where the value may have wrapped and the code just wasn;t setup to
> handle it.

Well, the worst case what would happen is that it loops another full round.
If you subtract 1e9 from rax often enough then it will become smaller than 1e9
no matter whether it wrapped or not. It just takes more iterations.

> 0xffffffff810ede1d 482D00CA9A3B    sub    rax,0x3b9aca00
> 0xffffffff810ede23 83C201          add    edx,0x1
> 0xffffffff810ede26 483DFFC99A3B    cmp    rax,0x3b9ac9ff
> 0xffffffff810ede2c 77EF            ja     ktime_get_ts64+0x9d
> (0xffffffff810ede1d) (up)
> 
> The C code is:
> 
> 
> /**
>  * ktime_get_ts64 - get the monotonic clock in timespec64 format
>  * @ts:		pointer to timespec variable
>  *
>  * The function calculates the monotonic clock from the realtime
>  * clock and the wall_to_monotonic offset and stores the result
>  * in normalized timespec64 format in the variable pointed to by @ts.
>  */
> void ktime_get_ts64(struct timespec64 *ts)
> {
> 	struct timekeeper *tk = &tk_core.timekeeper;
> 	struct timespec64 tomono;
> 	s64 nsec;
> 	unsigned int seq;
> 
> 	WARN_ON(timekeeping_suspended);
> 
> 	do {
> 		seq = read_seqcount_begin(&tk_core.seq);
> 		ts->tv_sec = tk->xtime_sec;
> 		nsec = timekeeping_get_ns(&tk->tkr_mono);
> 		tomono = tk->wall_to_monotonic;
> <<<
> 	} while (read_seqcount_retry(&tk_core.seq, seq));
> <<<
> 	ts->tv_sec += tomono.tv_sec;
> 	ts->tv_nsec = 0;
> 	timespec64_add_ns(ts, nsec + tomono.tv_nsec);
> }
> EXPORT_SYMBOL_GPL(ktime_get_ts64);
> 
> Any ideas how to fix this problem?  That do {} while gets stuck there.

So now you are pointing to that do { } while. That has absolutely nothing to
do with timespec64_add_ns() to which you are referring above.

That do {} while loop gets stuck when the time keeper sequence counter has
changed while we were reading the time and the offset.

So where exactly is it stuck? Please provide backtraces from the
hardlockup detector.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-19  1:59 [BUG REPORT] ktime_get_ts64 causes Hard Lockup Jeff Merkey
  2016-01-19  2:16 ` Jeff Merkey
  2016-01-19  9:50 ` Thomas Gleixner
@ 2016-01-19 15:37 ` Jeff Merkey
  2016-01-19 22:00   ` Jeff Merkey
  2 siblings, 1 reply; 29+ messages in thread
From: Jeff Merkey @ 2016-01-19 15:37 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz, Thomas Gleixner

On 1/18/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> If I suspend all processors in the NMI handler while in the debugger
> console at an active breakpoint, and then I leave the system in the
> console for about 20 minutes with the processors spinning, then exit
> and release the processors, ktime_get_ts64 enters an infinite loop and
> triggers the Hard Lockup detector.   Nice having a debugger to just
> step into the NMI handlers and find this stuff now.
>
> The offending code is:
>
> (2)> u ktime_get_ts64+9d
> <<<  this section gets stuck in an infinite loop.
> 0xffffffff810ede1d 482D00CA9A3B    sub    rax,0x3b9aca00
> 0xffffffff810ede23 83C201          add    edx,0x1
> 0xffffffff810ede26 483DFFC99A3B    cmp    rax,0x3b9ac9ff
> 0xffffffff810ede2c 77EF            ja     ktime_get_ts64+0x9d
> (0xffffffff810ede1d) (up)
> <<<
> 0xffffffff810ede2e 4801CA          add    rdx,rcx
> 0xffffffff810ede31 48894308        mov    QWORD PTR
> [rbx+8]=0xFFFFFFFFA0774E01,rax
> 0xffffffff810ede35 488913          mov    QWORD PTR
> [rbx]=0xFFFFFFFFA0742000,rdx
> 0xffffffff810ede38 5B              pop    rbx
> 0xffffffff810ede39 415C            pop    r12
> 0xffffffff810ede3b 415D            pop    r13
> 0xffffffff810ede3d 5D              pop    rbp
> 0xffffffff810ede3e C3              ret
> 0xffffffff810ede3f 31D2            xor    edx,edx
> 0xffffffff810ede41 EBEB            jmp    ktime_get_ts64+0xae
> (0xffffffff810ede2e) (up)
> 0xffffffff810ede43 BE11030000      mov    esi,0x311
> 0xffffffff810ede48 48C7C751C38D81  mov    rdi,0xffffffff818dc351
> 0xffffffff810ede4f E82C12F9FF      call   warn_slowpath_null
> 0xffffffff810ede54 E946FFFFFF      jmp    ktime_get_ts64+0x1f
> (0xffffffff810edd9f) (up)
> 0xffffffff810ede59 F390            pause
> 0xffffffff810ede5b E942FFFFFF      jmp    ktime_get_ts64+0x22
> (0xffffffff810edda2) (up)
>
> Same code in GDB format
>
> (2)> id ktime_get_ts64+9d
> <<<
> 0xffffffff810ede1d ktime_get_ts64+0x9d:    sub    $0x3b9aca00,%rax
> 0xffffffff810ede23 ktime_get_ts64+0xa3:    add    $0x1,%edx
> 0xffffffff810ede26 ktime_get_ts64+0xa6:    cmp    $0x3b9ac9ff,%rax
> 0xffffffff810ede2c ktime_get_ts64+0xac:    ja     0xffffffff810ede1d
> ktime_get_ts64+0x9d (up)
> <<<
> 0xffffffff810ede2e ktime_get_ts64+0xae:    add    %rcx,%rdx
> 0xffffffff810ede31 ktime_get_ts64+0xb1:    mov    %rax,0x8(%rbx)
> 0xffffffff810ede35 ktime_get_ts64+0xb5:    mov    %rdx,(%rbx)
> 0xffffffff810ede38 ktime_get_ts64+0xb8:    pop    %rbx
> 0xffffffff810ede39 ktime_get_ts64+0xb9:    pop    %r12
> 0xffffffff810ede3b ktime_get_ts64+0xbb:    pop    %r13
> 0xffffffff810ede3d ktime_get_ts64+0xbd:    pop    %rbp
> 0xffffffff810ede3e ktime_get_ts64+0xbe:    retq
> 0xffffffff810ede3f ktime_get_ts64+0xbf:    xor    %edx,%edx
> 0xffffffff810ede41 ktime_get_ts64+0xc1:    jmp    0xffffffff810ede2e
> ktime_get_ts64+0xae (up)
> 0xffffffff810ede43 ktime_get_ts64+0xc3:    mov    $0x311,%esi
> 0xffffffff810ede48 ktime_get_ts64+0xc8:    mov    $0xffffffff818dc351,%rdi
> 0xffffffff810ede4f ktime_get_ts64+0xcf:    callq  0xffffffff8107f080
> warn_slowpath_null
> 0xffffffff810ede54 ktime_get_ts64+0xd4:    jmpq   0xffffffff810edd9f
> ktime_get_ts64+0x1f (up)
> 0xffffffff810ede59 ktime_get_ts64+0xd9:    pause
> 0xffffffff810ede5b ktime_get_ts64+0xdb:    jmpq   0xffffffff810edda2
> ktime_get_ts64+0x22 (up)
> (2)> g
>
>
> What is strange is the math its doing.  It is subtracting a fixed
> value from rax then comparing the value.  It looks like this is case
> where the value may have wrapped and the code just wasn;t setup to
> handle it.
>
> 0xffffffff810ede1d 482D00CA9A3B    sub    rax,0x3b9aca00
> 0xffffffff810ede23 83C201          add    edx,0x1
> 0xffffffff810ede26 483DFFC99A3B    cmp    rax,0x3b9ac9ff
> 0xffffffff810ede2c 77EF            ja     ktime_get_ts64+0x9d
> (0xffffffff810ede1d) (up)
>
> The C code is:
>
>
> /**
>  * ktime_get_ts64 - get the monotonic clock in timespec64 format
>  * @ts:		pointer to timespec variable
>  *
>  * The function calculates the monotonic clock from the realtime
>  * clock and the wall_to_monotonic offset and stores the result
>  * in normalized timespec64 format in the variable pointed to by @ts.
>  */
> void ktime_get_ts64(struct timespec64 *ts)
> {
> 	struct timekeeper *tk = &tk_core.timekeeper;
> 	struct timespec64 tomono;
> 	s64 nsec;
> 	unsigned int seq;
>
> 	WARN_ON(timekeeping_suspended);
>
> 	do {
> 		seq = read_seqcount_begin(&tk_core.seq);
> 		ts->tv_sec = tk->xtime_sec;
> 		nsec = timekeeping_get_ns(&tk->tkr_mono);
> 		tomono = tk->wall_to_monotonic;
> <<<
> 	} while (read_seqcount_retry(&tk_core.seq, seq));
> <<<
> 	ts->tv_sec += tomono.tv_sec;
> 	ts->tv_nsec = 0;
> 	timespec64_add_ns(ts, nsec + tomono.tv_nsec);
> }
> EXPORT_SYMBOL_GPL(ktime_get_ts64);
>
> Any ideas how to fix this problem?  That do {} while gets stuck there.
>
> Jeff
>

I'll try to get a better trace of this problem today and post it.  I
have a target system setup to catch this now.

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-19 15:37 ` Jeff Merkey
@ 2016-01-19 22:00   ` Jeff Merkey
  2016-01-20  0:59     ` Jeff Merkey
  2016-01-20  9:21     ` Thomas Gleixner
  0 siblings, 2 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-19 22:00 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz, Thomas Gleixner

Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
a bunch of macros to __iter_div_u64_rem().  It doesn't loop infinitely
with a number that big, just for about 4 minutes but long enough for
the hard lockup detector to fire off.  Setting RAX=0 when this bug
occurs clears the problem from the debugger console when this function
gets stuck in this loop and the system recovers fine.

The problem code is in:

include/linux/math64.h:117
static __always_inline u32
__iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
{
	u32 ret = 0;

	while (dividend >= divisor) {
		/* The following asm() prevents the compiler from
		   optimising this loop into a modulo operation.  */
		asm("" : "+rm"(dividend));

		dividend -= divisor;
		ret++;
	}
	*remainder = dividend;

	return ret;
}

which is called by
include/linux/time.h:233
static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
{
	a->tv_sec += __iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
	a->tv_nsec = ns;
}

it's defined so that timespec64_add_ns just points to this function
which is called by ktime_get_ts64.  tv_nsec + ns gets passed as a huge
number (RAX = 0x17AE7F57C671EA7D) with a very small divisor and just
sits there and loops.

Submitting a patch to fix this after I regress and test it.   Since it
makes no sense to loop on a simple calculation, fix should be:

static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
{
	a->tv_sec += div64_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
	a->tv_nsec = ns;
}

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-19 22:00   ` Jeff Merkey
@ 2016-01-20  0:59     ` Jeff Merkey
  2016-01-20  9:21     ` Thomas Gleixner
  1 sibling, 0 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-20  0:59 UTC (permalink / raw)
  To: LKML

I have done some more research on this and due to the way vdso works
any fix must also be on inline macro are rely only on inline macros.

Jeff

On 1/19/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
> a bunch of macros to __iter_div_u64_rem().  It doesn't loop infinitely
> with a number that big, just for about 4 minutes but long enough for
> the hard lockup detector to fire off.  Setting RAX=0 when this bug
> occurs clears the problem from the debugger console when this function
> gets stuck in this loop and the system recovers fine.
>
> The problem code is in:
>
> include/linux/math64.h:117
> static __always_inline u32
> __iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
> {
> 	u32 ret = 0;
>
> 	while (dividend >= divisor) {
> 		/* The following asm() prevents the compiler from
> 		   optimising this loop into a modulo operation.  */
> 		asm("" : "+rm"(dividend));
>
> 		dividend -= divisor;
> 		ret++;
> 	}
> 	*remainder = dividend;
>
> 	return ret;
> }
>
> which is called by
> include/linux/time.h:233
> static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
> {
> 	a->tv_sec += __iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
> 	a->tv_nsec = ns;
> }
>
> it's defined so that timespec64_add_ns just points to this function
> which is called by ktime_get_ts64.  tv_nsec + ns gets passed as a huge
> number (RAX = 0x17AE7F57C671EA7D) with a very small divisor and just
> sits there and loops.
>
> Submitting a patch to fix this after I regress and test it.   Since it
> makes no sense to loop on a simple calculation, fix should be:
>
> static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
> {
> 	a->tv_sec += div64_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
> 	a->tv_nsec = ns;
> }
>
> Jeff
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-19 22:00   ` Jeff Merkey
  2016-01-20  0:59     ` Jeff Merkey
@ 2016-01-20  9:21     ` Thomas Gleixner
  2016-01-20 14:26       ` Thomas Gleixner
  2016-01-20 16:53       ` Jeff Merkey
  1 sibling, 2 replies; 29+ messages in thread
From: Thomas Gleixner @ 2016-01-20  9:21 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: LKML, John Stultz

On Tue, 19 Jan 2016, Jeff Merkey wrote:
> Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through

And how exactly does that happen?

0x17AE7F57C671EA7D = 1.70644e+18  nsec
		   = 1.70644e+09  sec
		   = 2.84407e+07  min
		   = 474011	  hrs
		   = 19750.5	  days
		   = 54.1109	  years

That's the real issue, not what you are trying to 'fix' in timespec_add_ns()

> Submitting a patch to fix this after I regress and test it.   Since it
> makes no sense to loop on a simple calculation, fix should be:
> 
> static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
> {
> 	a->tv_sec += div64_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
> 	a->tv_nsec = ns;
> }

No. It's not that simple, because div64_u64_rem() is expensive on 32bit
architectures which have no hardware 64/32 division. And that's going to hurt
for the normal tick case where we have at max one iteration.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20  9:21     ` Thomas Gleixner
@ 2016-01-20 14:26       ` Thomas Gleixner
  2016-01-20 16:40         ` Jeff Merkey
  2016-01-20 16:53       ` Jeff Merkey
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2016-01-20 14:26 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: LKML, John Stultz

Jeff,

On Wed, 20 Jan 2016, Thomas Gleixner wrote:
> On Tue, 19 Jan 2016, Jeff Merkey wrote:
> > Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
> > gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
> 
> And how exactly does that happen?
> 
> 0x17AE7F57C671EA7D = 1.70644e+18  nsec
> 		   = 1.70644e+09  sec
> 		   = 2.84407e+07  min
> 		   = 474011	  hrs
> 		   = 19750.5	  days
> 		   = 54.1109	  years
> 
> That's the real issue, not what you are trying to 'fix' in timespec_add_ns()

And that's caused by stopping the whole machine for 20 minutes. It violates
the assumption of the timekeeping core, that the maximum time which is between
two updates of the core is < 5-10min. So that insane large number is caused by a
mult overrun when converting the time delta to nanoseconds.

You can find that limit via:

# dmesg | grep tsc | grep max_idle_ns
[    5.242683] clocksource tsc: mask: 0xffffffffffffffff max_cycles: 0x21139a22526, max_idle_ns: 440795252169 ns

So on that machine the limit is:

   440795252169 nsec
   440.795	sec
   7.34659	min

And before you ask or come up with patches: No, we are not going to add
anything to the core timekeeping code to work around this limitation simply
because its going to add overhead to a performance sensitive code path for a
very limited value.

Keeping a machine stopped for 20 minutes will make a lot of other things
unhappy, so introducing a 'fix' for that particular issue is just silly.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 14:26       ` Thomas Gleixner
@ 2016-01-20 16:40         ` Jeff Merkey
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-20 16:40 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

On 1/20/16, Thomas Gleixner <tglx@linutronix.de> wrote:
> Jeff,
>
> On Wed, 20 Jan 2016, Thomas Gleixner wrote:
>> On Tue, 19 Jan 2016, Jeff Merkey wrote:
>> > Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
>> > gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
>>
>> And how exactly does that happen?
>>
>> 0x17AE7F57C671EA7D = 1.70644e+18  nsec
>> 		   = 1.70644e+09  sec
>> 		   = 2.84407e+07  min
>> 		   = 474011	  hrs
>> 		   = 19750.5	  days
>> 		   = 54.1109	  years
>>
>> That's the real issue, not what you are trying to 'fix' in
>> timespec_add_ns()
>
> And that's caused by stopping the whole machine for 20 minutes. It violates
> the assumption of the timekeeping core, that the maximum time which is
> between
> two updates of the core is < 5-10min. So that insane large number is caused
> by a
> mult overrun when converting the time delta to nanoseconds.
>
> You can find that limit via:
>
> # dmesg | grep tsc | grep max_idle_ns
> [    5.242683] clocksource tsc: mask: 0xffffffffffffffff max_cycles:
> 0x21139a22526, max_idle_ns: 440795252169 ns
>
> So on that machine the limit is:
>
>    440795252169 nsec
>    440.795	sec
>    7.34659	min
>
> And before you ask or come up with patches: No, we are not going to add
> anything to the core timekeeping code to work around this limitation simply
> because its going to add overhead to a performance sensitive code path for
> a
> very limited value.

Given how fragile that code appears to be, this is reasonable.

>
> Keeping a machine stopped for 20 minutes will make a lot of other things
> unhappy, so introducing a 'fix' for that particular issue is just silly.
>

You know what's needed here is some form of touch function to keep this
system updated while spinning in the debugger.  That would solve it.
I can maintain
a fix for that locally.  I debugged the soft hang in systemd last
night, and I discovered
that its all related to this function returning bogus time (systemd
was doing a system call that eventually made its way to ktime_get_ts64
and got returned garbage).   When this wraps it causes all sorts of
bad stuff.

Do you have any suggestions on how a touch function could be coded to keep this
subsystem updated while the debugger is active?  There are already a
few of them I
have to call as well as kgdb and kdb to get around some of this.

void mdb_watchdogs(void)
{
    touch_softlockup_watchdog_sync();
    clocksource_touch_watchdog();

#if defined(CONFIG_TREE_RCU)
    rcu_cpu_stall_reset();
#endif

    touch_nmi_watchdog();
#ifdef CONFIG_HARDLOCKUP_DETECTOR
    touch_hardlockup_watchdog();
#endif
    return;
}

As you can see, there are already quite a few subsystems that manage
this problem of
debuggers holding the system in stasis.

Jeff

> Thanks,
>
> 	tglx
>

Well, that explains it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20  9:21     ` Thomas Gleixner
  2016-01-20 14:26       ` Thomas Gleixner
@ 2016-01-20 16:53       ` Jeff Merkey
  2016-01-20 17:16         ` Jeff Merkey
  2016-01-20 17:21         ` Thomas Gleixner
  1 sibling, 2 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-20 16:53 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

On 1/20/16, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 19 Jan 2016, Jeff Merkey wrote:
>> Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
>> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
>
> And how exactly does that happen?
>
> 0x17AE7F57C671EA7D = 1.70644e+18  nsec
> 		   = 1.70644e+09  sec
> 		   = 2.84407e+07  min
> 		   = 474011	  hrs
> 		   = 19750.5	  days
> 		   = 54.1109	  years
>
> That's the real issue, not what you are trying to 'fix' in
> timespec_add_ns()
>
>> Submitting a patch to fix this after I regress and test it.   Since it
>> makes no sense to loop on a simple calculation, fix should be:
>>
>> static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
>> {
>> 	a->tv_sec += div64_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
>> 	a->tv_nsec = ns;
>> }
>
> No. It's not that simple, because div64_u64_rem() is expensive on 32bit
> architectures which have no hardware 64/32 division. And that's going to
> hurt
> for the normal tick case where we have at max one iteration.
>

It's less expensive than a hard coded loop that subtracts in a looping
function as a substitute for dividing which is what is there.  What a
busted piece of shit .... LOL



> Thanks,
>
> 	tglx
>
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 16:53       ` Jeff Merkey
@ 2016-01-20 17:16         ` Jeff Merkey
  2016-01-20 17:32           ` John Stultz
  2016-01-20 17:21         ` Thomas Gleixner
  1 sibling, 1 reply; 29+ messages in thread
From: Jeff Merkey @ 2016-01-20 17:16 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

On 1/20/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> On 1/20/16, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Tue, 19 Jan 2016, Jeff Merkey wrote:
>>> Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
>>> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
>>
>> And how exactly does that happen?
>>
>> 0x17AE7F57C671EA7D = 1.70644e+18  nsec
>> 		   = 1.70644e+09  sec
>> 		   = 2.84407e+07  min
>> 		   = 474011	  hrs
>> 		   = 19750.5	  days
>> 		   = 54.1109	  years
>>
>> That's the real issue, not what you are trying to 'fix' in
>> timespec_add_ns()
>>

I guess I am going to have to become an expert on the timekeeper and
learn this subsystem backwards and forwards to code a touch function
to keep it from crashing the system.

On the 2.6 series kernels (and 2.2) this problem did not exist.  I
noticed a lot of these changes came in in the late 2.6 cycles.  Before
that time, I could leave the debugger spinning for days and linux
worked fine.

For people who have to pay developers to develop code on Linux a
debugger is almost
an essential tool since it saves hundreds of thousands of dollars in
development costs.  Not everyone wants to spend money for their
employees and engineers to sit around and code review every problem -
customers just want their problems fixed -- and fast.  That being
said, I am having no lack of people who download and use this debugger
and I'm certain kgdb is heavily used by folks doing development.   If
kernel development is too hard, people move to something else based on
simple economics.

That being said, I need to get this fixed.  There is no good reason a
debugger shouldn't be able to stop the system and leave it suspended
for days if necessary to run down a bug.  I wrote a debugger on SMP
Netware that worked that way.  The earliest versions of MDB worked
that way.

kgdb is broken right now because of this.  I am not certain it affects
all systems out there, but it needs to be fixed.

If you have any ideas on how to code a touch function please send me a
patch or suggest how it could be done non-obstrusively, otherwise I'll
have to dive into the timekeeper and fix it myself and learn yet
another subsystem of Linux and fix it bugs.  A code subsystem that
crashes because the timer tick is skewed or returns garbage is poorly
designed IMHO.

It should have either a touch function to keep it updated, or have the
ability to recover.

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 16:53       ` Jeff Merkey
  2016-01-20 17:16         ` Jeff Merkey
@ 2016-01-20 17:21         ` Thomas Gleixner
  2016-01-20 17:33           ` Jeff Merkey
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2016-01-20 17:21 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: LKML, John Stultz

On Wed, 20 Jan 2016, Jeff Merkey wrote:
> On 1/20/16, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Tue, 19 Jan 2016, Jeff Merkey wrote:
> >> Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
> >> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
> >
> > And how exactly does that happen?
> >
> > 0x17AE7F57C671EA7D = 1.70644e+18  nsec
> > 		   = 1.70644e+09  sec
> > 		   = 2.84407e+07  min
> > 		   = 474011	  hrs
> > 		   = 19750.5	  days
> > 		   = 54.1109	  years
> >
> > That's the real issue, not what you are trying to 'fix' in
> > timespec_add_ns()
> >
> >> Submitting a patch to fix this after I regress and test it.   Since it
> >> makes no sense to loop on a simple calculation, fix should be:
> >>
> >> static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
> >> {
> >> 	a->tv_sec += div64_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
> >> 	a->tv_nsec = ns;
> >> }
> >
> > No. It's not that simple, because div64_u64_rem() is expensive on 32bit
> > architectures which have no hardware 64/32 division. And that's going to
> > hurt
> > for the normal tick case where we have at max one iteration.
> >
> 
> It's less expensive than a hard coded loop that subtracts in a looping
> function as a substitute for dividing which is what is there.  What a
> busted piece of shit .... LOL

Let's talk about shit.

timespec[64]_add_ns() is used for timekeeping and in all normal use cases the
nsec part is less than 1e9 nsec. Even on 64 bit a divide is more expensive
than the sinlge iteration while loop and its insane expensive on 32bit
machines which do not have a 64/32 divison in hardware.

The while loop is there for a few corner cases which are a bit larger than 1e9
nsecs, but that's not the case we optimize for.

The case you are creating with your debugger is something completely different
and we never thought about it nor cared about it. Why? Because so far nobody
complained and I never cared about kernel debuggers at all.

What's worse is that your 'fix' does not resolve the underlying issue at
all. Why? Simply because you tried to fix the symptom and not the root cause.

I explained you the root cause and I explained you why that while() loop is
more efficient than a divide for the case it was written and optimized for.

Instead of reading and understanding what I wrote you teach me that your
divide is more efficient and call it a busted piece of shit.

Sure you are free to call that a busted piece of shit, but you don't have to
expect that the people who wrote, maintain and understand that code are going
to put up with your attitude.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 17:16         ` Jeff Merkey
@ 2016-01-20 17:32           ` John Stultz
  2016-01-20 17:36             ` Jeff Merkey
  2016-01-20 17:42             ` Thomas Gleixner
  0 siblings, 2 replies; 29+ messages in thread
From: John Stultz @ 2016-01-20 17:32 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: Thomas Gleixner, LKML

On Wed, Jan 20, 2016 at 9:16 AM, Jeff Merkey <linux.mdb@gmail.com> wrote:
> On 1/20/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> On 1/20/16, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> On Tue, 19 Jan 2016, Jeff Merkey wrote:
>>>> Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
>>>> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
>>>
>>> And how exactly does that happen?
>>>
>>> 0x17AE7F57C671EA7D = 1.70644e+18  nsec
>>>                 = 1.70644e+09  sec
>>>                 = 2.84407e+07  min
>>>                 = 474011       hrs
>>>                 = 19750.5      days
>>>                 = 54.1109      years
>>>
>>> That's the real issue, not what you are trying to 'fix' in
>>> timespec_add_ns()
>>>
>
> I guess I am going to have to become an expert on the timekeeper and
> learn this subsystem backwards and forwards to code a touch function
> to keep it from crashing the system.
>
> On the 2.6 series kernels (and 2.2) this problem did not exist.  I
> noticed a lot of these changes came in in the late 2.6 cycles.  Before
> that time, I could leave the debugger spinning for days and linux
> worked fine.
>
> For people who have to pay developers to develop code on Linux a
> debugger is almost
> an essential tool since it saves hundreds of thousands of dollars in
> development costs.  Not everyone wants to spend money for their
> employees and engineers to sit around and code review every problem -
> customers just want their problems fixed -- and fast.  That being
> said, I am having no lack of people who download and use this debugger
> and I'm certain kgdb is heavily used by folks doing development.   If
> kernel development is too hard, people move to something else based on
> simple economics.
>
> That being said, I need to get this fixed.  There is no good reason a
> debugger shouldn't be able to stop the system and leave it suspended
> for days if necessary to run down a bug.  I wrote a debugger on SMP
> Netware that worked that way.  The earliest versions of MDB worked
> that way.
>
> kgdb is broken right now because of this.  I am not certain it affects
> all systems out there, but it needs to be fixed.
>
> If you have any ideas on how to code a touch function please send me a
> patch or suggest how it could be done non-obstrusively, otherwise I'll
> have to dive into the timekeeper and fix it myself and learn yet
> another subsystem of Linux and fix it bugs.  A code subsystem that
> crashes because the timer tick is skewed or returns garbage is poorly
> designed IMHO.

Ehrm.  A more productive route in solving this might be to cap the
cycle delta we return from timekeeping_get_delta().

We already do this in the CONFIG_DEBUG_TIMEKEEPING, but adding a
simple check it to the non-debug case should be doable w/o adding too
much overhead to this very hot path.

Something like:
if (delta > tkr->clock->max_cycles)
    delta = tkr->clock->max_cycles;

return delta;

thanks
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 17:21         ` Thomas Gleixner
@ 2016-01-20 17:33           ` Jeff Merkey
  2016-01-20 19:34             ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Merkey @ 2016-01-20 17:33 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

On 1/20/16, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 20 Jan 2016, Jeff Merkey wrote:
>> On 1/20/16, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > On Tue, 19 Jan 2016, Jeff Merkey wrote:
>> >> Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
>> >> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
>> >
>> > And how exactly does that happen?
>> >
>> > 0x17AE7F57C671EA7D = 1.70644e+18  nsec
>> > 		   = 1.70644e+09  sec
>> > 		   = 2.84407e+07  min
>> > 		   = 474011	  hrs
>> > 		   = 19750.5	  days
>> > 		   = 54.1109	  years
>> >
>> > That's the real issue, not what you are trying to 'fix' in
>> > timespec_add_ns()
>> >
>> >> Submitting a patch to fix this after I regress and test it.   Since it
>> >> makes no sense to loop on a simple calculation, fix should be:
>> >>
>> >> static __always_inline void timespec_add_ns(struct timespec *a, u64
>> >> ns)
>> >> {
>> >> 	a->tv_sec += div64_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
>> >> 	a->tv_nsec = ns;
>> >> }
>> >
>> > No. It's not that simple, because div64_u64_rem() is expensive on 32bit
>> > architectures which have no hardware 64/32 division. And that's going
>> > to
>> > hurt
>> > for the normal tick case where we have at max one iteration.
>> >
>>
>> It's less expensive than a hard coded loop that subtracts in a looping
>> function as a substitute for dividing which is what is there.  What a
>> busted piece of shit .... LOL
>
> Let's talk about shit.
>
> timespec[64]_add_ns() is used for timekeeping and in all normal use cases
> the
> nsec part is less than 1e9 nsec. Even on 64 bit a divide is more expensive
> than the sinlge iteration while loop and its insane expensive on 32bit
> machines which do not have a 64/32 divison in hardware.
>
> The while loop is there for a few corner cases which are a bit larger than
> 1e9
> nsecs, but that's not the case we optimize for.
>
> The case you are creating with your debugger is something completely
> different
> and we never thought about it nor cared about it. Why? Because so far
> nobody
> complained and I never cared about kernel debuggers at all.
>
> What's worse is that your 'fix' does not resolve the underlying issue at
> all. Why? Simply because you tried to fix the symptom and not the root
> cause.
>
> I explained you the root cause and I explained you why that while() loop is
> more efficient than a divide for the case it was written and optimized for.
>
> Instead of reading and understanding what I wrote you teach me that your
> divide is more efficient and call it a busted piece of shit.
>
> Sure you are free to call that a busted piece of shit, but you don't have
> to
> expect that the people who wrote, maintain and understand that code are
> going
> to put up with your attitude.
>
> Thanks,
>
> 	tglx
>
>
>
>

Thomas, so far the only thing I've gotten from you are whining
diatribes about email ettiquette and today is the FIRST time you have
responded to me with any arguments that have technical merit and
actually address the problem.    That's what I am here for, not
anything else.  As Linus has said repeatedly, being nice doesn't seem
to get results.    Being direct and honest does.

I am here to make MDB run as well as I can make it on Linux and to
help make it run as well as possible on c86_64 and ia32.  That's my
only objective here.   That being said,  I am still wanting to get
this fixed.

Will you help me?

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 17:32           ` John Stultz
@ 2016-01-20 17:36             ` Jeff Merkey
  2016-01-20 17:42             ` Thomas Gleixner
  1 sibling, 0 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-20 17:36 UTC (permalink / raw)
  To: John Stultz; +Cc: Thomas Gleixner, LKML

On 1/20/16, John Stultz <john.stultz@linaro.org> wrote:
> On Wed, Jan 20, 2016 at 9:16 AM, Jeff Merkey <linux.mdb@gmail.com> wrote:
>> On 1/20/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
>>> On 1/20/16, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>> On Tue, 19 Jan 2016, Jeff Merkey wrote:
>>>>> Nasty bug but trivial fix for this.  What happens here is RAX (nsecs)
>>>>> gets set to a huge value (RAX = 0x17AE7F57C671EA7D) and passed through
>>>>
>>>> And how exactly does that happen?
>>>>
>>>> 0x17AE7F57C671EA7D = 1.70644e+18  nsec
>>>>                 = 1.70644e+09  sec
>>>>                 = 2.84407e+07  min
>>>>                 = 474011       hrs
>>>>                 = 19750.5      days
>>>>                 = 54.1109      years
>>>>
>>>> That's the real issue, not what you are trying to 'fix' in
>>>> timespec_add_ns()
>>>>
>>
>> I guess I am going to have to become an expert on the timekeeper and
>> learn this subsystem backwards and forwards to code a touch function
>> to keep it from crashing the system.
>>
>> On the 2.6 series kernels (and 2.2) this problem did not exist.  I
>> noticed a lot of these changes came in in the late 2.6 cycles.  Before
>> that time, I could leave the debugger spinning for days and linux
>> worked fine.
>>
>> For people who have to pay developers to develop code on Linux a
>> debugger is almost
>> an essential tool since it saves hundreds of thousands of dollars in
>> development costs.  Not everyone wants to spend money for their
>> employees and engineers to sit around and code review every problem -
>> customers just want their problems fixed -- and fast.  That being
>> said, I am having no lack of people who download and use this debugger
>> and I'm certain kgdb is heavily used by folks doing development.   If
>> kernel development is too hard, people move to something else based on
>> simple economics.
>>
>> That being said, I need to get this fixed.  There is no good reason a
>> debugger shouldn't be able to stop the system and leave it suspended
>> for days if necessary to run down a bug.  I wrote a debugger on SMP
>> Netware that worked that way.  The earliest versions of MDB worked
>> that way.
>>
>> kgdb is broken right now because of this.  I am not certain it affects
>> all systems out there, but it needs to be fixed.
>>
>> If you have any ideas on how to code a touch function please send me a
>> patch or suggest how it could be done non-obstrusively, otherwise I'll
>> have to dive into the timekeeper and fix it myself and learn yet
>> another subsystem of Linux and fix it bugs.  A code subsystem that
>> crashes because the timer tick is skewed or returns garbage is poorly
>> designed IMHO.
>
> Ehrm.  A more productive route in solving this might be to cap the
> cycle delta we return from timekeeping_get_delta().
>
> We already do this in the CONFIG_DEBUG_TIMEKEEPING, but adding a
> simple check it to the non-debug case should be doable w/o adding too
> much overhead to this very hot path.
>
> Something like:
> if (delta > tkr->clock->max_cycles)
>     delta = tkr->clock->max_cycles;
>
> return delta;
>
> thanks
> -john
>


Thank you John.  This is helpful.  Can you send me a patch for this
and I'll test it.  Then I am not touching this code and you guys can
put it in.

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 17:32           ` John Stultz
  2016-01-20 17:36             ` Jeff Merkey
@ 2016-01-20 17:42             ` Thomas Gleixner
  2016-01-20 17:59               ` John Stultz
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2016-01-20 17:42 UTC (permalink / raw)
  To: John Stultz; +Cc: Jeff Merkey, LKML

On Wed, 20 Jan 2016, John Stultz wrote:
> Ehrm.  A more productive route in solving this might be to cap the
> cycle delta we return from timekeeping_get_delta().
> 
> We already do this in the CONFIG_DEBUG_TIMEKEEPING, but adding a
> simple check it to the non-debug case should be doable w/o adding too
> much overhead to this very hot path.
> 
> Something like:
> if (delta > tkr->clock->max_cycles)
>     delta = tkr->clock->max_cycles;
> 
> return delta;

Well, you can make CONFIG_KDB select CONFIG_DEBUG_TIMEKEEPING.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 17:42             ` Thomas Gleixner
@ 2016-01-20 17:59               ` John Stultz
  2016-01-20 18:03                 ` Jeff Merkey
  0 siblings, 1 reply; 29+ messages in thread
From: John Stultz @ 2016-01-20 17:59 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Jeff Merkey, LKML

On Wed, Jan 20, 2016 at 9:42 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 20 Jan 2016, John Stultz wrote:
>> Ehrm.  A more productive route in solving this might be to cap the
>> cycle delta we return from timekeeping_get_delta().
>>
>> We already do this in the CONFIG_DEBUG_TIMEKEEPING, but adding a
>> simple check it to the non-debug case should be doable w/o adding too
>> much overhead to this very hot path.
>>
>> Something like:
>> if (delta > tkr->clock->max_cycles)
>>     delta = tkr->clock->max_cycles;
>>
>> return delta;
>
> Well, you can make CONFIG_KDB select CONFIG_DEBUG_TIMEKEEPING.

True.  And turning on DEBUG_TIMEKEEPING is probably the easiest thing
for Jeff to try.

Though, there's still the same issue w/ paused VMs. Most of the design
for the timekeeping code has been that it can't properly function if
you block update_wall_time() calls, but it shouldn't kill the box.
With most clocksources, the issue is the counter wraps and we lose
time. But in this case with the TSC its the *very* large cycle delta
turning into a unexpectedly large nanosecond value.

Hrm.. I do also wonder: the logarithmic accumulation chews through
large cycle deltas efficiently, but it does have some design limits,
so it  might also hit the rails and take awhile to spin accumulating
time with such large offsets.

Jeff: Can you try the config option above to let me know if that
avoids the issue? And if not, can you provide some analysis of what
else is going on?

thanks
-john

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 17:59               ` John Stultz
@ 2016-01-20 18:03                 ` Jeff Merkey
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-20 18:03 UTC (permalink / raw)
  To: John Stultz; +Cc: Thomas Gleixner, LKML

On 1/20/16, John Stultz <john.stultz@linaro.org> wrote:
> On Wed, Jan 20, 2016 at 9:42 AM, Thomas Gleixner <tglx@linutronix.de>
> wrote:
>> On Wed, 20 Jan 2016, John Stultz wrote:
>>> Ehrm.  A more productive route in solving this might be to cap the
>>> cycle delta we return from timekeeping_get_delta().
>>>
>>> We already do this in the CONFIG_DEBUG_TIMEKEEPING, but adding a
>>> simple check it to the non-debug case should be doable w/o adding too
>>> much overhead to this very hot path.
>>>
>>> Something like:
>>> if (delta > tkr->clock->max_cycles)
>>>     delta = tkr->clock->max_cycles;
>>>
>>> return delta;
>>
>> Well, you can make CONFIG_KDB select CONFIG_DEBUG_TIMEKEEPING.
>
> True.  And turning on DEBUG_TIMEKEEPING is probably the easiest thing
> for Jeff to try.
>
> Though, there's still the same issue w/ paused VMs. Most of the design
> for the timekeeping code has been that it can't properly function if
> you block update_wall_time() calls, but it shouldn't kill the box.
> With most clocksources, the issue is the counter wraps and we lose
> time. But in this case with the TSC its the *very* large cycle delta
> turning into a unexpectedly large nanosecond value.
>
> Hrm.. I do also wonder: the logarithmic accumulation chews through
> large cycle deltas efficiently, but it does have some design limits,
> so it  might also hit the rails and take awhile to spin accumulating
> time with such large offsets.
>
> Jeff: Can you try the config option above to let me know if that
> avoids the issue? And if not, can you provide some analysis of what
> else is going on?
>
> thanks
> -john
>

Yes sir.  I am changing the code and preparing to test this right now.
It will be about 4 hours before I have the results.

Thanks to you and Thomas for the help.  I appreciate it.

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 17:33           ` Jeff Merkey
@ 2016-01-20 19:34             ` Thomas Gleixner
  2016-01-20 19:59               ` Jeff Merkey
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2016-01-20 19:34 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: LKML, John Stultz

Jeff,

On Wed, 20 Jan 2016, Jeff Merkey wrote:
> Thomas, so far the only thing I've gotten from you are whining
> diatribes about email ettiquette and today is the FIRST time you have
> responded to me with any arguments that have technical merit and
> actually address the problem.    That's what I am here for, not
> anything else.  As Linus has said repeatedly, being nice doesn't seem
> to get results.    Being direct and honest does.
> 
> I am here to make MDB run as well as I can make it on Linux and to
> help make it run as well as possible on c86_64 and ia32.  That's my
> only objective here.   That being said,  I am still wanting to get
> this fixed.
> 
> Will you help me?

If you accept, that you are not setting the tone of the conversation as you
think it fits you.

I started to give you technical answers way before you started your completely
unjustified ranting. And I have done so before.

I'm neither going to cope with random emails in my private inbox nor with
top-posting and non-trimmed replies simply because that wastes my time. And
certainly I'm not going to cope with your assumption that not being nice is
the right way to go.

It's your decision, not mine.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 19:34             ` Thomas Gleixner
@ 2016-01-20 19:59               ` Jeff Merkey
  2016-01-21  7:09                 ` Jeff Merkey
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Merkey @ 2016-01-20 19:59 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

On 1/20/16, Thomas Gleixner <tglx@linutronix.de> wrote:
> Jeff,
>
> On Wed, 20 Jan 2016, Jeff Merkey wrote:
>> Thomas, so far the only thing I've gotten from you are whining
>> diatribes about email ettiquette and today is the FIRST time you have
>> responded to me with any arguments that have technical merit and
>> actually address the problem.    That's what I am here for, not
>> anything else.  As Linus has said repeatedly, being nice doesn't seem
>> to get results.    Being direct and honest does.
>>
>> I am here to make MDB run as well as I can make it on Linux and to
>> help make it run as well as possible on c86_64 and ia32.  That's my
>> only objective here.   That being said,  I am still wanting to get
>> this fixed.
>>
>> Will you help me?
>
> If you accept, that you are not setting the tone of the conversation as you
> think it fits you.
>
> I started to give you technical answers way before you started your
> completely
> unjustified ranting. And I have done so before.
>
> I'm neither going to cope with random emails in my private inbox nor with
> top-posting and non-trimmed replies simply because that wastes my time. And
> certainly I'm not going to cope with your assumption that not being nice is
> the right way to go.
>
> It's your decision, not mine.
>
> Thanks,
>
> 	tglx
>

Oh good grief -- I guess we all have our days me included.  I am sorry
to have offended you.    Turn the page, next chapter.    I am already
past this.  BTW, I am preparing my report.  Results same as before
with CONFIG_DEBUG_TIMEKEEPING, Hard Lockup again.  I am running a
trace right now to see just how big RAX was this time with the
nanoseconds count.

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-20 19:59               ` Jeff Merkey
@ 2016-01-21  7:09                 ` Jeff Merkey
  2016-01-21  8:18                   ` Jeff Merkey
  2016-01-21 10:12                   ` Thomas Gleixner
  0 siblings, 2 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-21  7:09 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

Ok, here's what I found after several hours of debugging and reviewing
this subsystem:

This subsystem plays is pretty loose in doing its math on 64 bit
registers.  I traced through ktime_get_ts64 hundreds of times and
sampled data running through it and from what I saw, just normal
operations comes dangerously close to causing the RAX register to
wrap.   If the delta gets too big it does wrap and I observed it
happening with the debugger tracing through the code.  It wraps
because of a sar instruction generated from the inline macros.

The wrap happens in this inline function.

static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
{
	cycle_t delta;
	s64 nsec;

	delta = timekeeping_get_delta(tkr);

	nsec = delta * tkr->mult + tkr->xtime_nsec;
	nsec >>= tkr->shift;    << wrap caused here

	/* If arch requires, add in get_arch_timeoffset() */
	return nsec + arch_gettimeoffset();
}

You only have 64 bits of register and the numbers being calculated
here are big.   By way of example, I observed the following during
normal operations:

delta  (RAX)   |     tkr->mult (RDX)

0x157876            0x65ee27
0xf1855               0x65f158
0x16cf05             0x65f408
303bc3                0x65f154

When this bug occurs different story.

delta  (RAX)    |     tkr->mult (RDX)

0x243283994b8     0x65233

So it goes like this:

	nsec = delta * tkr->mult + tkr->xtime_nsec;
        0x243283994b8 * 0x65233
        imul   rax,rdx = 0xE6A2Ce1f1ea690a8

	nsec >>= tkr->shift;    << wrap caused here
        sar    rax,cl  =  0xFFFFFFE6BFB3B7C3

the sar instruction doesn't just shift, it backfills the signedness of
the value, so this instruction is not doing what the C code is asking
it to do.  I am guessing that somewhere in this mass of macros,
something may have gotten declared wrong or incomplete (declared
signed ?).

The assembler output for this section that calls the macro to
calculate nsecs shows the sar instruction:

	delta = timekeeping_get_delta(tkr);

	nsec = delta * tkr->mult + tkr->xtime_nsec;
     29b:	48 0f af c2          	imul   %rdx,%rax
     29f:	48 03 05 00 00 00 00 	add    0x0(%rip),%rax        # 2a6
<ktime_get_ts64+0xc6>
	nsec >>= tkr->shift;
     2a6:	48 d3 f8             	sar    %cl,%rax


There is another problem with the tkr->read returning an unchanging,
unclearable number when this bug occurs for the delta value.  I
appears for whatever reason the clock has gone to sleep or gone away
and is no longer updating its counters.

static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
{
	cycle_t cycle_now, delta;

	/* read clocksource */
	cycle_now = tkr->read(tkr->clock); << returns the same value after
this bug happens

	/* calculate the delta since the last update_wall_time */
	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); <<
cycle last is also the same value.

	return delta;
}

This problem appears to have several things happening at once.
Probably the most concerning is that the assembler output is making
some assumptions about the SIGNEDNESS of the values being shifted and
using sar instead of shl instructions.

I am also concerned about the thr->read function returning an
unchanging value when this problem shows up.

This subsystem plays it fast and loose with its math, and if the clock
gets delayed or out of sync, it will wrap in the above function and it
will trigger the Hard Lockup detector if the value is large enough in
RAX.  The sanity check for CONFIG_DEBUG_TIMEKEEPER does not catch the
code path where this delta value gets set because the function to
update the delta is called in more then just in that function that
checks for an overflow and the wrap case happens underneath it.

I would check how these structs are defined and the vars in them to
see if somewhere they are declared as signed values to the compiler,
because that's what it thinks it was given to compile.

I am still debugging the thr->read issue.  I have determined the cause
of the wrap in the assembler.  As to why the gcc compiler is outputing
this instruction here is something to be determined.

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-21  7:09                 ` Jeff Merkey
@ 2016-01-21  8:18                   ` Jeff Merkey
  2016-01-21  9:08                     ` Jeff Merkey
  2016-01-21 10:12                   ` Thomas Gleixner
  1 sibling, 1 reply; 29+ messages in thread
From: Jeff Merkey @ 2016-01-21  8:18 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

On 1/21/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> Ok, here's what I found after several hours of debugging and reviewing
> this subsystem:
>
> This subsystem plays is pretty loose in doing its math on 64 bit
> registers.  I traced through ktime_get_ts64 hundreds of times and
> sampled data running through it and from what I saw, just normal
> operations comes dangerously close to causing the RAX register to
> wrap.   If the delta gets too big it does wrap and I observed it
> happening with the debugger tracing through the code.  It wraps
> because of a sar instruction generated from the inline macros.
>
> The wrap happens in this inline function.
>
> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
> {
> 	cycle_t delta;
> 	s64 nsec;
>
> 	delta = timekeeping_get_delta(tkr);
>
> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
> 	nsec >>= tkr->shift;    << wrap caused here
>
> 	/* If arch requires, add in get_arch_timeoffset() */
> 	return nsec + arch_gettimeoffset();
> }
>
> You only have 64 bits of register and the numbers being calculated
> here are big.   By way of example, I observed the following during
> normal operations:
>
> delta  (RAX)   |     tkr->mult (RDX)
>
> 0x157876            0x65ee27
> 0xf1855               0x65f158
> 0x16cf05             0x65f408
> 303bc3                0x65f154
>
> When this bug occurs different story.
>
> delta  (RAX)    |     tkr->mult (RDX)
>
> 0x243283994b8     0x65233
>
> So it goes like this:
>
> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>         0x243283994b8 * 0x65233
>         imul   rax,rdx = 0xE6A2Ce1f1ea690a8
>
> 	nsec >>= tkr->shift;    << wrap caused here
>         sar    rax,cl  =  0xFFFFFFE6BFB3B7C3
>
> the sar instruction doesn't just shift, it backfills the signedness of
> the value, so this instruction is not doing what the C code is asking
> it to do.  I am guessing that somewhere in this mass of macros,
> something may have gotten declared wrong or incomplete (declared
> signed ?).
>
> The assembler output for this section that calls the macro to
> calculate nsecs shows the sar instruction:
>
> 	delta = timekeeping_get_delta(tkr);
>
> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>      29b:	48 0f af c2          	imul   %rdx,%rax
>      29f:	48 03 05 00 00 00 00 	add    0x0(%rip),%rax        # 2a6
> <ktime_get_ts64+0xc6>
> 	nsec >>= tkr->shift;
>      2a6:	48 d3 f8             	sar    %cl,%rax
>
>
> There is another problem with the tkr->read returning an unchanging,
> unclearable number when this bug occurs for the delta value.  I
> appears for whatever reason the clock has gone to sleep or gone away
> and is no longer updating its counters.
>
> static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
> {
> 	cycle_t cycle_now, delta;
>
> 	/* read clocksource */
> 	cycle_now = tkr->read(tkr->clock); << returns the same value after
> this bug happens
>
> 	/* calculate the delta since the last update_wall_time */
> 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); <<
> cycle last is also the same value.
>
> 	return delta;
> }
>
> This problem appears to have several things happening at once.
> Probably the most concerning is that the assembler output is making
> some assumptions about the SIGNEDNESS of the values being shifted and
> using sar instead of shl instructions.
>
> I am also concerned about the thr->read function returning an
> unchanging value when this problem shows up.
>
> This subsystem plays it fast and loose with its math, and if the clock
> gets delayed or out of sync, it will wrap in the above function and it
> will trigger the Hard Lockup detector if the value is large enough in
> RAX.  The sanity check for CONFIG_DEBUG_TIMEKEEPER does not catch the
> code path where this delta value gets set because the function to
> update the delta is called in more then just in that function that
> checks for an overflow and the wrap case happens underneath it.
>
> I would check how these structs are defined and the vars in them to
> see if somewhere they are declared as signed values to the compiler,
> because that's what it thinks it was given to compile.
>
> I am still debugging the thr->read issue.  I have determined the cause
> of the wrap in the assembler.  As to why the gcc compiler is outputing
> this instruction here is something to be determined.
>
> Jeff
>

s64 = signed 64.    right in the code.  wow.

The iter_div64 functions all assume unsigned values, so this needs to
be patched.  Can't use signed for shifting to a value that will be
interpreted as unsigned by the system math.  Just makes a huge number
that wraps.  This patch is a one-liner after all.

:-)

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-21  8:18                   ` Jeff Merkey
@ 2016-01-21  9:08                     ` Jeff Merkey
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-21  9:08 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

This bug has been fixed as of linux-next-20160121.  I just checked so
looks like its handled.

	delta = timekeeping_get_delta(tkr);

	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
     251:	8b 15 00 00 00 00    	mov    0x0(%rip),%edx        # 257
<ktime_get_ts64+0x77>
     257:	49 0f 45 c4          	cmovne %r12,%rax
     25b:	48 0f af c2          	imul   %rdx,%rax
     25f:	48 03 05 00 00 00 00 	add    0x0(%rip),%rax        # 266
<ktime_get_ts64+0x86>
     266:	48 d3 e8             	shr    %cl,%rax    << correct
	

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-21  7:09                 ` Jeff Merkey
  2016-01-21  8:18                   ` Jeff Merkey
@ 2016-01-21 10:12                   ` Thomas Gleixner
  2016-01-21 15:46                     ` Jeff Merkey
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2016-01-21 10:12 UTC (permalink / raw)
  To: Jeff Merkey; +Cc: LKML, John Stultz

Jeff,

On Thu, 21 Jan 2016, Jeff Merkey wrote:
> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
> {
> 	cycle_t delta;
> 	s64 nsec;
> 
> 	delta = timekeeping_get_delta(tkr);
> 
> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
> 	nsec >>= tkr->shift;    << wrap caused here
> 
> 	/* If arch requires, add in get_arch_timeoffset() */
> 	return nsec + arch_gettimeoffset();
> }
> 
> You only have 64 bits of register and the numbers being calculated
> here are big.   By way of example, I observed the following during
> normal operations:
> 
> delta  (RAX)   |     tkr->mult (RDX)
> 
> 0x157876            0x65ee27
> 0xf1855               0x65f158
> 0x16cf05             0x65f408
> 303bc3                0x65f154
> 
> When this bug occurs different story.
> 
> delta  (RAX)    |     tkr->mult (RDX)
> 
> 0x243283994b8     0x65233
> 
> So it goes like this:
> 
> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>         0x243283994b8 * 0x65233
>         imul   rax,rdx = 0xE6A2Ce1f1ea690a8
> 
> 	nsec >>= tkr->shift;    << wrap caused here
>         sar    rax,cl  =  0xFFFFFFE6BFB3B7C3

That SAR is siomply wrong here. It must be an SHR and it is at least when I'm
looking at the assembly of my machine.

> the sar instruction doesn't just shift, it backfills the signedness of
> the value, so this instruction is not doing what the C code is asking
> it to do.  I am guessing that somewhere in this mass of macros,
> something may have gotten declared wrong or incomplete (declared
> signed ?).

There is no macro involved.

timekeeping_get_ns
{
	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
}

> The assembler output for this section that calls the macro to
> calculate nsecs shows the sar instruction:
> 
> 	delta = timekeeping_get_delta(tkr);
> 
> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>      29b:	48 0f af c2          	imul   %rdx,%rax
>      29f:	48 03 05 00 00 00 00 	add    0x0(%rip),%rax        # 2a6
> <ktime_get_ts64+0xc6>
> 	nsec >>= tkr->shift;
>      2a6:	48 d3 f8             	sar    %cl,%rax

And this is fundamentally wrong. Why is the compiler emitting SAR instead of
SHR here? Here is the assembly output from my kernel:

	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
     27e:	48 0f af c5          	imul   %rbp,%rax
     282:	48 01 d8             	add    %rbx,%rax
     285:	48 d3 e8             	shr    %cl,%rax

	} while (read_seqcount_retry(&tk_core.seq, seq));


So the first thing which needs to be figured out is WHY this results in a SAR
on your compiler.

> There is another problem with the tkr->read returning an unchanging,
> unclearable number when this bug occurs for the delta value.  I
> appears for whatever reason the clock has gone to sleep or gone away
> and is no longer updating its counters.
> 
> static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
> {
> 	cycle_t cycle_now, delta;
> 
> 	/* read clocksource */
> 	cycle_now = tkr->read(tkr->clock); << returns the same value after
> this bug happens
> 
> 	/* calculate the delta since the last update_wall_time */
> 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); <<
> cycle last is also the same value.
> 
> 	return delta;
> }

If that value does not change, then the timekeeping update is not
running. That might happen because the timer interrupt is not happening or
whatever got wreckaged.
 
> I would check how these structs are defined and the vars in them to
> see if somewhere they are declared as signed values to the compiler,
> because that's what it thinks it was given to compile.

Sure. Here you go:

	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;

delta, mult, xtime_nsec and shift are unsigned. The only signed value is nsec.

Does that issue go away if you apply the patch below?

Thanks,

	tglx

8<-----------
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 34b4cedfa80d..d405bcdf9d40 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -301,7 +301,7 @@ static inline u32 arch_gettimeoffset(void) { return 0; }
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
 	cycle_t delta;
-	s64 nsec;
+	u64 nsec;
 
 	delta = timekeeping_get_delta(tkr);
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-21 10:12                   ` Thomas Gleixner
@ 2016-01-21 15:46                     ` Jeff Merkey
  2016-01-21 16:57                       ` Jeff Merkey
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Merkey @ 2016-01-21 15:46 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

Hi Thomas

That patch works and so does the patch currently in linux-next.  Yeah !

The code on your machine is apparently from the linux-next tree. In
4.4 is busted and that was the build I was using since that's the last
build I have MDB released on.  If you compare v4.4 vs. linux-next you
will see someone already has patched this problem -- its just not in
Linus tree yet.

v4.4

static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
{
	cycle_t delta;
	s64 nsec;

	delta = timekeeping_get_delta(tkr);

	nsec = delta * tkr->mult + tkr->xtime_nsec;
	nsec >>= tkr->shift;  << causes sar to be used

	/* If arch requires, add in get_arch_timeoffset() */
	return nsec + arch_gettimeoffset();
}

linux-next

static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
{
	cycle_t delta;
	u64 nsec;

	delta = timekeeping_get_delta(tkr);

	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift; << shr used here

	/* If arch requires, add in get_arch_timeoffset() */
	return (s64)(nsec + arch_gettimeoffset());
}


This bug goes way back in the linux revs, so I have to retro apply
this patch back to v3.6 in the MDB patch series.  This bug was
introduced in the late linux 3.6 kernels and exists in all of them up
to v4.4.

Jeff


On 1/21/16, Thomas Gleixner <tglx@linutronix.de> wrote:
> Jeff,
>
> On Thu, 21 Jan 2016, Jeff Merkey wrote:
>> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>> {
>> 	cycle_t delta;
>> 	s64 nsec;
>>
>> 	delta = timekeeping_get_delta(tkr);
>>
>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>> 	nsec >>= tkr->shift;    << wrap caused here
>>
>> 	/* If arch requires, add in get_arch_timeoffset() */
>> 	return nsec + arch_gettimeoffset();
>> }
>>
>> You only have 64 bits of register and the numbers being calculated
>> here are big.   By way of example, I observed the following during
>> normal operations:
>>
>> delta  (RAX)   |     tkr->mult (RDX)
>>
>> 0x157876            0x65ee27
>> 0xf1855               0x65f158
>> 0x16cf05             0x65f408
>> 303bc3                0x65f154
>>
>> When this bug occurs different story.
>>
>> delta  (RAX)    |     tkr->mult (RDX)
>>
>> 0x243283994b8     0x65233
>>
>> So it goes like this:
>>
>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>>         0x243283994b8 * 0x65233
>>         imul   rax,rdx = 0xE6A2Ce1f1ea690a8
>>
>> 	nsec >>= tkr->shift;    << wrap caused here
>>         sar    rax,cl  =  0xFFFFFFE6BFB3B7C3
>
> That SAR is siomply wrong here. It must be an SHR and it is at least when
> I'm
> looking at the assembly of my machine.
>
>> the sar instruction doesn't just shift, it backfills the signedness of
>> the value, so this instruction is not doing what the C code is asking
>> it to do.  I am guessing that somewhere in this mass of macros,
>> something may have gotten declared wrong or incomplete (declared
>> signed ?).
>
> There is no macro involved.
>
> timekeeping_get_ns
> {
> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
> }
>
>> The assembler output for this section that calls the macro to
>> calculate nsecs shows the sar instruction:
>>
>> 	delta = timekeeping_get_delta(tkr);
>>
>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>>      29b:	48 0f af c2          	imul   %rdx,%rax
>>      29f:	48 03 05 00 00 00 00 	add    0x0(%rip),%rax        # 2a6
>> <ktime_get_ts64+0xc6>
>> 	nsec >>= tkr->shift;
>>      2a6:	48 d3 f8             	sar    %cl,%rax
>
> And this is fundamentally wrong. Why is the compiler emitting SAR instead
> of
> SHR here? Here is the assembly output from my kernel:
>
> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>      27e:	48 0f af c5          	imul   %rbp,%rax
>      282:	48 01 d8             	add    %rbx,%rax
>      285:	48 d3 e8             	shr    %cl,%rax
>
> 	} while (read_seqcount_retry(&tk_core.seq, seq));
>
>
> So the first thing which needs to be figured out is WHY this results in a
> SAR
> on your compiler.
>
>> There is another problem with the tkr->read returning an unchanging,
>> unclearable number when this bug occurs for the delta value.  I
>> appears for whatever reason the clock has gone to sleep or gone away
>> and is no longer updating its counters.
>>
>> static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
>> {
>> 	cycle_t cycle_now, delta;
>>
>> 	/* read clocksource */
>> 	cycle_now = tkr->read(tkr->clock); << returns the same value after
>> this bug happens
>>
>> 	/* calculate the delta since the last update_wall_time */
>> 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); <<
>> cycle last is also the same value.
>>
>> 	return delta;
>> }
>
> If that value does not change, then the timekeeping update is not
> running. That might happen because the timer interrupt is not happening or
> whatever got wreckaged.
>
>> I would check how these structs are defined and the vars in them to
>> see if somewhere they are declared as signed values to the compiler,
>> because that's what it thinks it was given to compile.
>
> Sure. Here you go:
>
> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>
> delta, mult, xtime_nsec and shift are unsigned. The only signed value is
> nsec.
>
> Does that issue go away if you apply the patch below?
>
> Thanks,
>
> 	tglx
>
> 8<-----------
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 34b4cedfa80d..d405bcdf9d40 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -301,7 +301,7 @@ static inline u32 arch_gettimeoffset(void) { return 0;
> }
>  static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>  {
>  	cycle_t delta;
> -	s64 nsec;
> +	u64 nsec;
>
>  	delta = timekeeping_get_delta(tkr);
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-21 15:46                     ` Jeff Merkey
@ 2016-01-21 16:57                       ` Jeff Merkey
  2016-01-21 17:00                         ` Jeff Merkey
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Merkey @ 2016-01-21 16:57 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

One other item worth mentioning is the effect of this bug on the user
space daemons.  Since this bug is in all kernels from 3.6-v4.4, folks
will see the following behavior of systemd when this bug fires off and
the value wraps:

id 0a0c809bcdc0812145d9c35456e46335650601b7
reason:         systemd-journald killed by SIGABRT
time:           Sun 10 Jan 2016 11:56:16 AM MST
cmdline:        /usr/lib/systemd/systemd-journald
package:        systemd-219-19.el7
uid:            0 (root)
count:          2
Directory:      /var/tmp/abrt/ccpp-2016-01-10-11:56:16-6281

id 38f7bb372d0c14cb912e9d225fb474b931889e77
reason:         __epoll_wait_nocancel(): systemd-logind killed by SIGABRT
time:           Fri 08 Jan 2016 01:33:42 PM MST
cmdline:        /usr/lib/systemd/systemd-logind
package:        systemd-219-19.el7
uid:            0 (root)
count:          39
Directory:      /var/tmp/abrt/ccpp-2016-01-08-13:33:42-630
Reported:
https://retrace.fedoraproject.org/faf/reports/bthash/307b26a77cc6d5005ce2fdf18ff010fe3dc94401

So if folks see aborts in systemd that look like the above, it's this
bug.    I traced the systemd crash as well and its caused by the
system returning garbage via system calls to systemd when that rax
value wraps.

Jeff.



On 1/21/16, Jeff Merkey <linux.mdb@gmail.com> wrote:
> Hi Thomas
>
> That patch works and so does the patch currently in linux-next.  Yeah !
>
> The code on your machine is apparently from the linux-next tree. In
> 4.4 is busted and that was the build I was using since that's the last
> build I have MDB released on.  If you compare v4.4 vs. linux-next you
> will see someone already has patched this problem -- its just not in
> Linus tree yet.
>
> v4.4
>
> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
> {
> 	cycle_t delta;
> 	s64 nsec;
>
> 	delta = timekeeping_get_delta(tkr);
>
> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
> 	nsec >>= tkr->shift;  << causes sar to be used
>
> 	/* If arch requires, add in get_arch_timeoffset() */
> 	return nsec + arch_gettimeoffset();
> }
>
> linux-next
>
> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
> {
> 	cycle_t delta;
> 	u64 nsec;
>
> 	delta = timekeeping_get_delta(tkr);
>
> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift; << shr used
> here
>
> 	/* If arch requires, add in get_arch_timeoffset() */
> 	return (s64)(nsec + arch_gettimeoffset());
> }
>
>
> This bug goes way back in the linux revs, so I have to retro apply
> this patch back to v3.6 in the MDB patch series.  This bug was
> introduced in the late linux 3.6 kernels and exists in all of them up
> to v4.4.
>
> Jeff
>
>
> On 1/21/16, Thomas Gleixner <tglx@linutronix.de> wrote:
>> Jeff,
>>
>> On Thu, 21 Jan 2016, Jeff Merkey wrote:
>>> static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>>> {
>>> 	cycle_t delta;
>>> 	s64 nsec;
>>>
>>> 	delta = timekeeping_get_delta(tkr);
>>>
>>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>>> 	nsec >>= tkr->shift;    << wrap caused here
>>>
>>> 	/* If arch requires, add in get_arch_timeoffset() */
>>> 	return nsec + arch_gettimeoffset();
>>> }
>>>
>>> You only have 64 bits of register and the numbers being calculated
>>> here are big.   By way of example, I observed the following during
>>> normal operations:
>>>
>>> delta  (RAX)   |     tkr->mult (RDX)
>>>
>>> 0x157876            0x65ee27
>>> 0xf1855               0x65f158
>>> 0x16cf05             0x65f408
>>> 303bc3                0x65f154
>>>
>>> When this bug occurs different story.
>>>
>>> delta  (RAX)    |     tkr->mult (RDX)
>>>
>>> 0x243283994b8     0x65233
>>>
>>> So it goes like this:
>>>
>>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>>>         0x243283994b8 * 0x65233
>>>         imul   rax,rdx = 0xE6A2Ce1f1ea690a8
>>>
>>> 	nsec >>= tkr->shift;    << wrap caused here
>>>         sar    rax,cl  =  0xFFFFFFE6BFB3B7C3
>>
>> That SAR is siomply wrong here. It must be an SHR and it is at least when
>> I'm
>> looking at the assembly of my machine.
>>
>>> the sar instruction doesn't just shift, it backfills the signedness of
>>> the value, so this instruction is not doing what the C code is asking
>>> it to do.  I am guessing that somewhere in this mass of macros,
>>> something may have gotten declared wrong or incomplete (declared
>>> signed ?).
>>
>> There is no macro involved.
>>
>> timekeeping_get_ns
>> {
>> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>> }
>>
>>> The assembler output for this section that calls the macro to
>>> calculate nsecs shows the sar instruction:
>>>
>>> 	delta = timekeeping_get_delta(tkr);
>>>
>>> 	nsec = delta * tkr->mult + tkr->xtime_nsec;
>>>      29b:	48 0f af c2          	imul   %rdx,%rax
>>>      29f:	48 03 05 00 00 00 00 	add    0x0(%rip),%rax        # 2a6
>>> <ktime_get_ts64+0xc6>
>>> 	nsec >>= tkr->shift;
>>>      2a6:	48 d3 f8             	sar    %cl,%rax
>>
>> And this is fundamentally wrong. Why is the compiler emitting SAR instead
>> of
>> SHR here? Here is the assembly output from my kernel:
>>
>> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>>      27e:	48 0f af c5          	imul   %rbp,%rax
>>      282:	48 01 d8             	add    %rbx,%rax
>>      285:	48 d3 e8             	shr    %cl,%rax
>>
>> 	} while (read_seqcount_retry(&tk_core.seq, seq));
>>
>>
>> So the first thing which needs to be figured out is WHY this results in a
>> SAR
>> on your compiler.
>>
>>> There is another problem with the tkr->read returning an unchanging,
>>> unclearable number when this bug occurs for the delta value.  I
>>> appears for whatever reason the clock has gone to sleep or gone away
>>> and is no longer updating its counters.
>>>
>>> static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
>>> {
>>> 	cycle_t cycle_now, delta;
>>>
>>> 	/* read clocksource */
>>> 	cycle_now = tkr->read(tkr->clock); << returns the same value after
>>> this bug happens
>>>
>>> 	/* calculate the delta since the last update_wall_time */
>>> 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); <<
>>> cycle last is also the same value.
>>>
>>> 	return delta;
>>> }
>>
>> If that value does not change, then the timekeeping update is not
>> running. That might happen because the timer interrupt is not happening
>> or
>> whatever got wreckaged.
>>
>>> I would check how these structs are defined and the vars in them to
>>> see if somewhere they are declared as signed values to the compiler,
>>> because that's what it thinks it was given to compile.
>>
>> Sure. Here you go:
>>
>> 	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
>>
>> delta, mult, xtime_nsec and shift are unsigned. The only signed value is
>> nsec.
>>
>> Does that issue go away if you apply the patch below?
>>
>> Thanks,
>>
>> 	tglx
>>
>> 8<-----------
>> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>> index 34b4cedfa80d..d405bcdf9d40 100644
>> --- a/kernel/time/timekeeping.c
>> +++ b/kernel/time/timekeeping.c
>> @@ -301,7 +301,7 @@ static inline u32 arch_gettimeoffset(void) { return
>> 0;
>> }
>>  static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>>  {
>>  	cycle_t delta;
>> -	s64 nsec;
>> +	u64 nsec;
>>
>>  	delta = timekeeping_get_delta(tkr);
>>
>>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-21 16:57                       ` Jeff Merkey
@ 2016-01-21 17:00                         ` Jeff Merkey
  2016-01-21 17:07                           ` Jeff Merkey
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Merkey @ 2016-01-21 17:00 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

I am switching email clients from gmail.  This stupid email client of
theirs constantly top posts and sends multiple emails replies when you
try to send a reply.  I apologize yet again for the top posting.  It's
gmail doing it.  Unless I delete the entire previous message it will
always top post.

Sorry about that I finally figured out that bug too.

:-)

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [BUG REPORT] ktime_get_ts64 causes Hard Lockup
  2016-01-21 17:00                         ` Jeff Merkey
@ 2016-01-21 17:07                           ` Jeff Merkey
  0 siblings, 0 replies; 29+ messages in thread
From: Jeff Merkey @ 2016-01-21 17:07 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz

Hi Thomas,

I figured out where the top-posting is coming from -- gmail web client
does it automatically.  Switching email clients.  I will still use
gmail for smtp forwarding but their busted piece of crap web email
client automatically top posts when you use the "reply" window and
it's gone.

I am not intentionally top posting, I am switching clients today so
this problem does not happen again.

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-01-21 17:07 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-19  1:59 [BUG REPORT] ktime_get_ts64 causes Hard Lockup Jeff Merkey
2016-01-19  2:16 ` Jeff Merkey
2016-01-19  2:31   ` Jeff Merkey
2016-01-19  9:50 ` Thomas Gleixner
2016-01-19 15:37 ` Jeff Merkey
2016-01-19 22:00   ` Jeff Merkey
2016-01-20  0:59     ` Jeff Merkey
2016-01-20  9:21     ` Thomas Gleixner
2016-01-20 14:26       ` Thomas Gleixner
2016-01-20 16:40         ` Jeff Merkey
2016-01-20 16:53       ` Jeff Merkey
2016-01-20 17:16         ` Jeff Merkey
2016-01-20 17:32           ` John Stultz
2016-01-20 17:36             ` Jeff Merkey
2016-01-20 17:42             ` Thomas Gleixner
2016-01-20 17:59               ` John Stultz
2016-01-20 18:03                 ` Jeff Merkey
2016-01-20 17:21         ` Thomas Gleixner
2016-01-20 17:33           ` Jeff Merkey
2016-01-20 19:34             ` Thomas Gleixner
2016-01-20 19:59               ` Jeff Merkey
2016-01-21  7:09                 ` Jeff Merkey
2016-01-21  8:18                   ` Jeff Merkey
2016-01-21  9:08                     ` Jeff Merkey
2016-01-21 10:12                   ` Thomas Gleixner
2016-01-21 15:46                     ` Jeff Merkey
2016-01-21 16:57                       ` Jeff Merkey
2016-01-21 17:00                         ` Jeff Merkey
2016-01-21 17:07                           ` Jeff Merkey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.