[GIT PULL][PATCH 0/2] Timekeeping items for 4.8 via tip/timers/urgent

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [GIT PULL][PATCH 0/2] Timekeeping items for 4.8 via tip/timers/urgent
@ 2016-08-23 23:08 John Stultz
  2016-08-23 23:08 ` [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING John Stultz
  2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz
  0 siblings, 2 replies; 13+ messages in thread
From: John Stultz @ 2016-08-23 23:08 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner

Hey Thomas, Ingo,
  Wanted to submit the following changes for review and inclusion for
tip/timers/urgent. The first is a fix for an NMI safety issue Steven
pointed out last week and the other is a potential memory corruption
issue.

Let me know if you have any objections.

thanks
-john

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>

If desired, the patches can be pulled as follows:

The following changes since commit be5769e2061ac40b32daa83e28e1c4aac7133511:

  clocksource/drivers/mips-gic-timer: Make gic_clocksource_of_init() return int (2016-08-17 13:08:32 +0200)

are available in the git repository at:

  https://git.linaro.org/people/john.stultz/linux.git fortglx/4.8/time

for you to fetch changes up to 30a56a04134382bd8f0fa51db19b94af7f74f9ee:

  timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times (2016-08-23 16:03:15 -0700)

----------------------------------------------------------------
John Stultz (2):
  timekeeping: Avoid taking lock in NMI path with
    CONFIG_DEBUG_TIMEKEEPING
  timekeeping: Cap array access in timekeeping_debug to protect against
    invalid sleep times

 kernel/time/timekeeping.c       | 5 ++++-
 kernel/time/timekeeping_debug.c | 9 +++++++--
 2 files changed, 11 insertions(+), 3 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
  2016-08-23 23:08 [GIT PULL][PATCH 0/2] Timekeeping items for 4.8 via tip/timers/urgent John Stultz
@ 2016-08-23 23:08 ` John Stultz
  2016-08-24  7:40   ` [tip:timers/urgent] " tip-bot for John Stultz
  2016-10-05  0:30   ` [PATCH 1/2] " Alexei Starovoitov
  2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz
  1 sibling, 2 replies; 13+ messages in thread
From: John Stultz @ 2016-08-23 23:08 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, stable

When I added some extra sanity checking in timekeeping_get_ns() under
CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
method was using timekeeping_get_ns().

Thus the locking added to the debug checks broke the NMI-safety of
__ktime_get_fast_ns().

This patch open-codes the timekeeping_get_ns() logic for
__ktime_get_fast_ns(), so can avoid any deadlocks in NMI.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable <stable@vger.kernel.org> # 4.1+
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 3b65746..e07fb09 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
 	do {
 		seq = raw_read_seqcount_latch(&tkf->seq);
 		tkr = tkf->base + (seq & 0x01);
-		now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr);
+		now = ktime_to_ns(tkr->base);
+
+		now += clocksource_delta(tkr->read(tkr->clock),
+					 tkr->cycle_last, tkr->mask);
 	} while (read_seqcount_retry(&tkf->seq, seq));
 
 	return now;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:timers/urgent] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
  2016-08-23 23:08 ` [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING John Stultz
@ 2016-08-24  7:40   ` tip-bot for John Stultz
  2016-10-05  0:30   ` [PATCH 1/2] " Alexei Starovoitov
  1 sibling, 0 replies; 13+ messages in thread
From: tip-bot for John Stultz @ 2016-08-24  7:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, rostedt, mingo, peterz, linux-kernel, hpa, john.stultz, stable

Commit-ID:  27727df240c7cc84f2ba6047c6f18d5addfd25ef
Gitweb:     http://git.kernel.org/tip/27727df240c7cc84f2ba6047c6f18d5addfd25ef
Author:     John Stultz <john.stultz@linaro.org>
AuthorDate: Tue, 23 Aug 2016 16:08:21 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 24 Aug 2016 09:34:31 +0200

timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING

When I added some extra sanity checking in timekeeping_get_ns() under
CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
method was using timekeeping_get_ns().

Thus the locking added to the debug checks broke the NMI-safety of
__ktime_get_fast_ns().

This patch open-codes the timekeeping_get_ns() logic for
__ktime_get_fast_ns(), so can avoid any deadlocks in NMI.

Fixes: 4ca22c2648f9 "timekeeping: Add warnings when overflows or underflows are observed"
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: stable <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/timekeeping.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 3b65746..e07fb09 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
 	do {
 		seq = raw_read_seqcount_latch(&tkf->seq);
 		tkr = tkf->base + (seq & 0x01);
-		now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr);
+		now = ktime_to_ns(tkr->base);
+
+		now += clocksource_delta(tkr->read(tkr->clock),
+					 tkr->cycle_last, tkr->mask);
 	} while (read_seqcount_retry(&tkf->seq, seq));
 
 	return now;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
  2016-08-23 23:08 ` [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING John Stultz
  2016-08-24  7:40   ` [tip:timers/urgent] " tip-bot for John Stultz
@ 2016-10-05  0:30   ` Alexei Starovoitov
  2016-10-05  0:50     ` John Stultz
  2016-10-05  2:55     ` [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression John Stultz
  1 sibling, 2 replies; 13+ messages in thread
From: Alexei Starovoitov @ 2016-10-05  0:30 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, stable, Brendan Gregg

On Tue, Aug 23, 2016 at 04:08:21PM -0700, John Stultz wrote:
> When I added some extra sanity checking in timekeeping_get_ns() under
> CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
> method was using timekeeping_get_ns().
> 
> Thus the locking added to the debug checks broke the NMI-safety of
> __ktime_get_fast_ns().
> 
> This patch open-codes the timekeeping_get_ns() logic for
> __ktime_get_fast_ns(), so can avoid any deadlocks in NMI.
> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable <stable@vger.kernel.org> # 4.1+
> Reported-by: Steven Rostedt <rostedt@goodmis.org>
> Reported-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  kernel/time/timekeeping.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 3b65746..e07fb09 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
>  	do {
>  		seq = raw_read_seqcount_latch(&tkf->seq);
>  		tkr = tkf->base + (seq & 0x01);
> -		now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr);
> +		now = ktime_to_ns(tkr->base);
> +
> +		now += clocksource_delta(tkr->read(tkr->clock),
> +					 tkr->cycle_last, tkr->mask);

we're seeing the time jumping backwards between __ktime_get_fast_ns calls.
and looks like this patch broke it, since delta is being added to ns.
It seems it should be:
now += timekeeping_delta_to_ns(clocksource_delta(...));
or better fix possible?

Reported-by: Brendan Gregg <bgregg@netflix.com>

# perf record -F 9 --cpu 0 --clockid CLOCK_MONOTONIC -- sleep 1
# perf script
         swapper     0 [000] 2942480.468981:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000] 2942480.471468:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000] 2942480.473959:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000] 2942480.475202:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000] 2942480.477691:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000] 2942481.470226:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000] 2942481.472569:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000] 2942481.476460:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000] 2942481.478943:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)

Note the timestamp jump.

fixed (rolled back patch):

# perf record -F 9 --cpu 0 --clockid CLOCK_MONOTONIC -- sleep 1
# perf script
         swapper     0 [000]   589.729663:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000]   589.840772:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000]   589.951883:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000]   590.062994:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000]   590.174075:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000]   590.285217:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000]   590.396328:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000]   590.507439:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)
         swapper     0 [000]   590.618550:  111111111 cpu-clock:  ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
  2016-10-05  0:30   ` [PATCH 1/2] " Alexei Starovoitov
@ 2016-10-05  0:50     ` John Stultz
  2016-10-05  2:55     ` [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression John Stultz
  1 sibling, 0 replies; 13+ messages in thread
From: John Stultz @ 2016-10-05  0:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: lkml, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, stable, Brendan Gregg

On Tue, Oct 4, 2016 at 5:30 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Aug 23, 2016 at 04:08:21PM -0700, John Stultz wrote:
>> When I added some extra sanity checking in timekeeping_get_ns() under
>> CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
>> method was using timekeeping_get_ns().
>>
>> Thus the locking added to the debug checks broke the NMI-safety of
>> __ktime_get_fast_ns().
>>
>> This patch open-codes the timekeeping_get_ns() logic for
>> __ktime_get_fast_ns(), so can avoid any deadlocks in NMI.
>>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: stable <stable@vger.kernel.org> # 4.1+
>> Reported-by: Steven Rostedt <rostedt@goodmis.org>
>> Reported-by: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>> ---
>>  kernel/time/timekeeping.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>> index 3b65746..e07fb09 100644
>> --- a/kernel/time/timekeeping.c
>> +++ b/kernel/time/timekeeping.c
>> @@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
>>       do {
>>               seq = raw_read_seqcount_latch(&tkf->seq);
>>               tkr = tkf->base + (seq & 0x01);
>> -             now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr);
>> +             now = ktime_to_ns(tkr->base);
>> +
>> +             now += clocksource_delta(tkr->read(tkr->clock),
>> +                                      tkr->cycle_last, tkr->mask);
>
> we're seeing the time jumping backwards between __ktime_get_fast_ns calls.
> and looks like this patch broke it, since delta is being added to ns.
> It seems it should be:
> now += timekeeping_delta_to_ns(clocksource_delta(...));
> or better fix possible?

Gah! Yes. Quite right. I'm not sure how I missed that. Thanks for
catching it quickly and apologies!

I've got an initial fix but I'm sitting down for dinner so I'll send
it out a bit later after I can test it.

thanks
-john

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression
  2016-10-05  0:30   ` [PATCH 1/2] " Alexei Starovoitov
  2016-10-05  0:50     ` John Stultz
@ 2016-10-05  2:55     ` John Stultz
  2016-10-05 10:57       ` Mathieu Desnoyers
  2016-10-05 13:48       ` [tip:timers/urgent] " tip-bot for John Stultz
  1 sibling, 2 replies; 13+ messages in thread
From: John Stultz @ 2016-10-05  2:55 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Steven Rostedt, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, stable, Brendan Gregg, Alexei Starovoitov

In commit 27727df240c7 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.

This would result in bogus perf timestamps like:
 swapper     0 [000]   253.427536:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426573:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426687:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426800:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426905:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427022:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427127:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427239:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427346:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427463:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   255.426572:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Instead of more reasonable expected timestamps like:
 swapper     0 [000]    39.953768:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.064839:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.175956:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.287103:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.398217:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.509324:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.620437:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.731546:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.842654:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.953772:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    41.064881:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

This patch adds the proper use of timekeeping_delta_to_ns()
to convert the cycle delta to nanoseconds as needed.

Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.

Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable <stable@vger.kernel.org>
Cc: Brendan Gregg <bgregg@netflix.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"
Reported-by: Brendan Gregg <bgregg@netflix.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
Thomas/Ingo:
I've reproduced the issue and validated this fixes it, but given my limited perf
usage so far, I'd appreciate waiting for a Tested-by: from one of the reporters
before considering for tip/timers/urgent.

 kernel/time/timekeeping.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e07fb09..37dec7e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -403,8 +403,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
 		tkr = tkf->base + (seq & 0x01);
 		now = ktime_to_ns(tkr->base);
 
-		now += clocksource_delta(tkr->read(tkr->clock),
-					 tkr->cycle_last, tkr->mask);
+		now += timekeeping_delta_to_ns(tkr,
+				clocksource_delta(
+					tkr->read(tkr->clock),
+					tkr->cycle_last,
+					tkr->mask));
 	} while (read_seqcount_retry(&tkf->seq, seq));
 
 	return now;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression
  2016-10-05  2:55     ` [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression John Stultz
@ 2016-10-05 10:57       ` Mathieu Desnoyers
  2016-10-05 13:48       ` [tip:timers/urgent] " tip-bot for John Stultz
  1 sibling, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2016-10-05 10:57 UTC (permalink / raw)
  To: John Stultz
  Cc: linux-kernel, rostedt, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, stable, Brendan Gregg, Alexei Starovoitov,
	Linus Torvalds

----- On Oct 4, 2016, at 10:55 PM, John Stultz john.stultz@linaro.org wrote:

> In commit 27727df240c7 ("Avoid taking lock in NMI path with
> CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
> the timekeeping_get_ns() function, but I forgot to include
> the unit conversion from cycles to nanoseconds, breaking the
> function's output, which impacts users like perf.

It also impacts LTTng. I've had reports of timestamp borkage from
LTTng end users yesterday with kernels 4.7.5 and 4.7.4.

Some stable branches sleuthing indicates that the following
kernel versions are affected: 4.8, 4.7.4+, 4.4.20+, 4.1.32+

I plan to add kernel version range checks in lttng-modules to
work-around this mess, but I need an upper bound. Is there any way
we can get this upstream and cherry-picked into stable branches
ASAP ?

Tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

LTTng kernel traces with Linux 4.8:

[06:37:40.248222037] (+0.000001057) compudjdev power_cpu_idle: { cpu_id = 10 }, { state = 1, cpu_id = 10 }
[06:37:40.248222037] (+0.000000000) compudjdev x86_irq_vectors_local_timer_entry: { cpu_id = 10 }, { vector = 239 }
[06:37:40.248222037] (+0.000000000) compudjdev timer_hrtimer_cancel: { cpu_id = 10 }, { hrtimer = 0xFFFF880FC1692700 }
[06:37:40.248222037] (+0.000000000) compudjdev timer_hrtimer_expire_entry: { cpu_id = 10 }, { hrtimer = 0xFFFF880FC1692700, now = 211036043790, function = 0xFFFFFFFF8116FD40 }
[06:37:40.248222037] (+0.000000000) compudjdev irq_softirq_raise: { cpu_id = 10 }, { vec = 1 }
[06:37:40.248222037] (+0.000000000) compudjdev rcu_utilization: { cpu_id = 10 }, { s = "Start scheduler-tick" }
[06:37:40.248222037] (+0.000000000) compudjdev irq_softirq_raise: { cpu_id = 10 }, { vec = 9 }
[...]

% lttng view | grep "cpu_id = 0" | grep "(+0.000000000)" |wc -l
76156

LTTng kernel traces with Linux 4.8 + this fix applied:

[06:46:52.648061087] (+0.000001455) compudjdev power_cpu_idle: { cpu_id = 23 }, { state = 1, cpu_id = 23 }
[06:46:52.652011654] (+0.003950567) compudjdev x86_irq_vectors_local_timer_entry: { cpu_id = 9 }, { vector = 239 }
[06:46:52.652012324] (+0.000000670) compudjdev timer_hrtimer_cancel: { cpu_id = 9 }, { hrtimer = 0xFFFF880FC1652700 }
[06:46:52.652012654] (+0.000000330) compudjdev timer_hrtimer_expire_entry: { cpu_id = 9 }, { hrtimer = 0xFFFF880FC1652700, now = 91808012323, function = 0xFFFFFFFF8116FD80 }
[06:46:52.652013354] (+0.000000700) compudjdev irq_softirq_raise: { cpu_id = 9 }, { vec = 1 }
[06:46:52.652013784] (+0.000000430) compudjdev rcu_utilization: { cpu_id = 9 }, { s = "Start scheduler-tick" }

% lttng view | grep "cpu_id = 0" | grep "(+0.000000000)" |wc -l
2

Which confirms that this fix addresses the issue.

This issue seems to affect perf, ftrace "mono" clock source, and
eBPF in Linux.

Thanks,

Mathieu

> 
> This would result in bogus perf timestamps like:
> swapper     0 [000]   253.427536:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]   254.426573:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]   254.426687:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]   254.426800:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]   254.426905:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]   254.427022:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]   254.427127:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]   254.427239:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]   254.427346:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]   254.427463:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]   255.426572:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> 
> Instead of more reasonable expected timestamps like:
> swapper     0 [000]    39.953768:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]    40.064839:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]    40.175956:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]    40.287103:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]    40.398217:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]    40.509324:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]    40.620437:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]    40.731546:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]    40.842654:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]    40.953772:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper     0 [000]    41.064881:  111111111 cpu-clock:  ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> 
> This patch adds the proper use of timekeeping_delta_to_ns()
> to convert the cycle delta to nanoseconds as needed.
> 
> Thanks to Brendan and Alexei for finding this quickly after
> the v4.8 release. Unfortunately the problematic commit has
> landed in some -stable trees so they'll need this fix as
> well.
> 
> Many apologies for this mistake. I'll be looking to add a
> perf-clock sanity test to the kselftest timers tests soon.
> 
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable <stable@vger.kernel.org>
> Cc: Brendan Gregg <bgregg@netflix.com>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with
> CONFIG_DEBUG_TIMEKEEPING"
> Reported-by: Brendan Gregg <bgregg@netflix.com>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
> Thomas/Ingo:
> I've reproduced the issue and validated this fixes it, but given my limited perf
> usage so far, I'd appreciate waiting for a Tested-by: from one of the reporters
> before considering for tip/timers/urgent.
> 
> kernel/time/timekeeping.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index e07fb09..37dec7e 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -403,8 +403,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct
> tk_fast *tkf)
> 		tkr = tkf->base + (seq & 0x01);
> 		now = ktime_to_ns(tkr->base);
> 
> -		now += clocksource_delta(tkr->read(tkr->clock),
> -					 tkr->cycle_last, tkr->mask);
> +		now += timekeeping_delta_to_ns(tkr,
> +				clocksource_delta(
> +					tkr->read(tkr->clock),
> +					tkr->cycle_last,
> +					tkr->mask));
> 	} while (read_seqcount_retry(&tkf->seq, seq));
> 
> 	return now;
> --
> 1.9.1

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [tip:timers/urgent] timekeeping: Fix __ktime_get_fast_ns() regression
  2016-10-05  2:55     ` [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression John Stultz
  2016-10-05 10:57       ` Mathieu Desnoyers
@ 2016-10-05 13:48       ` tip-bot for John Stultz
  1 sibling, 0 replies; 13+ messages in thread
From: tip-bot for John Stultz @ 2016-10-05 13:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, john.stultz, peterz, rostedt, alexei.starovoitov,
	mathieu.desnoyers, tglx, mingo, linux-kernel, bgregg, stable

Commit-ID:  58bfea9532552d422bde7afa207e1a0f08dffa7d
Gitweb:     http://git.kernel.org/tip/58bfea9532552d422bde7afa207e1a0f08dffa7d
Author:     John Stultz <john.stultz@linaro.org>
AuthorDate: Tue, 4 Oct 2016 19:55:48 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 5 Oct 2016 15:44:46 +0200

timekeeping: Fix __ktime_get_fast_ns() regression

In commit 27727df240c7 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.

This results in bogus perf timestamps like:
 swapper     0 [000]   253.427536:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426573:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426687:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426800:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426905:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427022:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427127:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427239:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427346:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427463:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   255.426572:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Instead of more reasonable expected timestamps like:
 swapper     0 [000]    39.953768:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.064839:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.175956:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.287103:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.398217:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.509324:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.620437:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.731546:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.842654:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.953772:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    41.064881:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Add the proper use of timekeeping_delta_to_ns() to convert
the cycle delta to nanoseconds as needed.

Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.

Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.

Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"
Reported-by: Brendan Gregg <bgregg@netflix.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Tested-and-reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable <stable@vger.kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/timekeeping.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e07fb09..37dec7e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -403,8 +403,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
 		tkr = tkf->base + (seq & 0x01);
 		now = ktime_to_ns(tkr->base);
 
-		now += clocksource_delta(tkr->read(tkr->clock),
-					 tkr->cycle_last, tkr->mask);
+		now += timekeeping_delta_to_ns(tkr,
+				clocksource_delta(
+					tkr->read(tkr->clock),
+					tkr->cycle_last,
+					tkr->mask));
 	} while (read_seqcount_retry(&tkf->seq, seq));
 
 	return now;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times
  2016-08-23 23:08 [GIT PULL][PATCH 0/2] Timekeeping items for 4.8 via tip/timers/urgent John Stultz
  2016-08-23 23:08 ` [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING John Stultz
@ 2016-08-23 23:08 ` John Stultz
  2016-08-24  0:58   ` Chen Yu
                     ` (2 more replies)
  1 sibling, 3 replies; 13+ messages in thread
From: John Stultz @ 2016-08-23 23:08 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Rafael J. Wysocki, Janek Kozicki, Chen Yu, Xunlei Pang,
	Zhang Rui, linux-pm, stable

It was reported that hibernation could fail on the 2nd attempt,
where the system hangs at hibernate() -> syscore_resume() ->
i8237A_resume() -> claim_dma_lock(), because the lock has
already been taken.

However there is actually no other process would like to grab
this lock on that problematic platform.

Further investigation showed that the problem is triggered by
setting /sys/power/pm_trace to 1 before the 1st hibernation.

Since once pm_trace is enabled, the rtc becomes unmeaningful
after suspend, and meanwhile some BIOSes would like to adjust
the 'invalid' tsc(e.g, smaller than 1970) to the release date
of that motherboard during POST stage, thus after resumed, it
may seem that the system had a significant long sleep time might
due to meaningless tsc or RTC delta.

Then in timekeeping_resume -> tk_debug_account_sleep_time, if
the bit31 of the sleep time happened to be set to 1, the fls
returns 32 and then we add 1 to sleep_time_bin[32], which
caused a memory overwritten.

As depicted by System.map:
0xffffffff81c9d080 b sleep_time_bin
0xffffffff81c9d100 B dma_spin_lock
the dma_spin_lock.val is set to 1, which caused this problem.

This patch adds a sanity check in tk_debug_account_sleep_time()
to ensure we don't index past the sleep_time_bin array.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Janek Kozicki <cosurgi@gmail.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: Xunlei Pang <xpang@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: stable <stable@vger.kernel.org>
Reported-by: Janek Kozicki <cosurgi@gmail.com>
Reported-by: Chen Yu <yu.c.chen@intel.com>
[jstultz: Problem diagnosed and original patch by Chen Yu, I've
 solved the issue slightly differently, but borrowed his excelent
 explanation of of the issue here.]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping_debug.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
index f6bd652..107310a6 100644
--- a/kernel/time/timekeeping_debug.c
+++ b/kernel/time/timekeeping_debug.c
@@ -23,7 +23,9 @@

 #include "timekeeping_internal.h"

-static unsigned int sleep_time_bin[32] = {0};
+#define NUM_BINS 32
+
+static unsigned int sleep_time_bin[NUM_BINS] = {0};

 static int tk_debug_show_sleep_time(struct seq_file *s, void *data)
 {
@@ -69,6 +71,9 @@ late_initcall(tk_debug_sleep_time_init);

 void tk_debug_account_sleep_time(struct timespec64 *t)
 {
-	sleep_time_bin[fls(t->tv_sec)]++;
+	/* Cap bin index so we don't overflow the array */
+	int bin = min(fls(t->tv_sec), NUM_BINS-1);
+
+	sleep_time_bin[bin]++;
 }

-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times
  2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz
@ 2016-08-24  0:58   ` Chen Yu
  2016-08-24  2:54     ` John Stultz
  2016-08-24  2:56   ` [RFC][PATCH v2] " John Stultz
  2016-08-24  7:41   ` [tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug tip-bot for John Stultz
  2 siblings, 1 reply; 13+ messages in thread
From: Chen Yu @ 2016-08-24  0:58 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Rafael J. Wysocki, Janek Kozicki, Xunlei Pang, Zhang Rui,
	linux-pm, stable

Hi John, some small typos below, others should be OK.

On Tue, Aug 23, 2016 at 04:08:22PM -0700, John Stultz wrote:
> It was reported that hibernation could fail on the 2nd attempt,
> where the system hangs at hibernate() -> syscore_resume() ->
> i8237A_resume() -> claim_dma_lock(), because the lock has
> already been taken.
> 
> However there is actually no other process would like to grab
> this lock on that problematic platform.
> 
> Further investigation showed that the problem is triggered by
> setting /sys/power/pm_trace to 1 before the 1st hibernation.
> 
> Since once pm_trace is enabled, the rtc becomes unmeaningful
> after suspend, and meanwhile some BIOSes would like to adjust
> the 'invalid' tsc(e.g, smaller than 1970) to the release date
I checked the previous commit log, and I have made a mistake, it
should be:
s/tsc/RTC
> of that motherboard during POST stage, thus after resumed, it
> may seem that the system had a significant long sleep time might
> due to meaningless tsc or RTC delta.
s/tsc or RTC/RTC
> 
> Then in timekeeping_resume -> tk_debug_account_sleep_time, if
> the bit31 of the sleep time happened to be set to 1, the fls
> returns 32 and then we add 1 to sleep_time_bin[32], which
> caused a memory overwritten.
> 
> As depicted by System.map:
> 0xffffffff81c9d080 b sleep_time_bin
> 0xffffffff81c9d100 B dma_spin_lock
> the dma_spin_lock.val is set to 1, which caused this problem.
> 
> This patch adds a sanity check in tk_debug_account_sleep_time()
> to ensure we don't index past the sleep_time_bin array.
> 
BTW, I've also post a fix to deal with pm_trace which might break
timekeeping at:
https://patchwork.kernel.org/patch/9287347/
could you please hel take a glance? thanks.

Yu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times
  2016-08-24  0:58   ` Chen Yu
@ 2016-08-24  2:54     ` John Stultz
  0 siblings, 0 replies; 13+ messages in thread
From: John Stultz @ 2016-08-24  2:54 UTC (permalink / raw)
  To: Chen Yu
  Cc: lkml, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Rafael J. Wysocki, Janek Kozicki, Xunlei Pang, Zhang Rui,
	Linux PM list, stable

On Tue, Aug 23, 2016 at 5:58 PM, Chen Yu <yu.c.chen@intel.com> wrote:
> Hi John, some small typos below, others should be OK.

Thanks. I realize you mailed me before on this, but I forgot to
integrate the changes. Apologies.

I've now integrated the changes and updated the git branch.  The HEAD
for the pull request is now: 4efd3c8faf189674eeafb0c14ff4c556ed6adcf7.

I'll send an updated v2 patch here shortly.

thanks
-john

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH v2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times
  2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz
  2016-08-24  0:58   ` Chen Yu
@ 2016-08-24  2:56   ` John Stultz
  2016-08-24  7:41   ` [tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug tip-bot for John Stultz
  2 siblings, 0 replies; 13+ messages in thread
From: John Stultz @ 2016-08-24  2:56 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Rafael J. Wysocki, Janek Kozicki, Chen Yu, Xunlei Pang,
	Zhang Rui, linux-pm, stable

It was reported that hibernation could fail on the 2nd attempt,
where the system hangs at hibernate() -> syscore_resume() ->
i8237A_resume() -> claim_dma_lock(), because the lock has
already been taken.

However there is actually no other process would like to grab
this lock on that problematic platform.

Further investigation showed that the problem is triggered by
setting /sys/power/pm_trace to 1 before the 1st hibernation.

Since once pm_trace is enabled, the RTC becomes unmeaningful
after suspend, and meanwhile some BIOSes would like to adjust
the 'invalid' RTC (e.g, smaller than 1970) to the release date
of that motherboard during POST stage, thus after resumed, it
may seem that the system had a significant long sleep time might
due to meaningless RTC delta.

Then in timekeeping_resume -> tk_debug_account_sleep_time, if
the bit31 of the sleep time happened to be set to 1, the fls
returns 32 and then we add 1 to sleep_time_bin[32], which
caused a memory overwritten.

As depicted by System.map:
0xffffffff81c9d080 b sleep_time_bin
0xffffffff81c9d100 B dma_spin_lock
the dma_spin_lock.val is set to 1, which caused this problem.

This patch adds a sanity check in tk_debug_account_sleep_time()
to ensure we don't index past the sleep_time_bin array.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Janek Kozicki <cosurgi@gmail.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: Xunlei Pang <xpang@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: stable <stable@vger.kernel.org>
Reported-by: Janek Kozicki <cosurgi@gmail.com>
Reported-by: Chen Yu <yu.c.chen@intel.com>
[jstultz: Problem diagnosed and original patch by Chen Yu, I've
 solved the issue slightly differently, but borrowed his excelent
 explanation of of the issue here.]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
v2: Minor tweaks to the commit message

 kernel/time/timekeeping_debug.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
index f6bd652..107310a6 100644
--- a/kernel/time/timekeeping_debug.c
+++ b/kernel/time/timekeeping_debug.c
@@ -23,7 +23,9 @@

 #include "timekeeping_internal.h"

-static unsigned int sleep_time_bin[32] = {0};
+#define NUM_BINS 32
+
+static unsigned int sleep_time_bin[NUM_BINS] = {0};

 static int tk_debug_show_sleep_time(struct seq_file *s, void *data)
 {
@@ -69,6 +71,9 @@ late_initcall(tk_debug_sleep_time_init);

 void tk_debug_account_sleep_time(struct timespec64 *t)
 {
-	sleep_time_bin[fls(t->tv_sec)]++;
+	/* Cap bin index so we don't overflow the array */
+	int bin = min(fls(t->tv_sec), NUM_BINS-1);
+
+	sleep_time_bin[bin]++;
 }

-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug
  2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz
  2016-08-24  0:58   ` Chen Yu
  2016-08-24  2:56   ` [RFC][PATCH v2] " John Stultz
@ 2016-08-24  7:41   ` tip-bot for John Stultz
  2 siblings, 0 replies; 13+ messages in thread
From: tip-bot for John Stultz @ 2016-08-24  7:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: rjw, peterz, stable, linux-kernel, yu.c.chen, hpa, tglx,
	john.stultz, xpang, mingo, rui.zhang, cosurgi

Commit-ID:  a4f8f6667f099036c88f231dcad4cf233652c824
Gitweb:     http://git.kernel.org/tip/a4f8f6667f099036c88f231dcad4cf233652c824
Author:     John Stultz <john.stultz@linaro.org>
AuthorDate: Tue, 23 Aug 2016 16:08:22 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 24 Aug 2016 09:34:32 +0200

timekeeping: Cap array access in timekeeping_debug

It was reported that hibernation could fail on the 2nd attempt, where the
system hangs at hibernate() -> syscore_resume() -> i8237A_resume() ->
claim_dma_lock(), because the lock has already been taken.

However there is actually no other process would like to grab this lock on
that problematic platform.

Further investigation showed that the problem is triggered by setting
/sys/power/pm_trace to 1 before the 1st hibernation.

Since once pm_trace is enabled, the rtc becomes unmeaningful after suspend,
and meanwhile some BIOSes would like to adjust the 'invalid' RTC (e.g, smaller
than 1970) to the release date of that motherboard during POST stage, thus
after resumed, it may seem that the system had a significant long sleep time
which is a completely meaningless value.

Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the
sleep time happened to be set to 1, fls() returns 32 and we add 1 to
sleep_time_bin[32], which causes an out of bounds array access and therefor
memory being overwritten.

As depicted by System.map:
0xffffffff81c9d080 b sleep_time_bin
0xffffffff81c9d100 B dma_spin_lock
the dma_spin_lock.val is set to 1, which caused this problem.

This patch adds a sanity check in tk_debug_account_sleep_time()
to ensure we don't index past the sleep_time_bin array.

[jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the
 issue slightly differently, but borrowed his excelent explanation of the
 issue here.]

Fixes: 5c83545f24ab "power: Add option to log time spent in suspend"
Reported-by: Janek Kozicki <cosurgi@gmail.com>
Reported-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xunlei Pang <xpang@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: stable <stable@vger.kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: http://lkml.kernel.org/r/1471993702-29148-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/time/timekeeping_debug.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
index f6bd652..107310a 100644
--- a/kernel/time/timekeeping_debug.c
+++ b/kernel/time/timekeeping_debug.c
@@ -23,7 +23,9 @@

 #include "timekeeping_internal.h"

-static unsigned int sleep_time_bin[32] = {0};
+#define NUM_BINS 32
+
+static unsigned int sleep_time_bin[NUM_BINS] = {0};

 static int tk_debug_show_sleep_time(struct seq_file *s, void *data)
 {
@@ -69,6 +71,9 @@ late_initcall(tk_debug_sleep_time_init);

 void tk_debug_account_sleep_time(struct timespec64 *t)
 {
-	sleep_time_bin[fls(t->tv_sec)]++;
+	/* Cap bin index so we don't overflow the array */
+	int bin = min(fls(t->tv_sec), NUM_BINS-1);
+
+	sleep_time_bin[bin]++;
 }

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-10-05 13:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-23 23:08 [GIT PULL][PATCH 0/2] Timekeeping items for 4.8 via tip/timers/urgent John Stultz
2016-08-23 23:08 ` [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING John Stultz
2016-08-24  7:40   ` [tip:timers/urgent] " tip-bot for John Stultz
2016-10-05  0:30   ` [PATCH 1/2] " Alexei Starovoitov
2016-10-05  0:50     ` John Stultz
2016-10-05  2:55     ` [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression John Stultz
2016-10-05 10:57       ` Mathieu Desnoyers
2016-10-05 13:48       ` [tip:timers/urgent] " tip-bot for John Stultz
2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz
2016-08-24  0:58   ` Chen Yu
2016-08-24  2:54     ` John Stultz
2016-08-24  2:56   ` [RFC][PATCH v2] " John Stultz
2016-08-24  7:41   ` [tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug tip-bot for John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).