* [GIT PULL][PATCH 0/2] Timekeeping items for 4.8 via tip/timers/urgent @ 2016-08-23 23:08 John Stultz 2016-08-23 23:08 ` [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING John Stultz 2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz 0 siblings, 2 replies; 13+ messages in thread From: John Stultz @ 2016-08-23 23:08 UTC (permalink / raw) To: lkml Cc: John Stultz, Steven Rostedt, Peter Zijlstra, Ingo Molnar, Thomas Gleixner Hey Thomas, Ingo, Wanted to submit the following changes for review and inclusion for tip/timers/urgent. The first is a fix for an NMI safety issue Steven pointed out last week and the other is a potential memory corruption issue. Let me know if you have any objections. thanks -john Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> If desired, the patches can be pulled as follows: The following changes since commit be5769e2061ac40b32daa83e28e1c4aac7133511: clocksource/drivers/mips-gic-timer: Make gic_clocksource_of_init() return int (2016-08-17 13:08:32 +0200) are available in the git repository at: https://git.linaro.org/people/john.stultz/linux.git fortglx/4.8/time for you to fetch changes up to 30a56a04134382bd8f0fa51db19b94af7f74f9ee: timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times (2016-08-23 16:03:15 -0700) ---------------------------------------------------------------- John Stultz (2): timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times kernel/time/timekeeping.c | 5 ++++- kernel/time/timekeeping_debug.c | 9 +++++++-- 2 files changed, 11 insertions(+), 3 deletions(-) -- 1.9.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING 2016-08-23 23:08 [GIT PULL][PATCH 0/2] Timekeeping items for 4.8 via tip/timers/urgent John Stultz @ 2016-08-23 23:08 ` John Stultz 2016-08-24 7:40 ` [tip:timers/urgent] " tip-bot for John Stultz 2016-10-05 0:30 ` [PATCH 1/2] " Alexei Starovoitov 2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz 1 sibling, 2 replies; 13+ messages in thread From: John Stultz @ 2016-08-23 23:08 UTC (permalink / raw) To: lkml Cc: John Stultz, Steven Rostedt, Peter Zijlstra, Ingo Molnar, Thomas Gleixner, stable When I added some extra sanity checking in timekeeping_get_ns() under CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns() method was using timekeeping_get_ns(). Thus the locking added to the debug checks broke the NMI-safety of __ktime_get_fast_ns(). This patch open-codes the timekeeping_get_ns() logic for __ktime_get_fast_ns(), so can avoid any deadlocks in NMI. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable <stable@vger.kernel.org> # 4.1+ Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <john.stultz@linaro.org> --- kernel/time/timekeeping.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 3b65746..e07fb09 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) do { seq = raw_read_seqcount_latch(&tkf->seq); tkr = tkf->base + (seq & 0x01); - now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr); + now = ktime_to_ns(tkr->base); + + now += clocksource_delta(tkr->read(tkr->clock), + tkr->cycle_last, tkr->mask); } while (read_seqcount_retry(&tkf->seq, seq)); return now; -- 1.9.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [tip:timers/urgent] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING 2016-08-23 23:08 ` [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING John Stultz @ 2016-08-24 7:40 ` tip-bot for John Stultz 2016-10-05 0:30 ` [PATCH 1/2] " Alexei Starovoitov 1 sibling, 0 replies; 13+ messages in thread From: tip-bot for John Stultz @ 2016-08-24 7:40 UTC (permalink / raw) To: linux-tip-commits Cc: tglx, rostedt, mingo, peterz, linux-kernel, hpa, john.stultz, stable Commit-ID: 27727df240c7cc84f2ba6047c6f18d5addfd25ef Gitweb: http://git.kernel.org/tip/27727df240c7cc84f2ba6047c6f18d5addfd25ef Author: John Stultz <john.stultz@linaro.org> AuthorDate: Tue, 23 Aug 2016 16:08:21 -0700 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Wed, 24 Aug 2016 09:34:31 +0200 timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING When I added some extra sanity checking in timekeeping_get_ns() under CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns() method was using timekeeping_get_ns(). Thus the locking added to the debug checks broke the NMI-safety of __ktime_get_fast_ns(). This patch open-codes the timekeeping_get_ns() logic for __ktime_get_fast_ns(), so can avoid any deadlocks in NMI. Fixes: 4ca22c2648f9 "timekeeping: Add warnings when overflows or underflows are observed" Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: stable <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/time/timekeeping.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 3b65746..e07fb09 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) do { seq = raw_read_seqcount_latch(&tkf->seq); tkr = tkf->base + (seq & 0x01); - now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr); + now = ktime_to_ns(tkr->base); + + now += clocksource_delta(tkr->read(tkr->clock), + tkr->cycle_last, tkr->mask); } while (read_seqcount_retry(&tkf->seq, seq)); return now; ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING 2016-08-23 23:08 ` [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING John Stultz 2016-08-24 7:40 ` [tip:timers/urgent] " tip-bot for John Stultz @ 2016-10-05 0:30 ` Alexei Starovoitov 2016-10-05 0:50 ` John Stultz 2016-10-05 2:55 ` [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression John Stultz 1 sibling, 2 replies; 13+ messages in thread From: Alexei Starovoitov @ 2016-10-05 0:30 UTC (permalink / raw) To: John Stultz Cc: lkml, Steven Rostedt, Peter Zijlstra, Ingo Molnar, Thomas Gleixner, stable, Brendan Gregg On Tue, Aug 23, 2016 at 04:08:21PM -0700, John Stultz wrote: > When I added some extra sanity checking in timekeeping_get_ns() under > CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns() > method was using timekeeping_get_ns(). > > Thus the locking added to the debug checks broke the NMI-safety of > __ktime_get_fast_ns(). > > This patch open-codes the timekeeping_get_ns() logic for > __ktime_get_fast_ns(), so can avoid any deadlocks in NMI. > > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: stable <stable@vger.kernel.org> # 4.1+ > Reported-by: Steven Rostedt <rostedt@goodmis.org> > Reported-by: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: John Stultz <john.stultz@linaro.org> > --- > kernel/time/timekeeping.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index 3b65746..e07fb09 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) > do { > seq = raw_read_seqcount_latch(&tkf->seq); > tkr = tkf->base + (seq & 0x01); > - now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr); > + now = ktime_to_ns(tkr->base); > + > + now += clocksource_delta(tkr->read(tkr->clock), > + tkr->cycle_last, tkr->mask); we're seeing the time jumping backwards between __ktime_get_fast_ns calls. and looks like this patch broke it, since delta is being added to ns. It seems it should be: now += timekeeping_delta_to_ns(clocksource_delta(...)); or better fix possible? Reported-by: Brendan Gregg <bgregg@netflix.com> # perf record -F 9 --cpu 0 --clockid CLOCK_MONOTONIC -- sleep 1 # perf script swapper 0 [000] 2942480.468981: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 2942480.471468: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 2942480.473959: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 2942480.475202: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 2942480.477691: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 2942481.470226: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 2942481.472569: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 2942481.476460: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 2942481.478943: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) Note the timestamp jump. fixed (rolled back patch): # perf record -F 9 --cpu 0 --clockid CLOCK_MONOTONIC -- sleep 1 # perf script swapper 0 [000] 589.729663: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 589.840772: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 589.951883: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 590.062994: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 590.174075: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 590.285217: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 590.396328: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 590.507439: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) swapper 0 [000] 590.618550: 111111111 cpu-clock: ffffffff810013aa xen_hypercall_sched_op+0xa (vmlinux) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING 2016-10-05 0:30 ` [PATCH 1/2] " Alexei Starovoitov @ 2016-10-05 0:50 ` John Stultz 2016-10-05 2:55 ` [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression John Stultz 1 sibling, 0 replies; 13+ messages in thread From: John Stultz @ 2016-10-05 0:50 UTC (permalink / raw) To: Alexei Starovoitov Cc: lkml, Steven Rostedt, Peter Zijlstra, Ingo Molnar, Thomas Gleixner, stable, Brendan Gregg On Tue, Oct 4, 2016 at 5:30 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Tue, Aug 23, 2016 at 04:08:21PM -0700, John Stultz wrote: >> When I added some extra sanity checking in timekeeping_get_ns() under >> CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns() >> method was using timekeeping_get_ns(). >> >> Thus the locking added to the debug checks broke the NMI-safety of >> __ktime_get_fast_ns(). >> >> This patch open-codes the timekeeping_get_ns() logic for >> __ktime_get_fast_ns(), so can avoid any deadlocks in NMI. >> >> Cc: Steven Rostedt <rostedt@goodmis.org> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Cc: Ingo Molnar <mingo@kernel.org> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: stable <stable@vger.kernel.org> # 4.1+ >> Reported-by: Steven Rostedt <rostedt@goodmis.org> >> Reported-by: Peter Zijlstra <peterz@infradead.org> >> Signed-off-by: John Stultz <john.stultz@linaro.org> >> --- >> kernel/time/timekeeping.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c >> index 3b65746..e07fb09 100644 >> --- a/kernel/time/timekeeping.c >> +++ b/kernel/time/timekeeping.c >> @@ -401,7 +401,10 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) >> do { >> seq = raw_read_seqcount_latch(&tkf->seq); >> tkr = tkf->base + (seq & 0x01); >> - now = ktime_to_ns(tkr->base) + timekeeping_get_ns(tkr); >> + now = ktime_to_ns(tkr->base); >> + >> + now += clocksource_delta(tkr->read(tkr->clock), >> + tkr->cycle_last, tkr->mask); > > we're seeing the time jumping backwards between __ktime_get_fast_ns calls. > and looks like this patch broke it, since delta is being added to ns. > It seems it should be: > now += timekeeping_delta_to_ns(clocksource_delta(...)); > or better fix possible? Gah! Yes. Quite right. I'm not sure how I missed that. Thanks for catching it quickly and apologies! I've got an initial fix but I'm sitting down for dinner so I'll send it out a bit later after I can test it. thanks -john ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression 2016-10-05 0:30 ` [PATCH 1/2] " Alexei Starovoitov 2016-10-05 0:50 ` John Stultz @ 2016-10-05 2:55 ` John Stultz 2016-10-05 10:57 ` Mathieu Desnoyers 2016-10-05 13:48 ` [tip:timers/urgent] " tip-bot for John Stultz 1 sibling, 2 replies; 13+ messages in thread From: John Stultz @ 2016-10-05 2:55 UTC (permalink / raw) To: lkml Cc: John Stultz, Steven Rostedt, Peter Zijlstra, Ingo Molnar, Thomas Gleixner, stable, Brendan Gregg, Alexei Starovoitov In commit 27727df240c7 ("Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code the timekeeping_get_ns() function, but I forgot to include the unit conversion from cycles to nanoseconds, breaking the function's output, which impacts users like perf. This would result in bogus perf timestamps like: swapper 0 [000] 253.427536: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426573: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426687: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426800: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426905: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427022: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427127: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427239: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427346: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427463: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 255.426572: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) Instead of more reasonable expected timestamps like: swapper 0 [000] 39.953768: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.064839: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.175956: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.287103: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.398217: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.509324: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.620437: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.731546: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.842654: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.953772: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 41.064881: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) This patch adds the proper use of timekeeping_delta_to_ns() to convert the cycle delta to nanoseconds as needed. Thanks to Brendan and Alexei for finding this quickly after the v4.8 release. Unfortunately the problematic commit has landed in some -stable trees so they'll need this fix as well. Many apologies for this mistake. I'll be looking to add a perf-clock sanity test to the kselftest timers tests soon. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable <stable@vger.kernel.org> Cc: Brendan Gregg <bgregg@netflix.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING" Reported-by: Brendan Gregg <bgregg@netflix.com> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> --- Thomas/Ingo: I've reproduced the issue and validated this fixes it, but given my limited perf usage so far, I'd appreciate waiting for a Tested-by: from one of the reporters before considering for tip/timers/urgent. kernel/time/timekeeping.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index e07fb09..37dec7e 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -403,8 +403,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) tkr = tkf->base + (seq & 0x01); now = ktime_to_ns(tkr->base); - now += clocksource_delta(tkr->read(tkr->clock), - tkr->cycle_last, tkr->mask); + now += timekeeping_delta_to_ns(tkr, + clocksource_delta( + tkr->read(tkr->clock), + tkr->cycle_last, + tkr->mask)); } while (read_seqcount_retry(&tkf->seq, seq)); return now; -- 1.9.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression 2016-10-05 2:55 ` [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression John Stultz @ 2016-10-05 10:57 ` Mathieu Desnoyers 2016-10-05 13:48 ` [tip:timers/urgent] " tip-bot for John Stultz 1 sibling, 0 replies; 13+ messages in thread From: Mathieu Desnoyers @ 2016-10-05 10:57 UTC (permalink / raw) To: John Stultz Cc: linux-kernel, rostedt, Peter Zijlstra, Ingo Molnar, Thomas Gleixner, stable, Brendan Gregg, Alexei Starovoitov, Linus Torvalds ----- On Oct 4, 2016, at 10:55 PM, John Stultz john.stultz@linaro.org wrote: > In commit 27727df240c7 ("Avoid taking lock in NMI path with > CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code > the timekeeping_get_ns() function, but I forgot to include > the unit conversion from cycles to nanoseconds, breaking the > function's output, which impacts users like perf. It also impacts LTTng. I've had reports of timestamp borkage from LTTng end users yesterday with kernels 4.7.5 and 4.7.4. Some stable branches sleuthing indicates that the following kernel versions are affected: 4.8, 4.7.4+, 4.4.20+, 4.1.32+ I plan to add kernel version range checks in lttng-modules to work-around this mess, but I need an upper bound. Is there any way we can get this upstream and cherry-picked into stable branches ASAP ? Tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LTTng kernel traces with Linux 4.8: [06:37:40.248222037] (+0.000001057) compudjdev power_cpu_idle: { cpu_id = 10 }, { state = 1, cpu_id = 10 } [06:37:40.248222037] (+0.000000000) compudjdev x86_irq_vectors_local_timer_entry: { cpu_id = 10 }, { vector = 239 } [06:37:40.248222037] (+0.000000000) compudjdev timer_hrtimer_cancel: { cpu_id = 10 }, { hrtimer = 0xFFFF880FC1692700 } [06:37:40.248222037] (+0.000000000) compudjdev timer_hrtimer_expire_entry: { cpu_id = 10 }, { hrtimer = 0xFFFF880FC1692700, now = 211036043790, function = 0xFFFFFFFF8116FD40 } [06:37:40.248222037] (+0.000000000) compudjdev irq_softirq_raise: { cpu_id = 10 }, { vec = 1 } [06:37:40.248222037] (+0.000000000) compudjdev rcu_utilization: { cpu_id = 10 }, { s = "Start scheduler-tick" } [06:37:40.248222037] (+0.000000000) compudjdev irq_softirq_raise: { cpu_id = 10 }, { vec = 9 } [...] % lttng view | grep "cpu_id = 0" | grep "(+0.000000000)" |wc -l 76156 LTTng kernel traces with Linux 4.8 + this fix applied: [06:46:52.648061087] (+0.000001455) compudjdev power_cpu_idle: { cpu_id = 23 }, { state = 1, cpu_id = 23 } [06:46:52.652011654] (+0.003950567) compudjdev x86_irq_vectors_local_timer_entry: { cpu_id = 9 }, { vector = 239 } [06:46:52.652012324] (+0.000000670) compudjdev timer_hrtimer_cancel: { cpu_id = 9 }, { hrtimer = 0xFFFF880FC1652700 } [06:46:52.652012654] (+0.000000330) compudjdev timer_hrtimer_expire_entry: { cpu_id = 9 }, { hrtimer = 0xFFFF880FC1652700, now = 91808012323, function = 0xFFFFFFFF8116FD80 } [06:46:52.652013354] (+0.000000700) compudjdev irq_softirq_raise: { cpu_id = 9 }, { vec = 1 } [06:46:52.652013784] (+0.000000430) compudjdev rcu_utilization: { cpu_id = 9 }, { s = "Start scheduler-tick" } % lttng view | grep "cpu_id = 0" | grep "(+0.000000000)" |wc -l 2 Which confirms that this fix addresses the issue. This issue seems to affect perf, ftrace "mono" clock source, and eBPF in Linux. Thanks, Mathieu > > This would result in bogus perf timestamps like: > swapper 0 [000] 253.427536: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 254.426573: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 254.426687: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 254.426800: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 254.426905: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 254.427022: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 254.427127: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 254.427239: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 254.427346: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 254.427463: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 255.426572: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > > Instead of more reasonable expected timestamps like: > swapper 0 [000] 39.953768: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 40.064839: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 40.175956: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 40.287103: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 40.398217: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 40.509324: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 40.620437: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 40.731546: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 40.842654: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 40.953772: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > swapper 0 [000] 41.064881: 111111111 cpu-clock: ffffffff810a0de6 > native_safe_halt+0x6 ([kernel.kallsyms]) > > This patch adds the proper use of timekeeping_delta_to_ns() > to convert the cycle delta to nanoseconds as needed. > > Thanks to Brendan and Alexei for finding this quickly after > the v4.8 release. Unfortunately the problematic commit has > landed in some -stable trees so they'll need this fix as > well. > > Many apologies for this mistake. I'll be looking to add a > perf-clock sanity test to the kselftest timers tests soon. > > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: stable <stable@vger.kernel.org> > Cc: Brendan Gregg <bgregg@netflix.com> > Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with > CONFIG_DEBUG_TIMEKEEPING" > Reported-by: Brendan Gregg <bgregg@netflix.com> > Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Signed-off-by: John Stultz <john.stultz@linaro.org> > --- > Thomas/Ingo: > I've reproduced the issue and validated this fixes it, but given my limited perf > usage so far, I'd appreciate waiting for a Tested-by: from one of the reporters > before considering for tip/timers/urgent. > > kernel/time/timekeeping.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index e07fb09..37dec7e 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -403,8 +403,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct > tk_fast *tkf) > tkr = tkf->base + (seq & 0x01); > now = ktime_to_ns(tkr->base); > > - now += clocksource_delta(tkr->read(tkr->clock), > - tkr->cycle_last, tkr->mask); > + now += timekeeping_delta_to_ns(tkr, > + clocksource_delta( > + tkr->read(tkr->clock), > + tkr->cycle_last, > + tkr->mask)); > } while (read_seqcount_retry(&tkf->seq, seq)); > > return now; > -- > 1.9.1 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip:timers/urgent] timekeeping: Fix __ktime_get_fast_ns() regression 2016-10-05 2:55 ` [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression John Stultz 2016-10-05 10:57 ` Mathieu Desnoyers @ 2016-10-05 13:48 ` tip-bot for John Stultz 1 sibling, 0 replies; 13+ messages in thread From: tip-bot for John Stultz @ 2016-10-05 13:48 UTC (permalink / raw) To: linux-tip-commits Cc: hpa, john.stultz, peterz, rostedt, alexei.starovoitov, mathieu.desnoyers, tglx, mingo, linux-kernel, bgregg, stable Commit-ID: 58bfea9532552d422bde7afa207e1a0f08dffa7d Gitweb: http://git.kernel.org/tip/58bfea9532552d422bde7afa207e1a0f08dffa7d Author: John Stultz <john.stultz@linaro.org> AuthorDate: Tue, 4 Oct 2016 19:55:48 -0700 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Wed, 5 Oct 2016 15:44:46 +0200 timekeeping: Fix __ktime_get_fast_ns() regression In commit 27727df240c7 ("Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code the timekeeping_get_ns() function, but I forgot to include the unit conversion from cycles to nanoseconds, breaking the function's output, which impacts users like perf. This results in bogus perf timestamps like: swapper 0 [000] 253.427536: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426573: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426687: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426800: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.426905: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427022: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427127: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427239: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427346: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 254.427463: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 255.426572: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) Instead of more reasonable expected timestamps like: swapper 0 [000] 39.953768: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.064839: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.175956: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.287103: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.398217: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.509324: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.620437: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.731546: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.842654: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 40.953772: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) swapper 0 [000] 41.064881: 111111111 cpu-clock: ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms]) Add the proper use of timekeeping_delta_to_ns() to convert the cycle delta to nanoseconds as needed. Thanks to Brendan and Alexei for finding this quickly after the v4.8 release. Unfortunately the problematic commit has landed in some -stable trees so they'll need this fix as well. Many apologies for this mistake. I'll be looking to add a perf-clock sanity test to the kselftest timers tests soon. Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING" Reported-by: Brendan Gregg <bgregg@netflix.com> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Tested-and-reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable <stable@vger.kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/time/timekeeping.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index e07fb09..37dec7e 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -403,8 +403,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf) tkr = tkf->base + (seq & 0x01); now = ktime_to_ns(tkr->base); - now += clocksource_delta(tkr->read(tkr->clock), - tkr->cycle_last, tkr->mask); + now += timekeeping_delta_to_ns(tkr, + clocksource_delta( + tkr->read(tkr->clock), + tkr->cycle_last, + tkr->mask)); } while (read_seqcount_retry(&tkf->seq, seq)); return now; ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times 2016-08-23 23:08 [GIT PULL][PATCH 0/2] Timekeeping items for 4.8 via tip/timers/urgent John Stultz 2016-08-23 23:08 ` [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING John Stultz @ 2016-08-23 23:08 ` John Stultz 2016-08-24 0:58 ` Chen Yu ` (2 more replies) 1 sibling, 3 replies; 13+ messages in thread From: John Stultz @ 2016-08-23 23:08 UTC (permalink / raw) To: lkml Cc: John Stultz, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Janek Kozicki, Chen Yu, Xunlei Pang, Zhang Rui, linux-pm, stable It was reported that hibernation could fail on the 2nd attempt, where the system hangs at hibernate() -> syscore_resume() -> i8237A_resume() -> claim_dma_lock(), because the lock has already been taken. However there is actually no other process would like to grab this lock on that problematic platform. Further investigation showed that the problem is triggered by setting /sys/power/pm_trace to 1 before the 1st hibernation. Since once pm_trace is enabled, the rtc becomes unmeaningful after suspend, and meanwhile some BIOSes would like to adjust the 'invalid' tsc(e.g, smaller than 1970) to the release date of that motherboard during POST stage, thus after resumed, it may seem that the system had a significant long sleep time might due to meaningless tsc or RTC delta. Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the sleep time happened to be set to 1, the fls returns 32 and then we add 1 to sleep_time_bin[32], which caused a memory overwritten. As depicted by System.map: 0xffffffff81c9d080 b sleep_time_bin 0xffffffff81c9d100 B dma_spin_lock the dma_spin_lock.val is set to 1, which caused this problem. This patch adds a sanity check in tk_debug_account_sleep_time() to ensure we don't index past the sleep_time_bin array. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Janek Kozicki <cosurgi@gmail.com> Cc: Chen Yu <yu.c.chen@intel.com> Cc: Xunlei Pang <xpang@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: linux-pm@vger.kernel.org Cc: stable <stable@vger.kernel.org> Reported-by: Janek Kozicki <cosurgi@gmail.com> Reported-by: Chen Yu <yu.c.chen@intel.com> [jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the issue slightly differently, but borrowed his excelent explanation of of the issue here.] Signed-off-by: John Stultz <john.stultz@linaro.org> --- kernel/time/timekeeping_debug.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c index f6bd652..107310a6 100644 --- a/kernel/time/timekeeping_debug.c +++ b/kernel/time/timekeeping_debug.c @@ -23,7 +23,9 @@ #include "timekeeping_internal.h" -static unsigned int sleep_time_bin[32] = {0}; +#define NUM_BINS 32 + +static unsigned int sleep_time_bin[NUM_BINS] = {0}; static int tk_debug_show_sleep_time(struct seq_file *s, void *data) { @@ -69,6 +71,9 @@ late_initcall(tk_debug_sleep_time_init); void tk_debug_account_sleep_time(struct timespec64 *t) { - sleep_time_bin[fls(t->tv_sec)]++; + /* Cap bin index so we don't overflow the array */ + int bin = min(fls(t->tv_sec), NUM_BINS-1); + + sleep_time_bin[bin]++; } -- 1.9.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times 2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz @ 2016-08-24 0:58 ` Chen Yu 2016-08-24 2:54 ` John Stultz 2016-08-24 2:56 ` [RFC][PATCH v2] " John Stultz 2016-08-24 7:41 ` [tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug tip-bot for John Stultz 2 siblings, 1 reply; 13+ messages in thread From: Chen Yu @ 2016-08-24 0:58 UTC (permalink / raw) To: John Stultz Cc: lkml, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Janek Kozicki, Xunlei Pang, Zhang Rui, linux-pm, stable Hi John, some small typos below, others should be OK. On Tue, Aug 23, 2016 at 04:08:22PM -0700, John Stultz wrote: > It was reported that hibernation could fail on the 2nd attempt, > where the system hangs at hibernate() -> syscore_resume() -> > i8237A_resume() -> claim_dma_lock(), because the lock has > already been taken. > > However there is actually no other process would like to grab > this lock on that problematic platform. > > Further investigation showed that the problem is triggered by > setting /sys/power/pm_trace to 1 before the 1st hibernation. > > Since once pm_trace is enabled, the rtc becomes unmeaningful > after suspend, and meanwhile some BIOSes would like to adjust > the 'invalid' tsc(e.g, smaller than 1970) to the release date I checked the previous commit log, and I have made a mistake, it should be: s/tsc/RTC > of that motherboard during POST stage, thus after resumed, it > may seem that the system had a significant long sleep time might > due to meaningless tsc or RTC delta. s/tsc or RTC/RTC > > Then in timekeeping_resume -> tk_debug_account_sleep_time, if > the bit31 of the sleep time happened to be set to 1, the fls > returns 32 and then we add 1 to sleep_time_bin[32], which > caused a memory overwritten. > > As depicted by System.map: > 0xffffffff81c9d080 b sleep_time_bin > 0xffffffff81c9d100 B dma_spin_lock > the dma_spin_lock.val is set to 1, which caused this problem. > > This patch adds a sanity check in tk_debug_account_sleep_time() > to ensure we don't index past the sleep_time_bin array. > BTW, I've also post a fix to deal with pm_trace which might break timekeeping at: https://patchwork.kernel.org/patch/9287347/ could you please hel take a glance? thanks. Yu ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times 2016-08-24 0:58 ` Chen Yu @ 2016-08-24 2:54 ` John Stultz 0 siblings, 0 replies; 13+ messages in thread From: John Stultz @ 2016-08-24 2:54 UTC (permalink / raw) To: Chen Yu Cc: lkml, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Janek Kozicki, Xunlei Pang, Zhang Rui, Linux PM list, stable On Tue, Aug 23, 2016 at 5:58 PM, Chen Yu <yu.c.chen@intel.com> wrote: > Hi John, some small typos below, others should be OK. Thanks. I realize you mailed me before on this, but I forgot to integrate the changes. Apologies. I've now integrated the changes and updated the git branch. The HEAD for the pull request is now: 4efd3c8faf189674eeafb0c14ff4c556ed6adcf7. I'll send an updated v2 patch here shortly. thanks -john ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC][PATCH v2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times 2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz 2016-08-24 0:58 ` Chen Yu @ 2016-08-24 2:56 ` John Stultz 2016-08-24 7:41 ` [tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug tip-bot for John Stultz 2 siblings, 0 replies; 13+ messages in thread From: John Stultz @ 2016-08-24 2:56 UTC (permalink / raw) To: lkml Cc: John Stultz, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Janek Kozicki, Chen Yu, Xunlei Pang, Zhang Rui, linux-pm, stable It was reported that hibernation could fail on the 2nd attempt, where the system hangs at hibernate() -> syscore_resume() -> i8237A_resume() -> claim_dma_lock(), because the lock has already been taken. However there is actually no other process would like to grab this lock on that problematic platform. Further investigation showed that the problem is triggered by setting /sys/power/pm_trace to 1 before the 1st hibernation. Since once pm_trace is enabled, the RTC becomes unmeaningful after suspend, and meanwhile some BIOSes would like to adjust the 'invalid' RTC (e.g, smaller than 1970) to the release date of that motherboard during POST stage, thus after resumed, it may seem that the system had a significant long sleep time might due to meaningless RTC delta. Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the sleep time happened to be set to 1, the fls returns 32 and then we add 1 to sleep_time_bin[32], which caused a memory overwritten. As depicted by System.map: 0xffffffff81c9d080 b sleep_time_bin 0xffffffff81c9d100 B dma_spin_lock the dma_spin_lock.val is set to 1, which caused this problem. This patch adds a sanity check in tk_debug_account_sleep_time() to ensure we don't index past the sleep_time_bin array. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Janek Kozicki <cosurgi@gmail.com> Cc: Chen Yu <yu.c.chen@intel.com> Cc: Xunlei Pang <xpang@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: linux-pm@vger.kernel.org Cc: stable <stable@vger.kernel.org> Reported-by: Janek Kozicki <cosurgi@gmail.com> Reported-by: Chen Yu <yu.c.chen@intel.com> [jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the issue slightly differently, but borrowed his excelent explanation of of the issue here.] Signed-off-by: John Stultz <john.stultz@linaro.org> --- v2: Minor tweaks to the commit message kernel/time/timekeeping_debug.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c index f6bd652..107310a6 100644 --- a/kernel/time/timekeeping_debug.c +++ b/kernel/time/timekeeping_debug.c @@ -23,7 +23,9 @@ #include "timekeeping_internal.h" -static unsigned int sleep_time_bin[32] = {0}; +#define NUM_BINS 32 + +static unsigned int sleep_time_bin[NUM_BINS] = {0}; static int tk_debug_show_sleep_time(struct seq_file *s, void *data) { @@ -69,6 +71,9 @@ late_initcall(tk_debug_sleep_time_init); void tk_debug_account_sleep_time(struct timespec64 *t) { - sleep_time_bin[fls(t->tv_sec)]++; + /* Cap bin index so we don't overflow the array */ + int bin = min(fls(t->tv_sec), NUM_BINS-1); + + sleep_time_bin[bin]++; } -- 1.9.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug 2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz 2016-08-24 0:58 ` Chen Yu 2016-08-24 2:56 ` [RFC][PATCH v2] " John Stultz @ 2016-08-24 7:41 ` tip-bot for John Stultz 2 siblings, 0 replies; 13+ messages in thread From: tip-bot for John Stultz @ 2016-08-24 7:41 UTC (permalink / raw) To: linux-tip-commits Cc: rjw, peterz, stable, linux-kernel, yu.c.chen, hpa, tglx, john.stultz, xpang, mingo, rui.zhang, cosurgi Commit-ID: a4f8f6667f099036c88f231dcad4cf233652c824 Gitweb: http://git.kernel.org/tip/a4f8f6667f099036c88f231dcad4cf233652c824 Author: John Stultz <john.stultz@linaro.org> AuthorDate: Tue, 23 Aug 2016 16:08:22 -0700 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Wed, 24 Aug 2016 09:34:32 +0200 timekeeping: Cap array access in timekeeping_debug It was reported that hibernation could fail on the 2nd attempt, where the system hangs at hibernate() -> syscore_resume() -> i8237A_resume() -> claim_dma_lock(), because the lock has already been taken. However there is actually no other process would like to grab this lock on that problematic platform. Further investigation showed that the problem is triggered by setting /sys/power/pm_trace to 1 before the 1st hibernation. Since once pm_trace is enabled, the rtc becomes unmeaningful after suspend, and meanwhile some BIOSes would like to adjust the 'invalid' RTC (e.g, smaller than 1970) to the release date of that motherboard during POST stage, thus after resumed, it may seem that the system had a significant long sleep time which is a completely meaningless value. Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the sleep time happened to be set to 1, fls() returns 32 and we add 1 to sleep_time_bin[32], which causes an out of bounds array access and therefor memory being overwritten. As depicted by System.map: 0xffffffff81c9d080 b sleep_time_bin 0xffffffff81c9d100 B dma_spin_lock the dma_spin_lock.val is set to 1, which caused this problem. This patch adds a sanity check in tk_debug_account_sleep_time() to ensure we don't index past the sleep_time_bin array. [jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the issue slightly differently, but borrowed his excelent explanation of the issue here.] Fixes: 5c83545f24ab "power: Add option to log time spent in suspend" Reported-by: Janek Kozicki <cosurgi@gmail.com> Reported-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: linux-pm@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xunlei Pang <xpang@redhat.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: stable <stable@vger.kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Link: http://lkml.kernel.org/r/1471993702-29148-3-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/time/timekeeping_debug.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c index f6bd652..107310a 100644 --- a/kernel/time/timekeeping_debug.c +++ b/kernel/time/timekeeping_debug.c @@ -23,7 +23,9 @@ #include "timekeeping_internal.h" -static unsigned int sleep_time_bin[32] = {0}; +#define NUM_BINS 32 + +static unsigned int sleep_time_bin[NUM_BINS] = {0}; static int tk_debug_show_sleep_time(struct seq_file *s, void *data) { @@ -69,6 +71,9 @@ late_initcall(tk_debug_sleep_time_init); void tk_debug_account_sleep_time(struct timespec64 *t) { - sleep_time_bin[fls(t->tv_sec)]++; + /* Cap bin index so we don't overflow the array */ + int bin = min(fls(t->tv_sec), NUM_BINS-1); + + sleep_time_bin[bin]++; } ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-10-05 13:51 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-23 23:08 [GIT PULL][PATCH 0/2] Timekeeping items for 4.8 via tip/timers/urgent John Stultz 2016-08-23 23:08 ` [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING John Stultz 2016-08-24 7:40 ` [tip:timers/urgent] " tip-bot for John Stultz 2016-10-05 0:30 ` [PATCH 1/2] " Alexei Starovoitov 2016-10-05 0:50 ` John Stultz 2016-10-05 2:55 ` [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression John Stultz 2016-10-05 10:57 ` Mathieu Desnoyers 2016-10-05 13:48 ` [tip:timers/urgent] " tip-bot for John Stultz 2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz 2016-08-24 0:58 ` Chen Yu 2016-08-24 2:54 ` John Stultz 2016-08-24 2:56 ` [RFC][PATCH v2] " John Stultz 2016-08-24 7:41 ` [tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug tip-bot for John Stultz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).