Potential Issue in Tracing Ring Buffer

* Potential Issue in Tracing Ring Buffer
@ 2020-11-24 22:39 J. Avila
  2020-11-25  0:56 ` Steven Rostedt
  2020-11-26  5:52 ` Greg KH
  0 siblings, 2 replies; 8+ messages in thread
From: J. Avila @ 2020-11-24 22:39 UTC (permalink / raw)
  To: rostedt, mingo; +Cc: gregkh, john.stultz, linux-kernel, kernel-team

Hello,

In the ftrace logs we've collected internally, we have found that there are
situations where time seems to go backwards; this breaks userspace tools which
expect time to always go forward in these logs. For example, in this snippet
from a db845c running a 5.10-rc4 kernel[1] (thanks for getting us the trace,
John!), we see:

      android.bg-568     [002] dN.1        26247: rcu_utilization: Start context switch
           <...>-589     [003] ....        26247: sys_enter: NR 73 (b400006f4898c520, 2, 0, 0, 0, 11)
 PowerManagerSer-578     [001] d..2        26248: sched_switch: prev_comm=PowerManagerSer prev_pid=578 prev_prio=116 prev_state=S ==> next_comm=Binder:381_2 next_pid=395 next_prio=120
      android.bg-568     [002] dN.1        26248: rcu_utilization: End context switch
           <...>-589     [003] d..1        26249: rcu_utilization: Start context switch
           <...>-589     [003] d..1        26250: rcu_utilization: End context switch
    Binder:381_2-395     [001] ....        26251: sys_exit: NR 98 = 0
      android.bg-568     [002] dN.2        26251: sched_stat_wait: comm=kworker/u16:3 pid=164 delay=0 [ns]
           <...>-589     [003] d..2        26252: sched_stat_runtime: comm=statsd.writer pid=589 runtime=611094 [ns] vruntime=24367153868 [ns]
      android.bg-568     [002] d..2        26254: sched_switch: prev_comm=android.bg prev_pid=568 prev_prio=130 prev_state=R+ ==> next_comm=kworker/u16:3 next_pid=164 next_prio=120
   kworker/u16:3-164     [002] ....        26256: workqueue_execute_start: work struct 00000000ab3fe95e: function ufshcd_clk_scaling_resume_work
    Binder:381_2-395     [001] ....        26257: sys_enter: NR 98 (b400006fb8984bf0, 89, 185c, 0, 0, ffffffff)
    Binder:381_2-395     [001] d..1        26258: rcu_utilization: Start context switch
   kworker/u16:3-164     [002] ....        26258: workqueue_execute_end: work struct 00000000ab3fe95e: function ufshcd_clk_scaling_resume_work
    Binder:381_2-395     [001] d..1        26260: rcu_utilization: End context switch
   kworker/u16:3-164     [002] d..1        26261: rcu_utilization: Start context switch
    Binder:381_2-395     [001] d..2        26262: sched_stat_runtime: comm=Binder:381_2 pid=395 runtime=100989 [ns] vruntime=40144094287 [ns]
   kworker/u16:3-164     [002] d..1        26263: rcu_utilization: End context switch
   kworker/u16:3-164     [002] d..2        26237: sched_stat_runtime: comm=kworker/u16:3 pid=164 runtime=71614 [ns] vruntime=32851170186 [ns]
   kworker/u16:3-164     [002] d..2        26240: sched_stat_wait: comm=android.bg pid=568 delay=99635 [ns]
   kworker/u16:3-164     [002] d..2        26241: sched_switch: prev_comm=kworker/u16:3 prev_pid=164 prev_prio=120 prev_state=I ==> next_comm=android.bg next_pid=568 next_prio=130

In this trace, we switched the trace clock to counter to rule out hardware
issues, but this has been seen with default settings as well. This is
consistently reproducible - we see it regularly when collecting any trace for
~60 seconds. This seems like it could be a problem in the upstream code, as this
kernel had no modifications made to the ftrace code. Is this a known problem in
upstream? Is there any additional information I could provide to verify this is
an upstream issue? Is anyone else observing this behavior, and if so, do they
happen to be working on a fix? If not, do you have any pointers on how we can
address this?

Thanks,

Avila

[1] Taken from https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline

^ permalink raw reply	[flat|nested] 8+ messages in thread