From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932988AbcI3NtY (ORCPT ); Fri, 30 Sep 2016 09:49:24 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:58656 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751741AbcI3NtO (ORCPT ); Fri, 30 Sep 2016 09:49:14 -0400 Date: Fri, 30 Sep 2016 06:48:35 -0700 From: "Paul E. McKenney" To: Thomas Gleixner Cc: Rich Felker , Daniel Lezcano , devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, linux-sh@vger.kernel.org, Rob Herring , Mark Rutland Subject: Re: [PATCH v7 2/2] clocksource: add J-Core timer/clocksource driver Reply-To: paulmck@linux.vnet.ibm.com References: <22c1ee0f908fe3bf8b70f5e87d659ceb29af1434.1474693319.git.dalias@libc.org> <20160926210716.GA12855@brightrain.aerifal.cx> <4b02ba7d-4a31-297a-bbbd-be26da615e7b@linaro.org> <20160926222352.GE19318@brightrain.aerifal.cx> <20160927004258.GF19318@brightrain.aerifal.cx> <20160927220820.GH19318@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16093013-0012-0000-0000-000010C81874 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005830; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000186; SDB=6.00763090; UDB=6.00363842; IPR=6.00538248; BA=6.00004774; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012831; XFM=3.00000011; UTC=2016-09-30 13:48:37 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16093013-0013-0000-0000-000045ECE113 Message-Id: <20160930134835.GT14933@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-09-30_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609280000 definitions=main-1609300251 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 30, 2016 at 03:15:11PM +0200, Thomas Gleixner wrote: > On Tue, 27 Sep 2016, Rich Felker wrote: > > I've managed to get a trace with a stall. I'm not sure what the best > > way to share the full thing is, since it's large, but here are the > > potentially interesting parts. [ . . . ] Some RCU commentary, on the off-chance that it helps... > So that should kick rcu_sched-7 in 10ms, latest 20ms from now and CPU1 goes > into a NOHZ idle sleep. > > > -0 [001] d... 109.953436: tick_stop: success=1 dependency=NONE > > -0 [001] d... 109.953617: hrtimer_cancel: hrtimer=109f449c > > -0 [001] d... 109.953818: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109880000000 softexpires=109880000000 > > which is (using the 0.087621us delta between the trace clock and clock > MONO) at: 109.880 + 0.087621 = 109.968 > > Which is about correct as we expect the RCU timer to fire at: > > 109.952633 + 0.01 = 109.963633 > > or latest at > > 109.952633 + 0.02 = 109.983633 > > There is another caveat. That nohz stuff can queue the rcu timer on CPU0, which > it did not because: Just for annoying completeness, the location of the timer depends on how the rcuo callback-offload kthreads are constrained. And yes, in the most constrained case where all CPUs except for CPU 0 are nohz CPUs, they will by default all run on CPU 0. > > rcu_sched-7 [001] d... 109.952633: timer_start: timer=160a9eb0 function=process_timeout expires=4294948284 [timeout=1] flags=0x00000001 > > The CPU nr encoded in flags is: 1 > > Now we cancel and restart the timer w/o seing the interrupt expiring > it. And that expiry should have happened at 109.968000 !?! > > > -0 [001] d... 109.968225: hrtimer_cancel: hrtimer=109f449c > > -0 [001] d... 109.968526: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109890000000 softexpires=109890000000 > > So this advances the next tick even further out. And CPU 0 sets the timer to > the exact smae value: > > > -0 [000] d.h. 109.969104: hrtimer_start: hrtimer=109e949c function=tick_sched_timer expires=109890000000 softexpires=109890000000 > > > > -0 [000] d.h. 109.977690: irq_handler_entry: irq=16 name=jcore_pit > > -0 [000] d.h. 109.977911: hrtimer_cancel: hrtimer=109e949c > > -0 [000] d.h. 109.978053: hrtimer_expire_entry: hrtimer=109e949c function=tick_sched_timer now=109890434160 > > Which expires here. And CPU1 instead of getting an interrupt and expiring > the timer does the cancel/restart to the next jiffie again: > > > -0 [001] d... 109.978206: hrtimer_cancel: hrtimer=109f449c > > -0 [001] d... 109.978495: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109900000000 softexpires=109900000000 > > And this repeats; > > > -0 [000] d.h. 109.987726: irq_handler_entry: irq=16 name=jcore_pit > > -0 [000] d.h. 109.987954: hrtimer_cancel: hrtimer=109e949c > > -0 [000] d.h. 109.988095: hrtimer_expire_entry: hrtimer=109e949c function=tick_sched_timer now=109900474620 > > > -0 [001] d... 109.988243: hrtimer_cancel: hrtimer=109f449c > > -0 [001] d... 109.988537: hrtimer_start: hrtimer=109f449c fun9c > > There is something badly wrong here. > > > -0 [000] ..s. 110.019443: softirq_entry: vec=1 [action=TIMER] > > -0 [000] ..s. 110.019617: softirq_exit: vec=1 [action=TIMER] > > -0 [000] ..s. 110.019730: softirq_entry: vec=7 [action=SCHED] > > -0 [000] ..s. 110.020174: softirq_exit: vec=7 [action=SCHED] > > -0 [000] d.h. 110.027674: irq_handler_entry: irq=16 name=jcore_pit > > > > The rcu_sched process does not run again after the tick_stop until > > 132s, and only a few RCU softirqs happen (all shown above). During > > this time, cpu1 has no interrupt activity and nothing in the trace > > except the above hrtimer_cancel/hrtimer_start pairs (not sure how > > they're happening without any interrupts). > > If you drop out of the arch idle into the core idle loop then you might end > up with this. You want to add a few trace points or trace_printks() to the > involved functions. tick_nohz_restart() tick_nohz_stop_sched_tick() > tick_nohz_restart_sched_tick() and the idle code should be a good starting > point. > > > This pattern repeats until almost 131s, where cpu1 goes into a frenzy > > of hrtimer_cancel/start: > > It's not a frenzy. It's the same pattern as above. It arms the timer to the > next tick, but that timer never ever fires. And it does that every tick .... > > Please put a tracepoint into your set_next_event() callback as well. SO > this changes here: > > > -0 [001] d... 132.198170: hrtimer_cancel: hrtimer=109f449c > > -0 [001] d... 132.198451: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=132120000000 softexpires=132120000000 > > > -0 [001] dnh. 132.205860: irq_handler_entry: irq=20 name=ipi > > -0 [001] dnh. 132.206041: irq_handler_exit: irq=20 ret=handle > > So CPU1 gets an IPI > > > -0 [001] dn.. 132.206650: hrtimer_cancel: hrtimer=109f449c > 49c function=tick_sched_timer now=132119115200 > > -0 [001] dn.. 132.206936: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=132120000000 softexpires=132120000000 > > And rcu-sched-7 gets running magically, but we don't know what woke it > up. Definitely not the timer, because that did not fire. > > > rcu_sched-7 [001] d... 132.207710: timer_cancel: timer=160a9eb0 It could have been an explicit wakeup at the end of a grace period. That would explain its cancelling the timer, anyway. > > - During the whole sequence, hrtimer expiration times are being set to > > exact jiffies (@ 100 Hz), whereas before it they're quite arbitrary. > > When a CPU goes into NOHZ idle and the next (timer/hrtimer) is farther out > than the next tick, then tick_sched_timer is set to this next event which > can be far out. So that's expected. > > > - The CLOCK_MONOTONIC hrtimer times do not match up with the > > timestamps; they're off by about 0.087s. I assume this is just > > sched_clock vs clocksource time and not a big deal. > > Yes. You can tell the tracer to use clock monotonic so then they should match. > > > - The rcu_sched process is sleeping with timeout=1. This seems > > odd/excessive. > > Why is that odd? That's one tick, i.e. 10ms in your case. And that's not > the problem at all. The problem is your timer not firing, but the cpu is > obviously either getting out of idle and then moves the tick ahead for some > unknown reason. And a one-jiffy timeout is in fact expected behavior when HZ=100. You have to be running HZ=250 or better to have two-jiffy timeouts, and HZ=500 or better for three-jiffy timeouts. Thanx, Paul