From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
John Stultz <john.stultz@linaro.org>,
Eric Dumazet <edumazet@google.com>,
Anna-Maria Gleixner <anna-maria@linutronix.de>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
linux-pm@vger.kernel.org, Arjan van de Ven <arjan@infradead.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Rik van Riel <riel@redhat.com>
Subject: Re: [patch V2 00/10] timer: Move from a push remote at enqueue to a pull at expiry model
Date: Fri, 21 Apr 2017 12:28:53 -0700 [thread overview]
Message-ID: <20170421192853.GD3956@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170418111102.490432548@linutronix.de>
On Tue, Apr 18, 2017 at 01:11:02PM +0200, Thomas Gleixner wrote:
> Placing timers at enqueue time on a target CPU based on dubious heuristics
> does not make any sense:
>
> 1) Most timer wheel timers are canceled or rearmed before they expire.
>
> 2) The heuristics to predict which CPU will be busy when the timer expires
> are wrong by definition.
>
> So we waste precious cycles to place timers at enqueue time.
>
> The proper solution to this problem is to always queue the timers on the
> local CPU and allow the non pinned timers to be pulled onto a busy CPU at
> expiry time.
>
> To achieve this the timer storage has been split into local pinned and
> global timers. Local pinned timers are always expired on the CPU on which
> they have been queued. Global timers can be expired on any CPU.
>
> As long as a CPU is busy it expires both local and global timers. When a
> CPU goes idle it arms for the first expiring local timer. If the first
> expiring pinned (local) timer is before the first expiring movable timer,
> then no action is required because the CPU will wake up before the first
> movable timer expires. If the first expiring movable timer is before the
> first expiring pinned (local) timer, then this timer is queued into a idle
> timerqueue and eventually expired by some other active CPU.
>
> To avoid global locking the timerqueues are implemented as a hierarchy. The
> lowest level of the hierarchy holds the CPUs. The CPUs are associated to
> groups of 8, which are seperated per node. If more than one CPU group
> exist, then a second level in the hierarchy collects the groups. Depending
> on the size of the system more than 2 levels are required. Each group has a
> "migrator" which checks the timerqueue during the tick for remote expirable
> timers.
>
> If the last CPU in a group goes idle it reports the first expiring event in
> the group up to the next group(s) in the hierarchy. If the last CPU goes
> idle it arms its timer for the first system wide expiring timer to ensure
> that no timer event is missed.
OK, after several attempts this week, I formally state that I officially
suck at reviewing, though I must confess that I am quite partial to use
of hierarchies to avoid scalability bottlenecks. I therefore applied
these patches and ran rcutorture on them. Two of 19 scenarios resulted
in failures, namely TREE04 and TREE07. I have posted their .config and
console.log files here:
http://www2.rdrop.com/users/paulmck/submission/TREE04.2017.04.21a.config
http://www2.rdrop.com/users/paulmck/submission/TREE04.2017.04.21a.console.log
http://www2.rdrop.com/users/paulmck/submission/TREE07.2017.04.21a.config
http://www2.rdrop.com/users/paulmck/submission/TREE07.2017.04.21a.console.log
The first splat from TREE04 is:
[ 3.513964] WARNING: CPU: 1 PID: 755 at /home/paulmck/public_git/linux-rcu/kernel/time/timer_migration.c:387 tmigr_set_cpu_active+0xd7/0xf0
[ 3.517320] Modules linked in:
[ 3.518173] CPU: 1 PID: 755 Comm: rcu_torture_rea Not tainted 4.11.0-rc2+ #1
[ 3.520134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 3.522692] Call Trace:
[ 3.523354] <IRQ>
[ 3.523953] dump_stack+0x4d/0x65
[ 3.524846] __warn+0xc6/0xe0
[ 3.525680] warn_slowpath_null+0x18/0x20
[ 3.526787] tmigr_set_cpu_active+0xd7/0xf0
[ 3.527866] tmigr_cpu_activate+0x36/0x40
[ 3.528872] tick_nohz_stop_sched_tick+0x248/0x330
[ 3.530103] tick_nohz_irq_exit+0x109/0x140
[ 3.531214] irq_exit+0x74/0xf0
[ 3.532068] smp_apic_timer_interrupt+0x38/0x50
[ 3.533239] apic_timer_interrupt+0x90/0xa0
[ 3.534363] RIP: 0010:__local_bh_enable_ip+0x3e/0x80
[ 3.535722] RSP: 0000:ffffc900014ffe60 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
[ 3.537640] RAX: 0000000000000000 RBX: ffffffff8319c0e0 RCX: 000000000000000a
[ 3.539461] RDX: ffffffff81e46b40 RSI: 0000000000000200 RDI: ffffffff810b0630
[ 3.541307] RBP: ffffc900014ffe60 R08: 0000000000000001 R09: 0000000000000101
[ 3.543497] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffee5
[ 3.545556] R13: 0000000000000000 R14: 00000000dfa1ef74 R15: fffffffffffffee6
[ 3.547186] </IRQ>
[ 3.547689] ? rcu_bh_torture_deferred_free+0x20/0x20
[ 3.548837] rcu_bh_torture_read_unlock+0x15/0x20
[ 3.549861] rcu_torture_reader+0xe4/0x2b0
[ 3.550801] ? rcu_torture_reader+0x2b0/0x2b0
[ 3.551768] kthread+0xfc/0x130
[ 3.552465] ? rcu_torture_fqs+0xe0/0xe0
[ 3.553387] ? kthread_create_on_node+0x40/0x40
[ 3.554460] ret_from_fork+0x29/0x40
Line 387 is this line in tmigr_set_cpu_active():
if (WARN_ON(group->active == group->num_childs)) {
The splat from TREE07 looks quite similar, although it has three splats
rather than only two.
Thanx, Paul
> The series is also available from git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.timers
>
> Changes vs. V1:
> - Add missing inline stubs
> - Bail out when running on UP
> - Don't compile migration code for SMP=n
> - Reorder trace point storage
>
> Thanks,
>
> tglx
> ---
> /timer_migration.h | 173 ++++++++++
> b/kernel/time/timer_migration.c | 666 ++++++++++++++++++++++++++++++++++++++++
> b/kernel/time/timer_migration.h | 83 ++++
> include/linux/cpuhotplug.h | 1
> kernel/time/Makefile | 3
> kernel/time/tick-internal.h | 4
> kernel/time/tick-sched.c | 121 ++++++-
> kernel/time/tick-sched.h | 3
> kernel/time/timer.c | 239 +++++++++-----
> lib/timerqueue.c | 8
> 10 files changed, 1205 insertions(+), 96 deletions(-)
>
>
>
>
prev parent reply other threads:[~2017-04-21 19:29 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-18 11:11 [patch V2 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
2017-04-18 11:11 ` [patch V2 01/10] timer: Invoke timer_start_debug() where it makes sense Thomas Gleixner
2017-04-18 11:11 ` [patch V2 02/10] timerqueue: Document return values of timerqueue_add/del() Thomas Gleixner
2017-04-18 11:11 ` [patch V2 03/10] timers: Rework idle logic Thomas Gleixner
2017-04-19 6:50 ` Peter Zijlstra
2017-04-21 14:43 ` Frederic Weisbecker
2017-04-18 11:11 ` [patch V2 04/10] timer: Keep the pinned timers separate from the others Thomas Gleixner
2017-04-18 11:11 ` [patch V2 05/10] timer: Retrieve next expiry of pinned/non-pinned timers seperately Thomas Gleixner
2017-04-19 7:05 ` Peter Zijlstra
2017-04-19 9:56 ` Thomas Gleixner
2017-04-18 11:11 ` [patch V2 06/10] timer: Restructure internal locking Thomas Gleixner
2017-04-19 7:07 ` Peter Zijlstra
2017-04-18 11:11 ` [patch V2 07/10] tick/sched: Split out jiffies update helper function Thomas Gleixner
2017-04-18 11:11 ` [patch V2 08/10] timer: Implement the hierarchical pull model Thomas Gleixner
2017-04-19 7:20 ` Peter Zijlstra
2017-04-19 7:24 ` Peter Zijlstra
2017-04-19 7:34 ` Peter Zijlstra
2017-04-19 7:38 ` Peter Zijlstra
2017-04-19 8:11 ` Peter Zijlstra
2017-04-19 8:31 ` Thomas Gleixner
2017-04-19 8:36 ` Peter Zijlstra
2017-04-19 9:03 ` Thomas Gleixner
2017-04-19 8:52 ` Peter Zijlstra
2017-04-19 9:09 ` Peter Zijlstra
2017-04-19 9:43 ` Thomas Gleixner
2017-04-19 9:52 ` Peter Zijlstra
2017-04-19 9:44 ` Peter Zijlstra
2017-04-19 9:53 ` Peter Zijlstra
2017-04-19 9:20 ` Peter Zijlstra
2017-04-19 10:22 ` Peter Zijlstra
2017-04-18 11:11 ` [patch V2 09/10] timer_migration: Add tracepoints Thomas Gleixner
2017-04-18 11:11 ` [patch V2 10/10] timer: Always queue timers on the local CPU Thomas Gleixner
2017-04-21 19:28 ` Paul E. McKenney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170421192853.GD3956@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=anna-maria@linutronix.de \
--cc=arjan@infradead.org \
--cc=edumazet@google.com \
--cc=fweisbec@gmail.com \
--cc=john.stultz@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).