linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	John Stultz <john.stultz@linaro.org>,
	Eric Dumazet <edumazet@google.com>,
	Anna-Maria Gleixner <anna-maria@linutronix.de>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	linux-pm@vger.kernel.org, Arjan van de Ven <arjan@infradead.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Rik van Riel <riel@redhat.com>
Subject: Re: [patch V2 00/10] timer: Move from a push remote at enqueue to a pull at expiry model
Date: Fri, 21 Apr 2017 12:28:53 -0700	[thread overview]
Message-ID: <20170421192853.GD3956@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170418111102.490432548@linutronix.de>

On Tue, Apr 18, 2017 at 01:11:02PM +0200, Thomas Gleixner wrote:
> Placing timers at enqueue time on a target CPU based on dubious heuristics
> does not make any sense:
> 
>  1) Most timer wheel timers are canceled or rearmed before they expire.
> 
>  2) The heuristics to predict which CPU will be busy when the timer expires
>     are wrong by definition.
> 
> So we waste precious cycles to place timers at enqueue time.
> 
> The proper solution to this problem is to always queue the timers on the
> local CPU and allow the non pinned timers to be pulled onto a busy CPU at
> expiry time.
> 
> To achieve this the timer storage has been split into local pinned and
> global timers. Local pinned timers are always expired on the CPU on which
> they have been queued. Global timers can be expired on any CPU.
> 
> As long as a CPU is busy it expires both local and global timers. When a
> CPU goes idle it arms for the first expiring local timer. If the first
> expiring pinned (local) timer is before the first expiring movable timer,
> then no action is required because the CPU will wake up before the first
> movable timer expires. If the first expiring movable timer is before the
> first expiring pinned (local) timer, then this timer is queued into a idle
> timerqueue and eventually expired by some other active CPU.
> 
> To avoid global locking the timerqueues are implemented as a hierarchy. The
> lowest level of the hierarchy holds the CPUs. The CPUs are associated to
> groups of 8, which are seperated per node. If more than one CPU group
> exist, then a second level in the hierarchy collects the groups. Depending
> on the size of the system more than 2 levels are required. Each group has a
> "migrator" which checks the timerqueue during the tick for remote expirable
> timers.
> 
> If the last CPU in a group goes idle it reports the first expiring event in
> the group up to the next group(s) in the hierarchy. If the last CPU goes
> idle it arms its timer for the first system wide expiring timer to ensure
> that no timer event is missed.

OK, after several attempts this week, I formally state that I officially
suck at reviewing, though I must confess that I am quite partial to use
of hierarchies to avoid scalability bottlenecks.  I therefore applied
these patches and ran rcutorture on them.  Two of 19 scenarios resulted
in failures, namely TREE04 and TREE07.  I have posted their .config and
console.log files here:

http://www2.rdrop.com/users/paulmck/submission/TREE04.2017.04.21a.config
http://www2.rdrop.com/users/paulmck/submission/TREE04.2017.04.21a.console.log
http://www2.rdrop.com/users/paulmck/submission/TREE07.2017.04.21a.config
http://www2.rdrop.com/users/paulmck/submission/TREE07.2017.04.21a.console.log

The first splat from TREE04 is:

[    3.513964] WARNING: CPU: 1 PID: 755 at /home/paulmck/public_git/linux-rcu/kernel/time/timer_migration.c:387 tmigr_set_cpu_active+0xd7/0xf0
[    3.517320] Modules linked in:
[    3.518173] CPU: 1 PID: 755 Comm: rcu_torture_rea Not tainted 4.11.0-rc2+ #1
[    3.520134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[    3.522692] Call Trace:
[    3.523354]  <IRQ>
[    3.523953]  dump_stack+0x4d/0x65
[    3.524846]  __warn+0xc6/0xe0
[    3.525680]  warn_slowpath_null+0x18/0x20
[    3.526787]  tmigr_set_cpu_active+0xd7/0xf0
[    3.527866]  tmigr_cpu_activate+0x36/0x40
[    3.528872]  tick_nohz_stop_sched_tick+0x248/0x330
[    3.530103]  tick_nohz_irq_exit+0x109/0x140
[    3.531214]  irq_exit+0x74/0xf0
[    3.532068]  smp_apic_timer_interrupt+0x38/0x50
[    3.533239]  apic_timer_interrupt+0x90/0xa0
[    3.534363] RIP: 0010:__local_bh_enable_ip+0x3e/0x80
[    3.535722] RSP: 0000:ffffc900014ffe60 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff10
[    3.537640] RAX: 0000000000000000 RBX: ffffffff8319c0e0 RCX: 000000000000000a
[    3.539461] RDX: ffffffff81e46b40 RSI: 0000000000000200 RDI: ffffffff810b0630
[    3.541307] RBP: ffffc900014ffe60 R08: 0000000000000001 R09: 0000000000000101
[    3.543497] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffee5
[    3.545556] R13: 0000000000000000 R14: 00000000dfa1ef74 R15: fffffffffffffee6
[    3.547186]  </IRQ>
[    3.547689]  ? rcu_bh_torture_deferred_free+0x20/0x20
[    3.548837]  rcu_bh_torture_read_unlock+0x15/0x20
[    3.549861]  rcu_torture_reader+0xe4/0x2b0
[    3.550801]  ? rcu_torture_reader+0x2b0/0x2b0
[    3.551768]  kthread+0xfc/0x130
[    3.552465]  ? rcu_torture_fqs+0xe0/0xe0
[    3.553387]  ? kthread_create_on_node+0x40/0x40
[    3.554460]  ret_from_fork+0x29/0x40

Line 387 is this line in tmigr_set_cpu_active():

	if (WARN_ON(group->active == group->num_childs)) {

The splat from TREE07 looks quite similar, although it has three splats
rather than only two.

							Thanx, Paul

> The series is also available from git:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.timers
> 
> Changes vs. V1:
>   - Add missing inline stubs
>   - Bail out when running on UP
>   - Don't compile migration code for SMP=n
>   - Reorder trace point storage
> 
> Thanks,
> 
> 	tglx
> ---
>  /timer_migration.h              |  173 ++++++++++
>  b/kernel/time/timer_migration.c |  666 ++++++++++++++++++++++++++++++++++++++++
>  b/kernel/time/timer_migration.h |   83 ++++
>  include/linux/cpuhotplug.h      |    1 
>  kernel/time/Makefile            |    3 
>  kernel/time/tick-internal.h     |    4 
>  kernel/time/tick-sched.c        |  121 ++++++-
>  kernel/time/tick-sched.h        |    3 
>  kernel/time/timer.c             |  239 +++++++++-----
>  lib/timerqueue.c                |    8 
>  10 files changed, 1205 insertions(+), 96 deletions(-)
> 
> 
> 
> 

      parent reply	other threads:[~2017-04-21 19:29 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-18 11:11 [patch V2 00/10] timer: Move from a push remote at enqueue to a pull at expiry model Thomas Gleixner
2017-04-18 11:11 ` [patch V2 01/10] timer: Invoke timer_start_debug() where it makes sense Thomas Gleixner
2017-04-18 11:11 ` [patch V2 02/10] timerqueue: Document return values of timerqueue_add/del() Thomas Gleixner
2017-04-18 11:11 ` [patch V2 03/10] timers: Rework idle logic Thomas Gleixner
2017-04-19  6:50   ` Peter Zijlstra
2017-04-21 14:43     ` Frederic Weisbecker
2017-04-18 11:11 ` [patch V2 04/10] timer: Keep the pinned timers separate from the others Thomas Gleixner
2017-04-18 11:11 ` [patch V2 05/10] timer: Retrieve next expiry of pinned/non-pinned timers seperately Thomas Gleixner
2017-04-19  7:05   ` Peter Zijlstra
2017-04-19  9:56     ` Thomas Gleixner
2017-04-18 11:11 ` [patch V2 06/10] timer: Restructure internal locking Thomas Gleixner
2017-04-19  7:07   ` Peter Zijlstra
2017-04-18 11:11 ` [patch V2 07/10] tick/sched: Split out jiffies update helper function Thomas Gleixner
2017-04-18 11:11 ` [patch V2 08/10] timer: Implement the hierarchical pull model Thomas Gleixner
2017-04-19  7:20   ` Peter Zijlstra
2017-04-19  7:24   ` Peter Zijlstra
2017-04-19  7:34   ` Peter Zijlstra
2017-04-19  7:38   ` Peter Zijlstra
2017-04-19  8:11   ` Peter Zijlstra
2017-04-19  8:31     ` Thomas Gleixner
2017-04-19  8:36       ` Peter Zijlstra
2017-04-19  9:03         ` Thomas Gleixner
2017-04-19  8:52   ` Peter Zijlstra
2017-04-19  9:09   ` Peter Zijlstra
2017-04-19  9:43     ` Thomas Gleixner
2017-04-19  9:52       ` Peter Zijlstra
2017-04-19  9:44     ` Peter Zijlstra
2017-04-19  9:53       ` Peter Zijlstra
2017-04-19  9:20   ` Peter Zijlstra
2017-04-19 10:22   ` Peter Zijlstra
2017-04-18 11:11 ` [patch V2 09/10] timer_migration: Add tracepoints Thomas Gleixner
2017-04-18 11:11 ` [patch V2 10/10] timer: Always queue timers on the local CPU Thomas Gleixner
2017-04-21 19:28 ` Paul E. McKenney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170421192853.GD3956@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=anna-maria@linutronix.de \
    --cc=arjan@infradead.org \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).