All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Lameter <cl@linux.com>
To: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Gilad Ben Yossef <giladb@mellanox.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Tejun Heo <tj@kernel.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	linux-doc@vger.kernel.org, linux-api@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v13 00/12] support "task_isolation" mode
Date: Wed, 20 Jul 2016 21:04:03 -0500 (CDT)	[thread overview]
Message-ID: <alpine.DEB.2.20.1607202059180.25838@east.gentwo.org> (raw)
In-Reply-To: <1468529299-27929-1-git-send-email-cmetcalf@mellanox.com>

We are trying to test the patchset on x86 and are getting strange
backtraces and aborts. It seems that the cpu before the cpu we are running
on creates an irq_work event that causes a latency event on the next cpu.

This is weird. Is there a new round robin IPI feature in the kernel that I
am not aware of?

Backtraces from dmesg:

[  956.603223] latencytest/7928: task_isolation mode lost due to irq_work
[  956.610817] cpu 12: irq_work violating task isolation for latencytest/7928 on cpu 13
[  956.619985] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 4.7.0-rc7-stream1 #1
[  956.628765] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.0.2 03/15/2016
[  956.637642]  0000000000000086 ce6735c7b39e7b81 ffff88103e783d00 ffffffff8134f6ff
[  956.646739]  ffff88102c50d700 000000000000000d ffff88103e783d28 ffffffff811986f4
[  956.655828]  ffff88102c50d700 ffff88203cf97f80 000000000000000d ffff88103e783d68
[  956.664924] Call Trace:
[  956.667945]  <IRQ>  [<ffffffff8134f6ff>] dump_stack+0x63/0x84
[  956.674740]  [<ffffffff811986f4>] task_isolation_debug_task+0xb4/0xd0
[  956.682229]  [<ffffffff810b4a13>] _task_isolation_debug+0x83/0xc0
[  956.689331]  [<ffffffff81179c0c>] irq_work_queue_on+0x9c/0x120
[  956.696142]  [<ffffffff811075e4>] tick_nohz_full_kick_cpu+0x44/0x50
[  956.703438]  [<ffffffff810b48d9>] wake_up_nohz_cpu+0x99/0x110
[  956.710150]  [<ffffffff810f57e1>] internal_add_timer+0x71/0xb0
[  956.716959]  [<ffffffff810f696b>] add_timer_on+0xbb/0x140
[  956.723283]  [<ffffffff81100ca0>] clocksource_watchdog+0x230/0x300
[  956.730480]  [<ffffffff81100a70>] ? __clocksource_unstable.isra.2+0x40/0x40
[  956.738555]  [<ffffffff810f5615>] call_timer_fn+0x35/0x120
[  956.744973]  [<ffffffff81100a70>] ? __clocksource_unstable.isra.2+0x40/0x40
[  956.753046]  [<ffffffff810f64cc>] run_timer_softirq+0x23c/0x2f0
[  956.759952]  [<ffffffff816d4397>] __do_softirq+0xd7/0x2c5
[  956.766272]  [<ffffffff81091245>] irq_exit+0xf5/0x100
[  956.772209]  [<ffffffff816d41d2>] smp_apic_timer_interrupt+0x42/0x50
[  956.779600]  [<ffffffff816d231c>] apic_timer_interrupt+0x8c/0xa0
[  956.786602]  <EOI>  [<ffffffff81569eb0>] ? poll_idle+0x40/0x80
[  956.793490]  [<ffffffff815697dc>] cpuidle_enter_state+0x9c/0x260
[  956.800498]  [<ffffffff815699d7>] cpuidle_enter+0x17/0x20
[  956.806810]  [<ffffffff810cf497>] cpu_startup_entry+0x2b7/0x3a0
[  956.813717]  [<ffffffff81050e6c>] start_secondary+0x15c/0x1a0
[ 1036.601758] cpu 12: irq_work violating task isolation for latencytest/8447 on cpu 13
[ 1036.610922] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 4.7.0-rc7-stream1 #1
[ 1036.619692] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.0.2 03/15/2016
[ 1036.628551]  0000000000000086 ce6735c7b39e7b81 ffff88103e783d00 ffffffff8134f6ff
[ 1036.637648]  ffff88102dca0000 000000000000000d ffff88103e783d28 ffffffff811986f4
[ 1036.646741]  ffff88102dca0000 ffff88203cf97f80 000000000000000d ffff88103e783d68
[ 1036.655833] Call Trace:
[ 1036.658852]  <IRQ>  [<ffffffff8134f6ff>] dump_stack+0x63/0x84
[ 1036.665649]  [<ffffffff811986f4>] task_isolation_debug_task+0xb4/0xd0
[ 1036.673136]  [<ffffffff810b4a13>] _task_isolation_debug+0x83/0xc0
[ 1036.680237]  [<ffffffff81179c0c>] irq_work_queue_on+0x9c/0x120
[ 1036.687091]  [<ffffffff811075e4>] tick_nohz_full_kick_cpu+0x44/0x50
[ 1036.694388]  [<ffffffff810b48d9>] wake_up_nohz_cpu+0x99/0x110
[ 1036.701089]  [<ffffffff810f57e1>] internal_add_timer+0x71/0xb0
[ 1036.707896]  [<ffffffff810f696b>] add_timer_on+0xbb/0x140
[ 1036.714210]  [<ffffffff81100ca0>] clocksource_watchdog+0x230/0x300
[ 1036.721411]  [<ffffffff81100a70>] ? __clocksource_unstable.isra.2+0x40/0x40
[ 1036.729478]  [<ffffffff810f5615>] call_timer_fn+0x35/0x120
[ 1036.735899]  [<ffffffff81100a70>] ? __clocksource_unstable.isra.2+0x40/0x40
[ 1036.743970]  [<ffffffff810f64cc>] run_timer_softirq+0x23c/0x2f0
[ 1036.750878]  [<ffffffff816d4397>] __do_softirq+0xd7/0x2c5
[ 1036.757199]  [<ffffffff81091245>] irq_exit+0xf5/0x100
[ 1036.763132]  [<ffffffff816d41d2>] smp_apic_timer_interrupt+0x42/0x50
[ 1036.770520]  [<ffffffff816d231c>] apic_timer_interrupt+0x8c/0xa0
[ 1036.777520]  <EOI>  [<ffffffff81569eb0>] ? poll_idle+0x40/0x80
[ 1036.784410]  [<ffffffff815697dc>] cpuidle_enter_state+0x9c/0x260
[ 1036.791413]  [<ffffffff815699d7>] cpuidle_enter+0x17/0x20
[ 1036.797734]  [<ffffffff810cf497>] cpu_startup_entry+0x2b7/0x3a0
[ 1036.804641]  [<ffffffff81050e6c>] start_secondary+0x15c/0x1a0

WARNING: multiple messages have this Message-ID (diff)
From: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
To: Chris Metcalf <cmetcalf-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Gilad Ben Yossef <giladb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>,
	Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Frederic Weisbecker
	<fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	"Paul E. McKenney"
	<paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	Viresh Kumar
	<viresh.kumar-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>,
	Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>,
	Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	Daniel Lezcano
	<daniel.lezcano-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v13 00/12] support "task_isolation" mode
Date: Wed, 20 Jul 2016 21:04:03 -0500 (CDT)	[thread overview]
Message-ID: <alpine.DEB.2.20.1607202059180.25838@east.gentwo.org> (raw)
In-Reply-To: <1468529299-27929-1-git-send-email-cmetcalf-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

We are trying to test the patchset on x86 and are getting strange
backtraces and aborts. It seems that the cpu before the cpu we are running
on creates an irq_work event that causes a latency event on the next cpu.

This is weird. Is there a new round robin IPI feature in the kernel that I
am not aware of?

Backtraces from dmesg:

[  956.603223] latencytest/7928: task_isolation mode lost due to irq_work
[  956.610817] cpu 12: irq_work violating task isolation for latencytest/7928 on cpu 13
[  956.619985] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 4.7.0-rc7-stream1 #1
[  956.628765] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.0.2 03/15/2016
[  956.637642]  0000000000000086 ce6735c7b39e7b81 ffff88103e783d00 ffffffff8134f6ff
[  956.646739]  ffff88102c50d700 000000000000000d ffff88103e783d28 ffffffff811986f4
[  956.655828]  ffff88102c50d700 ffff88203cf97f80 000000000000000d ffff88103e783d68
[  956.664924] Call Trace:
[  956.667945]  <IRQ>  [<ffffffff8134f6ff>] dump_stack+0x63/0x84
[  956.674740]  [<ffffffff811986f4>] task_isolation_debug_task+0xb4/0xd0
[  956.682229]  [<ffffffff810b4a13>] _task_isolation_debug+0x83/0xc0
[  956.689331]  [<ffffffff81179c0c>] irq_work_queue_on+0x9c/0x120
[  956.696142]  [<ffffffff811075e4>] tick_nohz_full_kick_cpu+0x44/0x50
[  956.703438]  [<ffffffff810b48d9>] wake_up_nohz_cpu+0x99/0x110
[  956.710150]  [<ffffffff810f57e1>] internal_add_timer+0x71/0xb0
[  956.716959]  [<ffffffff810f696b>] add_timer_on+0xbb/0x140
[  956.723283]  [<ffffffff81100ca0>] clocksource_watchdog+0x230/0x300
[  956.730480]  [<ffffffff81100a70>] ? __clocksource_unstable.isra.2+0x40/0x40
[  956.738555]  [<ffffffff810f5615>] call_timer_fn+0x35/0x120
[  956.744973]  [<ffffffff81100a70>] ? __clocksource_unstable.isra.2+0x40/0x40
[  956.753046]  [<ffffffff810f64cc>] run_timer_softirq+0x23c/0x2f0
[  956.759952]  [<ffffffff816d4397>] __do_softirq+0xd7/0x2c5
[  956.766272]  [<ffffffff81091245>] irq_exit+0xf5/0x100
[  956.772209]  [<ffffffff816d41d2>] smp_apic_timer_interrupt+0x42/0x50
[  956.779600]  [<ffffffff816d231c>] apic_timer_interrupt+0x8c/0xa0
[  956.786602]  <EOI>  [<ffffffff81569eb0>] ? poll_idle+0x40/0x80
[  956.793490]  [<ffffffff815697dc>] cpuidle_enter_state+0x9c/0x260
[  956.800498]  [<ffffffff815699d7>] cpuidle_enter+0x17/0x20
[  956.806810]  [<ffffffff810cf497>] cpu_startup_entry+0x2b7/0x3a0
[  956.813717]  [<ffffffff81050e6c>] start_secondary+0x15c/0x1a0
[ 1036.601758] cpu 12: irq_work violating task isolation for latencytest/8447 on cpu 13
[ 1036.610922] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 4.7.0-rc7-stream1 #1
[ 1036.619692] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.0.2 03/15/2016
[ 1036.628551]  0000000000000086 ce6735c7b39e7b81 ffff88103e783d00 ffffffff8134f6ff
[ 1036.637648]  ffff88102dca0000 000000000000000d ffff88103e783d28 ffffffff811986f4
[ 1036.646741]  ffff88102dca0000 ffff88203cf97f80 000000000000000d ffff88103e783d68
[ 1036.655833] Call Trace:
[ 1036.658852]  <IRQ>  [<ffffffff8134f6ff>] dump_stack+0x63/0x84
[ 1036.665649]  [<ffffffff811986f4>] task_isolation_debug_task+0xb4/0xd0
[ 1036.673136]  [<ffffffff810b4a13>] _task_isolation_debug+0x83/0xc0
[ 1036.680237]  [<ffffffff81179c0c>] irq_work_queue_on+0x9c/0x120
[ 1036.687091]  [<ffffffff811075e4>] tick_nohz_full_kick_cpu+0x44/0x50
[ 1036.694388]  [<ffffffff810b48d9>] wake_up_nohz_cpu+0x99/0x110
[ 1036.701089]  [<ffffffff810f57e1>] internal_add_timer+0x71/0xb0
[ 1036.707896]  [<ffffffff810f696b>] add_timer_on+0xbb/0x140
[ 1036.714210]  [<ffffffff81100ca0>] clocksource_watchdog+0x230/0x300
[ 1036.721411]  [<ffffffff81100a70>] ? __clocksource_unstable.isra.2+0x40/0x40
[ 1036.729478]  [<ffffffff810f5615>] call_timer_fn+0x35/0x120
[ 1036.735899]  [<ffffffff81100a70>] ? __clocksource_unstable.isra.2+0x40/0x40
[ 1036.743970]  [<ffffffff810f64cc>] run_timer_softirq+0x23c/0x2f0
[ 1036.750878]  [<ffffffff816d4397>] __do_softirq+0xd7/0x2c5
[ 1036.757199]  [<ffffffff81091245>] irq_exit+0xf5/0x100
[ 1036.763132]  [<ffffffff816d41d2>] smp_apic_timer_interrupt+0x42/0x50
[ 1036.770520]  [<ffffffff816d231c>] apic_timer_interrupt+0x8c/0xa0
[ 1036.777520]  <EOI>  [<ffffffff81569eb0>] ? poll_idle+0x40/0x80
[ 1036.784410]  [<ffffffff815697dc>] cpuidle_enter_state+0x9c/0x260
[ 1036.791413]  [<ffffffff815699d7>] cpuidle_enter+0x17/0x20
[ 1036.797734]  [<ffffffff810cf497>] cpu_startup_entry+0x2b7/0x3a0
[ 1036.804641]  [<ffffffff81050e6c>] start_secondary+0x15c/0x1a0

  parent reply	other threads:[~2016-07-21  2:04 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-14 20:48 [PATCH v13 00/12] support "task_isolation" mode Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 01/12] vmstat: add quiet_vmstat_sync function Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 02/12] vmstat: add vmstat_idle function Chris Metcalf
2016-07-14 20:48   ` Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 03/12] lru_add_drain_all: factor out lru_add_drain_needed Chris Metcalf
2016-07-14 20:48   ` Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 04/12] task_isolation: add initial support Chris Metcalf
2016-07-14 20:48   ` Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 05/12] task_isolation: track asynchronous interrupts Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 06/12] arch/x86: enable task isolation functionality Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 07/12] arm64: factor work_pending state machine to C Chris Metcalf
2016-07-14 20:48   ` Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 08/12] arch/arm64: enable task isolation functionality Chris Metcalf
2016-07-14 20:48   ` Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 09/12] arch/tile: " Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 10/12] arm, tile: turn off timer tick for oneshot_stopped state Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 11/12] task_isolation: support CONFIG_TASK_ISOLATION_ALL Chris Metcalf
2016-07-14 20:48 ` [PATCH v13 12/12] task_isolation: add user-settable notification signal Chris Metcalf
2016-07-14 21:03 ` [PATCH v13 00/12] support "task_isolation" mode Andy Lutomirski
2016-07-14 21:03   ` Andy Lutomirski
2016-07-14 21:22   ` Chris Metcalf
2016-07-14 21:22     ` Chris Metcalf
2016-07-18 22:11     ` Andy Lutomirski
2016-07-18 22:50       ` Chris Metcalf
2016-07-18  0:42   ` Christoph Lameter
2016-07-18  0:42     ` Christoph Lameter
2016-07-21  2:04 ` Christoph Lameter [this message]
2016-07-21  2:04   ` Christoph Lameter
2016-07-21 14:06   ` Chris Metcalf
2016-07-21 14:06     ` Chris Metcalf
2016-07-22  2:20     ` Christoph Lameter
2016-07-22 12:50       ` Chris Metcalf
2016-07-22 12:50         ` Chris Metcalf
2016-07-25 16:35         ` Christoph Lameter
2016-07-27 13:55           ` clocksource_watchdog causing scheduling of timers every second (was [v13] support "task_isolation" mode) Christoph Lameter
2016-07-27 13:55             ` Christoph Lameter
2016-07-27 14:12             ` Chris Metcalf
2016-07-27 14:12               ` Chris Metcalf
2016-07-27 15:23               ` Christoph Lameter
2016-07-27 15:23                 ` Christoph Lameter
2016-07-27 15:31                 ` Christoph Lameter
2016-07-27 15:31                   ` Christoph Lameter
2016-07-27 17:06                   ` Chris Metcalf
2016-07-27 17:06                     ` Chris Metcalf
2016-07-27 18:56                     ` Christoph Lameter
2016-07-27 19:49                       ` Chris Metcalf
2016-07-27 19:49                         ` Chris Metcalf
2016-07-27 19:53                         ` Christoph Lameter
2016-07-27 19:58                           ` Chris Metcalf
2016-07-27 19:58                             ` Chris Metcalf
2016-07-29 18:31                             ` Francis Giraldeau
2016-07-29 18:31                               ` Francis Giraldeau
2016-07-29 21:04                               ` Chris Metcalf
2016-07-29 21:04                                 ` Chris Metcalf
2016-08-10 22:16             ` Frederic Weisbecker
2016-08-10 22:26               ` Chris Metcalf
2016-08-10 22:26                 ` Chris Metcalf
2016-08-11  8:40               ` Peter Zijlstra
2016-08-11 11:58                 ` Frederic Weisbecker
2016-08-15 15:03                   ` Chris Metcalf
2016-08-15 15:03                     ` Chris Metcalf
2016-08-11 16:00                 ` Paul E. McKenney
2016-08-11 23:02                   ` Christoph Lameter
2016-08-11 23:47                     ` Paul E. McKenney
2016-08-12 14:23                       ` Christoph Lameter
2016-08-12 14:26                         ` Frederic Weisbecker
2016-08-12 14:26                           ` Frederic Weisbecker
2016-08-12 16:19                           ` Paul E. McKenney
2016-08-13 15:39                             ` Frederic Weisbecker
2016-08-13 15:39                               ` Frederic Weisbecker
2016-08-11  8:27         ` [PATCH v13 00/12] support "task_isolation" mode Peter Zijlstra
2016-07-27 14:01 ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1607202059180.25838@east.gentwo.org \
    --to=cl@linux.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=cmetcalf@mellanox.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=fweisbec@gmail.com \
    --cc=giladb@mellanox.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=viresh.kumar@linaro.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.