* WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
@ 2017-01-28 7:21 Mike Galbraith
2017-01-30 11:59 ` Matt Fleming
0 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2017-01-28 7:21 UTC (permalink / raw)
To: Matt Fleming; +Cc: lkml, Peter Zijlstra
Running Steven's hotplug stress script in tip.today. Config is
NOPREEMPT, tune for maximum build time (enterprise default-ish).
[ 75.268049] x86: Booting SMP configuration:
[ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
[ 75.359056] smpboot: CPU 3 is now offline
[ 75.415505] smpboot: CPU 4 is now offline
[ 75.479985] smpboot: CPU 5 is now offline
[ 75.550674] ------------[ cut here ]------------
[ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
[ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP
[ 75.550679] Modules linked in: ebtable_filter(E) ebtables(E) fuse(E) nf_log_ipv6(E) xt_pkttype(E) xt_physdev(E) br_netfilter(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) snd_hda_codec_hdmi(E) ip6_tables(E) x_tables(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) nls_iso8859_1(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E)
[ 75.550703] snd_pcm(E) nls_cp437(E) kvm_intel(E) snd_timer(E) kvm(E) irqbypass(E) nfsd(E) snd(E) crct10dif_pclmul(E) crc32_pclmul(E) auth_rpcgss(E) ghash_clmulni_intel(E) joydev(E) nfs_acl(E) lockd(E) soundcore(E) i2c_i801(E) shpchp(E) pcbc(E) aesni_intel(E) mei_me(E) aes_x86_64(E) crypto_simd(E) iTCO_wdt(E) iTCO_vendor_support(E) lpc_ich(E) mfd_core(E) glue_helper(E) pcspkr(E) mei(E) grace(E) cryptd(E) intel_smartconnect(E) battery(E) fan(E) thermal(E) tpm_infineon(E) sunrpc(E) efivarfs(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) ahci(E) wmi(E) libahci(E) i2c_algo_bit(E) drm_kms_helper(E) xhci_pci(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ehci_pci(E) ehci_hcd(E) ttm(E) xhci_hcd(E) crc32c_intel(E) r8169(E)
[ 75.550721] mii(E) libata(E) drm(E) usbcore(E) fjes(E) video(E) button(E) sd_mod(E) vfat(E) fat(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mod(E) loop(E) sg(E) scsi_mod(E) autofs4(E)
[ 75.550728] CPU: 1 PID: 15 Comm: migration/1 Tainted: G E 4.10.0-tip-default #47
[ 75.550728] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[ 75.550728] Call Trace:
[ 75.550732] dump_stack+0x63/0x87
[ 75.550734] __warn+0xd1/0xf0
[ 75.550737] ? load_balance+0xa00/0xa00
[ 75.550738] warn_slowpath_fmt+0x4f/0x60
[ 75.550739] ? cpumask_next_and+0x35/0x50
[ 75.550740] assert_clock_updated.isra.62.part.63+0x25/0x27
[ 75.550741] update_load_avg+0x855/0x950
[ 75.550742] ? load_balance+0xa00/0xa00
[ 75.550743] set_next_entity+0x9e/0x1b0
[ 75.550744] pick_next_task_fair+0x78/0x540
[ 75.550746] ? sched_clock+0x9/0x10
[ 75.550747] ? sched_clock_cpu+0x11/0xb0
[ 75.550748] ? load_balance+0xa00/0xa00
[ 75.550749] sched_cpu_dying+0x23c/0x280
[ 75.550751] ? fini_debug_store_on_cpu+0x34/0x40
[ 75.550752] ? sched_cpu_starting+0x60/0x60
[ 75.550753] cpuhp_invoke_callback+0x90/0x400
[ 75.550754] take_cpu_down+0x5e/0xa0
[ 75.550757] multi_cpu_stop+0xc4/0xf0
[ 75.550757] ? cpu_stop_queue_work+0xb0/0xb0
[ 75.550758] cpu_stopper_thread+0x8c/0x120
[ 75.550760] smpboot_thread_fn+0x110/0x160
[ 75.550762] kthread+0x101/0x140
[ 75.550762] ? sort_range+0x30/0x30
[ 75.550763] ? kthread_park+0x90/0x90
[ 75.550766] ret_from_fork+0x2c/0x40
[ 75.550766] ---[ end trace 9dd372e3b19c77a0 ]---
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
2017-01-28 7:21 WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 Mike Galbraith
@ 2017-01-30 11:59 ` Matt Fleming
2017-01-31 6:19 ` Mike Galbraith
0 siblings, 1 reply; 11+ messages in thread
From: Matt Fleming @ 2017-01-30 11:59 UTC (permalink / raw)
To: Mike Galbraith; +Cc: lkml, Peter Zijlstra
On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote:
> Running Steven's hotplug stress script in tip.today. Config is
> NOPREEMPT, tune for maximum build time (enterprise default-ish).
>
> [ 75.268049] x86: Booting SMP configuration:
> [ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
> [ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
> [ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
> [ 75.359056] smpboot: CPU 3 is now offline
> [ 75.415505] smpboot: CPU 4 is now offline
> [ 75.479985] smpboot: CPU 5 is now offline
> [ 75.550674] ------------[ cut here ]------------
> [ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
> [ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP
The following patch queued in tip/sched/core should fix this issue:
---->8----
>From 4d25b35ea3729affd37d69c78191ce6f92766e1a Mon Sep 17 00:00:00 2001
From: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed, 26 Oct 2016 16:15:44 +0100
Subject: [PATCH] sched/fair: Restore previous rq_flags when migrating tasks in
hotplug
__migrate_task() can return with a different runqueue locked than the
one we passed as an argument. So that we can repin the lock in
migrate_tasks() (and keep the update_rq_clock() bit) we need to
restore the old rq_flags before repinning.
Note that it wouldn't be correct to change move_queued_task() to repin
because of the change of runqueue and the fact that having an
up-to-date clock on the initial rq doesn't mean the new rq has one
too.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/core.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f983e83a353..3b248b03ad8f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5608,7 +5608,7 @@ static void migrate_tasks(struct rq *dead_rq)
{
struct rq *rq = dead_rq;
struct task_struct *next, *stop = rq->stop;
- struct rq_flags rf;
+ struct rq_flags rf, old_rf;
int dest_cpu;
/*
@@ -5669,6 +5669,13 @@ static void migrate_tasks(struct rq *dead_rq)
continue;
}
+ /*
+ * __migrate_task() may return with a different
+ * rq->lock held and a new cookie in 'rf', but we need
+ * to preserve rf::clock_update_flags for 'dead_rq'.
+ */
+ old_rf = rf;
+
/* Find suitable destination for @next, with force if needed. */
dest_cpu = select_fallback_rq(dead_rq->cpu, next);
@@ -5677,6 +5684,7 @@ static void migrate_tasks(struct rq *dead_rq)
raw_spin_unlock(&rq->lock);
rq = dead_rq;
raw_spin_lock(&rq->lock);
+ rf = old_rf;
}
raw_spin_unlock(&next->pi_lock);
}
--
2.10.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
2017-01-30 11:59 ` Matt Fleming
@ 2017-01-31 6:19 ` Mike Galbraith
2017-01-31 7:28 ` Ingo Molnar
0 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2017-01-31 6:19 UTC (permalink / raw)
To: Matt Fleming; +Cc: lkml, Peter Zijlstra, Ingo Molnar
On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote:
> On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote:
> > Running Steven's hotplug stress script in tip.today. Config is
> > NOPREEMPT, tune for maximum build time (enterprise default-ish).
> >
> > [ 75.268049] x86: Booting SMP configuration:
> > [ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
> > [ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
> > [ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
> > [ 75.359056] smpboot: CPU 3 is now offline
> > [ 75.415505] smpboot: CPU 4 is now offline
> > [ 75.479985] smpboot: CPU 5 is now offline
> > [ 75.550674] ------------[ cut here ]------------
> > [ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804
> > assert_clock_updated.isra.62.part.63+0x25/0x27
> > [ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP
>
> The following patch queued in tip/sched/core should fix this issue:
Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
an early boot brick problem.
> ---->8----
>
> From 4d25b35ea3729affd37d69c78191ce6f92766e1a Mon Sep 17 00:00:00
> 2001
> From: Matt Fleming <matt@codeblueprint.co.uk>
> Date: Wed, 26 Oct 2016 16:15:44 +0100
> Subject: [PATCH] sched/fair: Restore previous rq_flags when migrating
> tasks in
> hotplug
>
> __migrate_task() can return with a different runqueue locked than the
> one we passed as an argument. So that we can repin the lock in
> migrate_tasks() (and keep the update_rq_clock() bit) we need to
> restore the old rq_flags before repinning.
>
> Note that it wouldn't be correct to change move_queued_task() to
> repin
> because of the change of runqueue and the fact that having an
> up-to-date clock on the initial rq doesn't mean the new rq has one
> too.
>
> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
> kernel/sched/core.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f983e83a353..3b248b03ad8f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5608,7 +5608,7 @@ static void migrate_tasks(struct rq *dead_rq)
> {
> struct rq *rq = dead_rq;
> struct task_struct *next, *stop = rq->stop;
> - struct rq_flags rf;
> + struct rq_flags rf, old_rf;
> int dest_cpu;
>
> /*
> @@ -5669,6 +5669,13 @@ static void migrate_tasks(struct rq *dead_rq)
> continue;
> }
>
> + /*
> + * __migrate_task() may return with a different
> + * rq->lock held and a new cookie in 'rf', but we
> need
> + * to preserve rf::clock_update_flags for 'dead_rq'.
> + */
> + old_rf = rf;
> +
> /* Find suitable destination for @next, with force
> if needed. */
> dest_cpu = select_fallback_rq(dead_rq->cpu, next);
>
> @@ -5677,6 +5684,7 @@ static void migrate_tasks(struct rq *dead_rq)
> raw_spin_unlock(&rq->lock);
> rq = dead_rq;
> raw_spin_lock(&rq->lock);
> + rf = old_rf;
> }
> raw_spin_unlock(&next->pi_lock);
> }
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
2017-01-31 6:19 ` Mike Galbraith
@ 2017-01-31 7:28 ` Ingo Molnar
2017-01-31 7:35 ` Mike Galbraith
2017-01-31 8:51 ` Mike Galbraith
0 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2017-01-31 7:28 UTC (permalink / raw)
To: Mike Galbraith; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar
* Mike Galbraith <efault@gmx.de> wrote:
> On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote:
> > On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote:
> > > Running Steven's hotplug stress script in tip.today. Config is
> > > NOPREEMPT, tune for maximum build time (enterprise default-ish).
> > >
> > > [ 75.268049] x86: Booting SMP configuration:
> > > [ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > > [ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
> > > [ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
> > > [ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
> > > [ 75.359056] smpboot: CPU 3 is now offline
> > > [ 75.415505] smpboot: CPU 4 is now offline
> > > [ 75.479985] smpboot: CPU 5 is now offline
> > > [ 75.550674] ------------[ cut here ]------------
> > > [ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804
> > > assert_clock_updated.isra.62.part.63+0x25/0x27
> > > [ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP
> >
> > The following patch queued in tip/sched/core should fix this issue:
>
> Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> an early boot brick problem.
That's bad - could you perhaps try to bisect it? All recently queued up patches
that could cause such problems should be readily bisectable.
The bisection might be faster if you first checked whether 5bf728f02218 works - if
it does then the bug is in the patches in WIP.x86/boot or WIP.x86/fpu.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
2017-01-31 7:28 ` Ingo Molnar
@ 2017-01-31 7:35 ` Mike Galbraith
2017-01-31 7:45 ` Ingo Molnar
2017-01-31 8:51 ` Mike Galbraith
1 sibling, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2017-01-31 7:35 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar
On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
> > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> > an early boot brick problem.
>
> That's bad - could you perhaps try to bisect it? All recently queued up patches
> that could cause such problems should be readily bisectable.
Yeah, I'll give it a go as soon as I get some other stuff done.
-Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
2017-01-31 7:35 ` Mike Galbraith
@ 2017-01-31 7:45 ` Ingo Molnar
2017-01-31 8:07 ` Mike Galbraith
0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2017-01-31 7:45 UTC (permalink / raw)
To: Mike Galbraith
Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Borislav Petkov
* Mike Galbraith <efault@gmx.de> wrote:
> On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote:
> > * Mike Galbraith <efault@gmx.de> wrote:
>
> > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> > > an early boot brick problem.
> >
> > That's bad - could you perhaps try to bisect it? All recently queued up patches
> > that could cause such problems should be readily bisectable.
>
> Yeah, I'll give it a go as soon as I get some other stuff done.
Please double check whether -tip f18a8a0143b1 works for you (latestest -tip
freshly pushed out), it might be that my bogus conflict resolution of a
x86/microcode conflict is what caused your boot problems?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
2017-01-31 7:45 ` Ingo Molnar
@ 2017-01-31 8:07 ` Mike Galbraith
0 siblings, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2017-01-31 8:07 UTC (permalink / raw)
To: Ingo Molnar
Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Borislav Petkov
On Tue, 2017-01-31 at 08:45 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
>
> > On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote:
> > > * Mike Galbraith <efault@gmx.de> wrote:
> >
> > > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> > > > an early boot brick problem.
> > >
> > > That's bad - could you perhaps try to bisect it? All recently queued up patches
> > > that could cause such problems should be readily bisectable.
> >
> > Yeah, I'll give it a go as soon as I get some other stuff done.
>
> Please double check whether -tip f18a8a0143b1 works for you (latestest -tip
> freshly pushed out), it might be that my bogus conflict resolution of a
> x86/microcode conflict is what caused your boot problems?
Oh darn, it's a nogo. Back to plan A.
-Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
2017-01-31 7:28 ` Ingo Molnar
2017-01-31 7:35 ` Mike Galbraith
@ 2017-01-31 8:51 ` Mike Galbraith
2017-01-31 8:54 ` Ingo Molnar
1 sibling, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2017-01-31 8:51 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar
On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
>
> > On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote:
> > > On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote:
> > > > Running Steven's hotplug stress script in tip.today. Config is
> > > > NOPREEMPT, tune for maximum build time (enterprise default-ish).
> > > >
> > > > [ 75.268049] x86: Booting SMP configuration:
> > > > [ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > > > [ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
> > > > [ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
> > > > [ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
> > > > [ 75.359056] smpboot: CPU 3 is now offline
> > > > [ 75.415505] smpboot: CPU 4 is now offline
> > > > [ 75.479985] smpboot: CPU 5 is now offline
> > > > [ 75.550674] ------------[ cut here ]------------
> > > > [ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804
> > > > assert_clock_updated.isra.62.part.63+0x25/0x27
> > > > [ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP
> > >
> > > The following patch queued in tip/sched/core should fix this issue:
> >
> > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> > an early boot brick problem.
>
> That's bad - could you perhaps try to bisect it? All recently queued up patches
> that could cause such problems should be readily bisectable.
>
> The bisection might be faster if you first checked whether 5bf728f02218 works - if
> it does then the bug is in the patches in WIP.x86/boot or WIP.x86/fpu.
Fast ain't gonna happen, 5bf728f02218 bricked.
-Mike
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
2017-01-31 8:51 ` Mike Galbraith
@ 2017-01-31 8:54 ` Ingo Molnar
2017-01-31 11:17 ` Mike Galbraith
0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2017-01-31 8:54 UTC (permalink / raw)
To: Mike Galbraith; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar
* Mike Galbraith <efault@gmx.de> wrote:
> On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote:
> > * Mike Galbraith <efault@gmx.de> wrote:
> >
> > > On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote:
> > > > On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote:
> > > > > Running Steven's hotplug stress script in tip.today. Config is
> > > > > NOPREEMPT, tune for maximum build time (enterprise default-ish).
> > > > >
> > > > > [ 75.268049] x86: Booting SMP configuration:
> > > > > [ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > > > > [ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
> > > > > [ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
> > > > > [ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
> > > > > [ 75.359056] smpboot: CPU 3 is now offline
> > > > > [ 75.415505] smpboot: CPU 4 is now offline
> > > > > [ 75.479985] smpboot: CPU 5 is now offline
> > > > > [ 75.550674] ------------[ cut here ]------------
> > > > > [ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804
> > > > > assert_clock_updated.isra.62.part.63+0x25/0x27
> > > > > [ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP
> > > >
> > > > The following patch queued in tip/sched/core should fix this issue:
> > >
> > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> > > an early boot brick problem.
> >
> > That's bad - could you perhaps try to bisect it? All recently queued up patches
> > that could cause such problems should be readily bisectable.
> >
> > The bisection might be faster if you first checked whether 5bf728f02218 works - if
> > it does then the bug is in the patches in WIP.x86/boot or WIP.x86/fpu.
>
> Fast ain't gonna happen, 5bf728f02218 bricked.
:-/
Next point would be f9a42e0d58cf I suspect, to establish that Linus's latest
kernel is fine. That means it's in one of the ~200 -tip commits - should be
bisectable in 8-10 steps from that point on.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
2017-01-31 8:54 ` Ingo Molnar
@ 2017-01-31 11:17 ` Mike Galbraith
2017-01-31 16:25 ` Ingo Molnar
0 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2017-01-31 11:17 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Marc Zyngier
On Tue, 2017-01-31 at 09:54 +0100, Ingo Molnar wrote:
> > Fast ain't gonna happen, 5bf728f02218 bricked.
>
> :-/
>
> Next point would be f9a42e0d58cf I suspect, to establish that Linus's latest
> kernel is fine. That means it's in one of the ~200 -tip commits - should be
> bisectable in 8-10 steps from that point on.
It bisected cleanly to the below, confirmed via quilt push/pop revert.
According to the symptoms my box exhibits, patchlet needs to be
twiddled to ensure that interrupts are enabled at _least_ once ;-)
08d85f3ea99f1eeafc4e8507936190e86a16ee8c is the first bad commit
commit 08d85f3ea99f1eeafc4e8507936190e86a16ee8c
Author: Marc Zyngier <marc.zyngier@arm.com>
Date: Tue Jan 17 16:00:48 2017 +0000
irqdomain: Avoid activating interrupts more than once
Since commit f3b0946d629c ("genirq/msi: Make sure PCI MSIs are
activated early"), we can end-up activating a PCI/MSI twice (once
at allocation time, and once at startup time).
This is normally of no consequences, except that there is some
HW out there that may misbehave if activate is used more than once
(the GICv3 ITS, for example, uses the activate callback
to issue the MAPVI command, and the architecture spec says that
"If there is an existing mapping for the EventID-DeviceID
combination, behavior is UNPREDICTABLE").
While this could be worked around in each individual driver, it may
make more sense to tackle the issue at the core level. In order to
avoid getting in that situation, let's have a per-interrupt flag
to remember if we have already activated that interrupt or not.
Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early")
Reported-and-tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1484668848-24361-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
:040000 040000 eed859b1f22b822f4400e7c050929d8b4c4a146d 39097c0315a12c0a3809bb82687fa56b1c9e5633 M include
:040000 040000 7dfe2ca8e1de55e890d0e6a761bab9c07c6f5f8a e28a3a54a68866273b474e2053b16155987e06f2 M kernel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
2017-01-31 11:17 ` Mike Galbraith
@ 2017-01-31 16:25 ` Ingo Molnar
0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2017-01-31 16:25 UTC (permalink / raw)
To: Mike Galbraith, Thomas Gleixner
Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Marc Zyngier
* Mike Galbraith <efault@gmx.de> wrote:
> On Tue, 2017-01-31 at 09:54 +0100, Ingo Molnar wrote:
>
> > > Fast ain't gonna happen, 5bf728f02218 bricked.
> >
> > :-/
> >
> > Next point would be f9a42e0d58cf I suspect, to establish that Linus's latest
> > kernel is fine. That means it's in one of the ~200 -tip commits - should be
> > bisectable in 8-10 steps from that point on.
>
> It bisected cleanly to the below, confirmed via quilt push/pop revert.
> According to the symptoms my box exhibits, patchlet needs to be
> twiddled to ensure that interrupts are enabled at _least_ once ;-)
>
> 08d85f3ea99f1eeafc4e8507936190e86a16ee8c is the first bad commit
> commit 08d85f3ea99f1eeafc4e8507936190e86a16ee8c
> Author: Marc Zyngier <marc.zyngier@arm.com>
> Date: Tue Jan 17 16:00:48 2017 +0000
>
> irqdomain: Avoid activating interrupts more than once
Fantastic, thanks Mike!
Ingo
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-01-31 16:25 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-28 7:21 WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 Mike Galbraith
2017-01-30 11:59 ` Matt Fleming
2017-01-31 6:19 ` Mike Galbraith
2017-01-31 7:28 ` Ingo Molnar
2017-01-31 7:35 ` Mike Galbraith
2017-01-31 7:45 ` Ingo Molnar
2017-01-31 8:07 ` Mike Galbraith
2017-01-31 8:51 ` Mike Galbraith
2017-01-31 8:54 ` Ingo Molnar
2017-01-31 11:17 ` Mike Galbraith
2017-01-31 16:25 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).