* WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 @ 2017-01-28 7:21 Mike Galbraith 2017-01-30 11:59 ` Matt Fleming 0 siblings, 1 reply; 11+ messages in thread From: Mike Galbraith @ 2017-01-28 7:21 UTC (permalink / raw) To: Matt Fleming; +Cc: lkml, Peter Zijlstra Running Steven's hotplug stress script in tip.today. Config is NOPREEMPT, tune for maximum build time (enterprise default-ish). [ 75.268049] x86: Booting SMP configuration: [ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2 [ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4 [ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1 [ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3 [ 75.359056] smpboot: CPU 3 is now offline [ 75.415505] smpboot: CPU 4 is now offline [ 75.479985] smpboot: CPU 5 is now offline [ 75.550674] ------------[ cut here ]------------ [ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 [ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP [ 75.550679] Modules linked in: ebtable_filter(E) ebtables(E) fuse(E) nf_log_ipv6(E) xt_pkttype(E) xt_physdev(E) br_netfilter(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) snd_hda_codec_hdmi(E) ip6_tables(E) x_tables(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) nls_iso8859_1(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) [ 75.550703] snd_pcm(E) nls_cp437(E) kvm_intel(E) snd_timer(E) kvm(E) irqbypass(E) nfsd(E) snd(E) crct10dif_pclmul(E) crc32_pclmul(E) auth_rpcgss(E) ghash_clmulni_intel(E) joydev(E) nfs_acl(E) lockd(E) soundcore(E) i2c_i801(E) shpchp(E) pcbc(E) aesni_intel(E) mei_me(E) aes_x86_64(E) crypto_simd(E) iTCO_wdt(E) iTCO_vendor_support(E) lpc_ich(E) mfd_core(E) glue_helper(E) pcspkr(E) mei(E) grace(E) cryptd(E) intel_smartconnect(E) battery(E) fan(E) thermal(E) tpm_infineon(E) sunrpc(E) efivarfs(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) ahci(E) wmi(E) libahci(E) i2c_algo_bit(E) drm_kms_helper(E) xhci_pci(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ehci_pci(E) ehci_hcd(E) ttm(E) xhci_hcd(E) crc32c_intel(E) r8169(E) [ 75.550721] mii(E) libata(E) drm(E) usbcore(E) fjes(E) video(E) button(E) sd_mod(E) vfat(E) fat(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mod(E) loop(E) sg(E) scsi_mod(E) autofs4(E) [ 75.550728] CPU: 1 PID: 15 Comm: migration/1 Tainted: G E 4.10.0-tip-default #47 [ 75.550728] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 75.550728] Call Trace: [ 75.550732] dump_stack+0x63/0x87 [ 75.550734] __warn+0xd1/0xf0 [ 75.550737] ? load_balance+0xa00/0xa00 [ 75.550738] warn_slowpath_fmt+0x4f/0x60 [ 75.550739] ? cpumask_next_and+0x35/0x50 [ 75.550740] assert_clock_updated.isra.62.part.63+0x25/0x27 [ 75.550741] update_load_avg+0x855/0x950 [ 75.550742] ? load_balance+0xa00/0xa00 [ 75.550743] set_next_entity+0x9e/0x1b0 [ 75.550744] pick_next_task_fair+0x78/0x540 [ 75.550746] ? sched_clock+0x9/0x10 [ 75.550747] ? sched_clock_cpu+0x11/0xb0 [ 75.550748] ? load_balance+0xa00/0xa00 [ 75.550749] sched_cpu_dying+0x23c/0x280 [ 75.550751] ? fini_debug_store_on_cpu+0x34/0x40 [ 75.550752] ? sched_cpu_starting+0x60/0x60 [ 75.550753] cpuhp_invoke_callback+0x90/0x400 [ 75.550754] take_cpu_down+0x5e/0xa0 [ 75.550757] multi_cpu_stop+0xc4/0xf0 [ 75.550757] ? cpu_stop_queue_work+0xb0/0xb0 [ 75.550758] cpu_stopper_thread+0x8c/0x120 [ 75.550760] smpboot_thread_fn+0x110/0x160 [ 75.550762] kthread+0x101/0x140 [ 75.550762] ? sort_range+0x30/0x30 [ 75.550763] ? kthread_park+0x90/0x90 [ 75.550766] ret_from_fork+0x2c/0x40 [ 75.550766] ---[ end trace 9dd372e3b19c77a0 ]--- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 2017-01-28 7:21 WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 Mike Galbraith @ 2017-01-30 11:59 ` Matt Fleming 2017-01-31 6:19 ` Mike Galbraith 0 siblings, 1 reply; 11+ messages in thread From: Matt Fleming @ 2017-01-30 11:59 UTC (permalink / raw) To: Mike Galbraith; +Cc: lkml, Peter Zijlstra On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote: > Running Steven's hotplug stress script in tip.today. Config is > NOPREEMPT, tune for maximum build time (enterprise default-ish). > > [ 75.268049] x86: Booting SMP configuration: > [ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2 > [ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4 > [ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1 > [ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3 > [ 75.359056] smpboot: CPU 3 is now offline > [ 75.415505] smpboot: CPU 4 is now offline > [ 75.479985] smpboot: CPU 5 is now offline > [ 75.550674] ------------[ cut here ]------------ > [ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 > [ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP The following patch queued in tip/sched/core should fix this issue: ---->8---- >From 4d25b35ea3729affd37d69c78191ce6f92766e1a Mon Sep 17 00:00:00 2001 From: Matt Fleming <matt@codeblueprint.co.uk> Date: Wed, 26 Oct 2016 16:15:44 +0100 Subject: [PATCH] sched/fair: Restore previous rq_flags when migrating tasks in hotplug __migrate_task() can return with a different runqueue locked than the one we passed as an argument. So that we can repin the lock in migrate_tasks() (and keep the update_rq_clock() bit) we need to restore the old rq_flags before repinning. Note that it wouldn't be correct to change move_queued_task() to repin because of the change of runqueue and the fact that having an up-to-date clock on the initial rq doesn't mean the new rq has one too. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/core.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7f983e83a353..3b248b03ad8f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5608,7 +5608,7 @@ static void migrate_tasks(struct rq *dead_rq) { struct rq *rq = dead_rq; struct task_struct *next, *stop = rq->stop; - struct rq_flags rf; + struct rq_flags rf, old_rf; int dest_cpu; /* @@ -5669,6 +5669,13 @@ static void migrate_tasks(struct rq *dead_rq) continue; } + /* + * __migrate_task() may return with a different + * rq->lock held and a new cookie in 'rf', but we need + * to preserve rf::clock_update_flags for 'dead_rq'. + */ + old_rf = rf; + /* Find suitable destination for @next, with force if needed. */ dest_cpu = select_fallback_rq(dead_rq->cpu, next); @@ -5677,6 +5684,7 @@ static void migrate_tasks(struct rq *dead_rq) raw_spin_unlock(&rq->lock); rq = dead_rq; raw_spin_lock(&rq->lock); + rf = old_rf; } raw_spin_unlock(&next->pi_lock); } -- 2.10.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 2017-01-30 11:59 ` Matt Fleming @ 2017-01-31 6:19 ` Mike Galbraith 2017-01-31 7:28 ` Ingo Molnar 0 siblings, 1 reply; 11+ messages in thread From: Mike Galbraith @ 2017-01-31 6:19 UTC (permalink / raw) To: Matt Fleming; +Cc: lkml, Peter Zijlstra, Ingo Molnar On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote: > On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote: > > Running Steven's hotplug stress script in tip.today. Config is > > NOPREEMPT, tune for maximum build time (enterprise default-ish). > > > > [ 75.268049] x86: Booting SMP configuration: > > [ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2 > > [ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4 > > [ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1 > > [ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3 > > [ 75.359056] smpboot: CPU 3 is now offline > > [ 75.415505] smpboot: CPU 4 is now offline > > [ 75.479985] smpboot: CPU 5 is now offline > > [ 75.550674] ------------[ cut here ]------------ > > [ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 > > assert_clock_updated.isra.62.part.63+0x25/0x27 > > [ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP > > The following patch queued in tip/sched/core should fix this issue: Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew an early boot brick problem. > ---->8---- > > From 4d25b35ea3729affd37d69c78191ce6f92766e1a Mon Sep 17 00:00:00 > 2001 > From: Matt Fleming <matt@codeblueprint.co.uk> > Date: Wed, 26 Oct 2016 16:15:44 +0100 > Subject: [PATCH] sched/fair: Restore previous rq_flags when migrating > tasks in > hotplug > > __migrate_task() can return with a different runqueue locked than the > one we passed as an argument. So that we can repin the lock in > migrate_tasks() (and keep the update_rq_clock() bit) we need to > restore the old rq_flags before repinning. > > Note that it wouldn't be correct to change move_queued_task() to > repin > because of the change of runqueue and the fact that having an > up-to-date clock on the initial rq doesn't mean the new rq has one > too. > > Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Mike Galbraith <efault@gmx.de> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Ingo Molnar <mingo@kernel.org> > --- > kernel/sched/core.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 7f983e83a353..3b248b03ad8f 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5608,7 +5608,7 @@ static void migrate_tasks(struct rq *dead_rq) > { > struct rq *rq = dead_rq; > struct task_struct *next, *stop = rq->stop; > - struct rq_flags rf; > + struct rq_flags rf, old_rf; > int dest_cpu; > > /* > @@ -5669,6 +5669,13 @@ static void migrate_tasks(struct rq *dead_rq) > continue; > } > > + /* > + * __migrate_task() may return with a different > + * rq->lock held and a new cookie in 'rf', but we > need > + * to preserve rf::clock_update_flags for 'dead_rq'. > + */ > + old_rf = rf; > + > /* Find suitable destination for @next, with force > if needed. */ > dest_cpu = select_fallback_rq(dead_rq->cpu, next); > > @@ -5677,6 +5684,7 @@ static void migrate_tasks(struct rq *dead_rq) > raw_spin_unlock(&rq->lock); > rq = dead_rq; > raw_spin_lock(&rq->lock); > + rf = old_rf; > } > raw_spin_unlock(&next->pi_lock); > } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 2017-01-31 6:19 ` Mike Galbraith @ 2017-01-31 7:28 ` Ingo Molnar 2017-01-31 7:35 ` Mike Galbraith 2017-01-31 8:51 ` Mike Galbraith 0 siblings, 2 replies; 11+ messages in thread From: Ingo Molnar @ 2017-01-31 7:28 UTC (permalink / raw) To: Mike Galbraith; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar * Mike Galbraith <efault@gmx.de> wrote: > On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote: > > On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote: > > > Running Steven's hotplug stress script in tip.today. Config is > > > NOPREEMPT, tune for maximum build time (enterprise default-ish). > > > > > > [ 75.268049] x86: Booting SMP configuration: > > > [ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2 > > > [ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4 > > > [ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1 > > > [ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3 > > > [ 75.359056] smpboot: CPU 3 is now offline > > > [ 75.415505] smpboot: CPU 4 is now offline > > > [ 75.479985] smpboot: CPU 5 is now offline > > > [ 75.550674] ------------[ cut here ]------------ > > > [ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 > > > assert_clock_updated.isra.62.part.63+0x25/0x27 > > > [ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP > > > > The following patch queued in tip/sched/core should fix this issue: > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew > an early boot brick problem. That's bad - could you perhaps try to bisect it? All recently queued up patches that could cause such problems should be readily bisectable. The bisection might be faster if you first checked whether 5bf728f02218 works - if it does then the bug is in the patches in WIP.x86/boot or WIP.x86/fpu. Thanks, Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 2017-01-31 7:28 ` Ingo Molnar @ 2017-01-31 7:35 ` Mike Galbraith 2017-01-31 7:45 ` Ingo Molnar 2017-01-31 8:51 ` Mike Galbraith 1 sibling, 1 reply; 11+ messages in thread From: Mike Galbraith @ 2017-01-31 7:35 UTC (permalink / raw) To: Ingo Molnar; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote: > * Mike Galbraith <efault@gmx.de> wrote: > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew > > an early boot brick problem. > > That's bad - could you perhaps try to bisect it? All recently queued up patches > that could cause such problems should be readily bisectable. Yeah, I'll give it a go as soon as I get some other stuff done. -Mike ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 2017-01-31 7:35 ` Mike Galbraith @ 2017-01-31 7:45 ` Ingo Molnar 2017-01-31 8:07 ` Mike Galbraith 0 siblings, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2017-01-31 7:45 UTC (permalink / raw) To: Mike Galbraith Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Borislav Petkov * Mike Galbraith <efault@gmx.de> wrote: > On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote: > > * Mike Galbraith <efault@gmx.de> wrote: > > > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew > > > an early boot brick problem. > > > > That's bad - could you perhaps try to bisect it? All recently queued up patches > > that could cause such problems should be readily bisectable. > > Yeah, I'll give it a go as soon as I get some other stuff done. Please double check whether -tip f18a8a0143b1 works for you (latestest -tip freshly pushed out), it might be that my bogus conflict resolution of a x86/microcode conflict is what caused your boot problems? Thanks, Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 2017-01-31 7:45 ` Ingo Molnar @ 2017-01-31 8:07 ` Mike Galbraith 0 siblings, 0 replies; 11+ messages in thread From: Mike Galbraith @ 2017-01-31 8:07 UTC (permalink / raw) To: Ingo Molnar Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Borislav Petkov On Tue, 2017-01-31 at 08:45 +0100, Ingo Molnar wrote: > * Mike Galbraith <efault@gmx.de> wrote: > > > On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote: > > > * Mike Galbraith <efault@gmx.de> wrote: > > > > > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew > > > > an early boot brick problem. > > > > > > That's bad - could you perhaps try to bisect it? All recently queued up patches > > > that could cause such problems should be readily bisectable. > > > > Yeah, I'll give it a go as soon as I get some other stuff done. > > Please double check whether -tip f18a8a0143b1 works for you (latestest -tip > freshly pushed out), it might be that my bogus conflict resolution of a > x86/microcode conflict is what caused your boot problems? Oh darn, it's a nogo. Back to plan A. -Mike ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 2017-01-31 7:28 ` Ingo Molnar 2017-01-31 7:35 ` Mike Galbraith @ 2017-01-31 8:51 ` Mike Galbraith 2017-01-31 8:54 ` Ingo Molnar 1 sibling, 1 reply; 11+ messages in thread From: Mike Galbraith @ 2017-01-31 8:51 UTC (permalink / raw) To: Ingo Molnar; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote: > * Mike Galbraith <efault@gmx.de> wrote: > > > On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote: > > > On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote: > > > > Running Steven's hotplug stress script in tip.today. Config is > > > > NOPREEMPT, tune for maximum build time (enterprise default-ish). > > > > > > > > [ 75.268049] x86: Booting SMP configuration: > > > > [ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2 > > > > [ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4 > > > > [ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1 > > > > [ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3 > > > > [ 75.359056] smpboot: CPU 3 is now offline > > > > [ 75.415505] smpboot: CPU 4 is now offline > > > > [ 75.479985] smpboot: CPU 5 is now offline > > > > [ 75.550674] ------------[ cut here ]------------ > > > > [ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 > > > > assert_clock_updated.isra.62.part.63+0x25/0x27 > > > > [ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP > > > > > > The following patch queued in tip/sched/core should fix this issue: > > > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew > > an early boot brick problem. > > That's bad - could you perhaps try to bisect it? All recently queued up patches > that could cause such problems should be readily bisectable. > > The bisection might be faster if you first checked whether 5bf728f02218 works - if > it does then the bug is in the patches in WIP.x86/boot or WIP.x86/fpu. Fast ain't gonna happen, 5bf728f02218 bricked. -Mike ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 2017-01-31 8:51 ` Mike Galbraith @ 2017-01-31 8:54 ` Ingo Molnar 2017-01-31 11:17 ` Mike Galbraith 0 siblings, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2017-01-31 8:54 UTC (permalink / raw) To: Mike Galbraith; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar * Mike Galbraith <efault@gmx.de> wrote: > On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote: > > * Mike Galbraith <efault@gmx.de> wrote: > > > > > On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote: > > > > On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote: > > > > > Running Steven's hotplug stress script in tip.today. Config is > > > > > NOPREEMPT, tune for maximum build time (enterprise default-ish). > > > > > > > > > > [ 75.268049] x86: Booting SMP configuration: > > > > > [ 75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2 > > > > > [ 75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4 > > > > > [ 75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1 > > > > > [ 75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3 > > > > > [ 75.359056] smpboot: CPU 3 is now offline > > > > > [ 75.415505] smpboot: CPU 4 is now offline > > > > > [ 75.479985] smpboot: CPU 5 is now offline > > > > > [ 75.550674] ------------[ cut here ]------------ > > > > > [ 75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 > > > > > assert_clock_updated.isra.62.part.63+0x25/0x27 > > > > > [ 75.550679] rq->clock_update_flags < RQCF_ACT_SKIP > > > > > > > > The following patch queued in tip/sched/core should fix this issue: > > > > > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew > > > an early boot brick problem. > > > > That's bad - could you perhaps try to bisect it? All recently queued up patches > > that could cause such problems should be readily bisectable. > > > > The bisection might be faster if you first checked whether 5bf728f02218 works - if > > it does then the bug is in the patches in WIP.x86/boot or WIP.x86/fpu. > > Fast ain't gonna happen, 5bf728f02218 bricked. :-/ Next point would be f9a42e0d58cf I suspect, to establish that Linus's latest kernel is fine. That means it's in one of the ~200 -tip commits - should be bisectable in 8-10 steps from that point on. Thanks, Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 2017-01-31 8:54 ` Ingo Molnar @ 2017-01-31 11:17 ` Mike Galbraith 2017-01-31 16:25 ` Ingo Molnar 0 siblings, 1 reply; 11+ messages in thread From: Mike Galbraith @ 2017-01-31 11:17 UTC (permalink / raw) To: Ingo Molnar; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Marc Zyngier On Tue, 2017-01-31 at 09:54 +0100, Ingo Molnar wrote: > > Fast ain't gonna happen, 5bf728f02218 bricked. > > :-/ > > Next point would be f9a42e0d58cf I suspect, to establish that Linus's latest > kernel is fine. That means it's in one of the ~200 -tip commits - should be > bisectable in 8-10 steps from that point on. It bisected cleanly to the below, confirmed via quilt push/pop revert. According to the symptoms my box exhibits, patchlet needs to be twiddled to ensure that interrupts are enabled at _least_ once ;-) 08d85f3ea99f1eeafc4e8507936190e86a16ee8c is the first bad commit commit 08d85f3ea99f1eeafc4e8507936190e86a16ee8c Author: Marc Zyngier <marc.zyngier@arm.com> Date: Tue Jan 17 16:00:48 2017 +0000 irqdomain: Avoid activating interrupts more than once Since commit f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early"), we can end-up activating a PCI/MSI twice (once at allocation time, and once at startup time). This is normally of no consequences, except that there is some HW out there that may misbehave if activate is used more than once (the GICv3 ITS, for example, uses the activate callback to issue the MAPVI command, and the architecture spec says that "If there is an existing mapping for the EventID-DeviceID combination, behavior is UNPREDICTABLE"). While this could be worked around in each individual driver, it may make more sense to tackle the issue at the core level. In order to avoid getting in that situation, let's have a per-interrupt flag to remember if we have already activated that interrupt or not. Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early") Reported-and-tested-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1484668848-24361-1-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> :040000 040000 eed859b1f22b822f4400e7c050929d8b4c4a146d 39097c0315a12c0a3809bb82687fa56b1c9e5633 M include :040000 040000 7dfe2ca8e1de55e890d0e6a761bab9c07c6f5f8a e28a3a54a68866273b474e2053b16155987e06f2 M kernel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 2017-01-31 11:17 ` Mike Galbraith @ 2017-01-31 16:25 ` Ingo Molnar 0 siblings, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2017-01-31 16:25 UTC (permalink / raw) To: Mike Galbraith, Thomas Gleixner Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Marc Zyngier * Mike Galbraith <efault@gmx.de> wrote: > On Tue, 2017-01-31 at 09:54 +0100, Ingo Molnar wrote: > > > > Fast ain't gonna happen, 5bf728f02218 bricked. > > > > :-/ > > > > Next point would be f9a42e0d58cf I suspect, to establish that Linus's latest > > kernel is fine. That means it's in one of the ~200 -tip commits - should be > > bisectable in 8-10 steps from that point on. > > It bisected cleanly to the below, confirmed via quilt push/pop revert. > According to the symptoms my box exhibits, patchlet needs to be > twiddled to ensure that interrupts are enabled at _least_ once ;-) > > 08d85f3ea99f1eeafc4e8507936190e86a16ee8c is the first bad commit > commit 08d85f3ea99f1eeafc4e8507936190e86a16ee8c > Author: Marc Zyngier <marc.zyngier@arm.com> > Date: Tue Jan 17 16:00:48 2017 +0000 > > irqdomain: Avoid activating interrupts more than once Fantastic, thanks Mike! Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-01-31 16:25 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-28 7:21 WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 Mike Galbraith 2017-01-30 11:59 ` Matt Fleming 2017-01-31 6:19 ` Mike Galbraith 2017-01-31 7:28 ` Ingo Molnar 2017-01-31 7:35 ` Mike Galbraith 2017-01-31 7:45 ` Ingo Molnar 2017-01-31 8:07 ` Mike Galbraith 2017-01-31 8:51 ` Mike Galbraith 2017-01-31 8:54 ` Ingo Molnar 2017-01-31 11:17 ` Mike Galbraith 2017-01-31 16:25 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).