linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
@ 2017-01-28  7:21 Mike Galbraith
  2017-01-30 11:59 ` Matt Fleming
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2017-01-28  7:21 UTC (permalink / raw)
  To: Matt Fleming; +Cc: lkml, Peter Zijlstra

Running Steven's hotplug stress script in tip.today.  Config is
NOPREEMPT, tune for maximum build time (enterprise default-ish).

[   75.268049] x86: Booting SMP configuration:
[   75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
[   75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
[   75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
[   75.359056] smpboot: CPU 3 is now offline
[   75.415505] smpboot: CPU 4 is now offline
[   75.479985] smpboot: CPU 5 is now offline
[   75.550674] ------------[ cut here ]------------
[   75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
[   75.550679] rq->clock_update_flags < RQCF_ACT_SKIP
[   75.550679] Modules linked in: ebtable_filter(E) ebtables(E) fuse(E) nf_log_ipv6(E) xt_pkttype(E) xt_physdev(E) br_netfilter(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) snd_hda_codec_hdmi(E) ip6_tables(E) x_tables(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) nls_iso8859_1(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E)
[   75.550703]  snd_pcm(E) nls_cp437(E) kvm_intel(E) snd_timer(E) kvm(E) irqbypass(E) nfsd(E) snd(E) crct10dif_pclmul(E) crc32_pclmul(E) auth_rpcgss(E) ghash_clmulni_intel(E) joydev(E) nfs_acl(E) lockd(E) soundcore(E) i2c_i801(E) shpchp(E) pcbc(E) aesni_intel(E) mei_me(E) aes_x86_64(E) crypto_simd(E) iTCO_wdt(E) iTCO_vendor_support(E) lpc_ich(E) mfd_core(E) glue_helper(E) pcspkr(E) mei(E) grace(E) cryptd(E) intel_smartconnect(E) battery(E) fan(E) thermal(E) tpm_infineon(E) sunrpc(E) efivarfs(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) ahci(E) wmi(E) libahci(E) i2c_algo_bit(E) drm_kms_helper(E) xhci_pci(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ehci_pci(E) ehci_hcd(E) ttm(E) xhci_hcd(E) crc32c_intel(E) r8169(E)
[   75.550721]  mii(E) libata(E) drm(E) usbcore(E) fjes(E) video(E) button(E) sd_mod(E) vfat(E) fat(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mod(E) loop(E) sg(E) scsi_mod(E) autofs4(E)
[   75.550728] CPU: 1 PID: 15 Comm: migration/1 Tainted: G            E   4.10.0-tip-default #47
[   75.550728] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[   75.550728] Call Trace:
[   75.550732]  dump_stack+0x63/0x87
[   75.550734]  __warn+0xd1/0xf0
[   75.550737]  ? load_balance+0xa00/0xa00
[   75.550738]  warn_slowpath_fmt+0x4f/0x60
[   75.550739]  ? cpumask_next_and+0x35/0x50
[   75.550740]  assert_clock_updated.isra.62.part.63+0x25/0x27
[   75.550741]  update_load_avg+0x855/0x950
[   75.550742]  ? load_balance+0xa00/0xa00
[   75.550743]  set_next_entity+0x9e/0x1b0
[   75.550744]  pick_next_task_fair+0x78/0x540
[   75.550746]  ? sched_clock+0x9/0x10
[   75.550747]  ? sched_clock_cpu+0x11/0xb0
[   75.550748]  ? load_balance+0xa00/0xa00
[   75.550749]  sched_cpu_dying+0x23c/0x280
[   75.550751]  ? fini_debug_store_on_cpu+0x34/0x40
[   75.550752]  ? sched_cpu_starting+0x60/0x60
[   75.550753]  cpuhp_invoke_callback+0x90/0x400
[   75.550754]  take_cpu_down+0x5e/0xa0
[   75.550757]  multi_cpu_stop+0xc4/0xf0
[   75.550757]  ? cpu_stop_queue_work+0xb0/0xb0
[   75.550758]  cpu_stopper_thread+0x8c/0x120
[   75.550760]  smpboot_thread_fn+0x110/0x160
[   75.550762]  kthread+0x101/0x140
[   75.550762]  ? sort_range+0x30/0x30
[   75.550763]  ? kthread_park+0x90/0x90
[   75.550766]  ret_from_fork+0x2c/0x40
[   75.550766] ---[ end trace 9dd372e3b19c77a0 ]---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
  2017-01-28  7:21 WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 Mike Galbraith
@ 2017-01-30 11:59 ` Matt Fleming
  2017-01-31  6:19   ` Mike Galbraith
  0 siblings, 1 reply; 11+ messages in thread
From: Matt Fleming @ 2017-01-30 11:59 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: lkml, Peter Zijlstra

On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote:
> Running Steven's hotplug stress script in tip.today.  Config is
> NOPREEMPT, tune for maximum build time (enterprise default-ish).
> 
> [   75.268049] x86: Booting SMP configuration:
> [   75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [   75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
> [   75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
> [   75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
> [   75.359056] smpboot: CPU 3 is now offline
> [   75.415505] smpboot: CPU 4 is now offline
> [   75.479985] smpboot: CPU 5 is now offline
> [   75.550674] ------------[ cut here ]------------
> [   75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
> [   75.550679] rq->clock_update_flags < RQCF_ACT_SKIP

The following patch queued in tip/sched/core should fix this issue:

---->8----

>From 4d25b35ea3729affd37d69c78191ce6f92766e1a Mon Sep 17 00:00:00 2001
From: Matt Fleming <matt@codeblueprint.co.uk>
Date: Wed, 26 Oct 2016 16:15:44 +0100
Subject: [PATCH] sched/fair: Restore previous rq_flags when migrating tasks in
 hotplug

__migrate_task() can return with a different runqueue locked than the
one we passed as an argument. So that we can repin the lock in
migrate_tasks() (and keep the update_rq_clock() bit) we need to
restore the old rq_flags before repinning.

Note that it wouldn't be correct to change move_queued_task() to repin
because of the change of runqueue and the fact that having an
up-to-date clock on the initial rq doesn't mean the new rq has one
too.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f983e83a353..3b248b03ad8f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5608,7 +5608,7 @@ static void migrate_tasks(struct rq *dead_rq)
 {
 	struct rq *rq = dead_rq;
 	struct task_struct *next, *stop = rq->stop;
-	struct rq_flags rf;
+	struct rq_flags rf, old_rf;
 	int dest_cpu;
 
 	/*
@@ -5669,6 +5669,13 @@ static void migrate_tasks(struct rq *dead_rq)
 			continue;
 		}
 
+		/*
+		 * __migrate_task() may return with a different
+		 * rq->lock held and a new cookie in 'rf', but we need
+		 * to preserve rf::clock_update_flags for 'dead_rq'.
+		 */
+		old_rf = rf;
+
 		/* Find suitable destination for @next, with force if needed. */
 		dest_cpu = select_fallback_rq(dead_rq->cpu, next);
 
@@ -5677,6 +5684,7 @@ static void migrate_tasks(struct rq *dead_rq)
 			raw_spin_unlock(&rq->lock);
 			rq = dead_rq;
 			raw_spin_lock(&rq->lock);
+			rf = old_rf;
 		}
 		raw_spin_unlock(&next->pi_lock);
 	}
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
  2017-01-30 11:59 ` Matt Fleming
@ 2017-01-31  6:19   ` Mike Galbraith
  2017-01-31  7:28     ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2017-01-31  6:19 UTC (permalink / raw)
  To: Matt Fleming; +Cc: lkml, Peter Zijlstra, Ingo Molnar

On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote:
> On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote:
> > Running Steven's hotplug stress script in tip.today.  Config is
> > NOPREEMPT, tune for maximum build time (enterprise default-ish).
> > 
> > [   75.268049] x86: Booting SMP configuration:
> > [   75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [   75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
> > [   75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
> > [   75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
> > [   75.359056] smpboot: CPU 3 is now offline
> > [   75.415505] smpboot: CPU 4 is now offline
> > [   75.479985] smpboot: CPU 5 is now offline
> > [   75.550674] ------------[ cut here ]------------
> > [   75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804
> > assert_clock_updated.isra.62.part.63+0x25/0x27
> > [   75.550679] rq->clock_update_flags < RQCF_ACT_SKIP
> 
> The following patch queued in tip/sched/core should fix this issue:

Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
an early boot brick problem.

> ---->8----
> 
> From 4d25b35ea3729affd37d69c78191ce6f92766e1a Mon Sep 17 00:00:00
> 2001
> From: Matt Fleming <matt@codeblueprint.co.uk>
> Date: Wed, 26 Oct 2016 16:15:44 +0100
> Subject: [PATCH] sched/fair: Restore previous rq_flags when migrating
> tasks in
>  hotplug
> 
> __migrate_task() can return with a different runqueue locked than the
> one we passed as an argument. So that we can repin the lock in
> migrate_tasks() (and keep the update_rq_clock() bit) we need to
> restore the old rq_flags before repinning.
> 
> Note that it wouldn't be correct to change move_queued_task() to
> repin
> because of the change of runqueue and the fact that having an
> up-to-date clock on the initial rq doesn't mean the new rq has one
> too.
> 
> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/sched/core.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f983e83a353..3b248b03ad8f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5608,7 +5608,7 @@ static void migrate_tasks(struct rq *dead_rq)
>  {
>  	struct rq *rq = dead_rq;
>  	struct task_struct *next, *stop = rq->stop;
> -	struct rq_flags rf;
> +	struct rq_flags rf, old_rf;
>  	int dest_cpu;
>  
>  	/*
> @@ -5669,6 +5669,13 @@ static void migrate_tasks(struct rq *dead_rq)
>  			continue;
>  		}
>  
> +		/*
> +		 * __migrate_task() may return with a different
> +		 * rq->lock held and a new cookie in 'rf', but we
> need
> +		 * to preserve rf::clock_update_flags for 'dead_rq'.
> +		 */
> +		old_rf = rf;
> +
>  		/* Find suitable destination for @next, with force
> if needed. */
>  		dest_cpu = select_fallback_rq(dead_rq->cpu, next);
>  
> @@ -5677,6 +5684,7 @@ static void migrate_tasks(struct rq *dead_rq)
>  			raw_spin_unlock(&rq->lock);
>  			rq = dead_rq;
>  			raw_spin_lock(&rq->lock);
> +			rf = old_rf;
>  		}
>  		raw_spin_unlock(&next->pi_lock);
>  	}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
  2017-01-31  6:19   ` Mike Galbraith
@ 2017-01-31  7:28     ` Ingo Molnar
  2017-01-31  7:35       ` Mike Galbraith
  2017-01-31  8:51       ` Mike Galbraith
  0 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2017-01-31  7:28 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar


* Mike Galbraith <efault@gmx.de> wrote:

> On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote:
> > On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote:
> > > Running Steven's hotplug stress script in tip.today.  Config is
> > > NOPREEMPT, tune for maximum build time (enterprise default-ish).
> > > 
> > > [   75.268049] x86: Booting SMP configuration:
> > > [   75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > > [   75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
> > > [   75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
> > > [   75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
> > > [   75.359056] smpboot: CPU 3 is now offline
> > > [   75.415505] smpboot: CPU 4 is now offline
> > > [   75.479985] smpboot: CPU 5 is now offline
> > > [   75.550674] ------------[ cut here ]------------
> > > [   75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804
> > > assert_clock_updated.isra.62.part.63+0x25/0x27
> > > [   75.550679] rq->clock_update_flags < RQCF_ACT_SKIP
> > 
> > The following patch queued in tip/sched/core should fix this issue:
> 
> Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> an early boot brick problem.

That's bad - could you perhaps try to bisect it? All recently queued up patches 
that could cause such problems should be readily bisectable.

The bisection might be faster if you first checked whether 5bf728f02218 works - if 
it does then the bug is in the patches in WIP.x86/boot or WIP.x86/fpu.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
  2017-01-31  7:28     ` Ingo Molnar
@ 2017-01-31  7:35       ` Mike Galbraith
  2017-01-31  7:45         ` Ingo Molnar
  2017-01-31  8:51       ` Mike Galbraith
  1 sibling, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2017-01-31  7:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar

On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:

> > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> > an early boot brick problem.
> 
> That's bad - could you perhaps try to bisect it? All recently queued up patches 
> that could cause such problems should be readily bisectable.

Yeah, I'll give it a go as soon as I get some other stuff done.

	-Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
  2017-01-31  7:35       ` Mike Galbraith
@ 2017-01-31  7:45         ` Ingo Molnar
  2017-01-31  8:07           ` Mike Galbraith
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2017-01-31  7:45 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Borislav Petkov


* Mike Galbraith <efault@gmx.de> wrote:

> On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote:
> > * Mike Galbraith <efault@gmx.de> wrote:
> 
> > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> > > an early boot brick problem.
> > 
> > That's bad - could you perhaps try to bisect it? All recently queued up patches 
> > that could cause such problems should be readily bisectable.
> 
> Yeah, I'll give it a go as soon as I get some other stuff done.

Please double check whether -tip f18a8a0143b1 works for you (latestest -tip 
freshly pushed out), it might be that my bogus conflict resolution of a 
x86/microcode conflict is what caused your boot problems?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
  2017-01-31  7:45         ` Ingo Molnar
@ 2017-01-31  8:07           ` Mike Galbraith
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Galbraith @ 2017-01-31  8:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Borislav Petkov

On Tue, 2017-01-31 at 08:45 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
> 
> > On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote:
> > > * Mike Galbraith <efault@gmx.de> wrote:
> > 
> > > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> > > > an early boot brick problem.
> > > 
> > > That's bad - could you perhaps try to bisect it? All recently queued up patches 
> > > that could cause such problems should be readily bisectable.
> > 
> > Yeah, I'll give it a go as soon as I get some other stuff done.
> 
> Please double check whether -tip f18a8a0143b1 works for you (latestest -tip 
> freshly pushed out), it might be that my bogus conflict resolution of a 
> x86/microcode conflict is what caused your boot problems?

Oh darn, it's a nogo.  Back to plan A.

	-Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
  2017-01-31  7:28     ` Ingo Molnar
  2017-01-31  7:35       ` Mike Galbraith
@ 2017-01-31  8:51       ` Mike Galbraith
  2017-01-31  8:54         ` Ingo Molnar
  1 sibling, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2017-01-31  8:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar

On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
> 
> > On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote:
> > > On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote:
> > > > Running Steven's hotplug stress script in tip.today.  Config is
> > > > NOPREEMPT, tune for maximum build time (enterprise default-ish).
> > > > 
> > > > [   75.268049] x86: Booting SMP configuration:
> > > > [   75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > > > [   75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
> > > > [   75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
> > > > [   75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
> > > > [   75.359056] smpboot: CPU 3 is now offline
> > > > [   75.415505] smpboot: CPU 4 is now offline
> > > > [   75.479985] smpboot: CPU 5 is now offline
> > > > [   75.550674] ------------[ cut here ]------------
> > > > [   75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804
> > > > assert_clock_updated.isra.62.part.63+0x25/0x27
> > > > [   75.550679] rq->clock_update_flags < RQCF_ACT_SKIP
> > > 
> > > The following patch queued in tip/sched/core should fix this issue:
> > 
> > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> > an early boot brick problem.
> 
> That's bad - could you perhaps try to bisect it? All recently queued up patches 
> that could cause such problems should be readily bisectable.
> 
> The bisection might be faster if you first checked whether 5bf728f02218 works - if 
> it does then the bug is in the patches in WIP.x86/boot or WIP.x86/fpu.

Fast ain't gonna happen, 5bf728f02218 bricked.

	-Mike

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
  2017-01-31  8:51       ` Mike Galbraith
@ 2017-01-31  8:54         ` Ingo Molnar
  2017-01-31 11:17           ` Mike Galbraith
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2017-01-31  8:54 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar


* Mike Galbraith <efault@gmx.de> wrote:

> On Tue, 2017-01-31 at 08:28 +0100, Ingo Molnar wrote:
> > * Mike Galbraith <efault@gmx.de> wrote:
> > 
> > > On Mon, 2017-01-30 at 11:59 +0000, Matt Fleming wrote:
> > > > On Sat, 28 Jan, at 08:21:05AM, Mike Galbraith wrote:
> > > > > Running Steven's hotplug stress script in tip.today.  Config is
> > > > > NOPREEMPT, tune for maximum build time (enterprise default-ish).
> > > > > 
> > > > > [   75.268049] x86: Booting SMP configuration:
> > > > > [   75.268052] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > > > > [   75.279994] smpboot: Booting Node 0 Processor 2 APIC 0x4
> > > > > [   75.294617] smpboot: Booting Node 0 Processor 4 APIC 0x1
> > > > > [   75.310698] smpboot: Booting Node 0 Processor 5 APIC 0x3
> > > > > [   75.359056] smpboot: CPU 3 is now offline
> > > > > [   75.415505] smpboot: CPU 4 is now offline
> > > > > [   75.479985] smpboot: CPU 5 is now offline
> > > > > [   75.550674] ------------[ cut here ]------------
> > > > > [   75.550678] WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804
> > > > > assert_clock_updated.isra.62.part.63+0x25/0x27
> > > > > [   75.550679] rq->clock_update_flags < RQCF_ACT_SKIP
> > > > 
> > > > The following patch queued in tip/sched/core should fix this issue:
> > > 
> > > Weeell, I'll have to take your word for it, as tip g35669bb7fd46 grew
> > > an early boot brick problem.
> > 
> > That's bad - could you perhaps try to bisect it? All recently queued up patches 
> > that could cause such problems should be readily bisectable.
> > 
> > The bisection might be faster if you first checked whether 5bf728f02218 works - if 
> > it does then the bug is in the patches in WIP.x86/boot or WIP.x86/fpu.
> 
> Fast ain't gonna happen, 5bf728f02218 bricked.

:-/

Next point would be f9a42e0d58cf I suspect, to establish that Linus's latest 
kernel is fine. That means it's in one of the ~200 -tip commits - should be 
bisectable in 8-10 steps from that point on.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
  2017-01-31  8:54         ` Ingo Molnar
@ 2017-01-31 11:17           ` Mike Galbraith
  2017-01-31 16:25             ` Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Galbraith @ 2017-01-31 11:17 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Marc Zyngier

On Tue, 2017-01-31 at 09:54 +0100, Ingo Molnar wrote:

> > Fast ain't gonna happen, 5bf728f02218 bricked.
> 
> :-/
> 
> Next point would be f9a42e0d58cf I suspect, to establish that Linus's latest 
> kernel is fine. That means it's in one of the ~200 -tip commits - should be 
> bisectable in 8-10 steps from that point on.

It bisected cleanly to the below, confirmed via quilt push/pop revert. 
 According to the symptoms my box exhibits, patchlet needs to be
twiddled to ensure that interrupts are enabled at _least_ once ;-)

08d85f3ea99f1eeafc4e8507936190e86a16ee8c is the first bad commit
commit 08d85f3ea99f1eeafc4e8507936190e86a16ee8c
Author: Marc Zyngier <marc.zyngier@arm.com>
Date:   Tue Jan 17 16:00:48 2017 +0000

    irqdomain: Avoid activating interrupts more than once
    
    Since commit f3b0946d629c ("genirq/msi: Make sure PCI MSIs are
    activated early"), we can end-up activating a PCI/MSI twice (once
    at allocation time, and once at startup time).
    
    This is normally of no consequences, except that there is some
    HW out there that may misbehave if activate is used more than once
    (the GICv3 ITS, for example, uses the activate callback
    to issue the MAPVI command, and the architecture spec says that
    "If there is an existing mapping for the EventID-DeviceID
    combination, behavior is UNPREDICTABLE").
    
    While this could be worked around in each individual driver, it may
    make more sense to tackle the issue at the core level. In order to
    avoid getting in that situation, let's have a per-interrupt flag
    to remember if we have already activated that interrupt or not.
    
    Fixes: f3b0946d629c ("genirq/msi: Make sure PCI MSIs are activated early")
    Reported-and-tested-by: Andre Przywara <andre.przywara@arm.com>
    Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/1484668848-24361-1-git-send-email-marc.zyngier@arm.com
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

:040000 040000 eed859b1f22b822f4400e7c050929d8b4c4a146d 39097c0315a12c0a3809bb82687fa56b1c9e5633 M	include
:040000 040000 7dfe2ca8e1de55e890d0e6a761bab9c07c6f5f8a e28a3a54a68866273b474e2053b16155987e06f2 M	kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27
  2017-01-31 11:17           ` Mike Galbraith
@ 2017-01-31 16:25             ` Ingo Molnar
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2017-01-31 16:25 UTC (permalink / raw)
  To: Mike Galbraith, Thomas Gleixner
  Cc: Matt Fleming, lkml, Peter Zijlstra, Ingo Molnar, Marc Zyngier


* Mike Galbraith <efault@gmx.de> wrote:

> On Tue, 2017-01-31 at 09:54 +0100, Ingo Molnar wrote:
> 
> > > Fast ain't gonna happen, 5bf728f02218 bricked.
> > 
> > :-/
> > 
> > Next point would be f9a42e0d58cf I suspect, to establish that Linus's latest 
> > kernel is fine. That means it's in one of the ~200 -tip commits - should be 
> > bisectable in 8-10 steps from that point on.
> 
> It bisected cleanly to the below, confirmed via quilt push/pop revert. 
>  According to the symptoms my box exhibits, patchlet needs to be
> twiddled to ensure that interrupts are enabled at _least_ once ;-)
> 
> 08d85f3ea99f1eeafc4e8507936190e86a16ee8c is the first bad commit
> commit 08d85f3ea99f1eeafc4e8507936190e86a16ee8c
> Author: Marc Zyngier <marc.zyngier@arm.com>
> Date:   Tue Jan 17 16:00:48 2017 +0000
> 
>     irqdomain: Avoid activating interrupts more than once

Fantastic, thanks Mike!

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-01-31 16:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-28  7:21 WARNING: CPU: 1 PID: 15 at kernel/sched/sched.h:804 assert_clock_updated.isra.62.part.63+0x25/0x27 Mike Galbraith
2017-01-30 11:59 ` Matt Fleming
2017-01-31  6:19   ` Mike Galbraith
2017-01-31  7:28     ` Ingo Molnar
2017-01-31  7:35       ` Mike Galbraith
2017-01-31  7:45         ` Ingo Molnar
2017-01-31  8:07           ` Mike Galbraith
2017-01-31  8:51       ` Mike Galbraith
2017-01-31  8:54         ` Ingo Molnar
2017-01-31 11:17           ` Mike Galbraith
2017-01-31 16:25             ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).