linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
       [not found] ` <tip-cb42c9a3ebbbb23448c3f9a25417fae6309b1a92@git.kernel.org>
@ 2017-01-30 21:24   ` Michael Ellerman
  2017-01-30 21:34     ` Matt Fleming
  0 siblings, 1 reply; 22+ messages in thread
From: Michael Ellerman @ 2017-01-30 21:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: pmladek, riel, linuxppc-dev, luca.abeni, sergey.senozhatsky.work,
	matt, fweisbec, efault, yuyang.du, linux-kernel, byungchul.park,
	peterz, torvalds, umgwanakikbuti, linux-next, hpa, jack, tglx,
	mgorman, mingo, wanpeng.li

tip-bot for Matt Fleming <tipbot@zytor.com> writes:

> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 98e7eee..6eeae7e 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -768,48 +768,110 @@ static inline u64 __rq_clock_broken(struct rq *rq)
>  	return READ_ONCE(rq->clock);
>  }
>  
> +/*
> + * rq::clock_update_flags bits
> + *
> + * %RQCF_REQ_SKIP - will request skipping of clock update on the next
> + *  call to __schedule(). This is an optimisation to avoid
> + *  neighbouring rq clock updates.
> + *
> + * %RQCF_ACT_SKIP - is set from inside of __schedule() when skipping is
> + *  in effect and calls to update_rq_clock() are being ignored.
> + *
> + * %RQCF_UPDATED - is a debug flag that indicates whether a call has been
> + *  made to update_rq_clock() since the last time rq::lock was pinned.
> + *
> + * If inside of __schedule(), clock_update_flags will have been
> + * shifted left (a left shift is a cheap operation for the fast path
> + * to promote %RQCF_REQ_SKIP to %RQCF_ACT_SKIP), so you must use,
> + *
> + *	if (rq-clock_update_flags >= RQCF_UPDATED)
> + *
> + * to check if %RQCF_UPADTED is set. It'll never be shifted more than
> + * one position though, because the next rq_unpin_lock() will shift it
> + * back.
> + */
> +#define RQCF_REQ_SKIP	0x01
> +#define RQCF_ACT_SKIP	0x02
> +#define RQCF_UPDATED	0x04
> +
> +static inline void assert_clock_updated(struct rq *rq)
> +{
> +	/*
> +	 * The only reason for not seeing a clock update since the
> +	 * last rq_pin_lock() is if we're currently skipping updates.
> +	 */
> +	SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP);
> +}

I'm hitting this on multiple powerpc systems:

[   38.339126] rq->clock_update_flags < RQCF_ACT_SKIP
[   38.339134] ------------[ cut here ]------------
[   38.339142] WARNING: CPU: 2 PID: 1 at kernel/sched/sched.h:804 detach_task_cfs_rq+0xa0c/0xd10
[   38.339142] Modules linked in: fuse pseries_rng rng_core vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi autofs4 btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c multipath raid10
[   38.339162] CPU: 2 PID: 1 Comm: systemd Not tainted 4.10.0-rc6-gccN-next-20170130-g14dcd7e #1
[   38.339163] task: c000000ffc900000 task.stack: c0000007fda00000
[   38.339164] NIP: c00000000012d31c LR: c00000000012d318 CTR: c0000000005aed80
[   38.339165] REGS: c0000007fda03630 TRAP: 0700   Not tainted  (4.10.0-rc6-gccN-next-20170130-g14dcd7e)
[   38.339166] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
[   38.339170]   CR: 28044242  XER: 20000000
[   38.339171] CFAR: c000000000a732c8 SOFTE: 0 
               GPR00: c00000000012d318 c0000007fda038b0 c000000000f32c00 0000000000000026 
               GPR04: 0000000000000000 00000000000001f7 6574616470755f6b 0000000000000000 
               GPR08: 00000004baa70000 c000000000e1edb0 0000000000000000 0000000000000e8a 
               GPR12: 0000000000004400 c00000000eaa0800 00003fffc25fdd50 000000003a30b518 
               GPR16: 000001002495fe70 0000000000000000 000000003a301668 000000000000001b 
               GPR20: 0000000000000001 c000000ff1d0a000 c0000004a8191300 c00000000b12fe90 
               GPR24: c000000000e80500 c000000001026dfc c0000007fda03aa0 c0000007fda03b30 
               GPR28: c0000007fda03b10 c0000004bb864880 c00000000b12fe80 c0000004acb8a900 
[   38.339186] NIP [c00000000012d31c] detach_task_cfs_rq+0xa0c/0xd10
[   38.339188] LR [c00000000012d318] detach_task_cfs_rq+0xa08/0xd10
[   38.339188] Call Trace:
[   38.339189] [c0000007fda038b0] [c00000000012d318] detach_task_cfs_rq+0xa08/0xd10 (unreliable)
[   38.339192] [c0000007fda03970] [c000000000133cb8] task_change_group_fair+0x48/0x130
[   38.339193] [c0000007fda039b0] [c0000000001139e4] sched_change_group+0x64/0xd0
[   38.339195] [c0000007fda039f0] [c000000000121790] sched_move_task+0x60/0x210
[   38.339197] [c0000007fda03a40] [c000000000121984] cpu_cgroup_attach+0x44/0x70
[   38.339200] [c0000007fda03a80] [c0000000001b84d0] cgroup_migrate_execute+0x410/0x460
[   38.339202] [c0000007fda03af0] [c0000000001b8790] cgroup_attach_task+0x180/0x350
[   38.339203] [c0000007fda03bb0] [c0000000001bb76c] __cgroup_procs_write+0x30c/0x450
[   38.339205] [c0000007fda03c40] [c0000000001b5644] cgroup_file_write+0x74/0x160
[   38.339208] [c0000007fda03ca0] [c0000000003fec00] kernfs_fop_write+0x190/0x260
[   38.339210] [c0000007fda03cf0] [c000000000344a0c] __vfs_write+0x6c/0x1c0
[   38.339211] [c0000007fda03d90] [c000000000346988] vfs_write+0xd8/0x240
[   38.339213] [c0000007fda03de0] [c00000000034882c] SyS_write+0x6c/0x110
[   38.339215] [c0000007fda03e30] [c00000000000b760] system_call+0x38/0xfc
[   38.339215] Instruction dump:
[   38.339217] 60420000 3d22fff4 8929130e 2f890000 409ef6c4 39200001 3d42fff4 3c62ffd2 
[   38.339220] 3863b610 992a130e 48945f65 60000000 <0fe00000> 4bfff6a0 60000000 60000000 
[   38.339223] ---[ end trace d76dabbf37156e2a ]---


I assume I should be worried?

cheers

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-01-30 21:24   ` [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls Michael Ellerman
@ 2017-01-30 21:34     ` Matt Fleming
  2017-01-31  8:35       ` Michael Ellerman
  2017-01-31 11:00       ` Sachin Sant
  0 siblings, 2 replies; 22+ messages in thread
From: Matt Fleming @ 2017-01-30 21:34 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: fweisbec, tglx, pmladek, hpa, efault, sergey.senozhatsky.work,
	peterz, mgorman, wanpeng.li, umgwanakikbuti, byungchul.park,
	jack, mingo, riel, luca.abeni, yuyang.du, torvalds, linux-kernel,
	linux-tip-commits, linuxppc-dev, linux-next

On Tue, 31 Jan, at 08:24:53AM, Michael Ellerman wrote:
> 
> I'm hitting this on multiple powerpc systems:
> 
> [   38.339126] rq->clock_update_flags < RQCF_ACT_SKIP
> [   38.339134] ------------[ cut here ]------------
> [   38.339142] WARNING: CPU: 2 PID: 1 at kernel/sched/sched.h:804 detach_task_cfs_rq+0xa0c/0xd10

[...]
 
> I assume I should be worried?

Thanks for the report. No need to worry, the bug has existed for a
while, this patch just turns on the warning ;-)

The following commit queued up in tip/sched/core should fix your
issues (assuming you see the same callstack on all your powerpc
machines):

  https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=1b1d62254df0fe42a711eb71948f915918987790

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-01-30 21:34     ` Matt Fleming
@ 2017-01-31  8:35       ` Michael Ellerman
  2017-01-31 11:00       ` Sachin Sant
  1 sibling, 0 replies; 22+ messages in thread
From: Michael Ellerman @ 2017-01-31  8:35 UTC (permalink / raw)
  To: Matt Fleming
  Cc: fweisbec, tglx, pmladek, hpa, efault, sergey.senozhatsky.work,
	peterz, mgorman, wanpeng.li, umgwanakikbuti, byungchul.park,
	jack, mingo, riel, luca.abeni, yuyang.du, torvalds, linux-kernel,
	linux-tip-commits, linuxppc-dev, linux-next

Matt Fleming <matt@codeblueprint.co.uk> writes:

> On Tue, 31 Jan, at 08:24:53AM, Michael Ellerman wrote:
>> 
>> I'm hitting this on multiple powerpc systems:
>> 
>> [   38.339126] rq->clock_update_flags < RQCF_ACT_SKIP
>> [   38.339134] ------------[ cut here ]------------
>> [   38.339142] WARNING: CPU: 2 PID: 1 at kernel/sched/sched.h:804 detach_task_cfs_rq+0xa0c/0xd10
>
> [...]
>  
>> I assume I should be worried?
>
> Thanks for the report. No need to worry, the bug has existed for a
> while, this patch just turns on the warning ;-)
>
> The following commit queued up in tip/sched/core should fix your
> issues (assuming you see the same callstack on all your powerpc
> machines):
>
>   https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=1b1d62254df0fe42a711eb71948f915918987790

Great thanks.

Looks like that commit is in today's linux-next, so hopefully I won't
see any oopses in my boot tests overnight. If I do I'll let you know.

cheers

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-01-30 21:34     ` Matt Fleming
  2017-01-31  8:35       ` Michael Ellerman
@ 2017-01-31 11:00       ` Sachin Sant
  2017-01-31 11:48         ` Mike Galbraith
  1 sibling, 1 reply; 22+ messages in thread
From: Sachin Sant @ 2017-01-31 11:00 UTC (permalink / raw)
  To: Matt Fleming, Michael Ellerman
  Cc: linuxppc-dev, peterz, linux-next, linux-kernel

Trimming the cc list.

>> I assume I should be worried?
> 
> Thanks for the report. No need to worry, the bug has existed for a
> while, this patch just turns on the warning ;-)
> 
> The following commit queued up in tip/sched/core should fix your
> issues (assuming you see the same callstack on all your powerpc
> machines):
> 
>  https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=1b1d62254df0fe42a711eb71948f915918987790

I still see this warning with today’s next running inside PowerVM LPAR
on a POWER8 box. The stack trace is different from what Michael had
reported.

Easiest way to recreate this is to Online/offline cpu’s.

[  114.795609] rq->clock_update_flags < RQCF_ACT_SKIP
[  114.795621] ------------[ cut here ]------------
[  114.795632] WARNING: CPU: 2 PID: 27 at kernel/sched/sched.h:804 set_next_entity+0xbc8/0xcc0
[  114.795634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc rpadlpar_io rpaphp kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm iw_cxgb3 ib_core ghash_generic xts gf128mul tpm_ibmvtpm tpm sg vmx_crypto pseries_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sr_mod sd_mod cdrom cxgb3 ibmvscsi ibmveth scsi_transport
 _srp mdio
[  114.795751]  dm_mirror dm_region_hash dm_log dm_mod
[  114.795762] CPU: 2 PID: 27 Comm: migration/2 Not tainted 4.10.0-rc6-next-20170131 #1
[  114.795765] task: c0000004fa2f8600 task.stack: c0000004fa49c000
[  114.795768] NIP: c000000000114ed8 LR: c000000000114ed4 CTR: c0000000004a8cf0
[  114.795771] REGS: c0000004fa49f6a0 TRAP: 0700   Not tainted  (4.10.0-rc6-next-20170131)
[  114.795773] MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>
[  114.795787]   CR: 28004022  XER: 00000000
[  114.795789] CFAR: c0000000008ec5c4 SOFTE: 0 
GPR00: c000000000114ed4 c0000004fa49f920 c00000000100dd00 0000000000000026 
GPR04: 0000000000000000 0000000000000006 6574616470755f6b c0000000011cdd00 
GPR08: 0000000000000000 c000000000c6edb0 000000015ef20000 d000000006488538 
GPR12: 0000000000004400 c00000000e801200 c0000000000ecc38 c0000004fe064300 
GPR16: 0000000000000000 0000000000000001 0000000000000000 c000000000f27e08 
GPR20: c000000000f277c5 0000000000000000 0000000000000004 0000000000000000 
GPR24: c00000015fba49f0 c000000000f27e08 c000000000ef9e80 c0000004fa49fb00 
GPR28: c00000015fba4980 c00000015fba49f0 c0000004f34c1000 c00000015fba49f0 
[  114.795850] NIP [c000000000114ed8] set_next_entity+0xbc8/0xcc0
[  114.795855] LR [c000000000114ed4] set_next_entity+0xbc4/0xcc0
[  114.795857] Call Trace:
[  114.795862] [c0000004fa49f920] [c000000000114ed4] set_next_entity+0xbc4/0xcc0 (unreliable)
[  114.795869] [c0000004fa49f9d0] [c000000000119f4c] pick_next_task_fair+0xfc/0x6f0
[  114.795874] [c0000004fa49fae0] [c000000000104820] sched_cpu_dying+0x3c0/0x450
[  114.795880] [c0000004fa49fb80] [c0000000000c1958] cpuhp_invoke_callback+0x148/0x5b0
[  114.795886] [c0000004fa49fbf0] [c0000000000c3340] take_cpu_down+0xb0/0x110
[  114.795893] [c0000004fa49fc50] [c0000000001a1e58] multi_cpu_stop+0x1a8/0x1e0
[  114.795899] [c0000004fa49fca0] [c0000000001a20c4] cpu_stopper_thread+0x104/0x1e0
[  114.795905] [c0000004fa49fd60] [c0000000000f2b90] smpboot_thread_fn+0x290/0x2a0
[  114.795911] [c0000004fa49fdc0] [c0000000000ecd7c] kthread+0x14c/0x190
[  114.795919] [c0000004fa49fe30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[  114.795921] Instruction dump:
[  114.795924] 0fe00000 4bfff884 3d02fff2 89289ac5 2f890000 40fef4ec 39200001 3c62ffac 
[  114.795936] 38633698 99289ac5 487d76b5 60000000 <0fe00000> 4bfff4cc eb9f0118 e93f0120 
[  114.795948] ---[ end trace 5c822f32f967fbc5 ]---
[  123.059141] nr_pdflush_threads exported in /proc is scheduled for removal

Thanks
-Sachin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-01-31 11:00       ` Sachin Sant
@ 2017-01-31 11:48         ` Mike Galbraith
  2017-01-31 17:22           ` Ross Zwisler
  0 siblings, 1 reply; 22+ messages in thread
From: Mike Galbraith @ 2017-01-31 11:48 UTC (permalink / raw)
  To: Sachin Sant, Matt Fleming, Michael Ellerman
  Cc: linuxppc-dev, peterz, linux-next, linux-kernel

On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> Trimming the cc list.
> 
> > > I assume I should be worried?
> > 
> > Thanks for the report. No need to worry, the bug has existed for a
> > while, this patch just turns on the warning ;-)
> > 
> > The following commit queued up in tip/sched/core should fix your
> > issues (assuming you see the same callstack on all your powerpc
> > machines):
> > 
> >  https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=1b1d62254df0fe42a711eb71948f915918987790
> 
> I still see this warning with today’s next running inside PowerVM LPAR
> on a POWER8 box. The stack trace is different from what Michael had
> reported.
> 
> Easiest way to recreate this is to Online/offline cpu’s.

(Ditto tip.today, x86_64 + hotplug stress)

[   94.804196] ------------[ cut here ]------------
[   94.804201] WARNING: CPU: 3 PID: 27 at kernel/sched/sched.h:804 set_next_entity+0x81c/0x910
[   94.804201] rq->clock_update_flags < RQCF_ACT_SKIP
[   94.804202] Modules linked in: ebtable_filter(E) ebtables(E) fuse(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) nls_iso8859_1(E) crc32c_intel(E) nls_cp437(E) snd_hda_codec_realtek(E) snd_hda_codec_hdmi(E) snd_hda_codec_generic(E) nfsd(E) aesni_intel(E) snd_hda_intel(E) snd_hda_codec(E) snd_hwdep(E) aes_x86_64(E) snd_hda_core
 (E) crypto_simd(E)
[   94.804220]  snd_pcm(E) auth_rpcgss(E) snd_timer(E) snd(E) iTCO_wdt(E) iTCO_vendor_support(E) joydev(E) nfs_acl(E) lpc_ich(E) cryptd(E) lockd(E) intel_smartconnect(E) mfd_core(E) i2c_i801(E) battery(E) glue_helper(E) mei_me(E) shpchp(E) mei(E) soundcore(E) grace(E) fan(E) thermal(E) tpm_infineon(E) pcspkr(E) sunrpc(E) efivarfs(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ahci(E) xhci_pci(E) ehci_pci(E) ttm(E) libahci(E) xhci_hcd(E) ehci_hcd(E) r8169(E) mii(E) libata(E) drm(E) usbcore(E) fjes(E) video(E) button(E) af_packet(E) sd_mod(E) vfat(E) fat(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mod(E) loop(E) sg(E) scsi_
 mod(E) autofs4(E)
[   94.804246] CPU: 3 PID: 27 Comm: migration/3 Tainted: G            E   4.10.0-tip #15
[   94.804247] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[   94.804247] Call Trace:
[   94.804251]  ? dump_stack+0x5c/0x7c
[   94.804253]  ? __warn+0xc4/0xe0
[   94.804255]  ? warn_slowpath_fmt+0x4f/0x60
[   94.804256]  ? set_next_entity+0x81c/0x910
[   94.804258]  ? pick_next_task_fair+0x20a/0xa20
[   94.804259]  ? sched_cpu_starting+0x50/0x50
[   94.804260]  ? sched_cpu_dying+0x237/0x280
[   94.804261]  ? sched_cpu_starting+0x50/0x50
[   94.804262]  ? cpuhp_invoke_callback+0x83/0x3e0
[   94.804263]  ? take_cpu_down+0x56/0x90
[   94.804266]  ? multi_cpu_stop+0xa9/0xd0
[   94.804267]  ? cpu_stop_queue_work+0xb0/0xb0
[   94.804268]  ? cpu_stopper_thread+0x81/0x110
[   94.804270]  ? smpboot_thread_fn+0xfe/0x150
[   94.804272]  ? kthread+0xf4/0x130
[   94.804273]  ? sort_range+0x20/0x20
[   94.804274]  ? kthread_park+0x80/0x80
[   94.804276]  ? ret_from_fork+0x26/0x40
[   94.804277] ---[ end trace b0a9e4aa1fb229bb ]---

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-01-31 11:48         ` Mike Galbraith
@ 2017-01-31 17:22           ` Ross Zwisler
  2017-02-02 15:55             ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Ross Zwisler @ 2017-01-31 17:22 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sachin Sant, Matt Fleming, Michael Ellerman, linuxppc-dev,
	peterz, linux-next, LKML

On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
>> Trimming the cc list.
>>
>> > > I assume I should be worried?
>> >
>> > Thanks for the report. No need to worry, the bug has existed for a
>> > while, this patch just turns on the warning ;-)
>> >
>> > The following commit queued up in tip/sched/core should fix your
>> > issues (assuming you see the same callstack on all your powerpc
>> > machines):
>> >
>> >  https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=1b1d62254df0fe42a711eb71948f915918987790
>>
>> I still see this warning with today’s next running inside PowerVM LPAR
>> on a POWER8 box. The stack trace is different from what Michael had
>> reported.
>>
>> Easiest way to recreate this is to Online/offline cpu’s.
>
> (Ditto tip.today, x86_64 + hotplug stress)
<>

I'm also seeing a splat in the mmots tree with
v4.10-rc5-mmots-2017-01-26-15-49, which pulled in this commit by
merging with next.  Just booting on an x86_64 VM gives me this:

[   13.090436] ------------[ cut here ]------------
[   13.090577] WARNING: CPU: 8 PID: 1 at kernel/sched/sched.h:804
update_load_avg+0x85b/0xb80
[   13.090577] rq->clock_update_flags < RQCF_ACT_SKIP
[   13.090578] Modules linked in: dax_pmem dax nd_pmem nd_btt nd_e820 libnvdimm
[   13.090582] CPU: 8 PID: 1 Comm: systemd Not tainted
4.10.0-rc5-mm1-00313-g5c0c3d7-dirty #10
[   13.090583] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[   13.090583] Call Trace:
[   13.090585]  dump_stack+0x86/0xc3
[   13.090586]  __warn+0xcb/0xf0
[   13.090588]  warn_slowpath_fmt+0x5f/0x80
[   13.090590]  update_load_avg+0x85b/0xb80
[   13.090591]  ? debug_smp_processor_id+0x17/0x20
[   13.090593]  detach_task_cfs_rq+0x3f/0x210
[   13.090594]  task_change_group_fair+0x24/0x100
[   13.090596]  sched_change_group+0x5f/0x110
[   13.090597]  sched_move_task+0x53/0x160
[   13.090598]  cpu_cgroup_attach+0x36/0x70
[   13.090600]  cgroup_migrate_execute+0x230/0x3f0
[   13.090602]  cgroup_migrate+0xce/0x140
[   13.090603]  ? cgroup_migrate+0x5/0x140
[   13.090604]  cgroup_attach_task+0x27f/0x3e0
[   13.090606]  ? cgroup_attach_task+0x9b/0x3e0
[   13.090608]  __cgroup_procs_write+0x30e/0x510
[   13.090608]  ? __cgroup_procs_write+0x70/0x510
[   13.090610]  cgroup_procs_write+0x14/0x20
[   13.090611]  cgroup_file_write+0x44/0x1e0
[   13.090613]  kernfs_fop_write+0x13c/0x1c0
[   13.090614]  __vfs_write+0x37/0x160
[   13.090615]  ? rcu_read_lock_sched_held+0x4a/0x80
[   13.090616]  ? rcu_sync_lockdep_assert+0x2f/0x60
[   13.090617]  ? __sb_start_write+0x10d/0x220
[   13.090618]  ? vfs_write+0x19b/0x1f0
[   13.090619]  ? security_file_permission+0x3b/0xc0
[   13.090620]  vfs_write+0xcb/0x1f0
[   13.090621]  SyS_write+0x58/0xc0
[   13.090623]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[   13.090623] RIP: 0033:0x7f8b7c1be210
[   13.090624] RSP: 002b:00007ffe73febfd8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   13.090625] RAX: ffffffffffffffda RBX: 000055a84870a7e0 RCX: 00007f8b7c1be210
[   13.090625] RDX: 0000000000000004 RSI: 000055a84870aa10 RDI: 0000000000000033
[   13.090626] RBP: 0000000000000000 R08: 000055a84870a8c0 R09: 00007f8b7dbda900
[   13.090627] R10: 000055a84870aa10 R11: 0000000000000246 R12: 0000000000000000
[   13.090627] R13: 000055a848775360 R14: 000055a84870a7e0 R15: 0000000000000033
[   13.090629] ---[ end trace ba535936c2409043 ]---

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-01-31 17:22           ` Ross Zwisler
@ 2017-02-02 15:55             ` Peter Zijlstra
  2017-02-02 22:01               ` Matt Fleming
                                 ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Peter Zijlstra @ 2017-02-02 15:55 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Mike Galbraith, Sachin Sant, Matt Fleming, Michael Ellerman,
	linuxppc-dev, linux-next, LKML

On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
> On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
> > On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:


Could some of you test this? It seems to cure things in my (very)
limited testing.

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 96e4ccc..b773821 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5609,7 +5609,7 @@ static void migrate_tasks(struct rq *dead_rq)
 {
 	struct rq *rq = dead_rq;
 	struct task_struct *next, *stop = rq->stop;
-	struct rq_flags rf, old_rf;
+	struct rq_flags rf;
 	int dest_cpu;
 
 	/*
@@ -5628,7 +5628,9 @@ static void migrate_tasks(struct rq *dead_rq)
 	 * class method both need to have an up-to-date
 	 * value of rq->clock[_task]
 	 */
+	rq_pin_lock(rq, &rf);
 	update_rq_clock(rq);
+	rq_unpin_lock(rq, &rf);
 
 	for (;;) {
 		/*
@@ -5641,7 +5643,7 @@ static void migrate_tasks(struct rq *dead_rq)
 		/*
 		 * pick_next_task assumes pinned rq->lock.
 		 */
-		rq_pin_lock(rq, &rf);
+		rq_repin_lock(rq, &rf);
 		next = pick_next_task(rq, &fake_task, &rf);
 		BUG_ON(!next);
 		next->sched_class->put_prev_task(rq, next);
@@ -5670,13 +5672,6 @@ static void migrate_tasks(struct rq *dead_rq)
 			continue;
 		}
 
-		/*
-		 * __migrate_task() may return with a different
-		 * rq->lock held and a new cookie in 'rf', but we need
-		 * to preserve rf::clock_update_flags for 'dead_rq'.
-		 */
-		old_rf = rf;
-
 		/* Find suitable destination for @next, with force if needed. */
 		dest_cpu = select_fallback_rq(dead_rq->cpu, next);
 
@@ -5685,7 +5680,6 @@ static void migrate_tasks(struct rq *dead_rq)
 			raw_spin_unlock(&rq->lock);
 			rq = dead_rq;
 			raw_spin_lock(&rq->lock);
-			rf = old_rf;
 		}
 		raw_spin_unlock(&next->pi_lock);
 	}

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-02 15:55             ` Peter Zijlstra
@ 2017-02-02 22:01               ` Matt Fleming
  2017-02-03  3:05               ` Mike Galbraith
                                 ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Matt Fleming @ 2017-02-02 22:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ross Zwisler, Mike Galbraith, Sachin Sant, Michael Ellerman,
	linuxppc-dev, linux-next, LKML

On Thu, 02 Feb, at 04:55:06PM, Peter Zijlstra wrote:
> On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
> > On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
> > > On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> 
> 
> Could some of you test this? It seems to cure things in my (very)
> limited testing.

I haven't tested it but this looks like the correct fix to me.

Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-02 15:55             ` Peter Zijlstra
  2017-02-02 22:01               ` Matt Fleming
@ 2017-02-03  3:05               ` Mike Galbraith
  2017-02-03  4:33               ` Sachin Sant
                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Mike Galbraith @ 2017-02-03  3:05 UTC (permalink / raw)
  To: Peter Zijlstra, Ross Zwisler
  Cc: Sachin Sant, Matt Fleming, Michael Ellerman, linuxppc-dev,
	linux-next, LKML

On Thu, 2017-02-02 at 16:55 +0100, Peter Zijlstra wrote:
> On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
> > On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de>
> > wrote:
> > > On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> 
> 
> Could some of you test this? It seems to cure things in my (very)
> limited testing.

Hotplug stress gripe is gone here.

	-Mike

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-02 15:55             ` Peter Zijlstra
  2017-02-02 22:01               ` Matt Fleming
  2017-02-03  3:05               ` Mike Galbraith
@ 2017-02-03  4:33               ` Sachin Sant
  2017-02-03  8:53                 ` Peter Zijlstra
  2017-02-03 13:04               ` Borislav Petkov
  2017-02-22  9:03               ` Wanpeng Li
  4 siblings, 1 reply; 22+ messages in thread
From: Sachin Sant @ 2017-02-03  4:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ross Zwisler, Mike Galbraith, Matt Fleming, Michael Ellerman,
	linuxppc-dev, linux-next, LKML


> On 02-Feb-2017, at 9:25 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
>> On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
>>> On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> 
> 
> Could some of you test this? It seems to cure things in my (very)
> limited testing.
> 

I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
where I ran into rcu stall:

[  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
[  173.493473] 	8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
[  173.493476] 	(detected by 0, t=6002 jiffies, g=885, c=884, q=6350)
[  173.493482] Task dump for CPU 8:
[  173.493484] cpuhp/8         R  running task        0  3416      2 0x00000884
[  173.493489] Call Trace:
[  173.493492] [c0000004f7b834a0] [c0000004f7b83560] 0xc0000004f7b83560 (unreliable)
[  173.493498] [c0000004f7b83670] [c000000000008d28] alignment_common+0x128/0x130
[  173.493503] --- interrupt: 600 at _raw_spin_lock+0x2c/0xc0
[  173.493503]     LR = try_to_wake_up+0x204/0x5c0
[  173.493507] [c0000004f7b83960] [c0000004f4d8084c] 0xc0000004f4d8084c (unreliable)
[  173.493511] [c0000004f7b83990] [c0000000000fef54] try_to_wake_up+0x204/0x5c0
[  173.493515] [c0000004f7b83a10] [c0000000000e2b88] create_worker+0x148/0x250
[  173.493519] [c0000004f7b83ab0] [c0000000000e6e1c] alloc_unbound_pwq+0x3bc/0x4c0
[  173.493522] [c0000004f7b83b10] [c0000000000e7084] wq_update_unbound_numa+0x164/0x270
[  173.493526] [c0000004f7b83bb0] [c0000000000e8990] workqueue_online_cpu+0x250/0x3b0
[  173.493529] [c0000004f7b83c70] [c0000000000c2758] cpuhp_invoke_callback+0x148/0x5b0
[  173.493533] [c0000004f7b83ce0] [c0000000000c2df8] cpuhp_up_callbacks+0x48/0x140
[  173.493536] [c0000004f7b83d30] [c0000000000c3e98] cpuhp_thread_fun+0x148/0x180
[  173.493540] [c0000004f7b83d60] [c0000000000f3930] smpboot_thread_fn+0x290/0x2a0
[  173.493544] [c0000004f7b83dc0] [c0000000000edb3c] kthread+0x14c/0x190
[  173.493547] [c0000004f7b83e30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[  243.913715] INFO: task kworker/0:2:380 blocked for more than 120 seconds.
[  243.913732]       Not tainted 4.10.0-rc6-next-20170202 #6
[  243.913735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.913738] kworker/0:2     D    0   380      2 0x00000800
[  243.913746] Workqueue: events vmstat_shepherd
[  243.913748] Call Trace:
[  243.913752] [c0000000ff07f820] [c00000000011135c] enqueue_entity+0x81c/0x1200 (unreliable)
[  243.913757] [c0000000ff07f9f0] [c00000000001a660] __switch_to+0x300/0x400
[  243.913762] [c0000000ff07fa50] [c0000000008df4f4] __schedule+0x314/0xb10
[  243.913766] [c0000000ff07fb20] [c0000000008dfd30] schedule+0x40/0xb0
[  243.913769] [c0000000ff07fb50] [c0000000008e02b8] schedule_preempt_disabled+0x18/0x30
[  243.913773] [c0000000ff07fb70] [c0000000008e1654] __mutex_lock.isra.6+0x1a4/0x660
[  243.913777] [c0000000ff07fc00] [c0000000000c3828] get_online_cpus+0x48/0x90
[  243.913780] [c0000000ff07fc30] [c00000000025fd78] vmstat_shepherd+0x38/0x150
[  243.913784] [c0000000ff07fc80] [c0000000000e5794] process_one_work+0x1a4/0x4d0
[  243.913788] [c0000000ff07fd20] [c0000000000e5b58] worker_thread+0x98/0x5a0
[  243.913791] [c0000000ff07fdc0] [c0000000000edb3c] kthread+0x14c/0x190
[  243.913795] [c0000000ff07fe30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[  243.913824] INFO: task drmgr:3413 blocked for more than 120 seconds.
[  243.913826]       Not tainted 4.10.0-rc6-next-20170202 #6
[  243.913829] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.913831] drmgr           D    0  3413   3114 0x00040080
[  243.913834] Call Trace:
[  243.913836] [c000000257ff3380] [c000000257ff3440] 0xc000000257ff3440 (unreliable)
[  243.913840] [c000000257ff3550] [c00000000001a660] __switch_to+0x300/0x400
[  243.913844] [c000000257ff35b0] [c0000000008df4f4] __schedule+0x314/0xb10
[  243.913847] [c000000257ff3680] [c0000000008dfd30] schedule+0x40/0xb0
[  243.913851] [c000000257ff36b0] [c0000000008e4594] schedule_timeout+0x274/0x470
[  243.913855] [c000000257ff37b0] [c0000000008e0efc] wait_for_common+0x1ac/0x2c0
[  243.913858] [c000000257ff3830] [c0000000000c50e4] bringup_cpu+0x84/0xe0
[  243.913862] [c000000257ff3860] [c0000000000c2758] cpuhp_invoke_callback+0x148/0x5b0
[  243.913865] [c000000257ff38d0] [c0000000000c2df8] cpuhp_up_callbacks+0x48/0x140
[  243.913868] [c000000257ff3920] [c0000000000c5438] _cpu_up+0xe8/0x1c0
[  243.913872] [c000000257ff3980] [c0000000000c5630] do_cpu_up+0x120/0x150
[  243.913876] [c000000257ff3a00] [c0000000005c005c] cpu_subsys_online+0x5c/0xe0
[  243.913879] [c000000257ff3a50] [c0000000005b7d84] device_online+0xb4/0x120
[  243.913883] [c000000257ff3a90] [c000000000093424] dlpar_online_cpu+0x144/0x1e0
[  243.913887] [c000000257ff3b50] [c000000000093c08] dlpar_cpu_add+0x108/0x2f0
[  243.913891] [c000000257ff3be0] [c0000000000948dc] dlpar_cpu_probe+0x3c/0x80
[  243.913894] [c000000257ff3c20] [c0000000000207a8] arch_cpu_probe+0x38/0x60
[  243.913898] [c000000257ff3c40] [c0000000005c0880] cpu_probe_store+0x40/0x70
[  243.913902] [c000000257ff3c70] [c0000000005b2e94] dev_attr_store+0x34/0x60
[  243.913906] [c000000257ff3c90] [c0000000003b0fc4] sysfs_kf_write+0x64/0xa0
[  243.913910] [c000000257ff3cb0] [c0000000003afd10] kernfs_fop_write+0x170/0x250
[  243.913914] [c000000257ff3d00] [c0000000002fb0f0] __vfs_write+0x40/0x1c0
[  243.913917] [c000000257ff3d90] [c0000000002fcba8] vfs_write+0xc8/0x240
[  243.913921] [c000000257ff3de0] [c0000000002fe790] SyS_write+0x60/0x110
[  243.913924] [c000000257ff3e30] [c00000000000b184] system_call+0x38/0xe0
[  243.913929] INFO: task ppc64_cpu:3423 blocked for more than 120 seconds.
[  243.913931]       Not tainted 4.10.0-rc6-next-20170202 #6
[  243.913933] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Thanks
-Sachin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03  4:33               ` Sachin Sant
@ 2017-02-03  8:53                 ` Peter Zijlstra
  2017-02-03 11:04                   ` Sachin Sant
  2017-02-03 12:59                   ` Mike Galbraith
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2017-02-03  8:53 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Matt Fleming, Mike Galbraith, linuxppc-dev, LKML, linux-next,
	Ross Zwisler

On Fri, Feb 03, 2017 at 10:03:14AM +0530, Sachin Sant wrote:
> 
> > On 02-Feb-2017, at 9:25 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
> >> On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
> >>> On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> > 
> > 
> > Could some of you test this? It seems to cure things in my (very)
> > limited testing.
> > 
> 
> I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
> where I ran into rcu stall:
> 
> [  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  173.493473] 	8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
> [  173.493476] 	(detected by 0, t=6002 jiffies, g=885, c=884, q=6350)

Right, I actually saw that too, but I don't think that would be related
to my patch. I'll see if I can dig into this though, ought to get fixed
regardless.

Thanks for testing!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03  8:53                 ` Peter Zijlstra
@ 2017-02-03 11:04                   ` Sachin Sant
  2017-02-03 12:59                   ` Mike Galbraith
  1 sibling, 0 replies; 22+ messages in thread
From: Sachin Sant @ 2017-02-03 11:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matt Fleming, Ross Zwisler, Mike Galbraith, LKML, linux-next,
	linuxppc-dev

[-- Attachment #1: Type: text/html, Size: 948 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03  8:53                 ` Peter Zijlstra
  2017-02-03 11:04                   ` Sachin Sant
@ 2017-02-03 12:59                   ` Mike Galbraith
  2017-02-03 13:37                     ` Peter Zijlstra
  1 sibling, 1 reply; 22+ messages in thread
From: Mike Galbraith @ 2017-02-03 12:59 UTC (permalink / raw)
  To: Peter Zijlstra, Sachin Sant
  Cc: Ross Zwisler, Matt Fleming, Michael Ellerman, linuxppc-dev,
	linux-next, LKML

On Fri, 2017-02-03 at 09:53 +0100, Peter Zijlstra wrote:
> On Fri, Feb 03, 2017 at 10:03:14AM +0530, Sachin Sant wrote:

> > I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
> > where I ran into rcu stall:
> > 
> > [  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
> > [  173.493473] > > 	> > 8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
> > [  173.493476] > > 	> > (detected by 0, t=6002 jiffies, g=885, c=884, q=6350)
> 
> Right, I actually saw that too, but I don't think that would be related
> to my patch. I'll see if I can dig into this though, ought to get fixed
> regardless.

FWIW, I'm not seeing stalls/hangs while beating hotplug up in tip. (so
next grew a wart?)

	-Mike

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-02 15:55             ` Peter Zijlstra
                                 ` (2 preceding siblings ...)
  2017-02-03  4:33               ` Sachin Sant
@ 2017-02-03 13:04               ` Borislav Petkov
  2017-02-22  9:03               ` Wanpeng Li
  4 siblings, 0 replies; 22+ messages in thread
From: Borislav Petkov @ 2017-02-03 13:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ross Zwisler, Mike Galbraith, Sachin Sant, Matt Fleming,
	Michael Ellerman, linuxppc-dev, linux-next, LKML

On Thu, Feb 02, 2017 at 04:55:06PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
> > On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
> > > On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> 
> 
> Could some of you test this? It seems to cure things in my (very)
> limited testing.

Tested-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03 12:59                   ` Mike Galbraith
@ 2017-02-03 13:37                     ` Peter Zijlstra
  2017-02-03 13:52                       ` Mike Galbraith
  2017-02-03 15:44                       ` Paul E. McKenney
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2017-02-03 13:37 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sachin Sant, Ross Zwisler, Matt Fleming, Michael Ellerman,
	linuxppc-dev, linux-next, LKML, Paul McKenney

On Fri, Feb 03, 2017 at 01:59:34PM +0100, Mike Galbraith wrote:
> On Fri, 2017-02-03 at 09:53 +0100, Peter Zijlstra wrote:
> > On Fri, Feb 03, 2017 at 10:03:14AM +0530, Sachin Sant wrote:
> 
> > > I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
> > > where I ran into rcu stall:
> > > 
> > > [  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
> > > [  173.493473] > > 	> > 8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
> > > [  173.493476] > > 	> > (detected by 0, t=6002 jiffies, g=885, c=884, q=6350)
> > 
> > Right, I actually saw that too, but I don't think that would be related
> > to my patch. I'll see if I can dig into this though, ought to get fixed
> > regardless.
> 
> FWIW, I'm not seeing stalls/hangs while beating hotplug up in tip. (so
> next grew a wart?)

I've seen it on tip. It looks like hot unplug goes really slow when
there's running tasks on the CPU being taken down.

What I did was something like:

  taskset -p $((1<<1)) $$
  for ((i=0; i<20; i++)) do while :; do :; done & done

  taskset -p $((1<<0)) $$
  echo 0 > /sys/devices/system/cpu/cpu1/online

And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
_really_ slow and the RCU stall triggers. What I suspect happens is that
hotplug stops participating in the RCU state machine early, but only
tells RCU about it really late, and in between it gets suspicious it
takes too long.

I've yet to dig through the RCU code to figure out the exact sequence of
events, but found the above to be fairly reliable in triggering the
issue.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03 13:37                     ` Peter Zijlstra
@ 2017-02-03 13:52                       ` Mike Galbraith
  2017-02-03 15:44                       ` Paul E. McKenney
  1 sibling, 0 replies; 22+ messages in thread
From: Mike Galbraith @ 2017-02-03 13:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sachin Sant, Ross Zwisler, Matt Fleming, Michael Ellerman,
	linuxppc-dev, linux-next, LKML, Paul McKenney

On Fri, 2017-02-03 at 14:37 +0100, Peter Zijlstra wrote:
> On Fri, Feb 03, 2017 at 01:59:34PM +0100, Mike Galbraith wrote:

> > FWIW, I'm not seeing stalls/hangs while beating hotplug up in tip. (so
> > next grew a wart?)
> 
> I've seen it on tip. It looks like hot unplug goes really slow when
> there's running tasks on the CPU being taken down.
> 
> What I did was something like:
> 
>   taskset -p $((1<<1)) $$
>   for ((i=0; i<20; i++)) do while :; do :; done & done
> 
>   taskset -p $((1<<0)) $$
>   echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
> _really_ slow and the RCU stall triggers. What I suspect happens is that
> hotplug stops participating in the RCU state machine early, but only
> tells RCU about it really late, and in between it gets suspicious it
> takes too long.

Ah.  I wasn't doing a really hard pounding, just running a couple
instances of Steven's script.  To beat hell out of it, I add futextest,
stockfish and a small kbuild on a big box.

	-Mike

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03 13:37                     ` Peter Zijlstra
  2017-02-03 13:52                       ` Mike Galbraith
@ 2017-02-03 15:44                       ` Paul E. McKenney
  2017-02-03 15:54                         ` Paul E. McKenney
  1 sibling, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2017-02-03 15:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Sachin Sant, Ross Zwisler, Matt Fleming,
	Michael Ellerman, linuxppc-dev, linux-next, LKML

On Fri, Feb 03, 2017 at 02:37:48PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 03, 2017 at 01:59:34PM +0100, Mike Galbraith wrote:
> > On Fri, 2017-02-03 at 09:53 +0100, Peter Zijlstra wrote:
> > > On Fri, Feb 03, 2017 at 10:03:14AM +0530, Sachin Sant wrote:
> > 
> > > > I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
> > > > where I ran into rcu stall:
> > > > 
> > > > [  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
> > > > [  173.493473] > > 	> > 8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
> > > > [  173.493476] > > 	> > (detected by 0, t=6002 jiffies, g=885, c=884, q=6350)
> > > 
> > > Right, I actually saw that too, but I don't think that would be related
> > > to my patch. I'll see if I can dig into this though, ought to get fixed
> > > regardless.
> > 
> > FWIW, I'm not seeing stalls/hangs while beating hotplug up in tip. (so
> > next grew a wart?)
> 
> I've seen it on tip. It looks like hot unplug goes really slow when
> there's running tasks on the CPU being taken down.
> 
> What I did was something like:
> 
>   taskset -p $((1<<1)) $$
>   for ((i=0; i<20; i++)) do while :; do :; done & done
> 
>   taskset -p $((1<<0)) $$
>   echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
> _really_ slow and the RCU stall triggers. What I suspect happens is that
> hotplug stops participating in the RCU state machine early, but only
> tells RCU about it really late, and in between it gets suspicious it
> takes too long.
> 
> I've yet to dig through the RCU code to figure out the exact sequence of
> events, but found the above to be fairly reliable in triggering the
> issue.

If you send me the full splat from the dmesg and the RCU portions of
.config, I will take a look.  Is this new behavior, or a new test?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03 15:44                       ` Paul E. McKenney
@ 2017-02-03 15:54                         ` Paul E. McKenney
  2017-02-06  6:23                           ` Sachin Sant
  0 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2017-02-03 15:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Sachin Sant, Ross Zwisler, Matt Fleming,
	Michael Ellerman, linuxppc-dev, linux-next, LKML

On Fri, Feb 03, 2017 at 07:44:57AM -0800, Paul E. McKenney wrote:
> On Fri, Feb 03, 2017 at 02:37:48PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 03, 2017 at 01:59:34PM +0100, Mike Galbraith wrote:
> > > On Fri, 2017-02-03 at 09:53 +0100, Peter Zijlstra wrote:
> > > > On Fri, Feb 03, 2017 at 10:03:14AM +0530, Sachin Sant wrote:
> > > 
> > > > > I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
> > > > > where I ran into rcu stall:
> > > > > 
> > > > > [  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
> > > > > [  173.493473] > > 	> > 8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
> > > > > [  173.493476] > > 	> > (detected by 0, t=6002 jiffies, g=885, c=884, q=6350)
> > > > 
> > > > Right, I actually saw that too, but I don't think that would be related
> > > > to my patch. I'll see if I can dig into this though, ought to get fixed
> > > > regardless.
> > > 
> > > FWIW, I'm not seeing stalls/hangs while beating hotplug up in tip. (so
> > > next grew a wart?)
> > 
> > I've seen it on tip. It looks like hot unplug goes really slow when
> > there's running tasks on the CPU being taken down.
> > 
> > What I did was something like:
> > 
> >   taskset -p $((1<<1)) $$
> >   for ((i=0; i<20; i++)) do while :; do :; done & done
> > 
> >   taskset -p $((1<<0)) $$
> >   echo 0 > /sys/devices/system/cpu/cpu1/online
> > 
> > And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
> > _really_ slow and the RCU stall triggers. What I suspect happens is that
> > hotplug stops participating in the RCU state machine early, but only
> > tells RCU about it really late, and in between it gets suspicious it
> > takes too long.
> > 
> > I've yet to dig through the RCU code to figure out the exact sequence of
> > events, but found the above to be fairly reliable in triggering the
> > issue.

> If you send me the full splat from the dmesg and the RCU portions of
> .config, I will take a look.  Is this new behavior, or a new test?

If new behavior, I would be most suspicious of these commits in -rcu which
recently entered -tip:

19e4d983cda1 rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions
913324b1364f rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()
fcdcfefafa45 rcu: Pull rcu_qs_ctr into rcu_dynticks structure
0919a0b7e7a5 rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure
caa7c8e34293 rcu: Make rcu_note_context_switch() do deferred NOCB wakeups
41e4b159d516 rcu: Make rcu_all_qs() do deferred NOCB wakeups
b457a3356a68 rcu: Make call_rcu() do deferred NOCB wakeups

Does reverting any of these help?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03 15:54                         ` Paul E. McKenney
@ 2017-02-06  6:23                           ` Sachin Sant
  2017-02-06 15:10                             ` Paul E. McKenney
  0 siblings, 1 reply; 22+ messages in thread
From: Sachin Sant @ 2017-02-06  6:23 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, Matt Fleming, linuxppc-dev, Mike Galbraith, LKML,
	linux-next, Ross Zwisler


>>> I've seen it on tip. It looks like hot unplug goes really slow when
>>> there's running tasks on the CPU being taken down.
>>> 
>>> What I did was something like:
>>> 
>>>  taskset -p $((1<<1)) $$
>>>  for ((i=0; i<20; i++)) do while :; do :; done & done
>>> 
>>>  taskset -p $((1<<0)) $$
>>>  echo 0 > /sys/devices/system/cpu/cpu1/online
>>> 
>>> And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
>>> _really_ slow and the RCU stall triggers. What I suspect happens is that
>>> hotplug stops participating in the RCU state machine early, but only
>>> tells RCU about it really late, and in between it gets suspicious it
>>> takes too long.
>>> 
>>> I've yet to dig through the RCU code to figure out the exact sequence of
>>> events, but found the above to be fairly reliable in triggering the
>>> issue.
> 
>> If you send me the full splat from the dmesg and the RCU portions of
>> .config, I will take a look.  Is this new behavior, or a new test?
> 

I have sent the required files to you via separate email.

> If new behavior, I would be most suspicious of these commits in -rcu which
> recently entered -tip:
> 
> 19e4d983cda1 rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions
> 913324b1364f rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()
> fcdcfefafa45 rcu: Pull rcu_qs_ctr into rcu_dynticks structure
> 0919a0b7e7a5 rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure
> caa7c8e34293 rcu: Make rcu_note_context_switch() do deferred NOCB wakeups
> 41e4b159d516 rcu: Make rcu_all_qs() do deferred NOCB wakeups
> b457a3356a68 rcu: Make call_rcu() do deferred NOCB wakeups
> 
> Does reverting any of these help?

I tried reverting the above commits. That does not help. I can still recreate the issue.

Thanks
-Sachin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-06  6:23                           ` Sachin Sant
@ 2017-02-06 15:10                             ` Paul E. McKenney
  2017-02-06 15:14                               ` Paul E. McKenney
  0 siblings, 1 reply; 22+ messages in thread
From: Paul E. McKenney @ 2017-02-06 15:10 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Peter Zijlstra, Matt Fleming, linuxppc-dev, Mike Galbraith, LKML,
	linux-next, Ross Zwisler

On Mon, Feb 06, 2017 at 11:53:10AM +0530, Sachin Sant wrote:
> 
> >>> I've seen it on tip. It looks like hot unplug goes really slow when
> >>> there's running tasks on the CPU being taken down.
> >>> 
> >>> What I did was something like:
> >>> 
> >>>  taskset -p $((1<<1)) $$
> >>>  for ((i=0; i<20; i++)) do while :; do :; done & done
> >>> 
> >>>  taskset -p $((1<<0)) $$
> >>>  echo 0 > /sys/devices/system/cpu/cpu1/online
> >>> 
> >>> And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
> >>> _really_ slow and the RCU stall triggers. What I suspect happens is that
> >>> hotplug stops participating in the RCU state machine early, but only
> >>> tells RCU about it really late, and in between it gets suspicious it
> >>> takes too long.
> >>> 
> >>> I've yet to dig through the RCU code to figure out the exact sequence of
> >>> events, but found the above to be fairly reliable in triggering the
> >>> issue.
> > 
> >> If you send me the full splat from the dmesg and the RCU portions of
> >> .config, I will take a look.  Is this new behavior, or a new test?
> > 
> 
> I have sent the required files to you via separate email.
> 
> > If new behavior, I would be most suspicious of these commits in -rcu which
> > recently entered -tip:
> > 
> > 19e4d983cda1 rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions
> > 913324b1364f rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()
> > fcdcfefafa45 rcu: Pull rcu_qs_ctr into rcu_dynticks structure
> > 0919a0b7e7a5 rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure
> > caa7c8e34293 rcu: Make rcu_note_context_switch() do deferred NOCB wakeups
> > 41e4b159d516 rcu: Make rcu_all_qs() do deferred NOCB wakeups
> > b457a3356a68 rcu: Make call_rcu() do deferred NOCB wakeups
> > 
> > Does reverting any of these help?
> 
> I tried reverting the above commits. That does not help. I can still recreate the issue.

Thank you for testing, Sachin!

Could you please try building and testing with CONFIG_RCU_BOOST=y?
You will need to enable CONFIG_RCU_EXPERT=y to see this Kconfig option.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-06 15:10                             ` Paul E. McKenney
@ 2017-02-06 15:14                               ` Paul E. McKenney
  0 siblings, 0 replies; 22+ messages in thread
From: Paul E. McKenney @ 2017-02-06 15:14 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Peter Zijlstra, Matt Fleming, linuxppc-dev, Mike Galbraith, LKML,
	linux-next, Ross Zwisler

On Mon, Feb 06, 2017 at 07:10:48AM -0800, Paul E. McKenney wrote:
> On Mon, Feb 06, 2017 at 11:53:10AM +0530, Sachin Sant wrote:
> > 
> > >>> I've seen it on tip. It looks like hot unplug goes really slow when
> > >>> there's running tasks on the CPU being taken down.
> > >>> 
> > >>> What I did was something like:
> > >>> 
> > >>>  taskset -p $((1<<1)) $$
> > >>>  for ((i=0; i<20; i++)) do while :; do :; done & done
> > >>> 
> > >>>  taskset -p $((1<<0)) $$
> > >>>  echo 0 > /sys/devices/system/cpu/cpu1/online
> > >>> 
> > >>> And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
> > >>> _really_ slow and the RCU stall triggers. What I suspect happens is that
> > >>> hotplug stops participating in the RCU state machine early, but only
> > >>> tells RCU about it really late, and in between it gets suspicious it
> > >>> takes too long.
> > >>> 
> > >>> I've yet to dig through the RCU code to figure out the exact sequence of
> > >>> events, but found the above to be fairly reliable in triggering the
> > >>> issue.
> > > 
> > >> If you send me the full splat from the dmesg and the RCU portions of
> > >> .config, I will take a look.  Is this new behavior, or a new test?
> > > 
> > 
> > I have sent the required files to you via separate email.
> > 
> > > If new behavior, I would be most suspicious of these commits in -rcu which
> > > recently entered -tip:
> > > 
> > > 19e4d983cda1 rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions
> > > 913324b1364f rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()
> > > fcdcfefafa45 rcu: Pull rcu_qs_ctr into rcu_dynticks structure
> > > 0919a0b7e7a5 rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure
> > > caa7c8e34293 rcu: Make rcu_note_context_switch() do deferred NOCB wakeups
> > > 41e4b159d516 rcu: Make rcu_all_qs() do deferred NOCB wakeups
> > > b457a3356a68 rcu: Make call_rcu() do deferred NOCB wakeups
> > > 
> > > Does reverting any of these help?
> > 
> > I tried reverting the above commits. That does not help. I can still recreate the issue.
> 
> Thank you for testing, Sachin!
> 
> Could you please try building and testing with CONFIG_RCU_BOOST=y?
> You will need to enable CONFIG_RCU_EXPERT=y to see this Kconfig option.

Ah, but looking ahead to your .config file, you have CONFIG_PREEMPT=n,
which means boosting would not help and is not available in any case.

So it looks like there is a very long loop within an RCU read-side
critical section, and that this critical section needs to be broken
up a bit -- 21 seconds in pretty much any kind of critical section is
a bit excessive, after all.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-02 15:55             ` Peter Zijlstra
                                 ` (3 preceding siblings ...)
  2017-02-03 13:04               ` Borislav Petkov
@ 2017-02-22  9:03               ` Wanpeng Li
  4 siblings, 0 replies; 22+ messages in thread
From: Wanpeng Li @ 2017-02-22  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ross Zwisler, Mike Galbraith, Sachin Sant, Matt Fleming,
	Michael Ellerman, linuxppc-dev, linux-next, LKML

2017-02-02 23:55 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
>> On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
>> > On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
>
>
> Could some of you test this? It seems to cure things in my (very)
> limited testing.
>

Tested-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 96e4ccc..b773821 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5609,7 +5609,7 @@ static void migrate_tasks(struct rq *dead_rq)
>  {
>         struct rq *rq = dead_rq;
>         struct task_struct *next, *stop = rq->stop;
> -       struct rq_flags rf, old_rf;
> +       struct rq_flags rf;
>         int dest_cpu;
>
>         /*
> @@ -5628,7 +5628,9 @@ static void migrate_tasks(struct rq *dead_rq)
>          * class method both need to have an up-to-date
>          * value of rq->clock[_task]
>          */
> +       rq_pin_lock(rq, &rf);
>         update_rq_clock(rq);
> +       rq_unpin_lock(rq, &rf);
>
>         for (;;) {
>                 /*
> @@ -5641,7 +5643,7 @@ static void migrate_tasks(struct rq *dead_rq)
>                 /*
>                  * pick_next_task assumes pinned rq->lock.
>                  */
> -               rq_pin_lock(rq, &rf);
> +               rq_repin_lock(rq, &rf);
>                 next = pick_next_task(rq, &fake_task, &rf);
>                 BUG_ON(!next);
>                 next->sched_class->put_prev_task(rq, next);
> @@ -5670,13 +5672,6 @@ static void migrate_tasks(struct rq *dead_rq)
>                         continue;
>                 }
>
> -               /*
> -                * __migrate_task() may return with a different
> -                * rq->lock held and a new cookie in 'rf', but we need
> -                * to preserve rf::clock_update_flags for 'dead_rq'.
> -                */
> -               old_rf = rf;
> -
>                 /* Find suitable destination for @next, with force if needed. */
>                 dest_cpu = select_fallback_rq(dead_rq->cpu, next);
>
> @@ -5685,7 +5680,6 @@ static void migrate_tasks(struct rq *dead_rq)
>                         raw_spin_unlock(&rq->lock);
>                         rq = dead_rq;
>                         raw_spin_lock(&rq->lock);
> -                       rf = old_rf;
>                 }
>                 raw_spin_unlock(&next->pi_lock);
>         }

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-02-22  9:03 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20160921133813.31976-8-matt@codeblueprint.co.uk>
     [not found] ` <tip-cb42c9a3ebbbb23448c3f9a25417fae6309b1a92@git.kernel.org>
2017-01-30 21:24   ` [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls Michael Ellerman
2017-01-30 21:34     ` Matt Fleming
2017-01-31  8:35       ` Michael Ellerman
2017-01-31 11:00       ` Sachin Sant
2017-01-31 11:48         ` Mike Galbraith
2017-01-31 17:22           ` Ross Zwisler
2017-02-02 15:55             ` Peter Zijlstra
2017-02-02 22:01               ` Matt Fleming
2017-02-03  3:05               ` Mike Galbraith
2017-02-03  4:33               ` Sachin Sant
2017-02-03  8:53                 ` Peter Zijlstra
2017-02-03 11:04                   ` Sachin Sant
2017-02-03 12:59                   ` Mike Galbraith
2017-02-03 13:37                     ` Peter Zijlstra
2017-02-03 13:52                       ` Mike Galbraith
2017-02-03 15:44                       ` Paul E. McKenney
2017-02-03 15:54                         ` Paul E. McKenney
2017-02-06  6:23                           ` Sachin Sant
2017-02-06 15:10                             ` Paul E. McKenney
2017-02-06 15:14                               ` Paul E. McKenney
2017-02-03 13:04               ` Borislav Petkov
2017-02-22  9:03               ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).