All of lore.kernel.org
 help / color / mirror / Atom feed
* tip.today - scheduler  bam boom crash (cpu hotplug)
@ 2017-01-19  7:31 Mike Galbraith
  2017-01-19 10:19 ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Mike Galbraith @ 2017-01-19  7:31 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar

Mindless testing only, too sick to work, not sick enough to be immune
to boredom.  Was verifying first warning wasn't somehow rt inspired,
but while doing so, plain nopreempt (and no rt patch set) went boom.

[  203.088255] smpboot: CPU 1 is now offline
[  203.168181] smpboot: CPU 2 is now offline
[  203.221461] x86: Booting SMP configuration:
[  203.221464] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  203.221728] ------------[ cut here ]------------
[  203.221733] WARNING: CPU: 1 PID: 0 at kernel/sched/clock.c:149 set_sched_clock_stable+0x43/0x50
[  203.221733] Modules linked in: nls_utf8(E) isofs(E) ebtable_filter(E) ebtables(E) fuse(E) nf_log_ipv6(E) xt_pkttype(E) xt_physdev(E) br_netfilter(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) nls_iso8859_1(E) snd_hda_codec_hdmi(E) nls_cp437(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) kvm(E) snd_hda_intel(E)
[  203.221748]  snd_hda_codec(E) irqbypass(E) crct10dif_pclmul(E) snd_hda_core(E) snd_hwdep(E) nfsd(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) pcbc(E) snd_pcm(E) auth_rpcgss(E) aesni_intel(E) aes_x86_64(E) snd_timer(E) nfs_acl(E) joydev(E) crypto_simd(E) snd(E) lockd(E) grace(E) iTCO_wdt(E) iTCO_vendor_support(E) lpc_ich(E) mei_me(E) i2c_i801(E) mei(E) pcspkr(E) glue_helper(E) mfd_core(E) shpchp(E) intel_smartconnect(E) sunrpc(E) soundcore(E) tpm_infineon(E) fan(E) thermal(E) battery(E) cryptd(E) efivarfs(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) libahci(E) xhci_pci(E) ehci_pci(E) xhci_hcd(E) ehci_hcd(E)
[  203.221765]  ttm(E) libata(E) r8169(E) mii(E) drm(E) usbcore(E) fjes(E) video(E) button(E) sd_mod(E) vfat(E) fat(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mod(E) loop(E) sg(E) scsi_mod(E) autofs4(E)
[  203.221773] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E   4.10.0-tip-default #29
[  203.221774] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[  203.221774] Call Trace:
[  203.221778]  dump_stack+0x63/0x90
[  203.221780]  __warn+0xd1/0xf0
[  203.221782]  warn_slowpath_null+0x1d/0x20
[  203.221782]  set_sched_clock_stable+0x43/0x50
[  203.221784]  early_init_intel+0x225/0x360
[  203.221785]  init_intel+0x18/0x2d0
[  203.221786]  identify_cpu+0x2d1/0x4d0
[  203.221786]  identify_secondary_cpu+0x18/0x80
[  203.221789]  smp_store_cpu_info+0x3e/0x40
[  203.221790]  start_secondary+0x53/0x180
[  203.221791]  start_cpu+0x14/0x14
[  203.221792] ---[ end trace 262c7e4b746d5a76 ]---
....
[  207.525918] smpboot: CPU 2 is now offline
[  207.586516] smpboot: CPU 4 is now offline
[  207.642988] smpboot: CPU 6 is now offline
[  207.682207] x86: Booting SMP configuration:
[  207.682210] smpboot: Booting Node 0 Processor 1 APIC 0x2
[  207.682505] sched_clock: Marking stable (207412639708, 0)->(207410993286, 1646422)
[  207.706220] smpboot: Booting Node 0 Processor 2 APIC 0x4
[  207.706502] sched_clock: Marking stable (207524564558, 0)->(207522917721, 1646837)
[  207.730376] smpboot: Booting Node 0 Processor 4 APIC 0x1
[  207.730644] sched_clock: Marking stable (207585448402, 0)->(207583801309, 1647093)
[  207.754593] smpboot: Booting Node 0 Processor 6 APIC 0x5
[  207.754881] sched_clock: Marking stable (207641939733, 0)->(207640292394, 1647339)
[  207.802195] smpboot: CPU 3 is now offline
[  207.862396] smpboot: CPU 4 is now offline
[  207.927159] ------------[ cut here ]------------
[  207.927163] WARNING: CPU: 6 PID: 45 at kernel/sched/sched.h:807 assert_clock_updated.isra.62.part.63+0x25/0x27
[  207.927164] rq->clock_update_flags < RQCF_ACT_SKIP
[  207.927164] Modules linked in: nls_utf8(E) isofs(E) ebtable_filter(E) ebtables(E) fuse(E) nf_log_ipv6(E) xt_pkttype(E) xt_physdev(E) br_netfilter(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) nls_iso8859_1(E) snd_hda_codec_hdmi(E) nls_cp437(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) kvm(E) snd_hda_intel(E)
[  207.927186]  snd_hda_codec(E) irqbypass(E) crct10dif_pclmul(E) snd_hda_core(E) snd_hwdep(E) nfsd(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) pcbc(E) snd_pcm(E) auth_rpcgss(E) aesni_intel(E) aes_x86_64(E) snd_timer(E) nfs_acl(E) joydev(E) crypto_simd(E) snd(E) lockd(E) grace(E) iTCO_wdt(E) iTCO_vendor_support(E) lpc_ich(E) mei_me(E) i2c_i801(E) mei(E) pcspkr(E) glue_helper(E) mfd_core(E) shpchp(E) intel_smartconnect(E) sunrpc(E) soundcore(E) tpm_infineon(E) fan(E) thermal(E) battery(E) cryptd(E) efivarfs(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) libahci(E) xhci_pci(E) ehci_pci(E) xhci_hcd(E) ehci_hcd(E)
[  207.927212]  ttm(E) libata(E) r8169(E) mii(E) drm(E) usbcore(E) fjes(E) video(E) button(E) sd_mod(E) vfat(E) fat(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mod(E) loop(E) sg(E) scsi_mod(E) autofs4(E)
[  207.927230] CPU: 6 PID: 45 Comm: migration/6 Tainted: G        W   E   4.10.0-tip-default #29
[  207.927231] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[  207.927231] Call Trace:
[  207.927235]  dump_stack+0x63/0x90
[  207.927238]  __warn+0xd1/0xf0
[  207.927240]  ? load_balance+0xa10/0xa10
[  207.927242]  warn_slowpath_fmt+0x4f/0x60
[  207.927244]  ? find_next_bit+0x19/0x20
[  207.927245]  ? cpumask_next_and+0x35/0x50
[  207.927246]  assert_clock_updated.isra.62.part.63+0x25/0x27
[  207.927247]  update_load_avg+0x855/0x950
[  207.927249]  ? load_balance+0xa10/0xa10
[  207.927250]  set_next_entity+0xa6/0x210
[  207.927252]  ? load_balance+0xa10/0xa10
[  207.927252]  pick_next_task_fair+0x78/0x550
[  207.927255]  ? sched_clock+0x9/0x10
[  207.927256]  ? sched_clock_cpu+0x11/0xc0
[  207.927257]  ? load_balance+0xa10/0xa10
[  207.927258]  sched_cpu_dying+0x251/0x2a0
[  207.927260]  ? fini_debug_store_on_cpu+0x34/0x40
[  207.927261]  ? sched_cpu_starting+0x60/0x60
[  207.927263]  cpuhp_invoke_callback+0x90/0x440
[  207.927265]  take_cpu_down+0x5e/0xa0
[  207.927267]  multi_cpu_stop+0xc4/0xf0
[  207.927268]  ? cpu_stop_queue_work+0xb0/0xb0
[  207.927269]  cpu_stopper_thread+0x96/0x120
[  207.927270]  smpboot_thread_fn+0x11a/0x1e0
[  207.927272]  kthread+0x10c/0x140
[  207.927273]  ? sort_range+0x30/0x30
[  207.927274]  ? kthread_parkme+0x40/0x40
[  207.927276]  ret_from_fork+0x2c/0x40
[  207.927277] ---[ end trace 262c7e4b746d5a77 ]---
[  207.927287] BUG: unable to handle kernel NULL pointer dereference at 0000000000000150
[  207.927289] IP: pick_next_task_fair+0x397/0x550
[  207.927289] PGD 0 
[  207.927290] 
[  207.927291] Oops: 0000 [#1] SMP
[  207.927294] Dumping ftrace buffer:
[  207.927296]    (ftrace buffer empty)
[  207.927296] Modules linked in: nls_utf8(E) isofs(E) ebtable_filter(E) ebtables(E) fuse(E) nf_log_ipv6(E) xt_pkttype(E) xt_physdev(E) br_netfilter(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) nls_iso8859_1(E) snd_hda_codec_hdmi(E) nls_cp437(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) kvm(E) snd_hda_intel(E)
[  207.927317]  snd_hda_codec(E) irqbypass(E) crct10dif_pclmul(E) snd_hda_core(E) snd_hwdep(E) nfsd(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) pcbc(E) snd_pcm(E) auth_rpcgss(E) aesni_intel(E) aes_x86_64(E) snd_timer(E) nfs_acl(E) joydev(E) crypto_simd(E) snd(E) lockd(E) grace(E) iTCO_wdt(E) iTCO_vendor_support(E) lpc_ich(E) mei_me(E) i2c_i801(E) mei(E) pcspkr(E) glue_helper(E) mfd_core(E) shpchp(E) intel_smartconnect(E) sunrpc(E) soundcore(E) tpm_infineon(E) fan(E) thermal(E) battery(E) cryptd(E) efivarfs(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) libahci(E) xhci_pci(E) ehci_pci(E) xhci_hcd(E) ehci_hcd(E)
[  207.927332]  ttm(E) libata(E) r8169(E) mii(E) drm(E) usbcore(E) fjes(E) video(E) button(E) sd_mod(E) vfat(E) fat(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mod(E) loop(E) sg(E) scsi_mod(E) autofs4(E)
[  207.927347] CPU: 6 PID: 45 Comm: migration/6 Tainted: G        W   E   4.10.0-tip-default #29
[  207.927347] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[  207.927348] task: ffff88017df703c0 task.stack: ffffc90001a4c000
[  207.927349] RIP: 0010:pick_next_task_fair+0x397/0x550
[  207.927350] RSP: 0018:ffffc90001a4fdb8 EFLAGS: 00010006
[  207.927350] RAX: 000000000b7eefb7 RBX: ffff88041ed98580 RCX: ffffffff8203e080
[  207.927351] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000800
[  207.927351] RBP: ffffc90001a4fe18 R08: 0000000000000001 R09: 0000000000000001
[  207.927352] R10: 00000030696e879b R11: 00000000001b6d69 R12: 0000000000000000
[  207.927352] R13: ffff88041ed985f0 R14: 0000000000000000 R15: 0000000000000000
[  207.927353] FS:  0000000000000000(0000) GS:ffff88041ed80000(0000) knlGS:0000000000000000
[  207.927354] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  207.927354] CR2: 0000000000000150 CR3: 0000000001c09000 CR4: 00000000001406e0
[  207.927355] Call Trace:
[  207.927358]  __schedule+0x745/0x940
[  207.927359]  schedule+0x3d/0x90
[  207.927360]  __kthread_parkme+0x4e/0xa0
[  207.927362]  kthread_parkme+0x25/0x40
[  207.927363]  smpboot_thread_fn+0x96/0x1e0
[  207.927364]  kthread+0x10c/0x140
[  207.927365]  ? sort_range+0x30/0x30
[  207.927366]  ? kthread_parkme+0x40/0x40
[  207.927366]  ret_from_fork+0x2c/0x40
[  207.927367] Code: 38 85 c9 74 c4 4c 89 e7 e8 d7 8e ff ff eb bd 4c 8b 75 b8 48 8d 48 80 48 89 4d c0 49 39 ce 0f 84 8b 00 00 00 49 83 ee 80 49 89 c4 <4d> 8b ac 24 50 01 00 00 49 8b be 50 01 00 00 49 39 fd 74 4d 41 
[  207.927377] RIP: pick_next_task_fair+0x397/0x550 RSP: ffffc90001a4fdb8
[  207.927377] CR2: 0000000000000150

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler  bam boom crash (cpu hotplug)
  2017-01-19  7:31 tip.today - scheduler bam boom crash (cpu hotplug) Mike Galbraith
@ 2017-01-19 10:19 ` Peter Zijlstra
  2017-01-19 11:39   ` Mike Galbraith
  2017-01-19 13:36   ` Peter Zijlstra
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2017-01-19 10:19 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Ingo Molnar, Thomas Gleixner

On Thu, Jan 19, 2017 at 08:31:09AM +0100, Mike Galbraith wrote:
> Mindless testing only, too sick to work, not sick enough to be immune
> to boredom.  Was verifying first warning wasn't somehow rt inspired,
> but while doing so, plain nopreempt (and no rt patch set) went boom.
> 
> [  203.088255] smpboot: CPU 1 is now offline
> [  203.168181] smpboot: CPU 2 is now offline
> [  203.221461] x86: Booting SMP configuration:
> [  203.221464] smpboot: Booting Node 0 Processor 1 APIC 0x2
> [  203.221728] ------------[ cut here ]------------
> [  203.221733] WARNING: CPU: 1 PID: 0 at kernel/sched/clock.c:149 set_sched_clock_stable+0x43/0x50
> [  203.221733] Modules linked in: nls_utf8(E) isofs(E) ebtable_filter(E) ebtables(E) fuse(E) nf_log_ipv6(E) xt_pkttype(E) xt_physdev(E) br_netfilter(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) xt_CT(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) nls_iso8859_1(E) snd_hda_codec_hdmi(E) nls_cp437(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) kvm(E) snd_hda_intel(E)
> [  203.221748]  snd_hda_codec(E) irqbypass(E) crct10dif_pclmul(E) snd_hda_core(E) snd_hwdep(E) nfsd(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) pcbc(E) snd_pcm(E) auth_rpcgss(E) aesni_intel(E) aes_x86_64(E) snd_timer(E) nfs_acl(E) joydev(E) crypto_simd(E) snd(E) lockd(E) grace(E) iTCO_wdt(E) iTCO_vendor_support(E) lpc_ich(E) mei_me(E) i2c_i801(E) mei(E) pcspkr(E) glue_helper(E) mfd_core(E) shpchp(E) intel_smartconnect(E) sunrpc(E) soundcore(E) tpm_infineon(E) fan(E) thermal(E) battery(E) cryptd(E) efivarfs(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) libahci(E) xhci_pci(E) ehci_pci(E) xhci_hcd(E) ehci_hcd(E)
> [  203.221765]  ttm(E) libata(E) r8169(E) mii(E) drm(E) usbcore(E) fjes(E) video(E) button(E) sd_mod(E) vfat(E) fat(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mod(E) loop(E) sg(E) scsi_mod(E) autofs4(E)
> [  203.221773] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E   4.10.0-tip-default #29
> [  203.221774] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [  203.221774] Call Trace:
> [  203.221778]  dump_stack+0x63/0x90
> [  203.221780]  __warn+0xd1/0xf0
> [  203.221782]  warn_slowpath_null+0x1d/0x20
> [  203.221782]  set_sched_clock_stable+0x43/0x50
> [  203.221784]  early_init_intel+0x225/0x360
> [  203.221785]  init_intel+0x18/0x2d0
> [  203.221786]  identify_cpu+0x2d1/0x4d0
> [  203.221786]  identify_secondary_cpu+0x18/0x80
> [  203.221789]  smp_store_cpu_info+0x3e/0x40
> [  203.221790]  start_secondary+0x53/0x180
> [  203.221791]  start_cpu+0x14/0x14
> [  203.221792] ---[ end trace 262c7e4b746d5a76 ]---


OK, you also forgot to tell what you did to trigger this, but a little
playing around seems enough to reproduce. All that was required was
offline + online and *boom*.

I'll go have a prod. Thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler  bam boom crash (cpu hotplug)
  2017-01-19 10:19 ` Peter Zijlstra
@ 2017-01-19 11:39   ` Mike Galbraith
  2017-01-19 13:36   ` Peter Zijlstra
  1 sibling, 0 replies; 29+ messages in thread
From: Mike Galbraith @ 2017-01-19 11:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar, Thomas Gleixner

On Thu, 2017-01-19 at 11:19 +0100, Peter Zijlstra wrote:

> OK, you also forgot to tell what you did to trigger this, but a little
> playing around seems enough to reproduce. All that was required was
> offline + online and *boom*.

Oh, sorry, it was Steven's hotplug stress script.  I thought the first
couple lines of kernel output (+$subject) told the tale.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler  bam boom crash (cpu hotplug)
  2017-01-19 10:19 ` Peter Zijlstra
  2017-01-19 11:39   ` Mike Galbraith
@ 2017-01-19 13:36   ` Peter Zijlstra
  2017-01-19 16:54     ` Ingo Molnar
                       ` (4 more replies)
  1 sibling, 5 replies; 29+ messages in thread
From: Peter Zijlstra @ 2017-01-19 13:36 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML, Ingo Molnar, Thomas Gleixner, Borislav Petkov

On Thu, Jan 19, 2017 at 11:19:24AM +0100, Peter Zijlstra wrote:

> I'll go have a prod. Thanks!

This seems to cure it for me.

---
Subject: sched/clock: Fix hotplug issue

Mike reported that he could trigger the WARN_ON_ONCE() in
set_sched_clock_stable() using hotplug.

This exposed a fundamental problem with the interface, we should never
mark the TSC stable if we ever find it to be unstable. Therefore
set_sched_clock_stable() is a broken interface.

The reason it existed is that not having it is a pain, it means all
relevant architecture code needs to call clear_sched_clock_stable()
where appropriate.

Of the three architectures that select HAVE_UNSTABLE_SCHED_CLOCK ia64
and parisc are trivial in that they never called
set_sched_clock_stable(), so add an unconditional call to
clear_sched_clock_stable() to them.

For x86 the story is a lot more involved, and what this patch tries to
do is ensure we preserve the status quo. So even is Cyrix or Transmeta
have usable TSC they never called set_sched_clock_stable() so they now
get an explicit mark unstable.

XXX: what about Xen ?

Fixes: 9881b024b7d7 ("sched/clock: Delay switching sched_clock to stable")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/ia64/kernel/setup.c        |  2 ++
 arch/parisc/kernel/setup.c      |  2 ++
 arch/x86/kernel/cpu/amd.c       |  6 ++++--
 arch/x86/kernel/cpu/centaur.c   |  2 ++
 arch/x86/kernel/cpu/common.c    |  3 +++
 arch/x86/kernel/cpu/cyrix.c     |  2 ++
 arch/x86/kernel/cpu/intel.c     |  6 ++++--
 arch/x86/kernel/cpu/transmeta.c |  3 +++
 arch/x86/kernel/kvmclock.c      |  2 +-
 include/linux/sched.h           |  1 -
 kernel/sched/clock.c            | 29 ++++++++---------------------
 11 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index 7ec7acc..c483ece 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -619,6 +619,8 @@ setup_arch (char **cmdline_p)
 	check_sal_cache_flush();
 #endif
 	paging_init();
+
+	clear_sched_clock_stable();
 }
 
 /*
diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
index 2e66a88..068ed36 100644
--- a/arch/parisc/kernel/setup.c
+++ b/arch/parisc/kernel/setup.c
@@ -36,6 +36,7 @@
 #undef PCI_DEBUG
 #include <linux/proc_fs.h>
 #include <linux/export.h>
+#include <linux/sched.h>
 
 #include <asm/processor.h>
 #include <asm/sections.h>
@@ -176,6 +177,7 @@ void __init setup_arch(char **cmdline_p)
 	conswitchp = &dummy_con;	/* we use do_take_over_console() later ! */
 #endif
 
+	clear_sched_clock_stable();
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 1d31672..80e657e 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -541,8 +541,10 @@ static void early_init_amd(struct cpuinfo_x86 *c)
 	if (c->x86_power & (1 << 8)) {
 		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
 		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
-		if (!check_tsc_unstable())
-			set_sched_clock_stable();
+		if (check_tsc_unstable())
+			clear_sched_clock_stable();
+	} else {
+		clear_sched_clock_stable();
 	}
 
 	/* Bit 12 of 8000_0007 edx is accumulated power mechanism. */
diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
index 1661d8e..c4bab8c 100644
--- a/arch/x86/kernel/cpu/centaur.c
+++ b/arch/x86/kernel/cpu/centaur.c
@@ -104,6 +104,8 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
 #ifdef CONFIG_X86_64
 	set_cpu_cap(c, X86_FEATURE_SYSENTER32);
 #endif
+
+	clear_sched_clock_stable();
 }
 
 static void init_centaur(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9bab7a8..0bdb1ab 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -83,6 +83,7 @@ static void default_init(struct cpuinfo_x86 *c)
 			strcpy(c->x86_model_id, "386");
 	}
 #endif
+	clear_sched_clock_stable();
 }
 
 static const struct cpu_dev default_cpu = {
@@ -1055,6 +1056,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	 */
 	if (this_cpu->c_init)
 		this_cpu->c_init(c);
+	else
+		clear_sched_clock_stable();
 
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
index bd9dcd6..47416f9 100644
--- a/arch/x86/kernel/cpu/cyrix.c
+++ b/arch/x86/kernel/cpu/cyrix.c
@@ -9,6 +9,7 @@
 #include <asm/pci-direct.h>
 #include <asm/tsc.h>
 #include <asm/cpufeature.h>
+#include <linux/sched.h>
 
 #include "cpu.h"
 
@@ -183,6 +184,7 @@ static void early_init_cyrix(struct cpuinfo_x86 *c)
 		set_cpu_cap(c, X86_FEATURE_CYRIX_ARR);
 		break;
 	}
+	clear_sched_clock_stable();
 }
 
 static void init_cyrix(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 203f860..026c728 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -119,8 +119,10 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	if (c->x86_power & (1 << 8)) {
 		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
 		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
-		if (!check_tsc_unstable())
-			set_sched_clock_stable();
+		if (check_tsc_unstable())
+			clear_sched_clock_stable();
+	} else {
+		clear_sched_clock_stable();
 	}
 
 	/* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
diff --git a/arch/x86/kernel/cpu/transmeta.c b/arch/x86/kernel/cpu/transmeta.c
index 3417856..c1ea5b9 100644
--- a/arch/x86/kernel/cpu/transmeta.c
+++ b/arch/x86/kernel/cpu/transmeta.c
@@ -1,4 +1,5 @@
 #include <linux/kernel.h>
+#include <linux/sched.h>
 #include <linux/mm.h>
 #include <asm/cpufeature.h>
 #include <asm/msr.h>
@@ -14,6 +15,8 @@ static void early_init_transmeta(struct cpuinfo_x86 *c)
 		if (xlvl >= 0x80860001)
 			c->x86_capability[CPUID_8086_0001_EDX] = cpuid_edx(0x80860001);
 	}
+
+	clear_sched_clock_stable();
 }
 
 static void init_transmeta(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 2a5cafd..542710b 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -107,12 +107,12 @@ static inline void kvm_sched_clock_init(bool stable)
 {
 	if (!stable) {
 		pv_time_ops.sched_clock = kvm_clock_read;
+		clear_sched_clock_stable();
 		return;
 	}
 
 	kvm_sched_clock_offset = kvm_clock_read();
 	pv_time_ops.sched_clock = kvm_sched_clock_read;
-	set_sched_clock_stable();
 
 	printk(KERN_INFO "kvm-clock: using sched offset of %llu cycles\n",
 			kvm_sched_clock_offset);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 68b35dd..8a9bbb6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2522,7 +2522,6 @@ extern void sched_clock_init_late(void);
  * is reliable after all:
  */
 extern int sched_clock_stable(void);
-extern void set_sched_clock_stable(void);
 extern void clear_sched_clock_stable(void);
 
 extern void sched_clock_tick(void);
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index 7713b2b..ad64efe 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -83,8 +83,15 @@ void sched_clock_init(void)
 }
 
 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
+/*
+ * We must start with !__sched_clock_stable because the unstable -> stable
+ * transition is accurate, while the stable -> unstable transition is not.
+ *
+ * Similarly we start with __sched_clock_stable_early, thereby assuming we
+ * will become stable, such that there's only a single 1 -> 0 transition.
+ */
 static DEFINE_STATIC_KEY_FALSE(__sched_clock_stable);
-static int __sched_clock_stable_early;
+static int __sched_clock_stable_early = 1;
 
 /*
  * We want: ktime_get_ns() + gtod_offset == sched_clock() + raw_offset
@@ -132,24 +139,6 @@ static void __set_sched_clock_stable(void)
 	tick_dep_clear(TICK_DEP_BIT_CLOCK_UNSTABLE);
 }
 
-void set_sched_clock_stable(void)
-{
-	__sched_clock_stable_early = 1;
-
-	smp_mb(); /* matches sched_clock_init_late() */
-
-	/*
-	 * This really should only be called early (before
-	 * sched_clock_init_late()) when guestimating our sched_clock() is
-	 * solid.
-	 *
-	 * After that we test stability and we can negate our guess using
-	 * clear_sched_clock_stable, possibly from a watchdog.
-	 */
-	if (WARN_ON_ONCE(sched_clock_running == 2))
-		__set_sched_clock_stable();
-}
-
 static void __clear_sched_clock_stable(struct work_struct *work)
 {
 	struct sched_clock_data *scd = this_scd();
@@ -199,8 +188,6 @@ void sched_clock_init_late(void)
 
 	if (__sched_clock_stable_early)
 		__set_sched_clock_stable();
-	else
-		__clear_sched_clock_stable(NULL);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler  bam boom crash (cpu hotplug)
  2017-01-19 13:36   ` Peter Zijlstra
@ 2017-01-19 16:54     ` Ingo Molnar
  2017-01-19 17:20       ` Peter Zijlstra
  2017-01-19 20:35     ` sched/clock: Fix hotplug issue kbuild test robot
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Ingo Molnar @ 2017-01-19 16:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner, Borislav Petkov


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Jan 19, 2017 at 11:19:24AM +0100, Peter Zijlstra wrote:
> 
> > I'll go have a prod. Thanks!
> 
> This seems to cure it for me.
> 
> ---
> Subject: sched/clock: Fix hotplug issue

JFYI:

 arch/x86/kernel/cpu/centaur.c: In function ‘early_init_centaur’:
 arch/x86/kernel/cpu/centaur.c:108:2: error: implicit declaration of function ‘clear_sched_clock_stable’ [-Werror=implicit-function-declaration]

x86-64 defconfig-ish.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler  bam boom crash (cpu hotplug)
  2017-01-19 16:54     ` Ingo Molnar
@ 2017-01-19 17:20       ` Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2017-01-19 17:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner, Borislav Petkov

On Thu, Jan 19, 2017 at 05:54:55PM +0100, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Thu, Jan 19, 2017 at 11:19:24AM +0100, Peter Zijlstra wrote:
> > 
> > > I'll go have a prod. Thanks!
> > 
> > This seems to cure it for me.
> > 
> > ---
> > Subject: sched/clock: Fix hotplug issue
> 
> JFYI:
> 
>  arch/x86/kernel/cpu/centaur.c: In function ‘early_init_centaur’:
>  arch/x86/kernel/cpu/centaur.c:108:2: error: implicit declaration of function ‘clear_sched_clock_stable’ [-Werror=implicit-function-declaration]
> 
> x86-64 defconfig-ish.

Duh.. that'd be a missing: #include <linux/sched.h>

Clearly that's not selected with the .config for the textbox ;-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: sched/clock: Fix hotplug issue
  2017-01-19 13:36   ` Peter Zijlstra
  2017-01-19 16:54     ` Ingo Molnar
@ 2017-01-19 20:35     ` kbuild test robot
  2017-01-19 20:37     ` kbuild test robot
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: kbuild test robot @ 2017-01-19 20:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kbuild-all, Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

Hi Peter,

[auto build test ERROR on tip/sched/core]
[also build test ERROR on next-20170119]
[cannot apply to tip/x86/core v4.10-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Peter-Zijlstra/sched-clock-Fix-hotplug-issue/20170120-034727
config: parisc-c3000_defconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=parisc 

All errors (new ones prefixed by >>):

   arch/parisc/kernel/setup.c: In function 'setup_arch':
>> arch/parisc/kernel/setup.c:180:2: error: implicit declaration of function 'clear_sched_clock_stable' [-Werror=implicit-function-declaration]
     clear_sched_clock_stable();
     ^~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/clear_sched_clock_stable +180 arch/parisc/kernel/setup.c

   174	#endif
   175	
   176	#if defined(CONFIG_VT) && defined(CONFIG_DUMMY_CONSOLE)
   177		conswitchp = &dummy_con;	/* we use do_take_over_console() later ! */
   178	#endif
   179	
 > 180		clear_sched_clock_stable();
   181	}
   182	
   183	/*

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 14576 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: sched/clock: Fix hotplug issue
  2017-01-19 13:36   ` Peter Zijlstra
  2017-01-19 16:54     ` Ingo Molnar
  2017-01-19 20:35     ` sched/clock: Fix hotplug issue kbuild test robot
@ 2017-01-19 20:37     ` kbuild test robot
  2017-01-20  6:36     ` [tip:sched/core] sched/clock: Fix hotplug crash tip-bot for Peter Zijlstra
  2017-02-27 12:30     ` tip.today - scheduler bam boom crash (cpu hotplug) Wanpeng Li
  4 siblings, 0 replies; 29+ messages in thread
From: kbuild test robot @ 2017-01-19 20:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kbuild-all, Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov

[-- Attachment #1: Type: text/plain, Size: 1367 bytes --]

Hi Peter,

[auto build test ERROR on tip/sched/core]
[also build test ERROR on next-20170119]
[cannot apply to tip/x86/core v4.10-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Peter-Zijlstra/sched-clock-Fix-hotplug-issue/20170120-034727
config: i386-randconfig-x003-201703 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   arch/x86/kernel/cpu/centaur.c: In function 'early_init_centaur':
>> arch/x86/kernel/cpu/centaur.c:108:2: error: implicit declaration of function 'clear_sched_clock_stable' [-Werror=implicit-function-declaration]
     clear_sched_clock_stable();
     ^~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/clear_sched_clock_stable +108 arch/x86/kernel/cpu/centaur.c

   102			break;
   103		}
   104	#ifdef CONFIG_X86_64
   105		set_cpu_cap(c, X86_FEATURE_SYSENTER32);
   106	#endif
   107	
 > 108		clear_sched_clock_stable();
   109	}
   110	
   111	static void init_centaur(struct cpuinfo_x86 *c)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22309 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [tip:sched/core] sched/clock: Fix hotplug crash
  2017-01-19 13:36   ` Peter Zijlstra
                       ` (2 preceding siblings ...)
  2017-01-19 20:37     ` kbuild test robot
@ 2017-01-20  6:36     ` tip-bot for Peter Zijlstra
  2017-02-27 12:30     ` tip.today - scheduler bam boom crash (cpu hotplug) Wanpeng Li
  4 siblings, 0 replies; 29+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-01-20  6:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, hpa, torvalds, bp, mingo, linux-kernel, peterz, efault

Commit-ID:  acb04058de49458010c44bb35b849d45113fd668
Gitweb:     http://git.kernel.org/tip/acb04058de49458010c44bb35b849d45113fd668
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 19 Jan 2017 14:36:33 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 20 Jan 2017 02:38:46 +0100

sched/clock: Fix hotplug crash

Mike reported that he could trigger the WARN_ON_ONCE() in
set_sched_clock_stable() using hotplug.

This exposed a fundamental problem with the interface, we should never
mark the TSC stable if we ever find it to be unstable. Therefore
set_sched_clock_stable() is a broken interface.

The reason it existed is that not having it is a pain, it means all
relevant architecture code needs to call clear_sched_clock_stable()
where appropriate.

Of the three architectures that select HAVE_UNSTABLE_SCHED_CLOCK ia64
and parisc are trivial in that they never called
set_sched_clock_stable(), so add an unconditional call to
clear_sched_clock_stable() to them.

For x86 the story is a lot more involved, and what this patch tries to
do is ensure we preserve the status quo. So even is Cyrix or Transmeta
have usable TSC they never called set_sched_clock_stable() so they now
get an explicit mark unstable.

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9881b024b7d7 ("sched/clock: Delay switching sched_clock to stable")
Link: http://lkml.kernel.org/r/20170119133633.GB6536@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/ia64/kernel/setup.c        |  2 ++
 arch/parisc/kernel/setup.c      |  2 ++
 arch/x86/kernel/cpu/amd.c       |  6 ++++--
 arch/x86/kernel/cpu/centaur.c   |  6 ++++--
 arch/x86/kernel/cpu/common.c    |  3 +++
 arch/x86/kernel/cpu/cyrix.c     |  2 ++
 arch/x86/kernel/cpu/intel.c     |  6 ++++--
 arch/x86/kernel/cpu/transmeta.c |  3 +++
 arch/x86/kernel/kvmclock.c      |  2 +-
 include/linux/sched.h           |  1 -
 kernel/sched/clock.c            | 29 ++++++++---------------------
 11 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index 7ec7acc..c483ece 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -619,6 +619,8 @@ setup_arch (char **cmdline_p)
 	check_sal_cache_flush();
 #endif
 	paging_init();
+
+	clear_sched_clock_stable();
 }
 
 /*
diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
index 2e66a88..068ed36 100644
--- a/arch/parisc/kernel/setup.c
+++ b/arch/parisc/kernel/setup.c
@@ -36,6 +36,7 @@
 #undef PCI_DEBUG
 #include <linux/proc_fs.h>
 #include <linux/export.h>
+#include <linux/sched.h>
 
 #include <asm/processor.h>
 #include <asm/sections.h>
@@ -176,6 +177,7 @@ void __init setup_arch(char **cmdline_p)
 	conswitchp = &dummy_con;	/* we use do_take_over_console() later ! */
 #endif
 
+	clear_sched_clock_stable();
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 71cae73..1bb253a 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -548,8 +548,10 @@ static void early_init_amd(struct cpuinfo_x86 *c)
 	if (c->x86_power & (1 << 8)) {
 		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
 		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
-		if (!check_tsc_unstable())
-			set_sched_clock_stable();
+		if (check_tsc_unstable())
+			clear_sched_clock_stable();
+	} else {
+		clear_sched_clock_stable();
 	}
 
 	/* Bit 12 of 8000_0007 edx is accumulated power mechanism. */
diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
index 1661d8e..2c234a6 100644
--- a/arch/x86/kernel/cpu/centaur.c
+++ b/arch/x86/kernel/cpu/centaur.c
@@ -1,5 +1,5 @@
-#include <linux/bitops.h>
-#include <linux/kernel.h>
+
+#include <linux/sched.h>
 
 #include <asm/cpufeature.h>
 #include <asm/e820.h>
@@ -104,6 +104,8 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
 #ifdef CONFIG_X86_64
 	set_cpu_cap(c, X86_FEATURE_SYSENTER32);
 #endif
+
+	clear_sched_clock_stable();
 }
 
 static void init_centaur(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index dc1697c..3457186 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -83,6 +83,7 @@ static void default_init(struct cpuinfo_x86 *c)
 			strcpy(c->x86_model_id, "386");
 	}
 #endif
+	clear_sched_clock_stable();
 }
 
 static const struct cpu_dev default_cpu = {
@@ -1055,6 +1056,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	 */
 	if (this_cpu->c_init)
 		this_cpu->c_init(c);
+	else
+		clear_sched_clock_stable();
 
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
index bd9dcd6..47416f9 100644
--- a/arch/x86/kernel/cpu/cyrix.c
+++ b/arch/x86/kernel/cpu/cyrix.c
@@ -9,6 +9,7 @@
 #include <asm/pci-direct.h>
 #include <asm/tsc.h>
 #include <asm/cpufeature.h>
+#include <linux/sched.h>
 
 #include "cpu.h"
 
@@ -183,6 +184,7 @@ static void early_init_cyrix(struct cpuinfo_x86 *c)
 		set_cpu_cap(c, X86_FEATURE_CYRIX_ARR);
 		break;
 	}
+	clear_sched_clock_stable();
 }
 
 static void init_cyrix(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index fcd484d..26eaff4 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -124,8 +124,10 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	if (c->x86_power & (1 << 8)) {
 		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
 		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
-		if (!check_tsc_unstable())
-			set_sched_clock_stable();
+		if (check_tsc_unstable())
+			clear_sched_clock_stable();
+	} else {
+		clear_sched_clock_stable();
 	}
 
 	/* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
diff --git a/arch/x86/kernel/cpu/transmeta.c b/arch/x86/kernel/cpu/transmeta.c
index 3417856..c1ea5b9 100644
--- a/arch/x86/kernel/cpu/transmeta.c
+++ b/arch/x86/kernel/cpu/transmeta.c
@@ -1,4 +1,5 @@
 #include <linux/kernel.h>
+#include <linux/sched.h>
 #include <linux/mm.h>
 #include <asm/cpufeature.h>
 #include <asm/msr.h>
@@ -14,6 +15,8 @@ static void early_init_transmeta(struct cpuinfo_x86 *c)
 		if (xlvl >= 0x80860001)
 			c->x86_capability[CPUID_8086_0001_EDX] = cpuid_edx(0x80860001);
 	}
+
+	clear_sched_clock_stable();
 }
 
 static void init_transmeta(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 2a5cafd..542710b 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -107,12 +107,12 @@ static inline void kvm_sched_clock_init(bool stable)
 {
 	if (!stable) {
 		pv_time_ops.sched_clock = kvm_clock_read;
+		clear_sched_clock_stable();
 		return;
 	}
 
 	kvm_sched_clock_offset = kvm_clock_read();
 	pv_time_ops.sched_clock = kvm_sched_clock_read;
-	set_sched_clock_stable();
 
 	printk(KERN_INFO "kvm-clock: using sched offset of %llu cycles\n",
 			kvm_sched_clock_offset);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a8daed9..69e6852 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2547,7 +2547,6 @@ extern void sched_clock_init_late(void);
  * is reliable after all:
  */
 extern int sched_clock_stable(void);
-extern void set_sched_clock_stable(void);
 extern void clear_sched_clock_stable(void);
 
 extern void sched_clock_tick(void);
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index 7713b2b..ad64efe 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -83,8 +83,15 @@ void sched_clock_init(void)
 }
 
 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
+/*
+ * We must start with !__sched_clock_stable because the unstable -> stable
+ * transition is accurate, while the stable -> unstable transition is not.
+ *
+ * Similarly we start with __sched_clock_stable_early, thereby assuming we
+ * will become stable, such that there's only a single 1 -> 0 transition.
+ */
 static DEFINE_STATIC_KEY_FALSE(__sched_clock_stable);
-static int __sched_clock_stable_early;
+static int __sched_clock_stable_early = 1;
 
 /*
  * We want: ktime_get_ns() + gtod_offset == sched_clock() + raw_offset
@@ -132,24 +139,6 @@ static void __set_sched_clock_stable(void)
 	tick_dep_clear(TICK_DEP_BIT_CLOCK_UNSTABLE);
 }
 
-void set_sched_clock_stable(void)
-{
-	__sched_clock_stable_early = 1;
-
-	smp_mb(); /* matches sched_clock_init_late() */
-
-	/*
-	 * This really should only be called early (before
-	 * sched_clock_init_late()) when guestimating our sched_clock() is
-	 * solid.
-	 *
-	 * After that we test stability and we can negate our guess using
-	 * clear_sched_clock_stable, possibly from a watchdog.
-	 */
-	if (WARN_ON_ONCE(sched_clock_running == 2))
-		__set_sched_clock_stable();
-}
-
 static void __clear_sched_clock_stable(struct work_struct *work)
 {
 	struct sched_clock_data *scd = this_scd();
@@ -199,8 +188,6 @@ void sched_clock_init_late(void)
 
 	if (__sched_clock_stable_early)
 		__set_sched_clock_stable();
-	else
-		__clear_sched_clock_stable(NULL);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-01-19 13:36   ` Peter Zijlstra
                       ` (3 preceding siblings ...)
  2017-01-20  6:36     ` [tip:sched/core] sched/clock: Fix hotplug crash tip-bot for Peter Zijlstra
@ 2017-02-27 12:30     ` Wanpeng Li
  2017-02-27 12:35       ` Paolo Bonzini
  2017-02-27 12:43       ` Peter Zijlstra
  4 siblings, 2 replies; 29+ messages in thread
From: Wanpeng Li @ 2017-02-27 12:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov, Paolo Bonzini

Cc Paolo,
2017-01-19 21:36 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Thu, Jan 19, 2017 at 11:19:24AM +0100, Peter Zijlstra wrote:
>
>> I'll go have a prod. Thanks!
>
> This seems to cure it for me.
>
> ---
> Subject: sched/clock: Fix hotplug issue
>
> Mike reported that he could trigger the WARN_ON_ONCE() in
> set_sched_clock_stable() using hotplug.
>
> This exposed a fundamental problem with the interface, we should never
> mark the TSC stable if we ever find it to be unstable. Therefore
> set_sched_clock_stable() is a broken interface.
>
> The reason it existed is that not having it is a pain, it means all
> relevant architecture code needs to call clear_sched_clock_stable()
> where appropriate.
>
> Of the three architectures that select HAVE_UNSTABLE_SCHED_CLOCK ia64
> and parisc are trivial in that they never called
> set_sched_clock_stable(), so add an unconditional call to
> clear_sched_clock_stable() to them.
>
> For x86 the story is a lot more involved, and what this patch tries to
> do is ensure we preserve the status quo. So even is Cyrix or Transmeta
> have usable TSC they never called set_sched_clock_stable() so they now
> get an explicit mark unstable.
>
> XXX: what about Xen ?
>
> Fixes: 9881b024b7d7 ("sched/clock: Delay switching sched_clock to stable")
> Reported-by: Mike Galbraith <efault@gmx.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/ia64/kernel/setup.c        |  2 ++
>  arch/parisc/kernel/setup.c      |  2 ++
>  arch/x86/kernel/cpu/amd.c       |  6 ++++--
>  arch/x86/kernel/cpu/centaur.c   |  2 ++
>  arch/x86/kernel/cpu/common.c    |  3 +++
>  arch/x86/kernel/cpu/cyrix.c     |  2 ++
>  arch/x86/kernel/cpu/intel.c     |  6 ++++--
>  arch/x86/kernel/cpu/transmeta.c |  3 +++
>  arch/x86/kernel/kvmclock.c      |  2 +-
>  include/linux/sched.h           |  1 -
>  kernel/sched/clock.c            | 29 ++++++++---------------------
>  11 files changed, 31 insertions(+), 27 deletions(-)
>
> diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
> index 7ec7acc..c483ece 100644
> --- a/arch/ia64/kernel/setup.c
> +++ b/arch/ia64/kernel/setup.c
> @@ -619,6 +619,8 @@ setup_arch (char **cmdline_p)
>         check_sal_cache_flush();
>  #endif
>         paging_init();
> +
> +       clear_sched_clock_stable();
>  }
>
>  /*
> diff --git a/arch/parisc/kernel/setup.c b/arch/parisc/kernel/setup.c
> index 2e66a88..068ed36 100644
> --- a/arch/parisc/kernel/setup.c
> +++ b/arch/parisc/kernel/setup.c
> @@ -36,6 +36,7 @@
>  #undef PCI_DEBUG
>  #include <linux/proc_fs.h>
>  #include <linux/export.h>
> +#include <linux/sched.h>
>
>  #include <asm/processor.h>
>  #include <asm/sections.h>
> @@ -176,6 +177,7 @@ void __init setup_arch(char **cmdline_p)
>         conswitchp = &dummy_con;        /* we use do_take_over_console() later ! */
>  #endif
>
> +       clear_sched_clock_stable();
>  }
>
>  /*
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 1d31672..80e657e 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -541,8 +541,10 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>         if (c->x86_power & (1 << 8)) {
>                 set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
>                 set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
> -               if (!check_tsc_unstable())
> -                       set_sched_clock_stable();
> +               if (check_tsc_unstable())
> +                       clear_sched_clock_stable();
> +       } else {
> +               clear_sched_clock_stable();
>         }
>
>         /* Bit 12 of 8000_0007 edx is accumulated power mechanism. */
> diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
> index 1661d8e..c4bab8c 100644
> --- a/arch/x86/kernel/cpu/centaur.c
> +++ b/arch/x86/kernel/cpu/centaur.c
> @@ -104,6 +104,8 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
>  #ifdef CONFIG_X86_64
>         set_cpu_cap(c, X86_FEATURE_SYSENTER32);
>  #endif
> +
> +       clear_sched_clock_stable();
>  }
>
>  static void init_centaur(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 9bab7a8..0bdb1ab 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -83,6 +83,7 @@ static void default_init(struct cpuinfo_x86 *c)
>                         strcpy(c->x86_model_id, "386");
>         }
>  #endif
> +       clear_sched_clock_stable();
>  }
>
>  static const struct cpu_dev default_cpu = {
> @@ -1055,6 +1056,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>          */
>         if (this_cpu->c_init)
>                 this_cpu->c_init(c);
> +       else
> +               clear_sched_clock_stable();
>
>         /* Disable the PN if appropriate */
>         squash_the_stupid_serial_number(c);
> diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
> index bd9dcd6..47416f9 100644
> --- a/arch/x86/kernel/cpu/cyrix.c
> +++ b/arch/x86/kernel/cpu/cyrix.c
> @@ -9,6 +9,7 @@
>  #include <asm/pci-direct.h>
>  #include <asm/tsc.h>
>  #include <asm/cpufeature.h>
> +#include <linux/sched.h>
>
>  #include "cpu.h"
>
> @@ -183,6 +184,7 @@ static void early_init_cyrix(struct cpuinfo_x86 *c)
>                 set_cpu_cap(c, X86_FEATURE_CYRIX_ARR);
>                 break;
>         }
> +       clear_sched_clock_stable();
>  }
>
>  static void init_cyrix(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 203f860..026c728 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -119,8 +119,10 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>         if (c->x86_power & (1 << 8)) {
>                 set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
>                 set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
> -               if (!check_tsc_unstable())
> -                       set_sched_clock_stable();
> +               if (check_tsc_unstable())
> +                       clear_sched_clock_stable();
> +       } else {
> +               clear_sched_clock_stable();
>         }
>
>         /* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
> diff --git a/arch/x86/kernel/cpu/transmeta.c b/arch/x86/kernel/cpu/transmeta.c
> index 3417856..c1ea5b9 100644
> --- a/arch/x86/kernel/cpu/transmeta.c
> +++ b/arch/x86/kernel/cpu/transmeta.c
> @@ -1,4 +1,5 @@
>  #include <linux/kernel.h>
> +#include <linux/sched.h>
>  #include <linux/mm.h>
>  #include <asm/cpufeature.h>
>  #include <asm/msr.h>
> @@ -14,6 +15,8 @@ static void early_init_transmeta(struct cpuinfo_x86 *c)
>                 if (xlvl >= 0x80860001)
>                         c->x86_capability[CPUID_8086_0001_EDX] = cpuid_edx(0x80860001);
>         }
> +
> +       clear_sched_clock_stable();
>  }
>
>  static void init_transmeta(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 2a5cafd..542710b 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -107,12 +107,12 @@ static inline void kvm_sched_clock_init(bool stable)
>  {
>         if (!stable) {
>                 pv_time_ops.sched_clock = kvm_clock_read;
> +               clear_sched_clock_stable();
>                 return;
>         }
>
>         kvm_sched_clock_offset = kvm_clock_read();
>         pv_time_ops.sched_clock = kvm_sched_clock_read;
> -       set_sched_clock_stable();

This results in sched clock always unstable for kvm guest since there
is no invariant tsc cpuid bit exposed for kvm guest currently. The
blockage happened for several reasons:

1) Migration: to host with different TSC frequency.
2) Savevm: It is not safe to use the TSC for wall clock timer services.

How about something like below(untested):

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index bae6ea6..a61c477 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -115,6 +115,7 @@ static inline void kvm_sched_clock_init(bool stable)

     kvm_sched_clock_offset = kvm_clock_read();
     pv_time_ops.sched_clock = kvm_sched_clock_read;
+    hypervisor_sched_clock_stable();

     printk(KERN_INFO "kvm-clock: using sched offset of %llu cycles\n",
             kvm_sched_clock_offset);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 451e241..38c6edb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2499,6 +2499,10 @@ static inline void clear_sched_clock_stable(void)
 {
 }

+static inline void hypervisor_sched_clock_stable(void)
+{
+}
+
 static inline void sched_clock_idle_sleep_event(void)
 {
 }
@@ -2526,6 +2530,7 @@ extern void sched_clock_init_late(void);
  */
 extern int sched_clock_stable(void);
 extern void clear_sched_clock_stable(void);
+extern void hypervisor_sched_clock_stable(void);

 extern void sched_clock_tick(void);
 extern void sched_clock_idle_sleep_event(void);
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index ad64efe..a46639e 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -77,9 +77,15 @@ EXPORT_SYMBOL_GPL(sched_clock);

 __read_mostly int sched_clock_running;

+enum {
+    SCHED_CLOCK_INIT = 0,
+    HYPERVISOR_SCHED_CLOCK_STABLE,
+    SCHED_CLOCK_INIT_LATE
+};
+
 void sched_clock_init(void)
 {
-    sched_clock_running = 1;
+    sched_clock_running = SCHED_CLOCK_INIT;
 }

 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
@@ -170,13 +176,14 @@ void clear_sched_clock_stable(void)

     smp_mb(); /* matches sched_clock_init_late() */

-    if (sched_clock_running == 2)
+    if (sched_clock_running == SCHED_CLOCK_INIT_LATE)
         schedule_work(&sched_clock_work);
 }

 void sched_clock_init_late(void)
 {
-    sched_clock_running = 2;
+    if (sched_clock_running == SCHED_CLOCK_INIT)
+        sched_clock_running = SCHED_CLOCK_INIT_LATE;
     /*
      * Ensure that it is impossible to not do a static_key update.
      *
@@ -186,8 +193,15 @@ void sched_clock_init_late(void)
      */
     smp_mb(); /* matches {set,clear}_sched_clock_stable() */

-    if (__sched_clock_stable_early)
+    if (__sched_clock_stable_early ||
+        sched_clock_running == HYPERVISOR_SCHED_CLOCK_STABLE) {
         __set_sched_clock_stable();
+    }
+}
+
+void hypervisor_sched_clock_stable()
+{
+    sched_clock_running = HYPERVISOR_SCHED_CLOCK_STABLE;
 }

 /*

Regards,
Wanpeng Li


>
>         printk(KERN_INFO "kvm-clock: using sched offset of %llu cycles\n",
>                         kvm_sched_clock_offset);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 68b35dd..8a9bbb6 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2522,7 +2522,6 @@ extern void sched_clock_init_late(void);
>   * is reliable after all:
>   */
>  extern int sched_clock_stable(void);
> -extern void set_sched_clock_stable(void);
>  extern void clear_sched_clock_stable(void);
>
>  extern void sched_clock_tick(void);
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index 7713b2b..ad64efe 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -83,8 +83,15 @@ void sched_clock_init(void)
>  }
>
>  #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> +/*
> + * We must start with !__sched_clock_stable because the unstable -> stable
> + * transition is accurate, while the stable -> unstable transition is not.
> + *
> + * Similarly we start with __sched_clock_stable_early, thereby assuming we
> + * will become stable, such that there's only a single 1 -> 0 transition.
> + */
>  static DEFINE_STATIC_KEY_FALSE(__sched_clock_stable);
> -static int __sched_clock_stable_early;
> +static int __sched_clock_stable_early = 1;
>
>  /*
>   * We want: ktime_get_ns() + gtod_offset == sched_clock() + raw_offset
> @@ -132,24 +139,6 @@ static void __set_sched_clock_stable(void)
>         tick_dep_clear(TICK_DEP_BIT_CLOCK_UNSTABLE);
>  }
>
> -void set_sched_clock_stable(void)
> -{
> -       __sched_clock_stable_early = 1;
> -
> -       smp_mb(); /* matches sched_clock_init_late() */
> -
> -       /*
> -        * This really should only be called early (before
> -        * sched_clock_init_late()) when guestimating our sched_clock() is
> -        * solid.
> -        *
> -        * After that we test stability and we can negate our guess using
> -        * clear_sched_clock_stable, possibly from a watchdog.
> -        */
> -       if (WARN_ON_ONCE(sched_clock_running == 2))
> -               __set_sched_clock_stable();
> -}
> -
>  static void __clear_sched_clock_stable(struct work_struct *work)
>  {
>         struct sched_clock_data *scd = this_scd();
> @@ -199,8 +188,6 @@ void sched_clock_init_late(void)
>
>         if (__sched_clock_stable_early)
>                 __set_sched_clock_stable();
> -       else
> -               __clear_sched_clock_stable(NULL);
>  }
>
>  /*

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 12:30     ` tip.today - scheduler bam boom crash (cpu hotplug) Wanpeng Li
@ 2017-02-27 12:35       ` Paolo Bonzini
  2017-02-27 12:43       ` Peter Zijlstra
  1 sibling, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2017-02-27 12:35 UTC (permalink / raw)
  To: Wanpeng Li, Peter Zijlstra
  Cc: Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner, Borislav Petkov



On 27/02/2017 13:30, Wanpeng Li wrote:
> This results in sched clock always unstable for kvm guest since there
> is no invariant tsc cpuid bit exposed for kvm guest currently. The
> blockage happened for several reasons:
> 
> 1) Migration: to host with different TSC frequency.
> 2) Savevm: It is not safe to use the TSC for wall clock timer services.

Right, the purpose of kvmclock is basically to ensure a stable clock
without using TSC.  However, I know zilch about kernel/sched/clock.c so
I cannot really understand your patch.

Paolo

> How about something like below(untested):
> 
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index bae6ea6..a61c477 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -115,6 +115,7 @@ static inline void kvm_sched_clock_init(bool stable)
> 
>      kvm_sched_clock_offset = kvm_clock_read();
>      pv_time_ops.sched_clock = kvm_sched_clock_read;
> +    hypervisor_sched_clock_stable();
> 
>      printk(KERN_INFO "kvm-clock: using sched offset of %llu cycles\n",
>              kvm_sched_clock_offset);
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 451e241..38c6edb 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2499,6 +2499,10 @@ static inline void clear_sched_clock_stable(void)
>  {
>  }
> 
> +static inline void hypervisor_sched_clock_stable(void)
> +{
> +}
> +
>  static inline void sched_clock_idle_sleep_event(void)
>  {
>  }
> @@ -2526,6 +2530,7 @@ extern void sched_clock_init_late(void);
>   */
>  extern int sched_clock_stable(void);
>  extern void clear_sched_clock_stable(void);
> +extern void hypervisor_sched_clock_stable(void);
> 
>  extern void sched_clock_tick(void);
>  extern void sched_clock_idle_sleep_event(void);
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index ad64efe..a46639e 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -77,9 +77,15 @@ EXPORT_SYMBOL_GPL(sched_clock);
> 
>  __read_mostly int sched_clock_running;
> 
> +enum {
> +    SCHED_CLOCK_INIT = 0,
> +    HYPERVISOR_SCHED_CLOCK_STABLE,
> +    SCHED_CLOCK_INIT_LATE
> +};
> +
>  void sched_clock_init(void)
>  {
> -    sched_clock_running = 1;
> +    sched_clock_running = SCHED_CLOCK_INIT;
>  }
> 
>  #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
> @@ -170,13 +176,14 @@ void clear_sched_clock_stable(void)
> 
>      smp_mb(); /* matches sched_clock_init_late() */
> 
> -    if (sched_clock_running == 2)
> +    if (sched_clock_running == SCHED_CLOCK_INIT_LATE)
>          schedule_work(&sched_clock_work);
>  }
> 
>  void sched_clock_init_late(void)
>  {
> -    sched_clock_running = 2;
> +    if (sched_clock_running == SCHED_CLOCK_INIT)
> +        sched_clock_running = SCHED_CLOCK_INIT_LATE;
>      /*
>       * Ensure that it is impossible to not do a static_key update.
>       *
> @@ -186,8 +193,15 @@ void sched_clock_init_late(void)
>       */
>      smp_mb(); /* matches {set,clear}_sched_clock_stable() */
> 
> -    if (__sched_clock_stable_early)
> +    if (__sched_clock_stable_early ||
> +        sched_clock_running == HYPERVISOR_SCHED_CLOCK_STABLE) {
>          __set_sched_clock_stable();
> +    }
> +}
> +
> +void hypervisor_sched_clock_stable()
> +{
> +    sched_clock_running = HYPERVISOR_SCHED_CLOCK_STABLE;
>  }
> 
>  /*

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 12:30     ` tip.today - scheduler bam boom crash (cpu hotplug) Wanpeng Li
  2017-02-27 12:35       ` Paolo Bonzini
@ 2017-02-27 12:43       ` Peter Zijlstra
  2017-02-27 12:50         ` Paolo Bonzini
  2017-02-27 13:48         ` Wanpeng Li
  1 sibling, 2 replies; 29+ messages in thread
From: Peter Zijlstra @ 2017-02-27 12:43 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov, Paolo Bonzini

On Mon, Feb 27, 2017 at 08:30:11PM +0800, Wanpeng Li wrote:

> > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> > index 2a5cafd..542710b 100644
> > --- a/arch/x86/kernel/kvmclock.c
> > +++ b/arch/x86/kernel/kvmclock.c
> > @@ -107,12 +107,12 @@ static inline void kvm_sched_clock_init(bool stable)
> >  {
> >         if (!stable) {
> >                 pv_time_ops.sched_clock = kvm_clock_read;
> > +               clear_sched_clock_stable();
> >                 return;
> >         }
> >
> >         kvm_sched_clock_offset = kvm_clock_read();
> >         pv_time_ops.sched_clock = kvm_sched_clock_read;
> > -       set_sched_clock_stable();
> 
> This results in sched clock always unstable for kvm guest since there
> is no invariant tsc cpuid bit exposed for kvm guest currently. 

What the heck is KVM_FEATURE_CLOCKSOURCE_STABLE_BIT /
PVCLOCK_TSC_STABLE_BIT about then?

> How about something like below(untested):

I have no clue what that tries to do.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 12:43       ` Peter Zijlstra
@ 2017-02-27 12:50         ` Paolo Bonzini
  2017-02-27 13:04           ` Peter Zijlstra
  2017-02-27 13:48         ` Wanpeng Li
  1 sibling, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2017-02-27 12:50 UTC (permalink / raw)
  To: Peter Zijlstra, Wanpeng Li
  Cc: Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner, Borislav Petkov



On 27/02/2017 13:43, Peter Zijlstra wrote:
> On Mon, Feb 27, 2017 at 08:30:11PM +0800, Wanpeng Li wrote:
> 
>>> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
>>> index 2a5cafd..542710b 100644
>>> --- a/arch/x86/kernel/kvmclock.c
>>> +++ b/arch/x86/kernel/kvmclock.c
>>> @@ -107,12 +107,12 @@ static inline void kvm_sched_clock_init(bool stable)
>>>  {
>>>         if (!stable) {
>>>                 pv_time_ops.sched_clock = kvm_clock_read;
>>> +               clear_sched_clock_stable();
>>>                 return;
>>>         }
>>>
>>>         kvm_sched_clock_offset = kvm_clock_read();
>>>         pv_time_ops.sched_clock = kvm_sched_clock_read;
>>> -       set_sched_clock_stable();
>>
>> This results in sched clock always unstable for kvm guest since there
>> is no invariant tsc cpuid bit exposed for kvm guest currently. 
> 
> What the heck is KVM_FEATURE_CLOCKSOURCE_STABLE_BIT /
> PVCLOCK_TSC_STABLE_BIT about then?

It checks that all the bugs in the host have been ironed out, and that
the host itself supports invtsc.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 12:50         ` Paolo Bonzini
@ 2017-02-27 13:04           ` Peter Zijlstra
  2017-02-27 15:27             ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2017-02-27 13:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov

On Mon, Feb 27, 2017 at 01:50:29PM +0100, Paolo Bonzini wrote:
> 
> 
> On 27/02/2017 13:43, Peter Zijlstra wrote:
> > On Mon, Feb 27, 2017 at 08:30:11PM +0800, Wanpeng Li wrote:
> > 
> >>> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> >>> index 2a5cafd..542710b 100644
> >>> --- a/arch/x86/kernel/kvmclock.c
> >>> +++ b/arch/x86/kernel/kvmclock.c
> >>> @@ -107,12 +107,12 @@ static inline void kvm_sched_clock_init(bool stable)
> >>>  {
> >>>         if (!stable) {
> >>>                 pv_time_ops.sched_clock = kvm_clock_read;
> >>> +               clear_sched_clock_stable();
> >>>                 return;
> >>>         }
> >>>
> >>>         kvm_sched_clock_offset = kvm_clock_read();
> >>>         pv_time_ops.sched_clock = kvm_sched_clock_read;
> >>> -       set_sched_clock_stable();
> >>
> >> This results in sched clock always unstable for kvm guest since there
> >> is no invariant tsc cpuid bit exposed for kvm guest currently. 
> > 
> > What the heck is KVM_FEATURE_CLOCKSOURCE_STABLE_BIT /
> > PVCLOCK_TSC_STABLE_BIT about then?
> 
> It checks that all the bugs in the host have been ironed out, and that
> the host itself supports invtsc.

But what does it mean if that is not so? That is, will kvm_clock_read()
still be stable even if !stable?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 12:43       ` Peter Zijlstra
  2017-02-27 12:50         ` Paolo Bonzini
@ 2017-02-27 13:48         ` Wanpeng Li
  2017-02-27 14:05           ` Peter Zijlstra
  1 sibling, 1 reply; 29+ messages in thread
From: Wanpeng Li @ 2017-02-27 13:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov, Paolo Bonzini

2017-02-27 20:43 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Mon, Feb 27, 2017 at 08:30:11PM +0800, Wanpeng Li wrote:
>
>> > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
>> > index 2a5cafd..542710b 100644
>> > --- a/arch/x86/kernel/kvmclock.c
>> > +++ b/arch/x86/kernel/kvmclock.c
>> > @@ -107,12 +107,12 @@ static inline void kvm_sched_clock_init(bool stable)
>> >  {
>> >         if (!stable) {
>> >                 pv_time_ops.sched_clock = kvm_clock_read;
>> > +               clear_sched_clock_stable();
>> >                 return;
>> >         }
>> >
>> >         kvm_sched_clock_offset = kvm_clock_read();
>> >         pv_time_ops.sched_clock = kvm_sched_clock_read;
>> > -       set_sched_clock_stable();
>>
>> This results in sched clock always unstable for kvm guest since there
>> is no invariant tsc cpuid bit exposed for kvm guest currently.
>
> What the heck is KVM_FEATURE_CLOCKSOURCE_STABLE_BIT /
> PVCLOCK_TSC_STABLE_BIT about then?
>
>> How about something like below(untested):
>
> I have no clue what that tries to do.

The patch tries to mark sched clock stable if
KVM_FEATURE_CLOCKSOURCE_STABLE_BIT / PVCLOCK_TSC_STABLE_BIT is set in
kvm guest even if there is no invariant tsc cpuid bit.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 13:48         ` Wanpeng Li
@ 2017-02-27 14:05           ` Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2017-02-27 14:05 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov, Paolo Bonzini

On Mon, Feb 27, 2017 at 09:48:35PM +0800, Wanpeng Li wrote:
> 2017-02-27 20:43 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> > I have no clue what that tries to do.
> 
> The patch tries to mark sched clock stable if
> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT / PVCLOCK_TSC_STABLE_BIT is set in
> kvm guest even if there is no invariant tsc cpuid bit.

I just ripped out the interface to 'set stable' for good reasons. Also,
the branch you add it to doesn't in fact do clear_stable -- so who does?

Also, the sched/clock.c code shouldn't know about hypervisor anything.

So whatever it is you're doing, you're doing it wrong.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 13:04           ` Peter Zijlstra
@ 2017-02-27 15:27             ` Paolo Bonzini
  2017-02-27 15:59               ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2017-02-27 15:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wanpeng Li, Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov



On 27/02/2017 14:04, Peter Zijlstra wrote:
>>>> This results in sched clock always unstable for kvm guest since there
>>>> is no invariant tsc cpuid bit exposed for kvm guest currently. 
>>> What the heck is KVM_FEATURE_CLOCKSOURCE_STABLE_BIT /
>>> PVCLOCK_TSC_STABLE_BIT about then?
>> It checks that all the bugs in the host have been ironed out, and that
>> the host itself supports invtsc.
> But what does it mean if that is not so? That is, will kvm_clock_read()
> still be stable even if !stable?

If kvmclock is !stable, nobody should have set that the sched clock to
stable, to begin with.

However, if kvmclock is stable, we know that the sched clock is stable.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 15:27             ` Paolo Bonzini
@ 2017-02-27 15:59               ` Peter Zijlstra
  2017-02-27 16:11                 ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2017-02-27 15:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov

On Mon, Feb 27, 2017 at 04:27:32PM +0100, Paolo Bonzini wrote:
> 
> 
> On 27/02/2017 14:04, Peter Zijlstra wrote:
> >>>> This results in sched clock always unstable for kvm guest since there
> >>>> is no invariant tsc cpuid bit exposed for kvm guest currently. 
> >>> What the heck is KVM_FEATURE_CLOCKSOURCE_STABLE_BIT /
> >>> PVCLOCK_TSC_STABLE_BIT about then?
> >> It checks that all the bugs in the host have been ironed out, and that
> >> the host itself supports invtsc.
> > But what does it mean if that is not so? That is, will kvm_clock_read()
> > still be stable even if !stable?
> 
> If kvmclock is !stable, nobody should have set that the sched clock to
> stable, to begin with.

OK, so if !KVM_FEATURE_CLOCKSOURCE_STABLE_BIT nothing is stable, but if
it is set, TSC might still not be stable, but kvm_clock_read() is.

> However, if kvmclock is stable, we know that the sched clock is stable.

Right, so the problem is that we only ever want to allow marking
unstable -- once its found unstable, for whatever reason, we should
never allow going stable. The corollary of this proposition is that we
must start out assuming it will become stable. And to avoid actually
using unstable TSC we do a 3 state bringup:

 1) sched_clock_running = 0, __stable_early = 1, __stable = 0
 2) sched_clock_running = 1 (__stable is effective, iow, we run unstable)
 3) sched_clock_running = 2 (__stable <- __stable_early)

2) happens 'early' but is 'safe'.
3) happens 'late', after we've brought up SMP and probed TSC

Between there, we should have detected the most common TSC wreckage and
made sure to not then switch to 'stable' at 3.

Now the problem appears to be that we assume sched_clock will use RDTSC
(native_sched_clock) while sched_clock is a paravirt op.

Now, I've not yet figured out the ordering between when we set
pv_time_ops.sched_clock and when we do the 'normal' TSC init stuff.

But it appears to me, we should not be calling
clear_sched_clock_stable() on TSC bits when we don't end up using
native_sched_clock().

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 15:59               ` Peter Zijlstra
@ 2017-02-27 16:11                 ` Paolo Bonzini
  2017-02-27 16:36                   ` Peter Zijlstra
  2017-02-28  1:51                   ` Wanpeng Li
  0 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2017-02-27 16:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wanpeng Li, Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov



On 27/02/2017 16:59, Peter Zijlstra wrote:
> OK, so if !KVM_FEATURE_CLOCKSOURCE_STABLE_BIT nothing is stable, but if
> it is set, TSC might still not be stable, but kvm_clock_read() is.
> 
>> However, if kvmclock is stable, we know that the sched clock is stable.
> Right, so the problem is that we only ever want to allow marking
> unstable -- once its found unstable, for whatever reason, we should
> never allow going stable. The corollary of this proposition is that we
> must start out assuming it will become stable. And to avoid actually
> using unstable TSC we do a 3 state bringup:
> 
>  1) sched_clock_running = 0, __stable_early = 1, __stable = 0
>  2) sched_clock_running = 1 (__stable is effective, iow, we run unstable)
>  3) sched_clock_running = 2 (__stable <- __stable_early)
> 
> 2) happens 'early' but is 'safe'.
> 3) happens 'late', after we've brought up SMP and probed TSC
> 
> Between there, we should have detected the most common TSC wreckage and
> made sure to not then switch to 'stable' at 3.
> 
> Now the problem appears to be that we assume sched_clock will use RDTSC
> (native_sched_clock) while sched_clock is a paravirt op.
> 
> Now, I've not yet figured out the ordering between when we set
> pv_time_ops.sched_clock and when we do the 'normal' TSC init stuff.

I think the ordering is fine:

- pv_time_ops.sched_clock is set here:

	start_kernel (init/main.c line 509)
	  setup_arch
	    kvmclock_init
	      kvm_sched_clock_init

- TSC can be declared unstable only after this:

	start_kernel (init/main.c line 628)
	  late_time_init
	    tsc_init

So by the time the tsc_cs_mark_unstable or mark_tsc_unstable can call
clear_sched_clock_stable, pv_time_ops.sched_clock has been set.

> But it appears to me, we should not be calling
> clear_sched_clock_stable() on TSC bits when we don't end up using
> native_sched_clock().

Yes, this makes sense.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 16:11                 ` Paolo Bonzini
@ 2017-02-27 16:36                   ` Peter Zijlstra
  2017-02-27 17:27                     ` Paolo Bonzini
  2017-02-28  1:51                   ` Wanpeng Li
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2017-02-27 16:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov

On Mon, Feb 27, 2017 at 05:11:34PM +0100, Paolo Bonzini wrote:
> On 27/02/2017 16:59, Peter Zijlstra wrote:

> > Now, I've not yet figured out the ordering between when we set
> > pv_time_ops.sched_clock and when we do the 'normal' TSC init stuff.
> 
> I think the ordering is fine:
> 
> - pv_time_ops.sched_clock is set here:
> 
> 	start_kernel (init/main.c line 509)
> 	  setup_arch
> 	    kvmclock_init
> 	      kvm_sched_clock_init
> 
> - TSC can be declared unstable only after this:
> 
> 	start_kernel (init/main.c line 628)
> 	  late_time_init
> 	    tsc_init
> 
> So by the time the tsc_cs_mark_unstable or mark_tsc_unstable can call
> clear_sched_clock_stable, pv_time_ops.sched_clock has been set.
> 
> > But it appears to me, we should not be calling
> > clear_sched_clock_stable() on TSC bits when we don't end up using
> > native_sched_clock().
> 
> Yes, this makes sense.

Something like the below then (completely untested for now) has a chance
of working. It does however change behaviour a little.

I'm trying to debug something else; after that I'll give this a little
more consideration.

---
 arch/x86/kernel/cpu/amd.c       |  4 +---
 arch/x86/kernel/cpu/centaur.c   |  2 +-
 arch/x86/kernel/cpu/common.c    |  4 ++--
 arch/x86/kernel/cpu/cyrix.c     |  2 +-
 arch/x86/kernel/cpu/intel.c     |  4 +---
 arch/x86/kernel/cpu/transmeta.c |  2 +-
 arch/x86/kernel/tsc.c           | 33 +++++++++++++++++++++------------
 7 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 4e95b2e0d95f..bc3bbb6a8ab0 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -555,10 +555,8 @@ static void early_init_amd(struct cpuinfo_x86 *c)
 	if (c->x86_power & (1 << 8)) {
 		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
 		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
-		if (check_tsc_unstable())
-			clear_sched_clock_stable();
 	} else {
-		clear_sched_clock_stable();
+		mark_tsc_unstable("not invariant");
 	}
 
 	/* Bit 12 of 8000_0007 edx is accumulated power mechanism. */
diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
index 2c234a6d94c4..0fdff183aa30 100644
--- a/arch/x86/kernel/cpu/centaur.c
+++ b/arch/x86/kernel/cpu/centaur.c
@@ -105,7 +105,7 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
 	set_cpu_cap(c, X86_FEATURE_SYSENTER32);
 #endif
 
-	clear_sched_clock_stable();
+	mark_tsc_unstable("not invariant");
 }
 
 static void init_centaur(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f07005e6f461..2b7ff648ea25 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -86,7 +86,7 @@ static void default_init(struct cpuinfo_x86 *c)
 			strcpy(c->x86_model_id, "386");
 	}
 #endif
-	clear_sched_clock_stable();
+	mark_tsc_unstable("not invariant");
 }
 
 static const struct cpu_dev default_cpu = {
@@ -1076,7 +1076,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	if (this_cpu->c_init)
 		this_cpu->c_init(c);
 	else
-		clear_sched_clock_stable();
+		mark_tsc_unstable("not invariant");
 
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
index 47416f959a48..35057d67e864 100644
--- a/arch/x86/kernel/cpu/cyrix.c
+++ b/arch/x86/kernel/cpu/cyrix.c
@@ -184,7 +184,7 @@ static void early_init_cyrix(struct cpuinfo_x86 *c)
 		set_cpu_cap(c, X86_FEATURE_CYRIX_ARR);
 		break;
 	}
-	clear_sched_clock_stable();
+	mark_tsc_unstable("not invariant");
 }
 
 static void init_cyrix(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 017ecd3bb553..e0e192e43a4c 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -161,10 +161,8 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	if (c->x86_power & (1 << 8)) {
 		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
 		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
-		if (check_tsc_unstable())
-			clear_sched_clock_stable();
 	} else {
-		clear_sched_clock_stable();
+		mark_tsc_unstable("not invariant");
 	}
 
 	/* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
diff --git a/arch/x86/kernel/cpu/transmeta.c b/arch/x86/kernel/cpu/transmeta.c
index c1ea5b999839..fa243ffd1c84 100644
--- a/arch/x86/kernel/cpu/transmeta.c
+++ b/arch/x86/kernel/cpu/transmeta.c
@@ -16,7 +16,7 @@ static void early_init_transmeta(struct cpuinfo_x86 *c)
 			c->x86_capability[CPUID_8086_0001_EDX] = cpuid_edx(0x80860001);
 	}
 
-	clear_sched_clock_stable();
+	mark_tsc_unstable("not invariant");
 }
 
 static void init_transmeta(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2724dc82f992..bf6627aff54d 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -326,9 +326,16 @@ unsigned long long sched_clock(void)
 {
 	return paravirt_sched_clock();
 }
+
+static inline bool using_native_sched_clock(void)
+{
+	return pv_time_ops.sched_clock == native_sched_clock;
+}
 #else
 unsigned long long
 sched_clock(void) __attribute__((alias("native_sched_clock")));
+
+static inline bool using_native_sched_clock(void) { return true; }
 #endif
 
 int check_tsc_unstable(void)
@@ -1112,7 +1119,8 @@ static void tsc_cs_mark_unstable(struct clocksource *cs)
 	if (tsc_unstable)
 		return;
 	tsc_unstable = 1;
-	clear_sched_clock_stable();
+	if (using_native_sched_clock())
+		clear_sched_clock_stable();
 	disable_sched_clock_irqtime();
 	pr_info("Marking TSC unstable due to clocksource watchdog\n");
 }
@@ -1134,18 +1142,19 @@ static struct clocksource clocksource_tsc = {
 
 void mark_tsc_unstable(char *reason)
 {
-	if (!tsc_unstable) {
-		tsc_unstable = 1;
+	if (tsc_unstable)
+		return;
+	tsc_unstable = 1;
+	if (using_native_sched_clock())
 		clear_sched_clock_stable();
-		disable_sched_clock_irqtime();
-		pr_info("Marking TSC unstable due to %s\n", reason);
-		/* Change only the rating, when not registered */
-		if (clocksource_tsc.mult)
-			clocksource_mark_unstable(&clocksource_tsc);
-		else {
-			clocksource_tsc.flags |= CLOCK_SOURCE_UNSTABLE;
-			clocksource_tsc.rating = 0;
-		}
+	disable_sched_clock_irqtime();
+	pr_info("Marking TSC unstable due to %s\n", reason);
+	/* Change only the rating, when not registered */
+	if (clocksource_tsc.mult)
+		clocksource_mark_unstable(&clocksource_tsc);
+	else {
+		clocksource_tsc.flags |= CLOCK_SOURCE_UNSTABLE;
+		clocksource_tsc.rating = 0;
 	}
 }
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 16:36                   ` Peter Zijlstra
@ 2017-02-27 17:27                     ` Paolo Bonzini
  2017-02-27 17:40                       ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2017-02-27 17:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wanpeng Li, Mike Galbraith, LKML, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov



On 27/02/2017 17:36, Peter Zijlstra wrote:
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 4e95b2e0d95f..bc3bbb6a8ab0 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -555,10 +555,8 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>  	if (c->x86_power & (1 << 8)) {
>  		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
>  		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
> -		if (check_tsc_unstable())
> -			clear_sched_clock_stable();
>  	} else {
> -		clear_sched_clock_stable();
> +		mark_tsc_unstable("not invariant");
>  	}
>  
>  	/* Bit 12 of 8000_0007 edx is accumulated power mechanism. */
> diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
> index 2c234a6d94c4..0fdff183aa30 100644
> --- a/arch/x86/kernel/cpu/centaur.c
> +++ b/arch/x86/kernel/cpu/centaur.c
> @@ -105,7 +105,7 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
>  	set_cpu_cap(c, X86_FEATURE_SYSENTER32);
>  #endif
>  
> -	clear_sched_clock_stable();
> +	mark_tsc_unstable("not invariant");
>  }
>  
>  static void init_centaur(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index f07005e6f461..2b7ff648ea25 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -86,7 +86,7 @@ static void default_init(struct cpuinfo_x86 *c)
>  			strcpy(c->x86_model_id, "386");
>  	}
>  #endif
> -	clear_sched_clock_stable();
> +	mark_tsc_unstable("not invariant");
>  }
>  
>  static const struct cpu_dev default_cpu = {
> @@ -1076,7 +1076,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>  	if (this_cpu->c_init)
>  		this_cpu->c_init(c);
>  	else
> -		clear_sched_clock_stable();
> +		mark_tsc_unstable("not invariant");
>  
>  	/* Disable the PN if appropriate */
>  	squash_the_stupid_serial_number(c);
> diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
> index 47416f959a48..35057d67e864 100644
> --- a/arch/x86/kernel/cpu/cyrix.c
> +++ b/arch/x86/kernel/cpu/cyrix.c
> @@ -184,7 +184,7 @@ static void early_init_cyrix(struct cpuinfo_x86 *c)
>  		set_cpu_cap(c, X86_FEATURE_CYRIX_ARR);
>  		break;
>  	}
> -	clear_sched_clock_stable();
> +	mark_tsc_unstable("not invariant");
>  }
>  
>  static void init_cyrix(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 017ecd3bb553..e0e192e43a4c 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -161,10 +161,8 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>  	if (c->x86_power & (1 << 8)) {
>  		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
>  		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
> -		if (check_tsc_unstable())
> -			clear_sched_clock_stable();
>  	} else {
> -		clear_sched_clock_stable();
> +		mark_tsc_unstable("not invariant");
>  	}
>  
>  	/* Penwell and Cloverview have the TSC which doesn't sleep on S3 */

Doh, these are called _before_ kvmclock_init.  But perhaps they can all
be replaced by something like this:

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2724dc82f992..3080b6877190 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1398,6 +1398,9 @@ void __init tsc_init(void)

 	use_tsc_delay();

+	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
+		mark_tsc_unstable("not invariant");
+
 	if (unsynchronized_tsc())
 		mark_tsc_unstable("TSCs unsynchronized");

The rest seems nice.

Paolo

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 17:27                     ` Paolo Bonzini
@ 2017-02-27 17:40                       ` Thomas Gleixner
  2017-02-27 19:06                         ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2017-02-27 17:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, Wanpeng Li, Mike Galbraith, LKML, Ingo Molnar,
	Borislav Petkov

On Mon, 27 Feb 2017, Paolo Bonzini wrote:
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 2724dc82f992..3080b6877190 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1398,6 +1398,9 @@ void __init tsc_init(void)
> 
>  	use_tsc_delay();
> 
> +	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
> +		mark_tsc_unstable("not invariant");

Errm, no. 

That makes TSC unusable for systems which do not go into C/P states in
which the TSC stops. There is a world outside KVM ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 17:40                       ` Thomas Gleixner
@ 2017-02-27 19:06                         ` Paolo Bonzini
  2017-02-27 20:35                           ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2017-02-27 19:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Wanpeng Li, Mike Galbraith, LKML, Ingo Molnar,
	Borislav Petkov



----- Original Message -----
> From: "Thomas Gleixner" <tglx@linutronix.de>
> To: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: "Peter Zijlstra" <peterz@infradead.org>, "Wanpeng Li" <kernellwp@gmail.com>, "Mike Galbraith" <efault@gmx.de>,
> "LKML" <linux-kernel@vger.kernel.org>, "Ingo Molnar" <mingo@elte.hu>, "Borislav Petkov" <bp@alien8.de>
> Sent: Monday, February 27, 2017 6:40:46 PM
> Subject: Re: tip.today - scheduler bam boom crash (cpu hotplug)
> 
> On Mon, 27 Feb 2017, Paolo Bonzini wrote:
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 2724dc82f992..3080b6877190 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1398,6 +1398,9 @@ void __init tsc_init(void)
> > 
> >  	use_tsc_delay();
> > 
> > +	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
> > +		mark_tsc_unstable("not invariant");
> 
> Errm, no.
> 
> That makes TSC unusable for systems which do not go into C/P states in
> which the TSC stops. There is a world outside KVM ....

Actually I was surprised too by Peter's patch, as it was adding
mark_tsc_unstable pretty much everywhere that didn't have nonstop TSC.
But hopefully it would still be okay to call clear_sched_clock_stable
in tsc_init, in the same way.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 19:06                         ` Paolo Bonzini
@ 2017-02-27 20:35                           ` Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2017-02-27 20:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Thomas Gleixner, Wanpeng Li, Mike Galbraith, LKML, Ingo Molnar,
	Borislav Petkov

On Mon, Feb 27, 2017 at 02:06:27PM -0500, Paolo Bonzini wrote:
> > > +	if (!boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
> > > +		mark_tsc_unstable("not invariant");
> > 
> > Errm, no.
> > 
> > That makes TSC unusable for systems which do not go into C/P states in
> > which the TSC stops. There is a world outside KVM ....
> 
> Actually I was surprised too by Peter's patch, as it was adding
> mark_tsc_unstable pretty much everywhere that didn't have nonstop TSC.
> But hopefully it would still be okay to call clear_sched_clock_stable
> in tsc_init, in the same way.

I was preserving the sched_clock behaviour. That said, maybe we can
simply remove all those extra checks and only rely on the current
mark_tsc_unstable() calls.

I'll just have to sit down and consider the various cases..

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-27 16:11                 ` Paolo Bonzini
  2017-02-27 16:36                   ` Peter Zijlstra
@ 2017-02-28  1:51                   ` Wanpeng Li
  2017-02-28  8:08                     ` Peter Zijlstra
  1 sibling, 1 reply; 29+ messages in thread
From: Wanpeng Li @ 2017-02-28  1:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, Mike Galbraith, LKML, Ingo Molnar,
	Thomas Gleixner, Borislav Petkov

2017-02-28 0:11 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 27/02/2017 16:59, Peter Zijlstra wrote:
>> OK, so if !KVM_FEATURE_CLOCKSOURCE_STABLE_BIT nothing is stable, but if
>> it is set, TSC might still not be stable, but kvm_clock_read() is.
>>
>>> However, if kvmclock is stable, we know that the sched clock is stable.
>> Right, so the problem is that we only ever want to allow marking
>> unstable -- once its found unstable, for whatever reason, we should
>> never allow going stable. The corollary of this proposition is that we
>> must start out assuming it will become stable. And to avoid actually
>> using unstable TSC we do a 3 state bringup:
>>
>>  1) sched_clock_running = 0, __stable_early = 1, __stable = 0
>>  2) sched_clock_running = 1 (__stable is effective, iow, we run unstable)
>>  3) sched_clock_running = 2 (__stable <- __stable_early)
>>
>> 2) happens 'early' but is 'safe'.
>> 3) happens 'late', after we've brought up SMP and probed TSC
>>
>> Between there, we should have detected the most common TSC wreckage and
>> made sure to not then switch to 'stable' at 3.
>>
>> Now the problem appears to be that we assume sched_clock will use RDTSC
>> (native_sched_clock) while sched_clock is a paravirt op.
>>
>> Now, I've not yet figured out the ordering between when we set
>> pv_time_ops.sched_clock and when we do the 'normal' TSC init stuff.
>
> I think the ordering is fine:
>
> - pv_time_ops.sched_clock is set here:
>
>         start_kernel (init/main.c line 509)
>           setup_arch
>             kvmclock_init
>               kvm_sched_clock_init
>
> - TSC can be declared unstable only after this:
>
>         start_kernel (init/main.c line 628)
>           late_time_init
>             tsc_init
>
> So by the time the tsc_cs_mark_unstable or mark_tsc_unstable can call
> clear_sched_clock_stable, pv_time_ops.sched_clock has been set.
>
>> But it appears to me, we should not be calling
>> clear_sched_clock_stable() on TSC bits when we don't end up using
>> native_sched_clock().
>
> Yes, this makes sense.

How about something like below, we delay sched clock stable check to
tsc_init() if we run in VM, it calls clear_sched_clock_stable() if
there is no invariant tsc bit when we end up using
native_sched_clock(), otherwise, kvmclock will determine if it is
stable depends on KVM_FEATURE_CLOCKSOURCE_STABLE_BIT /
PVCLOCK_TSC_STABLE_BIT :

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 4e95b2e..ed8eda4 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -557,7 +557,7 @@ static void early_init_amd(struct cpuinfo_x86 *c)
         set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
         if (check_tsc_unstable())
             clear_sched_clock_stable();
-    } else {
+    } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
         clear_sched_clock_stable();
     }

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 017ecd3..1927f5f 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -163,7 +163,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
         set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
         if (check_tsc_unstable())
             clear_sched_clock_stable();
-    } else {
+    } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
         clear_sched_clock_stable();
     }

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2724dc8..68149f5 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -326,9 +326,16 @@ unsigned long long sched_clock(void)
 {
     return paravirt_sched_clock();
 }
+
+static inline bool using_native_sched_clock(void)
+{
+    return pv_time_ops.sched_clock == native_sched_clock;
+}
 #else
 unsigned long long
 sched_clock(void) __attribute__((alias("native_sched_clock")));
+
+static inline bool using_native_sched_clock(void) { return true; }
 #endif

 int check_tsc_unstable(void)
@@ -1398,6 +1405,11 @@ void __init tsc_init(void)

     use_tsc_delay();

+    if (using_native_sched_clock() &&
+        boot_cpu_has(X86_FEATURE_HYPERVISOR) &&
+        !boot_cpu_has(X86_FEATURE_NONSTOP_TSC))
+        clear_sched_clock_stable();
+
     if (unsynchronized_tsc())
         mark_tsc_unstable("TSCs unsynchronized");

Regards,
Wanpeng Li

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-28  1:51                   ` Wanpeng Li
@ 2017-02-28  8:08                     ` Peter Zijlstra
  2017-02-28  8:11                       ` Wanpeng Li
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2017-02-28  8:08 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paolo Bonzini, Mike Galbraith, LKML, Ingo Molnar,
	Thomas Gleixner, Borislav Petkov

On Tue, Feb 28, 2017 at 09:51:07AM +0800, Wanpeng Li wrote:
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 4e95b2e..ed8eda4 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -557,7 +557,7 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>          set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
>          if (check_tsc_unstable())
>              clear_sched_clock_stable();
> -    } else {
> +    } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
>          clear_sched_clock_stable();
>      }

That's wrong, you can have HYPERVISOR and still use
native_sched_clock() (lguest does that for one).


I suspect we can do something like the below. Since we fixed the
clocksource watchdog to mark TSC unstable, and we're already fairly
careful with using TSC for timekeeping anyway.

---
 arch/x86/kernel/cpu/amd.c       |  4 ----
 arch/x86/kernel/cpu/centaur.c   |  2 --
 arch/x86/kernel/cpu/common.c    |  3 ---
 arch/x86/kernel/cpu/cyrix.c     |  1 -
 arch/x86/kernel/cpu/intel.c     |  4 ----
 arch/x86/kernel/cpu/transmeta.c |  2 --
 arch/x86/kernel/tsc.c           | 35 +++++++++++++++++++++++------------
 7 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 35a5d5dca2fa..c36140d788fe 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -556,10 +556,6 @@ static void early_init_amd(struct cpuinfo_x86 *c)
 	if (c->x86_power & (1 << 8)) {
 		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
 		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
-		if (check_tsc_unstable())
-			clear_sched_clock_stable();
-	} else {
-		clear_sched_clock_stable();
 	}
 
 	/* Bit 12 of 8000_0007 edx is accumulated power mechanism. */
diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
index 106bd3318121..44207b71fee1 100644
--- a/arch/x86/kernel/cpu/centaur.c
+++ b/arch/x86/kernel/cpu/centaur.c
@@ -105,8 +105,6 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
 #ifdef CONFIG_X86_64
 	set_cpu_cap(c, X86_FEATURE_SYSENTER32);
 #endif
-
-	clear_sched_clock_stable();
 }
 
 static void init_centaur(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c188ae5a5d9f..0209907a63b1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -88,7 +88,6 @@ static void default_init(struct cpuinfo_x86 *c)
 			strcpy(c->x86_model_id, "386");
 	}
 #endif
-	clear_sched_clock_stable();
 }
 
 static const struct cpu_dev default_cpu = {
@@ -1077,8 +1076,6 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	 */
 	if (this_cpu->c_init)
 		this_cpu->c_init(c);
-	else
-		clear_sched_clock_stable();
 
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
index 0a3bc19de017..a70fd61095f8 100644
--- a/arch/x86/kernel/cpu/cyrix.c
+++ b/arch/x86/kernel/cpu/cyrix.c
@@ -185,7 +185,6 @@ static void early_init_cyrix(struct cpuinfo_x86 *c)
 		set_cpu_cap(c, X86_FEATURE_CYRIX_ARR);
 		break;
 	}
-	clear_sched_clock_stable();
 }
 
 static void init_cyrix(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index fe0a615a051b..063197771b8d 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -162,10 +162,6 @@ static void early_init_intel(struct cpuinfo_x86 *c)
 	if (c->x86_power & (1 << 8)) {
 		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
 		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
-		if (check_tsc_unstable())
-			clear_sched_clock_stable();
-	} else {
-		clear_sched_clock_stable();
 	}
 
 	/* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
diff --git a/arch/x86/kernel/cpu/transmeta.c b/arch/x86/kernel/cpu/transmeta.c
index 8457b4978668..d77d07ab310b 100644
--- a/arch/x86/kernel/cpu/transmeta.c
+++ b/arch/x86/kernel/cpu/transmeta.c
@@ -16,8 +16,6 @@ static void early_init_transmeta(struct cpuinfo_x86 *c)
 		if (xlvl >= 0x80860001)
 			c->x86_capability[CPUID_8086_0001_EDX] = cpuid_edx(0x80860001);
 	}
-
-	clear_sched_clock_stable();
 }
 
 static void init_transmeta(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 46bcda4cb1c2..5ca0f52e1ba1 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -327,9 +327,16 @@ unsigned long long sched_clock(void)
 {
 	return paravirt_sched_clock();
 }
+
+static inline bool using_native_sched_clock(void)
+{
+	return pv_time_ops.sched_clock == native_sched_clock;
+}
 #else
 unsigned long long
 sched_clock(void) __attribute__((alias("native_sched_clock")));
+
+static inline bool using_native_sched_clock(void) { return true; }
 #endif
 
 int check_tsc_unstable(void)
@@ -1112,8 +1119,10 @@ static void tsc_cs_mark_unstable(struct clocksource *cs)
 {
 	if (tsc_unstable)
 		return;
+
 	tsc_unstable = 1;
-	clear_sched_clock_stable();
+	if (using_native_sched_clock())
+		clear_sched_clock_stable();
 	disable_sched_clock_irqtime();
 	pr_info("Marking TSC unstable due to clocksource watchdog\n");
 }
@@ -1135,18 +1144,20 @@ static struct clocksource clocksource_tsc = {
 
 void mark_tsc_unstable(char *reason)
 {
-	if (!tsc_unstable) {
-		tsc_unstable = 1;
+	if (tsc_unstable)
+		return;
+
+	tsc_unstable = 1;
+	if (using_native_sched_clock())
 		clear_sched_clock_stable();
-		disable_sched_clock_irqtime();
-		pr_info("Marking TSC unstable due to %s\n", reason);
-		/* Change only the rating, when not registered */
-		if (clocksource_tsc.mult)
-			clocksource_mark_unstable(&clocksource_tsc);
-		else {
-			clocksource_tsc.flags |= CLOCK_SOURCE_UNSTABLE;
-			clocksource_tsc.rating = 0;
-		}
+	disable_sched_clock_irqtime();
+	pr_info("Marking TSC unstable due to %s\n", reason);
+	/* Change only the rating, when not registered */
+	if (clocksource_tsc.mult)
+		clocksource_mark_unstable(&clocksource_tsc);
+	else {
+		clocksource_tsc.flags |= CLOCK_SOURCE_UNSTABLE;
+		clocksource_tsc.rating = 0;
 	}
 }
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-28  8:08                     ` Peter Zijlstra
@ 2017-02-28  8:11                       ` Wanpeng Li
  2017-03-01 13:39                         ` Wanpeng Li
  0 siblings, 1 reply; 29+ messages in thread
From: Wanpeng Li @ 2017-02-28  8:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paolo Bonzini, Mike Galbraith, LKML, Ingo Molnar,
	Thomas Gleixner, Borislav Petkov

2017-02-28 16:08 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Tue, Feb 28, 2017 at 09:51:07AM +0800, Wanpeng Li wrote:
>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
>> index 4e95b2e..ed8eda4 100644
>> --- a/arch/x86/kernel/cpu/amd.c
>> +++ b/arch/x86/kernel/cpu/amd.c
>> @@ -557,7 +557,7 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>>          set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
>>          if (check_tsc_unstable())
>>              clear_sched_clock_stable();
>> -    } else {
>> +    } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
>>          clear_sched_clock_stable();
>>      }
>
> That's wrong, you can have HYPERVISOR and still use
> native_sched_clock() (lguest does that for one).

My posting delay to check this in tsc_init().

Regards,
Wanpeng Li

>
>
> I suspect we can do something like the below. Since we fixed the
> clocksource watchdog to mark TSC unstable, and we're already fairly
> careful with using TSC for timekeeping anyway.
>
> ---
>  arch/x86/kernel/cpu/amd.c       |  4 ----
>  arch/x86/kernel/cpu/centaur.c   |  2 --
>  arch/x86/kernel/cpu/common.c    |  3 ---
>  arch/x86/kernel/cpu/cyrix.c     |  1 -
>  arch/x86/kernel/cpu/intel.c     |  4 ----
>  arch/x86/kernel/cpu/transmeta.c |  2 --
>  arch/x86/kernel/tsc.c           | 35 +++++++++++++++++++++++------------
>  7 files changed, 23 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 35a5d5dca2fa..c36140d788fe 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -556,10 +556,6 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>         if (c->x86_power & (1 << 8)) {
>                 set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
>                 set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
> -               if (check_tsc_unstable())
> -                       clear_sched_clock_stable();
> -       } else {
> -               clear_sched_clock_stable();
>         }
>
>         /* Bit 12 of 8000_0007 edx is accumulated power mechanism. */
> diff --git a/arch/x86/kernel/cpu/centaur.c b/arch/x86/kernel/cpu/centaur.c
> index 106bd3318121..44207b71fee1 100644
> --- a/arch/x86/kernel/cpu/centaur.c
> +++ b/arch/x86/kernel/cpu/centaur.c
> @@ -105,8 +105,6 @@ static void early_init_centaur(struct cpuinfo_x86 *c)
>  #ifdef CONFIG_X86_64
>         set_cpu_cap(c, X86_FEATURE_SYSENTER32);
>  #endif
> -
> -       clear_sched_clock_stable();
>  }
>
>  static void init_centaur(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index c188ae5a5d9f..0209907a63b1 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -88,7 +88,6 @@ static void default_init(struct cpuinfo_x86 *c)
>                         strcpy(c->x86_model_id, "386");
>         }
>  #endif
> -       clear_sched_clock_stable();
>  }
>
>  static const struct cpu_dev default_cpu = {
> @@ -1077,8 +1076,6 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>          */
>         if (this_cpu->c_init)
>                 this_cpu->c_init(c);
> -       else
> -               clear_sched_clock_stable();
>
>         /* Disable the PN if appropriate */
>         squash_the_stupid_serial_number(c);
> diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
> index 0a3bc19de017..a70fd61095f8 100644
> --- a/arch/x86/kernel/cpu/cyrix.c
> +++ b/arch/x86/kernel/cpu/cyrix.c
> @@ -185,7 +185,6 @@ static void early_init_cyrix(struct cpuinfo_x86 *c)
>                 set_cpu_cap(c, X86_FEATURE_CYRIX_ARR);
>                 break;
>         }
> -       clear_sched_clock_stable();
>  }
>
>  static void init_cyrix(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index fe0a615a051b..063197771b8d 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -162,10 +162,6 @@ static void early_init_intel(struct cpuinfo_x86 *c)
>         if (c->x86_power & (1 << 8)) {
>                 set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
>                 set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
> -               if (check_tsc_unstable())
> -                       clear_sched_clock_stable();
> -       } else {
> -               clear_sched_clock_stable();
>         }
>
>         /* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
> diff --git a/arch/x86/kernel/cpu/transmeta.c b/arch/x86/kernel/cpu/transmeta.c
> index 8457b4978668..d77d07ab310b 100644
> --- a/arch/x86/kernel/cpu/transmeta.c
> +++ b/arch/x86/kernel/cpu/transmeta.c
> @@ -16,8 +16,6 @@ static void early_init_transmeta(struct cpuinfo_x86 *c)
>                 if (xlvl >= 0x80860001)
>                         c->x86_capability[CPUID_8086_0001_EDX] = cpuid_edx(0x80860001);
>         }
> -
> -       clear_sched_clock_stable();
>  }
>
>  static void init_transmeta(struct cpuinfo_x86 *c)
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 46bcda4cb1c2..5ca0f52e1ba1 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -327,9 +327,16 @@ unsigned long long sched_clock(void)
>  {
>         return paravirt_sched_clock();
>  }
> +
> +static inline bool using_native_sched_clock(void)
> +{
> +       return pv_time_ops.sched_clock == native_sched_clock;
> +}
>  #else
>  unsigned long long
>  sched_clock(void) __attribute__((alias("native_sched_clock")));
> +
> +static inline bool using_native_sched_clock(void) { return true; }
>  #endif
>
>  int check_tsc_unstable(void)
> @@ -1112,8 +1119,10 @@ static void tsc_cs_mark_unstable(struct clocksource *cs)
>  {
>         if (tsc_unstable)
>                 return;
> +
>         tsc_unstable = 1;
> -       clear_sched_clock_stable();
> +       if (using_native_sched_clock())
> +               clear_sched_clock_stable();
>         disable_sched_clock_irqtime();
>         pr_info("Marking TSC unstable due to clocksource watchdog\n");
>  }
> @@ -1135,18 +1144,20 @@ static struct clocksource clocksource_tsc = {
>
>  void mark_tsc_unstable(char *reason)
>  {
> -       if (!tsc_unstable) {
> -               tsc_unstable = 1;
> +       if (tsc_unstable)
> +               return;
> +
> +       tsc_unstable = 1;
> +       if (using_native_sched_clock())
>                 clear_sched_clock_stable();
> -               disable_sched_clock_irqtime();
> -               pr_info("Marking TSC unstable due to %s\n", reason);
> -               /* Change only the rating, when not registered */
> -               if (clocksource_tsc.mult)
> -                       clocksource_mark_unstable(&clocksource_tsc);
> -               else {
> -                       clocksource_tsc.flags |= CLOCK_SOURCE_UNSTABLE;
> -                       clocksource_tsc.rating = 0;
> -               }
> +       disable_sched_clock_irqtime();
> +       pr_info("Marking TSC unstable due to %s\n", reason);
> +       /* Change only the rating, when not registered */
> +       if (clocksource_tsc.mult)
> +               clocksource_mark_unstable(&clocksource_tsc);
> +       else {
> +               clocksource_tsc.flags |= CLOCK_SOURCE_UNSTABLE;
> +               clocksource_tsc.rating = 0;
>         }
>  }
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-02-28  8:11                       ` Wanpeng Li
@ 2017-03-01 13:39                         ` Wanpeng Li
  2017-03-01 14:17                           ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Wanpeng Li @ 2017-03-01 13:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paolo Bonzini, Mike Galbraith, LKML, Ingo Molnar,
	Thomas Gleixner, Borislav Petkov

2017-02-28 16:11 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> 2017-02-28 16:08 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
>> On Tue, Feb 28, 2017 at 09:51:07AM +0800, Wanpeng Li wrote:
>>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
>>> index 4e95b2e..ed8eda4 100644
>>> --- a/arch/x86/kernel/cpu/amd.c
>>> +++ b/arch/x86/kernel/cpu/amd.c
>>> @@ -557,7 +557,7 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>>>          set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
>>>          if (check_tsc_unstable())
>>>              clear_sched_clock_stable();
>>> -    } else {
>>> +    } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
>>>          clear_sched_clock_stable();
>>>      }
>>
>> That's wrong, you can have HYPERVISOR and still use
>> native_sched_clock() (lguest does that for one).
>
> My posting delay to check this in tsc_init().

Does it make sense?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: tip.today - scheduler bam boom crash (cpu hotplug)
  2017-03-01 13:39                         ` Wanpeng Li
@ 2017-03-01 14:17                           ` Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2017-03-01 14:17 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paolo Bonzini, Mike Galbraith, LKML, Ingo Molnar,
	Thomas Gleixner, Borislav Petkov

On Wed, Mar 01, 2017 at 09:39:18PM +0800, Wanpeng Li wrote:
> 2017-02-28 16:11 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> > 2017-02-28 16:08 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> >> On Tue, Feb 28, 2017 at 09:51:07AM +0800, Wanpeng Li wrote:
> >>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> >>> index 4e95b2e..ed8eda4 100644
> >>> --- a/arch/x86/kernel/cpu/amd.c
> >>> +++ b/arch/x86/kernel/cpu/amd.c
> >>> @@ -557,7 +557,7 @@ static void early_init_amd(struct cpuinfo_x86 *c)
> >>>          set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
> >>>          if (check_tsc_unstable())
> >>>              clear_sched_clock_stable();
> >>> -    } else {
> >>> +    } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> >>>          clear_sched_clock_stable();
> >>>      }
> >>
> >> That's wrong, you can have HYPERVISOR and still use
> >> native_sched_clock() (lguest does that for one).
> >
> > My posting delay to check this in tsc_init().
> 
> Does it make sense?

It makes confusion from where I'm sitting. Please look at that latest
patch I send. I think it should work, but in case you have doubts, try
and find the holes.

  http://lkml.kernel.org/r/20170228080855.GV6515@twins.programming.kicks-ass.net

That marries the sched_clock stable to the existing TSC stable machinery
and removes a bunch of quirks.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-03-01 14:18 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-19  7:31 tip.today - scheduler bam boom crash (cpu hotplug) Mike Galbraith
2017-01-19 10:19 ` Peter Zijlstra
2017-01-19 11:39   ` Mike Galbraith
2017-01-19 13:36   ` Peter Zijlstra
2017-01-19 16:54     ` Ingo Molnar
2017-01-19 17:20       ` Peter Zijlstra
2017-01-19 20:35     ` sched/clock: Fix hotplug issue kbuild test robot
2017-01-19 20:37     ` kbuild test robot
2017-01-20  6:36     ` [tip:sched/core] sched/clock: Fix hotplug crash tip-bot for Peter Zijlstra
2017-02-27 12:30     ` tip.today - scheduler bam boom crash (cpu hotplug) Wanpeng Li
2017-02-27 12:35       ` Paolo Bonzini
2017-02-27 12:43       ` Peter Zijlstra
2017-02-27 12:50         ` Paolo Bonzini
2017-02-27 13:04           ` Peter Zijlstra
2017-02-27 15:27             ` Paolo Bonzini
2017-02-27 15:59               ` Peter Zijlstra
2017-02-27 16:11                 ` Paolo Bonzini
2017-02-27 16:36                   ` Peter Zijlstra
2017-02-27 17:27                     ` Paolo Bonzini
2017-02-27 17:40                       ` Thomas Gleixner
2017-02-27 19:06                         ` Paolo Bonzini
2017-02-27 20:35                           ` Peter Zijlstra
2017-02-28  1:51                   ` Wanpeng Li
2017-02-28  8:08                     ` Peter Zijlstra
2017-02-28  8:11                       ` Wanpeng Li
2017-03-01 13:39                         ` Wanpeng Li
2017-03-01 14:17                           ` Peter Zijlstra
2017-02-27 13:48         ` Wanpeng Li
2017-02-27 14:05           ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.