From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.1 required=3.0 tests=DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,T_DKIM_INVALID, USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C883C43144 for ; Tue, 26 Jun 2018 09:00:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1B6AD2673A for ; Tue, 26 Jun 2018 09:00:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="K9qwAQMn" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1B6AD2673A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933706AbeFZJAw (ORCPT ); Tue, 26 Jun 2018 05:00:52 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:48422 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752826AbeFZJAr (ORCPT ); Tue, 26 Jun 2018 05:00:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=knbt4nsNDlw6BJyCk3blM79cAITpFNowfwduF4lvq2E=; b=K9qwAQMni/UKZokBSdXN6c/tm 3MwjZt40C6v2OOSKio4v+BQviJUt2Bx3qOvWcU97E9WhP3SbOZuqWcgWCPoGNLAClaG6IBcayXK56 Ve8DSWaIVTUiDGJampTlvfkMwz+ynx/nl9GhbwCc+6bu1U75/CL/pKgW4bJPD1btFtUHYSRvbkC3b s+zYuXyzeiYpMwsGT/hL+Yn6PhShTw8EjW8SWL7CE+iOlbjUO9mRrfLVFHiIQhGvTZF5Vj6lsCB6X Iu2xJV18cDKXeVUSQ5+CTY5wd6hJCJn5UtobRSLBfwuAjVn/fP1FBjs7pvBpTp7/y0PUOyyOZo21b oFu16UK2A==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fXjpV-0005IG-Qu; Tue, 26 Jun 2018 09:00:06 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 453B62029F1D7; Tue, 26 Jun 2018 11:00:03 +0200 (CEST) Date: Tue, 26 Jun 2018 11:00:03 +0200 From: Peter Zijlstra To: Pavel Tatashin Cc: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, linux@armlinux.org.uk, schwidefsky@de.ibm.com, heiko.carstens@de.ibm.com, john.stultz@linaro.org, sboyd@codeaurora.org, x86@kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, tglx@linutronix.de, hpa@zytor.com, douly.fnst@cn.fujitsu.com, prarit@redhat.com, feng.tang@intel.com, pmladek@suse.com, gnomes@lxorguk.ukuu.org.uk, linux-s390@vger.kernel.org Subject: Re: [PATCH v12 10/11] sched: early boot clock Message-ID: <20180626090003.GA2458@hirez.programming.kicks-ass.net> References: <20180621212518.19914-1-pasha.tatashin@oracle.com> <20180621212518.19914-11-pasha.tatashin@oracle.com> <20180625085543.GT2494@hirez.programming.kicks-ass.net> <20180625192320.kzmqkvmfh5aeuhhx@xakep.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180625192320.kzmqkvmfh5aeuhhx@xakep.localdomain> User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 25, 2018 at 03:23:20PM -0400, Pavel Tatashin wrote: > Unfortunatly the above suggestion won't work. And here is why. > > We have a call sequence like this: > > start_kernel > sched_init() > sched_clock_init() > In this call sched_clock_running is set to 1. Which means > that sched_clock_cpu() starts doing the following sequence: > scd = cpu_sdc(cpu); > clock = sched_clock_local(scd); > Where we try to filter the output of sched_clock() based > on the value of scd. But, that won't work, because to get > this functionality, we need to have: timer initialized > that wakes up and updates scd, and we need timekeeping > initialized, so we can call ktime_get_ns(). Both of which > are called later. > ... > timekeeping_init() After this we can call ktime_get_ns() > time_init() Here we configure x86_late_time_init pointer. > ... > late_time_init() > x86_late_time_init() > x86_init.timers.timer_init() > hpet_time_init() Only after this call we finally start > getting clock interrupts, and can get precise output from > sched_clock_local(). > > The way I solved the above, is I changed sched_clock() to keep outputing > time based on early boot sched_clock() until sched_clock_init_late(), at > whic point everything is configured and we can switch to the permanent > clock, eventhough this happens after smp init. > > If you have a better solution, please let me know. How's something like this? That moves sched_clock_init() to right before we enable IRQs for the first time (which is after we've started the whole timekeeping business). The thing is, sched_clock_init_late() reall is far too late, we need to switch to unstable before we bring up SMP. --- include/linux/sched_clock.h | 5 ----- init/main.c | 4 ++-- kernel/sched/clock.c | 49 +++++++++++++++++++++++++++++++++++---------- kernel/sched/core.c | 1 - kernel/time/sched_clock.c | 2 +- 5 files changed, 41 insertions(+), 20 deletions(-) diff --git a/include/linux/sched_clock.h b/include/linux/sched_clock.h index 411b52e424e1..2d223677740f 100644 --- a/include/linux/sched_clock.h +++ b/include/linux/sched_clock.h @@ -9,17 +9,12 @@ #define LINUX_SCHED_CLOCK #ifdef CONFIG_GENERIC_SCHED_CLOCK -extern void sched_clock_postinit(void); - extern void sched_clock_register(u64 (*read)(void), int bits, unsigned long rate); #else -static inline void sched_clock_postinit(void) { } - static inline void sched_clock_register(u64 (*read)(void), int bits, unsigned long rate) { - ; } #endif diff --git a/init/main.c b/init/main.c index 3b4ada11ed52..162d931c9511 100644 --- a/init/main.c +++ b/init/main.c @@ -79,7 +79,7 @@ #include #include #include -#include +#include #include #include #include @@ -642,7 +642,7 @@ asmlinkage __visible void __init start_kernel(void) softirq_init(); timekeeping_init(); time_init(); - sched_clock_postinit(); + sched_clock_init(); printk_safe_init(); perf_event_init(); profile_init(); diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index 10c83e73837a..c8286b9fc593 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -68,11 +68,6 @@ EXPORT_SYMBOL_GPL(sched_clock); __read_mostly int sched_clock_running; -void sched_clock_init(void) -{ - sched_clock_running = 1; -} - #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK /* * We must start with !__sched_clock_stable because the unstable -> stable @@ -199,6 +194,23 @@ void clear_sched_clock_stable(void) __clear_sched_clock_stable(); } +static void __sched_clock_gtod_offset(void) +{ + __gtod_offset = (sched_clock() + __sched_clock_offset) - ktime_get_ns(); +} + +void __init sched_clock_init(void) +{ + /* + * Set __gtod_offset such that once we mark sched_clock_running, + * sched_clock_tick() continues where sched_clock() left off. + * + * Even if TSC is buggered, we're still UP at this point so it + * can't really be out of sync. + */ + __sched_clock_gtod_offset(); + sched_clock_running = 1; +} /* * We run this as late_initcall() such that it runs after all built-in drivers, * notably: acpi_processor and intel_idle, which can mark the TSC as unstable. @@ -351,7 +363,7 @@ u64 sched_clock_cpu(int cpu) return sched_clock() + __sched_clock_offset; if (unlikely(!sched_clock_running)) - return 0ull; + return sched_clock(); /* __sched_clock_offset == 0 */ preempt_disable_notrace(); scd = cpu_sdc(cpu); @@ -385,8 +397,6 @@ void sched_clock_tick(void) void sched_clock_tick_stable(void) { - u64 gtod, clock; - if (!sched_clock_stable()) return; @@ -398,9 +408,7 @@ void sched_clock_tick_stable(void) * TSC to be unstable, any computation will be computing crap. */ local_irq_disable(); - gtod = ktime_get_ns(); - clock = sched_clock(); - __gtod_offset = (clock + __sched_clock_offset) - gtod; + __sched_clock_gtod_offset(); local_irq_enable(); } @@ -434,6 +442,24 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event); #else /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */ +#ifdef CONFIG_GENERIC_SCHED_CLOCK + +/* + * kernel/time/sched_clock.c:sched_clock_init() + */ + +u64 sched_clock_cpu(int cpu) +{ + return sched_clock(); +} + +#else /* CONFIG_GENERIC_SCHED_CLOCK */ + +void __init sched_clock_init(void) +{ + sched_clock_running = 1; +} + u64 sched_clock_cpu(int cpu) { if (unlikely(!sched_clock_running)) @@ -442,6 +468,7 @@ u64 sched_clock_cpu(int cpu) return sched_clock(); } +#endif /* CONFIG_GENERIC_SCHED_CLOCK */ #endif /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */ /* diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a98d54cd5535..b27d034ef4a7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5953,7 +5953,6 @@ void __init sched_init(void) int i, j; unsigned long alloc_size = 0, ptr; - sched_clock_init(); wait_bit_init(); #ifdef CONFIG_FAIR_GROUP_SCHED diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c index 2d8f05aad442..b4fedf312979 100644 --- a/kernel/time/sched_clock.c +++ b/kernel/time/sched_clock.c @@ -237,7 +237,7 @@ sched_clock_register(u64 (*read)(void), int bits, unsigned long rate) pr_debug("Registered %pF as sched_clock source\n", read); } -void __init sched_clock_postinit(void) +void __init sched_clock_init(void) { /* * If no sched_clock() function has been provided at that point,