From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751699AbdCYShO (ORCPT ); Sat, 25 Mar 2017 14:37:14 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:41791 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751446AbdCYShN (ORCPT ); Sat, 25 Mar 2017 14:37:13 -0400 Date: Sat, 25 Mar 2017 19:36:31 +0100 (CET) From: Thomas Gleixner To: Pasha Tatashin cc: x86@kernel.org, LKML , mingo@redhat.com, Peter Zijlstra , "H. Peter Anvin" Subject: Re: [v2 0/9] Early boot time stamps for x86 In-Reply-To: Message-ID: References: <1490368899-877997-1-git-send-email-pasha.tatashin@oracle.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 25 Mar 2017, Pasha Tatashin wrote: > The second versions was actually meant as a reply to your e-mail: the code > differences were minimal: the main differences were in the cover letter. You > mentioned that it is not necessary to have early boot time stamps, and I > wanted to show examples how this data is useful to track scalability bugs and > avoid future regressions. I did not say that early timestamps are not useful. I said they fall not into the category of things which are absolutely necessary in the earliest boot process. Can you spot the difference? If you want to show me something to prove a point, then why don't you reply to my mail as anybody else does who wants a to start a conversation or discussion? Ignoring my remarks and then sending another hastily cobbled together patch set is not in any way a form of conversation/discussion or follow up to my mail. > Anyway, you asked for some time to think about this problem, I certainly did not ask for some time. ... I told you that I need to find a free time slot to look into that in detail. I told you as well, that some of your decisions (earliest boot code, code duplication etc.) are not acceptable. The fact that I told you that I need to find a free time slot also means that I'm going to tell you in more detail why and what needs to be done. > ... I won't send any replies to this thread for the next two weeks. Not replying to mail for a fixed amount of time is just silly. What are you going to do after that? Resend another half baken patch series which ignores every review comment? The review/feedback mechanism does not work that way. If you get feedback, then you either accept it and act accordingly or you discuss it as a response to that feedback. When that discussion is settled, then it's time to rework stuff and send out a new version after making sure that everything is addressed. > So, please consider this solution. The feature is well abstracted, does > not harm the performance of the fast path, and if necessary it can also > be made optional with something like: CONFIG_HAVE_UNSTABLE_EARLY_CLOCK Well abstracted? I have no idea what your understanding of abstractions is, but it's definitely different from mine. This is not abstracted, it's glued into the code in the 'works for me' mode. It's a proof of concept if at all. And no, this won't have a CONFIG switch to clutter the code even more. First of all, this "solution" is only valid for a very restricted set of systems and breaks others in very subtle ways. FYI, the architecture name is x86, which handles 32bit/64bit, various vendors and hypervisors. It's not 'arch/mymachine'. So this needs to be integrated into the existing TSC/CPU calibration mechanism which works across all supported systems. And that means, that early TSC calibration can't be done before x86_init.oem.arch_setup() and hypervisor_init_platform() have been invoked. By that time, which is still pretty early in the boot process: - boot_cpu_data has been preinitialized - calibration functions setup has been done So this won't ignore decisions made by the various platform quirks and does not require pointless copies of existing code and absurd hacks to make it work. Also at this point the most fragile parts of the boot process have been handled. Putting the early TSC initialization after hypervisor_init_platform() makes this available for _ALL_ systems/platforms as infrastructure and allows the platforms to add the required bits of support via a new set of function pointers in struct x86_platform_ops, which btw. is a proper form of abstraction. For some of them it's just switching the calibration function to the early one, for others this won't be possible ever. The time between x86_64_start_kernel() and that point is in the low single digit millisecond range or less than a millisecond and therefor completely irrelevant. The first timestamps before that point will be 0 as they used to be. You can argue in circles about that, it's simply not debatable. Doing that allows to simply split the existing TSC init into an early and late part and that late part is going to switch over from early sched clock to the regular one w/o any _fini() calls or other absurdities. The same applies to the sched core clock code. This can be integrated w/o all that extra early/fini() hackery cleanly. All it takes is a weak sched_clock_early() function which returns 0 to start with. You probably can figure out how to overload it. But that's only the sched clock part of the problem. If this early sched_clock() is available, then this needs to be fed into the setup of timekeeping as well, otherwise dmesg tells 50 seconds into boot and clock monotonic/boottime say 5, which would be confusing at best. That has not be solved in the first step, but definitely before something like this is going to be merged. Thanks, tglx