All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Tatashin <pasha.tatashin@oracle.com>
To: tglx@linutronix.de
Cc: Steven Sistare <steven.sistare@oracle.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	linux@armlinux.org.uk, schwidefsky@de.ibm.com,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	John Stultz <john.stultz@linaro.org>,
	sboyd@codeaurora.org, x86@kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	mingo@redhat.com, hpa@zytor.com, douly.fnst@cn.fujitsu.com,
	peterz@infradead.org, prarit@redhat.com, feng.tang@intel.com,
	Petr Mladek <pmladek@suse.com>,
	gnomes@lxorguk.ukuu.org.uk, linux-s390@vger.kernel.org,
	andriy.shevchenko@linux.intel.com, boris.ostrovsky@oracle.com
Subject: Re: [PATCH v12 09/11] x86/tsc: prepare for early sched_clock
Date: Thu, 28 Jun 2018 15:42:54 -0400	[thread overview]
Message-ID: <CAGM2read-NcUjssuUPHaf=qhdidT8f7cRNOs_GjdPnuOVvi-Lw@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1806281159590.1778@nanos.tec.linutronix.de>

On Thu, Jun 28, 2018 at 11:23 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Thu, 28 Jun 2018, Thomas Gleixner wrote:
> > I still want to document the unholy mess of what is initialized and
> > available when. We have 5 hypervisors and 3 different points in early boot
> > where the calibrate_* callbacks are overwritten. The XEN PV one is actually
> > post tsc_init_early() for whatever reason.
> >
> > That's all completely obscure and any attempt of moving tsc_early_init()
> > earlier than where it is now is just lottery.
> >
> > The other issue is that double calibration, e.g. doing the PIT thing twice
> > is just consuming boot time for no value.
> >
> > All of that has been duct taped over time and we really don't want yet
> > another thing glued to it just because we can.
>
> So here is the full picture of the TSC/CPU calibration maze:
>
> Compile time setup:
>         native_calibrate_tsc
>                 CPUID based frequency read out with magic fixups
>                 for broken CPUID implementations
>
>         native_calibrate_cpu
>                 Try the following:
>
>                 1) CPUID based (different leaf than the TSC one)
>                 2) MSR based
>                 3) Quick PIT calibration
>                 4) PIT/HPET/PMTIMER calibration (slow) and only
>                    available in tsc_init(). Could be made working
>                    post x86_dtb_init().
>
>
> Boot sequence:
>
>   start_kernel()
>
>         INTEL_MID:
>                 x86_intel_mid_early_setup()
>                 calibrate_tsc = intel_mid_calibrate_tsc
>
>                 intel_mid_calibrate_tsc() { return 0; }
>
>   setup_arch()
>
>         x86_init.oem.arch_setup();
>           INTEL_MID:
>                 intel_mid_arch_setup()
>
>                 PENWELL:
>                    x86_platform.calibrate_tsc = mfld_calibrate_tsc;
>
>                    MSR based magic. Value would be available right away.
>
>                 TANGIER:
>                    x86_platform.calibrate_tsc = tangier_calibrate_tsc;
>
>                    Different MSR based magic. Value would be available
>                    right away.
>
>         ....
>
>         init_hypervisor_platform()
>            vmware:
>                    Retrieves frequency and store it for the
>                    calibration function
>
>                    khz = vmware_get_khz_magic()
>                    vmware_tsc_khz = khz
>                    calibrate_cpu = vmware_get_tsc_khz
>                    calibrate_tsc = vmware_get_tsc_khz
>                    preset_lpj(khz)
>
>            hyperv:
>                    if special hyperv MSRs are available:
>
>                       calibrate_cpu = hv_get_tsc_khz
>                       calibrate_tsc = hv_get_tsc_khz
>
>                    MSR is readable already in this function
>
>            jailhouse:
>
>                    Frequency is available in this function and store
>                    in a variable for the calibration function
>
>                    calibrate_cpu        = jailhouse_get_tsc
>                    calibrate_tsc        = jailhouse_get_tsc
>
>         ...
>
>         kvmclock_init()
>
>                 if (magic_conditions)
>                         calibrate_tsc = kvm_get_tsc_khz
>                         calibrate_cpu = kvm_get_tsc_khz
>
>                         kvm_get_preset_lpj()
>                            khz = kvm_get_tsc_khz()
>                            preset_lpj(khz);
>
>         tsc_early_delay_calibrate()
>             tsc_khz = calibrate_tsc()
>             cpu_khz = calibrate_cpu()
>
>             ....
>             set_lpj(tsc_khz);
>
>
>         x86_init.paging.pagetable_init()
>            xen_pagetable_init()
>               xen_setup_shared_info()
>                  xen_hvm_init_time_ops()
>                     if (XENFEAT_hvm_safe_pvclock)
>                         calibrate_tsc = xen_tsc_khz
>
>                         PV clock based access
>
>         tsc_init()
>             tsc_khz = calibrate_tsc()
>             cpu_khz = calibrate_cpu()
>
>
> Putting this into a table:
>
> Platform        tsc_early_delay_calibrate()     tsc_init()
> -----------------------------------------------------------------------
>
> Generic         native_calibrate_tsc()          native_calibrate_tsc()
>                 native_calibrate_cpu()          native_calibrate_cpu()
>                 (Cannot do HPET/PMTIMER)
>
> -----------------------------------------------------------------------
>
> INTEL_MID       intel_mid_calibrate_tsc()       intel_mid_calibrate_tsc()
> Generic         native_calibrate_cpu()          native_calibrate_cpu()
>
> INTEL_MID       mfld_calibrate_tsc()            mfld_calibrate_tsc()
> PENWELL         native_calibrate_cpu()          native_calibrate_cpu()
>
> INTEL_MID       tangier_calibrate_tsc()         tangier_calibrate_tsc()
> TANGIER         native_calibrate_cpu()          native_calibrate_cpu()
>
> -----------------------------------------------------------------------
>
> VNWARE          vmware_get_tsc_khz()            vmware_get_tsc_khz()
>                 vmware_get_tsc_khz()            vmware_get_tsc_khz()
>
> HYPERV          hv_get_tsc_khz()                hv_get_tsc_khz()
>                 hv_get_tsc_khz()                hv_get_tsc_khz()
>
>
> JAILHOUSE       jailhouse_get_tsc()             jailhouse_get_tsc()
>                 jailhouse_get_tsc()             jailhouse_get_tsc()
>
>
> KVM             kvm_get_tsc_khz()               kvm_get_tsc_khz()
>                 kvm_get_tsc_khz()               kvm_get_tsc_khz()
>
> ------------------------------------------------------------------------
>
> XEN             native_calibrate_tsc()          xen_tsc_khz()
>                 native_calibrate_cpu()          native_calibrate_cpu()
>
> ------------------------------------------------------------------------
>
> The only platform which cannot use the special TSC calibration routine
> in the early calibration is XEN because it's initialized just _after_ the
> early calibration runs.
>
> For enhanced fun the early calibration stuff was moved from right after
> init_hypervisor_platform() to the place where it is now in commit
> ccb64941f375a6 ("x86/timers: Move simple_udelay_calibration() past
> kvmclock_init()") to speed up KVM boot time by avoiding the PIT
> calibration. I have no idea why it wasn't just moved past the XEN
> initialization a few lines further down, especially as the change was done
> by a XEN maintainer :) Boris?
>
> The other HV guests all do more or less the same thing and return the same
> value for cpu_khz and tsc_khz via the calibration indirection despite the
> value being known in the init_platform() function already.
>
> The generic initilizaiton does everything twice, which makes no sense,
> except for the unlikely case were no fast functions are available and the
> quick PIT calibration fails (PMTIMER/HPET) are not available in early
> calibration. HPET
>
> The INTEL MID stuff is wierd and not really obvious. AFAIR those systems
> don't have PIT or such, so they need to rely on the MSR/CPUID mechanisms to
> work, but that's just working because and not for obvious reasons. Andy,
> can you shed some light on that stuff?
>
> So some of this just works by chance, things are done twice and pointlessly
> (XEN). This really wants to be cleaned up and well documented which the
> requirements of each platform are, especially the Intel-MID stuff needs
> that.

Hi Thomas,

In addition to above, we have xen hvm:

setup_arch()
    ...
    init_hypervisor_platform();
        x86_init.hyper.init_platform();
            xen_hvm_guest_init()
                xen_hvm_init_time_ops();
    ...
    tsc_early_delay_calibrate();
        tsc_khz = x86_platform.calibrate_tsc(); == xen_tsc_khz()
    ...

Which works early.

So, what should we do with xen, which seems to be the only platform
that would provide different tsc frequency early and late, because of
different calibration method?

Thank you,
Pavel

  parent reply	other threads:[~2018-06-28 19:43 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-21 21:25 [PATCH v12 00/11] Early boot time stamps for x86 Pavel Tatashin
2018-06-21 21:25 ` [PATCH v12 01/11] x86: text_poke() may access uninitialized struct pages Pavel Tatashin
2018-06-21 21:37   ` Randy Dunlap
2018-06-25  8:14   ` Peter Zijlstra
2018-06-25  8:39     ` Thomas Gleixner
2018-06-25  9:09       ` Peter Zijlstra
2018-06-25  9:18         ` Thomas Gleixner
2018-06-25  9:22           ` Peter Zijlstra
2018-06-25 12:32             ` Pavel Tatashin
2018-06-25 13:48               ` Peter Zijlstra
2018-06-25 14:06                 ` Pavel Tatashin
2018-06-21 21:25 ` [PATCH v12 02/11] x86: initialize static branching early Pavel Tatashin
2018-06-23  9:16   ` Borislav Petkov
2018-06-23 13:11     ` Pavel Tatashin
2018-06-21 21:25 ` [PATCH v12 03/11] x86/tsc: redefine notsc to behave as tsc=unstable Pavel Tatashin
2018-06-23 13:32   ` Thomas Gleixner
2018-06-21 21:25 ` [PATCH v12 04/11] kvm/x86: remove kvm memblock dependency Pavel Tatashin
2018-06-23 13:36   ` Thomas Gleixner
2018-07-05 16:12   ` Paolo Bonzini
2018-07-06  9:24     ` Thomas Gleixner
2018-07-06  9:36       ` Paolo Bonzini
2018-07-06  9:45         ` Thomas Gleixner
2018-07-06 10:08           ` Paolo Bonzini
2018-07-06 10:44             ` Thomas Gleixner
2018-07-06 10:50               ` Thomas Gleixner
2018-07-06 15:03                 ` Pavel Tatashin
2018-07-06 15:09                   ` Paolo Bonzini
2018-06-21 21:25 ` [PATCH v12 05/11] s390/time: add read_persistent_wall_and_boot_offset() Pavel Tatashin
2018-06-25  7:07   ` Martin Schwidefsky
2018-06-25 12:45     ` Pavel Tatashin
2018-06-21 21:25 ` [PATCH v12 06/11] time: replace read_boot_clock64() with read_persistent_wall_and_boot_offset() Pavel Tatashin
2018-06-23 13:49   ` Thomas Gleixner
2018-06-21 21:25 ` [PATCH v12 07/11] s390/time: remove read_boot_clock64() Pavel Tatashin
2018-06-21 21:25 ` [PATCH v12 08/11] ARM/time: " Pavel Tatashin
2018-06-23 13:52   ` Thomas Gleixner
2018-06-21 21:25 ` [PATCH v12 09/11] x86/tsc: prepare for early sched_clock Pavel Tatashin
2018-06-23 16:50   ` Thomas Gleixner
2018-06-23 18:49     ` Pavel Tatashin
2018-06-23 20:11     ` Thomas Gleixner
2018-06-23 21:29       ` Pavel Tatashin
2018-06-23 23:38         ` Thomas Gleixner
2018-06-24  2:43           ` Pavel Tatashin
2018-06-24  7:30             ` Thomas Gleixner
2018-06-26 15:42           ` Thomas Gleixner
2018-06-26 18:42             ` Pavel Tatashin
2018-06-26 19:47               ` Pavel Tatashin
2018-06-28  7:31               ` Thomas Gleixner
2018-06-28 10:43                 ` Thomas Gleixner
2018-06-28 11:46                   ` Peter Zijlstra
2018-06-28 12:27                     ` Thomas Gleixner
2018-06-28 19:42                   ` Pavel Tatashin [this message]
2018-06-29  7:30                     ` Thomas Gleixner
2018-06-29  8:57                       ` Pavel Tatashin
2018-07-03 20:59                         ` Thomas Gleixner
2018-07-02 17:18                       ` Konrad Rzeszutek Wilk
2018-06-29 14:30                   ` Andy Shevchenko
2018-06-29 17:50                     ` Andy Shevchenko
2018-07-09 23:16                   ` Boris Ostrovsky
2018-06-21 21:25 ` [PATCH v12 10/11] sched: early boot clock Pavel Tatashin
2018-06-25  8:55   ` Peter Zijlstra
2018-06-25 12:44     ` Pavel Tatashin
2018-06-25 19:23     ` Pavel Tatashin
2018-06-26  9:00       ` Peter Zijlstra
2018-06-26 11:27         ` Pavel Tatashin
2018-06-26 11:51           ` Pavel Tatashin
2018-06-26 15:07           ` Peter Zijlstra
2018-06-21 21:25 ` [PATCH v12 11/11] x86/tsc: use tsc early Pavel Tatashin
2018-06-23 16:56   ` Thomas Gleixner
2018-06-23 21:38     ` Pavel Tatashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGM2read-NcUjssuUPHaf=qhdidT8f7cRNOs_GjdPnuOVvi-Lw@mail.gmail.com' \
    --to=pasha.tatashin@oracle.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=douly.fnst@cn.fujitsu.com \
    --cc=feng.tang@intel.com \
    --cc=gnomes@lxorguk.ukuu.org.uk \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=prarit@redhat.com \
    --cc=sboyd@codeaurora.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=steven.sistare@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.