From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, T_DKIMWL_WL_HIGH,UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C006BC43142 for ; Tue, 26 Jun 2018 18:42:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6A5C321A16 for ; Tue, 26 Jun 2018 18:42:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="mlu/Mfa4" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A5C321A16 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933219AbeFZSmw (ORCPT ); Tue, 26 Jun 2018 14:42:52 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:33816 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752285AbeFZSmu (ORCPT ); Tue, 26 Jun 2018 14:42:50 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w5QIe0t2102811; Tue, 26 Jun 2018 18:42:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type; s=corp-2017-10-26; bh=3J9DSDLz9nsLv43HQKxoGHpt8HsrtT/+o2r3LCFvNzY=; b=mlu/Mfa4T5YRNQ3UB/CcIgY7xTHNXGH/ebbQVeygTwwlcrdWjpjhAMZ3CzQja4Epekri a4jY/8ynaGNo4HvqMOsh0yYT3M1rhYynwC41umWF/nn46F0gQcAr4KGsZhZWIC015ljM S3R+gYZqYu5DQwa2LCG3ooSFZry++2xZOqveVDVXytQQ+4Omj0owFA8B6GozEMUo4ZtN Qg1lT3cEF2meQGa5PI0LKVYUDWg9TI5uGSNi46QK1CuhMHb0NhubV9Z0jUpugLzzYJRA v+SDsgzEEb/FoDOWRo6L2I9vk9o6lSzwyjDLH6uMOVlUaSdFy94R1v9tgqQkAXdNY3j6 VQ== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2130.oracle.com with ESMTP id 2jukmtswqj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 26 Jun 2018 18:42:50 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w5QIglpM001911 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 26 Jun 2018 18:42:47 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w5QIgkRx024992; Tue, 26 Jun 2018 18:42:47 GMT Received: from mail-ot0-f177.google.com (/74.125.82.177) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 26 Jun 2018 11:42:46 -0700 Received: by mail-ot0-f177.google.com with SMTP id a6-v6so5623796otf.2; Tue, 26 Jun 2018 11:42:46 -0700 (PDT) X-Gm-Message-State: APt69E36OzNU1B3Df/qavn/3OKNk9zYapWGiRC47GoY33zvzkvkDYs2C wSqoq7pYnGpZ2rVAbG/pU3/0B0+5JpPnN7Yggdc= X-Google-Smtp-Source: AAOMgpciQAZxrUrzcwYDswYnwSLuVtz0R7/vCrvRiY/tfNmQvsXr8h7JIw403eA1xDfGfDRYzC7eTuapGv8cQUmWsLE= X-Received: by 2002:a9d:5b39:: with SMTP id x54-v6mr1435664oth.275.1530038566271; Tue, 26 Jun 2018 11:42:46 -0700 (PDT) MIME-Version: 1.0 References: <20180621212518.19914-1-pasha.tatashin@oracle.com> <20180621212518.19914-10-pasha.tatashin@oracle.com> In-Reply-To: From: Pavel Tatashin Date: Tue, 26 Jun 2018 14:42:09 -0400 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v12 09/11] x86/tsc: prepare for early sched_clock To: tglx@linutronix.de Cc: Steven Sistare , Daniel Jordan , linux@armlinux.org.uk, schwidefsky@de.ibm.com, Heiko Carstens , John Stultz , sboyd@codeaurora.org, x86@kernel.org, LKML , mingo@redhat.com, hpa@zytor.com, douly.fnst@cn.fujitsu.com, peterz@infradead.org, prarit@redhat.com, feng.tang@intel.com, Petr Mladek , gnomes@lxorguk.ukuu.org.uk, linux-s390@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8936 signatures=668703 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=3 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1806260209 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Thomas, On Tue, Jun 26, 2018 at 11:44 AM Thomas Gleixner wrote: > > Pavel, > > first of all, sorry for my last outburst. I just was in a lousy mood after > staring into too much half baken stuff and failed to make myself stay away > from the computer. Thank you. > > On Sun, 24 Jun 2018, Thomas Gleixner wrote: > > On Sat, 23 Jun 2018, Pavel Tatashin wrote: > > And this early init sequence also needs to pull over the tsc adjust > > magic. So tsc_early_delay_calibrate() which should btw. be renamed to > > tsc_early_init() should have: > > > > { > > cpu_khz = x86_platform.calibrate_cpu(); > > tsc_khz = x86_platform.calibrate_tsc(); > > > > tsc_khz = tsc_khz ? : cpu_khz; > > if (!tsc_khz) > > return; > > > > /* Sanitize TSC ADJUST before cyc2ns gets initialized */ > > tsc_store_and_check_tsc_adjust(true); > > > > calc_lpj(tsc_khz); > > > > tsc_sched_clock_init(); > > } > > Peter made me look deeper into this and there are a few issues, which I > missed, depending on when some of the resources become available. So we > probably cannot hook all of this into tsc_early_delay_calibrate(). > > I have an idea how to distangle it and we'll end up in a staged approach, > which looks like this: > > 1) Earliest one (not sure how early yet) > > Attempt to use MSR/CPUID. If not running on a hypervisor this can > try the quick PIT calibration, but nothing else. > > 2) Post init_hypervisor_platform() > > An attempt to use the hypervisor data can be made. > > 3) Post early_acpi_boot_init() > > This can do PIT/HPET based calibration > > 4) Post x86_dtb_init() > > PIT/PMTIMER based calibration > > Once tsc_khz is known, no further attempts of calibration are made. I'll > look into that later tonight. I think, there are no reasons to try staged attempts. It usually gets harder to maintain overtime. In my opinion it is best if do it in two tries, as right now, but just cleaner. The first attempt we get a crude result, using the lowest denominator to which current logic might fallback if something else is not available that early in boot: i.e cpu calibration loop in native_calibrate_cpu() but later get something better. Also, even if early clock does not work because we could not get tsc early, it is not a problem, we still will probably determine it later during tsc_init call. I have re-wrote tsc_early_init()/tsc_init(), they are much simpler now: void __init tsc_early_init(void) { if (!boot_cpu_has(X86_FEATURE_TSC)) return; if (!determine_cpu_tsc_frequncies()) return; cyc2ns_init_boot_cpu(); static_branch_enable(&__use_tsc); loops_per_jiffy = get_loops_per_jiffy(tsc_khz); } void __init tsc_init(void) { if (!boot_cpu_has(X86_FEATURE_TSC)) return; if (!determine_cpu_tsc_frequncies()) { setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER); return; } /* Sanitize TSC ADJUST before cyc2ns gets initialized */ tsc_store_and_check_tsc_adjust(true); cyc2ns_reinit_boot_cpu(); cyc2ns_init_secondary_cpus(); static_branch_enable(&__use_tsc); if (!no_sched_irq_time) enable_sched_clock_irqtime(); lpj_fine = get_loops_per_jiffy(tsc_khz); use_tsc_delay(); check_system_tsc_reliable(); if (unsynchronized_tsc()) { mark_tsc_unstable("TSCs unsynchronized"); return; } clocksource_register_khz(&clocksource_tsc_early, tsc_khz); detect_art(); } All the new functions are self explanatory. I added three cyc2ns related functions based on your suggestions on how to clean-up that code: static void __init cyc2ns_init_boot(void) { struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns); seqcount_init(&c2n->seq); __set_cyc2ns_scale(tsc_khz, smp_processor_id(), rdtsc()); } static void __init cyc2ns_init_secondary(void) { unsigned int cpu, this_cpu = smp_processor_id(); struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns); struct cyc2ns_data *data = c2n->data; for_each_possible_cpu(cpu) { if (cpu != this_cpu) { seqcount_init(&c2n->seq); c2n = per_cpu_ptr(&cyc2ns, cpu); c2n->data[0] = data[0]; c2n->data[1] = data[1]; } } } /* Reinitialize boot cpu c2ns, using the offset of the current sched_clock() value */ static void __init cyc2ns_reinit_boot(void) { struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns); unsigned long sched_now = sched_clock(); __set_cyc2ns_scale(tsc_khz, smp_processor_id(), rdtsc()); c2n->data[0].cyc2ns_offset += sched_now; c2n->data[1].cyc2ns_offset += sched_now; } I know, conceptually, it is similar to what I had before, but I think it is simple enough, easy to maintain, but more importantly safe. What do you think? Thank you, Pavel