From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.71) id 1eoDla-000501-S2 for mharc-grub-devel@gnu.org; Tue, 20 Feb 2018 14:39:54 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50612) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eoDlV-0004z6-Uu for grub-devel@gnu.org; Tue, 20 Feb 2018 14:39:51 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eoDlR-0004Hd-0T for grub-devel@gnu.org; Tue, 20 Feb 2018 14:39:49 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:34468) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eoDlQ-0004G1-O3 for grub-devel@gnu.org; Tue, 20 Feb 2018 14:39:44 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w1KJawiJ143636; Tue, 20 Feb 2018 19:39:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2017-10-26; bh=o6h+vL4ixIBQEqEWnza3O6LxeTr3+serC/xoMGbi45Q=; b=OA2QaqP3PyQJj1DXMFhIsnckheoQAMOYEkrQb1d6HngMe+wJAj6COocAAPZ16wvXfA6z zwgjDchycnkEXjOMmYFixXUx0wUIgjN2hIIkpIdlzgglZ3BTHfjJL+VqIRpKIFhe4Wnx ZeIXQLgKIjT1NEWhFM4z/uJ4PmwOAlXiZ523AagwS4rkQzrLTbzj1bflDVgWmzNCZ7jM XUBxSN0cZqojw2YUKg9JvmJIOiVwUumqZ7Z//p7tcH4LJ/L3B6F2/x5xMlt/Lv1jALJI JpBvbM6UFOEYxn9p6mCrOYaG+n5ddVPQKTfOcVyIatwC4w3TvdR/FrMiAZr8TuzE1esE hg== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp2120.oracle.com with ESMTP id 2g8pqxsart-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 20 Feb 2018 19:39:42 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w1KJdfhN022164 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 20 Feb 2018 19:39:42 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w1KJddus030739; Tue, 20 Feb 2018 19:39:39 GMT Received: from olila.local.net-space.pl (/10.175.194.79) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 20 Feb 2018 11:39:38 -0800 Date: Tue, 20 Feb 2018 20:39:34 +0100 From: Daniel Kiper To: Peter Jones Cc: grub-devel@gnu.org Subject: Re: [PATCH] Make pmtimer tsc calibration not take 51 seconds to fail. Message-ID: <20180220193934.GQ20614@olila.local.net-space.pl> References: <20180129164226.GH16863@olila.local.net-space.pl> <20180219223756.23113-1-pjones@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180219223756.23113-1-pjones@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8810 signatures=668675 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802200236 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] [fuzzy] X-Received-From: 141.146.126.78 X-BeenThere: grub-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: The development of GNU GRUB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Feb 2018 19:39:51 -0000 On Mon, Feb 19, 2018 at 05:37:56PM -0500, Peter Jones wrote: > On my laptop running at 2.4GHz, if I run a VM where tsc calibration > using pmtimer will fail presuming a broken pmtimer, it takes ~51 seconds > to do so (as measured with the stopwatch on my phone), with a tsc delta > of 0x1cd1c85300, or around 125 billion cycles. > > If instead of trying to wait for 5-200ms to show up on the pmtimer, we try > to wait for 5-200us, it decides it's broken in ~0x2626aa0 TSCs, aka ~2.4 > million cycles, or more or less instantly. > > Additionally, this reading the pmtimer was returning 0xffffffff anyway, > and that's obviously an invalid return. I've added a check for that and > 0 so we don't bother waiting for the test if what we're seeing is dead > pins with no response at all. > > If "debug" is includes "pmtimer", you will see one of the following > three outcomes. If pmtimer gives all 0 or all 1 bits, you will see: > > kern/i386/tsc_pmtimer.c:77: pmtimer: 0xffffff bad_reads: 1 > kern/i386/tsc_pmtimer.c:77: pmtimer: 0xffffff bad_reads: 2 > kern/i386/tsc_pmtimer.c:77: pmtimer: 0xffffff bad_reads: 3 > kern/i386/tsc_pmtimer.c:77: pmtimer: 0xffffff bad_reads: 4 > kern/i386/tsc_pmtimer.c:77: pmtimer: 0xffffff bad_reads: 5 > kern/i386/tsc_pmtimer.c:77: pmtimer: 0xffffff bad_reads: 6 > kern/i386/tsc_pmtimer.c:77: pmtimer: 0xffffff bad_reads: 7 > kern/i386/tsc_pmtimer.c:77: pmtimer: 0xffffff bad_reads: 8 > kern/i386/tsc_pmtimer.c:77: pmtimer: 0xffffff bad_reads: 9 > kern/i386/tsc_pmtimer.c:77: pmtimer: 0xffffff bad_reads: 10 > kern/i386/tsc_pmtimer.c:78: timer is broken; giving up. OK. > This outcome was tested using qemu+kvm with UEFI (OVMF) firmware and > these options: -machine pc-q35-2.10 -cpu Broadwell-noTSX > > If pmtimer gives any other bit patterns but is not actually marching > forward fast enough to use for clock calibration, you will see: > > kern/i386/tsc_pmtimer.c:121: pmtimer delta is 0x0 (1904 iterations) > kern/i386/tsc_pmtimer.c:124: tsc delta is implausible: 0x2626aa0 OK. > This outcome was tested using grub compiled with GRUB_PMTIMER_IGNORE_BAD_READS > defined (so as not to trip the bad read test) using qemu+kvm with UEFI > (OVMF) firmware, and these options: -machine pc-q35-2.10 -cpu Broadwell-noTSX > > If pmtimer actually works, you'll see something like: > > kern/i386/tsc_pmtimer.c:121: pmtimer delta is 0x0 (1904 iterations) > kern/i386/tsc_pmtimer.c:124: tsc delta is implausible: 0x2626aa0 Hmmm... Same as above? > This outcome was tested using qemu+kvm with UEFI (OVMF) firmware, and > these options: -machine pc-i440fx-2.4 -cpu Broadwell-noTSX > > I've also tested this outcome on a real Intel Xeon E3-1275v3 on an Intel > Server Board S1200V3RPS using the SDV.RP.B8 "Release" build here: > https://firmware.intel.com/sites/default/files/UEFIDevKit_S1200RP_vB8.zip > > Signed-off-by: Peter Jones > --- > grub-core/kern/i386/tsc_pmtimer.c | 109 +++++++++++++++++++++++++++++++------- > 1 file changed, 89 insertions(+), 20 deletions(-) > > diff --git a/grub-core/kern/i386/tsc_pmtimer.c b/grub-core/kern/i386/tsc_pmtimer.c > index c9c36169978..ca15c3aacd7 100644 > --- a/grub-core/kern/i386/tsc_pmtimer.c > +++ b/grub-core/kern/i386/tsc_pmtimer.c > @@ -28,40 +28,101 @@ > #include > #include > > +/* > + * Define GRUB_PMTIMER_IGNORE_BAD_READS if you're trying to test a timer that's > + * present but doesn't keep time well. > + */ > +// #define GRUB_PMTIMER_IGNORE_BAD_READS > + > grub_uint64_t > grub_pmtimer_wait_count_tsc (grub_port_t pmtimer, > grub_uint16_t num_pm_ticks) > { > grub_uint32_t start; > - grub_uint32_t last; > - grub_uint32_t cur, end; > + grub_uint64_t cur, end; > grub_uint64_t start_tsc; > grub_uint64_t end_tsc; > - int num_iter = 0; > + unsigned int num_iter = 0; > +#ifndef GRUB_PMTIMER_IGNORE_BAD_READS > + int bad_reads = 0; > +#endif > > - start = grub_inl (pmtimer) & 0xffffff; > - last = start; > + /* > + * Some timers are 24-bit and some are 32-bit, but it doesn't make much > + * difference to us. Caring which one we have isn't really worth it since > + * the low-order digits will give us enough data to calibrate TSC. So just > + * mask the top-order byte off. > + */ > + cur = start = grub_inl (pmtimer) & 0xffffffUL; Just for the sake of readability I would do s/0xffffffUL/0x00ffffffUL/ here and below. > end = start + num_pm_ticks; > start_tsc = grub_get_tsc (); > while (1) > { > - cur = grub_inl (pmtimer) & 0xffffff; > - if (cur < last) > - cur |= 0x1000000; > - num_iter++; > + cur &= 0xffffffffff000000ULL; Could you put a comment before this line? It took me some time to get it and required to take a look below. This is not obvious at first sight. > + cur |= grub_inl (pmtimer) & 0xffffffUL; > + > + end_tsc = grub_get_tsc(); > + > +#ifndef GRUB_PMTIMER_IGNORE_BAD_READS > + /* > + * If we get 10 reads in a row that are obviously dead pins, there's no > + * reason to do this thousands of times. > + */ > + if (cur == 0xffffffUL || cur == 0) > + { > + bad_reads++; > + grub_dprintf ("pmtimer", > + "pmtimer: 0x%"PRIxGRUB_UINT64_T" bad_reads: %d\n", > + cur, bad_reads); > + grub_dprintf ("pmtimer", "timer is broken; giving up.\n"); > + > + if (bad_reads == 10) > + return 0; > + } > +#endif > + > + if (cur < start) > + cur += 0x1000000; > + > if (cur >= end) > { > - end_tsc = grub_get_tsc (); > + grub_dprintf ("pmtimer", "pmtimer delta is 0x%"PRIxGRUB_UINT64_T"\n", > + cur - start); > + grub_dprintf ("pmtimer", "tsc delta is 0x%"PRIxGRUB_UINT64_T"\n", > + end_tsc - start_tsc); > return end_tsc - start_tsc; > } > - /* Check for broken PM timer. > - 50000000 TSCs is between 5 ms (10GHz) and 200 ms (250 MHz) > - if after this time we still don't have 1 ms on pmtimer, then > - pmtimer is broken. > + > + /* > + * Check for broken PM timer. 1ms at 10GHz should be 1E+7 TSCs; at > + * 250MHz it should be 2.5E6. So if after 4E+7 TSCs on a 10GHz machine, s/2.5E6/2.5E5/ > + * we should have seen pmtimer show 4ms of change (i.e. cur =~ > + * start+14320); on a 250MHz machine that should be 16ms (start+57280). s/16ms/160ms/ s/start+57280/start+572800/ > + * If after this a time we still don't have 1ms on pmtimer, then pmtimer > + * is broken. > + * > + * Likewise, if our code is perfectly efficient and introduces no delays Just to be sure. Do you mean 1 CPU clock tick per loop? > + * whatsoever, on a 10GHz system we should see a TSC delta of 3580 in s/TSC delta/pm timer delta/? > + * ~3580 iterations. On a 250MHz machine that should be ~900 iterations. Hmmm... How come? AIUI the number of iterations is the same but the time needed to reach that value changes. > + * With those factors in mind, there are two limits here. There's a hard > + * limit here at 8x our desired pm timer delta, picked as an arbitrarily I am afraid that without proper comment above I am not able to understand why "8x" here. > + * large value that's still not a lot of time to humans, because if we > + * get that far this is either an implausibly fast machine or the pmtimer > + * is not running. And there's another limit on 4x our 10GHz tsc delta > + * without seeing cur converge on our target value. I buy this but I would tell "4 ms TSC delta on CPU with 10 GHz clock" or something like that. Daniel