From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C1FBC3F2CD for ; Mon, 2 Mar 2020 23:02:37 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E25AB21739 for ; Mon, 2 Mar 2020 23:02:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E25AB21739 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=eik.bme.hu Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:39532 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j8u56-0007WP-1i for qemu-devel@archiver.kernel.org; Mon, 02 Mar 2020 18:02:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:60032) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j8u4P-0006yV-9m for qemu-devel@nongnu.org; Mon, 02 Mar 2020 18:01:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j8u4M-0002H6-Px for qemu-devel@nongnu.org; Mon, 02 Mar 2020 18:01:52 -0500 Received: from zero.eik.bme.hu ([2001:738:2001:2001::2001]:62823) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j8u4L-0002FO-1D; Mon, 02 Mar 2020 18:01:49 -0500 Received: from zero.eik.bme.hu (blah.eik.bme.hu [152.66.115.182]) by localhost (Postfix) with SMTP id 29E9B747DFA; Tue, 3 Mar 2020 00:01:45 +0100 (CET) Received: by zero.eik.bme.hu (Postfix, from userid 432) id E14C5747DCF; Tue, 3 Mar 2020 00:01:44 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by zero.eik.bme.hu (Postfix) with ESMTP id DF20374637E; Tue, 3 Mar 2020 00:01:44 +0100 (CET) Date: Tue, 3 Mar 2020 00:01:44 +0100 (CET) From: BALATON Zoltan To: =?ISO-8859-15?Q?Alex_Benn=E9e?= Subject: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC In-Reply-To: <87d09u8yyh.fsf@linaro.org> Message-ID: References: <20200218171702.979F074637D@zero.eik.bme.hu> <1BC2E9E9-A694-4ED3-BD3D-D731F23B7245@gmail.com> <3539F747-145F-49CC-B494-C9794A8ABABA@gmail.com> <87eeuhxw0y.fsf@linaro.org> <878skpxltm.fsf@linaro.org> <2576fd41-8b01-91a0-ca56-792ce65b5092@linaro.org> <87d09u8yyh.fsf@linaro.org> User-Agent: Alpine 2.22 (BSF 395 2020-01-19) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="3866299591-1683699610-1583190104=:47473" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2001:738:2001:2001::2001 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Richard Henderson , QEMU Developers , Programmingkid , "qemu-ppc@nongnu.org" , Howard Spoelstra , luigi burdo , Dino Papararo , Aleksandar Markovic , David Gibson Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --3866299591-1683699610-1583190104=:47473 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable On Mon, 2 Mar 2020, Alex Benn=C3=A9e wrote: > BALATON Zoltan writes: >> On Sun, 1 Mar 2020, Richard Henderson wrote: >>> On 3/1/20 4:13 PM, Programmingkid wrote: >>>> Ok, I was just looking at Intel's x87 chip documentation. It >>>> supports IEEE 754 floating point operations and exception flags. >>>> This leads me to this question. Would simply taking the host >>>> exception flags and using them to set the PowerPC's FPU's flag be >>>> an acceptable solution to this problem? >> >> In my understanding that's what is currently done, the problem with >> PPC as Richard said is the non-sticky versions of some of these bits >> which need clearing FP exception status before every FPU op which >> seems to be expensive and slower than using softfloat. So to use >> hardfloat we either accept that we can't emulate these bits with >> hardfloat or we need to do something else than clearing flags and >> checking after every FPU op. >> >> While not emulating these bits don't seem to matter for most clients >> and other PPC emulations got away with it, QEMU prefers accuracy over >> speed even for rarely used features. >> >>> No. >>> >>> The primary issue is the FPSCR.FI flag. This is not an accumulative = bit, per >>> ieee754, but per operation. >>> >>> The "hardfloat" option works (with other targets) only with ieee745 >>> accumulative exceptions, when the most common of those exceptions, in= exact, has >>> already been raised. And thus need not be raised a second time. >> >> Why exactly it's done that way? What are the differences between IEEE >> FP implementations that prevents using hardfloat most of the time >> instead of only using it in some (although supposedly common) special >> cases? > > There are a couple of wrinkles. As far as NaN and denormal behaviour > goes we have enough slack in the spec that different guests have > slightly different behaviour. See pickNaN and friends in the soft float > specialisation code. As a result we never try and hand off to hardfloat > for NaNs, Infs and Zeros. Luckily testing for those cases if a fairly > small part of the cost of the calculation. > > Also things tend to get unstuck on changes to rounding modes. > Fortunately it doesn't seem to be supper common. OK but how do these relate to inexact flag and why is that the one that's= =20 checked for using hardfloat? Also rounding mode is checked but why can't=20 we set the same mode on host and why only use hardfloat in one specific=20 rounding mode? These two checks seem to further limit hardfloat use beyon= d=20 the above cases or are these the same? > You can read even more detail in the paper that originally prompted > Emilio's work: > > "supporting the neon and VFP instruction sets in an LLVM-based > binary translator" > https://www.thinkmind.org/download.php?articleid=3Dicas_2015_5_20_200= 33 I've only had a quick look at it but seems to not discuss all details.=20 They say the ARM instruction they wanted to emulate have some non-standar= d=20 flush-to-zero behaviour where exceptions (including inexact) are handled=20 differently. Is this related to the check above and if yes shouldn't that= =20 only apply to ARM target? Other standard compliant target probably should= =20 not be limited by this. They've also found out that clearing and reading host FP flags is "slower= =20 than QEMU" which is what we have for PPC currently. They say the solution= =20 is to not use host exceptions at all but calculate the exception flags=20 from software looking at inputs and result instead maybe trying different= =20 FP ops that test for the exception cases. Unfortunately this paper does=20 not describe how exactly that's done just say maybe it will be described=20 later. It seems like kind of softfloat but using FPU for actual=20 calculation and deduce exeptions without access to intermediate reaults=20 that softfloat may be using. So they can use hardware for calculation=20 which should be the largest part and calculate the flags from software.=20 This way they claim 1.24 to 3.36 times speed up compared to then QEMU=20 (using only softfloat I guess which is still what we have for PPC today). >>> Per the PowerPC architecture, inexact must be recognized afresh for e= very >>> operation. Which is cheap in hardware but expensive in software. >>> >>> And once you're done with FI, FR has been and continues to be emulate= d incorrectly. >> >> I think CPUs can also raise exceptions when they detect the condition >> in hardware so maybe we should install our FPU exception handler and >> set guest flags from that then we don't need to check and won't have >> problem with these bits either. Why is that not possible or isn't >> done? > > One of my original patches did just this: > > Subject: [PATCH] fpu/softfloat: use hardware sqrt if we can (EXPERIMEN= T!) > Date: Tue, 20 Feb 2018 21:01:37 +0000 > Message-Id: <20180220210137.18018-1-alex.bennee@linaro.org> It's this patch: http://patchwork.ozlabs.org/patch/875764/ This at least shows where to hook in FP exception handling but based on=20 the above paper maybe that's not the best solution after all but may wort= h=20 a try anyway in case it's simpler than what they did. > The two problems you run into are: > > - relying on a trap for inexact will be slow if you keep hitting it Which is slower? Clearing exception flags before every op and reading the= m=20 again or trapping for exceptions? I'd expect even if exceptions are commo= n=20 they should be less frequent than every op (otherwise they would not be=20 "exceptional"). > - reading host FPU flag registers turns out to be pretty expensive That's what using exceptions should avoid. If we only need to read and=20 clear flags when exception happens that should be less frequent than doin= g=20 that for every FP op. Hopefully even with the additional overhead of=20 calling the handler if all the handler has to do is set a corresponding=20 flag in a global. Regards, BALATON Zoltan --3866299591-1683699610-1583190104=:47473--