From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57529) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gQhmX-0003Gb-FT for qemu-devel@nongnu.org; Sat, 24 Nov 2018 18:56:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gQhmV-0005eN-SE for qemu-devel@nongnu.org; Sat, 24 Nov 2018 18:56:13 -0500 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:38477) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gQhmU-0005Gj-5i for qemu-devel@nongnu.org; Sat, 24 Nov 2018 18:56:10 -0500 From: "Emilio G. Cota" Date: Sat, 24 Nov 2018 18:55:52 -0500 Message-Id: <20181124235553.17371-13-cota@braap.org> In-Reply-To: <20181124235553.17371-1-cota@braap.org> References: <20181124235553.17371-1-cota@braap.org> Subject: [Qemu-devel] [PATCH v6 12/13] hardfloat: implement float32/64 square root List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: =?UTF-8?q?Alex=20Benn=C3=A9e?= , Richard Henderson Performance results for fp-bench: Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz - before: sqrt-single: 42.30 MFlops sqrt-double: 22.97 MFlops - after: sqrt-single: 311.42 MFlops sqrt-double: 311.08 MFlops Here USE_FP makes a huge difference for f64's, with throughput going from ~200 MFlops to ~300 MFlops. Signed-off-by: Emilio G. Cota --- fpu/softfloat.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 58 insertions(+), 2 deletions(-) diff --git a/fpu/softfloat.c b/fpu/softfloat.c index e03feafb6f..4c6ecd1883 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -3040,20 +3040,76 @@ float16 QEMU_FLATTEN float16_sqrt(float16 a, float_status *status) return float16_round_pack_canonical(pr, status); } -float32 QEMU_FLATTEN float32_sqrt(float32 a, float_status *status) +static float32 QEMU_SOFTFLOAT_ATTR +soft_f32_sqrt(float32 a, float_status *status) { FloatParts pa = float32_unpack_canonical(a, status); FloatParts pr = sqrt_float(pa, status, &float32_params); return float32_round_pack_canonical(pr, status); } -float64 QEMU_FLATTEN float64_sqrt(float64 a, float_status *status) +static float64 QEMU_SOFTFLOAT_ATTR +soft_f64_sqrt(float64 a, float_status *status) { FloatParts pa = float64_unpack_canonical(a, status); FloatParts pr = sqrt_float(pa, status, &float64_params); return float64_round_pack_canonical(pr, status); } +float32 QEMU_FLATTEN float32_sqrt(float32 xa, float_status *s) +{ + union_float32 ua, ur; + + ua.s = xa; + if (unlikely(!can_use_fpu(s))) { + goto soft; + } + + float32_input_flush1(&ua.s, s); + if (QEMU_HARDFLOAT_1F32_USE_FP) { + if (unlikely(!(fpclassify(ua.h) == FP_NORMAL || + fpclassify(ua.h) == FP_ZERO) || + signbit(ua.h))) { + goto soft; + } + } else if (unlikely(!float32_is_zero_or_normal(ua.s) || + float32_is_neg(ua.s))) { + goto soft; + } + ur.h = sqrtf(ua.h); + return ur.s; + + soft: + return soft_f32_sqrt(ua.s, s); +} + +float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s) +{ + union_float64 ua, ur; + + ua.s = xa; + if (unlikely(!can_use_fpu(s))) { + goto soft; + } + + float64_input_flush1(&ua.s, s); + if (QEMU_HARDFLOAT_1F64_USE_FP) { + if (unlikely(!(fpclassify(ua.h) == FP_NORMAL || + fpclassify(ua.h) == FP_ZERO) || + signbit(ua.h))) { + goto soft; + } + } else if (unlikely(!float64_is_zero_or_normal(ua.s) || + float64_is_neg(ua.s))) { + goto soft; + } + ur.h = sqrt(ua.h); + return ur.s; + + soft: + return soft_f64_sqrt(ua.s, s); +} + /*---------------------------------------------------------------------------- | The pattern for a default generated NaN. *----------------------------------------------------------------------------*/ -- 2.17.1