From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751717AbdLUKIC (ORCPT ); Thu, 21 Dec 2017 05:08:02 -0500 Received: from smtp-out4.electric.net ([192.162.216.187]:52484 "EHLO smtp-out4.electric.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750884AbdLUKH7 (ORCPT ); Thu, 21 Dec 2017 05:07:59 -0500 From: David Laight To: "'Crt Mori'" CC: Peter Zijlstra , Jonathan Cameron , Ingo Molnar , Andrew Morton , Kees Cook , Rusty Russell , Ian Abbott , Larry Finger , Niklas Soderlund , Thomas Gleixner , Krzysztof Kozlowski , Masahiro Yamada , "linux-kernel@vger.kernel.org" , "linux-iio@vger.kernel.org" , Joe Perches Subject: RE: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt Thread-Topic: [PATCH v10 1/3] lib: Add strongly typed 64bit int_sqrt Thread-Index: AQHTeZ24vj4uJGdJVEmg6EfyY0tMJ6NMSiFAgAAfCS+AAAbEMIAADUYAgAEVOQA= Date: Thu, 21 Dec 2017 10:08:09 +0000 Message-ID: References: <20171220142001.18161-1-cmo@melexis.com> <1c1d0ffa8ee140bf9adbc78f1559b1e8@AcuMS.aculab.com> <20171220160001.manjff26gfbjccsw@hirez.programming.kicks-ass.net> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.33] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 X-Outbound-IP: 156.67.243.126 X-Env-From: David.Laight@ACULAB.COM X-Proto: esmtps X-Revdns: X-HELO: AcuMS.aculab.com X-TLS: TLSv1.2:ECDHE-RSA-AES256-SHA384:256 X-Authenticated_ID: X-PolicySMART: 3396946, 3397078 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id vBLA8ISj007769 From: Crt Mori > Sent: 20 December 2017 17:30 > To: David Laight ... > > I think this version works. > > It doesn't have the optimisation for small values. > > > > unsigned int sqrt64(unsigned long long x) > > { > > unsigned int x_hi = x >> 32; > > > > unsigned int b = 0; > > unsigned int y = 0; > > unsigned int i; > > > > for (i = 0; i < 32; i++) { > > b <<= 2; > > b |= x_hi >> 30; > > x_hi <<= 2; > > if (i == 15) > > x_hi = x; > > y <<= 1; > > if (b > y) > > b -= ++y; > > } > > return y; > > } > > Wow, thanks. This seems like a major change. Not really, it is just doing it slightly differently. The algorithm is the 'schoolboy' one that used to be taught at school. (Although I found it in a boot from the 1930s.) I compared the object code for amd64 (doesn't need to reload 'x_hi' half way through) against the original loop. They are much the same size, execution speed will depend the lengths of the dependency chains. > I did a quick run through unit tests for the sensor and the results > are way off. On the sensor I had to convert double calculations to > integer calculations and target was to get end result within 0.02 degC > (with previous approximate sqrt implementation) in sensor working > range. This now gets into 3 degC delta at least and some are way off. > It might be off because of some scaling on the other hand during the > equation (not exactly comparing sqrt implementations here). > > I will need a bit more testing of this to see if it is replaceable. You probably need to put values through both (all 3) functions to see where you are getting errors. It might be rounding, or there might be a fubar somewhere. David