From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966050AbdGTWec (ORCPT ); Thu, 20 Jul 2017 18:34:32 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:57359 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965438AbdGTWe1 (ORCPT ); Thu, 20 Jul 2017 18:34:27 -0400 Date: Fri, 21 Jul 2017 00:34:16 +0200 From: Peter Zijlstra To: Linus Torvalds Cc: Anshul Garg , Davidlohr Bueso , Linux Kernel Mailing List , "anshul.g@samsung.com" , Thomas Gleixner , Joe Perches Subject: Re: [PATCH] lib/int_sqrt.c: Optimize square root function Message-ID: <20170720223416.fxkgdtvuqwxxmf3y@hirez.programming.kicks-ass.net> References: <1422897162-111998-1-git-send-email-aksgarg1989@gmail.com> <20170720112449.6xvc2ghaj3jh6w7l@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 20, 2017 at 11:31:36AM -0700, Linus Torvalds wrote: > How did this two-year old thread get resurrected? I was looking for the original thread doing that 'optimization' Davidlohr did but found this first. > And the *most* important question is that first one: > > "Why does this matter, and what is the range it matters for?" I was looking to do some work on the idle estimator. Parts of that keep online avg and variance for normal distributions. I wanted to bias the avg downwards, the way to do that is to subtract a scaled stdev from it. Computing the stdev from a variance requires the sqrt. Thomas rightly asked how expensive our sqrt is, I found Davidlohr's commit message and got confused by the numbers, so I reran them and found the optimization did the reverse, it made things worse. By then I was prodding at it... 'fun' problem :-) In any case, I suppose the range of values would be in the order of TICK_NSEC, so the variance would be a number of orders below that. So we're looking at fairly small numbers <1e5. > Also, since this is a generic library routine, no way can we depend on > fls being fast. Which is why I also tested the software fls, but you're right in that the microbench primes the branch predictor. Still, the software fls is 6 branches, whereas the 'missing' loop: while (m > x) m >>= 2; would need up to 30 or so cycles worst case. So even in that respect it makes sense its a 'win', esp. so for small numbers.