From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754973AbbCMMtv (ORCPT ); Fri, 13 Mar 2015 08:49:51 -0400 Received: from mail-la0-f50.google.com ([209.85.215.50]:33680 "EHLO mail-la0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751391AbbCMMtr (ORCPT ); Fri, 13 Mar 2015 08:49:47 -0400 MIME-Version: 1.0 In-Reply-To: <5500b987.kerYYCYfIffruy3Z%akpm@linux-foundation.org> References: <5500b987.kerYYCYfIffruy3Z%akpm@linux-foundation.org> Date: Fri, 13 Mar 2015 15:49:45 +0300 Message-ID: Subject: Re: + lib-vsprintfc-even-faster-decimal-conversion.patch added to -mm tree From: Alexey Dobriyan To: Linux Kernel , Andrew Morton Cc: linux@rasmusvillemoes.dk, Peter Zijlstra , Tejun Heo , Denis Vlasenko , KAMEZAWA Hiroyuki Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 12, 2015 at 12:54 AM, wrote: > Subject: lib/vsprintf.c: even faster binary to decimal conversion I spent some time to microbenchmark changes in userspace (audience: fool!). Results are below. Legend is "number avg+-1sigma min-max". Every number is CPU cycles. Great care was taken to remove interrupt noise. Number of measurements is 100 millions per line. CPU is Intel Core 2 Duo E6550 in 64-bit mode. 3.19.1: 0 98.015369 +- 0.512937 91-616 42 116.000193 +- 3.523826 112-868 27182 137.009008 +- 3.515725 133-1043 65535 137.008262 +- 3.521761 133-840 4294967295 201.019966 +- 3.278608 196-1050 3141592653589793238 289.996882 +- 3.489376 287-1148 18446744073709551615 295.065274 +- 2.860187 287-1029 ----------------------------------------------------- 3.19.1+patch 0 94.444063 +- 3.518922 84-630 42 116.428533 +- 18.539093 105-1036 42 116.316904 +- 18.234484 105-833 27182 136.172398 +- 3.737113 133-980 65535 136.014742 +- 3.537882 133-714 4294967295 172.009618 +- 3.507473 168-826 3141592653589793238 207.001114 +- 3.492724 196-1120 18446744073709551615 208.018154 +- 3.220185 203-1246 ----------------------------------------------------- New code is somewhat faster for huge numbers. But top and ps don't show huge numbers normally -- it is either PIDs (2^16) or moderately high numbers in a range of millions (see /proc/stat) * variance for new code is bigger I even tried N=42 twice because I thought 18.5 variance is a glitch but it is not. New code uses lookup table which implies cache misses. Current code is purely code. So I don't think new printing code will change anything really. > On a larger scale, perf shows that top, one of the big consumers of /proc > data, uses 0.5-1.0% fewer cpu cycles. perf(1) also shows variance next to average, what was it? In my experience everything perf measures has single digit percent variance (and this is just 1sigma!) so you can't say new code is faster. Also average can vary between runs more than variance (yuck!) First number printing improvement patch was measuring ~30% speedups: commit 4277eedd7908a0ca8b66fad46ee76b0ad96e6ef2 vsprintf.c: optimizing, part 2: base 10 conversion speedup, v2 Now it is 1%. > Microbenchmarking shows improvements > ranging from -50% (for numbers uniformly distributed in [0, 2^64-1]) to > -25% (for numbers heavily biased toward the smaller end, a more realistic > distribution). -25%? Mmm, no, see table above. I think any further improvements to number printing code should be rejected on philosophical grounds: Kernel should ship numbers to ps(1) and top(1) in BINARY, so it would take exactly 1 MOV instruction which takes exactly 1 cycle to execute. Currently it is 1) kernel converts binary to text, 2) usespace converts text to binary, 3) userspace converts binary to text and shows the user. 4) people optimizing #1 But only final conversion is needed to happen because it is communication between human and program. Programs can very well talk in binary. So, if you really want to speed up top(1), design binary interface for shipping numbers and enjoy 1000% speedups, leave text files in /proc for those undemanding shell scripts. Alexey