From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53050) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b8tAm-0000A9-0C for qemu-devel@nongnu.org; Fri, 03 Jun 2016 13:46:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b8tAi-0003PA-RF for qemu-devel@nongnu.org; Fri, 03 Jun 2016 13:46:16 -0400 Received: from mail-lf0-x22e.google.com ([2a00:1450:4010:c07::22e]:34971) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b8tAi-0003Os-JW for qemu-devel@nongnu.org; Fri, 03 Jun 2016 13:46:12 -0400 Received: by mail-lf0-x22e.google.com with SMTP id w16so59098245lfd.2 for ; Fri, 03 Jun 2016 10:46:12 -0700 (PDT) References: <1464138802-23503-1-git-send-email-cota@braap.org> <1464138802-23503-9-git-send-email-cota@braap.org> <5749E02A.3080909@gmail.com> <20160603172245.GA8361@flamenco> <5751BE73.2090402@gmail.com> From: Sergey Fedorov Message-ID: <5751C25F.6080801@gmail.com> Date: Fri, 3 Jun 2016 20:46:07 +0300 MIME-Version: 1.0 In-Reply-To: <5751BE73.2090402@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Emilio G. Cota" Cc: QEMU Developers , MTTCG Devel , =?UTF-8?Q?Alex_Benn=c3=a9e?= , Paolo Bonzini , Richard Henderson On 03/06/16 20:29, Sergey Fedorov wrote: > On 03/06/16 20:22, Emilio G. Cota wrote: >> On Sat, May 28, 2016 at 21:15:06 +0300, Sergey Fedorov wrote: >>> On 25/05/16 04:13, Emilio G. Cota wrote: >>> (snip) >>>> +double qdist_avg(const struct qdist *dist) >>>> +{ >>>> + unsigned long count; >>>> + size_t i; >>>> + double ret = 0; >>>> + >>>> + count = qdist_sample_count(dist); >>>> + if (!count) { >>>> + return NAN; >>>> + } >>>> + for (i = 0; i < dist->n; i++) { >>>> + struct qdist_entry *e = &dist->entries[i]; >>>> + >>>> + ret += e->x * e->count / count; >>> Please use Welford’s method or something like that, see >>> http://stackoverflow.com/a/1346890. >> Yes, the way the mean is computed right now, we might suffer >> from underflow if count is huge. But I'd rather take that, than the >> perf penalty of an iterative method (such as the one used >> in Welford's). Note that we might have huge amounts of >> items, e.g. one item per head bucket in qht's occupancy qdist >> (and 0.5M head buckets is easy to achieve). >> >> If we were to use an iterative method, we'd need to do something >> like: >> >> double qdist_avg(const struct qdist *dist) >> { >> size_t i, j; >> double ret = 0; >> >> if (!qdist_sample_count(dist)) { >> return NAN; >> } >> /* compute moving average to prevent under/overflow */ >> for (i = 0; i < dist->n; i++) { >> struct qdist_entry *e = &dist->entries[i]; >> >> for (j = 0; j < e->count; j++) { >> >> ret += (e->x - ret) / (i + j + 1); >> } >> } >> return ret; >> } >> >> Note that skipping the inner loop would be incorrect. > Ah, it's a shame. I'm wondering if there is some other algorithm that > could work for us? Maybe something like https://en.wikipedia.org/wiki/Kahan_summation_algorithm could help? Regards, Sergey > >> I measured the time it takes to execute qdist_avg(&hst.occupancy) at the >> end of booting debian jessie for ARM. The difference is >> significant: >> >> Original: 0.000002 s >> Iterative: 0.002846 s > Have you compared the results of computing the average as well? > >> So really I think we should be OK with a potential underflow. If you want >> I can add a comment to remind our future selves of these findings. > Kind regards, > Sergey