From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752126AbdGDMr6 (ORCPT ); Tue, 4 Jul 2017 08:47:58 -0400 Received: from mail-qt0-f195.google.com ([209.85.216.195]:34806 "EHLO mail-qt0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751645AbdGDMr5 (ORCPT ); Tue, 4 Jul 2017 08:47:57 -0400 Date: Tue, 4 Jul 2017 08:47:55 -0400 From: Josef Bacik To: Peter Zijlstra Cc: Ingo Molnar , josef@toxicpanda.com, mingo@redhat.com, linux-kernel@vger.kernel.org, kernel-team@fb.com, Josef Bacik Subject: Re: [RFC][PATCH] sched: attach extra runtime to the right avg Message-ID: <20170704124754.GA6807@destiny> References: <1498787766-9593-1-git-send-email-jbacik@fb.com> <20170702093718.aq5p5xxfvrjdeful@gmail.com> <20170704094141.mebcs2pjv2s6vynt@hirez.programming.kicks-ass.net> <20170704101308.odsijqc6qa7p2pe3@gmail.com> <20170704122150.f2bqv55g7vvjztxa@hirez.programming.kicks-ass.net> <20170704124003.i7lg4c7gdnqqbjyo@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170704124003.i7lg4c7gdnqqbjyo@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 04, 2017 at 02:40:03PM +0200, Peter Zijlstra wrote: > On Tue, Jul 04, 2017 at 02:21:50PM +0200, Peter Zijlstra wrote: > > On Tue, Jul 04, 2017 at 12:13:09PM +0200, Ingo Molnar wrote: > > > > > > This code on the other hand: > > > > > > sa->last_update_time += delta << 10; > > > > > > ... in essence creates a whole new absolute clock value that slowly but surely is > > > drifting away from the real rq->clock, because 'delta' is always rounded down to > > > the nearest 1024 ns boundary, so we accumulate the 'remainder' losses. > > > > > > That is because: > > > > > > delta >>= 10; > > > ... > > > sa->last_update_time += delta << 10; > > > > > > Given enough time, ->last_update_time can drift a long way, and this delta: > > > > > > delta = now - sa->last_update_time; > > > > > > ... becomes meaningless AFAICS, because it's essentially two different clocks that > > > get compared. > > > > Thing is, once you drift over 1023 (ns) your delta increases and you > > catch up again. > > > > > > > > A B C D E F > > | | | | | | > > +----+----+----+----+----+----+----+----+----+----+----+ > > > > > > A: now = 0 > > sa->last_update_time = 0 > > delta := (now - sa->last_update_time) >> 10 = 0 > > > > B: now = 614 (+614) > > delta = (614 - 0) >> 10 = 0 > > sa->last_update_time += 0 (0) > > sa->last_update_time = now & ~1023 (0) > > > > C: now = 1843 (+1229) > > delta = (1843 - 0) >> 10 = 1 > > sa->last_update_time += 1024 (1024) > > sa->last_update_time = now & ~1023 (1024) > > > > > > D: now = 3481 (+1638) > > delta = (3481 - 1024) >> 10 = 2 > > sa->last_update_time += 2048 (3072) > > sa->last_update_time = now & ~1023 (3072) > > > > E: now = 5734 (+2253) > > delta = (5734 - 3072) = 2 > > sa->last_update_time += 2048 (5120) > > sa->last_update_time = now & ~1023 (5120) > > > > F: now = 6348 (+614) > > delta = (6348 - 5120) >> 10 = 1 > > sa->last_update_time += 1024 (6144) > > sa->last_update_time = now & ~1023 (6144) > > > > > > > > And you'll see that both are identical, and that both D and F have > > gotten a spill from sub-chunk accounting. > > > Where the two approaches differ is when we have different modifications > to sa->last_update_time (and we do). > > The differential (+=) one does not mandate initial value of > ->last_update_time has the bottom 9 bits cleared. It will simply > continue from wherever. > > The absolute (&) one however mandates that ->last_update_time always has > the bottom few bits 0, otherwise we can 'gain' time. The first iteration > will clear those bits and we'll then double account them. > > It so happens that we have an explicit assign in migrate > (attach_entity_load_avg / set_task_rq_fair). And on negative delta. In > all those cases we use the immediate 'now' value, no clearing of bottom > bits. > > The differential should work fine with that, the absolute one has double > accounting issues in that case. > > So it would be very good to find what exactly causes Josef's workload to > get 'fixed'. Sorry let me experiment some more, like I said this is one of 4 patches I need to actually fix my workload, and I've tested so many iterations of this problem that I may be _thinking_ it affects things but it really doesn't. I'll re-test with the normal code and the other 3 patches in place and see if things are ok. Thanks, Josef