From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1033310AbdD0Aae (ORCPT ); Wed, 26 Apr 2017 20:30:34 -0400 Received: from mail-pf0-f196.google.com ([209.85.192.196]:33341 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033287AbdD0AaY (ORCPT ); Wed, 26 Apr 2017 20:30:24 -0400 Date: Wed, 26 Apr 2017 17:30:20 -0700 From: Tejun Heo To: Vincent Guittot Cc: Ingo Molnar , Peter Zijlstra , linux-kernel , Linus Torvalds , Mike Galbraith , Paul Turner , Chris Mason , kernel-team@fb.com Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg Message-ID: <20170427003020.GD11348@wtj.duckdns.org> References: <20170424201344.GA14169@wtj.duckdns.org> <20170424201444.GC14169@wtj.duckdns.org> <20170425184941.GB15593@wtj.duckdns.org> <20170425210810.GB20255@wtj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Vincent. On Wed, Apr 26, 2017 at 12:21:52PM +0200, Vincent Guittot wrote: > > This is from the follow-up patch. I was confused. Because we don't > > propagate decays, we still should decay the runnable_load_avg; > > otherwise, we end up accumulating errors in the counter. I'll drop > > the last patch. > > Ok, the runnable_load_avg goes back to 0 when I drop patch 3. But i > see runnable_load_avg sometimes significantly higher than load_avg > which is normally not possible as load_avg = runnable_load_avg + > sleeping task's load_avg So, while load_avg would eventually converge on runnable_load_avg + blocked load_avg given stable enough workload for long enough, runnable_load_avg jumping above load avg temporarily is expected, AFAICS. That's the whole point of it, a sum closely tracking what's currently on the cpu so that we can pick the cpu which has the most on it now. It doesn't make sense to try to pick threads off of a cpu which is generally loaded but doesn't have much going on right now, after all. > Then, I just have the opposite behavior on my platform. I see a > increase of latency at p99 with your patches. > My platform is a hikey : 2x4 cores ARM and I have used schbench -m 2 > -t 4 -s 10000 -c 15000 -r 30 so I have 1 worker thread per CPU which > is similar to what you are doing on your platform > > With v4.11-rc8. I have run 10 times the test and get consistent results ... > *99.0000th: 539 ... > With your patches i see an increase of the latency for p99. I run 10 > *99.0000th: 2034 I see. This is surprising given that at least the purpose of the patch is restoring cgroup behavior to match !cgroup one. I could have totally messed it up tho. Hmm... there are several ways forward I guess. * Can you please double check that the higher latencies w/ the patch is reliably reproducible? The test machines that I use have variable management load. They never dominate the machine but are enough to disturb the results so that to drawing out a reliable pattern takes a lot of repeated runs. I'd really appreciate if you could double check that the pattern is reliable with different run patterns (ie. instead of 10 consecutive runs after another, interleaved). * Is the board something easily obtainable? It'd be the eaisest for me to set up the same environment and reproduce the problem. I looked up hikey boards on amazon but couldn't easily find 2x4 core ones. If there's something I can easily buy, please point me to it. If there's something I can loan, that'd be great too. * If not, I'll try to clean up the debug patches I have and send them your way to get more visiblity but given these things tend to be very iterative, it might take quite a few back and forth. Thanks! -- tejun