From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751813AbdF0O4I (ORCPT ); Tue, 27 Jun 2017 10:56:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46822 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751087AbdF0O4B (ORCPT ); Tue, 27 Jun 2017 10:56:01 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com ECEA5C04B939 Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=riel@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com ECEA5C04B939 Message-ID: <1498575358.20270.114.camel@redhat.com> Subject: Re: [PATCH 4/4] sched,fair: remove effective_load From: Rik van Riel To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, jhladky@redhat.com, mingo@kernel.org, mgorman@suse.de Date: Tue, 27 Jun 2017 10:55:58 -0400 In-Reply-To: <20170627053906.GA7287@worktop> References: <20170623165530.22514-1-riel@redhat.com> <20170623165530.22514-5-riel@redhat.com> <20170626144437.GB4941@worktop> <20170626144611.GA5775@worktop> <1498488941.13083.43.camel@redhat.com> <20170626150401.GC4941@worktop> <1498490454.13083.45.camel@redhat.com> <20170626161250.GD4941@worktop> <1498505689.13083.49.camel@redhat.com> <20170627053906.GA7287@worktop> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-n2XLQ4is+qDoYz2C9iBY" Mime-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 27 Jun 2017 14:56:01 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-n2XLQ4is+qDoYz2C9iBY Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 2017-06-27 at 07:39 +0200, Peter Zijlstra wrote: > On Mon, Jun 26, 2017 at 03:34:49PM -0400, Rik van Riel wrote: > > On Mon, 2017-06-26 at 18:12 +0200, Peter Zijlstra wrote: > > > On Mon, Jun 26, 2017 at 11:20:54AM -0400, Rik van Riel wrote: > > >=20 > > > > Oh, indeed.=C2=A0=C2=A0I guess in wake_affine() we should test > > > > whether the CPUs are in the same NUMA node, rather than > > > > doing cpus_share_cache() ? > > >=20 > > > Well, since select_idle_sibling() is on LLC; the early test on > > > cpus_share_cache(prev,this) seems to actually make sense. > > >=20 > > > But then cutting out all the other bits seems wrong. Not in the > > > least > > > because !NUMA_BALACING should also still keep working. > >=20 > > Even when !NUMA_BALANCING, I suspect it makes little sense > > to compare the loads just one the cores in question, since > > select_idle_sibling() will likely move the task somewhere > > else. > >=20 > > I suspect we want to compare the load on the whole LLC > > for that reason, even with NUMA_BALANCING disabled. >=20 > But we don't have that data around :/ One thing we could do is try > and > keep a copy of the last s*_lb_stats around in the sched_domain_shared > stuff or something and try and use that. >=20 > That way we can strictly keep things at the LLC level and not confuse > things with NUMA. >=20 > Similarly, we could use that same data to then avoid re-computing > things > for the NUMA domain as well and do away with numa_stats. That does seem like a useful optimization, though I guess we would have to invalidate the cached data every time we actually move a task? The current code simply walks all the CPUs in the cpumask_t, and adds up capacity and load. Doing that appears to be better than poor task placement (Jirka's numbers speak for themselves), but optimizing this code path does seem like a worthwhile goal. I'll look into it. > > > > Or, alternatively, have an update_numa_stats() variant > > > > for numa_wake_affine() that works on the LLC level? > > >=20 > > > I think we want to retain the existing behaviour for everything > > > larger than LLC, and when NUMA_BALANCING, smaller than NUMA. > >=20 > > What do you mean by this, exactly? >=20 > As you noted, when prev and this are in the same LLC, it doesn't > matter > and select_idle_sibling() will do its thing. So anything smaller than > the LLC need not do anything. >=20 > When NUMA_BALANCING we have the numa_stats thing and we can, as you > propose use that. >=20 > If LLC < NUMA or !NUMA_BALANCING we have a region that needs to do > _something_. Agreed. I will fix this. Given that this is a bit of a corner case, I guess I can fix this with follow-up patches, to be merged into -tip before the whole series is sent on to Linus? > > > Also note that your use of task_h_load() in the new numa thing > > > suffers > > > from exactly the problem effective_load() is trying to solve. > >=20 > > Are you saying task_h_load is wrong in task_numa_compare() > > too, then?=C2=A0=C2=A0Should both use effective_load()? >=20 > I need more than the few minutes I currently have, but probably. The > question is of course, how much does it matter and how painful will > it > be to do it better. I suspect it does not matter at all currenly, since the=C2=A0 load balancing code does not use effective_load, and having the wake_affine logic calculate things differently from the load balancer is likely to result in both pieces of code fighting against each other. I suspect we should either use task_h_load everywhere, or effective_load everywhere, but not have a mix and match situation where one is used in some places, and the other in others. --=20 All rights reversed --=-n2XLQ4is+qDoYz2C9iBY Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJZUnH+AAoJEM553pKExN6DAkwIALP+sJTZ4pZqHN5raBd/sR5u HhuJBFPndl9xmwsdbEJx8kkYJrX0Qo/I5707I4a9a2gbT41P9TfkTGr8w7BqG4YF zwWAlrO3OS2ENCk9MHxYRaBXiD386zOadb297CFDBQNQOg6FvbvkJHPtq3xxTWtB TzgvpkUw3XYphJc1HXWenU+nzyy6LTQtBMUNMcqzxaY7/3eunyiKxNKpDGp5PyOY yr3ghlGrXRpN1j5MdN5Ft7LmHpMpWsVWTZwbKhUUl/C66aBz0cv2fTGoh0oGVz0u Xfc8HdeQYEBtqtR/IRK1I0sZ8ZCwSROZAFowNbm2pLsw8uW/uDlwMAlOvckMnso= =rnnP -----END PGP SIGNATURE----- --=-n2XLQ4is+qDoYz2C9iBY--