From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756283AbcDDTgQ (ORCPT ); Mon, 4 Apr 2016 15:36:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42962 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756063AbcDDTgP (ORCPT ); Mon, 4 Apr 2016 15:36:15 -0400 Message-ID: <1459798571.6219.23.camel@redhat.com> Subject: Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative From: Rik van Riel To: Chris Metcalf , Frederic Weisbecker , Christoph Lameter , Ingo Molnar , Luiz Capitulino , Peter Zijlstra , Thomas Gleixner , Viresh Kumar , linux-kernel@vger.kernel.org Date: Mon, 04 Apr 2016 15:36:11 -0400 In-Reply-To: <5702C126.1030904@mellanox.com> References: <1459539771-4251-1-git-send-email-cmetcalf@mellanox.com> <1459797143.6219.22.camel@redhat.com> <5702C126.1030904@mellanox.com> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-YXfWIcz+q2GW3qCTpnpK" Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-YXfWIcz+q2GW3qCTpnpK Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 2016-04-04 at 15:31 -0400, Chris Metcalf wrote: > On 4/4/2016 3:12 PM, Rik van Riel wrote: > >=20 > > On Fri, 2016-04-01 at 15:42 -0400, Chris Metcalf wrote: > > >=20 > > > On arm64, when calling enqueue_task_fair() from > > > migration_cpu_stop(), > > > we find the nr_running value updated by add_nr_running(), but the > > > cfs.nr_running value has not always yet been > > > updated.=C2=A0=C2=A0Accordingly, > > > the sched_can_stop_tick() false returns true when we are > > > migrating a > > > second task onto a core. > > I don't get it. > >=20 > > Looking at the enqueue_task_fair(), I see this: > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0for_each_sched_en= tity(se) { > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0cfs_rq =3D cfs_rq_of(se); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0cfs_rq->h_nr_running++; > > ... > > } > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (!se) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0add_nr_running(rq, 1); > >=20 > > What is the difference between cfs_rq->h_nr_running, > > and rq->cfs.nr_running? > >=20 > > Why do we have two? > > Are we simply testing against the wrong one in > > sched_can_stop_tick? > It seems that using the non-CFS one is what we want.=C2=A0=C2=A0I don't k= now > whether > using a different CFS count instead might be more correct. >=20 > Since I'm not sure what causes the difference I see between tile > (correct) > and arm64 (incorrect) it's hard for me to speculate. >=20 > >=20 > > >=20 > > > Correct this by using rq->nr_running instead of rq- > > > >cfs.nr_running. > > > This should always be more conservative, and reverts the test to > > > the > > > form it had before commit 76d92ac305f2 ("sched: Migrate sched to > > > use > > > new tick dependency mask model"). > > That would cause us to run the timer tick while running > > a single SCHED_RR real time task, with a single > > SCHED_OTHER task sitting in the background (which will > > not get run until the SCHED_RR task is done). > No, because in sched_can_stop_tick(), we first handle the special > cases of RR or FIFO tasks present.=C2=A0=C2=A0For example, RR: >=20 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (rq->rt.rr_nr_ru= nning) { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (rq->rt.rr_nr_running =3D=3D 1) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0return true; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0else > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0return false; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} >=20 > Once we see there's any RR tasks running, the return value > ignores any possible SCHED_OTHER tasks.=C2=A0=C2=A0Only after the code > concludes there are no RR/FIFO tasks do we even look at > the over nr_running value. OK, fair enough. I guess both of the RT cases are covered already. Patch gets my: Acked-by: Rik van Riel --=20 All Rights Reversed. --=-YXfWIcz+q2GW3qCTpnpK Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXAsIrAAoJEM553pKExN6DE50H/j37hgY+ScrushxWlR1AENa6 UqBqnllE164V6uZMObh2Ja2eJCoA4vfx4j3I5Vd3ALInAaB/UrLe9cwUnDqDi16I R2IwB1EOi2n5ZTEAwlbBab4U6ee4UuYQiOGe9VrbPpc2vxkosR8+olK7tWV9ExEV RkC1KTutcS2EVOR6FdeIsvkO12wNdc8mZNS7DPP6/UR8pCGHfD2GIB0LzcwQbQL/ UNNqw4SHnkB+pO0fsIbT2412ITuWUA0Fh6ln8q6pk9VfdMDmVaNwrmqpxKApPCMY 30BHMlnCrSX1aDJpCNljoqr2GanLqpw9Fidrws5dfnI7IchOwe0RmTS9EBbxyuA= =wkeY -----END PGP SIGNATURE----- --=-YXfWIcz+q2GW3qCTpnpK--