From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752461AbcHJV0Q (ORCPT ); Wed, 10 Aug 2016 17:26:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:28845 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932486AbcHJSDc (ORCPT ); Wed, 10 Aug 2016 14:03:32 -0400 Message-ID: <1470805661.13905.95.camel@redhat.com> Subject: Re: [PATCH 1/5] sched,time: Count actually elapsed irq & softirq time From: Rik van Riel To: Wanpeng Li Cc: Frederic Weisbecker , Ingo Molnar , LKML , Paolo Bonzini , Peter Zijlstra , Wanpeng Li , Thomas Gleixner , Radim Krcmar , Mike Galbraith Date: Wed, 10 Aug 2016 01:07:41 -0400 In-Reply-To: References: <1468421405-20056-1-git-send-email-fweisbec@gmail.com> <1468421405-20056-2-git-send-email-fweisbec@gmail.com> <1470751579.13905.77.camel@redhat.com> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-YEkeJS99YQwEDixj33Wv" Mime-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 10 Aug 2016 05:07:44 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-YEkeJS99YQwEDixj33Wv Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 2016-08-10 at 07:39 +0800, Wanpeng Li wrote: > 2016-08-10 7:25 GMT+08:00 Wanpeng Li : > > 2016-08-09 22:06 GMT+08:00 Rik van Riel : > > > On Tue, 2016-08-09 at 11:59 +0800, Wanpeng Li wrote: > > > > Hi Rik, > > > > 2016-07-13 22:50 GMT+08:00 Frederic Weisbecker > > > com>: > > > > > From: Rik van Riel > > > > >=20 > > > > > Currently, if there was any irq or softirq time during > > > > > 'ticks' > > > > > jiffies, the entire period will be accounted as irq or > > > > > softirq > > > > > time. > > > > >=20 > > > > > This is inaccurate if only a subset of the time was actually > > > > > spent > > > > > handling irqs, and could conceivably mis-count all of the > > > > > ticks > > > > > during > > > > > a period as irq time, when there was some irq and some > > > > > softirq > > > > > time. > > > > >=20 > > > > > This can actually happen when irqtime_account_process_tick is > > > > > called > > > > > from account_idle_ticks, which can pass a larger number of > > > > > ticks > > > > > down > > > > > all at once. > > > > >=20 > > > > > Fix this by changing irqtime_account_hi_update, > > > > > irqtime_account_si_update, > > > > > and steal_account_process_ticks to work with cputime_t time > > > > > units, > > > > > and > > > > > return the amount of time spent in each mode. > > > >=20 > > > > Do we need to minus st cputime from idle cputime in > > > > account_idle_ticks() when noirqtime is true? I try to add this > > > > logic > > > > w/ noirqtime and idle=3Dpoll boot parameter for a full dynticks > > > > guest, > > > > however, there is no difference, where I miss? > > >=20 > > > Yes, you are right. The code in account_idle_ticks() > > > could use the same treatment. > > >=20 > > > I am not sure why it would not work, though... > >=20 > > Actually I observed a regression caused by this patch. I use a i5 >=20 > The regression is caused by your commit "sched,time: Count actually > elapsed irq & softirq time". Wanpeng and I discussed this issue, and discovered that this bug is triggered by my patch, specifically this bit: -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (steal_account_process_tick(U= LONG_MAX)) +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0other =3D account_other_time(cpu= time); +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (other >=3D cputime) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0return; Replacing "cputime" with "ULONG_MAX" as the argument to account_other_time makes the bug disappear. However, this is not the cause of the bug. The cause of the bug appears to be that the values used to figure out how much steal time has passed are never initialized. =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 steal =3D paravirt_= steal_clock(smp_processor_id()); =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0steal -=3D this_rq()->prev_steal_time; The first of the two may be initialized by the host (I did not verify that), but the second one does not have any explicit initializers anywhere in the kernel tree. This can lead to an arbitrarily large difference between paravirt_steal_clock(smp_processor_id()) and this_rq()->prev_steal_time, which results in nothing but steal time getting accounted for a potentially a very long amount of time. Previously we carried this patch to initialize the various rq->prev_* values at CPU hotplug time: https://patchwork.codeaurora.org/patch/27699/ Which got reverted by Paolo here: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=3Dsche d/core&id=3D3d89e5478bf550a50c99e93adf659369798263b0 Which leads me to this question: Paulo, how would you like us to fix this bug? It seems like the host and guest steal time CAN get out of sync, sometimes by a ridiculous amount, and we need some way to get the excessive difference out of the way, without it getting accounted as steal time (not immediately, and not over the next 17 hours, or months). --=20 All Rights Reversed. --=-YEkeJS99YQwEDixj33Wv Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXqradAAoJEM553pKExN6D7/IH/0CNCsZZj5gpiBibUQxPTj2I wCQykGvSDXkr3kKnoHLyxMsGO0DPtklQxWVaV9KfPNAvCSCxQAPdQhoqU4pGHm3q lz2nhXJRtKn+VMt/879zIsSBsfelVk+nBnEj8qQDBXZ1oT3/QMF2+67L4ojqYBNs +Laf5mOaNdVGxS97th8zOD49ftPUyVLh8d9xBk0OCHdK4ZzVW6VllTlDNRBY7LZu b2+T0/lRcKFNgq+QnHARlzlZL1G4cLaLPN/O1lROJpYc9K6sc3ktGQxCsiYPv+/k POsBTaPE5/3rtfFhcEBq39lOdxodRzauRFVn85WoKkyrkBnCqNH5AeUv6TpcxSk= =nsE9 -----END PGP SIGNATURE----- --=-YEkeJS99YQwEDixj33Wv--