From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752111AbcHLCo6 (ORCPT ); Thu, 11 Aug 2016 22:44:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56620 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751258AbcHLCo4 (ORCPT ); Thu, 11 Aug 2016 22:44:56 -0400 Message-ID: <1470969892.13905.120.camel@redhat.com> Subject: Re: [PATCH] time,virt: resync steal time when guest & host lose sync From: Rik van Riel To: Wanpeng Li Cc: Frederic Weisbecker , Ingo Molnar , LKML , Paolo Bonzini , Peter Zijlstra , Wanpeng Li , Thomas Gleixner , Radim Krcmar , Mike Galbraith Date: Thu, 11 Aug 2016 22:44:52 -0400 In-Reply-To: References: <1468421405-20056-1-git-send-email-fweisbec@gmail.com> <1468421405-20056-2-git-send-email-fweisbec@gmail.com> <1470751579.13905.77.camel@redhat.com> <20160810125212.78564dc2@annuminas.surriel.com> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-NfrPxjrc1o4EwrgfAk8j" Mime-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Fri, 12 Aug 2016 02:44:56 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-NfrPxjrc1o4EwrgfAk8j Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, 2016-08-11 at 18:11 +0800, Wanpeng Li wrote: > 2016-08-11 0:52 GMT+08:00 Rik van Riel : > > On Wed, 10 Aug 2016 07:39:08 +0800 > > Wanpeng Li wrote: > >=20 > > > The regression is caused by your commit "sched,time: Count > > > actually > > > elapsed irq & softirq time". > >=20 > > Wanpeng, does this patch fix your issue? >=20 > I test this against kvm guest (nohz_full, four vCPUs running on one > pCPU, four cpuhog processes running on four vCPUs). > before this fix patch: > vCPU0's st is 100%, other vCPUs' st are ~75%. > after this fix patch: > all vCPUs' st are ~85%. > However, w/o commit "sched,time: Count actually elapsed irq & softirq > time", all vCPUs' st are ~75%. If you pass ULONG_MAX as the maxtime argument to steal_account_process_time(), does the steal time get accounted properly at 75%? If that is the case, I have a hypothesis: 1) The guest is running so much slower when sharing =C2=A0 =C2=A0a CPU 4 ways, that it is accounting only ~90% of =C2=A0 =C2=A0wall clock time as CPU time, due to missing the =C2=A0 =C2=A0other 10% or so of clock ticks. 2) account_process_tick() only ever processes one tick =C2=A0 =C2=A0at a time - if it gets called only 90x a second for =C2=A0 =C2=A0a 100Hz guest, but all the steal time recorded by =C2=A0 =C2=A0the host is fully accounted (ULONG_MAX limit), then =C2=A0 =C2=A0that could make up for lost/skipped timer ticks. 3) not accounting "extra" steal time (beyond the amount =C2=A0 =C2=A0of time accounted by account_process_tick) would reduce =C2=A0 =C2=A0the total amount of time that gets accounted if there =C2=A0 =C2=A0are missed ticks, taking time away from user/system/etc Does the above make sense? Am I overlooking some mechanism through which lost/skipped ticks are made up for in the kernel? =C2=A0I looked through the code in kernel/time/ briefly, but did not spot it... --=20 All Rights Reversed. --=-NfrPxjrc1o4EwrgfAk8j Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXrTglAAoJEM553pKExN6DZZMH/222Wlj7eKD4IK2VbSRGks3H f3+FFV1LPXRQnxfz6ACLfNbgXDLkWEHj+CmcZt8JO5JafqRKJCPwr9+zfVyuIz+/ 16EI7/nvSqlyw/wkH+XpwhK4GAypJvp8WwwpuH/ohjiEmbxh0Iqn5fSfKQidk9+O 2h0/1gb9Y/4WJ8YHDJkodDDA8JiaBGYFJqI2uIRTr5J6gEC+JPvb6t8HxuJ8v+dN 2NdfZ5n+78v7PpBjDA6JgLEuefFi8B31JVda/A4RMoNw6vhGm5UutAg31+0Ifppy g3hAwna3g1xjac42K1fDK6r33WTAPbWfQ+IZzo4xSbsan7Vz9mrPvz+a8d59dc4= =YHlK -----END PGP SIGNATURE----- --=-NfrPxjrc1o4EwrgfAk8j--