From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161108AbcFGLuN (ORCPT ); Tue, 7 Jun 2016 07:50:13 -0400 Received: from mail-oi0-f42.google.com ([209.85.218.42]:33956 "EHLO mail-oi0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161079AbcFGLuJ (ORCPT ); Tue, 7 Jun 2016 07:50:09 -0400 MIME-Version: 1.0 In-Reply-To: References: <1465661590-4732-1-git-send-email-wanpeng.li@hotmail.com> <15351b9b-563a-8945-5fb6-2df15107f0d6@redhat.com> From: Wanpeng Li Date: Tue, 7 Jun 2016 19:50:08 +0800 Message-ID: Subject: Re: Fw: [PATCH v4 2/3] sched/cputime: Fix prev steal time accouting during cpu hotplug To: "linux-kernel@vger.kernel.org" , kvm Cc: Paolo Bonzini , Radim Krcmar , Wanpeng Li , Ingo Molnar , "Peter Zijlstra (Intel)" , Rik van Riel , Thomas Gleixner , Frederic Weisbecker Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2016-06-07 19:41 GMT+08:00 Wanpeng Li : > On 07/06/2016 10:00, Wanpeng Li wrote: > >> From: Wanpeng Li >> >> Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU >> hotplug") >> set rq->prev_* to 0 after a cpu hotplug comes back in order to fix the >> scenario: >> >> | steal is smaller than rq->prev_steal_time we end up with an insane large >> | value which then gets added to rq->prev_steal_time, resulting in a >> permanent >> | wreckage of the accounting. >> >> However, it is still buggy. >> >> rq->prev_steal_time = 0: >> >> As Rik pointed out: >> >> | setting rq->prev_irq_time to 0 in the guest, and then getting a giant >> value from >> | the host, could result in a very large of steal_jiffies. >> >> rq->prev_steal_time_rq = 0: >> >> | steal = paravirt_steal_clock(cpu_of(rq)); >> | steal -= rq->prev_steal_time_rq; >> | >> | if (unlikely(steal > delta)) >> | steal = delta; >> | >> | rq->prev_steal_time_rq += steal; >> | delta -= steal; >> | >> | rq->clock_task += delta; >> >> steal is a giant value and rq->prev_steal_time_rq is 0, >> rq->prev_steal_time_rq >> grows in delta granularity, rq->clock_task can't ramp up until >> rq->prev_steal_time_rq >> catches up steal clock since delta value will be 0 after reducing steal >> time from >> normal execution time. That's why I obersved that cpuhg/1-12 continue >> running >> until rq->prev_steal_time_rq catches up steal clock timestamp. >> >> I believe rq->prev_irq_time has similar issue. So this patch fix it by >> setting >> rq->prev_* to current irq time and steal clock timestamp after a cpu >> hotplug >> comes back. > > I'm not sure this patch is necessary. Instead you could just revert > commit e9532e69b8d1. The previous patch obviously makes it unnecessary > to reset rq->prev_steal_time and rq->prev_steal_time_rq, and the reset > of rq->prev_irq_time looks like a no-op to me. The reason why I'm not just simple revert it is that commit mentioned "steal is smaller than rq->prev_steal_time we end up with an insane large value which then gets added to rq->prev_steal_time, resulting in a permanent wreckage of the accounting." Though I didn't meet such scenario. So I just do what that commit really want to do. Regards, Wanpeng Li