From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752635AbbLNG3F (ORCPT ); Mon, 14 Dec 2015 01:29:05 -0500 Received: from mga09.intel.com ([134.134.136.24]:17777 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752296AbbLNG3D (ORCPT ); Mon, 14 Dec 2015 01:29:03 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,426,1444719600"; d="scan'208";a="12633578" Date: Mon, 14 Dec 2015 06:42:24 +0800 From: Yuyang Du To: Morten Rasmussen Cc: Andrey Ryabinin , Peter Zijlstra , mingo@redhat.com, linux-kernel@vger.kernel.org, Paul Turner , Ben Segall Subject: Re: [PATCH] sched/fair: fix mul overflow on 32-bit systems Message-ID: <20151213224224.GC28098@intel.com> References: <1449838518-26543-1-git-send-email-aryabinin@virtuozzo.com> <20151211132551.GO6356@twins.programming.kicks-ass.net> <20151211133612.GG6373@twins.programming.kicks-ass.net> <566AD6E1.2070005@virtuozzo.com> <20151211175751.GA27552@e105550-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151211175751.GA27552@e105550-lin.cambridge.arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 11, 2015 at 05:57:51PM +0000, Morten Rasmussen wrote: > > >>> if (atomic_long_read(&cfs_rq->removed_load_avg)) { > > >>> - long r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); > > >>> + s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); > > >>> sa->load_avg = max_t(long, sa->load_avg - r, 0); > > >>> sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); > > >> > > >> This makes sense, because sched_avg::load_sum is u64. > > A single removed nice=-20 task should be sufficient to cause the > overflow. Oh yes, it was a wreck, sorry. > > >>> if (atomic_long_read(&cfs_rq->removed_util_avg)) { > > >>> - long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0); > > >>> + s64 r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0); > > >>> sa->util_avg = max_t(long, sa->util_avg - r, 0); > > >>> sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0); > > >>> } > > >> > > >> However sched_avg::util_sum is u32, so this is still wrecked. > > > > > > I seems to have wrecked that in: > > > > > > 006cdf025a33 ("sched/fair: Optimize per entity utilization tracking") > > > > > > maybe just make util_load u64 too? > > It isn't as bad, but the optimization does increase the normal range > (not guaranteed) for util_sum from 47742 to > scale_down(SCHED_LOAD_SCALE)*47742 (= 1024*47742, unless you mess with > the scaling). > > > Is there any guarantee that the final result of expression 'util_sum - r * LOAD_AVG_MAX' always can be represented by s32? > > > > If yes, than we could just do this: > > max_t(s32, (u64)sa->util_sum - r * LOAD_AVG_MAX, 0) > > In most cases 'r' shouldn't exceed 1024 and util_sum not significantly > exceed 1024*47742, but in extreme cases like spawning lots of new tasks > it may potentially overflow 32 bit. Newly created tasks contribute > 1024*47742 each to the rq util_sum, which means that more than ~87 new > tasks on a single rq will get us in trouble I think. > > Without Peter's optimization referenced above, that number should > increase to ~87k tasks as each task only contributed 47742 before, but > 'r' could still cause 32-bit overflow if we remove more than ~87 newly > created tasks in one go. But I'm not sure if that is a situation we > should worry about? > > I think we have to either make util_sum u64 too or look at the > optimizations again. Both can workaround the issue with additional overhead. But I suspectthey will end up going in the wrong direction for util_avg. The question is a big util_sum (much bigger than 1024) may not be in a right range for it to be used in load balancing. The problem is that it is not so good to initiate a new task's util_avg to 1024. At least, it makes much less sense than a new task's load_avg being initiated to its full weight. Because the top util_avg should be well bounded by 1024 - the CPU's full utilization. So, maybe give the initial util_sum to an average of its cfs_rq, like: cfs_rq->avg.util_sum / cfs_rq->load.weight * task->load.weight And make sure that initial value's is bounded on various conditions. Thanks, Yuyang