Re: [PATCH] sched/fair: fix mul overflow on 32-bit systems

From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Morten Rasmussen <morten.rasmussen@arm.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	mingo@redhat.com, linux-kernel@vger.kernel.org,
	yuyang.du@intel.com, Paul Turner <pjt@google.com>,
	Ben Segall <bsegall@google.com>
Subject: Re: [PATCH] sched/fair: fix mul overflow on 32-bit systems
Date: Fri, 11 Dec 2015 18:32:56 +0000	[thread overview]
Message-ID: <566B16D8.2060109@arm.com> (raw)
In-Reply-To: <20151211175751.GA27552@e105550-lin.cambridge.arm.com>

On 11/12/15 17:57, Morten Rasmussen wrote:
> On Fri, Dec 11, 2015 at 05:00:01PM +0300, Andrey Ryabinin wrote:
>>
>>
>> On 12/11/2015 04:36 PM, Peter Zijlstra wrote:
>>> On Fri, Dec 11, 2015 at 02:25:51PM +0100, Peter Zijlstra wrote:
>>>> On Fri, Dec 11, 2015 at 03:55:18PM +0300, Andrey Ryabinin wrote:
>>>>> Make 'r' 64-bit type to avoid overflow in 'r * LOAD_AVG_MAX'
>>>>> on 32-bit systems:
>>>>> 	UBSAN: Undefined behaviour in kernel/sched/fair.c:2785:18
>>>>> 	signed integer overflow:
>>>>> 	87950 * 47742 cannot be represented in type 'int'
>>>>>
>>>>> Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
>>>>> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
>>>>> ---
>>>>>  kernel/sched/fair.c | 4 ++--
>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index e3266eb..733f0b8 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -2780,14 +2780,14 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
>>>>>  	int decayed, removed = 0;
>>>>>  
>>>>>  	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
>>>>> -		long r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
>>>>> +		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
>>>>>  		sa->load_avg = max_t(long, sa->load_avg - r, 0);
>>>>>  		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
>>>>
>>>> This makes sense, because sched_avg::load_sum is u64.
> 
> A single removed nice=-20 task should be sufficient to cause the
> overflow.

yeah, this 87950 could be related to a single nice=-20 task
(prio_to_weight[0]) or it is a value aggregated from more than one task.
In any case the error is related to load not util.

> 
>>>>
>>>>>  		removed = 1;
>>>>>  	}
>>>>>  
>>>>>  	if (atomic_long_read(&cfs_rq->removed_util_avg)) {
>>>>> -		long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
>>>>> +		s64 r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
>>>>>  		sa->util_avg = max_t(long, sa->util_avg - r, 0);
>>>>>  		sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
>>>>>  	}
>>>>
>>>> However sched_avg::util_sum is u32, so this is still wrecked.
>>>
>>> I seems to have wrecked that in:
>>>
>>>   006cdf025a33 ("sched/fair: Optimize per entity utilization tracking")
>>>
>>> maybe just make util_load u64 too?
> 
> It isn't as bad, but the optimization does increase the normal range
> (not guaranteed) for util_sum from 47742 to
> scale_down(SCHED_LOAD_SCALE)*47742 (= 1024*47742, unless you mess with
> the scaling).
> 
>> Is there any guarantee that the final result of expression 'util_sum - r * LOAD_AVG_MAX' always can be represented by s32?
>>
>> If yes, than we could just do this:
>> 	max_t(s32, (u64)sa->util_sum - r * LOAD_AVG_MAX, 0)
> 
> In most cases 'r' shouldn't exceed 1024 and util_sum not significantly
> exceed 1024*47742, but in extreme cases like spawning lots of new tasks
> it may potentially overflow 32 bit. Newly created tasks contribute
> 1024*47742 each to the rq util_sum, which means that more than ~87 new
> tasks on a single rq will get us in trouble I think.
> 
> Without Peter's optimization referenced above, that number should
> increase to ~87k tasks as each task only contributed 47742 before, but
> 'r' could still cause 32-bit overflow if we remove more than ~87 newly
> created tasks in one go. But I'm not sure if that is a situation we
> should worry about?
> 
> I think we have to either make util_sum u64 too or look at the
> optimizations again.

But for me the question here is if 'r' for util has to be changed from
long to s64 as well.

IMHO, on 32bit machine we can deal with (2147483648/47742/1024 = 43.9)
43 tasks before overflowing.

Can we have a scenario where >43 tasks with se->avg.util_avg=1024 value
get migrated (migrate_task_rq_fair()) or die (task_dead_fair()) or a
task group dies (free_fair_sched_group()) which has a se->avg.util_avg >
44981 for a specific cpu before the atomic_long_xchg() happens in
update_cfs_rq_load_avg()? Never saw this in my tests so far on ARM machines.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>