From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932126AbcFVPmt (ORCPT <rfc822;w@1wt.eu>);
	Wed, 22 Jun 2016 11:42:49 -0400
Received: from dbmail.hebserv.net ([78.40.121.80]:33422 "EHLO
	dbmail.hebserv.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S932066AbcFVPmr convert rfc822-to-8bit (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Wed, 22 Jun 2016 11:42:47 -0400
Mime-Version: 1.0
Date: Wed, 22 Jun 2016 15:42:44 +0000
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8BIT
Message-ID: <59700e91b7c0884329e904fc175031bc@rcube.hebserv.net>
X-Mailer: RainLoop/1.9.4.398
From: "Yannis Aribaud" <bugs@d6bell.net>
Subject: Re: divide error: 0000 [#1] SMP in task_numa_migrate -
 handle_mm_fault vanilla 4.4.6
To: linux-kernel@vger.kernel.org
In-Reply-To: <b04bc59cce59e2c027aefad2f1807d92@rcube.hebserv.net>
References: <b04bc59cce59e2c027aefad2f1807d92@rcube.hebserv.net>
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

21 juin 2016 14:13 "Yannis Aribaud" <bugs@d6bell.net> a écrit:
> Hi everyone,
> 
> I recently it this bug in the kernel using a vanilla 4.6.2 release.
> It seems that somewhere in the load average calculation a division by 0 occurs (see the stack trace
> at the end).
>
> [snipped]
> 
> I'm not an expert at all but I suspect that is the issue's origin. Shouldn't the function
> cfs_rq_load_avg use an atomic_long_read() to avoid this ?

After digging a bit more, this can't be the problem as this function obviously can't return negative value.

I found that it can maybe come from the update_cfs_rq_load_avg function in the following block:

	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
		sa->load_avg = max_t(long, sa->load_avg - r, 0);
		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
		removed_load = 1;
	}

The max_t(long, sa->load_avg - r, 0) can result in a negative value keeped by the max_t function as the long would wrap up then generate a division by zero in task_h_load function.

Best regards,
--
Yannis Aribaud