From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932126AbcFVPmt (ORCPT ); Wed, 22 Jun 2016 11:42:49 -0400 Received: from dbmail.hebserv.net ([78.40.121.80]:33422 "EHLO dbmail.hebserv.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932066AbcFVPmr convert rfc822-to-8bit (ORCPT ); Wed, 22 Jun 2016 11:42:47 -0400 Mime-Version: 1.0 Date: Wed, 22 Jun 2016 15:42:44 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8BIT Message-ID: <59700e91b7c0884329e904fc175031bc@rcube.hebserv.net> X-Mailer: RainLoop/1.9.4.398 From: "Yannis Aribaud" Subject: Re: divide error: 0000 [#1] SMP in task_numa_migrate - handle_mm_fault vanilla 4.4.6 To: linux-kernel@vger.kernel.org In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 21 juin 2016 14:13 "Yannis Aribaud" a écrit: > Hi everyone, > > I recently it this bug in the kernel using a vanilla 4.6.2 release. > It seems that somewhere in the load average calculation a division by 0 occurs (see the stack trace > at the end). > > [snipped] > > I'm not an expert at all but I suspect that is the issue's origin. Shouldn't the function > cfs_rq_load_avg use an atomic_long_read() to avoid this ? After digging a bit more, this can't be the problem as this function obviously can't return negative value. I found that it can maybe come from the update_cfs_rq_load_avg function in the following block: if (atomic_long_read(&cfs_rq->removed_load_avg)) { s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); sa->load_avg = max_t(long, sa->load_avg - r, 0); sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); removed_load = 1; } The max_t(long, sa->load_avg - r, 0) can result in a negative value keeped by the max_t function as the long would wrap up then generate a division by zero in task_h_load function. Best regards, -- Yannis Aribaud