From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751927AbbCJDHP (ORCPT ); Mon, 9 Mar 2015 23:07:15 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:28386 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750976AbbCJDHN (ORCPT ); Mon, 9 Mar 2015 23:07:13 -0400 Message-ID: <54FE5FBD.3060801@oracle.com> Date: Mon, 09 Mar 2015 21:06:37 -0600 From: David Ahern User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Mike Galbraith CC: Peter Zijlstra , Ingo Molnar , LKML Subject: Re: NMI watchdog triggering during load_balance References: <54F92788.6010007@oracle.com> <1425617559.16821.36.camel@gmx.de> <54F9C155.3050309@oracle.com> <1425665511.7562.36.camel@gmx.de> <54F9F3D7.1030905@oracle.com> <1425670155.3775.30.camel@gmx.de> In-Reply-To: <1425670155.3775.30.camel@gmx.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/6/15 12:29 PM, Mike Galbraith wrote: > On Fri, 2015-03-06 at 11:37 -0700, David Ahern wrote: > >> But, I do not understand how the wrong topology is causing the NMI >> watchdog to trigger. In the end there are still N domains, M groups per >> domain and P cpus per group. Doesn't the balancing walk over all of them >> irrespective of physical topology? > > You have this size extra large CPU domain that you shouldn't have, > massive collisions therein ensue. > I was able to get the socket/cores/threads issue resolved, so the topology is correct. But still need to check out a few things. Thanks Mike and Peter for the suggestions. David