From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753851Ab0HYUZF (ORCPT ); Wed, 25 Aug 2010 16:25:05 -0400 Received: from mail-ey0-f174.google.com ([209.85.215.174]:39427 "EHLO mail-ey0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751859Ab0HYUZD (ORCPT ); Wed, 25 Aug 2010 16:25:03 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=hcTKLqE1QZp1jNmj5ngbQU3iYz+8La/o0peFUvYyTqnMnoV4x4IdlNAkACSziTtbRF mYB92HQoKuzAGKwB21SYqvK9NgVq92SgLF4YpWl9BXuzn3Uhcx5szKV3oVWOEmHHir4O nBax4nkCGDFaUtK1ooW284gGGP1roGZhI5nE8= Date: Thu, 26 Aug 2010 00:24:58 +0400 From: Cyrill Gorcunov To: Don Zickus Cc: Ingo Molnar , Robert Richter , Peter Zijlstra , Lin Ming , "fweisbec@gmail.com" , "linux-kernel@vger.kernel.org" , "Huang, Ying" , Yinghai Lu , Andi Kleen Subject: Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs Message-ID: <20100825202458.GE14874@lenovo> References: <9g472epksbkxhgmw6a3qh8r5.1282316687153@email.android.com> <20100820152510.GA4167@elte.hu> <20100825094819.GB3198@erda.amd.com> <20100825104130.GA27891@elte.hu> <20100825110006.GB27891@elte.hu> <20100825201106.GH4879@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100825201106.GH4879@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 25, 2010 at 04:11:06PM -0400, Don Zickus wrote: ... > > Uhhuh. NMI received for unknown reason 00 on CPU 15. > > Do you have a strange power saving mode enabled? > > Dazed and confused, but trying to continue > > So I found a Nehalem box that can reliably reproduce Ingo's problem using > something as simple 'perf top'. But like above, I am noticing the > samething, an extra NMI(PMI??) that comes out of nowhere. > > Looking at the data above the delta between nmis is very small compared to > the other nmis. It almost suggests that this is an extra PMI. > Considering there is already two cpu errata discussing extra PMIs under > certain configurations, I wouldn't be surprised if this was a third. > > Cheers, > Don > Oh. I'm not sure if it would be a good idea at all but maybe we could use kind of Robert's idea about "pmu nmi relaxing time" ie some time slice in which we treat nmi's as being from pmu, but not arbitrary number but equal to the number of PMI turned off. Say we handle NMI and found that 4 events are overflowed, we clear them, arm timer and wait for 3 unknow nmis to happen, if they are not happening during some time period we clear this waitqueue, if they happen or partially happen - we destroy the timer. Ie almost the same as Robert's idea but without tsc? Just a thought. -- Cyrill