From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753825Ab0H0PFy (ORCPT ); Fri, 27 Aug 2010 11:05:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:26523 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752170Ab0H0PFw (ORCPT ); Fri, 27 Aug 2010 11:05:52 -0400 Date: Fri, 27 Aug 2010 11:05:23 -0400 From: Don Zickus To: Robert Richter Cc: Ingo Molnar , Peter Zijlstra , Cyrill Gorcunov , Lin Ming , "fweisbec@gmail.com" , "linux-kernel@vger.kernel.org" , "Huang, Ying" , Yinghai Lu , Andi Kleen Subject: Re: [PATCH -v3] perf, x86: try to handle unknown nmis with running perfctrs Message-ID: <20100827150523.GT4879@redhat.com> References: <9g472epksbkxhgmw6a3qh8r5.1282316687153@email.android.com> <20100820152510.GA4167@elte.hu> <20100823085339.GA26713@elte.hu> <20100826211424.GQ4879@redhat.com> <20100827081038.GF22783@erda.amd.com> <20100827134429.GS4879@redhat.com> <20100827140523.GM22783@erda.amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100827140523.GM22783@erda.amd.com> User-Agent: Mutt/1.5.20 (2009-08-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 27, 2010 at 04:05:23PM +0200, Robert Richter wrote: > > What is funny is that this problem was masked by the > > perf_event_nmi_handler swallowing all the nmis. I wonder if we were > > losing events as a result of this bug too because if you think about it, > > we processed the first event, a second event came in and we accidentally > > ack'd it, thus dropping it on the floor. > > Yes, this could be the case, but only for handled counters. So it > would be interesting to see for this case the status mask of the > current and previous get_status call. The status masks seem to be identical, 0x1 (and when I forced pmc0 unusable, everything was 0x2). > > > Now I wonder how the event was > > ever reloaded, unless it was by accident because of how the scheduler > > deals with perf counters (perf_start/stop all the time). > > The nmi might be queued be the cpu regardless of of the overflow > state. > > I am wondering why this happens at all, because events are disabled by > wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0). Hmm, maybe this is exactly the Heh. Not sure why it isn't working then. Then again you shouldn't need the loop if it was working I would think. > reason because the nmi could fire again after reenabling the counters. > > Is there a reason for disabling all counters? It would be a nice to have that way we wouldn't have to 'eat' all these extra nmis. But I guess it isn't working correctly. Cheers, Don