From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S941056AbcKQRNn (ORCPT ); Thu, 17 Nov 2016 12:13:43 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:49050 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934363AbcKQRGC (ORCPT ); Thu, 17 Nov 2016 12:06:02 -0500 Date: Thu, 17 Nov 2016 17:07:00 +0100 From: Peter Zijlstra To: Josh Poimboeuf Cc: Vince Weaver , "linux-kernel@vger.kernel.org" , Ingo Molnar , Arnaldo Carvalho de Melo , "davej@codemonkey.org.uk" , "dvyukov@google.com" , Stephane Eranian Subject: Re: perf: fuzzer KASAN unwind_get_return_address Message-ID: <20161117160700.GF3117@twins.programming.kicks-ass.net> References: <20161115185756.GL3142@twins.programming.kicks-ass.net> <20161115205748.xtroftp55igs55bz@treble> <20161116130337.GT3142@twins.programming.kicks-ass.net> <20161116143746.zoxdxrfqvmx35wln@treble> <20161116144943.GB3117@twins.programming.kicks-ass.net> <20161116145849.GR3157@twins.programming.kicks-ass.net> <20161117044828.vedc3whqkuki624r@treble> <20161117090446.GC3142@twins.programming.kicks-ass.net> <20161117151848.7sdss3g4waanxfsk@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161117151848.7sdss3g4waanxfsk@treble> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 17, 2016 at 09:18:48AM -0600, Josh Poimboeuf wrote: > On Thu, Nov 17, 2016 at 10:04:46AM +0100, Peter Zijlstra wrote: > > On Wed, Nov 16, 2016 at 10:48:28PM -0600, Josh Poimboeuf wrote: > > > Peter or Vince, can you try to recreate with this patch? It dumps the > > > raw stack contents during a stack dump. Hopefully that would give a > > > clue about what's going wrong. > > > > > > Here goes... I'll do another run and get you the results of that as > > well. > > Thanks, I just waded through this and it turned up some good clues. And > according to 'git blame', you might be able to help :-) > > It's not stack corruption. Instead it looks like > __intel_pmu_pebs_event() is creating a bad or stale pt_regs which gets > passed to the unwinder. Specifically, regs->bp points to a seemingly > random address on the NMI stack. Which seems odd, considering the code > itself is running on the same NMI stack. > > I don't know much about the PEBS code but it seems like it's passing > some stale data. Either that or there's some NMI nesting going on. Ooh, indeed. The PEBS record can be quite stale by the time we get to the interrupt. Using those registers for an unwind is 'interesting' at best. Esp. with the multi-pebs stuff that's landed this can be very very stale, but even single pebs can have a radically different stack at interrupt time than we had at record time -- imagine a (i)ret happening in between. Let me consider that code, and what to do about this; its been a while since I went over all that.