From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755509AbcKVMbR (ORCPT ); Tue, 22 Nov 2016 07:31:17 -0500 Received: from terminus.zytor.com ([198.137.202.10]:41818 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755385AbcKVMbO (ORCPT ); Tue, 22 Nov 2016 07:31:14 -0500 Date: Tue, 22 Nov 2016 04:30:24 -0800 From: tip-bot for Peter Zijlstra Message-ID: Cc: dvyukov@google.com, jpoimboe@redhat.com, torvalds@linux-foundation.org, mingo@kernel.org, eranian@google.com, alexander.shishkin@linux.intel.com, acme@kernel.org, jolsa@redhat.com, linux-kernel@vger.kernel.org, acme@redhat.com, tglx@linutronix.de, davej@codemonkey.org.uk, vincent.weaver@maine.edu, eranian@gmail.com, peterz@infradead.org, hpa@zytor.com Reply-To: eranian@google.com, alexander.shishkin@linux.intel.com, acme@kernel.org, jolsa@redhat.com, dvyukov@google.com, jpoimboe@redhat.com, torvalds@linux-foundation.org, mingo@kernel.org, eranian@gmail.com, hpa@zytor.com, peterz@infradead.org, linux-kernel@vger.kernel.org, acme@redhat.com, tglx@linutronix.de, davej@codemonkey.org.uk, vincent.weaver@maine.edu In-Reply-To: <20161117171731.GV3157@twins.programming.kicks-ass.net> References: <20161117171731.GV3157@twins.programming.kicks-ass.net> To: linux-tip-commits@vger.kernel.org Subject: [tip:perf/urgent] perf/x86/intel: Cure bogus unwind from PEBS entries Git-Commit-ID: b8000586c90b4804902058a38d3a59ce5708e695 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: b8000586c90b4804902058a38d3a59ce5708e695 Gitweb: http://git.kernel.org/tip/b8000586c90b4804902058a38d3a59ce5708e695 Author: Peter Zijlstra AuthorDate: Thu, 17 Nov 2016 18:17:31 +0100 Committer: Ingo Molnar CommitDate: Tue, 22 Nov 2016 12:36:58 +0100 perf/x86/intel: Cure bogus unwind from PEBS entries Vince Weaver reported that perf_fuzzer + KASAN detects that PEBS event unwinds sometimes do 'weird' things. In particular, we seemed to be ending up unwinding from random places on the NMI stack. While it was somewhat expected that the event record BP,SP would not match the interrupt BP,SP in that the interrupt is strictly later than the record event, it was overlooked that it could be on an already overwritten stack. Therefore, don't copy the recorded BP,SP over the interrupted BP,SP when we need stack unwinds. Note that its still possible the unwind doesn't full match the actual event, as its entirely possible to have done an (I)RET between record and interrupt, but on average it should still point in the general direction of where the event came from. Also, it's the best we can do, considering. The particular scenario that triggered the bogus NMI stack unwind was a PEBS event with very short period, upon enabling the event at the tail of the PMI handler (FREEZE_ON_PMI is not used), it instantly triggers a record (while still on the NMI stack) which in turn triggers the next PMI. This then causes back-to-back NMIs and we'll try and unwind the stack-frame from the last NMI, which obviously is now overwritten by our own. Analyzed-by: Josh Poimboeuf Reported-by: Vince Weaver Signed-off-by: Peter Zijlstra (Intel) Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Stephane Eranian Cc: Thomas Gleixner Cc: davej@codemonkey.org.uk Cc: dvyukov@google.com Cc: stable@vger.kernel.org Fixes: ca037701a025 ("perf, x86: Add PEBS infrastructure") Link: http://lkml.kernel.org/r/20161117171731.GV3157@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar --- arch/x86/events/intel/ds.c | 35 +++++++++++++++++++++++------------ arch/x86/events/perf_event.h | 2 +- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 0319311..be20239 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1108,20 +1108,20 @@ static void setup_pebs_sample_data(struct perf_event *event, } /* - * We use the interrupt regs as a base because the PEBS record - * does not contain a full regs set, specifically it seems to - * lack segment descriptors, which get used by things like - * user_mode(). + * We use the interrupt regs as a base because the PEBS record does not + * contain a full regs set, specifically it seems to lack segment + * descriptors, which get used by things like user_mode(). * - * In the simple case fix up only the IP and BP,SP regs, for - * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly. - * A possible PERF_SAMPLE_REGS will have to transfer all regs. + * In the simple case fix up only the IP for PERF_SAMPLE_IP. + * + * We must however always use BP,SP from iregs for the unwinder to stay + * sane; the record BP,SP can point into thin air when the record is + * from a previous PMI context or an (I)RET happend between the record + * and PMI. */ *regs = *iregs; regs->flags = pebs->flags; set_linear_ip(regs, pebs->ip); - regs->bp = pebs->bp; - regs->sp = pebs->sp; if (sample_type & PERF_SAMPLE_REGS_INTR) { regs->ax = pebs->ax; @@ -1130,10 +1130,21 @@ static void setup_pebs_sample_data(struct perf_event *event, regs->dx = pebs->dx; regs->si = pebs->si; regs->di = pebs->di; - regs->bp = pebs->bp; - regs->sp = pebs->sp; - regs->flags = pebs->flags; + /* + * Per the above; only set BP,SP if we don't need callchains. + * + * XXX: does this make sense? + */ + if (!(sample_type & PERF_SAMPLE_CALLCHAIN)) { + regs->bp = pebs->bp; + regs->sp = pebs->sp; + } + + /* + * Preserve PERF_EFLAGS_VM from set_linear_ip(). + */ + regs->flags = pebs->flags | (regs->flags & PERF_EFLAGS_VM); #ifndef CONFIG_X86_32 regs->r8 = pebs->r8; regs->r9 = pebs->r9; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 5874d8d..a77ee02 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -113,7 +113,7 @@ struct debug_store { * Per register state. */ struct er_account { - raw_spinlock_t lock; /* per-core: protect structure */ + raw_spinlock_t lock; /* per-core: protect structure */ u64 config; /* extra MSR config */ u64 reg; /* extra MSR number */ atomic_t ref; /* reference count */