From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756084AbaD1TfL (ORCPT ); Mon, 28 Apr 2014 15:35:11 -0400 Received: from mail-qg0-f53.google.com ([209.85.192.53]:50625 "EHLO mail-qg0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754774AbaD1TfH (ORCPT ); Mon, 28 Apr 2014 15:35:07 -0400 X-Google-Original-From: Vince Weaver Date: Mon, 28 Apr 2014 15:38:38 -0400 (EDT) From: Vince Weaver To: Vince Weaver cc: Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, Thomas Gleixner , Steven Rostedt Subject: Re: [perf] more perf_fuzzer memory corruption In-Reply-To: Message-ID: References: <20140417094815.GA9348@gmail.com> <20140417114533.GJ11096@twins.programming.kicks-ass.net> <20140417142213.GA29338@gmail.com> <20140417145418.GM11096@twins.programming.kicks-ass.net> <20140418152314.GY11182@twins.programming.kicks-ass.net> <20140418165958.GQ13658@twins.programming.kicks-ass.net> <20140418171516.GR13658@twins.programming.kicks-ass.net> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org OK, this is my current theory as to what's going on. I'd appreciate any comments. We have an event, let's call it #16. Event #16 is a SW event created and running in the parent on CPU0. CPU0 (parent): calls fork() CPU6 (child): SW Event #16 is still running on CPU0 but is visible on CPU6 because the fd passed through with fork CPU0 (parent) close #16. Event not deallocated because still visible in child CPU0 (parent) kill child CPU6 (child) shutting down. last user of event #16 perf_release() called on event which eventually calls event_sched_out() which calls pmu->del which removes event from swevent_htable *but only on CPU6* **** some sort of race happens with CPU0 (possibly with event_sched_in() and event->state==PERF_EVENT_STATE_INACTIVE) That has event #16 in the cpu0 swevent_htable but not freed the next time ctx_sched_out() happens **** CPU6 (idle) grace period expires, kfree happens the CPU0 hlist still has in the list the now freed (and poisoned) event which causes problems, especially as new events added to the list over-write bytes starting at 0x48 with pprev values. Vince