From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, T_DKIMWL_WL_MED,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7B59C46470 for ; Wed, 8 Aug 2018 21:48:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 64BD221A8E for ; Wed, 8 Aug 2018 21:48:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="juo1Bnqi" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 64BD221A8E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730616AbeHIAJc (ORCPT ); Wed, 8 Aug 2018 20:09:32 -0400 Received: from mail-wr1-f68.google.com ([209.85.221.68]:40805 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727530AbeHIAJc (ORCPT ); Wed, 8 Aug 2018 20:09:32 -0400 Received: by mail-wr1-f68.google.com with SMTP id h15-v6so3290909wrs.7 for ; Wed, 08 Aug 2018 14:47:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=EYOhkqlMSqAMmDp3jViTrJi8ZBdYbjlxkmDTdzIuNbc=; b=juo1Bnqi6hF8QaHB4B150SMqRpXeJcJpx+Ty3ITNza/+6b9+E/tQ2KZSWUYvOMznrr ETXcF0Z3oorP1fqitumKeCoDKxFlbDzldClO/FxN0knFYEH8Sch3TzVGVDgX+3cePXJi uXOdNoHf+RqUnL9y3XLTRfp+FjBO7tJvrWdgVusWgZeIiRYNcJEqLvDSDJ1VKjGLWKRo MBg1ZmtlgEwxNkRLkEBWC/LHd4VQjZEheR2E4A/NqeMuEHxf89KyghCUp74dLEbhJGEb kz8yQ18gH5zF8HiqGL92CPkke9T/YohRNQDtO7vn3LibOoApj2cErsxzdG/Q8iGK5EkU /r8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EYOhkqlMSqAMmDp3jViTrJi8ZBdYbjlxkmDTdzIuNbc=; b=azNbuhFQbSRYLWMQc5dYceYKb48gorQbr5O9hqlou+o1BjUWkleTpDs4ZJtz/zyPP1 RUbIqY6zvClX7J2cVaN8HyifKmQy+3sCoGXtV5P3eMjvGrKMWfb2ABV5bFU5FFkQ5vdM VTd4jr2Pw3Ic4o1IhV7YtztjU8gtrYX8Ha5NsQ+eIM4dm5148Rm3cTCwseETqrWH4pfb dQN/BfU56b5fFiPQPDk4TY3TDd+86XZUv/+Ypk0uBdMTcq9LZDqacACBf4oj66vu6eBD TxOQLYphXbkQSlpl11RkxVJQvpBmDhFX/PdWmmtJwv2LbldjLsuh6ik1F7EHBzclPQsY kJsw== X-Gm-Message-State: AOUpUlFNIO+j+fc1PNDcRbMLG2MfPghG5f+7Sm6IihedO1V5+DV9bkT6 4ZLJImYwxdmy48l5xoEAN+5JRY1xKU1nu9L80S6RxBZqwB4= X-Google-Smtp-Source: AA+uWPz45IsrMJFXt7GQW8Y9Cpxdc36n6xoDxrZo7awgMsdeRD0Ivz27P4hhFpxS+LqW4WtxVccLgoI6RruaakJwX60= X-Received: by 2002:adf:fdcd:: with SMTP id i13-v6mr2864838wrs.276.1533764875421; Wed, 08 Aug 2018 14:47:55 -0700 (PDT) MIME-Version: 1.0 References: <1533605015-19514-1-git-send-email-eranian@google.com> <20180807072029.GA7716@krava> <20180807085010.GC7716@krava> <20180808082347.GB20320@krava> In-Reply-To: <20180808082347.GB20320@krava> From: Stephane Eranian Date: Wed, 8 Aug 2018 14:47:42 -0700 Message-ID: Subject: Re: [PATCH] perf ordered_events: fix crash in free_dup_event() To: Jiri Olsa Cc: LKML , Arnaldo Carvalho de Melo , Peter Zijlstra , mingo@elte.hu Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Ok, I found the problem. It still exists upstream , just very tricky to trigger. Took me lots of time with gdb + watchpoints to track this down, where in fact it was just in front of me. >From the crashdump: Program received signal SIGSEGV, Segmentation fault. free_dup_event (oe=0x26a39a0, event=0xffffffff00000000) I was puzzled by the 0xffffffff00000000. I tracked down where this value was coming from using watchpoints. In my case the memory was used before by elfutils to back the struct Elf. The -1 in the upper bits came from: file_read_elf () at third_party/elfutils/libelf/elf_begin.c:451 elf->state.elf64.scns.data[cnt].shndx_index = -1; And yet the next access to that memory location was in the crash. That meant the memory was released by elfutils and reused by ordered_events, yet without any initialization. But looking at alloc_event(), it was not obvious to figure out how a new_event->event could be uninitialized. Well, it turns out there is this little hack where the code commandeers the first element in the oe->buffer to use as a list_head for the oe->to_free freelist. The problem is that this entry also gets freed, but its event->event field is NEVER initialized. So depending on how the memory was previously used, you could get a on NULL value and crash in free_dup_event(). This is what happened to me. I am glad I pursued this further because the bug is still in the upstream version. The patch is a one-liner fixing the initialization of the event->event = NULL. For the other elements in the list, the initialization is already done at the end of alloc_event(). I will send the patch separately. On Wed, Aug 8, 2018 at 1:23 AM Jiri Olsa wrote: > > On Tue, Aug 07, 2018 at 12:11:05PM -0700, Stephane Eranian wrote: > > Jiri, > > On Tue, Aug 7, 2018 at 1:50 AM Jiri Olsa wrote: > > > > > > On Tue, Aug 07, 2018 at 01:16:22AM -0700, Stephane Eranian wrote: > > > > On Tue, Aug 7, 2018 at 12:20 AM Jiri Olsa wrote: > > > > > > > > > > On Mon, Aug 06, 2018 at 06:23:35PM -0700, Stephane Eranian wrote: > > > > > > Depending on memory allocations, it was possible to get a SEGFAULT in > > > > > > free_dup_event() because the event pointer was bogus: > > > > > > > > > > > > perf[1354]: segfault at ffffffff00000006 ip 00000000004b7fc7 > > > > > > > > > > is there any reproducer? > > > > > > > > > The cmdline is simple: > > > > $ perf record -e cycles:pp -o - -a sleep 1 | perf inject -b -i - >/dev/null > > > > I was using v4.13 for my tests and it may be sensitive to compiler. > > > > Was using LLVM. > > > > > > I can't make it fail even when I compile with clang 'make CC=clang' > > > > > I checked, my actual reproducer is: > > $ perf record -o - -e cycles date | perf inject -b -i - >/dev/null > > Tue Aug 7 12:03:48 PDT 2018 > > [ perf record: Woken up 1 times to write data ] > > [ perf record: Captured and wrote 0.000 MB - ] > > Segmentation fault (core dumped) > > > > the crash is in perf inject. > > So if I do: > > $ perf record -o - -e cycles date >tt > > $ gdb perf > > (gdb) r inject -b -i - < tt >/dev/null > > Program received signal SIGSEGV, Segmentation fault. > > free_dup_event (oe=0x26a39a0, event=0xffffffff00000000) at > > util/ordered-events.c:85 > > 85 in util/ordered-events.c > > (gdb) bt > > #0 free_dup_event (oe=0x26a39a0, event=0xffffffff00000000) at > > util/ordered-events.c:85 > > #1 ordered_events__free (oe=0x26a39a0) at util/ordered-events.c:310 > > #2 0x00000000004b5a56 in __perf_session__process_pipe_events > > (session=) at util/session.c:1753 > > #3 perf_session__process_events (session=) at > > util/session.c:1932 > > #4 0x000000000043a2eb in __cmd_inject (inject=) at > > builtin-inject.c:750 > > #5 cmd_inject (argc=, argv=) at > > builtin-inject.c:924 > > #6 0x000000000046b175 in run_builtin (p=0xabc640 , > > argc=4, argv=0x7fffffffe560) at perf.c:297 > > #7 0x000000000046b062 in handle_internal_command (argc=4, > > argv=0x7fffffffe560) at perf.c:349 > > #8 0x000000000046a5e8 in run_argv (argcp=, > > argv=) at perf.c:393 > > #9 main (argc=4, argv=0x7fffffffe560) at perf.c:531 > > nice > > > > > Again, this is with an older version of perf compiled with LLVM. > > can you also reproduce it with latest code? > > > Notice the value of event passed to free_dup_event(): 0xffffffff00000000 > > And yes, I checked sizeof(union_perf_event) = 4168 which is the size of > > the mmap2_event which is the largest. > > > > I also checked that you are freeing what you have actually allocated. > > No double free. > > If I add the padding or modifies the call to memdup() as in the patch, > > then the problem > > goes away. > > > > If you don't want to copy 4Kb each time, then you could also make the > > event field > > by a void *event and case whenever needed. > > not sure I understand this one.. could you please elaborate? > > > > > I suspect the problem may come from a compiler optimization or assumption which > > clashes with what you are optimizing here. > > I'll go throught the code and try to find somethign.. thanks a lot for the backtrace > > jirka >