Re: [PATCH] perf ordered_events: fix crash in free_dup_event()

From: Stephane Eranian <eranian@google.com>
To: Jiri Olsa <jolsa@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	mingo@elte.hu
Subject: Re: [PATCH] perf ordered_events: fix crash in free_dup_event()
Date: Wed, 8 Aug 2018 14:47:42 -0700	[thread overview]
Message-ID: <CABPqkBRwG_MzfJp1d7+4JKNemoD9cOWqJnXaBWQNR7buESKm2w@mail.gmail.com> (raw)
In-Reply-To: <20180808082347.GB20320@krava>

Hi,

Ok, I found the problem. It still exists upstream , just very tricky to trigger.
Took me lots of time with gdb + watchpoints to track this down, where
in fact it was just in front of me.

From the crashdump:
  Program received signal SIGSEGV, Segmentation fault.
  free_dup_event (oe=0x26a39a0, event=0xffffffff00000000)

I was puzzled by the 0xffffffff00000000. I tracked down where this
value was coming from using watchpoints.
In my case the memory was used before by elfutils to back the struct
Elf. The -1 in the upper bits came from:

file_read_elf () at third_party/elfutils/libelf/elf_begin.c:451
elf->state.elf64.scns.data[cnt].shndx_index = -1;

And yet the next access to that memory location was in the crash. That
meant the memory was released by
elfutils and reused by ordered_events, yet without any initialization.
But looking at alloc_event(), it was
not obvious to figure out how a new_event->event could be uninitialized.

Well, it turns out there is this little hack where the code
commandeers the first element in the oe->buffer to
use as a list_head for the oe->to_free freelist. The problem is that
this entry also gets freed, but its
event->event field is NEVER initialized. So depending on how the
memory was previously used, you
could get a on NULL value and crash in free_dup_event(). This is what
happened to me. I am glad
I pursued this further because the bug is still in the upstream
version. The patch is a one-liner fixing
the initialization of the event->event = NULL. For the other elements
in the list, the initialization is
already done at the end of alloc_event().

I will send the patch separately.

On Wed, Aug 8, 2018 at 1:23 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Aug 07, 2018 at 12:11:05PM -0700, Stephane Eranian wrote:
> > Jiri,
> > On Tue, Aug 7, 2018 at 1:50 AM Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > On Tue, Aug 07, 2018 at 01:16:22AM -0700, Stephane Eranian wrote:
> > > > On Tue, Aug 7, 2018 at 12:20 AM Jiri Olsa <jolsa@redhat.com> wrote:
> > > > >
> > > > > On Mon, Aug 06, 2018 at 06:23:35PM -0700, Stephane Eranian wrote:
> > > > > > Depending on memory allocations, it was possible to get a SEGFAULT in
> > > > > > free_dup_event() because the event pointer was bogus:
> > > > > >
> > > > > > perf[1354]: segfault at ffffffff00000006 ip 00000000004b7fc7
> > > > >
> > > > > is there any reproducer?
> > > > >
> > > > The cmdline is simple:
> > > > $ perf record -e cycles:pp -o - -a sleep 1 | perf inject -b -i - >/dev/null
> > > > I was using v4.13 for my tests and it may be sensitive to compiler.
> > > > Was using LLVM.
> > >
> > > I can't make it fail even when I compile with clang 'make CC=clang'
> > >
> > I checked, my actual reproducer is:
> > $ perf record -o - -e cycles date |  perf inject -b -i - >/dev/null
> > Tue Aug  7 12:03:48 PDT 2018
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.000 MB - ]
> > Segmentation fault (core dumped)
> >
> > the crash is in perf inject.
> > So if I do:
> > $ perf record -o - -e cycles date >tt
> > $ gdb perf
> > (gdb) r inject -b -i - < tt  >/dev/null
> > Program received signal SIGSEGV, Segmentation fault.
> > free_dup_event (oe=0x26a39a0, event=0xffffffff00000000) at
> > util/ordered-events.c:85
> > 85 in util/ordered-events.c
> > (gdb) bt
> > #0  free_dup_event (oe=0x26a39a0, event=0xffffffff00000000) at
> > util/ordered-events.c:85
> > #1  ordered_events__free (oe=0x26a39a0) at util/ordered-events.c:310
> > #2  0x00000000004b5a56 in __perf_session__process_pipe_events
> > (session=<optimized out>) at util/session.c:1753
> > #3  perf_session__process_events (session=<optimized out>) at
> > util/session.c:1932
> > #4  0x000000000043a2eb in __cmd_inject (inject=<optimized out>) at
> > builtin-inject.c:750
> > #5  cmd_inject (argc=<optimized out>, argv=<optimized out>) at
> > builtin-inject.c:924
> > #6  0x000000000046b175 in run_builtin (p=0xabc640 <commands+576>,
> > argc=4, argv=0x7fffffffe560) at perf.c:297
> > #7  0x000000000046b062 in handle_internal_command (argc=4,
> > argv=0x7fffffffe560) at perf.c:349
> > #8  0x000000000046a5e8 in run_argv (argcp=<optimized out>,
> > argv=<optimized out>) at perf.c:393
> > #9  main (argc=4, argv=0x7fffffffe560) at perf.c:531
>
> nice
>
> >
> > Again, this is with an older version of perf compiled with LLVM.
>
> can you also reproduce it with latest code?
>
> > Notice the value of event passed to free_dup_event(): 0xffffffff00000000
> > And yes, I checked sizeof(union_perf_event) = 4168 which is the size of
> > the mmap2_event which is the largest.
> >
> > I also checked that you are freeing what you have actually allocated.
> > No double free.
> > If I add the padding or modifies the call to memdup() as in the patch,
> > then the problem
> > goes away.
> >
> > If you don't want to copy 4Kb each time, then you could also make the
> > event field
> > by a void *event and case whenever needed.
>
> not sure I understand this one.. could you please elaborate?
>
> >
> > I suspect the problem may come from a compiler optimization or assumption which
> > clashes with what you are optimizing here.
>
> I'll go throught the code and try to find somethign.. thanks a lot for the backtrace
>
> jirka
>