Re: [PATCH] perf ordered_events: Optimise event object reuse

From: Jiri Olsa <jolsa@redhat.com>
To: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Jiri Olsa <jolsa@kernel.org>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] perf ordered_events: Optimise event object reuse
Date: Wed, 20 May 2020 23:52:34 +0200	[thread overview]
Message-ID: <20200520215234.GO157452@krava> (raw)
In-Reply-To: <20200520130049.GC19431@codeblueprint.co.uk>

On Wed, May 20, 2020 at 02:00:49PM +0100, Matt Fleming wrote:
> On Mon, 18 May, at 02:04:08PM, Jiri Olsa wrote:
> > On Fri, May 15, 2020 at 10:01:51PM +0100, Matt Fleming wrote:
> > > ordered_event objects can be placed on the free object cache list in any
> > > order which means future allocations may not return objects at
> > > sequential locations in memory. Getting non-contiguous objects from the
> > > free cache has bad consequences when later iterating over those objects
> > > in ordered_events__queue().
> > > 
> > > For example, large perf.data files can contain trillions of events and
> > > since objects that are next to each other in the free cache linked list
> > > can point to pretty much anywhere in the object address space, lots of
> > > cycles in ordered_events__queue() are spent servicing DTLB misses.
> > > 
> > > Implement the free object cache using the in-kernel implementation of
> > > interval trees so that objects can always be allocated from the free
> > > object cache in sequential order, improving spatial locality and
> > > reducing DTLB misses.
> > > 
> > > Here are some numbers showing the speed up (reducing in execution time)
> > > when running perf sched latency on sched events data and perf report on
> > > HW_CPU_CYCLES.
> > 
> > really nice, few questions below
> > 
> > > 
> > >  $ perf stat --null -r 10 -- bash -c \
> > > 	"export PAGER=cat ; perf sched latency -i $file --stdio &>/dev/null"
> > > 
> > >   Nr events     File Size   Before    After    Speed up
> > > --------------  ---------  --------  -------  ----------
> > >   123318457470     29MB     0.2149    0.2440    -13.5%
> > 
> > should we be concerned about small data and the extra processing?
>  
> I didn't look into this slowdown originally because it's ~2.9 ms, but
> FYI it looks like this is caused by:
> 
>  - Longer code paths (more instructions)
>  - More branches
>  - More branch mispredicts
> 
> > maybe we could add some option that disables this, at leat to be
> > able to compare times in the future
>  
> Sure. Do you mean a command-line option or build-time config?

command line option would be great

SNIP

> > > diff --git a/tools/perf/tests/free-object-cache.c b/tools/perf/tests/free-object-cache.c
> > > new file mode 100644
> > > index 000000000000..e4395ece7d2b
> > > --- /dev/null
> > > +++ b/tools/perf/tests/free-object-cache.c
> > > @@ -0,0 +1,200 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +#include "tests.h"
> > > +#include <linux/kernel.h>
> > > +
> > > +#define ordered_events__flush_time __test_ordered_events__flush_time
> > > +#define ordered_events__first_time __test_ordered_events__first_time
> > > +#define ordered_events__delete __test_ordered_events__delete
> > > +#define ordered_events__init __test_ordered_events__init
> > > +#define ordered_events__free __test_ordered_events__free
> > > +#define ordered_events__queue __test_ordered_events__queue
> > > +#define ordered_events__reinit __test_ordered_events__reinit
> > > +#define ordered_events__flush __test_ordered_events__flush
> > 
> > I'm excited to see these tests, but why is above needed?
> > 
> > can't you use ordered-events interface as it is? you used only
> > exported functions right?
>  
> Nope, the tests in this file are unit tests so I'm testing
> free_cache_{get,put} which are file-local functions by #include'ing
> ordered-events.c.
> 
> The above define are required to avoid duplicate symbol errors at
> link-time, e.g.
> 
>   util/perf-in.o: In function `ordered_events__flush_time':
>   /home/matt/src/kernels/linux/tools/perf/util/ordered-events.c:461: multiple definition of `ordered_events__flush_time'
>   tests/perf-in.o:/home/matt/src/kernels/linux/tools/perf/tests/../util/ordered-events.c:461: first defined here
> 
> There are other ways to resolve this (linker flags to change the
> symbols) but I couldn't find any precedent with that, so this seemed
> like the easiest and most obvious solution. I'm happy to fix this up any
> other way if you have suggestions though.

hum, could we just make free_cache_{get,put} public?

thanks,
jirka