From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757568AbcEEPVF (ORCPT ); Thu, 5 May 2016 11:21:05 -0400 Received: from mail-pf0-f179.google.com ([209.85.192.179]:33591 "EHLO mail-pf0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757205AbcEEPVB (ORCPT ); Thu, 5 May 2016 11:21:01 -0400 Date: Thu, 5 May 2016 08:20:57 -0700 From: Alexei Starovoitov To: Steven Rostedt Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Andrew Morton , Alexei Starovoitov Subject: Re: [for-next][PATCH 2/2] tracing: Use temp buffer when filtering events Message-ID: <20160505152055.GA72567@ast-mbp.thefacebook.com> References: <20160504135202.422290539@goodmis.org> <20160504135241.308454993@goodmis.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160504135241.308454993@goodmis.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 04, 2016 at 09:52:04AM -0400, Steven Rostedt wrote: > From: "Steven Rostedt (Red Hat)" > > Filtering of events requires the data to be written to the ring buffer > before it can be decided to filter or not. This is because the parameters of > the filter are based on the result that is written to the ring buffer and > not on the parameters that are passed into the trace functions. > > The ftrace ring buffer is optimized for writing into the ring buffer and > committing. The discard procedure used when filtering decides the event > should be discarded is much more heavy weight. Thus, using a temporary > filter when filtering events can speed things up drastically. > > Without a temp buffer we have: > > # trace-cmd start -p nop > # perf stat -r 10 hackbench 50 > 0.790706626 seconds time elapsed ( +- 0.71% ) > > # trace-cmd start -e all > # perf stat -r 10 hackbench 50 > 1.566904059 seconds time elapsed ( +- 0.27% ) > > # trace-cmd start -e all -f 'common_preempt_count==20' > # perf stat -r 10 hackbench 50 > 1.690598511 seconds time elapsed ( +- 0.19% ) > > # trace-cmd start -e all -f 'common_preempt_count!=20' > # perf stat -r 10 hackbench 50 > 1.707486364 seconds time elapsed ( +- 0.30% ) > > The first run above is without any tracing, just to get a based figure. > hackbench takes ~0.79 seconds to run on the system. > > The second run enables tracing all events where nothing is filtered. This > increases the time by 100% and hackbench takes 1.57 seconds to run. > > The third run filters all events where the preempt count will equal "20" > (this should never happen) thus all events are discarded. This takes 1.69 > seconds to run. This is 10% slower than just committing the events! > > The last run enables all events and filters where the filter will commit all > events, and this takes 1.70 seconds to run. The filtering overhead is > approximately 10%. Thus, the discard and commit of an event from the ring > buffer may be about the same time. > > With this patch, the numbers change: > > # trace-cmd start -p nop > # perf stat -r 10 hackbench 50 > 0.778233033 seconds time elapsed ( +- 0.38% ) > > # trace-cmd start -e all > # perf stat -r 10 hackbench 50 > 1.582102692 seconds time elapsed ( +- 0.28% ) > > # trace-cmd start -e all -f 'common_preempt_count==20' > # perf stat -r 10 hackbench 50 > 1.309230710 seconds time elapsed ( +- 0.22% ) > > # trace-cmd start -e all -f 'common_preempt_count!=20' > # perf stat -r 10 hackbench 50 > 1.786001924 seconds time elapsed ( +- 0.20% ) > > The first run is again the base with no tracing. > > The second run is all tracing with no filtering. It is a little slower, but > that may be well within the noise. > > The third run shows that discarding all events only took 1.3 seconds. This > is a speed up of 23%! The discard is much faster than even the commit. > > The one downside is shown in the last run. Events that are not discarded by > the filter will take longer to add, this is due to the extra copy of the > event. > > Cc: Alexei Starovoitov > Signed-off-by: Steven Rostedt ... > *current_rb = trace_file->tr->trace_buffer.buffer; > + > + if ((trace_file->flags & > + (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) && > + (entry = this_cpu_read(trace_buffered_event))) { > + /* Try to use the per cpu buffer first */ > + val = this_cpu_inc_return(trace_buffered_event_cnt); > + if (val == 1) { > + trace_event_setup(entry, type, flags, pc); > + entry->array[0] = len; > + return entry; > + } > + this_cpu_dec(trace_buffered_event_cnt); tricky :) so the buffer is used only for non-recursive events. If the 2nd event on the same cpu also needs filtering it will be going through normal trace_buffer_lock_reserve() path, but then it means such events will be out of order if both are accepted, right? Is that a concern or not?