From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754314AbbCDFBp (ORCPT <rfc822;w@1wt.eu>);
	Wed, 4 Mar 2015 00:01:45 -0500
Received: from mga02.intel.com ([134.134.136.20]:44454 "EHLO mga02.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754149AbbCDFBm (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Wed, 4 Mar 2015 00:01:42 -0500
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.09,685,1418112000"; 
   d="scan'208";a="686562225"
Message-ID: <1425445298.20819.222.camel@picadillo>
Subject: Re: [PATCH v2 00/15] tracing: 'hist' triggers
From: Tom Zanussi <tom.zanussi@linux.intel.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
        Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
        Namhyung Kim <namhyung@kernel.org>, Andi Kleen <andi@firstfloor.org>,
        LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@kernel.org>,
        Arnaldo Carvalho de Melo <acme@infradead.org>
Date: Tue, 03 Mar 2015 23:01:38 -0600
In-Reply-To: <54F66C7C.90301@plumgrid.com>
References: <CAMEtUuxn6mje47qaV_VmRqLFMh_zMrHVPt3QhqOgXqYm8BOwew@mail.gmail.com>
	 <1425397627.20819.161.camel@picadillo> <54F66C7C.90301@plumgrid.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.10.4 (3.10.4-4.fc20) 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, 2015-03-03 at 18:22 -0800, Alexei Starovoitov wrote:
> On 3/3/15 7:47 AM, Tom Zanussi wrote:
> > On Mon, 2015-03-02 at 18:31 -0800, Alexei Starovoitov wrote:
> >> On Mon, Mar 2, 2015 at 5:18 PM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:
> 
> >> I'm not proposing to use asm or C for this 'hist->bpf' tool.
> >> Keep proposed 'hist:keys=...:vals=...' syntax and generate
> >> bpf program on the fly based on this string.
> >> So this user tool will take string, generate program, load
> >> and attach it. Everything based on this single string input.
> >> With the examples you mentioned in docs, it's not hard.
> >> It will be more involved than patch 12, but way more generic
> >> and easily extensible when 'hist:keys=...' string would need
> >> to be extended.
> >>
> >
> > Hmm, this seems silly to me - doing all that work just to get back to
> > where we already are.
> 
> your 'hist->bpf' tool could have been first to solve 'bpf hard to use'
> problem and overtime it could have evolved into full dtrace alternative.
> Whereas by adding 'hist:keys=..' parsing to kernel you'll be stuck
> with it and somebody else's dtrace-like tool will supersede it.
> 

I think there's some misunderstanding there - it was never my intent to
create a full dtrace alternative, really the idea was (and has been,
even before there was any such thing as ebpf in the kernel) only to
provide access to some low-hanging fruit via the standard read/write
interfaces people are used to with ftrace.  The dtrace-like tools were
what I thought ebpf was all about, where users could go after exhausting
the possibilities of a simple table-driven first approach to a problem. 

> > I don't think you can start requiring all new tracing functionality to
> > embed a code generator/compiler, or as a result force any particular
> > interface on users.
> 
> force interface on users?!
> I think I've explained enough that bpf would not be a user visible
> interface. It's kernel to user land interface. In my proposal of
> 'hist->bpf' tool users won't even see that bpf is being generated
> and run underneath. Same with dtrace-like scripting. Users will
> see scripting language and won't care how it's compiled and
> executed inside kernel. Multiple different languages and interfaces
> are possible. Including hist-like text that's not a language.
> 

What about the kernel to user land interface that users have already
been asking for, pseudo-files in debugfs/tracefs?  If you can't do that,
then you're forcing interface on users.  

>  > If it's possible to factor out a common
> > infrastructure that can accommodate different user interfaces and
> > preferences, don't you think it makes sense to do that?
> >
> > In any case, outside of the boilerplate tracing infrastructure code, it
> > seems to me that a lot of the code in the main hist trigger patches is
> > code that you'd also need for a tracepoint/bpf interface.  I think we've
> > already agreed that it would be good to work towards being able to share
> > that, so for the next version, I'll see what I can come up with.
> 
> yes. let's see what pieces can be shared. Though I don't want to
> sacrifice long term goals for short term solution.
> In case of bpf_maps, the objective is to share data between
> kernel and user space by single abstraction with different
> implementations underneath. In some use cases kernel programs only
> do lookups into maps, while user space concurrently adding/deleting
> entries (so no allocations from programs)
> In your patch 4 you're adding kernel only access to maps but planning
> to use hash type only and also not exposing these maps to user space
> at all... that defeats bpf_map purpose and you only reusing ~200
> lines of hashtab.c code, but also need to hack it for pre-allocate
> and make config_bpf_syscall a strong dependency for 'hist'. why?
> 'hist' patch set would have been much shorter if you just used
> hashtable.h or rhashtable.h or even rbtree.h (so you don't need
> to do sorting in patch 13). The actual hash insert/walk code
> is tiny and trivial.

Well, I kind of hinted in the cover letter that ultimately I thought the
tracing_map common code would move out of bpf/syscall.c, and thus break
the dependency on bpf_syscall.  I didn't want to spend time actually
doing that before knowing how the general idea would fly, and am glad
now that I didn't.

But the reason I thought of using it in the first place was that I saw a
map implementation specially designed for tracing and thought it would
make sense to reuse it if possible for what I thought was a
complementary tracing tool.  I did find it useful, and using it did
simplify my overall code, but it seems it's more trouble than it's worth
and not particularly welcome, so yeah, I probably will just look at
something else.

Thanks for your input,

Tom

> I've looked through the patches again and they seem to have serious
> bugs when it comes to object life time:
> - patch 5 that adds clear_map is completely broken, calling
> delete_all_elements() by wrapping it in rcu_read_lock() ?!
> - patch 6 adds 'free_elem' callback that is called right from
> htab_map_delete_elem() while we're still under rcu.. ?!
> - patch 12 does
> struct hist_trigger_entry *entry;
> tracing_map_lookup_elem(hist_data->map, next_key, &entry);
> hist_trigger_entry_print(m, hist_data, next_key, entry);
> so when you're storing a pointer inside the map the rcu protection
> of map_lookup protects only the value of 'entry' pointer.
> The actual contents of *entry are not protected by anything,
> so even if you fix 5 and 6, you'll still have severe memory corruptions,
> since nothing guarantees that contents of *entry are valid when
> hist_trigger_entry_print() is trying to access it.
> 
> I think you'd be better off using vanilla hashtable.h
> 
> Thanks
>