From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754314AbbCDFBp (ORCPT ); Wed, 4 Mar 2015 00:01:45 -0500 Received: from mga02.intel.com ([134.134.136.20]:44454 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754149AbbCDFBm (ORCPT ); Wed, 4 Mar 2015 00:01:42 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,685,1418112000"; d="scan'208";a="686562225" Message-ID: <1425445298.20819.222.camel@picadillo> Subject: Re: [PATCH v2 00/15] tracing: 'hist' triggers From: Tom Zanussi To: Alexei Starovoitov Cc: Steven Rostedt , Masami Hiramatsu , Namhyung Kim , Andi Kleen , LKML , Ingo Molnar , Arnaldo Carvalho de Melo Date: Tue, 03 Mar 2015 23:01:38 -0600 In-Reply-To: <54F66C7C.90301@plumgrid.com> References: <1425397627.20819.161.camel@picadillo> <54F66C7C.90301@plumgrid.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4 (3.10.4-4.fc20) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2015-03-03 at 18:22 -0800, Alexei Starovoitov wrote: > On 3/3/15 7:47 AM, Tom Zanussi wrote: > > On Mon, 2015-03-02 at 18:31 -0800, Alexei Starovoitov wrote: > >> On Mon, Mar 2, 2015 at 5:18 PM, Tom Zanussi wrote: > > >> I'm not proposing to use asm or C for this 'hist->bpf' tool. > >> Keep proposed 'hist:keys=...:vals=...' syntax and generate > >> bpf program on the fly based on this string. > >> So this user tool will take string, generate program, load > >> and attach it. Everything based on this single string input. > >> With the examples you mentioned in docs, it's not hard. > >> It will be more involved than patch 12, but way more generic > >> and easily extensible when 'hist:keys=...' string would need > >> to be extended. > >> > > > > Hmm, this seems silly to me - doing all that work just to get back to > > where we already are. > > your 'hist->bpf' tool could have been first to solve 'bpf hard to use' > problem and overtime it could have evolved into full dtrace alternative. > Whereas by adding 'hist:keys=..' parsing to kernel you'll be stuck > with it and somebody else's dtrace-like tool will supersede it. > I think there's some misunderstanding there - it was never my intent to create a full dtrace alternative, really the idea was (and has been, even before there was any such thing as ebpf in the kernel) only to provide access to some low-hanging fruit via the standard read/write interfaces people are used to with ftrace. The dtrace-like tools were what I thought ebpf was all about, where users could go after exhausting the possibilities of a simple table-driven first approach to a problem. > > I don't think you can start requiring all new tracing functionality to > > embed a code generator/compiler, or as a result force any particular > > interface on users. > > force interface on users?! > I think I've explained enough that bpf would not be a user visible > interface. It's kernel to user land interface. In my proposal of > 'hist->bpf' tool users won't even see that bpf is being generated > and run underneath. Same with dtrace-like scripting. Users will > see scripting language and won't care how it's compiled and > executed inside kernel. Multiple different languages and interfaces > are possible. Including hist-like text that's not a language. > What about the kernel to user land interface that users have already been asking for, pseudo-files in debugfs/tracefs? If you can't do that, then you're forcing interface on users. > > If it's possible to factor out a common > > infrastructure that can accommodate different user interfaces and > > preferences, don't you think it makes sense to do that? > > > > In any case, outside of the boilerplate tracing infrastructure code, it > > seems to me that a lot of the code in the main hist trigger patches is > > code that you'd also need for a tracepoint/bpf interface. I think we've > > already agreed that it would be good to work towards being able to share > > that, so for the next version, I'll see what I can come up with. > > yes. let's see what pieces can be shared. Though I don't want to > sacrifice long term goals for short term solution. > In case of bpf_maps, the objective is to share data between > kernel and user space by single abstraction with different > implementations underneath. In some use cases kernel programs only > do lookups into maps, while user space concurrently adding/deleting > entries (so no allocations from programs) > In your patch 4 you're adding kernel only access to maps but planning > to use hash type only and also not exposing these maps to user space > at all... that defeats bpf_map purpose and you only reusing ~200 > lines of hashtab.c code, but also need to hack it for pre-allocate > and make config_bpf_syscall a strong dependency for 'hist'. why? > 'hist' patch set would have been much shorter if you just used > hashtable.h or rhashtable.h or even rbtree.h (so you don't need > to do sorting in patch 13). The actual hash insert/walk code > is tiny and trivial. Well, I kind of hinted in the cover letter that ultimately I thought the tracing_map common code would move out of bpf/syscall.c, and thus break the dependency on bpf_syscall. I didn't want to spend time actually doing that before knowing how the general idea would fly, and am glad now that I didn't. But the reason I thought of using it in the first place was that I saw a map implementation specially designed for tracing and thought it would make sense to reuse it if possible for what I thought was a complementary tracing tool. I did find it useful, and using it did simplify my overall code, but it seems it's more trouble than it's worth and not particularly welcome, so yeah, I probably will just look at something else. Thanks for your input, Tom > I've looked through the patches again and they seem to have serious > bugs when it comes to object life time: > - patch 5 that adds clear_map is completely broken, calling > delete_all_elements() by wrapping it in rcu_read_lock() ?! > - patch 6 adds 'free_elem' callback that is called right from > htab_map_delete_elem() while we're still under rcu.. ?! > - patch 12 does > struct hist_trigger_entry *entry; > tracing_map_lookup_elem(hist_data->map, next_key, &entry); > hist_trigger_entry_print(m, hist_data, next_key, entry); > so when you're storing a pointer inside the map the rcu protection > of map_lookup protects only the value of 'entry' pointer. > The actual contents of *entry are not protected by anything, > so even if you fix 5 and 6, you'll still have severe memory corruptions, > since nothing guarantees that contents of *entry are valid when > hist_trigger_entry_print() is trying to access it. > > I think you'd be better off using vanilla hashtable.h > > Thanks >