From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757483AbbCDCW4 (ORCPT ); Tue, 3 Mar 2015 21:22:56 -0500 Received: from mail-pd0-f179.google.com ([209.85.192.179]:42530 "EHLO mail-pd0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756819AbbCDCWy (ORCPT ); Tue, 3 Mar 2015 21:22:54 -0500 Message-ID: <54F66C7C.90301@plumgrid.com> Date: Tue, 03 Mar 2015 18:22:52 -0800 From: Alexei Starovoitov User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Tom Zanussi CC: Steven Rostedt , Masami Hiramatsu , Namhyung Kim , Andi Kleen , LKML , Ingo Molnar , Arnaldo Carvalho de Melo Subject: Re: [PATCH v2 00/15] tracing: 'hist' triggers References: <1425397627.20819.161.camel@picadillo> In-Reply-To: <1425397627.20819.161.camel@picadillo> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/3/15 7:47 AM, Tom Zanussi wrote: > On Mon, 2015-03-02 at 18:31 -0800, Alexei Starovoitov wrote: >> On Mon, Mar 2, 2015 at 5:18 PM, Tom Zanussi wrote: >> I'm not proposing to use asm or C for this 'hist->bpf' tool. >> Keep proposed 'hist:keys=...:vals=...' syntax and generate >> bpf program on the fly based on this string. >> So this user tool will take string, generate program, load >> and attach it. Everything based on this single string input. >> With the examples you mentioned in docs, it's not hard. >> It will be more involved than patch 12, but way more generic >> and easily extensible when 'hist:keys=...' string would need >> to be extended. >> > > Hmm, this seems silly to me - doing all that work just to get back to > where we already are. your 'hist->bpf' tool could have been first to solve 'bpf hard to use' problem and overtime it could have evolved into full dtrace alternative. Whereas by adding 'hist:keys=..' parsing to kernel you'll be stuck with it and somebody else's dtrace-like tool will supersede it. > I don't think you can start requiring all new tracing functionality to > embed a code generator/compiler, or as a result force any particular > interface on users. force interface on users?! I think I've explained enough that bpf would not be a user visible interface. It's kernel to user land interface. In my proposal of 'hist->bpf' tool users won't even see that bpf is being generated and run underneath. Same with dtrace-like scripting. Users will see scripting language and won't care how it's compiled and executed inside kernel. Multiple different languages and interfaces are possible. Including hist-like text that's not a language. > If it's possible to factor out a common > infrastructure that can accommodate different user interfaces and > preferences, don't you think it makes sense to do that? > > In any case, outside of the boilerplate tracing infrastructure code, it > seems to me that a lot of the code in the main hist trigger patches is > code that you'd also need for a tracepoint/bpf interface. I think we've > already agreed that it would be good to work towards being able to share > that, so for the next version, I'll see what I can come up with. yes. let's see what pieces can be shared. Though I don't want to sacrifice long term goals for short term solution. In case of bpf_maps, the objective is to share data between kernel and user space by single abstraction with different implementations underneath. In some use cases kernel programs only do lookups into maps, while user space concurrently adding/deleting entries (so no allocations from programs) In your patch 4 you're adding kernel only access to maps but planning to use hash type only and also not exposing these maps to user space at all... that defeats bpf_map purpose and you only reusing ~200 lines of hashtab.c code, but also need to hack it for pre-allocate and make config_bpf_syscall a strong dependency for 'hist'. why? 'hist' patch set would have been much shorter if you just used hashtable.h or rhashtable.h or even rbtree.h (so you don't need to do sorting in patch 13). The actual hash insert/walk code is tiny and trivial. I've looked through the patches again and they seem to have serious bugs when it comes to object life time: - patch 5 that adds clear_map is completely broken, calling delete_all_elements() by wrapping it in rcu_read_lock() ?! - patch 6 adds 'free_elem' callback that is called right from htab_map_delete_elem() while we're still under rcu.. ?! - patch 12 does struct hist_trigger_entry *entry; tracing_map_lookup_elem(hist_data->map, next_key, &entry); hist_trigger_entry_print(m, hist_data, next_key, entry); so when you're storing a pointer inside the map the rcu protection of map_lookup protects only the value of 'entry' pointer. The actual contents of *entry are not protected by anything, so even if you fix 5 and 6, you'll still have severe memory corruptions, since nothing guarantees that contents of *entry are valid when hist_trigger_entry_print() is trying to access it. I think you'd be better off using vanilla hashtable.h Thanks