From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1757483AbbCDCW4 (ORCPT <rfc822;w@1wt.eu>);
	Tue, 3 Mar 2015 21:22:56 -0500
Received: from mail-pd0-f179.google.com ([209.85.192.179]:42530 "EHLO
	mail-pd0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1756819AbbCDCWy (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 3 Mar 2015 21:22:54 -0500
Message-ID: <54F66C7C.90301@plumgrid.com>
Date: Tue, 03 Mar 2015 18:22:52 -0800
From: Alexei Starovoitov <ast@plumgrid.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: Tom Zanussi <tom.zanussi@linux.intel.com>
CC: Steven Rostedt <rostedt@goodmis.org>,
        Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
        Namhyung Kim <namhyung@kernel.org>, Andi Kleen <andi@firstfloor.org>,
        LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@kernel.org>,
        Arnaldo Carvalho de Melo <acme@infradead.org>
Subject: Re: [PATCH v2 00/15] tracing: 'hist' triggers
References: <CAMEtUuxn6mje47qaV_VmRqLFMh_zMrHVPt3QhqOgXqYm8BOwew@mail.gmail.com> <1425397627.20819.161.camel@picadillo>
In-Reply-To: <1425397627.20819.161.camel@picadillo>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 3/3/15 7:47 AM, Tom Zanussi wrote:
> On Mon, 2015-03-02 at 18:31 -0800, Alexei Starovoitov wrote:
>> On Mon, Mar 2, 2015 at 5:18 PM, Tom Zanussi <tom.zanussi@linux.intel.com> wrote:

>> I'm not proposing to use asm or C for this 'hist->bpf' tool.
>> Keep proposed 'hist:keys=...:vals=...' syntax and generate
>> bpf program on the fly based on this string.
>> So this user tool will take string, generate program, load
>> and attach it. Everything based on this single string input.
>> With the examples you mentioned in docs, it's not hard.
>> It will be more involved than patch 12, but way more generic
>> and easily extensible when 'hist:keys=...' string would need
>> to be extended.
>>
>
> Hmm, this seems silly to me - doing all that work just to get back to
> where we already are.

your 'hist->bpf' tool could have been first to solve 'bpf hard to use'
problem and overtime it could have evolved into full dtrace alternative.
Whereas by adding 'hist:keys=..' parsing to kernel you'll be stuck
with it and somebody else's dtrace-like tool will supersede it.

> I don't think you can start requiring all new tracing functionality to
> embed a code generator/compiler, or as a result force any particular
> interface on users.

force interface on users?!
I think I've explained enough that bpf would not be a user visible
interface. It's kernel to user land interface. In my proposal of
'hist->bpf' tool users won't even see that bpf is being generated
and run underneath. Same with dtrace-like scripting. Users will
see scripting language and won't care how it's compiled and
executed inside kernel. Multiple different languages and interfaces
are possible. Including hist-like text that's not a language.

 > If it's possible to factor out a common
> infrastructure that can accommodate different user interfaces and
> preferences, don't you think it makes sense to do that?
>
> In any case, outside of the boilerplate tracing infrastructure code, it
> seems to me that a lot of the code in the main hist trigger patches is
> code that you'd also need for a tracepoint/bpf interface.  I think we've
> already agreed that it would be good to work towards being able to share
> that, so for the next version, I'll see what I can come up with.

yes. let's see what pieces can be shared. Though I don't want to
sacrifice long term goals for short term solution.
In case of bpf_maps, the objective is to share data between
kernel and user space by single abstraction with different
implementations underneath. In some use cases kernel programs only
do lookups into maps, while user space concurrently adding/deleting
entries (so no allocations from programs)
In your patch 4 you're adding kernel only access to maps but planning
to use hash type only and also not exposing these maps to user space
at all... that defeats bpf_map purpose and you only reusing ~200
lines of hashtab.c code, but also need to hack it for pre-allocate
and make config_bpf_syscall a strong dependency for 'hist'. why?
'hist' patch set would have been much shorter if you just used
hashtable.h or rhashtable.h or even rbtree.h (so you don't need
to do sorting in patch 13). The actual hash insert/walk code
is tiny and trivial.

I've looked through the patches again and they seem to have serious
bugs when it comes to object life time:
- patch 5 that adds clear_map is completely broken, calling
delete_all_elements() by wrapping it in rcu_read_lock() ?!
- patch 6 adds 'free_elem' callback that is called right from
htab_map_delete_elem() while we're still under rcu.. ?!
- patch 12 does
struct hist_trigger_entry *entry;
tracing_map_lookup_elem(hist_data->map, next_key, &entry);
hist_trigger_entry_print(m, hist_data, next_key, entry);
so when you're storing a pointer inside the map the rcu protection
of map_lookup protects only the value of 'entry' pointer.
The actual contents of *entry are not protected by anything,
so even if you fix 5 and 6, you'll still have severe memory corruptions,
since nothing guarantees that contents of *entry are valid when
hist_trigger_entry_print() is trying to access it.

I think you'd be better off using vanilla hashtable.h

Thanks