From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755353AbbCCBTC (ORCPT <rfc822;w@1wt.eu>);
	Mon, 2 Mar 2015 20:19:02 -0500
Received: from mga01.intel.com ([192.55.52.88]:11044 "EHLO mga01.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753730AbbCCBTB (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Mon, 2 Mar 2015 20:19:01 -0500
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.09,679,1418112000"; 
   d="scan'208";a="674204432"
Message-ID: <1425345539.20819.108.camel@picadillo>
Subject: Re: [PATCH v2 00/15] tracing: 'hist' triggers
From: Tom Zanussi <tom.zanussi@linux.intel.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
        Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
        Namhyung Kim <namhyung@kernel.org>, Andi Kleen <andi@firstfloor.org>,
        LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@kernel.org>,
        Arnaldo Carvalho de Melo <acme@infradead.org>
Date: Mon, 02 Mar 2015 19:18:59 -0600
In-Reply-To: <CAMEtUuy6WM7-b1m32x+T=T-d8TNLhMpmrKSzb-SE7O_K9rh-8w@mail.gmail.com>
References: <CAMEtUuy6WM7-b1m32x+T=T-d8TNLhMpmrKSzb-SE7O_K9rh-8w@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.10.4 (3.10.4-4.fc20) 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, 2015-03-02 at 16:01 -0800, Alexei Starovoitov wrote:
> On Mon, Mar 2, 2015 at 11:55 AM, Tom Zanussi
> <tom.zanussi@linux.intel.com> wrote:
> >
> > I disagree that it would be rarely used.  In fact, it would probably
> > cover about 80% of the use cases that people initially use things like
> > systemtap or dtrace for, which I guess is what ebpf is shooting for.
> 
> 'hist' style won't solve any of the use cases I'm targeting with bpf.
> So, imo, 'hist' being 80% of dtrace is far from reality...
> but let's agree to disagree. it's not that important.
> I'm not saying don't do 'hist' at all.
> I'm only suggesting to do it differently.
> 
> > I'm also looking at systems that have very little memory and 8Mb of
> > storage to work with, so streaming it all to userspace and
> > post-processing won't really work on those systems.
> 
> I'm not suggesting to post-process. Quite the opposite.
> Let programs do ++ in the kernel, since that's what
> your patch 12 is doing, but in a hard coded way.
> 
> > With some thought, though, I think the ebpf system/interpreter could be
> > made smart enough to recognize the simple patterns represented by the
> > hist triggers, and reuse them internally.  So ftrace users get their
> > command-line version and it's also something ebpf can reuse.
> 
> I'm saying keep the command line version of hist, but let
> user space process it.
> I don't buy the argument that you must run it in busybox
> without any extra tools.
> If you're in busybox, the system is likely idle, so nothing
> to trace/analyze. If you have some user space apps,
> then it equally easy to add 'hist->bpf' tool.
> 

How about systems that run a single statically linked process with no
shell (but a service that can read and write files like/event/trigger
and event/hist)?  We'd still like to be able to trace those systems.

> >> to embedded argument. Why add this to kernel
> >> when bpf programs can do the same on demand?
> >
> > Because this demonstrates that all those things can be done without
> > introducing an interpreter into the mix, so why bother with the
> > interpreter?
> 
> because interpreter is done once for all use cases,
> whereas custom 'hist' code is doing only one thing for one use case.
> 

I agree that the hist functionality is a subset of what can be done with
a full-blown interpreter, but it's not doing just one thing for one use
case - it covers a whole set of use cases.

> >> the kernel ABI exposure is much smaller.
> >> So would you consider working together on adding
> >> clean bpf+tracepoints infra and corresponding
> >> user space bits?
> >> We can have small user space parser/compiler for
> >> 'hist:keys=common_pid.execname,id.syscall:vals=hitcount'
> >> strings that will convert it into bpf program and you'll
> >> be able to use it in embedded setups ?
> >
> > Yeah, wouldn't be averse to working together to create a clean bpf
> > +tracepoints infrastructure - I think creating a reusable component like
> > this would be a good first step.
> 
> great.
> From the program you can emit the same text format
> as in your 'cat hist' example.
> But it will not be a part of stable kernel ABI, which I think is
> one of the main advantages to do such printing from programs
> instead of kernel C code.
> If you decide to extend what is being printed, you can
> tweak 'hist->bpf' tool and print something else.
> No one will complain, whereas when you would want
> to extend the format of 'hist' file printed by kernel, you'd
> need to consider all user tools that are parsing it.
> Like we saw in systrace example...
> 
> > BTW, I've actually tried to play around with the BPF samples/, but it
> > seems they're not actually hooked into the system i.e. the samples
> > Makefile doesn't build them, and it even looks for tools/llvm that's not
> > there.  I got as far as getting the latest llvm from the location
> > mentioened in one of the bpf commit messages, but gave up after it told
> > me 'invalid target bpf'.  And I couldn't find any documentation on how
> > to set it all up - did I just miss that?
> 
> the comment next to 'tool/llvm' says 'point to your llvm' :)
> so yes, to build C examples one need to install latest llvm trunk.
> If you're saying that existing bpf stuff is hard to use, then yes.

Well, I'd say writing BPF 'assembly' to do anything isn't something more
than a few users in the world would even consider, so that's completely
out. Which means the only practical way to use it is via the C
interface.  But getting that set up properly doesn't seem
straightforward either - it isn't something the Makefile will help with,
and there's no documentation on how one might do it.

So I tweaked the Makefile to get samples/bpf in the build (I mean the
directory is there under samples/, so why do I need to add it to the
Makefile myself?) and tried building which failed until I tweaked
something else to get it to find the right headers, etc.  Finally I got
it building the userspace stuff but then found out I needed my own llvm
to get the kernel modules built, so searched and found your llvm tree
which I thought would configure the bpf backend automatically, but
apparently not, since it then failed with llc: invalid target 'bpf'
which is where I gave up.  Do I need to configure with --target=bpf or
something like that?  I don't know, and know nothing about llvm, so am
kind of stuck.

I really do want to try doing something with it, and I understand that
you're working on improving the user experience, but at this point it
seems users have to jump through a lot of hoops just to get a minimally
working setup.  Even a small paragraph with some basic instructions
would help.  Or maybe it's just me, and it works for everyone else out
of the box.

Tom

> I completely agree. It is hard to use. We're working on it.
> The user bits can be improved gradually unlike kernel/user
> boundary. Once you set it to be 'hist' file format it will stay
> forever.