From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755353AbbCCBTC (ORCPT ); Mon, 2 Mar 2015 20:19:02 -0500 Received: from mga01.intel.com ([192.55.52.88]:11044 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753730AbbCCBTB (ORCPT ); Mon, 2 Mar 2015 20:19:01 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,679,1418112000"; d="scan'208";a="674204432" Message-ID: <1425345539.20819.108.camel@picadillo> Subject: Re: [PATCH v2 00/15] tracing: 'hist' triggers From: Tom Zanussi To: Alexei Starovoitov Cc: Steven Rostedt , Masami Hiramatsu , Namhyung Kim , Andi Kleen , LKML , Ingo Molnar , Arnaldo Carvalho de Melo Date: Mon, 02 Mar 2015 19:18:59 -0600 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4 (3.10.4-4.fc20) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2015-03-02 at 16:01 -0800, Alexei Starovoitov wrote: > On Mon, Mar 2, 2015 at 11:55 AM, Tom Zanussi > wrote: > > > > I disagree that it would be rarely used. In fact, it would probably > > cover about 80% of the use cases that people initially use things like > > systemtap or dtrace for, which I guess is what ebpf is shooting for. > > 'hist' style won't solve any of the use cases I'm targeting with bpf. > So, imo, 'hist' being 80% of dtrace is far from reality... > but let's agree to disagree. it's not that important. > I'm not saying don't do 'hist' at all. > I'm only suggesting to do it differently. > > > I'm also looking at systems that have very little memory and 8Mb of > > storage to work with, so streaming it all to userspace and > > post-processing won't really work on those systems. > > I'm not suggesting to post-process. Quite the opposite. > Let programs do ++ in the kernel, since that's what > your patch 12 is doing, but in a hard coded way. > > > With some thought, though, I think the ebpf system/interpreter could be > > made smart enough to recognize the simple patterns represented by the > > hist triggers, and reuse them internally. So ftrace users get their > > command-line version and it's also something ebpf can reuse. > > I'm saying keep the command line version of hist, but let > user space process it. > I don't buy the argument that you must run it in busybox > without any extra tools. > If you're in busybox, the system is likely idle, so nothing > to trace/analyze. If you have some user space apps, > then it equally easy to add 'hist->bpf' tool. > How about systems that run a single statically linked process with no shell (but a service that can read and write files like/event/trigger and event/hist)? We'd still like to be able to trace those systems. > >> to embedded argument. Why add this to kernel > >> when bpf programs can do the same on demand? > > > > Because this demonstrates that all those things can be done without > > introducing an interpreter into the mix, so why bother with the > > interpreter? > > because interpreter is done once for all use cases, > whereas custom 'hist' code is doing only one thing for one use case. > I agree that the hist functionality is a subset of what can be done with a full-blown interpreter, but it's not doing just one thing for one use case - it covers a whole set of use cases. > >> the kernel ABI exposure is much smaller. > >> So would you consider working together on adding > >> clean bpf+tracepoints infra and corresponding > >> user space bits? > >> We can have small user space parser/compiler for > >> 'hist:keys=common_pid.execname,id.syscall:vals=hitcount' > >> strings that will convert it into bpf program and you'll > >> be able to use it in embedded setups ? > > > > Yeah, wouldn't be averse to working together to create a clean bpf > > +tracepoints infrastructure - I think creating a reusable component like > > this would be a good first step. > > great. > From the program you can emit the same text format > as in your 'cat hist' example. > But it will not be a part of stable kernel ABI, which I think is > one of the main advantages to do such printing from programs > instead of kernel C code. > If you decide to extend what is being printed, you can > tweak 'hist->bpf' tool and print something else. > No one will complain, whereas when you would want > to extend the format of 'hist' file printed by kernel, you'd > need to consider all user tools that are parsing it. > Like we saw in systrace example... > > > BTW, I've actually tried to play around with the BPF samples/, but it > > seems they're not actually hooked into the system i.e. the samples > > Makefile doesn't build them, and it even looks for tools/llvm that's not > > there. I got as far as getting the latest llvm from the location > > mentioened in one of the bpf commit messages, but gave up after it told > > me 'invalid target bpf'. And I couldn't find any documentation on how > > to set it all up - did I just miss that? > > the comment next to 'tool/llvm' says 'point to your llvm' :) > so yes, to build C examples one need to install latest llvm trunk. > If you're saying that existing bpf stuff is hard to use, then yes. Well, I'd say writing BPF 'assembly' to do anything isn't something more than a few users in the world would even consider, so that's completely out. Which means the only practical way to use it is via the C interface. But getting that set up properly doesn't seem straightforward either - it isn't something the Makefile will help with, and there's no documentation on how one might do it. So I tweaked the Makefile to get samples/bpf in the build (I mean the directory is there under samples/, so why do I need to add it to the Makefile myself?) and tried building which failed until I tweaked something else to get it to find the right headers, etc. Finally I got it building the userspace stuff but then found out I needed my own llvm to get the kernel modules built, so searched and found your llvm tree which I thought would configure the bpf backend automatically, but apparently not, since it then failed with llc: invalid target 'bpf' which is where I gave up. Do I need to configure with --target=bpf or something like that? I don't know, and know nothing about llvm, so am kind of stuck. I really do want to try doing something with it, and I understand that you're working on improving the user experience, but at this point it seems users have to jump through a lot of hoops just to get a minimally working setup. Even a small paragraph with some basic instructions would help. Or maybe it's just me, and it works for everyone else out of the box. Tom > I completely agree. It is hard to use. We're working on it. > The user bits can be improved gradually unlike kernel/user > boundary. Once you set it to be 'hist' file format it will stay > forever.