From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933661AbdK2QSZ (ORCPT ); Wed, 29 Nov 2017 11:18:25 -0500 Received: from mail.kernel.org ([198.145.29.99]:40808 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933432AbdK2QSW (ORCPT ); Wed, 29 Nov 2017 11:18:22 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9C006218A4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Wed, 29 Nov 2017 11:18:15 -0500 From: Steven Rostedt To: Vladislav Valtchev Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/3] trace-cmd: Making stat to report when the stack tracer is ON Message-ID: <20171129111815.1171c7f3@gandalf.local.home> In-Reply-To: <1511964054.4897.98.camel@gmail.com> References: <20171122180202.9519-1-vladislav.valtchev@gmail.com> <20171122180202.9519-4-vladislav.valtchev@gmail.com> <20171122145002.17c39637@gandalf.local.home> <1511440352.2719.34.camel@gmail.com> <20171129075738.0ccd7d84@gandalf.local.home> <1511964054.4897.98.camel@gmail.com> X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 29 Nov 2017 16:00:54 +0200 Vladislav Valtchev wrote: > On Wed, 2017-11-29 at 07:57 -0500, Steven Rostedt wrote: > > > > Let's think about what the user wants. > > > > If you do a "trace-cmd stat" what are you looking for? You want to see > > what ftrace operations are available. Now let's say we do something > > weird, or someone has some weird modified kernel, and the stack tracer > > shows something that trace-cmd doesn't expect. With a die, it kills the > > tool. > > > > Would you like it if you ran "trace-cmd stat" and got it crashed with > > an error message saying the kernel is doing something it doesn't > > understand? To me, I'd be pissed. I would be cursing at trace-cmd > > saying "I don't give a frick about that, show me what you do know!" > > > > I'm surprised how different kind of users are we :-) > > To me as user, in case there is a kernel bug, the best thing I'd expect > to see is the tool refusing to work and reporting that it does not really > know the state of the tracer because of invalid data in tracefs. > > In other words, I expect a tool to behave like: > "I don't know what is that, so I cannot take any decisions. > Here's the detailed problem (err msg, data). Now only a human may help now". > > The other approach is instead: > "I don't know what is that, but I'll guess by best trying to not piss off the user". No, I want "I don't know what this is (tell user about it) and carry on." The point being, trace-cmd stat does a lot more than check if the stack tracer is on. If it can't figure that out, it should warn that it got confused about it, but it should still report about all the other tracing that it does know about. And who said there was a bug? It could be a modified kernel that was done on purpose. Why should that kill trace-cmd? > > Both approaches have PROs and CONs. It is evident that, in the first case the tool is > pedantic and won't give even a try to do something. In the 2nd case instead, the tool > might guess even correctly at first but, by not exposing the underlying issue, it might > fail (if there's a mem corruption at kernel-level, likely it will) at any moment in a > strange way. In any case, the user will be pissed. Just, in the first case he/she will > benefit from an "early-stage" error, that might make the problem easier to find. > Also, with the 2nd approach, the user won't figure out immediately that the tool is not > guilty, while the kernel is: nobody should blame the poor tool when it had no chance > to get its job done in the first place. It should warn, and continue. It shouldn't die. Warning lets the user know that there was some kind of anomaly that trace-cmd doesn't know of, and the user can investigate further if they want to. Or the user could say "oh yeah, I modified this kernel, or I have an out of date trace-cmd, no problem, it still gives me the information I'm looking for." I see no CON with my approach, but I see many with yours. > > > Now, do you think having a "die" is good there? > > I prefer the "fail-early" approach in general. For a tool like trace-cmd, > I'd implement a layer validating all the input with an option for controlling > validation in hot paths. BUT, since that is not the philosophy of the tool, > adding a check like that only there, does not make much sense. > > It makes sense to take an approach and consistently follow it. > So, I'll fix my patch as you suggested. > > > I hope I'm not "pissing off" you with my long comments :-) Nope not at all :-) I'm just trying to educate you. Please note, the kernel itself does the same thing. And Linus has yelled at people for using BUG_ON() instead of WARN_ON(). He says, don't crash my kernel just because your code screwed up! ftrace itself has lots of self checks. It will shut itself down if it finds an anomaly, but it doesn't crash the kernel. There's one exception, and that's when it gets into a code path during function graph tracing where there's no place to return to. That happened once, and was due to a bug in gcc that caused function graph tracing to make all calls it called not return properly. There was no recovery. But that's the exception and not the rule. -- Steve