Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Sachin Sant <sachinp@linux.ibm.com>
Subject: Re: [GIT PULL] tracing: Prevent trace_marker being bigger than unsigned short
Date: Sun, 3 Mar 2024 12:09:37 -0800	[thread overview]
Message-ID: <CAHk-=wiTGmAXfHiRB8ku4diLxRpN=Hac_q86=j65oiP3J5uXKg@mail.gmail.com> (raw)
In-Reply-To: <20240303140705.0f655e36@rorschach.local.home>

On Sun, 3 Mar 2024 at 11:07, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> The string in question isn't some random string. It's a print event on
> the ring buffer where the size is strlen(p) rounded up to word size.
> That means, max will be no bigger than word-size - 1 greater than
> strlen(p). That means the chunks of 1024 will never land in the middle
> of garbage.

What a piece of unbelievable crap.

So you didn't actually want the precision in the first place, then you
started limiting it to an insane value because the printk code
complains about insane precision, and then you want to "fix" that by
printing it out in chunks where you know the chunk size won't hit
garbage, but that ends up being a random implementation detail that
you don't even document in that chunking code.

> > What was wrong with saying "don't do that"? You seem to be bending
> > over backwards to doing stupid things, and then insisting on doing
> > them entirely wrong.
>
> Don't do what?

You have this pattern of not actually thinking through the code AT
ALL, andc just fixing symptoms, and then making the code worse.

The whole "let's avoid the symptom of the kernel printk code telling
us that 32kB string precision is crazy by putting a 32kB-1 limit on
it" was clearly just papering over a symptom, not fixing the problem.

Doing insane chunking in 1kB pieces was another "let's paper over the
symptom, not fix the problem".

And now you finally admit that the actual problem was that YOUR
PRECISION WAS STUPID TO BEGIN WITH.

Do you really not see what the truly _fundamental_ problem here is?

Kernel code doesn't "paper over" stuff. We do things *right*. No more
of this crap.

You really need to take a deep look at what you are doing. I spend
more time on your pull requests than I want to, exactly because you
have had this pattern of doing something wrong in the first place, and
then adding MORE CODE to paper over all the problems that initial
wrong decision causes.

This was *exactly* the same same thing that happened on the tracefs
side. You did things wrong, and then you spent a lot of effort in
trying to patch up the resulting problems, instead of going back and
doing it *right*.

And honestly, I still think that the fundamental mistake you have done
is to let people say "I want to have these big strings" and you just
roll over and say "sure, we'll create shit kernel code for you".

WTF do you think it's fine to say "let people do insane things"
instead of just telling people that no, we have sane and small limits.

As a maintainer, one of your jobs is to say "No, we're not doing crazy
stuff". I still think that having so big strings that this came up in
the first place is a sign of the deeper problem, and then the fact
that you had an insane and pointless precision field was just a small
implementation issue.

Doing tracing in the kernel is not some kind of general-purpose thing.
It's ok - in fact, it's a really damn good idea - to just tell people
"yes, you can add strings, but dammit, there needs to be sanity to
it".

So I now tell you that you should

 (a) get rid of the stupid and nonsensical precision

 (b) tell people that their string are limited (and that 4kB is an
_upper_ value to sane string lengths in the kernel)

 (c) really fundamentally stop with the "paper over" things approach
to kernel programming

Large strings are not a "feature". They are a bug.

It's also sad that apparently your strings are counted, but you don't
count them very well, so instead of just using the count (which is
*much* cheaper) you end up using '%s' and do things until you hit a
NUL byte.  Guess what? All our printk infrastructure is designed for
small strings, so '%s' isn't exactly optimized, because we expected
sanity. It ends up in a loop that literally does things one byte at a
time.

And no. The solution is *not* to paper that over by making '%s'
printing more efficient. It's not supposed to need that kind of
efficiency.

Christ. *IF* large strings were a good idea, and you actually almost
have the length encoded, this whole tracing code could have used the
fact that you have that approximate count to do something like

    len = fieldsize-1;
    len &= ~(sizeof(unsigned long)-1);
    len += strlen(fieldbase + len);
    seq_write(buf, fieldbase, len);

and at least now you'd have something *efficient*. Which is at least a
source of pride in itself.

But since I really think the core of the problem is "we shouldn't have
allowed this kind of crap", I think efficiency - while a source of
pride - is polishing a turd and more of the "paper over the
fundamental issue".

And no, the above code sequence isn't wonderfully pretty either. Using
'strlen()' to find the last NUL character in a word is disgusting, and
our kernel 'strlen()' isn't some optimized thing either.

We do have fancier cases for fancier code (the word-at-a-time stuff),
but at least the above only walks things one byte at a time for a tiny
sequence.

                   Linus