All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Perches <joe@perches.com>
To: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>,
	linux-kernel@vger.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	John Ogness <john.ogness@linutronix.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	kernel-team@fb.com, Steven Rostedt <rostedt@goodmis.org>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jason Baron <jbaron@akamai.com>,
	Kees Cook <keescook@chromium.org>,
	linux-api@vger.kernel.org
Subject: Re: [PATCH] printk: Userspace format enumeration support
Date: Sun, 07 Feb 2021 06:58:28 -0800	[thread overview]
Message-ID: <bd53d894b7bb0fcaa520282a04a6487828282695.camel@perches.com> (raw)
In-Reply-To: <YB/1iHwwTi9dOv38@chrisdown.name>

On Sun, 2021-02-07 at 14:13 +0000, Chris Down wrote:
> Joe Perches writes:
> > > There are certainly printks which can't be trivially monitored using the printk
> > > format alone, but the vast majority of the ones that are monitored _do_ have
> > > meaningful formats and can be monitored over time. No solution to this is going
> > > to catch every single case, especially when so much of the information can be
> > > generated dyamically, but this patchset still goes a long way to making printk
> > > monitoring more tractable for use cases like the one described in the
> > > changelog.
> > 
> > For the _vast_ majority of printk strings, this can easily be found
> > and compared using a trivial modification to strings.
> 
> There are several issues with your proposed approach that make it unsuitable 
> for use as part of a reliable production environment:
> 
> 1. It misses printk() formats without KERN_SOH
> 
> printk() formats without KERN_SOH are legal and use MESSAGE_LOGLEVEL_DEFAULT.  
> On my test kernel, your proposed patch loses >5% of printk formats -- over 200 
> messages -- due to this, including critical ones like those about hardware or 
> other errors.

There are _very_ few of those printks without KERN_<level> and those
very few are not generally being changed.

> 2. Users don't always have the kernel image available
> 
> Many of our machines and many of the machines of others like us do not boot 
> using local storage, but instead use PXE or other technologies where the kernel 
> may not be stored during runtime.
> 
> As is described in the changelog, it is necessary to be able to vary 
> remediations not only based on what is already in /dev/kmsg, but also to be 
> able to make decisions about our methodology based on what's _supported_ in the 
> running kernel at runtime, and your proposed approach makes this not viable.

Indirection would alway work.

You could load a separate file with output strings along with your
kernel image.

> 3. `KERN_SOH + level' can appear in other places than just printk strings
> 
> KERN_SOH is just ASCII '\001' -- it's not distinctive or unique, even when 
> paired with a check for something that looks like a level after it. For this 
> reason, your proposed patch results in a non-trivial amount of non-printk 
> related garbage in its output. For example:
> 
>      % binutils/strings -k /tmp/vmlinux | head -5
>      3L)s
>      3L)s
>      c,[]A\
>      c(L)c
>      d$pL)d$`u4
> 
> Fundamentally, one cannot use a tool which just determines whether something is 
> printable to determine semantic intent.

$ kernel_strings --kernel --section ".rodata" vmlinux

I got exactly 0.

> 4. strings(1) output cannot differentiate embedded newlines and new formats
> 
> The following has exactly the same output from strings(1), but will manifest 
> completely differently at printk() time:
> 
>      printk(KERN_ERR "line one\nline two\nline three\n");
>      printk("line four\n");

This is not the preferred output style and is only done in old and
unchanging code.

Your use case in your commit log is looking for _changed_ formats.

On Thu, 2021-02-04 at 15:37 +0000, Chris Down wrote:
> This patch provides a solution to the issue of silently changed or
> deleted printks:

Exactly _how_ many of these use cases do you think exist?

The generally preferred style for the example above would be:

	pr_err("line one\n");
	pr_err("line two\n");
	pr_err("line three\n");
	pr_err("line four\n");

> The originally posted patch _does_ differentiate between these cases, using \0 
> as a reliable separator. Its outputs are, respectively:
> 
>      \0013line one\nline two\nline three\0\nline four\n\0
>      \0013line one\nline two\n\0line three\nline four\n\0
> 
> This isn't just a theoretical concern -- there are plenty of places which use 
> multiline printks, and we must be able to distinguish between that and 
> _multiple_ printks.

Just like there are many places that use buffered printks as the
example I gave earlier.  None of which your proposed solution would find.

> 5. strings(1) is not contextually aware, and cannot be made to act as if it is
> 
> strings has no idea about what it is reading, which is why it is more than 
> happy to output the kind of meaningless output shown in #3. There are plenty of 
> places across the kernel where there might be a sequence of bytes which the 
> strings utility happens to interpret as being semantically meaningful, but in 
> reality just happens to be an unrelated sequence of coincidentally printable 
> bytes that just happens to contain a \001.
> 
> I appreciate your willingness to propose other solutions, but for these 
> reasons, the proposed strings(1) patch would not suffice as an interface for 
> printk enumeration.

I think you are on a path to try to make printk output immutable.
I think that's a _very_ bad path.

I also think this is adding needless complexity.

A possible complexity I would like to support would be optionally
compressing printk format strings at compile time and uncompressing
them at use time.



  reply	other threads:[~2021-02-07 14:59 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-04 15:37 [PATCH] printk: Userspace format enumeration support Chris Down
2021-02-05  0:47 ` Chris Down
2021-02-05 14:26 ` Chris Down
2021-02-05 16:42 ` Petr Mladek
2021-02-05 17:47   ` Steven Rostedt
2021-02-05 22:45     ` Chris Down
2021-02-05 22:49       ` Steven Rostedt
2021-02-06  7:13       ` Greg Kroah-Hartman
2021-02-06 12:44         ` Chris Down
2021-02-05 22:25   ` Chris Down
2021-02-06 17:57     ` Joe Perches
2021-02-06 21:21       ` Chris Down
2021-02-07  4:41         ` Joe Perches
2021-02-07 14:13           ` Chris Down
2021-02-07 14:58             ` Joe Perches [this message]
2021-02-07 16:13               ` Chris Down
2021-02-07 16:53                 ` Chris Down
2021-02-07 22:21 ` kernel test robot
2021-02-07 22:21   ` kernel test robot
2021-02-08  1:13   ` Chris Down

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd53d894b7bb0fcaa520282a04a6487828282695.camel@perches.com \
    --to=joe@perches.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chrisdown.name \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jbaron@akamai.com \
    --cc=john.ogness@linutronix.de \
    --cc=keescook@chromium.org \
    --cc=kernel-team@fb.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.