[v4] printk: Userspace format enumeration support
diff mbox series

Message ID YCafCKg2bAlOw08H@chrisdown.name
State New, archived
Headers show
Series
  • [v4] printk: Userspace format enumeration support
Related show

Commit Message

Chris Down Feb. 12, 2021, 3:30 p.m. UTC
We have a number of systems industry-wide that have a subset of their
functionality that works as follows:

1. Receive a message from local kmsg, serial console, or netconsole;
2. Apply a set of rules to classify the message;
3. Do something based on this classification (like scheduling a
   remediation for the machine), rinse, and repeat.

As a couple of examples of places we have this implemented just inside
Facebook, although this isn't a Facebook-specific problem, we have this
inside our netconsole processing (for alarm classification), and as part
of our machine health checking. We use these messages to determine
fairly important metrics around production health, and it's important
that we get them right.

While for some kinds of issues we have counters, tracepoints, or metrics
with a stable interface which can reliably indicate the issue, in order
to react to production issues quickly we need to work with the interface
which most kernel developers naturally use when developing: printk.

Most production issues come from unexpected phenomena, and as such
usually the code in question doesn't have easily usable tracepoints or
other counters available for the specific problem being mitigated. We
have a number of lines of monitoring defence against problems in
production (host metrics, process metrics, service metrics, etc), and
where it's not feasible to reliably monitor at another level, this kind
of pragmatic netconsole monitoring is essential.

As you'd expect, monitoring using printk is rather brittle for a number
of reasons -- most notably that the message might disappear entirely in
a new version of the kernel, or that the message may change in some way
that the regex or other classification methods start to silently fail.

One factor that makes this even harder is that, under normal operation,
many of these messages are never expected to be hit. For example, there
may be some rare hardware bug which you want to detect if it was to ever
happen again, but its recurrence is not likely or anticipated. This
precludes using something like checking whether the printk in question
was printed somewhere fleetwide recently to determine whether the
message in question is still present or not, since we don't anticipate
that it should be printed anywhere, but still need to monitor for its
future presence in the long-term.

This class of issue has happened on a number of occasions, causing
unhealthy machines with hardware issues to remain in production for
longer than ideal. As a recent example, some monitoring around
blk_update_request fell out of date and caused semi-broken machines to
remain in production for longer than would be desirable.

Searching through the codebase to find the message is also extremely
fragile, because many of the messages are further constructed beyond
their callsite (eg. btrfs_printk and other module-specific wrappers,
each with their own functionality). Even if they aren't, guessing the
format and formulation of the underlying message based on the aesthetics
of the message emitted is not a recipe for success at scale, and our
previous issues with fleetwide machine health checking demonstrate as
much.

This patch provides a solution to the issue of silently changed or
deleted printks: we record pointers to all printk format strings known
at compile time into a new .printk_fmts section, both in vmlinux and
modules. At runtime, this can then be iterated by looking at
<debugfs>/printk/formats/<module>, which emits the same format as
`printk` itself, which we already export elsewhere (for example, in
netconsole).

As an example of how simple a parser for this format can be:

    $ cat pf.py
    #!/usr/bin/env python
    with open("/sys/kernel/debug/printk/formats/vmlinux") as f:
        raw_fmts = f.read().split("\x00")[:-1]
        for raw_fmt in raw_fmts:
            level, fmt = raw_fmt[1], raw_fmt[2:]
            print(f"Level {level}: {fmt!r}")

    $ ./pf.py | shuf -n 5
    Level 4: 'Build ID is too large to include in vmcoreinfo: %u > %u\n'
    Level 3: 'BIOS bug, no explicit IRQ entries, using default mptable. (tell your hw vendor)\n'
    Level 3: 'Failed to execute %s (error %d)\n'
    Level 3: 'CHRDEV "%s" minor range requested (%u-%u) is out of range of maximum range (%u-%u) for a single major\n'
    Level 3: "slub_debug option '%c' unknown. skipped\n"

This mitigates the majority of cases where we have a highly-specific
printk which we want to match on, as we can now enumerate and check
whether the format changed or the printk callsite disappeared entirely
in userspace. This allows us to catch changes to printks we monitor
earlier and decide what to do about it before it becomes problematic.

There is no additional runtime cost for printk callers or printk itself,
and the assembly generated is exactly the same.

Signed-off-by: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>

---

v2:

- Use seq_printf instead of step by step accumulation
- Scope fptr closer to its use
- Prevent seq_file from needing to alloc a new buffer
- Always provide KERN_SOH + level, even if caller omitted it
- Provide one file per module
- Update changelog to show ease of parsing
- Provide printk -> _printk for ARCH=um (thanks, LKP robot)
- Move to debugfs (thanks, Steven)

---

v3:

- Reduce locking during open by moving size lifetime to the inode
- Don't explicitly check for debugfs errors (thanks, gregkh)

---

v4:

- Fix a missed `extern int printk` fixup in ia64 (thanks, lkp)
---
 arch/arm/kernel/entry-v7m.S          |   2 +-
 arch/arm/lib/backtrace-clang.S       |   2 +-
 arch/arm/lib/backtrace.S             |   2 +-
 arch/arm/mach-rpc/io-acorn.S         |   2 +-
 arch/arm/vfp/vfphw.S                 |   6 +-
 arch/ia64/include/uapi/asm/cmpxchg.h |   4 +-
 arch/openrisc/kernel/entry.S         |   6 +-
 arch/powerpc/kernel/head_fsl_booke.S |   2 +-
 arch/um/include/shared/user.h        |   3 +-
 arch/x86/kernel/head_32.S            |   2 +-
 include/asm-generic/vmlinux.lds.h    |  13 ++
 include/linux/module.h               |   5 +
 include/linux/printk.h               |  43 ++++-
 init/Kconfig                         |  14 ++
 kernel/module.c                      |   5 +
 kernel/printk/printk.c               | 231 ++++++++++++++++++++++++++-
 16 files changed, 318 insertions(+), 24 deletions(-)

Comments

kernel test robot Feb. 12, 2021, 6:01 p.m. UTC | #1
Hi Chris,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on jeyu/modules-next]
[also build test ERROR on linux/master soc/for-next openrisc/for-next powerpc/next uml/linux-next asm-generic/master linus/master v5.11-rc7 next-20210211]
[cannot apply to tip/x86/core pmladek/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Chris-Down/printk-Userspace-format-enumeration-support/20210212-233240
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux.git modules-next
config: ia64-randconfig-r016-20210209 (attached as .config)
compiler: ia64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/bff4c8b4b0b9bcc31917d3b0f1a01a7573a1f473
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Chris-Down/printk-Userspace-format-enumeration-support/20210212-233240
        git checkout bff4c8b4b0b9bcc31917d3b0f1a01a7573a1f473
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=ia64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/kernel.h:16,
                    from include/asm-generic/bug.h:20,
                    from arch/ia64/include/asm/bug.h:17,
                    from include/linux/bug.h:5,
                    from include/linux/page-flags.h:10,
                    from kernel/bounds.c:10:
>> include/linux/printk.h:219:5: error: static declaration of '_printk' follows non-static declaration
     219 | int _printk(const char *s, ...)
         |     ^~~~~~~
   In file included from arch/ia64/include/uapi/asm/intrinsics.h:22,
                    from arch/ia64/include/asm/intrinsics.h:11,
                    from arch/ia64/include/asm/bitops.h:19,
                    from include/linux/bitops.h:32,
                    from include/linux/kernel.h:11,
                    from include/asm-generic/bug.h:20,
                    from arch/ia64/include/asm/bug.h:17,
                    from include/linux/bug.h:5,
                    from include/linux/page-flags.h:10,
                    from kernel/bounds.c:10:
   arch/ia64/include/uapi/asm/cmpxchg.h:142:14: note: previous declaration of '_printk' was here
     142 |   extern int _printk(const char *fmt, ...);  \
         |              ^~~~~~~
   arch/ia64/include/asm/bitops.h:309:3: note: in expansion of macro 'CMPXCHG_BUGCHECK'
     309 |   CMPXCHG_BUGCHECK(m);
         |   ^~~~~~~~~~~~~~~~
--
   In file included from include/linux/kernel.h:16,
                    from include/asm-generic/bug.h:20,
                    from arch/ia64/include/asm/bug.h:17,
                    from include/linux/bug.h:5,
                    from include/linux/page-flags.h:10,
                    from kernel/bounds.c:10:
>> include/linux/printk.h:219:5: error: static declaration of '_printk' follows non-static declaration
     219 | int _printk(const char *s, ...)
         |     ^~~~~~~
   In file included from arch/ia64/include/uapi/asm/intrinsics.h:22,
                    from arch/ia64/include/asm/intrinsics.h:11,
                    from arch/ia64/include/asm/bitops.h:19,
                    from include/linux/bitops.h:32,
                    from include/linux/kernel.h:11,
                    from include/asm-generic/bug.h:20,
                    from arch/ia64/include/asm/bug.h:17,
                    from include/linux/bug.h:5,
                    from include/linux/page-flags.h:10,
                    from kernel/bounds.c:10:
   arch/ia64/include/uapi/asm/cmpxchg.h:142:14: note: previous declaration of '_printk' was here
     142 |   extern int _printk(const char *fmt, ...);  \
         |              ^~~~~~~
   arch/ia64/include/asm/bitops.h:309:3: note: in expansion of macro 'CMPXCHG_BUGCHECK'
     309 |   CMPXCHG_BUGCHECK(m);
         |   ^~~~~~~~~~~~~~~~
   make[2]: *** [scripts/Makefile.build:117: kernel/bounds.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [Makefile:1206: prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:185: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.


vim +/_printk +219 include/linux/printk.h

   201	
   202	char *log_buf_addr_get(void);
   203	u32 log_buf_len_get(void);
   204	void log_buf_vmcoreinfo_setup(void);
   205	void __init setup_log_buf(int early);
   206	__printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
   207	void dump_stack_print_info(const char *log_lvl);
   208	void show_regs_print_info(const char *log_lvl);
   209	extern asmlinkage void dump_stack(void) __cold;
   210	extern void printk_safe_flush(void);
   211	extern void printk_safe_flush_on_panic(void);
   212	#else
   213	static inline __printf(1, 0)
   214	int vprintk(const char *s, va_list args)
   215	{
   216		return 0;
   217	}
   218	static inline __printf(1, 2) __cold
 > 219	int _printk(const char *s, ...)
   220	{
   221		return 0;
   222	}
   223	static inline __printf(1, 2) __cold
   224	int _printk_deferred(const char *s, ...)
   225	{
   226		return 0;
   227	}
   228	static inline int printk_ratelimit(void)
   229	{
   230		return 0;
   231	}
   232	static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies,
   233						  unsigned int interval_msec)
   234	{
   235		return false;
   236	}
   237	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Chris Down Feb. 13, 2021, 2:29 p.m. UTC | #2
kernel test robot writes:
>Hi Chris,
>
>Thank you for the patch! Yet something to improve:

I'm pretty sure !CONFIG_PRINTK && CONFIG_IA64_DEBUG_CMPXCHG has been broken 
like this long before this change.

With !CONFIG_PRINTK, printk() is static in the header, but ia64's cmpxchg.h 
with CONFIG_IA64_DEBUG_CMPXCHG doesn't take this into account before trying to 
use it as extern.

CONFIG_IA64_DEBUG_CMPXCHG should be dependent on CONFIG_PRINTK. I'll send a 
separate patch for that, no reason to put it in this series. Not sure why LKP 
only caught it now though.

Here's how it is even on the parent commit LKP applied this patch to:

     % git checkout bff4c8b4b0b9bcc31917d3b0f1a01a7573a1f473^
     Previous HEAD position was bff4c8b4b0b9 printk: Userspace format enumeration support
     HEAD is now at 1fa67f8391ac module: mark TRIM_UNUSED_KSYMS as BROKEN on powerpc
     % COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 /tmp/make.cross ARCH=ia64 -s
     Compiler will be installed in /home/cdown/0day
     make W=1 CROSS_COMPILE=/home/cdown/0day/gcc-9.3.0-nolibc/ia64-linux/bin/ia64-linux- --jobs=32 ARCH=ia64 -s
     In file included from ./include/linux/kernel.h:16,
                      from ./include/asm-generic/bug.h:20,
                      from ./arch/ia64/include/asm/bug.h:17,
                      from ./include/linux/bug.h:5,
                      from ./include/linux/page-flags.h:10,
                      from kernel/bounds.c:10:
     ./include/linux/printk.h:219:5: error: static declaration of 'printk' follows non-static declaration
       219 | int printk(const char *s, ...)
           |     ^~~~~~
     In file included from ./arch/ia64/include/uapi/asm/intrinsics.h:22,
                      from ./arch/ia64/include/asm/intrinsics.h:11,
                      from ./arch/ia64/include/asm/bitops.h:19,
                      from ./include/linux/bitops.h:32,
                      from ./include/linux/kernel.h:11,
                      from ./include/asm-generic/bug.h:20,
                      from ./arch/ia64/include/asm/bug.h:17,
                      from ./include/linux/bug.h:5,
                      from ./include/linux/page-flags.h:10,
                      from kernel/bounds.c:10:
     ./arch/ia64/include/uapi/asm/cmpxchg.h:142:14: note: previous declaration of 'printk' was here
       142 |   extern int printk(const char *fmt, ...);  \
           |              ^~~~~~
     ./arch/ia64/include/asm/bitops.h:309:3: note: in expansion of macro 'CMPXCHG_BUGCHECK'
       309 |   CMPXCHG_BUGCHECK(m);
           |   ^~~~~~~~~~~~~~~~
     make[1]: *** [scripts/Makefile.build:117: kernel/bounds.s] Error 1
     make[1]: *** Waiting for unfinished jobs....
     make: *** [Makefile:1206: prepare0] Error 2
Chris Down Feb. 13, 2021, 3:15 p.m. UTC | #3
Chris Down writes:
>kernel test robot writes:
>>Hi Chris,
>>
>>Thank you for the patch! Yet something to improve:
>
>I'm pretty sure !CONFIG_PRINTK && CONFIG_IA64_DEBUG_CMPXCHG has been 
>broken like this long before this change.

I sent "ia64: Depend on non-static printk for cmpxchg debug"[0] to ia64 folks.  
It shouldn't block this patch, this is already broken.

0: https://lore.kernel.org/patchwork/patch/1380380/
Petr Mladek Feb. 16, 2021, 4 p.m. UTC | #4
On Fri 2021-02-12 15:30:16, Chris Down wrote:
> We have a number of systems industry-wide that have a subset of their
> functionality that works as follows:
> 
> 1. Receive a message from local kmsg, serial console, or netconsole;
> 2. Apply a set of rules to classify the message;
> 3. Do something based on this classification (like scheduling a
>    remediation for the machine), rinse, and repeat.
> 
> As a couple of examples of places we have this implemented just inside
> Facebook, although this isn't a Facebook-specific problem, we have this
> inside our netconsole processing (for alarm classification), and as part
> of our machine health checking. We use these messages to determine
> fairly important metrics around production health, and it's important
> that we get them right.
> 
> 
> This patch provides a solution to the issue of silently changed or
> deleted printks: we record pointers to all printk format strings known
> at compile time into a new .printk_fmts section, both in vmlinux and
> modules. At runtime, this can then be iterated by looking at
> <debugfs>/printk/formats/<module>, which emits the same format as
> `printk` itself, which we already export elsewhere (for example, in
> netconsole).

> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 5a95c688621f..adf545ba9eb9 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> +
> +static const struct file_operations dfs_formats_fops = {
> +	.open    = debugfs_pf_open,
> +	.read    = seq_read,
> +	.llseek  = seq_lseek,
> +	.release = single_release,
> +};
> +
> +static size_t printk_fmt_size(const char *fmt)
> +{
> +	size_t sz = strlen(fmt) + 1;
> +
> +	/*
> +	 * Some printk formats don't start with KERN_SOH + level. We will add
> +	 * it later when rendering the output.
> +	 */
> +	if (unlikely(fmt[0] != KERN_SOH_ASCII))
> +		sz += 2;

This approach is hard to maintain. It might be pretty hard and error
prone to count the size if we want to provide more information.

There are many files in debugfs with not-well defined size.
They are opened by seq_open_private(). It allows to add
a line by line by an iterator.

For example:

	+ /sys/kernel/debug/dynamic_debug/control is opened by
	  ddebug_proc_open() in lib/dynamic_debug.c

	+ /sys/kernel/debug/tracing/available_filter_functions
	  is opened by ftrace_avail_open() in kernel/trace/ftrace.c



> +
> +	return sz;
> +}
> +
> +static struct printk_fmt_sec *find_printk_fmt_sec(struct module *mod)
> +{
> +	struct printk_fmt_sec *ps = NULL;
> +
> +	hash_for_each_possible(printk_fmts_mod_sections, ps, hnode,
> +			       (unsigned long)mod)
> +		if (ps->module == mod)
> +			return ps;
> +
> +	return NULL;
> +}
> +
> +static void store_printk_fmt_sec(struct module *mod, const char **start,
> +				 const char **end)
> +{
> +	struct printk_fmt_sec *ps = NULL;
> +	const char **fptr = NULL;
> +	size_t size = 0;
> +
> +	ps = kmalloc(sizeof(struct printk_fmt_sec), GFP_KERNEL);
> +	if (!ps)
> +		return;
> +
> +	ps->module = mod;
> +	ps->start = start;
> +	ps->end = end;
> +
> +	for (fptr = ps->start; fptr < ps->end; fptr++)
> +		size += printk_fmt_size(*fptr);
> +
> +	mutex_lock(&printk_fmts_mutex);
> +	hash_add(printk_fmts_mod_sections, &ps->hnode, (unsigned long)mod);
> +	mutex_unlock(&printk_fmts_mutex);
> +
> +	ps->file = debugfs_create_file(ps_get_module_name(ps), 0444,
> +				       dfs_formats, mod, &dfs_formats_fops);
> +
> +	if (!IS_ERR(ps->file))
> +		d_inode(ps->file)->i_size = size;

We should revert the changes when the file could not get crated.
It does not make sense to keep the structure when the file is not
there.

I guess that remove_printk_fmt_sec() would even crash when
ps->file was set to an error code.

> +}
> +
> +#ifdef CONFIG_MODULES
> +static void remove_printk_fmt_sec(struct module *mod)
> +{
> +	struct printk_fmt_sec *ps = NULL;
> +
> +	if (WARN_ON_ONCE(!mod))
> +		return;
> +
> +	mutex_lock(&printk_fmts_mutex);
> +
> +	ps = find_printk_fmt_sec(mod);
> +	if (!ps) {
> +		mutex_unlock(&printk_fmts_mutex);
> +		return;
> +	}
> +
> +	hash_del(&ps->hnode);
> +
> +	mutex_unlock(&printk_fmts_mutex);
> +
> +	debugfs_remove(ps->file);

IMHO, we should remove the file before we remove the way how
to read it. This should be done in the opposite order
than in store_printk_fmt_sec().

> +	kfree(ps);
> +}
> +

Best Regards,
Petr
Petr Mladek Feb. 16, 2021, 5:14 p.m. UTC | #5
Hi,

this is from Nitpicker's department.

On Fri 2021-02-12 15:30:16, Chris Down wrote:
> We have a number of systems industry-wide that have a subset of their
> functionality that works as follows:
> 
> 1. Receive a message from local kmsg, serial console, or netconsole;
> 2. Apply a set of rules to classify the message;
> 3. Do something based on this classification (like scheduling a
>    remediation for the machine), rinse, and repeat.
> 
> As a couple of examples of places we have this implemented just inside
> Facebook, although this isn't a Facebook-specific problem, we have this
> inside our netconsole processing (for alarm classification), and as part
> of our machine health checking. We use these messages to determine
> fairly important metrics around production health, and it's important
> that we get them right.
> 
> This patch provides a solution to the issue of silently changed or
> deleted printks: we record pointers to all printk format strings known
> at compile time into a new .printk_fmts section, both in vmlinux and
> modules. At runtime, this can then be iterated by looking at
> <debugfs>/printk/formats/<module>, which emits the same format as
> `printk` itself, which we already export elsewhere (for example, in
> netconsole).

Please show how the format really look like. It is not
exactly as in netconsole.


> As an example of how simple a parser for this format can be:
> 
>     $ cat pf.py
>     #!/usr/bin/env python
>     with open("/sys/kernel/debug/printk/formats/vmlinux") as f:
>         raw_fmts = f.read().split("\x00")[:-1]
>         for raw_fmt in raw_fmts:
>             level, fmt = raw_fmt[1], raw_fmt[2:]
>             print(f"Level {level}: {fmt!r}")
> 
>     $ ./pf.py | shuf -n 5
>     Level 4: 'Build ID is too large to include in vmcoreinfo: %u > %u\n'
>     Level 3: 'BIOS bug, no explicit IRQ entries, using default mptable. (tell your hw vendor)\n'
>     Level 3: 'Failed to execute %s (error %d)\n'
>     Level 3: 'CHRDEV "%s" minor range requested (%u-%u) is out of range of maximum range (%u-%u) for a single major\n'
>     Level 3: "slub_debug option '%c' unknown. skipped\n"

This looks much better than the original content. The kernel should
produce something human readable out of box.

Ah, this should have been mentioned in the reply prefixed by "output".

> diff --git a/init/Kconfig b/init/Kconfig
> index ab2e916f7e8b..f1f37a060235 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -764,6 +764,20 @@ config PRINTK_SAFE_LOG_BUF_SHIFT
>  		     13 =>   8 KB for each CPU
>  		     12 =>   4 KB for each CPU
>  
> +config PRINTK_ENUMERATION
> +	bool "Printk enumeration debugfs interface"
> +
> +	depends on PRINTK && DEBUG_FS
> +	help
> +	  Add support for enumeration of all printk formats known at compile
> +	  time at <debugfs>/printk/formats/<module>.
> +
> +	  This can be used as part of maintaining daemons which monitor
> +	  /dev/kmsg, as it permits auditing the printk formats present in a
> +	  kernel, allowing monitoring of cases where monitored printks are
> +	  changed or no longer present.
> +
> +	  There is no additional runtime cost to printk with this enabled.
> +
>  #
>  # Architectures with an unreliable sched_clock() should select this:
>  #
> diff --git a/kernel/module.c b/kernel/module.c
> index 1e5aad812310..7b0f2a2f428e 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3429,6 +3429,11 @@ static int find_module_sections(struct module *mod, struct load_info *info)
>  						sizeof(unsigned long),
>  						&mod->num_kprobe_blacklist);
>  #endif
> +#ifdef CONFIG_PRINTK_ENUMERATION
> +	mod->printk_fmts_start = section_objs(info, ".printk_fmts",
> +					      sizeof(*mod->printk_fmts_start),
> +					      &mod->printk_fmts_sec_size);

I wonder if we could find a better name for the configure switch.
I have troubles to imagine what printk enumeration might mean.
Well, it might be because I am not a native speaker.

Anyway, the word "enumeration" is used only in the configure option.
Everything else is "printk_fmt"

What about DEBUG_PRINTK_FORMATS?

> +#endif
>  #ifdef CONFIG_HAVE_STATIC_CALL_INLINE
>  	mod->static_call_sites = section_objs(info, ".static_call_sites",
>  					      sizeof(*mod->static_call_sites),
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 5a95c688621f..adf545ba9eb9 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -47,6 +47,8 @@
>  #include <linux/sched/clock.h>
>  #include <linux/sched/debug.h>
>  #include <linux/sched/task_stack.h>
> +#include <linux/debugfs.h>
> +#include <linux/hashtable.h>
>  
>  #include <linux/uaccess.h>
>  #include <asm/sections.h>
> @@ -617,6 +619,222 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
>  	return len;
>  }

printk.c is already too big. Please, implement this feature in a
separate source file, e.g. kernel/printk/debug_formats.c.

Please, use kernel/printk/internal.h if you need some function
from printk.c that have not been public before.

> +#ifdef CONFIG_PRINTK_ENUMERATION
> +
> +/*
> + * debugfs/printk/formats/ - userspace enumeration of printk formats
> + *
> + * The format is the same as typically used by printk, <KERN_SOH><level>fmt,
> + * with each distinct format separated by \0.
> + */


> struct printk_fmt_sec {
> +	struct hlist_node hnode;
> +	struct module *module;

Please, use "struct module *mod". It is a more common.

> +	struct dentry *file;
> +	const char **start;
> +	const char **end;
> +};

Please, document the meaning of the fields, ideally using the doc
style or how is the style called:

/**
 * struct printk_fmt_sec - 
 * @hnode:	node for the hash table
 * @new_func:	pointer to the patched function code


> +
> +/* The base dir for module formats, typically debugfs/printk/formats/ */
> +struct dentry *dfs_formats;
> +
> +/*
> + * Stores .printk_fmt section boundaries for vmlinux and all loaded modules.
> + * Add entries with store_printk_fmt_sec, remove entries with
> + * remove_printk_fmt_sec.
> + */
> +static DEFINE_HASHTABLE(printk_fmts_mod_sections, 8);

The hash table looks like an overkill. This is slow path. There are
typically only tens of loaded modules. Even the module loader
uses plain list for iterating the list of modules.

> +
> +/* Protects printk_fmts_mod_sections */
> +static DEFINE_MUTEX(printk_fmts_mutex);

What is the rule for using "printk_fmts" and "printk_fmt", please?
I can't find the system here ;-)

Anyway, I would prefer to use "printk_fmt" everywhere.
Or maybe even "pf_".

> +
> +static const char *ps_get_module_name(const struct printk_fmt_sec *ps);
> +static int debugfs_pf_open(struct inode *inode, struct file *file);

There are used many different:

   + shortcuts: fmt, fmts, ps, fmt_sec, dfs

   + styles/ordering: ps_get_module_name() vs.
		      find_printk_fmt_sec() vs.
		      printk_fmt_size() vs.
		      debugfs_pf_open()

It might be bearable because there is not much code. But it would
still help a lot when we make it more consistent. Many subsystems
prefer using a feature-specific prefix.

It might be "printk_fmt_" or "pf_" [*] in this case. And we could use
names like:

	+ struct pf_object [**]
	+ pf_get_object_name()
	+ pf_find_object()
	+ pf_fops,
	+ pf_open()
	+ pf_release()
	+ pf_mutex,
	+ pf_add_object()
	+ pf_remove_object()
	+ pf_module_notify

[*] "pf_" is inspired by kernel/printk/printk_ringbuffer.c that
     uses "prb_" prefix.

[**] The "object" is either vmlinux or module. I took this naming from
     include/linux/livepatch.h and kernel/livepatch/*.
     The livepatch code needs to handle vmlinux name a special way
     as well. Also notice that the livepatch code uses klp_ prefix ;-)

Best Regards,
Petr
Chris Down Feb. 16, 2021, 5:18 p.m. UTC | #6
Petr Mladek writes:
>> +static size_t printk_fmt_size(const char *fmt)
>> +{
>> +	size_t sz = strlen(fmt) + 1;
>> +
>> +	/*
>> +	 * Some printk formats don't start with KERN_SOH + level. We will add
>> +	 * it later when rendering the output.
>> +	 */
>> +	if (unlikely(fmt[0] != KERN_SOH_ASCII))
>> +		sz += 2;
>
>This approach is hard to maintain. It might be pretty hard and error
>prone to count the size if we want to provide more information.
>
>There are many files in debugfs with not-well defined size.
>They are opened by seq_open_private(). It allows to add
>a line by line by an iterator.

Hmm, this is optional -- it was just to avoid seq_file having to realloc the 
buffer. I originally used an iterator and I'm happy to go back to it if it 
proves more convenient.

>We should revert the changes when the file could not get crated.
>It does not make sense to keep the structure when the file is not
>there.

See the reply from gregkh on v2, who was quite insistent that we should not 
check debugfs error codes. I'm happy to do either, but I can't please you both 
:-)

>I guess that remove_printk_fmt_sec() would even crash when
>ps->file was set to an error code.

debugfs checks if its input is an error, so it shouldn't, unless that's not 
what you're referring to?

>> +}
>> +
>> +#ifdef CONFIG_MODULES
>> +static void remove_printk_fmt_sec(struct module *mod)
>> +{
>> +	struct printk_fmt_sec *ps = NULL;
>> +
>> +	if (WARN_ON_ONCE(!mod))
>> +		return;
>> +
>> +	mutex_lock(&printk_fmts_mutex);
>> +
>> +	ps = find_printk_fmt_sec(mod);
>> +	if (!ps) {
>> +		mutex_unlock(&printk_fmts_mutex);
>> +		return;
>> +	}
>> +
>> +	hash_del(&ps->hnode);
>> +
>> +	mutex_unlock(&printk_fmts_mutex);
>> +
>> +	debugfs_remove(ps->file);
>
>IMHO, we should remove the file before we remove the way how
>to read it. This should be done in the opposite order
>than in store_printk_fmt_sec().

There is a subtle issue with doing this as-is: debugfs_remove(ps->file) cannot 
be called under printk_fmts_mutex, because we may deadlock due to a pinned 
debugfs refcnt if debugfs_remove() and _show happen at the same time.

Imagine we go into remove_printk_fmt_sec and grab printk_fmts_lock. On another 
CPU, we call _show for the same file, which takes a reference in debugfs, but 
it will stall waiting for printk_fmts_lock. Now we go back into 
remove_printk_fmt_sec and can't make any forward progress, because 
debugfs_remove will stall until all reference holders have finished, and there 
is a deadlock.

That's the reason that debugfs_remove() must be called after we have already 
finished with the mutex and have the printk_fmt_sec, since we need to know that 
it's still valid, and we also need to not be under it at the time of removal.

One way to do what you're asking might be to have a flag in the printk_fmt_sec 
which indicates that we are freeing something, and then take and release the 
lock twice in remove_printk_fmt_sec. Personally, I feel indifferent to either 
the current solution or something like that, but if you have a preference for 
adding a flag or another similar solution, that's fine with me. Just let me 
know. :-)
Chris Down Feb. 16, 2021, 5:27 p.m. UTC | #7
Petr Mladek writes:
>I wonder if we could find a better name for the configure switch.
>I have troubles to imagine what printk enumeration might mean.
>Well, it might be because I am not a native speaker.
>
>Anyway, the word "enumeration" is used only in the configure option.
>Everything else is "printk_fmt"
>
>What about DEBUG_PRINTK_FORMATS?

Hmm, I don't like DEBUG_PRINTK_FMTS because it's not about debugging, it's 
about enumeration, I guess :-)

The name should reflect that this catalogues the available printks in the 
kernel -- "debugging" seems to imply something different.

I'm ok with a different name like "printk catalogue" or something like that if 
you prefer. Personally I think "printk enumeration" is fairly clear -- it's 
about enumerating the available printks -- but anything that captures that 
spirit is fine.

>printk.c is already too big. Please, implement this feature in a
>separate source file, e.g. kernel/printk/debug_formats.c.

Sure, that's fine.

>> struct printk_fmt_sec {
>> +	struct hlist_node hnode;
>> +	struct module *module;
>
>Please, use "struct module *mod". It is a more common.
>
>> +	struct dentry *file;
>> +	const char **start;
>> +	const char **end;
>> +};
>
>Please, document the meaning of the fields, ideally using the doc
>style or how is the style called:
>
>/**
> * struct printk_fmt_sec -
> * @hnode:	node for the hash table
> * @new_func:	pointer to the patched function code

Roger to both. :-)

>> +
>> +/* The base dir for module formats, typically debugfs/printk/formats/ */
>> +struct dentry *dfs_formats;
>> +
>> +/*
>> + * Stores .printk_fmt section boundaries for vmlinux and all loaded modules.
>> + * Add entries with store_printk_fmt_sec, remove entries with
>> + * remove_printk_fmt_sec.
>> + */
>> +static DEFINE_HASHTABLE(printk_fmts_mod_sections, 8);

>The hash table looks like an overkill. This is slow path. There are
>typically only tens of loaded modules. Even the module loader
>uses plain list for iterating the list of modules.

I don't think it's overkill -- we have prod systems with hundreds. Hell, even 
my laptop has over 150 loaded. If someone needs to walk all of the files in 
debugfs, it seems unreasonable to do an O(n^2) walk when an O(n) one would 
suffice.

Unless you have a practical concern, I think this is a distinct case from the 
module loader with its own unique requirements, so I'd prefer to use the hash 
table.

>> +
>> +/* Protects printk_fmts_mod_sections */
>> +static DEFINE_MUTEX(printk_fmts_mutex);
>
>What is the rule for using "printk_fmts" and "printk_fmt", please?
>I can't find the system here ;-)
>
>Anyway, I would prefer to use "printk_fmt" everywhere.
>Or maybe even "pf_".

pf_ sounds fine. :-)

>> +
>> +static const char *ps_get_module_name(const struct printk_fmt_sec *ps);
>> +static int debugfs_pf_open(struct inode *inode, struct file *file);
>
>There are used many different:
>
>   + shortcuts: fmt, fmts, ps, fmt_sec, dfs
>
>   + styles/ordering: ps_get_module_name() vs.
>		      find_printk_fmt_sec() vs.
>		      printk_fmt_size() vs.
>		      debugfs_pf_open()
>
>It might be bearable because there is not much code. But it would
>still help a lot when we make it more consistent. Many subsystems
>prefer using a feature-specific prefix.
>
>It might be "printk_fmt_" or "pf_" [*] in this case. And we could use
>names like:
>
>	+ struct pf_object [**]
>	+ pf_get_object_name()
>	+ pf_find_object()
>	+ pf_fops,
>	+ pf_open()
>	+ pf_release()
>	+ pf_mutex,
>	+ pf_add_object()
>	+ pf_remove_object()
>	+ pf_module_notify

Oh, I meant to change the name for v4 but neglected to do so. Sounds good, will 
do.
Johannes Weiner Feb. 16, 2021, 9 p.m. UTC | #8
On Tue, Feb 16, 2021 at 05:27:08PM +0000, Chris Down wrote:
> Petr Mladek writes:
> > I wonder if we could find a better name for the configure switch.
> > I have troubles to imagine what printk enumeration might mean.
> > Well, it might be because I am not a native speaker.
> > 
> > Anyway, the word "enumeration" is used only in the configure option.
> > Everything else is "printk_fmt"
> > 
> > What about DEBUG_PRINTK_FORMATS?
> 
> Hmm, I don't like DEBUG_PRINTK_FMTS because it's not about debugging, it's
> about enumeration, I guess :-)
> 
> The name should reflect that this catalogues the available printks in the
> kernel -- "debugging" seems to imply something different.
> 
> I'm ok with a different name like "printk catalogue" or something like that
> if you prefer. Personally I think "printk enumeration" is fairly clear --
> it's about enumerating the available printks -- but anything that captures
> that spirit is fine.

How about config PRINTK_INDEX?
Chris Down Feb. 16, 2021, 9:05 p.m. UTC | #9
Johannes Weiner writes:
>On Tue, Feb 16, 2021 at 05:27:08PM +0000, Chris Down wrote:
>> Petr Mladek writes:
>> > I wonder if we could find a better name for the configure switch.
>> > I have troubles to imagine what printk enumeration might mean.
>> > Well, it might be because I am not a native speaker.
>> >
>> > Anyway, the word "enumeration" is used only in the configure option.
>> > Everything else is "printk_fmt"
>> >
>> > What about DEBUG_PRINTK_FORMATS?
>>
>> Hmm, I don't like DEBUG_PRINTK_FMTS because it's not about debugging, it's
>> about enumeration, I guess :-)
>>
>> The name should reflect that this catalogues the available printks in the
>> kernel -- "debugging" seems to imply something different.
>>
>> I'm ok with a different name like "printk catalogue" or something like that
>> if you prefer. Personally I think "printk enumeration" is fairly clear --
>> it's about enumerating the available printks -- but anything that captures
>> that spirit is fine.
>
>How about config PRINTK_INDEX?

Ah yes, I also like that. PRINTK_INDEX is fine from my perspective and is more 
straightforward than "enumeration", thanks.
Petr Mladek Feb. 17, 2021, 3:35 p.m. UTC | #10
On Tue 2021-02-16 17:18:58, Chris Down wrote:
> Petr Mladek writes:
> > > +static size_t printk_fmt_size(const char *fmt)
> > > +{
> > > +	size_t sz = strlen(fmt) + 1;
> > > +
> > > +	/*
> > > +	 * Some printk formats don't start with KERN_SOH + level. We will add
> > > +	 * it later when rendering the output.
> > > +	 */
> > > +	if (unlikely(fmt[0] != KERN_SOH_ASCII))
> > > +		sz += 2;
> > 
> > This approach is hard to maintain. It might be pretty hard and error
> > prone to count the size if we want to provide more information.
> > 
> > There are many files in debugfs with not-well defined size.
> > They are opened by seq_open_private(). It allows to add
> > a line by line by an iterator.
> 
> Hmm, this is optional -- it was just to avoid seq_file having to realloc the
> buffer. I originally used an iterator and I'm happy to go back to it if it
> proves more convenient.

Please, go back to iterators ;-)

> > We should revert the changes when the file could not get crated.
> > It does not make sense to keep the structure when the file is not
> > there.
> 
> See the reply from gregkh on v2, who was quite insistent that we should not
> check debugfs error codes. I'm happy to do either, but I can't please you
> both :-)

I see. OK, it typically does not make any sense to handle the error.
The API is safe enough to handle ERR_PTR values. The files allow
to show something. The user will notice when they are not created.

But I believe that our case is different. We need to allocate
a structure, put it into hash table, to be able to show the content
by the debugfs interface. The only purpose of the structure is
to provide information for the debugfs file.

It does not make sense to keep the structure when the file was not
created.

Also it might make sense to print an error to inform the user
the some printk formats might be missing. Well, the error might
make sense only when format for others are shown.

That said, I do not resist on the error handling. The code will
be safe even without it.


> > I guess that remove_printk_fmt_sec() would even crash when
> > ps->file was set to an error code.
> 
> debugfs checks if its input is an error, so it shouldn't, unless that's not
> what you're referring to?

Good to know.

> > > +}
> > > +
> > > +#ifdef CONFIG_MODULES
> > > +static void remove_printk_fmt_sec(struct module *mod)
> > > +{
> > > +	struct printk_fmt_sec *ps = NULL;
> > > +
> > > +	if (WARN_ON_ONCE(!mod))
> > > +		return;
> > > +
> > > +	mutex_lock(&printk_fmts_mutex);
> > > +
> > > +	ps = find_printk_fmt_sec(mod);
> > > +	if (!ps) {
> > > +		mutex_unlock(&printk_fmts_mutex);
> > > +		return;
> > > +	}
> > > +
> > > +	hash_del(&ps->hnode);
> > > +
> > > +	mutex_unlock(&printk_fmts_mutex);
> > > +
> > > +	debugfs_remove(ps->file);
> > 
> > IMHO, we should remove the file before we remove the way how
> > to read it. This should be done in the opposite order
> > than in store_printk_fmt_sec().
> 
> There is a subtle issue with doing this as-is: debugfs_remove(ps->file)
> cannot be called under printk_fmts_mutex, because we may deadlock due to a
> pinned debugfs refcnt if debugfs_remove() and _show happen at the same time.

Do we need to call debugfs_remove(ps->file) under printk_fmts_mutex?

> Imagine we go into remove_printk_fmt_sec and grab printk_fmts_lock. On
> another CPU, we call _show for the same file, which takes a reference in
> debugfs, but it will stall waiting for printk_fmts_lock. Now we go back into
> remove_printk_fmt_sec and can't make any forward progress, because
> debugfs_remove will stall until all reference holders have finished, and
> there is a deadlock.

Sure. But this will not happen when debugfs_remove(ps->file) was
called without the mutex. It is safe. The code will wait here
until all references are released and the file is gone.

It will prevent an non-necessary error when reading file:

CPU0:					CPU1

monitor_check()
  cat debug/printk/formats/*
    open(debug/printk/formats/moduleA)
      # successfully opened the file
      # took reference count

					rmmod moduleA
					  remove_printk_fmt_sec()
					    mutex_lock(&printk_fmts_mutex);
					    hash_del(&ps->hnode);
					    mutex_unlock(&printk_fmts_mutex);

					    debugfs_remove(ps->file);

	debugfs_pf_show()
	  mutex_lock(&printk_fmts_mutex);
	  ps = find_printk_fmt_sec(mod);
	  if (unlikely(!ps)) {
		ret = -ENOENT;

RESTULT: There was no deadlock. The file was opened but the show
	 callback failed.

Best Regards,
Petr
Petr Mladek Feb. 17, 2021, 3:45 p.m. UTC | #11
On Tue 2021-02-16 21:05:48, Chris Down wrote:
> Johannes Weiner writes:
> > On Tue, Feb 16, 2021 at 05:27:08PM +0000, Chris Down wrote:
> > > Petr Mladek writes:
> > > > I wonder if we could find a better name for the configure switch.
> > > > I have troubles to imagine what printk enumeration might mean.
> > > > Well, it might be because I am not a native speaker.
> > > >
> > > > Anyway, the word "enumeration" is used only in the configure option.
> > > > Everything else is "printk_fmt"
> > > >
> > > > What about DEBUG_PRINTK_FORMATS?
> > > 
> > > Hmm, I don't like DEBUG_PRINTK_FMTS because it's not about debugging, it's
> > > about enumeration, I guess :-)
> > > 
> > > The name should reflect that this catalogues the available printks in the
> > > kernel -- "debugging" seems to imply something different.
> > > 
> > > I'm ok with a different name like "printk catalogue" or something like that
> > > if you prefer. Personally I think "printk enumeration" is fairly clear --
> > > it's about enumerating the available printks -- but anything that captures
> > > that spirit is fine.
> > 
> > How about config PRINTK_INDEX?
> 
> Ah yes, I also like that. PRINTK_INDEX is fine from my perspective and is
> more straightforward than "enumeration", thanks.

It is better than enumeration. But there is still the same
problem. The word "index" is used neither in the code
nor in the debugfs interface. It is like enabling cars and
seeing apples.

What about CONFIG_PRINTK_DEBUGFS?

It seems that various subsystems use CONFIG_<SUBSYSTEM>_DEBUGFS
pattern when they expose some internals in debugfs.

Best Regards,
Petr
Chris Down Feb. 17, 2021, 3:49 p.m. UTC | #12
Petr Mladek writes:
>> > > +	debugfs_remove(ps->file);
>> >
>> > IMHO, we should remove the file before we remove the way how
>> > to read it. This should be done in the opposite order
>> > than in store_printk_fmt_sec().
>>
>> There is a subtle issue with doing this as-is: debugfs_remove(ps->file)
>> cannot be called under printk_fmts_mutex, because we may deadlock due to a
>> pinned debugfs refcnt if debugfs_remove() and _show happen at the same time.
>
>Do we need to call debugfs_remove(ps->file) under printk_fmts_mutex?

Ah, my concern was simultaneous entries into remove_printk_fmt_sec (which would 
require setting a separate flag under the mutex), but now I think about it, the 
module notifier synchronously waits, so that can't happen anyway.

As such it should be safe to just do:

remove()
{
	mutex_lock(&printk_fmts_mutex);
	ps = find_printk_fmt_sec();
	mutex_unlock(&printk_fmts_mutex);

	if (!ps)
		return;

	/* waits for _show */
	debugfs_remove(ps->file);

	mutex_lock(&printk_fmts_mutex);
	/* Do the data structure teardown */
	mutex_unlock(&printk_fmts_mutex);
}

Sounds good to me, I'll do that for v5. Thanks! :-)
Chris Down Feb. 17, 2021, 3:56 p.m. UTC | #13
Petr Mladek writes:
>> > How about config PRINTK_INDEX?
>>
>> Ah yes, I also like that. PRINTK_INDEX is fine from my perspective and is
>> more straightforward than "enumeration", thanks.
>
>It is better than enumeration. But there is still the same
>problem. The word "index" is used neither in the code
>nor in the debugfs interface. It is like enabling cars and
>seeing apples.
>
>What about CONFIG_PRINTK_DEBUGFS?
>
>It seems that various subsystems use CONFIG_<SUBSYSTEM>_DEBUGFS
>pattern when they expose some internals in debugfs.

The thing I don't like about that is that it describes a largely
inconsequential implementation detail rather than the semantic intent of the
config change, which is what the person deciding what to include in their
config is likely to care about.  Often when I see "XXX debug interface" when
doing `make oldconfig` I think to myself "yes, but what does the debugfs
interface _do_?".

If someone else was writing this patch, and I saw "CONFIG_PRINTK_DEBUGFS"
appear in my prod kernel, I'd probably say N, because I don't need printk
debugging information. On the other hand, if I saw "CONFIG_PRINTK_INDEX", I'd
immediately understand that it's probably applicable to me.

I'm happy to rename the debugfs structure as <debugfs>/printk/fmt_index if it
helps, but personally I really feel CONFIG_PRINTK_{INDEX,ENUMERATION,CATALOGUE}
is a lot more descriptive than just saying "it has a debugfs interface" in the
config name for that reason.
Johannes Weiner Feb. 17, 2021, 4 p.m. UTC | #14
On Wed, Feb 17, 2021 at 04:45:51PM +0100, Petr Mladek wrote:
> On Tue 2021-02-16 21:05:48, Chris Down wrote:
> > Johannes Weiner writes:
> > > On Tue, Feb 16, 2021 at 05:27:08PM +0000, Chris Down wrote:
> > > > Petr Mladek writes:
> > > > > I wonder if we could find a better name for the configure switch.
> > > > > I have troubles to imagine what printk enumeration might mean.
> > > > > Well, it might be because I am not a native speaker.
> > > > >
> > > > > Anyway, the word "enumeration" is used only in the configure option.
> > > > > Everything else is "printk_fmt"
> > > > >
> > > > > What about DEBUG_PRINTK_FORMATS?
> > > > 
> > > > Hmm, I don't like DEBUG_PRINTK_FMTS because it's not about debugging, it's
> > > > about enumeration, I guess :-)
> > > > 
> > > > The name should reflect that this catalogues the available printks in the
> > > > kernel -- "debugging" seems to imply something different.
> > > > 
> > > > I'm ok with a different name like "printk catalogue" or something like that
> > > > if you prefer. Personally I think "printk enumeration" is fairly clear --
> > > > it's about enumerating the available printks -- but anything that captures
> > > > that spirit is fine.
> > > 
> > > How about config PRINTK_INDEX?
> > 
> > Ah yes, I also like that. PRINTK_INDEX is fine from my perspective and is
> > more straightforward than "enumeration", thanks.
> 
> It is better than enumeration. But there is still the same
> problem. The word "index" is used neither in the code
> nor in the debugfs interface. It is like enabling cars and
> seeing apples.

I assumed code and interface would then also be changed to
printk_index, struct printk_index_section, pi_foo, ...
Petr Mladek Feb. 17, 2021, 4:09 p.m. UTC | #15
On Tue 2021-02-16 17:27:08, Chris Down wrote:
> Petr Mladek writes:
> > > +/*
> > > + * Stores .printk_fmt section boundaries for vmlinux and all loaded modules.
> > > + * Add entries with store_printk_fmt_sec, remove entries with
> > > + * remove_printk_fmt_sec.
> > > + */
> > > +static DEFINE_HASHTABLE(printk_fmts_mod_sections, 8);
> 
> > The hash table looks like an overkill. This is slow path. There are
> > typically only tens of loaded modules. Even the module loader
> > uses plain list for iterating the list of modules.
> 
> I don't think it's overkill -- we have prod systems with hundreds. Hell,
> even my laptop has over 150 loaded. If someone needs to walk all of the
> files in debugfs, it seems unreasonable to do an O(n^2) walk when an O(n)
> one would suffice.
> 
> Unless you have a practical concern, I think this is a distinct case from
> the module loader with its own unique requirements, so I'd prefer to use the
> hash table.

OK, it is true that the module API is either called with a particular
struct module pointer. Or it has to iterate over all modules anyway,
for example, when looking for a symbol.

Well, do we need access to struct module at all?

What about storing the pointer to struct pf_object into
struct printk_fmt_sec *ps into the s->file->f_inode->i_private?
Then we would not need any global list/table at all.

> > > +
> > > +/* Protects printk_fmts_mod_sections */
> > > +static DEFINE_MUTEX(printk_fmts_mutex);
> > 
> > What is the rule for using "printk_fmts" and "printk_fmt", please?
> > I can't find the system here ;-)
> > 
> > Anyway, I would prefer to use "printk_fmt" everywhere.
> > Or maybe even "pf_".
> 
> pf_ sounds fine. :-)
> 
> > > +
> > > +static const char *ps_get_module_name(const struct printk_fmt_sec *ps);
> > > +static int debugfs_pf_open(struct inode *inode, struct file *file);
> > 
> > There are used many different:
> > 
> >   + shortcuts: fmt, fmts, ps, fmt_sec, dfs
> > 
> >   + styles/ordering: ps_get_module_name() vs.
> > 		      find_printk_fmt_sec() vs.
> > 		      printk_fmt_size() vs.
> > 		      debugfs_pf_open()
> > 
> > It might be bearable because there is not much code. But it would
> > still help a lot when we make it more consistent. Many subsystems
> > prefer using a feature-specific prefix.
> > 
> > It might be "printk_fmt_" or "pf_" [*] in this case. And we could use
> > names like:
> > 
> > 	+ struct pf_object [**]
> > 	+ pf_get_object_name()
> > 	+ pf_find_object()
> > 	+ pf_fops,
> > 	+ pf_open()
> > 	+ pf_release()
> > 	+ pf_mutex,
> > 	+ pf_add_object()
> > 	+ pf_remove_object()
> > 	+ pf_module_notify
> 
> Oh, I meant to change the name for v4 but neglected to do so. Sounds good,
> will do.

Thanks a lot. I am sorry that I ask you to do so many changes.
I talked about the style early enough to make the review easy.
Also I think that it is not ideal and annoing to do these
mass changes and refactoring when the code is already reviewed,
tested, and functional.

Best Regards,
Petr
Chris Down Feb. 17, 2021, 4:25 p.m. UTC | #16
Petr Mladek writes:
>What about storing the pointer to struct pf_object into
>struct printk_fmt_sec *ps into the s->file->f_inode->i_private?
>Then we would not need any global list/table at all.

Unless I'm misreading the debugfs code, I think the following is possible:

open(f);
   debugfs_file_get(f);
   fops->open();
     inode->private = ps;
   debugfs_file_put(f);

remove_printk_fmt_sec(); /* kfree ps */

read(f);
   debugfs_file_get(f);
   fops->read();
     ps = inode->private;  /* invalid */
   debugfs_file_put(f);

That's the reason why the code looks up from the module address again during 
_read. Maybe I'm missing something? :-)

>> Oh, I meant to change the name for v4 but neglected to do so. Sounds good,
>> will do.
>
>Thanks a lot. I am sorry that I ask you to do so many changes.
>I talked about the style early enough to make the review easy.
>Also I think that it is not ideal and annoing to do these
>mass changes and refactoring when the code is already reviewed,
>tested, and functional.

Quite the opposite: thanks a lot for taking so much time to provide valuable 
feedback :-) As someone who mostly works on mm code, having you to provide 
feedback as printk maintainer is really helpful. Even if we disagree on some 
stuff, it's really important that we have a good shared understanding of what 
we eventually agree upon.
Chris Down Feb. 17, 2021, 4:32 p.m. UTC | #17
Chris Down writes:
>open(f);
>  debugfs_file_get(f);
>  fops->open();
>    inode->private = ps;
>  debugfs_file_put(f);
>
>remove_printk_fmt_sec(); /* kfree ps */
>
>read(f);
>  debugfs_file_get(f);
>  fops->read();
>    ps = inode->private;  /* invalid */
>  debugfs_file_put(f);

Er, sorry, inode->private is populated at creation time, not at open(). The 
same general concern applies though -- as far as I can tell there's some period 
where we may be able to _read() and `ps` has already been freed.
Petr Mladek Feb. 18, 2021, 10:45 a.m. UTC | #18
On Wed 2021-02-17 16:32:21, Chris Down wrote:
> Chris Down writes:
> > open(f);
> >  debugfs_file_get(f);
> >  fops->open();
> >    inode->private = ps;
> >  debugfs_file_put(f);
> > 
> > remove_printk_fmt_sec(); /* kfree ps */
> > 
> > read(f);
> >  debugfs_file_get(f);
> >  fops->read();
> >    ps = inode->private;  /* invalid */
> >  debugfs_file_put(f);
> 
> Er, sorry, inode->private is populated at creation time, not at open(). The
> same general concern applies though -- as far as I can tell there's some
> period where we may be able to _read() and `ps` has already been freed.

Honestly, I am a bit lost here. Also I have realized that you needed to
release the struct from the module going notifier. And there you have
only pointer to struct module.

The thing is that I do not see similar tricks anywhere else. Well, other
users are easier because they create the debugfs file for it own purpose.
In our case, we actually create the file for another module.

Anyway, we are going to use the seq_buf iterator API. IMHO, the
seq_buf iterator functions should be able to get the structure
directly via the data pointer.

I wonder if it is similar to proc_seq_open() and proc_seq_release().
It is using the seq_buf iterator as well. It is created used
by proc_create_seq_private(). This API is used, for example,
in decnet_init(). It is a module, so there must be the similar
problems. All data are gone when the module is removed.

The only remaining problem is the module going notifier. For this
purpose, we could store the pointer to struct module. There
are many other fields for similar purposes. I am pretty sure that
the module loader maintainer will agree.

The result will be using some existing patterns. It should reduce
problems with possible races. It should make the life easier for
all involved APIs.

Best Regards,
Petr
Petr Mladek Feb. 18, 2021, 10:58 a.m. UTC | #19
On Wed 2021-02-17 15:56:38, Chris Down wrote:
> Petr Mladek writes:
> > > > How about config PRINTK_INDEX?
> > > 
> > > Ah yes, I also like that. PRINTK_INDEX is fine from my perspective and is
> > > more straightforward than "enumeration", thanks.
> > 
> > It is better than enumeration. But there is still the same
> > problem. The word "index" is used neither in the code
> > nor in the debugfs interface. It is like enabling cars and
> > seeing apples.
> > 
> > What about CONFIG_PRINTK_DEBUGFS?
> > 
> > It seems that various subsystems use CONFIG_<SUBSYSTEM>_DEBUGFS
> > pattern when they expose some internals in debugfs.
> 
> The thing I don't like about that is that it describes a largely
> inconsequential implementation detail rather than the semantic intent of the
> config change, which is what the person deciding what to include in their
> config is likely to care about.  Often when I see "XXX debug interface" when
> doing `make oldconfig` I think to myself "yes, but what does the debugfs
> interface _do_?".

I see.

> If someone else was writing this patch, and I saw "CONFIG_PRINTK_DEBUGFS"
> appear in my prod kernel, I'd probably say N, because I don't need printk
> debugging information. On the other hand, if I saw "CONFIG_PRINTK_INDEX", I'd
> immediately understand that it's probably applicable to me.
> 
> I'm happy to rename the debugfs structure as <debugfs>/printk/fmt_index if it
> helps, but personally I really feel CONFIG_PRINTK_{INDEX,ENUMERATION,CATALOGUE}
> is a lot more descriptive than just saying "it has a debugfs interface" in the
> config name for that reason.

PRINTK_INDEX sounds the best to me. Keep in mind that I am not a
native speaker.

And my concern will be gone when we use it also in the API and debugfs
hierarchy as suggested by Johannes.

Another compromise might be to have CONFIG_PRINTK_FORMATS_INDEX.
Then the prefix printk_format_, pf_ would still match the option.
Or we could use printk_format_index_m, pfi_ indexes.

Best Regards,
Petr

PS: I feel that I have enough bike-shading. I think that I will be
    fine with anything that you choose ;-)
Chris Down Feb. 18, 2021, 12:21 p.m. UTC | #20
Thanks for all your feedback, Petr and Steven. :-)

Petr, I believe this is a comprehensive checklist of everything we discussed 
for v5 -- any chance you could double check I'm not missing anything you folks 
wanted? Thanks!

- Use seq_file iterator again instead of simple_open + size
- Remove debugfs file first to avoid ENOENT
- Tear down datastructures if debugfs fails
- Human readable output format
- Display file/line
- Rename to CONFIG_PRINTK_INDEX, or... something
- Move to another file, kernel/printk/debug_formats.c or similar
- Use `struct module *mod` instead of calling it module
- Add documentation for printk_fmt_sec (or whatever it will be called)
- Rename things to pf_, pi_, or something
- See if it's safe to pass a printk_fmt_sec to seq_file instead of a module
- Handle cont + level
- Don't expose level/KERN_SOH directly
Petr Mladek Feb. 18, 2021, 12:37 p.m. UTC | #21
On Thu 2021-02-18 12:21:55, Chris Down wrote:
> Thanks for all your feedback, Petr and Steven. :-)
> 
> Petr, I believe this is a comprehensive checklist of everything we discussed
> for v5 -- any chance you could double check I'm not missing anything you
> folks wanted? Thanks!
> 
> - Use seq_file iterator again instead of simple_open + size
> - Remove debugfs file first to avoid ENOENT
> - Tear down datastructures if debugfs fails
> - Human readable output format
> - Display file/line
> - Rename to CONFIG_PRINTK_INDEX, or... something
> - Move to another file, kernel/printk/debug_formats.c or similar

Just to be sure. The filename should be ideally based on the configure
option and API names, e.g. formats_index.c or so.

The printk_ prefix is not strictly necessary. The file is in printk/
directory. IMHO, we should have used ringbuffer.c but ...


> - Use `struct module *mod` instead of calling it module
> - Add documentation for printk_fmt_sec (or whatever it will be called)
> - Rename things to pf_, pi_, or something
> - See if it's safe to pass a printk_fmt_sec to seq_file instead of a module

Also it might be needed to store the pointer to struct module.

Both things together might allow to remove the global hash table and likely
even the mutex.

> - Handle cont + level
> - Don't expose level/KERN_SOH directly

I can't remember anything else. I am curious how v5 would look like.

Best Regards,
Petr
Chris Down Feb. 18, 2021, 12:41 p.m. UTC | #22
Petr Mladek writes:
>> - Move to another file, kernel/printk/debug_formats.c or similar
>
>Just to be sure. The filename should be ideally based on the configure
>option and API names, e.g. formats_index.c or so.
>
>The printk_ prefix is not strictly necessary. The file is in printk/
>directory. IMHO, we should have used ringbuffer.c but ...

Sure thing.

>> - Use `struct module *mod` instead of calling it module
>> - Add documentation for printk_fmt_sec (or whatever it will be called)
>> - Rename things to pf_, pi_, or something
>> - See if it's safe to pass a printk_fmt_sec to seq_file instead of a module
>
>Also it might be needed to store the pointer to struct module.

You mean, have a `struct module` entry for this? I somewhat suspect that 
module.c maintainers are not likely to be happy about injecting non-generic 
code into there if it's possible to be avoided, but maybe I'm misunderstanding? 

>Both things together might allow to remove the global hash table and likely
>even the mutex.
>
>> - Handle cont + level
>> - Don't expose level/KERN_SOH directly
>
>I can't remember anything else. I am curious how v5 would look like.

You can join the club on that one... ;-)

Let me know if I understood you correctly on the `struct module` point, and 
after that I'll start work on v5. Thanks!
Petr Mladek Feb. 18, 2021, 2:25 p.m. UTC | #23
On Thu 2021-02-18 12:41:39, Chris Down wrote:
> Petr Mladek writes:
> > > - See if it's safe to pass a printk_fmt_sec to seq_file instead of a module
> > 
> > Also it might be needed to store the pointer to struct module.
> 
> You mean, have a `struct module` entry for this? I somewhat suspect that
> module.c maintainers are not likely to be happy about injecting non-generic
> code into there if it's possible to be avoided, but maybe I'm
> misunderstanding?

Yes, I suggest to store the pointer into struct module. It includes
many external entries. It is similar to struct task_struct.

I am active also in the kernel livepatch subsystem. We have added
there three values:

#ifdef CONFIG_LIVEPATCH
	bool klp; /* Is this a livepatch module? */
	bool klp_alive;

	/* Elf information */
	struct klp_modinfo *klp_info;
#endif

Many other subsystems have their stuff there, for example:

#ifdef CONFIG_TRACING
	unsigned int num_trace_bprintk_fmt;
	const char **trace_bprintk_fmt_start;
#endif
#ifdef CONFIG_EVENT_TRACING
	struct trace_event_call **trace_events;
	unsigned int num_trace_events;
	struct trace_eval_map **trace_evals;
	unsigned int num_trace_evals;
#endif
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
	unsigned int num_ftrace_callsites;
	unsigned long *ftrace_callsites;
#endif
#ifdef CONFIG_KPROBES
	void *kprobes_text_start;
	unsigned int kprobes_text_size;
	unsigned long *kprobe_blacklist;
	unsigned int num_kprobe_blacklist;
#endif

BTW: Jessica originally worked on the kernel livepatching.
     She became module loader code maintainer because we needed
     even more hacks there and the original maintainer got
     busy with other stuff at that time ;-)

     I am pretty sure that she would accept it. We need a per-module
     value. It is not necessary to maintain separate global list/hash
     table just to store these values.

Best Regards,
Petr
Chris Down Feb. 18, 2021, 3:53 p.m. UTC | #24
Petr Mladek writes:
>On Thu 2021-02-18 12:41:39, Chris Down wrote:
>> Petr Mladek writes:
>> > > - See if it's safe to pass a printk_fmt_sec to seq_file instead of a module
>> >
>> > Also it might be needed to store the pointer to struct module.
>>
>> You mean, have a `struct module` entry for this? I somewhat suspect that
>> module.c maintainers are not likely to be happy about injecting non-generic
>> code into there if it's possible to be avoided, but maybe I'm
>> misunderstanding?
>
>Yes, I suggest to store the pointer into struct module. It includes
>many external entries. It is similar to struct task_struct.

Ah yes, now you mention it that does look fine. Thanks!

Expect v5 in the coming days, then. :-)

Patch
diff mbox series

diff --git a/arch/arm/kernel/entry-v7m.S b/arch/arm/kernel/entry-v7m.S
index d0e898608d30..7bde93c10962 100644
--- a/arch/arm/kernel/entry-v7m.S
+++ b/arch/arm/kernel/entry-v7m.S
@@ -23,7 +23,7 @@  __invalid_entry:
 	adr	r0, strerr
 	mrs	r1, ipsr
 	mov	r2, lr
-	bl	printk
+	bl	_printk
 #endif
 	mov	r0, sp
 	bl	show_regs
diff --git a/arch/arm/lib/backtrace-clang.S b/arch/arm/lib/backtrace-clang.S
index 6174c45f53a5..5b2cdb1003e3 100644
--- a/arch/arm/lib/backtrace-clang.S
+++ b/arch/arm/lib/backtrace-clang.S
@@ -202,7 +202,7 @@  finished_setup:
 1006:		adr	r0, .Lbad
 		mov	r1, loglvl
 		mov	r2, frame
-		bl	printk
+		bl	_printk
 no_frame:	ldmfd	sp!, {r4 - r9, fp, pc}
 ENDPROC(c_backtrace)
 		.pushsection __ex_table,"a"
diff --git a/arch/arm/lib/backtrace.S b/arch/arm/lib/backtrace.S
index 872f658638d9..e8408f22d4dc 100644
--- a/arch/arm/lib/backtrace.S
+++ b/arch/arm/lib/backtrace.S
@@ -103,7 +103,7 @@  for_each_frame:	tst	frame, mask		@ Check for address exceptions
 1006:		adr	r0, .Lbad
 		mov	r1, loglvl
 		mov	r2, frame
-		bl	printk
+		bl	_printk
 no_frame:	ldmfd	sp!, {r4 - r9, pc}
 ENDPROC(c_backtrace)
 		
diff --git a/arch/arm/mach-rpc/io-acorn.S b/arch/arm/mach-rpc/io-acorn.S
index b9082a2a2a01..aa9bf0d771c0 100644
--- a/arch/arm/mach-rpc/io-acorn.S
+++ b/arch/arm/mach-rpc/io-acorn.S
@@ -25,4 +25,4 @@  ENTRY(insl)
 ENTRY(outsl)
 		adr	r0, .Liosl_warning
 		mov	r1, lr
-		b	printk
+		b	_printk
diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index d5837bf05a9a..6f7926c9c179 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -23,7 +23,7 @@ 
 #ifdef DEBUG
 	stmfd	sp!, {r0-r3, ip, lr}
 	ldr	r0, =1f
-	bl	printk
+	bl	_printk
 	ldmfd	sp!, {r0-r3, ip, lr}
 
 	.pushsection .rodata, "a"
@@ -38,7 +38,7 @@ 
 	stmfd	sp!, {r0-r3, ip, lr}
 	mov	r1, \arg
 	ldr	r0, =1f
-	bl	printk
+	bl	_printk
 	ldmfd	sp!, {r0-r3, ip, lr}
 
 	.pushsection .rodata, "a"
@@ -55,7 +55,7 @@ 
 	mov	r2, \arg2
 	mov	r1, \arg1
 	ldr	r0, =1f
-	bl	printk
+	bl	_printk
 	ldmfd	sp!, {r0-r3, ip, lr}
 
 	.pushsection .rodata, "a"
diff --git a/arch/ia64/include/uapi/asm/cmpxchg.h b/arch/ia64/include/uapi/asm/cmpxchg.h
index 5d90307fd6e0..d96438322d13 100644
--- a/arch/ia64/include/uapi/asm/cmpxchg.h
+++ b/arch/ia64/include/uapi/asm/cmpxchg.h
@@ -139,9 +139,9 @@  extern long ia64_cmpxchg_called_with_bad_pointer(void);
 do {									\
 	if (_cmpxchg_bugcheck_count-- <= 0) {				\
 		void *ip;						\
-		extern int printk(const char *fmt, ...);		\
+		extern int _printk(const char *fmt, ...);		\
 		ip = (void *) ia64_getreg(_IA64_REG_IP);		\
-		printk("CMPXCHG_BUGCHECK: stuck at %p on word %p\n", ip, (v));\
+		_printk("CMPXCHG_BUGCHECK: stuck at %p on word %p\n", ip, (v));\
 		break;							\
 	}								\
 } while (0)
diff --git a/arch/openrisc/kernel/entry.S b/arch/openrisc/kernel/entry.S
index bc657e55c15f..947613f61d4a 100644
--- a/arch/openrisc/kernel/entry.S
+++ b/arch/openrisc/kernel/entry.S
@@ -551,7 +551,7 @@  EXCEPTION_ENTRY(_external_irq_handler)
 	l.movhi r3,hi(42f)
 	l.ori	r3,r3,lo(42f)
 	l.sw    0x0(r1),r3
-	l.jal   printk
+	l.jal   _printk
 	l.sw    0x4(r1),r4
 	l.addi  r1,r1,0x8
 
@@ -681,8 +681,8 @@  _syscall_debug:
 	l.sw    -4(r1),r27
 	l.sw    -8(r1),r11
 	l.addi  r1,r1,-8
-	l.movhi r27,hi(printk)
-	l.ori   r27,r27,lo(printk)
+	l.movhi r27,hi(_printk)
+	l.ori   r27,r27,lo(_printk)
 	l.jalr  r27
 	 l.nop
 	l.addi  r1,r1,8
diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S
index fdd4d274c245..f2c7ecef9b14 100644
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -852,7 +852,7 @@  KernelSPE:
 	ori	r3,r3,87f@l
 	mr	r4,r2		/* current */
 	lwz	r5,_NIP(r1)
-	bl	printk
+	bl	_printk
 #endif
 	b	ret_from_except
 #ifdef CONFIG_PRINTK
diff --git a/arch/um/include/shared/user.h b/arch/um/include/shared/user.h
index e793e4212f0a..dd4badffdeb3 100644
--- a/arch/um/include/shared/user.h
+++ b/arch/um/include/shared/user.h
@@ -38,7 +38,8 @@  extern void panic(const char *fmt, ...)
 #define UM_KERN_CONT	KERN_CONT
 
 #ifdef UML_CONFIG_PRINTK
-extern int printk(const char *fmt, ...)
+#define printk(...) _printk(__VA_ARGS__)
+extern int _printk(const char *fmt, ...)
 	__attribute__ ((format (printf, 1, 2)));
 #else
 static inline int printk(const char *fmt, ...)
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 7ed84c282233..c207b789dd22 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -446,7 +446,7 @@  SYM_FUNC_START(early_ignore_irq)
 	pushl 32(%esp)
 	pushl 40(%esp)
 	pushl $int_msg
-	call printk
+	call _printk
 
 	call dump_stack
 
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 34b7e0d2346c..0ca6e28e05d6 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -309,6 +309,17 @@ 
 #define ACPI_PROBE_TABLE(name)
 #endif
 
+#ifdef CONFIG_PRINTK_ENUMERATION
+#define PRINTK_FMTS							\
+	.printk_fmts : AT(ADDR(.printk_fmts) - LOAD_OFFSET) {		\
+		__start_printk_fmts = .;				\
+		*(.printk_fmts)						\
+		__stop_printk_fmts = .;					\
+	}
+#else
+#define PRINTK_FMTS
+#endif
+
 #ifdef CONFIG_THERMAL
 #define THERMAL_TABLE(name)						\
 	. = ALIGN(8);							\
@@ -480,6 +491,8 @@ 
 									\
 	TRACEDATA							\
 									\
+	PRINTK_FMTS							\
+									\
 	/* Kernel symbol table: Normal symbols */			\
 	__ksymtab         : AT(ADDR(__ksymtab) - LOAD_OFFSET) {		\
 		__start___ksymtab = .;					\
diff --git a/include/linux/module.h b/include/linux/module.h
index 7a0bcb5b1ffc..4235b14a22ef 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -516,6 +516,11 @@  struct module {
 	struct klp_modinfo *klp_info;
 #endif
 
+#ifdef CONFIG_PRINTK_ENUMERATION
+	unsigned int printk_fmts_sec_size;
+	const char **printk_fmts_start;
+#endif
+
 #ifdef CONFIG_MODULE_UNLOAD
 	/* What modules depend on me? */
 	struct list_head source_list;
diff --git a/include/linux/printk.h b/include/linux/printk.h
index d3c08095a9a3..4aebb96c3694 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -173,12 +173,12 @@  asmlinkage __printf(1, 0)
 int vprintk(const char *fmt, va_list args);
 
 asmlinkage __printf(1, 2) __cold
-int printk(const char *fmt, ...);
+int _printk(const char *fmt, ...);
 
 /*
  * Special printk facility for scheduler/timekeeping use only, _DO_NOT_USE_ !
  */
-__printf(1, 2) __cold int printk_deferred(const char *fmt, ...);
+__printf(1, 2) __cold int _printk_deferred(const char *fmt, ...);
 
 /*
  * Please don't use printk_ratelimit(), because it shares ratelimiting state
@@ -216,12 +216,12 @@  int vprintk(const char *s, va_list args)
 	return 0;
 }
 static inline __printf(1, 2) __cold
-int printk(const char *s, ...)
+int _printk(const char *s, ...)
 {
 	return 0;
 }
 static inline __printf(1, 2) __cold
-int printk_deferred(const char *s, ...)
+int _printk_deferred(const char *s, ...)
 {
 	return 0;
 }
@@ -284,6 +284,11 @@  static inline void printk_safe_flush_on_panic(void)
 
 extern int kptr_restrict;
 
+#ifdef CONFIG_PRINTK_ENUMERATION
+extern const char *__start_printk_fmts[];
+extern const char *__stop_printk_fmts[];
+#endif
+
 /**
  * pr_fmt - used by the pr_*() macros to generate the printk format string
  * @fmt: format string passed from a pr_*() macro
@@ -301,6 +306,36 @@  extern int kptr_restrict;
 #define pr_fmt(fmt) fmt
 #endif
 
+#ifdef CONFIG_PRINTK_ENUMERATION
+#define printk_store_fmt(func, fmt, ...)				       \
+	({								       \
+		int _printk_ret;					       \
+									       \
+		if (__builtin_constant_p(fmt)) {			       \
+			/*
+			 * The compiler may not be able to eliminate this, so
+			 * we need to make sure that it doesn't see any
+			 * hypothetical assignment for non-constants even
+			 * though this is already inside the
+			 * __builtin_constant_p guard.
+			 */						       \
+			static const char *_fmt __section(".printk_fmts") =    \
+				__builtin_constant_p(fmt) ? fmt : NULL;	       \
+			_printk_ret = func(_fmt, ##__VA_ARGS__);	       \
+		} else							       \
+			_printk_ret = func(fmt, ##__VA_ARGS__);		       \
+									       \
+		_printk_ret;						       \
+	})
+
+#define printk(fmt, ...) printk_store_fmt(_printk, fmt, ##__VA_ARGS__)
+#define printk_deferred(fmt, ...)					       \
+	printk_store_fmt(_printk_deferred, fmt, ##__VA_ARGS__)
+#else /* !CONFIG_PRINTK_ENUMERATION */
+#define printk(...) _printk(__VA_ARGS__)
+#define printk_deferred(...) _printk_deferred(__VA_ARGS__)
+#endif /* CONFIG_PRINTK_ENUMERATION */
+
 /**
  * pr_emerg - Print an emergency-level message
  * @fmt: format string
diff --git a/init/Kconfig b/init/Kconfig
index ab2e916f7e8b..f1f37a060235 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -764,6 +764,20 @@  config PRINTK_SAFE_LOG_BUF_SHIFT
 		     13 =>   8 KB for each CPU
 		     12 =>   4 KB for each CPU
 
+config PRINTK_ENUMERATION
+	bool "Printk enumeration debugfs interface"
+	depends on PRINTK && DEBUG_FS
+	help
+	  Add support for enumeration of all printk formats known at compile
+	  time at <debugfs>/printk/formats/<module>.
+
+	  This can be used as part of maintaining daemons which monitor
+	  /dev/kmsg, as it permits auditing the printk formats present in a
+	  kernel, allowing monitoring of cases where monitored printks are
+	  changed or no longer present.
+
+	  There is no additional runtime cost to printk with this enabled.
+
 #
 # Architectures with an unreliable sched_clock() should select this:
 #
diff --git a/kernel/module.c b/kernel/module.c
index 1e5aad812310..7b0f2a2f428e 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3429,6 +3429,11 @@  static int find_module_sections(struct module *mod, struct load_info *info)
 						sizeof(unsigned long),
 						&mod->num_kprobe_blacklist);
 #endif
+#ifdef CONFIG_PRINTK_ENUMERATION
+	mod->printk_fmts_start = section_objs(info, ".printk_fmts",
+					      sizeof(*mod->printk_fmts_start),
+					      &mod->printk_fmts_sec_size);
+#endif
 #ifdef CONFIG_HAVE_STATIC_CALL_INLINE
 	mod->static_call_sites = section_objs(info, ".static_call_sites",
 					      sizeof(*mod->static_call_sites),
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5a95c688621f..adf545ba9eb9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -47,6 +47,8 @@ 
 #include <linux/sched/clock.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
+#include <linux/debugfs.h>
+#include <linux/hashtable.h>
 
 #include <linux/uaccess.h>
 #include <asm/sections.h>
@@ -617,6 +619,222 @@  static ssize_t msg_print_ext_body(char *buf, size_t size,
 	return len;
 }
 
+#ifdef CONFIG_PRINTK_ENUMERATION
+
+/*
+ * debugfs/printk/formats/ - userspace enumeration of printk formats
+ *
+ * The format is the same as typically used by printk, <KERN_SOH><level>fmt,
+ * with each distinct format separated by \0.
+ */
+
+struct printk_fmt_sec {
+	struct hlist_node hnode;
+	struct module *module;
+	struct dentry *file;
+	const char **start;
+	const char **end;
+};
+
+/* The base dir for module formats, typically debugfs/printk/formats/ */
+struct dentry *dfs_formats;
+
+/*
+ * Stores .printk_fmt section boundaries for vmlinux and all loaded modules.
+ * Add entries with store_printk_fmt_sec, remove entries with
+ * remove_printk_fmt_sec.
+ */
+static DEFINE_HASHTABLE(printk_fmts_mod_sections, 8);
+
+/* Protects printk_fmts_mod_sections */
+static DEFINE_MUTEX(printk_fmts_mutex);
+
+static const char *ps_get_module_name(const struct printk_fmt_sec *ps);
+static int debugfs_pf_open(struct inode *inode, struct file *file);
+
+static const struct file_operations dfs_formats_fops = {
+	.open    = debugfs_pf_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = single_release,
+};
+
+static size_t printk_fmt_size(const char *fmt)
+{
+	size_t sz = strlen(fmt) + 1;
+
+	/*
+	 * Some printk formats don't start with KERN_SOH + level. We will add
+	 * it later when rendering the output.
+	 */
+	if (unlikely(fmt[0] != KERN_SOH_ASCII))
+		sz += 2;
+
+	return sz;
+}
+
+static struct printk_fmt_sec *find_printk_fmt_sec(struct module *mod)
+{
+	struct printk_fmt_sec *ps = NULL;
+
+	hash_for_each_possible(printk_fmts_mod_sections, ps, hnode,
+			       (unsigned long)mod)
+		if (ps->module == mod)
+			return ps;
+
+	return NULL;
+}
+
+static void store_printk_fmt_sec(struct module *mod, const char **start,
+				 const char **end)
+{
+	struct printk_fmt_sec *ps = NULL;
+	const char **fptr = NULL;
+	size_t size = 0;
+
+	ps = kmalloc(sizeof(struct printk_fmt_sec), GFP_KERNEL);
+	if (!ps)
+		return;
+
+	ps->module = mod;
+	ps->start = start;
+	ps->end = end;
+
+	for (fptr = ps->start; fptr < ps->end; fptr++)
+		size += printk_fmt_size(*fptr);
+
+	mutex_lock(&printk_fmts_mutex);
+	hash_add(printk_fmts_mod_sections, &ps->hnode, (unsigned long)mod);
+	mutex_unlock(&printk_fmts_mutex);
+
+	ps->file = debugfs_create_file(ps_get_module_name(ps), 0444,
+				       dfs_formats, mod, &dfs_formats_fops);
+
+	if (!IS_ERR(ps->file))
+		d_inode(ps->file)->i_size = size;
+}
+
+#ifdef CONFIG_MODULES
+static void remove_printk_fmt_sec(struct module *mod)
+{
+	struct printk_fmt_sec *ps = NULL;
+
+	if (WARN_ON_ONCE(!mod))
+		return;
+
+	mutex_lock(&printk_fmts_mutex);
+
+	ps = find_printk_fmt_sec(mod);
+	if (!ps) {
+		mutex_unlock(&printk_fmts_mutex);
+		return;
+	}
+
+	hash_del(&ps->hnode);
+
+	mutex_unlock(&printk_fmts_mutex);
+
+	debugfs_remove(ps->file);
+	kfree(ps);
+}
+
+static int module_printk_fmts_notify(struct notifier_block *self,
+				     unsigned long val, void *data)
+{
+	struct module *mod = data;
+
+	if (mod->printk_fmts_sec_size) {
+		switch (val) {
+		case MODULE_STATE_COMING:
+			store_printk_fmt_sec(mod, mod->printk_fmts_start,
+					     mod->printk_fmts_start +
+						     mod->printk_fmts_sec_size);
+			break;
+
+		case MODULE_STATE_GOING:
+			remove_printk_fmt_sec(mod);
+			break;
+		}
+	}
+
+	return NOTIFY_OK;
+}
+
+static const char *ps_get_module_name(const struct printk_fmt_sec *ps)
+{
+	return ps->module ? ps->module->name : "vmlinux";
+}
+
+static struct notifier_block module_printk_fmts_nb = {
+	.notifier_call = module_printk_fmts_notify,
+};
+
+static int __init module_printk_fmts_init(void)
+{
+	return register_module_notifier(&module_printk_fmts_nb);
+}
+
+core_initcall(module_printk_fmts_init);
+
+#else /* !CONFIG_MODULES */
+static const char *ps_get_module_name(const struct printk_fmt_sec *ps)
+{
+	return "vmlinux";
+}
+#endif /* CONFIG_MODULES */
+
+static int debugfs_pf_show(struct seq_file *s, void *v)
+{
+	struct module *mod = s->file->f_inode->i_private;
+	struct printk_fmt_sec *ps = NULL;
+	const char **fptr = NULL;
+	int ret = 0;
+
+	mutex_lock(&printk_fmts_mutex);
+
+	/*
+	 * The entry might have been invalidated in the hlist between _open and
+	 * _show, so we need to eyeball the entries under printk_fmts_mutex
+	 * again.
+	 */
+	ps = find_printk_fmt_sec(mod);
+	if (unlikely(!ps)) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	for (fptr = ps->start; fptr < ps->end; fptr++) {
+		/* For callsites without KERN_SOH + level preamble. */
+		if (unlikely(*fptr[0] != KERN_SOH_ASCII))
+			seq_printf(s, "%c%d", KERN_SOH_ASCII,
+				   MESSAGE_LOGLEVEL_DEFAULT);
+		seq_printf(s, "%s%c", *fptr, '\0');
+	}
+
+out_unlock:
+	mutex_unlock(&printk_fmts_mutex);
+	return ret;
+}
+
+static int debugfs_pf_open(struct inode *inode, struct file *file)
+{
+	return single_open_size(file, debugfs_pf_show, NULL, inode->i_size);
+}
+
+static int __init init_printk_fmts(void)
+{
+	struct dentry *dfs_root = debugfs_create_dir("printk", NULL);
+
+	dfs_formats = debugfs_create_dir("formats", dfs_root);
+	store_printk_fmt_sec(NULL, __start_printk_fmts, __stop_printk_fmts);
+
+	return 0;
+}
+
+core_initcall(init_printk_fmts);
+
+#endif /* CONFIG_PRINTK_ENUMERATION */
+
 /* /dev/kmsg - userspace message inject/listen interface */
 struct devkmsg_user {
 	u64 seq;
@@ -2111,10 +2329,13 @@  int vprintk_default(const char *fmt, va_list args)
 EXPORT_SYMBOL_GPL(vprintk_default);
 
 /**
- * printk - print a kernel message
+ * _printk - print a kernel message
  * @fmt: format string
  *
- * This is printk(). It can be called from any context. We want it to work.
+ * This is _printk(). It can be called from any context. We want it to work.
+ *
+ * If printk enumeration is enabled, _printk() is called from printk_store_fmt.
+ * Otherwise, printk is simply #defined to _printk.
  *
  * We try to grab the console_lock. If we succeed, it's easy - we log the
  * output and call the console drivers.  If we fail to get the semaphore, we
@@ -2131,7 +2352,7 @@  EXPORT_SYMBOL_GPL(vprintk_default);
  *
  * See the vsnprintf() documentation for format string extensions over C99.
  */
-asmlinkage __visible int printk(const char *fmt, ...)
+asmlinkage __visible int _printk(const char *fmt, ...)
 {
 	va_list args;
 	int r;
@@ -2142,7 +2363,7 @@  asmlinkage __visible int printk(const char *fmt, ...)
 
 	return r;
 }
-EXPORT_SYMBOL(printk);
+EXPORT_SYMBOL(_printk);
 
 #else /* CONFIG_PRINTK */
 
@@ -3133,7 +3354,7 @@  int vprintk_deferred(const char *fmt, va_list args)
 	return r;
 }
 
-int printk_deferred(const char *fmt, ...)
+int _printk_deferred(const char *fmt, ...)
 {
 	va_list args;
 	int r;