linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] printk: increase devkmsg write() ratelimit
@ 2018-12-18  9:18 Sergey Senozhatsky
  2018-12-18 10:48 ` Peter Zijlstra
  2018-12-18 14:02 ` Sergey Senozhatsky
  0 siblings, 2 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2018-12-18  9:18 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt
  Cc: Borislav Petkov, Peter Zijlstra, Andrew Morton, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

	Hello,
	
	RFC

A painful subject:

I just noticed that stock systemd (no systemd debugging enabled) on my
x86 box write()-s during shutdown to devkmsg more than before, so old
devkmsg ratelimits do not apply:

$ sudo journalctl -n 40000 -f | grep "kernel: printk: systemd-shutdow"
 kernel: printk: systemd-shutdow: 35 output lines suppressed due to ratelimiting
 kernel: printk: systemd-shutdow: 31 output lines suppressed due to ratelimiting
 kernel: printk: systemd-shutdow: 35 output lines suppressed due to ratelimiting
 kernel: printk: systemd-shutdow: 36 output lines suppressed due to ratelimiting
 kernel: printk: systemd-shutdow: 36 output lines suppressed due to ratelimiting
 kernel: printk: systemd-shutdow: 36 output lines suppressed due to ratelimiting
 kernel: printk: systemd-shutdow: 36 output lines suppressed due to ratelimiting
 kernel: printk: systemd-shutdow: 35 output lines suppressed due to ratelimiting

I know that there is a "kernel.printk_devkmsg" interface; do we
expect every systemd-enabled distro to find that out and to tweak
kernel.printk_devkmsg or shall we change the default devkmsg
ratelimit instead?


=====================================================================

Every time I reboot my systemd x86_64 box:

kernel: printk: systemd-shutdow: 36 output lines suppressed due to ratelimiting

Apparently, the default ratelimit - 10 messages max, 5 * HZ
interval - is a bit too restrictive these days, can't help
it; so increase it to "50 messages, same interval" (which is
not awfully a lot).

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 84e54250840a..0ed54f90f208 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -998,7 +998,8 @@ static int devkmsg_open(struct inode *inode, struct file *file)
 	if (!user)
 		return -ENOMEM;
 
-	ratelimit_default_init(&user->rs);
+	ratelimit_state_init(&user->rs, DEFAULT_RATELIMIT_INTERVAL,
+					5 * DEFAULT_RATELIMIT_BURST);
 	ratelimit_set_flags(&user->rs, RATELIMIT_MSG_ON_RELEASE);
 
 	mutex_init(&user->lock);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18  9:18 [RFC][PATCH] printk: increase devkmsg write() ratelimit Sergey Senozhatsky
@ 2018-12-18 10:48 ` Peter Zijlstra
  2018-12-18 11:17   ` Sergey Senozhatsky
  2018-12-18 14:02 ` Sergey Senozhatsky
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2018-12-18 10:48 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Steven Rostedt, Borislav Petkov, Andrew Morton,
	linux-kernel, Sergey Senozhatsky

On Tue, Dec 18, 2018 at 06:18:42PM +0900, Sergey Senozhatsky wrote:
> 	Hello,
> 	
> 	RFC
> 
> A painful subject:
> 
> I just noticed that stock systemd (no systemd debugging enabled) on my
> x86 box write()-s during shutdown to devkmsg more than before, so old
> devkmsg ratelimits do not apply:
> 
> $ sudo journalctl -n 40000 -f | grep "kernel: printk: systemd-shutdow"
>  kernel: printk: systemd-shutdow: 35 output lines suppressed due to ratelimiting
>  kernel: printk: systemd-shutdow: 31 output lines suppressed due to ratelimiting
>  kernel: printk: systemd-shutdow: 35 output lines suppressed due to ratelimiting
>  kernel: printk: systemd-shutdow: 36 output lines suppressed due to ratelimiting
>  kernel: printk: systemd-shutdow: 36 output lines suppressed due to ratelimiting
>  kernel: printk: systemd-shutdow: 36 output lines suppressed due to ratelimiting
>  kernel: printk: systemd-shutdow: 36 output lines suppressed due to ratelimiting
>  kernel: printk: systemd-shutdow: 35 output lines suppressed due to ratelimiting
> 
> I know that there is a "kernel.printk_devkmsg" interface; do we
> expect every systemd-enabled distro to find that out and to tweak
> kernel.printk_devkmsg or shall we change the default devkmsg
> ratelimit instead?

How about we complain to systemd instead?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 10:48 ` Peter Zijlstra
@ 2018-12-18 11:17   ` Sergey Senozhatsky
  2018-12-18 11:29     ` Peter Zijlstra
  2018-12-18 11:47     ` Borislav Petkov
  0 siblings, 2 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2018-12-18 11:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Borislav Petkov,
	Andrew Morton, linux-kernel, Sergey Senozhatsky

On (12/18/18 11:48), Peter Zijlstra wrote:
> > I know that there is a "kernel.printk_devkmsg" interface; do we
> > expect every systemd-enabled distro to find that out and to tweak
> > kernel.printk_devkmsg or shall we change the default devkmsg
> > ratelimit instead?
> 
> How about we complain to systemd instead?

We certainly can. As far as I understand, they log shutdown events
(including errors and warnings): what they kill, what they stop,
what they umount, etc. The more partitions, services are running
(I guess), the more things they need to umount, kill, stop; hence,
the more messages. I kinda can imagine what they will answer ;)

The below (and a bunch of other) messages are getting ratelimited.
I'm not sure what will happen should any of those steps fail and
print warning-s. My guess would be that we probably can ratelimit
those warnings:

...
 systemd[1]: Unmounting /home...
 systemd[1]: Unmounting Temporary Directory (/tmp)...
 systemd[1]: Unmounted Temporary Directory (/tmp).
 systemd[1]: Stopped target Swap.
 systemd[1]: Unmounted /boot.
 systemd[1]: Stopped File System Check on /dev/disk/by-uuid/a0737dff-e797-44f0-aea7-d0df1107ff63.
 systemd[1]: Stopped File System Check on /dev/disk/by-uuid/5d773b72-e200-4d11-a219-176d62a16d8d.
 systemd[1]: Unmounted /home.
 systemd[1]: Stopped File System Check on /dev/disk/by-uuid/35319ddc-9b92-4ab0-aaa4-9922db636a5e.
 systemd[1]: Unmounted /media/edev.
 systemd[1]: Stopped File System Check on /dev/disk/by-uuid/da00daaf-5601-4531-912e-bd69103b379d.
 systemd[1]: Unmounted /media/dump.
 systemd[1]: Reached target Unmount All Filesystems.
 systemd[1]: Stopped File System Check on /dev/disk/by-uuid/b52da2df-161b-4c33-b700-277d95b9672f.
 systemd[1]: Removed slice system-systemd\x2dfsck.slice.
 systemd[1]: Stopped target Local File Systems (Pre).
 systemd[1]: Stopped Create Static Device Nodes in /dev.
 systemd[1]: Stopped Create System Users.
 systemd[1]: Stopped Remount Root and Kernel File Systems.
 systemd[1]: Reached target Shutdown.
 systemd[1]: Reached target Final Step.
 systemd[1]: Starting Reboot...

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 11:17   ` Sergey Senozhatsky
@ 2018-12-18 11:29     ` Peter Zijlstra
  2018-12-18 13:09       ` Sergey Senozhatsky
  2018-12-18 11:47     ` Borislav Petkov
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2018-12-18 11:29 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Steven Rostedt, Borislav Petkov, Andrew Morton,
	linux-kernel, Sergey Senozhatsky

On Tue, Dec 18, 2018 at 08:17:01PM +0900, Sergey Senozhatsky wrote:
> On (12/18/18 11:48), Peter Zijlstra wrote:
> > > I know that there is a "kernel.printk_devkmsg" interface; do we
> > > expect every systemd-enabled distro to find that out and to tweak
> > > kernel.printk_devkmsg or shall we change the default devkmsg
> > > ratelimit instead?
> > 
> > How about we complain to systemd instead?
> 
> We certainly can. As far as I understand, they log shutdown events
> (including errors and warnings): what they kill, what they stop,
> what they umount, etc. The more partitions, services are running
> (I guess), the more things they need to umount, kill, stop; hence,
> the more messages. I kinda can imagine what they will answer ;)
> 
> The below (and a bunch of other) messages are getting ratelimited.
> I'm not sure what will happen should any of those steps fail and
> print warning-s. My guess would be that we probably can ratelimit
> those warnings:

Why are they writing it to /dev/kmsg though? Shouldn't they be writing
this to syslog directly?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 11:17   ` Sergey Senozhatsky
  2018-12-18 11:29     ` Peter Zijlstra
@ 2018-12-18 11:47     ` Borislav Petkov
  2018-12-18 13:07       ` Sergey Senozhatsky
  1 sibling, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2018-12-18 11:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Peter Zijlstra, Petr Mladek, Steven Rostedt, Andrew Morton,
	linux-kernel, Sergey Senozhatsky

On Tue, Dec 18, 2018 at 08:17:01PM +0900, Sergey Senozhatsky wrote:
> On (12/18/18 11:48), Peter Zijlstra wrote:
> > > I know that there is a "kernel.printk_devkmsg" interface; do we
> > > expect every systemd-enabled distro to find that out and to tweak
> > > kernel.printk_devkmsg or shall we change the default devkmsg
> > > ratelimit instead?
> > 
> > How about we complain to systemd instead?
> 
> We certainly can. As far as I understand, they log shutdown events
> (including errors and warnings): what they kill, what they stop,
> what they umount, etc. The more partitions, services are running
> (I guess), the more things they need to umount, kill, stop; hence,
> the more messages. I kinda can imagine what they will answer ;)
> 
> The below (and a bunch of other) messages are getting ratelimited.
> I'm not sure what will happen should any of those steps fail and
> print warning-s. My guess would be that we probably can ratelimit
> those warnings:
> 
> ...
>  systemd[1]: Unmounting /home...
>  systemd[1]: Unmounting Temporary Directory (/tmp)...
>  systemd[1]: Unmounted Temporary Directory (/tmp).
>  systemd[1]: Stopped target Swap.
>  systemd[1]: Unmounted /boot.
>  systemd[1]: Stopped File System Check on /dev/disk/by-uuid/a0737dff-e797-44f0-aea7-d0df1107ff63.
>  systemd[1]: Stopped File System Check on /dev/disk/by-uuid/5d773b72-e200-4d11-a219-176d62a16d8d.
>  systemd[1]: Unmounted /home.
>  systemd[1]: Stopped File System Check on /dev/disk/by-uuid/35319ddc-9b92-4ab0-aaa4-9922db636a5e.
>  systemd[1]: Unmounted /media/edev.
>  systemd[1]: Stopped File System Check on /dev/disk/by-uuid/da00daaf-5601-4531-912e-bd69103b379d.
>  systemd[1]: Unmounted /media/dump.
>  systemd[1]: Reached target Unmount All Filesystems.
>  systemd[1]: Stopped File System Check on /dev/disk/by-uuid/b52da2df-161b-4c33-b700-277d95b9672f.
>  systemd[1]: Removed slice system-systemd\x2dfsck.slice.
>  systemd[1]: Stopped target Local File Systems (Pre).
>  systemd[1]: Stopped Create Static Device Nodes in /dev.
>  systemd[1]: Stopped Create System Users.
>  systemd[1]: Stopped Remount Root and Kernel File Systems.
>  systemd[1]: Reached target Shutdown.
>  systemd[1]: Reached target Final Step.
>  systemd[1]: Starting Reboot...

And we wanna see all those "very useful" messages in dmesg because...?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 11:47     ` Borislav Petkov
@ 2018-12-18 13:07       ` Sergey Senozhatsky
  2018-12-18 14:26         ` Borislav Petkov
  0 siblings, 1 reply; 22+ messages in thread
From: Sergey Senozhatsky @ 2018-12-18 13:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, Steven Rostedt,
	Andrew Morton, linux-kernel, Sergey Senozhatsky

On (12/18/18 12:47), Borislav Petkov wrote:
[..]
> >  systemd[1]: Unmounting /home...
> >  systemd[1]: Unmounting Temporary Directory (/tmp)...
> >  systemd[1]: Unmounted Temporary Directory (/tmp).
> >  systemd[1]: Stopped target Swap.
> >  systemd[1]: Unmounted /boot.
> >  systemd[1]: Stopped File System Check on /dev/disk/by-uuid/a0737dff-e797-44f0-aea7-d0df1107ff63.
> >  systemd[1]: Stopped File System Check on /dev/disk/by-uuid/5d773b72-e200-4d11-a219-176d62a16d8d.
> >  systemd[1]: Unmounted /home.
> >  systemd[1]: Stopped File System Check on /dev/disk/by-uuid/35319ddc-9b92-4ab0-aaa4-9922db636a5e.
> >  systemd[1]: Unmounted /media/edev.
> >  systemd[1]: Stopped File System Check on /dev/disk/by-uuid/da00daaf-5601-4531-912e-bd69103b379d.
> >  systemd[1]: Unmounted /media/dump.
> >  systemd[1]: Reached target Unmount All Filesystems.
> >  systemd[1]: Stopped File System Check on /dev/disk/by-uuid/b52da2df-161b-4c33-b700-277d95b9672f.
> >  systemd[1]: Removed slice system-systemd\x2dfsck.slice.
> >  systemd[1]: Stopped target Local File Systems (Pre).
> >  systemd[1]: Stopped Create Static Device Nodes in /dev.
> >  systemd[1]: Stopped Create System Users.
> >  systemd[1]: Stopped Remount Root and Kernel File Systems.
> >  systemd[1]: Reached target Shutdown.
> >  systemd[1]: Reached target Final Step.
> >  systemd[1]: Starting Reboot...
> 
> And we wanna see all those "very useful" messages in dmesg because...?

This is right before shutdown. dmesg is not really available anymore.
I can only guess why do they write it to devkmsg - to make it appear
on the serial/net console, perhaps. Dunno.

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 11:29     ` Peter Zijlstra
@ 2018-12-18 13:09       ` Sergey Senozhatsky
  0 siblings, 0 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2018-12-18 13:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Borislav Petkov,
	Andrew Morton, linux-kernel, Sergey Senozhatsky

On (12/18/18 12:29), Peter Zijlstra wrote:
> > We certainly can. As far as I understand, they log shutdown events
> > (including errors and warnings): what they kill, what they stop,
> > what they umount, etc. The more partitions, services are running
> > (I guess), the more things they need to umount, kill, stop; hence,
> > the more messages. I kinda can imagine what they will answer ;)
> > 
> > The below (and a bunch of other) messages are getting ratelimited.
> > I'm not sure what will happen should any of those steps fail and
> > print warning-s. My guess would be that we probably can ratelimit
> > those warnings:
> 
> Why are they writing it to /dev/kmsg though? Shouldn't they be writing
> this to syslog directly?

Can't really answer for systemd developers. To make those messages
appear on the serial/net console?

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18  9:18 [RFC][PATCH] printk: increase devkmsg write() ratelimit Sergey Senozhatsky
  2018-12-18 10:48 ` Peter Zijlstra
@ 2018-12-18 14:02 ` Sergey Senozhatsky
  1 sibling, 0 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2018-12-18 14:02 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Steven Rostedt, Borislav Petkov, Peter Zijlstra,
	Andrew Morton, linux-kernel, Sergey Senozhatsky

On (12/18/18 18:18), Sergey Senozhatsky wrote:
> 
> I know that there is a "kernel.printk_devkmsg" interface; do we
> expect every systemd-enabled distro to find that out and to tweak
> kernel.printk_devkmsg

My bad, this should be corrected. I thought that user space can
adjust ratelimit, via file->private_data->rs. This does not
appear to be the case.

So the question is - is it time to change the default devkmsg
ratelimit?

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 13:07       ` Sergey Senozhatsky
@ 2018-12-18 14:26         ` Borislav Petkov
  2018-12-18 14:55           ` Sergey Senozhatsky
  0 siblings, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2018-12-18 14:26 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, Steven Rostedt,
	Andrew Morton, linux-kernel

On Tue, Dec 18, 2018 at 10:07:50PM +0900, Sergey Senozhatsky wrote:
> This is right before shutdown. dmesg is not really available anymore.
> I can only guess why do they write it to devkmsg - to make it appear
> on the serial/net console, perhaps. Dunno.

But still, why do we want to see those messages and enlarge the
ratelimit for that? Do they contain any particularly important
information?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 14:26         ` Borislav Petkov
@ 2018-12-18 14:55           ` Sergey Senozhatsky
  2018-12-18 15:03             ` Borislav Petkov
  0 siblings, 1 reply; 22+ messages in thread
From: Sergey Senozhatsky @ 2018-12-18 14:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Andrew Morton, linux-kernel

On (12/18/18 15:26), Borislav Petkov wrote:
> On Tue, Dec 18, 2018 at 10:07:50PM +0900, Sergey Senozhatsky wrote:
> > This is right before shutdown. dmesg is not really available anymore.
> > I can only guess why do they write it to devkmsg - to make it appear
> > on the serial/net console, perhaps. Dunno.
> 
> But still, why do we want to see those messages and enlarge the
> ratelimit for that? Do they contain any particularly important
> information?

Good question. Theoretically, there can be some interesting stuff.
Examples: https://bugzilla.redhat.com/show_bug.cgi?id=1515373#c6
or        https://github.com/systemd/systemd/issues/5433

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 14:55           ` Sergey Senozhatsky
@ 2018-12-18 15:03             ` Borislav Petkov
  2018-12-18 15:14               ` Sergey Senozhatsky
  0 siblings, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2018-12-18 15:03 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, Steven Rostedt,
	Andrew Morton, linux-kernel

On Tue, Dec 18, 2018 at 11:55:58PM +0900, Sergey Senozhatsky wrote:
> Good question. Theoretically, there can be some interesting stuff.
> Examples: https://bugzilla.redhat.com/show_bug.cgi?id=1515373#c6
> or        https://github.com/systemd/systemd/issues/5433

Just like that last page says, people should use the printk.devkmsg=
option when they wanna see all messages. Normally, we do add cmdline
options to the kernel cmdline when debugging, like

debug ignore_loglevel log_buf_len=16M ...

etc and this is no different AFAICT.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 15:03             ` Borislav Petkov
@ 2018-12-18 15:14               ` Sergey Senozhatsky
  2018-12-18 15:24                 ` Borislav Petkov
  0 siblings, 1 reply; 22+ messages in thread
From: Sergey Senozhatsky @ 2018-12-18 15:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Andrew Morton, linux-kernel

On (12/18/18 16:03), Borislav Petkov wrote:
> On Tue, Dec 18, 2018 at 11:55:58PM +0900, Sergey Senozhatsky wrote:
> > Good question. Theoretically, there can be some interesting stuff.
> > Examples: https://bugzilla.redhat.com/show_bug.cgi?id=1515373#c6
> > or        https://github.com/systemd/systemd/issues/5433
> 
> Just like that last page says, people should use the printk.devkmsg=
> option when they wanna see all messages. Normally, we do add cmdline
> options to the kernel cmdline when debugging, like
> 
> debug ignore_loglevel log_buf_len=16M ...

Right, but unlike log_buf_len, devkmsg is a bit close to a "binary" knob:
either all messages or none; it's not like we can say "printk.devkmsg=100
messages per 5 * HZ max".

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 15:14               ` Sergey Senozhatsky
@ 2018-12-18 15:24                 ` Borislav Petkov
  2018-12-18 16:52                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2018-12-18 15:24 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, Steven Rostedt,
	Andrew Morton, linux-kernel

On Wed, Dec 19, 2018 at 12:14:55AM +0900, Sergey Senozhatsky wrote:
> Right, but unlike log_buf_len, devkmsg is a bit close to a "binary" knob:
> either all messages or none;

... which is perfectly fine for a debugging session.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 15:24                 ` Borislav Petkov
@ 2018-12-18 16:52                   ` Sergey Senozhatsky
  2018-12-18 17:21                     ` Peter Zijlstra
  2018-12-18 17:47                     ` Borislav Petkov
  0 siblings, 2 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2018-12-18 16:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Andrew Morton, linux-kernel

On (12/18/18 16:24), Borislav Petkov wrote:
> On Wed, Dec 19, 2018 at 12:14:55AM +0900, Sergey Senozhatsky wrote:
> > Right, but unlike log_buf_len, devkmsg is a bit close to a "binary" knob:
> > either all messages or none;
> 
> ... which is perfectly fine for a debugging session.

But devkmsg ratelimits systemd errors, so one does not even know that
"some debugging is required". For instance from my x86 box:

 Unmounting /home...
 [..]
 home.mount: Mount process exited, code=exited status=32
 Failed unmounting /home.

I don't want to debug systemd, I want to know that something didn't
work out. 10 messages max and 5 seconds interval looks a bit too strict.

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 16:52                   ` Sergey Senozhatsky
@ 2018-12-18 17:21                     ` Peter Zijlstra
  2018-12-18 17:37                       ` Steven Rostedt
  2018-12-18 17:47                     ` Borislav Petkov
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2018-12-18 17:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Borislav Petkov, Sergey Senozhatsky, Petr Mladek, Steven Rostedt,
	Andrew Morton, linux-kernel

On Wed, Dec 19, 2018 at 01:52:17AM +0900, Sergey Senozhatsky wrote:
> On (12/18/18 16:24), Borislav Petkov wrote:
> > On Wed, Dec 19, 2018 at 12:14:55AM +0900, Sergey Senozhatsky wrote:
> > > Right, but unlike log_buf_len, devkmsg is a bit close to a "binary" knob:
> > > either all messages or none;
> > 
> > ... which is perfectly fine for a debugging session.
> 
> But devkmsg ratelimits systemd errors, so one does not even know that
> "some debugging is required". For instance from my x86 box:
> 
>  Unmounting /home...
>  [..]
>  home.mount: Mount process exited, code=exited status=32
>  Failed unmounting /home.
> 
> I don't want to debug systemd, I want to know that something didn't
> work out. 10 messages max and 5 seconds interval looks a bit too strict.

Again, complain to system-doofus for printing so much crap to somewhere
it should not print to begin with.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 17:21                     ` Peter Zijlstra
@ 2018-12-18 17:37                       ` Steven Rostedt
  2018-12-19  8:50                         ` Petr Mladek
  2018-12-20 11:35                         ` Sergey Senozhatsky
  0 siblings, 2 replies; 22+ messages in thread
From: Steven Rostedt @ 2018-12-18 17:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Borislav Petkov, Sergey Senozhatsky,
	Petr Mladek, Andrew Morton, linux-kernel

On Tue, 18 Dec 2018 18:21:09 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Dec 19, 2018 at 01:52:17AM +0900, Sergey Senozhatsky wrote:
> > On (12/18/18 16:24), Borislav Petkov wrote:  
> > > On Wed, Dec 19, 2018 at 12:14:55AM +0900, Sergey Senozhatsky wrote:  
> > > > Right, but unlike log_buf_len, devkmsg is a bit close to a "binary" knob:
> > > > either all messages or none;  
> > > 
> > > ... which is perfectly fine for a debugging session.  
> > 
> > But devkmsg ratelimits systemd errors, so one does not even know that
> > "some debugging is required". For instance from my x86 box:
> > 
> >  Unmounting /home...
> >  [..]
> >  home.mount: Mount process exited, code=exited status=32
> >  Failed unmounting /home.
> > 
> > I don't want to debug systemd, I want to know that something didn't
> > work out. 10 messages max and 5 seconds interval looks a bit too strict.  
> 
> Again, complain to system-doofus for printing so much crap to somewhere
> it should not print to begin with.

I've been saying that it would be good to make the kmsg be a separate
buffer that just gets interleaved with the kernel buffer. Userspace
processes should never be able to overwrite messages from the kernel.

-- Steve

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 16:52                   ` Sergey Senozhatsky
  2018-12-18 17:21                     ` Peter Zijlstra
@ 2018-12-18 17:47                     ` Borislav Petkov
  2018-12-19  1:46                       ` Sergey Senozhatsky
  1 sibling, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2018-12-18 17:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, Steven Rostedt,
	Andrew Morton, linux-kernel

On Wed, Dec 19, 2018 at 01:52:17AM +0900, Sergey Senozhatsky wrote:
> But devkmsg ratelimits systemd errors, so one does not even know that
> "some debugging is required". For instance from my x86 box:
> 
>  Unmounting /home...
>  [..]
>  home.mount: Mount process exited, code=exited status=32
>  Failed unmounting /home.
> 
> I don't want to debug systemd, I want to know that something didn't
> work out. 10 messages max and 5 seconds interval looks a bit too strict.

So how much is not strict?

And what happens if you raise that ratelimiting level and the *one* line
which is most important for debugging an issue, still doesn't get logged
because all of a sudden, doofus is more talkative, overflows the new
limit and we drop the important line?

All I'm saying is, gradually raising the limit is the wrong approach
- there will always be a case where something important doesn't get
logged.

What we need is a different solution, maybe what Rostedt proposes or
so...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 17:47                     ` Borislav Petkov
@ 2018-12-19  1:46                       ` Sergey Senozhatsky
  0 siblings, 0 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2018-12-19  1:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Peter Zijlstra,
	Petr Mladek, Steven Rostedt, Andrew Morton, linux-kernel

On (12/18/18 18:47), Borislav Petkov wrote:
> So how much is not strict?
>
> And what happens if you raise that ratelimiting level and the *one* line
> which is most important for debugging an issue

Like you said, for debugging devkmsg=off works just fine. I don't mind
the ratelimiting and want to keep it; I just don't want errors to be
ratelimited.

> All I'm saying is, gradually raising the limit is the wrong approach
> - there will always be a case where something important doesn't get
> logged.

Well, OK.

> What we need is a different solution, maybe what Rostedt proposes or
> so...

Sure, a different approach and idea are welcome. This is RFC thread.

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 17:37                       ` Steven Rostedt
@ 2018-12-19  8:50                         ` Petr Mladek
  2018-12-20 11:35                         ` Sergey Senozhatsky
  1 sibling, 0 replies; 22+ messages in thread
From: Petr Mladek @ 2018-12-19  8:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Sergey Senozhatsky, Borislav Petkov,
	Sergey Senozhatsky, Andrew Morton, linux-kernel

On Tue 2018-12-18 12:37:48, Steven Rostedt wrote:
> On Tue, 18 Dec 2018 18:21:09 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Wed, Dec 19, 2018 at 01:52:17AM +0900, Sergey Senozhatsky wrote:
> > > On (12/18/18 16:24), Borislav Petkov wrote:  
> > > > On Wed, Dec 19, 2018 at 12:14:55AM +0900, Sergey Senozhatsky wrote:  
> > > > > Right, but unlike log_buf_len, devkmsg is a bit close to a "binary" knob:
> > > > > either all messages or none;  
> > > > 
> > > > ... which is perfectly fine for a debugging session.  
> > > 
> > > But devkmsg ratelimits systemd errors, so one does not even know that
> > > "some debugging is required". For instance from my x86 box:
> > > 
> > >  Unmounting /home...
> > >  [..]
> > >  home.mount: Mount process exited, code=exited status=32
> > >  Failed unmounting /home.
> > > 
> > > I don't want to debug systemd, I want to know that something didn't
> > > work out. 10 messages max and 5 seconds interval looks a bit too strict.  
> > 
> > Again, complain to system-doofus for printing so much crap to somewhere
> > it should not print to begin with.
> 
> I've been saying that it would be good to make the kmsg be a separate
> buffer that just gets interleaved with the kernel buffer. Userspace
> processes should never be able to overwrite messages from the kernel.

There was a proposal for this few years ago, see
https://lkml.kernel.org/r/1435920595-30879-1-git-send-email-m.niesluchow@samsung.com

Honestly, I got scared. It proposed an interface for dynamically adding
separate /dev/kmsg devices and related log buffers. This smells with
huge amount of messages that might fight for console throughput and
complicate printk code.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-18 17:37                       ` Steven Rostedt
  2018-12-19  8:50                         ` Petr Mladek
@ 2018-12-20 11:35                         ` Sergey Senozhatsky
  2018-12-20 13:58                           ` Steven Rostedt
  1 sibling, 1 reply; 22+ messages in thread
From: Sergey Senozhatsky @ 2018-12-20 11:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Sergey Senozhatsky, Borislav Petkov,
	Sergey Senozhatsky, Petr Mladek, Andrew Morton, linux-kernel

On (12/18/18 12:37), Steven Rostedt wrote:
> > 
> > Again, complain to system-doofus for printing so much crap to somewhere
> > it should not print to begin with.
> 
> I've been saying that it would be good to make the kmsg be a separate
> buffer that just gets interleaved with the kernel buffer.

Hmm, this is interesting.

> Userspace processes should never be able to overwrite messages
> from the kernel.

I would agree.

It's a bit funny that kernel people first take the patch in and then,
when user-space begins to use the feature, start to object and ratelimit.

By the way, systemd seems to be OK with alternative logging targets

/etc/systemd/system.conf

---
	[Manager]
	#LogLevel=info
	LogTarget=syslog console journal
---

Everything looks fine with read-only devkmsg (running the patched
kernel). "journalctl -f -b" gives a nice system log:

...
 kernel: r8169 0000:04:00.0 enp4s0: renamed from eth0
 kernel: snd_hda_codec_realtek hdaudioC0D0:      Line=0x1a
 systemd-udevd[180]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
 systemd[1]: Started Flush Journal to Persistent Storage.
 kernel: input: HDA Intel PCH Line as /devices/pci0000:00/0000:00:1f.3/sound/card0/input7
...


Given how often systemd hits ratelimits (I googled some bug reports),
I wonder if systemd people are still interested in /dev/kmsg logging
at all.


---

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 7b4e4de778e4..6d35115c5629 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -875,7 +875,7 @@ static const struct memdev {
 	 [8] = { "random", 0666, &random_fops, 0 },
 	 [9] = { "urandom", 0666, &urandom_fops, 0 },
 #ifdef CONFIG_PRINTK
-	[11] = { "kmsg", 0644, &kmsg_fops, 0 },
+	[11] = { "kmsg", 0444, &kmsg_fops, 0 },
 #endif
 };
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 829fe6fb098a..48c4ccac9fce 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -770,7 +770,6 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 struct devkmsg_user {
 	u64 seq;
 	u32 idx;
-	struct ratelimit_state rs;
 	struct mutex lock;
 	char buf[CONSOLE_EXT_LOG_MAX];
 };
@@ -788,69 +787,6 @@ int devkmsg_emit(int facility, int level, const char *fmt, ...)
 	return r;
 }
 
-static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
-{
-	char *buf, *line;
-	int level = default_message_loglevel;
-	int facility = 1;	/* LOG_USER */
-	struct file *file = iocb->ki_filp;
-	struct devkmsg_user *user = file->private_data;
-	size_t len = iov_iter_count(from);
-	ssize_t ret = len;
-
-	if (!user || len > LOG_LINE_MAX)
-		return -EINVAL;
-
-	/* Ignore when user logging is disabled. */
-	if (devkmsg_log & DEVKMSG_LOG_MASK_OFF)
-		return len;
-
-	/* Ratelimit when not explicitly enabled. */
-	if (!(devkmsg_log & DEVKMSG_LOG_MASK_ON)) {
-		if (!___ratelimit(&user->rs, current->comm))
-			return ret;
-	}
-
-	buf = kmalloc(len+1, GFP_KERNEL);
-	if (buf == NULL)
-		return -ENOMEM;
-
-	buf[len] = '\0';
-	if (!copy_from_iter_full(buf, len, from)) {
-		kfree(buf);
-		return -EFAULT;
-	}
-
-	/*
-	 * Extract and skip the syslog prefix <[0-9]*>. Coming from userspace
-	 * the decimal value represents 32bit, the lower 3 bit are the log
-	 * level, the rest are the log facility.
-	 *
-	 * If no prefix or no userspace facility is specified, we
-	 * enforce LOG_USER, to be able to reliably distinguish
-	 * kernel-generated messages from userspace-injected ones.
-	 */
-	line = buf;
-	if (line[0] == '<') {
-		char *endp = NULL;
-		unsigned int u;
-
-		u = simple_strtoul(line + 1, &endp, 10);
-		if (endp && endp[0] == '>') {
-			level = LOG_LEVEL(u);
-			if (LOG_FACILITY(u) != 0)
-				facility = LOG_FACILITY(u);
-			endp++;
-			len -= endp - line;
-			line = endp;
-		}
-	}
-
-	devkmsg_emit(facility, level, "%s", line);
-	kfree(buf);
-	return ret;
-}
-
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
 {
@@ -998,9 +934,6 @@ static int devkmsg_open(struct inode *inode, struct file *file)
 	if (!user)
 		return -ENOMEM;
 
-	ratelimit_default_init(&user->rs);
-	ratelimit_set_flags(&user->rs, RATELIMIT_MSG_ON_RELEASE);
-
 	mutex_init(&user->lock);
 
 	logbuf_lock_irq();
@@ -1019,8 +952,6 @@ static int devkmsg_release(struct inode *inode, struct file *file)
 	if (!user)
 		return 0;
 
-	ratelimit_state_exit(&user->rs);
-
 	mutex_destroy(&user->lock);
 	kfree(user);
 	return 0;
@@ -1029,7 +960,6 @@ static int devkmsg_release(struct inode *inode, struct file *file)
 const struct file_operations kmsg_fops = {
 	.open = devkmsg_open,
 	.read = devkmsg_read,
-	.write_iter = devkmsg_write,
 	.llseek = devkmsg_llseek,
 	.poll = devkmsg_poll,
 	.release = devkmsg_release,

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-20 11:35                         ` Sergey Senozhatsky
@ 2018-12-20 13:58                           ` Steven Rostedt
  2018-12-21  7:32                             ` Sergey Senozhatsky
  0 siblings, 1 reply; 22+ messages in thread
From: Steven Rostedt @ 2018-12-20 13:58 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Peter Zijlstra, Sergey Senozhatsky, Borislav Petkov, Petr Mladek,
	Andrew Morton, linux-kernel

On Thu, 20 Dec 2018 20:35:37 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> On (12/18/18 12:37), Steven Rostedt wrote:
> > > 
> > > Again, complain to system-doofus for printing so much crap to somewhere
> > > it should not print to begin with.  
> > 
> > I've been saying that it would be good to make the kmsg be a separate
> > buffer that just gets interleaved with the kernel buffer.  
> 
> Hmm, this is interesting.
> 
> > Userspace processes should never be able to overwrite messages
> > from the kernel.  
> 
> I would agree.
> 
> It's a bit funny that kernel people first take the patch in and then,
> when user-space begins to use the feature, start to object and ratelimit.

Please note the "kernel people" to first take it in, were also systemd
developers. And there was a bit of objections to what they wanted to
have to what they got. It wasn't until later when systemd started
abusing the buffer (taking it as its own, making the kernel use of it a
second class citizen) that other kernel developers (that now need to
deal with the fallout) started to object and ratelimit it.

Honestly, it should have been a separate buffer to begin with, and I
wish I pushed for that when it was first added. It's not too late. We
can still make it a separate buffer.

> 
> By the way, systemd seems to be OK with alternative logging targets
> 
> /etc/systemd/system.conf
> 
> ---
> 	[Manager]
> 	#LogLevel=info
> 	LogTarget=syslog console journal

When the system is up and running, it works. But I believe systemd can
still only use ksmsg for initial boot up messages before it does the
chroot.

> ---
> 
> Everything looks fine with read-only devkmsg (running the patched
> kernel). "journalctl -f -b" gives a nice system log:
> 
> ...
>  kernel: r8169 0000:04:00.0 enp4s0: renamed from eth0
>  kernel: snd_hda_codec_realtek hdaudioC0D0:      Line=0x1a
>  systemd-udevd[180]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
>  systemd[1]: Started Flush Journal to Persistent Storage.
>  kernel: input: HDA Intel PCH Line as /devices/pci0000:00/0000:00:1f.3/sound/card0/input7
> ...
> 

Do you get systemd messages before chroot with this patch compared to
what you get without it?

-- Steve

> 
> Given how often systemd hits ratelimits (I googled some bug reports),
> I wonder if systemd people are still interested in /dev/kmsg logging
> at all.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] printk: increase devkmsg write() ratelimit
  2018-12-20 13:58                           ` Steven Rostedt
@ 2018-12-21  7:32                             ` Sergey Senozhatsky
  0 siblings, 0 replies; 22+ messages in thread
From: Sergey Senozhatsky @ 2018-12-21  7:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Peter Zijlstra, Sergey Senozhatsky,
	Borislav Petkov, Petr Mladek, Andrew Morton, linux-kernel

On (12/20/18 08:58), Steven Rostedt wrote:
> > > Userspace processes should never be able to overwrite messages
> > > from the kernel.
> > 
> > I would agree.
> > 
> > It's a bit funny that kernel people first take the patch in and then,
> > when user-space begins to use the feature, start to object and ratelimit.
> 
> Please note the "kernel people" to first take it in, were also systemd
> developers.

Good point.

> Honestly, it should have been a separate buffer to begin with, and I
> wish I pushed for that when it was first added. It's not too late. We
> can still make it a separate buffer.

Ideally, yes, separating them will make sense.

[..]
> When the system is up and running, it works. But I believe systemd can
> still only use ksmsg for initial boot up messages before it does the
> chroot.
[..]
> Do you get systemd messages before chroot with this patch compared to
> what you get without it?

You are right.

systemd cannot log anything really early - there should be /dev/kmsg char
device so they can open() it and write() to it. IOW only after chr_dev_init()
they can append messages to devkmsg. Which is still early enough, perhaps
earlier than UART init. systemd can log to tty, but unlike logbuf messages,
messages written to tty cannot be re-played:

+->  chr_dev_init() create /dev/kmsg
|      persistent storage is not mounted yet
|      writes to /dev/kmsg work
|      ...
|    chroot
|    mount /dev/sdX
+->  journald starts
     pulls logbuf (devkmsg)
     syslog now works
     "systemd[1]: Starting Flush Journal to Persistent Storage..."

So without devkmsg some early systemd messages can be lost (between
chr_dev_init() and exec(journald)), just like you said; or (best case)
will go to serial log. Messages before chr_dev_init() will be lost
because there is no /dev/kmsg; messages after exec(journald) do not
necessarily depend on /dev/kmsg and can be logged via syslog/journal
interface.

	-ss

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-12-21  7:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-18  9:18 [RFC][PATCH] printk: increase devkmsg write() ratelimit Sergey Senozhatsky
2018-12-18 10:48 ` Peter Zijlstra
2018-12-18 11:17   ` Sergey Senozhatsky
2018-12-18 11:29     ` Peter Zijlstra
2018-12-18 13:09       ` Sergey Senozhatsky
2018-12-18 11:47     ` Borislav Petkov
2018-12-18 13:07       ` Sergey Senozhatsky
2018-12-18 14:26         ` Borislav Petkov
2018-12-18 14:55           ` Sergey Senozhatsky
2018-12-18 15:03             ` Borislav Petkov
2018-12-18 15:14               ` Sergey Senozhatsky
2018-12-18 15:24                 ` Borislav Petkov
2018-12-18 16:52                   ` Sergey Senozhatsky
2018-12-18 17:21                     ` Peter Zijlstra
2018-12-18 17:37                       ` Steven Rostedt
2018-12-19  8:50                         ` Petr Mladek
2018-12-20 11:35                         ` Sergey Senozhatsky
2018-12-20 13:58                           ` Steven Rostedt
2018-12-21  7:32                             ` Sergey Senozhatsky
2018-12-18 17:47                     ` Borislav Petkov
2018-12-19  1:46                       ` Sergey Senozhatsky
2018-12-18 14:02 ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).