[Ksummit-discuss] [TECH TOPIC] printk redesign

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ksummit-discuss] [TECH TOPIC] printk redesign
@ 2017-06-19  5:21 Sergey Senozhatsky
  2017-06-19  6:22 ` Hannes Reinecke
                   ` (3 more replies)
  0 siblings, 4 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-19  5:21 UTC (permalink / raw)
  To: ksummit-discuss, Petr Mladek, Steven Rostedt

Hello,

	I, Petr Mladek and Steven Rostedt would like to propose a printk
tech topic (as suggested by Steven). We are currently exploring the idea
of complete redesign and rework of printk and it would be extremely helpful
to hear from the community. printk serves different purposes, and some of
requirements of printk tend to contradict each other; printk is monolithic
and quite heavy, no wonder, it causes problems sometimes.

	So the questions are (a short list) - what the new printk should be?
should it remain monolithic, or can we split it? (e.g. core kernel messages
don't share the log buffer with debug/info messages, etc.) what are the
printk requirements? I've started playing with the idea of moving printk to
per-CPU model: log buffers, per-CPU printk flusher threads. does is it make
sense (wrt to printk requirements) to have direct and in-direct flushers of
printk messages (e.g. core kernel messages are printed directly; debug/info
messages are printed by printing kthreads, etc. well, unless in panic)? ...

	There are many other questions, so it'd be great to have a
brainstorming session.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19  5:21 [Ksummit-discuss] [TECH TOPIC] printk redesign Sergey Senozhatsky
@ 2017-06-19  6:22 ` Hannes Reinecke
  2017-06-19 14:39   ` Steven Rostedt
  2017-06-19 23:46 ` Josh Triplett
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 60+ messages in thread
From: Hannes Reinecke @ 2017-06-19  6:22 UTC (permalink / raw)
  To: ksummit-discuss

On 06/19/2017 07:21 AM, Sergey Senozhatsky wrote:
> Hello,
> 
> 	I, Petr Mladek and Steven Rostedt would like to propose a printk
> tech topic (as suggested by Steven). We are currently exploring the idea
> of complete redesign and rework of printk and it would be extremely helpful
> to hear from the community. printk serves different purposes, and some of
> requirements of printk tend to contradict each other; printk is monolithic
> and quite heavy, no wonder, it causes problems sometimes.
> 
> 	So the questions are (a short list) - what the new printk should be?
> should it remain monolithic, or can we split it? (e.g. core kernel messages
> don't share the log buffer with debug/info messages, etc.) what are the
> printk requirements? I've started playing with the idea of moving printk to
> per-CPU model: log buffers, per-CPU printk flusher threads. does is it make
> sense (wrt to printk requirements) to have direct and in-direct flushers of
> printk messages (e.g. core kernel messages are printed directly; debug/info
> messages are printed by printing kthreads, etc. well, unless in panic)? ...
> 
> 	There are many other questions, so it'd be great to have a
> brainstorming session.
> 
I'm all for it.
Personally, I'd love to see the printk mechanism split into something
which can be used primarily for logging/debugging (ie slow,
non-critical, large messages) and emergency messaging (ie fast, direct
messages like kernel oops and KERN_EMERG thingies).
Plus I'd love to decouple the message generation (ie writing into the
message log) from message output (ie printing out the message log).
That currently is a major performance drag when using slow output
devices like serial console.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19  6:22 ` Hannes Reinecke
@ 2017-06-19 14:39   ` Steven Rostedt
  2017-06-19 15:20     ` Andrew Lunn
                       ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Steven Rostedt @ 2017-06-19 14:39 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: ksummit-discuss

On Mon, 19 Jun 2017 08:22:13 +0200
Hannes Reinecke <hare@suse.com> wrote:

> > 	There are many other questions, so it'd be great to have a
> > brainstorming session.
> >   
> I'm all for it.
> Personally, I'd love to see the printk mechanism split into something
> which can be used primarily for logging/debugging (ie slow,
> non-critical, large messages) and emergency messaging (ie fast, direct
> messages like kernel oops and KERN_EMERG thingies).
> Plus I'd love to decouple the message generation (ie writing into the
> message log) from message output (ie printing out the message log).
> That currently is a major performance drag when using slow output
> devices like serial console.

I'd like to find out all the requirements for printk(). And some of
these requirements contradict each other. If we can sort out exactly
what people want from a printk() mechanism, perhaps we can group
together like requirements and then create multiple facilities that can
handle each group of requirements.

Here's a couple of requirements that I expect from printk:

1) First and for most, is the critical output. Those of warnings, and
above. Basically all critical messages that can be used to debug a
system crash. This requires the ability to be executed from any
context, including NMI.

 This group includes WARN() and BUG() output, and anything in an oops.

2) Activity information. This too can be used to debug a system crash,
and requires serializations. When a device comes on line. A spurious
interrupt. A system state change (CPU going on or off line).

3) Status information. Now, I'm sure people will argue about what goes
in this or the above #2. Here, this would be all pr_info. Useful
information that should be logged, but perhaps not something that is
critical knowledge if a crash happens. In other words, something that
isn't critical to get out immediately.

4) All other kernel information that's not critical at all, and perhaps
doesn't even need to be serialized. At least, not against the above.
This could be cached, and outputted at a later time than when the
printk() was called.

5) Finally, the data from userspace (/dev/kmsg). I believe that this
really should be in an buffer by itself, and at most interleaved via
timestamps with the above in dmesg.

That's my idea. If others have more to add, please do so.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 14:39   ` Steven Rostedt
@ 2017-06-19 15:20     ` Andrew Lunn
  2017-06-19 15:54       ` Hannes Reinecke
  2017-06-19 16:26       ` Steven Rostedt
  2017-06-20 15:56     ` Sergey Senozhatsky
  2017-06-20 18:45     ` Daniel Vetter
  2 siblings, 2 replies; 60+ messages in thread
From: Andrew Lunn @ 2017-06-19 15:20 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

> Here's a couple of requirements that I expect from printk:
> 
> 1) First and for most, is the critical output. Those of warnings, and
> above. Basically all critical messages that can be used to debug a
> system crash. This requires the ability to be executed from any
> context, including NMI.
> 
>  This group includes WARN() and BUG() output, and anything in an oops.
> 
> 2) Activity information. This too can be used to debug a system crash,
> and requires serializations. When a device comes on line. A spurious
> interrupt. A system state change (CPU going on or off line).
> 
> 3) Status information. Now, I'm sure people will argue about what goes
> in this or the above #2. Here, this would be all pr_info. Useful
> information that should be logged, but perhaps not something that is
> critical knowledge if a crash happens. In other words, something that
> isn't critical to get out immediately.
> 
> 4) All other kernel information that's not critical at all, and perhaps
> doesn't even need to be serialized. At least, not against the above.
> This could be cached, and outputted at a later time than when the
> printk() was called.

Developers machine probably have different requirements to production
machines. When debugging during code development, i want the debug
output to be in the correct order, independent of the level. If you
are going to cause reordering, you might want to add a sequence number
to each output, so it is possible to put it back into the correct
order. And it needs to be clear when output is out of order.

       Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 15:20     ` Andrew Lunn
@ 2017-06-19 15:54       ` Hannes Reinecke
  2017-06-19 16:17         ` Andrew Lunn
                           ` (2 more replies)
  2017-06-19 16:26       ` Steven Rostedt
  1 sibling, 3 replies; 60+ messages in thread
From: Hannes Reinecke @ 2017-06-19 15:54 UTC (permalink / raw)
  To: Andrew Lunn, Steven Rostedt; +Cc: ksummit-discuss

On 06/19/2017 05:20 PM, Andrew Lunn wrote:
>> Here's a couple of requirements that I expect from printk:
>>
>> 1) First and for most, is the critical output. Those of warnings, and
>> above. Basically all critical messages that can be used to debug a
>> system crash. This requires the ability to be executed from any
>> context, including NMI.
>>
>>  This group includes WARN() and BUG() output, and anything in an oops.
>>
>> 2) Activity information. This too can be used to debug a system crash,
>> and requires serializations. When a device comes on line. A spurious
>> interrupt. A system state change (CPU going on or off line).
>>
>> 3) Status information. Now, I'm sure people will argue about what goes
>> in this or the above #2. Here, this would be all pr_info. Useful
>> information that should be logged, but perhaps not something that is
>> critical knowledge if a crash happens. In other words, something that
>> isn't critical to get out immediately.
>>
>> 4) All other kernel information that's not critical at all, and perhaps
>> doesn't even need to be serialized. At least, not against the above.
>> This could be cached, and outputted at a later time than when the
>> printk() was called.
> 
> Developers machine probably have different requirements to production
> machines. When debugging during code development, i want the debug
> output to be in the correct order, independent of the level. If you
> are going to cause reordering, you might want to add a sequence number
> to each output, so it is possible to put it back into the correct
> order. And it needs to be clear when output is out of order.
> 
I was under the impression that we do this anyway; even ATM every line
in the printk buffer is prefixed with a timestamp.
And we should continue with that.

Obviously, this requires that any printk buffer entry which is _not_
time-critical (ie everything not in 1) or 2) _cannot_ be broken up, but
will always end up as one record in the printk buffer.
If we could achieve that all the logging infrastructure in scsi could go
away and I could use 'normal' printk.
Now _that_ would be really cool.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 15:54       ` Hannes Reinecke
@ 2017-06-19 16:17         ` Andrew Lunn
  2017-06-19 16:23         ` Mark Brown
  2017-06-20 16:09         ` Sergey Senozhatsky
  2 siblings, 0 replies; 60+ messages in thread
From: Andrew Lunn @ 2017-06-19 16:17 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: ksummit-discuss

> I was under the impression that we do this anyway; even ATM every line
> in the printk buffer is prefixed with a timestamp.

What is the granularity of the timestamp? I'm talking about where
every second or third line is a printk of some sort. It could be some
of them are existing _err(), some are added _dbg() etc. Is the
timestamp sufficient to put such output back into order?

	  Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 15:54       ` Hannes Reinecke
  2017-06-19 16:17         ` Andrew Lunn
@ 2017-06-19 16:23         ` Mark Brown
  2017-06-20 15:58           ` Sergey Senozhatsky
  2017-06-20 16:09         ` Sergey Senozhatsky
  2 siblings, 1 reply; 60+ messages in thread
From: Mark Brown @ 2017-06-19 16:23 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 411 bytes --]

On Mon, Jun 19, 2017 at 05:54:34PM +0200, Hannes Reinecke wrote:

> I was under the impression that we do this anyway; even ATM every line
> in the printk buffer is prefixed with a timestamp.
> And we should continue with that.

This is a configuration option (PRINTK_TIME) and isn't 100% reliable,
I've got examples in the syslog of my current system with multiple
prints being issued with the same timestamp.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 15:20     ` Andrew Lunn
  2017-06-19 15:54       ` Hannes Reinecke
@ 2017-06-19 16:26       ` Steven Rostedt
  2017-06-19 16:35         ` Andrew Lunn
  2017-06-24 11:14         ` Mauro Carvalho Chehab
  1 sibling, 2 replies; 60+ messages in thread
From: Steven Rostedt @ 2017-06-19 16:26 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: ksummit-discuss

On Mon, 19 Jun 2017 17:20:55 +0200
Andrew Lunn <andrew@lunn.ch> wrote:


> > 4) All other kernel information that's not critical at all, and perhaps
> > doesn't even need to be serialized. At least, not against the above.
> > This could be cached, and outputted at a later time than when the
> > printk() was called.  
> 
> Developers machine probably have different requirements to production
> machines. When debugging during code development, i want the debug
> output to be in the correct order, independent of the level. If you
> are going to cause reordering, you might want to add a sequence number
> to each output, so it is possible to put it back into the correct
> order. And it needs to be clear when output is out of order.

Perhaps a kernel command line option could force this.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 16:26       ` Steven Rostedt
@ 2017-06-19 16:35         ` Andrew Lunn
  2017-06-24 11:14         ` Mauro Carvalho Chehab
  1 sibling, 0 replies; 60+ messages in thread
From: Andrew Lunn @ 2017-06-19 16:35 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

> Perhaps a kernel command line option could force this.

Yep. And maybe having DEBUG defined in a file, so _dbg() is not a NOP,
is probably a good indication order output is required. But scope
becomes an issue.

	 Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19  5:21 [Ksummit-discuss] [TECH TOPIC] printk redesign Sergey Senozhatsky
  2017-06-19  6:22 ` Hannes Reinecke
@ 2017-06-19 23:46 ` Josh Triplett
  2017-06-20  8:24   ` Arnd Bergmann
  2017-06-22 16:35 ` David Howells
  2017-07-19  6:24 ` Sergey Senozhatsky
  3 siblings, 1 reply; 60+ messages in thread
From: Josh Triplett @ 2017-06-19 23:46 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: ksummit-discuss

On Mon, Jun 19, 2017 at 02:21:46PM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> 	I, Petr Mladek and Steven Rostedt would like to propose a printk
> tech topic (as suggested by Steven). We are currently exploring the idea
> of complete redesign and rework of printk and it would be extremely helpful
> to hear from the community. printk serves different purposes, and some of
> requirements of printk tend to contradict each other; printk is monolithic
> and quite heavy, no wonder, it causes problems sometimes.
> 
> 	So the questions are (a short list) - what the new printk should be?
> should it remain monolithic, or can we split it? (e.g. core kernel messages
> don't share the log buffer with debug/info messages, etc.) what are the
> printk requirements? I've started playing with the idea of moving printk to
> per-CPU model: log buffers, per-CPU printk flusher threads. does is it make
> sense (wrt to printk requirements) to have direct and in-direct flushers of
> printk messages (e.g. core kernel messages are printed directly; debug/info
> messages are printed by printing kthreads, etc. well, unless in panic)? ...
> 
> 	There are many other questions, so it'd be great to have a
> brainstorming session.

It'd be nice if, when building a kernel with CONFIG_PRINTK disabled, the
full format-string handling wasn't compiled in to handle snprintf and
family. In particular, some of the kernel-specific %p format-string
modifiers only get used when printing to a user, never when formatting a
string for machine consumption. I can think of a few different ways to
address that, but I'd love to see that taken into account in the
requirements for a printk redesign.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 23:46 ` Josh Triplett
@ 2017-06-20  8:24   ` Arnd Bergmann
  2017-06-20 14:36     ` Steven Rostedt
  0 siblings, 1 reply; 60+ messages in thread
From: Arnd Bergmann @ 2017-06-20  8:24 UTC (permalink / raw)
  To: Josh Triplett; +Cc: ksummit

On Tue, Jun 20, 2017 at 1:46 AM, Josh Triplett <josh@joshtriplett.org> wrote:
> On Mon, Jun 19, 2017 at 02:21:46PM +0900, Sergey Senozhatsky wrote:
>> Hello,
>>
>>       I, Petr Mladek and Steven Rostedt would like to propose a printk
>> tech topic (as suggested by Steven). We are currently exploring the idea
>> of complete redesign and rework of printk and it would be extremely helpful
>> to hear from the community. printk serves different purposes, and some of
>> requirements of printk tend to contradict each other; printk is monolithic
>> and quite heavy, no wonder, it causes problems sometimes.
>>
>>       So the questions are (a short list) - what the new printk should be?
>> should it remain monolithic, or can we split it? (e.g. core kernel messages
>> don't share the log buffer with debug/info messages, etc.) what are the
>> printk requirements? I've started playing with the idea of moving printk to
>> per-CPU model: log buffers, per-CPU printk flusher threads. does is it make
>> sense (wrt to printk requirements) to have direct and in-direct flushers of
>> printk messages (e.g. core kernel messages are printed directly; debug/info
>> messages are printed by printing kthreads, etc. well, unless in panic)? ...
>>
>>       There are many other questions, so it'd be great to have a
>> brainstorming session.
>
> It'd be nice if, when building a kernel with CONFIG_PRINTK disabled, the
> full format-string handling wasn't compiled in to handle snprintf and
> family. In particular, some of the kernel-specific %p format-string
> modifiers only get used when printing to a user, never when formatting a
> string for machine consumption. I can think of a few different ways to
> address that, but I'd love to see that taken into account in the
> requirements for a printk redesign.

I don't think that's realistic, given how many special formats we have
nowadays, see the discussion about the 'struct rtc_time' format recently,
which was definitely meant for external ABIs.

        Arnd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20  8:24   ` Arnd Bergmann
@ 2017-06-20 14:36     ` Steven Rostedt
  2017-06-20 15:26       ` Sergey Senozhatsky
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2017-06-20 14:36 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: ksummit

On Tue, 20 Jun 2017 10:24:43 +0200
Arnd Bergmann <arnd@arndb.de> wrote:


> > It'd be nice if, when building a kernel with CONFIG_PRINTK disabled, the
> > full format-string handling wasn't compiled in to handle snprintf and
> > family. In particular, some of the kernel-specific %p format-string
> > modifiers only get used when printing to a user, never when formatting a
> > string for machine consumption. I can think of a few different ways to
> > address that, but I'd love to see that taken into account in the
> > requirements for a printk redesign.  
> 
> I don't think that's realistic, given how many special formats we have
> nowadays, see the discussion about the 'struct rtc_time' format recently,
> which was definitely meant for external ABIs.

I think it may be possible, but it is a separate project from the
printk one. One thing, tracing requires it, and does not require
printk. What you could do is have CONFIG_VSPRINTF and have anything that
uses it select it as well. Which would be printk, tracing, etc.

But the printk mess in general is a big enough issue to try to solve
than to bring this up at this time.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20 14:36     ` Steven Rostedt
@ 2017-06-20 15:26       ` Sergey Senozhatsky
  0 siblings, 0 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-20 15:26 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit

On (06/20/17 10:36), Steven Rostedt wrote:
> On Tue, 20 Jun 2017 10:24:43 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> 
> > > It'd be nice if, when building a kernel with CONFIG_PRINTK disabled, the
> > > full format-string handling wasn't compiled in to handle snprintf and
> > > family. In particular, some of the kernel-specific %p format-string
> > > modifiers only get used when printing to a user, never when formatting a
> > > string for machine consumption. I can think of a few different ways to
> > > address that, but I'd love to see that taken into account in the
> > > requirements for a printk redesign.  
> > 
> > I don't think that's realistic, given how many special formats we have
> > nowadays, see the discussion about the 'struct rtc_time' format recently,
> > which was definitely meant for external ABIs.
> 
> I think it may be possible, but it is a separate project from the
> printk one. One thing, tracing requires it, and does not require
> printk. What you could do is have CONFIG_VSPRINTF and have anything that
> uses it select it as well. Which would be printk, tracing, etc.
> 
> But the printk mess in general is a big enough issue to try to solve
> than to bring this up at this time.

agree. that's a bit minor at the moment.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 14:39   ` Steven Rostedt
  2017-06-19 15:20     ` Andrew Lunn
@ 2017-06-20 15:56     ` Sergey Senozhatsky
  2017-06-20 18:45     ` Daniel Vetter
  2 siblings, 0 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-20 15:56 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

Hello,

On (06/19/17 10:39), Steven Rostedt wrote:
> > > 	There are many other questions, so it'd be great to have a
> > > brainstorming session.
> > >   
> > I'm all for it.
> > Personally, I'd love to see the printk mechanism split into something
> > which can be used primarily for logging/debugging (ie slow,
> > non-critical, large messages) and emergency messaging (ie fast, direct
> > messages like kernel oops and KERN_EMERG thingies).
> > Plus I'd love to decouple the message generation (ie writing into the
> > message log) from message output (ie printing out the message log).
> > That currently is a major performance drag when using slow output
> > devices like serial console.
> 
> I'd like to find out all the requirements for printk(). And some of
> these requirements contradict each other. If we can sort out exactly
> what people want from a printk() mechanism, perhaps we can group
> together like requirements and then create multiple facilities that can
> handle each group of requirements.
> 
> Here's a couple of requirements that I expect from printk:
> 
> 1) First and for most, is the critical output. Those of warnings, and
> above. Basically all critical messages that can be used to debug a
> system crash. This requires the ability to be executed from any
> context, including NMI.
> 
>  This group includes WARN() and BUG() output, and anything in an oops.
> 
> 2) Activity information. This too can be used to debug a system crash,
> and requires serializations. When a device comes on line. A spurious
> interrupt. A system state change (CPU going on or off line).
> 
> 3) Status information. Now, I'm sure people will argue about what goes
> in this or the above #2. Here, this would be all pr_info. Useful
> information that should be logged, but perhaps not something that is
> critical knowledge if a crash happens. In other words, something that
> isn't critical to get out immediately.

yeah, the 'get out immediately' part is pretty complex

a) some people like it and want to keep it (it has never killed
   their systems)

b) some people don't like it and want to change it (it has
   killed their systems)

the existing 'the winner prints it all' design is problematic sometimes.


> 4) All other kernel information that's not critical at all, and perhaps
> doesn't even need to be serialized. At least, not against the above.
> This could be cached, and outputted at a later time than when the
> printk() was called.

messages are not created equal :)

> 5) Finally, the data from userspace (/dev/kmsg). I believe that this
> really should be in an buffer by itself, and at most interleaved via
> timestamps with the above in dmesg.

yes, could do.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 16:23         ` Mark Brown
@ 2017-06-20 15:58           ` Sergey Senozhatsky
  2017-06-20 16:44             ` Luck, Tony
  0 siblings, 1 reply; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-20 15:58 UTC (permalink / raw)
  To: Mark Brown; +Cc: ksummit-discuss

On (06/19/17 17:23), Mark Brown wrote:
> On Mon, Jun 19, 2017 at 05:54:34PM +0200, Hannes Reinecke wrote:
> 
> > I was under the impression that we do this anyway; even ATM every line
> > in the printk buffer is prefixed with a timestamp.
> > And we should continue with that.
> 
> This is a configuration option (PRINTK_TIME) and isn't 100% reliable,
> I've got examples in the syslog of my current system with multiple
> prints being issued with the same timestamp.

yes, as far as I know, timestamps are not in complete sync between CPUs.

 	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 15:54       ` Hannes Reinecke
  2017-06-19 16:17         ` Andrew Lunn
  2017-06-19 16:23         ` Mark Brown
@ 2017-06-20 16:09         ` Sergey Senozhatsky
  2 siblings, 0 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-20 16:09 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: ksummit-discuss

Hello,

On (06/19/17 17:54), Hannes Reinecke wrote:
[..]
> > Developers machine probably have different requirements to production
> > machines. When debugging during code development, i want the debug
> > output to be in the correct order, independent of the level. If you
> > are going to cause reordering, you might want to add a sequence number
> > to each output, so it is possible to put it back into the correct
> > order. And it needs to be clear when output is out of order.
> > 
> I was under the impression that we do this anyway; even ATM every line
> in the printk buffer is prefixed with a timestamp.
> And we should continue with that.
> 
> Obviously, this requires that any printk buffer entry which is _not_
> time-critical (ie everything not in 1) or 2) _cannot_ be broken up, but
> will always end up as one record in the printk buffer.
> If we could achieve that all the logging infrastructure in scsi could go
> away and I could use 'normal' printk.
> Now _that_ would be really cool.

are we talking about dev_printk() and friends here? if so, then yes - would
be lovely to remove those.

there is some demand for a 'buffered printk' - multiple lines that would
appear in logbuf in sequential records. people (Tetsuo) want OOM backtraces
from different CPUs to be less broken up, etc.

we already have per-CPU buffers for printk_safe and we can reuse them.
there are some limitations, tho. if "buffered printk must be done with
local IRQs disabled" trade-off is acceptable (and I kinda think it is,
buffered printk from a scheduleable context is somewhat strange), then
we can add it _more or less_ easily.

very schematically

	buffered_printk_begin();  //local IRQ disable
		printk();
		printk();
		...
		printk();

	buffered_printk_end()   // local IRQ enable
		{
			spin_lock_irqsave(logbuf_lock)

				copy messages from this-CPU buffer to logbuf

			spin_unlock_irqrestore(logbuf_lock)
		}

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20 15:58           ` Sergey Senozhatsky
@ 2017-06-20 16:44             ` Luck, Tony
  2017-06-20 17:11               ` Sergey Senozhatsky
  0 siblings, 1 reply; 60+ messages in thread
From: Luck, Tony @ 2017-06-20 16:44 UTC (permalink / raw)
  To: Sergey Senozhatsky, Mark Brown; +Cc: ksummit-discuss

>> This is a configuration option (PRINTK_TIME) and isn't 100% reliable,
>> I've got examples in the syslog of my current system with multiple
>> prints being issued with the same timestamp.
>
> yes, as far as I know, timestamps are not in complete sync between CPUs.

We are pretty deep into the printk() code before we call local_clock() to
find out what timestamp to print.  Would it be helpful to grab that much
earlier ... so it is closer to the time at which the code that called printk()?

This would make it more likely that messages would appear out of order,
but I think it would be better to know when things actually happened, not
when we got around to printing them.

-Tony

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20 16:44             ` Luck, Tony
@ 2017-06-20 17:11               ` Sergey Senozhatsky
  2017-06-20 17:27                 ` Mark Brown
  0 siblings, 1 reply; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-20 17:11 UTC (permalink / raw)
  To: Luck, Tony; +Cc: ksummit-discuss

On (06/20/17 16:44), Luck, Tony wrote:
> >> This is a configuration option (PRINTK_TIME) and isn't 100% reliable,
> >> I've got examples in the syslog of my current system with multiple
> >> prints being issued with the same timestamp.
> >
> > yes, as far as I know, timestamps are not in complete sync between CPUs.
> 
> We are pretty deep into the printk() code before we call local_clock() to
> find out what timestamp to print.

correct, we basically keep the time when the message was finally
stored (there can be a 'lengthy' spinning on logbuf_lock before
that), not when the caller issued printk().

> Would it be helpful to grab that much earlier ... so it is closer to
> the time at which the code that called printk()?
> 
> This would make it more likely that messages would appear out of order,
> but I think it would be better to know when things actually happened, not
> when we got around to printing them.

so, yes, we probably can do it. and _perhaps_ it makes sense to do it even
in the current printk, I think. this won't make messages ordered, tho.
printk() does not sort logbuf records. and the way things work now, even
if we would sort logbuf entries, we still would have cases of out of
order messages - because printk_safe and printk_nmi messages in general
(in general; there is a shortcut for printk_nmi, but not for printk_safe)
cannot be stored directly to the logbuf because we can deadlock. we flush
those messages later from irq work instead. so those messages are expected
to be out of order.

another thing that I found useful is a CPU number of the processor
that stored a particular line to the logbuf.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20 17:11               ` Sergey Senozhatsky
@ 2017-06-20 17:27                 ` Mark Brown
  2017-06-20 23:28                   ` Steven Rostedt
  2017-07-19  7:35                   ` David Woodhouse
  0 siblings, 2 replies; 60+ messages in thread
From: Mark Brown @ 2017-06-20 17:27 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 322 bytes --]

On Wed, Jun 21, 2017 at 02:11:34AM +0900, Sergey Senozhatsky wrote:

> another thing that I found useful is a CPU number of the processor
> that stored a particular line to the logbuf.

At some point we start reinventing ftrace...  there's issues with
joining the two up but there should at least be lessons we can learn.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 14:39   ` Steven Rostedt
  2017-06-19 15:20     ` Andrew Lunn
  2017-06-20 15:56     ` Sergey Senozhatsky
@ 2017-06-20 18:45     ` Daniel Vetter
  2017-06-21  9:29       ` Petr Mladek
  2017-06-21 10:15       ` Sergey Senozhatsky
  2 siblings, 2 replies; 60+ messages in thread
From: Daniel Vetter @ 2017-06-20 18:45 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

On Mon, Jun 19, 2017 at 4:39 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Mon, 19 Jun 2017 08:22:13 +0200
> Hannes Reinecke <hare@suse.com> wrote:
>
>> >     There are many other questions, so it'd be great to have a
>> > brainstorming session.
>> >
>> I'm all for it.
>> Personally, I'd love to see the printk mechanism split into something
>> which can be used primarily for logging/debugging (ie slow,
>> non-critical, large messages) and emergency messaging (ie fast, direct
>> messages like kernel oops and KERN_EMERG thingies).
>> Plus I'd love to decouple the message generation (ie writing into the
>> message log) from message output (ie printing out the message log).
>> That currently is a major performance drag when using slow output
>> devices like serial console.
>
> I'd like to find out all the requirements for printk(). And some of
> these requirements contradict each other. If we can sort out exactly
> what people want from a printk() mechanism, perhaps we can group
> together like requirements and then create multiple facilities that can
> handle each group of requirements.
>
> Here's a couple of requirements that I expect from printk:
>
> 1) First and for most, is the critical output. Those of warnings, and
> above. Basically all critical messages that can be used to debug a
> system crash. This requires the ability to be executed from any
> context, including NMI.
>
>  This group includes WARN() and BUG() output, and anything in an oops.
>
> 2) Activity information. This too can be used to debug a system crash,
> and requires serializations. When a device comes on line. A spurious
> interrupt. A system state change (CPU going on or off line).
>
> 3) Status information. Now, I'm sure people will argue about what goes
> in this or the above #2. Here, this would be all pr_info. Useful
> information that should be logged, but perhaps not something that is
> critical knowledge if a crash happens. In other words, something that
> isn't critical to get out immediately.
>
> 4) All other kernel information that's not critical at all, and perhaps
> doesn't even need to be serialized. At least, not against the above.
> This could be cached, and outputted at a later time than when the
> printk() was called.
>
> 5) Finally, the data from userspace (/dev/kmsg). I believe that this
> really should be in an buffer by itself, and at most interleaved via
> timestamps with the above in dmesg.
>
> That's my idea. If others have more to add, please do so.

My own pet peeve with printk from the drm side:

Untangling printk form the console_lock horror show, if at all
possible. One problem is that heavy printk usage makes the
console_lock massively contended (we delay restoring the the kernel
console on resume in kms drivers to a worker because of that). The
worse problem is that console_lock locking is horribly monolithic and
defacto requires that large chunks of your gfx driver init code needs
to run while holding it. Which means no printk output, neither for
your gfx driver nor anything else while your cpu goes through
something like a few 100k lines of code (for big drivers).

We have a few pages of kerneldoc explaining how to debug this and what
happens, and gross debug hacks to just not take console_lock on driver
(and pray it won't race), but it's a constant trap for new gfx
hackers.

Fixing console_lock is much less likely to happen, I (and better
hackers like Alan Cox) tried, and ran away in tears.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20 17:27                 ` Mark Brown
@ 2017-06-20 23:28                   ` Steven Rostedt
  2017-06-21  7:17                     ` Hannes Reinecke
                                       ` (2 more replies)
  2017-07-19  7:35                   ` David Woodhouse
  1 sibling, 3 replies; 60+ messages in thread
From: Steven Rostedt @ 2017-06-20 23:28 UTC (permalink / raw)
  To: Mark Brown; +Cc: Peter Zijlstra, ksummit-discuss

On Tue, 20 Jun 2017 18:27:38 +0100
Mark Brown <broonie@kernel.org> wrote:

> On Wed, Jun 21, 2017 at 02:11:34AM +0900, Sergey Senozhatsky wrote:
> 
> > another thing that I found useful is a CPU number of the processor
> > that stored a particular line to the logbuf.  
> 
> At some point we start reinventing ftrace...  there's issues with
> joining the two up but there should at least be lessons we can learn.

I've thought about this a little too.

I would like printk to have per-cpu buffers. Then we don't even need to
store the CPU number, that would be explicit by which buffer the data
is stored in.

The one thing that is needed, is the consumer. In ftrace, it's whatever
reads the buffer, which is usually user space, but can be the kernel
(see sysctl-z). But there's only one consumer at a time.

I was thinking about a new design for printk. Similar to ftrace, but
different.

1) have per cpu buffers, that are lockless. Writes happen immediately,
but the output happens later.

2) have two types of console interfaces. A normal and a critical.

3) have a thread that is woken whenever there is data in any of the
buffers, and reads the buffers, again lockless. But to do this in a
reasonable manner, unless you break the printks up in sub buffers like
ftrace, if the consumer isn't fast enough, newer messages are dropped.

4) If a critical print is needed (and here's why we have two console
interfaces), the normal console interface gets turned off, and the
buffers stop being output through them. What ever called the critical
print, will take over, and flush out all the contents of the current
buffers. Then anything printed during the critical section will go out
immediately (no buffering). The printk thread, will stop having access
to the buffers, and shutdown till the critical section is complete.

This is just a rough idea. I think it is possible. The tricky part is
going to be the switch over to the critical section. Also, have a
command line parameter that has all printks be critical. Peter Zilstra
has some patches that already does that with making printk turn into
early printk.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20 23:28                   ` Steven Rostedt
@ 2017-06-21  7:17                     ` Hannes Reinecke
  2017-06-21 11:12                     ` Sergey Senozhatsky
  2017-06-21 12:23                     ` Petr Mladek
  2 siblings, 0 replies; 60+ messages in thread
From: Hannes Reinecke @ 2017-06-21  7:17 UTC (permalink / raw)
  To: Steven Rostedt, Mark Brown; +Cc: Peter Zijlstra, ksummit-discuss

On 06/21/2017 01:28 AM, Steven Rostedt wrote:
> On Tue, 20 Jun 2017 18:27:38 +0100
> Mark Brown <broonie@kernel.org> wrote:
> 
>> On Wed, Jun 21, 2017 at 02:11:34AM +0900, Sergey Senozhatsky wrote:
>>
>>> another thing that I found useful is a CPU number of the processor
>>> that stored a particular line to the logbuf.  
>>
>> At some point we start reinventing ftrace...  there's issues with
>> joining the two up but there should at least be lessons we can learn.
> 
> I've thought about this a little too.
> 
Of course you would :-)

> I would like printk to have per-cpu buffers. Then we don't even need to
> store the CPU number, that would be explicit by which buffer the data
> is stored in.
> 
> The one thing that is needed, is the consumer. In ftrace, it's whatever
> reads the buffer, which is usually user space, but can be the kernel
> (see sysctl-z). But there's only one consumer at a time.
> 
> I was thinking about a new design for printk. Similar to ftrace, but
> different.
> 
> 1) have per cpu buffers, that are lockless. Writes happen immediately,
> but the output happens later.
>> 2) have two types of console interfaces. A normal and a critical.
> 
> 3) have a thread that is woken whenever there is data in any of the
> buffers, and reads the buffers, again lockless. But to do this in a
> reasonable manner, unless you break the printks up in sub buffers like
> ftrace, if the consumer isn't fast enough, newer messages are dropped.
> 
I'd rather have the _older_ messages dropped. Or make this a policy.
But in general one should be able to adjust the buffer size so that this
doesn't occur.

> 4) If a critical print is needed (and here's why we have two console
> interfaces), the normal console interface gets turned off, and the
> buffers stop being output through them. What ever called the critical
> print, will take over, and flush out all the contents of the current
> buffers. Then anything printed during the critical section will go out
> immediately (no buffering). The printk thread, will stop having access
> to the buffers, and shutdown till the critical section is complete.
> 
Hmm. Ideally the critical print will be shipped out _without_ having to
wait for the normal buffer contents.
(Adding the message _at the tail_ and rewinding the buffer to that?)

Remember, there might be _lots_ of 'normal' buffer entries (like system
screaming just before OOM kicks in), so we might not have enough time
shipping out all entries, and the really critical one might never be seen.

_If_ all messages are proper prefixed with a timestamp (and possibly CPU
number) it doesn't really matter if messages are printed out of order;
one can always sort them later on.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.com			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20 18:45     ` Daniel Vetter
@ 2017-06-21  9:29       ` Petr Mladek
  2017-06-21 10:15       ` Sergey Senozhatsky
  1 sibling, 0 replies; 60+ messages in thread
From: Petr Mladek @ 2017-06-21  9:29 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit-discuss

On Tue 2017-06-20 20:45:27, Daniel Vetter wrote:
> My own pet peeve with printk from the drm side:
> 
> Untangling printk form the console_lock horror show, if at all
> possible. One problem is that heavy printk usage makes the
> console_lock massively contended (we delay restoring the the kernel
> console on resume in kms drivers to a worker because of that).

We really need an offload mechanism. The patches are circulating
for years. The last version uses a kthread. It allows to release
console_lock even when some messages are pending. It should be
perfectly fine when the system works normally and the kthread
will get scheduled eventually.

There is a problem to detect situations where the system is
in a bad state, the offload might not work and we should switch
to the sync mode. But we make some progress on this as well.

IMHO, the best solution would be to avoid sleeping with
console_lock. Then it will be guaranteed that either someone
is actively flushing the messages or that anyone else is
free to do so.

But this would require to untangle all the other uses of
console_lock where the sleeping is necessary these days.

Also others might think different.

Anyway, the discussion about the current patchset is at
https://lkml.kernel.org/r/20170509082859.854-3-sergey.senozhatsky@gmail.com

Sergey already have another RFC using another idea but I did
not have to review this yet. See
https://lkml.kernel.org/r/20170602090345.624-1-sergey.senozhatsky@gmail.com

> The worse problem is that console_lock locking is horribly monolithic and
> defacto requires that large chunks of your gfx driver init code needs
> to run while holding it. Which means no printk output, neither for
> your gfx driver nor anything else while your cpu goes through
> something like a few 100k lines of code (for big drivers).

To be honest, I still do not know all the dark sides of the console
handling and this seems to be one of them. I wonder why
the console_lock is needed in this code. It does not make
much sense to me. It must be a synchronization task of
console_lock that I not aware of.

> We have a few pages of kerneldoc explaining how to debug this and what
> happens, and gross debug hacks to just not take console_lock on driver
> (and pray it won't race), but it's a constant trap for new gfx
> hackers.

Good to know.

> Fixing console_lock is much less likely to happen, I (and better
> hackers like Alan Cox) tried, and ran away in tears.

This looks like a real challenge. I feel less shame that it take
us too long.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20 18:45     ` Daniel Vetter
  2017-06-21  9:29       ` Petr Mladek
@ 2017-06-21 10:15       ` Sergey Senozhatsky
  2017-06-22 13:42         ` Daniel Vetter
  1 sibling, 1 reply; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-21 10:15 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit-discuss

Hello,

On (06/20/17 20:45), Daniel Vetter wrote:
[..]
> My own pet peeve with printk from the drm side:
> 
> Untangling printk form the console_lock horror show, if at all
> possible. One problem is that heavy printk usage makes the
> console_lock massively contended (we delay restoring the the kernel
> console on resume in kms drivers to a worker because of that). The
> worse problem is that console_lock locking is horribly monolithic and
> defacto requires that large chunks of your gfx driver init code needs
> to run while holding it. Which means no printk output, neither for
> your gfx driver nor anything else while your cpu goes through
> something like a few 100k lines of code (for big drivers).

console_sem is definitely on my radar. I don't like it.

first, it does way too many things. it protects the console drivers
list (both for read and write accesses); it ensures that there is only
one CPU actually doing printing; it also serves as a 'general' lock in
some other parts of the kernel; its waiters must sleep in TASK_UNINTERRUPTIBLE
state; its owner upon unlocking must print all the pending logbuf messages
to the serial console; it even be acquired by user space processes (e.g.
tty_open or cat /proc/consoles, etc) so systemd can print kernel messages
and so on and on and on.

there are many things that are not good about console_sem.

besides, I'm really curious if there are any call sites (apart from
vprintk_emit() and console_flush_on_panic()) that actually want to print
all the pending kernel messages from console_unlock()? my expectation
would be that console_lock() callers mostly just want to protect the console
drivers list; and what console_unlock() does comes as a surprise.


> We have a few pages of kerneldoc explaining how to debug this and what
> happens, and gross debug hacks to just not take console_lock on driver
> (and pray it won't race), but it's a constant trap for new gfx
> hackers.
> 
> Fixing console_lock is much less likely to happen, I (and better
> hackers like Alan Cox) tried, and ran away in tears.

well, I don't even pretend to be a kernel hacker, but what were
the major obstacles?



the thing is,
I want to convert console_sem into RW lock. I briefly mentioned it last
year in Santa Fe, NM... and in fact I wrote *some sort* of a patch back
(well, mostly to see if it can fly at all). the patch has never been
finalized, polished, carefully written or properly tested (!).

it compiles and the system boots just fine. so it's definitely bug free (tm).


the patch (sorry for calling it a patch) does several things.


a) it turns console_unlock() into what it probably must be - just unlock
   console_sem. so now pending logbuf messages are printed by real printk()
   callers.


b) it converts console_sem semaphore to rwsem

hence a new locking API

	extern void console_read_lock(void);
	extern void console_write_lock(void);
	extern int console_read_trylock(void);
	extern int console_write_trylock(void);

	extern void console_read_unlock(void);
	extern void console_write_unlock(void);



so now we can convert things like

	console_lock();
	for_each_console(c) {
		if (!c->device)
			continue;
		driver = c->device(c, index);
		if (driver)
			break;
	}
	console_unlock();


which previously would
	1) sleep in console_lock()
	2) print all the pending messages from console_unlock()



to a more logical

	console_read_lock();
	for_each_console(c) {
		if (!c->device)
			continue;
		driver = c->device(c, index);
		if (driver)
			break;
	}
	console_read_unlock();



and at the same time, we still can have exclusive ownership of console
drivers list when it's needed. for example

	console_write_lock();
	....
		newcon->next = console_drivers->next;
		console_drivers->next = newcon;
	....
	console_write_unlock();
	console_tryflush();



c) a bunch of new functions to flush pending messages.

we used to have console_flush_on_panic() before, now we have two more
functions. this is for the case if console_unlock() caller actually
expected that if

	extern void console_tryflush(void);
	extern void console_flush(void);
	extern void console_flush_on_panic(void);

console_tryflush() does not sleep. console_flush() might sleep.

why do we need them? who knows, may be some of console_unlock()
callers actually want to flush logbuf, so they would need to
explicitly console_tryflush()/console_flush() now. yeah, I know.


d) it obsoletes 3 symbols that the rest of the system used to
   lock/unlock console subsystem:
		extern void console_lock(void);
		extern int console_trylock(void);
		extern void console_unlock(void);


the reason I kept those 3 is because converting all of the console_lock/unlock
callers is waaaaaay too much job for a PoC ;) so I just converted those 3
symbols instead. effectively they do down_write/up_write/etc at the moment;
but ideally we better switch to console_read_lock/console_write_lock and
friends. but I'll need a lot of help here because I have no idea whatsoever
what's going on in vt.c and some any other places.


so, JFYI, the patch is below. once again, it's not even in beta,
boot tested only (!).



Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/console.h |  16 +++
 kernel/printk/printk.c  | 327 ++++++++++++++++++++++++++++++++++--------------
 2 files changed, 246 insertions(+), 97 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index b8920a031a3e..e7050c43645f 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -162,12 +162,28 @@ extern int add_preferred_console(char *name, int idx, char *options);
 extern void register_console(struct console *);
 extern int unregister_console(struct console *);
 extern struct console *console_drivers;
+
+/* Obsolete, should be removed */
 extern void console_lock(void);
 extern int console_trylock(void);
 extern void console_unlock(void);
+
+/* Use these ones instead */
+extern void console_read_lock(void);
+extern void console_write_lock(void);
+extern int console_read_trylock(void);
+extern int console_write_trylock(void);
+
+extern void console_read_unlock(void);
+extern void console_write_unlock(void);
+
 extern void console_conditional_schedule(void);
 extern void console_unblank(void);
+
+extern void console_tryflush(void);
+extern void console_flush(void);
 extern void console_flush_on_panic(void);
+
 extern struct tty_driver *console_device(int *);
 extern void console_stop(struct console *);
 extern void console_start(struct console *);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fc47863f629c..a73bafb0b9d1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -78,16 +78,10 @@ EXPORT_SYMBOL(oops_in_progress);
  * provides serialisation for access to the entire console
  * driver system.
  */
-static DEFINE_SEMAPHORE(console_sem);
+static DECLARE_RWSEM(console_sem);
 struct console *console_drivers;
 EXPORT_SYMBOL_GPL(console_drivers);
 
-#ifdef CONFIG_LOCKDEP
-static struct lockdep_map console_lock_dep_map = {
-	.name = "console_lock"
-};
-#endif
-
 enum devkmsg_log_bits {
 	__DEVKMSG_LOG_BIT_ON = 0,
 	__DEVKMSG_LOG_BIT_OFF,
@@ -206,46 +200,67 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
 static int nr_ext_console_drivers;
 
 /*
- * Helper macros to handle lockdep when locking/unlocking console_sem. We use
- * macros instead of functions so that _RET_IP_ contains useful information.
+ * Helper macros to handle lockdep when locking/unlocking console_sem.
  */
-#define down_console_sem() do { \
-	down(&console_sem);\
-	mutex_acquire(&console_lock_dep_map, 0, 0, _RET_IP_);\
-} while (0)
+#define console_sem_down_write() do {					\
+		down_write(&console_sem);				\
+	} while (0)
+
+#define console_sem_down_read() do {					\
+		down_read(&console_sem);				\
+	} while (0)
+
+static void console_sem_up_write(void)
+{
+	unsigned long flags;
+
+	printk_safe_enter_irqsave(flags);
+	up_write(&console_sem);
+	printk_safe_exit_irqrestore(flags);
+}
 
-static int __down_trylock_console_sem(unsigned long ip)
+static void console_sem_up_read(void)
 {
-	int lock_failed;
+	unsigned long flags;
+
+	printk_safe_enter_irqsave(flags);
+	up_read(&console_sem);
+	printk_safe_exit_irqrestore(flags);
+}
+
+static int console_sem_read_trylock(void)
+{
+	int locked;
 	unsigned long flags;
 
 	/*
-	 * Here and in __up_console_sem() we need to be in safe mode,
+	 * Here, in __console_sem_write_trylock() ,and in
+	 * __console_sem_up_{write, read}() we need to be in safe mode,
 	 * because spindump/WARN/etc from under console ->lock will
 	 * deadlock in printk()->down_trylock_console_sem() otherwise.
 	 */
 	printk_safe_enter_irqsave(flags);
-	lock_failed = down_trylock(&console_sem);
+	locked = down_read_trylock(&console_sem);
 	printk_safe_exit_irqrestore(flags);
 
-	if (lock_failed)
+	if (!locked)
 		return 1;
-	mutex_acquire(&console_lock_dep_map, 0, 1, ip);
 	return 0;
 }
-#define down_trylock_console_sem() __down_trylock_console_sem(_RET_IP_)
 
-static void __up_console_sem(unsigned long ip)
+static int console_sem_write_trylock(void)
 {
+	int locked;
 	unsigned long flags;
 
-	mutex_release(&console_lock_dep_map, 1, ip);
-
 	printk_safe_enter_irqsave(flags);
-	up(&console_sem);
+	locked = down_write_trylock(&console_sem);
 	printk_safe_exit_irqrestore(flags);
+
+	if (!locked)
+		return 1;
+	return 0;
 }
-#define up_console_sem() __up_console_sem(_RET_IP_)
 
 /*
  * This is used for debugging the mess that is the VT code by
@@ -436,6 +451,26 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+#define CONSOLE_EMIT_UNLOCKED_VAL	-1
+#define CONSOLE_EMIT_LOCKED_VAL	1
+
+static int console_emit_lock = CONSOLE_EMIT_UNLOCKED_VAL;
+
+static int console_emit_trylock(void)
+{
+	if (cmpxchg(&console_emit_lock,
+			CONSOLE_EMIT_UNLOCKED_VAL,
+			CONSOLE_EMIT_LOCKED_VAL) == CONSOLE_EMIT_UNLOCKED_VAL)
+		return 1;
+
+	return 0;
+}
+
+static void console_emit_unlock(void)
+{
+	xchg(&console_emit_lock, CONSOLE_EMIT_UNLOCKED_VAL);
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1692,6 +1727,40 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c
 	return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len);
 }
 
+static void console_emit(bool may_schedule);
+
+/*
+ * Check if we can schedule in console_emit() from the current context.
+ */
+static int may_console_schedule(void)
+{
+	/*
+	 * When PREEMPT_COUNT disabled we can't reliably detect if it's
+	 * safe to schedule (e.g. calling printk while holding a spin_lock),
+	 * because preempt_disable()/preempt_enable() are just barriers there
+	 * and preempt_count() is always 0.
+	 *
+	 * RCU read sections have a separate preemption counter when
+	 * PREEMPT_RCU enabled thus we must take extra care and check
+	 * rcu_preempt_depth(), otherwise RCU read sections modify
+	 * preempt_count().
+	 */
+	return !oops_in_progress && preemptible() && !rcu_preempt_depth();
+}
+
+static int console_trylock_for_emit(void)
+{
+	/* If trylock fails, someone else is doing the printing */
+	if (!console_read_trylock())
+		return 0;
+
+	if (console_emit_trylock())
+		return 1;
+
+	console_read_unlock();
+	return 0;
+}
+
 asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
@@ -1765,8 +1834,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * semaphore.  The release will print out buffers and wake up
 		 * /dev/kmsg and syslog() users.
 		 */
-		if (console_trylock())
-			console_unlock();
+		if (console_trylock_for_emit()) {
+			console_emit(may_console_schedule());
+		}
 	}
 
 	return printed_len;
@@ -2011,18 +2081,18 @@ void suspend_console(void)
 	if (!console_suspend_enabled)
 		return;
 	printk("Suspending console(s) (use no_console_suspend to debug)\n");
-	console_lock();
+	console_write_lock();
 	console_suspended = 1;
-	up_console_sem();
+	console_sem_up_write();
 }
 
 void resume_console(void)
 {
 	if (!console_suspend_enabled)
 		return;
-	down_console_sem();
+	console_sem_down_write();
 	console_suspended = 0;
-	console_unlock();
+	console_write_unlock();
 }
 
 /**
@@ -2038,70 +2108,75 @@ static int console_cpu_notify(unsigned int cpu)
 {
 	if (!cpuhp_tasks_frozen) {
 		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
+		if (console_trylock_for_emit())
+			console_emit(true);
 	}
 	return 0;
 }
 
 /**
- * console_lock - lock the console system for exclusive use.
- *
- * Acquires a lock which guarantees that the caller has
- * exclusive access to the console system and the console_drivers list.
+ * console_write_lock/console_read_lock - lock the console system
+ * in exclusive or shared more correspondingly.
  *
  * Can sleep, returns nothing.
  */
-void console_lock(void)
+void console_write_lock(void)
 {
 	might_sleep();
 
-	down_console_sem();
+	console_sem_down_write();
 	if (console_suspended)
 		return;
-	console_locked = 1;
-	console_may_schedule = 1;
+	console_locked++;
 }
-EXPORT_SYMBOL(console_lock);
+EXPORT_SYMBOL(console_write_lock);
+
+void console_read_lock(void)
+{
+	might_sleep();
+
+	console_sem_down_read();
+	if (console_suspended)
+		return;
+	console_locked++;
+}
+EXPORT_SYMBOL(console_read_lock);
 
 /**
- * console_trylock - try to lock the console system for exclusive use.
- *
- * Try to acquire a lock which guarantees that the caller has exclusive
- * access to the console system and the console_drivers list.
+ * console_read_trylock/console_write_trylock - try to lock the console system
+ * for shared or exclusive use correspondingly.
  *
  * returns 1 on success, and 0 on failure to acquire the lock.
  */
-int console_trylock(void)
+int console_read_trylock(void)
 {
-	if (down_trylock_console_sem())
+	if (console_sem_read_trylock())
 		return 0;
 	if (console_suspended) {
-		up_console_sem();
+		console_sem_up_read();
 		return 0;
 	}
-	console_locked = 1;
-	/*
-	 * When PREEMPT_COUNT disabled we can't reliably detect if it's
-	 * safe to schedule (e.g. calling printk while holding a spin_lock),
-	 * because preempt_disable()/preempt_enable() are just barriers there
-	 * and preempt_count() is always 0.
-	 *
-	 * RCU read sections have a separate preemption counter when
-	 * PREEMPT_RCU enabled thus we must take extra care and check
-	 * rcu_preempt_depth(), otherwise RCU read sections modify
-	 * preempt_count().
-	 */
-	console_may_schedule = !oops_in_progress &&
-			preemptible() &&
-			!rcu_preempt_depth();
+	console_locked++;
 	return 1;
 }
-EXPORT_SYMBOL(console_trylock);
+EXPORT_SYMBOL(console_read_trylock);
+
+int console_write_trylock(void)
+{
+	if (console_sem_write_trylock())
+		return 0;
+	if (console_suspended) {
+		console_sem_up_write();
+		return 0;
+	}
+	console_locked++;
+	return 1;
+}
+EXPORT_SYMBOL(console_write_trylock);
 
 int is_console_locked(void)
 {
-	return console_locked;
+	return console_locked != 0;
 }
 
 /*
@@ -2133,7 +2208,7 @@ static inline int can_use_console(void)
 }
 
 /**
- * console_unlock - unlock the console system
+ * console_emit - print all pending messages and unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
@@ -2144,9 +2219,9 @@ static inline int can_use_console(void)
  *
  * If there is output waiting, we wake /dev/kmsg and syslog() users.
  *
- * console_unlock(); may be called from any context.
+ * console_emit(); may be called from any context.
  */
-void console_unlock(void)
+static void console_emit(bool may_schedule)
 {
 	static char ext_text[CONSOLE_EXT_LOG_MAX];
 	static char text[LOG_LINE_MAX + PREFIX_MAX];
@@ -2156,7 +2231,8 @@ void console_unlock(void)
 	bool do_cond_resched, retry;
 
 	if (console_suspended) {
-		up_console_sem();
+		console_emit_unlock();
+		console_sem_up_read();
 		return;
 	}
 
@@ -2174,7 +2250,7 @@ void console_unlock(void)
 	 * context reliably. Therefore the value must be stored before
 	 * and cleared after the the "again" goto label.
 	 */
-	do_cond_resched = console_may_schedule;
+	do_cond_resched = may_schedule;
 again:
 	console_may_schedule = 0;
 
@@ -2184,8 +2260,9 @@ void console_unlock(void)
 	 * console.
 	 */
 	if (!can_use_console()) {
-		console_locked = 0;
-		up_console_sem();
+		console_locked--;
+		console_emit_unlock();
+		console_sem_up_read();
 		return;
 	}
 
@@ -2249,7 +2326,7 @@ void console_unlock(void)
 		if (do_cond_resched)
 			cond_resched();
 	}
-	console_locked = 0;
+	console_locked--;
 
 	/* Release the exclusive_console once it is used */
 	if (unlikely(exclusive_console))
@@ -2257,7 +2334,8 @@ void console_unlock(void)
 
 	raw_spin_unlock(&logbuf_lock);
 
-	up_console_sem();
+	console_emit_unlock();
+	console_sem_up_read();
 
 	/*
 	 * Someone could have filled up the buffer again, so re-check if there's
@@ -2270,12 +2348,66 @@ void console_unlock(void)
 	raw_spin_unlock(&logbuf_lock);
 	printk_safe_exit_irqrestore(flags);
 
-	if (retry && console_trylock())
+	if (retry && console_trylock_for_emit())
 		goto again;
 
 	if (wake_klogd)
 		wake_up_klogd();
 }
+
+void console_tryflush(void)
+{
+	if (console_trylock_for_emit())
+		console_emit(may_console_schedule());
+}
+EXPORT_SYMBOL(console_tryflush);
+
+void console_flush(void)
+{
+	might_sleep();
+
+	console_read_lock();
+
+	if (console_emit_trylock())
+		console_emit(true);
+	else
+		console_read_unlock();
+}
+EXPORT_SYMBOL(console_flush);
+
+void console_write_unlock(void)
+{
+	console_locked--;
+	console_sem_up_write();
+}
+EXPORT_SYMBOL(console_write_unlock);
+
+void console_read_unlock(void)
+{
+	console_locked--;
+	console_sem_up_read();
+}
+EXPORT_SYMBOL(console_read_unlock);
+
+/*
+ * For compatibility only!
+ */
+extern void console_lock(void)
+{
+	console_write_lock();
+}
+EXPORT_SYMBOL(console_lock);
+
+extern int console_trylock(void)
+{
+	return console_write_trylock();
+}
+EXPORT_SYMBOL(console_trylock);
+
+extern void console_unlock(void)
+{
+	console_write_unlock();
+}
 EXPORT_SYMBOL(console_unlock);
 
 /**
@@ -2285,7 +2417,7 @@ EXPORT_SYMBOL(console_unlock);
  * if this CPU should yield the CPU to another task, do
  * so here.
  *
- * Must be called within console_lock();.
+ * Must be called under console_sem
  */
 void __sched console_conditional_schedule(void)
 {
@@ -2303,17 +2435,16 @@ void console_unblank(void)
 	 * oops_in_progress is set to 1..
 	 */
 	if (oops_in_progress) {
-		if (down_trylock_console_sem() != 0)
+		if (console_write_trylock() != 0)
 			return;
 	} else
-		console_lock();
+		console_write_lock();
 
-	console_locked = 1;
-	console_may_schedule = 0;
+	console_locked++;
 	for_each_console(c)
 		if ((c->flags & CON_ENABLED) && c->unblank)
 			c->unblank();
-	console_unlock();
+	console_write_unlock();
 }
 
 /**
@@ -2330,9 +2461,10 @@ void console_flush_on_panic(void)
 	 * context and we don't want to get preempted while flushing,
 	 * ensure may_schedule is cleared.
 	 */
-	console_trylock();
-	console_may_schedule = 0;
-	console_unlock();
+	console_read_trylock();
+	xchg(&console_emit_lock, CONSOLE_EMIT_LOCKED_VAL);
+	/* Forbid resheduling */
+	console_emit(0);
 }
 
 /*
@@ -2343,7 +2475,7 @@ struct tty_driver *console_device(int *index)
 	struct console *c;
 	struct tty_driver *driver = NULL;
 
-	console_lock();
+	console_read_lock();
 	for_each_console(c) {
 		if (!c->device)
 			continue;
@@ -2351,7 +2483,7 @@ struct tty_driver *console_device(int *index)
 		if (driver)
 			break;
 	}
-	console_unlock();
+	console_read_unlock();
 	return driver;
 }
 
@@ -2362,17 +2494,17 @@ struct tty_driver *console_device(int *index)
  */
 void console_stop(struct console *console)
 {
-	console_lock();
+	console_write_lock();
 	console->flags &= ~CON_ENABLED;
-	console_unlock();
+	console_write_unlock();
 }
 EXPORT_SYMBOL(console_stop);
 
 void console_start(struct console *console)
 {
-	console_lock();
+	console_write_lock();
 	console->flags |= CON_ENABLED;
-	console_unlock();
+	console_write_unlock();
 }
 EXPORT_SYMBOL(console_start);
 
@@ -2512,7 +2644,7 @@ void register_console(struct console *newcon)
 	 *	Put this console in the list - keep the
 	 *	preferred driver at the head of the list.
 	 */
-	console_lock();
+	console_write_lock();
 	if ((newcon->flags & CON_CONSDEV) || console_drivers == NULL) {
 		newcon->next = console_drivers;
 		console_drivers = newcon;
@@ -2543,7 +2675,8 @@ void register_console(struct console *newcon)
 		 */
 		exclusive_console = newcon;
 	}
-	console_unlock();
+	console_write_unlock();
+	console_tryflush();
 	console_sysfs_notify();
 
 	/*
@@ -2583,7 +2716,7 @@ int unregister_console(struct console *console)
 		return res;
 
 	res = 1;
-	console_lock();
+	console_write_lock();
 	if (console_drivers == console) {
 		console_drivers=console->next;
 		res = 0;
@@ -2609,7 +2742,7 @@ int unregister_console(struct console *console)
 		console_drivers->flags |= CON_CONSDEV;
 
 	console->flags &= ~CON_ENABLED;
-	console_unlock();
+	console_write_unlock();
 	console_sysfs_notify();
 	return res;
 }
@@ -2697,8 +2830,8 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
 
 	if (pending & PRINTK_PENDING_OUTPUT) {
 		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
+		if (console_trylock_for_emit())
+			console_emit(false);
 	}
 
 	if (pending & PRINTK_PENDING_WAKEUP)
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20 23:28                   ` Steven Rostedt
  2017-06-21  7:17                     ` Hannes Reinecke
@ 2017-06-21 11:12                     ` Sergey Senozhatsky
  2017-06-22 14:06                       ` Steven Rostedt
  2017-06-21 12:23                     ` Petr Mladek
  2 siblings, 1 reply; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-21 11:12 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Peter Zijlstra, ksummit-discuss

On (06/20/17 19:28), Steven Rostedt wrote:
> > On Wed, Jun 21, 2017 at 02:11:34AM +0900, Sergey Senozhatsky wrote:
> > 
> > > another thing that I found useful is a CPU number of the processor
> > > that stored a particular line to the logbuf.  
> > 
> > At some point we start reinventing ftrace...  there's issues with
> > joining the two up but there should at least be lessons we can learn.
> 
> I've thought about this a little too.
> 
> I would like printk to have per-cpu buffers. Then we don't even need to
> store the CPU number, that would be explicit by which buffer the data
> is stored in.
> 
> The one thing that is needed, is the consumer. In ftrace, it's whatever
> reads the buffer, which is usually user space, but can be the kernel
> (see sysctl-z). But there's only one consumer at a time.
> 
> I was thinking about a new design for printk. Similar to ftrace, but
> different.
> 
> 1) have per cpu buffers, that are lockless. Writes happen immediately,
> but the output happens later.

I thought about it, and the question is:
would lockless per-CPU logbuffers buy us anything? we used to have
problems with the logbuf_lock, but not any more (to the best of my
knowledge). we deal with logbuf_lock deadlocks using printk_nmi and
printk_safe. so I'd say that logbuf_lock probably doesn't bother us
anymore, it's all those locks that printk can't control that bother
us (semaphore, scheduler, timekeeping, serial consoles, etc. etc.).

so would per-CPU logbufs be better? we would need to do N-way merge
(N per-CPU logbufs) when we print the kernel messages log, correct?

> 2) have two types of console interfaces. A normal and a critical.
> 
> 3) have a thread that is woken whenever there is data in any of the
> buffers, and reads the buffers, again lockless. But to do this in a
> reasonable manner, unless you break the printks up in sub buffers like
> ftrace, if the consumer isn't fast enough, newer messages are dropped.

yes, so I definitely want to have printing offloading. but, per my
experience, it's not all so simple when it comes to offloading. if
we would compare offloading with the direct printing then offloading
does change printk behaviour and I saw a number of dropped messages
bug reports because of offloading. the existing direct printing can
throttle the CPU that printks a lot.

direct printing

	CPU1

	printk
	call_console_drivers
	printk
	call_console_drivers
	...
	printk
	call_console_drivers

so new logbuf entries do not appear in the logbuf until the previous
ones are printed to the serial console. while with the offloading
it's different:

offloading

	CPU1				CPU2
	printk
	printk				call_console_drivers
	printk
	printk				call_console_drivers
	printk
					call_console_drivers

new logbuf entries now appear uncontrollably.

well, nothing new here. we already can have hit scenario, we just need
one CPU spinning in console_unlock() and one or several CPUs doing
printk. but with offloading we potentially break a trivial case - a
single CPU that calls printk.

so may be additionally to offloading we also would need some sort of
throttling mechanism in printk.

> 4) If a critical print is needed (and here's why we have two console
> interfaces), the normal console interface gets turned off, and the
> buffers stop being output through them. What ever called the critical
> print, will take over, and flush out all the contents of the current
> buffers. Then anything printed during the critical section will go out
> immediately (no buffering). The printk thread, will stop having access
> to the buffers, and shutdown till the critical section is complete.
> 
> This is just a rough idea. I think it is possible. The tricky part is
> going to be the switch over to the critical section. Also, have a
> command line parameter that has all printks be critical. Peter Zilstra
> has some patches that already does that with making printk turn into
> early printk.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20 23:28                   ` Steven Rostedt
  2017-06-21  7:17                     ` Hannes Reinecke
  2017-06-21 11:12                     ` Sergey Senozhatsky
@ 2017-06-21 12:23                     ` Petr Mladek
  2017-06-21 14:18                       ` Andrew Lunn
  2017-06-21 16:09                       ` Andrew Lunn
  2 siblings, 2 replies; 60+ messages in thread
From: Petr Mladek @ 2017-06-21 12:23 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Peter Zijlstra, ksummit-discuss

On Tue 2017-06-20 19:28:58, Steven Rostedt wrote:
> I've thought about this a little too.
> 
> I would like printk to have per-cpu buffers. Then we don't even need to
> store the CPU number, that would be explicit by which buffer the data
> is stored in.
> 
> The one thing that is needed, is the consumer. In ftrace, it's whatever
> reads the buffer, which is usually user space, but can be the kernel
> (see sysctl-z). But there's only one consumer at a time.
> 
> I was thinking about a new design for printk. Similar to ftrace, but
> different.
> 
> 1) have per cpu buffers, that are lockless. Writes happen immediately,
> but the output happens later.

My problems with per-CPU buffers is that:

    + I am not sure how big per-CPU buffers we could afford.
      Any non-balanced usage increases the chance of loosing
      messages.

    + The information is scattered and extra tools are needed
      to locate the messages and sort them.

    + It suggests that the solution should be lockless. But
      lockless code is very complex in principle. The ring
      buffer used by ftrace is a good example and it is
      still limited to one reader.

> 2) have two types of console interfaces. A normal and a critical.
> 
> 3) have a thread that is woken whenever there is data in any of the
> buffers, and reads the buffers, again lockless. But to do this in a
> reasonable manner, unless you break the printks up in sub buffers like
> ftrace, if the consumer isn't fast enough, newer messages are dropped.
>
> 4) If a critical print is needed (and here's why we have two console
> interfaces), the normal console interface gets turned off, and the
> buffers stop being output through them. What ever called the critical
> print, will take over, and flush out all the contents of the current
> buffers. Then anything printed during the critical section will go out
> immediately (no buffering). The printk thread, will stop having access
> to the buffers, and shutdown till the critical section is complete.

IMHO, this is something that we are already trying to implement by
the printk_kthread.

To be honest, I am not sure if I have a good top view at the moment.
Especially I am not sure about all the existing problems and requirements.

I always hear the the printk code is too complex. Then people complain
about various limitations. Solution of the limitations usually make
the code even more complex.

IMHO, the two main fighting tasks are:

  1. store messages as fast as possible

  2. show the messages as reliably as possible

IMHO, we are relatively good in the storing part. The biggest
problems are on the showing side, especially when it comes to
slow and messy consoles.

I tried to look at it also in more details. The problems that come
to my mind are:

  1. hard lockups in NMI

  2. hard lockups caused by recursive calls, e.g. warnings
     triggered from printk() code

  3. soft lockups caused by console handling

  4. Lost messages when the is a flood of them

  5. Lost messages when the system hangs

  6. mixed part of continues lines or related lines, e.g. backtraces,
     WARN()

  7. Unreliable time stamps and sorting of messages.

  8. console code is a big mess and I am afraid that I am still not
     aware of many hidden traps there.

Let me to look closer at the problems:

Ad 1. hard lockups in NMI

   It is almost solved by the printk_safe buffer. One drawback
   is that the messages are temporary stored separately and
   the buffer is rather small.

   Lock-less ring buffer would help. The question is if is
   worth the cost. It still does not solve pushing to consoles
   that might have their own locks.

Ad 2. hard lockups by recursive calls

   The recursion printk() -> some_func() -> printk()
   is mostly solved by printk_safe. It has the same drawbacks
   as the NMI solution.

   The recursion some_func() -> printk() -> some_func() -> printk()
   is partly solved by printk_deferred(). It avoids the recursion
   from the console handling code. I actually do not know about
   better solution. Note that the deadlock usually happens in
   some_func() and _not_ in printk(). I do not see how printk()
   itself could detect and prevent this. We could try to detect
   this problems earlier using lockdep.

Ad 3. soft lockups caused by console handling

   We basically need some offloading for the console handling.
   The current problem is how to detect critical situation
   and switch to the sync mode.

Ad 4. Lost messages when the is a flood of them

   Separate buffers or reshuffling (dropping) less important
   messages would help.

Ad 5. Lost messages when the system hangs

   We already have troubles and the console offloading
   makes it worse.

   We should reduce the negative effects of offloading.
   We should make sure that someone is always handling
   console and reduce sleeps with console_lock. Also
   everyone should try to handle some messages when
   the console_lock is available to handle sudden death.

   It was never perfect. The patchset from Peter Zijlstra
   (early printk) looks like an interesting fallback to me.

   We should make more consoles lock less.

   We could also implement storing log into persistent memory.

Ad 6. Mixed parts of continues lines and related lines.

   We need to be careful here. The cont buffer handling
   made the printk code much more complex, introduced
   many regressions. We always have to consider
   the complexity and the gain.

   There are some proposals for an API that would allow
   to enter/exit a buffered mode. One question is if
   we could afford to disable preemption (use per-CPU
   buffers). Another question is the complexity
   and extra memory needs.

   IMPORTANT: Any buffering is dangerous for the reliability of
   the output. By other words, buffering delays output and
   we might never see such messages.

7. Unreliable time stamps and sorting messages

   The current extra buffers (cont, printk_safe, printk_safe_nmi)
   makes this worse. The timestamp is added later.

   We could surely improve this. But it is always with the cost
   of complexity. Also it might bring new problems when interacting
   with the timer code.

Did I miss some important problems?
Did I miss some possible solutions?

I have to admit that I did not have time to think about the last
proposals from Sergey about printk_kthread. So, some of the above
summary might be a bit out of date.

Anyway, I wanted to move the discussion from implementation
back to gathering requirements and the problems with the current
implementation. At least I am not able to judge other implementation
proposals without it. Also I wanted to summarize the current
know-how. I hope that it would help to move forward and avoid
discussion cycles.

I hope that I did not kill the brainstorming effect with this.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-21 12:23                     ` Petr Mladek
@ 2017-06-21 14:18                       ` Andrew Lunn
  2017-06-23  8:46                         ` Petr Mladek
  2017-06-21 16:09                       ` Andrew Lunn
  1 sibling, 1 reply; 60+ messages in thread
From: Andrew Lunn @ 2017-06-21 14:18 UTC (permalink / raw)
  To: Petr Mladek; +Cc: Peter Zijlstra, ksummit-discuss

On Wed, Jun 21, 2017 at 02:23:04PM +0200, Petr Mladek wrote:
> On Tue 2017-06-20 19:28:58, Steven Rostedt wrote:
> > I've thought about this a little too.
> > 
> > I would like printk to have per-cpu buffers. Then we don't even need to
> > store the CPU number, that would be explicit by which buffer the data
> > is stored in.
> > 
> > The one thing that is needed, is the consumer. In ftrace, it's whatever
> > reads the buffer, which is usually user space, but can be the kernel
> > (see sysctl-z). But there's only one consumer at a time.
> > 
> > I was thinking about a new design for printk. Similar to ftrace, but
> > different.
> > 
> > 1) have per cpu buffers, that are lockless. Writes happen immediately,
> > but the output happens later.
> 
> My problems with per-CPU buffers is that:
> 
>     + I am not sure how big per-CPU buffers we could afford.
>       Any non-balanced usage increases the chance of loosing
>       messages.
> 
>     + The information is scattered and extra tools are needed
>       to locate the messages and sort them.

Considering this from the Embedded world, during board bring up, you
might be poking around in memory using a JTAG interface because your
serial driver is not working yet, or the kernel is crashing before the
serial port is up and running. The in-memory format needs to be quite
simple to read. I don't know how early_printk() factors into
this. Maybe that solves this problem?

      Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-21 12:23                     ` Petr Mladek
  2017-06-21 14:18                       ` Andrew Lunn
@ 2017-06-21 16:09                       ` Andrew Lunn
  2017-06-23  8:49                         ` Petr Mladek
  1 sibling, 1 reply; 60+ messages in thread
From: Andrew Lunn @ 2017-06-21 16:09 UTC (permalink / raw)
  To: Petr Mladek; +Cc: Peter Zijlstra, ksummit-discuss

> 7. Unreliable time stamps and sorting messages
> 
>    The current extra buffers (cont, printk_safe, printk_safe_nmi)
>    makes this worse. The timestamp is added later.

All you need is a sequence number. Since you have the CPU number
already, it could be a per CPU atomic. And you probably don't even
need to sort the output, if the sequence number is part of the
output. How bad is the reordering? Would a sequence space of 256 be
enough?

	Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-21 10:15       ` Sergey Senozhatsky
@ 2017-06-22 13:42         ` Daniel Vetter
  2017-06-22 13:48             ` Daniel Vetter
  2017-06-23  5:20           ` Sergey Senozhatsky
  0 siblings, 2 replies; 60+ messages in thread
From: Daniel Vetter @ 2017-06-22 13:42 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: ksummit-discuss

On Wed, Jun 21, 2017 at 12:15 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> Hello,
>
> On (06/20/17 20:45), Daniel Vetter wrote:
> [..]
>> My own pet peeve with printk from the drm side:
>>
>> Untangling printk form the console_lock horror show, if at all
>> possible. One problem is that heavy printk usage makes the
>> console_lock massively contended (we delay restoring the the kernel
>> console on resume in kms drivers to a worker because of that). The
>> worse problem is that console_lock locking is horribly monolithic and
>> defacto requires that large chunks of your gfx driver init code needs
>> to run while holding it. Which means no printk output, neither for
>> your gfx driver nor anything else while your cpu goes through
>> something like a few 100k lines of code (for big drivers).
>
> console_sem is definitely on my radar. I don't like it.
>
> first, it does way too many things. it protects the console drivers
> list (both for read and write accesses); it ensures that there is only
> one CPU actually doing printing; it also serves as a 'general' lock in
> some other parts of the kernel; its waiters must sleep in TASK_UNINTERRUPTIBLE
> state; its owner upon unlocking must print all the pending logbuf messages
> to the serial console; it even be acquired by user space processes (e.g.
> tty_open or cat /proc/consoles, etc) so systemd can print kernel messages
> and so on and on and on.
>
> there are many things that are not good about console_sem.
>
> besides, I'm really curious if there are any call sites (apart from
> vprintk_emit() and console_flush_on_panic()) that actually want to print
> all the pending kernel messages from console_unlock()? my expectation
> would be that console_lock() callers mostly just want to protect the console
> drivers list; and what console_unlock() does comes as a surprise.
>
>
>> We have a few pages of kerneldoc explaining how to debug this and what
>> happens, and gross debug hacks to just not take console_lock on driver
>> (and pray it won't race), but it's a constant trap for new gfx
>> hackers.
>>
>> Fixing console_lock is much less likely to happen, I (and better
>> hackers like Alan Cox) tried, and ran away in tears.
>
> well, I don't even pretend to be a kernel hacker, but what were
> the major obstacles?

The framebuffer notifier gets called from multiple places, with
multiple different execution contexts and locking contexts, for at
least 3 different things:
- Allowing fbdev console support to be built modular, and loaded in
any order wrt fbdev drivers and still end up with an fbcon console.
This results in a fbdev driver 2 fbcon call in
(un)register_framebuffer.
- Backlight handling. This means fbcon needs to call into the fb
notifier, giving you a locking inversion.
- Bunch of other things just for fun.

This means all the locking for all of the above things are entangled
through the modifiers mutex, with the result that defacto console_lock
protects everything, and that anything in fbdev that might call into
one of these notifier use-cases must use console_lock as the
outermost.

To make matters worse, drm kernel modesetting drivers then get called
in an fbdev->fbcon->fbdev->drm fbdev emulation->kms driver chain from
within register_framebuffer to do the initial modeset, and ofc that's
all under the console_lock.

Tears pretty much guaranteed, and after a few hacks from Alan&me I
concluded that the only way to fix this is to at least partially
rewrite fbdev (a dead subsystem, so no one's volunteering), with the
risk that you get to revert it all because someone is indeed relying
on that super-flexible module loading order sequence. The simplest fix
would probably be to make the entire fbdev->fbcon setup depency a
hard-coded function call, or maybe at most a one-shot symbol_get
attempt.

> the thing is,
> I want to convert console_sem into RW lock. I briefly mentioned it last
> year in Santa Fe, NM... and in fact I wrote *some sort* of a patch back
> (well, mostly to see if it can fly at all). the patch has never been
> finalized, polished, carefully written or properly tested (!).

RW won't help, the problem is the locking inversion design disaster in
fbdev/fbcon resulting in console_lock being an outermost lock procting
almost the entire fbdev world.

> it compiles and the system boots just fine. so it's definitely bug free (tm).
>
>
> the patch (sorry for calling it a patch) does several things.
>
>
> a) it turns console_unlock() into what it probably must be - just unlock
>    console_sem. so now pending logbuf messages are printed by real printk()
>    callers.
>
>
> b) it converts console_sem semaphore to rwsem
>
> hence a new locking API
>
>         extern void console_read_lock(void);
>         extern void console_write_lock(void);
>         extern int console_read_trylock(void);
>         extern int console_write_trylock(void);
>
>         extern void console_read_unlock(void);
>         extern void console_write_unlock(void);
>
>
>
> so now we can convert things like
>
>         console_lock();
>         for_each_console(c) {
>                 if (!c->device)
>                         continue;
>                 driver = c->device(c, index);
>                 if (driver)
>                         break;
>         }
>         console_unlock();
>
>
> which previously would
>         1) sleep in console_lock()
>         2) print all the pending messages from console_unlock()
>
>
>
> to a more logical
>
>         console_read_lock();
>         for_each_console(c) {
>                 if (!c->device)
>                         continue;
>                 driver = c->device(c, index);
>                 if (driver)
>                         break;
>         }
>         console_read_unlock();
>
>
>
> and at the same time, we still can have exclusive ownership of console
> drivers list when it's needed. for example
>
>         console_write_lock();
>         ....
>                 newcon->next = console_drivers->next;
>                 console_drivers->next = newcon;
>         ....
>         console_write_unlock();
>         console_tryflush();
>
>
>
> c) a bunch of new functions to flush pending messages.
>
> we used to have console_flush_on_panic() before, now we have two more
> functions. this is for the case if console_unlock() caller actually
> expected that if
>
>         extern void console_tryflush(void);
>         extern void console_flush(void);
>         extern void console_flush_on_panic(void);
>
> console_tryflush() does not sleep. console_flush() might sleep.
>
> why do we need them? who knows, may be some of console_unlock()
> callers actually want to flush logbuf, so they would need to
> explicitly console_tryflush()/console_flush() now. yeah, I know.
>
>
> d) it obsoletes 3 symbols that the rest of the system used to
>    lock/unlock console subsystem:
>                 extern void console_lock(void);
>                 extern int console_trylock(void);
>                 extern void console_unlock(void);
>
>
> the reason I kept those 3 is because converting all of the console_lock/unlock
> callers is waaaaaay too much job for a PoC ;) so I just converted those 3
> symbols instead. effectively they do down_write/up_write/etc at the moment;
> but ideally we better switch to console_read_lock/console_write_lock and
> friends. but I'll need a lot of help here because I have no idea whatsoever
> what's going on in vt.c and some any other places.
>
>
> so, JFYI, the patch is below. once again, it's not even in beta,
> boot tested only (!).

Sounds all like a very solid plan, and probably even works well if you
have serial consoles and nothing else. I expect pain when you try to
make this work with fbdev/fbcon.

Imo the situation is so bad that the simplest fix would be to burn
down fbcon and write a new minimal emergency console on top of just
kms. But there's unfortunately still a pile of hw only support by
fbdev and not kms :-/ Also, kmscon didn't see enough interest to get
polished for merging.

Cheers, Daniel


>
>
>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  include/linux/console.h |  16 +++
>  kernel/printk/printk.c  | 327 ++++++++++++++++++++++++++++++++++--------------
>  2 files changed, 246 insertions(+), 97 deletions(-)
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index b8920a031a3e..e7050c43645f 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -162,12 +162,28 @@ extern int add_preferred_console(char *name, int idx, char *options);
>  extern void register_console(struct console *);
>  extern int unregister_console(struct console *);
>  extern struct console *console_drivers;
> +
> +/* Obsolete, should be removed */
>  extern void console_lock(void);
>  extern int console_trylock(void);
>  extern void console_unlock(void);
> +
> +/* Use these ones instead */
> +extern void console_read_lock(void);
> +extern void console_write_lock(void);
> +extern int console_read_trylock(void);
> +extern int console_write_trylock(void);
> +
> +extern void console_read_unlock(void);
> +extern void console_write_unlock(void);
> +
>  extern void console_conditional_schedule(void);
>  extern void console_unblank(void);
> +
> +extern void console_tryflush(void);
> +extern void console_flush(void);
>  extern void console_flush_on_panic(void);
> +
>  extern struct tty_driver *console_device(int *);
>  extern void console_stop(struct console *);
>  extern void console_start(struct console *);
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index fc47863f629c..a73bafb0b9d1 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -78,16 +78,10 @@ EXPORT_SYMBOL(oops_in_progress);
>   * provides serialisation for access to the entire console
>   * driver system.
>   */
> -static DEFINE_SEMAPHORE(console_sem);
> +static DECLARE_RWSEM(console_sem);
>  struct console *console_drivers;
>  EXPORT_SYMBOL_GPL(console_drivers);
>
> -#ifdef CONFIG_LOCKDEP
> -static struct lockdep_map console_lock_dep_map = {
> -       .name = "console_lock"
> -};
> -#endif
> -
>  enum devkmsg_log_bits {
>         __DEVKMSG_LOG_BIT_ON = 0,
>         __DEVKMSG_LOG_BIT_OFF,
> @@ -206,46 +200,67 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
>  static int nr_ext_console_drivers;
>
>  /*
> - * Helper macros to handle lockdep when locking/unlocking console_sem. We use
> - * macros instead of functions so that _RET_IP_ contains useful information.
> + * Helper macros to handle lockdep when locking/unlocking console_sem.
>   */
> -#define down_console_sem() do { \
> -       down(&console_sem);\
> -       mutex_acquire(&console_lock_dep_map, 0, 0, _RET_IP_);\
> -} while (0)
> +#define console_sem_down_write() do {                                  \
> +               down_write(&console_sem);                               \
> +       } while (0)
> +
> +#define console_sem_down_read() do {                                   \
> +               down_read(&console_sem);                                \
> +       } while (0)
> +
> +static void console_sem_up_write(void)
> +{
> +       unsigned long flags;
> +
> +       printk_safe_enter_irqsave(flags);
> +       up_write(&console_sem);
> +       printk_safe_exit_irqrestore(flags);
> +}
>
> -static int __down_trylock_console_sem(unsigned long ip)
> +static void console_sem_up_read(void)
>  {
> -       int lock_failed;
> +       unsigned long flags;
> +
> +       printk_safe_enter_irqsave(flags);
> +       up_read(&console_sem);
> +       printk_safe_exit_irqrestore(flags);
> +}
> +
> +static int console_sem_read_trylock(void)
> +{
> +       int locked;
>         unsigned long flags;
>
>         /*
> -        * Here and in __up_console_sem() we need to be in safe mode,
> +        * Here, in __console_sem_write_trylock() ,and in
> +        * __console_sem_up_{write, read}() we need to be in safe mode,
>          * because spindump/WARN/etc from under console ->lock will
>          * deadlock in printk()->down_trylock_console_sem() otherwise.
>          */
>         printk_safe_enter_irqsave(flags);
> -       lock_failed = down_trylock(&console_sem);
> +       locked = down_read_trylock(&console_sem);
>         printk_safe_exit_irqrestore(flags);
>
> -       if (lock_failed)
> +       if (!locked)
>                 return 1;
> -       mutex_acquire(&console_lock_dep_map, 0, 1, ip);
>         return 0;
>  }
> -#define down_trylock_console_sem() __down_trylock_console_sem(_RET_IP_)
>
> -static void __up_console_sem(unsigned long ip)
> +static int console_sem_write_trylock(void)
>  {
> +       int locked;
>         unsigned long flags;
>
> -       mutex_release(&console_lock_dep_map, 1, ip);
> -
>         printk_safe_enter_irqsave(flags);
> -       up(&console_sem);
> +       locked = down_write_trylock(&console_sem);
>         printk_safe_exit_irqrestore(flags);
> +
> +       if (!locked)
> +               return 1;
> +       return 0;
>  }
> -#define up_console_sem() __up_console_sem(_RET_IP_)
>
>  /*
>   * This is used for debugging the mess that is the VT code by
> @@ -436,6 +451,26 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>
> +#define CONSOLE_EMIT_UNLOCKED_VAL      -1
> +#define CONSOLE_EMIT_LOCKED_VAL        1
> +
> +static int console_emit_lock = CONSOLE_EMIT_UNLOCKED_VAL;
> +
> +static int console_emit_trylock(void)
> +{
> +       if (cmpxchg(&console_emit_lock,
> +                       CONSOLE_EMIT_UNLOCKED_VAL,
> +                       CONSOLE_EMIT_LOCKED_VAL) == CONSOLE_EMIT_UNLOCKED_VAL)
> +               return 1;
> +
> +       return 0;
> +}
> +
> +static void console_emit_unlock(void)
> +{
> +       xchg(&console_emit_lock, CONSOLE_EMIT_UNLOCKED_VAL);
> +}
> +
>  /* Return log buffer address */
>  char *log_buf_addr_get(void)
>  {
> @@ -1692,6 +1727,40 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c
>         return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len);
>  }
>
> +static void console_emit(bool may_schedule);
> +
> +/*
> + * Check if we can schedule in console_emit() from the current context.
> + */
> +static int may_console_schedule(void)
> +{
> +       /*
> +        * When PREEMPT_COUNT disabled we can't reliably detect if it's
> +        * safe to schedule (e.g. calling printk while holding a spin_lock),
> +        * because preempt_disable()/preempt_enable() are just barriers there
> +        * and preempt_count() is always 0.
> +        *
> +        * RCU read sections have a separate preemption counter when
> +        * PREEMPT_RCU enabled thus we must take extra care and check
> +        * rcu_preempt_depth(), otherwise RCU read sections modify
> +        * preempt_count().
> +        */
> +       return !oops_in_progress && preemptible() && !rcu_preempt_depth();
> +}
> +
> +static int console_trylock_for_emit(void)
> +{
> +       /* If trylock fails, someone else is doing the printing */
> +       if (!console_read_trylock())
> +               return 0;
> +
> +       if (console_emit_trylock())
> +               return 1;
> +
> +       console_read_unlock();
> +       return 0;
> +}
> +
>  asmlinkage int vprintk_emit(int facility, int level,
>                             const char *dict, size_t dictlen,
>                             const char *fmt, va_list args)
> @@ -1765,8 +1834,9 @@ asmlinkage int vprintk_emit(int facility, int level,
>                  * semaphore.  The release will print out buffers and wake up
>                  * /dev/kmsg and syslog() users.
>                  */
> -               if (console_trylock())
> -                       console_unlock();
> +               if (console_trylock_for_emit()) {
> +                       console_emit(may_console_schedule());
> +               }
>         }
>
>         return printed_len;
> @@ -2011,18 +2081,18 @@ void suspend_console(void)
>         if (!console_suspend_enabled)
>                 return;
>         printk("Suspending console(s) (use no_console_suspend to debug)\n");
> -       console_lock();
> +       console_write_lock();
>         console_suspended = 1;
> -       up_console_sem();
> +       console_sem_up_write();
>  }
>
>  void resume_console(void)
>  {
>         if (!console_suspend_enabled)
>                 return;
> -       down_console_sem();
> +       console_sem_down_write();
>         console_suspended = 0;
> -       console_unlock();
> +       console_write_unlock();
>  }
>
>  /**
> @@ -2038,70 +2108,75 @@ static int console_cpu_notify(unsigned int cpu)
>  {
>         if (!cpuhp_tasks_frozen) {
>                 /* If trylock fails, someone else is doing the printing */
> -               if (console_trylock())
> -                       console_unlock();
> +               if (console_trylock_for_emit())
> +                       console_emit(true);
>         }
>         return 0;
>  }
>
>  /**
> - * console_lock - lock the console system for exclusive use.
> - *
> - * Acquires a lock which guarantees that the caller has
> - * exclusive access to the console system and the console_drivers list.
> + * console_write_lock/console_read_lock - lock the console system
> + * in exclusive or shared more correspondingly.
>   *
>   * Can sleep, returns nothing.
>   */
> -void console_lock(void)
> +void console_write_lock(void)
>  {
>         might_sleep();
>
> -       down_console_sem();
> +       console_sem_down_write();
>         if (console_suspended)
>                 return;
> -       console_locked = 1;
> -       console_may_schedule = 1;
> +       console_locked++;
>  }
> -EXPORT_SYMBOL(console_lock);
> +EXPORT_SYMBOL(console_write_lock);
> +
> +void console_read_lock(void)
> +{
> +       might_sleep();
> +
> +       console_sem_down_read();
> +       if (console_suspended)
> +               return;
> +       console_locked++;
> +}
> +EXPORT_SYMBOL(console_read_lock);
>
>  /**
> - * console_trylock - try to lock the console system for exclusive use.
> - *
> - * Try to acquire a lock which guarantees that the caller has exclusive
> - * access to the console system and the console_drivers list.
> + * console_read_trylock/console_write_trylock - try to lock the console system
> + * for shared or exclusive use correspondingly.
>   *
>   * returns 1 on success, and 0 on failure to acquire the lock.
>   */
> -int console_trylock(void)
> +int console_read_trylock(void)
>  {
> -       if (down_trylock_console_sem())
> +       if (console_sem_read_trylock())
>                 return 0;
>         if (console_suspended) {
> -               up_console_sem();
> +               console_sem_up_read();
>                 return 0;
>         }
> -       console_locked = 1;
> -       /*
> -        * When PREEMPT_COUNT disabled we can't reliably detect if it's
> -        * safe to schedule (e.g. calling printk while holding a spin_lock),
> -        * because preempt_disable()/preempt_enable() are just barriers there
> -        * and preempt_count() is always 0.
> -        *
> -        * RCU read sections have a separate preemption counter when
> -        * PREEMPT_RCU enabled thus we must take extra care and check
> -        * rcu_preempt_depth(), otherwise RCU read sections modify
> -        * preempt_count().
> -        */
> -       console_may_schedule = !oops_in_progress &&
> -                       preemptible() &&
> -                       !rcu_preempt_depth();
> +       console_locked++;
>         return 1;
>  }
> -EXPORT_SYMBOL(console_trylock);
> +EXPORT_SYMBOL(console_read_trylock);
> +
> +int console_write_trylock(void)
> +{
> +       if (console_sem_write_trylock())
> +               return 0;
> +       if (console_suspended) {
> +               console_sem_up_write();
> +               return 0;
> +       }
> +       console_locked++;
> +       return 1;
> +}
> +EXPORT_SYMBOL(console_write_trylock);
>
>  int is_console_locked(void)
>  {
> -       return console_locked;
> +       return console_locked != 0;
>  }
>
>  /*
> @@ -2133,7 +2208,7 @@ static inline int can_use_console(void)
>  }
>
>  /**
> - * console_unlock - unlock the console system
> + * console_emit - print all pending messages and unlock the console system
>   *
>   * Releases the console_lock which the caller holds on the console system
>   * and the console driver list.
> @@ -2144,9 +2219,9 @@ static inline int can_use_console(void)
>   *
>   * If there is output waiting, we wake /dev/kmsg and syslog() users.
>   *
> - * console_unlock(); may be called from any context.
> + * console_emit(); may be called from any context.
>   */
> -void console_unlock(void)
> +static void console_emit(bool may_schedule)
>  {
>         static char ext_text[CONSOLE_EXT_LOG_MAX];
>         static char text[LOG_LINE_MAX + PREFIX_MAX];
> @@ -2156,7 +2231,8 @@ void console_unlock(void)
>         bool do_cond_resched, retry;
>
>         if (console_suspended) {
> -               up_console_sem();
> +               console_emit_unlock();
> +               console_sem_up_read();
>                 return;
>         }
>
> @@ -2174,7 +2250,7 @@ void console_unlock(void)
>          * context reliably. Therefore the value must be stored before
>          * and cleared after the the "again" goto label.
>          */
> -       do_cond_resched = console_may_schedule;
> +       do_cond_resched = may_schedule;
>  again:
>         console_may_schedule = 0;
>
> @@ -2184,8 +2260,9 @@ void console_unlock(void)
>          * console.
>          */
>         if (!can_use_console()) {
> -               console_locked = 0;
> -               up_console_sem();
> +               console_locked--;
> +               console_emit_unlock();
> +               console_sem_up_read();
>                 return;
>         }
>
> @@ -2249,7 +2326,7 @@ void console_unlock(void)
>                 if (do_cond_resched)
>                         cond_resched();
>         }
> -       console_locked = 0;
> +       console_locked--;
>
>         /* Release the exclusive_console once it is used */
>         if (unlikely(exclusive_console))
> @@ -2257,7 +2334,8 @@ void console_unlock(void)
>
>         raw_spin_unlock(&logbuf_lock);
>
> -       up_console_sem();
> +       console_emit_unlock();
> +       console_sem_up_read();
>
>         /*
>          * Someone could have filled up the buffer again, so re-check if there's
> @@ -2270,12 +2348,66 @@ void console_unlock(void)
>         raw_spin_unlock(&logbuf_lock);
>         printk_safe_exit_irqrestore(flags);
>
> -       if (retry && console_trylock())
> +       if (retry && console_trylock_for_emit())
>                 goto again;
>
>         if (wake_klogd)
>                 wake_up_klogd();
>  }
> +
> +void console_tryflush(void)
> +{
> +       if (console_trylock_for_emit())
> +               console_emit(may_console_schedule());
> +}
> +EXPORT_SYMBOL(console_tryflush);
> +
> +void console_flush(void)
> +{
> +       might_sleep();
> +
> +       console_read_lock();
> +
> +       if (console_emit_trylock())
> +               console_emit(true);
> +       else
> +               console_read_unlock();
> +}
> +EXPORT_SYMBOL(console_flush);
> +
> +void console_write_unlock(void)
> +{
> +       console_locked--;
> +       console_sem_up_write();
> +}
> +EXPORT_SYMBOL(console_write_unlock);
> +
> +void console_read_unlock(void)
> +{
> +       console_locked--;
> +       console_sem_up_read();
> +}
> +EXPORT_SYMBOL(console_read_unlock);
> +
> +/*
> + * For compatibility only!
> + */
> +extern void console_lock(void)
> +{
> +       console_write_lock();
> +}
> +EXPORT_SYMBOL(console_lock);
> +
> +extern int console_trylock(void)
> +{
> +       return console_write_trylock();
> +}
> +EXPORT_SYMBOL(console_trylock);
> +
> +extern void console_unlock(void)
> +{
> +       console_write_unlock();
> +}
>  EXPORT_SYMBOL(console_unlock);
>
>  /**
> @@ -2285,7 +2417,7 @@ EXPORT_SYMBOL(console_unlock);
>   * if this CPU should yield the CPU to another task, do
>   * so here.
>   *
> - * Must be called within console_lock();.
> + * Must be called under console_sem
>   */
>  void __sched console_conditional_schedule(void)
>  {
> @@ -2303,17 +2435,16 @@ void console_unblank(void)
>          * oops_in_progress is set to 1..
>          */
>         if (oops_in_progress) {
> -               if (down_trylock_console_sem() != 0)
> +               if (console_write_trylock() != 0)
>                         return;
>         } else
> -               console_lock();
> +               console_write_lock();
>
> -       console_locked = 1;
> -       console_may_schedule = 0;
> +       console_locked++;
>         for_each_console(c)
>                 if ((c->flags & CON_ENABLED) && c->unblank)
>                         c->unblank();
> -       console_unlock();
> +       console_write_unlock();
>  }
>
>  /**
> @@ -2330,9 +2461,10 @@ void console_flush_on_panic(void)
>          * context and we don't want to get preempted while flushing,
>          * ensure may_schedule is cleared.
>          */
> -       console_trylock();
> -       console_may_schedule = 0;
> -       console_unlock();
> +       console_read_trylock();
> +       xchg(&console_emit_lock, CONSOLE_EMIT_LOCKED_VAL);
> +       /* Forbid resheduling */
> +       console_emit(0);
>  }
>
>  /*
> @@ -2343,7 +2475,7 @@ struct tty_driver *console_device(int *index)
>         struct console *c;
>         struct tty_driver *driver = NULL;
>
> -       console_lock();
> +       console_read_lock();
>         for_each_console(c) {
>                 if (!c->device)
>                         continue;
> @@ -2351,7 +2483,7 @@ struct tty_driver *console_device(int *index)
>                 if (driver)
>                         break;
>         }
> -       console_unlock();
> +       console_read_unlock();
>         return driver;
>  }
>
> @@ -2362,17 +2494,17 @@ struct tty_driver *console_device(int *index)
>   */
>  void console_stop(struct console *console)
>  {
> -       console_lock();
> +       console_write_lock();
>         console->flags &= ~CON_ENABLED;
> -       console_unlock();
> +       console_write_unlock();
>  }
>  EXPORT_SYMBOL(console_stop);
>
>  void console_start(struct console *console)
>  {
> -       console_lock();
> +       console_write_lock();
>         console->flags |= CON_ENABLED;
> -       console_unlock();
> +       console_write_unlock();
>  }
>  EXPORT_SYMBOL(console_start);
>
> @@ -2512,7 +2644,7 @@ void register_console(struct console *newcon)
>          *      Put this console in the list - keep the
>          *      preferred driver at the head of the list.
>          */
> -       console_lock();
> +       console_write_lock();
>         if ((newcon->flags & CON_CONSDEV) || console_drivers == NULL) {
>                 newcon->next = console_drivers;
>                 console_drivers = newcon;
> @@ -2543,7 +2675,8 @@ void register_console(struct console *newcon)
>                  */
>                 exclusive_console = newcon;
>         }
> -       console_unlock();
> +       console_write_unlock();
> +       console_tryflush();
>         console_sysfs_notify();
>
>         /*
> @@ -2583,7 +2716,7 @@ int unregister_console(struct console *console)
>                 return res;
>
>         res = 1;
> -       console_lock();
> +       console_write_lock();
>         if (console_drivers == console) {
>                 console_drivers=console->next;
>                 res = 0;
> @@ -2609,7 +2742,7 @@ int unregister_console(struct console *console)
>                 console_drivers->flags |= CON_CONSDEV;
>
>         console->flags &= ~CON_ENABLED;
> -       console_unlock();
> +       console_write_unlock();
>         console_sysfs_notify();
>         return res;
>  }
> @@ -2697,8 +2830,8 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
>
>         if (pending & PRINTK_PENDING_OUTPUT) {
>                 /* If trylock fails, someone else is doing the printing */
> -               if (console_trylock())
> -                       console_unlock();
> +               if (console_trylock_for_emit())
> +                       console_emit(false);
>         }
>
>         if (pending & PRINTK_PENDING_WAKEUP)
> --
> 2.13.1
>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-22 13:42         ` Daniel Vetter
@ 2017-06-22 13:48             ` Daniel Vetter
  2017-06-23  5:20           ` Sergey Senozhatsky
  1 sibling, 0 replies; 60+ messages in thread
From: Daniel Vetter @ 2017-06-22 13:48 UTC (permalink / raw)
  To: Sergey Senozhatsky, Bartlomiej Zolnierkiewicz,
	Linux Fbdev development list
  Cc: ksummit-discuss

On Thu, Jun 22, 2017 at 3:42 PM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> Tears pretty much guaranteed, and after a few hacks from Alan&me I
> concluded that the only way to fix this is to at least partially
> rewrite fbdev (a dead subsystem, so no one's volunteering), with the
> risk that you get to revert it all because someone is indeed relying
> on that super-flexible module loading order sequence. The simplest fix
> would probably be to make the entire fbdev->fbcon setup depency a
> hard-coded function call, or maybe at most a one-shot symbol_get
> attempt.

I did once hash out a plan how to fix this with the least amount of pain:

1. Merge a patch to build the fbcon support into the overall fb.ko
module, so that the dynamic loading nonsense is essentially disabled,
and fbcon becomes a Kconfig/compile-time only option, no longer a
runtime-selectable thing.

2. Wait 1 year and pray that no one reports a regression. If you're
unlucky, try to fence them of with a runtime option to disable fbcon.

3. Rip out the notifier nonsense and replace it by direct function
calls. You can only do that once 1. won't be reverted anymore.

4. Push the console_lock donw the callchains until it's again at the
right spots, auditing all the other stuff meanwhile to make sure the
locking is still correct.

5. Apply your patch to make console_lock sane.

Adding fbdev maintainers and lists just.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
@ 2017-06-22 13:48             ` Daniel Vetter
  0 siblings, 0 replies; 60+ messages in thread
From: Daniel Vetter @ 2017-06-22 13:48 UTC (permalink / raw)
  To: Sergey Senozhatsky, Bartlomiej Zolnierkiewicz,
	Linux Fbdev development list
  Cc: ksummit-discuss

On Thu, Jun 22, 2017 at 3:42 PM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> Tears pretty much guaranteed, and after a few hacks from Alan&me I
> concluded that the only way to fix this is to at least partially
> rewrite fbdev (a dead subsystem, so no one's volunteering), with the
> risk that you get to revert it all because someone is indeed relying
> on that super-flexible module loading order sequence. The simplest fix
> would probably be to make the entire fbdev->fbcon setup depency a
> hard-coded function call, or maybe at most a one-shot symbol_get
> attempt.

I did once hash out a plan how to fix this with the least amount of pain:

1. Merge a patch to build the fbcon support into the overall fb.ko
module, so that the dynamic loading nonsense is essentially disabled,
and fbcon becomes a Kconfig/compile-time only option, no longer a
runtime-selectable thing.

2. Wait 1 year and pray that no one reports a regression. If you're
unlucky, try to fence them of with a runtime option to disable fbcon.

3. Rip out the notifier nonsense and replace it by direct function
calls. You can only do that once 1. won't be reverted anymore.

4. Push the console_lock donw the callchains until it's again at the
right spots, auditing all the other stuff meanwhile to make sure the
locking is still correct.

5. Apply your patch to make console_lock sane.

Adding fbdev maintainers and lists just.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-21 11:12                     ` Sergey Senozhatsky
@ 2017-06-22 14:06                       ` Steven Rostedt
  2017-06-23  5:43                         ` Sergey Senozhatsky
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2017-06-22 14:06 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Peter Zijlstra, ksummit-discuss

On Wed, 21 Jun 2017 20:12:10 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:


> I thought about it, and the question is:
> would lockless per-CPU logbuffers buy us anything? we used to have

Well, I'm not 100% happy with the current NMI approach. There is still
no "print everything" from NMI. That is, prints from NMI must be placed
in a buffer before going out, and that limits how much can be printed.
And an ftrace_dump_on_oops can be huge.

> problems with the logbuf_lock, but not any more (to the best of my
> knowledge). we deal with logbuf_lock deadlocks using printk_nmi and
> printk_safe. so I'd say that logbuf_lock probably doesn't bother us
> anymore, it's all those locks that printk can't control that bother
> us (semaphore, scheduler, timekeeping, serial consoles, etc. etc.).
> 
> so would per-CPU logbufs be better? we would need to do N-way merge
> (N per-CPU logbufs) when we print the kernel messages log, correct?

Yes.

> 
> > 2) have two types of console interfaces. A normal and a critical.
> > 
> > 3) have a thread that is woken whenever there is data in any of the
> > buffers, and reads the buffers, again lockless. But to do this in a
> > reasonable manner, unless you break the printks up in sub buffers like
> > ftrace, if the consumer isn't fast enough, newer messages are dropped.  
> 
> yes, so I definitely want to have printing offloading. but, per my
> experience, it's not all so simple when it comes to offloading. if
> we would compare offloading with the direct printing then offloading
> does change printk behaviour and I saw a number of dropped messages
> bug reports because of offloading. the existing direct printing can
> throttle the CPU that printks a lot.
> 
> direct printing
> 
> 	CPU1
> 
> 	printk
> 	call_console_drivers
> 	printk
> 	call_console_drivers
> 	...
> 	printk
> 	call_console_drivers
> 
> 
> so new logbuf entries do not appear in the logbuf until the previous
> ones are printed to the serial console. while with the offloading
> it's different:
> 
> offloading
> 
> 	CPU1				CPU2
> 	printk
> 	printk				call_console_drivers
> 	printk
> 	printk				call_console_drivers
> 	printk
> 					call_console_drivers
> 
> new logbuf entries now appear uncontrollably.
> 
> well, nothing new here. we already can have hit scenario, we just need
> one CPU spinning in console_unlock() and one or several CPUs doing
> printk. but with offloading we potentially break a trivial case - a
> single CPU that calls printk.

We could come up with another way to throttle the CPU that does all the
printks.

> 
> 
> so may be additionally to offloading we also would need some sort of
> throttling mechanism in printk.

Yes.


-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19  5:21 [Ksummit-discuss] [TECH TOPIC] printk redesign Sergey Senozhatsky
  2017-06-19  6:22 ` Hannes Reinecke
  2017-06-19 23:46 ` Josh Triplett
@ 2017-06-22 16:35 ` David Howells
  2017-07-19  6:24 ` Sergey Senozhatsky
  3 siblings, 0 replies; 60+ messages in thread
From: David Howells @ 2017-06-22 16:35 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: ksummit-discuss

Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> 	I, Petr Mladek and Steven Rostedt would like to propose a printk
> tech topic (as suggested by Steven). We are currently exploring the idea
> of complete redesign and rework of printk and it would be extremely helpful
> to hear from the community. printk serves different purposes, and some of
> requirements of printk tend to contradict each other; printk is monolithic
> and quite heavy, no wonder, it causes problems sometimes.

One thing I've been looking at is improving mount by creating a context in the
kernel that can be used to allow better parameter passing to the kernel - and
*also* better passage of error information back to userspace rather than
dumping it in dmesg.

I originally put a supplementary error message buffer in the aforementioned
context, but then moved it to task_struct so that I could pass error message
from nfs4 mount (which uses a pathwalk over submounts inside its mount) and
automount.  However, putting it in the task_struct makes it more generally
available.

Attached is the supplementary error message patch that I've implemented.  It
only allows for one message at a time at the moment - and must be enabled by a
process to work for that process.

See:

	Subject: [RFC][PATCH 00/27] VFS: Introduce filesystem context [ver #5]

posted to LKML, linux-nfs, linux-fsdevel and linux-security-module on Jun
14th to see it being used.

I should note that Al thinks that it's not a good idea because it's generic.

David
---
commit 44e890888ad8fb6f605abc66bcc0948580834eba
Author: David Howells <dhowells@redhat.com>
Date:   Thu Jun 8 16:07:08 2017 +0100

    Provide supplementary error message facility
    
    Provide a way for the kernel to pass supplementary error messages to
    userspace.  This will make it easier for userspace, particularly in
    containers to find out what went wrong during mounts and automounts, but is
    also made available to any other syscalls that want to use it.
    
    Two prctl() functions are added for this:
    
     (1) int old_setting = prctl(PR_ERRMSG_ENABLE, int setting);
    
         Enable (setting == 1) or disable (setting == 0) the facility.
         Disabling the facility clears the error buffer.
    
     (2) int size = prctl(PR_ERRMSG_READ, char *buffer, int buf_size);
    
         Reads the next error string into the buffer.  The string is truncated
         if it won't fit.  Strings are discarded as they're read.
    
         If there isn't a string, ENODATA is indicated.
    
    I've done it this way rather than a proc file because procfs might not be
    accessible.
    
    The interface inside the kernel is a pair of macros:
    
     (*) void errorf(const char *fmt, ...);
     (*) int invalf(const char *fmt, ...);
    
    Both of them snprintf() the string into the current process's error message
    buffer if the facility is enabled.  The string is truncated if it exceeds
    the limit.  invalf() returns -EINVAL whereas errof() has no return.
    
    Note that this is very crude and could be made to store multiple strings,
    allocate storage as required and not duplicate unformatted strings that are
    stored in the rodata section (like kvasprintf_const).  Unfortunately,
    specially handling rodata strings wouldn't gain a lot as most strings are
    likely to be in modules, where the string's life can be terminated by
    rmmod.
    
    Signed-off-by: David Howells <dhowells@redhat.com>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2b69fc650201..a6002b60b0b9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1051,6 +1051,8 @@ struct task_struct {
 	/* Used by LSM modules for access restriction: */
 	void				*security;
 #endif
+#define ERROR_MSG_SIZE 256
+	char				*error_msg;
 	/* CPU-specific state of this task: */
 	struct thread_struct		thread;
 
@@ -1573,4 +1575,31 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 #define TASK_SIZE_OF(tsk)	TASK_SIZE
 #endif
 
+/**
+ * errorf - Store supplementary error message
+ * @fmt: The format string
+ *
+ * Store the supplementary error message for the process if the process has
+ * enabled the facility.
+ */
+#define errorf(fmt, ...)			\
+	do {					\
+		if (current->error_msg)					\
+			snprintf(current->error_msg, ERROR_MSG_SIZE, fmt, ## __VA_ARGS__); \
+	} while(0)
+
+/**
+ * invalf - Store supplementary invalid argument error message
+ * @fmt: The format string
+ *
+ * Store the supplementary error message for the process if the process has
+ * enabled the facility and return -EINVAL.
+ */
+#define invalf(fmt, ...)			\
+	({					\
+		errorf(fmt, ## __VA_ARGS__);	\
+		-EINVAL;			\
+	})
+
+
 #endif
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759a9e40..b1203850dac8 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,10 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER		3
 # define PR_CAP_AMBIENT_CLEAR_ALL	4
 
+/*
+ * Control the supplementary error message gathering facility.
+ */
+#define PR_ERRMSG_ENABLE		48
+#define PR_ERRMSG_READ			49
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index 516acdb0e0ec..31b8617aee04 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -932,6 +932,7 @@ void __noreturn do_exit(long code)
 		__this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied);
 	exit_rcu();
 	TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));
+	kfree(tsk->error_msg);
 
 	do_task_dead();
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index e53770d2bf95..177b4c82fcb9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1912,6 +1912,7 @@ static __latent_entropy struct task_struct *copy_process(
 
 	trace_task_newtask(p, clone_flags);
 	uprobe_copy_process(p, clone_flags);
+	p->error_msg = NULL;
 
 	return p;
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 8a94b4eabcaa..b784905c4806 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2295,6 +2295,44 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_GET_FP_MODE:
 		error = GET_FP_MODE(me);
 		break;
+
+	case PR_ERRMSG_ENABLE:
+		switch (arg2) {
+		case 0:
+			if (!current->error_msg)
+				return 0;
+			kfree(current->error_msg);
+			current->error_msg = NULL;
+			return 1;
+		case 1:
+			if (current->error_msg)
+				return 1;
+			current->error_msg = kmalloc(ERROR_MSG_SIZE, GFP_KERNEL);
+			if (!current->error_msg)
+				return -ENOMEM;
+			current->error_msg[0] = 0;
+			return 0;
+		default:
+			error = -EINVAL;
+			break;
+		}
+		break;
+
+	case PR_ERRMSG_READ:
+		if (!arg2 || !arg3)
+			return -EINVAL;
+		if (!current->error_msg)
+			return -EINVAL;
+		if (!current->error_msg[0])
+			return -ENODATA;
+		error = strlen(current->error_msg);
+		if (arg3 < error)
+			error = arg3;
+		if (copy_to_user((char __user *)arg2, current->error_msg, error))
+			return -EFAULT;
+		current->error_msg[0] = 0;
+		return error;
+
 	default:
 		error = -EINVAL;
 		break;

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-22 13:42         ` Daniel Vetter
  2017-06-22 13:48             ` Daniel Vetter
@ 2017-06-23  5:20           ` Sergey Senozhatsky
  1 sibling, 0 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-23  5:20 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit-discuss

On (06/22/17 15:42), Daniel Vetter wrote:
[..]
> >> Fixing console_lock is much less likely to happen, I (and better
> >> hackers like Alan Cox) tried, and ran away in tears.
> >
> > well, I don't even pretend to be a kernel hacker, but what were
> > the major obstacles?
> 
> The framebuffer notifier gets called from multiple places, with
> multiple different execution contexts and locking contexts, for at
> least 3 different things:
> - Allowing fbdev console support to be built modular, and loaded in
> any order wrt fbdev drivers and still end up with an fbcon console.
> This results in a fbdev driver 2 fbcon call in
> (un)register_framebuffer.
> - Backlight handling. This means fbcon needs to call into the fb
> notifier, giving you a locking inversion.
> - Bunch of other things just for fun.
> 
> This means all the locking for all of the above things are entangled
> through the modifiers mutex, with the result that defacto console_lock
> protects everything, and that anything in fbdev that might call into
> one of these notifier use-cases must use console_lock as the
> outermost.
> 
> To make matters worse, drm kernel modesetting drivers then get called
> in an fbdev->fbcon->fbdev->drm fbdev emulation->kms driver chain from
> within register_framebuffer to do the initial modeset, and ofc that's
> all under the console_lock.
> 
> Tears pretty much guaranteed, and after a few hacks from Alan&me I
> concluded that the only way to fix this is to at least partially
> rewrite fbdev (a dead subsystem, so no one's volunteering), with the
> risk that you get to revert it all because someone is indeed relying
> on that super-flexible module loading order sequence. The simplest fix
> would probably be to make the entire fbdev->fbcon setup depency a
> hard-coded function call, or maybe at most a one-shot symbol_get
> attempt.

many thanks.

I played with the RW console_sem last night... and, yes, I was pretty sad.

just in case, if we ever will return back, I improved "the patch"
a bit, there are still several broken places (console_may_schedule()
on !PREEMPT systems for instance), and so on. uploaded here:
https://github.com/sergey-senozhatsky/linux-next-ss/commits/console_RW_sem
so at least we won't lose the patch, may be.

but, like you said, we've got a much a bigger problems.


> > the thing is,
> > I want to convert console_sem into RW lock. I briefly mentioned it last
> > year in Santa Fe, NM... and in fact I wrote *some sort* of a patch back
> > (well, mostly to see if it can fly at all). the patch has never been
> > finalized, polished, carefully written or properly tested (!).
> 
> RW won't help, the problem is the locking inversion design disaster in
> fbdev/fbcon resulting in console_lock being an outermost lock procting
> almost the entire fbdev world.

yes. console_sem is a *Big Konsole Lock* (German spelling, I believe) as
of now. it's way too important and in most of the cases it must be acquired
in exclusive (write) mode. So we are losing the whole point of convertion.
we can, of course, switch printk to rwsem and keep console_lock() as a
wrapper around console_write_lock(), but the benefit of such excerise is
questionable. I was hoping to see more places where we can be in read lock
mode.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-22 14:06                       ` Steven Rostedt
@ 2017-06-23  5:43                         ` Sergey Senozhatsky
  2017-06-23 13:09                           ` Steven Rostedt
  0 siblings, 1 reply; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-23  5:43 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Peter Zijlstra, ksummit-discuss

On (06/22/17 10:06), Steven Rostedt wrote:
> On Wed, 21 Jun 2017 20:12:10 +0900
> Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
>
> > I thought about it, and the question is:
> > would lockless per-CPU logbuffers buy us anything? we used to have
> 
> Well, I'm not 100% happy with the current NMI approach.

that's a good point.

> There is still no "print everything" from NMI. That is, prints from NMI
> must be placed in a buffer before going out, and that limits how much
> can be printed. And an ftrace_dump_on_oops can be huge.

hm, ok.

_just thinking out loud_

currently we've got a _very_ accidental fix/hack that lets us to bypass
printk_nmi per-CPU buffers in most of the cases and print NMI messages
directly to the logbuf, with the only exceptional case (when we store the
messages to the per-CPU printk_nmi buffer) being the case when NMI printk
happens on the CPU that owned the logbuf_lock at the time when NMI occurred
on that CPU. which is may be narrow enough. so we can keep printk_nmi and
printk_safe per-CPU buffers relatively small in size, and instead make
only logbuf really huge. with per-CPU logbufs design we would need to make
each CPU's buffer huge unconditionally.

but, at the same time, with the current implementation, there is a
possibility that we will have to make both logbuf and per-CPU buffers
really huge.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-21 14:18                       ` Andrew Lunn
@ 2017-06-23  8:46                         ` Petr Mladek
  0 siblings, 0 replies; 60+ messages in thread
From: Petr Mladek @ 2017-06-23  8:46 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Peter Zijlstra, ksummit-discuss

On Wed 2017-06-21 16:18:08, Andrew Lunn wrote:
> On Wed, Jun 21, 2017 at 02:23:04PM +0200, Petr Mladek wrote:
> > On Tue 2017-06-20 19:28:58, Steven Rostedt wrote:
> > > I've thought about this a little too.
> > > 
> > > I would like printk to have per-cpu buffers. Then we don't even need to
> > > store the CPU number, that would be explicit by which buffer the data
> > > is stored in.
> > > 
> > > The one thing that is needed, is the consumer. In ftrace, it's whatever
> > > reads the buffer, which is usually user space, but can be the kernel
> > > (see sysctl-z). But there's only one consumer at a time.
> > > 
> > > I was thinking about a new design for printk. Similar to ftrace, but
> > > different.
> > > 
> > > 1) have per cpu buffers, that are lockless. Writes happen immediately,
> > > but the output happens later.
> > 
> > My problems with per-CPU buffers is that:
> > 
> >     + I am not sure how big per-CPU buffers we could afford.
> >       Any non-balanced usage increases the chance of loosing
> >       messages.
> > 
> >     + The information is scattered and extra tools are needed
> >       to locate the messages and sort them.
> 
> Considering this from the Embedded world, during board bring up, you
> might be poking around in memory using a JTAG interface because your
> serial driver is not working yet, or the kernel is crashing before the
> serial port is up and running. The in-memory format needs to be quite
> simple to read. I don't know how early_printk() factors into
> this. Maybe that solves this problem?

Early printk just directly writes the messages on an early console.
They are not stored anywhere.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-21 16:09                       ` Andrew Lunn
@ 2017-06-23  8:49                         ` Petr Mladek
  0 siblings, 0 replies; 60+ messages in thread
From: Petr Mladek @ 2017-06-23  8:49 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Peter Zijlstra, ksummit-discuss

On Wed 2017-06-21 18:09:51, Andrew Lunn wrote:
> > 7. Unreliable time stamps and sorting messages
> > 
> >    The current extra buffers (cont, printk_safe, printk_safe_nmi)
> >    makes this worse. The timestamp is added later.
> 
> All you need is a sequence number. Since you have the CPU number
> already, it could be a per CPU atomic. And you probably don't even
> need to sort the output, if the sequence number is part of the
> output.

Yup.

> How bad is the reordering? Would a sequence space of 256 be
> enough?

That is very good question and we need a feedback here. I am not sure
what fixes are worth the complexity.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-22 13:48             ` Daniel Vetter
@ 2017-06-23  9:07               ` Bartlomiej Zolnierkiewicz
  -1 siblings, 0 replies; 60+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2017-06-23  9:07 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit-discuss, Linux Fbdev development list


Hi,

On Thursday, June 22, 2017 03:48:34 PM Daniel Vetter wrote:
> On Thu, Jun 22, 2017 at 3:42 PM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > Tears pretty much guaranteed, and after a few hacks from Alan&me I
> > concluded that the only way to fix this is to at least partially
> > rewrite fbdev (a dead subsystem, so no one's volunteering), with the
> > risk that you get to revert it all because someone is indeed relying
> > on that super-flexible module loading order sequence. The simplest fix
> > would probably be to make the entire fbdev->fbcon setup depency a
> > hard-coded function call, or maybe at most a one-shot symbol_get
> > attempt.
> 
> I did once hash out a plan how to fix this with the least amount of pain:
> 
> 1. Merge a patch to build the fbcon support into the overall fb.ko
> module, so that the dynamic loading nonsense is essentially disabled,
> and fbcon becomes a Kconfig/compile-time only option, no longer a
> runtime-selectable thing.
> 
> 2. Wait 1 year and pray that no one reports a regression. If you're
> unlucky, try to fence them of with a runtime option to disable fbcon.
> 
> 3. Rip out the notifier nonsense and replace it by direct function
> calls. You can only do that once 1. won't be reverted anymore.
> 
> 4. Push the console_lock donw the callchains until it's again at the
> right spots, auditing all the other stuff meanwhile to make sure the
> locking is still correct.
> 
> 5. Apply your patch to make console_lock sane.
> 
> Adding fbdev maintainers and lists just.

Your plan sounds good to me.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
@ 2017-06-23  9:07               ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 60+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2017-06-23  9:07 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit-discuss, Linux Fbdev development list


Hi,

On Thursday, June 22, 2017 03:48:34 PM Daniel Vetter wrote:
> On Thu, Jun 22, 2017 at 3:42 PM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > Tears pretty much guaranteed, and after a few hacks from Alan&me I
> > concluded that the only way to fix this is to at least partially
> > rewrite fbdev (a dead subsystem, so no one's volunteering), with the
> > risk that you get to revert it all because someone is indeed relying
> > on that super-flexible module loading order sequence. The simplest fix
> > would probably be to make the entire fbdev->fbcon setup depency a
> > hard-coded function call, or maybe at most a one-shot symbol_get
> > attempt.
> 
> I did once hash out a plan how to fix this with the least amount of pain:
> 
> 1. Merge a patch to build the fbcon support into the overall fb.ko
> module, so that the dynamic loading nonsense is essentially disabled,
> and fbcon becomes a Kconfig/compile-time only option, no longer a
> runtime-selectable thing.
> 
> 2. Wait 1 year and pray that no one reports a regression. If you're
> unlucky, try to fence them of with a runtime option to disable fbcon.
> 
> 3. Rip out the notifier nonsense and replace it by direct function
> calls. You can only do that once 1. won't be reverted anymore.
> 
> 4. Push the console_lock donw the callchains until it's again at the
> right spots, auditing all the other stuff meanwhile to make sure the
> locking is still correct.
> 
> 5. Apply your patch to make console_lock sane.
> 
> Adding fbdev maintainers and lists just.

Your plan sounds good to me.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-23  5:43                         ` Sergey Senozhatsky
@ 2017-06-23 13:09                           ` Steven Rostedt
  0 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2017-06-23 13:09 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Peter Zijlstra, ksummit-discuss

On Fri, 23 Jun 2017 14:43:22 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> currently we've got a _very_ accidental fix/hack that lets us to bypass
> printk_nmi per-CPU buffers in most of the cases and print NMI messages
> directly to the logbuf, with the only exceptional case (when we store the
> messages to the per-CPU printk_nmi buffer) being the case when NMI printk
> happens on the CPU that owned the logbuf_lock at the time when NMI occurred
> on that CPU. which is may be narrow enough. so we can keep printk_nmi and
> printk_safe per-CPU buffers relatively small in size, and instead make
> only logbuf really huge. with per-CPU logbufs design we would need to make
> each CPU's buffer huge unconditionally.
> 
> but, at the same time, with the current implementation, there is a
> possibility that we will have to make both logbuf and per-CPU buffers
> really huge.

Ideally, I would love to have a way that an NMI (on opps) can have a
way to just print directly to the consoles, without having to use the
log buffers at all.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19 16:26       ` Steven Rostedt
  2017-06-19 16:35         ` Andrew Lunn
@ 2017-06-24 11:14         ` Mauro Carvalho Chehab
  2017-06-24 14:06           ` Andrew Lunn
  1 sibling, 1 reply; 60+ messages in thread
From: Mauro Carvalho Chehab @ 2017-06-24 11:14 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

Em Mon, 19 Jun 2017 12:26:51 -0400
Steven Rostedt <rostedt@goodmis.org> escreveu:

> On Mon, 19 Jun 2017 17:20:55 +0200
> Andrew Lunn <andrew@lunn.ch> wrote:
> 
> 
> > > 4) All other kernel information that's not critical at all, and perhaps
> > > doesn't even need to be serialized. At least, not against the above.
> > > This could be cached, and outputted at a later time than when the
> > > printk() was called.    
> > 
> > Developers machine probably have different requirements to production
> > machines. When debugging during code development, i want the debug
> > output to be in the correct order, independent of the level. If you
> > are going to cause reordering, you might want to add a sequence number
> > to each output, so it is possible to put it back into the correct
> > order. And it needs to be clear when output is out of order.  

Even for developers, ensuring the correct order is a must.

If someone reports me a bug that the Kernel crashed because of some
driver, I would ask him to enable debug messages and send the crash
logs with it enabled. 

> Perhaps a kernel command line option could force this.

I'd say that a Kconfig option can also be used for that. In
the case of media, we have a config option that enables extra
debugging features (CONFIG_VIDEO_ADV_DEBUG). It seems that other
subsystems have similar things:

	$ grep ADV .config|grep DEB
	CONFIG_PM_ADVANCED_DEBUG=y
	CONFIG_BATMAN_ADV_DEBUGFS=y
	CONFIG_BATMAN_ADV_DEBUG=y
	CONFIG_VIDEO_ADV_DEBUG=y

Well, if any of such options are selected, I'd say that ordering
the logs should be the default.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-24 11:14         ` Mauro Carvalho Chehab
@ 2017-06-24 14:06           ` Andrew Lunn
  2017-06-24 22:42             ` Steven Rostedt
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Lunn @ 2017-06-24 14:06 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: ksummit-discuss

On Sat, Jun 24, 2017 at 08:14:11AM -0300, Mauro Carvalho Chehab wrote:
> Em Mon, 19 Jun 2017 12:26:51 -0400
> Steven Rostedt <rostedt@goodmis.org> escreveu:
> 
> > On Mon, 19 Jun 2017 17:20:55 +0200
> > Andrew Lunn <andrew@lunn.ch> wrote:
> > 
> > 
> > > > 4) All other kernel information that's not critical at all, and perhaps
> > > > doesn't even need to be serialized. At least, not against the above.
> > > > This could be cached, and outputted at a later time than when the
> > > > printk() was called.    
> > > 
> > > Developers machine probably have different requirements to production
> > > machines. When debugging during code development, i want the debug
> > > output to be in the correct order, independent of the level. If you
> > > are going to cause reordering, you might want to add a sequence number
> > > to each output, so it is possible to put it back into the correct
> > > order. And it needs to be clear when output is out of order.  
> 
> Even for developers, ensuring the correct order is a must.
> 
> If someone reports me a bug that the Kernel crashed because of some
> driver, I would ask him to enable debug messages and send the crash
> logs with it enabled. 

We also need a clear indication that ordering is broken and should not
be trusted when we get a crash log from some random user.

   Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-24 14:06           ` Andrew Lunn
@ 2017-06-24 22:42             ` Steven Rostedt
  2017-06-24 23:21               ` Andrew Lunn
  0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2017-06-24 22:42 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Mauro Carvalho Chehab, ksummit-discuss

On Sat, 24 Jun 2017 16:06:59 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > If someone reports me a bug that the Kernel crashed because of some
> > driver, I would ask him to enable debug messages and send the crash
> > logs with it enabled.   
> 
> We also need a clear indication that ordering is broken and should not
> be trusted when we get a crash log from some random user.

When ordering is not ensured, then we can force timestamps. And I like
the idea of the time stamp happening when the printk is called, not
when it actually goes to the buffer. That way, we could see the
timestamps out of order, and sequence them during postmortem.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-24 22:42             ` Steven Rostedt
@ 2017-06-24 23:21               ` Andrew Lunn
  2017-06-24 23:26                 ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Lunn @ 2017-06-24 23:21 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mauro Carvalho Chehab, ksummit-discuss

On Sat, Jun 24, 2017 at 06:42:16PM -0400, Steven Rostedt wrote:
> On Sat, 24 Jun 2017 16:06:59 +0200
> Andrew Lunn <andrew@lunn.ch> wrote:
> 
> > > If someone reports me a bug that the Kernel crashed because of some
> > > driver, I would ask him to enable debug messages and send the crash
> > > logs with it enabled.   
> > 
> > We also need a clear indication that ordering is broken and should not
> > be trusted when we get a crash log from some random user.
> 
> When ordering is not ensured, then we can force timestamps. And I like
> the idea of the time stamp happening when the printk is called, not
> when it actually goes to the buffer. That way, we could see the
> timestamps out of order, and sequence them during postmortem.

Hi Steven

This assumes timestamps are fine grained enough. I still prefer a
sequence number in addition.

	 Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-24 23:21               ` Andrew Lunn
@ 2017-06-24 23:26                 ` Linus Torvalds
  2017-06-24 23:40                   ` Steven Rostedt
  2017-06-24 23:48                   ` Al Viro
  0 siblings, 2 replies; 60+ messages in thread
From: Linus Torvalds @ 2017-06-24 23:26 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Mauro Carvalho Chehab, ksummit

On Sat, Jun 24, 2017 at 4:21 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> This assumes timestamps are fine grained enough. I still prefer a
> sequence number in addition.

Guyes, you're overdesigning.

The general old printk subsystem had "continuations". Nobody ever used
them. Nobody. I know, because they were broken, and nobody even
reported it.

Similarly, nobody is going to use sequence numbers and timestamps.
What *is* going to get used is digital cameras to take pictures of a
screen, and "dmesg" output (and no, dmesg will not use those sequence
numbers either, see above).

If those two things aren't the absolutely primary goals, the whole
thing is pointless to even discuss. No amount of cool features,
performance, or theoretical deadlock avoidance matters ONE WHIT
compared to the two things above.

                      Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-24 23:26                 ` Linus Torvalds
@ 2017-06-24 23:40                   ` Steven Rostedt
  2017-06-26 11:16                     ` Sergey Senozhatsky
  2017-06-24 23:48                   ` Al Viro
  1 sibling, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2017-06-24 23:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mauro Carvalho Chehab, ksummit

On Sat, 24 Jun 2017 16:26:42 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sat, Jun 24, 2017 at 4:21 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> > This assumes timestamps are fine grained enough. I still prefer a
> > sequence number in addition.  
> 
> Guyes, you're overdesigning.
> 
> The general old printk subsystem had "continuations". Nobody ever used
> them. Nobody. I know, because they were broken, and nobody even
> reported it.
> 
> Similarly, nobody is going to use sequence numbers and timestamps.
> What *is* going to get used is digital cameras to take pictures of a
> screen, and "dmesg" output (and no, dmesg will not use those sequence
> numbers either, see above).
> 
> If those two things aren't the absolutely primary goals, the whole
> thing is pointless to even discuss. No amount of cool features,
> performance, or theoretical deadlock avoidance matters ONE WHIT
> compared to the two things above.
>

That was in my original email about the #1 priority. Critical prints.
Which will always be done in sequence and at the time they are printed.
A digital camera will only be able to grab the backtrace. Most all the
other crud that people use printk for will be off screen.

In fact, it would be great to have it so that the first bug stops the
kernel. I hate it when I get pictures of backtraces only to find out
it's the fifth backtrace that happened, and whatever caused the bug
had its backtrace lost by the 4 other bugs that were simply side
effects of the first real bug.

IMHO, printk is used too freely. Perhaps what comes out of this is to
create an "info only" printk that is out of band with critical printks.
Or perhaps even simpler, start auditing printks and remove those that
are really not needed.

Now the reports I've heard about, these are not theoretical deadlocks,
they can really happen on large scale machines.

Then there's the "debug" printks. Perhaps they should be converted to
trace points. If a printk requires a config option, a lot of users may
not be able to enable it.

-- Steve

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-24 23:26                 ` Linus Torvalds
  2017-06-24 23:40                   ` Steven Rostedt
@ 2017-06-24 23:48                   ` Al Viro
  2017-06-25  1:29                     ` Andrew Lunn
  1 sibling, 1 reply; 60+ messages in thread
From: Al Viro @ 2017-06-24 23:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mauro Carvalho Chehab, ksummit

On Sat, Jun 24, 2017 at 04:26:42PM -0700, Linus Torvalds wrote:
> On Sat, Jun 24, 2017 at 4:21 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> > This assumes timestamps are fine grained enough. I still prefer a
> > sequence number in addition.
> 
> Guyes, you're overdesigning.
> 
> The general old printk subsystem had "continuations". Nobody ever used
> them. Nobody. I know, because they were broken, and nobody even
> reported it.
> 
> Similarly, nobody is going to use sequence numbers and timestamps.
> What *is* going to get used is digital cameras to take pictures of a
> screen, and "dmesg" output (and no, dmesg will not use those sequence
> numbers either, see above).

I'd really hate to have to use pictures of screen...  I really hope that
printk to serial console keeps working - I don't care about timestamps
granularity, etc., but losing this would hurt.  Is it really that
uncommon use case?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-24 23:48                   ` Al Viro
@ 2017-06-25  1:29                     ` Andrew Lunn
  2017-06-25  2:41                       ` Linus Torvalds
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Lunn @ 2017-06-25  1:29 UTC (permalink / raw)
  To: Al Viro; +Cc: Mauro Carvalho Chehab, ksummit

> I'd really hate to have to use pictures of screen...  I really hope that
> printk to serial console keeps working - I don't care about timestamps
> granularity, etc., but losing this would hurt.  Is it really that
> uncommon use case?

It is how the embedded world operates, RS232, or now more often, RS232
with a built in USB-RS232 converter, so you use USB on the host.

     Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-25  1:29                     ` Andrew Lunn
@ 2017-06-25  2:41                       ` Linus Torvalds
  2017-06-26  8:46                         ` Jiri Kosina
  0 siblings, 1 reply; 60+ messages in thread
From: Linus Torvalds @ 2017-06-25  2:41 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Mauro Carvalho Chehab, ksummit

On Sat, Jun 24, 2017 at 6:29 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>> I'd really hate to have to use pictures of screen...  I really hope that
>> printk to serial console keeps working - I don't care about timestamps
>> granularity, etc., but losing this would hurt.  Is it really that
>> uncommon use case?
>
> It is how the embedded world operates, RS232, or now more often, RS232
> with a built in USB-RS232 converter, so you use USB on the host.

I'm not saying that serial lines shouldn't be an option.

But for a *large* user base, they simply aren't.

On regular PC's, it's often not an option any more. Even in the data
center, it's often not an option any more.

Yes, yes, 99% of the time for the simpler bugs, the machine survives,
and you get a nice oops message.

But that still leaves a reasonably big chunk of cases where you end up
getting an oops in an interrupt (or just in the disk layer itself),
and the machine is just dead, and the oops never makes it to disk.

Maybe people have netconsole or something - with known problems, but
compared to not getting anything those problems are often better than
the alternative. It should never be the default due to the kinds of
issues it has, but it might be a "no other option - I have an ethernet
port on a maintenance network, that's it".

And yes, *maybe* people have a serial line, but those traditional
UARTs close to the CPU are getting pretty rare, even in the embedded
world I think.

USB is no good for "the machine is dead", unless you are one of the
_very_ few people who use USB with the debug port dongle (which
basically bypasses "real" USB and just uses the cable and connector as
a magic serial line).

And yes, things like netconsole absolutely *will* have lockdep issues
and will have situations where it fails.

And yes, even the regular console will have situations where it
deadlocks and fails. Tough. It's probably not even worth it to try to
fix them (ie "oops while taking an interrupt while the CPU was inside
the vgacon driver itself from a previous printk" is just not worth
worrying about). Aim to minimize them in practice to code that has
been made really robust thanks to testing and not being actively
developed.

Put another way: there will always be situations where the console
just does not work. But that is *not* an excuse for looking at
relatively irrelevant stuff (ie sequence numbers etc). We still want
to make sure that when dmesg won't be saved, the console works most of
the time (where "most of the time" is >> 99%).

Out of order messages are survivable. In fact, they are hardly even an
annoyance most of the time.

But not having any messages at all, because we were trying so hard to
abstract things out and put them in buffers so that we couldn't
deadlock with the IO routines, and the timer or workqueue that was
supposed to do it is never going to happen any more because of the bug
that is trying to be printed out?

THAT is bad.

                Linus

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-25  2:41                       ` Linus Torvalds
@ 2017-06-26  8:46                         ` Jiri Kosina
  2017-07-19  7:59                           ` David Woodhouse
  0 siblings, 1 reply; 60+ messages in thread
From: Jiri Kosina @ 2017-06-26  8:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mauro Carvalho Chehab, ksummit

On Sat, 24 Jun 2017, Linus Torvalds wrote:

> > It is how the embedded world operates, RS232, or now more often, RS232 
> > with a built in USB-RS232 converter, so you use USB on the host.
> 
> I'm not saying that serial lines shouldn't be an option.
> 
> But for a *large* user base, they simply aren't.
> 
> On regular PC's, it's often not an option any more. Even in the data
> center, it's often not an option any more.

I don't really agree here. Yes, the mid-to-hig-end servers don't probably 
contain the actual UART chip any more, but the vast majority of those have 
somehting that's emulated in firmware, and actually do have a serial 
console line connector (not the 9-pin one, but rather RJ-45 with either 
Cisco or Yost pinout), which is then connected into serial-over-TCP 
concentrator box, exposing the serial console over telnet (or some 
proprietary client application). This is seen in DCs quite frequently.

Even machines that have very good IPMI support still ship with this.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-24 23:40                   ` Steven Rostedt
@ 2017-06-26 11:16                     ` Sergey Senozhatsky
  0 siblings, 0 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-26 11:16 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Mauro Carvalho Chehab, ksummit

Hello,

On (06/24/17 19:40), Steven Rostedt wrote:
[..]
> > On Sat, Jun 24, 2017 at 4:21 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> > > This assumes timestamps are fine grained enough. I still prefer a
> > > sequence number in addition.  
> > 
> > Guyes, you're overdesigning.
> > 
> > The general old printk subsystem had "continuations". Nobody ever used
> > them. Nobody. I know, because they were broken, and nobody even
> > reported it.
> > 
> > Similarly, nobody is going to use sequence numbers and timestamps.
> > What *is* going to get used is digital cameras to take pictures of a
> > screen, and "dmesg" output (and no, dmesg will not use those sequence
> > numbers either, see above).
> > 
> > If those two things aren't the absolutely primary goals, the whole
> > thing is pointless to even discuss. No amount of cool features,
> > performance, or theoretical deadlock avoidance matters ONE WHIT
> > compared to the two things above.
> >
> 
> That was in my original email about the #1 priority. Critical prints.
> Which will always be done in sequence and at the time they are printed.
> A digital camera will only be able to grab the backtrace. Most all the
> other crud that people use printk for will be off screen.
> 
> In fact, it would be great to have it so that the first bug stops the
> kernel. I hate it when I get pictures of backtraces only to find out
> it's the fifth backtrace that happened, and whatever caused the bug
> had its backtrace lost by the 4 other bugs that were simply side
> effects of the first real bug.
>
> IMHO, printk is used too freely. Perhaps what comes out of this is to
> create an "info only" printk that is out of band with critical printks.
> Or perhaps even simpler, start auditing printks and remove those that
> are really not needed.

here is my crazy idea (we are still brainstorming, right?)

we can "mark" the first panic logbuf entry in the logbuf and never drop it.
if we are out of space in logbuf then we drop new messages/logbuf entries
rather than the old ones. and we can replay the logbuf starting from that
"first panic logbuf entry" every once in a while, so your camera will have
better chances to capture the backtrace. we've got `panic_timeout' in panic(),
can introduce `panic_reflush_console_timeout' branch, which can either simply
flush logbuf starting from "panic logbuf entry" to the serial console, or to
the early console, or a brand new `struct console' callback
->emergency_write(...) which will basically do uart_console_write(), or
whatever it usually does, but without locking the port->lock (well if
anyone suffers from it). the last part is optional, but replaying logbuf
is *probably* not entirely mad; we flush kernel logbuf entries anyway,
replaying it shouldn't harm (tm). especially when the only available tool
is a digital camera.

> Now the reports I've heard about, these are not theoretical deadlocks,
> they can really happen on large scale machines.
> 
> Then there's the "debug" printks. Perhaps they should be converted to
> trace points. If a printk requires a config option, a lot of users may
> not be able to enable it.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-23  9:07               ` Bartlomiej Zolnierkiewicz
@ 2017-06-27 13:06                 ` Sergey Senozhatsky
  -1 siblings, 0 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-27 13:06 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: ksummit-discuss, Linux Fbdev development list

On (06/23/17 11:07), Bartlomiej Zolnierkiewicz wrote:
> On Thursday, June 22, 2017 03:48:34 PM Daniel Vetter wrote:
> > On Thu, Jun 22, 2017 at 3:42 PM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > Tears pretty much guaranteed, and after a few hacks from Alan&me I
> > > concluded that the only way to fix this is to at least partially
> > > rewrite fbdev (a dead subsystem, so no one's volunteering), with the
> > > risk that you get to revert it all because someone is indeed relying
> > > on that super-flexible module loading order sequence. The simplest fix
> > > would probably be to make the entire fbdev->fbcon setup depency a
> > > hard-coded function call, or maybe at most a one-shot symbol_get
> > > attempt.
> > 
> > I did once hash out a plan how to fix this with the least amount of pain:
> > 
> > 1. Merge a patch to build the fbcon support into the overall fb.ko
> > module, so that the dynamic loading nonsense is essentially disabled,
> > and fbcon becomes a Kconfig/compile-time only option, no longer a
> > runtime-selectable thing.
> > 
> > 2. Wait 1 year and pray that no one reports a regression. If you're
> > unlucky, try to fence them of with a runtime option to disable fbcon.
> > 
> > 3. Rip out the notifier nonsense and replace it by direct function
> > calls. You can only do that once 1. won't be reverted anymore.
> > 
> > 4. Push the console_lock donw the callchains until it's again at the
> > right spots, auditing all the other stuff meanwhile to make sure the
> > locking is still correct.
> > 
> > 5. Apply your patch to make console_lock sane.
> > 
> > Adding fbdev maintainers and lists just.
> 
> Your plan sounds good to me.

cool.

I'm willing to help. let's coordinate.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
@ 2017-06-27 13:06                 ` Sergey Senozhatsky
  0 siblings, 0 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-06-27 13:06 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: ksummit-discuss, Linux Fbdev development list

On (06/23/17 11:07), Bartlomiej Zolnierkiewicz wrote:
> On Thursday, June 22, 2017 03:48:34 PM Daniel Vetter wrote:
> > On Thu, Jun 22, 2017 at 3:42 PM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > Tears pretty much guaranteed, and after a few hacks from Alan&me I
> > > concluded that the only way to fix this is to at least partially
> > > rewrite fbdev (a dead subsystem, so no one's volunteering), with the
> > > risk that you get to revert it all because someone is indeed relying
> > > on that super-flexible module loading order sequence. The simplest fix
> > > would probably be to make the entire fbdev->fbcon setup depency a
> > > hard-coded function call, or maybe at most a one-shot symbol_get
> > > attempt.
> > 
> > I did once hash out a plan how to fix this with the least amount of pain:
> > 
> > 1. Merge a patch to build the fbcon support into the overall fb.ko
> > module, so that the dynamic loading nonsense is essentially disabled,
> > and fbcon becomes a Kconfig/compile-time only option, no longer a
> > runtime-selectable thing.
> > 
> > 2. Wait 1 year and pray that no one reports a regression. If you're
> > unlucky, try to fence them of with a runtime option to disable fbcon.
> > 
> > 3. Rip out the notifier nonsense and replace it by direct function
> > calls. You can only do that once 1. won't be reverted anymore.
> > 
> > 4. Push the console_lock donw the callchains until it's again at the
> > right spots, auditing all the other stuff meanwhile to make sure the
> > locking is still correct.
> > 
> > 5. Apply your patch to make console_lock sane.
> > 
> > Adding fbdev maintainers and lists just.
> 
> Your plan sounds good to me.

cool.

I'm willing to help. let's coordinate.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-19  5:21 [Ksummit-discuss] [TECH TOPIC] printk redesign Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2017-06-22 16:35 ` David Howells
@ 2017-07-19  6:24 ` Sergey Senozhatsky
  2017-07-19  6:25   ` Sergey Senozhatsky
  3 siblings, 1 reply; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-07-19  6:24 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt; +Cc: ksummit-discuss

On (06/19/17 14:21), Sergey Senozhatsky wrote:
> Hello,
> 
> 	I, Petr Mladek and Steven Rostedt would like to propose a printk
> tech topic (as suggested by Steven). We are currently exploring the idea
> of complete redesign and rework of printk and it would be extremely helpful
> to hear from the community. printk serves different purposes, and some of
> requirements of printk tend to contradict each other; printk is monolithic
> and quite heavy, no wonder, it causes problems sometimes.

I made a trivial printk TODO list. The list is incomplete and mostly
was created for personal use: thus it's probably a bit hard to read,
but at the same time it contains some quotes/opinions/ideas copy-pastes
and web-links. May be can be of some use. This also looks like our
possible (some approximation) agenda [if the topic will be accepted].

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-07-19  6:24 ` Sergey Senozhatsky
@ 2017-07-19  6:25   ` Sergey Senozhatsky
  2017-07-19  7:26     ` Daniel Vetter
  0 siblings, 1 reply; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-07-19  6:25 UTC (permalink / raw)
  To: Petr Mladek, Steven Rostedt; +Cc: ksummit-discuss

On (07/19/17 15:24), Sergey Senozhatsky wrote:
> On (06/19/17 14:21), Sergey Senozhatsky wrote:
> > Hello,
> > 
> > 	I, Petr Mladek and Steven Rostedt would like to propose a printk
> > tech topic (as suggested by Steven). We are currently exploring the idea
> > of complete redesign and rework of printk and it would be extremely helpful
> > to hear from the community. printk serves different purposes, and some of
> > requirements of printk tend to contradict each other; printk is monolithic
> > and quite heavy, no wonder, it causes problems sometimes.
> 
> I made a trivial printk TODO list. The list is incomplete and mostly
> was created for personal use: thus it's probably a bit hard to read,
> but at the same time it contains some quotes/opinions/ideas copy-pastes
> and web-links. May be can be of some use. This also looks like our
> possible (some approximation) agenda [if the topic will be accepted].
> 

d'oh...  the link...

https://github.com/sergey-senozhatsky/printk-todo

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-07-19  6:25   ` Sergey Senozhatsky
@ 2017-07-19  7:26     ` Daniel Vetter
  2017-07-20  5:19       ` Sergey Senozhatsky
  0 siblings, 1 reply; 60+ messages in thread
From: Daniel Vetter @ 2017-07-19  7:26 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: ksummit-discuss

On Wed, Jul 19, 2017 at 8:25 AM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> On (07/19/17 15:24), Sergey Senozhatsky wrote:
>> On (06/19/17 14:21), Sergey Senozhatsky wrote:
>> > Hello,
>> >
>> >     I, Petr Mladek and Steven Rostedt would like to propose a printk
>> > tech topic (as suggested by Steven). We are currently exploring the idea
>> > of complete redesign and rework of printk and it would be extremely helpful
>> > to hear from the community. printk serves different purposes, and some of
>> > requirements of printk tend to contradict each other; printk is monolithic
>> > and quite heavy, no wonder, it causes problems sometimes.
>>
>> I made a trivial printk TODO list. The list is incomplete and mostly
>> was created for personal use: thus it's probably a bit hard to read,
>> but at the same time it contains some quotes/opinions/ideas copy-pastes
>> and web-links. May be can be of some use. This also looks like our
>> possible (some approximation) agenda [if the topic will be accepted].
>>
>
> d'oh...  the link...
>
> https://github.com/sergey-senozhatsky/printk-todo

lgtm, two quick notes:
- my mail with the fbdev discussion seems to be in the wrong chapter.
Move it from "console_sem" to "fbdev, tty, drm, etc .."?

- feature request for per-console output: Per-console flag to always
use a kthread/offloading, even when oops/panic is happening. kms
definitely wants that. Please note that in that section. I can help
with implementing, once we get there.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-20 17:27                 ` Mark Brown
  2017-06-20 23:28                   ` Steven Rostedt
@ 2017-07-19  7:35                   ` David Woodhouse
  2017-07-20  7:53                     ` Sergey Senozhatsky
  1 sibling, 1 reply; 60+ messages in thread
From: David Woodhouse @ 2017-07-19  7:35 UTC (permalink / raw)
  To: Mark Brown, Sergey Senozhatsky; +Cc: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 1463 bytes --]

On Tue, 2017-06-20 at 18:27 +0100, Mark Brown wrote:
> On Wed, Jun 21, 2017 at 02:11:34AM +0900, Sergey Senozhatsky wrote:
> 
> > 
> > another thing that I found useful is a CPU number of the processor
> > that stored a particular line to the logbuf.
>
> At some point we start reinventing ftrace...  there's issues with
> joining the two up but there should at least be lessons we can learn.
> 

The other way of looking at this is "why are you abusing printk for
stuff that should have been done via ftrace or other means instead".

I confess I haven't got my curmudgeonly brain out of that mode at all,
ever since realising that printk had been made asynchronous and
unreliable (how long ago was that?) and that you could no longer see
the dying gasp of a crashing kernel on its serial console.

Rather than morphing printk into something more capable of bulk
transport, I'd rather see it go back to its roots of
debugging/diagnostics.

The original complaint of "all this printk output makes things too
slow" was better addressed by printing less or at lower severity (or
adjusting the console loglevel), IMO.

As things stand, the requirements for the various printk (ab)use cases
seem to be contradictory — if we're going to have a redesign then I
think it would be good to take a holistic view and decide what it's
actually *supposed* to be used for. And, perhaps more to the point,
what it isn't supposed to be used for.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-06-26  8:46                         ` Jiri Kosina
@ 2017-07-19  7:59                           ` David Woodhouse
  0 siblings, 0 replies; 60+ messages in thread
From: David Woodhouse @ 2017-07-19  7:59 UTC (permalink / raw)
  To: Jiri Kosina, Linus Torvalds; +Cc: Mauro Carvalho Chehab, ksummit

[-- Attachment #1: Type: text/plain, Size: 1796 bytes --]

On Mon, 2017-06-26 at 10:46 +0200, Jiri Kosina wrote:
> On Sat, 24 Jun 2017, Linus Torvalds wrote:
> 
> > 
> > > 
> > > It is how the embedded world operates, RS232, or now more often, RS232 
> > > with a built in USB-RS232 converter, so you use USB on the host.
> > I'm not saying that serial lines shouldn't be an option.
> > 
> > But for a *large* user base, they simply aren't.
> > 
> > On regular PC's, it's often not an option any more. Even in the data
> > center, it's often not an option any more.
> I don't really agree here. Yes, the mid-to-hig-end servers don't probably 
> contain the actual UART chip any more, but the vast majority of those have 
> somehting that's emulated in firmware, and actually do have a serial 
> console line connector (not the 9-pin one, but rather RJ-45 with either 
> Cisco or Yost pinout), which is then connected into serial-over-TCP 
> concentrator box, exposing the serial console over telnet (or some 
> proprietary client application). This is seen in DCs quite frequently.
> 
> Even machines that have very good IPMI support still ship with this.

Yeah, we definitely still have a "serial console" in the data centre,
even if it's not actually RS232 any more. Or indeed "serial".

You want to catch those failures where even kdump doesn't manage to
give you a viable report of the original crash? You'd better be
watching...

Even on regular PCs we have the USB debug ports which can serve the
same purpose.

But still, we're talking about printk being used for its original
$DEITY-intended purpose for debugging and diagnostic data. Not for the
random "hey, here's a channel I can abuse to send data up to userspace"
stuff. I heartily agree with Steven when he says that "printk is used
too freely".

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-07-19  7:26     ` Daniel Vetter
@ 2017-07-20  5:19       ` Sergey Senozhatsky
  0 siblings, 0 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-07-20  5:19 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: ksummit-discuss

On (07/19/17 09:26), Daniel Vetter wrote:
[..]
> > d'oh...  the link...
> >
> > https://github.com/sergey-senozhatsky/printk-todo
> 
> lgtm, two quick notes:
> - my mail with the fbdev discussion seems to be in the wrong chapter.
> Move it from "console_sem" to "fbdev, tty, drm, etc .."?

thanks for taking a look!

and sorry for not being very responsive these weeks, still struggling
to recover from my sickness.

the list is incomplete and very spontaneous, I'll improve it.

> - feature request for per-console output: Per-console flag to always
> use a kthread/offloading, even when oops/panic is happening. kms
> definitely wants that. Please note that in that section. I can help
> with implementing, once we get there.

thanks. will add.

> Per-console flag to always use a kthread/offloading, even when oops/panic
> is happening. kms definitely wants that.
>

hmm... kthread offloading during panic() is really risky. nothing
guarantees that we will be able to call into the scheduler and wake
up that console printing-kthread, or that we will be able to schedule
at all. we may be in panic() from NMI handler, with the rest of CPUs
stopped. it's quite a risky thing to do. that's why we disable printk
offloading when in panic() - we don't want to make the things any
worse.

before doing this I think I want to make call_console_drivers() to be
more reliable. right now we pick the first unseen messages from the
logbuf and iterate over registered consoles calling ->write() on every
driver from the console drivers list. if one of consoles is misbehaving,
then the entire console output mechanism stops: we don't print anything
on other consoles until current con->write() returns. so probably I want
to make it more independent.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Ksummit-discuss] [TECH TOPIC] printk redesign
  2017-07-19  7:35                   ` David Woodhouse
@ 2017-07-20  7:53                     ` Sergey Senozhatsky
  0 siblings, 0 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2017-07-20  7:53 UTC (permalink / raw)
  To: David Woodhouse; +Cc: ksummit-discuss

Hello,

On (07/19/17 09:35), David Woodhouse wrote:
[..]
> > At some point we start reinventing ftrace...  there's issues with
> > joining the two up but there should at least be lessons we can learn.
> > 
> 
> The other way of looking at this is "why are you abusing printk for
> stuff that should have been done via ftrace or other means instead".
>
> I confess I haven't got my curmudgeonly brain out of that mode at all,
> ever since realising that printk had been made asynchronous and
> unreliable (how long ago was that?) and that you could no longer see
> the dying gasp of a crashing kernel on its serial console.
> 
> Rather than morphing printk into something more capable of bulk
> transport, I'd rather see it go back to its roots of
> debugging/diagnostics.
> 
> The original complaint of "all this printk output makes things too
> slow" was better addressed by printing less or at lower severity (or
> adjusting the console loglevel), IMO.
> 
> As things stand, the requirements for the various printk (ab)use cases
> seem to be contradictory — if we're going to have a redesign then I
> think it would be good to take a holistic view and decide what it's
> actually *supposed* to be used for. And, perhaps more to the point,
> what it isn't supposed to be used for.

just some thoughts,

at glance printk has 3 major issues

- it has to do offloading, no doubt.

- printk() can deadlock, easily. (that's the whole reason there is
  printk_deferred())

- printk from NMI is not completely reliable. this area has been
  improved recently; but there are still cases when we can lose
  NMI-printk messages

... but there are more problems. and those issues are not completely
printk fault.

what I mean (and I'm not criticizing anyone),

so we can split printk: defer printing of debug messages and have
direct printing of important messages. and that's where the redesign
hits the first obstacle: direct printing is unreliable. when we do
call_console_drivers() we pass control to the outside world, and we
never know where we will end up at. consoles can invoke timekeeping,
networking, MM, and so on. so I think printk redesign better start
from this part - make call to console drivers more reliable.
if possible.

what I'm talking about, by just one example:

bug report
https://marc.info/?l=dri-devel&m=149938825811219

root cause
https://marc.info/?l=linux-mm&m=149939515214223&w=2

so printk live-locked, and there was no way to see any kernel logs
until Tetsuo sysrq-c'ed the system. and the root cause was all those
complex and difficult dependencies between completely different
subsystems that printk depend on and that, in turn, depend on printk.

> hm, this allocation, per se, looks ok to me. can't really blame it.
> what you had is a combination of factors
>
>        CPU0                    CPU1                            CPU2
>                                                                console_callback()
>                                                                 console_lock()
>                                                                 ^^^^^^^^^^^^^
>        vprintk_emit()          mutex_lock(&par->bo_mutex)
>                                 kzalloc(GFP_KERNEL)
>         console_trylock()        kmem_cache_alloc()              mutex_lock(&par->bo_mutex)
>         ^^^^^^^^^^^^^^^^          io_schedule_timeout

there are more examples.

more closer to the point,
to the best of my knowledge, we don't have that much problems with the
printk logbuf now. we made some progress there over the last year. yes,
NMI printk is not completely awesome.
where we do have problems, I think:

a) we probably need to make more progress towards "and now we print it to the console"
b) print out offloading
c) printk deadlock and the need of printk_deferred()

and it's not always crazy printk abuse to justify the existence of
printk offloading.

example: https://marc.info/?l=linux-mm&m=149977866327662

> you will find that calling cond_resched() (from console_unlock() from printk())
> can cause a delay of nearly one minute, and it can cause a delay of nearly 5 minutes
> to complete one out_of_memory() call. 

example: https://marc.info/?l=linux-kernel&m=149509270422321

printk, to me, is a debugging/diagnostics tool. and we can't fully rely on
it, even we do reasonable things, like OOM print out. moreover, I think, to
some extent, due to printk imperfections, the more debugging options we
enable (CONFIG_DEBUG_PREEMPT, CONFIG_DEBUG_SPINLOCK, etc.) the less stable
the kernel, potentially, gets. because those options use printk() to report
the problems. so might_sleep() or spin_dump() called from "a wrong place"
can eventually deadlock printk() and the system.

example: https://marc.info/?l=linux-kernel&m=149007148320611

well, just my thoughts.

	-ss

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2017-07-20  7:53 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-19  5:21 [Ksummit-discuss] [TECH TOPIC] printk redesign Sergey Senozhatsky
2017-06-19  6:22 ` Hannes Reinecke
2017-06-19 14:39   ` Steven Rostedt
2017-06-19 15:20     ` Andrew Lunn
2017-06-19 15:54       ` Hannes Reinecke
2017-06-19 16:17         ` Andrew Lunn
2017-06-19 16:23         ` Mark Brown
2017-06-20 15:58           ` Sergey Senozhatsky
2017-06-20 16:44             ` Luck, Tony
2017-06-20 17:11               ` Sergey Senozhatsky
2017-06-20 17:27                 ` Mark Brown
2017-06-20 23:28                   ` Steven Rostedt
2017-06-21  7:17                     ` Hannes Reinecke
2017-06-21 11:12                     ` Sergey Senozhatsky
2017-06-22 14:06                       ` Steven Rostedt
2017-06-23  5:43                         ` Sergey Senozhatsky
2017-06-23 13:09                           ` Steven Rostedt
2017-06-21 12:23                     ` Petr Mladek
2017-06-21 14:18                       ` Andrew Lunn
2017-06-23  8:46                         ` Petr Mladek
2017-06-21 16:09                       ` Andrew Lunn
2017-06-23  8:49                         ` Petr Mladek
2017-07-19  7:35                   ` David Woodhouse
2017-07-20  7:53                     ` Sergey Senozhatsky
2017-06-20 16:09         ` Sergey Senozhatsky
2017-06-19 16:26       ` Steven Rostedt
2017-06-19 16:35         ` Andrew Lunn
2017-06-24 11:14         ` Mauro Carvalho Chehab
2017-06-24 14:06           ` Andrew Lunn
2017-06-24 22:42             ` Steven Rostedt
2017-06-24 23:21               ` Andrew Lunn
2017-06-24 23:26                 ` Linus Torvalds
2017-06-24 23:40                   ` Steven Rostedt
2017-06-26 11:16                     ` Sergey Senozhatsky
2017-06-24 23:48                   ` Al Viro
2017-06-25  1:29                     ` Andrew Lunn
2017-06-25  2:41                       ` Linus Torvalds
2017-06-26  8:46                         ` Jiri Kosina
2017-07-19  7:59                           ` David Woodhouse
2017-06-20 15:56     ` Sergey Senozhatsky
2017-06-20 18:45     ` Daniel Vetter
2017-06-21  9:29       ` Petr Mladek
2017-06-21 10:15       ` Sergey Senozhatsky
2017-06-22 13:42         ` Daniel Vetter
2017-06-22 13:48           ` Daniel Vetter
2017-06-22 13:48             ` Daniel Vetter
2017-06-23  9:07             ` Bartlomiej Zolnierkiewicz
2017-06-23  9:07               ` Bartlomiej Zolnierkiewicz
2017-06-27 13:06               ` Sergey Senozhatsky
2017-06-27 13:06                 ` Sergey Senozhatsky
2017-06-23  5:20           ` Sergey Senozhatsky
2017-06-19 23:46 ` Josh Triplett
2017-06-20  8:24   ` Arnd Bergmann
2017-06-20 14:36     ` Steven Rostedt
2017-06-20 15:26       ` Sergey Senozhatsky
2017-06-22 16:35 ` David Howells
2017-07-19  6:24 ` Sergey Senozhatsky
2017-07-19  6:25   ` Sergey Senozhatsky
2017-07-19  7:26     ` Daniel Vetter
2017-07-20  5:19       ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.