* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 6:17 ` Hannes Reinecke
@ 2016-07-19 6:49 ` Josh Triplett
2016-07-19 7:02 ` Hannes Reinecke
2016-07-19 7:33 ` Sergey Senozhatsky
` (2 subsequent siblings)
3 siblings, 1 reply; 60+ messages in thread
From: Josh Triplett @ 2016-07-19 6:49 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: ksummit-discuss
On Tue, Jul 19, 2016 at 08:17:19AM +0200, Hannes Reinecke wrote:
> Yes. The main problem stems from the fact that printk has two different
> and conflicting use-cases:
> - Really urgent, 'I am about to die' messages. Which obviously need to
> be printed out as fast as possible.
> - Rather largish, information/logging 'what I always wanted to tell you'
> type of messages. These messages tend to be very large, but at the end
> it doesn't really matter _when_ they'll be printed as they are
> time-stamped anyway.
>
> For the first use-case you absolutely need a synchronous printk, but
> this is a complete killer for the second case.
> And OTOH having a separate thread is really the way to go for the second
> case, but an absolute no-go for the first.
>
> So I really wonder if it does make sense to lump both use-cases into one
> call, or whether it wouldn't be better to have two distinct calls
> for that (or, for the sake of argument, use KERN_EMERG to trigger
> synchronous printks).
For the sake of argument: what about using loglevel to distinguish the
two cases by default?
If the printk would show up on the console, handle it inline immediately
before returning, so that the user sees it on the console immediately in
case the very next line hangs the system. That also helps with the
debugging approach of copy/pasting many instances of pr_alert("%s:%d:
here\n", __func__, __LINE__) and looking for the last one that shows up.
If the printk would *not* show up on the console, and would only show up
asynchronously in dmesg or a log somewhere, then go ahead and throw it
to the asynchronous printk_kthread context to handle and return, because
if the next line crashes, userspace wouldn't get the opportunity to read
and log it anyway.
Combined with a mechanism like "if the kernel panics, try as hard as
possible to dump out all the pending printks before dying", that seems
like a reasonable default behavior that shouldn't result in surprises.
If the kernel is alive enough that userspace can still log things (such
as if the display hangs but the kernel and userspace are still running),
then the kernel should also still be alive enough to process the pending
printks.
- Josh Triplett
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 6:49 ` Josh Triplett
@ 2016-07-19 7:02 ` Hannes Reinecke
2016-07-19 7:11 ` Geert Uytterhoeven
2016-07-20 22:54 ` Josh Triplett
0 siblings, 2 replies; 60+ messages in thread
From: Hannes Reinecke @ 2016-07-19 7:02 UTC (permalink / raw)
To: Josh Triplett; +Cc: ksummit-discuss
On 07/19/2016 08:49 AM, Josh Triplett wrote:
> On Tue, Jul 19, 2016 at 08:17:19AM +0200, Hannes Reinecke wrote:
>> Yes. The main problem stems from the fact that printk has two different
>> and conflicting use-cases:
>> - Really urgent, 'I am about to die' messages. Which obviously need to
>> be printed out as fast as possible.
>> - Rather largish, information/logging 'what I always wanted to tell you'
>> type of messages. These messages tend to be very large, but at the end
>> it doesn't really matter _when_ they'll be printed as they are
>> time-stamped anyway.
>>
>> For the first use-case you absolutely need a synchronous printk, but
>> this is a complete killer for the second case.
>> And OTOH having a separate thread is really the way to go for the second
>> case, but an absolute no-go for the first.
>>
>> So I really wonder if it does make sense to lump both use-cases into one
>> call, or whether it wouldn't be better to have two distinct calls
>> for that (or, for the sake of argument, use KERN_EMERG to trigger
>> synchronous printks).
>
> For the sake of argument: what about using loglevel to distinguish the
> two cases by default?
>
That's what I've tried to infer by the above statement; KERN_EMERG could
easily used for that sort of thing.
Also I'd like to have everything on board with this idea, namely that
using printk for instantaneous but really long messages it _NOT_ going
to work.
(And I'd love to kill printk line continuation, too, but that's another
story)
> If the printk would show up on the console, handle it inline immediately
> before returning, so that the user sees it on the console immediately in
> case the very next line hangs the system. That also helps with the
> debugging approach of copy/pasting many instances of pr_alert("%s:%d:
> here\n", __func__, __LINE__) and looking for the last one that shows up.
>
> If the printk would *not* show up on the console, and would only show up
> asynchronously in dmesg or a log somewhere, then go ahead and throw it
> to the asynchronous printk_kthread context to handle and return, because
> if the next line crashes, userspace wouldn't get the opportunity to read
> and log it anyway.
>
> Combined with a mechanism like "if the kernel panics, try as hard as
> possible to dump out all the pending printks before dying", that seems
> like a reasonable default behavior that shouldn't result in surprises.
> If the kernel is alive enough that userspace can still log things (such
> as if the display hangs but the kernel and userspace are still running),
> then the kernel should also still be alive enough to process the pending
> printks.
>
But it still leaves us with a possible priority inversion.
How should we deal with situations where the async thread is running and
someone is issuing a synchronous printk?
Should we skip the asynchonous ones and print the synchronous one first,
risking out-of-order messages but a higher probability that the urgent
message is actually printed ?
Or should we elevate everything to synchronous, preserving ordering
but increasing the risk that the synchronous message never ever makes it
to the console?
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.com +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 7:02 ` Hannes Reinecke
@ 2016-07-19 7:11 ` Geert Uytterhoeven
2016-07-20 6:02 ` Jan Kara
2016-07-20 22:54 ` Josh Triplett
1 sibling, 1 reply; 60+ messages in thread
From: Geert Uytterhoeven @ 2016-07-19 7:11 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: ksummit-discuss
On Tue, Jul 19, 2016 at 9:02 AM, Hannes Reinecke <hare@suse.com> wrote:
>> If the printk would show up on the console, handle it inline immediately
>> before returning, so that the user sees it on the console immediately in
>> case the very next line hangs the system. That also helps with the
>> debugging approach of copy/pasting many instances of pr_alert("%s:%d:
>> here\n", __func__, __LINE__) and looking for the last one that shows up.
>>
>> If the printk would *not* show up on the console, and would only show up
>> asynchronously in dmesg or a log somewhere, then go ahead and throw it
>> to the asynchronous printk_kthread context to handle and return, because
>> if the next line crashes, userspace wouldn't get the opportunity to read
>> and log it anyway.
>>
>> Combined with a mechanism like "if the kernel panics, try as hard as
>> possible to dump out all the pending printks before dying", that seems
>> like a reasonable default behavior that shouldn't result in surprises.
>> If the kernel is alive enough that userspace can still log things (such
>> as if the display hangs but the kernel and userspace are still running),
>> then the kernel should also still be alive enough to process the pending
>> printks.
>>
> But it still leaves us with a possible priority inversion.
> How should we deal with situations where the async thread is running and
> someone is issuing a synchronous printk?
> Should we skip the asynchonous ones and print the synchronous one first,
> risking out-of-order messages but a higher probability that the urgent
> message is actually printed ?
There are already out-of-order messages for SMP and continuation,
some of these we want to fix, but always for the same loglevel.
If two messages have a different log level, I see no reason to (try to) keep
ordering. If keeping them together is required, the user should use pr_cont.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 7:11 ` Geert Uytterhoeven
@ 2016-07-20 6:02 ` Jan Kara
0 siblings, 0 replies; 60+ messages in thread
From: Jan Kara @ 2016-07-20 6:02 UTC (permalink / raw)
To: Geert Uytterhoeven; +Cc: ksummit-discuss
On Tue 19-07-16 09:11:40, Geert Uytterhoeven wrote:
> On Tue, Jul 19, 2016 at 9:02 AM, Hannes Reinecke <hare@suse.com> wrote:
> >> If the printk would show up on the console, handle it inline immediately
> >> before returning, so that the user sees it on the console immediately in
> >> case the very next line hangs the system. That also helps with the
> >> debugging approach of copy/pasting many instances of pr_alert("%s:%d:
> >> here\n", __func__, __LINE__) and looking for the last one that shows up.
> >>
> >> If the printk would *not* show up on the console, and would only show up
> >> asynchronously in dmesg or a log somewhere, then go ahead and throw it
> >> to the asynchronous printk_kthread context to handle and return, because
> >> if the next line crashes, userspace wouldn't get the opportunity to read
> >> and log it anyway.
> >>
> >> Combined with a mechanism like "if the kernel panics, try as hard as
> >> possible to dump out all the pending printks before dying", that seems
> >> like a reasonable default behavior that shouldn't result in surprises.
> >> If the kernel is alive enough that userspace can still log things (such
> >> as if the display hangs but the kernel and userspace are still running),
> >> then the kernel should also still be alive enough to process the pending
> >> printks.
> >>
> > But it still leaves us with a possible priority inversion.
> > How should we deal with situations where the async thread is running and
> > someone is issuing a synchronous printk?
> > Should we skip the asynchonous ones and print the synchronous one first,
> > risking out-of-order messages but a higher probability that the urgent
> > message is actually printed ?
>
> There are already out-of-order messages for SMP and continuation,
> some of these we want to fix, but always for the same loglevel.
> If two messages have a different log level, I see no reason to (try to) keep
> ordering. If keeping them together is required, the user should use pr_cont.
This does not really match reality. Just look at how Oops is printed. It is
a mix of different loglevels and you certainly want to print them together
and in the right order. Generally I don't think reordering messages is
acceptable because less severe messages before a critical message can be
very important in analyzing what has happened.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 7:02 ` Hannes Reinecke
2016-07-19 7:11 ` Geert Uytterhoeven
@ 2016-07-20 22:54 ` Josh Triplett
2016-07-21 0:46 ` Sergey Senozhatsky
1 sibling, 1 reply; 60+ messages in thread
From: Josh Triplett @ 2016-07-20 22:54 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: ksummit-discuss
On Tue, Jul 19, 2016 at 09:02:17AM +0200, Hannes Reinecke wrote:
> On 07/19/2016 08:49 AM, Josh Triplett wrote:
> > On Tue, Jul 19, 2016 at 08:17:19AM +0200, Hannes Reinecke wrote:
> >> Yes. The main problem stems from the fact that printk has two different
> >> and conflicting use-cases:
> >> - Really urgent, 'I am about to die' messages. Which obviously need to
> >> be printed out as fast as possible.
> >> - Rather largish, information/logging 'what I always wanted to tell you'
> >> type of messages. These messages tend to be very large, but at the end
> >> it doesn't really matter _when_ they'll be printed as they are
> >> time-stamped anyway.
> >>
> >> For the first use-case you absolutely need a synchronous printk, but
> >> this is a complete killer for the second case.
> >> And OTOH having a separate thread is really the way to go for the second
> >> case, but an absolute no-go for the first.
> >>
> >> So I really wonder if it does make sense to lump both use-cases into one
> >> call, or whether it wouldn't be better to have two distinct calls
> >> for that (or, for the sake of argument, use KERN_EMERG to trigger
> >> synchronous printks).
> >
> > For the sake of argument: what about using loglevel to distinguish the
> > two cases by default?
> >
> That's what I've tried to infer by the above statement; KERN_EMERG could
> easily used for that sort of thing.
I don't mean using just the priority level of the printk call. I mean
using the current kernel loglevel, as in what level it displays to the
console, as set on the kernel command line with the loglevel= parameter.
printk could quickly check the priority level of the call versus the
current kernel loglevel to determine if the message would go to the
console or not, and use that to decide whether to handle it
synchronously or asynchronously.
- Josh Triplett
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-20 22:54 ` Josh Triplett
@ 2016-07-21 0:46 ` Sergey Senozhatsky
2016-07-21 1:12 ` Josh Triplett
0 siblings, 1 reply; 60+ messages in thread
From: Sergey Senozhatsky @ 2016-07-21 0:46 UTC (permalink / raw)
To: Josh Triplett; +Cc: ksummit-discuss
Hello,
On (07/20/16 15:54), Josh Triplett wrote:
[..]
> > That's what I've tried to infer by the above statement; KERN_EMERG could
> > easily used for that sort of thing.
>
> I don't mean using just the priority level of the printk call. I mean
> using the current kernel loglevel, as in what level it displays to the
> console, as set on the kernel command line with the loglevel= parameter.
> printk could quickly check the priority level of the call versus the
> current kernel loglevel to determine if the message would go to the
> console or not, and use that to decide whether to handle it
> synchronously or asynchronously.
between loglevel check in printk() and actual printing console loglevel
may change. thus printk() does not make this (severity level filtering)
decision. console_unlock() does, on per-log record basis:
console_unlock()
{
for (;;) {
spin_lock(logbuf_lock)
skip:
msg = log_from_idx(console_idx);
if (msg->level >= console_loglevel)
goto skip;
spin_unlock(logbuf_lock)
call_console_drivers(text)
}
}
and by the time we call console_unlock() we better already be
either in async mode, or sync mode. unless we want to rewrite
console_unlock().
-ss
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 0:46 ` Sergey Senozhatsky
@ 2016-07-21 1:12 ` Josh Triplett
0 siblings, 0 replies; 60+ messages in thread
From: Josh Triplett @ 2016-07-21 1:12 UTC (permalink / raw)
To: Sergey Senozhatsky; +Cc: ksummit-discuss
On Thu, Jul 21, 2016 at 09:46:16AM +0900, Sergey Senozhatsky wrote:
> Hello,
>
> On (07/20/16 15:54), Josh Triplett wrote:
> [..]
> > > That's what I've tried to infer by the above statement; KERN_EMERG could
> > > easily used for that sort of thing.
> >
> > I don't mean using just the priority level of the printk call. I mean
> > using the current kernel loglevel, as in what level it displays to the
> > console, as set on the kernel command line with the loglevel= parameter.
> > printk could quickly check the priority level of the call versus the
> > current kernel loglevel to determine if the message would go to the
> > console or not, and use that to decide whether to handle it
> > synchronously or asynchronously.
>
> between loglevel check in printk() and actual printing console loglevel
> may change. thus printk() does not make this (severity level filtering)
> decision. console_unlock() does, on per-log record basis:
[...]
> and by the time we call console_unlock() we better already be
> either in async mode, or sync mode. unless we want to rewrite
> console_unlock().
That's exactly why I was suggesting this filter. Yes, loglevel can
change at any time, but I don't think it's worth worrying about whether
messages that happen to race with the change to loglevel get handled
synchronously or asynchronously.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 6:17 ` Hannes Reinecke
2016-07-19 6:49 ` Josh Triplett
@ 2016-07-19 7:33 ` Sergey Senozhatsky
2016-07-19 7:38 ` Hannes Reinecke
2016-07-19 7:46 ` Christian Borntraeger
2016-07-20 3:35 ` Wangnan (F)
3 siblings, 1 reply; 60+ messages in thread
From: Sergey Senozhatsky @ 2016-07-19 7:33 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: ksummit-discuss
On (07/19/16 08:17), Hannes Reinecke wrote:
[..]
> Yes. The main problem stems from the fact that printk has two different
> and conflicting use-cases:
> - Really urgent, 'I am about to die' messages. Which obviously need to
> be printed out as fast as possible.
> - Rather largish, information/logging 'what I always wanted to tell you'
> type of messages. These messages tend to be very large, but at the end
> it doesn't really matter _when_ they'll be printed as they are
> time-stamped anyway.
>
> For the first use-case you absolutely need a synchronous printk, but
> this is a complete killer for the second case.
> And OTOH having a separate thread is really the way to go for the second
> case, but an absolute no-go for the first.
>
> So I really wonder if it does make sense to lump both use-cases into one
> call, or whether it wouldn't be better to have two distinct calls
> for that (or, for the sake of argument, use KERN_EMERG to trigger
> synchronous printks).
yes, panic() must be in sync printk mode. but we do it a
bit differently: console_verbose() forces printk to switch
to sync mode.
so panic() goes like this:
panic()
{
console_verbose(); # switch to sync printk. forever.
bust_spinlocks();
pr_emerg("Kernel panic - not syncing...");
....
debug_locks_off();
console_flush_on_panic(); # flushes kernel log_bug in
# sync mode
// the rest of panic()
}
-ss
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 7:33 ` Sergey Senozhatsky
@ 2016-07-19 7:38 ` Hannes Reinecke
2016-07-19 7:46 ` Sergey Senozhatsky
2016-07-20 6:09 ` Jan Kara
0 siblings, 2 replies; 60+ messages in thread
From: Hannes Reinecke @ 2016-07-19 7:38 UTC (permalink / raw)
To: Sergey Senozhatsky; +Cc: ksummit-discuss
On 07/19/2016 09:33 AM, Sergey Senozhatsky wrote:
> On (07/19/16 08:17), Hannes Reinecke wrote:
> [..]
>> Yes. The main problem stems from the fact that printk has two different
>> and conflicting use-cases:
>> - Really urgent, 'I am about to die' messages. Which obviously need to
>> be printed out as fast as possible.
>> - Rather largish, information/logging 'what I always wanted to tell you'
>> type of messages. These messages tend to be very large, but at the end
>> it doesn't really matter _when_ they'll be printed as they are
>> time-stamped anyway.
>>
>> For the first use-case you absolutely need a synchronous printk, but
>> this is a complete killer for the second case.
>> And OTOH having a separate thread is really the way to go for the second
>> case, but an absolute no-go for the first.
>>
>> So I really wonder if it does make sense to lump both use-cases into one
>> call, or whether it wouldn't be better to have two distinct calls
>> for that (or, for the sake of argument, use KERN_EMERG to trigger
>> synchronous printks).
>
> yes, panic() must be in sync printk mode. but we do it a
> bit differently: console_verbose() forces printk to switch
> to sync mode.
>
> so panic() goes like this:
>
> panic()
> {
> console_verbose(); # switch to sync printk. forever.
> bust_spinlocks();
> pr_emerg("Kernel panic - not syncing...");
>
> ....
> debug_locks_off();
> console_flush_on_panic(); # flushes kernel log_bug in
> # sync mode
>
> // the rest of panic()
>
> }
>
> -ss
>
but this is precisely what I meant by priority inversion:
If there are lots of messages in the printk buffer we might not be
_able_ to print out everything as the machine died before the entire
printk buffer could be printed.
Which means we will never see the panic message, at which point we could
as well drop the whole exercise.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.com +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 7:38 ` Hannes Reinecke
@ 2016-07-19 7:46 ` Sergey Senozhatsky
2016-07-19 8:02 ` Hannes Reinecke
2016-07-20 6:09 ` Jan Kara
1 sibling, 1 reply; 60+ messages in thread
From: Sergey Senozhatsky @ 2016-07-19 7:46 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: ksummit-discuss
On (07/19/16 09:38), Hannes Reinecke wrote:
[..]
> > yes, panic() must be in sync printk mode. but we do it a
> > bit differently: console_verbose() forces printk to switch
> > to sync mode.
> >
> > so panic() goes like this:
> >
> > panic()
> > {
> > console_verbose(); # switch to sync printk. forever.
> > bust_spinlocks();
> > pr_emerg("Kernel panic - not syncing...");
> >
> > ....
> > debug_locks_off();
> > console_flush_on_panic(); # flushes kernel log_bug in
> > # sync mode
> >
> > // the rest of panic()
> >
> > }
> >
> > -ss
> >
> but this is precisely what I meant by priority inversion:
> If there are lots of messages in the printk buffer we might not be
> _able_ to print out everything as the machine died before the entire
> printk buffer could be printed.
why would it die? the same CPU that panics the system flushes the
log_buf in sync mode. console_flush_on_panic() is invoked after
panic_smp_send_stop(), which is supposed to IPI_STOP other CPUs on the
system, but before panic() reboots the system (if it must reboot it).
and it does not really care whether console_sem is available, it goes
to console_unlock() in any case
void console_flush_on_panic(void)
{
/*
* If someone else is holding the console lock, trylock will fail
* and may_schedule may be set. Ignore and proceed to unlock so
* that messages are flushed out. As this can be called from any
* context and we don't want to get preempted while flushing,
* ensure may_schedule is cleared.
*/
console_trylock();
console_may_schedule = 0;
console_unlock();
}
so why would the system die before we console_unlock()?
-ss
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 7:46 ` Sergey Senozhatsky
@ 2016-07-19 8:02 ` Hannes Reinecke
2016-07-19 8:23 ` Sergey Senozhatsky
2016-07-21 10:36 ` David Woodhouse
0 siblings, 2 replies; 60+ messages in thread
From: Hannes Reinecke @ 2016-07-19 8:02 UTC (permalink / raw)
To: Sergey Senozhatsky; +Cc: ksummit-discuss
On 07/19/2016 09:46 AM, Sergey Senozhatsky wrote:
> On (07/19/16 09:38), Hannes Reinecke wrote:
> [..]
>>> yes, panic() must be in sync printk mode. but we do it a
>>> bit differently: console_verbose() forces printk to switch
>>> to sync mode.
>>>
>>> so panic() goes like this:
>>>
>>> panic()
>>> {
>>> console_verbose(); # switch to sync printk. forever.
>>> bust_spinlocks();
>>> pr_emerg("Kernel panic - not syncing...");
>>>
>>> ....
>>> debug_locks_off();
>>> console_flush_on_panic(); # flushes kernel log_bug in
>>> # sync mode
>>>
>>> // the rest of panic()
>>>
>>> }
>>>
>>> -ss
>>>
>> but this is precisely what I meant by priority inversion:
>> If there are lots of messages in the printk buffer we might not be
>> _able_ to print out everything as the machine died before the entire
>> printk buffer could be printed.
>
> why would it die? the same CPU that panics the system flushes the
> log_buf in sync mode. console_flush_on_panic() is invoked after
> panic_smp_send_stop(), which is supposed to IPI_STOP other CPUs on the
> system, but before panic() reboots the system (if it must reboot it).
> and it does not really care whether console_sem is available, it goes
> to console_unlock() in any case
>
> void console_flush_on_panic(void)
> {
> /*
> * If someone else is holding the console lock, trylock will fail
> * and may_schedule may be set. Ignore and proceed to unlock so
> * that messages are flushed out. As this can be called from any
> * context and we don't want to get preempted while flushing,
> * ensure may_schedule is cleared.
> */
> console_trylock();
> console_may_schedule = 0;
> console_unlock();
> }
>
> so why would the system die before we console_unlock()?
>
Errm.
Because it doesn't have any other chance?
Like, hard lockup?
Power down?
Hardware dead?
Slightly puzzled,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.com +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 8:02 ` Hannes Reinecke
@ 2016-07-19 8:23 ` Sergey Senozhatsky
2016-07-21 10:36 ` David Woodhouse
1 sibling, 0 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2016-07-19 8:23 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: ksummit-discuss
On (07/19/16 10:02), Hannes Reinecke wrote:
[..]
> > so why would the system die before we console_unlock()?
> >
> Errm.
> Because it doesn't have any other chance?
> Like, hard lockup?
> Power down?
> Hardware dead?
ah, ok. so mostly for the reasons outside of printk/panic code.
hard lockup, case. hm, I think nmi_panic() is still going to
call panic() at some point. so we have chances to flush log_buf
messages.
power down or h/w failure, however, can't be fixed up neither
by sync nor by async printk. well, we can't help here.
the thing that really can stop us from flushing the messages
is the fact that console driver may be in some unknown state.
for example, have some internal locks owned by non-panic CPU,
which we IPI_STOP-ed before console_flush_on_panic(). console
drivers are normally unaware of the fact that the system is in
panic now. so some sort of console->write_on_panic(), or
console->reset() would probably be helpful, but that's a bit
different topic.
-ss
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 8:02 ` Hannes Reinecke
2016-07-19 8:23 ` Sergey Senozhatsky
@ 2016-07-21 10:36 ` David Woodhouse
2016-07-21 12:31 ` Jan Kara
2016-07-28 2:55 ` Steven Rostedt
1 sibling, 2 replies; 60+ messages in thread
From: David Woodhouse @ 2016-07-21 10:36 UTC (permalink / raw)
To: Hannes Reinecke, Sergey Senozhatsky; +Cc: ksummit-discuss
[-- Attachment #1: Type: text/plain, Size: 1547 bytes --]
On Tue, 2016-07-19 at 10:02 +0200, Hannes Reinecke wrote:
> > so why would the system die before we console_unlock()?
> >
> Errm.
> Because it doesn't have any other chance?
> Like, hard lockup?
> Power down?
> Hardware dead?
>
> Slightly puzzled,
Right. This was exactly the kind of hang I was chasing shortly before
last year's KS — an interrupt storm killing the box (because of a
tendency in network drivers to return IRQ_HANDLED and not really care
if we *had* done so, which ISTR arguing with DaveM about separately).
Sometimes you don't get a nice clean panic. Sometimes you just get a
lockup or a hard reset.
Which is also why it doesn't help much to try to use the level of an
individual printk to determine whether it should be synchronous or not.
In this case it was all KERN_DEBUG messages from the network driver,
which I was logging to the serial port so I could see what was
happening... but which weren't making it out the port before the
lockup.
A viable solution to fix this might be a 'synchronous' flag on the
console itself — so I could boot with 'console=ttyS0,synchronous' and
get a debuggable system again, Or maybe it would be simpler to have a
system-wide control which makes all consoles synchronous, if that's
easier. Either way, we do need the option, and we need it to apply to
*all* output, not just KERN_EMERG messages.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 10:36 ` David Woodhouse
@ 2016-07-21 12:31 ` Jan Kara
2016-07-28 2:55 ` Steven Rostedt
1 sibling, 0 replies; 60+ messages in thread
From: Jan Kara @ 2016-07-21 12:31 UTC (permalink / raw)
To: David Woodhouse; +Cc: ksummit-discuss
On Thu 21-07-16 11:36:42, David Woodhouse wrote:
> On Tue, 2016-07-19 at 10:02 +0200, Hannes Reinecke wrote:
> > > so why would the system die before we console_unlock()?
> > >
> > Errm.
> > Because it doesn't have any other chance?
> > Like, hard lockup?
> > Power down?
> > Hardware dead?
> >
> > Slightly puzzled,
>
> Right. This was exactly the kind of hang I was chasing shortly before
> last year's KS — an interrupt storm killing the box (because of a
> tendency in network drivers to return IRQ_HANDLED and not really care
> if we *had* done so, which ISTR arguing with DaveM about separately).
>
> Sometimes you don't get a nice clean panic. Sometimes you just get a
> lockup or a hard reset.
>
> Which is also why it doesn't help much to try to use the level of an
> individual printk to determine whether it should be synchronous or not.
> In this case it was all KERN_DEBUG messages from the network driver,
> which I was logging to the serial port so I could see what was
> happening... but which weren't making it out the port before the
> lockup.
>
> A viable solution to fix this might be a 'synchronous' flag on the
> console itself — so I could boot with 'console=ttyS0,synchronous' and
> get a debuggable system again, Or maybe it would be simpler to have a
> system-wide control which makes all consoles synchronous, if that's
> easier. Either way, we do need the option, and we need it to apply to
> *all* output, not just KERN_EMERG messages.
Yes, and something like this is already implemented in the patchset - you
have /sys/kernel/printk/synchronous tunable and you can switch its value
anytime between 0 and 1 (or specify its value as a kernel parameter) and
the printk behavior changes. So for debugging the patchset already supports
all the necessary tuning.
Of course there are cases where you run a fleet of production machines and
you don't know in advance which and when fails. Then async printk may
somewhat reduce debuggability of this. But the above tunable still gives
reasonable handle to userspace to cater for cases like that - e.g. you can
switch to synchronous printk after the booting is finished and the printk
load is lower or based on whatever other heuristic that you are able to
invent in userspace...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 10:36 ` David Woodhouse
2016-07-21 12:31 ` Jan Kara
@ 2016-07-28 2:55 ` Steven Rostedt
1 sibling, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2016-07-28 2:55 UTC (permalink / raw)
To: David Woodhouse; +Cc: ksummit-discuss
On Thu, 21 Jul 2016 11:36:42 +0100
David Woodhouse <dwmw2@infradead.org> wrote:
> A viable solution to fix this might be a 'synchronous' flag on the
> console itself — so I could boot with 'console=ttyS0,synchronous' and
> get a debuggable system again, Or maybe it would be simpler to have a
> system-wide control which makes all consoles synchronous, if that's
> easier. Either way, we do need the option, and we need it to apply to
> *all* output, not just KERN_EMERG messages.
>
For these cases I use, earlyprintk=ttyS0,keep
;-)
Actually, the RT patch has a option to force all printk()s to be
earlyprintk directly. I have a separate patch that has a special
spinlock to keep SMP from stomping all over the output. It's special
because it locks per CPU, if an NMI comes in while the current context
has the lock and is printing, it will corrupt the buffer being printed,
but I don't care at that moment. If NMI is doing printks, its message
will be more important anyway.
-- Steve
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 7:38 ` Hannes Reinecke
2016-07-19 7:46 ` Sergey Senozhatsky
@ 2016-07-20 6:09 ` Jan Kara
1 sibling, 0 replies; 60+ messages in thread
From: Jan Kara @ 2016-07-20 6:09 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: ksummit-discuss
On Tue 19-07-16 09:38:10, Hannes Reinecke wrote:
> On 07/19/2016 09:33 AM, Sergey Senozhatsky wrote:
> > On (07/19/16 08:17), Hannes Reinecke wrote:
> > [..]
> >> Yes. The main problem stems from the fact that printk has two different
> >> and conflicting use-cases:
> >> - Really urgent, 'I am about to die' messages. Which obviously need to
> >> be printed out as fast as possible.
> >> - Rather largish, information/logging 'what I always wanted to tell you'
> >> type of messages. These messages tend to be very large, but at the end
> >> it doesn't really matter _when_ they'll be printed as they are
> >> time-stamped anyway.
> >>
> >> For the first use-case you absolutely need a synchronous printk, but
> >> this is a complete killer for the second case.
> >> And OTOH having a separate thread is really the way to go for the second
> >> case, but an absolute no-go for the first.
> >>
> >> So I really wonder if it does make sense to lump both use-cases into one
> >> call, or whether it wouldn't be better to have two distinct calls
> >> for that (or, for the sake of argument, use KERN_EMERG to trigger
> >> synchronous printks).
> >
> > yes, panic() must be in sync printk mode. but we do it a
> > bit differently: console_verbose() forces printk to switch
> > to sync mode.
> >
> > so panic() goes like this:
> >
> > panic()
> > {
> > console_verbose(); # switch to sync printk. forever.
> > bust_spinlocks();
> > pr_emerg("Kernel panic - not syncing...");
> >
> > ....
> > debug_locks_off();
> > console_flush_on_panic(); # flushes kernel log_bug in
> > # sync mode
> >
> > // the rest of panic()
> >
> > }
> >
> > -ss
> >
> but this is precisely what I meant by priority inversion:
> If there are lots of messages in the printk buffer we might not be
> _able_ to print out everything as the machine died before the entire
> printk buffer could be printed.
> Which means we will never see the panic message, at which point we could
> as well drop the whole exercise.
I like Linus's suggestion how to fix this problem: skip part of buffered
messages and print just last 100 of them before the critical event. I have
a patch to do this but for now Sergey removed it from the patch set because
it is mostly independent and there were some technical issues with it.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 6:17 ` Hannes Reinecke
2016-07-19 6:49 ` Josh Triplett
2016-07-19 7:33 ` Sergey Senozhatsky
@ 2016-07-19 7:46 ` Christian Borntraeger
2016-07-19 7:53 ` Christian Borntraeger
2016-07-20 3:35 ` Wangnan (F)
3 siblings, 1 reply; 60+ messages in thread
From: Christian Borntraeger @ 2016-07-19 7:46 UTC (permalink / raw)
To: Hannes Reinecke, ksummit-discuss
On 07/19/2016 08:17 AM, Hannes Reinecke wrote:
> On 07/19/2016 05:47 AM, Sergey Senozhatsky wrote:
>> Hello,
>>
>> Wondering if anyone will be interested in printk-related topics
>> (or we can handle it in the mailing list).
>>
>> What I have on my list is:
>>
>>
>> - synchronous printk()
>>
>> printk() prints messages from kernel printk buffer until the buffer
>> is empty. When serial console is attached, printing is slow and thus
>> other CPUs in the system have plenty of time to append new messages to
>> the buffer while one CPU is printing. Thus the CPU can spend unbounded
>> amount of time doing printing in console_unlock(). This is especially
>> serious problem if the printk() calling console_unlock() was called with
>> interrupts disabled, or from IRQ, or from spin_lock protected section
>> (if the spinlock is contended), etc. etc. IOW, printk() is quite dangerous
>> function to call in some cases, it can cause different types of lockups
>> (soft, hard, spinlock), stalls and so on.
>>
>> we have some progress on this side. printk() can offload printing from
>> sensitive and unsafe contexts to a schedulable printk_kthread context (a
>> special purpose printing kthread).
>> but "The whole idea remains worrisome", per Andrew :)
>>
> Yes. The main problem stems from the fact that printk has two different
> and conflicting use-cases:
> - Really urgent, 'I am about to die' messages. Which obviously need to
> be printed out as fast as possible.
> - Rather largish, information/logging 'what I always wanted to tell you'
> type of messages. These messages tend to be very large, but at the end
> it doesn't really matter _when_ they'll be printed as they are
> time-stamped anyway.
I think you brought up this topic last year in a lightning talk, correct?
Didn't Linus say "fix the console then"?
>
> For the first use-case you absolutely need a synchronous printk, but
> this is a complete killer for the second case.
> And OTOH having a separate thread is really the way to go for the second
> case, but an absolute no-go for the first.
>
> So I really wonder if it does make sense to lump both use-cases into one
> call, or whether it wouldn't be better to have two distinct calls
> for that (or, for the sake of argument, use KERN_EMERG to trigger
> synchronous printks).
>
> Cheers,
>
> Hannes
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 7:46 ` Christian Borntraeger
@ 2016-07-19 7:53 ` Christian Borntraeger
2016-07-19 13:55 ` Jan Kara
0 siblings, 1 reply; 60+ messages in thread
From: Christian Borntraeger @ 2016-07-19 7:53 UTC (permalink / raw)
To: Hannes Reinecke, ksummit-discuss
On 07/19/2016 09:46 AM, Christian Borntraeger wrote:
> On 07/19/2016 08:17 AM, Hannes Reinecke wrote:
>> On 07/19/2016 05:47 AM, Sergey Senozhatsky wrote:
>>> Hello,
>>>
>>> Wondering if anyone will be interested in printk-related topics
>>> (or we can handle it in the mailing list).
>>>
>>> What I have on my list is:
>>>
>>>
>>> - synchronous printk()
>>>
>>> printk() prints messages from kernel printk buffer until the buffer
>>> is empty. When serial console is attached, printing is slow and thus
>>> other CPUs in the system have plenty of time to append new messages to
>>> the buffer while one CPU is printing. Thus the CPU can spend unbounded
>>> amount of time doing printing in console_unlock(). This is especially
>>> serious problem if the printk() calling console_unlock() was called with
>>> interrupts disabled, or from IRQ, or from spin_lock protected section
>>> (if the spinlock is contended), etc. etc. IOW, printk() is quite dangerous
>>> function to call in some cases, it can cause different types of lockups
>>> (soft, hard, spinlock), stalls and so on.
>>>
>>> we have some progress on this side. printk() can offload printing from
>>> sensitive and unsafe contexts to a schedulable printk_kthread context (a
>>> special purpose printing kthread).
>>> but "The whole idea remains worrisome", per Andrew :)
>>>
>> Yes. The main problem stems from the fact that printk has two different
>> and conflicting use-cases:
>> - Really urgent, 'I am about to die' messages. Which obviously need to
>> be printed out as fast as possible.
>> - Rather largish, information/logging 'what I always wanted to tell you'
>> type of messages. These messages tend to be very large, but at the end
>> it doesn't really matter _when_ they'll be printed as they are
>> time-stamped anyway.
>
> I think you brought up this topic last year in a lightning talk, correct?
> Didn't Linus say "fix the console then"?
Which does not make the problem go away, though, so yes - a session to
discuss this is probably a good idea - it really depends on how strong
Linus opinion on this topic is ;-)
On s390 we have for LPAR the relatively slow operating system messages console,
which goes out to the service element.
In
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=25b41a7b67ee4f4d12cee8a4b8b5929e36c27e29
we provided a console specific fix^Whack to allow to tune the buffer size
and to allow to drop messages. Better solutions are certainly welcome.
Christian
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 7:53 ` Christian Borntraeger
@ 2016-07-19 13:55 ` Jan Kara
2016-07-28 2:59 ` Steven Rostedt
0 siblings, 1 reply; 60+ messages in thread
From: Jan Kara @ 2016-07-19 13:55 UTC (permalink / raw)
To: Christian Borntraeger; +Cc: ksummit-discuss
On Tue 19-07-16 09:53:01, Christian Borntraeger wrote:
> On 07/19/2016 09:46 AM, Christian Borntraeger wrote:
> > On 07/19/2016 08:17 AM, Hannes Reinecke wrote:
> >> On 07/19/2016 05:47 AM, Sergey Senozhatsky wrote:
> >>> Hello,
> >>>
> >>> Wondering if anyone will be interested in printk-related topics
> >>> (or we can handle it in the mailing list).
> >>>
> >>> What I have on my list is:
> >>>
> >>>
> >>> - synchronous printk()
> >>>
> >>> printk() prints messages from kernel printk buffer until the buffer
> >>> is empty. When serial console is attached, printing is slow and thus
> >>> other CPUs in the system have plenty of time to append new messages to
> >>> the buffer while one CPU is printing. Thus the CPU can spend unbounded
> >>> amount of time doing printing in console_unlock(). This is especially
> >>> serious problem if the printk() calling console_unlock() was called with
> >>> interrupts disabled, or from IRQ, or from spin_lock protected section
> >>> (if the spinlock is contended), etc. etc. IOW, printk() is quite dangerous
> >>> function to call in some cases, it can cause different types of lockups
> >>> (soft, hard, spinlock), stalls and so on.
> >>>
> >>> we have some progress on this side. printk() can offload printing from
> >>> sensitive and unsafe contexts to a schedulable printk_kthread context (a
> >>> special purpose printing kthread).
> >>> but "The whole idea remains worrisome", per Andrew :)
> >>>
> >> Yes. The main problem stems from the fact that printk has two different
> >> and conflicting use-cases:
> >> - Really urgent, 'I am about to die' messages. Which obviously need to
> >> be printed out as fast as possible.
> >> - Rather largish, information/logging 'what I always wanted to tell you'
> >> type of messages. These messages tend to be very large, but at the end
> >> it doesn't really matter _when_ they'll be printed as they are
> >> time-stamped anyway.
> >
> > I think you brought up this topic last year in a lightning talk, correct?
> > Didn't Linus say "fix the console then"?
>
> Which does not make the problem go away, though, so yes - a session to
> discuss this is probably a good idea - it really depends on how strong
> Linus opinion on this topic is ;-)
Well, he backed up from that statement later a bit. So when I was
discussing this problem with him at last KS he agreed that making printk
async is the way to go and he specifically dismissed attempts to print
synchronously for a while and fall back to async only later. He just wanted
some way to get things to console synchronously once things go bad (oops,
panic,...) which is discussed elsewhere in this thread.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 13:55 ` Jan Kara
@ 2016-07-28 2:59 ` Steven Rostedt
2016-07-28 4:12 ` Sergey Senozhatsky
0 siblings, 1 reply; 60+ messages in thread
From: Steven Rostedt @ 2016-07-28 2:59 UTC (permalink / raw)
To: Jan Kara; +Cc: ksummit-discuss
On Tue, 19 Jul 2016 15:55:29 +0200
Jan Kara <jack@suse.cz> wrote:
> Well, he backed up from that statement later a bit. So when I was
> discussing this problem with him at last KS he agreed that making printk
> async is the way to go and he specifically dismissed attempts to print
> synchronously for a while and fall back to async only later. He just wanted
> some way to get things to console synchronously once things go bad (oops,
> panic,...) which is discussed elsewhere in this thread.
Note, my selftests require synchronous printing, so I hope this feature
still allows locations to force that without being in oops mode. The
tracing selftests can cause triple faults, and seeing where it happened
is a must. I already had issues once when printk changed (temporarily)
to only dump out lines when a \n was seen. That caused me to think the
wrong test was rebooting the box.
-- Steve
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-28 2:59 ` Steven Rostedt
@ 2016-07-28 4:12 ` Sergey Senozhatsky
2016-07-28 13:02 ` Steven Rostedt
0 siblings, 1 reply; 60+ messages in thread
From: Sergey Senozhatsky @ 2016-07-28 4:12 UTC (permalink / raw)
To: Steven Rostedt; +Cc: ksummit-discuss
Hello Steven,
On (07/27/16 22:59), Steven Rostedt wrote:
> > Well, he backed up from that statement later a bit. So when I was
> > discussing this problem with him at last KS he agreed that making printk
> > async is the way to go and he specifically dismissed attempts to print
> > synchronously for a while and fall back to async only later. He just wanted
> > some way to get things to console synchronously once things go bad (oops,
> > panic,...) which is discussed elsewhere in this thread.
>
> Note, my selftests require synchronous printing, so I hope this feature
> still allows locations to force that without being in oops mode. The
> tracing selftests can cause triple faults, and seeing where it happened
> is a must. I already had issues once when printk changed (temporarily)
> to only dump out lines when a \n was seen. That caused me to think the
> wrong test was rebooting the box.
if you run the tests from user space, then there is a way to change
printk behavior at will. if from kernel, then... there is no really
nice way, as of yet.
you can do something like (ugly)
console_lock();
test();
console_unlock();
or change console_loglevel to CONSOLE_LOGLEVEL_MOTORMOUTH temporarily
console_verbose();
test();
or boot with async printk (if needed) then change to sync mode from
user space and then start the test...
by default printk is in sync (old) mode; we don't want to enforce printk
changes on the systems that have never seen any issues with printk. so unless
you pass aync printk boot param or change printk to async from user space,
it must work just as before.
-ss
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-28 4:12 ` Sergey Senozhatsky
@ 2016-07-28 13:02 ` Steven Rostedt
0 siblings, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2016-07-28 13:02 UTC (permalink / raw)
To: Sergey Senozhatsky; +Cc: ksummit-discuss
On Thu, 28 Jul 2016 13:12:51 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:
> Hello Steven,
>
> On (07/27/16 22:59), Steven Rostedt wrote:
> > > Well, he backed up from that statement later a bit. So when I was
> > > discussing this problem with him at last KS he agreed that making printk
> > > async is the way to go and he specifically dismissed attempts to print
> > > synchronously for a while and fall back to async only later. He just wanted
> > > some way to get things to console synchronously once things go bad (oops,
> > > panic,...) which is discussed elsewhere in this thread.
> >
> > Note, my selftests require synchronous printing, so I hope this feature
> > still allows locations to force that without being in oops mode. The
> > tracing selftests can cause triple faults, and seeing where it happened
> > is a must. I already had issues once when printk changed (temporarily)
> > to only dump out lines when a \n was seen. That caused me to think the
> > wrong test was rebooting the box.
>
> if you run the tests from user space, then there is a way to change
> printk behavior at will. if from kernel, then... there is no really
> nice way, as of yet.
The tests are run by the kernel at boot up.
>
> you can do something like (ugly)
>
> console_lock();
> test();
> console_unlock();
>
> or change console_loglevel to CONSOLE_LOGLEVEL_MOTORMOUTH temporarily
A temporary force also works.
-- Steve
>
> console_verbose();
> test();
>
> or boot with async printk (if needed) then change to sync mode from
> user space and then start the test...
>
> by default printk is in sync (old) mode; we don't want to enforce printk
> changes on the systems that have never seen any issues with printk. so unless
> you pass aync printk boot param or change printk to async from user space,
> it must work just as before.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-19 6:17 ` Hannes Reinecke
` (2 preceding siblings ...)
2016-07-19 7:46 ` Christian Borntraeger
@ 2016-07-20 3:35 ` Wangnan (F)
2016-07-21 1:16 ` Andy Lutomirski
3 siblings, 1 reply; 60+ messages in thread
From: Wangnan (F) @ 2016-07-20 3:35 UTC (permalink / raw)
To: Hannes Reinecke, ksummit-discuss
On 2016/7/19 14:17, Hannes Reinecke wrote:
> On 07/19/2016 05:47 AM, Sergey Senozhatsky wrote:
>> Hello,
>>
>> Wondering if anyone will be interested in printk-related topics
>> (or we can handle it in the mailing list).
>>
>> What I have on my list is:
>>
>>
>> - synchronous printk()
>>
>> printk() prints messages from kernel printk buffer until the buffer
>> is empty. When serial console is attached, printing is slow and thus
>> other CPUs in the system have plenty of time to append new messages to
>> the buffer while one CPU is printing. Thus the CPU can spend unbounded
>> amount of time doing printing in console_unlock(). This is especially
>> serious problem if the printk() calling console_unlock() was called with
>> interrupts disabled, or from IRQ, or from spin_lock protected section
>> (if the spinlock is contended), etc. etc. IOW, printk() is quite dangerous
>> function to call in some cases, it can cause different types of lockups
>> (soft, hard, spinlock), stalls and so on.
>>
>> we have some progress on this side. printk() can offload printing from
>> sensitive and unsafe contexts to a schedulable printk_kthread context (a
>> special purpose printing kthread).
>> but "The whole idea remains worrisome", per Andrew :)
>>
> Yes. The main problem stems from the fact that printk has two different
> and conflicting use-cases:
> - Really urgent, 'I am about to die' messages. Which obviously need to
> be printed out as fast as possible.
> - Rather largish, information/logging 'what I always wanted to tell you'
> type of messages. These messages tend to be very large, but at the end
> it doesn't really matter _when_ they'll be printed as they are
> time-stamped anyway.
>
Actually, there are 3 types of messages:
1. Urgent: I'm going to die.
2. information/logging.
3. Trace.
If you look at some customized kernel you can find lots of trace from
kernel. Most of the time they are really meaningless, but if kernel is
buggy or crash they become key information. When debugging, it is really
frustrating to see something is missed in kmsg, so developer tend to output
maything from printk. However, in production system they have to turns those
trace off, so debugging production systems is really painful.
I have an idea on those trace level printk. Since they are meaningless most
of the time, let it output to another ring buffer, unconditionally overwrite
old message without outputing to console or log file. When kernel detect
something
happen or before kernel die, dump the ring buffer to the console. Then we
can put as many as tracing information through printk without console
problem.
Since we already have tracepoints and overwritable perf ring buffer, we can
encapsulate trace level printk using tracepoints, and let it output to perf
ring buffer.
Make sence?
Thanks.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-20 3:35 ` Wangnan (F)
@ 2016-07-21 1:16 ` Andy Lutomirski
2016-07-21 1:52 ` Wangnan (F)
` (2 more replies)
0 siblings, 3 replies; 60+ messages in thread
From: Andy Lutomirski @ 2016-07-21 1:16 UTC (permalink / raw)
To: Wangnan (F); +Cc: ksummit-discuss
On Tue, Jul 19, 2016 at 8:35 PM, Wangnan (F) <wangnan0@huawei.com> wrote:
>
>
> On 2016/7/19 14:17, Hannes Reinecke wrote:
>>
>> On 07/19/2016 05:47 AM, Sergey Senozhatsky wrote:
>>>
>>> Hello,
>>>
>>> Wondering if anyone will be interested in printk-related topics
>>> (or we can handle it in the mailing list).
>>>
>>> What I have on my list is:
>>>
>>>
>>> - synchronous printk()
>>>
>>> printk() prints messages from kernel printk buffer until the buffer
>>> is empty. When serial console is attached, printing is slow and thus
>>> other CPUs in the system have plenty of time to append new messages to
>>> the buffer while one CPU is printing. Thus the CPU can spend unbounded
>>> amount of time doing printing in console_unlock(). This is especially
>>> serious problem if the printk() calling console_unlock() was called with
>>> interrupts disabled, or from IRQ, or from spin_lock protected section
>>> (if the spinlock is contended), etc. etc. IOW, printk() is quite
>>> dangerous
>>> function to call in some cases, it can cause different types of lockups
>>> (soft, hard, spinlock), stalls and so on.
>>>
>>> we have some progress on this side. printk() can offload printing from
>>> sensitive and unsafe contexts to a schedulable printk_kthread context (a
>>> special purpose printing kthread).
>>> but "The whole idea remains worrisome", per Andrew :)
>>>
>> Yes. The main problem stems from the fact that printk has two different
>> and conflicting use-cases:
>> - Really urgent, 'I am about to die' messages. Which obviously need to
>> be printed out as fast as possible.
>> - Rather largish, information/logging 'what I always wanted to tell you'
>> type of messages. These messages tend to be very large, but at the end
>> it doesn't really matter _when_ they'll be printed as they are
>> time-stamped anyway.
>>
>
> Actually, there are 3 types of messages:
>
> 1. Urgent: I'm going to die.
> 2. information/logging.
> 3. Trace.
>
If we do all this stuff, can we also try to clean up earlyprintk a
bit? The whole earlyconsole mechanism is a mess, and switching over
to the non-early console is only somewhat functional. I'd love to see
this all simplified: before there's any console at all available, just
buffer messages. Then, when a console shows up, write the buffer out.
Then earlyprintk can work just like regular printk.
--Andy
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 1:16 ` Andy Lutomirski
@ 2016-07-21 1:52 ` Wangnan (F)
2016-07-21 5:59 ` Hannes Reinecke
2016-07-21 10:28 ` David Woodhouse
2 siblings, 0 replies; 60+ messages in thread
From: Wangnan (F) @ 2016-07-21 1:52 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: ksummit-discuss
On 2016/7/21 9:16, Andy Lutomirski wrote:
> On Tue, Jul 19, 2016 at 8:35 PM, Wangnan (F) <wangnan0@huawei.com> wrote:
>>
>> On 2016/7/19 14:17, Hannes Reinecke wrote:
>>> On 07/19/2016 05:47 AM, Sergey Senozhatsky wrote:
>>>> Hello,
>>>>
>>>> Wondering if anyone will be interested in printk-related topics
>>>> (or we can handle it in the mailing list).
>>>>
>>>> What I have on my list is:
>>>>
>>>>
>>>> - synchronous printk()
>>>>
>>>> printk() prints messages from kernel printk buffer until the buffer
>>>> is empty. When serial console is attached, printing is slow and thus
>>>> other CPUs in the system have plenty of time to append new messages to
>>>> the buffer while one CPU is printing. Thus the CPU can spend unbounded
>>>> amount of time doing printing in console_unlock(). This is especially
>>>> serious problem if the printk() calling console_unlock() was called with
>>>> interrupts disabled, or from IRQ, or from spin_lock protected section
>>>> (if the spinlock is contended), etc. etc. IOW, printk() is quite
>>>> dangerous
>>>> function to call in some cases, it can cause different types of lockups
>>>> (soft, hard, spinlock), stalls and so on.
>>>>
>>>> we have some progress on this side. printk() can offload printing from
>>>> sensitive and unsafe contexts to a schedulable printk_kthread context (a
>>>> special purpose printing kthread).
>>>> but "The whole idea remains worrisome", per Andrew :)
>>>>
>>> Yes. The main problem stems from the fact that printk has two different
>>> and conflicting use-cases:
>>> - Really urgent, 'I am about to die' messages. Which obviously need to
>>> be printed out as fast as possible.
>>> - Rather largish, information/logging 'what I always wanted to tell you'
>>> type of messages. These messages tend to be very large, but at the end
>>> it doesn't really matter _when_ they'll be printed as they are
>>> time-stamped anyway.
>>>
>> Actually, there are 3 types of messages:
>>
>> 1. Urgent: I'm going to die.
>> 2. information/logging.
>> 3. Trace.
>>
> If we do all this stuff, can we also try to clean up earlyprintk a
> bit? The whole earlyconsole mechanism is a mess, and switching over
> to the non-early console is only somewhat functional. I'd love to see
> this all simplified: before there's any console at all available, just
> buffer messages. Then, when a console shows up, write the buffer out.
> Then earlyprintk can work just like regular printk.
I think kernel need earlyprintk because at very early stage kernel
can't use memory so can't buffer messages?
Thank you.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 1:16 ` Andy Lutomirski
2016-07-21 1:52 ` Wangnan (F)
@ 2016-07-21 5:59 ` Hannes Reinecke
2016-07-21 10:31 ` David Woodhouse
2016-07-21 10:28 ` David Woodhouse
2 siblings, 1 reply; 60+ messages in thread
From: Hannes Reinecke @ 2016-07-21 5:59 UTC (permalink / raw)
To: Andy Lutomirski, Wangnan (F); +Cc: ksummit-discuss
On 07/21/2016 03:16 AM, Andy Lutomirski wrote:
> On Tue, Jul 19, 2016 at 8:35 PM, Wangnan (F) <wangnan0@huawei.com> wrote:
>>
>>
>> On 2016/7/19 14:17, Hannes Reinecke wrote:
>>>
>>> On 07/19/2016 05:47 AM, Sergey Senozhatsky wrote:
>>>>
>>>> Hello,
>>>>
>>>> Wondering if anyone will be interested in printk-related topics
>>>> (or we can handle it in the mailing list).
>>>>
>>>> What I have on my list is:
>>>>
>>>>
>>>> - synchronous printk()
>>>>
>>>> printk() prints messages from kernel printk buffer until the buffer
>>>> is empty. When serial console is attached, printing is slow and thus
>>>> other CPUs in the system have plenty of time to append new messages to
>>>> the buffer while one CPU is printing. Thus the CPU can spend unbounded
>>>> amount of time doing printing in console_unlock(). This is especially
>>>> serious problem if the printk() calling console_unlock() was called with
>>>> interrupts disabled, or from IRQ, or from spin_lock protected section
>>>> (if the spinlock is contended), etc. etc. IOW, printk() is quite
>>>> dangerous
>>>> function to call in some cases, it can cause different types of lockups
>>>> (soft, hard, spinlock), stalls and so on.
>>>>
>>>> we have some progress on this side. printk() can offload printing from
>>>> sensitive and unsafe contexts to a schedulable printk_kthread context (a
>>>> special purpose printing kthread).
>>>> but "The whole idea remains worrisome", per Andrew :)
>>>>
>>> Yes. The main problem stems from the fact that printk has two different
>>> and conflicting use-cases:
>>> - Really urgent, 'I am about to die' messages. Which obviously need to
>>> be printed out as fast as possible.
>>> - Rather largish, information/logging 'what I always wanted to tell you'
>>> type of messages. These messages tend to be very large, but at the end
>>> it doesn't really matter _when_ they'll be printed as they are
>>> time-stamped anyway.
>>>
>>
>> Actually, there are 3 types of messages:
>>
>> 1. Urgent: I'm going to die.
>> 2. information/logging.
>> 3. Trace.
>>
>
> If we do all this stuff, can we also try to clean up earlyprintk a
> bit? The whole earlyconsole mechanism is a mess, and switching over
> to the non-early console is only somewhat functional. I'd love to see
> this all simplified: before there's any console at all available, just
> buffer messages. Then, when a console shows up, write the buffer out.
> Then earlyprintk can work just like regular printk.
>
Begging the question how one would debug failures during that time.
The current earlyprintk stuff is at least able to print out _something_,
so that you have some idea what went wrong.
Without that things will become _really_ hard during board bringup.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.com +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 5:59 ` Hannes Reinecke
@ 2016-07-21 10:31 ` David Woodhouse
2016-07-21 11:19 ` Josh Triplett
2016-07-21 15:05 ` Andy Lutomirski
0 siblings, 2 replies; 60+ messages in thread
From: David Woodhouse @ 2016-07-21 10:31 UTC (permalink / raw)
To: Hannes Reinecke, Andy Lutomirski, Wangnan (F); +Cc: ksummit-discuss
[-- Attachment #1: Type: text/plain, Size: 1216 bytes --]
On Thu, 2016-07-21 at 07:59 +0200, Hannes Reinecke wrote:
> > If we do all this stuff, can we also try to clean up earlyprintk a
> > bit? The whole earlyconsole mechanism is a mess, and switching over
> > to the non-early console is only somewhat functional. I'd love to see
> > this all simplified: before there's any console at all available, just
> > buffer messages. Then, when a console shows up, write the buffer out.
> > Then earlyprintk can work just like regular printk.
> >
> Begging the question how one would debug failures during that time.
> The current earlyprintk stuff is at least able to print out _something_,
> so that you have some idea what went wrong.
> Without that things will become _really_ hard during board bringup.
No, you don't actually *delay* the console output. You still register
the console as soon as you physically can, but you just don't have all
that nasty special-casing of the "earlyprintk" nonsense; you just make
sure the standard console output is capable of working as early as you
need it to.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 10:31 ` David Woodhouse
@ 2016-07-21 11:19 ` Josh Triplett
2016-07-21 11:59 ` David Woodhouse
2016-07-21 15:05 ` Andy Lutomirski
1 sibling, 1 reply; 60+ messages in thread
From: Josh Triplett @ 2016-07-21 11:19 UTC (permalink / raw)
To: David Woodhouse; +Cc: ksummit-discuss
On Thu, Jul 21, 2016 at 11:31:05AM +0100, David Woodhouse wrote:
> On Thu, 2016-07-21 at 07:59 +0200, Hannes Reinecke wrote:
> > > If we do all this stuff, can we also try to clean up earlyprintk a
> > > bit? The whole earlyconsole mechanism is a mess, and switching over
> > > to the non-early console is only somewhat functional. I'd love to see
> > > this all simplified: before there's any console at all available, just
> > > buffer messages. Then, when a console shows up, write the buffer out.
> > > Then earlyprintk can work just like regular printk.
> > >
> > Begging the question how one would debug failures during that time.
> > The current earlyprintk stuff is at least able to print out _something_,
> > so that you have some idea what went wrong.
> > Without that things will become _really_ hard during board bringup.
>
> No, you don't actually *delay* the console output. You still register
> the console as soon as you physically can, but you just don't have all
> that nasty special-casing of the "earlyprintk" nonsense; you just make
> sure the standard console output is capable of working as early as you
> need it to.
I've run into various cases where "console=ttyS0,115200" didn't produce
any serial output at all, but "earlyprintk=ttyS0,115200,keep" did.
Perhaps "earlyprintk" expects less from the serial port than "console"
does, but that suggests that the differences involve more than just how
early the driver can work.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 11:19 ` Josh Triplett
@ 2016-07-21 11:59 ` David Woodhouse
2016-07-21 14:21 ` Josh Triplett
2016-08-02 11:59 ` Petr Mladek
0 siblings, 2 replies; 60+ messages in thread
From: David Woodhouse @ 2016-07-21 11:59 UTC (permalink / raw)
To: Josh Triplett; +Cc: ksummit-discuss
[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]
On Thu, 2016-07-21 at 04:19 -0700, Josh Triplett wrote:
>
> I've run into various cases where "console=ttyS0,115200" didn't produce
> any serial output at all, but "earlyprintk=ttyS0,115200,keep" did.
Note that those are very different things. Because 'ttyS0' for
earlyprintk really means '16x50 UART at 0x3f8'. While the later one
actually depends on it being discovered by the various options we have
for finding them, some of which are only available quite late.
> Perhaps "earlyprintk" expects less from the serial port than "console"
> does, but that suggests that the differences involve more than just how
> early the driver can work.
There *are* differences, yes. But they are almost entirely gratuitous.
Fundamentally, the only interesting thing is how early the driver can
shovel bits out the port, and thus how early it can be registered.
In the case of serial ports, we can register the console early and just
need to sure it keeps working when the driver is *later* fully
initialised.
It's slightly more fun in cases like the bootx console which actually
does stop working when the graphics hardware is properly set up, but
that's not really hard to cope with in the generic code either; it
doesn't need to be *such* a horrid mess of special handling.
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 11:59 ` David Woodhouse
@ 2016-07-21 14:21 ` Josh Triplett
2016-07-21 14:40 ` David Woodhouse
2016-07-28 3:05 ` Steven Rostedt
2016-08-02 11:59 ` Petr Mladek
1 sibling, 2 replies; 60+ messages in thread
From: Josh Triplett @ 2016-07-21 14:21 UTC (permalink / raw)
To: David Woodhouse; +Cc: ksummit-discuss
On Thu, Jul 21, 2016 at 12:59:18PM +0100, David Woodhouse wrote:
> On Thu, 2016-07-21 at 04:19 -0700, Josh Triplett wrote:
> > I've run into various cases where "console=ttyS0,115200" didn't produce
> > any serial output at all, but "earlyprintk=ttyS0,115200,keep" did.
>
> Note that those are very different things. Because 'ttyS0' for
> earlyprintk really means '16x50 UART at 0x3f8'. While the later one
> actually depends on it being discovered by the various options we have
> for finding them, some of which are only available quite late.
Given the discussion in this thread about wanting to eliminate
earlyprintk in favor of a unified mechanism that works as early as
possible, how would this case work on the system I described above where
earlyprintk works and console doesn't?
- Josh Triplett
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 14:21 ` Josh Triplett
@ 2016-07-21 14:40 ` David Woodhouse
2016-07-28 3:05 ` Steven Rostedt
1 sibling, 0 replies; 60+ messages in thread
From: David Woodhouse @ 2016-07-21 14:40 UTC (permalink / raw)
To: Josh Triplett; +Cc: ksummit-discuss
[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]
On Thu, 2016-07-21 at 07:21 -0700, Josh Triplett wrote:
> On Thu, Jul 21, 2016 at 12:59:18PM +0100, David Woodhouse wrote:
> > On Thu, 2016-07-21 at 04:19 -0700, Josh Triplett wrote:
> > > I've run into various cases where "console=ttyS0,115200" didn't produce
> > > any serial output at all, but "earlyprintk=ttyS0,115200,keep" did.
> >
> > Note that those are very different things. Because 'ttyS0' for
> > earlyprintk really means '16x50 UART at 0x3f8'. While the later one
> > actually depends on it being discovered by the various options we have
> > for finding them, some of which are only available quite late.
>
> Given the discussion in this thread about wanting to eliminate
> earlyprintk in favor of a unified mechanism that works as early as
> possible, how would this case work on the system I described above where
> earlyprintk works and console doesn't?
We'd need to look at why, and fix the driver so that it *does* work.
But that should be a driver-specific issue.
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 14:21 ` Josh Triplett
2016-07-21 14:40 ` David Woodhouse
@ 2016-07-28 3:05 ` Steven Rostedt
1 sibling, 0 replies; 60+ messages in thread
From: Steven Rostedt @ 2016-07-28 3:05 UTC (permalink / raw)
To: Josh Triplett; +Cc: ksummit-discuss
On Thu, 21 Jul 2016 07:21:28 -0700
Josh Triplett <josh@joshtriplett.org> wrote:
>
> Given the discussion in this thread about wanting to eliminate
> earlyprintk in favor of a unified mechanism that works as early as
> possible, how would this case work on the system I described above where
> earlyprintk works and console doesn't?
No no no!!! Please do not eliminate earlyprintk(). It is critical to
the PREEMPT_RT project. It requires no sleeping locks, and can be used
in pretty much any context (with slight modifications). printk() is a
beast, and in -rt it calls sleeping locks. That is, if interrupts are
disabled, we prevent printk from print anything. (Note PREEMPT_RT
keeps interrupts enabled much more than the normal kernel does). If a
bug happens where interrupts are disabled, we fall back to earlyprintk
(and tracing), to debug it.
-- Steve
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 11:59 ` David Woodhouse
2016-07-21 14:21 ` Josh Triplett
@ 2016-08-02 11:59 ` Petr Mladek
1 sibling, 0 replies; 60+ messages in thread
From: Petr Mladek @ 2016-08-02 11:59 UTC (permalink / raw)
To: David Woodhouse; +Cc: ksummit-discuss
On Thu 2016-07-21 12:59:18, David Woodhouse wrote:
> On Thu, 2016-07-21 at 04:19 -0700, Josh Triplett wrote:
> >
> > I've run into various cases where "console=ttyS0,115200" didn't produce
> > any serial output at all, but "earlyprintk=ttyS0,115200,keep" did.
>
> Note that those are very different things. Because 'ttyS0' for
> earlyprintk really means '16x50 UART at 0x3f8'. While the later one
> actually depends on it being discovered by the various options we have
> for finding them, some of which are only available quite late.
>
> > Perhaps "earlyprintk" expects less from the serial port than "console"
> > does, but that suggests that the differences involve more than just how
> > early the driver can work.
>
> There *are* differences, yes. But they are almost entirely gratuitous.
> Fundamentally, the only interesting thing is how early the driver can
> shovel bits out the port, and thus how early it can be registered.
Also earlyprintk allows to flush the characters directly to the serial
port without any locking. It is racy and hacky but it makes it work
even on very borked system when the normal printk would get blocked.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 10:31 ` David Woodhouse
2016-07-21 11:19 ` Josh Triplett
@ 2016-07-21 15:05 ` Andy Lutomirski
2016-07-26 14:40 ` David Woodhouse
1 sibling, 1 reply; 60+ messages in thread
From: Andy Lutomirski @ 2016-07-21 15:05 UTC (permalink / raw)
To: David Woodhouse; +Cc: ksummit-discuss
On Thu, Jul 21, 2016 at 3:31 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Thu, 2016-07-21 at 07:59 +0200, Hannes Reinecke wrote:
>> > If we do all this stuff, can we also try to clean up earlyprintk a
>> > bit? The whole earlyconsole mechanism is a mess, and switching over
>> > to the non-early console is only somewhat functional. I'd love to see
>> > this all simplified: before there's any console at all available, just
>> > buffer messages. Then, when a console shows up, write the buffer out.
>> > Then earlyprintk can work just like regular printk.
>> >
>> Begging the question how one would debug failures during that time.
>> The current earlyprintk stuff is at least able to print out _something_,
>> so that you have some idea what went wrong.
>> Without that things will become _really_ hard during board bringup.
>
> No, you don't actually *delay* the console output. You still register
> the console as soon as you physically can, but you just don't have all
> that nasty special-casing of the "earlyprintk" nonsense; you just make
> sure the standard console output is capable of working as early as you
> need it to.
Agreed.
If we could convince fbcon to work early (which it really ought to),
then we could get rid of the extra graphical console implementations
floating around, e.g. the mediocre one sitting in the earlyefi console
code. We could also get rid of the handover problem when we
initialize the "real" efifb -- all we'd have to do is initialize efifb
very, very early, and I see no reason at all that we can't do that.
(Well, I see one silly reason: if we initialize it before PAT, then we
could end up with a UC mapping instead of WC, and I'm not sure that
the x86 PAT code is smart enough to change the mapping type after the
fact. This could relatively easily be special-cased for efifb by
teaching it to unmap and remap itself after PAT init, or we could try
to initialize PAT earlier.)
But doing this really nicely does tie in to the async printk
discussion for a different reason. I think that we should always
initialize the obvious "early" consoles that can be cleanly enumerated
early (e.g. EFI). What we shouldn't do by default is *display*
anything to them unless we crash, because the distro people want a
clean boot to look pretty. ISTM the condition for flushing the async
printk buffers could be the same as the condition for actually writing
to pre-VT consoles.
--Andy
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 15:05 ` Andy Lutomirski
@ 2016-07-26 14:40 ` David Woodhouse
2016-07-26 15:44 ` Benjamin Herrenschmidt
2016-07-26 21:00 ` Andy Lutomirski
0 siblings, 2 replies; 60+ messages in thread
From: David Woodhouse @ 2016-07-26 14:40 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: ksummit-discuss
[-- Attachment #1: Type: text/plain, Size: 2523 bytes --]
On Thu, 2016-07-21 at 08:05 -0700, Andy Lutomirski wrote:
> On Thu, Jul 21, 2016 at 3:31 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> > On Thu, 2016-07-21 at 07:59 +0200, Hannes Reinecke wrote:
> > > > If we do all this stuff, can we also try to clean up earlyprintk a
> > > > bit? The whole earlyconsole mechanism is a mess, and switching over
> > > > to the non-early console is only somewhat functional. I'd love to see
> > > > this all simplified: before there's any console at all available, just
> > > > buffer messages. Then, when a console shows up, write the buffer out.
> > > > Then earlyprintk can work just like regular printk.
> > > >
> > > Begging the question how one would debug failures during that time.
> > > The current earlyprintk stuff is at least able to print out _something_,
> > > so that you have some idea what went wrong.
> > > Without that things will become _really_ hard during board bringup.
> >
> > No, you don't actually *delay* the console output. You still register
> > the console as soon as you physically can, but you just don't have all
> > that nasty special-casing of the "earlyprintk" nonsense; you just make
> > sure the standard console output is capable of working as early as you
> > need it to.
>
> Agreed.
>
> If we could convince fbcon to work early (which it really ought to),
We had bootx on PowerPC which worked from extremely early.
> But doing this really nicely does tie in to the async printk
> discussion for a different reason. I think that we should always
> initialize the obvious "early" consoles that can be cleanly enumerated
> early (e.g. EFI). What we shouldn't do by default is *display*
> anything to them unless we crash, because the distro people want a
> clean boot to look pretty. ISTM the condition for flushing the async
> printk buffers could be the same as the condition for actually writing
> to pre-VT consoles.
Me no like.
That makes even *less* sense for early boot than it does during later
runtime. Because even *more* of the 'we crash' scenarios during early
boot are going to be uncontrolled "oh shit" happenings, not "oh dear,
it looks like we're going to crash so let's helpfully dump out all the
console messages before we do so".
We already have *really* early command line parsing; we can use that to
determine the console loglevel if we need to. No need to be printing
debug messages to the console unconditionally and upsetting the
graphical boot...
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-26 14:40 ` David Woodhouse
@ 2016-07-26 15:44 ` Benjamin Herrenschmidt
2016-07-26 21:00 ` Andy Lutomirski
1 sibling, 0 replies; 60+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-26 15:44 UTC (permalink / raw)
To: David Woodhouse, Andy Lutomirski; +Cc: ksummit-discuss
On Tue, 2016-07-26 at 15:40 +0100, David Woodhouse wrote:
> > If we could convince fbcon to work early (which it really ought to),
>
> We had bootx on PowerPC which worked from extremely early.
We still do ;-)
Cheers,
Ben.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-26 14:40 ` David Woodhouse
2016-07-26 15:44 ` Benjamin Herrenschmidt
@ 2016-07-26 21:00 ` Andy Lutomirski
2016-07-27 0:03 ` David Woodhouse
1 sibling, 1 reply; 60+ messages in thread
From: Andy Lutomirski @ 2016-07-26 21:00 UTC (permalink / raw)
To: David Woodhouse; +Cc: ksummit-discuss
On Tue, Jul 26, 2016 at 7:40 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Thu, 2016-07-21 at 08:05 -0700, Andy Lutomirski wrote:
>> On Thu, Jul 21, 2016 at 3:31 AM, David Woodhouse <dwmw2@infradead.org> wrote:
>> > On Thu, 2016-07-21 at 07:59 +0200, Hannes Reinecke wrote:
>> > > > If we do all this stuff, can we also try to clean up earlyprintk a
>> > > > bit? The whole earlyconsole mechanism is a mess, and switching over
>> > > > to the non-early console is only somewhat functional. I'd love to see
>> > > > this all simplified: before there's any console at all available, just
>> > > > buffer messages. Then, when a console shows up, write the buffer out.
>> > > > Then earlyprintk can work just like regular printk.
>> > > >
>> > > Begging the question how one would debug failures during that time.
>> > > The current earlyprintk stuff is at least able to print out _something_,
>> > > so that you have some idea what went wrong.
>> > > Without that things will become _really_ hard during board bringup.
>> >
>> > No, you don't actually *delay* the console output. You still register
>> > the console as soon as you physically can, but you just don't have all
>> > that nasty special-casing of the "earlyprintk" nonsense; you just make
>> > sure the standard console output is capable of working as early as you
>> > need it to.
>>
>> Agreed.
>>
>> If we could convince fbcon to work early (which it really ought to),
>
> We had bootx on PowerPC which worked from extremely early.
>
>> But doing this really nicely does tie in to the async printk
>> discussion for a different reason. I think that we should always
>> initialize the obvious "early" consoles that can be cleanly enumerated
>> early (e.g. EFI). What we shouldn't do by default is *display*
>> anything to them unless we crash, because the distro people want a
>> clean boot to look pretty. ISTM the condition for flushing the async
>> printk buffers could be the same as the condition for actually writing
>> to pre-VT consoles.
>
> Me no like.
>
> That makes even *less* sense for early boot than it does during later
> runtime. Because even *more* of the 'we crash' scenarios during early
> boot are going to be uncontrolled "oh shit" happenings, not "oh dear,
> it looks like we're going to crash so let's helpfully dump out all the
> console messages before we do so".
I only meant in the cases where we're currently silent. If we boot
with "quiet" and we panic, couldn't we at least try to flush out the
printk buffer onto the early console? Currently we just sit there
with a blank screen. (I debugged a case exactly like that last week.)
--Andy
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-26 21:00 ` Andy Lutomirski
@ 2016-07-27 0:03 ` David Woodhouse
2016-07-27 1:16 ` Sergey Senozhatsky
0 siblings, 1 reply; 60+ messages in thread
From: David Woodhouse @ 2016-07-27 0:03 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: ksummit-discuss
[-- Attachment #1: Type: text/plain, Size: 2331 bytes --]
On Tue, 2016-07-26 at 14:00 -0700, Andy Lutomirski wrote:
>
> >> But doing this really nicely does tie in to the async printk
> >> discussion for a different reason. I think that we should always
> >> initialize the obvious "early" consoles that can be cleanly enumerated
> >> early (e.g. EFI). What we shouldn't do by default is *display*
> >> anything to them unless we crash, because the distro people want a
> >> clean boot to look pretty. ISTM the condition for flushing the async
> >> printk buffers could be the same as the condition for actually writing
> >> to pre-VT consoles.
> >
> > Me no like.
> >
> > That makes even *less* sense for early boot than it does during later
> > runtime. Because even *more* of the 'we crash' scenarios during early
> > boot are going to be uncontrolled "oh shit" happenings, not "oh dear,
> > it looks like we're going to crash so let's helpfully dump out all the
> > console messages before we do so".
>
> I only meant in the cases where we're currently silent. If we boot
> with "quiet" and we panic, couldn't we at least try to flush out the
> printk buffer onto the early console? Currently we just sit there
> with a blank screen. (I debugged a case exactly like that last week.)
Sure, *that* part makes some sense if we can work out the details.
One of the issues is that you can end up with out-of-order messages —
for example if you've already printed everything above KERN_CRIT as it
happened, and you then go back and re-print the *whole* logbuffer when
you realise you're going to panic. If you *haven't* actually printed
any lines yet then that's kind of OK, but are you really intending to
keep track of that and behave differently if you have?
And if you spew *everything* on panic to a limited output like the
bootx or VGA text-mode console, you might find that the KERN_DEBUG
messages obscure what actually happened, which would be visible if you
only printed KERN_NOTICE and above... so maybe the best option is just
to honour the loglevel on the command line, but to do it *early*.
Heuristics for 'dump the logbuffer when you panic' still aren't a
panacea.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-27 0:03 ` David Woodhouse
@ 2016-07-27 1:16 ` Sergey Senozhatsky
0 siblings, 0 replies; 60+ messages in thread
From: Sergey Senozhatsky @ 2016-07-27 1:16 UTC (permalink / raw)
To: David Woodhouse; +Cc: ksummit-discuss
On (07/27/16 01:03), David Woodhouse wrote:
[..]
> And if you spew *everything* on panic to a limited output like the
> bootx or VGA text-mode console, you might find that the KERN_DEBUG
> messages obscure what actually happened, which would be visible if you
> only printed KERN_NOTICE and above...
yes, limited output is a problem on its own. sometimes backtrace is just too
big and even critical messages don't fit. I ended up once forcing panic() to
re-print that "bug at __FILE__, __LINE__, __func__" at the bottom of panic
output. schematically, something like this:
#define BUG() do { \
- printk("BUG: failure at %s:%d/%s()!\n", __FILE__, __LINE__, __func__); \
- panic("BUG!"); \
+ panic("BUG: failure at %s:%d/%s()!\n", __FILE__, __LINE__, __func__); \
} while (0)
and pr_emerg() that string in panic() instead of atomic_notifier_call_chain()
output.
-ss
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] asynchronous printk
2016-07-21 1:16 ` Andy Lutomirski
2016-07-21 1:52 ` Wangnan (F)
2016-07-21 5:59 ` Hannes Reinecke
@ 2016-07-21 10:28 ` David Woodhouse
2 siblings, 0 replies; 60+ messages in thread
From: David Woodhouse @ 2016-07-21 10:28 UTC (permalink / raw)
To: Andy Lutomirski, Wangnan (F); +Cc: ksummit-discuss
[-- Attachment #1: Type: text/plain, Size: 968 bytes --]
On Wed, 2016-07-20 at 18:16 -0700, Andy Lutomirski wrote:
> If we do all this stuff, can we also try to clean up earlyprintk a
> bit? The whole earlyconsole mechanism is a mess, and switching over
> to the non-early console is only somewhat functional. I'd love to see
> this all simplified: before there's any console at all available, just
> buffer messages. Then, when a console shows up, write the buffer out.
> Then earlyprintk can work just like regular printk.
That's exactly what we already do. Look at the CON_PRINTBUFFER flag.
I never did understand why "earlyprintk" was introduced as such a nasty
special-case hack in the first place. There were various minor excuses
for doing so, but nothing insurmountable IIRC.
I would *love* to kill it completely and simplify the whole thing.
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@intel.com Intel Corporation
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread