All of lore.kernel.org
 help / color / mirror / Atom feed
* linux.git: printk() problem
@ 2016-10-12 13:30 Tetsuo Handa
  2016-10-12 14:35 ` Michal Hocko
  2016-10-12 15:47 ` Linus Torvalds
  0 siblings, 2 replies; 36+ messages in thread
From: Tetsuo Handa @ 2016-10-12 13:30 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

Hello.

I noticed that current linux.git generates hardly readable console output
due to KERN_CONT changes. Are you suggesting developers that output like
this be fixed?

[   93.723582] Mem-Info:
[   93.725919] active_anon:380970 inactive_anon:2098 isolated_anon:0
[   93.725919]  active_file:242 inactive_file:275 isolated_file:0
[   93.725919]  unevictable:0 dirty:0 writeback:65 unstable:0
[   93.725919]  slab_reclaimable:3825 slab_unreclaimable:12527
[   93.725919]  mapped:760 shmem:2164 pagetables:9297 bounce:0
[   93.725919]  free:14046 free_pcp:74 free_cma:0
[   93.735626] Node 0 active_anon:1523880kB inactive_anon:8392kB active_file:968kB inactive_file:1100kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3040kB dirty:0kB writeback:244kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 1056768kB anon_thp: 8656kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
[   93.742894] Node 0 [   93.743319] DMA free:7156kB min:408kB low:508kB high:608kB active_anon:8672kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:40kB kernel_stack:0kB pagetables:36kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   93.750951] lowmem_reserve[]:[   93.751597]  0
 1699[   93.752678]  1699
 1699[   93.753910] 
[   93.754822] Node 0 [   93.755216] DMA32 free:49028kB min:44644kB low:55804kB high:66964kB active_anon:1515208kB inactive_anon:8392kB active_file:968kB inactive_file:960kB unevictable:0kB writepending:232kB present:2080640kB managed:1760736kB mlocked:0kB slab_reclaimable:15300kB slab_unreclaimable:50068kB kernel_stack:20320kB pagetables:37152kB bounce:0kB free_pcp:436kB local_pcp:240kB free_cma:0kB
[   93.763709] lowmem_reserve[]:[   93.764235]  0
 0[   93.765630]  0
 0[   93.767544] 
[   93.768676] Node 0 [   93.769135] DMA: 
1*4kB [   93.770932] (U) 
0*8kB [   93.772192] 1*16kB 
(U) [   93.773438] 1*32kB 
(M) [   93.774574] 1*64kB 
(U) [   93.775997] 1*128kB 
(U) [   93.777665] 1*256kB 
(U) [   93.778861] 1*512kB 
(M) [   93.780004] 2*1024kB 
(UM) [   93.781225] 0*2048kB 
1*4096kB [   93.782422] (M) 
[   93.783423] = 7156kB
[   93.784445] Node 0 [   93.784844] DMA32: 
1265*4kB [   93.786204] (UMEH) 
790*8kB [   93.787412] (UMEH) 
335*16kB [   93.788634] (UEH) 
265*32kB [   93.790037] (UE) 
89*64kB [   93.791254] (UEH) 
23*128kB [   93.792541] (UME) 
9*256kB [   93.793787] (ME) 
13*512kB [   93.794956] (UME) 
6*1024kB [   93.796094] (UME) 
0*2048kB [   93.797435] 0*4096kB 
[   93.798413] = 48964kB
[   93.799357] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   93.802221] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   93.804476] 2700 total pagecache pages
[   93.805949] 0 pages in swap cache
[   93.807557] Swap cache stats: add 0, delete 0, find 0/0
[   93.809417] Free swap  = 0kB
[   93.810573] Total swap = 0kB
[   93.811788] 524157 pages RAM
[   93.812903] 0 pages HighMem/MovableOnly
[   93.814216] 79997 pages reserved
[   93.815292] 0 pages hwpoisoned

[  185.497015] Showing all locks held in the system:
[  185.500009] 1 lock held by systemd/1:
[  185.501377]  #0: [  185.501717]  (
&(&ip->i_mmaplock)->mr_lock[  185.503347] ){++++++}
, at: [  185.504566] [<ffffffff812a1600>] xfs_ilock+0xd0/0xe0
[  185.506157] 2 locks held by khungtaskd/46:
[  185.507553]  #0: [  185.507896]  (
rcu_read_lock[  185.509091] ){......}
, at: [  185.510277] [<ffffffff8112054e>] watchdog+0x9e/0x480
[  185.512072]  #1: [  185.512413]  (
tasklist_lock[  185.513602] ){.+.+..}
, at: [  185.514802] [<ffffffff810b5adf>] debug_show_all_locks+0x3f/0x1b0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-12 13:30 linux.git: printk() problem Tetsuo Handa
@ 2016-10-12 14:35 ` Michal Hocko
  2016-10-12 16:08   ` Joe Perches
  2016-10-12 15:47 ` Linus Torvalds
  1 sibling, 1 reply; 36+ messages in thread
From: Michal Hocko @ 2016-10-12 14:35 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: torvalds, linux-kernel

On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> Hello.
> 
> I noticed that current linux.git generates hardly readable console output
> due to KERN_CONT changes. Are you suggesting developers that output like
> this be fixed?

Joe has already posted a patch
http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.com

And in general I think that adding those KERN_CONT is a good thing to
do.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-12 13:30 linux.git: printk() problem Tetsuo Handa
  2016-10-12 14:35 ` Michal Hocko
@ 2016-10-12 15:47 ` Linus Torvalds
  2016-10-12 16:16   ` Joe Perches
  2016-10-23  9:22   ` linux.git: printk() problem Geert Uytterhoeven
  1 sibling, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2016-10-12 15:47 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Linux Kernel Mailing List

On Wed, Oct 12, 2016 at 6:30 AM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> I noticed that current linux.git generates hardly readable console output
> due to KERN_CONT changes. Are you suggesting developers that output like
> this be fixed?

Yes. Needing to add a few KERN_CONT markers was pretty much expected,
since it's about 5 years since we enfroced it and new code won't
necessarily have it (and even back then I don't think we _always_ got
it right).

That said, looking at the printk's in the lowmem code, I think it
could be useful if there was some effort to see if the code could
somehow avoid the multi-printk thing. This is actually one area where

 (a) the problem actually happens while the system is running, rather
than booting

 (b) I've seen line mixing in the past

but the short term fix is to just add KERN_CONT markers to the lines
that are continuations.

NOTE! The reason I mention that (a) thing that it has traditionally
made it much messier to do logging of continuation lines in the first
place (because more things are going on and often one problem leads to
another and then the mixing is much more likely), but I actually
intentionally made it more likely to trigger the flushing issue in
commit bfd8d3f23b51 ("printk: make reading the kernel log flush
pending lines").

So if there is an active system logger that is reading messages *when*
one of those "one line in multiple printk's" things happen, that log
reader will now potentially cause the logging to be broken up because
the act of reading will flush the pending lines.

Now, honestly, that is something that we may end up reverting, but I'd
_like_ to try not to. Because without that flushing, there might be
one last partial line that the logger never sees. So it was me trying
to be aggressive about those partial lines, and the *hope* is that we
can just keep it, and that we can look at areas that have problems
with it.

We'll see. But the other issues are easily fixed by just adding
KERN_CONT where appropriate. It was actually very much what you were
supposed to do before too, if only as a marker to others that "yes,
I'm actually doing this, and no, I'm not supposed to have a log level"
(but with the new world order you actually *can* combine KERN_CONT
with a loglevel, so that if the beginning od the line got flushed, the
continuation can still be printed with the right log level).

                Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-12 14:35 ` Michal Hocko
@ 2016-10-12 16:08   ` Joe Perches
  2016-10-13  6:26     ` Michal Hocko
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Perches @ 2016-10-12 16:08 UTC (permalink / raw)
  To: Michal Hocko, Tetsuo Handa; +Cc: torvalds, linux-kernel

On Wed, 2016-10-12 at 16:35 +0200, Michal Hocko wrote:
> On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> > Hello.
> > 
> > I noticed that current linux.git generates hardly readable console output
> > due to KERN_CONT changes. Are you suggesting developers that output like
> > this be fixed?
> 
> 
> Joe has already posted a patch
> http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.com
> 
> And in general I think that adding those KERN_CONT is a good thing to
> do.

As do I, but there are about a quarter _million_ uses of
printk/logging messages in the kernel tree with newlines.

Maybe 100 or so of the existing messages lack terminating
newlines.

There are about 1800 uses of KERN_CONT/pr_cont today.

There are multiple thousands of uses of bare printks used as
printk line continuations that would need updating.

Most of those multiple thousands are in drivers of ancient and
out of manufacture devices or are subsystems that are old, crufty
and effectively unmaintained.  These might not need to be updated.

Still, there are probably hundreds to low thousands of actively
used device drivers that will need KERN?CONT or pr_cont update to
avoid logging output defects.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-12 15:47 ` Linus Torvalds
@ 2016-10-12 16:16   ` Joe Perches
  2016-10-12 16:54     ` Linus Torvalds
  2016-10-23  9:22   ` linux.git: printk() problem Geert Uytterhoeven
  1 sibling, 1 reply; 36+ messages in thread
From: Joe Perches @ 2016-10-12 16:16 UTC (permalink / raw)
  To: Linus Torvalds, Tetsuo Handa; +Cc: Linux Kernel Mailing List

On Wed, 2016-10-12 at 08:47 -0700, Linus Torvalds wrote:
< We'll see. But the other issues are easily fixed by just adding
> KERN_CONT where appropriate. It was actually very much what you were
> supposed to do before too, if only as a marker to others that "yes,
> I'm actually doing this, and no, I'm not supposed to have a log level"
> (but with the new world order you actually *can* combine KERN_CONT
> with a loglevel, so that if the beginning od the line got flushed, the
> continuation can still be printed with the right log level).

I think that might not be a good idea.

Anything that uses a KERN_CONT with a new log level
might as well be converted into multiple printks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-12 16:16   ` Joe Perches
@ 2016-10-12 16:54     ` Linus Torvalds
  2016-10-12 18:50       ` [PATCH] acpi_os_vprintf: Use printk_get_level() to avoid unnecessary KERN_CONT Joe Perches
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2016-10-12 16:54 UTC (permalink / raw)
  To: Joe Perches; +Cc: Tetsuo Handa, Linux Kernel Mailing List

On Wed, Oct 12, 2016 at 9:16 AM, Joe Perches <joe@perches.com> wrote:
>> (but with the new world order you actually *can* combine KERN_CONT
>> with a loglevel, so that if the beginning od the line got flushed, the
>> continuation can still be printed with the right log level).
>
> I think that might not be a good idea.
>
> Anything that uses a KERN_CONT with a new log level
> might as well be converted into multiple printks.

Generally yes.

The immediate reason for it was some really really nasty and horrible
indirection in the ACPI layer, which goes through something like
fifteen million different abstraction layers before it actually hits
"printk()", and some of them do vsprintf() in the middle.

It was something like ACPI_INFO() -> acpi_os_printf() ->
acpi_os_vprintf() which is completely broken and *always* has that
KERN_CONT marker at the beginning, but earlier phases can have the
level marker in it already..

In other words, it was a terminally broken piece of nasty code, but
there was no way I was going to touch the ACPI "OS independent"
layers, so I just said "screw this, if you want to mix KERN_CONT with
a level marker, you can".

So I agree, you normally shouldn't need to. In fact, the fewer
KERN_CONT's I see in the kernel, the better off we are. But
_sometimes_ KERN_CONT makes sense (and the OOM code is likely one of
the few places where it really might be the best of a number of bad
options: lots of information that needs to be pretty dense, and we
obviously can't afford to allocate a buffer for it dynamically, and
doing so statically isn't a great option either..)

                 Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] acpi_os_vprintf: Use printk_get_level() to avoid unnecessary KERN_CONT
  2016-10-12 16:54     ` Linus Torvalds
@ 2016-10-12 18:50       ` Joe Perches
  2016-10-13 21:59         ` Rafael J. Wysocki
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Perches @ 2016-10-12 18:50 UTC (permalink / raw)
  To: Linus Torvalds, Rafael J. Wysocki, Len Brown; +Cc: linux-acpi, linux-kernel

acpi_os_vprintf currently always uses a KERN_CONT prefix which may be
followed immediately by a proper KERN_<LEVEL>.  Check if the buffer
already has a KERN_<LEVEL> at the start of the buffer and avoid the
unnecessary KERN_CONT.

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/acpi/osl.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 4305ee9db4b2..416953a42510 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -162,11 +162,18 @@ void acpi_os_vprintf(const char *fmt, va_list args)
 	if (acpi_in_debugger) {
 		kdb_printf("%s", buffer);
 	} else {
-		printk(KERN_CONT "%s", buffer);
+		if (printk_get_level(buffer))
+			printk("%s", buffer);
+		else
+			printk(KERN_CONT "%s", buffer);
 	}
 #else
-	if (acpi_debugger_write_log(buffer) < 0)
-		printk(KERN_CONT "%s", buffer);
+	if (acpi_debugger_write_log(buffer) < 0) {
+		if (printk_get_level(buffer))
+			printk("%s", buffer);
+		else
+			printk(KERN_CONT "%s", buffer);
+	}
 #endif
 }
 
-- 
2.10.0.rc2.1.g053435c


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-12 16:08   ` Joe Perches
@ 2016-10-13  6:26     ` Michal Hocko
  2016-10-13  9:29       ` Joe Perches
  0 siblings, 1 reply; 36+ messages in thread
From: Michal Hocko @ 2016-10-13  6:26 UTC (permalink / raw)
  To: Joe Perches; +Cc: Tetsuo Handa, torvalds, linux-kernel

On Wed 12-10-16 09:08:58, Joe Perches wrote:
> On Wed, 2016-10-12 at 16:35 +0200, Michal Hocko wrote:
> > On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> > > Hello.
> > > 
> > > I noticed that current linux.git generates hardly readable console output
> > > due to KERN_CONT changes. Are you suggesting developers that output like
> > > this be fixed?
> > 
> > 
> > Joe has already posted a patch
> > http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.com
> > 
> > And in general I think that adding those KERN_CONT is a good thing to
> > do.
> 
> As do I, but there are about a quarter _million_ uses of
> printk/logging messages in the kernel tree with newlines.
> 
> Maybe 100 or so of the existing messages lack terminating
> newlines.

I think they are not critical and can be fix once somebody notices.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-13  6:26     ` Michal Hocko
@ 2016-10-13  9:29       ` Joe Perches
  2016-10-13 10:04         ` Michal Hocko
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Perches @ 2016-10-13  9:29 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tetsuo Handa, torvalds, linux-kernel, Colin Ian King

On Thu, 2016-10-13 at 08:26 +0200, Michal Hocko wrote:
> On Wed 12-10-16 09:08:58, Joe Perches wrote:
> > On Wed, 2016-10-12 at 16:35 +0200, Michal Hocko wrote:
> > > On Wed 12-10-16 22:30:03, Tetsuo Handa wrote:
> > > > Hello.
> > > > 
> > > > I noticed that current linux.git generates hardly readable console output
> > > > due to KERN_CONT changes. Are you suggesting developers that output like
> > > > this be fixed?
> > > 
> > > 
> > > 
> > > Joe has already posted a patch
> > > http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.com
> > > 
> > > And in general I think that adding those KERN_CONT is a good thing to
> > > do.
> > 
> > 
> > As do I, but there are about a quarter _million_ uses of
> > printk/logging messages in the kernel tree with newlines.
> > 
> > Maybe 100 or so of the existing messages lack terminating
> > newlines.
> 
> 
> I think they are not critical and can be fix once somebody notices.
 
As do I, but Linus objected to applying a patch when Colin Ian King
noticed one.

I think the 250,000 or so uses with newlines are enough of a
precedence to keep using newlines everywhere.

Now we'll have to have patches adding hundreds to thousands of the
missing KERN_CONTs for continuation lines that weren't previously a
problem in logging output but are now.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-13  9:29       ` Joe Perches
@ 2016-10-13 10:04         ` Michal Hocko
  2016-10-13 10:20           ` Joe Perches
  0 siblings, 1 reply; 36+ messages in thread
From: Michal Hocko @ 2016-10-13 10:04 UTC (permalink / raw)
  To: Joe Perches; +Cc: Tetsuo Handa, torvalds, linux-kernel, Colin Ian King

On Thu 13-10-16 02:29:46, Joe Perches wrote:
> On Thu, 2016-10-13 at 08:26 +0200, Michal Hocko wrote:
[...]
> > I think they are not critical and can be fix once somebody notices.
>  
> As do I, but Linus objected to applying a patch when Colin Ian King
> noticed one.
> 
> I think the 250,000 or so uses with newlines are enough of a
> precedence to keep using newlines everywhere.

or simply fix missing KERN_CONTs and simply do not add any new missing \n

> Now we'll have to have patches adding hundreds to thousands of the
> missing KERN_CONTs for continuation lines that weren't previously a
> problem in logging output but are now.

I would be really surprised if we really had that many continuation
lines. They should be avoided as much as possible. Hundreds of thousands
just sounds more than over exaggerated...

I think you are just making much bigger deal from this than necessary.
Not requiring \n at the end of strings just makes a lot of sense if we
have a KERN_CONT with a well defined semantic. Which was the whole point
of the patch from Linus AFAIU. If there are some left overs, so what, we
can fix them as soon as somebody notices. The worst thing we will get is
a messy output but no information should be lost. We used to have a
messy output in the past regardless and we could live with it...

That being said I would be happier to know about this change before I
had to scratch my head when seeing this for the first time so a heads up
would be more than appreciated but fixing these issues is trivial not
not worth making a lot of noise about.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-13 10:04         ` Michal Hocko
@ 2016-10-13 10:20           ` Joe Perches
  2016-10-13 11:06             ` Michal Hocko
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Perches @ 2016-10-13 10:20 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tetsuo Handa, torvalds, linux-kernel, Colin Ian King

On Thu, 2016-10-13 at 12:04 +0200, Michal Hocko wrote:
> On Thu 13-10-16 02:29:46, Joe Perches wrote:
> > On Thu, 2016-10-13 at 08:26 +0200, Michal Hocko wrote:
> 
> [...]
> > > I think they are not critical and can be fix once somebody notices.
> > 
> >  
> > As do I, but Linus objected to applying a patch when Colin Ian King
> > noticed one.
> > 
> > I think the 250,000 or so uses with newlines are enough of a
> > precedence to keep using newlines everywhere.
> 
> 
> or simply fix missing KERN_CONTs and simply do not add any new missing \n
> 
> > Now we'll have to have patches adding hundreds to thousands of the
> > missing KERN_CONTs for continuation lines that weren't previously a
> > problem in logging output but are now.
> 
> 
> I would be really surprised if we really had that many continuation
> lines. They should be avoided as much as possible. Hundreds of thousands
> just sounds more than over exaggerated...

Hey Michal.

"Hundreds _to_ thousands" of instances.  Not "hundreds _of_ thousands".

> Not requiring \n at the end of strings just makes a lot of sense if we
> have a KERN_CONT with a well defined semantic.

True enough.  And I am not at all arguing against having a
well defined KERN_CONT semantic.

But using KERN_CONT alone is not enough information to be able to
perfectly reassemble message fragments post hoc given multiple
threads possibly interleaving KERN_CONT.

I do think the inconsistency of mixing styles with and without
newlines not particularly good.

cheers, Joe

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-13 10:20           ` Joe Perches
@ 2016-10-13 11:06             ` Michal Hocko
  0 siblings, 0 replies; 36+ messages in thread
From: Michal Hocko @ 2016-10-13 11:06 UTC (permalink / raw)
  To: Joe Perches; +Cc: Tetsuo Handa, torvalds, linux-kernel, Colin Ian King

On Thu 13-10-16 03:20:50, Joe Perches wrote:
> On Thu, 2016-10-13 at 12:04 +0200, Michal Hocko wrote:
[...]
> > I would be really surprised if we really had that many continuation
> > lines. They should be avoided as much as possible. Hundreds of thousands
> > just sounds more than over exaggerated...
> 
> Hey Michal.
> 
> "Hundreds _to_ thousands" of instances.  Not "hundreds _of_ thousands".

my bad I misread your words. Sorry about that

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] acpi_os_vprintf: Use printk_get_level() to avoid unnecessary KERN_CONT
  2016-10-12 18:50       ` [PATCH] acpi_os_vprintf: Use printk_get_level() to avoid unnecessary KERN_CONT Joe Perches
@ 2016-10-13 21:59         ` Rafael J. Wysocki
  0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2016-10-13 21:59 UTC (permalink / raw)
  To: Joe Perches; +Cc: Linus Torvalds, Len Brown, linux-acpi, linux-kernel

On Wednesday, October 12, 2016 11:50:34 AM Joe Perches wrote:
> acpi_os_vprintf currently always uses a KERN_CONT prefix which may be
> followed immediately by a proper KERN_<LEVEL>.  Check if the buffer
> already has a KERN_<LEVEL> at the start of the buffer and avoid the
> unnecessary KERN_CONT.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
>  drivers/acpi/osl.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 4305ee9db4b2..416953a42510 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -162,11 +162,18 @@ void acpi_os_vprintf(const char *fmt, va_list args)
>  	if (acpi_in_debugger) {
>  		kdb_printf("%s", buffer);
>  	} else {
> -		printk(KERN_CONT "%s", buffer);
> +		if (printk_get_level(buffer))
> +			printk("%s", buffer);
> +		else
> +			printk(KERN_CONT "%s", buffer);
>  	}
>  #else
> -	if (acpi_debugger_write_log(buffer) < 0)
> -		printk(KERN_CONT "%s", buffer);
> +	if (acpi_debugger_write_log(buffer) < 0) {
> +		if (printk_get_level(buffer))
> +			printk("%s", buffer);
> +		else
> +			printk(KERN_CONT "%s", buffer);
> +	}
>  #endif
>  }

Applied.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-12 15:47 ` Linus Torvalds
  2016-10-12 16:16   ` Joe Perches
@ 2016-10-23  9:22   ` Geert Uytterhoeven
  2016-10-23 18:11     ` Linus Torvalds
  1 sibling, 1 reply; 36+ messages in thread
From: Geert Uytterhoeven @ 2016-10-23  9:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tetsuo Handa, Linux Kernel Mailing List

Hi Linus,

On Wed, Oct 12, 2016 at 5:47 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Oct 12, 2016 at 6:30 AM, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> I noticed that current linux.git generates hardly readable console output
>> due to KERN_CONT changes. Are you suggesting developers that output like
>> this be fixed?
>
> Yes. Needing to add a few KERN_CONT markers was pretty much expected,
> since it's about 5 years since we enfroced it and new code won't
> necessarily have it (and even back then I don't think we _always_ got
> it right).
>
> That said, looking at the printk's in the lowmem code, I think it
> could be useful if there was some effort to see if the code could
> somehow avoid the multi-printk thing. This is actually one area where
>
>  (a) the problem actually happens while the system is running, rather
> than booting
>
>  (b) I've seen line mixing in the past
>
> but the short term fix is to just add KERN_CONT markers to the lines
> that are continuations.
>
> NOTE! The reason I mention that (a) thing that it has traditionally
> made it much messier to do logging of continuation lines in the first
> place (because more things are going on and often one problem leads to
> another and then the mixing is much more likely), but I actually
> intentionally made it more likely to trigger the flushing issue in
> commit bfd8d3f23b51 ("printk: make reading the kernel log flush
> pending lines").
>
> So if there is an active system logger that is reading messages *when*
> one of those "one line in multiple printk's" things happen, that log
> reader will now potentially cause the logging to be broken up because
> the act of reading will flush the pending lines.

These changes have an interesting side-effect on sequences of printk()s that
lack proper continuation: they introduced a discrepancy between dmesg output
and the actual kernel output.

Before:

    Atari hardware found: VIDEL STDMA-SCSI ST_MFP YM2149 PCM CODEC
DSP56K SCC ANALOG_JOY BLITTER IDE TT_CLK FDC_SPEED

Output of "dmesg" after:

    Atari hardware found:
    VIDEL
    STDMA-SCSI
    ST_MFP
    YM2149
    PCM
    CODEC
    DSP56K
    SCC
    ANALOG_JOY
    BLITTER
    IDE
    TT_CLK
    FDC_SPEED

Actual kernel output after:

    Atari hardware found: VIDEL
    STDMA-SCSI ST_MFP
    YM2149 PCM
    CODEC DSP56K
    SCC ANALOG_JOY
    BLITTER IDE
    TT_CLK FDC_SPEED

Note that the above is really early in the boot process, right after the debug
console is enabled, and before any system log consumer is running,

Of course I'm converting this code to use pr_cont() anyway...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-23  9:22   ` linux.git: printk() problem Geert Uytterhoeven
@ 2016-10-23 18:11     ` Linus Torvalds
  2016-10-23 19:06       ` Joe Perches
  2016-10-25 14:42       ` Steven Rostedt
  0 siblings, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2016-10-23 18:11 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Tetsuo Handa, Linux Kernel Mailing List

On Sun, Oct 23, 2016 at 2:22 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> These changes have an interesting side-effect on sequences of printk()s that
> lack proper continuation: they introduced a discrepancy between dmesg output
> and the actual kernel output.

Yes.

So the "print vs log" handling is really really horrible. I cleaned up
some of it, but left the really fundamental problems. I wanted to just
rewrite it all, but didn't quite have the heart for it.

The best solution by far would be to just not support KERN_CONT at
all, but there's too many "silly details" things that keep it from
being possible.

The basic issue is that we have the line buffer that is used for
continuations, and then the record buffer that is used for logging.

And those two per se sound fairly easy to handle ("KERN_CONT means
append to the line buffer, otherwise flush the line buffer and move to
the record buffer").

But what complicates things more is then the "console output", which
has two issues:

 - it is done outside the locking regime for the line buffer and the
record buffer.

 - it is done on _partial_ line buffers.

Again, this would be absolutely trivial if we just said "we only print
the record buffer". Easy, and solves all the problems. Except for
_one_ problem:

 - if a hang occurs in the middle of a continuation, we historically
*really* want that continuation to have been printed out.

For example, one of the really historical uses for partial lines is this:

       pr_info("Checking 'hlt' instruction... ");

       if (!boot_cpu_data.hlt_works_ok) {
               pr_cont("disabled\n");
               return;
       }
       halt();
       halt();
       halt();
       halt();
       pr_cont("OK\n");

and the point was that there used to be some really old i386 machines
that hung on the "hlt" instruction (probably not because of a CPU bug,
but because of either power supply issues or some DMA issues).

To support that, we really *had* to print out the continuation lines
even when they were partial. And that complicates the printk logic a
lot.

Now, that "hlt" case is long long gone, and maybe we should just say
"screw that". It would be really quite easy to say "we don't print out
continuation lines immediately, we just buffer them for 0.1s instead,
and KERN_CONT only works for things that really happen more or less
immediately".

Maybe that really is the right answer. Because the original cause of
us having to bend over backwards in this case is really no longer
there. And it would simplify printk a *lot*.

Let me whip up a minimal patch for you to try.

                Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-23 18:11     ` Linus Torvalds
@ 2016-10-23 19:06       ` Joe Perches
  2016-10-23 19:32         ` Linus Torvalds
  2016-10-25 14:42       ` Steven Rostedt
  1 sibling, 1 reply; 36+ messages in thread
From: Joe Perches @ 2016-10-23 19:06 UTC (permalink / raw)
  To: Linus Torvalds, Geert Uytterhoeven
  Cc: Tetsuo Handa, Linux Kernel Mailing List, Sergey Senozhatsky, Petr Mladek

On Sun, 2016-10-23 at 11:11 -0700, Linus Torvalds wrote:
> On Sun, Oct 23, 2016 at 2:22 AM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > 
> > These changes have an interesting side-effect on sequences of printk()s that
> > lack proper continuation: they introduced a discrepancy between dmesg output
> > and the actual kernel output.
> 
> Yes.
> 
> So the "print vs log" handling is really really horrible. I cleaned up
> some of it, but left the really fundamental problems. I wanted to just
> rewrite it all, but didn't quite have the heart for it.
> 
> The best solution by far would be to just not support KERN_CONT at
> all,  but there's too many "silly details" things that keep it from
> being possible.
> 
> The basic issue is that we have the line buffer that is used for
> continuations, and then the record buffer that is used for logging.
> 
> And those two per se sound fairly easy to handle ("KERN_CONT means
> append to the line buffer, otherwise flush the line buffer and move to
> the record buffer").
> 
> But what complicates things more is then the "console output", which
> has two issues:
> 
>  - it is done outside the locking regime for the line buffer and the
> record buffer.
> 
>  - it is done on _partial_ line buffers.

EOL KERN_<LEVEL> and thread interleaving still exists.

> It would be really quite easy to say "we don't print out
> continuation lines immediately, we just buffer them for 0.1s instead,
> and KERN_CONT only works for things that really happen more or less
> immediately".

Or use to a start/stop buffer (maybe via KERN_<LEVEL> and \n) with
PID/TIDs added to /dev/kmsg and that short-term timer to reassemble.

> Maybe that really is the right answer. Because the original cause of
> us having to bend over backwards in this case is really no longer
> there. And it would simplify printk a *lot*.

A timer might be a good idea, but perhaps Sergey and Petr might
have some interest in that too. (added to cc's)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-23 19:06       ` Joe Perches
@ 2016-10-23 19:32         ` Linus Torvalds
  2016-10-23 19:46           ` Linus Torvalds
  2016-10-23 20:33           ` Joe Perches
  0 siblings, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2016-10-23 19:32 UTC (permalink / raw)
  To: Joe Perches
  Cc: Geert Uytterhoeven, Tetsuo Handa, Linux Kernel Mailing List,
	Sergey Senozhatsky, Petr Mladek

On Sun, Oct 23, 2016 at 12:06 PM, Joe Perches <joe@perches.com> wrote:
> On Sun, 2016-10-23 at 11:11 -0700, Linus Torvalds wrote:
>>
>> And those two per se sound fairly easy to handle ("KERN_CONT means
>> append to the line buffer, otherwise flush the line buffer and move to
>> the record buffer").
>>
>> But what complicates things more is then the "console output", which
>> has two issues:
>>
>>  - it is done outside the locking regime for the line buffer and the
>> record buffer.
>>
>>  - it is done on _partial_ line buffers.
>
> EOL KERN_<LEVEL> and thread interleaving still exists.

Note that the thread interleaving is still trivial: it's easily done
at the point where we decide "can we append to the line buffer or
not". That's pretty simple. Just flush the record when the thread
changes.

So the interleaving will never go away, it's very fundamental - unless
we make the line buffer just be a per-thread thing. And yes, that
would be the cleanest solution, but it's also an extra buffer for each
thread, so realistically it's just not going to happen.

End result: I'm not worried about the interleaving. It will cause ugly
output, but we've always had that, and the solution to it is "if you
absolutely don't want interleaving, then don't try to print partial
lines!".

The classic "don't do that then" response, in other world.

No, the real complexity comes from that interaction with the console
output, which is done outside the core log locks, and which currently
has the added thing where we have a "has this line fragment been
flushed or not".

That "has this line fragment been flushed or not" is particularly
painful, because we may have flushed it *partially*. That "cont.cons"
thing is a counter of how many bytes have been flushed, and we can be
in the situation where we have had multiple continuations added to the
line buffer, and only *some* of them have been flushed to the console.

(Reasons for not flushing: we couldn't get the console lock because
another process held it due to logging or whatever).

And then the interface to the actual record logging only has a "all or
nothing was flushed" flag (LOG_NOCONS) to avoid flushing things twice.
So when we actually log the record, we lose the "this line was only
partially printed".

That whole "we've flushed part of the line to the console" thing is
why it would make things so much easier to just log full records to
the console. Getting rid of that gets rid of a lot of ugly and
hard-to-read crap. Yes, the line buffer would still remain, and yes,
you'd still see the interleaving with threads, but that's not
"complexity", that's just "visually ugly output".

                 Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-23 19:32         ` Linus Torvalds
@ 2016-10-23 19:46           ` Linus Torvalds
  2016-10-24 11:15             ` Geert Uytterhoeven
  2016-10-24 14:08             ` Sergey Senozhatsky
  2016-10-23 20:33           ` Joe Perches
  1 sibling, 2 replies; 36+ messages in thread
From: Linus Torvalds @ 2016-10-23 19:46 UTC (permalink / raw)
  To: Joe Perches
  Cc: Geert Uytterhoeven, Tetsuo Handa, Linux Kernel Mailing List,
	Sergey Senozhatsky, Petr Mladek

[-- Attachment #1: Type: text/plain, Size: 1737 bytes --]

On Sun, Oct 23, 2016 at 12:32 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> No, the real complexity comes from that interaction with the console
> output, which is done outside the core log locks, and which currently
> has the added thing where we have a "has this line fragment been
> flushed or not".

Ok, so here's the stupid patch that removes all the partial line flushing.

NOTE! It still leaves all the games with LOG_NEWLINE and LOG_NOCONS
that are pretty much pointless with it. So there's room for more
simplification here.

In particular, the games with LOG_NEWLINE is what Geert's "console and
dmesg output looks different" at least partially comes from. What
happens is that "dmesg" always shows the records as one line (so it
effectively ignores LOG_NEWLINE), but the console output (in
msg_print_text() still has that LOG_NEWLINE logic.

In particular, msg_print_text() looks at the *previous* logged line to
decide whether it should do newlines etc, which is why Geert gets that
odd "two continuations per line" pattern on the console, but "one
continuation per line" in dmesg. That comes from the interaction with
flushing to the console and LOG_NEWLINE and just general complexity.

All of that LOG_NEWLINE code could be removed. But again, this patch
doesn't do that removal. It just removes the partial console flushing
and simplifies that part of the code.

(This patch removes way more lines than it adds, but the *real*
advantage is that it removes complexity. The rules for
console_cont_flush() really were _very_ hard to grok, it has subtle
interactions with cont_add() and cont_flush() through that "cont.cons"
and "cont.flushed" logic that is all removed by this patch).

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 7743 bytes --]

 kernel/printk/printk.c | 132 +++++++++++--------------------------------------
 1 file changed, 28 insertions(+), 104 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de08fc90baaf..ed1bfa774427 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -794,7 +794,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
 	return ret;
 }
 
-static void cont_flush(void);
+static void deferred_cont_flush(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
@@ -811,7 +811,6 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	if (ret)
 		return ret;
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	while (user->seq == log_next_seq) {
 		if (file->f_flags & O_NONBLOCK) {
 			ret = -EAGAIN;
@@ -860,6 +859,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	ret = len;
 out:
 	mutex_unlock(&user->lock);
+	deferred_cont_flush();
 	return ret;
 }
 
@@ -874,7 +874,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 		return -ESPIPE;
 
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	switch (whence) {
 	case SEEK_SET:
 		/* the first record */
@@ -913,7 +912,6 @@ static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
 	poll_wait(file, &log_wait, wait);
 
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	if (user->seq < log_next_seq) {
 		/* return error when data has vanished underneath us */
 		if (user->seq < log_first_seq)
@@ -922,6 +920,7 @@ static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
 			ret = POLLIN|POLLRDNORM;
 	}
 	raw_spin_unlock_irq(&logbuf_lock);
+	deferred_cont_flush();
 
 	return ret;
 }
@@ -1300,7 +1299,6 @@ static int syslog_print(char __user *buf, int size)
 		size_t skip;
 
 		raw_spin_lock_irq(&logbuf_lock);
-		cont_flush();
 		if (syslog_seq < log_first_seq) {
 			/* messages are gone, move to first one */
 			syslog_seq = log_first_seq;
@@ -1360,7 +1358,6 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		return -ENOMEM;
 
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	if (buf) {
 		u64 next_seq;
 		u64 seq;
@@ -1522,7 +1519,6 @@ int do_syslog(int type, char __user *buf, int len, int source)
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
 		raw_spin_lock_irq(&logbuf_lock);
-		cont_flush();
 		if (syslog_seq < log_first_seq) {
 			/* messages are gone, move to first one */
 			syslog_seq = log_first_seq;
@@ -1563,6 +1559,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		error = -EINVAL;
 		break;
 	}
+	deferred_cont_flush();
 out:
 	return error;
 }
@@ -1650,46 +1647,41 @@ static inline void printk_delay(void)
 static struct cont {
 	char buf[LOG_LINE_MAX];
 	size_t len;			/* length == 0 means unused buffer */
-	size_t cons;			/* bytes written to console */
 	struct task_struct *owner;	/* task of first print*/
 	u64 ts_nsec;			/* time of first print */
 	u8 level;			/* log level of first message */
 	u8 facility;			/* log facility of first message */
 	enum log_flags flags;		/* prefix, newline flags */
-	bool flushed:1;			/* buffer sealed and committed */
 } cont;
 
-static void cont_flush(void)
+static bool cont_flush(void)
 {
-	if (cont.flushed)
-		return;
-	if (cont.len == 0)
+	if (!cont.len)
+		return false;
+
+	log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
+		  NULL, 0, cont.buf, cont.len);
+	cont.len = 0;
+	return true;
+}
+
+static void flush_timer(unsigned long data)
+{
+	if (cont_flush())
+		wake_up_klogd();
+}
+
+static void deferred_cont_flush(void)
+{
+	static DEFINE_TIMER(timer, flush_timer, 0, 0);
+
+	if (!cont.len)
 		return;
-	if (cont.cons) {
-		/*
-		 * If a fragment of this line was directly flushed to the
-		 * console; wait for the console to pick up the rest of the
-		 * line. LOG_NOCONS suppresses a duplicated output.
-		 */
-		log_store(cont.facility, cont.level, cont.flags | LOG_NOCONS,
-			  cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-		cont.flushed = true;
-	} else {
-		/*
-		 * If no fragment of this line ever reached the console,
-		 * just submit it to the store and free the buffer.
-		 */
-		log_store(cont.facility, cont.level, cont.flags, 0,
-			  NULL, 0, cont.buf, cont.len);
-		cont.len = 0;
-	}
+	mod_timer(&timer, jiffies + HZ/10);
 }
 
 static bool cont_add(int facility, int level, enum log_flags flags, const char *text, size_t len)
 {
-	if (cont.len && cont.flushed)
-		return false;
-
 	/*
 	 * If ext consoles are present, flush and skip in-kernel
 	 * continuation.  See nr_ext_console_drivers definition.  Also, if
@@ -1706,8 +1698,6 @@ static bool cont_add(int facility, int level, enum log_flags flags, const char *
 		cont.owner = current;
 		cont.ts_nsec = local_clock();
 		cont.flags = flags;
-		cont.cons = 0;
-		cont.flushed = false;
 	}
 
 	memcpy(cont.buf + cont.len, text, len);
@@ -1726,34 +1716,6 @@ static bool cont_add(int facility, int level, enum log_flags flags, const char *
 	return true;
 }
 
-static size_t cont_print_text(char *text, size_t size)
-{
-	size_t textlen = 0;
-	size_t len;
-
-	if (cont.cons == 0 && (console_prev & LOG_NEWLINE)) {
-		textlen += print_time(cont.ts_nsec, text);
-		size -= textlen;
-	}
-
-	len = cont.len - cont.cons;
-	if (len > 0) {
-		if (len+1 > size)
-			len = size-1;
-		memcpy(text + textlen, cont.buf + cont.cons, len);
-		textlen += len;
-		cont.cons = cont.len;
-	}
-
-	if (cont.flushed) {
-		if (cont.flags & LOG_NEWLINE)
-			text[textlen++] = '\n';
-		/* got everything, release buffer */
-		cont.len = 0;
-	}
-	return textlen;
-}
-
 static size_t log_output(int facility, int level, enum log_flags lflags, const char *dict, size_t dictlen, char *text, size_t text_len)
 {
 	/*
@@ -2025,7 +1987,6 @@ static void call_console_drivers(int level,
 				 const char *text, size_t len) {}
 static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
 			     bool syslog, char *buf, size_t size) { return 0; }
-static size_t cont_print_text(char *text, size_t size) { return 0; }
 static bool suppress_message_printing(int level) { return false; }
 
 /* Still needs to be defined for users */
@@ -2296,42 +2257,6 @@ static inline int can_use_console(void)
 	return cpu_online(raw_smp_processor_id()) || have_callable_console();
 }
 
-static void console_cont_flush(char *text, size_t size)
-{
-	unsigned long flags;
-	size_t len;
-
-	raw_spin_lock_irqsave(&logbuf_lock, flags);
-
-	if (!cont.len)
-		goto out;
-
-	if (suppress_message_printing(cont.level)) {
-		cont.cons = cont.len;
-		if (cont.flushed)
-			cont.len = 0;
-		goto out;
-	}
-
-	/*
-	 * We still queue earlier records, likely because the console was
-	 * busy. The earlier ones need to be printed before this one, we
-	 * did not flush any fragment so far, so just let it queue up.
-	 */
-	if (console_seq < log_next_seq && !cont.cons)
-		goto out;
-
-	len = cont_print_text(text, size);
-	raw_spin_unlock(&logbuf_lock);
-	stop_critical_timings();
-	call_console_drivers(cont.level, NULL, 0, text, len);
-	start_critical_timings();
-	local_irq_restore(flags);
-	return;
-out:
-	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
-}
-
 /**
  * console_unlock - unlock the console system
  *
@@ -2360,6 +2285,8 @@ void console_unlock(void)
 		return;
 	}
 
+	deferred_cont_flush();
+
 	/*
 	 * Console drivers are called under logbuf_lock, so
 	 * @console_may_schedule should be cleared before; however, we may
@@ -2385,9 +2312,6 @@ void console_unlock(void)
 		return;
 	}
 
-	/* flush buffered message fragment immediately to console */
-	console_cont_flush(text, sizeof(text));
-
 	for (;;) {
 		struct printk_log *msg;
 		size_t ext_len = 0;

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-23 19:32         ` Linus Torvalds
  2016-10-23 19:46           ` Linus Torvalds
@ 2016-10-23 20:33           ` Joe Perches
  2016-10-23 21:13             ` Linus Torvalds
  1 sibling, 1 reply; 36+ messages in thread
From: Joe Perches @ 2016-10-23 20:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Geert Uytterhoeven, Tetsuo Handa, Linux Kernel Mailing List,
	Sergey Senozhatsky, Petr Mladek

On Sun, 2016-10-23 at 12:32 -0700, Linus Torvalds wrote:
> On Sun, Oct 23, 2016 at 12:06 PM, Joe Perches <joe@perches.com> wrote:
> > On Sun, 2016-10-23 at 11:11 -0700, Linus Torvalds wrote:
> > > 
> > > And those two per se sound fairly easy to handle ("KERN_CONT means
> > > append to the line buffer, otherwise flush the line buffer and move to
> > > the record buffer").
> > > 
> > > But what complicates things more is then the "console output", which
> > > has two issues:
> > > 
> > >  - it is done outside the locking regime for the line buffer and the
> > > record buffer.
> > > 
> > >  - it is done on _partial_ line buffers.
> > 
> > 
> > EOL KERN_<LEVEL> and thread interleaving still exists.
> 
> 
> Note that the thread interleaving is still trivial: it's easily done
> at the point where we decide "can we append to the line buffer or
> not". That's pretty simple. Just flush the record when the thread
> changes.
> 
> So the interleaving will never go away, it's very fundamental - unless
> we make the line buffer just be a per-thread thing. And yes, that
> would be the cleanest solution, but it's also an extra buffer for each
> thread, so realistically it's just not going to happen.

I doubt there are cases where more than a few of
these interleaving threads are simultaneous.

Perhaps it could be a pool of active thread
continuation buffers.  

> End result: I'm not worried about the interleaving. It will cause ugly
> output, but we've always had that, and the solution to it is "if you
> absolutely don't want interleaving, then don't try to print partial
> lines!".
> The classic "don't do that then" response, in other world.

Yup, best solution.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-23 20:33           ` Joe Perches
@ 2016-10-23 21:13             ` Linus Torvalds
  0 siblings, 0 replies; 36+ messages in thread
From: Linus Torvalds @ 2016-10-23 21:13 UTC (permalink / raw)
  To: Joe Perches
  Cc: Geert Uytterhoeven, Tetsuo Handa, Linux Kernel Mailing List,
	Sergey Senozhatsky, Petr Mladek

On Sun, Oct 23, 2016 at 1:33 PM, Joe Perches <joe@perches.com> wrote:
>
> Perhaps it could be a pool of active thread
> continuation buffers.

Yes, we could probably do with just a couple of entries. Even just two
would probably catch almost all cases.

That said, having dealt with that code, I'd hate to make it more
complex again for something where the preferred solution is "don't do
it".

             Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-23 19:46           ` Linus Torvalds
@ 2016-10-24 11:15             ` Geert Uytterhoeven
  2016-10-24 14:08             ` Sergey Senozhatsky
  1 sibling, 0 replies; 36+ messages in thread
From: Geert Uytterhoeven @ 2016-10-24 11:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joe Perches, Tetsuo Handa, Linux Kernel Mailing List,
	Sergey Senozhatsky, Petr Mladek

Hi Linus,

On Sun, Oct 23, 2016 at 9:46 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Oct 23, 2016 at 12:32 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> No, the real complexity comes from that interaction with the console
>> output, which is done outside the core log locks, and which currently
>> has the added thing where we have a "has this line fragment been
>> flushed or not".
>
> Ok, so here's the stupid patch that removes all the partial line flushing.
>
> NOTE! It still leaves all the games with LOG_NEWLINE and LOG_NOCONS
> that are pretty much pointless with it. So there's room for more
> simplification here.
>
> In particular, the games with LOG_NEWLINE is what Geert's "console and
> dmesg output looks different" at least partially comes from. What
> happens is that "dmesg" always shows the records as one line (so it
> effectively ignores LOG_NEWLINE), but the console output (in
> msg_print_text() still has that LOG_NEWLINE logic.
>
> In particular, msg_print_text() looks at the *previous* logged line to
> decide whether it should do newlines etc, which is why Geert gets that
> odd "two continuations per line" pattern on the console, but "one
> continuation per line" in dmesg. That comes from the interaction with
> flushing to the console and LOG_NEWLINE and just general complexity.

Thanks, Linux kernel output is again in sync with dmesg output.

Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-23 19:46           ` Linus Torvalds
  2016-10-24 11:15             ` Geert Uytterhoeven
@ 2016-10-24 14:08             ` Sergey Senozhatsky
  2016-10-24 14:23               ` Sergey Senozhatsky
  2016-10-24 17:54               ` Linus Torvalds
  1 sibling, 2 replies; 36+ messages in thread
From: Sergey Senozhatsky @ 2016-10-24 14:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joe Perches, Geert Uytterhoeven, Tetsuo Handa,
	Linux Kernel Mailing List, Sergey Senozhatsky, Petr Mladek,
	Sergey Senozhatsky

Hello,

thanks for Cc-ing.

On (10/23/16 12:46), Linus Torvalds wrote:
> +static void deferred_cont_flush(void)
> +{
> +	static DEFINE_TIMER(timer, flush_timer, 0, 0);
> +
> +	if (!cont.len)
>  		return;
> +	mod_timer(&timer, jiffies + HZ/10);
>  }

[..]

> @@ -2360,6 +2285,8 @@ void console_unlock(void)
>  		return;
>  	}
>  
> +	deferred_cont_flush();
> +

is mod_timer() safe enough to rely on/call from
	panic()->console_flush_on_panic()->console_unlock() ?

shouldn't deferred_cont_flush() be called every time we jump
to `again' label in console_unlock()?

timer has debug object support, which probably can printk(), but
that shouldn't cause any troubles, I suppose.

	-ss

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-24 14:08             ` Sergey Senozhatsky
@ 2016-10-24 14:23               ` Sergey Senozhatsky
  2016-10-24 17:54               ` Linus Torvalds
  1 sibling, 0 replies; 36+ messages in thread
From: Sergey Senozhatsky @ 2016-10-24 14:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joe Perches, Geert Uytterhoeven, Tetsuo Handa,
	Linux Kernel Mailing List, Petr Mladek, Sergey Senozhatsky,
	Sergey Senozhatsky

> On (10/23/16 12:46), Linus Torvalds wrote:
> > +static void deferred_cont_flush(void)
> > +{
> > +	static DEFINE_TIMER(timer, flush_timer, 0, 0);
> > +
> > +	if (!cont.len)
> >  		return;
> > +	mod_timer(&timer, jiffies + HZ/10);
> >  }
> 
> [..]
> 
> > @@ -2360,6 +2285,8 @@ void console_unlock(void)
> >  		return;
> >  	}
> >  
> > +	deferred_cont_flush();
> > +
> 

[    0.000000] BUG: spinlock bad magic on CPU#0, swapper/0
[    0.000000]  lock: __init_begin+0x1cebc0/0x1da000, .magic: 00000000, .owner: swapper/0, .owner_cpu: 0
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc1-next-20161024-dbg-dirty #150
[    0.000000]  ffffffff81803c18 ffffffff81221216 ffffffff8180a540 ffffffff81ac7bc0
[    0.000000]  ffffffff81803c38 ffffffff81081414 ffffffff81ac7bc0 0000000000000046
[    0.000000]  ffffffff81803c58 ffffffff81081444 ffffffff81ac7bc0 ffffffff81737e85
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff81221216>] dump_stack+0x4f/0x65
[    0.000000]  [<ffffffff81081414>] spin_dump+0x8a/0x8f
[    0.000000]  [<ffffffff81081444>] spin_bug+0x2b/0x2d
[    0.000000]  [<ffffffff81081604>] do_raw_spin_unlock+0x22/0x81
[    0.000000]  [<ffffffff814b145a>] _raw_spin_unlock_irqrestore+0x2c/0x4e
[    0.000000]  [<ffffffff81097974>] mod_timer+0x266/0x2a0
[    0.000000]  [<ffffffff81084f14>] deferred_cont_flush+0x2a/0x2c
[    0.000000]  [<ffffffff81086092>] console_unlock+0x28/0x45c
[    0.000000]  [<ffffffff8108686c>] ? vprintk_emit+0x3a6/0x3c5
[    0.000000]  [<ffffffff81086875>] vprintk_emit+0x3af/0x3c5
[    0.000000]  [<ffffffff810869f7>] vprintk_default+0x1d/0x1f
[    0.000000]  [<ffffffff810ecb95>] printk+0x48/0x50
[    0.000000]  [<ffffffff810869f7>] ? vprintk_default+0x1d/0x1f
[    0.000000]  [<ffffffff81adb082>] e820_print_map+0x4a/0x70
[    0.000000]  [<ffffffff81000000>] ? 0xffffffff81000000
[    0.000000]  [<ffffffff81adbbc4>] setup_memory_map+0x37/0x3a
[    0.000000]  [<ffffffff81ad9264>] setup_arch+0x16d/0x890
[    0.000000]  [<ffffffff81ad3b9c>] start_kernel+0x59/0x40c
[    0.000000]  [<ffffffff81ad328e>] x86_64_start_reservations+0x2a/0x2c
[    0.000000]  [<ffffffff81ad340a>] x86_64_start_kernel+0x17a/0x18d
[    0.000000]  [<ffffffff810001a5>] start_cpu+0x5/0x14
[    0.000000]  [<ffffffff810001a5>] ? start_cpu+0x5/0x14
[    0.000000] usable

	-ss

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-24 14:08             ` Sergey Senozhatsky
  2016-10-24 14:23               ` Sergey Senozhatsky
@ 2016-10-24 17:54               ` Linus Torvalds
  2016-10-24 17:55                 ` Linus Torvalds
  1 sibling, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2016-10-24 17:54 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Joe Perches, Geert Uytterhoeven, Tetsuo Handa,
	Linux Kernel Mailing List, Petr Mladek, Sergey Senozhatsky

On Mon, Oct 24, 2016 at 7:08 AM, Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>
> is mod_timer() safe enough to rely on/call from
>         panic()->console_flush_on_panic()->console_unlock() ?

I don't think that's a big issue: the whole "panic()" siotuation is
very much about best effort.

> shouldn't deferred_cont_flush() be called every time we jump
> to `again' label in console_unlock()?

actually, I think we should just do it at the end.

And your "spinlock bad magic" reports says to me that we should
actually delay it even more, and just do it in klogd, to avoid the
whole issue with "printk() is used very _very_ early during boot".

Because if klogd isn't even running yet, then we clearly don't care
about timely flushing to the logs.

So here's a new version - TOTALLY UNTESTED! - that removes even more
redundant code (all the "test previous record flags" crud - it is not
sensible to think that the previous record was what a continuation was
all about, since the whole reason the continuation didn't get merged
was that something else happened in between).

Note the "totally untested" part. It compiles for me. But it may do
unspeakably stupid things. Caveat applior.

             Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-24 17:54               ` Linus Torvalds
@ 2016-10-24 17:55                 ` Linus Torvalds
  2016-10-25  1:55                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2016-10-24 17:55 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Joe Perches, Geert Uytterhoeven, Tetsuo Handa,
	Linux Kernel Mailing List, Petr Mladek, Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 296 bytes --]

On Mon, Oct 24, 2016 at 10:54 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Note the "totally untested" part. It compiles for me. But it may do
> unspeakably stupid things. Caveat applior.

Well, it is hard to apply a patch that I didn't even attach. Blush.

               Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 19679 bytes --]

 kernel/printk/printk.c | 249 +++++++++++--------------------------------------
 1 file changed, 52 insertions(+), 197 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de08fc90baaf..6df4d3f6c276 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -367,7 +367,6 @@ DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static u32 syslog_idx;
-static enum log_flags syslog_prev;
 static size_t syslog_partial;
 
 /* index and sequence number of the first record stored in the buffer */
@@ -381,7 +380,6 @@ static u32 log_next_idx;
 /* the next printk record to write to the console */
 static u64 console_seq;
 static u32 console_idx;
-static enum log_flags console_prev;
 
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
@@ -650,27 +648,15 @@ static void append_char(char **pp, char *e, char c)
 }
 
 static ssize_t msg_print_ext_header(char *buf, size_t size,
-				    struct printk_log *msg, u64 seq,
-				    enum log_flags prev_flags)
+				    struct printk_log *msg, u64 seq)
 {
 	u64 ts_usec = msg->ts_nsec;
-	char cont = '-';
 
 	do_div(ts_usec, 1000);
 
-	/*
-	 * If we couldn't merge continuation line fragments during the print,
-	 * export the stored flags to allow an optional external merge of the
-	 * records. Merging the records isn't always neccessarily correct, like
-	 * when we hit a race during printing. In most cases though, it produces
-	 * better readable output. 'c' in the record flags mark the first
-	 * fragment of a line, '+' the following.
-	 */
-	if (msg->flags & LOG_CONT)
-		cont = (prev_flags & LOG_CONT) ? '+' : 'c';
-
 	return scnprintf(buf, size, "%u,%llu,%llu,%c;",
-		       (msg->facility << 3) | msg->level, seq, ts_usec, cont);
+		       (msg->facility << 3) | msg->level, seq, ts_usec,
+		       msg->flags & LOG_CONT ? 'c' : '-');
 }
 
 static ssize_t msg_print_ext_body(char *buf, size_t size,
@@ -725,7 +711,6 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 struct devkmsg_user {
 	u64 seq;
 	u32 idx;
-	enum log_flags prev;
 	struct ratelimit_state rs;
 	struct mutex lock;
 	char buf[CONSOLE_EXT_LOG_MAX];
@@ -794,7 +779,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
 	return ret;
 }
 
-static void cont_flush(void);
+static void deferred_cont_flush(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
@@ -811,7 +796,6 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	if (ret)
 		return ret;
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	while (user->seq == log_next_seq) {
 		if (file->f_flags & O_NONBLOCK) {
 			ret = -EAGAIN;
@@ -838,12 +822,11 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 
 	msg = log_from_idx(user->idx);
 	len = msg_print_ext_header(user->buf, sizeof(user->buf),
-				   msg, user->seq, user->prev);
+				   msg, user->seq);
 	len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
 				  log_dict(msg), msg->dict_len,
 				  log_text(msg), msg->text_len);
 
-	user->prev = msg->flags;
 	user->idx = log_next(user->idx);
 	user->seq++;
 	raw_spin_unlock_irq(&logbuf_lock);
@@ -860,6 +843,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	ret = len;
 out:
 	mutex_unlock(&user->lock);
+	deferred_cont_flush();
 	return ret;
 }
 
@@ -874,7 +858,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 		return -ESPIPE;
 
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	switch (whence) {
 	case SEEK_SET:
 		/* the first record */
@@ -913,7 +896,6 @@ static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
 	poll_wait(file, &log_wait, wait);
 
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	if (user->seq < log_next_seq) {
 		/* return error when data has vanished underneath us */
 		if (user->seq < log_first_seq)
@@ -922,6 +904,7 @@ static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
 			ret = POLLIN|POLLRDNORM;
 	}
 	raw_spin_unlock_irq(&logbuf_lock);
+	deferred_cont_flush();
 
 	return ret;
 }
@@ -1226,26 +1209,12 @@ static size_t print_prefix(const struct printk_log *msg, bool syslog, char *buf)
 	return len;
 }
 
-static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
-			     bool syslog, char *buf, size_t size)
+static size_t msg_print_text(const struct printk_log *msg, bool syslog, char *buf, size_t size)
 {
 	const char *text = log_text(msg);
 	size_t text_size = msg->text_len;
-	bool prefix = true;
-	bool newline = true;
 	size_t len = 0;
 
-	if ((prev & LOG_CONT) && !(msg->flags & LOG_PREFIX))
-		prefix = false;
-
-	if (msg->flags & LOG_CONT) {
-		if ((prev & LOG_CONT) && !(prev & LOG_NEWLINE))
-			prefix = false;
-
-		if (!(msg->flags & LOG_NEWLINE))
-			newline = false;
-	}
-
 	do {
 		const char *next = memchr(text, '\n', text_size);
 		size_t text_len;
@@ -1263,22 +1232,17 @@ static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
 			    text_len + 1 >= size - len)
 				break;
 
-			if (prefix)
-				len += print_prefix(msg, syslog, buf + len);
+			len += print_prefix(msg, syslog, buf + len);
 			memcpy(buf + len, text, text_len);
 			len += text_len;
-			if (next || newline)
-				buf[len++] = '\n';
+			buf[len++] = '\n';
 		} else {
 			/* SYSLOG_ACTION_* buffer size only calculation */
-			if (prefix)
-				len += print_prefix(msg, syslog, NULL);
+			len += print_prefix(msg, syslog, NULL);
 			len += text_len;
-			if (next || newline)
-				len++;
+			len++;
 		}
 
-		prefix = true;
 		text = next;
 	} while (text);
 
@@ -1300,12 +1264,10 @@ static int syslog_print(char __user *buf, int size)
 		size_t skip;
 
 		raw_spin_lock_irq(&logbuf_lock);
-		cont_flush();
 		if (syslog_seq < log_first_seq) {
 			/* messages are gone, move to first one */
 			syslog_seq = log_first_seq;
 			syslog_idx = log_first_idx;
-			syslog_prev = 0;
 			syslog_partial = 0;
 		}
 		if (syslog_seq == log_next_seq) {
@@ -1315,13 +1277,11 @@ static int syslog_print(char __user *buf, int size)
 
 		skip = syslog_partial;
 		msg = log_from_idx(syslog_idx);
-		n = msg_print_text(msg, syslog_prev, true, text,
-				   LOG_LINE_MAX + PREFIX_MAX);
+		n = msg_print_text(msg, true, text, LOG_LINE_MAX + PREFIX_MAX);
 		if (n - syslog_partial <= size) {
 			/* message fits into buffer, move forward */
 			syslog_idx = log_next(syslog_idx);
 			syslog_seq++;
-			syslog_prev = msg->flags;
 			n -= syslog_partial;
 			syslog_partial = 0;
 		} else if (!len){
@@ -1360,12 +1320,10 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		return -ENOMEM;
 
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	if (buf) {
 		u64 next_seq;
 		u64 seq;
 		u32 idx;
-		enum log_flags prev;
 
 		/*
 		 * Find first record that fits, including all following records,
@@ -1373,12 +1331,10 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		 */
 		seq = clear_seq;
 		idx = clear_idx;
-		prev = 0;
 		while (seq < log_next_seq) {
 			struct printk_log *msg = log_from_idx(idx);
 
-			len += msg_print_text(msg, prev, true, NULL, 0);
-			prev = msg->flags;
+			len += msg_print_text(msg, true, NULL, 0);
 			idx = log_next(idx);
 			seq++;
 		}
@@ -1386,12 +1342,10 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		/* move first record forward until length fits into the buffer */
 		seq = clear_seq;
 		idx = clear_idx;
-		prev = 0;
 		while (len > size && seq < log_next_seq) {
 			struct printk_log *msg = log_from_idx(idx);
 
-			len -= msg_print_text(msg, prev, true, NULL, 0);
-			prev = msg->flags;
+			len -= msg_print_text(msg, true, NULL, 0);
 			idx = log_next(idx);
 			seq++;
 		}
@@ -1404,7 +1358,7 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 			struct printk_log *msg = log_from_idx(idx);
 			int textlen;
 
-			textlen = msg_print_text(msg, prev, true, text,
+			textlen = msg_print_text(msg, true, text,
 						 LOG_LINE_MAX + PREFIX_MAX);
 			if (textlen < 0) {
 				len = textlen;
@@ -1412,7 +1366,6 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 			}
 			idx = log_next(idx);
 			seq++;
-			prev = msg->flags;
 
 			raw_spin_unlock_irq(&logbuf_lock);
 			if (copy_to_user(buf + len, text, textlen))
@@ -1425,7 +1378,6 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 				/* messages are gone, move to next one */
 				seq = log_first_seq;
 				idx = log_first_idx;
-				prev = 0;
 			}
 		}
 	}
@@ -1522,12 +1474,10 @@ int do_syslog(int type, char __user *buf, int len, int source)
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
 		raw_spin_lock_irq(&logbuf_lock);
-		cont_flush();
 		if (syslog_seq < log_first_seq) {
 			/* messages are gone, move to first one */
 			syslog_seq = log_first_seq;
 			syslog_idx = log_first_idx;
-			syslog_prev = 0;
 			syslog_partial = 0;
 		}
 		if (source == SYSLOG_FROM_PROC) {
@@ -1540,16 +1490,14 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		} else {
 			u64 seq = syslog_seq;
 			u32 idx = syslog_idx;
-			enum log_flags prev = syslog_prev;
 
 			error = 0;
 			while (seq < log_next_seq) {
 				struct printk_log *msg = log_from_idx(idx);
 
-				error += msg_print_text(msg, prev, true, NULL, 0);
+				error += msg_print_text(msg, true, NULL, 0);
 				idx = log_next(idx);
 				seq++;
-				prev = msg->flags;
 			}
 			error -= syslog_partial;
 		}
@@ -1563,6 +1511,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		error = -EINVAL;
 		break;
 	}
+	deferred_cont_flush();
 out:
 	return error;
 }
@@ -1650,46 +1599,41 @@ static inline void printk_delay(void)
 static struct cont {
 	char buf[LOG_LINE_MAX];
 	size_t len;			/* length == 0 means unused buffer */
-	size_t cons;			/* bytes written to console */
 	struct task_struct *owner;	/* task of first print*/
 	u64 ts_nsec;			/* time of first print */
 	u8 level;			/* log level of first message */
 	u8 facility;			/* log facility of first message */
 	enum log_flags flags;		/* prefix, newline flags */
-	bool flushed:1;			/* buffer sealed and committed */
 } cont;
 
-static void cont_flush(void)
+static bool cont_flush(void)
 {
-	if (cont.flushed)
-		return;
-	if (cont.len == 0)
+	if (!cont.len)
+		return false;
+
+	log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
+		  NULL, 0, cont.buf, cont.len);
+	cont.len = 0;
+	return true;
+}
+
+static void flush_timer(unsigned long data)
+{
+	if (cont_flush())
+		wake_up_klogd();
+}
+
+static void deferred_cont_flush(void)
+{
+	static DEFINE_TIMER(timer, flush_timer, 0, 0);
+
+	if (!cont.len)
 		return;
-	if (cont.cons) {
-		/*
-		 * If a fragment of this line was directly flushed to the
-		 * console; wait for the console to pick up the rest of the
-		 * line. LOG_NOCONS suppresses a duplicated output.
-		 */
-		log_store(cont.facility, cont.level, cont.flags | LOG_NOCONS,
-			  cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-		cont.flushed = true;
-	} else {
-		/*
-		 * If no fragment of this line ever reached the console,
-		 * just submit it to the store and free the buffer.
-		 */
-		log_store(cont.facility, cont.level, cont.flags, 0,
-			  NULL, 0, cont.buf, cont.len);
-		cont.len = 0;
-	}
+	mod_timer(&timer, jiffies + HZ/10);
 }
 
 static bool cont_add(int facility, int level, enum log_flags flags, const char *text, size_t len)
 {
-	if (cont.len && cont.flushed)
-		return false;
-
 	/*
 	 * If ext consoles are present, flush and skip in-kernel
 	 * continuation.  See nr_ext_console_drivers definition.  Also, if
@@ -1706,8 +1650,6 @@ static bool cont_add(int facility, int level, enum log_flags flags, const char *
 		cont.owner = current;
 		cont.ts_nsec = local_clock();
 		cont.flags = flags;
-		cont.cons = 0;
-		cont.flushed = false;
 	}
 
 	memcpy(cont.buf + cont.len, text, len);
@@ -1726,34 +1668,6 @@ static bool cont_add(int facility, int level, enum log_flags flags, const char *
 	return true;
 }
 
-static size_t cont_print_text(char *text, size_t size)
-{
-	size_t textlen = 0;
-	size_t len;
-
-	if (cont.cons == 0 && (console_prev & LOG_NEWLINE)) {
-		textlen += print_time(cont.ts_nsec, text);
-		size -= textlen;
-	}
-
-	len = cont.len - cont.cons;
-	if (len > 0) {
-		if (len+1 > size)
-			len = size-1;
-		memcpy(text + textlen, cont.buf + cont.cons, len);
-		textlen += len;
-		cont.cons = cont.len;
-	}
-
-	if (cont.flushed) {
-		if (cont.flags & LOG_NEWLINE)
-			text[textlen++] = '\n';
-		/* got everything, release buffer */
-		cont.len = 0;
-	}
-	return textlen;
-}
-
 static size_t log_output(int facility, int level, enum log_flags lflags, const char *dict, size_t dictlen, char *text, size_t text_len)
 {
 	/*
@@ -1999,11 +1913,9 @@ static u64 syslog_seq;
 static u32 syslog_idx;
 static u64 console_seq;
 static u32 console_idx;
-static enum log_flags syslog_prev;
 static u64 log_first_seq;
 static u32 log_first_idx;
 static u64 log_next_seq;
-static enum log_flags console_prev;
 static struct cont {
 	size_t len;
 	size_t cons;
@@ -2015,17 +1927,16 @@ static char *log_dict(const struct printk_log *msg) { return NULL; }
 static struct printk_log *log_from_idx(u32 idx) { return NULL; }
 static u32 log_next(u32 idx) { return 0; }
 static ssize_t msg_print_ext_header(char *buf, size_t size,
-				    struct printk_log *msg, u64 seq,
-				    enum log_flags prev_flags) { return 0; }
+				    struct printk_log *msg,
+				    u64 seq) { return 0; }
 static ssize_t msg_print_ext_body(char *buf, size_t size,
 				  char *dict, size_t dict_len,
 				  char *text, size_t text_len) { return 0; }
 static void call_console_drivers(int level,
 				 const char *ext_text, size_t ext_len,
 				 const char *text, size_t len) {}
-static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
+static size_t msg_print_text(const struct printk_log *msg,
 			     bool syslog, char *buf, size_t size) { return 0; }
-static size_t cont_print_text(char *text, size_t size) { return 0; }
 static bool suppress_message_printing(int level) { return false; }
 
 /* Still needs to be defined for users */
@@ -2296,42 +2207,6 @@ static inline int can_use_console(void)
 	return cpu_online(raw_smp_processor_id()) || have_callable_console();
 }
 
-static void console_cont_flush(char *text, size_t size)
-{
-	unsigned long flags;
-	size_t len;
-
-	raw_spin_lock_irqsave(&logbuf_lock, flags);
-
-	if (!cont.len)
-		goto out;
-
-	if (suppress_message_printing(cont.level)) {
-		cont.cons = cont.len;
-		if (cont.flushed)
-			cont.len = 0;
-		goto out;
-	}
-
-	/*
-	 * We still queue earlier records, likely because the console was
-	 * busy. The earlier ones need to be printed before this one, we
-	 * did not flush any fragment so far, so just let it queue up.
-	 */
-	if (console_seq < log_next_seq && !cont.cons)
-		goto out;
-
-	len = cont_print_text(text, size);
-	raw_spin_unlock(&logbuf_lock);
-	stop_critical_timings();
-	call_console_drivers(cont.level, NULL, 0, text, len);
-	start_critical_timings();
-	local_irq_restore(flags);
-	return;
-out:
-	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
-}
-
 /**
  * console_unlock - unlock the console system
  *
@@ -2385,9 +2260,6 @@ void console_unlock(void)
 		return;
 	}
 
-	/* flush buffered message fragment immediately to console */
-	console_cont_flush(text, sizeof(text));
-
 	for (;;) {
 		struct printk_log *msg;
 		size_t ext_len = 0;
@@ -2407,7 +2279,6 @@ void console_unlock(void)
 			/* messages are gone, move to first one */
 			console_seq = log_first_seq;
 			console_idx = log_first_idx;
-			console_prev = 0;
 		} else {
 			len = 0;
 		}
@@ -2417,8 +2288,7 @@ void console_unlock(void)
 
 		msg = log_from_idx(console_idx);
 		level = msg->level;
-		if ((msg->flags & LOG_NOCONS) ||
-				suppress_message_printing(level)) {
+		if (suppress_message_printing(level)) {
 			/*
 			 * Skip record we have buffered and already printed
 			 * directly to the console when we received it, and
@@ -2426,22 +2296,14 @@ void console_unlock(void)
 			 */
 			console_idx = log_next(console_idx);
 			console_seq++;
-			/*
-			 * We will get here again when we register a new
-			 * CON_PRINTBUFFER console. Clear the flag so we
-			 * will properly dump everything later.
-			 */
-			msg->flags &= ~LOG_NOCONS;
-			console_prev = msg->flags;
 			goto skip;
 		}
 
-		len += msg_print_text(msg, console_prev, false,
-				      text + len, sizeof(text) - len);
+		len += msg_print_text(msg, false, text + len, sizeof(text) - len);
 		if (nr_ext_console_drivers) {
 			ext_len = msg_print_ext_header(ext_text,
 						sizeof(ext_text),
-						msg, console_seq, console_prev);
+						msg, console_seq);
 			ext_len += msg_print_ext_body(ext_text + ext_len,
 						sizeof(ext_text) - ext_len,
 						log_dict(msg), msg->dict_len,
@@ -2449,7 +2311,6 @@ void console_unlock(void)
 		}
 		console_idx = log_next(console_idx);
 		console_seq++;
-		console_prev = msg->flags;
 		raw_spin_unlock(&logbuf_lock);
 
 		stop_critical_timings();	/* don't trace print latency */
@@ -2483,7 +2344,7 @@ void console_unlock(void)
 	if (retry && console_trylock())
 		goto again;
 
-	if (wake_klogd)
+	if (wake_klogd || cont.len)
 		wake_up_klogd();
 }
 EXPORT_SYMBOL(console_unlock);
@@ -2744,7 +2605,6 @@ void register_console(struct console *newcon)
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
 		console_seq = syslog_seq;
 		console_idx = syslog_idx;
-		console_prev = syslog_prev;
 		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 		/*
 		 * We're about to replay the log buffer.  Only do this to the
@@ -2883,6 +2743,7 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
 
 	if (pending & PRINTK_PENDING_WAKEUP)
 		wake_up_interruptible(&log_wait);
+	deferred_cont_flush();
 }
 
 static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
@@ -3095,7 +2956,7 @@ bool kmsg_dump_get_line_nolock(struct kmsg_dumper *dumper, bool syslog,
 		goto out;
 
 	msg = log_from_idx(dumper->cur_idx);
-	l = msg_print_text(msg, 0, syslog, line, size);
+	l = msg_print_text(msg, syslog, line, size);
 
 	dumper->cur_idx = log_next(dumper->cur_idx);
 	dumper->cur_seq++;
@@ -3165,7 +3026,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
 	u32 idx;
 	u64 next_seq;
 	u32 next_idx;
-	enum log_flags prev;
 	size_t l = 0;
 	bool ret = false;
 
@@ -3189,27 +3049,23 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
 	/* calculate length of entire buffer */
 	seq = dumper->cur_seq;
 	idx = dumper->cur_idx;
-	prev = 0;
 	while (seq < dumper->next_seq) {
 		struct printk_log *msg = log_from_idx(idx);
 
-		l += msg_print_text(msg, prev, true, NULL, 0);
+		l += msg_print_text(msg, true, NULL, 0);
 		idx = log_next(idx);
 		seq++;
-		prev = msg->flags;
 	}
 
 	/* move first record forward until length fits into the buffer */
 	seq = dumper->cur_seq;
 	idx = dumper->cur_idx;
-	prev = 0;
 	while (l > size && seq < dumper->next_seq) {
 		struct printk_log *msg = log_from_idx(idx);
 
-		l -= msg_print_text(msg, prev, true, NULL, 0);
+		l -= msg_print_text(msg, true, NULL, 0);
 		idx = log_next(idx);
 		seq++;
-		prev = msg->flags;
 	}
 
 	/* last message in next interation */
@@ -3220,10 +3076,9 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
 	while (seq < dumper->next_seq) {
 		struct printk_log *msg = log_from_idx(idx);
 
-		l += msg_print_text(msg, prev, syslog, buf + l, size - l);
+		l += msg_print_text(msg, syslog, buf + l, size - l);
 		idx = log_next(idx);
 		seq++;
-		prev = msg->flags;
 	}
 
 	dumper->next_seq = next_seq;

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-24 17:55                 ` Linus Torvalds
@ 2016-10-25  1:55                   ` Sergey Senozhatsky
  2016-10-25  2:06                     ` Linus Torvalds
  2016-10-25  2:24                     ` Sergey Senozhatsky
  0 siblings, 2 replies; 36+ messages in thread
From: Sergey Senozhatsky @ 2016-10-25  1:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Joe Perches, Geert Uytterhoeven,
	Tetsuo Handa, Linux Kernel Mailing List, Petr Mladek,
	Sergey Senozhatsky, Tejun Heo, Calvin Owens, Steven Rostedt,
	Andrew Morton

Hello,

Cc more people

report: https://marc.info/?l=linux-kernel&m=147721454506634&w=2

patch: https://marc.info/?l=linux-kernel&m=147733173800859


FB is using ext header a lot (afaik), so may be Tejun or Calvin
will also be interested.

On (10/24/16 10:55), Linus Torvalds wrote:
> > Note the "totally untested" part. It compiles for me. But it may do
> > unspeakably stupid things. Caveat applior.
> 
> Well, it is hard to apply a patch that I didn't even attach. Blush.

[..]

> -static void cont_flush(void)
> +static bool cont_flush(void)
>  {
> -	if (cont.flushed)
> -		return;
> -	if (cont.len == 0)
> +	if (!cont.len)
> +		return false;
> +
> +	log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
> +		  NULL, 0, cont.buf, cont.len);
> +	cont.len = 0;
> +	return true;
> +}


I think cont_flush() should grab the logbuf_lock lock, because
it does log_store() and touches the cont.len. so something like
this perhaps

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7f490f..47f887c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1608,13 +1608,20 @@ static struct cont {
 
 static bool cont_flush(void)
 {
+       unsigned long flags;
+       bool flushed = false;
+
+       raw_spin_lock_irqsave(&logbuf_lock, flags);
        if (!cont.len)
-               return false;
+               goto out;
 
        log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
                  NULL, 0, cont.buf, cont.len);
        cont.len = 0;
-       return true;
+       flushed = true;
+out:
+       raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+       return flushed;
 }


[..]
> @@ -2449,7 +2311,6 @@ void console_unlock(void)
>  		}
>  		console_idx = log_next(console_idx);
>  		console_seq++;
> -		console_prev = msg->flags;
>  		raw_spin_unlock(&logbuf_lock);
>  
>  		stop_critical_timings();	/* don't trace print latency */
> @@ -2483,7 +2344,7 @@ void console_unlock(void)
>  	if (retry && console_trylock())
>  		goto again;
>  
> -	if (wake_klogd)
> +	if (wake_klogd || cont.len)
			^^^^^^^^^^
this _technically_ can result in additional spurious wakeups - cont.len
check is done outside of console_sem && logbuf_lock - but I don't think
this is a huge problem.

	-ss

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-25  1:55                   ` Sergey Senozhatsky
@ 2016-10-25  2:06                     ` Linus Torvalds
  2016-10-25  2:22                       ` Linus Torvalds
  2016-10-25  2:24                     ` Sergey Senozhatsky
  1 sibling, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2016-10-25  2:06 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Joe Perches, Geert Uytterhoeven,
	Tetsuo Handa, Linux Kernel Mailing List, Petr Mladek, Tejun Heo,
	Calvin Owens, Steven Rostedt, Andrew Morton

On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
>
> I think cont_flush() should grab the logbuf_lock lock, because
> it does log_store() and touches the cont.len. so something like
> this perhaps

Absolutely. Good catch.

>> -     if (wake_klogd)
>> +     if (wake_klogd || cont.len)
>                         ^^^^^^^^^^
> this _technically_ can result in additional spurious wakeups - cont.len
> check is done outside of console_sem && logbuf_lock - but I don't think
> this is a huge problem.

Yes. We could easily just do it inside the spinlock and make it part
of the wake_klogd logic, but as you say, it doesn't actually matter.

And yes, it would be lovely if people who use the network console
(which I think is the only user of the whole extended console thing)
would test this too.

It really makes things potentially much simpler. It's not just that it
removes 150 lines, it's 150 lines of  really nasty complex special
case crud.

I'm obviously not going to apply it for 4.9, but it might be "do it
for 4.10 and mark it for stable if nobody notices any issues", because
apart from the code simplification it also gets more cases "right".
Right now the console flushing often means that we end up not merging
continuation lines as much as we could.

           Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-25  2:06                     ` Linus Torvalds
@ 2016-10-25  2:22                       ` Linus Torvalds
  2016-10-25  4:06                         ` Sergey Senozhatsky
                                           ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Linus Torvalds @ 2016-10-25  2:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Joe Perches, Geert Uytterhoeven,
	Tetsuo Handa, Linux Kernel Mailing List, Petr Mladek, Tejun Heo,
	Calvin Owens, Steven Rostedt, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]

On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
> <sergey.senozhatsky.work@gmail.com> wrote:
>>
>> I think cont_flush() should grab the logbuf_lock lock, because
>> it does log_store() and touches the cont.len. so something like
>> this perhaps
>
> Absolutely. Good catch.

Actually, you can't do it the way you did (inside cont_flush), because
"cont_flush()" is already called with logbuf_lock held in most cases
(see "cont_add()").

So it's really just the timer function that needs to take the
logbuf_lock before it calls cont_flush().

So here's a new version. How does this look to you?

Again, this still tests "cont.len" outside the lock (not just in
console_unlock(), but also in deferred_cont_flush()). And it's fine:
even if it sees the "wrong" value due to some race, it does so either
because cont.len was just set to non-zero (and whoever set it will
force the re-check anyway), or it got cleared just as it was tested
(and at worst you end up with an extra timer invocation).

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 19844 bytes --]

 kernel/printk/printk.c | 255 +++++++++++--------------------------------------
 1 file changed, 58 insertions(+), 197 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de08fc90baaf..e63aa679614e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -367,7 +367,6 @@ DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static u32 syslog_idx;
-static enum log_flags syslog_prev;
 static size_t syslog_partial;
 
 /* index and sequence number of the first record stored in the buffer */
@@ -381,7 +380,6 @@ static u32 log_next_idx;
 /* the next printk record to write to the console */
 static u64 console_seq;
 static u32 console_idx;
-static enum log_flags console_prev;
 
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
@@ -650,27 +648,15 @@ static void append_char(char **pp, char *e, char c)
 }
 
 static ssize_t msg_print_ext_header(char *buf, size_t size,
-				    struct printk_log *msg, u64 seq,
-				    enum log_flags prev_flags)
+				    struct printk_log *msg, u64 seq)
 {
 	u64 ts_usec = msg->ts_nsec;
-	char cont = '-';
 
 	do_div(ts_usec, 1000);
 
-	/*
-	 * If we couldn't merge continuation line fragments during the print,
-	 * export the stored flags to allow an optional external merge of the
-	 * records. Merging the records isn't always neccessarily correct, like
-	 * when we hit a race during printing. In most cases though, it produces
-	 * better readable output. 'c' in the record flags mark the first
-	 * fragment of a line, '+' the following.
-	 */
-	if (msg->flags & LOG_CONT)
-		cont = (prev_flags & LOG_CONT) ? '+' : 'c';
-
 	return scnprintf(buf, size, "%u,%llu,%llu,%c;",
-		       (msg->facility << 3) | msg->level, seq, ts_usec, cont);
+		       (msg->facility << 3) | msg->level, seq, ts_usec,
+		       msg->flags & LOG_CONT ? 'c' : '-');
 }
 
 static ssize_t msg_print_ext_body(char *buf, size_t size,
@@ -725,7 +711,6 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
 struct devkmsg_user {
 	u64 seq;
 	u32 idx;
-	enum log_flags prev;
 	struct ratelimit_state rs;
 	struct mutex lock;
 	char buf[CONSOLE_EXT_LOG_MAX];
@@ -794,7 +779,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
 	return ret;
 }
 
-static void cont_flush(void);
+static void deferred_cont_flush(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
@@ -811,7 +796,6 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	if (ret)
 		return ret;
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	while (user->seq == log_next_seq) {
 		if (file->f_flags & O_NONBLOCK) {
 			ret = -EAGAIN;
@@ -838,12 +822,11 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 
 	msg = log_from_idx(user->idx);
 	len = msg_print_ext_header(user->buf, sizeof(user->buf),
-				   msg, user->seq, user->prev);
+				   msg, user->seq);
 	len += msg_print_ext_body(user->buf + len, sizeof(user->buf) - len,
 				  log_dict(msg), msg->dict_len,
 				  log_text(msg), msg->text_len);
 
-	user->prev = msg->flags;
 	user->idx = log_next(user->idx);
 	user->seq++;
 	raw_spin_unlock_irq(&logbuf_lock);
@@ -860,6 +843,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
 	ret = len;
 out:
 	mutex_unlock(&user->lock);
+	deferred_cont_flush();
 	return ret;
 }
 
@@ -874,7 +858,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
 		return -ESPIPE;
 
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	switch (whence) {
 	case SEEK_SET:
 		/* the first record */
@@ -913,7 +896,6 @@ static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
 	poll_wait(file, &log_wait, wait);
 
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	if (user->seq < log_next_seq) {
 		/* return error when data has vanished underneath us */
 		if (user->seq < log_first_seq)
@@ -922,6 +904,7 @@ static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
 			ret = POLLIN|POLLRDNORM;
 	}
 	raw_spin_unlock_irq(&logbuf_lock);
+	deferred_cont_flush();
 
 	return ret;
 }
@@ -1226,26 +1209,12 @@ static size_t print_prefix(const struct printk_log *msg, bool syslog, char *buf)
 	return len;
 }
 
-static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
-			     bool syslog, char *buf, size_t size)
+static size_t msg_print_text(const struct printk_log *msg, bool syslog, char *buf, size_t size)
 {
 	const char *text = log_text(msg);
 	size_t text_size = msg->text_len;
-	bool prefix = true;
-	bool newline = true;
 	size_t len = 0;
 
-	if ((prev & LOG_CONT) && !(msg->flags & LOG_PREFIX))
-		prefix = false;
-
-	if (msg->flags & LOG_CONT) {
-		if ((prev & LOG_CONT) && !(prev & LOG_NEWLINE))
-			prefix = false;
-
-		if (!(msg->flags & LOG_NEWLINE))
-			newline = false;
-	}
-
 	do {
 		const char *next = memchr(text, '\n', text_size);
 		size_t text_len;
@@ -1263,22 +1232,17 @@ static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
 			    text_len + 1 >= size - len)
 				break;
 
-			if (prefix)
-				len += print_prefix(msg, syslog, buf + len);
+			len += print_prefix(msg, syslog, buf + len);
 			memcpy(buf + len, text, text_len);
 			len += text_len;
-			if (next || newline)
-				buf[len++] = '\n';
+			buf[len++] = '\n';
 		} else {
 			/* SYSLOG_ACTION_* buffer size only calculation */
-			if (prefix)
-				len += print_prefix(msg, syslog, NULL);
+			len += print_prefix(msg, syslog, NULL);
 			len += text_len;
-			if (next || newline)
-				len++;
+			len++;
 		}
 
-		prefix = true;
 		text = next;
 	} while (text);
 
@@ -1300,12 +1264,10 @@ static int syslog_print(char __user *buf, int size)
 		size_t skip;
 
 		raw_spin_lock_irq(&logbuf_lock);
-		cont_flush();
 		if (syslog_seq < log_first_seq) {
 			/* messages are gone, move to first one */
 			syslog_seq = log_first_seq;
 			syslog_idx = log_first_idx;
-			syslog_prev = 0;
 			syslog_partial = 0;
 		}
 		if (syslog_seq == log_next_seq) {
@@ -1315,13 +1277,11 @@ static int syslog_print(char __user *buf, int size)
 
 		skip = syslog_partial;
 		msg = log_from_idx(syslog_idx);
-		n = msg_print_text(msg, syslog_prev, true, text,
-				   LOG_LINE_MAX + PREFIX_MAX);
+		n = msg_print_text(msg, true, text, LOG_LINE_MAX + PREFIX_MAX);
 		if (n - syslog_partial <= size) {
 			/* message fits into buffer, move forward */
 			syslog_idx = log_next(syslog_idx);
 			syslog_seq++;
-			syslog_prev = msg->flags;
 			n -= syslog_partial;
 			syslog_partial = 0;
 		} else if (!len){
@@ -1360,12 +1320,10 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		return -ENOMEM;
 
 	raw_spin_lock_irq(&logbuf_lock);
-	cont_flush();
 	if (buf) {
 		u64 next_seq;
 		u64 seq;
 		u32 idx;
-		enum log_flags prev;
 
 		/*
 		 * Find first record that fits, including all following records,
@@ -1373,12 +1331,10 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		 */
 		seq = clear_seq;
 		idx = clear_idx;
-		prev = 0;
 		while (seq < log_next_seq) {
 			struct printk_log *msg = log_from_idx(idx);
 
-			len += msg_print_text(msg, prev, true, NULL, 0);
-			prev = msg->flags;
+			len += msg_print_text(msg, true, NULL, 0);
 			idx = log_next(idx);
 			seq++;
 		}
@@ -1386,12 +1342,10 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 		/* move first record forward until length fits into the buffer */
 		seq = clear_seq;
 		idx = clear_idx;
-		prev = 0;
 		while (len > size && seq < log_next_seq) {
 			struct printk_log *msg = log_from_idx(idx);
 
-			len -= msg_print_text(msg, prev, true, NULL, 0);
-			prev = msg->flags;
+			len -= msg_print_text(msg, true, NULL, 0);
 			idx = log_next(idx);
 			seq++;
 		}
@@ -1404,7 +1358,7 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 			struct printk_log *msg = log_from_idx(idx);
 			int textlen;
 
-			textlen = msg_print_text(msg, prev, true, text,
+			textlen = msg_print_text(msg, true, text,
 						 LOG_LINE_MAX + PREFIX_MAX);
 			if (textlen < 0) {
 				len = textlen;
@@ -1412,7 +1366,6 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 			}
 			idx = log_next(idx);
 			seq++;
-			prev = msg->flags;
 
 			raw_spin_unlock_irq(&logbuf_lock);
 			if (copy_to_user(buf + len, text, textlen))
@@ -1425,7 +1378,6 @@ static int syslog_print_all(char __user *buf, int size, bool clear)
 				/* messages are gone, move to next one */
 				seq = log_first_seq;
 				idx = log_first_idx;
-				prev = 0;
 			}
 		}
 	}
@@ -1522,12 +1474,10 @@ int do_syslog(int type, char __user *buf, int len, int source)
 	/* Number of chars in the log buffer */
 	case SYSLOG_ACTION_SIZE_UNREAD:
 		raw_spin_lock_irq(&logbuf_lock);
-		cont_flush();
 		if (syslog_seq < log_first_seq) {
 			/* messages are gone, move to first one */
 			syslog_seq = log_first_seq;
 			syslog_idx = log_first_idx;
-			syslog_prev = 0;
 			syslog_partial = 0;
 		}
 		if (source == SYSLOG_FROM_PROC) {
@@ -1540,16 +1490,14 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		} else {
 			u64 seq = syslog_seq;
 			u32 idx = syslog_idx;
-			enum log_flags prev = syslog_prev;
 
 			error = 0;
 			while (seq < log_next_seq) {
 				struct printk_log *msg = log_from_idx(idx);
 
-				error += msg_print_text(msg, prev, true, NULL, 0);
+				error += msg_print_text(msg, true, NULL, 0);
 				idx = log_next(idx);
 				seq++;
-				prev = msg->flags;
 			}
 			error -= syslog_partial;
 		}
@@ -1563,6 +1511,7 @@ int do_syslog(int type, char __user *buf, int len, int source)
 		error = -EINVAL;
 		break;
 	}
+	deferred_cont_flush();
 out:
 	return error;
 }
@@ -1650,46 +1599,47 @@ static inline void printk_delay(void)
 static struct cont {
 	char buf[LOG_LINE_MAX];
 	size_t len;			/* length == 0 means unused buffer */
-	size_t cons;			/* bytes written to console */
 	struct task_struct *owner;	/* task of first print*/
 	u64 ts_nsec;			/* time of first print */
 	u8 level;			/* log level of first message */
 	u8 facility;			/* log facility of first message */
 	enum log_flags flags;		/* prefix, newline flags */
-	bool flushed:1;			/* buffer sealed and committed */
 } cont;
 
-static void cont_flush(void)
+static bool cont_flush(void)
 {
-	if (cont.flushed)
-		return;
-	if (cont.len == 0)
+	if (!cont.len)
+		return false;
+
+	log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
+		  NULL, 0, cont.buf, cont.len);
+	cont.len = 0;
+	return true;
+}
+
+static void flush_timer(unsigned long data)
+{
+	unsigned long flags;
+	bool did_flush;
+
+	raw_spin_lock_irqsave(&logbuf_lock, flags);
+	did_flush = cont_flush();
+	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+	if (did_flush)
+		wake_up_klogd();
+}
+
+static void deferred_cont_flush(void)
+{
+	static DEFINE_TIMER(timer, flush_timer, 0, 0);
+
+	if (!cont.len)
 		return;
-	if (cont.cons) {
-		/*
-		 * If a fragment of this line was directly flushed to the
-		 * console; wait for the console to pick up the rest of the
-		 * line. LOG_NOCONS suppresses a duplicated output.
-		 */
-		log_store(cont.facility, cont.level, cont.flags | LOG_NOCONS,
-			  cont.ts_nsec, NULL, 0, cont.buf, cont.len);
-		cont.flushed = true;
-	} else {
-		/*
-		 * If no fragment of this line ever reached the console,
-		 * just submit it to the store and free the buffer.
-		 */
-		log_store(cont.facility, cont.level, cont.flags, 0,
-			  NULL, 0, cont.buf, cont.len);
-		cont.len = 0;
-	}
+	mod_timer(&timer, jiffies + HZ/10);
 }
 
 static bool cont_add(int facility, int level, enum log_flags flags, const char *text, size_t len)
 {
-	if (cont.len && cont.flushed)
-		return false;
-
 	/*
 	 * If ext consoles are present, flush and skip in-kernel
 	 * continuation.  See nr_ext_console_drivers definition.  Also, if
@@ -1706,8 +1656,6 @@ static bool cont_add(int facility, int level, enum log_flags flags, const char *
 		cont.owner = current;
 		cont.ts_nsec = local_clock();
 		cont.flags = flags;
-		cont.cons = 0;
-		cont.flushed = false;
 	}
 
 	memcpy(cont.buf + cont.len, text, len);
@@ -1726,34 +1674,6 @@ static bool cont_add(int facility, int level, enum log_flags flags, const char *
 	return true;
 }
 
-static size_t cont_print_text(char *text, size_t size)
-{
-	size_t textlen = 0;
-	size_t len;
-
-	if (cont.cons == 0 && (console_prev & LOG_NEWLINE)) {
-		textlen += print_time(cont.ts_nsec, text);
-		size -= textlen;
-	}
-
-	len = cont.len - cont.cons;
-	if (len > 0) {
-		if (len+1 > size)
-			len = size-1;
-		memcpy(text + textlen, cont.buf + cont.cons, len);
-		textlen += len;
-		cont.cons = cont.len;
-	}
-
-	if (cont.flushed) {
-		if (cont.flags & LOG_NEWLINE)
-			text[textlen++] = '\n';
-		/* got everything, release buffer */
-		cont.len = 0;
-	}
-	return textlen;
-}
-
 static size_t log_output(int facility, int level, enum log_flags lflags, const char *dict, size_t dictlen, char *text, size_t text_len)
 {
 	/*
@@ -1999,11 +1919,9 @@ static u64 syslog_seq;
 static u32 syslog_idx;
 static u64 console_seq;
 static u32 console_idx;
-static enum log_flags syslog_prev;
 static u64 log_first_seq;
 static u32 log_first_idx;
 static u64 log_next_seq;
-static enum log_flags console_prev;
 static struct cont {
 	size_t len;
 	size_t cons;
@@ -2015,17 +1933,16 @@ static char *log_dict(const struct printk_log *msg) { return NULL; }
 static struct printk_log *log_from_idx(u32 idx) { return NULL; }
 static u32 log_next(u32 idx) { return 0; }
 static ssize_t msg_print_ext_header(char *buf, size_t size,
-				    struct printk_log *msg, u64 seq,
-				    enum log_flags prev_flags) { return 0; }
+				    struct printk_log *msg,
+				    u64 seq) { return 0; }
 static ssize_t msg_print_ext_body(char *buf, size_t size,
 				  char *dict, size_t dict_len,
 				  char *text, size_t text_len) { return 0; }
 static void call_console_drivers(int level,
 				 const char *ext_text, size_t ext_len,
 				 const char *text, size_t len) {}
-static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
+static size_t msg_print_text(const struct printk_log *msg,
 			     bool syslog, char *buf, size_t size) { return 0; }
-static size_t cont_print_text(char *text, size_t size) { return 0; }
 static bool suppress_message_printing(int level) { return false; }
 
 /* Still needs to be defined for users */
@@ -2296,42 +2213,6 @@ static inline int can_use_console(void)
 	return cpu_online(raw_smp_processor_id()) || have_callable_console();
 }
 
-static void console_cont_flush(char *text, size_t size)
-{
-	unsigned long flags;
-	size_t len;
-
-	raw_spin_lock_irqsave(&logbuf_lock, flags);
-
-	if (!cont.len)
-		goto out;
-
-	if (suppress_message_printing(cont.level)) {
-		cont.cons = cont.len;
-		if (cont.flushed)
-			cont.len = 0;
-		goto out;
-	}
-
-	/*
-	 * We still queue earlier records, likely because the console was
-	 * busy. The earlier ones need to be printed before this one, we
-	 * did not flush any fragment so far, so just let it queue up.
-	 */
-	if (console_seq < log_next_seq && !cont.cons)
-		goto out;
-
-	len = cont_print_text(text, size);
-	raw_spin_unlock(&logbuf_lock);
-	stop_critical_timings();
-	call_console_drivers(cont.level, NULL, 0, text, len);
-	start_critical_timings();
-	local_irq_restore(flags);
-	return;
-out:
-	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
-}
-
 /**
  * console_unlock - unlock the console system
  *
@@ -2385,9 +2266,6 @@ void console_unlock(void)
 		return;
 	}
 
-	/* flush buffered message fragment immediately to console */
-	console_cont_flush(text, sizeof(text));
-
 	for (;;) {
 		struct printk_log *msg;
 		size_t ext_len = 0;
@@ -2407,7 +2285,6 @@ void console_unlock(void)
 			/* messages are gone, move to first one */
 			console_seq = log_first_seq;
 			console_idx = log_first_idx;
-			console_prev = 0;
 		} else {
 			len = 0;
 		}
@@ -2417,8 +2294,7 @@ void console_unlock(void)
 
 		msg = log_from_idx(console_idx);
 		level = msg->level;
-		if ((msg->flags & LOG_NOCONS) ||
-				suppress_message_printing(level)) {
+		if (suppress_message_printing(level)) {
 			/*
 			 * Skip record we have buffered and already printed
 			 * directly to the console when we received it, and
@@ -2426,22 +2302,14 @@ void console_unlock(void)
 			 */
 			console_idx = log_next(console_idx);
 			console_seq++;
-			/*
-			 * We will get here again when we register a new
-			 * CON_PRINTBUFFER console. Clear the flag so we
-			 * will properly dump everything later.
-			 */
-			msg->flags &= ~LOG_NOCONS;
-			console_prev = msg->flags;
 			goto skip;
 		}
 
-		len += msg_print_text(msg, console_prev, false,
-				      text + len, sizeof(text) - len);
+		len += msg_print_text(msg, false, text + len, sizeof(text) - len);
 		if (nr_ext_console_drivers) {
 			ext_len = msg_print_ext_header(ext_text,
 						sizeof(ext_text),
-						msg, console_seq, console_prev);
+						msg, console_seq);
 			ext_len += msg_print_ext_body(ext_text + ext_len,
 						sizeof(ext_text) - ext_len,
 						log_dict(msg), msg->dict_len,
@@ -2449,7 +2317,6 @@ void console_unlock(void)
 		}
 		console_idx = log_next(console_idx);
 		console_seq++;
-		console_prev = msg->flags;
 		raw_spin_unlock(&logbuf_lock);
 
 		stop_critical_timings();	/* don't trace print latency */
@@ -2483,7 +2350,7 @@ void console_unlock(void)
 	if (retry && console_trylock())
 		goto again;
 
-	if (wake_klogd)
+	if (wake_klogd || cont.len)
 		wake_up_klogd();
 }
 EXPORT_SYMBOL(console_unlock);
@@ -2744,7 +2611,6 @@ void register_console(struct console *newcon)
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
 		console_seq = syslog_seq;
 		console_idx = syslog_idx;
-		console_prev = syslog_prev;
 		raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 		/*
 		 * We're about to replay the log buffer.  Only do this to the
@@ -2883,6 +2749,7 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work)
 
 	if (pending & PRINTK_PENDING_WAKEUP)
 		wake_up_interruptible(&log_wait);
+	deferred_cont_flush();
 }
 
 static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
@@ -3095,7 +2962,7 @@ bool kmsg_dump_get_line_nolock(struct kmsg_dumper *dumper, bool syslog,
 		goto out;
 
 	msg = log_from_idx(dumper->cur_idx);
-	l = msg_print_text(msg, 0, syslog, line, size);
+	l = msg_print_text(msg, syslog, line, size);
 
 	dumper->cur_idx = log_next(dumper->cur_idx);
 	dumper->cur_seq++;
@@ -3165,7 +3032,6 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
 	u32 idx;
 	u64 next_seq;
 	u32 next_idx;
-	enum log_flags prev;
 	size_t l = 0;
 	bool ret = false;
 
@@ -3189,27 +3055,23 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
 	/* calculate length of entire buffer */
 	seq = dumper->cur_seq;
 	idx = dumper->cur_idx;
-	prev = 0;
 	while (seq < dumper->next_seq) {
 		struct printk_log *msg = log_from_idx(idx);
 
-		l += msg_print_text(msg, prev, true, NULL, 0);
+		l += msg_print_text(msg, true, NULL, 0);
 		idx = log_next(idx);
 		seq++;
-		prev = msg->flags;
 	}
 
 	/* move first record forward until length fits into the buffer */
 	seq = dumper->cur_seq;
 	idx = dumper->cur_idx;
-	prev = 0;
 	while (l > size && seq < dumper->next_seq) {
 		struct printk_log *msg = log_from_idx(idx);
 
-		l -= msg_print_text(msg, prev, true, NULL, 0);
+		l -= msg_print_text(msg, true, NULL, 0);
 		idx = log_next(idx);
 		seq++;
-		prev = msg->flags;
 	}
 
 	/* last message in next interation */
@@ -3220,10 +3082,9 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
 	while (seq < dumper->next_seq) {
 		struct printk_log *msg = log_from_idx(idx);
 
-		l += msg_print_text(msg, prev, syslog, buf + l, size - l);
+		l += msg_print_text(msg, syslog, buf + l, size - l);
 		idx = log_next(idx);
 		seq++;
-		prev = msg->flags;
 	}
 
 	dumper->next_seq = next_seq;

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-25  1:55                   ` Sergey Senozhatsky
  2016-10-25  2:06                     ` Linus Torvalds
@ 2016-10-25  2:24                     ` Sergey Senozhatsky
  1 sibling, 0 replies; 36+ messages in thread
From: Sergey Senozhatsky @ 2016-10-25  2:24 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Linus Torvalds, Sergey Senozhatsky, Joe Perches,
	Geert Uytterhoeven, Tetsuo Handa, Linux Kernel Mailing List,
	Petr Mladek, Tejun Heo, Calvin Owens, Steven Rostedt,
	Andrew Morton

On (10/25/16 10:55), Sergey Senozhatsky wrote:
> I think cont_flush() should grab the logbuf_lock lock, because
> it does log_store() and touches the cont.len. so something like
> this perhaps
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c7f490f..47f887c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1608,13 +1608,20 @@ static struct cont {
>  
>  static bool cont_flush(void)
>  {
> +       unsigned long flags;
> +       bool flushed = false;
> +
> +       raw_spin_lock_irqsave(&logbuf_lock, flags);
>         if (!cont.len)
> -               return false;
> +               goto out;
>  
>         log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
>                   NULL, 0, cont.buf, cont.len);
>         cont.len = 0;
> -       return true;
> +       flushed = true;
> +out:
> +       raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> +       return flushed;
>  }


... clearly, wasn't tested at all! sorry about that.
what I meant was cont_flush() from deferred_cont_flush()->flush_timer().


diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7f490f..bd7841c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1619,7 +1619,14 @@ static bool cont_flush(void)
 
 static void flush_timer(unsigned long data)
 {
-	if (cont_flush())
+	unsigned long flags;
+	bool flushed;
+
+	raw_spin_lock_irqsave(&logbuf_lock, flags);
+	flushed = cont_flush();
+	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+
+	if (flushed)
 		wake_up_klogd();
 }

 	-ss

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-25  2:22                       ` Linus Torvalds
@ 2016-10-25  4:06                         ` Sergey Senozhatsky
  2016-10-25  4:13                           ` Joe Perches
  2016-10-25  4:15                           ` Linus Torvalds
  2016-10-25 14:44                         ` Petr Mladek
  2016-11-09 15:47                         ` Petr Mladek
  2 siblings, 2 replies; 36+ messages in thread
From: Sergey Senozhatsky @ 2016-10-25  4:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Joe Perches,
	Geert Uytterhoeven, Tetsuo Handa, Linux Kernel Mailing List,
	Petr Mladek, Tejun Heo, Calvin Owens, Steven Rostedt,
	Andrew Morton

On (10/24/16 19:22), Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
> > <sergey.senozhatsky.work@gmail.com> wrote:
> >>
> >> I think cont_flush() should grab the logbuf_lock lock, because
> >> it does log_store() and touches the cont.len. so something like
> >> this perhaps
> >
> > Absolutely. Good catch.
> 
> Actually, you can't do it the way you did (inside cont_flush), because
> "cont_flush()" is already called with logbuf_lock held in most cases
> (see "cont_add()").

right. my bad, realized too late.

> So it's really just the timer function that needs to take the
> logbuf_lock before it calls cont_flush().

yes.

> So here's a new version. How does this look to you?

ok, looks much better.

there are several things that I want to mention here, just to make
sure we don't miss anything (just my 5 cents).

1) the way we dumpstack on x86 (at least on x86) is a spaghetti of
printk() and pr_cont() calls. for instance, arch/x86/kernel/dumpstack_64.c
show_regs() does pr_cont() to print out the registers, while the stack and
backtrace are printed with printk(). so, I assume, the backtrace now will
look a bit upside-down, because cont lines are printed with the delay.
correct?

2) flush on oops. not that panic printk is deadlock proof (not at all)
   but:
     a) rather unlikely, but what if BUG_ON() or panic() happens
      under lock_timer_base()?
     b) what if timer event never happens? (we are in panic, who knows)

  so how about skipping mod_timer in deferred_cont_flush() and just
  cont_flush() when we are in oops? here is probably one more thing we
  need to "fix" first. oops_in_progress is unreliable. x86 oops_end()
  does bust_spinlocks(0) before it calls panic(). panic() increments
  oops_in_progress but decrements it back to 0 (bust_spinlocks(0)) before
  it does console_flush_on_panic(). so there is (almost) no way
  console_flush_on_panic() can see oops_in_progress != 0.


diff --git a/kernel/panic.c b/kernel/panic.c
index e6480e2..8e17540 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -228,7 +228,6 @@ void panic(const char *fmt, ...)
        if (_crash_kexec_post_notifiers)
                __crash_kexec(NULL);
 
-       bust_spinlocks(0);
 
        /*
         * We may have ended up stopping the CPU holding the lock (in
@@ -240,6 +239,7 @@ void panic(const char *fmt, ...)
         */
        debug_locks_off();
        console_flush_on_panic();
+       bust_spinlocks(0);
 
        if (!panic_blink)
                panic_blink = no_blink;
---

	-ss

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-25  4:06                         ` Sergey Senozhatsky
@ 2016-10-25  4:13                           ` Joe Perches
  2016-10-25  4:15                           ` Linus Torvalds
  1 sibling, 0 replies; 36+ messages in thread
From: Joe Perches @ 2016-10-25  4:13 UTC (permalink / raw)
  To: Sergey Senozhatsky, Linus Torvalds
  Cc: Sergey Senozhatsky, Geert Uytterhoeven, Tetsuo Handa,
	Linux Kernel Mailing List, Petr Mladek, Tejun Heo, Calvin Owens,
	Steven Rostedt, Andrew Morton

On Tue, 2016-10-25 at 13:06 +0900, Sergey Senozhatsky wrote:
>   so how about skipping mod_timer in deferred_cont_flush() and just
>   cont_flush() when we are in oops? here is probably one more thing we
>   need to "fix" first. oops_in_progress is unreliable. x86 oops_end()
>   does bust_spinlocks(0) before it calls panic(). panic() increments
>   oops_in_progress but decrements it back to 0 (bust_spinlocks(0)) before
>   it does console_flush_on_panic(). so there is (almost) no way
>   console_flush_on_panic() can see oops_in_progress != 0.

I think Linus' insistence on not requiring EOL/newline is misguided.
It'd be simpler to avoid the flushes and have a require EOL/newline.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-25  4:06                         ` Sergey Senozhatsky
  2016-10-25  4:13                           ` Joe Perches
@ 2016-10-25  4:15                           ` Linus Torvalds
  2016-10-25  4:44                             ` Sergey Senozhatsky
  1 sibling, 1 reply; 36+ messages in thread
From: Linus Torvalds @ 2016-10-25  4:15 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Joe Perches, Geert Uytterhoeven,
	Tetsuo Handa, Linux Kernel Mailing List, Petr Mladek, Tejun Heo,
	Calvin Owens, Steven Rostedt, Andrew Morton

On Mon, Oct 24, 2016 at 9:06 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
>
> 1) the way we dumpstack on x86 (at least on x86) is a spaghetti of
> printk() and pr_cont() calls. for instance, arch/x86/kernel/dumpstack_64.c
> show_regs() does pr_cont() to print out the registers, while the stack and
> backtrace are printed with printk(). so, I assume, the backtrace now will
> look a bit upside-down, because cont lines are printed with the delay.
> correct?

No. Most cont lines never hit the delay, because when the line is
completed, it is flushed (and then printed synchronously, assuming it
can get the console lock).

So the timeout only ever comes into effect if the line isn't completed
in time at all. Which is actually very rare, and never happens for the
"let's print things out in multiple chinks because we're using a
loop".

Similarly, if a new printk() happens due to interleaving, the previous
buffered line is always flushed first, so buffering never causes
out-of-order behavior.

Basically, the only time the timer actually does anything is if
something just does a printk() without a newline, and no other
printouts happen for the next 0.1s.

> 2) flush on oops.

Again, really not an issue for the exact reason above: nothing is ever
buffered when something new is printed.

And all you need to guarantee that last line of the oops itself is
printed is that it has a newline.

So again, the timer only matters for the exceptional case, not for the
normal situation. It's literally there to guarantee basic timeliness.

         Linus

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-25  4:15                           ` Linus Torvalds
@ 2016-10-25  4:44                             ` Sergey Senozhatsky
  0 siblings, 0 replies; 36+ messages in thread
From: Sergey Senozhatsky @ 2016-10-25  4:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Joe Perches,
	Geert Uytterhoeven, Tetsuo Handa, Linux Kernel Mailing List,
	Petr Mladek, Tejun Heo, Calvin Owens, Steven Rostedt,
	Andrew Morton

On (10/24/16 21:15), Linus Torvalds wrote:
[..]
> No. Most cont lines never hit the delay, because when the line is
> completed, it is flushed (and then printed synchronously, assuming it
> can get the console lock).
> 
> So the timeout only ever comes into effect if the line isn't completed
> in time at all. Which is actually very rare, and never happens for the
> "let's print things out in multiple chinks because we're using a
> loop".
> 
> Similarly, if a new printk() happens due to interleaving, the previous
> buffered line is always flushed first, so buffering never causes
> out-of-order behavior.

thanks. the patch works fine on my x86 box.

> Basically, the only time the timer actually does anything is if
> something just does a printk() without a newline, and no other
> printouts happen for the next 0.1s.

ok. perhaps, like slow serial console. will test on arm board later.

	-ss

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-23 18:11     ` Linus Torvalds
  2016-10-23 19:06       ` Joe Perches
@ 2016-10-25 14:42       ` Steven Rostedt
  1 sibling, 0 replies; 36+ messages in thread
From: Steven Rostedt @ 2016-10-25 14:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Geert Uytterhoeven, Tetsuo Handa, Linux Kernel Mailing List

On Sun, Oct 23, 2016 at 11:11:18AM -0700, Linus Torvalds wrote:
> 
> For example, one of the really historical uses for partial lines is this:
> 
>        pr_info("Checking 'hlt' instruction... ");
> 
>        if (!boot_cpu_data.hlt_works_ok) {
>                pr_cont("disabled\n");
>                return;
>        }
>        halt();
>        halt();
>        halt();
>        halt();
>        pr_cont("OK\n");
> 
> and the point was that there used to be some really old i386 machines
> that hung on the "hlt" instruction (probably not because of a CPU bug,
> but because of either power supply issues or some DMA issues).
> 
> To support that, we really *had* to print out the continuation lines
> even when they were partial. And that complicates the printk logic a
> lot.

Note, my ftrace start up tests still does exactly this (e.g.):

	pr_info("Testing dynamic ftrace ops: #%d: ", cnt);

	[ do lots of testing ]

	printk(KERN_CONT "PASSED\n");

Previously a change was made to buffer lines without \n, and I wasted an
entire day debugging why a crash happened because I was looking at the wrong
test.

-- Steve

> 
> Now, that "hlt" case is long long gone, and maybe we should just say
> "screw that". It would be really quite easy to say "we don't print out
> continuation lines immediately, we just buffer them for 0.1s instead,
> and KERN_CONT only works for things that really happen more or less
> immediately".
> 
> Maybe that really is the right answer. Because the original cause of
> us having to bend over backwards in this case is really no longer
> there. And it would simplify printk a *lot*.
> 
> Let me whip up a minimal patch for you to try.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-25  2:22                       ` Linus Torvalds
  2016-10-25  4:06                         ` Sergey Senozhatsky
@ 2016-10-25 14:44                         ` Petr Mladek
  2016-11-09 15:47                         ` Petr Mladek
  2 siblings, 0 replies; 36+ messages in thread
From: Petr Mladek @ 2016-10-25 14:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Joe Perches,
	Geert Uytterhoeven, Tetsuo Handa, Linux Kernel Mailing List,
	Tejun Heo, Calvin Owens, Steven Rostedt, Andrew Morton

On Mon 2016-10-24 19:22:59, Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
> > <sergey.senozhatsky.work@gmail.com> wrote:
> >>
> >> I think cont_flush() should grab the logbuf_lock lock, because
> >> it does log_store() and touches the cont.len. so something like
> >> this perhaps
> >
> > Absolutely. Good catch.
> 
> Actually, you can't do it the way you did (inside cont_flush), because
> "cont_flush()" is already called with logbuf_lock held in most cases
> (see "cont_add()").
> 
> So it's really just the timer function that needs to take the
> logbuf_lock before it calls cont_flush().
> 
> So here's a new version. How does this look to you?
> 
> Again, this still tests "cont.len" outside the lock (not just in
> console_unlock(), but also in deferred_cont_flush()). And it's fine:
> even if it sees the "wrong" value due to some race, it does so either
> because cont.len was just set to non-zero (and whoever set it will
> force the re-check anyway), or it got cleared just as it was tested
> (and at worst you end up with an extra timer invocation).
> 
> +static void flush_timer(unsigned long data)
> +{
> +	unsigned long flags;
> +	bool did_flush;
> +
> +	raw_spin_lock_irqsave(&logbuf_lock, flags);
> +	did_flush = cont_flush();
> +	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> +	if (did_flush)
> +		wake_up_klogd();

We wake only klogd and syslog but not console here. Same problem is
also with some other recently added cont_flush() calls, e.g. in
kmsg_dump(), kmsg_dump_get_line().

BTW: It should be safe to call wake_up_klogd() in the locked area.
It just modifies a per-CPU variable and queues IRQ work.

I have played with it a bit and the result is below. Feel free
to just use the pieces or the idea if you like a part of it.


>From 226c4bfbe9f81bdd3b91b6b0ff3e268b1c09085f Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Tue, 25 Oct 2016 15:23:13 +0200
Subject: [PATCH] printk: Poke console when cont buffer is flushed

The commit bfd8d3f23b51018388be041 ("printk: make reading the kernel
log flush pending lines") allows to add new message into the log
buffer without flushing it to the console.

This patch reworks wake_up_klogd() to a generic printk_poke().
It pokes the loggers or console or both where necessary.

I have renamed some some variables and functions. I am not
if it is better readable now. Anyway, the old names came
from times before printk_deferred() when the irq work
handled only klogd.

I am not super happy the new state because the irq work is
called from many locations and there are cycles.
The cycle breaks when cont.len is zero but it might
be worth some clean up. For example, deferred_cont_flush()
call in printk_poke_func() looks superfluous. But
I wanted to be rather conservative for the moment.

Also I wonder if we really need per-CPU variable for
the printk_pending/printk_need_poke flag. A global
variable might help to reduce some calls and it would
not complicate the code.

This patch is based on top of the fix proposed at
https://lkml.kernel.org/r/CA+55aFwKYnrMJr_vSE+GfDGszeUGyd=CPUD15-zZ8yWQW61GBA@mail.gmail.com

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/printk.h |  7 ++++--
 kernel/printk/printk.c | 67 ++++++++++++++++++++++++++++----------------------
 lib/bust_spinlocks.c   |  2 +-
 3 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index eac1af8502bb..6fdced13d2c9 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -31,6 +31,9 @@ static inline const char *printk_skip_level(const char *buffer)
 	return buffer;
 }
 
+#define PRINTK_POKE_CONSOLE	0x01
+#define PRINTK_POKE_LOGGERS	0x02
+
 #define CONSOLE_EXT_LOG_MAX	8192
 
 /* printk's without a loglevel use this.. */
@@ -185,7 +188,7 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
 devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, void __user *buf,
 			  size_t *lenp, loff_t *ppos);
 
-extern void wake_up_klogd(void);
+extern void printk_poke(int poke);
 
 char *log_buf_addr_get(void);
 u32 log_buf_len_get(void);
@@ -220,7 +223,7 @@ static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies,
 	return false;
 }
 
-static inline void wake_up_klogd(void)
+static inline void printk_poke(int poke)
 {
 }
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e63aa679614e..2c231bb73744 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1620,13 +1620,11 @@ static bool cont_flush(void)
 static void flush_timer(unsigned long data)
 {
 	unsigned long flags;
-	bool did_flush;
 
 	raw_spin_lock_irqsave(&logbuf_lock, flags);
-	did_flush = cont_flush();
+	if (cont_flush())
+		printk_poke(PRINTK_POKE_CONSOLE | PRINTK_POKE_LOGGERS);
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
-	if (did_flush)
-		wake_up_klogd();
 }
 
 static void deferred_cont_flush(void)
@@ -2233,7 +2231,7 @@ void console_unlock(void)
 	static char text[LOG_LINE_MAX + PREFIX_MAX];
 	static u64 seen_seq;
 	unsigned long flags;
-	bool wake_klogd = false;
+	bool poke_loggers = false;
 	bool do_cond_resched, retry;
 
 	if (console_suspended) {
@@ -2274,7 +2272,7 @@ void console_unlock(void)
 
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
 		if (seen_seq != log_next_seq) {
-			wake_klogd = true;
+			poke_loggers = true;
 			seen_seq = log_next_seq;
 		}
 
@@ -2350,8 +2348,8 @@ void console_unlock(void)
 	if (retry && console_trylock())
 		goto again;
 
-	if (wake_klogd || cont.len)
-		wake_up_klogd();
+	if (poke_loggers || cont.len)
+		printk_poke(PRINTK_POKE_LOGGERS);
 }
 EXPORT_SYMBOL(console_unlock);
 
@@ -2730,40 +2728,45 @@ static int __init printk_late_init(void)
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
+ * Poke console and loggers via IRQ work so that it can be done
+ * from any context.
  */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
+static DEFINE_PER_CPU(int, printk_need_poke);
 
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
+static void printk_poke_func(struct irq_work *irq_work)
 {
-	int pending = __this_cpu_xchg(printk_pending, 0);
+	int poke = __this_cpu_xchg(printk_need_poke, 0);
 
-	if (pending & PRINTK_PENDING_OUTPUT) {
+	if (poke & PRINTK_POKE_CONSOLE) {
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
 	}
 
-	if (pending & PRINTK_PENDING_WAKEUP)
+	if (poke & PRINTK_POKE_LOGGERS)
 		wake_up_interruptible(&log_wait);
+
 	deferred_cont_flush();
 }
 
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
+static DEFINE_PER_CPU(struct irq_work, printk_poke_work) = {
+	.func = printk_poke_func,
 	.flags = IRQ_WORK_LAZY,
 };
 
-void wake_up_klogd(void)
+void printk_poke(int poke)
 {
 	preempt_disable();
-	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	}
+
+	if (poke & PRINTK_POKE_CONSOLE)
+		this_cpu_or(printk_need_poke, PRINTK_POKE_CONSOLE);
+
+	if ((poke & PRINTK_POKE_LOGGERS) && waitqueue_active(&log_wait))
+		this_cpu_or(printk_need_poke, PRINTK_POKE_LOGGERS);
+
+	if (this_cpu_read(printk_need_poke))
+		irq_work_queue(this_cpu_ptr(&printk_poke_work));
+
 	preempt_enable();
 }
 
@@ -2777,8 +2780,7 @@ int printk_deferred(const char *fmt, ...)
 	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
 	va_end(args);
 
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	printk_poke(PRINTK_POKE_CONSOLE);
 	preempt_enable();
 
 	return r;
@@ -2893,6 +2895,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 {
 	struct kmsg_dumper *dumper;
 	unsigned long flags;
+	int did_flush = 0;
 
 	if ((reason > KMSG_DUMP_OOPS) && !always_kmsg_dump)
 		return;
@@ -2906,7 +2909,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 		dumper->active = true;
 
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
-		cont_flush();
+		did_flush |= cont_flush();
 		dumper->cur_seq = clear_seq;
 		dumper->cur_idx = clear_idx;
 		dumper->next_seq = log_next_seq;
@@ -2920,6 +2923,9 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 		dumper->active = false;
 	}
 	rcu_read_unlock();
+
+	if (did_flush)
+		printk_poke(PRINTK_POKE_CONSOLE);
 }
 
 /**
@@ -2997,7 +3003,8 @@ bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool syslog,
 	bool ret;
 
 	raw_spin_lock_irqsave(&logbuf_lock, flags);
-	cont_flush();
+	if (cont_flush())
+		printk_poke(PRINTK_POKE_CONSOLE);
 	ret = kmsg_dump_get_line_nolock(dumper, syslog, line, size, len);
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
@@ -3039,7 +3046,9 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
 		goto out;
 
 	raw_spin_lock_irqsave(&logbuf_lock, flags);
-	cont_flush();
+	if (cont_flush())
+		printk_poke(PRINTK_POKE_CONSOLE);
+
 	if (dumper->cur_seq < log_first_seq) {
 		/* messages are gone, move to first available one */
 		dumper->cur_seq = log_first_seq;
diff --git a/lib/bust_spinlocks.c b/lib/bust_spinlocks.c
index f8e0e5367398..2bd31977058f 100644
--- a/lib/bust_spinlocks.c
+++ b/lib/bust_spinlocks.c
@@ -26,6 +26,6 @@ void __attribute__((weak)) bust_spinlocks(int yes)
 #endif
 		console_unblank();
 		if (--oops_in_progress == 0)
-			wake_up_klogd();
+			printk_poke(PRINTK_POKE_LOGGERS);
 	}
 }
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: linux.git: printk() problem
  2016-10-25  2:22                       ` Linus Torvalds
  2016-10-25  4:06                         ` Sergey Senozhatsky
  2016-10-25 14:44                         ` Petr Mladek
@ 2016-11-09 15:47                         ` Petr Mladek
  2 siblings, 0 replies; 36+ messages in thread
From: Petr Mladek @ 2016-11-09 15:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Joe Perches,
	Geert Uytterhoeven, Tetsuo Handa, Linux Kernel Mailing List,
	Tejun Heo, Calvin Owens, Steven Rostedt, Andrew Morton

On Mon 2016-10-24 19:22:59, Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:06 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Mon, Oct 24, 2016 at 6:55 PM, Sergey Senozhatsky
> > <sergey.senozhatsky.work@gmail.com> wrote:
> >>
> >> I think cont_flush() should grab the logbuf_lock lock, because
> >> it does log_store() and touches the cont.len. so something like
> >> this perhaps
> >
> > Absolutely. Good catch.
> 
> Actually, you can't do it the way you did (inside cont_flush), because
> "cont_flush()" is already called with logbuf_lock held in most cases
> (see "cont_add()").
> 
> So it's really just the timer function that needs to take the
> logbuf_lock before it calls cont_flush().
> 
> So here's a new version. How does this look to you?
> 
> Again, this still tests "cont.len" outside the lock (not just in
> console_unlock(), but also in deferred_cont_flush()). And it's fine:
> even if it sees the "wrong" value due to some race, it does so either
> because cont.len was just set to non-zero (and whoever set it will
> force the re-check anyway), or it got cleared just as it was tested
> (and at worst you end up with an extra timer invocation).

This patch really seems to reduce the number of too-early flushed
continuous lines. It reduces the scattered output. But I am not sure
if we want to add a timer code into the printk calls at this stage
(for 4.9-rc5).

Well, the patch looks fine, except that we call cont_flush() without
poking console. It is not a regression because only newlines triggered
console in the past and they still do but...

I would suggest to revert the commit bfd8d3f23b51018388be
("printk: make reading the kernel log flush pending lines") for
4.9. Then we could  test/fix it properly for 4.10. Me and Sergey would
happily help with it.

Just in case, you still want to commit this patch. I would suggest
to apply the one below on top.


>From 7a0ad7ce2347346fc81872fe42a95ad5dfba4098 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Tue, 25 Oct 2016 15:23:13 +0200
Subject: [PATCH] printk: Poke console and other loggers when cont buffer is
 flushed

The commit bfd8d3f23b51018388be041 ("printk: make reading the kernel
log flush pending lines") allows to add new message into the log
buffer without flushing it to the console and waking other loggers.

This patch adds wake_up_console() and calls it when the cont
buffer is flushed. The function name will make more sense once
we switch to the async printk.

Note that it is enough to poke console. The other loggers
are waken from console_unlock() when there is a new message.

The patch also renames PRINTK_PENDING flags and wake_up_klogd_work*
stuff to reduce confusion. First, there are more possible log
daemons, e.g. klogd, syslogd. Second, the word "console" is more
descriptive than "output".

This patch is based on top of the fix proposed at
https://lkml.kernel.org/r/CA+55aFwKYnrMJr_vSE+GfDGszeUGyd=CPUD15-zZ8yWQW61GBA@mail.gmail.com

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 60 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e63aa679614e..f0e72de6ddbc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -780,6 +780,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
 }
 
 static void deferred_cont_flush(void);
+static void __wake_up_console(void);
 
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
 			    size_t count, loff_t *ppos)
@@ -1620,13 +1621,11 @@ static bool cont_flush(void)
 static void flush_timer(unsigned long data)
 {
 	unsigned long flags;
-	bool did_flush;
 
 	raw_spin_lock_irqsave(&logbuf_lock, flags);
-	did_flush = cont_flush();
+	if (cont_flush())
+		__wake_up_console();
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
-	if (did_flush)
-		wake_up_klogd();
 }
 
 static void deferred_cont_flush(void)
@@ -2730,40 +2729,57 @@ static int __init printk_late_init(void)
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
+ * Handle console and wakeup loggers via IRQ work so that it can be done
+ * from any context.
  */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
+#define PRINTK_PENDING_LOGGERS	0x01
+#define PRINTK_PENDING_CONSOLE	0x02
 
 static DEFINE_PER_CPU(int, printk_pending);
 
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
+static void printk_pending_func(struct irq_work *irq_work)
 {
 	int pending = __this_cpu_xchg(printk_pending, 0);
 
-	if (pending & PRINTK_PENDING_OUTPUT) {
+	if (pending & PRINTK_PENDING_CONSOLE) {
 		/* If trylock fails, someone else is doing the printing */
 		if (console_trylock())
 			console_unlock();
 	}
 
-	if (pending & PRINTK_PENDING_WAKEUP)
+	if (pending & PRINTK_PENDING_LOGGERS)
 		wake_up_interruptible(&log_wait);
+
 	deferred_cont_flush();
 }
 
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
+static DEFINE_PER_CPU(struct irq_work, printk_pending_work) = {
+	.func = printk_pending_func,
 	.flags = IRQ_WORK_LAZY,
 };
 
 void wake_up_klogd(void)
 {
 	preempt_disable();
+
 	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		this_cpu_or(printk_pending, PRINTK_PENDING_LOGGERS);
+		irq_work_queue(this_cpu_ptr(&printk_pending_work));
 	}
+
+	preempt_enable();
+}
+
+static void __wake_up_console(void)
+{
+	this_cpu_or(printk_pending, PRINTK_PENDING_CONSOLE);
+	irq_work_queue(this_cpu_ptr(&printk_pending_work));
+}
+
+static void wake_up_console(void)
+{
+	preempt_disable();
+	__wake_up_console();
 	preempt_enable();
 }
 
@@ -2777,8 +2793,7 @@ int printk_deferred(const char *fmt, ...)
 	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
 	va_end(args);
 
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	__wake_up_console();
 	preempt_enable();
 
 	return r;
@@ -2893,6 +2908,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 {
 	struct kmsg_dumper *dumper;
 	unsigned long flags;
+	int did_flush = 0;
 
 	if ((reason > KMSG_DUMP_OOPS) && !always_kmsg_dump)
 		return;
@@ -2906,7 +2922,7 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 		dumper->active = true;
 
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
-		cont_flush();
+		did_flush |= cont_flush();
 		dumper->cur_seq = clear_seq;
 		dumper->cur_idx = clear_idx;
 		dumper->next_seq = log_next_seq;
@@ -2920,6 +2936,9 @@ void kmsg_dump(enum kmsg_dump_reason reason)
 		dumper->active = false;
 	}
 	rcu_read_unlock();
+
+	if (did_flush)
+		wake_up_console();
 }
 
 /**
@@ -2997,7 +3016,8 @@ bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool syslog,
 	bool ret;
 
 	raw_spin_lock_irqsave(&logbuf_lock, flags);
-	cont_flush();
+	if (cont_flush())
+		__wake_up_console();
 	ret = kmsg_dump_get_line_nolock(dumper, syslog, line, size, len);
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 
@@ -3039,7 +3059,9 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
 		goto out;
 
 	raw_spin_lock_irqsave(&logbuf_lock, flags);
-	cont_flush();
+	if (cont_flush())
+		__wake_up_console();
+
 	if (dumper->cur_seq < log_first_seq) {
 		/* messages are gone, move to first available one */
 		dumper->cur_seq = log_first_seq;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2016-11-09 15:47 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-12 13:30 linux.git: printk() problem Tetsuo Handa
2016-10-12 14:35 ` Michal Hocko
2016-10-12 16:08   ` Joe Perches
2016-10-13  6:26     ` Michal Hocko
2016-10-13  9:29       ` Joe Perches
2016-10-13 10:04         ` Michal Hocko
2016-10-13 10:20           ` Joe Perches
2016-10-13 11:06             ` Michal Hocko
2016-10-12 15:47 ` Linus Torvalds
2016-10-12 16:16   ` Joe Perches
2016-10-12 16:54     ` Linus Torvalds
2016-10-12 18:50       ` [PATCH] acpi_os_vprintf: Use printk_get_level() to avoid unnecessary KERN_CONT Joe Perches
2016-10-13 21:59         ` Rafael J. Wysocki
2016-10-23  9:22   ` linux.git: printk() problem Geert Uytterhoeven
2016-10-23 18:11     ` Linus Torvalds
2016-10-23 19:06       ` Joe Perches
2016-10-23 19:32         ` Linus Torvalds
2016-10-23 19:46           ` Linus Torvalds
2016-10-24 11:15             ` Geert Uytterhoeven
2016-10-24 14:08             ` Sergey Senozhatsky
2016-10-24 14:23               ` Sergey Senozhatsky
2016-10-24 17:54               ` Linus Torvalds
2016-10-24 17:55                 ` Linus Torvalds
2016-10-25  1:55                   ` Sergey Senozhatsky
2016-10-25  2:06                     ` Linus Torvalds
2016-10-25  2:22                       ` Linus Torvalds
2016-10-25  4:06                         ` Sergey Senozhatsky
2016-10-25  4:13                           ` Joe Perches
2016-10-25  4:15                           ` Linus Torvalds
2016-10-25  4:44                             ` Sergey Senozhatsky
2016-10-25 14:44                         ` Petr Mladek
2016-11-09 15:47                         ` Petr Mladek
2016-10-25  2:24                     ` Sergey Senozhatsky
2016-10-23 20:33           ` Joe Perches
2016-10-23 21:13             ` Linus Torvalds
2016-10-25 14:42       ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.