All of lore.kernel.org
 help / color / mirror / Atom feed
* + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
@ 2014-05-02 21:22 akpm
  2014-05-02 22:46 ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: akpm @ 2014-05-02 21:22 UTC (permalink / raw)
  To: mm-commits, peterz, kay, jack, will.deacon

Subject: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
To: will.deacon@arm.com,jack@suse.cz,kay@vrfy.org,peterz@infradead.org
From: akpm@linux-foundation.org
Date: Fri, 02 May 2014 14:22:20 -0700


The patch titled
     Subject: printk: print initial logbuf contents before re-enabling interrupts
has been added to the -mm tree.  Its filename is
     printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Will Deacon <will.deacon@arm.com>
Subject: printk: print initial logbuf contents before re-enabling interrupts

When running on a hideously slow system (~10Mhz FPGA) with a bunch of
debug printk invocations on the timer interrupt path, we end up filling
the log buffer faster than we can drain it.

The reason is that console_unlock (which is responsible for moving
messages out of logbuf to hand over to the console driver) removes one
message at a time, briefly re-enabling interrupts between each of them. 
If the interrupt path prints more than a single message, then we can
easily generate more messages than we can print for a regular, recurring
interrupt (e.g.  a 1khz timer).  This results in messages getting silently
dropped, leading to counter-intuitive, incomplete printk traces on the
console.

Rather than run the console_unlock loop with interrupts disabled (which
has obvious latency problems), this patch records the sequence number of
the last message in the log buffer after taking the logbuf_lock.  We can
then print this fixed amount of work before re-enabling interrupts again,
making sure we keep up with ourself.  Other CPUs could still potentially
flood the buffer, but there's little that we can do to protect against
that.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/printk/printk.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff -puN kernel/printk/printk.c~printk-print-initial-logbuf-contents-before-re-enabling-interrupts kernel/printk/printk.c
--- a/kernel/printk/printk.c~printk-print-initial-logbuf-contents-before-re-enabling-interrupts
+++ a/kernel/printk/printk.c
@@ -2147,10 +2147,13 @@ void console_unlock(void)
 again:
 	for (;;) {
 		struct printk_log *msg;
+		u64 console_end_seq;
 		size_t len;
 		int level;
 
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
+		console_end_seq = log_next_seq;
+again_noirq:
 		if (seen_seq != log_next_seq) {
 			wake_klogd = true;
 			seen_seq = log_next_seq;
@@ -2195,6 +2198,12 @@ skip:
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(level, text, len);
 		start_critical_timings();
+
+		if (console_seq < console_end_seq) {
+			raw_spin_lock(&logbuf_lock);
+			goto again_noirq;
+		}
+
 		local_irq_restore(flags);
 	}
 	console_locked = 0;
_

Patches currently in -mm which might be from will.deacon@arm.com are

origin.patch
printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch
printk-report-dropping-of-messages-from-logbuf.patch
documentation-devicetree-bindings-add-documentation-for-the-apm-x-gene-soc-rtc-dts-binding.patch
drivers-rtc-add-apm-x-gene-soc-rtc-driver.patch
arm64-add-apm-x-gene-soc-rtc-dts-entry.patch
linux-next.patch


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-02 21:22 + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree akpm
@ 2014-05-02 22:46 ` Jan Kara
  2014-05-06 12:06   ` Will Deacon
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2014-05-02 22:46 UTC (permalink / raw)
  To: will.deacon; +Cc: mm-commits, peterz, kay, jack, LKML, Andrew Morton

On Fri 02-05-14 14:22:20, Andrew Morton wrote:
> From: Will Deacon <will.deacon@arm.com>
> Subject: printk: print initial logbuf contents before re-enabling interrupts
> 
> When running on a hideously slow system (~10Mhz FPGA) with a bunch of
> debug printk invocations on the timer interrupt path, we end up filling
> the log buffer faster than we can drain it.
> 
> The reason is that console_unlock (which is responsible for moving
> messages out of logbuf to hand over to the console driver) removes one
> message at a time, briefly re-enabling interrupts between each of them. 
> If the interrupt path prints more than a single message, then we can
> easily generate more messages than we can print for a regular, recurring
> interrupt (e.g.  a 1khz timer).  This results in messages getting silently
> dropped, leading to counter-intuitive, incomplete printk traces on the
> console.
> 
> Rather than run the console_unlock loop with interrupts disabled (which
> has obvious latency problems), this patch records the sequence number of
> the last message in the log buffer after taking the logbuf_lock.  We can
> then print this fixed amount of work before re-enabling interrupts again,
> making sure we keep up with ourself.  Other CPUs could still potentially
> flood the buffer, but there's little that we can do to protect against
> that.
  I really dislike this patch. It goes completely against my efforts of
lowering irq latency caused by printing to console (which are the
problems I have observed ;). My opinion is that when you are printing from
each and every interrupt which happens so often, then you have a problem and
disabling IRQs in printk so that your interrupt doesn't happen that often
seems like a poor solution to me. You could as well just ratelimit your
debug messages, couldn't you?

								Honza

> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Kay Sievers <kay@vrfy.org>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  kernel/printk/printk.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff -puN kernel/printk/printk.c~printk-print-initial-logbuf-contents-before-re-enabling-interrupts kernel/printk/printk.c
> --- a/kernel/printk/printk.c~printk-print-initial-logbuf-contents-before-re-enabling-interrupts
> +++ a/kernel/printk/printk.c
> @@ -2147,10 +2147,13 @@ void console_unlock(void)
>  again:
>  	for (;;) {
>  		struct printk_log *msg;
> +		u64 console_end_seq;
>  		size_t len;
>  		int level;
>  
>  		raw_spin_lock_irqsave(&logbuf_lock, flags);
> +		console_end_seq = log_next_seq;
> +again_noirq:
>  		if (seen_seq != log_next_seq) {
>  			wake_klogd = true;
>  			seen_seq = log_next_seq;
> @@ -2195,6 +2198,12 @@ skip:
>  		stop_critical_timings();	/* don't trace print latency */
>  		call_console_drivers(level, text, len);
>  		start_critical_timings();
> +
> +		if (console_seq < console_end_seq) {
> +			raw_spin_lock(&logbuf_lock);
> +			goto again_noirq;
> +		}
> +
>  		local_irq_restore(flags);
>  	}
>  	console_locked = 0;
> _
> 
> Patches currently in -mm which might be from will.deacon@arm.com are
> 
> origin.patch
> printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch
> printk-report-dropping-of-messages-from-logbuf.patch
> documentation-devicetree-bindings-add-documentation-for-the-apm-x-gene-soc-rtc-dts-binding.patch
> drivers-rtc-add-apm-x-gene-soc-rtc-driver.patch
> arm64-add-apm-x-gene-soc-rtc-dts-entry.patch
> linux-next.patch
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-02 22:46 ` Jan Kara
@ 2014-05-06 12:06   ` Will Deacon
  2014-05-06 12:29     ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2014-05-06 12:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: mm-commits, peterz, kay, LKML, Andrew Morton

Hello,

On Fri, May 02, 2014 at 11:46:51PM +0100, Jan Kara wrote:
> On Fri 02-05-14 14:22:20, Andrew Morton wrote:
> > From: Will Deacon <will.deacon@arm.com>
> > Subject: printk: print initial logbuf contents before re-enabling interrupts
> > 
> > When running on a hideously slow system (~10Mhz FPGA) with a bunch of
> > debug printk invocations on the timer interrupt path, we end up filling
> > the log buffer faster than we can drain it.
> > 
> > The reason is that console_unlock (which is responsible for moving
> > messages out of logbuf to hand over to the console driver) removes one
> > message at a time, briefly re-enabling interrupts between each of them. 
> > If the interrupt path prints more than a single message, then we can
> > easily generate more messages than we can print for a regular, recurring
> > interrupt (e.g.  a 1khz timer).  This results in messages getting silently
> > dropped, leading to counter-intuitive, incomplete printk traces on the
> > console.
> > 
> > Rather than run the console_unlock loop with interrupts disabled (which
> > has obvious latency problems), this patch records the sequence number of
> > the last message in the log buffer after taking the logbuf_lock.  We can
> > then print this fixed amount of work before re-enabling interrupts again,
> > making sure we keep up with ourself.  Other CPUs could still potentially
> > flood the buffer, but there's little that we can do to protect against
> > that.
>   I really dislike this patch. It goes completely against my efforts of
> lowering irq latency caused by printing to console (which are the
> problems I have observed ;).

Hmmm, what makes you think that? Interrupts only remain disabled whilst we
process the backlog, which in the usual case should be pretty small. We also
hold the logbuf_lock during this time, so we can't get stuck in an unbounded
loop.

Can you elaborate a bit more on the problems you've observed, please?

> My opinion is that when you are printing from each and every interrupt
> which happens so often, then you have a problem and disabling IRQs in
> printk so that your interrupt doesn't happen that often seems like a poor
> solution to me. You could as well just ratelimit your debug messages,
> couldn't you?

My use-case was basically using printk as a debug trace during early boot
when bringing up Linux on a new CPU core. I don't think ratelimiting would
be the right thing there, since I really did want as many messages to
reach the console as possible (which is also why I wrote this patch, not
just the other one in the series).

Cheers,

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-06 12:06   ` Will Deacon
@ 2014-05-06 12:29     ` Jan Kara
  2014-05-06 13:12       ` Will Deacon
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2014-05-06 12:29 UTC (permalink / raw)
  To: Will Deacon; +Cc: Jan Kara, mm-commits, peterz, kay, LKML, Andrew Morton

On Tue 06-05-14 13:06:48, Will Deacon wrote:
> Hello,
> 
> On Fri, May 02, 2014 at 11:46:51PM +0100, Jan Kara wrote:
> > On Fri 02-05-14 14:22:20, Andrew Morton wrote:
> > > From: Will Deacon <will.deacon@arm.com>
> > > Subject: printk: print initial logbuf contents before re-enabling interrupts
> > > 
> > > When running on a hideously slow system (~10Mhz FPGA) with a bunch of
> > > debug printk invocations on the timer interrupt path, we end up filling
> > > the log buffer faster than we can drain it.
> > > 
> > > The reason is that console_unlock (which is responsible for moving
> > > messages out of logbuf to hand over to the console driver) removes one
> > > message at a time, briefly re-enabling interrupts between each of them. 
> > > If the interrupt path prints more than a single message, then we can
> > > easily generate more messages than we can print for a regular, recurring
> > > interrupt (e.g.  a 1khz timer).  This results in messages getting silently
> > > dropped, leading to counter-intuitive, incomplete printk traces on the
> > > console.
> > > 
> > > Rather than run the console_unlock loop with interrupts disabled (which
> > > has obvious latency problems), this patch records the sequence number of
> > > the last message in the log buffer after taking the logbuf_lock.  We can
> > > then print this fixed amount of work before re-enabling interrupts again,
> > > making sure we keep up with ourself.  Other CPUs could still potentially
> > > flood the buffer, but there's little that we can do to protect against
> > > that.
> >   I really dislike this patch. It goes completely against my efforts of
> > lowering irq latency caused by printing to console (which are the
> > problems I have observed ;).
> 
> Hmmm, what makes you think that? Interrupts only remain disabled whilst we
> process the backlog, which in the usual case should be pretty small. We also
> hold the logbuf_lock during this time, so we can't get stuck in an unbounded
> loop.
>
> Can you elaborate a bit more on the problems you've observed, please?
  Well, with serial console the backlog can get actually pretty big. During
boot on large machines I've seen CPUs stuck in that very loop in
console_unlock() for tens of seconds. Obviously that causes problems - e.g.
watchdog fires, RCU lockup detector fires, when interrupts are disabled,
some hardware gives up because its interrupts weren't served for too long.
All in all the machine just dies.

And the backlog builds up because while one cpu is doing the printing in
console_unlock() all the other cpus are busily adding new messages to the
buffer faster than they can be printed...

> > My opinion is that when you are printing from each and every interrupt
> > which happens so often, then you have a problem and disabling IRQs in
> > printk so that your interrupt doesn't happen that often seems like a poor
> > solution to me. You could as well just ratelimit your debug messages,
> > couldn't you?
> 
> My use-case was basically using printk as a debug trace during early boot
> when bringing up Linux on a new CPU core. I don't think ratelimiting would
> be the right thing there, since I really did want as many messages to
> reach the console as possible (which is also why I wrote this patch, not
> just the other one in the series).
  OK, I understand. It just seems wrong to me to throttle all interrupts on
the cpu while doing printing just because someone might be doing debug
printing from the interrupt. Sure it's fine as a debug hack but on a
production machine that seems rather counterproductive.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-06 12:29     ` Jan Kara
@ 2014-05-06 13:12       ` Will Deacon
  2014-05-06 13:50         ` Peter Zijlstra
                           ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Will Deacon @ 2014-05-06 13:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: mm-commits, peterz, kay, LKML, Andrew Morton

On Tue, May 06, 2014 at 01:29:58PM +0100, Jan Kara wrote:
> On Tue 06-05-14 13:06:48, Will Deacon wrote:
> > On Fri, May 02, 2014 at 11:46:51PM +0100, Jan Kara wrote:
> > >   I really dislike this patch. It goes completely against my efforts of
> > > lowering irq latency caused by printing to console (which are the
> > > problems I have observed ;).
> > 
> > Hmmm, what makes you think that? Interrupts only remain disabled whilst we
> > process the backlog, which in the usual case should be pretty small. We also
> > hold the logbuf_lock during this time, so we can't get stuck in an unbounded
> > loop.
> >
> > Can you elaborate a bit more on the problems you've observed, please?
>   Well, with serial console the backlog can get actually pretty big. During
> boot on large machines I've seen CPUs stuck in that very loop in
> console_unlock() for tens of seconds. Obviously that causes problems - e.g.
> watchdog fires, RCU lockup detector fires, when interrupts are disabled,
> some hardware gives up because its interrupts weren't served for too long.
> All in all the machine just dies.

Right, so there's the usual compromise here between throughput and latency.
I hoped that pegging the amount of irq-disabled work using console_end_seq
would give us the best of both worlds, but you're saying that it can still
lead to issues on large machines (in the worst case, the whole buffer fills
up while we're printing the last lot, so the irq-disabled work amounts to
draining logbuf, right?).

That said, printing one message each time seems to go too far in the
opposite direction for my liking, so the best bet is likely to limit the
work to some fixed number of messages. Do you have any feeling for such a
limit?

> And the backlog builds up because while one cpu is doing the printing in
> console_unlock() all the other cpus are busily adding new messages to the
> buffer faster than they can be printed...

Understood, but that's also the situation without this patch (and not one
that I think you can fix without hurting latency).

> > > My opinion is that when you are printing from each and every interrupt
> > > which happens so often, then you have a problem and disabling IRQs in
> > > printk so that your interrupt doesn't happen that often seems like a poor
> > > solution to me. You could as well just ratelimit your debug messages,
> > > couldn't you?
> > 
> > My use-case was basically using printk as a debug trace during early boot
> > when bringing up Linux on a new CPU core. I don't think ratelimiting would
> > be the right thing there, since I really did want as many messages to
> > reach the console as possible (which is also why I wrote this patch, not
> > just the other one in the series).
>   OK, I understand. It just seems wrong to me to throttle all interrupts on
> the cpu while doing printing just because someone might be doing debug
> printing from the interrupt. Sure it's fine as a debug hack but on a
> production machine that seems rather counterproductive.

Perhaps, but the one time I *really* want printk to be reliable is when I'm
using it to debug a problem.

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-06 13:12       ` Will Deacon
@ 2014-05-06 13:50         ` Peter Zijlstra
  2014-05-06 14:00         ` Jan Kara
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2014-05-06 13:50 UTC (permalink / raw)
  To: Will Deacon; +Cc: Jan Kara, mm-commits, kay, LKML, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 297 bytes --]

On Tue, May 06, 2014 at 02:12:34PM +0100, Will Deacon wrote:
> That said, printing one message each time seems to go too far in the
> opposite direction for my liking, so the best bet is likely to limit the
> work to some fixed number of messages. Do you have any feeling for such a
> limit?

42!

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-06 13:12       ` Will Deacon
  2014-05-06 13:50         ` Peter Zijlstra
@ 2014-05-06 14:00         ` Jan Kara
  2014-05-06 15:00           ` Will Deacon
  2014-05-06 22:05         ` Andrew Morton
  2014-05-06 22:05         ` Andrew Morton
  3 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2014-05-06 14:00 UTC (permalink / raw)
  To: Will Deacon; +Cc: Jan Kara, mm-commits, peterz, kay, LKML, Andrew Morton

On Tue 06-05-14 14:12:34, Will Deacon wrote:
> On Tue, May 06, 2014 at 01:29:58PM +0100, Jan Kara wrote:
> > On Tue 06-05-14 13:06:48, Will Deacon wrote:
> > > On Fri, May 02, 2014 at 11:46:51PM +0100, Jan Kara wrote:
> > > >   I really dislike this patch. It goes completely against my efforts of
> > > > lowering irq latency caused by printing to console (which are the
> > > > problems I have observed ;).
> > > 
> > > Hmmm, what makes you think that? Interrupts only remain disabled whilst we
> > > process the backlog, which in the usual case should be pretty small. We also
> > > hold the logbuf_lock during this time, so we can't get stuck in an unbounded
> > > loop.
> > >
> > > Can you elaborate a bit more on the problems you've observed, please?
> >   Well, with serial console the backlog can get actually pretty big. During
> > boot on large machines I've seen CPUs stuck in that very loop in
> > console_unlock() for tens of seconds. Obviously that causes problems - e.g.
> > watchdog fires, RCU lockup detector fires, when interrupts are disabled,
> > some hardware gives up because its interrupts weren't served for too long.
> > All in all the machine just dies.
> 
> Right, so there's the usual compromise here between throughput and latency.
  I'd see that compromise if enabling & disabling interrupts would be
taking considerable amount of time. I don't think that was your concern,
was it? Maybe I just misunderstood you...

> I hoped that pegging the amount of irq-disabled work using console_end_seq
> would give us the best of both worlds, but you're saying that it can still
> lead to issues on large machines (in the worst case, the whole buffer fills
> up while we're printing the last lot, so the irq-disabled work amounts to
> draining logbuf, right?).
  Yup.

> That said, printing one message each time seems to go too far in the
> opposite direction for my liking, so the best bet is likely to limit the
> work to some fixed number of messages. Do you have any feeling for such a
> limit?
  If you really are concerned about enabling and disabling of interrupts
taking significant time (and it may be, I just don't know), then printing
couple of messages without enabling them makes sense. How many is a tricky
question since it depends on the console speed. I had a similar problem
when I was deciding in my patch when we should ask another CPU to take over
printing from the current CPU (to avoid the issues I've described in the
previous email). I was experimenting with various stuff but in the end I
restorted to a stupid "after X characters are printed".

> > And the backlog builds up because while one cpu is doing the printing in
> > console_unlock() all the other cpus are busily adding new messages to the
> > buffer faster than they can be printed...
> 
> Understood, but that's also the situation without this patch (and not one
> that I think you can fix without hurting latency).
  Sure. I have a patch which transitions printing to another CPU once in a
while so single CPU isn't hogged for too long and that solves the issues I
have observed. But Alan didn't like this solution so the issue is unfixed
for now.

> > > > My opinion is that when you are printing from each and every interrupt
> > > > which happens so often, then you have a problem and disabling IRQs in
> > > > printk so that your interrupt doesn't happen that often seems like a poor
> > > > solution to me. You could as well just ratelimit your debug messages,
> > > > couldn't you?
> > > 
> > > My use-case was basically using printk as a debug trace during early boot
> > > when bringing up Linux on a new CPU core. I don't think ratelimiting would
> > > be the right thing there, since I really did want as many messages to
> > > reach the console as possible (which is also why I wrote this patch, not
> > > just the other one in the series).
> >   OK, I understand. It just seems wrong to me to throttle all interrupts on
> > the cpu while doing printing just because someone might be doing debug
> > printing from the interrupt. Sure it's fine as a debug hack but on a
> > production machine that seems rather counterproductive.
> 
> Perhaps, but the one time I *really* want printk to be reliable is when I'm
> using it to debug a problem.
  Yes, I understand that. 

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-06 14:00         ` Jan Kara
@ 2014-05-06 15:00           ` Will Deacon
  2014-05-06 20:00             ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2014-05-06 15:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: mm-commits, peterz, kay, LKML, Andrew Morton

On Tue, May 06, 2014 at 03:00:32PM +0100, Jan Kara wrote:
> On Tue 06-05-14 14:12:34, Will Deacon wrote:
> > On Tue, May 06, 2014 at 01:29:58PM +0100, Jan Kara wrote:
> > >   Well, with serial console the backlog can get actually pretty big. During
> > > boot on large machines I've seen CPUs stuck in that very loop in
> > > console_unlock() for tens of seconds. Obviously that causes problems - e.g.
> > > watchdog fires, RCU lockup detector fires, when interrupts are disabled,
> > > some hardware gives up because its interrupts weren't served for too long.
> > > All in all the machine just dies.
> > 
> > Right, so there's the usual compromise here between throughput and latency.
>   I'd see that compromise if enabling & disabling interrupts would be
> taking considerable amount of time. I don't think that was your concern,
> was it? Maybe I just misunderstood you...

Well, that isn't the quickest operation on ARM (since it's
self-synchronising), but I was actually referring to the ability to drain
the log buffer (with interrupts disabled) vs the ability to service
interrupts quickly. The moment we re-enable interrupts, we can start adding
more messages to the buffer from the IRQ path (I didn't attempt to solve the
multi-CPU case, as I mentioned before).

> > That said, printing one message each time seems to go too far in the
> > opposite direction for my liking, so the best bet is likely to limit the
> > work to some fixed number of messages. Do you have any feeling for such a
> > limit?
>   If you really are concerned about enabling and disabling of interrupts
> taking significant time (and it may be, I just don't know), then printing
> couple of messages without enabling them makes sense. How many is a tricky
> question since it depends on the console speed. I had a similar problem
> when I was deciding in my patch when we should ask another CPU to take over
> printing from the current CPU (to avoid the issues I've described in the
> previous email). I was experimenting with various stuff but in the end I
> restorted to a stupid "after X characters are printed".

Yeah, so you also end up with the same problem of tuning your heuristics.
Peter's suggestion of X == 42 is as good as any arbitrary constant I can
suggest, hence my snapshotting of log_next_seq originally.

> > > And the backlog builds up because while one cpu is doing the printing in
> > > console_unlock() all the other cpus are busily adding new messages to the
> > > buffer faster than they can be printed...
> > 
> > Understood, but that's also the situation without this patch (and not one
> > that I think you can fix without hurting latency).
>   Sure. I have a patch which transitions printing to another CPU once in a
> while so single CPU isn't hogged for too long and that solves the issues I
> have observed. But Alan didn't like this solution so the issue is unfixed
> for now.

Interesting. Do you have a pointer to the thread?

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-06 15:00           ` Will Deacon
@ 2014-05-06 20:00             ` Jan Kara
  2014-05-07  9:46               ` Will Deacon
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2014-05-06 20:00 UTC (permalink / raw)
  To: Will Deacon; +Cc: Jan Kara, mm-commits, peterz, kay, LKML, Andrew Morton

On Tue 06-05-14 16:00:37, Will Deacon wrote:
> On Tue, May 06, 2014 at 03:00:32PM +0100, Jan Kara wrote:
> > On Tue 06-05-14 14:12:34, Will Deacon wrote:
> > > On Tue, May 06, 2014 at 01:29:58PM +0100, Jan Kara wrote:
> > > >   Well, with serial console the backlog can get actually pretty big. During
> > > > boot on large machines I've seen CPUs stuck in that very loop in
> > > > console_unlock() for tens of seconds. Obviously that causes problems - e.g.
> > > > watchdog fires, RCU lockup detector fires, when interrupts are disabled,
> > > > some hardware gives up because its interrupts weren't served for too long.
> > > > All in all the machine just dies.
> > > 
> > > Right, so there's the usual compromise here between throughput and latency.
> >   I'd see that compromise if enabling & disabling interrupts would be
> > taking considerable amount of time. I don't think that was your concern,
> > was it? Maybe I just misunderstood you...
> 
> Well, that isn't the quickest operation on ARM (since it's
> self-synchronising), but I was actually referring to the ability to drain
> the log buffer (with interrupts disabled) vs the ability to service
> interrupts quickly. The moment we re-enable interrupts, we can start adding
> more messages to the buffer from the IRQ path (I didn't attempt to solve the
> multi-CPU case, as I mentioned before).
  I see. But practically the multi-CPU case is much more common than the
IRQ case, isn't it?

> > > That said, printing one message each time seems to go too far in the
> > > opposite direction for my liking, so the best bet is likely to limit the
> > > work to some fixed number of messages. Do you have any feeling for such a
> > > limit?
> >   If you really are concerned about enabling and disabling of interrupts
> > taking significant time (and it may be, I just don't know), then printing
> > couple of messages without enabling them makes sense. How many is a tricky
> > question since it depends on the console speed. I had a similar problem
> > when I was deciding in my patch when we should ask another CPU to take over
> > printing from the current CPU (to avoid the issues I've described in the
> > previous email). I was experimenting with various stuff but in the end I
> > restorted to a stupid "after X characters are printed".
> 
> Yeah, so you also end up with the same problem of tuning your heuristics.
> Peter's suggestion of X == 42 is as good as any arbitrary constant I can
> suggest, hence my snapshotting of log_next_seq originally.
  Yes I can fully understand where you came from :). I just wanted to point
out that your choice isn't a particularly good one either,
 
> > > > And the backlog builds up because while one cpu is doing the printing in
> > > > console_unlock() all the other cpus are busily adding new messages to the
> > > > buffer faster than they can be printed...
> > > 
> > > Understood, but that's also the situation without this patch (and not one
> > > that I think you can fix without hurting latency).
> >   Sure. I have a patch which transitions printing to another CPU once in a
> > while so single CPU isn't hogged for too long and that solves the issues I
> > have observed. But Alan didn't like this solution so the issue is unfixed
> > for now.
> 
> Interesting. Do you have a pointer to the thread?
  The patchset posting starts here:
https://lkml.org/lkml/2014/3/25/343

  Patch 5/8 is probably the most interesting for you (patches 1-4 are
already in the mm tree).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-06 13:12       ` Will Deacon
  2014-05-06 13:50         ` Peter Zijlstra
  2014-05-06 14:00         ` Jan Kara
@ 2014-05-06 22:05         ` Andrew Morton
  2014-05-07  9:58           ` Will Deacon
  2014-05-06 22:05         ` Andrew Morton
  3 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2014-05-06 22:05 UTC (permalink / raw)
  To: Will Deacon; +Cc: Jan Kara, mm-commits, peterz, kay, LKML

On Tue, 6 May 2014 14:12:34 +0100 Will Deacon <will.deacon@arm.com> wrote:

> > > > My opinion is that when you are printing from each and every interrupt
> > > > which happens so often, then you have a problem and disabling IRQs in
> > > > printk so that your interrupt doesn't happen that often seems like a poor
> > > > solution to me. You could as well just ratelimit your debug messages,
> > > > couldn't you?
> > > 
> > > My use-case was basically using printk as a debug trace during early boot
> > > when bringing up Linux on a new CPU core. I don't think ratelimiting would
> > > be the right thing there, since I really did want as many messages to
> > > reach the console as possible (which is also why I wrote this patch, not
> > > just the other one in the series).
> >   OK, I understand. It just seems wrong to me to throttle all interrupts on
> > the cpu while doing printing just because someone might be doing debug
> > printing from the interrupt. Sure it's fine as a debug hack but on a
> > production machine that seems rather counterproductive.
> 
> Perhaps, but the one time I *really* want printk to be reliable is when I'm
> using it to debug a problem.

If you're debugging a problem, you're able to alter printk!  So perhaps
one way out of this is some developer-only ifdef to robustify printk
for particular usage patterns.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-06 13:12       ` Will Deacon
                           ` (2 preceding siblings ...)
  2014-05-06 22:05         ` Andrew Morton
@ 2014-05-06 22:05         ` Andrew Morton
  3 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2014-05-06 22:05 UTC (permalink / raw)
  To: Will Deacon; +Cc: Jan Kara, mm-commits, peterz, kay, LKML

On Tue, 6 May 2014 14:12:34 +0100 Will Deacon <will.deacon@arm.com> wrote:

> > > > My opinion is that when you are printing from each and every interrupt
> > > > which happens so often, then you have a problem and disabling IRQs in
> > > > printk so that your interrupt doesn't happen that often seems like a poor
> > > > solution to me. You could as well just ratelimit your debug messages,
> > > > couldn't you?
> > > 
> > > My use-case was basically using printk as a debug trace during early boot
> > > when bringing up Linux on a new CPU core. I don't think ratelimiting would
> > > be the right thing there, since I really did want as many messages to
> > > reach the console as possible (which is also why I wrote this patch, not
> > > just the other one in the series).
> >   OK, I understand. It just seems wrong to me to throttle all interrupts on
> > the cpu while doing printing just because someone might be doing debug
> > printing from the interrupt. Sure it's fine as a debug hack but on a
> > production machine that seems rather counterproductive.
> 
> Perhaps, but the one time I *really* want printk to be reliable is when I'm
> using it to debug a problem.

If you're debugging a problem, you're able to alter printk!  So perhaps
one way out of this is some developer-only ifdef to robustify printk
for particular usage patterns.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-06 20:00             ` Jan Kara
@ 2014-05-07  9:46               ` Will Deacon
  0 siblings, 0 replies; 16+ messages in thread
From: Will Deacon @ 2014-05-07  9:46 UTC (permalink / raw)
  To: Jan Kara; +Cc: mm-commits, peterz, kay, LKML, Andrew Morton

On Tue, May 06, 2014 at 09:00:22PM +0100, Jan Kara wrote:
> On Tue 06-05-14 16:00:37, Will Deacon wrote:
> > On Tue, May 06, 2014 at 03:00:32PM +0100, Jan Kara wrote:
> > > On Tue 06-05-14 14:12:34, Will Deacon wrote:
> > > > Right, so there's the usual compromise here between throughput and latency.
> > >   I'd see that compromise if enabling & disabling interrupts would be
> > > taking considerable amount of time. I don't think that was your concern,
> > > was it? Maybe I just misunderstood you...
> > 
> > Well, that isn't the quickest operation on ARM (since it's
> > self-synchronising), but I was actually referring to the ability to drain
> > the log buffer (with interrupts disabled) vs the ability to service
> > interrupts quickly. The moment we re-enable interrupts, we can start adding
> > more messages to the buffer from the IRQ path (I didn't attempt to solve the
> > multi-CPU case, as I mentioned before).
>   I see. But practically the multi-CPU case is much more common than the
> IRQ case, isn't it?

I think they're both pretty niche, but still valid scenarios.

> > >   Sure. I have a patch which transitions printing to another CPU once in a
> > > while so single CPU isn't hogged for too long and that solves the issues I
> > > have observed. But Alan didn't like this solution so the issue is unfixed
> > > for now.
> > 
> > Interesting. Do you have a pointer to the thread?
>   The patchset posting starts here:
> https://lkml.org/lkml/2014/3/25/343
> 
>   Patch 5/8 is probably the most interesting for you (patches 1-4 are
> already in the mm tree).

Yikes, that's certainly more invasive than anything I had in mind!

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-06 22:05         ` Andrew Morton
@ 2014-05-07  9:58           ` Will Deacon
  2014-05-07 16:41             ` One Thousand Gnomes
  0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2014-05-07  9:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jan Kara, mm-commits, peterz, kay, LKML

On Tue, May 06, 2014 at 11:05:53PM +0100, Andrew Morton wrote:
> On Tue, 6 May 2014 14:12:34 +0100 Will Deacon <will.deacon@arm.com> wrote:
> > > > > My opinion is that when you are printing from each and every interrupt
> > > > > which happens so often, then you have a problem and disabling IRQs in
> > > > > printk so that your interrupt doesn't happen that often seems like a poor
> > > > > solution to me. You could as well just ratelimit your debug messages,
> > > > > couldn't you?
> > > > 
> > > > My use-case was basically using printk as a debug trace during early boot
> > > > when bringing up Linux on a new CPU core. I don't think ratelimiting would
> > > > be the right thing there, since I really did want as many messages to
> > > > reach the console as possible (which is also why I wrote this patch, not
> > > > just the other one in the series).
> > >   OK, I understand. It just seems wrong to me to throttle all interrupts on
> > > the cpu while doing printing just because someone might be doing debug
> > > printing from the interrupt. Sure it's fine as a debug hack but on a
> > > production machine that seems rather counterproductive.
> > 
> > Perhaps, but the one time I *really* want printk to be reliable is when I'm
> > using it to debug a problem.
> 
> If you're debugging a problem, you're able to alter printk!  So perhaps
> one way out of this is some developer-only ifdef to robustify printk
> for particular usage patterns.

Possibly, but I fear we'd incur the wrath of Alan after reading that other
thread. Having a CONFIG_ option or similar to control the amount of printing
we do is very similar to the command-line option Jan proposed in his series.

So, sadly, I think we may as well drop this patch for the time being. The
second patch in the series still applies fine without it and means we can
at least detect when we're dropping messages.

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-07  9:58           ` Will Deacon
@ 2014-05-07 16:41             ` One Thousand Gnomes
  2014-05-08 14:34               ` Will Deacon
  0 siblings, 1 reply; 16+ messages in thread
From: One Thousand Gnomes @ 2014-05-07 16:41 UTC (permalink / raw)
  To: Will Deacon; +Cc: Andrew Morton, Jan Kara, mm-commits, peterz, kay, LKML

> Possibly, but I fear we'd incur the wrath of Alan after reading that other
> thread. Having a CONFIG_ option or similar to control the amount of printing
> we do is very similar to the command-line option Jan proposed in his series.

I've nothing against a configuration option, and having a printk that
queued most stuff to the serial IRQ handler on overflow and a boot option
of printk=synchronous for debug work would be awesome in my book. Many end
production boxes don't give a toss about losing the odd bit of data they
just need to shift the logs somewhere for filing.

All these "clever" approaches just seem to me to be ever more convoluted
attempts to fail to deal with the simple reality that if you put more
down the sewage pipe than fits it has to overflow somewhere.

Our tty drivers have a fifo, our tty drivers have an IRQ driven write
operation. It seems silly to be adding magic to solve the problem rather
than just using it (and in turn getting a ton of other consoles for free,
and being able to take GregKH's current usb console hack out)

Alan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-07 16:41             ` One Thousand Gnomes
@ 2014-05-08 14:34               ` Will Deacon
  2014-05-12 17:15                 ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Will Deacon @ 2014-05-08 14:34 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Andrew Morton, Jan Kara, mm-commits, peterz, kay, LKML

Hi Alan,

On Wed, May 07, 2014 at 05:41:51PM +0100, One Thousand Gnomes wrote:
> > Possibly, but I fear we'd incur the wrath of Alan after reading that other
> > thread. Having a CONFIG_ option or similar to control the amount of printing
> > we do is very similar to the command-line option Jan proposed in his series.
> 
> I've nothing against a configuration option, and having a printk that
> queued most stuff to the serial IRQ handler on overflow and a boot option
> of printk=synchronous for debug work would be awesome in my book. Many end
> production boxes don't give a toss about losing the odd bit of data they
> just need to shift the logs somewhere for filing.

Ok, I'll revisit this with a command-line option for the debug case. Thanks.

> All these "clever" approaches just seem to me to be ever more convoluted
> attempts to fail to deal with the simple reality that if you put more
> down the sewage pipe than fits it has to overflow somewhere.
>
> Our tty drivers have a fifo, our tty drivers have an IRQ driven write
> operation. It seems silly to be adding magic to solve the problem rather
> than just using it (and in turn getting a ton of other consoles for free,
> and being able to take GregKH's current usb console hack out)

Hmm, I'm not at all familiar with the tty layer, so I'd have to go and take
a look. Jan, did you look at this at all?

Will

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree
  2014-05-08 14:34               ` Will Deacon
@ 2014-05-12 17:15                 ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2014-05-12 17:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: One Thousand Gnomes, Andrew Morton, Jan Kara, mm-commits, peterz,
	kay, LKML

On Thu 08-05-14 15:34:07, Will Deacon wrote:
> Hi Alan,
> 
> On Wed, May 07, 2014 at 05:41:51PM +0100, One Thousand Gnomes wrote:
> > > Possibly, but I fear we'd incur the wrath of Alan after reading that other
> > > thread. Having a CONFIG_ option or similar to control the amount of printing
> > > we do is very similar to the command-line option Jan proposed in his series.
> > 
> > I've nothing against a configuration option, and having a printk that
> > queued most stuff to the serial IRQ handler on overflow and a boot option
> > of printk=synchronous for debug work would be awesome in my book. Many end
> > production boxes don't give a toss about losing the odd bit of data they
> > just need to shift the logs somewhere for filing.
> 
> Ok, I'll revisit this with a command-line option for the debug case. Thanks.
> 
> > All these "clever" approaches just seem to me to be ever more convoluted
> > attempts to fail to deal with the simple reality that if you put more
> > down the sewage pipe than fits it has to overflow somewhere.
> >
> > Our tty drivers have a fifo, our tty drivers have an IRQ driven write
> > operation. It seems silly to be adding magic to solve the problem rather
> > than just using it (and in turn getting a ton of other consoles for free,
> > and being able to take GregKH's current usb console hack out)
> 
> Hmm, I'm not at all familiar with the tty layer, so I'd have to go and take
> a look. Jan, did you look at this at all?
  No. I know nothing about tty layer as well so this research task is
currently on a backburner...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-05-12 17:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-02 21:22 + printk-print-initial-logbuf-contents-before-re-enabling-interrupts.patch added to -mm tree akpm
2014-05-02 22:46 ` Jan Kara
2014-05-06 12:06   ` Will Deacon
2014-05-06 12:29     ` Jan Kara
2014-05-06 13:12       ` Will Deacon
2014-05-06 13:50         ` Peter Zijlstra
2014-05-06 14:00         ` Jan Kara
2014-05-06 15:00           ` Will Deacon
2014-05-06 20:00             ` Jan Kara
2014-05-07  9:46               ` Will Deacon
2014-05-06 22:05         ` Andrew Morton
2014-05-07  9:58           ` Will Deacon
2014-05-07 16:41             ` One Thousand Gnomes
2014-05-08 14:34               ` Will Deacon
2014-05-12 17:15                 ` Jan Kara
2014-05-06 22:05         ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.