All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: John Ogness <john.ogness@linutronix.de>
Cc: linux-kernel@vger.kernel.org, frederic@kernel.org,
	pmladek@suse.com, Sergey Senozhatsky <senozhatsky@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [BUG] 8e274732115f ("printk: extend console_lock for per-console locking")
Date: Sun, 12 Jun 2022 21:29:37 -0700	[thread overview]
Message-ID: <20220613042937.GZ1790663@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <87v8t5l39z.fsf@jogness.linutronix.de>

On Mon, Jun 13, 2022 at 04:18:56AM +0206, John Ogness wrote:
> On 2022-06-10, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > I have started getting rcutorture shutdown-time hangs when running
> > against recent mainline, and bisected back to the 8e274732115f ("printk:
> > extend console_lock for per-console locking") commit.  These hangs go
> > away (or at least their probability drops dramatically) if I build with
> > CONFIG_PREEMPTION=n -and- CONFIG_NO_HZ=y (not n!), at least assuming
> > that I also boot with "nohz_full=0-N".
> >
> > Attempts to debug using rcutorture's "--gdb" argument result in
> > "[Inferior 1 (process 1) exited normally]", but with the same truncated
> > console-log output as when running without "--gdb".  This suggests
> > that the kernel really did run to completion and halt as expected,
> > but that the shutdown-time printk() output was lost.  Furthermore, if I
> > use the gdb "hbreak" command to set a breakpoint at kernel_power_off(),
> > it really does hit that breakpoint.  This would not happen in the case
> > of a kernel hang.
> >
> > So, given that I can hit that breakpoint, what should I ask gdb to
> > show me?
> 
> If you also compile with CONFIG_GDB_SCRIPTS=y then you can use the
> defined "lx-dmesg" gdb command to see if the messages are in the
> ringbuffer.
> 
> (You may need to add:
> 
> add-auto-load-safe-path /path/to/linux/scripts/gdb/vmlinux-gdb.py
> 
> to your $HOME/.gdbinit)
> 
> But since you are hitting the breakpoint, the messages will be there.
> 
> > Alternatively, this reproduces on a variety of x86 platforms, so you
> > should be able reproduce it as follows [1]:
> >
> > 	git checkout v5.19-rc1
> > 	tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs "TREE01" --gdb --kconfig "CONFIG_DEBUG_INFO_NONE=n CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y" --trust-make
> >
> > This builds a kernel, boots it, and then tells you how to launch gdb
> > (presumably in some other window).  Once you give launch gdb and give
> > it the suggested commands, the kernel runs for two minutes under qemu,
> > then shuts down.  I used the following gdb commands to set the breakpoint
> > and run the kernel:
> >
> > 	target remote :1234  # suggested by the rcutorture script
> > 	hbreak kernel_power_off  # added by me
> > 	continue  # suggested by the rcutorture script
> 
> Thanks. Nice helper scripts. With this I could easily reproduce the
> issue.

Glad that they worked nicely for you!

> As I suspected, the final printk's cannot direct print because the
> kthread was printing. Using the below patch did seem to address your
> problem. But this is probably not the way forward.

When I apply it, I still lose output, perhaps due to different timing?
Doing the pr_flush(1000, true) just before the call to kernel_power_off()
has been working quite well thus far, though.

							Thanx, Paul

> What I have not figured out is why this problem does not exist when only
> the kthread patch (but not the "extend console_lock" patch) is
> applied. Somehow the console_lock is magically providing some sort of
> synchronization in the end. Or maybe it is just the increased lock
> contention that helps out.
> 
> It seems we need some sort of console_flush_on_panic() for non-panic
> situations that is not as violent as console_flush_on_panic().
> 
> @Petr, does it make sense to add the below patch to mainline? It is only
> marginally helpful because it performs the direct printing in the wrong
> context. Really we need that final pr_emerg("Power down\n") to be a
> successful direct print so that the buffer is fully flushed before
> hitting machine_power_off(). In fact, the below patch could make things
> worse because the printing kthread could take over printing from that
> final pr_emerg().
> 
> John Ogness
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index ea3dd55709e7..5950586008fa 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3841,6 +3841,18 @@ static int printk_kthread_func(void *data)
>  		console_kthread_printing_exit();
>  
>  		mutex_unlock(&con->lock);
> +
> +		/*
> +		 * The kernel may have transitioned to a direct printing
> +		 * state, but any printk calls may not have direct printed
> +		 * because this thread was printing its message. Grab and
> +		 * release the console lock to flush out any pending
> +		 * messages on all consoles.
> +		 */
> +		if (allow_direct_printing()) {
> +			console_lock();
> +			console_unlock();
> +		}
>  	}
>  
>  	con_printk(KERN_INFO, con, "printing thread stopped\n");

  reply	other threads:[~2022-06-13  4:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-10 20:50 [BUG] 8e274732115f ("printk: extend console_lock for per-console locking") Paul E. McKenney
2022-06-12 13:23 ` Paul E. McKenney
2022-06-12 16:03   ` John Ogness
2022-06-12 16:30     ` Paul E. McKenney
2022-06-13 11:01     ` Petr Mladek
2022-06-13  2:12 ` John Ogness
2022-06-13  4:29   ` Paul E. McKenney [this message]
2022-06-13  9:04     ` John Ogness
2022-06-13 10:32       ` Petr Mladek
2022-06-13 15:07         ` Petr Mladek
2022-06-13 15:51           ` Paul E. McKenney
2022-06-13 15:44       ` Paul E. McKenney
2022-06-13 10:50 ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220613042937.GZ1790663@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=frederic@kernel.org \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.