All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>,
	Anton Vorontsov <anton@enomsg.org>,
	Ben Segall <bsegall@google.com>, Colin Cross <ccross@android.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Daniel Vetter <daniel@ffwll.ch>, David Airlie <airlied@linux.ie>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	dri-devel@lists.freedesktop.org, Ingo Molnar <mingo@redhat.com>,
	John Ogness <john.ogness@linutronix.de>,
	Juri Lelli <juri.lelli@redhat.com>,
	Kees Cook <keescook@chromium.org>,
	linux-kernel@vger.kernel.org,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>, Mel Gorman <mgorman@suse.de>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	Tony Luck <tony.luck@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	mkoutny@suse.com
Subject: Re: printk deadlock due to double lock attempt on current CPU's runqueue
Date: Wed, 10 Nov 2021 10:37:26 +0100	[thread overview]
Message-ID: <YYuS1uNhxWOEX1Ci@phenom.ffwll.local> (raw)
In-Reply-To: <20211109213847.GY174703@worktop.programming.kicks-ass.net>

On Tue, Nov 09, 2021 at 10:38:47PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 09, 2021 at 12:06:48PM -0800, Sultan Alsawaf wrote:
> > Hi,
> > 
> > I encountered a printk deadlock on 5.13 which appears to still affect the latest
> > kernel. The deadlock occurs due to printk being used while having the current
> > CPU's runqueue locked, and the underlying framebuffer console attempting to lock
> > the same runqueue when printk tries to flush the log buffer.
> 
> Yes, that's a known 'feature' of some consoles. printk() is in the
> process of being reworked to not call con->write() from the printk()
> calling context, which would go a long way towards fixing this.

I'm a bit out of the loop but from lwn articles my understanding is that
part of upstreaming from -rt we no longer have the explicit "I'm a safe
console for direct printing" opt-in. Which I get from a backwards compat
pov, but I still think for at least fbcon we really should never attempt a
direct printk con->write, it's just all around terrible.

And it's getting worse by the year:
- direct scanout displays (i.e. just a few mmio writes and it will show
  up) are on the way out at least in laptops, everyone gets self-refresh
  (dp psr) under software control, so without being able to kick a kthread
  off nothing shows up except more oopses

- because of the impendence mismatch between fbdev and drm-kms we even go
  ever more this direction for dumb framebuffers, including the firmware
  boot-up framebuffer simpledrm. This could perhaps be fixed with a new
  dedicate console driver directly on top of drm-kms, but that's on the
  wishlist for years and I don't see anyone typing that.

So yeah for fbcon at least I think we really should throw out direct
con->write from printk completely.

Also adding John Ogness.
-Daniel

> 
> >   #27 [ffffc900005b8e28] enqueue_task_fair at ffffffff8129774a  <-- SCHED_WARN_ON(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list);
> >   #28 [ffffc900005b8ec0] activate_task at ffffffff8125625d
> >   #29 [ffffc900005b8ef0] ttwu_do_activate at ffffffff81257943
> >   #30 [ffffc900005b8f28] sched_ttwu_pending at ffffffff8125c71f <-- locks this CPU's runqueue
> >   #31 [ffffc900005b8fa0] flush_smp_call_function_queue at ffffffff813c6833
> >   #32 [ffffc900005b8fd8] generic_smp_call_function_single_interrupt at ffffffff813c7f58
> >   #33 [ffffc900005b8fe0] __sysvec_call_function_single at ffffffff810f1456
> >   #34 [ffffc900005b8ff0] sysvec_call_function_single at ffffffff831ec1bc
> >   --- <IRQ stack> ---
> >   #35 [ffffc9000019fda8] sysvec_call_function_single at ffffffff831ec1bc
> >       RIP: ffffffff831ed06e  RSP: ffffed10438a6a49  RFLAGS: 00000001
> >       RAX: ffff888100d832c0  RBX: 0000000000000000  RCX: 1ffff92000033fd7
> >       RDX: 0000000000000000  RSI: ffff888100d832c0  RDI: ffffed10438a6a49
> >       RBP: ffffffff831ec166   R8: dffffc0000000000   R9: 0000000000000000
> >       R10: ffffffff83400e22  R11: 0000000000000000  R12: ffffffff831ed83e
> >       R13: 0000000000000000  R14: ffffc9000019fde8  R15: ffffffff814d4d9d
> >       ORIG_RAX: ffff88821c53524b  CS: 0001  SS: ef073a2
> >   WARNING: possibly bogus exception frame
> > ----------------------->8-----------------------
> > 
> > The catalyst is that CONFIG_SCHED_DEBUG is enabled and the tmp_alone_branch
> > assertion fails (Peter, is this bad?).
> 
> Yes, that's not good. IIRC Vincent and Michal were looking at that code
> recently.
> 
> > I'm not sure what the *correct* solution is here (don't use printk while having
> > a runqueue locked? don't use schedule_work() from the fbcon path? tell printk
> > to use one of its lock-less backends?), so I've cc'd all the relevant folks.
> 
> I'm a firm believer in early_printk serial consoles.

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>,
	David Airlie <airlied@linux.ie>,
	dri-devel@lists.freedesktop.org, Ben Segall <bsegall@google.com>,
	Sultan Alsawaf <sultan@kerneltoast.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Anton Vorontsov <anton@enomsg.org>,
	Ingo Molnar <mingo@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Petr Mladek <pmladek@suse.com>, Kees Cook <keescook@chromium.org>,
	John Ogness <john.ogness@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Tony Luck <tony.luck@intel.com>,
	linux-kernel@vger.kernel.org,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	mkoutny@suse.com, Thomas Zimmermann <tzimmermann@suse.de>,
	Colin Cross <ccross@android.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: Re: printk deadlock due to double lock attempt on current CPU's runqueue
Date: Wed, 10 Nov 2021 10:37:26 +0100	[thread overview]
Message-ID: <YYuS1uNhxWOEX1Ci@phenom.ffwll.local> (raw)
In-Reply-To: <20211109213847.GY174703@worktop.programming.kicks-ass.net>

On Tue, Nov 09, 2021 at 10:38:47PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 09, 2021 at 12:06:48PM -0800, Sultan Alsawaf wrote:
> > Hi,
> > 
> > I encountered a printk deadlock on 5.13 which appears to still affect the latest
> > kernel. The deadlock occurs due to printk being used while having the current
> > CPU's runqueue locked, and the underlying framebuffer console attempting to lock
> > the same runqueue when printk tries to flush the log buffer.
> 
> Yes, that's a known 'feature' of some consoles. printk() is in the
> process of being reworked to not call con->write() from the printk()
> calling context, which would go a long way towards fixing this.

I'm a bit out of the loop but from lwn articles my understanding is that
part of upstreaming from -rt we no longer have the explicit "I'm a safe
console for direct printing" opt-in. Which I get from a backwards compat
pov, but I still think for at least fbcon we really should never attempt a
direct printk con->write, it's just all around terrible.

And it's getting worse by the year:
- direct scanout displays (i.e. just a few mmio writes and it will show
  up) are on the way out at least in laptops, everyone gets self-refresh
  (dp psr) under software control, so without being able to kick a kthread
  off nothing shows up except more oopses

- because of the impendence mismatch between fbdev and drm-kms we even go
  ever more this direction for dumb framebuffers, including the firmware
  boot-up framebuffer simpledrm. This could perhaps be fixed with a new
  dedicate console driver directly on top of drm-kms, but that's on the
  wishlist for years and I don't see anyone typing that.

So yeah for fbcon at least I think we really should throw out direct
con->write from printk completely.

Also adding John Ogness.
-Daniel

> 
> >   #27 [ffffc900005b8e28] enqueue_task_fair at ffffffff8129774a  <-- SCHED_WARN_ON(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list);
> >   #28 [ffffc900005b8ec0] activate_task at ffffffff8125625d
> >   #29 [ffffc900005b8ef0] ttwu_do_activate at ffffffff81257943
> >   #30 [ffffc900005b8f28] sched_ttwu_pending at ffffffff8125c71f <-- locks this CPU's runqueue
> >   #31 [ffffc900005b8fa0] flush_smp_call_function_queue at ffffffff813c6833
> >   #32 [ffffc900005b8fd8] generic_smp_call_function_single_interrupt at ffffffff813c7f58
> >   #33 [ffffc900005b8fe0] __sysvec_call_function_single at ffffffff810f1456
> >   #34 [ffffc900005b8ff0] sysvec_call_function_single at ffffffff831ec1bc
> >   --- <IRQ stack> ---
> >   #35 [ffffc9000019fda8] sysvec_call_function_single at ffffffff831ec1bc
> >       RIP: ffffffff831ed06e  RSP: ffffed10438a6a49  RFLAGS: 00000001
> >       RAX: ffff888100d832c0  RBX: 0000000000000000  RCX: 1ffff92000033fd7
> >       RDX: 0000000000000000  RSI: ffff888100d832c0  RDI: ffffed10438a6a49
> >       RBP: ffffffff831ec166   R8: dffffc0000000000   R9: 0000000000000000
> >       R10: ffffffff83400e22  R11: 0000000000000000  R12: ffffffff831ed83e
> >       R13: 0000000000000000  R14: ffffc9000019fde8  R15: ffffffff814d4d9d
> >       ORIG_RAX: ffff88821c53524b  CS: 0001  SS: ef073a2
> >   WARNING: possibly bogus exception frame
> > ----------------------->8-----------------------
> > 
> > The catalyst is that CONFIG_SCHED_DEBUG is enabled and the tmp_alone_branch
> > assertion fails (Peter, is this bad?).
> 
> Yes, that's not good. IIRC Vincent and Michal were looking at that code
> recently.
> 
> > I'm not sure what the *correct* solution is here (don't use printk while having
> > a runqueue locked? don't use schedule_work() from the fbcon path? tell printk
> > to use one of its lock-less backends?), so I've cc'd all the relevant folks.
> 
> I'm a firm believer in early_printk serial consoles.

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  parent reply	other threads:[~2021-11-10  9:37 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-09 20:06 printk deadlock due to double lock attempt on current CPU's runqueue Sultan Alsawaf
2021-11-09 20:06 ` Sultan Alsawaf
2021-11-09 21:38 ` Peter Zijlstra
2021-11-09 21:38   ` Peter Zijlstra
2021-11-10  9:00   ` Vincent Guittot
2021-11-10  9:00     ` Vincent Guittot
2021-11-10 10:45     ` Michal Koutný
2021-11-10 10:45       ` Michal Koutný
2021-11-10 19:50     ` Sultan Alsawaf
2021-11-10 19:50       ` Sultan Alsawaf
2021-11-12  7:50       ` Vincent Guittot
2021-11-12  7:50         ` Vincent Guittot
2021-11-10  9:37   ` Daniel Vetter [this message]
2021-11-10  9:37     ` Daniel Vetter
2021-11-10 10:07     ` John Ogness
2021-11-10 10:07       ` John Ogness
2021-11-10 10:44       ` Daniel Vetter
2021-11-10 10:44         ` Daniel Vetter
2021-11-10 20:03       ` Sultan Alsawaf
2021-11-10 20:03         ` Sultan Alsawaf
2021-11-11  8:28         ` John Ogness
2021-11-11  8:28           ` John Ogness
2021-11-11  9:27     ` Petr Mladek
2021-11-10 10:50 ` Petr Mladek
2021-11-10 10:50   ` Petr Mladek
2021-11-10 11:20   ` Peter Zijlstra
2021-11-10 11:20     ` Peter Zijlstra
2021-11-10 13:21     ` Daniel Vetter
2021-11-10 13:21       ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YYuS1uNhxWOEX1Ci@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=airlied@linux.ie \
    --cc=anton@enomsg.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=ccross@android.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=john.ogness@linutronix.de \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=mripard@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=sultan@kerneltoast.com \
    --cc=tony.luck@intel.com \
    --cc=tzimmermann@suse.de \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.