All of lore.kernel.org
 help / color / mirror / Atom feed
From: Conor Dooley <conor.dooley@microchip.com>
To: Petr Mladek <pmladek@suse.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>,
	Conor Dooley <conor@kernel.org>, <senozhatsky@chromium.org>,
	<rostedt@goodmis.org>, <john.ogness@linutronix.de>,
	<linux-kernel@vger.kernel.org>, <regressions@lists.linux.dev>
Subject: Re: [resend][bug] low-probability console lockups since 5.19
Date: Thu, 29 Sep 2022 11:52:00 +0100	[thread overview]
Message-ID: <YzV40LbMHcW1S/9O@wendy> (raw)
In-Reply-To: <YzVvl+rv3iZS9vxk@alley>

On Thu, Sep 29, 2022 at 12:12:39PM +0200, Petr Mladek wrote:
> On Thu 2022-09-29 10:29:05, Conor Dooley wrote:
> > On Thu, Sep 29, 2022 at 11:06:01AM +0200, Thorsten Leemhuis wrote:
> > > Hi Conor
> > > 
> > > On 28.09.22 18:55, Conor Dooley wrote:
> > > > On Fri, Sep 23, 2022 at 05:24:17PM +0100, Conor Dooley wrote:
> > > >>
> > > >> Been bisecting a bug that is causing a boot failure in my CI & have
> > > >> ended up here.. The bug in question is a low(ish) probability lock up
> > > >> of the serial console, I would estimate about 1-in-5 chance on the
> > > >> boards I could actually trigger it on which it has taken me so long
> > > >> to realise that this was an actual problem. Thinking back on it, there
> > > >> were other failures that I would retroactively attribute to this
> > > >> problem too, but I had earlycon disabled
> > > 
> > > There is one thing I wonder when skimming this thread: was there maybe
> > > some other change somewhere in the kernel between the introduction and
> > > the revert of the printk console kthreads patches that is the real
> > > culprit here that makes existing, older races easier to hit? But I guess
> > > in the end that would be very hard to find and it's easier to fix the
> > > problem in the console driver... :-/
> > 
> > Entirely possible that something arrived in the middle, yeah. I've done
> > 100s of reboots on that interim section, albeit with the threaded
> > printers enabled, as I restarted the bisection several times & never hit
> > this failure then.
> 
> Interesting. I wonder if the used console was fixed during the window
> when the kthreads were enabled.

I will, possibly tonight but probably not, run the bisection again with
the threaded printer merge reverted. Hopefully it is not filled with
conflicts if I go that way...

> 
> > I don't know anything about console/printk/serial drivers unfortunately
> > so I will almost certainly not be able to find the problem by
> > inspection. I'd rather submit patches than send reports, but I really
> > really need some help here. I looked at the two patterns Petr suggested,
> > but the former I am not sure applies since the issue is present even
> > when earlycon is disabled & the latter appears (to my untrained eye) to
> > be accounted for in the 8250 driver.
> 
> The problem with the missing port->lock is visible only when the
> early console is enabled. But It is really hard to hit without
> the kthreads.

Right, so sounds like that can be excluded since my CI was hitting it
with earlycon disabled. I'll triple check this, possibly later today.

> 
> The problem with enabled IRQs was visible only with kthreads. The
> original code called console->write() callback already with IRQs
> disabled.
> 
> The kthreads called console->write() callback with IRQs enabled.
> It made sense. They need to be disabled only when really needed
> and the tested drivers did this correctly.

And that sounds like it can be also excluded, since my issue started
post-revert. Unless there's still some kthreads code in there that was
not reverted?

Thanks,
Conor.

 

  reply	other threads:[~2022-09-29 10:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-23 16:24 [resend][bug] low-probability console lockups since 5.19 Conor Dooley
2022-09-26 10:32 ` Petr Mladek
2022-09-26 13:07   ` Conor Dooley
2022-09-28 16:55 ` Conor Dooley
2022-09-29  9:06   ` Thorsten Leemhuis
2022-09-29  9:29     ` Conor Dooley
2022-09-29 10:12       ` Petr Mladek
2022-09-29 10:52         ` Conor Dooley [this message]
2022-09-29 14:13           ` John Ogness
2022-09-29 21:22             ` Conor Dooley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzV40LbMHcW1S/9O@wendy \
    --to=conor.dooley@microchip.com \
    --cc=conor@kernel.org \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmladek@suse.com \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.