linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	x86 <x86@kernel.org>, Suresh Siddha <suresh.b.siddha@intel.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/apic: Fix circular locking dependency between console and hrtimer locks
Date: Mon, 27 Apr 2020 14:32:18 +0300	[thread overview]
Message-ID: <20200427113218.GB134660@unreal> (raw)
In-Reply-To: <87tv15qj5u.fsf@nanos.tec.linutronix.de>

On Mon, Apr 27, 2020 at 01:09:49PM +0200, Thomas Gleixner wrote:
> Ingo Molnar <mingo@kernel.org> writes:
> > * Leon Romanovsky <leon@kernel.org> wrote:
> > The fix definitely looks legit, lockdep is right that we shouldn't take
> > the console_sem.lock even under trylock.
> >
> > It's only a printk_once(), yet I'm wondering why in the last ~8 years
> > this never triggered. Nobody ever ran lockdep and debug console level
> > enabled on such hardware, or did something else change?
> >
> > One possibility would be that apic_check_deadline_errata() marked almost
> > all Intel systems as broken and the TSC-deadline hardware never actually
> > got activated. In that case you have triggered rarely tested code and
> > might see other weirdnesses. Just saying. :-)
> >
> > Or a bootup with "debug" specified is much more rare in production
> > systems, hence the 8 years old bug.
>
> None of this makes any sense at all.
>
> The local APIC timer (in this case the TSC deadline timer) is set up
> during early boot on the boot CPU (before SMP setup) with this call
> chain:
>
> smp_prepare_cpus()
>  native_smp_prepare_cpus()
>    x86_init.timers.setup_percpu_clockev()
>      setup_boot_APIC_clock()
>        setup_APIC_timer()
>          clockevents_config_and_register()
>            tick_check_new_device()
>              tick_setup_device()
>                tick_setup_oneshot()
>                  clockevents_switch_state()
>                    lapic_timer_set_oneshot()
>                      __setup_APIC_LVTT()
>                        printk_once(...)
>
> Nothing holds hrtimer.base_lock in this call chain.

Can't printk hold that lock through console/netconsole?

>
> But the lockdep splat clearly says:
>
>  [  735.324357] stack backtrace:
>  [  735.324360] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.6.0-for-upstream-dbg-2020-04-03_10-44-43-70 #1
>
> ...
>
> So how can that be the first invocation of that printk_once()?
>
> While the patch looks innocent, it papers over the underlying problem
> and wild theories are not really helping here.
>
> Here is a boot log excerpt with lockdep enabled and 'debug' on the
> command line:
>
> [    0.000000] Linux version 5.7.0-rc3 ...
> ...
> [    3.992125] TSC deadline timer enabled
> [    3.995820] smpboot: CPU0: Intel(R) ....
> ...
> [    4.050766] smp: Bringing up secondary CPUs ...
>
> No splat nothing. The real question is WHY this triggers on Leons
> machine 735 seconds after boot and on CPU3.

I want to believe that the timestamp are not correct, have no clue if it
is even possible.

But let's talk about facts:
1. It is started after -rc1 (we don't test linux-next).
2. This workaround helped to eliminate the splat.
3. My machine experiences the extra splat all the time
https://lore.kernel.org/lkml/20200414070502.GR334007@unreal/

Unfortunately, I can't bisect because the failure mentioned in the
commit message because it  doesn't happen on one machine all the time,
but when we are talking about night run regression, at least one of the
runners hits such lockdep prints.

I can add to our regression any debug patch and get results day after,
if it helps.

Thanks

>
> Thanks,
>
>         tglx

  reply	other threads:[~2020-04-27 11:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-07 17:09 [PATCH] x86/apic: Fix circular locking dependency between console and hrtimer locks Leon Romanovsky
2020-04-14  5:48 ` Leon Romanovsky
2020-04-14  6:24   ` Ingo Molnar
2020-04-14  7:05     ` Leon Romanovsky
2020-04-23  7:13       ` Leon Romanovsky
2020-04-27 15:35       ` Thomas Gleixner
2020-04-27 15:49         ` Leon Romanovsky
2020-04-27 11:09     ` Thomas Gleixner
2020-04-27 11:32       ` Leon Romanovsky [this message]
2020-04-27 12:59         ` Thomas Gleixner
2020-04-27 13:41           ` Leon Romanovsky
2020-04-27 15:31             ` Thomas Gleixner
2020-04-27 15:54               ` Leon Romanovsky
2020-05-01 18:22               ` [tip: x86/urgent] x86/apic: Move TSC deadline timer debug printk tip-bot2 for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200427113218.GB134660@unreal \
    --to=leon@kernel.org \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).