LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Leon Romanovsky <leon@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	x86 <x86@kernel.org>, Suresh Siddha <suresh.b.siddha@intel.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/apic: Fix circular locking dependency between console and hrtimer locks
Date: Mon, 27 Apr 2020 14:59:00 +0200
Message-ID: <87h7x5qe3v.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20200427113218.GB134660@unreal>

Leon Romanovsky <leon@kernel.org> writes:
> On Mon, Apr 27, 2020 at 01:09:49PM +0200, Thomas Gleixner wrote:
>> The local APIC timer (in this case the TSC deadline timer) is set up
>> during early boot on the boot CPU (before SMP setup) with this call
>> chain:
>>
>> smp_prepare_cpus()
>>  native_smp_prepare_cpus()
>>    x86_init.timers.setup_percpu_clockev()
>>      setup_boot_APIC_clock()
>>        setup_APIC_timer()
>>          clockevents_config_and_register()
>>            tick_check_new_device()
>>              tick_setup_device()
>>                tick_setup_oneshot()
>>                  clockevents_switch_state()
>>                    lapic_timer_set_oneshot()
>>                      __setup_APIC_LVTT()
>>                        printk_once(...)
>>
>> Nothing holds hrtimer.base_lock in this call chain.
>
> Can't printk hold that lock through console/netconsole?

How so?

The lockdep splat is about this:

 [  735.324154] swapper/3/0 is trying to acquire lock:
 [  735.324155] ffffffff8442c858 ((console_sem).lock){-...}-{2:2}, at: down_trylock+0x13/0x70
 [  735.324162]
 [  735.324164] but task is already holding lock:
 [  735.324165] ffff88842dfb9958 (hrtimer_bases.lock){-.-.}-{2:2}, at: lock_hrtimer_base+0x71/0x120

and the call chain is:

 [  735.324283]        __lock_acquire+0x374a/0x5210
 [  735.324284]        lock_acquire+0x1b9/0x920
 [  735.324286]        _raw_spin_lock_irqsave+0x3c/0x4b
 [  735.324288]        down_trylock+0x13/0x70
 [  735.324289]        __down_trylock_console_sem+0x33/0xa0
 [  735.324291]        console_trylock+0x13/0x60
 [  735.324292]        vprintk_emit+0xec/0x370
 [  735.324294]        printk+0x9c/0xc3
 [  735.324296]        lapic_timer_set_oneshot+0x4e/0x60
 [  735.324297]        clockevents_switch_state+0x1e1/0x360
 [  735.324299]        tick_program_event+0xae/0xc0
 [  735.324301]        hrtimer_start_range_ns+0x4b6/0xaa0
 [  735.324302]        tick_nohz_idle_stop_tick+0x67c/0xa90
 [  735.324304]        do_idle+0x326/0x530
 [  735.324305]        cpu_startup_entry+0x19/0x20
 [  735.324307]        start_secondary+0x2ee/0x3e0
 [  735.324309]        secondary_startup_64+0xa4/0xb0

hrtimer_start_range_ns() clearly holds htimer_base::lock

>> But the lockdep splat clearly says:
>>
>>  [  735.324357] stack backtrace:
>>  [  735.324360] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.6.0-for-upstream-dbg-2020-04-03_10-44-43-70 #1
>>
>> ...
>>
>> So how can that be the first invocation of that printk_once()?
>>
>> While the patch looks innocent, it papers over the underlying problem
>> and wild theories are not really helping here.
>>
>> Here is a boot log excerpt with lockdep enabled and 'debug' on the
>> command line:
>>
>> [    0.000000] Linux version 5.7.0-rc3 ...
>> ...
>> [    3.992125] TSC deadline timer enabled
>> [    3.995820] smpboot: CPU0: Intel(R) ....
>> ...
>> [    4.050766] smp: Bringing up secondary CPUs ...
>>
>> No splat nothing. The real question is WHY this triggers on Leons
>> machine 735 seconds after boot and on CPU3.
>
> I want to believe that the timestamp are not correct, have no clue if it
> is even possible.

It does not matter whether the timestamps are correct or not. Even if
they are not, then this does not change the fact that this happens on
CPU3 way after the first invocation of __setup_APIC_LVTT() on CPU0 which
simply cannot trigger this splat.

Can you please provide the full dmesg with the splat?

> But let's talk about facts:
> 1. It is started after -rc1 (we don't test linux-next).

Is that a plain kernel from Linus tree or do you have other patches
applied?

A .config file would be appreciated as well along with information about
the hardware or whatever this runs on.

> 2. This workaround helped to eliminate the splat.

It's eliminating the symptom, but this does not make the root cause go
away nor does it explain anything.

> 3. My machine experiences the extra splat all the time
> https://lore.kernel.org/lkml/20200414070502.GR334007@unreal/

which is completely unrelated.

Thanks,

        tglx

  reply index

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-07 17:09 Leon Romanovsky
2020-04-14  5:48 ` Leon Romanovsky
2020-04-14  6:24   ` Ingo Molnar
2020-04-14  7:05     ` Leon Romanovsky
2020-04-23  7:13       ` Leon Romanovsky
2020-04-27 15:35       ` Thomas Gleixner
2020-04-27 15:49         ` Leon Romanovsky
2020-04-27 11:09     ` Thomas Gleixner
2020-04-27 11:32       ` Leon Romanovsky
2020-04-27 12:59         ` Thomas Gleixner [this message]
2020-04-27 13:41           ` Leon Romanovsky
2020-04-27 15:31             ` Thomas Gleixner
2020-04-27 15:54               ` Leon Romanovsky
2020-05-01 18:22               ` [tip: x86/urgent] x86/apic: Move TSC deadline timer debug printk tip-bot2 for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h7x5qe3v.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git