From: Alexandru Chirvasitu <achirvasub@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>,
Pavel Machek <pavel@ucw.cz>,
kernel list <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@redhat.com>,
"Maciej W. Rozycki" <macro@linux-mips.org>,
Mikael Pettersson <mikpelinux@gmail.com>,
Josh Poulson <jopoulso@microsoft.com>,
Mihai Costache <v-micos@microsoft.com>,
Stephen Hemminger <sthemmin@microsoft.com>,
Marc Zyngier <marc.zyngier@arm.com>,
linux-pci@vger.kernel.org, Haiyang Zhang <haiyangz@microsoft.com>,
Dexuan Cui <decui@microsoft.com>,
Simon Xiao <sixiao@microsoft.com>,
Saeed Mahameed <saeedm@mellanox.com>,
Jork Loeser <Jork.Loeser@microsoft.com>,
Bjorn Helgaas <bhelgaas@google.com>,
devel@linuxdriverproject.org, KY Srinivasan <kys@microsoft.com>
Subject: Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
Date: Thu, 28 Dec 2017 10:48:35 -0500 [thread overview]
Message-ID: <20171228154835.GB10658@chirva-slack.chirva-slack> (raw)
In-Reply-To: <alpine.DEB.2.20.1712281531120.1899@nanos>
On Thu, Dec 28, 2017 at 03:48:15PM +0100, Thomas Gleixner wrote:
> On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote:
> > On Thu, Dec 28, 2017 at 12:00:47PM +0100, Thomas Gleixner wrote:
> > > Ok, lets take a step back. The bisect/kexec attempts led us away from the
> > > initial problem which is the machine locking up after login, right?
> > >
> >
> > Yes; sorry about that..
>
> Nothing to be sorry about.
>
> > x86/vector: Replace the raw_spin_lock() with
> >
> > diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
> > index 7504491..e5bab02 100644
> > --- a/arch/x86/kernel/apic/vector.c
> > +++ b/arch/x86/kernel/apic/vector.c
> > @@ -726,6 +726,7 @@ static int apic_set_affinity(struct irq_data *irqd,
> > const struct cpumask *dest, bool force)
> > {
> > struct apic_chip_data *apicd = apic_chip_data(irqd);
> > + unsigned long flags;
> > int err;
> >
> > /*
> > @@ -740,13 +741,13 @@ static int apic_set_affinity(struct irq_data *irqd,
> > (apicd->is_managed || apicd->can_reserve))
> > return IRQ_SET_MASK_OK;
> >
> > - raw_spin_lock(&vector_lock);
> > + raw_spin_lock_irqsave(&vector_lock, flags);
> > cpumask_and(vector_searchmask, dest, cpu_online_mask);
> > if (irqd_affinity_is_managed(irqd))
> > err = assign_managed_vector(irqd, vector_searchmask);
> > else
> > err = assign_vector_locked(irqd, vector_searchmask);
> > - raw_spin_unlock(&vector_lock);
> > + raw_spin_unlock_irqrestore(&vector_lock, flags);
> > return err ? err : IRQ_SET_MASK_OK;
> > }
> >
> > With this, I still get the lockup messages after login, but not the
> > freezes!
>
> That's really interesting. There should be no code path which calls into
> that with interrupts enabled. I assume you never ran that kernel with
> CONFIG_PROVE_LOCKING=y.
>
Correct. That option is not set in .config.
> Find below a debug patch which should show us the call chain for that
> case. Please apply that on top of Dou's patch so the machine stays
> accessible. Plain output from dmesg is sufficient.
>
> > The lockups register in the log, which I am attaching (see below for
> > attachment naming conventions).
>
> Hmm. That's RCU lockups and that backtrace on the CPU which gets the stall
> looks very familiar. I'd like to see the above result first and then I'll
> send you another pile of patches which might cure that RCU issue.
>
> Thanks,
>
> tglx
>
> 8<-------------------
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -729,6 +729,8 @@ static int apic_set_affinity(struct irq_
> unsigned long flags;
> int err;
>
> + WARN_ON_ONCE(!irqs_disabled());
> +
> /*
> * Core code can call here for inactive interrupts. For inactive
> * interrupts which use managed or reservation mode there is no
>
>
>
Bit of a step back here: the kernel treated with Dou's patch no longer
logs me in reliably as before, with or without this newest patch on
top..
So now I sometimes get immediate lockups and freezes upon trying to
log in, and other times I get logged in but get a freeze seconds
later.
In no case can I roam around long nough to get a dmesg, and I no
longer get the non-freezing lockups from before. I can't imagine what
I could possibly have changed..
Here's the output of `git log --pretty=oneline -5` on the branch I'm
working in.
--------------------
f2c02af5cc1d620c039b21fab0ca5948a06daf90 2nd tglx patch
7715575170bacf3566d400b9f2210a10ce152880 x86/vector: Replace the raw_spin_lock() with raw_spin_lock_irqsave()
8d9d56caf33d78bfe6b6087767b1b84acee58458 x86-32: fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)
a197e9dea4ccb72e1a6457fac15329bd5319e719 irq/matrix: Remove the overused BUGON() in irq_matrix_assign_system()
464e1d5f23cca236b930ef068c328a64cab78fb1 Linux 4.15-rc5
--------------------
7715575170bacf3566d400b9f2210a10ce152880, which is the kernel with
Dou's patch, logged me in and allowed me to produce the dmesg from
before. I did this a couple of times back then. I no longer can, for
some reason, as it's reverted back to the no-go lockups from before.
And the next one, f2c02af5cc1d620c039b21fab0ca5948a06daf90, where I
applied the patch you just sent, behaves identically.
next prev parent reply other threads:[~2017-12-28 15:47 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20171218082011.GA24638@arch-chirva.localdomain>
2017-12-18 10:11 ` PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop Pavel Machek
2017-12-19 8:34 ` Alexandru Chirvasitu
2017-12-20 0:31 ` Thomas Gleixner
2017-12-20 3:58 ` Dou Liyang
2017-12-20 13:19 ` Alexandru Chirvasitu
2017-12-20 19:45 ` Alexandru Chirvasitu
2017-12-21 2:23 ` Alexandru Chirvasitu
2017-12-22 10:28 ` Dou Liyang
[not found] ` <20171222142053.3cbhi2nhh24w7yoo@D-69-91-141-110.dhcp4.washington.edu>
2017-12-22 21:31 ` Dexuan Cui
[not found] ` <20171222222917.GA1138@arch-chirva.localdomain>
2017-12-23 1:35 ` Dexuan Cui
2017-12-23 4:51 ` Alexandru Chirvasitu
2017-12-23 13:32 ` Thomas Gleixner
2017-12-23 20:01 ` Alexandru Chirvasitu
2017-12-27 8:14 ` Dou Liyang
2017-12-27 16:18 ` Alexandru Chirvasitu
[not found] ` <20171227195007.GF1410@arch-chirva.localdomain>
2017-12-27 23:13 ` Alexandru Chirvasitu
2017-12-28 2:06 ` Dou Liyang
2017-12-28 2:51 ` Alexandru Chirvasitu
2017-12-28 10:23 ` Dou Liyang
2017-12-24 3:29 ` Dou Liyang
2017-12-28 11:00 ` Thomas Gleixner
2017-12-28 14:21 ` Alexandru Chirvasitu
2017-12-28 14:48 ` Thomas Gleixner
2017-12-28 15:48 ` Alexandru Chirvasitu [this message]
2017-12-28 16:05 ` Alexandru Chirvasitu
2017-12-28 16:10 ` Thomas Gleixner
2017-12-28 17:22 ` Alexandru Chirvasitu
2017-12-28 17:29 ` Thomas Gleixner
2017-12-28 17:50 ` Alexandru Chirvasitu
2017-12-28 18:32 ` Thomas Gleixner
2017-12-28 21:54 ` Thomas Gleixner
2017-12-28 22:50 ` Alexandru Chirvasitu
2017-12-28 22:57 ` Thomas Gleixner
2017-12-28 23:19 ` Thomas Gleixner
2017-12-28 23:30 ` Alexandru Chirvasitu
2017-12-28 23:36 ` Thomas Gleixner
2017-12-28 23:59 ` Alexandru Chirvasitu
2017-12-29 8:07 ` Thomas Gleixner
2017-12-29 11:49 ` Alexandru Chirvasitu
2017-12-29 12:22 ` Alexandru Chirvasitu
2017-12-29 13:09 ` Thomas Gleixner
2017-12-29 14:06 ` Alexandru Chirvasitu
2017-12-29 0:15 ` Bjorn Helgaas
2017-12-29 0:38 ` Alexandru Chirvasitu
2017-12-28 11:03 ` Thomas Gleixner
2017-12-28 19:01 ` Dexuan Cui
2017-12-28 20:14 ` Thomas Gleixner
2017-12-28 17:17 IRQ behaivour has been changed from v4.14 to v4.15-rc1 Shevchenko, Andriy
2017-12-28 17:21 ` Thomas Gleixner
2017-12-28 17:34 ` Andy Shevchenko
2017-12-28 17:44 ` Thomas Gleixner
2017-12-28 19:31 ` Andy Shevchenko
2017-12-28 19:36 ` Andy Shevchenko
2017-12-28 20:18 ` Thomas Gleixner
2017-12-28 21:03 ` Andy Shevchenko
2017-12-28 21:31 ` Thomas Gleixner
2017-12-28 21:59 ` Thomas Gleixner
2017-12-29 12:06 ` Andy Shevchenko
2017-12-29 13:10 ` Thomas Gleixner
2017-12-29 14:27 ` Andy Shevchenko
2017-12-29 20:20 ` [tip:irq/urgent] genirq/msi, x86/vector: Prevent reservation mode for non maskable MSI tip-bot for Thomas Gleixner
2017-12-28 17:23 ` IRQ behaivour has been changed from v4.14 to v4.15-rc1 Andy Shevchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171228154835.GB10658@chirva-slack.chirva-slack \
--to=achirvasub@gmail.com \
--cc=Jork.Loeser@microsoft.com \
--cc=bhelgaas@google.com \
--cc=decui@microsoft.com \
--cc=devel@linuxdriverproject.org \
--cc=douly.fnst@cn.fujitsu.com \
--cc=haiyangz@microsoft.com \
--cc=jopoulso@microsoft.com \
--cc=kys@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=macro@linux-mips.org \
--cc=marc.zyngier@arm.com \
--cc=mikpelinux@gmail.com \
--cc=mingo@redhat.com \
--cc=pavel@ucw.cz \
--cc=saeedm@mellanox.com \
--cc=sixiao@microsoft.com \
--cc=sthemmin@microsoft.com \
--cc=tglx@linutronix.de \
--cc=v-micos@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).