All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexandru Chirvasitu <achirvasub@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>,
	Pavel Machek <pavel@ucw.cz>,
	kernel list <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	"Maciej W. Rozycki" <macro@linux-mips.org>,
	Mikael Pettersson <mikpelinux@gmail.com>,
	Josh Poulson <jopoulso@microsoft.com>,
	Mihai Costache <v-micos@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Marc Zyngier <marc.zyngier@arm.com>,
	linux-pci@vger.kernel.org, Haiyang Zhang <haiyangz@microsoft.com>,
	Dexuan Cui <decui@microsoft.com>,
	Simon Xiao <sixiao@microsoft.com>,
	Saeed Mahameed <saeedm@mellanox.com>,
	Jork Loeser <Jork.Loeser@microsoft.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	devel@linuxdriverproject.org, KY Srinivasan <kys@microsoft.com>
Subject: Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
Date: Thu, 28 Dec 2017 10:48:35 -0500	[thread overview]
Message-ID: <20171228154835.GB10658@chirva-slack.chirva-slack> (raw)
In-Reply-To: <alpine.DEB.2.20.1712281531120.1899@nanos>

On Thu, Dec 28, 2017 at 03:48:15PM +0100, Thomas Gleixner wrote:
> On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote:
> > On Thu, Dec 28, 2017 at 12:00:47PM +0100, Thomas Gleixner wrote:
> > > Ok, lets take a step back. The bisect/kexec attempts led us away from the
> > > initial problem which is the machine locking up after login, right?
> > >
> > 
> > Yes; sorry about that..
> 
> Nothing to be sorry about.
> 
> >     x86/vector: Replace the raw_spin_lock() with
> > 
> > diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
> > index 7504491..e5bab02 100644
> > --- a/arch/x86/kernel/apic/vector.c
> > +++ b/arch/x86/kernel/apic/vector.c
> > @@ -726,6 +726,7 @@ static int apic_set_affinity(struct irq_data *irqd,
> >                              const struct cpumask *dest, bool force)
> >  {
> >         struct apic_chip_data *apicd = apic_chip_data(irqd);
> > +       unsigned long flags;
> >         int err;
> >  
> >         /*
> > @@ -740,13 +741,13 @@ static int apic_set_affinity(struct irq_data *irqd,
> >             (apicd->is_managed || apicd->can_reserve))
> >                 return IRQ_SET_MASK_OK;
> >  
> > -       raw_spin_lock(&vector_lock);
> > +       raw_spin_lock_irqsave(&vector_lock, flags);
> >         cpumask_and(vector_searchmask, dest, cpu_online_mask);
> >         if (irqd_affinity_is_managed(irqd))
> >                 err = assign_managed_vector(irqd, vector_searchmask);
> >         else
> >                 err = assign_vector_locked(irqd, vector_searchmask);
> > -       raw_spin_unlock(&vector_lock);
> > +       raw_spin_unlock_irqrestore(&vector_lock, flags);
> >         return err ? err : IRQ_SET_MASK_OK;
> >  }
> > 
> > With this, I still get the lockup messages after login, but not the
> > freezes!
> 
> That's really interesting. There should be no code path which calls into
> that with interrupts enabled. I assume you never ran that kernel with
> CONFIG_PROVE_LOCKING=y.
>

Correct. That option is not set in .config.

> Find below a debug patch which should show us the call chain for that
> case. Please apply that on top of Dou's patch so the machine stays
> accessible. Plain output from dmesg is sufficient.
> 
> > The lockups register in the log, which I am attaching (see below for
> > attachment naming conventions).
> 
> Hmm. That's RCU lockups and that backtrace on the CPU which gets the stall
> looks very familiar. I'd like to see the above result first and then I'll
> send you another pile of patches which might cure that RCU issue.
> 
> Thanks,
> 
> 	tglx
> 
> 8<-------------------
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -729,6 +729,8 @@ static int apic_set_affinity(struct irq_
>  	unsigned long flags;
>  	int err;
>  
> +	WARN_ON_ONCE(!irqs_disabled());
> +
>  	/*
>  	 * Core code can call here for inactive interrupts. For inactive
>  	 * interrupts which use managed or reservation mode there is no
> 
> 
> 

Bit of a step back here: the kernel treated with Dou's patch no longer
logs me in reliably as before, with or without this newest patch on
top..

So now I sometimes get immediate lockups and freezes upon trying to
log in, and other times I get logged in but get a freeze seconds
later.

In no case can I roam around long nough to get a dmesg, and I no
longer get the non-freezing lockups from before. I can't imagine what
I could possibly have changed..

Here's the output of `git log --pretty=oneline -5` on the branch I'm
working in.

--------------------

f2c02af5cc1d620c039b21fab0ca5948a06daf90 2nd tglx patch
7715575170bacf3566d400b9f2210a10ce152880 x86/vector: Replace the raw_spin_lock() with raw_spin_lock_irqsave()
8d9d56caf33d78bfe6b6087767b1b84acee58458 x86-32: fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)
a197e9dea4ccb72e1a6457fac15329bd5319e719 irq/matrix: Remove the overused BUGON() in irq_matrix_assign_system()
464e1d5f23cca236b930ef068c328a64cab78fb1 Linux 4.15-rc5

--------------------

7715575170bacf3566d400b9f2210a10ce152880, which is the kernel with
Dou's patch, logged me in and allowed me to produce the dmesg from
before. I did this a couple of times back then. I no longer can, for
some reason, as it's reverted back to the no-go lockups from before.

And the next one, f2c02af5cc1d620c039b21fab0ca5948a06daf90, where I
applied the patch you just sent, behaves identically.

  reply	other threads:[~2017-12-28 15:47 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20171218082011.GA24638@arch-chirva.localdomain>
2017-12-18 10:11 ` PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop Pavel Machek
2017-12-19  8:34   ` Alexandru Chirvasitu
2017-12-20  0:31     ` Thomas Gleixner
2017-12-20  3:58       ` Dou Liyang
2017-12-20 13:19         ` Alexandru Chirvasitu
2017-12-20 19:45           ` Alexandru Chirvasitu
2017-12-21  2:23             ` Alexandru Chirvasitu
2017-12-22 10:28               ` Dou Liyang
     [not found]                 ` <20171222142053.3cbhi2nhh24w7yoo@D-69-91-141-110.dhcp4.washington.edu>
2017-12-22 21:31                   ` Dexuan Cui
2017-12-22 21:31                     ` Dexuan Cui
     [not found]                     ` <20171222222917.GA1138@arch-chirva.localdomain>
2017-12-23  1:35                       ` Dexuan Cui
2017-12-23  1:35                         ` Dexuan Cui
2017-12-23  4:51                         ` Alexandru Chirvasitu
2017-12-23 13:32                         ` Thomas Gleixner
2017-12-23 20:01                           ` Alexandru Chirvasitu
2017-12-27  8:14                             ` Dou Liyang
2017-12-27 16:18                               ` Alexandru Chirvasitu
     [not found]                                 ` <20171227195007.GF1410@arch-chirva.localdomain>
2017-12-27 23:13                                   ` Alexandru Chirvasitu
2017-12-28  2:06                                 ` Dou Liyang
2017-12-28  2:51                                   ` Alexandru Chirvasitu
2017-12-28 10:23                                     ` Dou Liyang
2017-12-24  3:29                           ` Dou Liyang
2017-12-28 11:00           ` Thomas Gleixner
2017-12-28 14:21             ` Alexandru Chirvasitu
2017-12-28 14:48               ` Thomas Gleixner
2017-12-28 15:48                 ` Alexandru Chirvasitu [this message]
2017-12-28 16:05                   ` Alexandru Chirvasitu
2017-12-28 16:10                     ` Thomas Gleixner
2017-12-28 17:22                       ` Alexandru Chirvasitu
2017-12-28 17:29                         ` Thomas Gleixner
2017-12-28 17:50                           ` Alexandru Chirvasitu
2017-12-28 18:32                             ` Thomas Gleixner
2017-12-28 21:54                               ` Thomas Gleixner
2017-12-28 22:50                                 ` Alexandru Chirvasitu
2017-12-28 22:57                                   ` Thomas Gleixner
2017-12-28 23:19                                     ` Thomas Gleixner
2017-12-28 23:30                                       ` Alexandru Chirvasitu
2017-12-28 23:36                                         ` Thomas Gleixner
2017-12-28 23:59                                           ` Alexandru Chirvasitu
2017-12-29  8:07                                             ` Thomas Gleixner
2017-12-29 11:49                                               ` Alexandru Chirvasitu
2017-12-29 12:22                                                 ` Alexandru Chirvasitu
2017-12-29 13:09                                                 ` Thomas Gleixner
2017-12-29 14:06                                                   ` Alexandru Chirvasitu
2017-12-29  0:15                                         ` Bjorn Helgaas
2017-12-29  0:38                                           ` Alexandru Chirvasitu
2017-12-28 11:03           ` Thomas Gleixner
2017-12-28 19:01             ` Dexuan Cui
2017-12-28 19:01               ` Dexuan Cui
2017-12-28 20:14               ` Thomas Gleixner
2017-12-28 17:17 IRQ behaivour has been changed from v4.14 to v4.15-rc1 Shevchenko, Andriy
2017-12-28 17:21 ` Thomas Gleixner
2017-12-28 17:34   ` Andy Shevchenko
2017-12-28 17:44     ` Thomas Gleixner
2017-12-28 19:31       ` Andy Shevchenko
2017-12-28 19:36         ` Andy Shevchenko
2017-12-28 20:18         ` Thomas Gleixner
2017-12-28 21:03           ` Andy Shevchenko
2017-12-28 21:31             ` Thomas Gleixner
2017-12-28 21:59               ` Thomas Gleixner
2017-12-29 12:06                 ` Andy Shevchenko
2017-12-29 13:10                   ` Thomas Gleixner
2017-12-29 14:27                     ` Andy Shevchenko
2017-12-29 20:20                     ` [tip:irq/urgent] genirq/msi, x86/vector: Prevent reservation mode for non maskable MSI tip-bot for Thomas Gleixner
2017-12-28 17:23 ` IRQ behaivour has been changed from v4.14 to v4.15-rc1 Andy Shevchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171228154835.GB10658@chirva-slack.chirva-slack \
    --to=achirvasub@gmail.com \
    --cc=Jork.Loeser@microsoft.com \
    --cc=bhelgaas@google.com \
    --cc=decui@microsoft.com \
    --cc=devel@linuxdriverproject.org \
    --cc=douly.fnst@cn.fujitsu.com \
    --cc=haiyangz@microsoft.com \
    --cc=jopoulso@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=macro@linux-mips.org \
    --cc=marc.zyngier@arm.com \
    --cc=mikpelinux@gmail.com \
    --cc=mingo@redhat.com \
    --cc=pavel@ucw.cz \
    --cc=saeedm@mellanox.com \
    --cc=sixiao@microsoft.com \
    --cc=sthemmin@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=v-micos@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.