linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Evan Green <evgreen@chromium.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Rajat Jain <rajatja@google.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci <linux-pci@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	x86@kernel.org, Marc Zyngier <maz@kernel.org>
Subject: Re: [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs
Date: Tue, 28 Jan 2020 14:22:28 -0800	[thread overview]
Message-ID: <CAE=gft7Gu0ah4qcbsEB1X+kUMagCzPR+cdCfn2caofcGV+tBjA@mail.gmail.com> (raw)
In-Reply-To: <87imkv63yf.fsf@nanos.tec.linutronix.de>

On Tue, Jan 28, 2020 at 6:38 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Evan,
>
> Thomas Gleixner <tglx@linutronix.de> writes:
> > It's worthwhile, but that needs some deep thoughts about locking and
> > ordering plus the inevitable race conditions this creates. If it would
> > be trivial, I surely wouldn't have hacked up the retrigger mess.
>
> So after staring at it for a while, I came up with the patch below.
>
> Your idea of going through some well defined transition vector is just
> not feasible due to locking and life-time issues.
>
> I'm taking a similar but easier to handle approach.
>
>     1) Move the interrupt to the new vector on the old (local) CPU
>
>     2) Move it to the new CPU
>
>     3) Check if the new vector is pending on the local CPU. If yes
>        retrigger it on the new CPU.
>
> That might give a spurious interrupt if the new vector on the local CPU
> is in use. But as I said before this is nothing to worry about. If the
> affected device driver fails to handle that spurious interrupt then it
> is broken anyway.
>
> In theory we could teach the vector allocation logic to search for an
> unused pair of vectors on both CPUs, but the required code for that is
> hardly worth the trouble. In the end the situation that no pair is found
> has to be handled anyway. So rather than making this the corner case
> which is never tested and then leads to hard to debug issues, I prefer
> to make it more likely to happen.
>
> The patch is only lightly tested, but so far it survived.
>

Hi Thomas,
Thanks for the patch, I gave it a try. I get the following splat, then a hang:

[   62.173778] ============================================
[   62.179723] WARNING: possible recursive locking detected
[   62.185657] 4.19.96 #2 Not tainted
[   62.189453] --------------------------------------------
[   62.195388] migration/1/17 is trying to acquire lock:
[   62.201031] 000000006885da2d (vector_lock){-.-.}, at:
apic_retrigger_irq+0x31/0x63
[   62.209508]
[   62.209508] but task is already holding lock:
[   62.216026] 000000006885da2d (vector_lock){-.-.}, at:
msi_set_affinity+0x13c/0x27b
[   62.224498]
[   62.224498] other info that might help us debug this:
[   62.231791]  Possible unsafe locking scenario:
[   62.231791]
[   62.238406]        CPU0
[   62.241135]        ----
[   62.243863]   lock(vector_lock);
[   62.247467]   lock(vector_lock);
[   62.251071]
[   62.251071]  *** DEADLOCK ***
[   62.251071]
[   62.257687]  May be due to missing lock nesting notation
[   62.257687]
[   62.265274] 2 locks held by migration/1/17:
[   62.269946]  #0: 00000000cfa9d8c3 (&irq_desc_lock_class){-.-.}, at:
irq_migrate_all_off_this_cpu+0x44/0x28f
[   62.280846]  #1: 000000006885da2d (vector_lock){-.-.}, at:
msi_set_affinity+0x13c/0x27b
[   62.289801]
[   62.289801] stack backtrace:
[   62.294669] CPU: 1 PID: 17 Comm: migration/1 Not tainted 4.19.96 #2
[   62.310713] Call Trace:
[   62.313446]  dump_stack+0xac/0x11e
[   62.317255]  __lock_acquire+0x64f/0x19bc
[   62.321646]  ? find_held_lock+0x3d/0xb8
[   62.325936]  ? pci_conf1_write+0x4f/0xdf
[   62.330320]  lock_acquire+0x1b2/0x1fa
[   62.334413]  ? apic_retrigger_irq+0x31/0x63
[   62.339097]  _raw_spin_lock_irqsave+0x51/0x7d
[   62.343972]  ? apic_retrigger_irq+0x31/0x63
[   62.348646]  apic_retrigger_irq+0x31/0x63
[   62.353124]  msi_set_affinity+0x25a/0x27b
[   62.357606]  irq_do_set_affinity+0x37/0xaa
[   62.362191]  irq_migrate_all_off_this_cpu+0x1c1/0x28f
[   62.367841]  fixup_irqs+0x15/0xd2
[   62.371544]  cpu_disable_common+0x20a/0x217
[   62.376217]  native_cpu_disable+0x1f/0x24
[   62.380696]  take_cpu_down+0x41/0x95
[   62.384691]  multi_cpu_stop+0xbd/0x14b
[   62.388878]  ? _raw_spin_unlock_irq+0x2c/0x40
[   62.393746]  ? stop_two_cpus+0x2c5/0x2c5
[   62.398127]  cpu_stopper_thread+0x84/0x100
[   62.402705]  smpboot_thread_fn+0x1a9/0x25f
[   62.407281]  ? cpu_report_death+0x81/0x81
[   62.411760]  kthread+0x146/0x14e
[   62.415364]  ? cpu_report_death+0x81/0x81
[   62.419846]  ? kthread_blkcg+0x31/0x31
[   62.424042]  ret_from_fork+0x24/0x50

-Evan

  reply	other threads:[~2020-01-28 22:23 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-18  0:25 [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs Evan Green
2020-01-22 11:25 ` Rajat Jain
2020-01-22 18:00   ` Evan Green
2020-01-23  8:49     ` Thomas Gleixner
2020-01-23 18:16       ` Thomas Gleixner
     [not found]         ` <CAE=gft6YiM5S1A7iJYJTd5zmaAa8=nhLE3B94JtWa+XW-qVSqQ@mail.gmail.com>
2020-01-23 22:59           ` Evan Green
2020-01-24  0:29             ` Evan Green
2020-01-24 14:34               ` Thomas Gleixner
2020-01-24 21:53                 ` Evan Green
2020-01-24 22:50                   ` Thomas Gleixner
2020-01-28 14:38                     ` Thomas Gleixner
2020-01-28 22:22                       ` Evan Green [this message]
2020-01-28 22:48                         ` Thomas Gleixner
2020-01-29 18:00                           ` Evan Green
2020-01-29 21:00                             ` Thomas Gleixner
2020-01-29 22:53                               ` Evan Green
2020-01-29 23:16                                 ` Thomas Gleixner
2020-01-29 23:48                                   ` Evan Green
2020-01-31 11:27                                     ` [PATCH] x86/apic/msi: Plug non-maskable MSI affinity race Thomas Gleixner
2020-01-31 14:26                                       ` [PATCH V2] " Thomas Gleixner
2020-01-31 20:32                                         ` Evan Green
2020-01-31 21:45                                           ` Thomas Gleixner
     [not found]                                       ` <20200205144509.7004C21D7D@mail.kernel.org>
2020-02-05 14:58                                         ` [PATCH] " Thomas Gleixner
2020-02-05 20:18                                           ` Sasha Levin
2020-01-24  0:50             ` [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAE=gft7Gu0ah4qcbsEB1X+kUMagCzPR+cdCfn2caofcGV+tBjA@mail.gmail.com' \
    --to=evgreen@chromium.org \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=rajatja@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).