All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Keith Busch <keith.busch@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: Re: [BUG 4.15-rc7] IRQ matrix management errors
Date: Wed, 17 Jan 2018 10:32:12 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.20.1801171030150.1777@nanos> (raw)
In-Reply-To: <alpine.DEB.2.20.1801171020440.1777@nanos>

On Wed, 17 Jan 2018, Thomas Gleixner wrote:

> On Wed, 17 Jan 2018, Keith Busch wrote:
> > On Wed, Jan 17, 2018 at 08:34:22AM +0100, Thomas Gleixner wrote:
> > > Can you trace the matrix allocations from the very beginning or tell me how
> > > to reproduce. I'd like to figure out why this is happening.
> > 
> > Sure, I'll get the irq_matrix events.
> > 
> > I reproduce this on a machine with 112 CPUs and 3 NVMe controllers. The
> > first two NVMe want 112 MSI-x vectors, and the last only 31 vectors. The
> > test runs 'modprobe nvme' and 'modprobe -r nvme' in a loop with 10
> > second delay between each step. Repro occurs within a few iterations,
> > sometimes already broken after the initial boot.
> 
> That doesn't sound right. The vectors should be spread evenly accross the
> CPUs. So ENOSPC should never happen.
> 
> Can you please take snapshots of /sys/kernel/debug/irq/ between the
> modprobe and modprobe -r steps?

The allocation fails because CPU1 has exhausted it's vector space here:

[002] d...   333.028216: irq_matrix_alloc_managed: bit=34 cpu=1 online=1 avl=0 alloc=202 managed=2 online_maps=112 global_avl=22085, global_rsvd=158, total_alloc=460

Now the interesting question is how that happens.

Thanks,

	tglx

  reply	other threads:[~2018-01-17  9:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20180115025759.GG13580@localhost.localdomain>
2018-01-15  3:02 ` Keith Busch
2018-01-15  9:13   ` Thomas Gleixner
2018-01-16  6:16     ` Keith Busch
2018-01-16  7:11       ` Keith Busch
2018-01-16 10:33         ` Thomas Gleixner
2018-01-16 11:20           ` Thomas Gleixner
2018-01-16 14:26             ` Keith Busch
2018-01-17  2:25             ` Keith Busch
2018-01-17  7:34               ` Thomas Gleixner
2018-01-17  7:55                 ` Keith Busch
2018-01-17  9:24                   ` Thomas Gleixner
2018-01-17  9:32                     ` Thomas Gleixner [this message]
2018-01-17 14:24                       ` Keith Busch
2018-01-17 15:01                         ` Thomas Gleixner
2018-01-18  2:37                           ` Keith Busch
2018-01-18  8:10                             ` Thomas Gleixner
2018-01-18  8:48                               ` Keith Busch
2018-01-18  9:06                                 ` Thomas Gleixner
2018-01-18 10:43                           ` [tip:irq/urgent] irq/matrix: Spread interrupts on allocation tip-bot for Thomas Gleixner
2018-01-17 11:15             ` [tip:x86/urgent] x86/apic/vector: Fix off by one in error path tip-bot for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1801171030150.1777@nanos \
    --to=tglx@linutronix.de \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: [BUG 4.15-rc7] IRQ matrix management errors' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.