All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Keith Busch <keith.busch@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: Re: [BUG 4.15-rc7] IRQ matrix management errors
Date: Wed, 17 Jan 2018 16:01:47 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.20.1801171557330.1777@nanos> (raw)
In-Reply-To: <20180117142440.GC7562@localhost.localdomain>

On Wed, 17 Jan 2018, Keith Busch wrote:

> On Wed, Jan 17, 2018 at 10:32:12AM +0100, Thomas Gleixner wrote:
> > On Wed, 17 Jan 2018, Thomas Gleixner wrote:
> > > That doesn't sound right. The vectors should be spread evenly accross the
> > > CPUs. So ENOSPC should never happen.
> > > 
> > > Can you please take snapshots of /sys/kernel/debug/irq/ between the
> > > modprobe and modprobe -r steps?
> > 
> > The allocation fails because CPU1 has exhausted it's vector space here:
> > 
> > [002] d...   333.028216: irq_matrix_alloc_managed: bit=34 cpu=1 online=1 avl=0 alloc=202 managed=2 online_maps=112 global_avl=22085, global_rsvd=158, total_alloc=460
> > 
> > Now the interesting question is how that happens.
> 
> The trace with "trace_events=irq_matrix" kernel parameter is attached,
> ended shortly after an allocation failure.

Which device is allocating gazillions of non-managed interrupts?

  NetworkManager-2208  [044] d...     8.648608: irq_matrix_alloc: bit=68 cpu=0 online=1 avl=168 alloc=35 managed=3 online_maps=112 global_avl=22359, global_rsvd=532, total_alloc=215

....

  NetworkManager-2208  [044] d...     8.665114: irq_matrix_alloc: bit=237 cpu=0 online=1 avl=0 alloc=203 managed=3 online_maps=112 global_avl=22191, global_rsvd=364, total_alloc=383

That's 168 interrupts total. Enterprise grade insanity.

The patch below should cure that by spreading them out on allocation.

Thanks,

	tglx

8<------------------

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 0ba0dd8863a7..5831cc7db27d 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -321,29 +321,38 @@ void irq_matrix_remove_reserved(struct irq_matrix *m)
 int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
 		     bool reserved, unsigned int *mapped_cpu)
 {
-	unsigned int cpu;
+	unsigned int cpu, best_cpu, maxavl = 0;
+	struct cpumap *cm;
+	unsigned int bit;
 
+	best_cpu = UINT_MAX;
 	for_each_cpu(cpu, msk) {
-		struct cpumap *cm = per_cpu_ptr(m->maps, cpu);
-		unsigned int bit;
+		cm = per_cpu_ptr(m->maps, cpu);
 
-		if (!cm->online)
+		if (!cm->online || cm->available <= maxavl)
 			continue;
 
-		bit = matrix_alloc_area(m, cm, 1, false);
-		if (bit < m->alloc_end) {
-			cm->allocated++;
-			cm->available--;
-			m->total_allocated++;
-			m->global_available--;
-			if (reserved)
-				m->global_reserved--;
-			*mapped_cpu = cpu;
-			trace_irq_matrix_alloc(bit, cpu, m, cm);
-			return bit;
-		}
+		best_cpu = cpu;
+		maxavl = cm->available;
 	}
-	return -ENOSPC;
+
+	if (!maxavl)
+		return -ENOSPC;
+
+	cm = per_cpu_ptr(m->maps, best_cpu);
+	bit = matrix_alloc_area(m, cm, 1, false);
+	if (bit >= m->alloc_end)
+		return -ENOSPC;
+
+	cm->allocated++;
+	cm->available--;
+	m->total_allocated++;
+	m->global_available--;
+	if (reserved)
+		m->global_reserved--;
+	*mapped_cpu = best_cpu;
+	trace_irq_matrix_alloc(bit, best_cpu, m, cm);
+	return bit;
 }
 
 /**

  reply	other threads:[~2018-01-17 15:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20180115025759.GG13580@localhost.localdomain>
2018-01-15  3:02 ` [BUG 4.15-rc7] IRQ matrix management errors Keith Busch
2018-01-15  9:13   ` Thomas Gleixner
2018-01-16  6:16     ` Keith Busch
2018-01-16  7:11       ` Keith Busch
2018-01-16 10:33         ` Thomas Gleixner
2018-01-16 11:20           ` Thomas Gleixner
2018-01-16 14:26             ` Keith Busch
2018-01-17  2:25             ` Keith Busch
2018-01-17  7:34               ` Thomas Gleixner
2018-01-17  7:55                 ` Keith Busch
2018-01-17  9:24                   ` Thomas Gleixner
2018-01-17  9:32                     ` Thomas Gleixner
2018-01-17 14:24                       ` Keith Busch
2018-01-17 15:01                         ` Thomas Gleixner [this message]
2018-01-18  2:37                           ` Keith Busch
2018-01-18  8:10                             ` Thomas Gleixner
2018-01-18  8:48                               ` Keith Busch
2018-01-18  9:06                                 ` Thomas Gleixner
2018-01-18 10:43                           ` [tip:irq/urgent] irq/matrix: Spread interrupts on allocation tip-bot for Thomas Gleixner
2018-01-17 11:15             ` [tip:x86/urgent] x86/apic/vector: Fix off by one in error path tip-bot for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1801171557330.1777@nanos \
    --to=tglx@linutronix.de \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.