All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Keith Busch <keith.busch@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: Re: [BUG 4.15-rc7] IRQ matrix management errors
Date: Wed, 17 Jan 2018 16:01:47 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.20.1801171557330.1777@nanos> (raw)
In-Reply-To: <20180117142440.GC7562@localhost.localdomain>

On Wed, 17 Jan 2018, Keith Busch wrote:

> On Wed, Jan 17, 2018 at 10:32:12AM +0100, Thomas Gleixner wrote:
> > On Wed, 17 Jan 2018, Thomas Gleixner wrote:
> > > That doesn't sound right. The vectors should be spread evenly accross the
> > > CPUs. So ENOSPC should never happen.
> > > 
> > > Can you please take snapshots of /sys/kernel/debug/irq/ between the
> > > modprobe and modprobe -r steps?
> > 
> > The allocation fails because CPU1 has exhausted it's vector space here:
> > 
> > [002] d...   333.028216: irq_matrix_alloc_managed: bit=34 cpu=1 online=1 avl=0 alloc=202 managed=2 online_maps=112 global_avl=22085, global_rsvd=158, total_alloc=460
> > 
> > Now the interesting question is how that happens.
> 
> The trace with "trace_events=irq_matrix" kernel parameter is attached,
> ended shortly after an allocation failure.

Which device is allocating gazillions of non-managed interrupts?

  NetworkManager-2208  [044] d...     8.648608: irq_matrix_alloc: bit=68 cpu=0 online=1 avl=168 alloc=35 managed=3 online_maps=112 global_avl=22359, global_rsvd=532, total_alloc=215

....

  NetworkManager-2208  [044] d...     8.665114: irq_matrix_alloc: bit=237 cpu=0 online=1 avl=0 alloc=203 managed=3 online_maps=112 global_avl=22191, global_rsvd=364, total_alloc=383

That's 168 interrupts total. Enterprise grade insanity.

The patch below should cure that by spreading them out on allocation.

Thanks,

	tglx

8<------------------

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 0ba0dd8863a7..5831cc7db27d 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -321,29 +321,38 @@ void irq_matrix_remove_reserved(struct irq_matrix *m)
 int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
 		     bool reserved, unsigned int *mapped_cpu)
 {
-	unsigned int cpu;
+	unsigned int cpu, best_cpu, maxavl = 0;
+	struct cpumap *cm;
+	unsigned int bit;
 
+	best_cpu = UINT_MAX;
 	for_each_cpu(cpu, msk) {
-		struct cpumap *cm = per_cpu_ptr(m->maps, cpu);
-		unsigned int bit;
+		cm = per_cpu_ptr(m->maps, cpu);
 
-		if (!cm->online)
+		if (!cm->online || cm->available <= maxavl)
 			continue;
 
-		bit = matrix_alloc_area(m, cm, 1, false);
-		if (bit < m->alloc_end) {
-			cm->allocated++;
-			cm->available--;
-			m->total_allocated++;
-			m->global_available--;
-			if (reserved)
-				m->global_reserved--;
-			*mapped_cpu = cpu;
-			trace_irq_matrix_alloc(bit, cpu, m, cm);
-			return bit;
-		}
+		best_cpu = cpu;
+		maxavl = cm->available;
 	}
-	return -ENOSPC;
+
+	if (!maxavl)
+		return -ENOSPC;
+
+	cm = per_cpu_ptr(m->maps, best_cpu);
+	bit = matrix_alloc_area(m, cm, 1, false);
+	if (bit >= m->alloc_end)
+		return -ENOSPC;
+
+	cm->allocated++;
+	cm->available--;
+	m->total_allocated++;
+	m->global_available--;
+	if (reserved)
+		m->global_reserved--;
+	*mapped_cpu = best_cpu;
+	trace_irq_matrix_alloc(bit, best_cpu, m, cm);
+	return bit;
 }
 
 /**

  reply	other threads:[~2018-01-17 15:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20180115025759.GG13580@localhost.localdomain>
2018-01-15  3:02 ` Keith Busch
2018-01-15  9:13   ` Thomas Gleixner
2018-01-16  6:16     ` Keith Busch
2018-01-16  7:11       ` Keith Busch
2018-01-16 10:33         ` Thomas Gleixner
2018-01-16 11:20           ` Thomas Gleixner
2018-01-16 14:26             ` Keith Busch
2018-01-17  2:25             ` Keith Busch
2018-01-17  7:34               ` Thomas Gleixner
2018-01-17  7:55                 ` Keith Busch
2018-01-17  9:24                   ` Thomas Gleixner
2018-01-17  9:32                     ` Thomas Gleixner
2018-01-17 14:24                       ` Keith Busch
2018-01-17 15:01                         ` Thomas Gleixner [this message]
2018-01-18  2:37                           ` Keith Busch
2018-01-18  8:10                             ` Thomas Gleixner
2018-01-18  8:48                               ` Keith Busch
2018-01-18  9:06                                 ` Thomas Gleixner
2018-01-18 10:43                           ` [tip:irq/urgent] irq/matrix: Spread interrupts on allocation tip-bot for Thomas Gleixner
2018-01-17 11:15             ` [tip:x86/urgent] x86/apic/vector: Fix off by one in error path tip-bot for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1801171557330.1777@nanos \
    --to=tglx@linutronix.de \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: [BUG 4.15-rc7] IRQ matrix management errors' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.