All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Yu Chen <yu.c.chen@intel.com>
Cc: x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Rui Zhang <rui.zhang@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Christoph Hellwig <hch@lst.de>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 4/4][RFC v2] x86/apic: Spread the vectors by choosing the idlest CPU
Date: Wed, 6 Sep 2017 10:03:58 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.20.1709060801480.2144@nanos> (raw)
In-Reply-To: <20170906043454.GD23250@localhost.localdomain>

On Wed, 6 Sep 2017, Yu Chen wrote:
> On Wed, Sep 06, 2017 at 12:57:41AM +0200, Thomas Gleixner wrote:
> > I have a hard time to figure out how the 133 vectors on CPU31 are now
> > magically fitting in the empty space on CPU0, which is 204 - 133 = 71. In
> > my limited understanding of math 133 is greater than 71, but your patch
> > might make that magically be wrong.
> >
> The problem is reproduced when the network cable is not plugged in,
> because this driver looks like this:
> 
> step 1. Reserved enough irq vectors and corresponding IRQs.
> step 2. If the network is activated, invoke request_irq() to
>         register the handler.
> step 3. Invoke set_affinity() to spread the IRQs onto different
>         CPUs, thus to spread the vectors too.
> 
> Here's my understanding for why spreading vectors might help for this
> special case: 
> As step 2 will not get invoked, the IRQs of this driver
> has not been enabled, thus in migrate_one_irq() this IRQ
> will not be considered because there is a check of
> irqd_is_started(d), thus there should only be 8 vectors
> allocated by this driver on CPU0, and 8 vectors left on
> CPU31, and the 8 vectors on CPU31 will not be migrated
> to CPU0 neither, so there is room for other 'valid' vectors
> to be migrated to CPU0.

Can you please spare me repeating your theories, as long as you don't have
hard facts to back them up? The network cable is changing the symptoms,
but the underlying root cause is definitely something different.

> # cat /sys/kernel/debug/irq/domains/*
> name:   VECTOR
>  size:   0
>  mapped: 388
>  flags:  0x00000041

So we have 388 vectors mapped in total. And those are just device vectors
because system vectors are not accounted there.

> name:   IO-APIC-0
>  size:   24
>  mapped: 16

That's the legacy space

> name:   IO-APIC-1
>  size:   8
>  mapped: 2

> name:   IO-APIC-2
>  size:   8
>  mapped: 0

> name:   IO-APIC-3
>  size:   8
>  mapped: 0

> name:   IO-APIC-4
>  size:   8
>  mapped: 5

And a few GSIs: Total GSIs = 16 + 2 + 5 = 23

> name:   PCI-MSI-2
>  size:   0
>  mapped: 365

Plus 365 PCI-MSI vectors allocated.

>  flags:  0x00000051
>  parent: VECTOR
>     name:   VECTOR
>      size:   0
>      mapped: 388

Which nicely sums up to 388

> # ls /sys/kernel/debug/irq/irqs
> ls /sys/kernel/debug/irq/irqs
> 0  10   11  13  142  184  217  259  292  31  33   337  339
> 340  342  344  346  348  350  352  354  356  358  360  362
> 364  366  368  370  372  374  376  378  380  382  384  386
> 388  390  392  394  4  6   7  9  1  109  12  14  15   2
> 24   26   3    32  335  338  34   341  343  345  347  349
> 351  353  355  357  359  361  363  365  367  369  371  373
> 375  377  379  381  383  385  387  389  391  393  395  5
> 67  8

That are all interrupts which are active. That's a total of 89. Can you
explain where the delta of 299 vectors comes from?

299 allocated, vector mapped, but unused interrupts?

That's where your problem is, not in the vector spreading. You have a
massive leak.

> BTW, do we have sysfs to display how much vectors used on each CPUs?

Not yet.

Can you please apply the debug patch below, boot the machine and right
after login provide the output of

# cat /sys/kernel/debug/tracing/trace

Thanks,

	tglx

8<-------------------
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -372,6 +372,9 @@ int msi_domain_alloc_irqs(struct irq_dom
 			return ret;
 		}
 
+		trace_printk("dev: %s nvec %d virq %d\n",
+			     dev_name(dev), desc->nvec_used, virq);
+
 		for (i = 0; i < desc->nvec_used; i++)
 			irq_set_msi_desc_off(virq, i, desc);
 	}
@@ -419,6 +422,8 @@ void msi_domain_free_irqs(struct irq_dom
 		 * entry. If that's the case, don't do anything.
 		 */
 		if (desc->irq) {
+			trace_printk("dev: %s nvec %d virq %d\n",
+				     dev_name(dev), desc->nvec_used, desc->irq);
 			irq_domain_free_irqs(desc->irq, desc->nvec_used);
 			desc->irq = 0;
 		}

  reply	other threads:[~2017-09-06  8:04 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-01  5:03 [PATCH 0/4][RFC v2] x86/irq: Spread vectors on different CPUs Chen Yu
2017-09-01  5:03 ` [PATCH 1/4][RFC v2] x86/apic: Extend the defination for vector_irq Chen Yu
2017-09-01  5:04 ` [PATCH 2/4][RFC v2] x86/apic: Record the number of vectors assigned on a CPU Chen Yu
2017-09-01  5:04 ` [PATCH 3/4] x86/apic: Introduce the per vector cpumask array Chen Yu
2017-09-01  5:04 ` [PATCH 4/4][RFC v2] x86/apic: Spread the vectors by choosing the idlest CPU Chen Yu
2017-09-03 18:17   ` Thomas Gleixner
2017-09-03 19:18     ` RFD: x86: Sanitize the vector allocator Thomas Gleixner
2017-09-05 22:57     ` [PATCH 4/4][RFC v2] x86/apic: Spread the vectors by choosing the idlest CPU Thomas Gleixner
2017-09-06  4:34       ` Yu Chen
2017-09-06  8:03         ` Thomas Gleixner [this message]
2017-09-07  2:52           ` Yu Chen
2017-09-07  5:54             ` Thomas Gleixner
2017-09-07  8:34               ` Yu Chen
2017-09-07  9:45                 ` Thomas Gleixner
2017-09-06  4:13     ` Yu Chen
2017-09-06  6:15       ` Christoph Hellwig
2017-09-06 17:46         ` Dan Williams
2017-09-07  2:57           ` Yu Chen
2017-09-07  5:59           ` Thomas Gleixner
2017-09-07  6:23             ` Dan Williams
2017-09-07  6:59               ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1709060801480.2144@nanos \
    --to=tglx@linutronix.de \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=rui.zhang@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.