All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Gordeev <agordeev@redhat.com>
To: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	Yinghai Lu <yinghai@kernel.org>
Subject: Re: [PATCH 2/3] x86: x2apic/cluster: Make use of lowest priority delivery mode
Date: Fri, 18 May 2012 17:42:37 +0200	[thread overview]
Message-ID: <20120518154236.GA23406@dhcp-26-207.brq.redhat.com> (raw)
In-Reply-To: <20120518144153.GD8455@moon>

[-- Attachment #1: Type: text/plain, Size: 1920 bytes --]

On Fri, May 18, 2012 at 06:41:53PM +0400, Cyrill Gorcunov wrote:
> On Fri, May 18, 2012 at 12:26:41PM +0200, Alexander Gordeev wrote:
> > Currently x2APIC in logical destination mode delivers interrupts to a
> > single CPU, no matter how many CPUs were specified in the destination
> > cpumask.
> > 
> > This fix enables delivery of interrupts to multiple CPUs by bit-ORing
> > Logical IDs of destination CPUs that have matching Cluster ID.
> > 
> > Because only one cluster could be specified in a message destination
> > address, the destination cpumask is tried for a cluster that contains
> > maximum number of CPUs matching this cpumask. The CPUs in this cluster
> > are selected to receive the interrupts while all other CPUs (in the
> > cpumask) are ignored.
> 
> Hi Alexander,
> 
> I'm a bit confused, we do compose destination id from target cpumask
> and send one ipi per _cluster_ with all cpus belonging to this cluster
> OR'ed. So if my memory doesn't betray me all specified cpus in cluster
> should obtain this message. Thus I'm wondering where do you find that
> only one apic obtains ipi? (note i don't have the real physical
> machine with x2apic enabled handy to test, so please share
> the experience).

Cyrill,

This patchset is not about IPIs at all. It is about interrupts coming from
IO-APICs.

I.e. if you check /proc/interrups for some IR-IO-APIC you will notice that just
a single (first found) CPU receives the interrupts, even though the
corresponding /proc/irq/<irq#>/smp_affinity would specify multiple CPUs.

Given the current design it is unavoidable in physical destination mode (well,
as long as we do not broadcast, which I did not try yet). But it is well
avoidable in clustered logical addressing mode + lowest priority delivery mode.

I am attaching my debugging patchses so you could check actual values of ITREs.

> 
> 	Cyrill

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

[-- Attachment #2: 0001-x86-io-apic-Dump-IO-APICs-to-sys-kernel-ioapic.patch --]
[-- Type: text/plain, Size: 6692 bytes --]

>From a9feb660a6a5e860052f9f2a86bfeced3faaa283 Mon Sep 17 00:00:00 2001
From: Alexander Gordeev <agordeev@redhat.com>
Date: Mon, 7 May 2012 12:55:12 -0400
Subject: [PATCH 1/3] x86: io-apic: Dump IO-APICs to /sys/kernel/ioapic

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 arch/x86/kernel/apic/io_apic.c |  160 ++++++++++++++++++++++++++++++++++++++++
 kernel/ksysfs.c                |   31 ++++++++
 2 files changed, 191 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index e88300d..3b78a50 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1618,6 +1618,166 @@ static void __init setup_timer_IRQ0_pin(unsigned int ioapic_idx,
 	ioapic_write_entry(ioapic_idx, pin, entry);
 }
 
+int sprintf_IO_APIC(char *p, int ioapic_idx)
+{
+	int i;
+	union IO_APIC_reg_00 reg_00;
+	union IO_APIC_reg_01 reg_01;
+	union IO_APIC_reg_02 reg_02;
+	union IO_APIC_reg_03 reg_03;
+	unsigned long flags;
+	char *buf = p;
+
+	raw_spin_lock_irqsave(&ioapic_lock, flags);
+	reg_00.raw = io_apic_read(ioapic_idx, 0);
+	reg_01.raw = io_apic_read(ioapic_idx, 1);
+	if (reg_01.bits.version >= 0x10)
+		reg_02.raw = io_apic_read(ioapic_idx, 2);
+	if (reg_01.bits.version >= 0x20)
+		reg_03.raw = io_apic_read(ioapic_idx, 3);
+	raw_spin_unlock_irqrestore(&ioapic_lock, flags);
+
+	p += sprintf(p, "IO APIC #%d......\n", mpc_ioapic_id(ioapic_idx));
+	p += sprintf(p, ".... register #00: %08X\n", reg_00.raw);
+	p += sprintf(p, ".......    : physical APIC id: %02X\n", reg_00.bits.ID);
+	p += sprintf(p, ".......    : Delivery Type: %X\n", reg_00.bits.delivery_type);
+	p += sprintf(p, ".......    : LTS          : %X\n", reg_00.bits.LTS);
+
+	p += sprintf(p, ".... register #01: %08X\n", *(int *)&reg_01);
+	p += sprintf(p, ".......     : max redirection entries: %02X\n",
+		reg_01.bits.entries);
+
+	p += sprintf(p, ".......     : PRQ implemented: %X\n", reg_01.bits.PRQ);
+	p += sprintf(p, ".......     : IO APIC version: %02X\n",
+		reg_01.bits.version);
+
+	/*
+	 * Some Intel chipsets with IO APIC VERSION of 0x1? don't have reg_02,
+	 * but the value of reg_02 is read as the previous read register
+	 * value, so ignore it if reg_02 == reg_01.
+	 */
+	if (reg_01.bits.version >= 0x10 && reg_02.raw != reg_01.raw) {
+		p += sprintf(p, ".... register #02: %08X\n", reg_02.raw);
+		p += sprintf(p, ".......     : arbitration: %02X\n", reg_02.bits.arbitration);
+	}
+
+	/*
+	 * Some Intel chipsets with IO APIC VERSION of 0x2? don't have reg_02
+	 * or reg_03, but the value of reg_0[23] is read as the previous read
+	 * register value, so ignore it if reg_03 == reg_0[12].
+	 */
+	if (reg_01.bits.version >= 0x20 && reg_03.raw != reg_02.raw &&
+	    reg_03.raw != reg_01.raw) {
+		p += sprintf(p, ".... register #03: %08X\n", reg_03.raw);
+		p += sprintf(p, ".......     : Boot DT    : %X\n", reg_03.bits.boot_DT);
+	}
+
+	p += sprintf(p, ".... IRQ redirection table:\n");
+
+	if (intr_remapping_enabled) {
+		p += sprintf(p, " NR Indx Fmt Mask Trig IRR"
+			" Pol Stat Indx2 Zero Vect:\n");
+	} else {
+		p += sprintf(p, " NR Dst Mask Trig IRR Pol"
+			" Stat Dmod Deli Vect:\n");
+	}
+
+	for (i = 0; i <= reg_01.bits.entries; i++) {
+		if (intr_remapping_enabled) {
+			struct IO_APIC_route_entry entry;
+			struct IR_IO_APIC_route_entry *ir_entry;
+
+			entry = ioapic_read_entry(ioapic_idx, i);
+			ir_entry = (struct IR_IO_APIC_route_entry *) &entry;
+			p += sprintf(p, " %02x %04X ",
+				i,
+				ir_entry->index
+			);
+			p += sprintf(p, "%1d   %1d    %1d    %1d   %1d   "
+				"%1d    %1d     %X    %02X\n",
+				ir_entry->format,
+				ir_entry->mask,
+				ir_entry->trigger,
+				ir_entry->irr,
+				ir_entry->polarity,
+				ir_entry->delivery_status,
+				ir_entry->index2,
+				ir_entry->zero,
+				ir_entry->vector
+			);
+		} else {
+			struct IO_APIC_route_entry entry;
+
+			entry = ioapic_read_entry(ioapic_idx, i);
+			p += sprintf(p, " %02x %02X  ",
+				i,
+				entry.dest
+			);
+			p += sprintf(p, "%1d    %1d    %1d   %1d   %1d    "
+				"%1d    %1d    %02X\n",
+				entry.mask,
+				entry.trigger,
+				entry.irr,
+				entry.polarity,
+				entry.delivery_status,
+				entry.dest_mode,
+				entry.delivery_mode,
+				entry.vector
+			);
+		}
+	}
+
+	return p - buf;
+}
+
+int sprintf_IO_APICs(char *p)
+{
+	int ioapic_idx;
+	struct irq_cfg *cfg;
+	unsigned int irq;
+	struct irq_chip *chip;
+	char *buf = p;
+
+	p += sprintf(p, "number of MP IRQ sources: %d.\n", mp_irq_entries);
+	for (ioapic_idx = 0; ioapic_idx < nr_ioapics; ioapic_idx++)
+		p += sprintf(p, "number of IO-APIC #%d registers: %d.\n",
+		       mpc_ioapic_id(ioapic_idx),
+		       ioapics[ioapic_idx].nr_registers);
+
+	/*
+	 * We are a bit conservative about what we expect.  We have to
+	 * know about every hardware change ASAP.
+	 */
+	p += sprintf(p, "testing the IO APIC.......................\n");
+
+	for (ioapic_idx = 0; ioapic_idx < nr_ioapics; ioapic_idx++)
+		p += sprintf_IO_APIC(p, ioapic_idx);
+
+	p += sprintf(p, "IRQ to pin mappings:\n");
+	for_each_active_irq(irq) {
+		struct irq_pin_list *entry;
+
+		chip = irq_get_chip(irq);
+		if (chip != &ioapic_chip)
+			continue;
+
+		cfg = irq_get_chip_data(irq);
+		if (!cfg)
+			continue;
+		entry = cfg->irq_2_pin;
+		if (!entry)
+			continue;
+		p += sprintf(p, "IRQ%d ", irq);
+		for_each_irq_pin(entry, cfg->irq_2_pin)
+			p += sprintf(p, "-> %d:%d", entry->apic, entry->pin);
+		p += sprintf(p, "\n");
+	}
+
+	p += sprintf(p, ".................................... done.\n");
+
+	return p - buf;
+}
+
 __apicdebuginit(void) print_IO_APIC(int ioapic_idx)
 {
 	int i;
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 4e316e1..85df7bb 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -141,6 +141,34 @@ static ssize_t fscaps_show(struct kobject *kobj,
 }
 KERNEL_ATTR_RO(fscaps);
 
+#ifdef CONFIG_X86_IO_APIC
+
+extern int sprintf_IO_APICs(char *p);
+
+static ssize_t ioapic_show(struct kobject *kobj,
+				       struct kobj_attribute *attr, char *buf)
+{
+	return sprintf_IO_APICs(buf);
+}
+static ssize_t ioapic_store(struct kobject *kobj,
+				   struct kobj_attribute *attr,
+				   const char *buf, size_t count)
+{
+	return -EINVAL;
+/*
+	unsigned long cnt;
+	int ret;
+
+	if (strict_strtoul(buf, 0, &cnt))
+		return -EINVAL;
+
+	ret = crash_shrink_memory(cnt);
+	return ret < 0 ? ret : count;
+*/
+}
+KERNEL_ATTR_RW(ioapic);
+#endif
+
 /*
  * Make /sys/kernel/notes give the raw contents of our kernel .notes section.
  */
@@ -182,6 +210,9 @@ static struct attribute * kernel_attrs[] = {
 	&kexec_crash_size_attr.attr,
 	&vmcoreinfo_attr.attr,
 #endif
+#ifdef CONFIG_X86_IO_APIC
+	&ioapic_attr.attr,
+#endif
 	NULL
 };
 
-- 
1.7.6.5


[-- Attachment #3: 0002-x86-intremap-Fix-get_irte-NULL-pointer-assignment.patch --]
[-- Type: text/plain, Size: 816 bytes --]

>From ac8d07e4bb93ac25582cf117735b0896347b7df7 Mon Sep 17 00:00:00 2001
From: Alexander Gordeev <agordeev@redhat.com>
Date: Wed, 9 May 2012 10:45:52 -0400
Subject: [PATCH 2/3] x86: intremap: Fix get_irte() NULL-pointer assignment

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 drivers/iommu/intr_remapping.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/intr_remapping.c b/drivers/iommu/intr_remapping.c
index 6777ca0..adb0818 100644
--- a/drivers/iommu/intr_remapping.c
+++ b/drivers/iommu/intr_remapping.c
@@ -68,7 +68,7 @@ int get_irte(int irq, struct irte *entry)
 	unsigned long flags;
 	int index;
 
-	if (!entry || !irq_iommu)
+	if (!entry || !irq_iommu || !irq_iommu->iommu)
 		return -1;
 
 	raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
-- 
1.7.6.5


[-- Attachment #4: 0003-x86-intremap-Dump-IRTEs-to-sys-kernel-intremap.patch --]
[-- Type: text/plain, Size: 2692 bytes --]

>From 62d026842357ffdee3397fd5d9f38b68bebc3f0b Mon Sep 17 00:00:00 2001
From: Alexander Gordeev <agordeev@redhat.com>
Date: Wed, 9 May 2012 07:48:53 -0400
Subject: [PATCH 3/3] x86: intremap: Dump IRTEs to /sys/kernel/intremap

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 drivers/iommu/intr_remapping.c |   26 ++++++++++++++++++++++++++
 kernel/ksysfs.c                |   29 +++++++++++++++++++----------
 2 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intr_remapping.c b/drivers/iommu/intr_remapping.c
index adb0818..541dc33 100644
--- a/drivers/iommu/intr_remapping.c
+++ b/drivers/iommu/intr_remapping.c
@@ -832,3 +832,29 @@ error:
 	return -1;
 }
 
+int sprintf_IRTE(char *p, int irq, const struct irte *irte)
+{
+	char *buf = p;
+
+	p += sprintf(p, "%4d: %1d   %1d  %1d  %1d  %1d %03X %03X %04X.%04X %04X  %1d   %1d\n", irq, irte->present, irte->fpd, irte->dst_mode, irte->redir_hint, irte->trigger_mode, irte->dlvry_mode, irte->vector, (irte->dest_id >> 16) & 0xFFFF, irte->dest_id & 0xFFFF, irte->sid, irte->sq, irte->svt);
+
+	return p - buf;
+}
+
+int sprintf_IRTEs(char *p)
+{
+	int irq;
+	struct irte irte;
+	char *buf = p;
+
+	p += sprintf(p, "IRQ#: P FPD DM RH TM DLM VEC --Dest-ID -SID SQ SVT\n");
+
+	for_each_active_irq(irq) {
+		if (get_irte(irq, &irte))
+			continue;
+		p += sprintf_IRTE(p, irq, &irte);
+	}
+
+	return p - buf;
+}
+
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 85df7bb..be4e035 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -142,7 +142,6 @@ static ssize_t fscaps_show(struct kobject *kobj,
 KERNEL_ATTR_RO(fscaps);
 
 #ifdef CONFIG_X86_IO_APIC
-
 extern int sprintf_IO_APICs(char *p);
 
 static ssize_t ioapic_show(struct kobject *kobj,
@@ -155,18 +154,25 @@ static ssize_t ioapic_store(struct kobject *kobj,
 				   const char *buf, size_t count)
 {
 	return -EINVAL;
-/*
-	unsigned long cnt;
-	int ret;
+}
+KERNEL_ATTR_RW(ioapic);
+#endif
 
-	if (strict_strtoul(buf, 0, &cnt))
-		return -EINVAL;
+#ifdef CONFIG_IRQ_REMAP
+extern int sprintf_IRTEs(char *p);
 
-	ret = crash_shrink_memory(cnt);
-	return ret < 0 ? ret : count;
-*/
+static ssize_t intremap_show(struct kobject *kobj,
+				       struct kobj_attribute *attr, char *buf)
+{
+	return sprintf_IRTEs(buf);
 }
-KERNEL_ATTR_RW(ioapic);
+static ssize_t intremap_store(struct kobject *kobj,
+				   struct kobj_attribute *attr,
+				   const char *buf, size_t count)
+{
+	return -EINVAL;
+}
+KERNEL_ATTR_RW(intremap);
 #endif
 
 /*
@@ -213,6 +219,9 @@ static struct attribute * kernel_attrs[] = {
 #ifdef CONFIG_X86_IO_APIC
 	&ioapic_attr.attr,
 #endif
+#ifdef CONFIG_IRQ_REMAP
+	&intremap_attr.attr,
+#endif
 	NULL
 };
 
-- 
1.7.6.5


  reply	other threads:[~2012-05-18 15:42 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-18 10:26 [PATCH 2/3] x86: x2apic/cluster: Make use of lowest priority delivery mode Alexander Gordeev
2012-05-18 14:41 ` Cyrill Gorcunov
2012-05-18 15:42   ` Alexander Gordeev [this message]
2012-05-18 15:51     ` Cyrill Gorcunov
2012-05-19 10:47       ` Cyrill Gorcunov
2012-05-21  7:11         ` Alexander Gordeev
2012-05-21  9:46           ` Cyrill Gorcunov
2012-05-19 20:53 ` Yinghai Lu
2012-05-21  8:13   ` Alexander Gordeev
2012-05-21 23:02     ` Yinghai Lu
2012-05-21 23:33       ` Yinghai Lu
2012-05-22  9:36         ` Alexander Gordeev
2012-05-21 23:44     ` Suresh Siddha
2012-05-21 23:58       ` [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain Suresh Siddha
2012-05-21 23:58         ` [PATCH 2/2] x2apic, cluster: use all the members of one cluster specified in the smp_affinity mask for the interrupt desintation Suresh Siddha
2012-05-22  7:04           ` Ingo Molnar
2012-05-22  7:34             ` Cyrill Gorcunov
2012-05-22 17:21             ` Suresh Siddha
2012-05-22 17:39               ` Cyrill Gorcunov
2012-05-22 17:42                 ` Suresh Siddha
2012-05-22 17:45                   ` Cyrill Gorcunov
2012-05-22 20:03           ` Yinghai Lu
2012-06-06 15:04           ` [tip:x86/apic] x86/x2apic/cluster: Use all the members of one cluster specified in the smp_affinity mask for the interrupt destination tip-bot for Suresh Siddha
2012-06-06 22:21             ` Yinghai Lu
2012-06-06 23:14               ` Suresh Siddha
2012-06-06 15:03         ` [tip:x86/apic] x86/irq: Update irq_cfg domain unless the new affinity is a subset of the current domain tip-bot for Suresh Siddha
2012-08-07 15:31           ` Robert Richter
2012-08-07 15:41             ` do_IRQ: 1.55 No irq handler for vector (irq -1) Borislav Petkov
2012-08-07 16:24               ` Suresh Siddha
2012-08-07 17:28                 ` Robert Richter
2012-08-07 17:47                   ` Suresh Siddha
2012-08-07 17:45                 ` Eric W. Biederman
2012-08-07 20:57                   ` Borislav Petkov
2012-08-07 22:39                     ` Suresh Siddha
2012-08-08  8:58                       ` Robert Richter
2012-08-08 11:04                         ` Borislav Petkov
2012-08-08 19:16                           ` Suresh Siddha
2012-08-14 17:02                             ` [tip:x86/urgent] x86, apic: fix broken legacy interrupts in the logical apic mode tip-bot for Suresh Siddha
2012-06-06 17:20         ` [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain Alexander Gordeev
2012-06-06 23:02           ` Suresh Siddha
2012-06-16  0:25           ` Suresh Siddha
2012-06-18  9:17             ` Alexander Gordeev
2012-06-19  0:51               ` Suresh Siddha
2012-06-19 23:43                 ` [PATCH 1/2] x86, apic: optimize cpu traversal in __assign_irq_vector() using domain membership Suresh Siddha
2012-06-19 23:43                   ` [PATCH 2/2] x86, x2apic: limit the vector reservation to the user specified mask Suresh Siddha
2012-06-20  5:56                     ` Yinghai Lu
2012-06-21  9:04                     ` Alexander Gordeev
2012-06-21 21:51                       ` Suresh Siddha
2012-06-20  5:53                   ` [PATCH 1/2] x86, apic: optimize cpu traversal in __assign_irq_vector() using domain membership Yinghai Lu
2012-06-21  8:31                   ` Alexander Gordeev
2012-06-21 21:53                     ` Suresh Siddha
2012-06-20  0:18               ` [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain Suresh Siddha
2012-06-21 11:00                 ` Alexander Gordeev
2012-06-21 21:58                   ` Suresh Siddha
2012-05-22 10:12       ` [PATCH 2/3] x86: x2apic/cluster: Make use of lowest priority delivery mode Alexander Gordeev
2012-05-21  8:22 ` Ingo Molnar
2012-05-21  9:36   ` Alexander Gordeev
2012-05-21 12:40     ` Ingo Molnar
2012-05-21 14:48       ` Alexander Gordeev
2012-05-21 14:59         ` Ingo Molnar
2012-05-21 15:22           ` Alexander Gordeev
2012-05-21 15:34           ` Cyrill Gorcunov
2012-05-21 15:36           ` Linus Torvalds
2012-05-21 18:07             ` Suresh Siddha
2012-05-21 18:18               ` Linus Torvalds
2012-05-21 18:37                 ` Suresh Siddha
2012-05-21 19:30                   ` Ingo Molnar
2012-05-21 19:15             ` Ingo Molnar
2012-05-21 19:56               ` Suresh Siddha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120518154236.GA23406@dhcp-26-207.brq.redhat.com \
    --to=agordeev@redhat.com \
    --cc=gorcunov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=suresh.b.siddha@intel.com \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.