linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 -tip 0/5] x86, MSI, AHCI: Support multiple MSIs
@ 2012-09-03  9:16 Alexander Gordeev
  2012-09-03  9:17 ` [PATCH v2 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping Alexander Gordeev
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Alexander Gordeev @ 2012-09-03  9:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Thomas Gleixner, Bjorn Helgaas, Suresh Siddha,
	Yinghai Lu, Jeff Garzik, Matthew Wilcox, x86, linux-pci,
	linux-ide

[-- Attachment #1: Type: text/plain, Size: 3684 bytes --]

This series is against the latest tip tree.

Currently multiple MSI mode is limited to a single vector per device (at
least on x86 and PPC). This series breathes life into pci_enable_msi_block()
and makes it possible to set interrupt affinity for multiple IRQs, similarly
to MSI-X. Yet, only for x86 and only when IOMMUs are present.

Although IRQ and PCI subsystems are modified, the current behaviour left
intact. The drivers could just start using multiple MSIs just by following
the existing documentation.

The AHCI device driver makes use of the new mode and a new function -
pci_enable_msi_block_auto() - patches 4,5.

Changes since v1:
  4/5:	based on Bjorn's review pci_enable_msi_block_auto() returns
	the number of interrupts allocated or error code

  5/5:	based on Jeff's comments:

	- AHCI specific code moved from libata-core.c to libahci.c;

	- the host-wide lock concern addressed - ahci_interrupt() handler
	  split in two parts: ahci_hw_interrupt() hardware context handler
	  and ahci_thread_fn() threaded handler;

	- host->lock statistics attached;

Host-wide lock statistics summary:

  Taken using command 'if=/dev/sd{a,b,c} of=/dev/null' running in parallel
  on three SATA HDDs:
	ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
	ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
	ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

  ahci_interrupt()
  Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit-

Test	holdtime-total	waittime-total	acquis.	holdtime-avg	waittime-avg
#
01.	22565801.27	93056.41	6377094	3.538571216	0.014592291
02.	20358324.12	60825.47	6411012	3.175524257	0.009487655
03.	21190730.38	63610.38	6384508	3.319085884	0.009963239
04.	22491064.54	80624.96	6398390	3.515113105	0.012600820
05.	21986193.10	78034.38	6379092	3.446602291	0.012232835
06.	20556437.70	62314.64	6287673	3.269323596	0.009910604
07.	20477137.55	66190.92	6388507	3.205308776	0.010360937
08.	21442240.03	69306.80	6402109	3.349246323	0.010825620
						-----------	-----------
avg						3.352346931	0.011246750

  ahci_hw_interrupt()
  Capabilities: [80] MSI: Enable+ Count=16/16 Maskable- 64bit-

Test	holdtime-total	waittime-total	acquis.	holdtime-avg	waittime-avg
#
09.	8936643.32	54559.79	6505606	1.373683454	0.008386581
10.	8579972.36	51496.56	6496233	1.320761180	0.007927142
11.	8138708.47	54061.46	6494647	1.253141005	0.008324003
12.	8322068.81	60427.51	6509975	1.278356493	0.009282295
13.	8527745.18	65978.83	6515589	1.308821839	0.010126303
14.	8662461.99	57291.39	6510702	1.330495850	0.008799572
15.	8054911.26	69186.41	6514262	1.236504037	0.010620759
16.	9618631.17	39726.37	6517385	1.475842101	0.006095446
						-----------	-----------
avg						1.322200745	0.008695263
Attached patches:
  1/5: x86, MSI: Support multiple MSIs in presense of IRQ remapping
  2/5: x86, MSI: Allocate as many multiple IRQs as requested
  3/5: x86, MSI: Minor readability fixes
  4/5: PCI, MSI: Enable multiple MSIs with pci_enable_msi_block_auto()
  5/5: AHCI: Support multiple MSIs

 Documentation/PCI/MSI-HOWTO.txt |   37 ++++++-
 arch/x86/kernel/apic/io_apic.c  |  170 +++++++++++++++++++++++++++++--
 drivers/ata/ahci.c              |   14 ++-
 drivers/ata/ahci.h              |    5 +
 drivers/ata/libahci.c           |  211 +++++++++++++++++++++++++++++++++++++-
 drivers/pci/msi.c               |   36 ++++++-
 include/linux/irq.h             |    6 +
 include/linux/msi.h             |    1 +
 include/linux/pci.h             |    7 ++
 kernel/irq/chip.c               |   30 ++++--
 kernel/irq/irqdesc.c            |   31 ++++++
 11 files changed, 509 insertions(+), 39 deletions(-)

-- 
1.7.7.6


-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

[-- Attachment #2: ahci_stats.txt --]
[-- Type: text/plain, Size: 13293 bytes --]

	ahci_interrupt()
	Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit-

Test	holdtime-total	waittime-total	acquis.	holdtime-avg	waittime-avg
#
01.	22565801.27	93056.41	6377094	3.538571216	0.014592291
02.	20358324.12	60825.47	6411012	3.175524257	0.009487655
03.	21190730.38	63610.38	6384508	3.319085884	0.009963239
04.	22491064.54	80624.96	6398390	3.515113105	0.012600820
05.	21986193.10	78034.38	6379092	3.446602291	0.012232835
06.	20556437.70	62314.64	6287673	3.269323596	0.009910604
07.	20477137.55	66190.92	6388507	3.205308776	0.010360937
08.	21442240.03	69306.80	6402109	3.349246323	0.010825620
						-----------	-----------
avg						3.352346931	0.011246750

	ahci_hw_interrupt()
	Capabilities: [80] MSI: Enable+ Count=16/16 Maskable- 64bit-

Test	holdtime-total	waittime-total	acquis.	holdtime-avg	waittime-avg
#
09.	8936643.32	54559.79	6505606	1.373683454	0.008386581
10.	8579972.36	51496.56	6496233	1.320761180	0.007927142
11.	8138708.47	54061.46	6494647	1.253141005	0.008324003
12.	8322068.81	60427.51	6509975	1.278356493	0.009282295
13.	8527745.18	65978.83	6515589	1.308821839	0.010126303
14.	8662461.99	57291.39	6510702	1.330495850	0.008799572
15.	8054911.26	69186.41	6514262	1.236504037	0.010620759
16.	9618631.17	39726.37	6517385	1.475842101	0.006095446
						-----------	-----------
avg						1.322200745	0.008695263

                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total

Test #1
                   &(&host->lock)->rlock:         41558          41558           0.13         126.84       93056.41        5141195        6377094           0.42         197.27    22565801.27
                   ---------------------
                   &(&host->lock)->rlock          36643          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock           4915          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120
                   ---------------------
                   &(&host->lock)->rlock          25017          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock          16541          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120
Test #2
                   &(&host->lock)->rlock:         30714          30714           0.14         104.50       60825.47        4398165        6411012           0.23         176.84    20358324.12
                   ---------------------
                   &(&host->lock)->rlock          25858          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock           4856          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120
                   ---------------------
                   &(&host->lock)->rlock          18199          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock          12515          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120

Test #3
                   &(&host->lock)->rlock:         32869          32870           0.14         106.30       63610.38        4655399        6384508           0.43         214.18    21190730.38
                   ---------------------
                   &(&host->lock)->rlock          27494          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock           5376          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120
                   ---------------------
                   &(&host->lock)->rlock          19632          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock          13238          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120

Test #4
                   &(&host->lock)->rlock:         37426          37426           0.13         104.95       80624.96        5009173        6398390           0.38         189.14    22491064.54
                   ---------------------
                   &(&host->lock)->rlock           6162          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120
                   &(&host->lock)->rlock          31264          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   ---------------------
                   &(&host->lock)->rlock          22271          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock          15155          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120
Test #5
                   &(&host->lock)->rlock:         38154          38154           0.14         119.50       78034.38        4923881        6379092           0.24         183.30    21986193.10
                   ---------------------
                   &(&host->lock)->rlock          32396          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock           5758          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120
                   ---------------------
                   &(&host->lock)->rlock          23014          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock          15140          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120

Test #6
                   &(&host->lock)->rlock:         33812          33812           0.14          98.56       62314.64        4479447        6287673           0.33         183.96    20556437.70
                   ---------------------
                   &(&host->lock)->rlock          28301          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock           5511          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120
                   ---------------------
                   &(&host->lock)->rlock          21378          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock          12434          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120

Test #7
                   &(&host->lock)->rlock:         35336          35336           0.14         105.05       66190.92        4642703        6388507           0.40         179.67    20477137.55
                   ---------------------
                   &(&host->lock)->rlock          30348          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock           4988          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120
                   ---------------------
                   &(&host->lock)->rlock          21983          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock          13353          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120

Test #8
                   &(&host->lock)->rlock:         37715          37715           0.14         122.00       69306.80        4827790        6402109           0.42         192.10    21442240.03
                   ---------------------
                   &(&host->lock)->rlock          31913          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock           5802          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120
                   ---------------------
                   &(&host->lock)->rlock          23613          [<ffffffff81416b03>] ata_scsi_queuecmd+0x33/0x290
                   &(&host->lock)->rlock          14102          [<ffffffff814283e4>] ahci_interrupt+0x64/0x120
Test #9
                   &(&host->lock)->rlock:         24517          24519           0.11         114.67       54559.79        5550859        6505606           0.06         153.24     8936643.32
                   ---------------------
                   &(&host->lock)->rlock          10797          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          13722          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130
                   ---------------------
                   &(&host->lock)->rlock          18255          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130
                   &(&host->lock)->rlock           6264          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
Test #10
                   &(&host->lock)->rlock:         27503          27505           0.13         143.29       51496.56        5659327        6496233           0.06         196.87     8579972.36
                   ---------------------
                   &(&host->lock)->rlock          10952          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          16553          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130
                   ---------------------
                   &(&host->lock)->rlock           6553          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          20952          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130

Test #11
                   &(&host->lock)->rlock:         25655          25658           0.12         118.67       54061.46        5690296        6494647           0.07         210.10     8138708.47
                   ---------------------
                   &(&host->lock)->rlock          16681          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock           8977          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130
                   ---------------------
                   &(&host->lock)->rlock          11899          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          13759          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130

Test #12
                   &(&host->lock)->rlock:         36741          36742           0.12         113.75       60427.51        5624197        6509975           0.06         226.06     8322068.81
                   ---------------------
                   &(&host->lock)->rlock          24149          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          12593          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130
                   ---------------------
                   &(&host->lock)->rlock          18710          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          18032          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130

Test #13
                   &(&host->lock)->rlock:         25348          25350           0.16         129.52       65978.83        5632012        6515589           0.06         154.50     8527745.18
                   ---------------------
                   &(&host->lock)->rlock          10507          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          14843          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130
                   ---------------------
                   &(&host->lock)->rlock          20301          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130
                   &(&host->lock)->rlock           5049          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0

Test #14
                   &(&host->lock)->rlock:         33124          33125           0.11         117.73       57291.39        5621960        6510702           0.06         209.74     8662461.99
                   ---------------------
                   &(&host->lock)->rlock          14406          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          18719          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130
                   ---------------------
                   &(&host->lock)->rlock           9999          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          23126          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130

Test #15
                   &(&host->lock)->rlock:         24801          24801           0.14         130.18       69186.41        5468958        6514262           0.07         160.17     8054911.26
                   ---------------------
                   &(&host->lock)->rlock          16013          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock           8788          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130
                   ---------------------
                   &(&host->lock)->rlock          10132          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          14669          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130

Test #16
                   &(&host->lock)->rlock:         29420          29422           0.15         112.77       39726.37        5609359        6517385           0.06         160.30     9618631.17
                   ---------------------
                   &(&host->lock)->rlock          14040          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          15382          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130
                   ---------------------
                   &(&host->lock)->rlock          10237          [<ffffffff814280d8>] ahci_thread_fn+0x48/0xa0
                   &(&host->lock)->rlock          19185          [<ffffffff81428164>] ahci_hw_interrupt+0x34/0x130


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping
  2012-09-03  9:16 [PATCH v2 -tip 0/5] x86, MSI, AHCI: Support multiple MSIs Alexander Gordeev
@ 2012-09-03  9:17 ` Alexander Gordeev
  2012-09-03 18:53   ` Yinghai Lu
  2012-09-03  9:18 ` [PATCH v2 -tip 2/5] x86, MSI: Allocate as many multiple IRQs as requested Alexander Gordeev
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Alexander Gordeev @ 2012-09-03  9:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Thomas Gleixner, Bjorn Helgaas, Suresh Siddha,
	Yinghai Lu, Jeff Garzik, Matthew Wilcox, x86, linux-pci,
	linux-ide

The MSI specification has several constraints in comparison with MSI-X,
most notable of them is the inability to configure MSIs independently.
As a result, it is impossible to dispatch interrupts from different
queues to different CPUs. This is largely devalues the support of
multiple MSIs in SMP systems.

Also, a necessity to allocate a contiguous block of vector numbers for
devices capable of multiple MSIs might cause a considerable pressure on
x86 interrupt vector allocator and could lead to fragmentation of the
interrupt vectors space.

This patch overcomes both drawbacks in presense of IRQ remapping and
lets devices take advantage of multiple queues and per-IRQ affinity
assignments.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 arch/x86/kernel/apic/io_apic.c |  166 +++++++++++++++++++++++++++++++++++++--
 include/linux/irq.h            |    6 ++
 kernel/irq/chip.c              |   30 +++++--
 kernel/irq/irqdesc.c           |   31 ++++++++
 4 files changed, 216 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index c265593..5fd2577 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -305,6 +305,11 @@ static int alloc_irq_from(unsigned int from, int node)
 	return irq_alloc_desc_from(from, node);
 }
 
+static int alloc_irqs_from(unsigned int from, unsigned int count, int node)
+{
+	return irq_alloc_descs_from(from, count, node);
+}
+
 static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
 {
 	free_irq_cfg(at, cfg);
@@ -3039,6 +3044,55 @@ int create_irq(void)
 	return irq;
 }
 
+unsigned int create_irqs(unsigned int from, unsigned int count, int node)
+{
+	struct irq_cfg **cfg;
+	unsigned long flags;
+	int irq, i;
+
+	if (from < nr_irqs_gsi)
+		from = nr_irqs_gsi;
+
+	cfg = kzalloc_node(count * sizeof(cfg[0]), GFP_KERNEL, node);
+	if (!cfg)
+		return 0;
+
+	irq = alloc_irqs_from(from, count, node);
+	if (irq < 0)
+		goto out_cfgs;
+
+	for (i = 0; i < count; i++) {
+		cfg[i] = alloc_irq_cfg(irq + i, node);
+		if (!cfg[i])
+			goto out_irqs;
+	}
+
+	raw_spin_lock_irqsave(&vector_lock, flags);
+	for (i = 0; i < count; i++)
+		if (__assign_irq_vector(irq + i, cfg[i], apic->target_cpus()))
+			goto out_vecs;
+	raw_spin_unlock_irqrestore(&vector_lock, flags);
+
+	for (i = 0; i < count; i++) {
+		irq_set_chip_data(irq + i, cfg[i]);
+		irq_clear_status_flags(irq + i, IRQ_NOREQUEST);
+	}
+
+	kfree(cfg);
+	return irq;
+
+out_vecs:
+	for (; i; i--)
+		__clear_irq_vector(irq + i - 1, cfg[i - 1]);
+	raw_spin_unlock_irqrestore(&vector_lock, flags);
+out_irqs:
+	for (i = 0; i < count; i++)
+		free_irq_at(irq + i, cfg[i]);
+out_cfgs:
+	kfree(cfg);
+	return 0;
+}
+
 void destroy_irq(unsigned int irq)
 {
 	struct irq_cfg *cfg = irq_get_chip_data(irq);
@@ -3054,6 +3108,27 @@ void destroy_irq(unsigned int irq)
 	free_irq_at(irq, cfg);
 }
 
+static inline void destroy_irqs(unsigned int irq, unsigned int count)
+{
+	unsigned int i;
+	for (i = 0; i < count; i++)
+		destroy_irq(irq + i);
+}
+
+static inline int
+can_create_pow_of_two_irqs(unsigned int from, unsigned int count)
+{
+	if ((count > 1) && (count % 2))
+		return -EINVAL;
+
+	for (; count; count = count / 2) {
+		if (!irq_can_alloc_irqs(from, count))
+			return count;
+	}
+
+	return -ENOSPC;
+}
+
 /*
  * MSI message composition
  */
@@ -3145,18 +3220,25 @@ static struct irq_chip msi_chip = {
 	.irq_retrigger		= ioapic_retrigger_irq,
 };
 
-static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
+static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
+			 unsigned int irq_base, unsigned int irq_offset)
 {
 	struct irq_chip *chip = &msi_chip;
 	struct msi_msg msg;
+	unsigned int irq = irq_base + irq_offset;
 	int ret;
 
 	ret = msi_compose_msg(dev, irq, &msg, -1);
 	if (ret < 0)
 		return ret;
 
-	irq_set_msi_desc(irq, msidesc);
-	write_msi_msg(irq, &msg);
+	irq_set_msi_desc_off(irq_base, irq_offset, msidesc);
+
+	/* MSI-X message is written per-IRQ, the offset is always 0.
+	 * MSI message denotes a contiguous group of IRQs, written for 0th IRQ.
+	 */
+	if (!irq_offset)
+		write_msi_msg(irq, &msg);
 
 	if (irq_remapped(irq_get_chip_data(irq))) {
 		irq_set_status_flags(irq, IRQ_MOVE_PCNTXT);
@@ -3170,16 +3252,12 @@ static int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
 	return 0;
 }
 
-int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+int setup_msix_irqs(struct pci_dev *dev, int nvec)
 {
 	int node, ret, sub_handle, index = 0;
 	unsigned int irq, irq_want;
 	struct msi_desc *msidesc;
 
-	/* x86 doesn't support multiple MSI yet */
-	if (type == PCI_CAP_ID_MSI && nvec > 1)
-		return 1;
-
 	node = dev_to_node(&dev->dev);
 	irq_want = nr_irqs_gsi;
 	sub_handle = 0;
@@ -3208,7 +3286,7 @@ int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 				goto error;
 		}
 no_ir:
-		ret = setup_msi_irq(dev, msidesc, irq);
+		ret = setup_msi_irq(dev, msidesc, irq, 0);
 		if (ret < 0)
 			goto error;
 		sub_handle++;
@@ -3220,6 +3298,76 @@ error:
 	return ret;
 }
 
+int setup_msi_irqs(struct pci_dev *dev, int nvec)
+{
+	int node, ret, sub_handle, index = 0;
+	unsigned int irq;
+	struct msi_desc *msidesc;
+
+	if (nvec > 1 && !irq_remapping_enabled)
+		return 1;
+
+	nvec = __roundup_pow_of_two(nvec);
+	ret = can_create_pow_of_two_irqs(nr_irqs_gsi, nvec);
+	if (ret != nvec)
+		return ret;
+
+	WARN_ON(!list_is_singular(&dev->msi_list));
+	msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
+	WARN_ON(msidesc->irq);
+	WARN_ON(msidesc->msi_attrib.multiple);
+
+	node = dev_to_node(&dev->dev);
+	irq = create_irqs(nr_irqs_gsi, nvec, node);
+	if (irq == 0)
+		return -ENOSPC;
+
+	if (!irq_remapping_enabled) {
+		ret = setup_msi_irq(dev, msidesc, irq, 0);
+		if (ret < 0)
+			goto error;
+		return 0;
+	}
+
+	msidesc->msi_attrib.multiple = ilog2(nvec);
+	for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
+		if (!sub_handle) {
+			index = msi_alloc_remapped_irq(dev, irq, nvec);
+			if (index < 0) {
+				ret = index;
+				goto error;
+			}
+		} else {
+			ret = msi_setup_remapped_irq(dev, irq + sub_handle,
+						     index, sub_handle);
+			if (ret < 0)
+				goto error;
+		}
+		ret = setup_msi_irq(dev, msidesc, irq, sub_handle);
+		if (ret < 0)
+			goto error;
+	}
+	return 0;
+
+error:
+	destroy_irqs(irq, nvec);
+
+	/* Restore altered MSI descriptor fields and prevent just destroyed
+	 * IRQs from tearing down again in default_teardown_msi_irqs()
+	 */
+	msidesc->irq = 0;
+	msidesc->msi_attrib.multiple = 0;
+
+	return ret;
+}
+
+int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+{
+	if (type == PCI_CAP_ID_MSI)
+		return setup_msi_irqs(dev, nvec);
+	return setup_msix_irqs(dev, nvec);
+}
+
 void native_teardown_msi_irq(unsigned int irq)
 {
 	destroy_irq(irq);
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 216b0ba..c3ba39f 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -522,6 +522,8 @@ extern int irq_set_handler_data(unsigned int irq, void *data);
 extern int irq_set_chip_data(unsigned int irq, void *data);
 extern int irq_set_irq_type(unsigned int irq, unsigned int type);
 extern int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry);
+extern int irq_set_msi_desc_off(unsigned int irq_base, unsigned int irq_offset,
+				struct msi_desc *entry);
 extern struct irq_data *irq_get_irq_data(unsigned int irq);
 
 static inline struct irq_chip *irq_get_chip(unsigned int irq)
@@ -584,8 +586,12 @@ int __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node,
 #define irq_alloc_desc_from(from, node)		\
 	irq_alloc_descs(-1, from, 1, node)
 
+#define irq_alloc_descs_from(from, cnt, node)	\
+	irq_alloc_descs(-1, from, cnt, node)
+
 void irq_free_descs(unsigned int irq, unsigned int cnt);
 int irq_reserve_irqs(unsigned int from, unsigned int cnt);
+int irq_can_alloc_irqs(unsigned int from, unsigned int cnt);
 
 static inline void irq_free_desc(unsigned int irq)
 {
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 57d86d0..2230389 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -90,27 +90,41 @@ int irq_set_handler_data(unsigned int irq, void *data)
 EXPORT_SYMBOL(irq_set_handler_data);
 
 /**
- *	irq_set_msi_desc - set MSI descriptor data for an irq
- *	@irq:	Interrupt number
- *	@entry:	Pointer to MSI descriptor data
+ *	irq_set_msi_desc_off - set MSI descriptor data for an irq at offset
+ *	@irq_base:	Interrupt number base
+ *	@irq_offset:	Interrupt number offset
+ *	@entry:		Pointer to MSI descriptor data
  *
- *	Set the MSI descriptor entry for an irq
+ *	Set the MSI descriptor entry for an irq at offset
  */
-int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry)
+int irq_set_msi_desc_off(unsigned int irq_base, unsigned int irq_offset,
+			 struct msi_desc *entry)
 {
 	unsigned long flags;
-	struct irq_desc *desc = irq_get_desc_lock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
+	struct irq_desc *desc = irq_get_desc_lock(irq_base + irq_offset, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
 
 	if (!desc)
 		return -EINVAL;
 	desc->irq_data.msi_desc = entry;
-	if (entry)
-		entry->irq = irq;
+	if (entry && !irq_offset)
+		entry->irq = irq_base;
 	irq_put_desc_unlock(desc, flags);
 	return 0;
 }
 
 /**
+ *	irq_set_msi_desc - set MSI descriptor data for an irq
+ *	@irq:	Interrupt number
+ *	@entry:	Pointer to MSI descriptor data
+ *
+ *	Set the MSI descriptor entry for an irq
+ */
+int irq_set_msi_desc(unsigned int irq, struct msi_desc *entry)
+{
+	return irq_set_msi_desc_off(irq, 0, entry);
+}
+
+/**
  *	irq_set_chip_data - set irq chip data for an irq
  *	@irq:	Interrupt number
  *	@data:	Pointer to chip specific data
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 192a302..8287b78 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -210,6 +210,13 @@ static int irq_expand_nr_irqs(unsigned int nr)
 	return 0;
 }
 
+static int irq_can_expand_nr_irqs(unsigned int nr)
+{
+	if (nr > IRQ_BITMAP_BITS)
+		return -ENOMEM;
+	return 0;
+}
+
 int __init early_irq_init(void)
 {
 	int i, initcnt, node = first_online_node;
@@ -414,6 +421,30 @@ int irq_reserve_irqs(unsigned int from, unsigned int cnt)
 }
 
 /**
+ * irq_can_alloc_irqs - checks if a range of irqs could be allocated
+ * @from:	check from irq number
+ * @cnt:	number of irqs to check
+ *
+ * Returns 0 on success or an appropriate error code
+ */
+int irq_can_alloc_irqs(unsigned int from, unsigned int cnt)
+{
+	unsigned int start;
+	int ret = 0;
+
+	if (!cnt)
+		return -EINVAL;
+
+	mutex_lock(&sparse_irq_lock);
+	start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,
+					   from, cnt, 0);
+	mutex_unlock(&sparse_irq_lock);
+	if (start + cnt > nr_irqs)
+		ret = irq_can_expand_nr_irqs(start + cnt);
+	return ret;
+}
+
+/**
  * irq_get_next_irq - get next allocated irq number
  * @offset:	where to start the search
  *
-- 
1.7.7.6


-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 -tip 2/5] x86, MSI: Allocate as many multiple IRQs as requested
  2012-09-03  9:16 [PATCH v2 -tip 0/5] x86, MSI, AHCI: Support multiple MSIs Alexander Gordeev
  2012-09-03  9:17 ` [PATCH v2 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping Alexander Gordeev
@ 2012-09-03  9:18 ` Alexander Gordeev
  2012-09-03  9:19 ` [PATCH v2 -tip 3/5] x86, MSI: Minor readability fixes Alexander Gordeev
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Alexander Gordeev @ 2012-09-03  9:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Thomas Gleixner, Bjorn Helgaas, Suresh Siddha,
	Yinghai Lu, Jeff Garzik, Matthew Wilcox, x86, linux-pci,
	linux-ide

When multiple MSIs are enabled with pci_enable_msi_block() the number of
allocated IRQs 'nvec' is rounded up to the nearest value of power of two.
That could lead to a condition when number of requested and used IRQs is
less than number of actually allocated IRQs.

This fix introduces 'msi_desc::nvec' field to address the above issue -
when non-zero, it holds the number of allocated IRQs. Otherwise, the old
method is used.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 arch/x86/kernel/apic/io_apic.c |   16 +++++++---------
 drivers/pci/msi.c              |   10 ++++++++--
 include/linux/msi.h            |    1 +
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 5fd2577..fbf3916 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3116,16 +3116,12 @@ static inline void destroy_irqs(unsigned int irq, unsigned int count)
 }
 
 static inline int
-can_create_pow_of_two_irqs(unsigned int from, unsigned int count)
+can_create_irqs(unsigned int from, unsigned int count)
 {
-	if ((count > 1) && (count % 2))
-		return -EINVAL;
-
-	for (; count; count = count / 2) {
+	for (; count; count = count - 1) {
 		if (!irq_can_alloc_irqs(from, count))
 			return count;
 	}
-
 	return -ENOSPC;
 }
 
@@ -3307,8 +3303,7 @@ int setup_msi_irqs(struct pci_dev *dev, int nvec)
 	if (nvec > 1 && !irq_remapping_enabled)
 		return 1;
 
-	nvec = __roundup_pow_of_two(nvec);
-	ret = can_create_pow_of_two_irqs(nr_irqs_gsi, nvec);
+	ret = can_create_irqs(nr_irqs_gsi, nvec);
 	if (ret != nvec)
 		return ret;
 
@@ -3316,11 +3311,13 @@ int setup_msi_irqs(struct pci_dev *dev, int nvec)
 	msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
 	WARN_ON(msidesc->irq);
 	WARN_ON(msidesc->msi_attrib.multiple);
+	WARN_ON(msidesc->nvec);
 
 	node = dev_to_node(&dev->dev);
 	irq = create_irqs(nr_irqs_gsi, nvec, node);
 	if (irq == 0)
 		return -ENOSPC;
+	msidesc->nvec = nvec;
 
 	if (!irq_remapping_enabled) {
 		ret = setup_msi_irq(dev, msidesc, irq, 0);
@@ -3329,7 +3326,7 @@ int setup_msi_irqs(struct pci_dev *dev, int nvec)
 		return 0;
 	}
 
-	msidesc->msi_attrib.multiple = ilog2(nvec);
+	msidesc->msi_attrib.multiple = ilog2(__roundup_pow_of_two(nvec));
 	for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
 		if (!sub_handle) {
 			index = msi_alloc_remapped_irq(dev, irq, nvec);
@@ -3357,6 +3354,7 @@ error:
 	 */
 	msidesc->irq = 0;
 	msidesc->msi_attrib.multiple = 0;
+	msidesc->nvec = 0;
 
 	return ret;
 }
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a825d78..f0752d1 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -79,7 +79,10 @@ void default_teardown_msi_irqs(struct pci_dev *dev)
 		int i, nvec;
 		if (entry->irq == 0)
 			continue;
-		nvec = 1 << entry->msi_attrib.multiple;
+		if (entry->nvec)
+			nvec = entry->nvec;
+		else
+			nvec = 1 << entry->msi_attrib.multiple;
 		for (i = 0; i < nvec; i++)
 			arch_teardown_msi_irq(entry->irq + i);
 	}
@@ -336,7 +339,10 @@ static void free_msi_irqs(struct pci_dev *dev)
 		int i, nvec;
 		if (!entry->irq)
 			continue;
-		nvec = 1 << entry->msi_attrib.multiple;
+		if (entry->nvec)
+			nvec = entry->nvec;
+		else
+			nvec = 1 << entry->msi_attrib.multiple;
 		for (i = 0; i < nvec; i++)
 			BUG_ON(irq_has_action(entry->irq + i));
 	}
diff --git a/include/linux/msi.h b/include/linux/msi.h
index ce93a34..6f4dfba 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -35,6 +35,7 @@ struct msi_desc {
 
 	u32 masked;			/* mask bits */
 	unsigned int irq;
+	unsigned int nvec;
 	struct list_head list;
 
 	union {
-- 
1.7.7.6


-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 -tip 3/5] x86, MSI: Minor readability fixes
  2012-09-03  9:16 [PATCH v2 -tip 0/5] x86, MSI, AHCI: Support multiple MSIs Alexander Gordeev
  2012-09-03  9:17 ` [PATCH v2 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping Alexander Gordeev
  2012-09-03  9:18 ` [PATCH v2 -tip 2/5] x86, MSI: Allocate as many multiple IRQs as requested Alexander Gordeev
@ 2012-09-03  9:19 ` Alexander Gordeev
  2012-09-03  9:19 ` [PATCH v2 -tip 4/5] PCI, MSI: Enable multiple MSIs with pci_enable_msi_block_auto() Alexander Gordeev
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Alexander Gordeev @ 2012-09-03  9:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Thomas Gleixner, Bjorn Helgaas, Suresh Siddha,
	Yinghai Lu, Jeff Garzik, Matthew Wilcox, x86, linux-pci,
	linux-ide

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 arch/x86/kernel/apic/io_apic.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index fbf3916..3c29b00 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3151,7 +3151,7 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq,
 
 	if (irq_remapped(cfg)) {
 		compose_remapped_msi_msg(pdev, irq, dest, msg, hpet_id);
-		return err;
+		return 0;
 	}
 
 	if (x2apic_enabled())
@@ -3178,7 +3178,7 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq,
 			MSI_DATA_DELIVERY_LOWPRI) |
 		MSI_DATA_VECTOR(cfg->vector);
 
-	return err;
+	return 0;
 }
 
 static int
@@ -3260,7 +3260,7 @@ int setup_msix_irqs(struct pci_dev *dev, int nvec)
 	list_for_each_entry(msidesc, &dev->msi_list, list) {
 		irq = create_irq_nr(irq_want, node);
 		if (irq == 0)
-			return -1;
+			return -ENOSPC;
 		irq_want = irq + 1;
 		if (!irq_remapping_enabled)
 			goto no_ir;
-- 
1.7.7.6


-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 -tip 4/5] PCI, MSI: Enable multiple MSIs with pci_enable_msi_block_auto()
  2012-09-03  9:16 [PATCH v2 -tip 0/5] x86, MSI, AHCI: Support multiple MSIs Alexander Gordeev
                   ` (2 preceding siblings ...)
  2012-09-03  9:19 ` [PATCH v2 -tip 3/5] x86, MSI: Minor readability fixes Alexander Gordeev
@ 2012-09-03  9:19 ` Alexander Gordeev
  2012-09-07 20:53   ` Bjorn Helgaas
  2012-09-03  9:20 ` [PATCH v2 -tip 5/5] AHCI: Support multiple MSIs Alexander Gordeev
  2012-09-26 12:38 ` [PATCH v2 -tip 0/5] x86, MSI, " Alexander Gordeev
  5 siblings, 1 reply; 12+ messages in thread
From: Alexander Gordeev @ 2012-09-03  9:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Thomas Gleixner, Bjorn Helgaas, Suresh Siddha,
	Yinghai Lu, Jeff Garzik, Matthew Wilcox, x86, linux-pci,
	linux-ide

The new function pci_enable_msi_block_auto() tries to allocate maximum
possible number of MSIs up to the number the device supports. It
generalizes a pattern when pci_enable_msi_block() is contiguously called
until it succeeds or fails.

Opposite to pci_enable_msi_block() which takes the number of MSIs to
allocate as a input parameter, pci_enable_msi_block_auto() could be used
by device drivers to obtain the number of assigned MSIs and the number
of MSIs the device supports.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 Documentation/PCI/MSI-HOWTO.txt |   37 ++++++++++++++++++++++++++++++++-----
 drivers/pci/msi.c               |   26 ++++++++++++++++++++++++++
 include/linux/pci.h             |    7 +++++++
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
index 53e6fca..a091780 100644
--- a/Documentation/PCI/MSI-HOWTO.txt
+++ b/Documentation/PCI/MSI-HOWTO.txt
@@ -127,15 +127,42 @@ on the number of vectors that can be allocated; pci_enable_msi_block()
 returns as soon as it finds any constraint that doesn't allow the
 call to succeed.
 
-4.2.3 pci_disable_msi
+4.2.3 pci_enable_msi_block_auto
+
+int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *count)
+
+This variation on pci_enable_msi() call allows a device driver to request
+the maximum possible number of MSIs.  The MSI specification only allows
+interrupts to be allocated in powers of two, up to a maximum of 2^5 (32).
+
+If this function returns a positive number, it indicates that it has
+succeeded and the returned value is the number of allocated interrupts. In
+this case, the function enables MSI on this device and updates dev->irq to
+be the lowest of the new interrupts assigned to it.  The other interrupts
+assigned to the device are in the range dev->irq to dev->irq + returned
+value - 1.
+
+If this function returns a negative number, it indicates an error and
+the driver should not attempt to request any more MSI interrupts for
+this device.
+
+If the device driver needs to know the number of interrupts the device
+supports it can pass the pointer count where that number is stored. The
+device driver must decide what action to take if pci_enable_msi_block_auto()
+succeeds, but returns a value less than the number of interrupts supported.
+If the device driver does not need to know the number of interrupts
+supported, it can set the pointer count to NULL.
+
+4.2.4 pci_disable_msi
 
 void pci_disable_msi(struct pci_dev *dev)
 
 This function should be used to undo the effect of pci_enable_msi() or
-pci_enable_msi_block().  Calling it restores dev->irq to the pin-based
-interrupt number and frees the previously allocated message signaled
-interrupt(s).  The interrupt may subsequently be assigned to another
-device, so drivers should not cache the value of dev->irq.
+pci_enable_msi_block() or pci_enable_msi_block_auto().  Calling it restores
+dev->irq to the pin-based interrupt number and frees the previously
+allocated message signaled interrupt(s).  The interrupt may subsequently be
+assigned to another device, so drivers should not cache the value of
+dev->irq.
 
 Before calling this function, a device driver must always call free_irq()
 on any interrupt for which it previously called request_irq().
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index f0752d1..690b268 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -845,6 +845,32 @@ int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
 }
 EXPORT_SYMBOL(pci_enable_msi_block);
 
+int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec)
+{
+	int ret, pos, nvec;
+	u16 msgctl;
+
+	pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
+	if (!pos)
+		return -EINVAL;
+
+	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+	ret = 1 << ((msgctl & PCI_MSI_FLAGS_QMASK) >> 1);
+
+	if (maxvec)
+		*maxvec = ret;
+
+	do {
+		nvec = ret;
+		ret = pci_enable_msi_block(dev, nvec);
+	} while (ret > 0);
+
+	if (ret < 0)
+		return ret;
+	return nvec;
+}
+EXPORT_SYMBOL(pci_enable_msi_block_auto);
+
 void pci_msi_shutdown(struct pci_dev *dev)
 {
 	struct msi_desc *desc;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5faa831..b8a9454 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1070,6 +1070,12 @@ static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
 	return -1;
 }
 
+static inline int
+pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec)
+{
+	return -1;
+}
+
 static inline void pci_msi_shutdown(struct pci_dev *dev)
 { }
 static inline void pci_disable_msi(struct pci_dev *dev)
@@ -1101,6 +1107,7 @@ static inline int pci_msi_enabled(void)
 }
 #else
 extern int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
+extern int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec);
 extern void pci_msi_shutdown(struct pci_dev *dev);
 extern void pci_disable_msi(struct pci_dev *dev);
 extern int pci_msix_table_size(struct pci_dev *dev);
-- 
1.7.7.6


-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 -tip 5/5] AHCI: Support multiple MSIs
  2012-09-03  9:16 [PATCH v2 -tip 0/5] x86, MSI, AHCI: Support multiple MSIs Alexander Gordeev
                   ` (3 preceding siblings ...)
  2012-09-03  9:19 ` [PATCH v2 -tip 4/5] PCI, MSI: Enable multiple MSIs with pci_enable_msi_block_auto() Alexander Gordeev
@ 2012-09-03  9:20 ` Alexander Gordeev
  2012-09-27  4:59   ` Jeff Garzik
  2012-09-26 12:38 ` [PATCH v2 -tip 0/5] x86, MSI, " Alexander Gordeev
  5 siblings, 1 reply; 12+ messages in thread
From: Alexander Gordeev @ 2012-09-03  9:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Thomas Gleixner, Bjorn Helgaas, Suresh Siddha,
	Yinghai Lu, Jeff Garzik, Matthew Wilcox, x86, linux-pci,
	linux-ide

Take advantage of multiple MSIs implementation on x86 - on systems with
IRQ remapping AHCI ports not only get assigned separate MSI vectors -
but also separate IRQs. As result, interrupts generated by different
ports could be serviced on different CPUs rather than on a single one.

In cases when number of allocated MSIs is less than requested the Sharing
Last MSI mode does not get used, no matter implemented in hardware or not.
Instead, the driver assumes the advantage of multiple MSIs is negated and
falls back to the single MSI mode as if MRSM bit was set (some Intel chips
implement this strategy anyway - MRSM bit gets set even if the number of
allocated MSIs exceeds the number of implemented ports).

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 drivers/ata/ahci.c    |   14 ++--
 drivers/ata/ahci.h    |    5 +
 drivers/ata/libahci.c |  211 +++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 218 insertions(+), 12 deletions(-)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 50d5dea..0e7525e 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1057,7 +1057,7 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	struct device *dev = &pdev->dev;
 	struct ahci_host_priv *hpriv;
 	struct ata_host *host;
-	int n_ports, i, rc;
+	int n_ports, n_msis, i, rc;
 	int ahci_pci_bar = AHCI_PCI_BAR_STANDARD;
 
 	VPRINTK("ENTER\n");
@@ -1142,11 +1142,10 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (ahci_sb600_enable_64bit(pdev))
 		hpriv->flags &= ~AHCI_HFLAG_32BIT_ONLY;
 
-	if ((hpriv->flags & AHCI_HFLAG_NO_MSI) || pci_enable_msi(pdev))
-		pci_intx(pdev, 1);
-
 	hpriv->mmio = pcim_iomap_table(pdev)[ahci_pci_bar];
 
+	n_msis = ahci_init_interrupts(pdev, hpriv);
+
 	/* save initial config */
 	ahci_pci_save_initial_config(pdev, hpriv);
 
@@ -1242,8 +1241,11 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	ahci_pci_print_info(host);
 
 	pci_set_master(pdev);
-	return ata_host_activate(host, pdev->irq, ahci_interrupt, IRQF_SHARED,
-				 &ahci_sht);
+
+	if (!pdev->irq || (n_msis < 2))
+		return ata_host_activate(host, pdev->irq, ahci_interrupt,
+					 IRQF_SHARED, &ahci_sht);
+	return ahci_host_activate(host, pdev->irq, n_msis, &ahci_sht);
 }
 
 module_pci_driver(ahci_pci_driver);
diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h
index 57eb1c2..aa32efb 100644
--- a/drivers/ata/ahci.h
+++ b/drivers/ata/ahci.h
@@ -282,6 +282,8 @@ struct ahci_port_priv {
 	unsigned int		ncq_saw_d2h:1;
 	unsigned int		ncq_saw_dmas:1;
 	unsigned int		ncq_saw_sdb:1;
+	u32			intr_status;	/* interrupts to handle */
+	spinlock_t		lock;
 	u32 			intr_mask;	/* interrupts to enable */
 	bool			fbs_supported;	/* set iff FBS is supported */
 	bool			fbs_enabled;	/* set iff FBS is enabled */
@@ -329,6 +331,7 @@ void ahci_save_initial_config(struct device *dev,
 			      unsigned int mask_port_map);
 void ahci_init_controller(struct ata_host *host);
 int ahci_reset_controller(struct ata_host *host);
+int ahci_init_interrupts(struct pci_dev *pdev, struct ahci_host_priv *hpriv);
 
 int ahci_do_softreset(struct ata_link *link, unsigned int *class,
 		      int pmp, unsigned long deadline,
@@ -344,6 +347,8 @@ void ahci_set_em_messages(struct ahci_host_priv *hpriv,
 int ahci_reset_em(struct ata_host *host);
 irqreturn_t ahci_interrupt(int irq, void *dev_instance);
 void ahci_print_info(struct ata_host *host, const char *scc_s);
+int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis,
+		       struct scsi_host_template *sht);
 
 static inline void __iomem *__ahci_port_base(struct ata_host *host,
 					     unsigned int port_no)
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 555c07a..925c826 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -39,6 +39,7 @@
 #include <linux/blkdev.h>
 #include <linux/delay.h>
 #include <linux/interrupt.h>
+#include <linux/pci.h>
 #include <linux/dma-mapping.h>
 #include <linux/device.h>
 #include <scsi/scsi_host.h>
@@ -1128,6 +1129,29 @@ void ahci_init_controller(struct ata_host *host)
 }
 EXPORT_SYMBOL_GPL(ahci_init_controller);
 
+int ahci_init_interrupts(struct pci_dev *pdev, struct ahci_host_priv *hpriv)
+{
+	int rc;
+	unsigned int maxvec;
+
+	if (!(hpriv->flags & AHCI_HFLAG_NO_MSI)) {
+		rc = pci_enable_msi_block_auto(pdev, &maxvec);
+		if (rc > 0) {
+			if ((rc == maxvec) || (rc == 1))
+				return rc;
+			/* assume that advantage of multipe MSIs is negated,
+			 * so fallback to single MSI mode to save resources */
+			pci_disable_msi(pdev);
+			if (!pci_enable_msi(pdev))
+				return 1;
+		}
+	}
+
+	pci_intx(pdev, 1);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ahci_init_interrupts);
+
 static void ahci_dev_config(struct ata_device *dev)
 {
 	struct ahci_host_priv *hpriv = dev->link->ap->host->private_data;
@@ -1639,19 +1663,16 @@ static void ahci_error_intr(struct ata_port *ap, u32 irq_stat)
 		ata_port_abort(ap);
 }
 
-static void ahci_port_intr(struct ata_port *ap)
+static void ahci_handle_port_interrupt(struct ata_port *ap,
+				       void __iomem *port_mmio, u32 status)
 {
-	void __iomem *port_mmio = ahci_port_base(ap);
 	struct ata_eh_info *ehi = &ap->link.eh_info;
 	struct ahci_port_priv *pp = ap->private_data;
 	struct ahci_host_priv *hpriv = ap->host->private_data;
 	int resetting = !!(ap->pflags & ATA_PFLAG_RESETTING);
-	u32 status, qc_active = 0;
+	u32 qc_active = 0;
 	int rc;
 
-	status = readl(port_mmio + PORT_IRQ_STAT);
-	writel(status, port_mmio + PORT_IRQ_STAT);
-
 	/* ignore BAD_PMP while resetting */
 	if (unlikely(resetting))
 		status &= ~PORT_IRQ_BAD_PMP;
@@ -1727,6 +1748,105 @@ static void ahci_port_intr(struct ata_port *ap)
 	}
 }
 
+static inline void ahci_port_intr(struct ata_port *ap)
+{
+	void __iomem *port_mmio = ahci_port_base(ap);
+	u32 status;
+
+	status = readl(port_mmio + PORT_IRQ_STAT);
+	writel(status, port_mmio + PORT_IRQ_STAT);
+
+	ahci_handle_port_interrupt(ap, port_mmio, status);
+}
+
+irqreturn_t ahci_thread_fn(int irq, void *dev_instance)
+{
+	struct ata_port *ap = dev_instance;
+	struct ahci_port_priv *pp = ap->private_data;
+	void __iomem *port_mmio = ahci_port_base(ap);
+	unsigned long flags;
+	u32 status;
+
+	spin_lock_irqsave(&ap->host->lock, flags);
+	status = pp->intr_status;
+	if (status)
+		pp->intr_status = 0;
+	spin_unlock_irqrestore(&ap->host->lock, flags);
+
+	spin_lock_bh(ap->lock);
+	ahci_handle_port_interrupt(ap, port_mmio, status);
+	spin_unlock_bh(ap->lock);
+
+	return IRQ_HANDLED;
+}
+
+static inline void ahci_hw_port_interrupt(struct ata_port *ap)
+{
+	void __iomem *port_mmio = ahci_port_base(ap);
+	struct ahci_port_priv *pp = ap->private_data;
+	u32 status;
+
+	status = readl(port_mmio + PORT_IRQ_STAT);
+	writel(status, port_mmio + PORT_IRQ_STAT);
+
+	pp->intr_status |= status;
+}
+
+irqreturn_t ahci_hw_interrupt(int irq, void *dev_instance)
+{
+	struct ata_port *ap_this = dev_instance;
+	struct ahci_port_priv *pp = ap_this->private_data;
+	struct ata_host *host = ap_this->host;
+	struct ahci_host_priv *hpriv = host->private_data;
+	void __iomem *mmio = hpriv->mmio;
+	unsigned int i;
+	u32 irq_stat, irq_masked;
+
+	VPRINTK("ENTER\n");
+
+	spin_lock(&host->lock);
+
+	irq_stat = readl(mmio + HOST_IRQ_STAT);
+
+	if (!irq_stat) {
+		u32 status = pp->intr_status;
+
+		spin_unlock(&host->lock);
+
+		VPRINTK("EXIT\n");
+
+		return status ? IRQ_WAKE_THREAD : IRQ_NONE;
+	}
+
+	irq_masked = irq_stat & hpriv->port_map;
+
+	for (i = 0; i < host->n_ports; i++) {
+		struct ata_port *ap;
+
+		if (!(irq_masked & (1 << i)))
+			continue;
+
+		ap = host->ports[i];
+		if (ap) {
+			ahci_hw_port_interrupt(ap);
+			VPRINTK("port %u\n", i);
+		} else {
+			VPRINTK("port %u (no irq)\n", i);
+			if (ata_ratelimit())
+				dev_warn(host->dev,
+					 "interrupt on disabled port %u\n", i);
+		}
+	}
+
+	writel(irq_stat, mmio + HOST_IRQ_STAT);
+
+	spin_unlock(&host->lock);
+
+	VPRINTK("EXIT\n");
+
+	return IRQ_WAKE_THREAD;
+}
+
 irqreturn_t ahci_interrupt(int irq, void *dev_instance)
 {
 	struct ata_host *host = dev_instance;
@@ -2216,6 +2336,85 @@ void ahci_set_em_messages(struct ahci_host_priv *hpriv,
 }
 EXPORT_SYMBOL_GPL(ahci_set_em_messages);
 
+static inline void ahci_set_per_port_locking(struct ata_host *host)
+{
+	struct ata_port *ap;
+	struct ahci_port_priv *pp;
+	int i;
+
+	for (i = 0; i < host->n_ports; i++) {
+		ap = host->ports[i];
+		pp = ap->private_data;
+
+		spin_lock_init(&pp->lock);
+		ap->lock = &pp->lock;
+	}
+}
+
+/**
+ *	ahci_host_activate - start AHCI host, request IRQs and register it
+ *	@host: target ATA host
+ *	@irq: base IRQ number to request
+ *	@n_msis: number of MSIs allocated for this host
+ *	@irq_handler: irq_handler used when requesting IRQs
+ *	@irq_flags: irq_flags used when requesting IRQs
+ *	@sht: scsi_host_template to use when registering the host
+ *
+ *	Similar to ata_host_activate, but requests IRQs according to AHCI-1.1
+ *	when multiple MSIs were allocated. That is one MSI per port, starting
+ *	from @irq. Also, when number of MSIs is less than number of ports,
+ *	last MSI is shared among those ports which did not get their personal
+ *	MSI vector - see 10.6.2.3.1.
+ *
+ *	LOCKING:
+ *	Inherited from calling layer (may sleep).
+ *
+ *	RETURNS:
+ *	0 on success, -errno otherwise.
+ */
+int ahci_host_activate(struct ata_host *host, int irq, unsigned int n_msis,
+		       struct scsi_host_template *sht)
+{
+	int i, rc;
+	unsigned int n_irqs;
+
+	rc = ata_host_start(host);
+	if (rc)
+		return rc;
+
+	ahci_set_per_port_locking(host);
+
+	n_irqs = min(host->n_ports, n_msis);
+
+	for (i = 0; i < n_irqs; i++) {
+		rc = devm_request_threaded_irq(host->dev,
+			irq + i, ahci_hw_interrupt, ahci_thread_fn, IRQF_SHARED,
+			dev_driver_string(host->dev), host->ports[i]);
+		if (rc)
+			goto out_free_irqs;
+	}
+
+	for (i = 0; i < n_irqs; i++)
+		ata_port_desc(host->ports[i], "irq %d", irq + i);
+	for (; i < host->n_ports; i++)
+		ata_port_desc(host->ports[i], "irq %d", n_irqs - 1);
+
+	rc = ata_host_register(host, sht);
+	if (rc)
+		goto out_free_all_irqs;
+
+	return 0;
+
+out_free_all_irqs:
+	i = n_irqs;
+out_free_irqs:
+	for (; i; i--)
+		devm_free_irq(host->dev, irq + i - 1, host);
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(ahci_host_activate);
+
 MODULE_AUTHOR("Jeff Garzik");
 MODULE_DESCRIPTION("Common AHCI SATA low-level routines");
 MODULE_LICENSE("GPL");
-- 
1.7.7.6


-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping
  2012-09-03  9:17 ` [PATCH v2 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping Alexander Gordeev
@ 2012-09-03 18:53   ` Yinghai Lu
  2012-09-04  9:32     ` Alexander Gordeev
  0 siblings, 1 reply; 12+ messages in thread
From: Yinghai Lu @ 2012-09-03 18:53 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, Bjorn Helgaas,
	Suresh Siddha, Jeff Garzik, Matthew Wilcox, x86, linux-pci,
	linux-ide

On Mon, Sep 3, 2012 at 2:17 AM, Alexander Gordeev <agordeev@redhat.com> wrote:
> The MSI specification has several constraints in comparison with MSI-X,
> most notable of them is the inability to configure MSIs independently.
> As a result, it is impossible to dispatch interrupts from different
> queues to different CPUs. This is largely devalues the support of
> multiple MSIs in SMP systems.
>
> Also, a necessity to allocate a contiguous block of vector numbers for
> devices capable of multiple MSIs might cause a considerable pressure on
> x86 interrupt vector allocator and could lead to fragmentation of the
> interrupt vectors space.
>
> This patch overcomes both drawbacks in presense of IRQ remapping and
> lets devices take advantage of multiple queues and per-IRQ affinity
> assignments.
>
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> ---
>  arch/x86/kernel/apic/io_apic.c |  166 +++++++++++++++++++++++++++++++++++++--
>  include/linux/irq.h            |    6 ++
>  kernel/irq/chip.c              |   30 +++++--
>  kernel/irq/irqdesc.c           |   31 ++++++++
>  4 files changed, 216 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index c265593..5fd2577 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -305,6 +305,11 @@ static int alloc_irq_from(unsigned int from, int node)
>         return irq_alloc_desc_from(from, node);
>  }
>
> +static int alloc_irqs_from(unsigned int from, unsigned int count, int node)
> +{
> +       return irq_alloc_descs_from(from, count, node);
> +}
> +
>  static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
>  {
>         free_irq_cfg(at, cfg);
> @@ -3039,6 +3044,55 @@ int create_irq(void)
>         return irq;
>  }
>
> +unsigned int create_irqs(unsigned int from, unsigned int count, int node)
> +{
> +       struct irq_cfg **cfg;
> +       unsigned long flags;
> +       int irq, i;
> +
> +       if (from < nr_irqs_gsi)
> +               from = nr_irqs_gsi;
> +
> +       cfg = kzalloc_node(count * sizeof(cfg[0]), GFP_KERNEL, node);
> +       if (!cfg)
> +               return 0;
> +
> +       irq = alloc_irqs_from(from, count, node);
> +       if (irq < 0)
> +               goto out_cfgs;
> +
> +       for (i = 0; i < count; i++) {
> +               cfg[i] = alloc_irq_cfg(irq + i, node);
> +               if (!cfg[i])
> +                       goto out_irqs;
> +       }
> +
> +       raw_spin_lock_irqsave(&vector_lock, flags);
> +       for (i = 0; i < count; i++)
> +               if (__assign_irq_vector(irq + i, cfg[i], apic->target_cpus()))
> +                       goto out_vecs;
> +       raw_spin_unlock_irqrestore(&vector_lock, flags);
> +
> +       for (i = 0; i < count; i++) {
> +               irq_set_chip_data(irq + i, cfg[i]);
> +               irq_clear_status_flags(irq + i, IRQ_NOREQUEST);
> +       }
> +
> +       kfree(cfg);
> +       return irq;
> +
> +out_vecs:
> +       for (; i; i--)
> +               __clear_irq_vector(irq + i - 1, cfg[i - 1]);
> +       raw_spin_unlock_irqrestore(&vector_lock, flags);
> +out_irqs:
> +       for (i = 0; i < count; i++)
> +               free_irq_at(irq + i, cfg[i]);
> +out_cfgs:
> +       kfree(cfg);
> +       return 0;
> +}
> +

You may update create_irq_nr to be __create_irq_nr, and it could take
extra count.

and later have create_irq_nr to be __create_irq_nr(,1,)
and create_irqs to be __create_irq_nr(,count,)
....

BTW, in short, how much performance benefits for adding 500 lines code?

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping
  2012-09-03 18:53   ` Yinghai Lu
@ 2012-09-04  9:32     ` Alexander Gordeev
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Gordeev @ 2012-09-04  9:32 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, Bjorn Helgaas,
	Suresh Siddha, Jeff Garzik, Matthew Wilcox, x86, linux-pci,
	linux-ide

On Mon, Sep 03, 2012 at 11:53:39AM -0700, Yinghai Lu wrote:
> On Mon, Sep 3, 2012 at 2:17 AM, Alexander Gordeev <agordeev@redhat.com> wrote:
> You may update create_irq_nr to be __create_irq_nr, and it could take
> extra count.
> 
> and later have create_irq_nr to be __create_irq_nr(,1,)
> and create_irqs to be __create_irq_nr(,count,)
> ....

Indeed. Will do.

> BTW, in short, how much performance benefits for adding 500 lines code?

Unfortunatelly, I do not have a short answer here. There are three types
of performance this series deals with - I'll try to summarize:

- devices - 3 SATA HDDs generate roughly one interrupt every 273 us while
  it get handled in less than 5 us. So there is/could be no increase here;

- the hardware context interrupt handler - its performance dropped 2.5 times
  (little bit more in fact) at the expense of increase of 1.3 times in
  overall interrupt handling time (hardware context + threaded context);

- overall system performance - I *assume* it should increase, because:
  (a) AHCI interrupt handlers keep local interrupts disabled 2.5 times less
  (b) separate AHCI IRQs become subjects of IRQ balancing
  (c) threaded handlers are per-device, per-CPU (well, up to irqbalanced)
      and executed with local interrupts enabled;

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -tip 4/5] PCI, MSI: Enable multiple MSIs with pci_enable_msi_block_auto()
  2012-09-03  9:19 ` [PATCH v2 -tip 4/5] PCI, MSI: Enable multiple MSIs with pci_enable_msi_block_auto() Alexander Gordeev
@ 2012-09-07 20:53   ` Bjorn Helgaas
  0 siblings, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2012-09-07 20:53 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, Suresh Siddha,
	Yinghai Lu, Jeff Garzik, Matthew Wilcox, x86, linux-pci,
	linux-ide

On Mon, Sep 3, 2012 at 2:19 AM, Alexander Gordeev <agordeev@redhat.com> wrote:
> The new function pci_enable_msi_block_auto() tries to allocate maximum
> possible number of MSIs up to the number the device supports. It
> generalizes a pattern when pci_enable_msi_block() is contiguously called
> until it succeeds or fails.
>
> Opposite to pci_enable_msi_block() which takes the number of MSIs to
> allocate as a input parameter, pci_enable_msi_block_auto() could be used
> by device drivers to obtain the number of assigned MSIs and the number
> of MSIs the device supports.
>
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

I haven't looked at the other patches in this series, but if the
x86/IRQ folks like them, I'm OK with this part.

Since the series is mostly non-PCI, I think it makes the most sense to
keep it all together and merge it through some x86 or IRQ tree,
including this PCI bit.

> ---
>  Documentation/PCI/MSI-HOWTO.txt |   37 ++++++++++++++++++++++++++++++++-----
>  drivers/pci/msi.c               |   26 ++++++++++++++++++++++++++
>  include/linux/pci.h             |    7 +++++++
>  3 files changed, 65 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
> index 53e6fca..a091780 100644
> --- a/Documentation/PCI/MSI-HOWTO.txt
> +++ b/Documentation/PCI/MSI-HOWTO.txt
> @@ -127,15 +127,42 @@ on the number of vectors that can be allocated; pci_enable_msi_block()
>  returns as soon as it finds any constraint that doesn't allow the
>  call to succeed.
>
> -4.2.3 pci_disable_msi
> +4.2.3 pci_enable_msi_block_auto
> +
> +int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *count)
> +
> +This variation on pci_enable_msi() call allows a device driver to request
> +the maximum possible number of MSIs.  The MSI specification only allows
> +interrupts to be allocated in powers of two, up to a maximum of 2^5 (32).
> +
> +If this function returns a positive number, it indicates that it has
> +succeeded and the returned value is the number of allocated interrupts. In
> +this case, the function enables MSI on this device and updates dev->irq to
> +be the lowest of the new interrupts assigned to it.  The other interrupts
> +assigned to the device are in the range dev->irq to dev->irq + returned
> +value - 1.
> +
> +If this function returns a negative number, it indicates an error and
> +the driver should not attempt to request any more MSI interrupts for
> +this device.
> +
> +If the device driver needs to know the number of interrupts the device
> +supports it can pass the pointer count where that number is stored. The
> +device driver must decide what action to take if pci_enable_msi_block_auto()
> +succeeds, but returns a value less than the number of interrupts supported.
> +If the device driver does not need to know the number of interrupts
> +supported, it can set the pointer count to NULL.
> +
> +4.2.4 pci_disable_msi
>
>  void pci_disable_msi(struct pci_dev *dev)
>
>  This function should be used to undo the effect of pci_enable_msi() or
> -pci_enable_msi_block().  Calling it restores dev->irq to the pin-based
> -interrupt number and frees the previously allocated message signaled
> -interrupt(s).  The interrupt may subsequently be assigned to another
> -device, so drivers should not cache the value of dev->irq.
> +pci_enable_msi_block() or pci_enable_msi_block_auto().  Calling it restores
> +dev->irq to the pin-based interrupt number and frees the previously
> +allocated message signaled interrupt(s).  The interrupt may subsequently be
> +assigned to another device, so drivers should not cache the value of
> +dev->irq.
>
>  Before calling this function, a device driver must always call free_irq()
>  on any interrupt for which it previously called request_irq().
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index f0752d1..690b268 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -845,6 +845,32 @@ int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
>  }
>  EXPORT_SYMBOL(pci_enable_msi_block);
>
> +int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec)
> +{
> +       int ret, pos, nvec;
> +       u16 msgctl;
> +
> +       pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
> +       if (!pos)
> +               return -EINVAL;
> +
> +       pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
> +       ret = 1 << ((msgctl & PCI_MSI_FLAGS_QMASK) >> 1);
> +
> +       if (maxvec)
> +               *maxvec = ret;
> +
> +       do {
> +               nvec = ret;
> +               ret = pci_enable_msi_block(dev, nvec);
> +       } while (ret > 0);
> +
> +       if (ret < 0)
> +               return ret;
> +       return nvec;
> +}
> +EXPORT_SYMBOL(pci_enable_msi_block_auto);
> +
>  void pci_msi_shutdown(struct pci_dev *dev)
>  {
>         struct msi_desc *desc;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 5faa831..b8a9454 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1070,6 +1070,12 @@ static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
>         return -1;
>  }
>
> +static inline int
> +pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec)
> +{
> +       return -1;
> +}
> +
>  static inline void pci_msi_shutdown(struct pci_dev *dev)
>  { }
>  static inline void pci_disable_msi(struct pci_dev *dev)
> @@ -1101,6 +1107,7 @@ static inline int pci_msi_enabled(void)
>  }
>  #else
>  extern int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
> +extern int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec);
>  extern void pci_msi_shutdown(struct pci_dev *dev);
>  extern void pci_disable_msi(struct pci_dev *dev);
>  extern int pci_msix_table_size(struct pci_dev *dev);
> --
> 1.7.7.6
>
>
> --
> Regards,
> Alexander Gordeev
> agordeev@redhat.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -tip 0/5] x86, MSI, AHCI: Support multiple MSIs
  2012-09-03  9:16 [PATCH v2 -tip 0/5] x86, MSI, AHCI: Support multiple MSIs Alexander Gordeev
                   ` (4 preceding siblings ...)
  2012-09-03  9:20 ` [PATCH v2 -tip 5/5] AHCI: Support multiple MSIs Alexander Gordeev
@ 2012-09-26 12:38 ` Alexander Gordeev
  2012-09-27  4:03   ` Ingo Molnar
  5 siblings, 1 reply; 12+ messages in thread
From: Alexander Gordeev @ 2012-09-26 12:38 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Ingo Molnar, Thomas Gleixner, Bjorn Helgaas, Suresh Siddha,
	Yinghai Lu, Jeff Garzik, Matthew Wilcox, x86, linux-pci,
	linux-ide, linux-kernel

On Mon, Sep 03, 2012 at 11:16:55AM +0200, Alexander Gordeev wrote:
>   5/5:	based on Jeff's comments:
> 
> 	- AHCI specific code moved from libata-core.c to libahci.c;
> 
> 	- the host-wide lock concern addressed - ahci_interrupt() handler
> 	  split in two parts: ahci_hw_interrupt() hardware context handler
> 	  and ahci_thread_fn() threaded handler;
> 
> 	- host->lock statistics attached;

Hi Jeff,

Would you please comment on the changes and numbers?

Thanks!

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -tip 0/5] x86, MSI, AHCI: Support multiple MSIs
  2012-09-26 12:38 ` [PATCH v2 -tip 0/5] x86, MSI, " Alexander Gordeev
@ 2012-09-27  4:03   ` Ingo Molnar
  0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2012-09-27  4:03 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Jeff Garzik, Ingo Molnar, Thomas Gleixner, Bjorn Helgaas,
	Suresh Siddha, Yinghai Lu, Matthew Wilcox, x86, linux-pci,
	linux-ide, linux-kernel


* Alexander Gordeev <agordeev@redhat.com> wrote:

> On Mon, Sep 03, 2012 at 11:16:55AM +0200, Alexander Gordeev wrote:
> >   5/5:	based on Jeff's comments:
> > 
> > 	- AHCI specific code moved from libata-core.c to libahci.c;
> > 
> > 	- the host-wide lock concern addressed - ahci_interrupt() handler
> > 	  split in two parts: ahci_hw_interrupt() hardware context handler
> > 	  and ahci_thread_fn() threaded handler;
> > 
> > 	- host->lock statistics attached;
> 
> Hi Jeff,
> 
> Would you please comment on the changes and numbers?

Yep, given the diffstat:

 Documentation/PCI/MSI-HOWTO.txt |   37 ++++++-
 arch/x86/kernel/apic/io_apic.c  |  170 +++++++++++++++++++++++++++++--
 drivers/ata/ahci.c              |   14 ++-
 drivers/ata/ahci.h              |    5 +
 drivers/ata/libahci.c           |  211 +++++++++++++++++++++++++++++++++++++-
 drivers/pci/msi.c               |   36 ++++++-
 include/linux/irq.h             |    6 +
 include/linux/msi.h             |    1 +
 include/linux/pci.h             |    7 ++
 kernel/irq/chip.c               |   30 ++++--
 kernel/irq/irqdesc.c            |   31 ++++++
 11 files changed, 509 insertions(+), 39 deletions(-)

We absolutely need a very happy and enthusiastic Acked-by from 
Jeff before we can pursue it :-)

Thaks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 -tip 5/5] AHCI: Support multiple MSIs
  2012-09-03  9:20 ` [PATCH v2 -tip 5/5] AHCI: Support multiple MSIs Alexander Gordeev
@ 2012-09-27  4:59   ` Jeff Garzik
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Garzik @ 2012-09-27  4:59 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, Bjorn Helgaas,
	Suresh Siddha, Yinghai Lu, Matthew Wilcox, x86, linux-pci,
	linux-ide

On 09/03/2012 05:20 AM, Alexander Gordeev wrote:
> Take advantage of multiple MSIs implementation on x86 - on systems with
> IRQ remapping AHCI ports not only get assigned separate MSI vectors -
> but also separate IRQs. As result, interrupts generated by different
> ports could be serviced on different CPUs rather than on a single one.
>
> In cases when number of allocated MSIs is less than requested the Sharing
> Last MSI mode does not get used, no matter implemented in hardware or not.
> Instead, the driver assumes the advantage of multiple MSIs is negated and
> falls back to the single MSI mode as if MRSM bit was set (some Intel chips
> implement this strategy anyway - MRSM bit gets set even if the number of
> allocated MSIs exceeds the number of implemented ports).
>
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> ---
>   drivers/ata/ahci.c    |   14 ++--
>   drivers/ata/ahci.h    |    5 +
>   drivers/ata/libahci.c |  211 +++++++++++++++++++++++++++++++++++++++++++++++--
>   3 files changed, 218 insertions(+), 12 deletions(-)

OK, round 2 :)

Reviewed the locking twice, and cannot find any problems there.  That 
was my main concern.

Other problems noted:

1) Including linux/pci.h into libahci.c should have been a hint ;p

libahci is entirely PCI-free, AHCI-specific code.  All code that is 
specific to PCI hosts goes into ahci.c.  Anything in libahci needs to be 
purely iomem read/write, standard AHCI DMA data structure manipulation, 
etc.  This is shared with ahci_platform.

So, move the PCI specific code back into ahci.c.  Thus, 
ahci_hw_interrupt appears like libahci material, but 
ahci_init_interrupts is not.

2) don't manually add new "static inline", let the compiler figure it out

3) spinlock init, such as in e.g. ahci_set_per_port_lock() should occur 
with the rest of ahci_port_priv initialization, inside ahci_port_start()

Otherwise it looks OK and ready for merging, after these updates.





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-09-27  4:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-03  9:16 [PATCH v2 -tip 0/5] x86, MSI, AHCI: Support multiple MSIs Alexander Gordeev
2012-09-03  9:17 ` [PATCH v2 -tip 1/5] x86, MSI: Support multiple MSIs in presense of IRQ remapping Alexander Gordeev
2012-09-03 18:53   ` Yinghai Lu
2012-09-04  9:32     ` Alexander Gordeev
2012-09-03  9:18 ` [PATCH v2 -tip 2/5] x86, MSI: Allocate as many multiple IRQs as requested Alexander Gordeev
2012-09-03  9:19 ` [PATCH v2 -tip 3/5] x86, MSI: Minor readability fixes Alexander Gordeev
2012-09-03  9:19 ` [PATCH v2 -tip 4/5] PCI, MSI: Enable multiple MSIs with pci_enable_msi_block_auto() Alexander Gordeev
2012-09-07 20:53   ` Bjorn Helgaas
2012-09-03  9:20 ` [PATCH v2 -tip 5/5] AHCI: Support multiple MSIs Alexander Gordeev
2012-09-27  4:59   ` Jeff Garzik
2012-09-26 12:38 ` [PATCH v2 -tip 0/5] x86, MSI, " Alexander Gordeev
2012-09-27  4:03   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).