linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
@ 2014-11-04 12:01 Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 01/31] irqdomain: Introduce new interfaces to support hierarchy irqdomains Jiang Liu
                   ` (32 more replies)
  0 siblings, 33 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger
  Cc: Jiang Liu, Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Joerg Roedel, Greg Kroah-Hartman, x86, linux-kernel, linux-pci,
	linux-acpi, linux-arm-kernel

We plan to restructure x86 interrupt code based on hierarchy irqdomain,
that is to build irqdomains for CPU vector, interrupt remapping unit,
IOAPIC, MSI and HPET etc and organize those irqdomains in hierarchy mode.
Each irqdomain manages corresponding interrupt controller and talks to
parent interrupt controller through public irqdomain interfaces. We also
support stacked irq_chip based on hierarchy irqdomain. It will make the
x86 interrupt architecture much more clear and more easy to maintain
with hierarchy irqdomain and stacked irq_chip. It may also help ARM
interrupt management architecture too.

This is the second patch set to enable support of hierarchy irqdomain
on x86 platforms. It depends on the first part at:
https://lkml.org/lkml/2014/10/27/122
And you may access it at:
https://github.com/jiangliu/linux.git irqdomain/p2v4

And there will be a third patch set to convert IOAPIC driver to support
hierarchy irqdomain and clean up code.

Patch 1-5 enhance irqdomain and irq core to support hierarchy irqdomain
and stacked irqchip.
Patch 6-12 implement an irqdomain to manange CPU interrupt vectors, and
it's the root irqdomain for x86 platforms.
Patch 13-16 converts Intel and AMD interrupt remapping drivers to
support hierarchy irqdomain.
Patch 17-23 converts HPET and MSI to support hierarchy irqdomain.
Patch 24-27 cleans up unsued code in x86 arch and interrupt remapping
drivers.
Patch 28-31 converts DMAR, HTIRQ and UV to support hierarchy irqdomain.

We have tested this patchset on Intel 32-bit and 64-bit systems. And it
also passes Fengguang's 0day tests. But helps are need for testing:
1) AMD interrupt remapping 
2) AMD HT_IRQ
3) UV platform

V3->V4:
1) Simplify IRQ remapping interfaces
2) Hide all IRQ remapping logic from MSI/HPET drivers
3) Move most MSI irqdomain code to public drivers/pci/msi.c so it could
   be resued
4) Improve common PCI MSI code
5) Rebase to tip/x86/apic
V2->V3:
1) Fix bugs in handling OF irqdomain
2) Add documentation
3) Rebase to v3.18-rc2
V1->V2
1) Add hierarchy iredomain support of DMAR IRQ and UV IRQ.
2) Fix bugs reported by Joe C.
3) Address all review comments from Thomas
4) Fix a bug found during tests
5) Fix errors and warning found by 0day tests

Jiang Liu (30):
  irqdomain: Introduce new interfaces to support hierarchy irqdomains
  genirq: Introduce helper functions to support stacked irq_chip
  genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked
    irqchip
  genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip
  x86, irq: Save destination CPU ID in irq_cfg
  x86, irq: Use hierarchy irqdomain to manage CPU interrupt vectors
  x86, hpet: Use new irqdomain interfaces to allocate/free IRQ
  x86, MSI: Use new irqdomain interfaces to allocate/free IRQ
  x86, uv: Use new irqdomain interfaces to allocate/free IRQ
  x86, htirq: Use new irqdomain interfaces to allocate/free IRQ
  x86, dmar: Use new irqdomain interfaces to allocate/free IRQ
  x86: irq_remapping: Introduce new interfaces to support hierarchy
    irqdomain
  iommu/vt-d: Change prototypes to prepare for enabling hierarchy
    irqdomain
  iommu/vt-d: Enhance Intel IR driver to suppport hierarchy irqdomain
  iommu/amd: Enhance AMD IR driver to suppport hierarchy irqdomain
  x86, hpet: Enhance HPET IRQ to support hierarchy irqdomain
  PCI/MSI, trivial: Fix minor syntax issues according to coding styles
  PCI/MSI: Simplify PCI MSI code by initializing msi_desc.nvec_used
    earlier
  PCI/MSI: Kill redundant calling for irq_set_msi_desc() for MSIx
    interrupts
  PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  x86, PCI, MSI: Use hierarchy irqdomain to manage MSI interrupts
  x86, irq: Directly call native_compose_msi_msg() for DMAR IRQ
  iommu/vt-d: Clean up unused MSI related code
  iommu/amd: Clean up unused MSI related code
  x86: irq_remapping: Clean up unused MSI related code
  x86, irq: Clean up unused MSI related code and interfaces
  iommu/vt-d: Refine the interfaces to create IRQ for DMAR unit
  x86, irq: Use hierarchy irqdomain to manage DMAR interrupts
  x86, htirq: Use hierarchy irqdomain to manage Hypertransport
    interrupts
  x86, uv: Use hierarchy irqdomain to manage UV interrupts

Yingjoe Chen (1):
  irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain
    in case OF

 Documentation/IRQ-domain.txt          |   71 +++++
 arch/ia64/include/asm/irq_remapping.h |    2 -
 arch/ia64/kernel/msi_ia64.c           |   30 ++-
 arch/x86/Kconfig                      |    4 +-
 arch/x86/include/asm/hpet.h           |   16 +-
 arch/x86/include/asm/hw_irq.h         |   87 ++++++
 arch/x86/include/asm/irq_remapping.h  |   50 ++--
 arch/x86/include/asm/pci.h            |    5 -
 arch/x86/include/asm/x86_init.h       |    4 -
 arch/x86/kernel/apic/htirq.c          |  176 +++++++++----
 arch/x86/kernel/apic/io_apic.c        |    3 -
 arch/x86/kernel/apic/msi.c            |  432 ++++++++++++++++++++----------
 arch/x86/kernel/apic/vector.c         |  165 +++++++++++-
 arch/x86/kernel/hpet.c                |   57 ++--
 arch/x86/kernel/x86_init.c            |    2 -
 arch/x86/platform/uv/uv_irq.c         |  299 ++++++++-------------
 drivers/iommu/amd_iommu.c             |  380 ++++++++++++++++++++------
 drivers/iommu/amd_iommu_init.c        |    4 +
 drivers/iommu/amd_iommu_proto.h       |    9 +
 drivers/iommu/amd_iommu_types.h       |    5 +
 drivers/iommu/dmar.c                  |   19 +-
 drivers/iommu/intel_irq_remapping.c   |  469 ++++++++++++++++++++++-----------
 drivers/iommu/irq_remapping.c         |  197 ++++----------
 drivers/iommu/irq_remapping.h         |   20 +-
 drivers/pci/Kconfig                   |    4 +
 drivers/pci/htirq.c                   |   48 +---
 drivers/pci/msi.c                     |  176 ++++++++++---
 include/linux/dmar.h                  |    3 +-
 include/linux/htirq.h                 |   22 +-
 include/linux/intel-iommu.h           |    4 +
 include/linux/irq.h                   |   19 ++
 include/linux/irqdomain.h             |   86 ++++++
 include/linux/msi.h                   |   11 +
 kernel/irq/Kconfig                    |    4 +
 kernel/irq/chip.c                     |   37 +++
 kernel/irq/irqdomain.c                |  416 +++++++++++++++++++++++++++--
 kernel/irq/manage.c                   |    2 +
 37 files changed, 2347 insertions(+), 991 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 01/31] irqdomain: Introduce new interfaces to support hierarchy irqdomains
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-05 23:48   ` Thomas Gleixner
  2014-11-04 12:01 ` [Patch Part2 v4 02/31] irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF Jiang Liu
                   ` (31 subsequent siblings)
  32 siblings, 1 reply; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Jonathan Corbet, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel, linux-doc

We plan to use hierarchy irqdomain to suppport CPU vector assignment,
interrupt remapping controller, IO-APIC controller, MSI interrupt
and hypertransport interrupt etc on x86 platforms. So extend irqdomain
interfaces to support hierarchy irqdomain.

There are already many clients of current irqdomain interfaces.
To minimize the changes, we choose to introduce new version 2 interfaces
to support hierarchy instead of extending existing irqdomain interfaces.

According to Thomas's suggestion, the most important design decision is
to build hierarchy struct irq_data to support hierarchy irqdomain, so
hierarchy irqdomain related data could be saved in struct irq_data.
With support of hierarchy irq_data, we could also support stacked
irq_chips. This is most useful in case of set_affinity().

The new hierarchy irqdomain introduces following interfaces:
1) irq_domain_alloc_irqs()/irq_domain_free_irqs(): allocate/release IRQ
   and related resources.
2) __irq_domain_alloc_irqs(): a special version to support legacy IRQs.
3) irq_domain_activate_irq()/irq_domain_deactivate_irq(): program
   interrupt controllers to activate/deactivate interrupt.

There are also several help functions to ease irqdomain implemenations:
1) irq_domain_get_irq_data(): get irq_data associated with a specific
   irqdomain.
2) irq_domain_set_hwirq_and_chip(): save irqdomain specific data into
   irq_data.
3) irq_domain_alloc_irqs_parent()/irq_domain_free_irqs_parent(): invoke
   parent irqdomain's alloc/free callbacks.

We also changed irq_startup()/irq_shutdown() to invoke
irq_domain_activate_irq()/irq_domain_deactivate_irq() to program
interrupt controller when start/stop interrupts.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 Documentation/IRQ-domain.txt |   71 ++++++++
 include/linux/irq.h          |    5 +
 include/linux/irqdomain.h    |   86 +++++++++
 kernel/irq/Kconfig           |    4 +
 kernel/irq/chip.c            |    3 +
 kernel/irq/irqdomain.c       |  399 ++++++++++++++++++++++++++++++++++++++++--
 6 files changed, 552 insertions(+), 16 deletions(-)

diff --git a/Documentation/IRQ-domain.txt b/Documentation/IRQ-domain.txt
index 8a8b82c9ca53..39cfa72732ff 100644
--- a/Documentation/IRQ-domain.txt
+++ b/Documentation/IRQ-domain.txt
@@ -151,3 +151,74 @@ used and no descriptor gets allocated it is very important to make sure
 that the driver using the simple domain call irq_create_mapping()
 before any irq_find_mapping() since the latter will actually work
 for the static IRQ assignment case.
+
+==== Hierarchy IRQ domain ====
+On some architectures, there may be multiple interrupt controllers
+involved in delivering an interrupt from the device to the target CPU.
+Let's look at a typical interrupt delivering path on x86 platforms:
+
+Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU
+
+There are three interrupt controllers involved:
+1) IOAPIC controller
+2) Interrupt remapping controller
+3) Local APIC controller
+
+To support such a hardware topology and make software architecture match
+hardware architecture, an irq_domain data structure is built for each
+interrupt controller and those irq_domains are organized into hierarchy.
+When building irq_domain hierarchy, the irq_domain near to the device is
+child and the irq_domain near to CPU is parent. So a hierarchy structure
+as below will be built for the example above.
+	CPU Vector irq_domain (root irq_domain to manage CPU vectors)
+		^
+		|
+	Interrupt Remapping irq_domain (manage irq_remapping entries)
+		^
+		|
+	IOAPIC irq_domain (manage IOAPIC delivery entries/pins)
+
+There are four major interfaces to use hierarchy irq_domain:
+1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt
+   controller related resources to deliver these interrupts.
+2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller
+   related resources associated with these interrupts.
+3) irq_domain_activate_irq(): activate interrupt controller hardware to
+   deliver the interrupt.
+3) irq_domain_deactivate_irq(): deactivate interrupt controller hardware
+   to stop delivering the interrupt.
+
+Following changes are needed to support hierarchy irq_domain.
+1) a new field 'parent' is added to struct irq_domain; it's used to
+   maintain irq_domain hierarchy information.
+2) a new field 'parent_data' is added to struct irq_data; it's used to
+   build hierarchy irq_data to match hierarchy irq_domains. The irq_data
+   is used to store irq_domain pointer and hardware irq number.
+3) new callbacks are added to struct irq_domain_ops to support hierarchy
+   irq_domain operations.
+
+With support of hierarchy irq_domain and hierarchy irq_data ready, an
+irq_domain structure is built for each interrupt controller, and an
+irq_data structure is allocated for each irq_domain associated with an
+IRQ. Now we could go one step further to support stacked(hierarchy)
+irq_chip. That is, an irq_chip is associated with each irq_data along
+the hierarchy. A child irq_chip may implement a required action by
+itself or by cooperating with its parent irq_chip.
+
+With stacked irq_chip, interrupt controller driver only needs to deal
+with the hardware managed by itself and may ask for services from its
+parent irq_chip when needed. So we could achieve a much cleaner
+software architecture.
+
+For an interrupt controller driver to support hierarchy irq_domain, it
+needs to:
+1) Implement irq_domain_ops.alloc and irq_domain_ops.free
+2) Optionally implement irq_domain_ops.activate and
+   irq_domain_ops.deactivate.
+3) Optionally implement an irq_chip to manage the interrupt controller
+   hardware.
+4) No need to implement irq_domain_ops.map and irq_domain_ops.unmap,
+   they are unused with hierarchy irq_domain.
+
+Hierarchy irq_domain may also be used to support other architectures,
+such as ARM, ARM64 etc.
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 03f48d936f66..13ba412ce3a0 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -133,6 +133,8 @@ struct irq_domain;
  * @chip:		low level interrupt hardware access
  * @domain:		Interrupt translation domain; responsible for mapping
  *			between hwirq number and linux irq number.
+ * @parent_data:	pointer to parent struct irq_data to support hierarchy
+ *			irq_domain
  * @handler_data:	per-IRQ data for the irq_chip methods
  * @chip_data:		platform-specific per-chip private data for the chip
  *			methods, to allow shared chip implementations
@@ -151,6 +153,9 @@ struct irq_data {
 	unsigned int		state_use_accessors;
 	struct irq_chip		*chip;
 	struct irq_domain	*domain;
+#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
+	struct irq_data		*parent_data;
+#endif
 	void			*handler_data;
 	void			*chip_data;
 	struct msi_desc		*msi_desc;
diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index b0f9d16e48f6..009b4f573f17 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -38,6 +38,8 @@
 struct device_node;
 struct irq_domain;
 struct of_device_id;
+struct irq_chip;
+struct irq_data;
 
 /* Number of irqs reserved for a legacy isa controller */
 #define NUM_ISA_INTERRUPTS	16
@@ -64,6 +66,16 @@ struct irq_domain_ops {
 	int (*xlate)(struct irq_domain *d, struct device_node *node,
 		     const u32 *intspec, unsigned int intsize,
 		     unsigned long *out_hwirq, unsigned int *out_type);
+
+#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
+	/* extended V2 interfaces to support hierarchy irq_domains */
+	int (*alloc)(struct irq_domain *d, unsigned int virq,
+		     unsigned int nr_irqs, void *arg);
+	void (*free)(struct irq_domain *d, unsigned int virq,
+		     unsigned int nr_irqs);
+	int (*activate)(struct irq_domain *d, struct irq_data *irq_data);
+	int (*deactivate)(struct irq_domain *d, struct irq_data *irq_data);
+#endif
 };
 
 extern struct irq_domain_ops irq_generic_chip_ops;
@@ -77,6 +89,7 @@ struct irq_domain_chip_generic;
  * @ops: pointer to irq_domain methods
  * @host_data: private data pointer for use by owner.  Not touched by irq_domain
  *             core code.
+ * @flags: host per irq_domain flags
  *
  * Optional elements
  * @of_node: Pointer to device tree nodes associated with the irq_domain. Used
@@ -84,6 +97,7 @@ struct irq_domain_chip_generic;
  * @gc: Pointer to a list of generic chips. There is a helper function for
  *      setting up one or more generic chips for interrupt controllers
  *      drivers using the generic chip library which uses this pointer.
+ * @parent: Pointer to parent irq_domain to support hierarchy irq_domains
  *
  * Revmap data, used internally by irq_domain
  * @revmap_direct_max_irq: The largest hwirq that can be set for controllers that
@@ -97,10 +111,14 @@ struct irq_domain {
 	const char *name;
 	const struct irq_domain_ops *ops;
 	void *host_data;
+	unsigned int flags;
 
 	/* Optional data */
 	struct device_node *of_node;
 	struct irq_domain_chip_generic *gc;
+#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
+	struct irq_domain *parent;
+#endif
 
 	/* reverse map data. The linear map gets appended to the irq_domain */
 	irq_hw_number_t hwirq_max;
@@ -110,6 +128,9 @@ struct irq_domain {
 	unsigned int linear_revmap[];
 };
 
+#define	IRQ_DOMAIN_FLAG_HIERARCHY	0x1
+#define	IRQ_DOMAIN_FLAG_ARCH1		0x10000
+
 #ifdef CONFIG_IRQ_DOMAIN
 struct irq_domain *__irq_domain_add(struct device_node *of_node, int size,
 				    irq_hw_number_t hwirq_max, int direct_max,
@@ -220,8 +241,73 @@ int irq_domain_xlate_onetwocell(struct irq_domain *d, struct device_node *ctrlr,
 			const u32 *intspec, unsigned int intsize,
 			irq_hw_number_t *out_hwirq, unsigned int *out_type);
 
+/* V2 interfaces to support hierarchy IRQ domains. */
+extern struct irq_data *irq_domain_get_irq_data(struct irq_domain *domain,
+						unsigned int virq);
+#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
+extern int __irq_domain_alloc_irqs(struct irq_domain *domain, int irq_base,
+				   unsigned int nr_irqs, int node, void *arg,
+				   bool realloc);
+extern void irq_domain_free_irqs(unsigned int virq, unsigned int nr_irqs);
+extern int irq_domain_activate_irq(struct irq_data *irq_data);
+extern int irq_domain_deactivate_irq(struct irq_data *irq_data);
+
+static inline int irq_domain_alloc_irqs(struct irq_domain *domain,
+			unsigned int nr_irqs, int node, void *arg)
+{
+	return __irq_domain_alloc_irqs(domain, -1, nr_irqs, node, arg, false);
+}
+
+extern int irq_domain_set_hwirq_and_chip(struct irq_domain *domain,
+					 unsigned int virq,
+					 irq_hw_number_t hwirq,
+					 struct irq_chip *chip,
+					 void *chip_data);
+extern void irq_domain_reset_irq_data(struct irq_data *irq_data);
+extern void irq_domain_free_irqs_common(struct irq_domain *domain,
+					int virq, int nr_irqs);
+extern void irq_domain_free_irqs_top(struct irq_domain *domain,
+				     int virq, int nr_irqs);
+
+static inline int irq_domain_alloc_irqs_parent(struct irq_domain *domain,
+				int irq_base, unsigned int nr_irqs, void *arg)
+{
+	if (domain->parent && domain->parent->ops->alloc)
+		return domain->parent->ops->alloc(domain->parent, irq_base,
+						  nr_irqs, arg);
+	return -ENOSYS;
+}
+
+static inline void irq_domain_free_irqs_parent(struct irq_domain *domain,
+					int irq_base, unsigned int nr_irqs)
+{
+	if (domain->parent && domain->parent->ops->free)
+		domain->parent->ops->free(domain->parent, irq_base, nr_irqs);
+}
+
+static inline bool irq_domain_is_hierarchy(struct irq_domain *domain)
+{
+	return domain->flags & IRQ_DOMAIN_FLAG_HIERARCHY;
+}
+#else	/* CONFIG_IRQ_DOMAIN_HIERARCHY */
+static inline int irq_domain_activate_irq(struct irq_data *data) { return 0; }
+static inline int irq_domain_deactivate_irq(struct irq_data *data) { return 0; }
+static inline int irq_domain_alloc_irqs(struct irq_domain *domain,
+			unsigned int nr_irqs, int node, void *arg)
+{
+	return -1;
+}
+
+static inline bool irq_domain_is_hierarchy(struct irq_domain *domain)
+{
+	return false;
+}
+#endif	/* CONFIG_IRQ_DOMAIN_HIERARCHY */
+
 #else /* CONFIG_IRQ_DOMAIN */
 static inline void irq_dispose_mapping(unsigned int virq) { }
+static inline int irq_domain_activate_irq(struct irq_data *data) { return 0; }
+static inline int irq_domain_deactivate_irq(struct irq_data *data) { return 0; }
 #endif /* !CONFIG_IRQ_DOMAIN */
 
 #endif /* _LINUX_IRQDOMAIN_H */
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 225086b2652e..e9b580eccc01 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -55,6 +55,10 @@ config GENERIC_IRQ_CHIP
 config IRQ_DOMAIN
 	bool
 
+config IRQ_DOMAIN_HIERARCHY
+	bool
+	depends on IRQ_DOMAIN
+
 config HANDLE_DOMAIN_IRQ
 	bool
 
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index e5202f00cabc..72a93086216b 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
+#include <linux/irqdomain.h>
 
 #include <trace/events/irq.h>
 
@@ -178,6 +179,7 @@ int irq_startup(struct irq_desc *desc, bool resend)
 	irq_state_clr_disabled(desc);
 	desc->depth = 0;
 
+	irq_domain_activate_irq(&desc->irq_data);
 	if (desc->irq_data.chip->irq_startup) {
 		ret = desc->irq_data.chip->irq_startup(&desc->irq_data);
 		irq_state_clr_masked(desc);
@@ -199,6 +201,7 @@ void irq_shutdown(struct irq_desc *desc)
 		desc->irq_data.chip->irq_disable(&desc->irq_data);
 	else
 		desc->irq_data.chip->irq_mask(&desc->irq_data);
+	irq_domain_deactivate_irq(&desc->irq_data);
 	irq_state_set_masked(desc);
 }
 
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 6534ff6ce02e..899150452ae8 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -23,6 +23,10 @@ static DEFINE_MUTEX(irq_domain_mutex);
 static DEFINE_MUTEX(revmap_trees_mutex);
 static struct irq_domain *irq_default_domain;
 
+static int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
+				  irq_hw_number_t hwirq, int node);
+static void irq_domain_check_hierarchy(struct irq_domain *domain);
+
 /**
  * __irq_domain_add() - Allocate a new irq_domain data structure
  * @of_node: optional device-tree node of the interrupt controller
@@ -30,7 +34,7 @@ static struct irq_domain *irq_default_domain;
  * @hwirq_max: Maximum number of interrupts supported by controller
  * @direct_max: Maximum value of direct maps; Use ~0 for no limit; 0 for no
  *              direct mapping
- * @ops: map/unmap domain callbacks
+ * @ops: domain callbacks
  * @host_data: Controller private data pointer
  *
  * Allocates and initialize and irq_domain structure.
@@ -56,6 +60,7 @@ struct irq_domain *__irq_domain_add(struct device_node *of_node, int size,
 	domain->hwirq_max = hwirq_max;
 	domain->revmap_size = size;
 	domain->revmap_direct_max_irq = direct_max;
+	irq_domain_check_hierarchy(domain);
 
 	mutex_lock(&irq_domain_mutex);
 	list_add(&domain->link, &irq_domain_list);
@@ -109,7 +114,7 @@ EXPORT_SYMBOL_GPL(irq_domain_remove);
  * @first_irq: first number of irq block assigned to the domain,
  *	pass zero to assign irqs on-the-fly. If first_irq is non-zero, then
  *	pre-map all of the irqs in the domain to virqs starting at first_irq.
- * @ops: map/unmap domain callbacks
+ * @ops: domain callbacks
  * @host_data: Controller private data pointer
  *
  * Allocates an irq_domain, and optionally if first_irq is positive then also
@@ -174,10 +179,8 @@ struct irq_domain *irq_domain_add_legacy(struct device_node *of_node,
 
 	domain = __irq_domain_add(of_node, first_hwirq + size,
 				  first_hwirq + size, 0, ops, host_data);
-	if (!domain)
-		return NULL;
-
-	irq_domain_associate_many(domain, first_irq, first_hwirq, size);
+	if (domain)
+		irq_domain_associate_many(domain, first_irq, first_hwirq, size);
 
 	return domain;
 }
@@ -388,7 +391,6 @@ EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
 unsigned int irq_create_mapping(struct irq_domain *domain,
 				irq_hw_number_t hwirq)
 {
-	unsigned int hint;
 	int virq;
 
 	pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq);
@@ -410,12 +412,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
 	}
 
 	/* Allocate a virtual interrupt number */
-	hint = hwirq % nr_irqs;
-	if (hint == 0)
-		hint++;
-	virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node));
-	if (virq <= 0)
-		virq = irq_alloc_desc_from(1, of_node_to_nid(domain->of_node));
+	virq = irq_domain_alloc_descs(-1, 1, hwirq,
+				      of_node_to_nid(domain->of_node));
 	if (virq <= 0) {
 		pr_debug("-> virq allocation failed\n");
 		return 0;
@@ -471,7 +469,7 @@ unsigned int irq_create_of_mapping(struct of_phandle_args *irq_data)
 	struct irq_domain *domain;
 	irq_hw_number_t hwirq;
 	unsigned int type = IRQ_TYPE_NONE;
-	unsigned int virq;
+	int virq;
 
 	domain = irq_data->np ? irq_find_host(irq_data->np) : irq_default_domain;
 	if (!domain) {
@@ -480,6 +478,11 @@ unsigned int irq_create_of_mapping(struct of_phandle_args *irq_data)
 		return 0;
 	}
 
+	if (irq_domain_is_hierarchy(domain)) {
+		virq = irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, irq_data);
+		return virq <= 0 ? 0 : virq;
+	}
+
 	/* If domain has no translation, then we assume interrupt line */
 	if (domain->ops->xlate == NULL)
 		hwirq = irq_data->args[0];
@@ -540,8 +543,8 @@ unsigned int irq_find_mapping(struct irq_domain *domain,
 		return 0;
 
 	if (hwirq < domain->revmap_direct_max_irq) {
-		data = irq_get_irq_data(hwirq);
-		if (data && (data->domain == domain) && (data->hwirq == hwirq))
+		data = irq_domain_get_irq_data(domain, hwirq);
+		if (data && data->hwirq == hwirq)
 			return hwirq;
 	}
 
@@ -709,3 +712,367 @@ const struct irq_domain_ops irq_domain_simple_ops = {
 	.xlate = irq_domain_xlate_onetwocell,
 };
 EXPORT_SYMBOL_GPL(irq_domain_simple_ops);
+
+static int irq_domain_alloc_descs(int virq, unsigned int cnt,
+				  irq_hw_number_t hwirq, int node)
+{
+	unsigned int hint;
+
+	if (virq >= 0) {
+		virq = irq_alloc_descs(virq, virq, cnt, node);
+	} else {
+		hint = hwirq % nr_irqs;
+		if (hint == 0)
+			hint++;
+		virq = irq_alloc_descs_from(hint, cnt, node);
+		if (virq <= 0 && hint > 1)
+			virq = irq_alloc_descs_from(1, cnt, node);
+	}
+
+	return virq;
+}
+
+#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
+static void irq_domain_insert_irq(int virq)
+{
+	struct irq_data *data;
+
+	for (data = irq_get_irq_data(virq); data; data = data->parent_data) {
+		struct irq_domain *domain = data->domain;
+		irq_hw_number_t hwirq = data->hwirq;
+
+		if (hwirq < domain->revmap_size) {
+			domain->linear_revmap[hwirq] = virq;
+		} else {
+			mutex_lock(&revmap_trees_mutex);
+			radix_tree_insert(&domain->revmap_tree, hwirq, data);
+			mutex_unlock(&revmap_trees_mutex);
+		}
+
+		/* If not already assigned, give the domain the chip's name */
+		if (!domain->name && data->chip)
+			domain->name = data->chip->name;
+	}
+
+	irq_clear_status_flags(virq, IRQ_NOREQUEST);
+}
+
+static void irq_domain_remove_irq(int virq)
+{
+	struct irq_data *data;
+
+	irq_set_status_flags(virq, IRQ_NOREQUEST);
+	irq_set_chip_and_handler(virq, NULL, NULL);
+	synchronize_irq(virq);
+	smp_mb();
+
+	for (data = irq_get_irq_data(virq); data; data = data->parent_data) {
+		struct irq_domain *domain = data->domain;
+		irq_hw_number_t hwirq = data->hwirq;
+
+		if (hwirq < domain->revmap_size) {
+			domain->linear_revmap[hwirq] = 0;
+		} else {
+			mutex_lock(&revmap_trees_mutex);
+			radix_tree_delete(&domain->revmap_tree, hwirq);
+			mutex_unlock(&revmap_trees_mutex);
+		}
+	}
+}
+
+static struct irq_data *irq_domain_insert_irq_data(struct irq_domain *domain,
+						   struct irq_data *child)
+{
+	struct irq_data *irq_data;
+
+	irq_data = kzalloc_node(sizeof(*irq_data), GFP_KERNEL, child->node);
+	if (irq_data) {
+		child->parent_data = irq_data;
+		irq_data->irq = child->irq;
+		irq_data->node = child->node;
+		irq_data->domain = domain;
+	}
+
+	return irq_data;
+}
+
+static void irq_domain_free_irq_data(unsigned int virq, unsigned int nr_irqs)
+{
+	int i;
+	struct irq_data *irq_data, *tmp;
+
+	for (i = 0; i < nr_irqs; i++) {
+		irq_data = irq_get_irq_data(virq + i);
+		tmp = irq_data->parent_data;
+		irq_data->parent_data = NULL;
+		irq_data->domain = NULL;
+
+		while (tmp) {
+			irq_data = tmp;
+			tmp = tmp->parent_data;
+			kfree(irq_data);
+		}
+	}
+}
+
+static int irq_domain_alloc_irq_data(struct irq_domain *domain,
+				     unsigned int virq, unsigned int nr_irqs)
+{
+	int i;
+	struct irq_data *irq_data;
+	struct irq_domain *parent;
+
+	/* The outermost irq_data is embedded in struct irq_desc */
+	for (i = 0; i < nr_irqs; i++) {
+		irq_data = irq_get_irq_data(virq + i);
+		irq_data->domain = domain;
+
+		for (parent = domain->parent; parent; parent = parent->parent) {
+			irq_data = irq_domain_insert_irq_data(parent, irq_data);
+			if (!irq_data) {
+				irq_domain_free_irq_data(virq, i + 1);
+				return -ENOMEM;
+			}
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * irq_domain_get_irq_data - Get irq_data associated with @virq and @domain
+ * @domain: domain to match
+ * @virq: IRQ number to get irq_data
+ */
+struct irq_data *irq_domain_get_irq_data(struct irq_domain *domain,
+					 unsigned int virq)
+{
+	struct irq_data *irq_data;
+
+	for (irq_data = irq_get_irq_data(virq); irq_data;
+	     irq_data = irq_data->parent_data)
+		if (irq_data->domain == domain)
+			return irq_data;
+
+	return NULL;
+}
+
+int irq_domain_set_hwirq_and_chip(struct irq_domain *domain, unsigned int virq,
+				  irq_hw_number_t hwirq, struct irq_chip *chip,
+				  void *chip_data)
+{
+	struct irq_data *irq_data = irq_domain_get_irq_data(domain, virq);
+
+	if (!irq_data)
+		return -ENOENT;
+
+	irq_data->hwirq = hwirq;
+	irq_data->chip = chip ? chip : &no_irq_chip;
+	irq_data->chip_data = chip_data;
+
+	return 0;
+}
+
+void irq_domain_reset_irq_data(struct irq_data *irq_data)
+{
+	irq_data->hwirq = 0;
+	irq_data->chip = &no_irq_chip;
+	irq_data->chip_data = NULL;
+}
+
+void irq_domain_free_irqs_common(struct irq_domain *domain, int virq,
+				 int nr_irqs)
+{
+	int i;
+	struct irq_data *irq_data;
+
+	for (i = 0; i < nr_irqs; i++) {
+		irq_data = irq_domain_get_irq_data(domain, virq + i);
+		if (irq_data)
+			irq_domain_reset_irq_data(irq_data);
+	}
+	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
+}
+
+void irq_domain_free_irqs_top(struct irq_domain *domain, int virq,
+			      int nr_irqs)
+{
+	int i;
+
+	for (i = 0; i < nr_irqs; i++) {
+		irq_set_handler_data(virq + i, NULL);
+		irq_set_handler(virq + i, NULL);
+	}
+	irq_domain_free_irqs_common(domain, virq, nr_irqs);
+}
+
+/**
+ * __irq_domain_alloc_irqs - Allocate IRQs from domain
+ * @domain: domain to allocate from
+ * @irq_base: allocate specified IRQ nubmer if irq_base >= 0
+ * @nr_irqs: number of IRQs to allocate
+ * @node: NUMA node id for memory allocation
+ * @arg: domain specific argument
+ * @realloc: IRQ descriptors have already been allocated if true
+ *
+ * Allocate IRQ numbers and initialized all data structures to support
+ * hiearchy IRQ domains.
+ * Parameter @realloc is mainly to support legacy IRQs.
+ * Returns error code or allocated IRQ number
+ *
+ * The whole process to setup an IRQ has been split into two steps.
+ * The first step, __irq_domain_alloc_irqs(), is to allocate IRQ
+ * descriptor and required hardware resources. The second step,
+ * irq_domain_activate_irq(), is to program hardwares with preallocated
+ * resources. In this way, it's easier to rollback when failing to
+ * allocate resources.
+ */
+int __irq_domain_alloc_irqs(struct irq_domain *domain, int irq_base,
+			    unsigned int nr_irqs, int node, void *arg,
+			    bool realloc)
+{
+	int i, ret, virq;
+
+	if (domain == NULL) {
+		domain = irq_default_domain;
+		if (WARN(!domain, "domain is NULL; cannot allocate IRQ\n"))
+			return -EINVAL;
+	}
+
+	if (!domain->ops->alloc) {
+		pr_debug("domain->ops->alloc() is NULL\n");
+		return -ENOSYS;
+	}
+
+	if (realloc && irq_base >= 0) {
+		virq = irq_base;
+	} else {
+		virq = irq_domain_alloc_descs(irq_base, nr_irqs, 0, node);
+		if (virq < 0) {
+			pr_debug("cannot allocate IRQ(base %d, count %d)\n",
+				 irq_base, nr_irqs);
+			return virq;
+		}
+	}
+
+	if (irq_domain_alloc_irq_data(domain, virq, nr_irqs)) {
+		pr_debug("cannot allocate memory for IRQ%d\n", virq);
+		ret = -ENOMEM;
+		goto out_free_desc;
+	}
+
+	mutex_lock(&irq_domain_mutex);
+	ret = domain->ops->alloc(domain, virq, nr_irqs, arg);
+	if (ret < 0) {
+		mutex_unlock(&irq_domain_mutex);
+		goto out_free_irq_data;
+	}
+	for (i = 0; i < nr_irqs; i++)
+		irq_domain_insert_irq(virq + i);
+	mutex_unlock(&irq_domain_mutex);
+
+	return virq;
+
+out_free_irq_data:
+	irq_domain_free_irq_data(virq, nr_irqs);
+out_free_desc:
+	irq_free_descs(virq, nr_irqs);
+	return ret;
+}
+
+/**
+ * irq_domain_free_irqs - Free IRQ number and associated data structures
+ * @virq: base IRQ number
+ * @nr_irqs: number of IRQs to free
+ */
+void irq_domain_free_irqs(unsigned int virq, unsigned int nr_irqs)
+{
+	int i;
+	struct irq_data *data = irq_get_irq_data(virq);
+
+	if (WARN(!data || !data->domain || !data->domain->ops->free,
+		 "NULL pointer, cannot free irq\n"))
+		return;
+
+	mutex_lock(&irq_domain_mutex);
+	for (i = 0; i < nr_irqs; i++)
+		irq_domain_remove_irq(virq + i);
+	data->domain->ops->free(data->domain, virq, nr_irqs);
+	mutex_unlock(&irq_domain_mutex);
+
+	irq_domain_free_irq_data(virq, nr_irqs);
+	irq_free_descs(virq, nr_irqs);
+}
+
+/**
+ * irq_domain_activate_irq - Call domain_ops->activate recursively to activate
+ *			     interrupt
+ * @irq_data: outermost irq_data associated with interrupt
+ *
+ * This is the second step to call domain_ops->activate to program interrupt
+ * controllers, so the interrupt could actually get delivered.
+ */
+int irq_domain_activate_irq(struct irq_data *irq_data)
+{
+	int ret = 0;
+
+	if (irq_data && irq_data->domain) {
+		struct irq_domain *domain = irq_data->domain;
+
+		if (irq_data->parent_data)
+			ret = irq_domain_activate_irq(irq_data->parent_data);
+		if (ret == 0 && domain->ops->activate)
+			ret = domain->ops->activate(domain, irq_data);
+	}
+
+	return ret;
+}
+
+/**
+ * irq_domain_deactivate_irq - Call domain_ops->deactivate recursively to
+ *			       deactivate interrupt
+ * @irq_data: outermost irq_data associated with interrupt
+ *
+ * It calls domain_ops->deactivate to program interrupt controllers to disable
+ * interrupt delivery.
+ */
+int irq_domain_deactivate_irq(struct irq_data *irq_data)
+{
+	int ret = 0;
+
+	if (irq_data && irq_data->domain) {
+		struct irq_domain *domain = irq_data->domain;
+
+		if (domain->ops->deactivate)
+			ret = domain->ops->deactivate(domain, irq_data);
+		if (ret == 0 && irq_data->parent_data)
+			ret = irq_domain_deactivate_irq(irq_data->parent_data);
+	}
+
+	return ret;
+}
+
+static void irq_domain_check_hierarchy(struct irq_domain *domain)
+{
+	/* Hierarchy irq_domains must implement callback alloc() */
+	if (domain->ops->alloc)
+		domain->flags |= IRQ_DOMAIN_FLAG_HIERARCHY;
+}
+#else	/* CONFIG_IRQ_DOMAIN_HIERARCHY */
+/**
+ * irq_domain_get_irq_data - Get irq_data associated with @virq and @domain
+ * @domain: domain to match
+ * @virq: IRQ number to get irq_data
+ */
+struct irq_data *irq_domain_get_irq_data(struct irq_domain *domain,
+					 unsigned int virq)
+{
+	struct irq_data *irq_data = irq_get_irq_data(virq);
+
+	return (irq_data && irq_data->domain == domain) ? irq_data : NULL;
+}
+
+static void irq_domain_check_hierarchy(struct irq_domain *domain)
+{
+}
+#endif	/* CONFIG_IRQ_DOMAIN_HIERARCHY */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 02/31] irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 01/31] irqdomain: Introduce new interfaces to support hierarchy irqdomains Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 03/31] genirq: Introduce helper functions to support stacked irq_chip Jiang Liu
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel, Jiang Liu

From: Yingjoe Chen <yingjoe.chen@mediatek.com>

It is possible to call irq_create_of_mapping to create/translate the
same IRQ from DT for multiple times. Perform irq_find_mapping check
and set_type for hierarchy irqdomain in irq_create_of_mapping() to
avoid duplicate these functionality in all outer most irqdomain.

Signed-off-by: Yingjoe Chen <yingjoe.chen@mediatek.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 kernel/irq/irqdomain.c |   27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 899150452ae8..e1351b0bd7f8 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -478,11 +478,6 @@ unsigned int irq_create_of_mapping(struct of_phandle_args *irq_data)
 		return 0;
 	}
 
-	if (irq_domain_is_hierarchy(domain)) {
-		virq = irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, irq_data);
-		return virq <= 0 ? 0 : virq;
-	}
-
 	/* If domain has no translation, then we assume interrupt line */
 	if (domain->ops->xlate == NULL)
 		hwirq = irq_data->args[0];
@@ -492,10 +487,24 @@ unsigned int irq_create_of_mapping(struct of_phandle_args *irq_data)
 			return 0;
 	}
 
-	/* Create mapping */
-	virq = irq_create_mapping(domain, hwirq);
-	if (!virq)
-		return virq;
+	if (irq_domain_is_hierarchy(domain)) {
+		/*
+		 * If we've already configured this interrupt,
+		 * don't do it again, or hell will break loose.
+		 */
+		virq = irq_find_mapping(domain, hwirq);
+		if (virq)
+			return virq;
+
+		virq = irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, irq_data);
+		if (virq <= 0)
+			return 0;
+	} else {
+		/* Create mapping */
+		virq = irq_create_mapping(domain, hwirq);
+		if (!virq)
+			return virq;
+	}
 
 	/* Set type if specified and different than the current one */
 	if (type != IRQ_TYPE_NONE &&
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 03/31] genirq: Introduce helper functions to support stacked irq_chip
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 01/31] irqdomain: Introduce new interfaces to support hierarchy irqdomains Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 02/31] irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 04/31] genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip Jiang Liu
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Now we already support hierarchy irq_data, so introduce several helpers
to support stacked irq_chips.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 include/linux/irq.h |    5 +++++
 kernel/irq/chip.c   |   17 +++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 13ba412ce3a0..0adcbbbf2e87 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -443,6 +443,11 @@ extern void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc);
 extern void handle_bad_irq(unsigned int irq, struct irq_desc *desc);
 extern void handle_nested_irq(unsigned int irq);
 
+#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
+extern void irq_chip_ack_parent(struct irq_data *data);
+extern int irq_chip_retrigger_hierarchy(struct irq_data *data);
+#endif
+
 /* Handling of unhandled and spurious interrupts: */
 extern void note_interrupt(unsigned int irq, struct irq_desc *desc,
 			   irqreturn_t action_ret);
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 72a93086216b..12f3e72449eb 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -850,3 +850,20 @@ void irq_cpu_offline(void)
 		raw_spin_unlock_irqrestore(&desc->lock, flags);
 	}
 }
+
+#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
+void irq_chip_ack_parent(struct irq_data *data)
+{
+	data = data->parent_data;
+	data->chip->irq_ack(data);
+}
+
+int irq_chip_retrigger_hierarchy(struct irq_data *data)
+{
+	for (data = data->parent_data; data; data = data->parent_data)
+		if (data->chip && data->chip->irq_retrigger)
+			return data->chip->irq_retrigger(data);
+
+	return -ENOSYS;
+}
+#endif
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 04/31] genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (2 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 03/31] genirq: Introduce helper functions to support stacked irq_chip Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 05/31] genirq: Add IRQ_SET_MASK_OK_DONE " Jiang Liu
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Add callback irq_compose_msi_msg to struct irq_chip, which will be used
to support stacked irqchip.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 include/linux/irq.h |    5 +++++
 kernel/irq/chip.c   |   17 +++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 0adcbbbf2e87..536b7fc6c8f4 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -29,6 +29,7 @@ struct seq_file;
 struct module;
 struct irq_desc;
 struct irq_data;
+struct msi_msg;
 typedef	void (*irq_flow_handler_t)(unsigned int irq,
 					    struct irq_desc *desc);
 typedef	void (*irq_preflow_handler_t)(struct irq_data *data);
@@ -320,6 +321,7 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
  *				any other callback related to this irq
  * @irq_release_resources:	optional to release resources acquired with
  *				irq_request_resources
+ * @irq_compose_msi_msg:	optional to compose message content for MSI
  * @flags:		chip specific flags
  */
 struct irq_chip {
@@ -356,6 +358,8 @@ struct irq_chip {
 	int		(*irq_request_resources)(struct irq_data *data);
 	void		(*irq_release_resources)(struct irq_data *data);
 
+	void		(*irq_compose_msi_msg)(struct irq_data *data, struct msi_msg *msg);
+
 	unsigned long	flags;
 };
 
@@ -443,6 +447,7 @@ extern void handle_percpu_devid_irq(unsigned int irq, struct irq_desc *desc);
 extern void handle_bad_irq(unsigned int irq, struct irq_desc *desc);
 extern void handle_nested_irq(unsigned int irq);
 
+extern int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg);
 #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
 extern void irq_chip_ack_parent(struct irq_data *data);
 extern int irq_chip_retrigger_hierarchy(struct irq_data *data);
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 12f3e72449eb..8f362db17a8a 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -867,3 +867,20 @@ int irq_chip_retrigger_hierarchy(struct irq_data *data)
 	return -ENOSYS;
 }
 #endif
+
+int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	struct irq_data *pos = NULL;
+
+#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
+	for (; data; data = data->parent_data)
+#endif
+		if (data->chip && data->chip->irq_compose_msi_msg)
+			pos = data;
+	if (!pos)
+		return -ENOSYS;
+
+	pos->chip->irq_compose_msi_msg(pos, msg);
+
+	return 0;
+}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 05/31] genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (3 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 04/31] genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 06/31] x86, irq: Save destination CPU ID in irq_cfg Jiang Liu
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Add IRQ_SET_MASK_OK_DONE in addition to IRQ_SET_MASK_OK and
IRQ_SET_MASK_OK_NOCOPY to support stacked irqchip. IRQ_SET_MASK_OK_DONE
is the same as IRQ_SET_MASK_OK to irq core. To stacked irqchip, it means
that ascendant irqchips have done all the work and no more handling
needed in descendant irqchips.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 include/linux/irq.h |    4 ++++
 kernel/irq/manage.c |    2 ++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 536b7fc6c8f4..041edd6dc409 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -114,10 +114,14 @@ enum {
  *
  * IRQ_SET_MASK_OK	- OK, core updates irq_data.affinity
  * IRQ_SET_MASK_NOCPY	- OK, chip did update irq_data.affinity
+ * IRQ_SET_MASK_OK_DONE	- Same as IRQ_SET_MASK_OK for core. Special code to
+ *			  support stacked irqchips, which indicates skipping
+ *			  all descendent irqchips.
  */
 enum {
 	IRQ_SET_MASK_OK = 0,
 	IRQ_SET_MASK_OK_NOCOPY,
+	IRQ_SET_MASK_OK_DONE,
 };
 
 struct msi_desc;
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 0a9104b4608b..80692373abd6 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -183,6 +183,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	ret = chip->irq_set_affinity(data, mask, force);
 	switch (ret) {
 	case IRQ_SET_MASK_OK:
+	case IRQ_SET_MASK_OK_DONE:
 		cpumask_copy(data->affinity, mask);
 	case IRQ_SET_MASK_OK_NOCOPY:
 		irq_set_thread_affinity(desc);
@@ -600,6 +601,7 @@ int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
 
 	switch (ret) {
 	case IRQ_SET_MASK_OK:
+	case IRQ_SET_MASK_OK_DONE:
 		irqd_clear(&desc->irq_data, IRQD_TRIGGER_MASK);
 		irqd_set(&desc->irq_data, flags);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 06/31] x86, irq: Save destination CPU ID in irq_cfg
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (4 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 05/31] genirq: Add IRQ_SET_MASK_OK_DONE " Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 07/31] x86, irq: Use hierarchy irqdomain to manage CPU interrupt vectors Jiang Liu
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Cache destination CPU APIC ID into struct irq_cfg when assigning vector
for interrupt. Upper layer just needs to read the cached APIC ID instead
of calling apic->cpu_mask_to_apicid_and(), it helps to hide APIC driver
details from IOAPIC/HPET/MSI drivers..

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/hw_irq.h |    1 +
 arch/x86/kernel/apic/vector.c |    6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 7624fffc2822..3d51d74d6c01 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -116,6 +116,7 @@ struct irq_data;
 struct irq_cfg {
 	cpumask_var_t		domain;
 	cpumask_var_t		old_domain;
+	unsigned int		dest_apicid;
 	u8			vector;
 	u8			move_in_progress : 1;
 #ifdef CONFIG_IRQ_REMAP
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 5ac840a4cc53..02cb5d386985 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -188,6 +188,12 @@ next:
 	}
 	free_cpumask_var(tmp_mask);
 
+	if (!err) {
+		/* cache destination APIC IDs into cfg->dest_apicid */
+		err = apic->cpu_mask_to_apicid_and(mask, cfg->domain,
+						   &cfg->dest_apicid);
+	}
+
 	return err;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 07/31] x86, irq: Use hierarchy irqdomain to manage CPU interrupt vectors
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (5 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 06/31] x86, irq: Save destination CPU ID in irq_cfg Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 08/31] x86, hpet: Use new irqdomain interfaces to allocate/free IRQ Jiang Liu
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu, Prarit Bhargava
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Abstract CPU local APIC as an interrupt controller and create an
irqdomain for it to manage CPU interupt vectors. It's the base to
enable hierarchy irqdomain on x86 systems. Eventually we will build
a irqdomain hiearchy as below:
IOAPIC domain-------|
MSI/MSI-x domain------> [Inerrupt Remapping domain] -> CPU vector domain
HPET_IRQ domain_____|                                         ^
DMAR domain---------------------------------------------------|
HT_IRQ domain-------------------------------------------------|

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/Kconfig               |    3 +-
 arch/x86/include/asm/hw_irq.h  |   15 ++++
 arch/x86/kernel/apic/io_apic.c |    3 -
 arch/x86/kernel/apic/vector.c  |  156 ++++++++++++++++++++++++++++++++++++----
 4 files changed, 160 insertions(+), 17 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8e3175ba7f4c..9df24a42f54d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -883,11 +883,12 @@ config X86_LOCAL_APIC
 	def_bool y
 	depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC || PCI_MSI
 	select GENERIC_IRQ_LEGACY_ALLOC_HWIRQ
+	select IRQ_DOMAIN
+	select IRQ_DOMAIN_HIERARCHY
 
 config X86_IO_APIC
 	def_bool X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_IOAPIC
 	depends on X86_LOCAL_APIC
-	select IRQ_DOMAIN
 
 config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
 	bool "Reroute for broken boot IRQs"
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 3d51d74d6c01..78130156601a 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -112,6 +112,15 @@ struct irq_2_irte {
 
 #ifdef	CONFIG_X86_LOCAL_APIC
 struct irq_data;
+struct irq_domain;
+
+struct irq_alloc_info {
+	u32			flags;
+	const struct cpumask	*mask;	/* CPU mask for vector allocation */
+};
+
+/* Request contigious CPU vectors */
+#define	X86_IRQ_ALLOC_CONTIGOUS_VECTORS	0x1
 
 struct irq_cfg {
 	cpumask_var_t		domain;
@@ -135,6 +144,12 @@ struct irq_cfg {
 	};
 };
 
+extern struct irq_domain *x86_vector_domain;
+
+extern void init_irq_alloc_info(struct irq_alloc_info *info,
+				const struct cpumask *mask);
+extern void copy_irq_alloc_info(struct irq_alloc_info *dst,
+				struct irq_alloc_info *src);
 extern struct irq_cfg *irq_cfg(unsigned int irq);
 extern struct irq_cfg *irqd_cfg(struct irq_data *irq_data);
 extern struct irq_cfg *alloc_irq_and_cfg_at(unsigned int at, int node);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 9593c4cac1c0..b46192774d91 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2353,9 +2353,6 @@ static int mp_irqdomain_create(int ioapic)
 		ioapic_dynirq_base = max(ioapic_dynirq_base,
 					 gsi_cfg->gsi_end + 1);
 
-	if (gsi_cfg->gsi_base == 0)
-		irq_set_default_host(ip->irqdomain);
-
 	return 0;
 }
 
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 02cb5d386985..4b5a021f2094 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -3,6 +3,8 @@
  *
  * Copyright (C) 1997, 1998, 1999, 2000, 2009 Ingo Molnar, Hajnalka Szabo
  *	Moved from arch/x86/kernel/apic/io_apic.c.
+ * Jiang Liu <jiang.liu@linux.intel.com>
+ *	Add support of hierarchy irqdomain
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -19,7 +21,9 @@
 #include <asm/desc.h>
 #include <asm/irq_remapping.h>
 
+struct irq_domain *x86_vector_domain;
 static DEFINE_RAW_SPINLOCK(vector_lock);
+static struct irq_chip vector_chip;
 
 void lock_vector_lock(void)
 {
@@ -36,15 +40,21 @@ void unlock_vector_lock(void)
 
 struct irq_cfg *irq_cfg(unsigned int irq)
 {
-	return irq_get_chip_data(irq);
+	return irqd_cfg(irq_get_irq_data(irq));
 }
 
 struct irq_cfg *irqd_cfg(struct irq_data *irq_data)
 {
+	if (!irq_data)
+		return NULL;
+
+	while (irq_data->parent_data)
+		irq_data = irq_data->parent_data;
+
 	return irq_data->chip_data;
 }
 
-static struct irq_cfg *alloc_irq_cfg(unsigned int irq, int node)
+static struct irq_cfg *alloc_irq_cfg(int node)
 {
 	struct irq_cfg *cfg;
 
@@ -79,7 +89,7 @@ struct irq_cfg *alloc_irq_and_cfg_at(unsigned int at, int node)
 			return cfg;
 	}
 
-	cfg = alloc_irq_cfg(at, node);
+	cfg = alloc_irq_cfg(node);
 	if (cfg)
 		irq_set_chip_data(at, cfg);
 	else
@@ -87,14 +97,13 @@ struct irq_cfg *alloc_irq_and_cfg_at(unsigned int at, int node)
 	return cfg;
 }
 
-static void free_irq_cfg(unsigned int at, struct irq_cfg *cfg)
+static void free_irq_cfg(struct irq_cfg *cfg)
 {
-	if (!cfg)
-		return;
-	irq_set_chip_data(at, NULL);
-	free_cpumask_var(cfg->domain);
-	free_cpumask_var(cfg->old_domain);
-	kfree(cfg);
+	if (cfg) {
+		free_cpumask_var(cfg->domain);
+		free_cpumask_var(cfg->old_domain);
+		kfree(cfg);
+	}
 }
 
 static int
@@ -241,6 +250,90 @@ void clear_irq_vector(int irq, struct irq_cfg *cfg)
 	raw_spin_unlock_irqrestore(&vector_lock, flags);
 }
 
+void init_irq_alloc_info(struct irq_alloc_info *info,
+			 const struct cpumask *mask)
+{
+	memset(info, 0, sizeof(*info));
+	info->mask = mask;
+}
+
+void copy_irq_alloc_info(struct irq_alloc_info *dst, struct irq_alloc_info *src)
+{
+	if (src)
+		*dst = *src;
+	else
+		memset(dst, 0, sizeof(*dst));
+}
+
+static inline const struct cpumask *
+irq_alloc_info_get_mask(struct irq_alloc_info *info)
+{
+	return (!info || !info->mask) ? apic->target_cpus() : info->mask;
+}
+
+static void x86_vector_free_irqs(struct irq_domain *domain,
+				 unsigned int virq, unsigned int nr_irqs)
+{
+	int i;
+	struct irq_data *irq_data;
+
+	for (i = 0; i < nr_irqs; i++) {
+		irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
+		if (irq_data && irq_data->chip_data) {
+			free_remapped_irq(virq);
+			clear_irq_vector(virq + i, irq_data->chip_data);
+			free_irq_cfg(irq_data->chip_data);
+			irq_domain_reset_irq_data(irq_data);
+		}
+	}
+}
+
+static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,
+				 unsigned int nr_irqs, void *arg)
+{
+	int i, err;
+	struct irq_cfg *cfg;
+	struct irq_data *irq_data;
+	const struct cpumask *mask;
+	struct irq_alloc_info *info = arg;
+
+	if (disable_apic)
+		return -ENXIO;
+
+	/* Currently vector allocator can't guarantee contigious allocations */
+	if ((info->flags & X86_IRQ_ALLOC_CONTIGOUS_VECTORS) && nr_irqs > 1)
+		return -ENOSYS;
+
+	mask = irq_alloc_info_get_mask(info);
+	for (i = 0; i < nr_irqs; i++) {
+		irq_data = irq_domain_get_irq_data(domain, virq + i);
+		BUG_ON(!irq_data);
+		cfg = alloc_irq_cfg(irq_data->node);
+		if (!cfg) {
+			err = -ENOMEM;
+			goto error;
+		}
+
+		irq_data->chip = &vector_chip;
+		irq_data->chip_data = cfg;
+		irq_data->hwirq = virq + i;
+		err = assign_irq_vector(virq, cfg, mask);
+		if (err)
+			goto error;
+	}
+
+	return 0;
+
+error:
+	x86_vector_free_irqs(domain, virq, i + 1);
+	return err;
+}
+
+static struct irq_domain_ops x86_vector_domain_ops = {
+	.alloc = x86_vector_alloc_irqs,
+	.free = x86_vector_free_irqs,
+};
+
 int __init arch_probe_nr_irqs(void)
 {
 	int nr;
@@ -266,6 +359,11 @@ int __init arch_probe_nr_irqs(void)
 
 int __init arch_early_irq_init(void)
 {
+	x86_vector_domain = irq_domain_add_tree(NULL, &x86_vector_domain_ops,
+						NULL);
+	BUG_ON(x86_vector_domain == NULL);
+	irq_set_default_host(x86_vector_domain);
+
 	return arch_early_ioapic_init();
 }
 
@@ -380,6 +478,37 @@ int apic_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	return 0;
 }
 
+static int vector_set_affinity(struct irq_data *irq_data,
+			       const struct cpumask *dest, bool force)
+{
+	int err;
+	int irq = irq_data->irq;
+	struct irq_cfg *cfg = irq_data->chip_data;
+
+	if (!config_enabled(CONFIG_SMP))
+		return -EPERM;
+
+	if (!cpumask_intersects(dest, cpu_online_mask))
+		return -EINVAL;
+
+	err = assign_irq_vector(irq, cfg, dest);
+	if (err) {
+		struct irq_data *top = irq_get_irq_data(irq);
+
+		if (assign_irq_vector(irq, cfg, top->affinity))
+			pr_err("Failed to recover vector for irq %d\n", irq);
+		return err;
+	}
+
+	return IRQ_SET_MASK_OK;
+}
+
+static struct irq_chip vector_chip = {
+	.irq_ack = apic_ack_edge,
+	.irq_set_affinity = vector_set_affinity,
+	.irq_retrigger = apic_retrigger_irq,
+};
+
 #ifdef CONFIG_SMP
 void send_cleanup_vector(struct irq_cfg *cfg)
 {
@@ -500,7 +629,7 @@ int arch_setup_hwirq(unsigned int irq, int node)
 	unsigned long flags;
 	int ret;
 
-	cfg = alloc_irq_cfg(irq, node);
+	cfg = alloc_irq_cfg(node);
 	if (!cfg)
 		return -ENOMEM;
 
@@ -511,7 +640,7 @@ int arch_setup_hwirq(unsigned int irq, int node)
 	if (!ret)
 		irq_set_chip_data(irq, cfg);
 	else
-		free_irq_cfg(irq, cfg);
+		free_irq_cfg(cfg);
 	return ret;
 }
 
@@ -521,7 +650,8 @@ void arch_teardown_hwirq(unsigned int irq)
 
 	free_remapped_irq(irq);
 	clear_irq_vector(irq, cfg);
-	free_irq_cfg(irq, cfg);
+	irq_set_chip_data(irq, NULL);
+	free_irq_cfg(cfg);
 }
 
 static void __init print_APIC_field(int base)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 08/31] x86, hpet: Use new irqdomain interfaces to allocate/free IRQ
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (6 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 07/31] x86, irq: Use hierarchy irqdomain to manage CPU interrupt vectors Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 09/31] x86, MSI: " Jiang Liu
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu, Srivatsa S. Bhat,
	Andy Lutomirski
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Use new irqdomain interfaces to allocate/free IRQ for HPET, so we could
kill GENERIC_IRQ_LEGACY_ALLOC_HWIRQ later.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/kernel/hpet.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 319bcb9372fe..24db2d33fab7 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -11,6 +11,7 @@
 #include <linux/cpu.h>
 #include <linux/pm.h>
 #include <linux/io.h>
+#include <linux/irqdomain.h>
 
 #include <asm/fixmap.h>
 #include <asm/hpet.h>
@@ -476,7 +477,7 @@ static int hpet_msi_next_event(unsigned long delta,
 static int hpet_setup_msi_irq(unsigned int irq)
 {
 	if (x86_msi.setup_hpet_msi(irq, hpet_blockid)) {
-		irq_free_hwirq(irq);
+		irq_domain_free_irqs(irq, 1);
 		return -EINVAL;
 	}
 	return 0;
@@ -484,9 +485,10 @@ static int hpet_setup_msi_irq(unsigned int irq)
 
 static int hpet_assign_irq(struct hpet_dev *dev)
 {
-	unsigned int irq = irq_alloc_hwirq(-1);
+	int irq;
 
-	if (!irq)
+	irq = irq_domain_alloc_irqs(NULL, 1, NUMA_NO_NODE, NULL);
+	if (irq <= 0)
 		return -EINVAL;
 
 	irq_set_handler_data(irq, dev);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 09/31] x86, MSI: Use new irqdomain interfaces to allocate/free IRQ
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (7 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 08/31] x86, hpet: Use new irqdomain interfaces to allocate/free IRQ Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 10/31] x86, uv: " Jiang Liu
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Use new irqdomain interfaces to allocate/free IRQ for MSI, so we could
kill GENERIC_IRQ_LEGACY_ALLOC_HWIRQ later.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/kernel/apic/msi.c |   14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6916246294eb..7e45991c0e79 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -14,6 +14,7 @@
 #include <linux/dmar.h>
 #include <linux/hpet.h>
 #include <linux/msi.h>
+#include <linux/irqdomain.h>
 #include <asm/msidef.h>
 #include <asm/hpet.h>
 #include <asm/hw_irq.h>
@@ -146,23 +147,20 @@ int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
 int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
 	struct msi_desc *msidesc;
-	unsigned int irq;
-	int node, ret;
+	int irq, ret;
 
 	/* Multiple MSI vectors only supported with interrupt remapping */
 	if (type == PCI_CAP_ID_MSI && nvec > 1)
 		return 1;
 
-	node = dev_to_node(&dev->dev);
-
 	list_for_each_entry(msidesc, &dev->msi_list, list) {
-		irq = irq_alloc_hwirq(node);
-		if (!irq)
+		irq = irq_domain_alloc_irqs(NULL, 1, NUMA_NO_NODE, NULL);
+		if (irq <= 0)
 			return -ENOSPC;
 
 		ret = setup_msi_irq(dev, msidesc, irq, 0);
 		if (ret < 0) {
-			irq_free_hwirq(irq);
+			irq_domain_free_irqs(irq, 1);
 			return ret;
 		}
 
@@ -172,7 +170,7 @@ int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 
 void native_teardown_msi_irq(unsigned int irq)
 {
-	irq_free_hwirq(irq);
+	irq_domain_free_irqs(irq, 1);
 }
 
 #ifdef CONFIG_DMAR_TABLE
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 10/31] x86, uv: Use new irqdomain interfaces to allocate/free IRQ
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (8 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 09/31] x86, MSI: " Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 11/31] x86, htirq: " Jiang Liu
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Use new irqdomain interfaces to allocate/free IRQ, so we could
kill GENERIC_IRQ_LEGACY_ALLOC_HWIRQ later.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/platform/uv/uv_irq.c |   27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index 0ce673645432..474912d03f40 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -12,6 +12,7 @@
 #include <linux/rbtree.h>
 #include <linux/slab.h>
 #include <linux/irq.h>
+#include <linux/irqdomain.h>
 
 #include <asm/apic.h>
 #include <asm/uv/uv_irq.h>
@@ -130,24 +131,14 @@ static int
 arch_enable_uv_irq(char *irq_name, unsigned int irq, int cpu, int mmr_blade,
 		       unsigned long mmr_offset, int limit)
 {
-	const struct cpumask *eligible_cpu = cpumask_of(cpu);
 	struct irq_cfg *cfg = irq_cfg(irq);
 	unsigned long mmr_value;
 	struct uv_IO_APIC_route_entry *entry;
-	int mmr_pnode, err;
-	unsigned int dest;
+	int mmr_pnode;
 
 	BUILD_BUG_ON(sizeof(struct uv_IO_APIC_route_entry) !=
 			sizeof(unsigned long));
 
-	err = assign_irq_vector(irq, cfg, eligible_cpu);
-	if (err != 0)
-		return err;
-
-	err = apic->cpu_mask_to_apicid_and(eligible_cpu, eligible_cpu, &dest);
-	if (err != 0)
-		return err;
-
 	if (limit == UV_AFFINITY_CPU)
 		irq_set_status_flags(irq, IRQ_NO_BALANCING);
 	else
@@ -164,7 +155,7 @@ arch_enable_uv_irq(char *irq_name, unsigned int irq, int cpu, int mmr_blade,
 	entry->polarity		= 0;
 	entry->trigger		= 0;
 	entry->mask		= 0;
-	entry->dest		= dest;
+	entry->dest		= cfg->dest_apicid;
 
 	mmr_pnode = uv_blade_to_pnode(mmr_blade);
 	uv_write_global_mmr64(mmr_pnode, mmr_offset, mmr_value);
@@ -238,9 +229,13 @@ uv_set_irq_affinity(struct irq_data *data, const struct cpumask *mask,
 int uv_setup_irq(char *irq_name, int cpu, int mmr_blade,
 		 unsigned long mmr_offset, int limit)
 {
-	int ret, irq = irq_alloc_hwirq(uv_blade_to_memory_nid(mmr_blade));
+	int ret, irq;
+	struct irq_alloc_info info;
 
-	if (!irq)
+	init_irq_alloc_info(&info, cpumask_of(cpu));
+	irq = irq_domain_alloc_irqs(NULL, 1, uv_blade_to_memory_nid(mmr_blade),
+				    &info);
+	if (irq <= 0)
 		return -EBUSY;
 
 	ret = arch_enable_uv_irq(irq_name, irq, cpu, mmr_blade, mmr_offset,
@@ -248,7 +243,7 @@ int uv_setup_irq(char *irq_name, int cpu, int mmr_blade,
 	if (ret == irq)
 		uv_set_irq_2_mmr_info(irq, mmr_offset, mmr_blade);
 	else
-		irq_free_hwirq(irq);
+		irq_domain_free_irqs(irq, 1);
 
 	return ret;
 }
@@ -283,6 +278,6 @@ void uv_teardown_irq(unsigned int irq)
 			n = n->rb_right;
 	}
 	spin_unlock_irqrestore(&uv_irq_lock, irqflags);
-	irq_free_hwirq(irq);
+	irq_domain_free_irqs(irq, 1);
 }
 EXPORT_SYMBOL_GPL(uv_teardown_irq);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 11/31] x86, htirq: Use new irqdomain interfaces to allocate/free IRQ
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (9 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 10/31] x86, uv: " Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 12/31] x86, dmar: " Jiang Liu
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Use new irqdomain interfaces to allocate/free IRQ for HTIRQ, so we could
kill GENERIC_IRQ_LEGACY_ALLOC_HWIRQ later.

This patch changes the interfaces between arch independent PCI driver
and arch specific code. Currently HT_IRQ is only enabled on x86, so it
shouldn't break other architectures.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/kernel/apic/htirq.c |   26 +++++++++++++-------------
 drivers/pci/htirq.c          |    7 +++----
 include/linux/htirq.h        |    2 ++
 3 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/apic/htirq.c b/arch/x86/kernel/apic/htirq.c
index 816f36e979ad..b307ee7a7148 100644
--- a/arch/x86/kernel/apic/htirq.c
+++ b/arch/x86/kernel/apic/htirq.c
@@ -14,6 +14,7 @@
 #include <linux/device.h>
 #include <linux/pci.h>
 #include <linux/htirq.h>
+#include <linux/irqdomain.h>
 #include <asm/hw_irq.h>
 #include <asm/apic.h>
 #include <asm/hypertransport.h>
@@ -61,31 +62,30 @@ static struct irq_chip ht_irq_chip = {
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
+int arch_alloc_ht_irq(struct pci_dev *dev)
+{
+	return irq_domain_alloc_irqs(NULL, 1, dev_to_node(&dev->dev), NULL);
+}
+
+void arch_free_ht_irq(int irq)
+{
+	irq_domain_free_irqs(irq, 1);
+}
+
 int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev)
 {
 	struct irq_cfg *cfg;
 	struct ht_irq_msg msg;
-	unsigned dest;
-	int err;
 
 	if (disable_apic)
 		return -ENXIO;
 
 	cfg = irq_cfg(irq);
-	err = assign_irq_vector(irq, cfg, apic->target_cpus());
-	if (err)
-		return err;
-
-	err = apic->cpu_mask_to_apicid_and(cfg->domain,
-					   apic->target_cpus(), &dest);
-	if (err)
-		return err;
-
-	msg.address_hi = HT_IRQ_HIGH_DEST_ID(dest);
+	msg.address_hi = HT_IRQ_HIGH_DEST_ID(cfg->dest_apicid);
 
 	msg.address_lo =
 		HT_IRQ_LOW_BASE |
-		HT_IRQ_LOW_DEST_ID(dest) |
+		HT_IRQ_LOW_DEST_ID(cfg->dest_apicid) |
 		HT_IRQ_LOW_VECTOR(cfg->vector) |
 		((apic->irq_dest_mode == 0) ?
 			HT_IRQ_LOW_DM_PHYSICAL :
diff --git a/drivers/pci/htirq.c b/drivers/pci/htirq.c
index a94dd2c4183a..ceb0ebeb7b5f 100644
--- a/drivers/pci/htirq.c
+++ b/drivers/pci/htirq.c
@@ -117,8 +117,8 @@ int __ht_create_irq(struct pci_dev *dev, int idx, ht_irq_update_t *update)
 	cfg->msg.address_lo = 0xffffffff;
 	cfg->msg.address_hi = 0xffffffff;
 
-	irq = irq_alloc_hwirq(dev_to_node(&dev->dev));
-	if (!irq) {
+	irq = arch_alloc_ht_irq(dev);
+	if (irq <= 0) {
 		kfree(cfg);
 		return -EBUSY;
 	}
@@ -163,8 +163,7 @@ void ht_destroy_irq(unsigned int irq)
 	cfg = irq_get_handler_data(irq);
 	irq_set_chip(irq, NULL);
 	irq_set_handler_data(irq, NULL);
-	irq_free_hwirq(irq);
-
+	arch_free_ht_irq(irq);
 	kfree(cfg);
 }
 EXPORT_SYMBOL(ht_destroy_irq);
diff --git a/include/linux/htirq.h b/include/linux/htirq.h
index 70a1dbbf2093..5caa51b7b95c 100644
--- a/include/linux/htirq.h
+++ b/include/linux/htirq.h
@@ -15,6 +15,8 @@ void unmask_ht_irq(struct irq_data *data);
 
 /* The arch hook for getting things started */
 int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev);
+int arch_alloc_ht_irq(struct pci_dev *dev);
+void arch_free_ht_irq(int irq);
 
 /* For drivers of buggy hardware */
 typedef void (ht_irq_update_t)(struct pci_dev *dev, int irq,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 12/31] x86, dmar: Use new irqdomain interfaces to allocate/free IRQ
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (10 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 11/31] x86, htirq: " Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 13/31] x86: irq_remapping: Introduce new interfaces to support hierarchy irqdomain Jiang Liu
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Joerg Roedel, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel, iommu

Use new irqdomain interfaces to allocate/free IRQ for DMAR and interrupt
remapping, so we could kill GENERIC_IRQ_LEGACY_ALLOC_HWIRQ later.

The private definition of irq_alloc_hwirqs()/irq_free_hwirqs() are
temporary solution, it will be removed once we have converted interrupt
remapping driver to use irqdomain framework.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/irq_remapping.h |    4 ++--
 arch/x86/kernel/apic/msi.c           |   10 ++++++++++
 drivers/iommu/irq_remapping.c        |   17 +++++++++++++++--
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index b7747c4c2cf2..230dde9b695e 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -103,7 +103,7 @@ static inline bool setup_remapped_irq(int irq,
 }
 #endif /* CONFIG_IRQ_REMAP */
 
-#define dmar_alloc_hwirq()	irq_alloc_hwirq(-1)
-#define dmar_free_hwirq		irq_free_hwirq
+extern int dmar_alloc_hwirq(void);
+extern void dmar_free_hwirq(int irq);
 
 #endif /* __X86_IRQ_REMAPPING_H */
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 7e45991c0e79..4bb2b583be7f 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -223,6 +223,16 @@ int arch_setup_dmar_msi(unsigned int irq)
 				      "edge");
 	return 0;
 }
+
+int dmar_alloc_hwirq(void)
+{
+	return irq_domain_alloc_irqs(NULL, 1, NUMA_NO_NODE, NULL);
+}
+
+void dmar_free_hwirq(int irq)
+{
+	irq_domain_free_irqs(irq, 1);
+}
 #endif
 
 /*
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index e9fbd68db96e..63886bafed9f 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -6,6 +6,7 @@
 #include <linux/msi.h>
 #include <linux/irq.h>
 #include <linux/pci.h>
+#include <linux/irqdomain.h>
 
 #include <asm/hw_irq.h>
 #include <asm/irq_remapping.h>
@@ -50,6 +51,18 @@ static void irq_remapping_disable_io_apic(void)
 		disconnect_bsp_APIC(0);
 }
 
+#ifndef CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ
+static unsigned int irq_alloc_hwirqs(int cnt, int node)
+{
+	return irq_domain_alloc_irqs(NULL, -1, cnt, node, NULL);
+}
+
+static void irq_free_hwirqs(unsigned int from, int cnt)
+{
+	irq_domain_free_irqs(from, cnt);
+}
+#endif
+
 static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
 {
 	int ret, sub_handle, nvec_pow2, index = 0;
@@ -113,7 +126,7 @@ static int do_setup_msix_irqs(struct pci_dev *dev, int nvec)
 
 	list_for_each_entry(msidesc, &dev->msi_list, list) {
 
-		irq = irq_alloc_hwirq(node);
+		irq = irq_alloc_hwirqs(1, node);
 		if (irq == 0)
 			return -1;
 
@@ -136,7 +149,7 @@ static int do_setup_msix_irqs(struct pci_dev *dev, int nvec)
 	return 0;
 
 error:
-	irq_free_hwirq(irq);
+	irq_free_hwirqs(irq, 1);
 	return ret;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 13/31] x86: irq_remapping: Introduce new interfaces to support hierarchy irqdomain
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (11 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 12/31] x86, dmar: " Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-06 11:43   ` Yijing Wang
  2014-11-04 12:01 ` [Patch Part2 v4 14/31] iommu/vt-d: Change prototypes to prepare for enabling " Jiang Liu
                   ` (19 subsequent siblings)
  32 siblings, 1 reply; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Joerg Roedel, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel, iommu

Introduce new interfaces for interrupt remapping drivers to support
hierarchy irqdomain:
1) irq_remapping_get_ir_irq_domain(): get irqdomain associated with an
   interrupt remapping unit. IOAPIC/HPET drivers use this interface to
   get parent interrupt remapping irqdomain.
2) irq_remapping_get_irq_domain(): get irqdomain for an IRQ allocation.
   This is mainly used to support MSI irqdomain. We must build one MSI
   irqdomain for each interrupt remapping unit. MSI driver calls this
   interface to get MSI irqdomain associated with an IR irqdomain which
   manages the PCI devices.

Architecture specific needs to implement two hooks:
1) arch_get_ir_parent_domain(): get parent irqdomain for IR irqdomain,
   which is x86_vector_domain on x86 platforms.
2) arch_create_msi_irq_domain(): create an MSI irqdomain associated with
   the interrupt remapping unit.

We also add follwing callbacks into struct irq_remap_ops:
	struct irq_domain *(*get_ir_irq_domain)(struct irq_alloc_info *);
	struct irq_domain *(*get_irq_domain)(struct irq_alloc_info *);

Once all clients of IR have been converted to new hierarchy irqdomain
interfaces, we will:
1) Remove set_ioapic_entry, set_affinity, free_irq, compose_msi_msg,
   msi_alloc_irq, msi_setup_irq, setup_hpet_msi from struct remap_osp
2) Kill setup_ioapic_remapped_entry, free_remapped_irq,
   compose_remapped_msi_msg, setup_hpet_msi_remapped, setup_remapped_irq.
3) Simplify x86_io_apic_ops and x86_msi.

We could achieve a much more clear architecture with all these changes
applied.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/hw_irq.h        |   35 +++++++++++++++++++++++++
 arch/x86/include/asm/irq_remapping.h |   39 +++++++++++++++++++++++++++
 drivers/iommu/irq_remapping.c        |   48 +++++++++++++++++++++++++++++++++-
 drivers/iommu/irq_remapping.h        |   10 +++++++
 4 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 78130156601a..9e91a5d048de 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -113,10 +113,45 @@ struct irq_2_irte {
 #ifdef	CONFIG_X86_LOCAL_APIC
 struct irq_data;
 struct irq_domain;
+struct pci_dev;
+struct msi_desc;
+
+enum irq_alloc_type {
+	X86_IRQ_ALLOC_TYPE_IOAPIC = 1,
+	X86_IRQ_ALLOC_TYPE_HPET,
+	X86_IRQ_ALLOC_TYPE_MSI,
+	X86_IRQ_ALLOC_TYPE_MSIX,
+};
 
 struct irq_alloc_info {
+	enum irq_alloc_type	type;
 	u32			flags;
 	const struct cpumask	*mask;	/* CPU mask for vector allocation */
+	union {
+		int		unused;
+#ifdef	CONFIG_HPET_TIMER
+		struct {
+			int		hpet_id;
+			int		hpet_index;
+			void		*hpet_data;
+		};
+#endif
+#ifdef	CONFIG_PCI_MSI
+		struct {
+			struct pci_dev	*msi_dev;
+			irq_hw_number_t	msi_hwirq;
+		};
+#endif
+#ifdef	CONFIG_X86_IO_APIC
+		struct {
+			int		ioapic_id;
+			int		ioapic_pin;
+			u32		ioapic_trigger : 1;
+			u32		ioapic_polarity : 1;
+			struct IO_APIC_route_entry *ioapic_entry;
+		};
+#endif
+	};
 };
 
 /* Request contigious CPU vectors */
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 230dde9b695e..d2410ac8cef9 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -22,6 +22,8 @@
 #ifndef __X86_IRQ_REMAPPING_H
 #define __X86_IRQ_REMAPPING_H
 
+#include <linux/irqdomain.h>
+#include <asm/hw_irq.h>
 #include <asm/io_apic.h>
 
 struct IO_APIC_route_entry;
@@ -30,6 +32,7 @@ struct irq_chip;
 struct msi_msg;
 struct pci_dev;
 struct irq_cfg;
+struct irq_alloc_info;
 
 #ifdef CONFIG_IRQ_REMAP
 
@@ -58,6 +61,28 @@ extern bool setup_remapped_irq(int irq,
 
 void irq_remap_modify_chip_defaults(struct irq_chip *chip);
 
+extern struct irq_domain *irq_remapping_get_ir_irq_domain(
+				struct irq_alloc_info *info);
+extern struct irq_domain *irq_remapping_get_irq_domain(
+				struct irq_alloc_info *info);
+extern void irq_remapping_print_chip(struct irq_data *data, struct seq_file *p);
+
+/*
+ * Create MSI/MSIx irqdomain for interrupt remapping device, use @parent as
+ * parent irqdomain.
+ */
+static inline struct irq_domain *
+arch_create_msi_irq_domain(struct irq_domain *parent)
+{
+	return NULL;
+}
+
+/* Get parent irqdomain for interrupt remapping irqdomain */
+static inline struct irq_domain *arch_get_ir_parent_domain(void)
+{
+	return x86_vector_domain;
+}
+
 #else  /* CONFIG_IRQ_REMAP */
 
 static inline void setup_irq_remapping_ops(void) { }
@@ -101,6 +126,20 @@ static inline bool setup_remapped_irq(int irq,
 {
 	return false;
 }
+
+static inline struct irq_domain *
+irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info)
+{
+	return NULL;
+}
+
+static inline struct irq_domain *
+irq_remapping_get_irq_domain(struct irq_alloc_info *info)
+{
+	return NULL;
+}
+
+#define	irq_remapping_print_chip	NULL
 #endif /* CONFIG_IRQ_REMAP */
 
 extern int dmar_alloc_hwirq(void);
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 63886bafed9f..176ff4372b7d 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -377,7 +377,7 @@ void panic_if_irq_remap(const char *msg)
 		panic(msg);
 }
 
-static void ir_ack_apic_edge(struct irq_data *data)
+void ir_ack_apic_edge(struct irq_data *data)
 {
 	ack_APIC_irq();
 }
@@ -388,6 +388,19 @@ static void ir_ack_apic_level(struct irq_data *data)
 	eoi_ioapic_irq(data->irq, irqd_cfg(data));
 }
 
+void irq_remapping_print_chip(struct irq_data *data, struct seq_file *p)
+{
+	/*
+	 * Assume interrupt is remapped if the parent irqdomain isn't the
+	 * vector domain, which is true for MSI, HPET and IOAPIC on x86
+	 * platforms.
+	 */
+	if (data->domain && data->domain->parent != arch_get_ir_parent_domain())
+		seq_printf(p, " IR-%s", data->chip->name);
+	else
+		seq_printf(p, " %s", data->chip->name);
+}
+
 static void ir_print_prefix(struct irq_data *data, struct seq_file *p)
 {
 	seq_printf(p, " IR-%s", data->chip->name);
@@ -409,3 +422,36 @@ bool setup_remapped_irq(int irq, struct irq_cfg *cfg, struct irq_chip *chip)
 	irq_remap_modify_chip_defaults(chip);
 	return true;
 }
+
+/**
+ * irq_remapping_get_ir_irq_domain - Get the irqdomain associated the IOMMU
+ *				     device serving @info
+ * @info: interrupt allocation information, used to find the IOMMU device
+ *
+ * It's used to get parent irqdomain for HPET and IOAPIC domains.
+ * Returns pointer to IRQ domain, or NULL on failure.
+ */
+struct irq_domain *
+irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info)
+{
+	if (!remap_ops || !remap_ops->get_ir_irq_domain)
+		return NULL;
+
+	return remap_ops->get_ir_irq_domain(info);
+}
+
+/**
+ * irq_remapping_get_irq_domain - Get the irqdomain serving the MSI interrupt
+ * @info: interrupt allocation information, used to find the IOMMU device
+ *
+ * It's used to get irqdomain for MSI/MSIx interrupt allocation.
+ * Returns pointer to IRQ domain, or NULL on failure.
+ */
+struct irq_domain *
+irq_remapping_get_irq_domain(struct irq_alloc_info *info)
+{
+	if (!remap_ops || !remap_ops->get_irq_domain)
+		return NULL;
+
+	return remap_ops->get_irq_domain(info);
+}
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index fde250f86e60..8c159d6fac46 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -30,6 +30,8 @@ struct irq_data;
 struct cpumask;
 struct pci_dev;
 struct msi_msg;
+struct irq_domain;
+struct irq_alloc_info;
 
 extern int disable_irq_remap;
 extern int irq_remap_broken;
@@ -81,11 +83,19 @@ struct irq_remap_ops {
 
 	/* Setup interrupt remapping for an HPET MSI */
 	int (*alloc_hpet_msi)(unsigned int, unsigned int);
+
+	/* Get the irqdomain associated the IOMMU device */
+	struct irq_domain *(*get_ir_irq_domain)(struct irq_alloc_info *);
+
+	/* Get the MSI irqdomain associated with the IOMMU device */
+	struct irq_domain *(*get_irq_domain)(struct irq_alloc_info *);
 };
 
 extern struct irq_remap_ops intel_irq_remap_ops;
 extern struct irq_remap_ops amd_iommu_irq_ops;
 
+extern void ir_ack_apic_edge(struct irq_data *data);
+
 #else  /* CONFIG_IRQ_REMAP */
 
 #define irq_remapping_enabled 0
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 14/31] iommu/vt-d: Change prototypes to prepare for enabling hierarchy irqdomain
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (12 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 13/31] x86: irq_remapping: Introduce new interfaces to support hierarchy irqdomain Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 15/31] iommu/vt-d: Enhance Intel IR driver to suppport " Jiang Liu
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Joerg Roedel, Matthias Brugger
  Cc: Jiang Liu, Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel, iommu

Prepare for support hierarchy irqdomain by changing function prototypes,
should be no function changes.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/iommu/intel_irq_remapping.c |   22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index fd181cf8a589..5acad492701e 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -82,10 +82,10 @@ static int get_irte(int irq, struct irte *entry)
 	return 0;
 }
 
-static int alloc_irte(struct intel_iommu *iommu, int irq, u16 count)
+static int alloc_irte(struct intel_iommu *iommu, int irq,
+		      struct irq_2_iommu *irq_iommu, u16 count)
 {
 	struct ir_table *table = iommu->ir_table;
-	struct irq_2_iommu *irq_iommu = irq_2_iommu(irq);
 	struct irq_cfg *cfg = irq_cfg(irq);
 	unsigned int mask = 0;
 	unsigned long flags;
@@ -173,9 +173,9 @@ static int set_irte_irq(int irq, struct intel_iommu *iommu, u16 index, u16 subha
 	return 0;
 }
 
-static int modify_irte(int irq, struct irte *irte_modified)
+static int modify_irte(struct irq_2_iommu *irq_iommu,
+		       struct irte *irte_modified)
 {
-	struct irq_2_iommu *irq_iommu = irq_2_iommu(irq);
 	struct intel_iommu *iommu;
 	unsigned long flags;
 	struct irte *irte;
@@ -242,7 +242,7 @@ static int clear_entries(struct irq_2_iommu *irq_iommu)
 		return 0;
 
 	iommu = irq_iommu->iommu;
-	index = irq_iommu->irte_index + irq_iommu->sub_handle;
+	index = irq_iommu->irte_index;
 
 	start = iommu->ir_table->base + index;
 	end = start + (1 << irq_iommu->irte_mask);
@@ -937,7 +937,7 @@ static int intel_setup_ioapic_entry(int irq,
 		pr_warn("No mapping iommu for ioapic %d\n", ioapic_id);
 		index = -ENODEV;
 	} else {
-		index = alloc_irte(iommu, irq, 1);
+		index = alloc_irte(iommu, irq, irq_2_iommu(irq), 1);
 		if (index < 0) {
 			pr_warn("Failed to allocate IRTE for ioapic %d\n",
 				ioapic_id);
@@ -953,7 +953,7 @@ static int intel_setup_ioapic_entry(int irq,
 	/* Set source-id of interrupt request */
 	set_ioapic_sid(&irte, ioapic_id);
 
-	modify_irte(irq, &irte);
+	modify_irte(irq_2_iommu(irq), &irte);
 
 	apic_printk(APIC_VERBOSE, KERN_DEBUG "IOAPIC[%d]: "
 		"Set IRTE entry (P:%d FPD:%d Dst_Mode:%d "
@@ -1040,7 +1040,7 @@ intel_ioapic_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	 * Atomically updates the IRTE with the new destination, vector
 	 * and flushes the interrupt entry cache.
 	 */
-	modify_irte(irq, &irte);
+	modify_irte(irq_2_iommu(irq), &irte);
 
 	/*
 	 * After this point, all the interrupts will start arriving
@@ -1076,7 +1076,7 @@ static void intel_compose_msi_msg(struct pci_dev *pdev,
 	else
 		set_hpet_sid(&irte, hpet_id);
 
-	modify_irte(irq, &irte);
+	modify_irte(irq_2_iommu(irq), &irte);
 
 	msg->address_hi = MSI_ADDR_BASE_HI;
 	msg->data = sub_handle;
@@ -1103,7 +1103,7 @@ static int intel_msi_alloc_irq(struct pci_dev *dev, int irq, int nvec)
 		       "Unable to map PCI %s to iommu\n", pci_name(dev));
 		index = -ENOENT;
 	} else {
-		index = alloc_irte(iommu, irq, nvec);
+		index = alloc_irte(iommu, irq, irq_2_iommu(irq), nvec);
 		if (index < 0) {
 			printk(KERN_ERR
 			       "Unable to allocate %d IRTE for PCI %s\n",
@@ -1147,7 +1147,7 @@ static int intel_alloc_hpet_msi(unsigned int irq, unsigned int id)
 	down_read(&dmar_global_lock);
 	iommu = map_hpet_to_ir(id);
 	if (iommu) {
-		index = alloc_irte(iommu, irq, 1);
+		index = alloc_irte(iommu, irq, irq_2_iommu(irq), 1);
 		if (index >= 0)
 			ret = 0;
 	}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 15/31] iommu/vt-d: Enhance Intel IR driver to suppport hierarchy irqdomain
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (13 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 14/31] iommu/vt-d: Change prototypes to prepare for enabling " Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 16/31] iommu/amd: Enhance AMD " Jiang Liu
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Joerg Roedel, David Woodhouse, Matthias Brugger
  Cc: Jiang Liu, Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel, iommu

Enhance Intel interrupt remapping driver to support hierarchy irqdomain,
it will simplify the code eventually. It also implements intel_ir_chip
to support stacked irq_chip.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/iommu/intel_irq_remapping.c |  353 +++++++++++++++++++++++++++++++++--
 include/linux/intel-iommu.h         |    4 +
 2 files changed, 339 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 5acad492701e..88196ca55e29 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -8,6 +8,7 @@
 #include <linux/irq.h>
 #include <linux/intel-iommu.h>
 #include <linux/acpi.h>
+#include <linux/irqdomain.h>
 #include <asm/io_apic.h>
 #include <asm/smp.h>
 #include <asm/cpu.h>
@@ -31,6 +32,14 @@ struct hpet_scope {
 	unsigned int devfn;
 };
 
+struct intel_ir_data {
+	struct irq_2_iommu			irq_2_iommu;
+	struct irte				irte_entry;
+	union {
+		struct msi_msg			msi_entry;
+	};
+};
+
 #define IR_X2APIC_MODE(mode) (mode ? (1 << 11) : 0)
 #define IRTE_DEST(dest) ((x2apic_mode) ? dest : dest << 8)
 
@@ -50,6 +59,7 @@ static int ir_ioapic_num, ir_hpet_num;
  * the dmar_global_lock.
  */
 static DEFINE_RAW_SPINLOCK(irq_2_ir_lock);
+static struct irq_domain_ops intel_ir_domain_ops;
 
 static int __init parse_ioapics_under_ir(void);
 
@@ -263,7 +273,7 @@ static int free_irte(int irq)
 	unsigned long flags;
 	int rc;
 
-	if (!irq_iommu)
+	if (!irq_iommu || irq_iommu->iommu == NULL)
 		return -1;
 
 	raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
@@ -480,36 +490,47 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu, int mode)
 	struct page *pages;
 	unsigned long *bitmap;
 
-	ir_table = iommu->ir_table = kzalloc(sizeof(struct ir_table),
-					     GFP_ATOMIC);
-
-	if (!iommu->ir_table)
+	ir_table = kzalloc(sizeof(struct ir_table), GFP_ATOMIC);
+	if (!ir_table)
 		return -ENOMEM;
 
 	pages = alloc_pages_node(iommu->node, GFP_ATOMIC | __GFP_ZERO,
 				 INTR_REMAP_PAGE_ORDER);
-
 	if (!pages) {
 		pr_err("IR%d: failed to allocate pages of order %d\n",
 		       iommu->seq_id, INTR_REMAP_PAGE_ORDER);
-		kfree(iommu->ir_table);
-		return -ENOMEM;
+		goto out_free_table;
 	}
 
 	bitmap = kcalloc(BITS_TO_LONGS(INTR_REMAP_TABLE_ENTRIES),
 			 sizeof(long), GFP_ATOMIC);
 	if (bitmap == NULL) {
 		pr_err("IR%d: failed to allocate bitmap\n", iommu->seq_id);
-		__free_pages(pages, INTR_REMAP_PAGE_ORDER);
-		kfree(ir_table);
-		return -ENOMEM;
+		goto out_free_pages;
+	}
+
+	iommu->ir_domain = irq_domain_add_linear(NULL, INTR_REMAP_TABLE_ENTRIES,
+						 &intel_ir_domain_ops, iommu);
+	if (!iommu->ir_domain) {
+		pr_err("IR%d: failed to allocate irqdomain\n", iommu->seq_id);
+		goto out_free_bitmap;
 	}
+	iommu->ir_domain->parent = arch_get_ir_parent_domain();
+	iommu->ir_msi_domain = arch_create_msi_irq_domain(iommu->ir_domain);
 
 	ir_table->base = page_address(pages);
 	ir_table->bitmap = bitmap;
-
+	iommu->ir_table = ir_table;
 	iommu_set_irq_remapping(iommu, mode);
 	return 0;
+
+out_free_bitmap:
+	kfree(bitmap);
+out_free_pages:
+	__free_pages(pages, INTR_REMAP_PAGE_ORDER);
+out_free_table:
+	kfree(ir_table);
+	return -ENOMEM;
 }
 
 /*
@@ -1013,12 +1034,6 @@ intel_ioapic_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	struct irte irte;
 	int err;
 
-	if (!config_enabled(CONFIG_SMP))
-		return -EINVAL;
-
-	if (!cpumask_intersects(mask, cpu_online_mask))
-		return -EINVAL;
-
 	if (get_irte(irq, &irte))
 		return -EBUSY;
 
@@ -1051,6 +1066,7 @@ intel_ioapic_set_affinity(struct irq_data *data, const struct cpumask *mask,
 		send_cleanup_vector(cfg);
 
 	cpumask_copy(data->affinity, mask);
+
 	return 0;
 }
 
@@ -1156,6 +1172,53 @@ static int intel_alloc_hpet_msi(unsigned int irq, unsigned int id)
 	return ret;
 }
 
+static struct irq_domain *intel_get_ir_irq_domain(struct irq_alloc_info *info)
+{
+	struct intel_iommu *iommu = NULL;
+
+	if (!info)
+		return NULL;
+
+	switch (info->type) {
+	case X86_IRQ_ALLOC_TYPE_IOAPIC:
+		iommu = map_ioapic_to_ir(info->ioapic_id);
+		break;
+	case X86_IRQ_ALLOC_TYPE_HPET:
+		iommu = map_hpet_to_ir(info->hpet_id);
+		break;
+	case X86_IRQ_ALLOC_TYPE_MSI:
+	case X86_IRQ_ALLOC_TYPE_MSIX:
+		iommu = map_dev_to_ir(info->msi_dev);
+		break;
+	default:
+		BUG_ON(1);
+		break;
+	}
+
+	return iommu ? iommu->ir_domain : NULL;
+}
+
+static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
+{
+	struct intel_iommu *iommu;
+
+	if (!info)
+		return NULL;
+
+	switch (info->type) {
+	case X86_IRQ_ALLOC_TYPE_MSI:
+	case X86_IRQ_ALLOC_TYPE_MSIX:
+		iommu = map_dev_to_ir(info->msi_dev);
+		if (iommu)
+			return iommu->ir_msi_domain;
+		break;
+	default:
+		break;
+	}
+
+	return NULL;
+}
+
 struct irq_remap_ops intel_irq_remap_ops = {
 	.supported		= intel_irq_remapping_supported,
 	.prepare		= dmar_table_init,
@@ -1170,4 +1233,258 @@ struct irq_remap_ops intel_irq_remap_ops = {
 	.msi_alloc_irq		= intel_msi_alloc_irq,
 	.msi_setup_irq		= intel_msi_setup_irq,
 	.alloc_hpet_msi		= intel_alloc_hpet_msi,
+	.get_ir_irq_domain	= intel_get_ir_irq_domain,
+	.get_irq_domain		= intel_get_irq_domain,
+};
+
+/*
+ * Migrate the IO-APIC irq in the presence of intr-remapping.
+ *
+ * For both level and edge triggered, irq migration is a simple atomic
+ * update(of vector and cpu destination) of IRTE and flush the hardware cache.
+ *
+ * For level triggered, we eliminate the io-apic RTE modification (with the
+ * updated vector information), by using a virtual vector (io-apic pin number).
+ * Real vector that is used for interrupting cpu will be coming from
+ * the interrupt-remapping table entry.
+ *
+ * As the migration is a simple atomic update of IRTE, the same mechanism
+ * is used to migrate MSI irq's in the presence of interrupt-remapping.
+ */
+static int
+intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
+		      bool force)
+{
+	struct intel_ir_data *ir_data = data->chip_data;
+	struct irte *irte = &ir_data->irte_entry;
+	struct irq_cfg *cfg = irqd_cfg(data);
+	struct irq_data *parent = data->parent_data;
+	int ret;
+
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+		return ret;
+
+	/*
+	 * Atomically updates the IRTE with the new destination, vector
+	 * and flushes the interrupt entry cache.
+	 */
+	irte->vector = cfg->vector;
+	irte->dest_id = IRTE_DEST(cfg->dest_apicid);
+	modify_irte(&ir_data->irq_2_iommu, irte);
+
+	/*
+	 * After this point, all the interrupts will start arriving
+	 * at the new destination. So, time to cleanup the previous
+	 * vector allocation.
+	 */
+	if (cfg->move_in_progress)
+		send_cleanup_vector(cfg);
+
+	return IRQ_SET_MASK_OK_DONE;
+}
+
+static void intel_ir_compose_msi_msg(struct irq_data *irq_data,
+				     struct msi_msg *msg)
+{
+	struct intel_ir_data *ir_data = irq_data->chip_data;
+
+	*msg = ir_data->msi_entry;
+}
+
+static struct irq_chip intel_ir_chip = {
+	.irq_ack = ir_ack_apic_edge,
+	.irq_set_affinity = intel_ir_set_affinity,
+	.irq_compose_msi_msg = intel_ir_compose_msi_msg,
+};
+
+static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
+					     struct irq_cfg *irq_cfg,
+					     struct irq_alloc_info *info,
+					     int index, int sub_handle)
+{
+	struct IR_IO_APIC_route_entry *entry;
+	struct irte *irte = &data->irte_entry;
+	struct msi_msg *msg = &data->msi_entry;
+
+	prepare_irte(irte, irq_cfg->vector, irq_cfg->dest_apicid);
+	switch (info->type) {
+	case X86_IRQ_ALLOC_TYPE_IOAPIC:
+		/* Set source-id of interrupt request */
+		set_ioapic_sid(irte, info->ioapic_id);
+		apic_printk(APIC_VERBOSE, KERN_DEBUG "IOAPIC[%d]: Set IRTE entry (P:%d FPD:%d Dst_Mode:%d Redir_hint:%d Trig_Mode:%d Dlvry_Mode:%X Avail:%X Vector:%02X Dest:%08X SID:%04X SQ:%X SVT:%X)\n",
+			info->ioapic_id, irte->present, irte->fpd,
+			irte->dst_mode, irte->redir_hint,
+			irte->trigger_mode, irte->dlvry_mode,
+			irte->avail, irte->vector, irte->dest_id,
+			irte->sid, irte->sq, irte->svt);
+
+		entry = (struct IR_IO_APIC_route_entry *)info->ioapic_entry;
+		info->ioapic_entry = NULL;
+		memset(entry, 0, sizeof(*entry));
+		entry->index2	= (index >> 15) & 0x1;
+		entry->zero	= 0;
+		entry->format	= 1;
+		entry->index	= (index & 0x7fff);
+		/*
+		 * IO-APIC RTE will be configured with virtual vector.
+		 * irq handler will do the explicit EOI to the io-apic.
+		 */
+		entry->vector	= info->ioapic_pin;
+		entry->mask	= 0;			/* enable IRQ */
+		entry->trigger	= info->ioapic_trigger;
+		entry->polarity	= info->ioapic_polarity;
+		if (info->ioapic_trigger)
+			entry->mask = 1; /* Mask level triggered irqs. */
+		break;
+
+	case X86_IRQ_ALLOC_TYPE_HPET:
+	case X86_IRQ_ALLOC_TYPE_MSI:
+	case X86_IRQ_ALLOC_TYPE_MSIX:
+		if (info->type == X86_IRQ_ALLOC_TYPE_HPET)
+			set_hpet_sid(irte, info->hpet_id);
+		else
+			set_msi_sid(irte, info->msi_dev);
+
+		msg->address_hi = MSI_ADDR_BASE_HI;
+		msg->data = sub_handle;
+		msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
+				  MSI_ADDR_IR_SHV |
+				  MSI_ADDR_IR_INDEX1(index) |
+				  MSI_ADDR_IR_INDEX2(index);
+		break;
+
+	default:
+		BUG_ON(1);
+		break;
+	}
+}
+
+static void intel_free_irq_resources(struct irq_domain *domain,
+				     unsigned int virq, unsigned int nr_irqs)
+{
+	struct irq_data *irq_data;
+	struct intel_ir_data *data;
+	struct irq_2_iommu *irq_iommu;
+	unsigned long flags;
+	int i;
+
+	for (i = 0; i < nr_irqs; i++) {
+		irq_data = irq_domain_get_irq_data(domain, virq  + i);
+		if (irq_data && irq_data->chip_data) {
+			data = irq_data->chip_data;
+			irq_iommu = &data->irq_2_iommu;
+			raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
+			clear_entries(irq_iommu);
+			raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);
+			irq_domain_reset_irq_data(irq_data);
+			kfree(data);
+		}
+	}
+}
+
+static int intel_irq_remapping_alloc(struct irq_domain *domain,
+				     unsigned int virq, unsigned int nr_irqs,
+				     void *arg)
+{
+	struct intel_iommu *iommu = domain->host_data;
+	struct irq_alloc_info *info = arg;
+	struct intel_ir_data *data;
+	struct irq_data *irq_data;
+	struct irq_cfg *irq_cfg;
+	int i, ret, index;
+
+	if (!info || !iommu)
+		return -EINVAL;
+	if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_MSI &&
+	    info->type != X86_IRQ_ALLOC_TYPE_MSIX)
+		return -EINVAL;
+
+	/*
+	 * With IRQ remapping enabled, don't need contigious CPU vectors
+	 * to support multiple MSI interrupts.
+	 */
+	if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+		info->flags &= ~X86_IRQ_ALLOC_CONTIGOUS_VECTORS;
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+	if (ret < 0)
+		return ret;
+
+	ret = -ENOMEM;
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		goto out_free_parent;
+
+	down_read(&dmar_global_lock);
+	index = alloc_irte(iommu, virq, &data->irq_2_iommu, nr_irqs);
+	up_read(&dmar_global_lock);
+	if (index < 0) {
+		pr_warn("Failed to allocate IRTE\n");
+		kfree(data);
+		goto out_free_parent;
+	}
+
+	for (i = 0; i < nr_irqs; i++) {
+		irq_data = irq_domain_get_irq_data(domain, virq + i);
+		irq_cfg = irqd_cfg(irq_data);
+		if (!irq_data || !irq_cfg) {
+			ret = -EINVAL;
+			goto out_free_data;
+		}
+
+		if (i > 0) {
+			data = kzalloc(sizeof(*data), GFP_KERNEL);
+			if (!data)
+				goto out_free_data;
+		}
+		irq_data->hwirq = (index << 16) + i;
+		irq_data->chip_data = data;
+		irq_data->chip = &intel_ir_chip;
+		intel_irq_remapping_prepare_irte(data, irq_cfg, info, index, i);
+		irq_set_status_flags(virq + i, IRQ_MOVE_PCNTXT);
+	}
+	return 0;
+
+out_free_data:
+	intel_free_irq_resources(domain, virq, i);
+out_free_parent:
+	irq_domain_free_irqs_common(domain, virq, nr_irqs);
+	return ret;
+}
+
+static void intel_irq_remapping_free(struct irq_domain *domain,
+				     unsigned int virq, unsigned int nr_irqs)
+{
+	intel_free_irq_resources(domain, virq, nr_irqs);
+	irq_domain_free_irqs_common(domain, virq, nr_irqs);
+}
+
+static int intel_irq_remapping_activate(struct irq_domain *domain,
+					struct irq_data *irq_data)
+{
+	struct intel_ir_data *data = irq_data->chip_data;
+
+	modify_irte(&data->irq_2_iommu, &data->irte_entry);
+
+	return 0;
+}
+
+static int intel_irq_remapping_deactivate(struct irq_domain *domain,
+					  struct irq_data *irq_data)
+{
+	struct intel_ir_data *data = irq_data->chip_data;
+	struct irte entry;
+
+	memset(&entry, 0, sizeof(entry));
+	modify_irte(&data->irq_2_iommu, &entry);
+
+	return 0;
+}
+
+static struct irq_domain_ops intel_ir_domain_ops = {
+	.alloc = intel_irq_remapping_alloc,
+	.free = intel_irq_remapping_free,
+	.activate = intel_irq_remapping_activate,
+	.deactivate = intel_irq_remapping_deactivate,
 };
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index a65208a8fe18..ecaf3a937845 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -286,6 +286,8 @@ struct q_inval {
 
 #define INTR_REMAP_TABLE_ENTRIES	65536
 
+struct irq_domain;
+
 struct ir_table {
 	struct irte *base;
 	unsigned long *bitmap;
@@ -335,6 +337,8 @@ struct intel_iommu {
 
 #ifdef CONFIG_IRQ_REMAP
 	struct ir_table *ir_table;	/* Interrupt remapping info */
+	struct irq_domain *ir_domain;
+	struct irq_domain *ir_msi_domain;
 #endif
 	struct device	*iommu_dev; /* IOMMU-sysfs device */
 	int		node;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 16/31] iommu/amd: Enhance AMD IR driver to suppport hierarchy irqdomain
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (14 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 15/31] iommu/vt-d: Enhance Intel IR driver to suppport " Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 17/31] x86, hpet: Enhance HPET IRQ to support " Jiang Liu
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Joerg Roedel, Matthias Brugger
  Cc: Jiang Liu, Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel, iommu

Enhance AMD interrupt remapping driver to support hierarchy irqdomain,
it will simplify the code eventually.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/iommu/amd_iommu.c       |  333 ++++++++++++++++++++++++++++++++++++++-
 drivers/iommu/amd_iommu_init.c  |    4 +
 drivers/iommu/amd_iommu_proto.h |    9 ++
 drivers/iommu/amd_iommu_types.h |    5 +
 4 files changed, 345 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 6fda7cc789eb..2d03e294e40f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -33,6 +33,7 @@
 #include <linux/export.h>
 #include <linux/irq.h>
 #include <linux/msi.h>
+#include <linux/irqdomain.h>
 #include <asm/irq_remapping.h>
 #include <asm/io_apic.h>
 #include <asm/apic.h>
@@ -3854,6 +3855,16 @@ union irte {
 	} fields;
 };
 
+struct amd_ir_data {
+	struct irq_2_irte			irq_2_irte;
+	union irte				irte_entry;
+	union {
+		struct msi_msg			msi_entry;
+	};
+};
+
+static struct irq_chip amd_ir_chip;
+
 #define DTE_IRQ_PHYS_ADDR_MASK	(((1ULL << 45)-1) << 6)
 #define DTE_IRQ_REMAP_INTCTL    (2ULL << 60)
 #define DTE_IRQ_TABLE_LEN       (8ULL << 1)
@@ -3947,7 +3958,8 @@ out_unlock:
 	return table;
 }
 
-static int alloc_irq_index(struct irq_cfg *cfg, u16 devid, int count)
+static int alloc_irq_index(struct irq_cfg *cfg, struct irq_2_irte *irte_info,
+			   u16 devid, int count)
 {
 	struct irq_remap_table *table;
 	unsigned long flags;
@@ -3969,15 +3981,12 @@ static int alloc_irq_index(struct irq_cfg *cfg, u16 devid, int count)
 			c = 0;
 
 		if (c == count)	{
-			struct irq_2_irte *irte_info;
-
 			for (; c != 0; --c)
 				table->table[index - c + 1] = IRTE_ALLOCATED;
 
 			index -= count - 1;
 
 			cfg->remapped	      = 1;
-			irte_info             = &cfg->irq_2_irte;
 			irte_info->devid      = devid;
 			irte_info->index      = index;
 
@@ -4222,7 +4231,7 @@ static int msi_alloc_irq(struct pci_dev *pdev, int irq, int nvec)
 		return -EINVAL;
 
 	devid = get_device_id(&pdev->dev);
-	index = alloc_irq_index(cfg, devid, nvec);
+	index = alloc_irq_index(cfg, &cfg->irq_2_irte, devid, nvec);
 
 	return index < 0 ? MAX_IRQS_PER_TABLE : index;
 }
@@ -4269,7 +4278,7 @@ static int alloc_hpet_msi(unsigned int irq, unsigned int id)
 	if (devid < 0)
 		return devid;
 
-	index = alloc_irq_index(cfg, devid, 1);
+	index = alloc_irq_index(cfg, &cfg->irq_2_irte, devid, 1);
 	if (index < 0)
 		return index;
 
@@ -4280,6 +4289,72 @@ static int alloc_hpet_msi(unsigned int irq, unsigned int id)
 	return 0;
 }
 
+static int get_devid(struct irq_alloc_info *info)
+{
+	int devid = -1;
+
+	switch (info->type) {
+	case X86_IRQ_ALLOC_TYPE_IOAPIC:
+		devid     = get_ioapic_devid(info->ioapic_id);
+		break;
+	case X86_IRQ_ALLOC_TYPE_HPET:
+		devid     = get_hpet_devid(info->hpet_id);
+		break;
+	case X86_IRQ_ALLOC_TYPE_MSI:
+	case X86_IRQ_ALLOC_TYPE_MSIX:
+		devid = get_device_id(&info->msi_dev->dev);
+		break;
+	default:
+		BUG_ON(1);
+		break;
+	}
+
+	return devid;
+}
+
+static struct irq_domain *get_ir_irq_domain(struct irq_alloc_info *info)
+{
+	int devid;
+	struct amd_iommu *iommu;
+
+	if (!info)
+		return NULL;
+
+	devid = get_devid(info);
+	if (devid >= 0) {
+		iommu = amd_iommu_rlookup_table[devid];
+		if (iommu)
+			return iommu->ir_domain;
+	}
+
+	return NULL;
+}
+
+static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
+{
+	int devid;
+	struct amd_iommu *iommu;
+
+	if (!info)
+		return NULL;
+
+	switch (info->type) {
+	case X86_IRQ_ALLOC_TYPE_MSI:
+	case X86_IRQ_ALLOC_TYPE_MSIX:
+		devid = get_device_id(&info->msi_dev->dev);
+		if (devid >= 0) {
+			iommu = amd_iommu_rlookup_table[devid];
+			if (iommu)
+				return iommu->msi_domain;
+		}
+		break;
+	default:
+		break;
+	}
+
+	return NULL;
+}
+
 struct irq_remap_ops amd_iommu_irq_ops = {
 	.supported		= amd_iommu_supported,
 	.prepare		= amd_iommu_prepare,
@@ -4294,5 +4369,251 @@ struct irq_remap_ops amd_iommu_irq_ops = {
 	.msi_alloc_irq		= msi_alloc_irq,
 	.msi_setup_irq		= msi_setup_irq,
 	.alloc_hpet_msi		= alloc_hpet_msi,
+	.get_ir_irq_domain	= get_ir_irq_domain,
+	.get_irq_domain		= get_irq_domain,
+};
+
+static void irq_remapping_prepare_irte(struct amd_ir_data *data,
+				       struct irq_cfg *irq_cfg,
+				       struct irq_alloc_info *info,
+				       int devid, int index, int sub_handle)
+{
+	union irte *irte = &data->irte_entry;
+	struct irq_2_irte *irte_info = &data->irq_2_irte;
+	struct msi_msg *msg = &data->msi_entry;
+	struct IO_APIC_route_entry *entry;
+
+	irq_cfg->remapped = 1;
+	data->irq_2_irte.devid = devid;
+	data->irq_2_irte.index = index + sub_handle;
+
+	/* Setup IRTE for IOMMU */
+	irte->val = 0;
+	irte->fields.vector      = irq_cfg->vector;
+	irte->fields.int_type    = apic->irq_delivery_mode;
+	irte->fields.destination = irq_cfg->dest_apicid;
+	irte->fields.dm          = apic->irq_dest_mode;
+	irte->fields.valid       = 1;
+
+	switch (info->type) {
+	case X86_IRQ_ALLOC_TYPE_IOAPIC:
+		/* Setup IOAPIC entry */
+		entry = info->ioapic_entry;
+		info->ioapic_entry = NULL;
+		memset(entry, 0, sizeof(*entry));
+		entry->vector        = index;
+		entry->mask          = 0;
+		entry->trigger       = info->ioapic_trigger;
+		entry->polarity      = info->ioapic_polarity;
+		/* Mask level triggered irqs. */
+		if (info->ioapic_trigger)
+			entry->mask = 1;
+		break;
+
+	case X86_IRQ_ALLOC_TYPE_HPET:
+	case X86_IRQ_ALLOC_TYPE_MSI:
+	case X86_IRQ_ALLOC_TYPE_MSIX:
+		msg->address_hi = MSI_ADDR_BASE_HI;
+		msg->address_lo = MSI_ADDR_BASE_LO;
+		msg->data = irte_info->index;
+		break;
+
+	default:
+		BUG_ON(1);
+		break;
+	}
+}
+
+static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq,
+			       unsigned int nr_irqs, void *arg)
+{
+	struct irq_alloc_info *info = arg;
+	struct amd_ir_data *data;
+	struct irq_data *irq_data;
+	struct irq_cfg *cfg;
+	int i, ret, devid;
+	int index = -1;
+
+	if (!info)
+		return -EINVAL;
+	if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_MSI &&
+	    info->type != X86_IRQ_ALLOC_TYPE_MSIX)
+		return -EINVAL;
+
+	/*
+	 * With IRQ remapping enabled, don't need contigious CPU vectors
+	 * to support multiple MSI interrupts.
+	 */
+	if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+		info->flags &= ~X86_IRQ_ALLOC_CONTIGOUS_VECTORS;
+
+	devid = get_devid(info);
+	if (devid < 0)
+		return -EINVAL;
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+	if (ret < 0)
+		return ret;
+
+	ret = -ENOMEM;
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		goto out_free_parent;
+
+	if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
+		if (get_irq_table(devid, true))
+			index = info->ioapic_pin;
+		else
+			ret = -ENOMEM;
+	} else {
+		cfg = irq_cfg(virq);
+		index = alloc_irq_index(cfg, &data->irq_2_irte, devid, nr_irqs);
+	}
+	if (index < 0) {
+		pr_warn("Failed to allocate IRTE\n");
+		kfree(data);
+		goto out_free_parent;
+	}
+
+	for (i = 0; i < nr_irqs; i++) {
+		irq_data = irq_domain_get_irq_data(domain, virq + i);
+		cfg = irqd_cfg(irq_data);
+		if (!irq_data || !cfg) {
+			ret = -EINVAL;
+			goto out_free_data;
+		}
+
+		if (i > 0) {
+			data = kzalloc(sizeof(*data), GFP_KERNEL);
+			if (!data)
+				goto out_free_data;
+		}
+		irq_data->hwirq = (devid << 16) + i;
+		irq_data->chip_data = data;
+		irq_data->chip = &amd_ir_chip;
+		irq_remapping_prepare_irte(data, cfg, info, devid, index, i);
+		irq_set_status_flags(virq + i, IRQ_MOVE_PCNTXT);
+	}
+	return 0;
+
+out_free_data:
+	for (i--; i >= 0; i--) {
+		irq_data = irq_domain_get_irq_data(domain, virq + i);
+		if (irq_data->chip_data)
+			kfree(irq_data->chip_data);
+	}
+	for (i = 0; i < nr_irqs; i++)
+		free_irte(devid, index + i);
+out_free_parent:
+	irq_domain_free_irqs_common(domain, virq, nr_irqs);
+	return ret;
+}
+
+static void irq_remapping_free(struct irq_domain *domain, unsigned int virq,
+			       unsigned int nr_irqs)
+{
+	struct irq_data *irq_data;
+	struct amd_ir_data *data;
+	struct irq_2_irte *irte_info;
+	int i;
+
+	for (i = 0; i < nr_irqs; i++) {
+		irq_data = irq_domain_get_irq_data(domain, virq  + i);
+		if (irq_data && irq_data->chip_data) {
+			data = irq_data->chip_data;
+			irte_info = &data->irq_2_irte;
+			free_irte(irte_info->devid, irte_info->index);
+			kfree(data);
+		}
+	}
+	irq_domain_free_irqs_common(domain, virq, nr_irqs);
+}
+
+static int irq_remapping_activate(struct irq_domain *domain,
+				  struct irq_data *irq_data)
+{
+	struct amd_ir_data *data = irq_data->chip_data;
+	struct irq_2_irte *irte_info = &data->irq_2_irte;
+
+	modify_irte(irte_info->devid, irte_info->index, data->irte_entry);
+
+	return 0;
+}
+
+static int irq_remapping_deactivate(struct irq_domain *domain,
+				    struct irq_data *irq_data)
+{
+	struct amd_ir_data *data = irq_data->chip_data;
+	struct irq_2_irte *irte_info = &data->irq_2_irte;
+	union irte entry;
+
+	entry.val = 0;
+	modify_irte(irte_info->devid, irte_info->index, data->irte_entry);
+
+	return 0;
+}
+
+static struct irq_domain_ops amd_ir_domain_ops = {
+	.alloc = irq_remapping_alloc,
+	.free = irq_remapping_free,
+	.activate = irq_remapping_activate,
+	.deactivate = irq_remapping_deactivate,
+};
+
+static int amd_ir_set_affinity(struct irq_data *data,
+			       const struct cpumask *mask, bool force)
+{
+	struct amd_ir_data *ir_data = data->chip_data;
+	struct irq_2_irte *irte_info = &ir_data->irq_2_irte;
+	struct irq_cfg *cfg = irqd_cfg(data);
+	struct irq_data *parent = data->parent_data;
+	int ret;
+
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+		return ret;
+
+	/*
+	 * Atomically updates the IRTE with the new destination, vector
+	 * and flushes the interrupt entry cache.
+	 */
+	ir_data->irte_entry.fields.vector = cfg->vector;
+	ir_data->irte_entry.fields.destination = cfg->dest_apicid;
+	modify_irte(irte_info->devid, irte_info->index, ir_data->irte_entry);
+
+	/*
+	 * After this point, all the interrupts will start arriving
+	 * at the new destination. So, time to cleanup the previous
+	 * vector allocation.
+	 */
+	if (cfg->move_in_progress)
+		send_cleanup_vector(cfg);
+
+	return IRQ_SET_MASK_OK_DONE;
+}
+
+static void ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
+{
+	struct amd_ir_data *ir_data = irq_data->chip_data;
+
+	*msg = ir_data->msi_entry;
+}
+
+static struct irq_chip amd_ir_chip = {
+	.irq_ack = ir_ack_apic_edge,
+	.irq_set_affinity = amd_ir_set_affinity,
+	.irq_compose_msi_msg = ir_compose_msi_msg,
 };
+
+int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
+{
+	iommu->ir_domain = irq_domain_add_tree(NULL, &amd_ir_domain_ops, iommu);
+	if (!iommu->ir_domain)
+		return -ENOMEM;
+
+	iommu->ir_domain->parent = arch_get_ir_parent_domain();
+	iommu->msi_domain = arch_create_msi_irq_domain(iommu->ir_domain);
+
+	return 0;
+}
 #endif
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index b0522f15730f..de3390a7d345 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1124,6 +1124,10 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
 	if (ret)
 		return ret;
 
+	ret = amd_iommu_create_irq_domain(iommu);
+	if (ret)
+		return ret;
+
 	/*
 	 * Make sure IOMMU is not considered to translate itself. The IVRS
 	 * table tells us so, but this is a lie!
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index 95ed6deae47f..612a22192fa0 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -63,6 +63,15 @@ extern u8 amd_iommu_pc_get_max_counters(u16 devid);
 extern int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
 				    u64 *value, bool is_write);
 
+#ifdef CONFIG_IRQ_REMAP
+extern int amd_iommu_create_irq_domain(struct amd_iommu *iommu);
+#else
+static inline int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
+{
+	return 0;
+}
+#endif
+
 #define PPR_SUCCESS			0x0
 #define PPR_INVALID			0x1
 #define PPR_FAILURE			0xf
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index cec51a8ba844..ef12d74a03fe 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -392,6 +392,7 @@ struct amd_iommu_fault {
 
 
 struct iommu_domain;
+struct irq_domain;
 
 /*
  * This structure contains generic data for  IOMMU protection domains
@@ -574,6 +575,10 @@ struct amd_iommu {
 	/* The maximum PC banks and counters/bank (PCSup=1) */
 	u8 max_banks;
 	u8 max_counters;
+#ifdef CONFIG_IRQ_REMAP
+	struct irq_domain *ir_domain;
+	struct irq_domain *msi_domain;
+#endif
 };
 
 struct devid_map {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 17/31] x86, hpet: Enhance HPET IRQ to support hierarchy irqdomain
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (15 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 16/31] iommu/amd: Enhance AMD " Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 18/31] PCI/MSI, trivial: Fix minor syntax issues according to coding styles Jiang Liu
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu, Srivatsa S. Bhat,
	Andy Lutomirski
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Enhance HPET code to support hierarchy irqdomain, it helps to make
the architecture more clear.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/hpet.h |    7 +-
 arch/x86/kernel/apic/msi.c  |  172 ++++++++++++++++++++++++++++++++++++++-----
 arch/x86/kernel/hpet.c      |   57 ++++----------
 3 files changed, 175 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 36f7125945e3..e87e9faf87a9 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -74,11 +74,16 @@ extern unsigned int hpet_readl(unsigned int a);
 extern void force_hpet_resume(void);
 
 struct irq_data;
+struct hpet_dev;
+struct irq_domain;
+
 extern void hpet_msi_unmask(struct irq_data *data);
 extern void hpet_msi_mask(struct irq_data *data);
-struct hpet_dev;
 extern void hpet_msi_write(struct hpet_dev *hdev, struct msi_msg *msg);
 extern void hpet_msi_read(struct hpet_dev *hdev, struct msi_msg *msg);
+extern struct irq_domain *hpet_create_irq_domain(int hpet_id);
+extern int hpet_assign_irq(struct irq_domain *domain,
+			   struct hpet_dev *dev, int dev_num);
 
 #ifdef CONFIG_PCI_MSI
 extern int default_setup_hpet_msi(unsigned int irq, unsigned int id);
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 4bb2b583be7f..f2f8c999bdcc 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -51,6 +51,44 @@ void native_compose_msi_msg(struct pci_dev *pdev,
 		MSI_DATA_VECTOR(cfg->vector);
 }
 
+static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	struct irq_cfg *cfg = irqd_cfg(data);
+
+	msg->address_hi = MSI_ADDR_BASE_HI;
+
+	if (x2apic_enabled())
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
+
+	msg->address_lo =
+		MSI_ADDR_BASE_LO |
+		((apic->irq_dest_mode == 0) ?
+			MSI_ADDR_DEST_MODE_PHYSICAL :
+			MSI_ADDR_DEST_MODE_LOGICAL) |
+		((apic->irq_delivery_mode != dest_LowestPrio) ?
+			MSI_ADDR_REDIRECTION_CPU :
+			MSI_ADDR_REDIRECTION_LOWPRI) |
+		MSI_ADDR_DEST_ID(cfg->dest_apicid);
+
+	msg->data =
+		MSI_DATA_TRIGGER_EDGE |
+		MSI_DATA_LEVEL_ASSERT |
+		((apic->irq_delivery_mode != dest_LowestPrio) ?
+			MSI_DATA_DELIVERY_FIXED :
+			MSI_DATA_DELIVERY_LOWPRI) |
+		MSI_DATA_VECTOR(cfg->vector);
+}
+
+static void msi_update_msg(struct msi_msg *msg, struct irq_data *irq_data)
+{
+	struct irq_cfg *cfg = irqd_cfg(irq_data);
+
+	msg->data &= ~MSI_DATA_VECTOR_MASK;
+	msg->data |= MSI_DATA_VECTOR(cfg->vector);
+	msg->address_lo &= ~MSI_ADDR_DEST_ID_MASK;
+	msg->address_lo |= MSI_ADDR_DEST_ID(cfg->dest_apicid);
+}
+
 static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq,
 			   struct msi_msg *msg, u8 hpet_id)
 {
@@ -239,38 +277,37 @@ void dmar_free_hwirq(int irq)
  * MSI message composition
  */
 #ifdef CONFIG_HPET_TIMER
+static inline int hpet_dev_id(struct irq_domain *domain)
+{
+	return (int)(long)domain->host_data;
+}
 
 static int hpet_msi_set_affinity(struct irq_data *data,
 				 const struct cpumask *mask, bool force)
 {
-	struct irq_cfg *cfg = irqd_cfg(data);
+	struct irq_data *parent = data->parent_data;
 	struct msi_msg msg;
-	unsigned int dest;
 	int ret;
 
-	ret = apic_set_affinity(data, mask, &dest);
-	if (ret)
-		return ret;
-
-	hpet_msi_read(data->handler_data, &msg);
-
-	msg.data &= ~MSI_DATA_VECTOR_MASK;
-	msg.data |= MSI_DATA_VECTOR(cfg->vector);
-	msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
-	msg.address_lo |= MSI_ADDR_DEST_ID(dest);
-
-	hpet_msi_write(data->handler_data, &msg);
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
+		hpet_msi_read(data->handler_data, &msg);
+		msi_update_msg(&msg, data);
+		hpet_msi_write(data->handler_data, &msg);
+	}
 
-	return IRQ_SET_MASK_OK_NOCOPY;
+	return ret;
 }
 
 static struct irq_chip hpet_msi_type = {
 	.name = "HPET_MSI",
 	.irq_unmask = hpet_msi_unmask,
 	.irq_mask = hpet_msi_mask,
-	.irq_ack = apic_ack_edge,
+	.irq_ack = irq_chip_ack_parent,
 	.irq_set_affinity = hpet_msi_set_affinity,
-	.irq_retrigger = apic_retrigger_irq,
+	.irq_retrigger = irq_chip_retrigger_hierarchy,
+	.irq_print_chip = irq_remapping_print_chip,
+	.irq_compose_msi_msg = irq_msi_compose_msg,
  	.flags = IRQCHIP_SKIP_SET_WAKE,
 };
 
@@ -291,4 +328,105 @@ int default_setup_hpet_msi(unsigned int irq, unsigned int id)
 	irq_set_chip_and_handler_name(irq, chip, handle_edge_irq, "edge");
 	return 0;
 }
+
+static int hpet_domain_alloc(struct irq_domain *domain, unsigned int virq,
+			     unsigned int nr_irqs, void *arg)
+{
+	struct irq_alloc_info *info = arg;
+	int ret;
+
+	if (nr_irqs > 1 || !info || info->type != X86_IRQ_ALLOC_TYPE_HPET)
+		return -EINVAL;
+	if (irq_find_mapping(domain, info->hpet_index)) {
+		pr_warn("IRQ for HPET%d already exists.\n", info->hpet_index);
+		return -EEXIST;
+	}
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+	if (ret >= 0) {
+		irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
+		irq_domain_set_hwirq_and_chip(domain, virq, info->hpet_index,
+					      &hpet_msi_type, NULL);
+		irq_set_handler_data(virq, info->hpet_data);
+		__irq_set_handler(virq, handle_edge_irq, 0, "edge");
+	}
+
+	return ret;
+}
+
+static void hpet_domain_free(struct irq_domain *domain, unsigned int virq,
+			     unsigned int nr_irqs)
+{
+	BUG_ON(nr_irqs > 1);
+	irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
+	irq_domain_free_irqs_top(domain, virq, nr_irqs);
+}
+
+static int hpet_domain_activate(struct irq_domain *domain,
+				struct irq_data *irq_data)
+{
+	int ret;
+	struct msi_msg msg;
+
+	ret = irq_chip_compose_msi_msg(irq_data, &msg);
+	if (ret == 0)
+		hpet_msi_write(irq_get_handler_data(irq_data->irq), &msg);
+
+	return ret;
+}
+
+static int hpet_domain_deactivate(struct irq_domain *domain,
+				  struct irq_data *irq_data)
+{
+	struct msi_msg msg;
+
+	memset(&msg, 0, sizeof(msg));
+	hpet_msi_write(irq_get_handler_data(irq_data->irq), &msg);
+
+	return 0;
+}
+
+static struct irq_domain_ops hpet_domain_ops = {
+	.alloc = hpet_domain_alloc,
+	.free = hpet_domain_free,
+	.activate = hpet_domain_activate,
+	.deactivate = hpet_domain_deactivate,
+};
+
+struct irq_domain *hpet_create_irq_domain(int hpet_id)
+{
+	struct irq_domain *domain;
+	struct irq_alloc_info info;
+
+	if (x86_vector_domain == NULL)
+		return NULL;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_HPET;
+	info.hpet_id = hpet_id;
+
+	domain = irq_domain_add_tree(NULL, &hpet_domain_ops,
+				     (void *)(long)hpet_id);
+	if (domain) {
+		domain->parent = irq_remapping_get_ir_irq_domain(&info);
+		if (!domain->parent)
+			domain->parent = x86_vector_domain;
+	}
+
+	return domain;
+}
+
+int hpet_assign_irq(struct irq_domain *domain, struct hpet_dev *dev,
+		    int dev_num)
+{
+	struct irq_alloc_info info;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_HPET;
+	info.hpet_data = dev;
+	info.hpet_id = hpet_dev_id(domain);
+	info.hpet_index = dev_num;
+
+	return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, NULL);
+}
 #endif
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 24db2d33fab7..a22d7288202b 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -306,8 +306,6 @@ static void hpet_legacy_clockevent_register(void)
 	printk(KERN_DEBUG "hpet clockevent registered\n");
 }
 
-static int hpet_setup_msi_irq(unsigned int irq);
-
 static void hpet_set_mode(enum clock_event_mode mode,
 			  struct clock_event_device *evt, int timer)
 {
@@ -358,7 +356,7 @@ static void hpet_set_mode(enum clock_event_mode mode,
 			hpet_enable_legacy_int();
 		} else {
 			struct hpet_dev *hdev = EVT_TO_HPET_DEV(evt);
-			hpet_setup_msi_irq(hdev->irq);
+			irq_domain_activate_irq(irq_get_irq_data(hdev->irq));
 			disable_irq(hdev->irq);
 			irq_set_affinity(hdev->irq, cpumask_of(hdev->cpu));
 			enable_irq(hdev->irq);
@@ -424,6 +422,7 @@ static int hpet_legacy_next_event(unsigned long delta,
 
 static DEFINE_PER_CPU(struct hpet_dev *, cpu_hpet_dev);
 static struct hpet_dev	*hpet_devs;
+static struct irq_domain *hpet_domain;
 
 void hpet_msi_unmask(struct irq_data *data)
 {
@@ -474,32 +473,6 @@ static int hpet_msi_next_event(unsigned long delta,
 	return hpet_next_event(delta, evt, hdev->num);
 }
 
-static int hpet_setup_msi_irq(unsigned int irq)
-{
-	if (x86_msi.setup_hpet_msi(irq, hpet_blockid)) {
-		irq_domain_free_irqs(irq, 1);
-		return -EINVAL;
-	}
-	return 0;
-}
-
-static int hpet_assign_irq(struct hpet_dev *dev)
-{
-	int irq;
-
-	irq = irq_domain_alloc_irqs(NULL, 1, NUMA_NO_NODE, NULL);
-	if (irq <= 0)
-		return -EINVAL;
-
-	irq_set_handler_data(irq, dev);
-
-	if (hpet_setup_msi_irq(irq))
-		return -EINVAL;
-
-	dev->irq = irq;
-	return 0;
-}
-
 static irqreturn_t hpet_interrupt_handler(int irq, void *data)
 {
 	struct hpet_dev *dev = (struct hpet_dev *)data;
@@ -542,9 +515,6 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu)
 	if (!(hdev->flags & HPET_DEV_VALID))
 		return;
 
-	if (hpet_setup_msi_irq(hdev->irq))
-		return;
-
 	hdev->cpu = cpu;
 	per_cpu(cpu_hpet_dev, cpu) = hdev;
 	evt->name = hdev->name;
@@ -576,7 +546,7 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
 	unsigned int id;
 	unsigned int num_timers;
 	unsigned int num_timers_used = 0;
-	int i;
+	int i, irq;
 
 	if (hpet_msi_disable)
 		return;
@@ -589,6 +559,10 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
 	num_timers++; /* Value read out starts from 0 */
 	hpet_print_config();
 
+	hpet_domain = hpet_create_irq_domain(hpet_blockid);
+	if (!hpet_domain)
+		return;
+
 	hpet_devs = kzalloc(sizeof(struct hpet_dev) * num_timers, GFP_KERNEL);
 	if (!hpet_devs)
 		return;
@@ -603,15 +577,16 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
 		if (!(cfg & HPET_TN_FSB_CAP))
 			continue;
 
+		irq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
+		if (irq < 0)
+			continue;
+
+		sprintf(hdev->name, "hpet%d", i);
+		hdev->num = i;
+		hdev->irq = irq;
 		hdev->flags = 0;
 		if (cfg & HPET_TN_PERIODIC_CAP)
 			hdev->flags |= HPET_DEV_PERI_CAP;
-		hdev->num = i;
-
-		sprintf(hdev->name, "hpet%d", i);
-		if (hpet_assign_irq(hdev))
-			continue;
-
 		hdev->flags |= HPET_DEV_FSB_CAP;
 		hdev->flags |= HPET_DEV_VALID;
 		num_timers_used++;
@@ -711,10 +686,6 @@ static int hpet_cpuhp_notify(struct notifier_block *n,
 }
 #else
 
-static int hpet_setup_msi_irq(unsigned int irq)
-{
-	return 0;
-}
 static void hpet_msi_capability_lookup(unsigned int start_timer)
 {
 	return;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 18/31] PCI/MSI, trivial: Fix minor syntax issues according to coding styles
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (16 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 17/31] x86, hpet: Enhance HPET IRQ to support " Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-05 22:10   ` Bjorn Helgaas
  2014-11-05 22:10   ` Bjorn Helgaas
  2014-11-04 12:01 ` [Patch Part2 v4 19/31] PCI/MSI: Simplify PCI MSI code by initializing msi_desc.nvec_used earlier Jiang Liu
                   ` (14 subsequent siblings)
  32 siblings, 2 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Jiri Kosina
  Cc: Jiang Liu, Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Joerg Roedel, Greg Kroah-Hartman, x86, linux-kernel, linux-pci,
	linux-acpi, linux-arm-kernel

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/pci/msi.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 9fab30af0e75..fb2ccb536324 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -244,9 +244,8 @@ void default_restore_msi_irqs(struct pci_dev *dev)
 {
 	struct msi_desc *entry;
 
-	list_for_each_entry(entry, &dev->msi_list, list) {
+	list_for_each_entry(entry, &dev->msi_list, list)
 		default_restore_msi_irq(dev, entry->irq);
-	}
 }
 
 void __read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
@@ -451,9 +450,8 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
 				PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
 
 	arch_restore_msi_irqs(dev);
-	list_for_each_entry(entry, &dev->msi_list, list) {
+	list_for_each_entry(entry, &dev->msi_list, list)
 		msix_mask_irq(entry, entry->masked);
-	}
 
 	msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 }
@@ -497,9 +495,8 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
 	int count = 0;
 
 	/* Determine how many msi entries we have */
-	list_for_each_entry(entry, &pdev->msi_list, list) {
+	list_for_each_entry(entry, &pdev->msi_list, list)
 		++num_msi;
-	}
 	if (!num_msi)
 		return 0;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 19/31] PCI/MSI: Simplify PCI MSI code by initializing msi_desc.nvec_used earlier
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (17 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 18/31] PCI/MSI, trivial: Fix minor syntax issues according to coding styles Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-05 22:35   ` Bjorn Helgaas
  2014-11-04 12:01 ` [Patch Part2 v4 20/31] PCI/MSI: Kill redundant calling for irq_set_msi_desc() for MSIx interrupts Jiang Liu
                   ` (13 subsequent siblings)
  32 siblings, 1 reply; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Joerg Roedel, Matthias Brugger
  Cc: Jiang Liu, Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel, iommu

Simplify PCI MSI code by initializing msi_desc.nvec_used and
msi_desc.msi_attrib.mutiple when create MSI descriptors.

Also remove redundant checks in IRQ remapping drivers, PCI MSI core
already guarattees these.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/iommu/irq_remapping.c |    8 --------
 drivers/pci/msi.c             |   40 +++++++++++++++-------------------------
 2 files changed, 15 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 176ff4372b7d..32fe5b1322d0 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -69,19 +69,13 @@ static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
 	unsigned int irq;
 	struct msi_desc *msidesc;
 
-	WARN_ON(!list_is_singular(&dev->msi_list));
 	msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
-	WARN_ON(msidesc->irq);
-	WARN_ON(msidesc->msi_attrib.multiple);
-	WARN_ON(msidesc->nvec_used);
 
 	irq = irq_alloc_hwirqs(nvec, dev_to_node(&dev->dev));
 	if (irq == 0)
 		return -ENOSPC;
 
 	nvec_pow2 = __roundup_pow_of_two(nvec);
-	msidesc->nvec_used = nvec;
-	msidesc->msi_attrib.multiple = ilog2(nvec_pow2);
 	for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
 		if (!sub_handle) {
 			index = msi_alloc_remapped_irq(dev, irq, nvec_pow2);
@@ -109,8 +103,6 @@ error:
 	 * IRQs from tearing down again in default_teardown_msi_irqs()
 	 */
 	msidesc->irq = 0;
-	msidesc->nvec_used = 0;
-	msidesc->msi_attrib.multiple = 0;
 
 	return ret;
 }
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index fb2ccb536324..afe974600c7d 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -85,19 +85,13 @@ int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
  */
 void default_teardown_msi_irqs(struct pci_dev *dev)
 {
+	int i;
 	struct msi_desc *entry;
 
-	list_for_each_entry(entry, &dev->msi_list, list) {
-		int i, nvec;
-		if (entry->irq == 0)
-			continue;
-		if (entry->nvec_used)
-			nvec = entry->nvec_used;
-		else
-			nvec = 1 << entry->msi_attrib.multiple;
-		for (i = 0; i < nvec; i++)
-			arch_teardown_msi_irq(entry->irq + i);
-	}
+	list_for_each_entry(entry, &dev->msi_list, list)
+		if (entry->irq)
+			for (i = 0; i < entry->nvec_used; i++)
+				arch_teardown_msi_irq(entry->irq + i);
 }
 
 void __weak arch_teardown_msi_irqs(struct pci_dev *dev)
@@ -353,19 +347,12 @@ static void free_msi_irqs(struct pci_dev *dev)
 	struct msi_desc *entry, *tmp;
 	struct attribute **msi_attrs;
 	struct device_attribute *dev_attr;
-	int count = 0;
+	int i, count = 0;
 
-	list_for_each_entry(entry, &dev->msi_list, list) {
-		int i, nvec;
-		if (!entry->irq)
-			continue;
-		if (entry->nvec_used)
-			nvec = entry->nvec_used;
-		else
-			nvec = 1 << entry->msi_attrib.multiple;
-		for (i = 0; i < nvec; i++)
-			BUG_ON(irq_has_action(entry->irq + i));
-	}
+	list_for_each_entry(entry, &dev->msi_list, list)
+		if (entry->irq)
+			for (i = 0; i < entry->nvec_used; i++)
+				BUG_ON(irq_has_action(entry->irq + i));
 
 	arch_teardown_msi_irqs(dev);
 
@@ -556,7 +543,7 @@ error_attrs:
 	return ret;
 }
 
-static struct msi_desc *msi_setup_entry(struct pci_dev *dev)
+static struct msi_desc *msi_setup_entry(struct pci_dev *dev, int nvec)
 {
 	u16 control;
 	struct msi_desc *entry;
@@ -574,6 +561,8 @@ static struct msi_desc *msi_setup_entry(struct pci_dev *dev)
 	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
 	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
 	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
+	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
+	entry->nvec_used		= nvec;
 
 	if (control & PCI_MSI_FLAGS_64BIT)
 		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_64;
@@ -606,7 +595,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec)
 
 	msi_set_enable(dev, 0);	/* Disable MSI during set up */
 
-	entry = msi_setup_entry(dev);
+	entry = msi_setup_entry(dev, nvec);
 	if (!entry)
 		return -ENOMEM;
 
@@ -677,6 +666,7 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
 		entry->msi_attrib.entry_nr	= entries[i].entry;
 		entry->msi_attrib.default_irq	= dev->irq;
 		entry->mask_base		= base;
+		entry->nvec_used		= 1;
 
 		list_add_tail(&entry->list, &dev->msi_list);
 	}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 20/31] PCI/MSI: Kill redundant calling for irq_set_msi_desc() for MSIx interrupts
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (18 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 19/31] PCI/MSI: Simplify PCI MSI code by initializing msi_desc.nvec_used earlier Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-05 22:45   ` Bjorn Helgaas
  2014-11-04 12:01 ` [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain Jiang Liu
                   ` (12 subsequent siblings)
  32 siblings, 1 reply; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger
  Cc: Jiang Liu, Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Joerg Roedel, Greg Kroah-Hartman, x86, linux-kernel, linux-pci,
	linux-acpi, linux-arm-kernel

It's arch_setup_msi_irq()/arch_setup_msi_irqs()'s responsibility to call
irq_set_msi_desc() to associate IRQ descriptors and MSI descriptors,
so kill the redundant call of irq_set_msi_desc() for MSIx interrupts
in PCI MSI core.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/pci/msi.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index afe974600c7d..da181c59394b 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -685,7 +685,6 @@ static void msix_program_entries(struct pci_dev *dev,
 						PCI_MSIX_ENTRY_VECTOR_CTRL;
 
 		entries[i].vector = entry->irq;
-		irq_set_msi_desc(entry->irq, entry);
 		entry->masked = readl(entry->mask_base + offset);
 		msix_mask_irq(entry, 1);
 		i++;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (19 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 20/31] PCI/MSI: Kill redundant calling for irq_set_msi_desc() for MSIx interrupts Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-05 23:09   ` Bjorn Helgaas
  2014-11-06 10:01   ` Thomas Gleixner
  2014-11-04 12:01 ` [Patch Part2 v4 22/31] x86, PCI, MSI: Use hierarchy irqdomain to manage MSI interrupts Jiang Liu
                   ` (11 subsequent siblings)
  32 siblings, 2 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Yijing Wang, Jiang Liu,
	Alexander Gordeev
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Enhance PCI MSI core to support hierarchy irqdomain, so the common
code could be shared among architectures.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/pci/Kconfig |    4 ++
 drivers/pci/msi.c   |  126 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/msi.h |   11 +++++
 3 files changed, 141 insertions(+)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index b9db0f2ce11f..022e89745f86 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -16,6 +16,10 @@ config PCI_MSI
 
 	   If you don't know what to do here, say Y.
 
+config PCI_MSI_IRQ_DOMAIN
+	bool
+	depends on PCI_MSI && IRQ_DOMAIN_HIERARCHY
+
 config PCI_DEBUG
 	bool "PCI Debugging"
 	depends on PCI && DEBUG_KERNEL
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index da181c59394b..7423ee16972f 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -19,6 +19,7 @@
 #include <linux/errno.h>
 #include <linux/io.h>
 #include <linux/slab.h>
+#include <linux/irqdomain.h>
 
 #include "pci.h"
 
@@ -1098,3 +1099,128 @@ int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
 	return nvec;
 }
 EXPORT_SYMBOL(pci_enable_msix_range);
+
+#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
+static inline irq_hw_number_t
+msi_get_hwirq(struct pci_dev *pdev, struct msi_desc *msidesc)
+{
+	return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
+		PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
+		(pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;
+}
+
+static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
+			    unsigned int nr_irqs, void *arg)
+{
+	int i, ret;
+	irq_hw_number_t hwirq = arch_msi_irq_domain_get_hwirq(arg);
+
+	if (irq_find_mapping(domain, hwirq) > 0)
+		return -EEXIST;
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+	if (ret >= 0)
+		for (i = 0; i < nr_irqs; i++) {
+			irq_domain_set_hwirq_and_chip(domain, virq + i,
+					hwirq + i, &msi_chip, (void *)(long)i);
+			__irq_set_handler(virq + i, handle_edge_irq, 0, "edge");
+		}
+
+	return ret;
+}
+
+static void msi_domain_free(struct irq_domain *domain, unsigned int virq,
+			    unsigned int nr_irqs)
+{
+	int i;
+
+	for (i = 0; i < nr_irqs; i++) {
+		struct msi_desc *msidesc = irq_get_msi_desc(virq);
+
+		if (msidesc)
+			msidesc->irq = 0;
+	}
+	irq_domain_free_irqs_top(domain, virq, nr_irqs);
+}
+
+static int msi_domain_activate(struct irq_domain *domain,
+			       struct irq_data *irq_data)
+{
+	int ret = 0;
+	struct msi_msg msg;
+
+	/*
+	 * irq_data->chip_data is MSI/MSIx offset.
+	 * MSI-X message is written per-IRQ, the offset is always 0.
+	 * MSI message denotes a contiguous group of IRQs, written for 0th IRQ.
+	 */
+	if (!irq_data->chip_data) {
+		ret = irq_chip_compose_msi_msg(irq_data, &msg);
+		if (ret == 0)
+			write_msi_msg(irq_data->irq, &msg);
+	}
+
+	return ret;
+}
+
+static int msi_domain_deactivate(struct irq_domain *domain,
+				 struct irq_data *irq_data)
+{
+	struct msi_msg msg;
+
+	if (irq_data->chip_data) {
+		memset(&msg, 0, sizeof(msg));
+		write_msi_msg(irq_data->irq, &msg);
+	}
+
+	return 0;
+}
+
+static struct irq_domain_ops msi_domain_ops = {
+	.alloc = msi_domain_alloc,
+	.free = msi_domain_free,
+	.activate = msi_domain_activate,
+	.deactivate = msi_domain_deactivate,
+};
+
+struct irq_domain *msi_create_irq_domain(struct irq_domain *parent)
+{
+	struct irq_domain *domain;
+
+	domain = irq_domain_add_tree(NULL, &msi_domain_ops, NULL);
+	if (domain)
+		domain->parent = parent;
+
+	return domain;
+}
+
+int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
+			      struct pci_dev *dev, void *arg)
+{
+	int i, virq;
+	struct msi_desc *msidesc;
+	int node = dev_to_node(&dev->dev);
+
+	list_for_each_entry(msidesc, &dev->msi_list, list) {
+		arch_msi_irq_domain_set_hwirq(arg, msi_get_hwirq(dev, msidesc));
+		virq = irq_domain_alloc_irqs(domain, msidesc->nvec_used,
+					     node, arg);
+		if (virq < 0) {
+			/* Special handling for pci_enable_msi_range(). */
+			return (type == PCI_CAP_ID_MSI &&
+				msidesc->nvec_used > 1) ?  1 : -ENOSPC;
+		}
+		for (i = 0; i < msidesc->nvec_used; i++)
+			irq_set_msi_desc_off(virq + i, i, msidesc);
+	}
+
+	list_for_each_entry(msidesc, &dev->msi_list, list)
+		if (msidesc->nvec_used == 1)
+			dev_dbg(&dev->dev, "irq %d for MSI/MSI-X\n", virq);
+		else
+			dev_dbg(&dev->dev, "irq [%d-%d] for MSI/MSI-X\n",
+				virq, virq + msidesc->nvec_used - 1);
+
+	return 0;
+}
+#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 44f4746d033b..05dcd425f82b 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -75,4 +75,15 @@ struct msi_chip {
 	void (*teardown_irq)(struct msi_chip *chip, unsigned int irq);
 };
 
+#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
+extern struct irq_chip msi_chip;
+
+extern struct irq_domain *msi_create_irq_domain(struct irq_domain *parent);
+extern int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
+				     struct pci_dev *dev, void *arg);
+
+extern irq_hw_number_t arch_msi_irq_domain_get_hwirq(void *arg);
+extern void arch_msi_irq_domain_set_hwirq(void *arg, irq_hw_number_t hwirq);
+#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */
+
 #endif /* LINUX_MSI_H */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 22/31] x86, PCI, MSI: Use hierarchy irqdomain to manage MSI interrupts
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (20 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 23/31] x86, irq: Directly call native_compose_msi_msg() for DMAR IRQ Jiang Liu
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Joerg Roedel, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel, iommu

Enhance MSI code to support hierarchy irqdomain, it helps to make
the architecture more clear.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/Kconfig                     |    1 +
 arch/x86/include/asm/hw_irq.h        |    9 ++-
 arch/x86/include/asm/irq_remapping.h |    6 +-
 arch/x86/kernel/apic/msi.c           |  122 +++++++++++++++++-----------------
 arch/x86/kernel/apic/vector.c        |    2 +
 drivers/iommu/irq_remapping.c        |    1 -
 6 files changed, 73 insertions(+), 68 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9df24a42f54d..a3675e4f4342 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -885,6 +885,7 @@ config X86_LOCAL_APIC
 	select GENERIC_IRQ_LEGACY_ALLOC_HWIRQ
 	select IRQ_DOMAIN
 	select IRQ_DOMAIN_HIERARCHY
+	select PCI_MSI_IRQ_DOMAIN if PCI_MSI
 
 config X86_IO_APIC
 	def_bool X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_IOAPIC
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 9e91a5d048de..eb206e8b0bb7 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -110,9 +110,10 @@ struct irq_2_irte {
 };
 #endif	/* CONFIG_IRQ_REMAP */
 
+struct irq_domain;
+
 #ifdef	CONFIG_X86_LOCAL_APIC
 struct irq_data;
-struct irq_domain;
 struct pci_dev;
 struct msi_desc;
 
@@ -205,6 +206,12 @@ static inline void lock_vector_lock(void) {}
 static inline void unlock_vector_lock(void) {}
 #endif	/* CONFIG_X86_LOCAL_APIC */
 
+#ifdef	CONFIG_PCI_MSI
+extern void arch_init_msi_domain(struct irq_domain *domain);
+#else
+static inline void arch_init_msi_domain(struct irq_domain *domain) { }
+#endif
+
 /* Statistics */
 extern atomic_t irq_err_count;
 extern atomic_t irq_mis_count;
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index d2410ac8cef9..c4fa0d2291b8 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -71,11 +71,7 @@ extern void irq_remapping_print_chip(struct irq_data *data, struct seq_file *p);
  * Create MSI/MSIx irqdomain for interrupt remapping device, use @parent as
  * parent irqdomain.
  */
-static inline struct irq_domain *
-arch_create_msi_irq_domain(struct irq_domain *parent)
-{
-	return NULL;
-}
+extern struct irq_domain *arch_create_msi_irq_domain(struct irq_domain *parent);
 
 /* Get parent irqdomain for interrupt remapping irqdomain */
 static inline struct irq_domain *arch_get_ir_parent_domain(void)
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index f2f8c999bdcc..f6c06ceba9ee 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -3,6 +3,8 @@
  *
  * Copyright (C) 1997, 1998, 1999, 2000, 2009 Ingo Molnar, Hajnalka Szabo
  *	Moved from arch/x86/kernel/apic/io_apic.c.
+ * Jiang Liu <jiang.liu@linux.intel.com>
+ *	Add support of hierarchy irqdomain
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -21,6 +23,8 @@
 #include <asm/apic.h>
 #include <asm/irq_remapping.h>
 
+static struct irq_domain *msi_default_domain;
+
 void native_compose_msi_msg(struct pci_dev *pdev,
 			    unsigned int irq, unsigned int dest,
 			    struct msi_msg *msg, u8 hpet_id)
@@ -114,102 +118,98 @@ static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq,
 	return 0;
 }
 
-static int
-msi_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force)
+static int msi_set_affinity(struct irq_data *data, const struct cpumask *mask,
+			    bool force)
 {
-	struct irq_cfg *cfg = irqd_cfg(data);
-	struct msi_msg msg;
-	unsigned int dest;
+	struct irq_data *parent = data->parent_data;
 	int ret;
 
-	ret = apic_set_affinity(data, mask, &dest);
-	if (ret)
-		return ret;
-
-	__get_cached_msi_msg(data->msi_desc, &msg);
-
-	msg.data &= ~MSI_DATA_VECTOR_MASK;
-	msg.data |= MSI_DATA_VECTOR(cfg->vector);
-	msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
-	msg.address_lo |= MSI_ADDR_DEST_ID(dest);
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
+		struct msi_msg msg;
 
-	__write_msi_msg(data->msi_desc, &msg);
+		__get_cached_msi_msg(data->msi_desc, &msg);
+		msi_update_msg(&msg, data);
+		__write_msi_msg(data->msi_desc, &msg);
+	}
 
-	return IRQ_SET_MASK_OK_NOCOPY;
+	return ret;
 }
 
 /*
  * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
  * which implement the MSI or MSI-X Capability Structure.
  */
-static struct irq_chip msi_chip = {
+struct irq_chip msi_chip = {
 	.name			= "PCI-MSI",
 	.irq_unmask		= unmask_msi_irq,
 	.irq_mask		= mask_msi_irq,
-	.irq_ack		= apic_ack_edge,
+	.irq_ack		= irq_chip_ack_parent,
 	.irq_set_affinity	= msi_set_affinity,
-	.irq_retrigger		= apic_retrigger_irq,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_print_chip		= irq_remapping_print_chip,
+	.irq_compose_msi_msg	= irq_msi_compose_msg,
  	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
-int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
-		  unsigned int irq_base, unsigned int irq_offset)
+int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
-	struct irq_chip *chip = &msi_chip;
-	struct msi_msg msg;
-	unsigned int irq = irq_base + irq_offset;
-	int ret;
-
-	ret = msi_compose_msg(dev, irq, &msg, -1);
-	if (ret < 0)
-		return ret;
+	struct irq_domain *domain;
+	struct irq_alloc_info info;
 
-	irq_set_msi_desc_off(irq_base, irq_offset, msidesc);
+	init_irq_alloc_info(&info, NULL);
+	info.msi_dev = dev;
+	if (type == PCI_CAP_ID_MSI) {
+		info.type = X86_IRQ_ALLOC_TYPE_MSI;
+		info.flags |= X86_IRQ_ALLOC_CONTIGOUS_VECTORS;
+	} else {
+		info.type = X86_IRQ_ALLOC_TYPE_MSIX;
+	}
 
-	/*
-	 * MSI-X message is written per-IRQ, the offset is always 0.
-	 * MSI message denotes a contiguous group of IRQs, written for 0th IRQ.
-	 */
-	if (!irq_offset)
-		write_msi_msg(irq, &msg);
+	domain = irq_remapping_get_irq_domain(&info);
+	if (domain == NULL)
+		domain = msi_default_domain;
+	if (domain == NULL)
+		return -ENOSYS;
 
-	setup_remapped_irq(irq, irq_cfg(irq), chip);
+	return msi_irq_domain_alloc_irqs(domain, type, dev, &info);
+}
 
-	irq_set_chip_and_handler_name(irq, chip, handle_edge_irq, "edge");
+void native_teardown_msi_irq(unsigned int irq)
+{
+	irq_domain_free_irqs(irq, 1);
+}
 
-	dev_dbg(&dev->dev, "irq %d for MSI/MSI-X\n", irq);
+irq_hw_number_t arch_msi_irq_domain_get_hwirq(void *arg)
+{
+	struct irq_alloc_info *info = arg;
 
-	return 0;
+	return info->msi_hwirq;
 }
 
-int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+void arch_msi_irq_domain_set_hwirq(void *arg, irq_hw_number_t hwirq)
 {
-	struct msi_desc *msidesc;
-	int irq, ret;
-
-	/* Multiple MSI vectors only supported with interrupt remapping */
-	if (type == PCI_CAP_ID_MSI && nvec > 1)
-		return 1;
+	struct irq_alloc_info *info = arg;
 
-	list_for_each_entry(msidesc, &dev->msi_list, list) {
-		irq = irq_domain_alloc_irqs(NULL, 1, NUMA_NO_NODE, NULL);
-		if (irq <= 0)
-			return -ENOSPC;
+	info->msi_hwirq = hwirq;
+}
 
-		ret = setup_msi_irq(dev, msidesc, irq, 0);
-		if (ret < 0) {
-			irq_domain_free_irqs(irq, 1);
-			return ret;
-		}
+void arch_init_msi_domain(struct irq_domain *parent)
+{
+	if (disable_apic)
+		return;
 
-	}
-	return 0;
+	msi_default_domain = msi_create_irq_domain(parent);
+	if (!msi_default_domain)
+		pr_warn("failed to initialize irqdomain for MSI/MSI-x.\n");
 }
 
-void native_teardown_msi_irq(unsigned int irq)
+#ifdef CONFIG_IRQ_REMAP
+struct irq_domain *arch_create_msi_irq_domain(struct irq_domain *parent)
 {
-	irq_domain_free_irqs(irq, 1);
+	return msi_create_irq_domain(parent);
 }
+#endif
 
 #ifdef CONFIG_DMAR_TABLE
 static int
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 4b5a021f2094..9ee62cf83edf 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -364,6 +364,8 @@ int __init arch_early_irq_init(void)
 	BUG_ON(x86_vector_domain == NULL);
 	irq_set_default_host(x86_vector_domain);
 
+	arch_init_msi_domain(x86_vector_domain);
+
 	return arch_early_ioapic_init();
 }
 
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 32fe5b1322d0..414ab0cddbbc 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -171,7 +171,6 @@ static void __init irq_remapping_modify_x86_ops(void)
 	x86_io_apic_ops.set_affinity	= set_remapped_irq_affinity;
 	x86_io_apic_ops.setup_entry	= setup_ioapic_remapped_entry;
 	x86_io_apic_ops.eoi_ioapic_pin	= eoi_ioapic_pin_remapped;
-	x86_msi.setup_msi_irqs		= irq_remapping_setup_msi_irqs;
 	x86_msi.setup_hpet_msi		= setup_hpet_msi_remapped;
 	x86_msi.compose_msi_msg		= compose_remapped_msi_msg;
 }
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 23/31] x86, irq: Directly call native_compose_msi_msg() for DMAR IRQ
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (21 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 22/31] x86, PCI, MSI: Use hierarchy irqdomain to manage MSI interrupts Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 24/31] iommu/vt-d: Clean up unused MSI related code Jiang Liu
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

DMAR interrupt won't be remapped by interrupt remapping hardware,
so directly call native_compose_msi_msg() for DMAR IRQ to compose MSI
message data. This will help to simplify MSI code later.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/kernel/apic/msi.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index f6c06ceba9ee..265178b30816 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -250,12 +250,10 @@ static struct irq_chip dmar_msi_type = {
 
 int arch_setup_dmar_msi(unsigned int irq)
 {
-	int ret;
 	struct msi_msg msg;
+	struct irq_cfg *cfg = irq_cfg(irq);
 
-	ret = msi_compose_msg(NULL, irq, &msg, -1);
-	if (ret < 0)
-		return ret;
+	native_compose_msi_msg(NULL, irq, cfg->dest_apicid, &msg, -1);
 	dmar_msi_write(irq, &msg);
 	irq_set_chip_and_handler_name(irq, &dmar_msi_type, handle_edge_irq,
 				      "edge");
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 24/31] iommu/vt-d: Clean up unused MSI related code
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (22 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 23/31] x86, irq: Directly call native_compose_msi_msg() for DMAR IRQ Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:01 ` [Patch Part2 v4 25/31] iommu/amd: " Jiang Liu
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Joerg Roedel, Matthias Brugger
  Cc: Jiang Liu, Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel, iommu

Now MSI interrupt has been converted to new hierarchy irqdomain
interfaces, so kill legacy MSI related code.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/iommu/intel_irq_remapping.c |  144 -----------------------------------
 1 file changed, 144 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 88196ca55e29..cbaad087a872 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -145,44 +145,6 @@ static int qi_flush_iec(struct intel_iommu *iommu, int index, int mask)
 	return qi_submit_sync(&desc, iommu);
 }
 
-static int map_irq_to_irte_handle(int irq, u16 *sub_handle)
-{
-	struct irq_2_iommu *irq_iommu = irq_2_iommu(irq);
-	unsigned long flags;
-	int index;
-
-	if (!irq_iommu)
-		return -1;
-
-	raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
-	*sub_handle = irq_iommu->sub_handle;
-	index = irq_iommu->irte_index;
-	raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);
-	return index;
-}
-
-static int set_irte_irq(int irq, struct intel_iommu *iommu, u16 index, u16 subhandle)
-{
-	struct irq_2_iommu *irq_iommu = irq_2_iommu(irq);
-	struct irq_cfg *cfg = irq_cfg(irq);
-	unsigned long flags;
-
-	if (!irq_iommu)
-		return -1;
-
-	raw_spin_lock_irqsave(&irq_2_ir_lock, flags);
-
-	cfg->remapped = 1;
-	irq_iommu->iommu = iommu;
-	irq_iommu->irte_index = index;
-	irq_iommu->sub_handle = subhandle;
-	irq_iommu->irte_mask = 0;
-
-	raw_spin_unlock_irqrestore(&irq_2_ir_lock, flags);
-
-	return 0;
-}
-
 static int modify_irte(struct irq_2_iommu *irq_iommu,
 		       struct irte *irte_modified)
 {
@@ -1070,108 +1032,6 @@ intel_ioapic_set_affinity(struct irq_data *data, const struct cpumask *mask,
 	return 0;
 }
 
-static void intel_compose_msi_msg(struct pci_dev *pdev,
-				  unsigned int irq, unsigned int dest,
-				  struct msi_msg *msg, u8 hpet_id)
-{
-	struct irq_cfg *cfg;
-	struct irte irte;
-	u16 sub_handle = 0;
-	int ir_index;
-
-	cfg = irq_cfg(irq);
-
-	ir_index = map_irq_to_irte_handle(irq, &sub_handle);
-	BUG_ON(ir_index == -1);
-
-	prepare_irte(&irte, cfg->vector, dest);
-
-	/* Set source-id of interrupt request */
-	if (pdev)
-		set_msi_sid(&irte, pdev);
-	else
-		set_hpet_sid(&irte, hpet_id);
-
-	modify_irte(irq_2_iommu(irq), &irte);
-
-	msg->address_hi = MSI_ADDR_BASE_HI;
-	msg->data = sub_handle;
-	msg->address_lo = MSI_ADDR_BASE_LO | MSI_ADDR_IR_EXT_INT |
-			  MSI_ADDR_IR_SHV |
-			  MSI_ADDR_IR_INDEX1(ir_index) |
-			  MSI_ADDR_IR_INDEX2(ir_index);
-}
-
-/*
- * Map the PCI dev to the corresponding remapping hardware unit
- * and allocate 'nvec' consecutive interrupt-remapping table entries
- * in it.
- */
-static int intel_msi_alloc_irq(struct pci_dev *dev, int irq, int nvec)
-{
-	struct intel_iommu *iommu;
-	int index;
-
-	down_read(&dmar_global_lock);
-	iommu = map_dev_to_ir(dev);
-	if (!iommu) {
-		printk(KERN_ERR
-		       "Unable to map PCI %s to iommu\n", pci_name(dev));
-		index = -ENOENT;
-	} else {
-		index = alloc_irte(iommu, irq, irq_2_iommu(irq), nvec);
-		if (index < 0) {
-			printk(KERN_ERR
-			       "Unable to allocate %d IRTE for PCI %s\n",
-			       nvec, pci_name(dev));
-			index = -ENOSPC;
-		}
-	}
-	up_read(&dmar_global_lock);
-
-	return index;
-}
-
-static int intel_msi_setup_irq(struct pci_dev *pdev, unsigned int irq,
-			       int index, int sub_handle)
-{
-	struct intel_iommu *iommu;
-	int ret = -ENOENT;
-
-	down_read(&dmar_global_lock);
-	iommu = map_dev_to_ir(pdev);
-	if (iommu) {
-		/*
-		 * setup the mapping between the irq and the IRTE
-		 * base index, the sub_handle pointing to the
-		 * appropriate interrupt remap table entry.
-		 */
-		set_irte_irq(irq, iommu, index, sub_handle);
-		ret = 0;
-	}
-	up_read(&dmar_global_lock);
-
-	return ret;
-}
-
-static int intel_alloc_hpet_msi(unsigned int irq, unsigned int id)
-{
-	int ret = -1;
-	struct intel_iommu *iommu;
-	int index;
-
-	down_read(&dmar_global_lock);
-	iommu = map_hpet_to_ir(id);
-	if (iommu) {
-		index = alloc_irte(iommu, irq, irq_2_iommu(irq), 1);
-		if (index >= 0)
-			ret = 0;
-	}
-	up_read(&dmar_global_lock);
-
-	return ret;
-}
-
 static struct irq_domain *intel_get_ir_irq_domain(struct irq_alloc_info *info)
 {
 	struct intel_iommu *iommu = NULL;
@@ -1229,10 +1089,6 @@ struct irq_remap_ops intel_irq_remap_ops = {
 	.setup_ioapic_entry	= intel_setup_ioapic_entry,
 	.set_affinity		= intel_ioapic_set_affinity,
 	.free_irq		= free_irte,
-	.compose_msi_msg	= intel_compose_msi_msg,
-	.msi_alloc_irq		= intel_msi_alloc_irq,
-	.msi_setup_irq		= intel_msi_setup_irq,
-	.alloc_hpet_msi		= intel_alloc_hpet_msi,
 	.get_ir_irq_domain	= intel_get_ir_irq_domain,
 	.get_irq_domain		= intel_get_irq_domain,
 };
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 25/31] iommu/amd: Clean up unused MSI related code
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (23 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 24/31] iommu/vt-d: Clean up unused MSI related code Jiang Liu
@ 2014-11-04 12:01 ` Jiang Liu
  2014-11-04 12:02 ` [Patch Part2 v4 26/31] x86: irq_remapping: " Jiang Liu
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Joerg Roedel, Matthias Brugger
  Cc: Jiang Liu, Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel, iommu

Now MSI interrupt has been converted to new hierarchy irqdomain
interfaces, so kill legacy MSI related code.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 drivers/iommu/amd_iommu.c |  115 +--------------------------------------------
 1 file changed, 2 insertions(+), 113 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 2d03e294e40f..e85cd4c8b380 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3958,8 +3958,7 @@ out_unlock:
 	return table;
 }
 
-static int alloc_irq_index(struct irq_cfg *cfg, struct irq_2_irte *irte_info,
-			   u16 devid, int count)
+static int alloc_irq_index(u16 devid, int count)
 {
 	struct irq_remap_table *table;
 	unsigned long flags;
@@ -3985,11 +3984,6 @@ static int alloc_irq_index(struct irq_cfg *cfg, struct irq_2_irte *irte_info,
 				table->table[index - c + 1] = IRTE_ALLOCATED;
 
 			index -= count - 1;
-
-			cfg->remapped	      = 1;
-			irte_info->devid      = devid;
-			irte_info->index      = index;
-
 			goto out;
 		}
 	}
@@ -4189,106 +4183,6 @@ static int free_irq(int irq)
 	return 0;
 }
 
-static void compose_msi_msg(struct pci_dev *pdev,
-			    unsigned int irq, unsigned int dest,
-			    struct msi_msg *msg, u8 hpet_id)
-{
-	struct irq_2_irte *irte_info;
-	struct irq_cfg *cfg;
-	union irte irte;
-
-	cfg = irq_cfg(irq);
-	if (!cfg)
-		return;
-
-	irte_info = &cfg->irq_2_irte;
-
-	irte.val		= 0;
-	irte.fields.vector	= cfg->vector;
-	irte.fields.int_type    = apic->irq_delivery_mode;
-	irte.fields.destination	= dest;
-	irte.fields.dm		= apic->irq_dest_mode;
-	irte.fields.valid	= 1;
-
-	modify_irte(irte_info->devid, irte_info->index, irte);
-
-	msg->address_hi = MSI_ADDR_BASE_HI;
-	msg->address_lo = MSI_ADDR_BASE_LO;
-	msg->data       = irte_info->index;
-}
-
-static int msi_alloc_irq(struct pci_dev *pdev, int irq, int nvec)
-{
-	struct irq_cfg *cfg;
-	int index;
-	u16 devid;
-
-	if (!pdev)
-		return -EINVAL;
-
-	cfg = irq_cfg(irq);
-	if (!cfg)
-		return -EINVAL;
-
-	devid = get_device_id(&pdev->dev);
-	index = alloc_irq_index(cfg, &cfg->irq_2_irte, devid, nvec);
-
-	return index < 0 ? MAX_IRQS_PER_TABLE : index;
-}
-
-static int msi_setup_irq(struct pci_dev *pdev, unsigned int irq,
-			 int index, int offset)
-{
-	struct irq_2_irte *irte_info;
-	struct irq_cfg *cfg;
-	u16 devid;
-
-	if (!pdev)
-		return -EINVAL;
-
-	cfg = irq_cfg(irq);
-	if (!cfg)
-		return -EINVAL;
-
-	if (index >= MAX_IRQS_PER_TABLE)
-		return 0;
-
-	devid		= get_device_id(&pdev->dev);
-	irte_info	= &cfg->irq_2_irte;
-
-	cfg->remapped	      = 1;
-	irte_info->devid      = devid;
-	irte_info->index      = index + offset;
-
-	return 0;
-}
-
-static int alloc_hpet_msi(unsigned int irq, unsigned int id)
-{
-	struct irq_2_irte *irte_info;
-	struct irq_cfg *cfg;
-	int index, devid;
-
-	cfg = irq_cfg(irq);
-	if (!cfg)
-		return -EINVAL;
-
-	irte_info = &cfg->irq_2_irte;
-	devid     = get_hpet_devid(id);
-	if (devid < 0)
-		return devid;
-
-	index = alloc_irq_index(cfg, &cfg->irq_2_irte, devid, 1);
-	if (index < 0)
-		return index;
-
-	cfg->remapped	      = 1;
-	irte_info->devid      = devid;
-	irte_info->index      = index;
-
-	return 0;
-}
-
 static int get_devid(struct irq_alloc_info *info)
 {
 	int devid = -1;
@@ -4365,10 +4259,6 @@ struct irq_remap_ops amd_iommu_irq_ops = {
 	.setup_ioapic_entry	= setup_ioapic_entry,
 	.set_affinity		= set_affinity,
 	.free_irq		= free_irq,
-	.compose_msi_msg	= compose_msi_msg,
-	.msi_alloc_irq		= msi_alloc_irq,
-	.msi_setup_irq		= msi_setup_irq,
-	.alloc_hpet_msi		= alloc_hpet_msi,
 	.get_ir_irq_domain	= get_ir_irq_domain,
 	.get_irq_domain		= get_irq_domain,
 };
@@ -4466,8 +4356,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq,
 		else
 			ret = -ENOMEM;
 	} else {
-		cfg = irq_cfg(virq);
-		index = alloc_irq_index(cfg, &data->irq_2_irte, devid, nr_irqs);
+		index = alloc_irq_index(devid, nr_irqs);
 	}
 	if (index < 0) {
 		pr_warn("Failed to allocate IRTE\n");
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 26/31] x86: irq_remapping: Clean up unused MSI related code
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (24 preceding siblings ...)
  2014-11-04 12:01 ` [Patch Part2 v4 25/31] iommu/amd: " Jiang Liu
@ 2014-11-04 12:02 ` Jiang Liu
  2014-11-04 12:02 ` [Patch Part2 v4 27/31] x86, irq: Clean up unused MSI related code and interfaces Jiang Liu
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Joerg Roedel, Matthias Brugger, Jiang Liu,
	Rafael J. Wysocki, Konrad Rzeszutek Wilk
  Cc: Andrew Morton, Tony Luck, Greg Kroah-Hartman, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel, iommu

Now MSI interrupt has been converted to new hierarchy irqdomain
interfaces, so kill legacy MSI related code and interfaces.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/irq_remapping.h |   13 ---
 arch/x86/include/asm/pci.h           |    5 --
 arch/x86/kernel/x86_init.c           |    2 -
 drivers/iommu/irq_remapping.c        |  151 ----------------------------------
 drivers/iommu/irq_remapping.h        |   14 ----
 5 files changed, 185 deletions(-)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index c4fa0d2291b8..61c50e6e28c8 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -50,10 +50,6 @@ extern int setup_ioapic_remapped_entry(int irq,
 				       int vector,
 				       struct io_apic_irq_attr *attr);
 extern void free_remapped_irq(int irq);
-extern void compose_remapped_msi_msg(struct pci_dev *pdev,
-				     unsigned int irq, unsigned int dest,
-				     struct msi_msg *msg, u8 hpet_id);
-extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id);
 extern void panic_if_irq_remap(const char *msg);
 extern bool setup_remapped_irq(int irq,
 			       struct irq_cfg *cfg,
@@ -98,15 +94,6 @@ static inline int setup_ioapic_remapped_entry(int irq,
 	return -ENODEV;
 }
 static inline void free_remapped_irq(int irq) { }
-static inline void compose_remapped_msi_msg(struct pci_dev *pdev,
-					    unsigned int irq, unsigned int dest,
-					    struct msi_msg *msg, u8 hpet_id)
-{
-}
-static inline int setup_hpet_msi_remapped(unsigned int irq, unsigned int id)
-{
-	return -ENODEV;
-}
 
 static inline void panic_if_irq_remap(const char *msg)
 {
diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index 4e370a5d8117..d8c80ff32e8c 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -96,15 +96,10 @@ extern void pci_iommu_alloc(void);
 #ifdef CONFIG_PCI_MSI
 /* implemented in arch/x86/kernel/apic/io_apic. */
 struct msi_desc;
-void native_compose_msi_msg(struct pci_dev *pdev, unsigned int irq,
-			    unsigned int dest, struct msi_msg *msg, u8 hpet_id);
 int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
 void native_teardown_msi_irq(unsigned int irq);
 void native_restore_msi_irqs(struct pci_dev *dev);
-int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,
-		  unsigned int irq_base, unsigned int irq_offset);
 #else
-#define native_compose_msi_msg		NULL
 #define native_setup_msi_irqs		NULL
 #define native_teardown_msi_irq		NULL
 #endif
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index e48b674639cc..814fcbadaad1 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -111,11 +111,9 @@ EXPORT_SYMBOL_GPL(x86_platform);
 #if defined(CONFIG_PCI_MSI)
 struct x86_msi_ops x86_msi = {
 	.setup_msi_irqs		= native_setup_msi_irqs,
-	.compose_msi_msg	= native_compose_msi_msg,
 	.teardown_msi_irq	= native_teardown_msi_irq,
 	.teardown_msi_irqs	= default_teardown_msi_irqs,
 	.restore_msi_irqs	= default_restore_msi_irqs,
-	.setup_hpet_msi		= default_setup_hpet_msi,
 	.msi_mask_irq		= default_msi_mask_irq,
 	.msix_mask_irq		= default_msix_mask_irq,
 };
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 414ab0cddbbc..782dea5e4233 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -26,9 +26,6 @@ int no_x2apic_optout;
 
 static struct irq_remap_ops *remap_ops;
 
-static int msi_alloc_remapped_irq(struct pci_dev *pdev, int irq, int nvec);
-static int msi_setup_remapped_irq(struct pci_dev *pdev, unsigned int irq,
-				  int index, int sub_handle);
 static int set_remapped_irq_affinity(struct irq_data *data,
 				     const struct cpumask *mask,
 				     bool force);
@@ -51,109 +48,6 @@ static void irq_remapping_disable_io_apic(void)
 		disconnect_bsp_APIC(0);
 }
 
-#ifndef CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ
-static unsigned int irq_alloc_hwirqs(int cnt, int node)
-{
-	return irq_domain_alloc_irqs(NULL, -1, cnt, node, NULL);
-}
-
-static void irq_free_hwirqs(unsigned int from, int cnt)
-{
-	irq_domain_free_irqs(from, cnt);
-}
-#endif
-
-static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
-{
-	int ret, sub_handle, nvec_pow2, index = 0;
-	unsigned int irq;
-	struct msi_desc *msidesc;
-
-	msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
-
-	irq = irq_alloc_hwirqs(nvec, dev_to_node(&dev->dev));
-	if (irq == 0)
-		return -ENOSPC;
-
-	nvec_pow2 = __roundup_pow_of_two(nvec);
-	for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
-		if (!sub_handle) {
-			index = msi_alloc_remapped_irq(dev, irq, nvec_pow2);
-			if (index < 0) {
-				ret = index;
-				goto error;
-			}
-		} else {
-			ret = msi_setup_remapped_irq(dev, irq + sub_handle,
-						     index, sub_handle);
-			if (ret < 0)
-				goto error;
-		}
-		ret = setup_msi_irq(dev, msidesc, irq, sub_handle);
-		if (ret < 0)
-			goto error;
-	}
-	return 0;
-
-error:
-	irq_free_hwirqs(irq, nvec);
-
-	/*
-	 * Restore altered MSI descriptor fields and prevent just destroyed
-	 * IRQs from tearing down again in default_teardown_msi_irqs()
-	 */
-	msidesc->irq = 0;
-
-	return ret;
-}
-
-static int do_setup_msix_irqs(struct pci_dev *dev, int nvec)
-{
-	int node, ret, sub_handle, index = 0;
-	struct msi_desc *msidesc;
-	unsigned int irq;
-
-	node		= dev_to_node(&dev->dev);
-	sub_handle	= 0;
-
-	list_for_each_entry(msidesc, &dev->msi_list, list) {
-
-		irq = irq_alloc_hwirqs(1, node);
-		if (irq == 0)
-			return -1;
-
-		if (sub_handle == 0)
-			ret = index = msi_alloc_remapped_irq(dev, irq, nvec);
-		else
-			ret = msi_setup_remapped_irq(dev, irq, index, sub_handle);
-
-		if (ret < 0)
-			goto error;
-
-		ret = setup_msi_irq(dev, msidesc, irq, 0);
-		if (ret < 0)
-			goto error;
-
-		sub_handle += 1;
-		irq        += 1;
-	}
-
-	return 0;
-
-error:
-	irq_free_hwirqs(irq, 1);
-	return ret;
-}
-
-static int irq_remapping_setup_msi_irqs(struct pci_dev *dev,
-					int nvec, int type)
-{
-	if (type == PCI_CAP_ID_MSI)
-		return do_setup_msi_irqs(dev, nvec);
-	else
-		return do_setup_msix_irqs(dev, nvec);
-}
-
 static void eoi_ioapic_pin_remapped(int apic, int pin, int vector)
 {
 	/*
@@ -171,8 +65,6 @@ static void __init irq_remapping_modify_x86_ops(void)
 	x86_io_apic_ops.set_affinity	= set_remapped_irq_affinity;
 	x86_io_apic_ops.setup_entry	= setup_ioapic_remapped_entry;
 	x86_io_apic_ops.eoi_ioapic_pin	= eoi_ioapic_pin_remapped;
-	x86_msi.setup_hpet_msi		= setup_hpet_msi_remapped;
-	x86_msi.compose_msi_msg		= compose_remapped_msi_msg;
 }
 
 static __init int setup_nointremap(char *str)
@@ -319,49 +211,6 @@ void free_remapped_irq(int irq)
 		remap_ops->free_irq(irq);
 }
 
-void compose_remapped_msi_msg(struct pci_dev *pdev,
-			      unsigned int irq, unsigned int dest,
-			      struct msi_msg *msg, u8 hpet_id)
-{
-	struct irq_cfg *cfg = irq_cfg(irq);
-
-	if (!irq_remapped(cfg))
-		native_compose_msi_msg(pdev, irq, dest, msg, hpet_id);
-	else if (remap_ops && remap_ops->compose_msi_msg)
-		remap_ops->compose_msi_msg(pdev, irq, dest, msg, hpet_id);
-}
-
-static int msi_alloc_remapped_irq(struct pci_dev *pdev, int irq, int nvec)
-{
-	if (!remap_ops || !remap_ops->msi_alloc_irq)
-		return -ENODEV;
-
-	return remap_ops->msi_alloc_irq(pdev, irq, nvec);
-}
-
-static int msi_setup_remapped_irq(struct pci_dev *pdev, unsigned int irq,
-				  int index, int sub_handle)
-{
-	if (!remap_ops || !remap_ops->msi_setup_irq)
-		return -ENODEV;
-
-	return remap_ops->msi_setup_irq(pdev, irq, index, sub_handle);
-}
-
-int setup_hpet_msi_remapped(unsigned int irq, unsigned int id)
-{
-	int ret;
-
-	if (!remap_ops || !remap_ops->alloc_hpet_msi)
-		return -ENODEV;
-
-	ret = remap_ops->alloc_hpet_msi(irq, id);
-	if (ret)
-		return -EINVAL;
-
-	return default_setup_hpet_msi(irq, id);
-}
-
 void panic_if_irq_remap(const char *msg)
 {
 	if (irq_remapping_enabled)
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 8c159d6fac46..95b19a6ef16a 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -70,20 +70,6 @@ struct irq_remap_ops {
 	/* Free an IRQ */
 	int (*free_irq)(int);
 
-	/* Create MSI msg to use for interrupt remapping */
-	void (*compose_msi_msg)(struct pci_dev *,
-				unsigned int, unsigned int,
-				struct msi_msg *, u8);
-
-	/* Allocate remapping resources for MSI */
-	int (*msi_alloc_irq)(struct pci_dev *, int, int);
-
-	/* Setup the remapped MSI irq */
-	int (*msi_setup_irq)(struct pci_dev *, unsigned int, int, int);
-
-	/* Setup interrupt remapping for an HPET MSI */
-	int (*alloc_hpet_msi)(unsigned int, unsigned int);
-
 	/* Get the irqdomain associated the IOMMU device */
 	struct irq_domain *(*get_ir_irq_domain)(struct irq_alloc_info *);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 27/31] x86, irq: Clean up unused MSI related code and interfaces
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (25 preceding siblings ...)
  2014-11-04 12:02 ` [Patch Part2 v4 26/31] x86: irq_remapping: " Jiang Liu
@ 2014-11-04 12:02 ` Jiang Liu
  2014-11-04 12:02 ` [Patch Part2 v4 28/31] iommu/vt-d: Refine the interfaces to create IRQ for DMAR unit Jiang Liu
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu,
	Konrad Rzeszutek Wilk
  Cc: Andrew Morton, Tony Luck, Joerg Roedel, Greg Kroah-Hartman,
	linux-kernel, linux-pci, linux-acpi, linux-arm-kernel

Now MSI interrupt has been converted to new hierarchy irqdomain
interfaces, so kill legacy MSI related code and interfaces.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/hpet.h     |    9 ------
 arch/x86/include/asm/x86_init.h |    4 ---
 arch/x86/kernel/apic/msi.c      |   61 +++++----------------------------------
 3 files changed, 7 insertions(+), 67 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index e87e9faf87a9..5fa9fb0f8809 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -85,15 +85,6 @@ extern struct irq_domain *hpet_create_irq_domain(int hpet_id);
 extern int hpet_assign_irq(struct irq_domain *domain,
 			   struct hpet_dev *dev, int dev_num);
 
-#ifdef CONFIG_PCI_MSI
-extern int default_setup_hpet_msi(unsigned int irq, unsigned int id);
-#else
-static inline int default_setup_hpet_msi(unsigned int irq, unsigned int id)
-{
-	return -EINVAL;
-}
-#endif
-
 #ifdef CONFIG_HPET_EMULATE_RTC
 
 #include <linux/interrupt.h>
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index e45e4da96bf1..9b53cb2acfbb 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -176,13 +176,9 @@ struct msi_desc;
 
 struct x86_msi_ops {
 	int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
-	void (*compose_msi_msg)(struct pci_dev *dev, unsigned int irq,
-				unsigned int dest, struct msi_msg *msg,
-			       u8 hpet_id);
 	void (*teardown_msi_irq)(unsigned int irq);
 	void (*teardown_msi_irqs)(struct pci_dev *dev);
 	void (*restore_msi_irqs)(struct pci_dev *dev);
-	int  (*setup_hpet_msi)(unsigned int irq, unsigned int id);
 	u32 (*msi_mask_irq)(struct msi_desc *desc, u32 mask, u32 flag);
 	u32 (*msix_mask_irq)(struct msi_desc *desc, u32 flag);
 };
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 265178b30816..6bfe85e96c74 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -25,16 +25,12 @@
 
 static struct irq_domain *msi_default_domain;
 
-void native_compose_msi_msg(struct pci_dev *pdev,
-			    unsigned int irq, unsigned int dest,
-			    struct msi_msg *msg, u8 hpet_id)
+static void native_compose_msi_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 {
-	struct irq_cfg *cfg = irq_cfg(irq);
-
 	msg->address_hi = MSI_ADDR_BASE_HI;
 
 	if (x2apic_enabled())
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(dest);
+		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
 
 	msg->address_lo =
 		MSI_ADDR_BASE_LO |
@@ -44,7 +40,7 @@ void native_compose_msi_msg(struct pci_dev *pdev,
 		((apic->irq_delivery_mode != dest_LowestPrio) ?
 			MSI_ADDR_REDIRECTION_CPU :
 			MSI_ADDR_REDIRECTION_LOWPRI) |
-		MSI_ADDR_DEST_ID(dest);
+		MSI_ADDR_DEST_ID(cfg->dest_apicid);
 
 	msg->data =
 		MSI_DATA_TRIGGER_EDGE |
@@ -55,7 +51,7 @@ void native_compose_msi_msg(struct pci_dev *pdev,
 		MSI_DATA_VECTOR(cfg->vector);
 }
 
-static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+static void msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	struct irq_cfg *cfg = irqd_cfg(data);
 
@@ -93,31 +89,6 @@ static void msi_update_msg(struct msi_msg *msg, struct irq_data *irq_data)
 	msg->address_lo |= MSI_ADDR_DEST_ID(cfg->dest_apicid);
 }
 
-static int msi_compose_msg(struct pci_dev *pdev, unsigned int irq,
-			   struct msi_msg *msg, u8 hpet_id)
-{
-	struct irq_cfg *cfg;
-	int err;
-	unsigned dest;
-
-	if (disable_apic)
-		return -ENXIO;
-
-	cfg = irq_cfg(irq);
-	err = assign_irq_vector(irq, cfg, apic->target_cpus());
-	if (err)
-		return err;
-
-	err = apic->cpu_mask_to_apicid_and(cfg->domain,
-					   apic->target_cpus(), &dest);
-	if (err)
-		return err;
-
-	x86_msi.compose_msi_msg(pdev, irq, dest, msg, hpet_id);
-
-	return 0;
-}
-
 static int msi_set_affinity(struct irq_data *data, const struct cpumask *mask,
 			    bool force)
 {
@@ -148,7 +119,7 @@ struct irq_chip msi_chip = {
 	.irq_set_affinity	= msi_set_affinity,
 	.irq_retrigger		= irq_chip_retrigger_hierarchy,
 	.irq_print_chip		= irq_remapping_print_chip,
-	.irq_compose_msi_msg	= irq_msi_compose_msg,
+	.irq_compose_msi_msg	= msi_compose_msg,
  	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
@@ -253,7 +224,7 @@ int arch_setup_dmar_msi(unsigned int irq)
 	struct msi_msg msg;
 	struct irq_cfg *cfg = irq_cfg(irq);
 
-	native_compose_msi_msg(NULL, irq, cfg->dest_apicid, &msg, -1);
+	native_compose_msi_msg(cfg, &msg);
 	dmar_msi_write(irq, &msg);
 	irq_set_chip_and_handler_name(irq, &dmar_msi_type, handle_edge_irq,
 				      "edge");
@@ -305,28 +276,10 @@ static struct irq_chip hpet_msi_type = {
 	.irq_set_affinity = hpet_msi_set_affinity,
 	.irq_retrigger = irq_chip_retrigger_hierarchy,
 	.irq_print_chip = irq_remapping_print_chip,
-	.irq_compose_msi_msg = irq_msi_compose_msg,
+	.irq_compose_msi_msg = msi_compose_msg,
  	.flags = IRQCHIP_SKIP_SET_WAKE,
 };
 
-int default_setup_hpet_msi(unsigned int irq, unsigned int id)
-{
-	struct irq_chip *chip = &hpet_msi_type;
-	struct msi_msg msg;
-	int ret;
-
-	ret = msi_compose_msg(NULL, irq, &msg, id);
-	if (ret < 0)
-		return ret;
-
-	hpet_msi_write(irq_get_handler_data(irq), &msg);
-	irq_set_status_flags(irq, IRQ_MOVE_PCNTXT);
-	setup_remapped_irq(irq, irq_cfg(irq), chip);
-
-	irq_set_chip_and_handler_name(irq, chip, handle_edge_irq, "edge");
-	return 0;
-}
-
 static int hpet_domain_alloc(struct irq_domain *domain, unsigned int virq,
 			     unsigned int nr_irqs, void *arg)
 {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 28/31] iommu/vt-d: Refine the interfaces to create IRQ for DMAR unit
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (26 preceding siblings ...)
  2014-11-04 12:02 ` [Patch Part2 v4 27/31] x86, irq: Clean up unused MSI related code and interfaces Jiang Liu
@ 2014-11-04 12:02 ` Jiang Liu
  2014-11-04 12:02 ` [Patch Part2 v4 29/31] x86, irq: Use hierarchy irqdomain to manage DMAR interrupts Jiang Liu
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Tony Luck, Fenghua Yu, x86, Joerg Roedel,
	Vinod Koul, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Greg Kroah-Hartman,
	linux-kernel, linux-pci, linux-acpi, linux-arm-kernel,
	linux-ia64, iommu, dmaengine

Refine the interfaces to create IRQ for DMAR unit. It's a preparation
for converting DMAR IRQ to irqdomain on x86.

It also moves dmar_alloc_hwirq()/dmar_free_hwirq() from irq_remapping.h
to dmar.h. They are not irq_remapping specific.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/ia64/include/asm/irq_remapping.h |    2 --
 arch/ia64/kernel/msi_ia64.c           |   30 +++++++++++++++++++-----------
 arch/x86/include/asm/irq_remapping.h  |    4 ----
 arch/x86/kernel/apic/msi.c            |   24 +++++++++++++-----------
 drivers/iommu/dmar.c                  |   19 +++++--------------
 include/linux/dmar.h                  |    3 ++-
 6 files changed, 39 insertions(+), 43 deletions(-)

diff --git a/arch/ia64/include/asm/irq_remapping.h b/arch/ia64/include/asm/irq_remapping.h
index e3b3556e2e1b..a8687b1d8906 100644
--- a/arch/ia64/include/asm/irq_remapping.h
+++ b/arch/ia64/include/asm/irq_remapping.h
@@ -1,6 +1,4 @@
 #ifndef __IA64_INTR_REMAPPING_H
 #define __IA64_INTR_REMAPPING_H
 #define irq_remapping_enabled 0
-#define dmar_alloc_hwirq	create_irq
-#define dmar_free_hwirq		destroy_irq
 #endif
diff --git a/arch/ia64/kernel/msi_ia64.c b/arch/ia64/kernel/msi_ia64.c
index 8c3730c3c63d..15032330573b 100644
--- a/arch/ia64/kernel/msi_ia64.c
+++ b/arch/ia64/kernel/msi_ia64.c
@@ -166,7 +166,7 @@ static struct irq_chip dmar_msi_type = {
 	.irq_retrigger = ia64_msi_retrigger_irq,
 };
 
-static int
+static void
 msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_msg *msg)
 {
 	struct irq_cfg *cfg = irq_cfg + irq;
@@ -188,21 +188,29 @@ msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_msg *msg)
 		MSI_DATA_LEVEL_ASSERT |
 		MSI_DATA_DELIVERY_FIXED |
 		MSI_DATA_VECTOR(cfg->vector);
-	return 0;
 }
 
-int arch_setup_dmar_msi(unsigned int irq)
+int dmar_alloc_hwirq(int id, int node, void *arg)
 {
-	int ret;
+	int irq;
 	struct msi_msg msg;
 
-	ret = msi_compose_msg(NULL, irq, &msg);
-	if (ret < 0)
-		return ret;
-	dmar_msi_write(irq, &msg);
-	irq_set_chip_and_handler_name(irq, &dmar_msi_type, handle_edge_irq,
-				      "edge");
-	return 0;
+	irq = create_irq();
+	if (irq > 0) {
+		irq_set_handler_data(irq, arg);
+		irq_set_chip_and_handler_name(irq, &dmar_msi_type,
+					      handle_edge_irq, "edge");
+		msi_compose_msg(NULL, irq, &msg);
+		dmar_msi_write(irq, &msg);
+	}
+
+	return irq;
+}
+
+void dmar_free_hwirq(int irq)
+{
+	irq_set_handler_data(irq, NULL);
+	destroy_irq(irq);
 }
 #endif /* CONFIG_INTEL_IOMMU */
 
diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 61c50e6e28c8..61ec9234c88e 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -124,8 +124,4 @@ irq_remapping_get_irq_domain(struct irq_alloc_info *info)
 
 #define	irq_remapping_print_chip	NULL
 #endif /* CONFIG_IRQ_REMAP */
-
-extern int dmar_alloc_hwirq(void);
-extern void dmar_free_hwirq(int irq);
-
 #endif /* __X86_IRQ_REMAPPING_H */
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 6bfe85e96c74..79fc6bb0d104 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -219,25 +219,27 @@ static struct irq_chip dmar_msi_type = {
  	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
-int arch_setup_dmar_msi(unsigned int irq)
+int dmar_alloc_hwirq(int id, int node, void *arg)
 {
+	int irq;
 	struct msi_msg msg;
-	struct irq_cfg *cfg = irq_cfg(irq);
 
-	native_compose_msi_msg(cfg, &msg);
-	dmar_msi_write(irq, &msg);
-	irq_set_chip_and_handler_name(irq, &dmar_msi_type, handle_edge_irq,
-				      "edge");
-	return 0;
-}
+	irq = irq_domain_alloc_irqs(NULL, 1, node, NULL);
+	if (irq > 0) {
+		irq_set_handler_data(irq, arg);
+		irq_set_chip_and_handler_name(irq, &dmar_msi_type,
+					      handle_edge_irq, "edge");
+		native_compose_msi_msg(irq_cfg(irq), &msg);
+		dmar_msi_write(irq, &msg);
+	}
 
-int dmar_alloc_hwirq(void)
-{
-	return irq_domain_alloc_irqs(NULL, 1, NUMA_NO_NODE, NULL);
+	return irq;
 }
 
 void dmar_free_hwirq(int irq)
 {
+	irq_set_handler_data(irq, NULL);
+	irq_set_handler(irq, NULL);
 	irq_domain_free_irqs(irq, 1);
 }
 #endif
diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index c5c61cabd6e3..25f47937f1d5 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1018,8 +1018,8 @@ static void free_iommu(struct intel_iommu *iommu)
 
 	if (iommu->irq) {
 		free_irq(iommu->irq, iommu);
-		irq_set_handler_data(iommu->irq, NULL);
 		dmar_free_hwirq(iommu->irq);
+		iommu->irq = 0;
 	}
 
 	if (iommu->qi) {
@@ -1572,23 +1572,14 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
 	if (iommu->irq)
 		return 0;
 
-	irq = dmar_alloc_hwirq();
-	if (irq <= 0) {
+	irq = dmar_alloc_hwirq(iommu->seq_id, iommu->node, iommu);
+	if (irq > 0) {
+		iommu->irq = irq;
+	} else {
 		pr_err("IOMMU: no free vectors\n");
 		return -EINVAL;
 	}
 
-	irq_set_handler_data(irq, iommu);
-	iommu->irq = irq;
-
-	ret = arch_setup_dmar_msi(irq);
-	if (ret) {
-		irq_set_handler_data(irq, NULL);
-		iommu->irq = 0;
-		dmar_free_hwirq(irq);
-		return ret;
-	}
-
 	ret = request_irq(irq, dmar_fault, IRQF_NO_THREAD, iommu->name, iommu);
 	if (ret)
 		pr_err("IOMMU: can't request irq\n");
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 593fff99e6bf..df3918482073 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -189,6 +189,7 @@ extern void dmar_msi_read(int irq, struct msi_msg *msg);
 extern void dmar_msi_write(int irq, struct msi_msg *msg);
 extern int dmar_set_interrupt(struct intel_iommu *iommu);
 extern irqreturn_t dmar_fault(int irq, void *dev_id);
-extern int arch_setup_dmar_msi(unsigned int irq);
+extern int dmar_alloc_hwirq(int id, int node, void *arg);
+extern void dmar_free_hwirq(int irq);
 
 #endif /* __DMAR_H__ */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 29/31] x86, irq: Use hierarchy irqdomain to manage DMAR interrupts
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (27 preceding siblings ...)
  2014-11-04 12:02 ` [Patch Part2 v4 28/31] iommu/vt-d: Refine the interfaces to create IRQ for DMAR unit Jiang Liu
@ 2014-11-04 12:02 ` Jiang Liu
  2014-11-04 12:02 ` [Patch Part2 v4 30/31] x86, htirq: Use hierarchy irqdomain to manage Hypertransport interrupts Jiang Liu
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Enhance DMAR code to support hierarchy irqdomain, it helps to make
the architecture more clear.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/hw_irq.h |    7 ++
 arch/x86/kernel/apic/msi.c    |  157 ++++++++++++++++++++++++++---------------
 2 files changed, 108 insertions(+), 56 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index eb206e8b0bb7..1b7501cfee8c 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -122,6 +122,7 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_HPET,
 	X86_IRQ_ALLOC_TYPE_MSI,
 	X86_IRQ_ALLOC_TYPE_MSIX,
+	X86_IRQ_ALLOC_TYPE_DMAR,
 };
 
 struct irq_alloc_info {
@@ -152,6 +153,12 @@ struct irq_alloc_info {
 			struct IO_APIC_route_entry *ioapic_entry;
 		};
 #endif
+#ifdef	CONFIG_DMAR_TABLE
+		struct {
+			int		dmar_id;
+			void		*dmar_data;
+		};
+#endif
 	};
 };
 
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 79fc6bb0d104..cc70fc659121 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -25,32 +25,6 @@
 
 static struct irq_domain *msi_default_domain;
 
-static void native_compose_msi_msg(struct irq_cfg *cfg, struct msi_msg *msg)
-{
-	msg->address_hi = MSI_ADDR_BASE_HI;
-
-	if (x2apic_enabled())
-		msg->address_hi |= MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
-
-	msg->address_lo =
-		MSI_ADDR_BASE_LO |
-		((apic->irq_dest_mode == 0) ?
-			MSI_ADDR_DEST_MODE_PHYSICAL :
-			MSI_ADDR_DEST_MODE_LOGICAL) |
-		((apic->irq_delivery_mode != dest_LowestPrio) ?
-			MSI_ADDR_REDIRECTION_CPU :
-			MSI_ADDR_REDIRECTION_LOWPRI) |
-		MSI_ADDR_DEST_ID(cfg->dest_apicid);
-
-	msg->data =
-		MSI_DATA_TRIGGER_EDGE |
-		MSI_DATA_LEVEL_ASSERT |
-		((apic->irq_delivery_mode != dest_LowestPrio) ?
-			MSI_DATA_DELIVERY_FIXED :
-			MSI_DATA_DELIVERY_LOWPRI) |
-		MSI_DATA_VECTOR(cfg->vector);
-}
-
 static void msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	struct irq_cfg *cfg = irqd_cfg(data);
@@ -87,6 +61,9 @@ static void msi_update_msg(struct msi_msg *msg, struct irq_data *irq_data)
 	msg->data |= MSI_DATA_VECTOR(cfg->vector);
 	msg->address_lo &= ~MSI_ADDR_DEST_ID_MASK;
 	msg->address_lo |= MSI_ADDR_DEST_ID(cfg->dest_apicid);
+	if (x2apic_enabled())
+		msg->address_hi = MSI_ADDR_BASE_HI |
+				  MSI_ADDR_EXT_DEST_ID(cfg->dest_apicid);
 }
 
 static int msi_set_affinity(struct irq_data *data, const struct cpumask *mask,
@@ -187,59 +164,127 @@ static int
 dmar_msi_set_affinity(struct irq_data *data, const struct cpumask *mask,
 		      bool force)
 {
-	struct irq_cfg *cfg = irqd_cfg(data);
-	unsigned int dest, irq = data->irq;
+	struct irq_data *parent = data->parent_data;
 	struct msi_msg msg;
 	int ret;
 
-	ret = apic_set_affinity(data, mask, &dest);
-	if (ret)
-		return ret;
-
-	dmar_msi_read(irq, &msg);
-
-	msg.data &= ~MSI_DATA_VECTOR_MASK;
-	msg.data |= MSI_DATA_VECTOR(cfg->vector);
-	msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
-	msg.address_lo |= MSI_ADDR_DEST_ID(dest);
-	msg.address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(dest);
-
-	dmar_msi_write(irq, &msg);
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret >= 0) {
+		dmar_msi_read(data->irq, &msg);
+		msi_update_msg(&msg, data);
+		dmar_msi_write(data->irq, &msg);
+	}
 
-	return IRQ_SET_MASK_OK_NOCOPY;
+	return ret;
 }
 
 static struct irq_chip dmar_msi_type = {
 	.name			= "DMAR_MSI",
 	.irq_unmask		= dmar_msi_unmask,
 	.irq_mask		= dmar_msi_mask,
-	.irq_ack		= apic_ack_edge,
+	.irq_ack		= irq_chip_ack_parent,
 	.irq_set_affinity	= dmar_msi_set_affinity,
-	.irq_retrigger		= apic_retrigger_irq,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.irq_compose_msi_msg	= msi_compose_msg,
  	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
-int dmar_alloc_hwirq(int id, int node, void *arg)
+static int dmar_domain_alloc(struct irq_domain *domain, unsigned int virq,
+			     unsigned int nr_irqs, void *arg)
+{
+	struct irq_alloc_info *info = arg;
+	int ret;
+
+	if (nr_irqs > 1 || !info || info->type != X86_IRQ_ALLOC_TYPE_DMAR)
+		return -EINVAL;
+	if (irq_find_mapping(domain, info->dmar_id)) {
+		pr_warn("IRQ for DMAR%d already exists.\n", info->dmar_id);
+		return -EEXIST;
+	}
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+	if (ret >= 0) {
+		irq_domain_set_hwirq_and_chip(domain, virq, info->dmar_id,
+					      &dmar_msi_type, NULL);
+		irq_set_handler_data(virq, info->dmar_data);
+		__irq_set_handler(virq, handle_edge_irq, 0, "edge");
+	}
+
+	return ret;
+}
+
+static void dmar_domain_free(struct irq_domain *domain, unsigned int virq,
+			     unsigned int nr_irqs)
+{
+	BUG_ON(nr_irqs > 1);
+	irq_domain_free_irqs_top(domain, virq, nr_irqs);
+}
+
+static int dmar_domain_activate(struct irq_domain *domain,
+				struct irq_data *irq_data)
+{
+	int ret;
+	struct msi_msg msg;
+
+	ret = irq_chip_compose_msi_msg(irq_data, &msg);
+	if (ret == 0)
+		dmar_msi_write(irq_data->irq, &msg);
+
+	return ret;
+}
+
+static int dmar_domain_deactivate(struct irq_domain *domain,
+				  struct irq_data *irq_data)
 {
-	int irq;
 	struct msi_msg msg;
 
-	irq = irq_domain_alloc_irqs(NULL, 1, node, NULL);
-	if (irq > 0) {
-		irq_set_handler_data(irq, arg);
-		irq_set_chip_and_handler_name(irq, &dmar_msi_type,
-					      handle_edge_irq, "edge");
-		native_compose_msi_msg(irq_cfg(irq), &msg);
-		dmar_msi_write(irq, &msg);
+	memset(&msg, 0, sizeof(msg));
+	dmar_msi_write(irq_data->irq, &msg);
+
+	return 0;
+}
+
+static struct irq_domain_ops dmar_domain_ops = {
+	.alloc = dmar_domain_alloc,
+	.free = dmar_domain_free,
+	.activate = dmar_domain_activate,
+	.deactivate = dmar_domain_deactivate,
+};
+
+static struct irq_domain *dmar_get_irq_domain(void)
+{
+	static struct irq_domain *dmar_domain;
+	static DEFINE_MUTEX(dmar_lock);
+
+	mutex_lock(&dmar_lock);
+	if (dmar_domain == NULL) {
+		dmar_domain = irq_domain_add_tree(NULL, &dmar_domain_ops, NULL);
+		if (dmar_domain)
+			dmar_domain->parent = x86_vector_domain;
 	}
+	mutex_unlock(&dmar_lock);
+
+	return dmar_domain;
+}
+
+int dmar_alloc_hwirq(int id, int node, void *arg)
+{
+	struct irq_domain *domain = dmar_get_irq_domain();
+	struct irq_alloc_info info;
+
+	if (!domain)
+		return -1;
+
+	init_irq_alloc_info(&info, NULL);
+	info.type = X86_IRQ_ALLOC_TYPE_DMAR;
+	info.dmar_id = id;
+	info.dmar_data = arg;
 
-	return irq;
+	return irq_domain_alloc_irqs(domain, 1, node, &info);
 }
 
 void dmar_free_hwirq(int irq)
 {
-	irq_set_handler_data(irq, NULL);
-	irq_set_handler(irq, NULL);
 	irq_domain_free_irqs(irq, 1);
 }
 #endif
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 30/31] x86, htirq: Use hierarchy irqdomain to manage Hypertransport interrupts
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (28 preceding siblings ...)
  2014-11-04 12:02 ` [Patch Part2 v4 29/31] x86, irq: Use hierarchy irqdomain to manage DMAR interrupts Jiang Liu
@ 2014-11-04 12:02 ` Jiang Liu
  2014-11-04 12:02 ` [Patch Part2 v4 31/31] x86, uv: Use hierarchy irqdomain to manage UV interrupts Jiang Liu
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Use hierarchy irqdomain to manage Hypertransport interrupts.
We have slightly changed the architecture interfaces to support htirq
PCI driver, it should be safe because currently Hypertransport interrupt
is only enabled on x86 platforms.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/hw_irq.h |   13 ++++
 arch/x86/kernel/apic/htirq.c  |  164 +++++++++++++++++++++++++++++++----------
 arch/x86/kernel/apic/vector.c |    1 +
 drivers/pci/htirq.c           |   47 ++----------
 include/linux/htirq.h         |   24 ++++--
 5 files changed, 162 insertions(+), 87 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 1b7501cfee8c..f54e78023218 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -159,6 +159,14 @@ struct irq_alloc_info {
 			void		*dmar_data;
 		};
 #endif
+#ifdef	CONFIG_HT_IRQ
+		struct {
+			int		ht_pos;
+			int		ht_idx;
+			struct pci_dev	*ht_dev;
+			void		*ht_update;
+		};
+#endif
 	};
 };
 
@@ -218,6 +226,11 @@ extern void arch_init_msi_domain(struct irq_domain *domain);
 #else
 static inline void arch_init_msi_domain(struct irq_domain *domain) { }
 #endif
+#ifdef	CONFIG_HT_IRQ
+extern void arch_init_htirq_domain(struct irq_domain *domain);
+#else
+static inline void arch_init_htirq_domain(struct irq_domain *domain) { }
+#endif
 
 /* Statistics */
 extern atomic_t irq_err_count;
diff --git a/arch/x86/kernel/apic/htirq.c b/arch/x86/kernel/apic/htirq.c
index b307ee7a7148..86ecf81a455a 100644
--- a/arch/x86/kernel/apic/htirq.c
+++ b/arch/x86/kernel/apic/htirq.c
@@ -3,6 +3,8 @@
  *
  * Copyright (C) 1997, 1998, 1999, 2000, 2009 Ingo Molnar, Hajnalka Szabo
  *	Moved from arch/x86/kernel/apic/io_apic.c.
+ * Jiang Liu <jiang.liu@linux.intel.com>
+ *	Add support of hierarchy irqdomain
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -19,70 +21,105 @@
 #include <asm/apic.h>
 #include <asm/hypertransport.h>
 
+static struct irq_domain *htirq_domain;
+
 /*
  * Hypertransport interrupt support
  */
-static void target_ht_irq(unsigned int irq, unsigned int dest, u8 vector)
-{
-	struct ht_irq_msg msg;
-
-	fetch_ht_irq_msg(irq, &msg);
-
-	msg.address_lo &= ~(HT_IRQ_LOW_VECTOR_MASK | HT_IRQ_LOW_DEST_ID_MASK);
-	msg.address_hi &= ~(HT_IRQ_HIGH_DEST_ID_MASK);
-
-	msg.address_lo |= HT_IRQ_LOW_VECTOR(vector) | HT_IRQ_LOW_DEST_ID(dest);
-	msg.address_hi |= HT_IRQ_HIGH_DEST_ID(dest);
-
-	write_ht_irq_msg(irq, &msg);
-}
-
 static int
 ht_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force)
 {
-	struct irq_cfg *cfg = irqd_cfg(data);
-	unsigned int dest;
+	struct irq_data *parent = data->parent_data;
 	int ret;
 
-	ret = apic_set_affinity(data, mask, &dest);
-	if (ret)
-		return ret;
-
-	target_ht_irq(data->irq, dest, cfg->vector);
-	return IRQ_SET_MASK_OK_NOCOPY;
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret >= 0) {
+		struct ht_irq_msg msg;
+		struct irq_cfg *cfg = data->chip_data;
+
+		fetch_ht_irq_msg(data->irq, &msg);
+		msg.address_lo &= ~(HT_IRQ_LOW_VECTOR_MASK |
+				    HT_IRQ_LOW_DEST_ID_MASK);
+		msg.address_lo |= HT_IRQ_LOW_VECTOR(cfg->vector) |
+				  HT_IRQ_LOW_DEST_ID(cfg->dest_apicid);
+		msg.address_hi &= ~(HT_IRQ_HIGH_DEST_ID_MASK);
+		msg.address_hi |= HT_IRQ_HIGH_DEST_ID(cfg->dest_apicid);
+		write_ht_irq_msg(data->irq, &msg);
+	}
+
+	return ret;
 }
 
 static struct irq_chip ht_irq_chip = {
 	.name			= "PCI-HT",
 	.irq_mask		= mask_ht_irq,
 	.irq_unmask		= unmask_ht_irq,
-	.irq_ack		= apic_ack_edge,
+	.irq_ack		= irq_chip_ack_parent,
 	.irq_set_affinity	= ht_set_affinity,
-	.irq_retrigger		= apic_retrigger_irq,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
-int arch_alloc_ht_irq(struct pci_dev *dev)
+static int htirq_domain_alloc(struct irq_domain *domain, unsigned int virq,
+			      unsigned int nr_irqs, void *arg)
 {
-	return irq_domain_alloc_irqs(NULL, 1, dev_to_node(&dev->dev), NULL);
+	struct ht_irq_cfg *ht_cfg;
+	struct irq_alloc_info *info = arg;
+	struct pci_dev *dev;
+	irq_hw_number_t hwirq;
+	int ret;
+
+	if (nr_irqs > 1 || !info)
+		return -EINVAL;
+
+	dev = info->ht_dev;
+	hwirq = (info->ht_idx & 0xFF) |
+		PCI_DEVID(dev->bus->number, dev->devfn) << 8 |
+		(pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 24;
+	if (irq_find_mapping(domain, hwirq) > 0)
+		return -EEXIST;
+
+	ht_cfg = kmalloc(sizeof(*ht_cfg), GFP_KERNEL);
+	if (!ht_cfg)
+		return -ENOMEM;
+
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, info);
+	if (ret < 0) {
+		kfree(ht_cfg);
+		return ret;
+	}
+
+	/* Initialize msg to a value that will never match the first write. */
+	ht_cfg->msg.address_lo = 0xffffffff;
+	ht_cfg->msg.address_hi = 0xffffffff;
+	ht_cfg->dev = info->ht_dev;
+	ht_cfg->update = info->ht_update;
+	ht_cfg->pos = info->ht_pos;
+	ht_cfg->idx = 0x10 + (info->ht_idx * 2);
+	irq_domain_set_hwirq_and_chip(domain, virq, hwirq, &ht_irq_chip,
+				      ht_cfg);
+	__irq_set_handler(virq, handle_edge_irq, 0, "edge");
+
+	return 0;
 }
 
-void arch_free_ht_irq(int irq)
+static void htirq_domain_free(struct irq_domain *domain, unsigned int virq,
+			      unsigned int nr_irqs)
 {
-	irq_domain_free_irqs(irq, 1);
+	struct irq_data *irq_data = irq_domain_get_irq_data(domain, virq);
+
+	BUG_ON(nr_irqs != 1);
+	kfree(irq_data->chip_data);
+	irq_domain_free_irqs_top(domain, virq, nr_irqs);
 }
 
-int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev)
+static int htirq_domain_activate(struct irq_domain *domain,
+				 struct irq_data *irq_data)
 {
-	struct irq_cfg *cfg;
 	struct ht_irq_msg msg;
+	struct irq_cfg *cfg = irqd_cfg(irq_data);
 
-	if (disable_apic)
-		return -ENXIO;
-
-	cfg = irq_cfg(irq);
 	msg.address_hi = HT_IRQ_HIGH_DEST_ID(cfg->dest_apicid);
-
 	msg.address_lo =
 		HT_IRQ_LOW_BASE |
 		HT_IRQ_LOW_DEST_ID(cfg->dest_apicid) |
@@ -95,13 +132,60 @@ int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev)
 			HT_IRQ_LOW_MT_FIXED :
 			HT_IRQ_LOW_MT_ARBITRATED) |
 		HT_IRQ_LOW_IRQ_MASKED;
+	write_ht_irq_msg(irq_data->irq, &msg);
 
-	write_ht_irq_msg(irq, &msg);
+	return 0;
+}
 
-	irq_set_chip_and_handler_name(irq, &ht_irq_chip,
-				      handle_edge_irq, "edge");
+static int htirq_domain_deactivate(struct irq_domain *domain,
+				   struct irq_data *irq_data)
+{
+	struct ht_irq_msg msg;
 
-	dev_dbg(&dev->dev, "irq %d for HT\n", irq);
+	memset(&msg, 0, sizeof(msg));
+	write_ht_irq_msg(irq_data->irq, &msg);
 
 	return 0;
 }
+
+static struct irq_domain_ops htirq_domain_ops = {
+	.alloc = htirq_domain_alloc,
+	.free = htirq_domain_free,
+	.activate = htirq_domain_activate,
+	.deactivate = htirq_domain_deactivate,
+};
+
+void arch_init_htirq_domain(struct irq_domain *parent)
+{
+	if (disable_apic)
+		return;
+
+	htirq_domain = irq_domain_add_tree(NULL, &htirq_domain_ops, NULL);
+	if (!htirq_domain)
+		pr_warn("failed to initialize irqdomain for HTIRQ.\n");
+	else
+		htirq_domain->parent = parent;
+}
+
+int arch_setup_ht_irq(int idx, int pos, struct pci_dev *dev,
+		      ht_irq_update_t *update)
+{
+	struct irq_alloc_info info;
+
+	if (!htirq_domain)
+		return -ENOSYS;
+
+	init_irq_alloc_info(&info, NULL);
+	info.ht_idx = idx;
+	info.ht_pos = pos;
+	info.ht_dev = dev;
+	info.ht_update = update;
+
+	return irq_domain_alloc_irqs(htirq_domain, 1, dev_to_node(&dev->dev),
+				     &info);
+}
+
+void arch_teardown_ht_irq(unsigned int irq)
+{
+	irq_domain_free_irqs(irq, 1);
+}
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 9ee62cf83edf..678435af82bd 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -365,6 +365,7 @@ int __init arch_early_irq_init(void)
 	irq_set_default_host(x86_vector_domain);
 
 	arch_init_msi_domain(x86_vector_domain);
+	arch_init_htirq_domain(x86_vector_domain);
 
 	return arch_early_ioapic_init();
 }
diff --git a/drivers/pci/htirq.c b/drivers/pci/htirq.c
index ceb0ebeb7b5f..7eb4109a3df4 100644
--- a/drivers/pci/htirq.c
+++ b/drivers/pci/htirq.c
@@ -23,20 +23,11 @@
  */
 static DEFINE_SPINLOCK(ht_irq_lock);
 
-struct ht_irq_cfg {
-	struct pci_dev *dev;
-	 /* Update callback used to cope with buggy hardware */
-	ht_irq_update_t *update;
-	unsigned pos;
-	unsigned idx;
-	struct ht_irq_msg msg;
-};
-
-
 void write_ht_irq_msg(unsigned int irq, struct ht_irq_msg *msg)
 {
 	struct ht_irq_cfg *cfg = irq_get_handler_data(irq);
 	unsigned long flags;
+
 	spin_lock_irqsave(&ht_irq_lock, flags);
 	if (cfg->msg.address_lo != msg->address_lo) {
 		pci_write_config_byte(cfg->dev, cfg->pos + 2, cfg->idx);
@@ -55,6 +46,7 @@ void write_ht_irq_msg(unsigned int irq, struct ht_irq_msg *msg)
 void fetch_ht_irq_msg(unsigned int irq, struct ht_irq_msg *msg)
 {
 	struct ht_irq_cfg *cfg = irq_get_handler_data(irq);
+
 	*msg = cfg->msg;
 }
 
@@ -86,7 +78,6 @@ void unmask_ht_irq(struct irq_data *data)
  */
 int __ht_create_irq(struct pci_dev *dev, int idx, ht_irq_update_t *update)
 {
-	struct ht_irq_cfg *cfg;
 	int max_irq, pos, irq;
 	unsigned long flags;
 	u32 data;
@@ -105,29 +96,9 @@ int __ht_create_irq(struct pci_dev *dev, int idx, ht_irq_update_t *update)
 	if (idx > max_irq)
 		return -EINVAL;
 
-	cfg = kmalloc(sizeof(*cfg), GFP_KERNEL);
-	if (!cfg)
-		return -ENOMEM;
-
-	cfg->dev = dev;
-	cfg->update = update;
-	cfg->pos = pos;
-	cfg->idx = 0x10 + (idx * 2);
-	/* Initialize msg to a value that will never match the first write. */
-	cfg->msg.address_lo = 0xffffffff;
-	cfg->msg.address_hi = 0xffffffff;
-
-	irq = arch_alloc_ht_irq(dev);
-	if (irq <= 0) {
-		kfree(cfg);
-		return -EBUSY;
-	}
-	irq_set_handler_data(irq, cfg);
-
-	if (arch_setup_ht_irq(irq, dev) < 0) {
-		ht_destroy_irq(irq);
-		return -EBUSY;
-	}
+	irq = arch_setup_ht_irq(idx, pos, dev, update);
+	if (irq > 0)
+		dev_dbg(&dev->dev, "irq %d for HT\n", irq);
 
 	return irq;
 }
@@ -158,12 +129,6 @@ EXPORT_SYMBOL(ht_create_irq);
  */
 void ht_destroy_irq(unsigned int irq)
 {
-	struct ht_irq_cfg *cfg;
-
-	cfg = irq_get_handler_data(irq);
-	irq_set_chip(irq, NULL);
-	irq_set_handler_data(irq, NULL);
-	arch_free_ht_irq(irq);
-	kfree(cfg);
+	arch_teardown_ht_irq(irq);
 }
 EXPORT_SYMBOL(ht_destroy_irq);
diff --git a/include/linux/htirq.h b/include/linux/htirq.h
index 5caa51b7b95c..d4a527e58434 100644
--- a/include/linux/htirq.h
+++ b/include/linux/htirq.h
@@ -1,26 +1,38 @@
 #ifndef LINUX_HTIRQ_H
 #define LINUX_HTIRQ_H
 
+struct pci_dev;
+struct irq_data;
+
 struct ht_irq_msg {
 	u32	address_lo;	/* low 32 bits of the ht irq message */
 	u32	address_hi;	/* high 32 bits of the it irq message */
 };
 
+typedef void (ht_irq_update_t)(struct pci_dev *dev, int irq,
+			       struct ht_irq_msg *msg);
+
+struct ht_irq_cfg {
+	struct pci_dev *dev;
+	 /* Update callback used to cope with buggy hardware */
+	ht_irq_update_t *update;
+	unsigned pos;
+	unsigned idx;
+	struct ht_irq_msg msg;
+};
+
 /* Helper functions.. */
 void fetch_ht_irq_msg(unsigned int irq, struct ht_irq_msg *msg);
 void write_ht_irq_msg(unsigned int irq, struct ht_irq_msg *msg);
-struct irq_data;
 void mask_ht_irq(struct irq_data *data);
 void unmask_ht_irq(struct irq_data *data);
 
 /* The arch hook for getting things started */
-int arch_setup_ht_irq(unsigned int irq, struct pci_dev *dev);
-int arch_alloc_ht_irq(struct pci_dev *dev);
-void arch_free_ht_irq(int irq);
+int arch_setup_ht_irq(int idx, int pos, struct pci_dev *dev,
+		      ht_irq_update_t *update);
+void arch_teardown_ht_irq(unsigned int irq);
 
 /* For drivers of buggy hardware */
-typedef void (ht_irq_update_t)(struct pci_dev *dev, int irq,
-			       struct ht_irq_msg *msg);
 int __ht_create_irq(struct pci_dev *dev, int idx, ht_irq_update_t *update);
 
 #endif /* LINUX_HTIRQ_H */
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [Patch Part2 v4 31/31] x86, uv: Use hierarchy irqdomain to manage UV interrupts
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (29 preceding siblings ...)
  2014-11-04 12:02 ` [Patch Part2 v4 30/31] x86, htirq: Use hierarchy irqdomain to manage Hypertransport interrupts Jiang Liu
@ 2014-11-04 12:02 ` Jiang Liu
  2014-11-04 14:47 ` [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Joerg Roedel
  2014-11-06 13:07 ` Joerg Roedel
  32 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 12:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Matthias Brugger, Jiang Liu
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

Enhance UV code to support hierarchy irqdomain, it helps to make
the architecture more clear.

We should construct hwirq based on mmr_blade and mmr_offset, but
mmr_offset is type of unsigned long, it may exceed the range of
irq_hw_number_t. So help about the way to construct hwirq based
on mmr_blade and mmr_offset is welcomed!

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/include/asm/hw_irq.h |    9 ++
 arch/x86/platform/uv/uv_irq.c |  288 ++++++++++++++++-------------------------
 2 files changed, 120 insertions(+), 177 deletions(-)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index f54e78023218..e0c3332832e7 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -123,6 +123,7 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_MSI,
 	X86_IRQ_ALLOC_TYPE_MSIX,
 	X86_IRQ_ALLOC_TYPE_DMAR,
+	X86_IRQ_ALLOC_TYPE_UV,
 };
 
 struct irq_alloc_info {
@@ -167,6 +168,14 @@ struct irq_alloc_info {
 			void		*ht_update;
 		};
 #endif
+#ifdef	CONFIG_X86_UV
+		struct {
+			int		uv_limit;
+			int		uv_blade;
+			unsigned long	uv_offset;
+			char		*uv_name;
+		};
+#endif
 	};
 };
 
diff --git a/arch/x86/platform/uv/uv_irq.c b/arch/x86/platform/uv/uv_irq.c
index 474912d03f40..50c18bcb7f40 100644
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -19,17 +19,31 @@
 #include <asm/uv/uv_hub.h>
 
 /* MMR offset and pnode of hub sourcing interrupts for a given irq */
-struct uv_irq_2_mmr_pnode{
-	struct rb_node		list;
+struct uv_irq_2_mmr_pnode {
 	unsigned long		offset;
 	int			pnode;
-	int			irq;
 };
 
-static DEFINE_SPINLOCK(uv_irq_lock);
-static struct rb_root		uv_irq_root;
+static void uv_program_mmr(struct irq_cfg *cfg, struct uv_irq_2_mmr_pnode *info)
+{
+	unsigned long mmr_value;
+	struct uv_IO_APIC_route_entry *entry;
+
+	BUILD_BUG_ON(sizeof(struct uv_IO_APIC_route_entry) !=
+		     sizeof(unsigned long));
+
+	mmr_value = 0;
+	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
+	entry->vector		= cfg->vector;
+	entry->delivery_mode	= apic->irq_delivery_mode;
+	entry->dest_mode	= apic->irq_dest_mode;
+	entry->polarity		= 0;
+	entry->trigger		= 0;
+	entry->mask		= 0;
+	entry->dest		= cfg->dest_apicid;
 
-static int uv_set_irq_affinity(struct irq_data *, const struct cpumask *, bool);
+	uv_write_global_mmr64(info->pnode, info->offset, mmr_value);
+}
 
 static void uv_noop(struct irq_data *data) { }
 
@@ -38,6 +52,24 @@ static void uv_ack_apic(struct irq_data *data)
 	ack_APIC_irq();
 }
 
+static int
+uv_set_irq_affinity(struct irq_data *data, const struct cpumask *mask,
+		    bool force)
+{
+	struct irq_data *parent = data->parent_data;
+	struct irq_cfg *cfg = irqd_cfg(data);
+	int ret;
+
+	ret = parent->chip->irq_set_affinity(parent, mask, force);
+	if (ret >= 0) {
+		uv_program_mmr(cfg, data->chip_data);
+		if (cfg->move_in_progress)
+			send_cleanup_vector(cfg);
+	}
+
+	return ret;
+}
+
 static struct irq_chip uv_irq_chip = {
 	.name			= "UV-CORE",
 	.irq_mask		= uv_noop,
@@ -46,179 +78,104 @@ static struct irq_chip uv_irq_chip = {
 	.irq_set_affinity	= uv_set_irq_affinity,
 };
 
-/*
- * Add offset and pnode information of the hub sourcing interrupts to the
- * rb tree for a specific irq.
- */
-static int uv_set_irq_2_mmr_info(int irq, unsigned long offset, unsigned blade)
+static int uv_domain_alloc(struct irq_domain *domain, unsigned int virq,
+			   unsigned int nr_irqs, void *arg)
 {
-	struct rb_node **link = &uv_irq_root.rb_node;
-	struct rb_node *parent = NULL;
-	struct uv_irq_2_mmr_pnode *n;
-	struct uv_irq_2_mmr_pnode *e;
-	unsigned long irqflags;
-
-	n = kmalloc_node(sizeof(struct uv_irq_2_mmr_pnode), GFP_KERNEL,
-				uv_blade_to_memory_nid(blade));
-	if (!n)
+	struct uv_irq_2_mmr_pnode *chip_data;
+	struct irq_alloc_info *info = arg;
+	struct irq_data *irq_data = irq_domain_get_irq_data(domain, virq);
+	int ret;
+
+	if (nr_irqs > 1 || !info || info->type != X86_IRQ_ALLOC_TYPE_UV)
+		return -EINVAL;
+
+	chip_data = kmalloc_node(sizeof(*chip_data), GFP_KERNEL,
+				 irq_data->node);
+	if (!chip_data)
 		return -ENOMEM;
 
-	n->irq = irq;
-	n->offset = offset;
-	n->pnode = uv_blade_to_pnode(blade);
-	spin_lock_irqsave(&uv_irq_lock, irqflags);
-	/* Find the right place in the rbtree: */
-	while (*link) {
-		parent = *link;
-		e = rb_entry(parent, struct uv_irq_2_mmr_pnode, list);
-
-		if (unlikely(irq == e->irq)) {
-			/* irq entry exists */
-			e->pnode = uv_blade_to_pnode(blade);
-			e->offset = offset;
-			spin_unlock_irqrestore(&uv_irq_lock, irqflags);
-			kfree(n);
-			return 0;
-		}
-
-		if (irq < e->irq)
-			link = &(*link)->rb_left;
+	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
+	if (ret >= 0) {
+		if (info->uv_limit == UV_AFFINITY_CPU)
+			irq_set_status_flags(virq, IRQ_NO_BALANCING);
 		else
-			link = &(*link)->rb_right;
+			irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
+
+		chip_data->pnode = uv_blade_to_pnode(info->uv_blade);
+		chip_data->offset = info->uv_offset;
+		irq_domain_set_hwirq_and_chip(domain, virq, virq,
+					      &uv_irq_chip, chip_data);
+		__irq_set_handler(virq, handle_percpu_irq, 0, info->uv_name);
+	} else {
+		kfree(chip_data);
 	}
 
-	/* Insert the node into the rbtree. */
-	rb_link_node(&n->list, parent, link);
-	rb_insert_color(&n->list, &uv_irq_root);
-
-	spin_unlock_irqrestore(&uv_irq_lock, irqflags);
-	return 0;
+	return ret;
 }
 
-/* Retrieve offset and pnode information from the rb tree for a specific irq */
-int uv_irq_2_mmr_info(int irq, unsigned long *offset, int *pnode)
+static void uv_domain_free(struct irq_domain *domain, unsigned int virq,
+			   unsigned int nr_irqs)
 {
-	struct uv_irq_2_mmr_pnode *e;
-	struct rb_node *n;
-	unsigned long irqflags;
-
-	spin_lock_irqsave(&uv_irq_lock, irqflags);
-	n = uv_irq_root.rb_node;
-	while (n) {
-		e = rb_entry(n, struct uv_irq_2_mmr_pnode, list);
-
-		if (e->irq == irq) {
-			*offset = e->offset;
-			*pnode = e->pnode;
-			spin_unlock_irqrestore(&uv_irq_lock, irqflags);
-			return 0;
-		}
-
-		if (irq < e->irq)
-			n = n->rb_left;
-		else
-			n = n->rb_right;
-	}
-	spin_unlock_irqrestore(&uv_irq_lock, irqflags);
-	return -1;
+	struct irq_data *irq_data = irq_domain_get_irq_data(domain, virq);
+
+	BUG_ON(nr_irqs != 1);
+	kfree(irq_data->chip_data);
+	irq_clear_status_flags(virq, IRQ_MOVE_PCNTXT);
+	irq_clear_status_flags(virq, IRQ_NO_BALANCING);
+	irq_domain_free_irqs_top(domain, virq, nr_irqs);
 }
 
 /*
  * Re-target the irq to the specified CPU and enable the specified MMR located
  * on the specified blade to allow the sending of MSIs to the specified CPU.
  */
-static int
-arch_enable_uv_irq(char *irq_name, unsigned int irq, int cpu, int mmr_blade,
-		       unsigned long mmr_offset, int limit)
+static int uv_domain_activate(struct irq_domain *domain,
+			      struct irq_data *irq_data)
 {
-	struct irq_cfg *cfg = irq_cfg(irq);
-	unsigned long mmr_value;
-	struct uv_IO_APIC_route_entry *entry;
-	int mmr_pnode;
-
-	BUILD_BUG_ON(sizeof(struct uv_IO_APIC_route_entry) !=
-			sizeof(unsigned long));
-
-	if (limit == UV_AFFINITY_CPU)
-		irq_set_status_flags(irq, IRQ_NO_BALANCING);
-	else
-		irq_set_status_flags(irq, IRQ_MOVE_PCNTXT);
-
-	irq_set_chip_and_handler_name(irq, &uv_irq_chip, handle_percpu_irq,
-				      irq_name);
+	uv_program_mmr(irqd_cfg(irq_data), irq_data->chip_data);
 
-	mmr_value = 0;
-	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
-	entry->vector		= cfg->vector;
-	entry->delivery_mode	= apic->irq_delivery_mode;
-	entry->dest_mode	= apic->irq_dest_mode;
-	entry->polarity		= 0;
-	entry->trigger		= 0;
-	entry->mask		= 0;
-	entry->dest		= cfg->dest_apicid;
-
-	mmr_pnode = uv_blade_to_pnode(mmr_blade);
-	uv_write_global_mmr64(mmr_pnode, mmr_offset, mmr_value);
-
-	if (cfg->move_in_progress)
-		send_cleanup_vector(cfg);
-
-	return irq;
+	return 0;
 }
 
 /*
  * Disable the specified MMR located on the specified blade so that MSIs are
  * longer allowed to be sent.
  */
-static void arch_disable_uv_irq(int mmr_pnode, unsigned long mmr_offset)
+static int uv_domain_deactivate(struct irq_domain *domain,
+				struct irq_data *irq_data)
 {
 	unsigned long mmr_value;
 	struct uv_IO_APIC_route_entry *entry;
 
-	BUILD_BUG_ON(sizeof(struct uv_IO_APIC_route_entry) !=
-			sizeof(unsigned long));
-
 	mmr_value = 0;
 	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
 	entry->mask = 1;
+	uv_program_mmr(irqd_cfg(irq_data), irq_data->chip_data);
 
-	uv_write_global_mmr64(mmr_pnode, mmr_offset, mmr_value);
+	return 0;
 }
 
-static int
-uv_set_irq_affinity(struct irq_data *data, const struct cpumask *mask,
-		    bool force)
-{
-	struct irq_cfg *cfg = irqd_cfg(data);
-	unsigned int dest;
-	unsigned long mmr_value, mmr_offset;
-	struct uv_IO_APIC_route_entry *entry;
-	int mmr_pnode;
-
-	if (apic_set_affinity(data, mask, &dest))
-		return -1;
-
-	mmr_value = 0;
-	entry = (struct uv_IO_APIC_route_entry *)&mmr_value;
-
-	entry->vector		= cfg->vector;
-	entry->delivery_mode	= apic->irq_delivery_mode;
-	entry->dest_mode	= apic->irq_dest_mode;
-	entry->polarity		= 0;
-	entry->trigger		= 0;
-	entry->mask		= 0;
-	entry->dest		= dest;
-
-	/* Get previously stored MMR and pnode of hub sourcing interrupts */
-	if (uv_irq_2_mmr_info(data->irq, &mmr_offset, &mmr_pnode))
-		return -1;
-
-	uv_write_global_mmr64(mmr_pnode, mmr_offset, mmr_value);
+static struct irq_domain_ops uv_domain_ops = {
+	.alloc = uv_domain_alloc,
+	.free = uv_domain_free,
+	.activate = uv_domain_activate,
+	.deactivate = uv_domain_deactivate,
+};
 
-	if (cfg->move_in_progress)
-		send_cleanup_vector(cfg);
+static struct irq_domain *uv_get_irq_domain(void)
+{
+	static struct irq_domain *uv_domain;
+	static DEFINE_MUTEX(uv_lock);
+
+	mutex_lock(&uv_lock);
+	if (uv_domain == NULL) {
+		uv_domain = irq_domain_add_tree(NULL, &uv_domain_ops, NULL);
+		if (uv_domain)
+			uv_domain->parent = x86_vector_domain;
+	}
+	mutex_unlock(&uv_lock);
 
-	return IRQ_SET_MASK_OK_NOCOPY;
+	return uv_domain;
 }
 
 /*
@@ -229,23 +186,20 @@ uv_set_irq_affinity(struct irq_data *data, const struct cpumask *mask,
 int uv_setup_irq(char *irq_name, int cpu, int mmr_blade,
 		 unsigned long mmr_offset, int limit)
 {
-	int ret, irq;
 	struct irq_alloc_info info;
+	struct irq_domain *domain = uv_get_irq_domain();
+
+	if (!domain)
+		return -ENOMEM;
 
 	init_irq_alloc_info(&info, cpumask_of(cpu));
-	irq = irq_domain_alloc_irqs(NULL, 1, uv_blade_to_memory_nid(mmr_blade),
-				    &info);
-	if (irq <= 0)
-		return -EBUSY;
-
-	ret = arch_enable_uv_irq(irq_name, irq, cpu, mmr_blade, mmr_offset,
-		limit);
-	if (ret == irq)
-		uv_set_irq_2_mmr_info(irq, mmr_offset, mmr_blade);
-	else
-		irq_domain_free_irqs(irq, 1);
+	info.uv_limit = limit;
+	info.uv_blade = mmr_blade;
+	info.uv_offset = mmr_offset;
+	info.uv_name = irq_name;
 
-	return ret;
+	return irq_domain_alloc_irqs(domain, 1,
+				     uv_blade_to_memory_nid(mmr_blade), &info);
 }
 EXPORT_SYMBOL_GPL(uv_setup_irq);
 
@@ -258,26 +212,6 @@ EXPORT_SYMBOL_GPL(uv_setup_irq);
  */
 void uv_teardown_irq(unsigned int irq)
 {
-	struct uv_irq_2_mmr_pnode *e;
-	struct rb_node *n;
-	unsigned long irqflags;
-
-	spin_lock_irqsave(&uv_irq_lock, irqflags);
-	n = uv_irq_root.rb_node;
-	while (n) {
-		e = rb_entry(n, struct uv_irq_2_mmr_pnode, list);
-		if (e->irq == irq) {
-			arch_disable_uv_irq(e->pnode, e->offset);
-			rb_erase(n, &uv_irq_root);
-			kfree(e);
-			break;
-		}
-		if (irq < e->irq)
-			n = n->rb_left;
-		else
-			n = n->rb_right;
-	}
-	spin_unlock_irqrestore(&uv_irq_lock, irqflags);
 	irq_domain_free_irqs(irq, 1);
 }
 EXPORT_SYMBOL_GPL(uv_teardown_irq);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (30 preceding siblings ...)
  2014-11-04 12:02 ` [Patch Part2 v4 31/31] x86, uv: Use hierarchy irqdomain to manage UV interrupts Jiang Liu
@ 2014-11-04 14:47 ` Joerg Roedel
  2014-11-04 15:12   ` Jiang Liu
  2014-11-06 13:07 ` Joerg Roedel
  32 siblings, 1 reply; 65+ messages in thread
From: Joerg Roedel @ 2014-11-04 14:47 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

Hi Jiang,

On Tue, Nov 04, 2014 at 08:01:34PM +0800, Jiang Liu wrote:
> This is the second patch set to enable support of hierarchy irqdomain
> on x86 platforms. It depends on the first part at:
> https://lkml.org/lkml/2014/10/27/122
> And you may access it at:
> https://github.com/jiangliu/linux.git irqdomain/p2v4

I gave this some testing on a couple of machines. Unfortunatly it panics
on my AMD Kaveri system with IOMMU enabled in drivers/pci/msi.c:

static void msi_set_mask_bit(struct irq_data *data, u32 flag)
{
        struct msi_desc *desc = irq_data_get_msi(data);

        if (desc->msi_attrib.is_msix) {		<-- at this line something goes wrong
                msix_mask_irq(desc, flag);
                readl(desc->mask_base);         /* Flush write to device */
        } else {
                unsigned offset = data->irq - desc->irq;
                msi_mask_irq(desc, 1 << offset, flag << offset);
        }
}

I am further investigating to find out what went wrong, but maybe you
also have an idea?


	Joerg


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
  2014-11-04 14:47 ` [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Joerg Roedel
@ 2014-11-04 15:12   ` Jiang Liu
  2014-11-04 15:32     ` Joerg Roedel
  2014-11-05  8:51     ` Joerg Roedel
  0 siblings, 2 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-04 15:12 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

On 2014/11/4 22:47, Joerg Roedel wrote:
> Hi Jiang,
> 
> On Tue, Nov 04, 2014 at 08:01:34PM +0800, Jiang Liu wrote:
>> This is the second patch set to enable support of hierarchy irqdomain
>> on x86 platforms. It depends on the first part at:
>> https://lkml.org/lkml/2014/10/27/122
>> And you may access it at:
>> https://github.com/jiangliu/linux.git irqdomain/p2v4
> 
> I gave this some testing on a couple of machines. Unfortunatly it panics
> on my AMD Kaveri system with IOMMU enabled in drivers/pci/msi.c:
> 
> static void msi_set_mask_bit(struct irq_data *data, u32 flag)
> {
>         struct msi_desc *desc = irq_data_get_msi(data);
> 
>         if (desc->msi_attrib.is_msix) {		<-- at this line something goes wrong
>                 msix_mask_irq(desc, flag);
>                 readl(desc->mask_base);         /* Flush write to device */
>         } else {
>                 unsigned offset = data->irq - desc->irq;
>                 msi_mask_irq(desc, 1 << offset, flag << offset);
>         }
> }
> 
> I am further investigating to find out what went wrong, but maybe you
> also have an idea?
Hi Joerg,
	Thanks for testing:)
	Do you have the call stack? I have changed the way to call
irq_set_msi_desc_off() for MSI/MSIx interrupts, which may cause the
panic. Patch 19-21 changes the PCI MSI code.
Regards!
Gerry

> 
> 
> 	Joerg
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
  2014-11-04 15:12   ` Jiang Liu
@ 2014-11-04 15:32     ` Joerg Roedel
  2014-11-05  8:51     ` Joerg Roedel
  1 sibling, 0 replies; 65+ messages in thread
From: Joerg Roedel @ 2014-11-04 15:32 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

On Tue, Nov 04, 2014 at 11:12:51PM +0800, Jiang Liu wrote:
> 	Do you have the call stack? I have changed the way to call
> irq_set_msi_desc_off() for MSI/MSIx interrupts, which may cause the
> panic. Patch 19-21 changes the PCI MSI code.

Unfortunatly I have no full call-stack yet, as the intersting parts
scrolled out of the screen. But the last interesting function seen there
was do_one_initcall(). I'll try to setup serial console on the machine
and let you know when I have a full call-stack.


	Joerg

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
  2014-11-04 15:12   ` Jiang Liu
  2014-11-04 15:32     ` Joerg Roedel
@ 2014-11-05  8:51     ` Joerg Roedel
  2014-11-05  9:04       ` Jiang Liu
  2014-11-05  9:41       ` Jiang Liu
  1 sibling, 2 replies; 65+ messages in thread
From: Joerg Roedel @ 2014-11-05  8:51 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

On Tue, Nov 04, 2014 at 11:12:51PM +0800, Jiang Liu wrote:
> 	Do you have the call stack?

Okay, had some issues with serial setup, but now its working. Here is
the complete panic msg from the AMD Kaveri box (the panic only occurs
with IOMMU enabled):

[    2.487552] ahci 0000:00:11.0: AHCI 0001.0300 32 slots 8 ports 6 Gbps 0xff impl SATA mode
[    2.495844] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part 
[    2.504592] BUG: unable to handle kernel NULL pointer dereference at           (null)
[    2.512618] IP: [<ffffffff8136849d>] msi_set_mask_bit+0xd/0x50
[    2.518556] PGD 0 
[    2.520672] Oops: 0000 [#1] PREEMPT SMP 
[    2.524784] Modules linked in:
[    2.527946] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc3+ #4
[    2.534384] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./F2A88XM-HD3, BIOS F6 05/28/2014
[    2.544576] task: ffff88042b54c010 ti: ffff88042b550000 task.ti: ffff88042b550000
[    2.552170] RIP: 0010:[<ffffffff8136849d>]  [<ffffffff8136849d>] msi_set_mask_bit+0xd/0x50
[    2.560594] RSP: 0000:ffff88042b5539d8  EFLAGS: 00010096
[    2.565954] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88042b553968
[    2.573175] RDX: ffffffff81c25f40 RSI: 0000000000000000 RDI: ffff880424c65c00
[    2.580361] RBP: ffff88042b5539e8 R08: ffff88042b519800 R09: ffff88042b000b20
[    2.587582] R10: ffff880424c24410 R11: 0000000000000246 R12: 0000000000000001
[    2.594762] R13: ffff8804253fb2c0 R14: ffff880424c24410 R15: ffff880424c65c98
[    2.601983] FS:  0000000000000000(0000) GS:ffff88043ed80000(0000) knlGS:0000000000000000
[    2.610181] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.615975] CR2: 0000000000000000 CR3: 0000000001c16000 CR4: 00000000000407e0
[    2.623197] Stack:
[    2.625264]  ffff88042b5539f8 ffff880424c65c00 ffff88042b5539f8 ffffffff813688fb
[    2.632936]  ffff88042b553a18 ffffffff810b0603 ffff880424c65c00 ffff880424c65c00
[    2.640598]  ffff88042b553a48 ffffffff810b0685 0000000000000000 0000000000000000
[    2.648261] Call Trace:
[    2.650768]  [<ffffffff813688fb>] unmask_msi_irq+0xb/0x10
[    2.656222]  [<ffffffff810b0603>] irq_enable+0x33/0x50
[    2.661414]  [<ffffffff810b0685>] irq_startup+0x65/0x70
[    2.666696]  [<ffffffff810af161>] __setup_irq+0x511/0x5a0
[    2.672152]  [<ffffffff81196326>] ? __kmalloc_track_caller+0x256/0x4b0
[    2.678733]  [<ffffffff81460a50>] ? ahci_bad_pmp_check_ready+0x60/0x60
[    2.685347]  [<ffffffff810af34a>] request_threaded_irq+0xca/0x170
[    2.691529]  [<ffffffff81460a50>] ? ahci_bad_pmp_check_ready+0x60/0x60
[    2.698110]  [<ffffffff81461ee0>] ? ahci_single_irq_intr+0x110/0x110
[    2.704517]  [<ffffffff810b102a>] devm_request_threaded_irq+0x5a/0xc0
[    2.711002]  [<ffffffff81462b93>] ahci_host_activate+0x143/0x220
[    2.717098]  [<ffffffff814602a8>] ahci_init_one+0x7b8/0xb00
[    2.722728]  [<ffffffff8134e760>] local_pci_probe+0x40/0xa0
[    2.728355]  [<ffffffff8134f9b5>] ? pci_match_device+0xe5/0x110
[    2.734365]  [<ffffffff8134faf1>] pci_device_probe+0xd1/0x130
[    2.740164]  [<ffffffff81413d9b>] driver_probe_device+0x8b/0x3d0
[    2.746216]  [<ffffffff814141b3>] __driver_attach+0x93/0xa0
[    2.751846]  [<ffffffff81414120>] ? __device_attach+0x40/0x40
[    2.757681]  [<ffffffff81411e13>] bus_for_each_dev+0x63/0xa0
[    2.763385]  [<ffffffff81413819>] driver_attach+0x19/0x20
[    2.768842]  [<ffffffff81413430>] bus_add_driver+0x180/0x250
[    2.774591]  [<ffffffff81d2fd60>] ? ata_sff_init+0x33/0x33
[    2.780123]  [<ffffffff81414a0f>] driver_register+0x5f/0xf0
[    2.785751]  [<ffffffff8134e107>] __pci_register_driver+0x47/0x50
[    2.791892]  [<ffffffff81d2fd79>] ahci_pci_driver_init+0x19/0x1b
[    2.797954]  [<ffffffff810002f4>] do_one_initcall+0xb4/0x1f0
[    2.803667]  [<ffffffff81095e23>] ? __wake_up+0x43/0x60
[    2.808948]  [<ffffffff81ce7248>] kernel_init_freeable+0x197/0x21f
[    2.815181]  [<ffffffff81ce6983>] ? initcall_blacklist+0xc0/0xc0
[    2.821280]  [<ffffffff815fe680>] ? rest_init+0x90/0x90
[    2.826561]  [<ffffffff815fe689>] kernel_init+0x9/0xf0
[    2.831756]  [<ffffffff8161433c>] ret_from_fork+0x7c/0xb0
[    2.837211]  [<ffffffff815fe680>] ? rest_init+0x90/0x90
[    2.842489] Code: c1 83 c9 01 83 c2 0c 85 f6 0f 45 c1 48 63 d2 48 03 57 28 89 02 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 53 48 83 ec 08 48 8b 5f 40 <f6> 03 01 75 26 8b 4f 04 2b 4b 0c 89 f2 be 01 00 00 00 48 89 df 
[    2.864562] RIP  [<ffffffff8136849d>] msi_set_mask_bit+0xd/0x50
[    2.870588]  RSP <ffff88042b5539d8>
[    2.874127] CR2: 0000000000000000
[    2.877501] ---[ end trace dd9f8c29b83b2de1 ]---
[    2.882174] note: swapper/0[1] exited with preempt_count 1
[    2.887769] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    2.887769] 
[    2.897147] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[    2.907440] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
  2014-11-05  8:51     ` Joerg Roedel
@ 2014-11-05  9:04       ` Jiang Liu
  2014-11-05  9:41       ` Jiang Liu
  1 sibling, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-05  9:04 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

On 2014/11/5 16:51, Joerg Roedel wrote:
> On Tue, Nov 04, 2014 at 11:12:51PM +0800, Jiang Liu wrote:
>> 	Do you have the call stack?
> 
> Okay, had some issues with serial setup, but now its working. Here is
> the complete panic msg from the AMD Kaveri box (the panic only occurs
> with IOMMU enabled):
> 
> [    2.487552] ahci 0000:00:11.0: AHCI 0001.0300 32 slots 8 ports 6 Gbps 0xff impl SATA mode
> [    2.495844] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part 
> [    2.504592] BUG: unable to handle kernel NULL pointer dereference at           (null)
> [    2.512618] IP: [<ffffffff8136849d>] msi_set_mask_bit+0xd/0x50
> [    2.518556] PGD 0 
> [    2.520672] Oops: 0000 [#1] PREEMPT SMP 
> [    2.524784] Modules linked in:
> [    2.527946] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc3+ #4
> [    2.534384] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./F2A88XM-HD3, BIOS F6 05/28/2014
> [    2.544576] task: ffff88042b54c010 ti: ffff88042b550000 task.ti: ffff88042b550000
> [    2.552170] RIP: 0010:[<ffffffff8136849d>]  [<ffffffff8136849d>] msi_set_mask_bit+0xd/0x50
> [    2.560594] RSP: 0000:ffff88042b5539d8  EFLAGS: 00010096
> [    2.565954] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88042b553968
> [    2.573175] RDX: ffffffff81c25f40 RSI: 0000000000000000 RDI: ffff880424c65c00
> [    2.580361] RBP: ffff88042b5539e8 R08: ffff88042b519800 R09: ffff88042b000b20
> [    2.587582] R10: ffff880424c24410 R11: 0000000000000246 R12: 0000000000000001
> [    2.594762] R13: ffff8804253fb2c0 R14: ffff880424c24410 R15: ffff880424c65c98
> [    2.601983] FS:  0000000000000000(0000) GS:ffff88043ed80000(0000) knlGS:0000000000000000
> [    2.610181] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.615975] CR2: 0000000000000000 CR3: 0000000001c16000 CR4: 00000000000407e0
> [    2.623197] Stack:
> [    2.625264]  ffff88042b5539f8 ffff880424c65c00 ffff88042b5539f8 ffffffff813688fb
> [    2.632936]  ffff88042b553a18 ffffffff810b0603 ffff880424c65c00 ffff880424c65c00
> [    2.640598]  ffff88042b553a48 ffffffff810b0685 0000000000000000 0000000000000000
> [    2.648261] Call Trace:
> [    2.650768]  [<ffffffff813688fb>] unmask_msi_irq+0xb/0x10
> [    2.656222]  [<ffffffff810b0603>] irq_enable+0x33/0x50
> [    2.661414]  [<ffffffff810b0685>] irq_startup+0x65/0x70
> [    2.666696]  [<ffffffff810af161>] __setup_irq+0x511/0x5a0
> [    2.672152]  [<ffffffff81196326>] ? __kmalloc_track_caller+0x256/0x4b0
> [    2.678733]  [<ffffffff81460a50>] ? ahci_bad_pmp_check_ready+0x60/0x60
> [    2.685347]  [<ffffffff810af34a>] request_threaded_irq+0xca/0x170
> [    2.691529]  [<ffffffff81460a50>] ? ahci_bad_pmp_check_ready+0x60/0x60
> [    2.698110]  [<ffffffff81461ee0>] ? ahci_single_irq_intr+0x110/0x110
> [    2.704517]  [<ffffffff810b102a>] devm_request_threaded_irq+0x5a/0xc0
> [    2.711002]  [<ffffffff81462b93>] ahci_host_activate+0x143/0x220
> [    2.717098]  [<ffffffff814602a8>] ahci_init_one+0x7b8/0xb00
> [    2.722728]  [<ffffffff8134e760>] local_pci_probe+0x40/0xa0
> [    2.728355]  [<ffffffff8134f9b5>] ? pci_match_device+0xe5/0x110
> [    2.734365]  [<ffffffff8134faf1>] pci_device_probe+0xd1/0x130
> [    2.740164]  [<ffffffff81413d9b>] driver_probe_device+0x8b/0x3d0
> [    2.746216]  [<ffffffff814141b3>] __driver_attach+0x93/0xa0
> [    2.751846]  [<ffffffff81414120>] ? __device_attach+0x40/0x40
> [    2.757681]  [<ffffffff81411e13>] bus_for_each_dev+0x63/0xa0
> [    2.763385]  [<ffffffff81413819>] driver_attach+0x19/0x20
> [    2.768842]  [<ffffffff81413430>] bus_add_driver+0x180/0x250
> [    2.774591]  [<ffffffff81d2fd60>] ? ata_sff_init+0x33/0x33
> [    2.780123]  [<ffffffff81414a0f>] driver_register+0x5f/0xf0
> [    2.785751]  [<ffffffff8134e107>] __pci_register_driver+0x47/0x50
> [    2.791892]  [<ffffffff81d2fd79>] ahci_pci_driver_init+0x19/0x1b
> [    2.797954]  [<ffffffff810002f4>] do_one_initcall+0xb4/0x1f0
> [    2.803667]  [<ffffffff81095e23>] ? __wake_up+0x43/0x60
> [    2.808948]  [<ffffffff81ce7248>] kernel_init_freeable+0x197/0x21f
> [    2.815181]  [<ffffffff81ce6983>] ? initcall_blacklist+0xc0/0xc0
> [    2.821280]  [<ffffffff815fe680>] ? rest_init+0x90/0x90
> [    2.826561]  [<ffffffff815fe689>] kernel_init+0x9/0xf0
> [    2.831756]  [<ffffffff8161433c>] ret_from_fork+0x7c/0xb0
> [    2.837211]  [<ffffffff815fe680>] ? rest_init+0x90/0x90
> [    2.842489] Code: c1 83 c9 01 83 c2 0c 85 f6 0f 45 c1 48 63 d2 48 03 57 28 89 02 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 53 48 83 ec 08 48 8b 5f 40 <f6> 03 01 75 26 8b 4f 04 2b 4b 0c 89 f2 be 01 00 00 00 48 89 df 
> [    2.864562] RIP  [<ffffffff8136849d>] msi_set_mask_bit+0xd/0x50
> [    2.870588]  RSP <ffff88042b5539d8>
> [    2.874127] CR2: 0000000000000000
> [    2.877501] ---[ end trace dd9f8c29b83b2de1 ]---
> [    2.882174] note: swapper/0[1] exited with preempt_count 1
> [    2.887769] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> [    2.887769] 
> [    2.897147] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> [    2.907440] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
Hi Joerg,
	Something is wrong with multiple MSI interrupt support, which
is only enabled when IRQ remapping is in use. Keeping analyzing and
will ask for you help when I have any ideas for debugging:)
Regards!
Gerry

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
  2014-11-05  8:51     ` Joerg Roedel
  2014-11-05  9:04       ` Jiang Liu
@ 2014-11-05  9:41       ` Jiang Liu
  2014-11-05  9:58         ` Joerg Roedel
  1 sibling, 1 reply; 65+ messages in thread
From: Jiang Liu @ 2014-11-05  9:41 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 5552 bytes --]

Hi Joerg,
	Could you please help to apply the attached patch and send me
console outputs?
Regards!
Gerry

On 2014/11/5 16:51, Joerg Roedel wrote:
> On Tue, Nov 04, 2014 at 11:12:51PM +0800, Jiang Liu wrote:
>> 	Do you have the call stack?
> 
> Okay, had some issues with serial setup, but now its working. Here is
> the complete panic msg from the AMD Kaveri box (the panic only occurs
> with IOMMU enabled):
> 
> [    2.487552] ahci 0000:00:11.0: AHCI 0001.0300 32 slots 8 ports 6 Gbps 0xff impl SATA mode
> [    2.495844] ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part 
> [    2.504592] BUG: unable to handle kernel NULL pointer dereference at           (null)
> [    2.512618] IP: [<ffffffff8136849d>] msi_set_mask_bit+0xd/0x50
> [    2.518556] PGD 0 
> [    2.520672] Oops: 0000 [#1] PREEMPT SMP 
> [    2.524784] Modules linked in:
> [    2.527946] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc3+ #4
> [    2.534384] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./F2A88XM-HD3, BIOS F6 05/28/2014
> [    2.544576] task: ffff88042b54c010 ti: ffff88042b550000 task.ti: ffff88042b550000
> [    2.552170] RIP: 0010:[<ffffffff8136849d>]  [<ffffffff8136849d>] msi_set_mask_bit+0xd/0x50
> [    2.560594] RSP: 0000:ffff88042b5539d8  EFLAGS: 00010096
> [    2.565954] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88042b553968
> [    2.573175] RDX: ffffffff81c25f40 RSI: 0000000000000000 RDI: ffff880424c65c00
> [    2.580361] RBP: ffff88042b5539e8 R08: ffff88042b519800 R09: ffff88042b000b20
> [    2.587582] R10: ffff880424c24410 R11: 0000000000000246 R12: 0000000000000001
> [    2.594762] R13: ffff8804253fb2c0 R14: ffff880424c24410 R15: ffff880424c65c98
> [    2.601983] FS:  0000000000000000(0000) GS:ffff88043ed80000(0000) knlGS:0000000000000000
> [    2.610181] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.615975] CR2: 0000000000000000 CR3: 0000000001c16000 CR4: 00000000000407e0
> [    2.623197] Stack:
> [    2.625264]  ffff88042b5539f8 ffff880424c65c00 ffff88042b5539f8 ffffffff813688fb
> [    2.632936]  ffff88042b553a18 ffffffff810b0603 ffff880424c65c00 ffff880424c65c00
> [    2.640598]  ffff88042b553a48 ffffffff810b0685 0000000000000000 0000000000000000
> [    2.648261] Call Trace:
> [    2.650768]  [<ffffffff813688fb>] unmask_msi_irq+0xb/0x10
> [    2.656222]  [<ffffffff810b0603>] irq_enable+0x33/0x50
> [    2.661414]  [<ffffffff810b0685>] irq_startup+0x65/0x70
> [    2.666696]  [<ffffffff810af161>] __setup_irq+0x511/0x5a0
> [    2.672152]  [<ffffffff81196326>] ? __kmalloc_track_caller+0x256/0x4b0
> [    2.678733]  [<ffffffff81460a50>] ? ahci_bad_pmp_check_ready+0x60/0x60
> [    2.685347]  [<ffffffff810af34a>] request_threaded_irq+0xca/0x170
> [    2.691529]  [<ffffffff81460a50>] ? ahci_bad_pmp_check_ready+0x60/0x60
> [    2.698110]  [<ffffffff81461ee0>] ? ahci_single_irq_intr+0x110/0x110
> [    2.704517]  [<ffffffff810b102a>] devm_request_threaded_irq+0x5a/0xc0
> [    2.711002]  [<ffffffff81462b93>] ahci_host_activate+0x143/0x220
> [    2.717098]  [<ffffffff814602a8>] ahci_init_one+0x7b8/0xb00
> [    2.722728]  [<ffffffff8134e760>] local_pci_probe+0x40/0xa0
> [    2.728355]  [<ffffffff8134f9b5>] ? pci_match_device+0xe5/0x110
> [    2.734365]  [<ffffffff8134faf1>] pci_device_probe+0xd1/0x130
> [    2.740164]  [<ffffffff81413d9b>] driver_probe_device+0x8b/0x3d0
> [    2.746216]  [<ffffffff814141b3>] __driver_attach+0x93/0xa0
> [    2.751846]  [<ffffffff81414120>] ? __device_attach+0x40/0x40
> [    2.757681]  [<ffffffff81411e13>] bus_for_each_dev+0x63/0xa0
> [    2.763385]  [<ffffffff81413819>] driver_attach+0x19/0x20
> [    2.768842]  [<ffffffff81413430>] bus_add_driver+0x180/0x250
> [    2.774591]  [<ffffffff81d2fd60>] ? ata_sff_init+0x33/0x33
> [    2.780123]  [<ffffffff81414a0f>] driver_register+0x5f/0xf0
> [    2.785751]  [<ffffffff8134e107>] __pci_register_driver+0x47/0x50
> [    2.791892]  [<ffffffff81d2fd79>] ahci_pci_driver_init+0x19/0x1b
> [    2.797954]  [<ffffffff810002f4>] do_one_initcall+0xb4/0x1f0
> [    2.803667]  [<ffffffff81095e23>] ? __wake_up+0x43/0x60
> [    2.808948]  [<ffffffff81ce7248>] kernel_init_freeable+0x197/0x21f
> [    2.815181]  [<ffffffff81ce6983>] ? initcall_blacklist+0xc0/0xc0
> [    2.821280]  [<ffffffff815fe680>] ? rest_init+0x90/0x90
> [    2.826561]  [<ffffffff815fe689>] kernel_init+0x9/0xf0
> [    2.831756]  [<ffffffff8161433c>] ret_from_fork+0x7c/0xb0
> [    2.837211]  [<ffffffff815fe680>] ? rest_init+0x90/0x90
> [    2.842489] Code: c1 83 c9 01 83 c2 0c 85 f6 0f 45 c1 48 63 d2 48 03 57 28 89 02 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 53 48 83 ec 08 48 8b 5f 40 <f6> 03 01 75 26 8b 4f 04 2b 4b 0c 89 f2 be 01 00 00 00 48 89 df 
> [    2.864562] RIP  [<ffffffff8136849d>] msi_set_mask_bit+0xd/0x50
> [    2.870588]  RSP <ffff88042b5539d8>
> [    2.874127] CR2: 0000000000000000
> [    2.877501] ---[ end trace dd9f8c29b83b2de1 ]---
> [    2.882174] note: swapper/0[1] exited with preempt_count 1
> [    2.887769] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> [    2.887769] 
> [    2.897147] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> [    2.907440] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

[-- Attachment #2: 0001-.patch --]
[-- Type: text/plain, Size: 3643 bytes --]

>From 705c73aee455cfe5abb27da0d62cb38e1a256bde Mon Sep 17 00:00:00 2001
From: Jiang Liu <jiang.liu@linux.intel.com>
Date: Wed, 5 Nov 2014 17:25:04 +0800
Subject: [PATCH]


Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
 arch/x86/kernel/apic/msi.c |    1 +
 drivers/ata/ahci.c         |    2 ++
 drivers/ata/libahci.c      |    1 +
 drivers/pci/msi.c          |    7 ++++++-
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index cc70fc659121..a2dffe3c30ce 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -110,6 +110,7 @@ int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 	if (type == PCI_CAP_ID_MSI) {
 		info.type = X86_IRQ_ALLOC_TYPE_MSI;
 		info.flags |= X86_IRQ_ALLOC_CONTIGOUS_VECTORS;
+		dev_warn(&dev->dev, "irqdomain: try allocate %d MSI IRQs\n", nvec);
 	} else {
 		info.type = X86_IRQ_ALLOC_TYPE_MSIX;
 	}
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 5f039f191067..13985ba61b18 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1200,6 +1200,7 @@ static int ahci_init_interrupts(struct pci_dev *pdev, unsigned int n_ports,
 	if (nvec < 0)
 		goto intx;
 
+	pr_warn("irqdomain: AHCI %d ports, %d MSI\n", n_ports, nvec);
 	/*
 	 * If number of MSIs is less than number of ports then Sharing Last
 	 * Message mode could be enforced. In this case assume that advantage
@@ -1214,6 +1215,7 @@ static int ahci_init_interrupts(struct pci_dev *pdev, unsigned int n_ports,
 	else if (rc < 0)
 		goto intx;
 
+	pr_warn("irqdomain: AHCI allocated IRQ%d to IRQ%d\n", dev->irq, dev->irq + nvec - 1);
 	/* fallback to single MSI mode if the controller enforced MRSM mode */
 	if (readl(hpriv->mmio + HOST_CTL) & HOST_MRSM) {
 		pci_disable_msi(pdev);
diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index 5eb61c9e63da..d90b623eab35 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -2456,6 +2456,7 @@ static int ahci_host_activate_multi_irqs(struct ata_host *host, int irq,
 			continue;
 		}
 
+		pr_warn("irqdomain: request IRQ%d\n", irq + i);
 		rc = devm_request_threaded_irq(host->dev, irq + i,
 					       ahci_multi_irqs_intr,
 					       ahci_port_thread_fn, IRQF_SHARED,
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 7423ee16972f..4d8fef065e2c 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -216,6 +216,8 @@ static void msi_set_mask_bit(struct irq_data *data, u32 flag)
 {
 	struct msi_desc *desc = irq_data_get_msi(data);
 
+	if (desc == NULL)
+		pr_warn("no msi_desc for IRQ%d\n", data->irq);
 	if (desc->msi_attrib.is_msix) {
 		msix_mask_irq(desc, flag);
 		readl(desc->mask_base);		/* Flush write to device */
@@ -1202,6 +1204,8 @@ int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
 	int node = dev_to_node(&dev->dev);
 
 	list_for_each_entry(msidesc, &dev->msi_list, list) {
+		if (type == PCI_CAP_ID_MSI && msidesc->nvec_used > 1) 
+			dev_warn(&dev->dev, "try to alloc nvec %d\n", msidesc->nvec_used);
 		arch_msi_irq_domain_set_hwirq(arg, msi_get_hwirq(dev, msidesc));
 		virq = irq_domain_alloc_irqs(domain, msidesc->nvec_used,
 					     node, arg);
@@ -1210,8 +1214,9 @@ int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
 			return (type == PCI_CAP_ID_MSI &&
 				msidesc->nvec_used > 1) ?  1 : -ENOSPC;
 		}
+		dev_warn(&dev->dev, "allocated IRQ%d for MSI\n", virq);
 		for (i = 0; i < msidesc->nvec_used; i++)
-			irq_set_msi_desc_off(virq + i, i, msidesc);
+			BUG_ON(irq_set_msi_desc_off(virq + i, i, msidesc));
 	}
 
 	list_for_each_entry(msidesc, &dev->msi_list, list)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
  2014-11-05  9:41       ` Jiang Liu
@ 2014-11-05  9:58         ` Joerg Roedel
  2014-11-05 10:28           ` Jiang Liu
  0 siblings, 1 reply; 65+ messages in thread
From: Joerg Roedel @ 2014-11-05  9:58 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 163 bytes --]

On Wed, Nov 05, 2014 at 05:41:50PM +0800, Jiang Liu wrote:
> 	Could you please help to apply the attached patch and send me
> console outputs?

Sure, here it is.


[-- Attachment #2: kv-boot.log --]
[-- Type: text/plain, Size: 29478 bytes --]

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.18.0-rc3+ (joro@kv) (gcc version 4.8.1 20130909 [gcc-4_8-branch revision 202388] (SUSE Linux) ) #5 SMP PREEMPT Wed Nov 5 10:52:24 CET 2014
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.18.0-rc3+ root=UUID=fb121574-ea39-49a2-a896-0750eff9d30d resume=/dev/disk/by-id/ata-KINGSTON_SV300S37A120G_50026B773C03A9A5-part1 showopts amd_iommu_dump console=ttyS0,115200 console=tty0
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009e7ff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009e800-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007cb72fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007cb73000-0x000000007cba2fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007cba3000-0x000000007ce65fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007ce66000-0x000000007cf33fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007cf34000-0x000000007e1c8fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007e1c9000-0x000000007e1c9fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007e1ca000-0x000000007e3cffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000007e3d0000-0x000000007e850fff] usable
[    0.000000] BIOS-e820: [mem 0x000000007e851000-0x000000007efe1fff] reserved
[    0.000000] BIOS-e820: [mem 0x000000007efe2000-0x000000007effffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec01fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000043effffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.7 present.
[    0.000000] AGP: No AGP bridge found
[    0.000000] e820: last_pfn = 0x43f000 max_arch_pfn = 0x400000000
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[    0.000000] e820: last_pfn = 0x7f000 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [mem 0x000fd6c0-0x000fd6cf] mapped at [ffff8800000fd6c0]
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] Using GB pages for direct mapping
[    0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[    0.000000] init_memory_mapping: [mem 0x43ee00000-0x43effffff]
[    0.000000] init_memory_mapping: [mem 0x43c000000-0x43edfffff]
[    0.000000] init_memory_mapping: [mem 0x400000000-0x43bffffff]
[    0.000000] init_memory_mapping: [mem 0x00100000-0x7cb72fff]
[    0.000000] init_memory_mapping: [mem 0x7cba3000-0x7ce65fff]
[    0.000000] init_memory_mapping: [mem 0x7e1c9000-0x7e1c9fff]
[    0.000000] init_memory_mapping: [mem 0x7e3d0000-0x7e850fff]
[    0.000000] init_memory_mapping: [mem 0x7efe2000-0x7effffff]
[    0.000000] init_memory_mapping: [mem 0x100000000-0x3ffffffff]
[    0.000000] RAMDISK: [mem 0x3322a000-0x3590cfff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F0490 000024 (v02 ALASKA)
[    0.000000] ACPI: XSDT 0x000000007CEEA080 000084 (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] ACPI: FACP 0x000000007CEF0340 00010C (v05 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has zero address or length: 0x0000000000000000/0x1 (20140926/tbfadt-649)
[    0.000000] ACPI: DSDT 0x000000007CEEA1A0 0061A0 (v02 ALASKA A M I    00000088 INTL 20051117)
[    0.000000] ACPI: FACS 0x000000007CF29080 000040
[    0.000000] ACPI: APIC 0x000000007CEF0450 00007E (v03 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] ACPI: FPDT 0x000000007CEF04D0 000044 (v01 ALASKA A M I    01072009 AMI  00010013)
[    0.000000] ACPI: MCFG 0x000000007CEF0518 00003C (v01 ALASKA A M I    01072009 MSFT 00010013)
[    0.000000] ACPI: HPET 0x000000007CEF0558 000038 (v01 ALASKA A M I    01072009 AMI  00000005)
[    0.000000] ACPI: WDRT 0x000000007CEF0590 000047 (v01 ALASKA A M I    01072009 AMI  00000005)
[    0.000000] ACPI: IVRS 0x000000007CEF05D8 000078 (v02 AMD    BANTRY   00000001 AMD  00000000)
[    0.000000] ACPI: SSDT 0x000000007CEF0650 000B9C (v01 AMD    BANTRY   00000001 AMD  00000001)
[    0.000000] ACPI: SSDT 0x000000007CEF11F0 00033B (v02 AMD    BANTRY   00000002 MSFT 04000000)
[    0.000000] ACPI: CRAT 0x000000007CEF1530 0005A0 (v01 AMD    BANTRY   00000001 AMD  00000001)
[    0.000000] ACPI: SSDT 0x000000007CEF1AD0 001457 (v01 AMD    CPMDFIGP 00000001 INTL 20051117)
[    0.000000] ACPI: SSDT 0x000000007CEF2F28 00122C (v01 AMD    CPMCMN   00000001 INTL 20051117)
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000043effffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x43efe7000-0x43effafff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   [mem 0x100000000-0x43effffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00001000-0x0009dfff]
[    0.000000]   node   0: [mem 0x00100000-0x7cb72fff]
[    0.000000]   node   0: [mem 0x7cba3000-0x7ce65fff]
[    0.000000]   node   0: [mem 0x7e1c9000-0x7e1c9fff]
[    0.000000]   node   0: [mem 0x7e3d0000-0x7e850fff]
[    0.000000]   node   0: [mem 0x7efe2000-0x7effffff]
[    0.000000]   node   0: [mem 0x100000000-0x43effffff]
[    0.000000] Initmem setup node 0 [mem 0x00001000-0x43effffff]
[    0.000000] ACPI: PM-Timer IO Port: 0x808
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x10] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x11] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x12] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x13] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 0, version 33, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: IOAPIC (id[0x01] address[0xfec01000] gsi_base[24])
[    0.000000] IOAPIC[1]: apic_id 1, version 33, address 0xfec01000, GSI 24-55
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x10228210 base: 0xfed00000
[    0.000000] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009e000-0x0009efff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000e0000-0x000fffff]
[    0.000000] PM: Registered nosave memory: [mem 0x7cb73000-0x7cba2fff]
[    0.000000] PM: Registered nosave memory: [mem 0x7ce66000-0x7cf33fff]
[    0.000000] PM: Registered nosave memory: [mem 0x7cf34000-0x7e1c8fff]
[    0.000000] PM: Registered nosave memory: [mem 0x7e1ca000-0x7e3cffff]
[    0.000000] PM: Registered nosave memory: [mem 0x7e851000-0x7efe1fff]
[    0.000000] PM: Registered nosave memory: [mem 0x7f000000-0xfebfffff]
[    0.000000] PM: Registered nosave memory: [mem 0xfec00000-0xfec01fff]
[    0.000000] PM: Registered nosave memory: [mem 0xfec02000-0xfec0ffff]
[    0.000000] PM: Registered nosave memory: [mem 0xfec10000-0xfec10fff]
[    0.000000] PM: Registered nosave memory: [mem 0xfec11000-0xfecfffff]
[    0.000000] PM: Registered nosave memory: [mem 0xfed00000-0xfed00fff]
[    0.000000] PM: Registered nosave memory: [mem 0xfed01000-0xfed7ffff]
[    0.000000] PM: Registered nosave memory: [mem 0xfed80000-0xfed8ffff]
[    0.000000] PM: Registered nosave memory: [mem 0xfed90000-0xfeffffff]
[    0.000000] PM: Registered nosave memory: [mem 0xff000000-0xffffffff]
[    0.000000] e820: [mem 0x7f000000-0xfebfffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on bare hardware
[    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:4 nr_node_ids:1
[    0.000000] PERCPU: Embedded 30 pages/cpu @ffff88043ec00000 s83648 r8192 d31040 u524288
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 3862836
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.18.0-rc3+ root=UUID=fb121574-ea39-49a2-a896-0750eff9d30d resume=/dev/disk/by-id/ata-KINGSTON_SV300S37A120G_50026B773C03A9A5-part1 showopts amd_iommu_dump console=ttyS0,115200 console=tty0
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340 using standard form
[    0.000000] AGP: Checking aperture...
[    0.000000] AGP: No AGP bridge found
[    0.000000] AGP: Node 0: aperture [bus addr 0x00000000-0x01ffffff] (32MB)
[    0.000000] AGP: Your BIOS doesn't leave a aperture memory hole
[    0.000000] AGP: Please enable the IOMMU option in the BIOS setup
[    0.000000] AGP: This costs you 64MB of RAM
[    0.000000] AGP: Mapping aperture over RAM [mem 0x74000000-0x77ffffff] (65536KB)
[    0.000000] PM: Registered nosave memory: [mem 0x74000000-0x77ffffff]
[    0.000000] Memory: 15262208K/15665612K available (6243K kernel code, 830K rwdata, 2848K rodata, 1328K init, 1528K bss, 403404K reserved)
[    0.000000] Preemptible hierarchical RCU implementation.
[    0.000000] 	RCU dyntick-idle grace-period acceleration is enabled.
[    0.000000] 	RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=4.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.000000] NR_IRQS:33024 nr_irqs:1000 16
[    0.000000] 	Offload RCU callbacks from all CPUs
[    0.000000] 	Offload RCU callbacks from CPUs: 0-3.
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [ttyS0] enabled
[    0.000000] allocated 62914560 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 3693.331 MHz processor
[    0.000022] Calibrating delay loop (skipped), value calculated using timer frequency.. 7386.66 BogoMIPS (lpj=3693331)
[    0.010762] pid_max: default: 32768 minimum: 301
[    0.015447] ACPI: Core revision 20140926
[    0.024757] ACPI: All ACPI Tables successfully acquired
[    0.031767] Security Framework initialized
[    0.035966] AppArmor: AppArmor initialized
[    0.041012] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
[    0.052731] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.062030] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.069018] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.076706] Initializing cgroup subsys memory
[    0.081226] Initializing cgroup subsys devices
[    0.085730] Initializing cgroup subsys freezer
[    0.090231] Initializing cgroup subsys net_cls
[    0.094734] Initializing cgroup subsys blkio
[    0.099063] Initializing cgroup subsys perf_event
[    0.103824] Initializing cgroup subsys hugetlb
[    0.108346] CPU: Physical Processor ID: 0
[    0.112413] CPU: Processor Core ID: 0
[    0.116136] mce: CPU supports 7 MCE banks
[    0.120210] LVT offset 1 assigned for vector 0xf9
[    0.124972] Last level iTLB entries: 4KB 512, 2MB 1024, 4MB 512
[    0.124972] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 512, 1GB 0
[    0.138536] Freeing SMP alternatives memory: 24K (ffffffff81e1d000 - ffffffff81e23000)
[    0.147712] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: b8 info 0000
[    0.154537] AMD-Vi:        mmio-addr: 00000000feb80000
[    0.159749] AMD-Vi:   DEV_SELECT_RANGE_START	 devid: 00:01.0 flags: 00
[    0.166331] AMD-Vi:   DEV_RANGE_END		 devid: ff:1f.6
[    0.171909] AMD-Vi:   DEV_ALIAS_RANGE		 devid: 02:00.0 flags: 00 devid_to: 00:14.4
[    0.179570] AMD-Vi:   DEV_RANGE_END		 devid: 02:1f.7
[    0.184595] AMD-Vi:   DEV_SPECIAL(HPET[0])		devid: 00:14.0
[    0.190133] AMD-Vi:   DEV_SPECIAL(IOAPIC[0])		devid: 00:14.0
[    0.195882] AMD-Vi:   DEV_SPECIAL(IOAPIC[1])		devid: 00:00.0
[    0.281157] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.297248] smpboot: CPU0: AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G (fam: 15, model: 30, stepping: 01)
[    0.408654] Performance Events: Fam15h core perfctr, AMD PMU driver.
[    0.415209] ... version:                0
[    0.419277] ... bit width:              48
[    0.423424] ... generic registers:      6
[    0.427495] ... value mask:             0000ffffffffffff
[    0.432863] ... max period:             00007fffffffffff
[    0.438231] ... fixed-purpose events:   0
[    0.442300] ... event mask:             000000000000003f
[    0.455818] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[    0.465881] x86: Booting SMP configuration:
[    0.470161] .... node  #0, CPUs:      #1 #2 #3
[    0.517125] x86: Booted up 1 node, 4 CPUs
[    0.521511] smpboot: Total of 4 processors activated (29546.64 BogoMIPS)
[    0.529044] devtmpfs: initialized
[    0.535817] PM: Registering ACPI NVS region [mem 0x7ce66000-0x7cf33fff] (843776 bytes)
[    0.543897] PM: Registering ACPI NVS region [mem 0x7e1ca000-0x7e3cffff] (2121728 bytes)
[    0.552409] RTC time:  9:54:55, date: 11/05/14
[    0.557072] NET: Registered protocol family 16
[    0.564563] cpuidle: using governor ladder
[    0.571554] cpuidle: using governor menu
[    0.575733] ACPI: bus type PCI registered
[    0.579835] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.586410] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)
[    0.595818] PCI: not using MMCONFIG
[    0.599368] PCI: Using configuration type 1 for base access
[    0.604995] PCI: Using configuration type 1 for extended access
[    0.611441] mtrr: your CPUs had inconsistent variable MTRR settings
[    0.617791] mtrr: probably your BIOS does not setup all CPUs.
[    0.623592] mtrr: corrected configuration.
[    0.632001] ACPI: Added _OSI(Module Device)
[    0.636307] ACPI: Added _OSI(Processor Device)
[    0.640809] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.645569] ACPI: Added _OSI(Processor Aggregator Device)
[    0.652898] ACPI: Executed 1 blocks of module-level executable AML code
[    0.663767] ACPI: Interpreter enabled
[    0.667523] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20140926/hwxface-580)
[    0.676966] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20140926/hwxface-580)
[    0.686380] ACPI: (supports S0 S3 S4 S5)
[    0.690361] ACPI: Using IOAPIC for interrupt routing
[    0.695534] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)
[    0.704980] PCI: MMCONFIG at [mem 0xe0000000-0xefffffff] reserved in ACPI motherboard resources
[    0.714154] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.723910] [Firmware Bug]: ACPI: No _BQC method, cannot determine initial brightness
[    0.732044] [Firmware Bug]: ACPI: No _BQC method, cannot determine initial brightness
[    0.740140] [Firmware Bug]: ACPI: No _BQC method, cannot determine initial brightness
[    0.780225] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.786502] acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    0.795058] acpi PNP0A03:00: _OSC: OS now controls [PCIeHotplug PME AER PCIeCapability]
[    0.803458] acpi PNP0A03:00: host bridge window [0x0-0x0] (ignored, not CPU addressable)
[    0.811926] PCI host bridge to bus 0000:00
[    0.816084] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.821622] pci_bus 0000:00: root bus resource [io  0x0000-0x03af]
[    0.827893] pci_bus 0000:00: root bus resource [io  0x03e0-0x0cf7]
[    0.834126] pci_bus 0000:00: root bus resource [io  0x03b0-0x03df]
[    0.840359] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
[    0.846593] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[    0.853522] pci_bus 0000:00: root bus resource [mem 0x000c0000-0x000dffff]
[    0.860482] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
[    0.868258] pci 0000:00:03.1: System wakeup disabled by ACPI
[    0.874286] pci 0000:00:10.0: System wakeup disabled by ACPI
[    0.880248] pci 0000:00:10.1: System wakeup disabled by ACPI
[    0.886258] pci 0000:00:12.0: System wakeup disabled by ACPI
[    0.892178] pci 0000:00:12.2: System wakeup disabled by ACPI
[    0.898058] pci 0000:00:13.0: System wakeup disabled by ACPI
[    0.903931] pci 0000:00:13.2: System wakeup disabled by ACPI
[    0.909983] pci 0000:00:14.2: System wakeup disabled by ACPI
[    0.915976] pci 0000:00:14.4: System wakeup disabled by ACPI
[    0.921832] pci 0000:00:14.5: System wakeup disabled by ACPI
[    0.930037] pci 0000:00:03.1: PCI bridge to [bus 01]
[    0.935168] pci 0000:00:14.4: PCI bridge to [bus 02] (subtractive decode)
[    0.942834] ACPI: PCI Interrupt Link [LNKA] (IRQs 4 5 7 10 11 14 15) *0
[    0.950040] ACPI: PCI Interrupt Link [LNKB] (IRQs 4 5 7 10 11 14 15) *0
[    0.957262] ACPI: PCI Interrupt Link [LNKC] (IRQs 4 5 7 10 11 14 15) *0
[    0.964463] ACPI: PCI Interrupt Link [LNKD] (IRQs 4 5 7 10 11 14 15) *0
[    0.971696] ACPI: PCI Interrupt Link [LNKE] (IRQs 4 5 7 10 11 14 15) *0
[    0.978888] ACPI: PCI Interrupt Link [LNKF] (IRQs 4 5 7 10 11 14 15) *0
[    0.986099] ACPI: PCI Interrupt Link [LNKG] (IRQs 4 5 7 10 11 14 15) *0
[    0.993303] ACPI: PCI Interrupt Link [LNKH] (IRQs 4 5 7 10 11 14 15) *0
[    1.000829] vgaarb: setting as boot device: PCI:0000:00:01.0
[    1.006603] vgaarb: device added: PCI:0000:00:01.0,decodes=io+mem,owns=io+mem,locks=none
[    1.014793] vgaarb: loaded
[    1.017556] vgaarb: bridge control possible 0000:00:01.0
[    1.023081] SCSI subsystem initialized
[    1.026983] ACPI: bus type USB registered
[    1.031067] usbcore: registered new interface driver usbfs
[    1.036614] usbcore: registered new interface driver hub
[    1.042039] usbcore: registered new device driver usb
[    1.047276] PCI: Using ACPI for IRQ routing
[    1.058253] NetLabel: Initializing
[    1.061715] NetLabel:  domain hash size = 128
[    1.066121] NetLabel:  protocols = UNLABELED CIPSOv4
[    1.071145] NetLabel:  unlabeled traffic allowed by default
[    1.076802] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    1.081924] hpet0: 3 comparators, 32-bit 14.318180 MHz counter
[    1.089926] Switched to clocksource hpet
[    1.096389] AppArmor: AppArmor Filesystem Enabled
[    1.101253] pnp: PnP ACPI init
[    1.104510] system 00:00: [mem 0xe0000000-0xefffffff] has been reserved
[    1.111335] system 00:01: [mem 0x80000000-0xbfffffff] has been reserved
[    1.118078] system 00:02: [mem 0xfeb80000-0xfebfffff] could not be reserved
[    1.125346] system 00:03: [io  0x0220-0x0227] has been reserved
[    1.131395] system 00:03: [io  0x0228-0x0237] has been reserved
[    1.137368] system 00:03: [io  0x0a20-0x0a2f] has been reserved
[    1.144382] system 00:08: [io  0x04d0-0x04d1] has been reserved
[    1.150687] system 00:09: [io  0x04d0-0x04d1] has been reserved
[    1.156728] system 00:09: [io  0x040b] has been reserved
[    1.162132] system 00:09: [io  0x04d6] has been reserved
[    1.167500] system 00:09: [io  0x0c00-0x0c01] has been reserved
[    1.173511] system 00:09: [io  0x0c14] has been reserved
[    1.178912] system 00:09: [io  0x0c50-0x0c51] has been reserved
[    1.184885] system 00:09: [io  0x0c52] has been reserved
[    1.190290] system 00:09: [io  0x0c6c] has been reserved
[    1.195691] system 00:09: [io  0x0c6f] has been reserved
[    1.201060] system 00:09: [io  0x0cd0-0x0cd1] has been reserved
[    1.207069] system 00:09: [io  0x0cd2-0x0cd3] has been reserved
[    1.213078] system 00:09: [io  0x0cd4-0x0cd5] has been reserved
[    1.219054] system 00:09: [io  0x0cd6-0x0cd7] has been reserved
[    1.225061] system 00:09: [io  0x0cd8-0x0cdf] has been reserved
[    1.231072] system 00:09: [io  0x0800-0x089f] could not be reserved
[    1.237390] system 00:09: [io  0x0b20-0x0b3f] has been reserved
[    1.243402] system 00:09: [io  0x0900-0x090f] has been reserved
[    1.249376] system 00:09: [io  0x0910-0x091f] has been reserved
[    1.255350] system 00:09: [io  0xfe00-0xfefe] has been reserved
[    1.261359] system 00:09: [mem 0xfec00000-0xfec00fff] could not be reserved
[    1.268372] system 00:09: [mem 0xfee00000-0xfee00fff] has been reserved
[    1.275074] system 00:09: [mem 0xfed80000-0xfed8ffff] has been reserved
[    1.281776] system 00:09: [mem 0xfed61000-0xfed70fff] has been reserved
[    1.288442] system 00:09: [mem 0xfec10000-0xfec10fff] has been reserved
[    1.295144] system 00:09: [mem 0xfed00000-0xfed00fff] could not be reserved
[    1.302158] system 00:09: [mem 0xff000000-0xffffffff] has been reserved
[    1.309038] pnp: PnP ACPI: found 10 devices
[    1.320395] pci 0000:00:03.1: PCI bridge to [bus 01]
[    1.325425] pci 0000:00:03.1:   bridge window [io  0xe000-0xefff]
[    1.331573] pci 0000:00:03.1:   bridge window [mem 0xfea00000-0xfeafffff]
[    1.338413] pci 0000:00:03.1:   bridge window [mem 0xd0800000-0xd08fffff 64bit pref]
[    1.346247] pci 0000:00:14.4: PCI bridge to [bus 02]
[    1.351420] NET: Registered protocol family 2
[    1.356165] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    1.363956] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    1.370906] TCP: Hash tables configured (established 131072 bind 65536)
[    1.377661] TCP: reno registered
[    1.380966] UDP hash table entries: 8192 (order: 6, 262144 bytes)
[    1.387189] UDP-Lite hash table entries: 8192 (order: 6, 262144 bytes)
[    1.394053] NET: Registered protocol family 1
[    1.789712] Unpacking initramfs...
[    2.214547] Freeing initrd memory: 39820K (ffff88003322a000 - ffff88003590d000)
[    2.223483] AMD-Vi: IOMMU performance counters supported
[    2.228978] pci 0000:00:00.2: can't derive routing for PCI INT A
[    2.235113] pci 0000:00:00.2: PCI INT A: no GSI
[    2.240159] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    2.245590] AMD-Vi:  Extended features:  PPR GT IA PC
[    2.250933] AMD-Vi: Interrupt remapping enabled
[    2.255526] pci 0000:00:00.2: irqdomain: try allocate 1 MSI IRQs
[    2.261598] pci 0000:00:00.2: allocated IRQ24 for MSI
[    2.267467] AMD-Vi: Using passthrough domain for device 0000:00:01.0
[    2.280548] AMD-Vi: Lazy IO/TLB flushing enabled
[    2.285741] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    2.292337] software IO TLB [mem 0x78b73000-0x7cb73000] (64MB) mapped at [ffff880078b73000-ffff88007cb72fff]
[    2.302677] perf: AMD NB counters detected
[    2.306959] perf: amd_iommu: Detected. (2 banks, 4 counters/bank)
[    2.313139] microcode: CPU0: patch_level=0x06003104
[    2.318112] microcode: CPU1: patch_level=0x06003104
[    2.323193] microcode: CPU2: patch_level=0x06003104
[    2.328136] microcode: CPU3: patch_level=0x06003104
[    2.333163] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[    2.342070] LVT offset 0 assigned for vector 0x400
[    2.347037] perf: AMD IBS detected (0x000001ff)
[    2.352010] Scanning for low memory corruption every 60 seconds
[    2.358368] futex hash table entries: 1024 (order: 4, 65536 bytes)
[    2.364766] audit: initializing netlink subsys (disabled)
[    2.370276] audit: type=2000 audit(1415181294.621:1): initialized
[    2.377009] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    2.383450] zpool: loaded
[    2.386139] zbud: loaded
[    2.389009] VFS: Disk quotas dquot_6.5.2
[    2.393136] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    2.399936] msgmni has been set to 29886
[    2.404477] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    2.412051] io scheduler noop registered
[    2.416044] io scheduler deadline registered
[    2.420483] io scheduler cfq registered (default)
[    2.425434] pcieport 0000:00:03.1: irqdomain: try allocate 1 MSI IRQs
[    2.431952] pcieport 0000:00:03.1: allocated IRQ25 for MSI
[    2.437603] pcieport 0000:00:03.1: Signaling PME through PCIe PME interrupt
[    2.444645] pci 0000:01:00.0: Signaling PME through PCIe PME interrupt
[    2.451273] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    2.456953] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    2.463694] GHES: HEST is not enabled!
[    2.467715] Serial: 8250/16550 driver, 32 ports, IRQ sharing disabled
[    2.494801] serial 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    2.504654] Non-volatile memory driver v1.3
[    2.508900] Linux agpgart interface v0.103
[    2.513260] irqdomain: AHCI 8 ports, 8 MSI
[    2.517456] ahci 0000:00:11.0: irqdomain: try allocate 8 MSI IRQs
[    2.523643] ahci 0000:00:11.0: try to alloc nvec 8
[    2.528510] ahci 0000:00:11.0: allocated IRQ26 for MSI
[    2.533713] ------------[ cut here ]------------
[    2.538420] kernel BUG at drivers/pci/msi.c:1219!
[    2.543173] invalid opcode: 0000 [#1] PREEMPT SMP 
[    2.548161] Modules linked in:
[    2.551321] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.0-rc3+ #5
[    2.557762] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./F2A88XM-HD3, BIOS F6 05/28/2014
[    2.567953] task: ffff88042b54c010 ti: ffff88042b550000 task.ti: ffff88042b550000
[    2.575548] RIP: 0010:[<ffffffff81369429>]  [<ffffffff81369429>] msi_irq_domain_alloc_irqs+0x1e9/0x220
[    2.585012] RSP: 0000:ffff88042b553ab8  EFLAGS: 00010282
[    2.590413] RAX: 00000000ffffffea RBX: ffff8804253e27c0 RCX: 0000000000000000
[    2.597591] RDX: 0000000000000000 RSI: 0000000000000022 RDI: 0000000000000026
[    2.604814] RBP: ffff88042b553b08 R08: 0000000000000001 R09: ffff88043dc00008
[    2.611999] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000001a
[    2.619222] R13: ffff88042ad54000 R14: ffff88042b553b20 R15: 0000000000000004
[    2.626442] FS:  0000000000000000(0000) GS:ffff88043ec00000(0000) knlGS:0000000000000000
[    2.634641] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.640477] CR2: 0000000000000000 CR3: 0000000001c16000 CR4: 00000000000407f0
[    2.647654] Stack:
[    2.649726]  ffff88042b5104c0 00000005ffffffff ffff88042ad54098 ffff88042ad54880
[    2.657378]  ffff88042b553ae8 ffff88042ad54000 0000000000000005 0000000000000008
[    2.665050]  00000000000000ff ffff88042ad54880 ffff88042b553b68 ffffffff81037aa2
[    2.672713] Call Trace:
[    2.675217]  [<ffffffff81037aa2>] native_setup_msi_irqs+0x52/0xa0
[    2.681365]  [<ffffffff8100770a>] arch_setup_msi_irqs+0xa/0x10
[    2.687288]  [<ffffffff81368315>] pci_enable_msi_range+0x105/0x220
[    2.693520]  [<ffffffff8146060a>] ahci_init_one+0xa6a/0xb30
[    2.699184]  [<ffffffff8134e760>] local_pci_probe+0x40/0xa0
[    2.704813]  [<ffffffff8134f9b5>] ? pci_match_device+0xe5/0x110
[    2.710784]  [<ffffffff8134faf1>] pci_device_probe+0xd1/0x130
[    2.716613]  [<ffffffff81413e4b>] driver_probe_device+0x8b/0x3d0
[    2.722707]  [<ffffffff81414263>] __driver_attach+0x93/0xa0
[    2.728335]  [<ffffffff814141d0>] ? __device_attach+0x40/0x40
[    2.734172]  [<ffffffff81411ec3>] bus_for_each_dev+0x63/0xa0
[    2.739920]  [<ffffffff814138c9>] driver_attach+0x19/0x20
[    2.745369]  [<ffffffff814134e0>] bus_add_driver+0x180/0x250
[    2.751082]  [<ffffffff81d2fd60>] ? ata_sff_init+0x33/0x33
[    2.756657]  [<ffffffff81414abf>] driver_register+0x5f/0xf0
[    2.762277]  [<ffffffff8134e107>] __pci_register_driver+0x47/0x50
[    2.768459]  [<ffffffff81d2fd79>] ahci_pci_driver_init+0x19/0x1b
[    2.774556]  [<ffffffff810002f4>] do_one_initcall+0xb4/0x1f0
[    2.780271]  [<ffffffff81095e23>] ? __wake_up+0x43/0x60
[    2.785587]  [<ffffffff81ce7248>] kernel_init_freeable+0x197/0x21f
[    2.791821]  [<ffffffff81ce6983>] ? initcall_blacklist+0xc0/0xc0
[    2.797884]  [<ffffffff815fe760>] ? rest_init+0x90/0x90
[    2.803199]  [<ffffffff815fe769>] kernel_init+0x9/0xf0
[    2.808393]  [<ffffffff816143fc>] ret_from_fork+0x7c/0xb0
[    2.813839]  [<ffffffff815fe760>] ? rest_init+0x90/0x90
[    2.819154] Code: 83 c4 28 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 83 7b 10 02 19 c0 48 83 c4 28 5b 41 5c 41 5d 41 5e 83 e0 e3 41 5f 83 c0 01 5d c3 <0f> 0b 48 8b 75 c0 46 8d 44 20 ff 44 89 e1 48 c7 c2 a1 9f a2 81 
[    2.841251] RIP  [<ffffffff81369429>] msi_irq_domain_alloc_irqs+0x1e9/0x220
[    2.848351]  RSP <ffff88042b553ab8>
[    2.851916] ---[ end trace a98410f04540cfbe ]---
[    2.856595] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    2.856595] 
[    2.865976] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[    2.876257] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    2.876257] 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
  2014-11-05  9:58         ` Joerg Roedel
@ 2014-11-05 10:28           ` Jiang Liu
  2014-11-05 11:10             ` Joerg Roedel
  0 siblings, 1 reply; 65+ messages in thread
From: Jiang Liu @ 2014-11-05 10:28 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

Hi Joerg,
	Seems like a silly bug, could you please help to try this
fix?
Regards!
Gerry
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 7423ee16972f..62ba8a6f6e79 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1211,7 +1211,7 @@ int msi_irq_domain_alloc_irqs(struct irq_domain
*domain, int type,
                                msidesc->nvec_used > 1) ?  1 : -ENOSPC;
                }
                for (i = 0; i < msidesc->nvec_used; i++)
-                       irq_set_msi_desc_off(virq + i, i, msidesc);
+                       irq_set_msi_desc_off(virq, i, msidesc);
        }

        list_for_each_entry(msidesc, &dev->msi_list, list)

On 2014/11/5 17:58, Joerg Roedel wrote:
> On Wed, Nov 05, 2014 at 05:41:50PM +0800, Jiang Liu wrote:
>> 	Could you please help to apply the attached patch and send me
>> console outputs?
> 
> Sure, here it is.
> 

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
  2014-11-05 10:28           ` Jiang Liu
@ 2014-11-05 11:10             ` Joerg Roedel
  0 siblings, 0 replies; 65+ messages in thread
From: Joerg Roedel @ 2014-11-05 11:10 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

Hi Jiang,

On Wed, Nov 05, 2014 at 06:28:45PM +0800, Jiang Liu wrote:
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 7423ee16972f..62ba8a6f6e79 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -1211,7 +1211,7 @@ int msi_irq_domain_alloc_irqs(struct irq_domain
> *domain, int type,
>                                 msidesc->nvec_used > 1) ?  1 : -ENOSPC;
>                 }
>                 for (i = 0; i < msidesc->nvec_used; i++)
> -                       irq_set_msi_desc_off(virq + i, i, msidesc);
> +                       irq_set_msi_desc_off(virq, i, msidesc);
>         }
> 
>         list_for_each_entry(msidesc, &dev->msi_list, list)

Yes, this fixes the issue, thanks :)


	Joerg


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 18/31] PCI/MSI, trivial: Fix minor syntax issues according to coding styles
  2014-11-04 12:01 ` [Patch Part2 v4 18/31] PCI/MSI, trivial: Fix minor syntax issues according to coding styles Jiang Liu
@ 2014-11-05 22:10   ` Bjorn Helgaas
  2014-11-05 22:10   ` Bjorn Helgaas
  1 sibling, 0 replies; 65+ messages in thread
From: Bjorn Helgaas @ 2014-11-05 22:10 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Jiri Kosina, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86,
	linux-kernel, linux-pci, linux-acpi, linux-arm-kernel

On Tue, Nov 04, 2014 at 08:01:52PM +0800, Jiang Liu wrote:
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>

Needs a changelog, even if it's similar to the topic.

Speaking of the topic, please run "git log --oneline drivers/pci/msi.c" and
make yours look similar to the others.  This doesn't really fix a syntax
issue; the existing code is syntactically correct.  It's only a style
issue.

I'd say something like:

  PCI/MSI: Remove unnecessary braces around single statements

  Per Documentation/CodingStyle, don't use braces around single statements.

> ---
>  drivers/pci/msi.c |    9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 9fab30af0e75..fb2ccb536324 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -244,9 +244,8 @@ void default_restore_msi_irqs(struct pci_dev *dev)
>  {
>  	struct msi_desc *entry;
>  
> -	list_for_each_entry(entry, &dev->msi_list, list) {
> +	list_for_each_entry(entry, &dev->msi_list, list)
>  		default_restore_msi_irq(dev, entry->irq);
> -	}
>  }
>  
>  void __read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
> @@ -451,9 +450,8 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
>  				PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
>  
>  	arch_restore_msi_irqs(dev);
> -	list_for_each_entry(entry, &dev->msi_list, list) {
> +	list_for_each_entry(entry, &dev->msi_list, list)
>  		msix_mask_irq(entry, entry->masked);
> -	}
>  
>  	msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>  }
> @@ -497,9 +495,8 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>  	int count = 0;
>  
>  	/* Determine how many msi entries we have */
> -	list_for_each_entry(entry, &pdev->msi_list, list) {
> +	list_for_each_entry(entry, &pdev->msi_list, list)
>  		++num_msi;
> -	}
>  	if (!num_msi)
>  		return 0;
>  
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 18/31] PCI/MSI, trivial: Fix minor syntax issues according to coding styles
  2014-11-04 12:01 ` [Patch Part2 v4 18/31] PCI/MSI, trivial: Fix minor syntax issues according to coding styles Jiang Liu
  2014-11-05 22:10   ` Bjorn Helgaas
@ 2014-11-05 22:10   ` Bjorn Helgaas
  1 sibling, 0 replies; 65+ messages in thread
From: Bjorn Helgaas @ 2014-11-05 22:10 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Jiri Kosina, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86,
	linux-kernel, linux-pci, linux-acpi, linux-arm-kernel

On Tue, Nov 04, 2014 at 08:01:52PM +0800, Jiang Liu wrote:
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>

Oh, I forgot:

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  drivers/pci/msi.c |    9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 9fab30af0e75..fb2ccb536324 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -244,9 +244,8 @@ void default_restore_msi_irqs(struct pci_dev *dev)
>  {
>  	struct msi_desc *entry;
>  
> -	list_for_each_entry(entry, &dev->msi_list, list) {
> +	list_for_each_entry(entry, &dev->msi_list, list)
>  		default_restore_msi_irq(dev, entry->irq);
> -	}
>  }
>  
>  void __read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
> @@ -451,9 +450,8 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
>  				PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
>  
>  	arch_restore_msi_irqs(dev);
> -	list_for_each_entry(entry, &dev->msi_list, list) {
> +	list_for_each_entry(entry, &dev->msi_list, list)
>  		msix_mask_irq(entry, entry->masked);
> -	}
>  
>  	msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>  }
> @@ -497,9 +495,8 @@ static int populate_msi_sysfs(struct pci_dev *pdev)
>  	int count = 0;
>  
>  	/* Determine how many msi entries we have */
> -	list_for_each_entry(entry, &pdev->msi_list, list) {
> +	list_for_each_entry(entry, &pdev->msi_list, list)
>  		++num_msi;
> -	}
>  	if (!num_msi)
>  		return 0;
>  
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 19/31] PCI/MSI: Simplify PCI MSI code by initializing msi_desc.nvec_used earlier
  2014-11-04 12:01 ` [Patch Part2 v4 19/31] PCI/MSI: Simplify PCI MSI code by initializing msi_desc.nvec_used earlier Jiang Liu
@ 2014-11-05 22:35   ` Bjorn Helgaas
  0 siblings, 0 replies; 65+ messages in thread
From: Bjorn Helgaas @ 2014-11-05 22:35 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Joerg Roedel, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel, iommu

On Tue, Nov 04, 2014 at 08:01:53PM +0800, Jiang Liu wrote:

  PCI/MSI: Initialize msi_desc.nvec_used earlier to simplify code

> Simplify PCI MSI code by initializing msi_desc.nvec_used and
> msi_desc.msi_attrib.mutiple when create MSI descriptors.

multiple
when creating

> Also remove redundant checks in IRQ remapping drivers, PCI MSI core
> already guarattees these.

guarantees

> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  drivers/iommu/irq_remapping.c |    8 --------
>  drivers/pci/msi.c             |   40 +++++++++++++++-------------------------
>  2 files changed, 15 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index 176ff4372b7d..32fe5b1322d0 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -69,19 +69,13 @@ static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
>  	unsigned int irq;
>  	struct msi_desc *msidesc;
>  
> -	WARN_ON(!list_is_singular(&dev->msi_list));
>  	msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
> -	WARN_ON(msidesc->irq);
> -	WARN_ON(msidesc->msi_attrib.multiple);
> -	WARN_ON(msidesc->nvec_used);
>  
>  	irq = irq_alloc_hwirqs(nvec, dev_to_node(&dev->dev));
>  	if (irq == 0)
>  		return -ENOSPC;
>  
>  	nvec_pow2 = __roundup_pow_of_two(nvec);
> -	msidesc->nvec_used = nvec;
> -	msidesc->msi_attrib.multiple = ilog2(nvec_pow2);
>  	for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
>  		if (!sub_handle) {
>  			index = msi_alloc_remapped_irq(dev, irq, nvec_pow2);
> @@ -109,8 +103,6 @@ error:
>  	 * IRQs from tearing down again in default_teardown_msi_irqs()
>  	 */
>  	msidesc->irq = 0;
> -	msidesc->nvec_used = 0;
> -	msidesc->msi_attrib.multiple = 0;
>  
>  	return ret;
>  }
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index fb2ccb536324..afe974600c7d 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -85,19 +85,13 @@ int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>   */
>  void default_teardown_msi_irqs(struct pci_dev *dev)
>  {
> +	int i;
>  	struct msi_desc *entry;
>  
> -	list_for_each_entry(entry, &dev->msi_list, list) {
> -		int i, nvec;
> -		if (entry->irq == 0)
> -			continue;
> -		if (entry->nvec_used)
> -			nvec = entry->nvec_used;
> -		else
> -			nvec = 1 << entry->msi_attrib.multiple;
> -		for (i = 0; i < nvec; i++)
> -			arch_teardown_msi_irq(entry->irq + i);
> -	}
> +	list_for_each_entry(entry, &dev->msi_list, list)
> +		if (entry->irq)
> +			for (i = 0; i < entry->nvec_used; i++)
> +				arch_teardown_msi_irq(entry->irq + i);
>  }
>  
>  void __weak arch_teardown_msi_irqs(struct pci_dev *dev)
> @@ -353,19 +347,12 @@ static void free_msi_irqs(struct pci_dev *dev)
>  	struct msi_desc *entry, *tmp;
>  	struct attribute **msi_attrs;
>  	struct device_attribute *dev_attr;
> -	int count = 0;
> +	int i, count = 0;
>  
> -	list_for_each_entry(entry, &dev->msi_list, list) {
> -		int i, nvec;
> -		if (!entry->irq)
> -			continue;
> -		if (entry->nvec_used)
> -			nvec = entry->nvec_used;
> -		else
> -			nvec = 1 << entry->msi_attrib.multiple;
> -		for (i = 0; i < nvec; i++)
> -			BUG_ON(irq_has_action(entry->irq + i));
> -	}
> +	list_for_each_entry(entry, &dev->msi_list, list)
> +		if (entry->irq)
> +			for (i = 0; i < entry->nvec_used; i++)
> +				BUG_ON(irq_has_action(entry->irq + i));
>  
>  	arch_teardown_msi_irqs(dev);
>  
> @@ -556,7 +543,7 @@ error_attrs:
>  	return ret;
>  }
>  
> -static struct msi_desc *msi_setup_entry(struct pci_dev *dev)
> +static struct msi_desc *msi_setup_entry(struct pci_dev *dev, int nvec)
>  {
>  	u16 control;
>  	struct msi_desc *entry;
> @@ -574,6 +561,8 @@ static struct msi_desc *msi_setup_entry(struct pci_dev *dev)
>  	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
>  	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
>  	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
> +	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
> +	entry->nvec_used		= nvec;
>  
>  	if (control & PCI_MSI_FLAGS_64BIT)
>  		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_64;
> @@ -606,7 +595,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec)
>  
>  	msi_set_enable(dev, 0);	/* Disable MSI during set up */
>  
> -	entry = msi_setup_entry(dev);
> +	entry = msi_setup_entry(dev, nvec);
>  	if (!entry)
>  		return -ENOMEM;
>  
> @@ -677,6 +666,7 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
>  		entry->msi_attrib.entry_nr	= entries[i].entry;
>  		entry->msi_attrib.default_irq	= dev->irq;
>  		entry->mask_base		= base;
> +		entry->nvec_used		= 1;
>  
>  		list_add_tail(&entry->list, &dev->msi_list);
>  	}
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 20/31] PCI/MSI: Kill redundant calling for irq_set_msi_desc() for MSIx interrupts
  2014-11-04 12:01 ` [Patch Part2 v4 20/31] PCI/MSI: Kill redundant calling for irq_set_msi_desc() for MSIx interrupts Jiang Liu
@ 2014-11-05 22:45   ` Bjorn Helgaas
  2014-11-06  1:32     ` Yijing Wang
  0 siblings, 1 reply; 65+ messages in thread
From: Bjorn Helgaas @ 2014-11-05 22:45 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Konrad Rzeszutek Wilk, Andrew Morton,
	Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

On Tue, Nov 04, 2014 at 08:01:54PM +0800, Jiang Liu wrote:
> It's arch_setup_msi_irq()/arch_setup_msi_irqs()'s responsibility to call
> irq_set_msi_desc() to associate IRQ descriptors and MSI descriptors,
> so kill the redundant call of irq_set_msi_desc() for MSIx interrupts
> in PCI MSI core.

"MSI-X" in English text, "msix" in code.

The default arch_setup_msi_irq() in drivers/pci/msi.c doesn't call
irq_set_msi_desc().  Does it happen somewhere inside chip->setup_irq()?

I don't know how to verify that there are calls in all the places needed.
That makes me wonder if the factoring is wrong -- maybe irq_set_msi_desc()
could be done in some common place.

> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> ---
>  drivers/pci/msi.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index afe974600c7d..da181c59394b 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -685,7 +685,6 @@ static void msix_program_entries(struct pci_dev *dev,
>  						PCI_MSIX_ENTRY_VECTOR_CTRL;
>  
>  		entries[i].vector = entry->irq;
> -		irq_set_msi_desc(entry->irq, entry);
>  		entry->masked = readl(entry->mask_base + offset);
>  		msix_mask_irq(entry, 1);
>  		i++;
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-04 12:01 ` [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain Jiang Liu
@ 2014-11-05 23:09   ` Bjorn Helgaas
  2014-11-06  1:58     ` Yijing Wang
  2014-11-06  4:58     ` Jiang Liu
  2014-11-06 10:01   ` Thomas Gleixner
  1 sibling, 2 replies; 65+ messages in thread
From: Bjorn Helgaas @ 2014-11-05 23:09 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Yijing Wang, Alexander Gordeev,
	Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

On Tue, Nov 04, 2014 at 08:01:55PM +0800, Jiang Liu wrote:

In your topic:

  PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain

There's no need to repeat "PCI MSI".  Please run "git log --oneline
drivers/pci/msi.c" and make your similar (capitalize the first word).

> Enhance PCI MSI core to support hierarchy irqdomain, so the common
> code could be shared among architectures.
> 
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> ---
>  drivers/pci/Kconfig |    4 ++
>  drivers/pci/msi.c   |  126 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/msi.h |   11 +++++
>  3 files changed, 141 insertions(+)
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index b9db0f2ce11f..022e89745f86 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -16,6 +16,10 @@ config PCI_MSI
>  
>  	   If you don't know what to do here, say Y.
>  
> +config PCI_MSI_IRQ_DOMAIN
> +	bool
> +	depends on PCI_MSI && IRQ_DOMAIN_HIERARCHY
> +
>  config PCI_DEBUG
>  	bool "PCI Debugging"
>  	depends on PCI && DEBUG_KERNEL
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index da181c59394b..7423ee16972f 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -19,6 +19,7 @@
>  #include <linux/errno.h>
>  #include <linux/io.h>
>  #include <linux/slab.h>
> +#include <linux/irqdomain.h>
>  
>  #include "pci.h"
>  
> @@ -1098,3 +1099,128 @@ int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
>  	return nvec;
>  }
>  EXPORT_SYMBOL(pci_enable_msix_range);
> +
> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN

Space, not tab.

> +static inline irq_hw_number_t
> +msi_get_hwirq(struct pci_dev *pdev, struct msi_desc *msidesc)

The convention in this file is "struct pci_dev *dev".  And "struct msi_desc
*desc" (or maybe "*entry").  Try to converge things, not diverge them.

> +{
> +	return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
> +		PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
> +		(pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;

Where does this bit layout come from?  Is this defined in the spec
somewhere?  A reference would help.

> +}
> +
> +static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
> +			    unsigned int nr_irqs, void *arg)
> +{
> +	int i, ret;
> +	irq_hw_number_t hwirq = arch_msi_irq_domain_get_hwirq(arg);
> +
> +	if (irq_find_mapping(domain, hwirq) > 0)
> +		return -EEXIST;
> +
> +	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
> +	if (ret >= 0)

	if (ret < 0)
		return ret;

and un-indent the mainline code below.  Then it's obvious that this is the
normal case, not the error case.

> +		for (i = 0; i < nr_irqs; i++) {
> +			irq_domain_set_hwirq_and_chip(domain, virq + i,
> +					hwirq + i, &msi_chip, (void *)(long)i);
> +			__irq_set_handler(virq + i, handle_edge_irq, 0, "edge");
> +		}
> +
> +	return ret;
> +}
> +
> +static void msi_domain_free(struct irq_domain *domain, unsigned int virq,
> +			    unsigned int nr_irqs)
> +{
> +	int i;
> +
> +	for (i = 0; i < nr_irqs; i++) {
> +		struct msi_desc *msidesc = irq_get_msi_desc(virq);
> +
> +		if (msidesc)
> +			msidesc->irq = 0;
> +	}
> +	irq_domain_free_irqs_top(domain, virq, nr_irqs);
> +}
> +
> +static int msi_domain_activate(struct irq_domain *domain,
> +			       struct irq_data *irq_data)
> +{
> +	int ret = 0;
> +	struct msi_msg msg;
> +
> +	/*
> +	 * irq_data->chip_data is MSI/MSIx offset.

"MSI-X", as you wrote on the next line.

> +	 * MSI-X message is written per-IRQ, the offset is always 0.
> +	 * MSI message denotes a contiguous group of IRQs, written for 0th IRQ.
> +	 */
> +	if (!irq_data->chip_data) {

	if (irq_data->chip_data)
		return 0;

and un-indent the mainline code below, and drop the "ret = 0" init above.

> +		ret = irq_chip_compose_msi_msg(irq_data, &msg);
> +		if (ret == 0)

	if (ret)
		return ret;

> +			write_msi_msg(irq_data->irq, &msg);
> +	}
> +
> +	return ret;
	return 0;
> +}
> +
> +static int msi_domain_deactivate(struct irq_domain *domain,
> +				 struct irq_data *irq_data)
> +{
> +	struct msi_msg msg;
> +
> +	if (irq_data->chip_data) {
> +		memset(&msg, 0, sizeof(msg));
> +		write_msi_msg(irq_data->irq, &msg);
> +	}
> +
> +	return 0;
> +}
> +
> +static struct irq_domain_ops msi_domain_ops = {
> +	.alloc = msi_domain_alloc,
> +	.free = msi_domain_free,
> +	.activate = msi_domain_activate,
> +	.deactivate = msi_domain_deactivate,
> +};
> +
> +struct irq_domain *msi_create_irq_domain(struct irq_domain *parent)
> +{
> +	struct irq_domain *domain;
> +
> +	domain = irq_domain_add_tree(NULL, &msi_domain_ops, NULL);
> +	if (domain)

	if (!domain)
		return NULL;

and un-indent this:

> +		domain->parent = parent;
> +
> +	return domain;
> +}
> +
> +int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
> +			      struct pci_dev *dev, void *arg)
> +{
> +	int i, virq;
> +	struct msi_desc *msidesc;
> +	int node = dev_to_node(&dev->dev);
> +
> +	list_for_each_entry(msidesc, &dev->msi_list, list) {
> +		arch_msi_irq_domain_set_hwirq(arg, msi_get_hwirq(dev, msidesc));
> +		virq = irq_domain_alloc_irqs(domain, msidesc->nvec_used,
> +					     node, arg);
> +		if (virq < 0) {
> +			/* Special handling for pci_enable_msi_range(). */
> +			return (type == PCI_CAP_ID_MSI &&
> +				msidesc->nvec_used > 1) ?  1 : -ENOSPC;	

I think "if" would be easier to read than this ternary expression.

> +		}
> +		for (i = 0; i < msidesc->nvec_used; i++)
> +			irq_set_msi_desc_off(virq + i, i, msidesc);
> +	}
> +
> +	list_for_each_entry(msidesc, &dev->msi_list, list)
> +		if (msidesc->nvec_used == 1)
> +			dev_dbg(&dev->dev, "irq %d for MSI/MSI-X\n", virq);
> +		else
> +			dev_dbg(&dev->dev, "irq [%d-%d] for MSI/MSI-X\n",
> +				virq, virq + msidesc->nvec_used - 1);
> +
> +	return 0;
> +}
> +#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 44f4746d033b..05dcd425f82b 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -75,4 +75,15 @@ struct msi_chip {
>  	void (*teardown_irq)(struct msi_chip *chip, unsigned int irq);
>  };
>  
> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN

Use a space here, not a tab.

> +extern struct irq_chip msi_chip;

I don't think "msi_chip" is a good name.  "Chip" only hints that it's a
semiconductor integrated circuit; it doesn't say anything about what it
does.  I've suggested "msi_controller" elsewhere.

Why does this need to be exported?  And why should there be only one in a
system?

> +extern struct irq_domain *msi_create_irq_domain(struct irq_domain *parent);
> +extern int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
> +				     struct pci_dev *dev, void *arg);
> +
> +extern irq_hw_number_t arch_msi_irq_domain_get_hwirq(void *arg);
> +extern void arch_msi_irq_domain_set_hwirq(void *arg, irq_hw_number_t hwirq);

Look at the rest of the file and notice that the existing code does not use
"extern" on function declarations.

> +#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */

Use a space here (not a tab), like the #endif just below.

>  #endif /* LINUX_MSI_H */
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 01/31] irqdomain: Introduce new interfaces to support hierarchy irqdomains
  2014-11-04 12:01 ` [Patch Part2 v4 01/31] irqdomain: Introduce new interfaces to support hierarchy irqdomains Jiang Liu
@ 2014-11-05 23:48   ` Thomas Gleixner
  2014-11-06  6:09     ` Jiang Liu
  0 siblings, 1 reply; 65+ messages in thread
From: Thomas Gleixner @ 2014-11-05 23:48 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Ingo Molnar, H. Peter Anvin,
	Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Jonathan Corbet, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86,
	linux-kernel, linux-pci, linux-acpi, linux-arm-kernel, linux-doc

On Tue, 4 Nov 2014, Jiang Liu wrote:
>  /* Number of irqs reserved for a legacy isa controller */
>  #define NUM_ISA_INTERRUPTS	16
> @@ -64,6 +66,16 @@ struct irq_domain_ops {
>  	int (*xlate)(struct irq_domain *d, struct device_node *node,
>  		     const u32 *intspec, unsigned int intsize,
>  		     unsigned long *out_hwirq, unsigned int *out_type);
> +
> +#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
> +	/* extended V2 interfaces to support hierarchy irq_domains */
> +	int (*alloc)(struct irq_domain *d, unsigned int virq,
> +		     unsigned int nr_irqs, void *arg);
> +	void (*free)(struct irq_domain *d, unsigned int virq,
> +		     unsigned int nr_irqs);
> +	int (*activate)(struct irq_domain *d, struct irq_data *irq_data);
> +	int (*deactivate)(struct irq_domain *d, struct irq_data *irq_data);

Why do we have a return value here? Especially the deactivate one
makes no sense at all.

> +extern int irq_domain_activate_irq(struct irq_data *irq_data);
> +extern int irq_domain_deactivate_irq(struct irq_data *irq_data);

And here.

> @@ -178,6 +179,7 @@ int irq_startup(struct irq_desc *desc, bool resend)
>  	irq_state_clr_disabled(desc);
>  	desc->depth = 0;
>  
> +	irq_domain_activate_irq(&desc->irq_data);

We do not check it and we cannot do here AFAICT.

>  	if (desc->irq_data.chip->irq_startup) {
>  		ret = desc->irq_data.chip->irq_startup(&desc->irq_data);
>  		irq_state_clr_masked(desc);
> @@ -199,6 +201,7 @@ void irq_shutdown(struct irq_desc *desc)
>  		desc->irq_data.chip->irq_disable(&desc->irq_data);
>  	else
>  		desc->irq_data.chip->irq_mask(&desc->irq_data);
> +	irq_domain_deactivate_irq(&desc->irq_data);

Ditto.

So the return value for irq_domain_deactivate_irq() is silly to begin
with, but also the return value for irq_domain_activate_irq() does not
really make sense. We've allocated the resources for the interrupt
already down the hierarchy chain. So there is no reason why the actual
activation should fail.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 20/31] PCI/MSI: Kill redundant calling for irq_set_msi_desc() for MSIx interrupts
  2014-11-05 22:45   ` Bjorn Helgaas
@ 2014-11-06  1:32     ` Yijing Wang
  2014-11-06  4:04       ` Bjorn Helgaas
  2014-11-06  4:31       ` Jiang Liu
  0 siblings, 2 replies; 65+ messages in thread
From: Yijing Wang @ 2014-11-06  1:32 UTC (permalink / raw)
  To: Bjorn Helgaas, Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Konrad Rzeszutek Wilk, Andrew Morton,
	Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

On 2014/11/6 6:45, Bjorn Helgaas wrote:
> On Tue, Nov 04, 2014 at 08:01:54PM +0800, Jiang Liu wrote:
>> It's arch_setup_msi_irq()/arch_setup_msi_irqs()'s responsibility to call
>> irq_set_msi_desc() to associate IRQ descriptors and MSI descriptors,
>> so kill the redundant call of irq_set_msi_desc() for MSIx interrupts
>> in PCI MSI core.
> 
> "MSI-X" in English text, "msix" in code.
> 
> The default arch_setup_msi_irq() in drivers/pci/msi.c doesn't call
> irq_set_msi_desc().  Does it happen somewhere inside chip->setup_irq()?

Yes.

I also found this.
http://www.spinics.net/lists/linux-pci/msg34256.html

> 
> I don't know how to verify that there are calls in all the places needed.
> That makes me wonder if the factoring is wrong -- maybe irq_set_msi_desc()
> could be done in some common place.

In my idea, place the irq_set_msi_desc() in common MSI core is ok, but currently almost
all MSI arch code call irq_set_msi_desc() in arch code. So a lot of code need to change.
And arch code setup MSI and MSI-X in the same way, so if MSI work happy without irq_set_msi_desc(entry->irq, entry)
in common MSI code, MSI-X should be the same.

> 
>> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>> ---
>>  drivers/pci/msi.c |    1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> index afe974600c7d..da181c59394b 100644
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -685,7 +685,6 @@ static void msix_program_entries(struct pci_dev *dev,
>>  						PCI_MSIX_ENTRY_VECTOR_CTRL;
>>  
>>  		entries[i].vector = entry->irq;
>> -		irq_set_msi_desc(entry->irq, entry);
>>  		entry->masked = readl(entry->mask_base + offset);
>>  		msix_mask_irq(entry, 1);
>>  		i++;
>> -- 
>> 1.7.10.4
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-05 23:09   ` Bjorn Helgaas
@ 2014-11-06  1:58     ` Yijing Wang
  2014-11-06  4:10       ` Bjorn Helgaas
  2014-11-06  5:06       ` Jiang Liu
  2014-11-06  4:58     ` Jiang Liu
  1 sibling, 2 replies; 65+ messages in thread
From: Yijing Wang @ 2014-11-06  1:58 UTC (permalink / raw)
  To: Bjorn Helgaas, Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Alexander Gordeev, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86,
	linux-kernel, linux-pci, linux-acpi, linux-arm-kernel

>>  
>> @@ -1098,3 +1099,128 @@ int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
>>  	return nvec;
>>  }
>>  EXPORT_SYMBOL(pci_enable_msix_range);
>> +
>> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
> 
> Space, not tab.
> 
>> +static inline irq_hw_number_t
>> +msi_get_hwirq(struct pci_dev *pdev, struct msi_desc *msidesc)
> 
> The convention in this file is "struct pci_dev *dev".  And "struct msi_desc
> *desc" (or maybe "*entry").  Try to converge things, not diverge them.
> 
>> +{
>> +	return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
>> +		PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
>> +		(pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;
> 
> Where does this bit layout come from?  Is this defined in the spec
> somewhere?  A reference would help.

Currently, more and more Non-PCI device use MSI(or similar MSI mechanism), like DMAR fault irq
and HPET FSB irq. And we have to add additional code to support the MSI capability.
So I hope we can decouple MSI code and PCI code, then we can unify all MSI(or Message Based interrupt)
in one framework.

Thanks!
Yijing.

> 
>> +}
>> +
>> +static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
>> +			    unsigned int nr_irqs, void *arg)
>> +{
>> +	int i, ret;
>> +	irq_hw_number_t hwirq = arch_msi_irq_domain_get_hwirq(arg);
>> +
>> +	if (irq_find_mapping(domain, hwirq) > 0)
>> +		return -EEXIST;
>> +
>> +	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
>> +	if (ret >= 0)
> 
> 	if (ret < 0)
> 		return ret;
> 
> and un-indent the mainline code below.  Then it's obvious that this is the
> normal case, not the error case.
> 
>> +		for (i = 0; i < nr_irqs; i++) {
>> +			irq_domain_set_hwirq_and_chip(domain, virq + i,
>> +					hwirq + i, &msi_chip, (void *)(long)i);
>> +			__irq_set_handler(virq + i, handle_edge_irq, 0, "edge");
>> +		}
>> +
>> +	return ret;
>> +}
>> +
>> +static void msi_domain_free(struct irq_domain *domain, unsigned int virq,
>> +			    unsigned int nr_irqs)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < nr_irqs; i++) {
>> +		struct msi_desc *msidesc = irq_get_msi_desc(virq);
>> +
>> +		if (msidesc)
>> +			msidesc->irq = 0;
>> +	}
>> +	irq_domain_free_irqs_top(domain, virq, nr_irqs);
>> +}
>> +
>> +static int msi_domain_activate(struct irq_domain *domain,
>> +			       struct irq_data *irq_data)
>> +{
>> +	int ret = 0;
>> +	struct msi_msg msg;
>> +
>> +	/*
>> +	 * irq_data->chip_data is MSI/MSIx offset.
> 
> "MSI-X", as you wrote on the next line.
> 
>> +	 * MSI-X message is written per-IRQ, the offset is always 0.
>> +	 * MSI message denotes a contiguous group of IRQs, written for 0th IRQ.
>> +	 */
>> +	if (!irq_data->chip_data) {
> 
> 	if (irq_data->chip_data)
> 		return 0;
> 
> and un-indent the mainline code below, and drop the "ret = 0" init above.
> 
>> +		ret = irq_chip_compose_msi_msg(irq_data, &msg);
>> +		if (ret == 0)
> 
> 	if (ret)
> 		return ret;
> 
>> +			write_msi_msg(irq_data->irq, &msg);
>> +	}
>> +
>> +	return ret;
> 	return 0;
>> +}
>> +
>> +static int msi_domain_deactivate(struct irq_domain *domain,
>> +				 struct irq_data *irq_data)
>> +{
>> +	struct msi_msg msg;
>> +
>> +	if (irq_data->chip_data) {
>> +		memset(&msg, 0, sizeof(msg));
>> +		write_msi_msg(irq_data->irq, &msg);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static struct irq_domain_ops msi_domain_ops = {
>> +	.alloc = msi_domain_alloc,
>> +	.free = msi_domain_free,
>> +	.activate = msi_domain_activate,
>> +	.deactivate = msi_domain_deactivate,
>> +};
>> +
>> +struct irq_domain *msi_create_irq_domain(struct irq_domain *parent)
>> +{
>> +	struct irq_domain *domain;
>> +
>> +	domain = irq_domain_add_tree(NULL, &msi_domain_ops, NULL);
>> +	if (domain)
> 
> 	if (!domain)
> 		return NULL;
> 
> and un-indent this:
> 
>> +		domain->parent = parent;
>> +
>> +	return domain;
>> +}
>> +
>> +int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
>> +			      struct pci_dev *dev, void *arg)
>> +{
>> +	int i, virq;
>> +	struct msi_desc *msidesc;
>> +	int node = dev_to_node(&dev->dev);
>> +
>> +	list_for_each_entry(msidesc, &dev->msi_list, list) {
>> +		arch_msi_irq_domain_set_hwirq(arg, msi_get_hwirq(dev, msidesc));
>> +		virq = irq_domain_alloc_irqs(domain, msidesc->nvec_used,
>> +					     node, arg);
>> +		if (virq < 0) {
>> +			/* Special handling for pci_enable_msi_range(). */
>> +			return (type == PCI_CAP_ID_MSI &&
>> +				msidesc->nvec_used > 1) ?  1 : -ENOSPC;	
> 
> I think "if" would be easier to read than this ternary expression.
> 
>> +		}
>> +		for (i = 0; i < msidesc->nvec_used; i++)
>> +			irq_set_msi_desc_off(virq + i, i, msidesc);
>> +	}
>> +
>> +	list_for_each_entry(msidesc, &dev->msi_list, list)
>> +		if (msidesc->nvec_used == 1)
>> +			dev_dbg(&dev->dev, "irq %d for MSI/MSI-X\n", virq);
>> +		else
>> +			dev_dbg(&dev->dev, "irq [%d-%d] for MSI/MSI-X\n",
>> +				virq, virq + msidesc->nvec_used - 1);
>> +
>> +	return 0;
>> +}
>> +#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */
>> diff --git a/include/linux/msi.h b/include/linux/msi.h
>> index 44f4746d033b..05dcd425f82b 100644
>> --- a/include/linux/msi.h
>> +++ b/include/linux/msi.h
>> @@ -75,4 +75,15 @@ struct msi_chip {
>>  	void (*teardown_irq)(struct msi_chip *chip, unsigned int irq);
>>  };
>>  
>> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
> 
> Use a space here, not a tab.
> 
>> +extern struct irq_chip msi_chip;
> 
> I don't think "msi_chip" is a good name.  "Chip" only hints that it's a
> semiconductor integrated circuit; it doesn't say anything about what it
> does.  I've suggested "msi_controller" elsewhere.
> 
> Why does this need to be exported?  And why should there be only one in a
> system?
> 
>> +extern struct irq_domain *msi_create_irq_domain(struct irq_domain *parent);
>> +extern int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
>> +				     struct pci_dev *dev, void *arg);
>> +
>> +extern irq_hw_number_t arch_msi_irq_domain_get_hwirq(void *arg);
>> +extern void arch_msi_irq_domain_set_hwirq(void *arg, irq_hw_number_t hwirq);
> 
> Look at the rest of the file and notice that the existing code does not use
> "extern" on function declarations.
> 
>> +#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */
> 
> Use a space here (not a tab), like the #endif just below.
> 
>>  #endif /* LINUX_MSI_H */
>> -- 
>> 1.7.10.4
>>
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 20/31] PCI/MSI: Kill redundant calling for irq_set_msi_desc() for MSIx interrupts
  2014-11-06  1:32     ` Yijing Wang
@ 2014-11-06  4:04       ` Bjorn Helgaas
  2014-11-06  4:31       ` Jiang Liu
  1 sibling, 0 replies; 65+ messages in thread
From: Bjorn Helgaas @ 2014-11-06  4:04 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Jiang Liu, Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Konrad Rzeszutek Wilk, Andrew Morton,
	Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm

On Wed, Nov 5, 2014 at 6:32 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> On 2014/11/6 6:45, Bjorn Helgaas wrote:
>> On Tue, Nov 04, 2014 at 08:01:54PM +0800, Jiang Liu wrote:
>>> It's arch_setup_msi_irq()/arch_setup_msi_irqs()'s responsibility to call
>>> irq_set_msi_desc() to associate IRQ descriptors and MSI descriptors,
>>> so kill the redundant call of irq_set_msi_desc() for MSIx interrupts
>>> in PCI MSI core.
>>
>> "MSI-X" in English text, "msix" in code.
>>
>> The default arch_setup_msi_irq() in drivers/pci/msi.c doesn't call
>> irq_set_msi_desc().  Does it happen somewhere inside chip->setup_irq()?
>
> Yes.
>
> I also found this.
> http://www.spinics.net/lists/linux-pci/msg34256.html

Yes, and I asked the same question then :)

It's just impractical to review things like this that make assumptions
about lots of code scattered all over the place with no direct linkage
to the change.

Bjorn

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-06  1:58     ` Yijing Wang
@ 2014-11-06  4:10       ` Bjorn Helgaas
  2014-11-06  4:54         ` Yijing Wang
  2014-11-06  5:06       ` Jiang Liu
  1 sibling, 1 reply; 65+ messages in thread
From: Bjorn Helgaas @ 2014-11-06  4:10 UTC (permalink / raw)
  To: Yijing Wang
  Cc: Jiang Liu, Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Alexander Gordeev, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86,
	linux-kernel, linux-pci, linux-acpi, linux-arm

On Wed, Nov 5, 2014 at 6:58 PM, Yijing Wang <wangyijing@huawei.com> wrote:

>>> +{
>>> +    return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
>>> +            PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
>>> +            (pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;
>>
>> Where does this bit layout come from?  Is this defined in the spec
>> somewhere?  A reference would help.
>
> Currently, more and more Non-PCI device use MSI(or similar MSI mechanism), like DMAR fault irq
> and HPET FSB irq. And we have to add additional code to support the MSI capability.
> So I hope we can decouple MSI code and PCI code, then we can unify all MSI(or Message Based interrupt)
> in one framework.

Was that supposed to answer my question?  If so, I didn't understand
how it explains where the bit layout came from.

Bjorn

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 20/31] PCI/MSI: Kill redundant calling for irq_set_msi_desc() for MSIx interrupts
  2014-11-06  1:32     ` Yijing Wang
  2014-11-06  4:04       ` Bjorn Helgaas
@ 2014-11-06  4:31       ` Jiang Liu
  1 sibling, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-06  4:31 UTC (permalink / raw)
  To: Yijing Wang, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Konrad Rzeszutek Wilk, Andrew Morton,
	Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

On 2014/11/6 9:32, Yijing Wang wrote:
> On 2014/11/6 6:45, Bjorn Helgaas wrote:
>> On Tue, Nov 04, 2014 at 08:01:54PM +0800, Jiang Liu wrote:
>>> It's arch_setup_msi_irq()/arch_setup_msi_irqs()'s responsibility to call
>>> irq_set_msi_desc() to associate IRQ descriptors and MSI descriptors,
>>> so kill the redundant call of irq_set_msi_desc() for MSIx interrupts
>>> in PCI MSI core.
>>
>> "MSI-X" in English text, "msix" in code.
>>
>> The default arch_setup_msi_irq() in drivers/pci/msi.c doesn't call
>> irq_set_msi_desc().  Does it happen somewhere inside chip->setup_irq()?
> 
> Yes.
> 
> I also found this.
> http://www.spinics.net/lists/linux-pci/msg34256.html
> 
>>
>> I don't know how to verify that there are calls in all the places needed.
>> That makes me wonder if the factoring is wrong -- maybe irq_set_msi_desc()
>> could be done in some common place.
> 
> In my idea, place the irq_set_msi_desc() in common MSI core is ok, but currently almost
> all MSI arch code call irq_set_msi_desc() in arch code. So a lot of code need to change.
> And arch code setup MSI and MSI-X in the same way, so if MSI work happy without irq_set_msi_desc(entry->irq, entry)
> in common MSI code, MSI-X should be the same.
Hi Bjorn and Yijing,
	I originally plan was to move irq_set_msi_desc() into common
PCI MSI code. But when implementing this, I found every
arch_setup_msi_irq()/arch_setup_msi_irqs() needs to set msidesc->irq,
thus need to lock the irq descriptor. So I realized we should
rely on arch_setup_msi_irq() to call irq_set_msi_desc():)
Regards!
Gerry

> 
>>
>>> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>>> ---
>>>  drivers/pci/msi.c |    1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>>> index afe974600c7d..da181c59394b 100644
>>> --- a/drivers/pci/msi.c
>>> +++ b/drivers/pci/msi.c
>>> @@ -685,7 +685,6 @@ static void msix_program_entries(struct pci_dev *dev,
>>>  						PCI_MSIX_ENTRY_VECTOR_CTRL;
>>>  
>>>  		entries[i].vector = entry->irq;
>>> -		irq_set_msi_desc(entry->irq, entry);
>>>  		entry->masked = readl(entry->mask_base + offset);
>>>  		msix_mask_irq(entry, 1);
>>>  		i++;
>>> -- 
>>> 1.7.10.4
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> .
>>
> 
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-06  4:10       ` Bjorn Helgaas
@ 2014-11-06  4:54         ` Yijing Wang
  0 siblings, 0 replies; 65+ messages in thread
From: Yijing Wang @ 2014-11-06  4:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jiang Liu, Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Alexander Gordeev, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86,
	linux-kernel, linux-pci, linux-acpi, linux-arm

On 2014/11/6 12:10, Bjorn Helgaas wrote:
> On Wed, Nov 5, 2014 at 6:58 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> 
>>>> +{
>>>> +    return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
>>>> +            PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
>>>> +            (pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;
>>>
>>> Where does this bit layout come from?  Is this defined in the spec
>>> somewhere?  A reference would help.
>>
>> Currently, more and more Non-PCI device use MSI(or similar MSI mechanism), like DMAR fault irq
>> and HPET FSB irq. And we have to add additional code to support the MSI capability.
>> So I hope we can decouple MSI code and PCI code, then we can unify all MSI(or Message Based interrupt)
>> in one framework.
> 
> Was that supposed to answer my question?  If so, I didn't understand
> how it explains where the bit layout came from.

No, that's just my concern. Because this function uses the pci device id,
but more and more Non-PCI devices use MSI.

> 
> Bjorn
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-05 23:09   ` Bjorn Helgaas
  2014-11-06  1:58     ` Yijing Wang
@ 2014-11-06  4:58     ` Jiang Liu
  2014-11-06  5:28       ` Bjorn Helgaas
  1 sibling, 1 reply; 65+ messages in thread
From: Jiang Liu @ 2014-11-06  4:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Yijing Wang, Alexander Gordeev,
	Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm-kernel

On 2014/11/6 7:09, Bjorn Helgaas wrote:
> On Tue, Nov 04, 2014 at 08:01:55PM +0800, Jiang Liu wrote:
> 
> In your topic:
> 
>   PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
> 
> There's no need to repeat "PCI MSI".  Please run "git log --oneline
> drivers/pci/msi.c" and make your similar (capitalize the first word).
Hi Bjornm
	I'm already very carefully with your education about commit
log messages, but still missed this one:(. Will be even more careful
next time.

> 
>> Enhance PCI MSI core to support hierarchy irqdomain, so the common
>> code could be shared among architectures.
>>
>> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>> ---
>>  drivers/pci/Kconfig |    4 ++
>>  drivers/pci/msi.c   |  126 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/msi.h |   11 +++++
>>  3 files changed, 141 insertions(+)
>>
>> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
>> index b9db0f2ce11f..022e89745f86 100644
>> --- a/drivers/pci/Kconfig
>> +++ b/drivers/pci/Kconfig
>> @@ -16,6 +16,10 @@ config PCI_MSI
>>  
>>  	   If you don't know what to do here, say Y.
>>  
>> +config PCI_MSI_IRQ_DOMAIN
>> +	bool
>> +	depends on PCI_MSI && IRQ_DOMAIN_HIERARCHY
>> +
>>  config PCI_DEBUG
>>  	bool "PCI Debugging"
>>  	depends on PCI && DEBUG_KERNEL
>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> index da181c59394b..7423ee16972f 100644
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -19,6 +19,7 @@
>>  #include <linux/errno.h>
>>  #include <linux/io.h>
>>  #include <linux/slab.h>
>> +#include <linux/irqdomain.h>
>>  
>>  #include "pci.h"
>>  
>> @@ -1098,3 +1099,128 @@ int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
>>  	return nvec;
>>  }
>>  EXPORT_SYMBOL(pci_enable_msix_range);
>> +
>> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
> 
> Space, not tab.
Will fix it in next version.

> 
>> +static inline irq_hw_number_t
>> +msi_get_hwirq(struct pci_dev *pdev, struct msi_desc *msidesc)
> 
> The convention in this file is "struct pci_dev *dev".  And "struct msi_desc
> *desc" (or maybe "*entry").  Try to converge things, not diverge them.
Thanks for reminder. Adding another check item to my list before
sending out patches:)

> 
>> +{
>> +	return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
>> +		PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
>> +		(pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;
> 
> Where does this bit layout come from?  Is this defined in the spec
> somewhere?  A reference would help.
We need a unique number to identify every possible MSI source,
and this ID number is only used within the irqdomain subsystem.
So we used above algorithm to generate the ID number, there's
no specification for it.

> 
>> +}
>> +
>> +static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
>> +			    unsigned int nr_irqs, void *arg)
>> +{
>> +	int i, ret;
>> +	irq_hw_number_t hwirq = arch_msi_irq_domain_get_hwirq(arg);
>> +
>> +	if (irq_find_mapping(domain, hwirq) > 0)
>> +		return -EEXIST;
>> +
>> +	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
>> +	if (ret >= 0)
> 
> 	if (ret < 0)
> 		return ret;
> 
> and un-indent the mainline code below.  Then it's obvious that this is the
> normal case, not the error case.
Sure, I want to only use one return statement, but didn't realized that
syntax seems like error handling:)

> 
>> +		for (i = 0; i < nr_irqs; i++) {
>> +			irq_domain_set_hwirq_and_chip(domain, virq + i,
>> +					hwirq + i, &msi_chip, (void *)(long)i);
>> +			__irq_set_handler(virq + i, handle_edge_irq, 0, "edge");
>> +		}
>> +
>> +	return ret;
>> +}
>> +
>> +static void msi_domain_free(struct irq_domain *domain, unsigned int virq,
>> +			    unsigned int nr_irqs)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < nr_irqs; i++) {
>> +		struct msi_desc *msidesc = irq_get_msi_desc(virq);
>> +
>> +		if (msidesc)
>> +			msidesc->irq = 0;
>> +	}
>> +	irq_domain_free_irqs_top(domain, virq, nr_irqs);
>> +}
>> +
>> +static int msi_domain_activate(struct irq_domain *domain,
>> +			       struct irq_data *irq_data)
>> +{
>> +	int ret = 0;
>> +	struct msi_msg msg;
>> +
>> +	/*
>> +	 * irq_data->chip_data is MSI/MSIx offset.
> 
> "MSI-X", as you wrote on the next line.
Sure.

> 
>> +	 * MSI-X message is written per-IRQ, the offset is always 0.
>> +	 * MSI message denotes a contiguous group of IRQs, written for 0th IRQ.
>> +	 */
>> +	if (!irq_data->chip_data) {
> 
> 	if (irq_data->chip_data)
> 		return 0;
> 
> and un-indent the mainline code below, and drop the "ret = 0" init above.
> 
>> +		ret = irq_chip_compose_msi_msg(irq_data, &msg);
>> +		if (ret == 0)
> 
> 	if (ret)
> 		return ret;
> 
>> +			write_msi_msg(irq_data->irq, &msg);
>> +	}
>> +
>> +	return ret;
> 	return 0;
>> +}
>> +
>> +static int msi_domain_deactivate(struct irq_domain *domain,
>> +				 struct irq_data *irq_data)
>> +{
>> +	struct msi_msg msg;
>> +
>> +	if (irq_data->chip_data) {
>> +		memset(&msg, 0, sizeof(msg));
>> +		write_msi_msg(irq_data->irq, &msg);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static struct irq_domain_ops msi_domain_ops = {
>> +	.alloc = msi_domain_alloc,
>> +	.free = msi_domain_free,
>> +	.activate = msi_domain_activate,
>> +	.deactivate = msi_domain_deactivate,
>> +};
>> +
>> +struct irq_domain *msi_create_irq_domain(struct irq_domain *parent)
>> +{
>> +	struct irq_domain *domain;
>> +
>> +	domain = irq_domain_add_tree(NULL, &msi_domain_ops, NULL);
>> +	if (domain)
> 
> 	if (!domain)
> 		return NULL;
> 
> and un-indent this:
> 
>> +		domain->parent = parent;
>> +
>> +	return domain;
>> +}
>> +
>> +int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
>> +			      struct pci_dev *dev, void *arg)
>> +{
>> +	int i, virq;
>> +	struct msi_desc *msidesc;
>> +	int node = dev_to_node(&dev->dev);
>> +
>> +	list_for_each_entry(msidesc, &dev->msi_list, list) {
>> +		arch_msi_irq_domain_set_hwirq(arg, msi_get_hwirq(dev, msidesc));
>> +		virq = irq_domain_alloc_irqs(domain, msidesc->nvec_used,
>> +					     node, arg);
>> +		if (virq < 0) {
>> +			/* Special handling for pci_enable_msi_range(). */
>> +			return (type == PCI_CAP_ID_MSI &&
>> +				msidesc->nvec_used > 1) ?  1 : -ENOSPC;	
> 
> I think "if" would be easier to read than this ternary expression.
Sure.

> 
>> +		}
>> +		for (i = 0; i < msidesc->nvec_used; i++)
>> +			irq_set_msi_desc_off(virq + i, i, msidesc);
>> +	}
>> +
>> +	list_for_each_entry(msidesc, &dev->msi_list, list)
>> +		if (msidesc->nvec_used == 1)
>> +			dev_dbg(&dev->dev, "irq %d for MSI/MSI-X\n", virq);
>> +		else
>> +			dev_dbg(&dev->dev, "irq [%d-%d] for MSI/MSI-X\n",
>> +				virq, virq + msidesc->nvec_used - 1);
>> +
>> +	return 0;
>> +}
>> +#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */
>> diff --git a/include/linux/msi.h b/include/linux/msi.h
>> index 44f4746d033b..05dcd425f82b 100644
>> --- a/include/linux/msi.h
>> +++ b/include/linux/msi.h
>> @@ -75,4 +75,15 @@ struct msi_chip {
>>  	void (*teardown_irq)(struct msi_chip *chip, unsigned int irq);
>>  };
>>  
>> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
> 
> Use a space here, not a tab.
Sure.

> 
>> +extern struct irq_chip msi_chip;
> 
> I don't think "msi_chip" is a good name.  "Chip" only hints that it's a
> semiconductor integrated circuit; it doesn't say anything about what it
> does.  I've suggested "msi_controller" elsewhere.
> 
> Why does this need to be exported?  And why should there be only one in a
> system?
I have changed the interfaces as below in next version, so we could
hide "msi_chip" private and support different irq_chip for different
irqdomains.

struct irq_domain *msi_create_irq_domain(struct device_node *of_node,
                                         struct irq_chip *chip,
                                         struct irq_domain *parent);

> 
>> +extern struct irq_domain *msi_create_irq_domain(struct irq_domain *parent);
>> +extern int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
>> +				     struct pci_dev *dev, void *arg);
>> +
>> +extern irq_hw_number_t arch_msi_irq_domain_get_hwirq(void *arg);
>> +extern void arch_msi_irq_domain_set_hwirq(void *arg, irq_hw_number_t hwirq);
> 
> Look at the rest of the file and notice that the existing code does not use
> "extern" on function declarations.
Sure.

> 
>> +#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */
> 
> Use a space here (not a tab), like the #endif just below.
Sure.

Thanks for your review and great comments!
Regards!
Gerry
> 
>>  #endif /* LINUX_MSI_H */
>> -- 
>> 1.7.10.4
>>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-06  1:58     ` Yijing Wang
  2014-11-06  4:10       ` Bjorn Helgaas
@ 2014-11-06  5:06       ` Jiang Liu
  2014-11-06  5:42         ` Yijing Wang
  1 sibling, 1 reply; 65+ messages in thread
From: Jiang Liu @ 2014-11-06  5:06 UTC (permalink / raw)
  To: Yijing Wang, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Alexander Gordeev, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86,
	linux-kernel, linux-pci, linux-acpi, linux-arm-kernel

On 2014/11/6 9:58, Yijing Wang wrote:
>>>  
>>> @@ -1098,3 +1099,128 @@ int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
>>>  	return nvec;
>>>  }
>>>  EXPORT_SYMBOL(pci_enable_msix_range);
>>> +
>>> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
>>
>> Space, not tab.
>>
>>> +static inline irq_hw_number_t
>>> +msi_get_hwirq(struct pci_dev *pdev, struct msi_desc *msidesc)
>>
>> The convention in this file is "struct pci_dev *dev".  And "struct msi_desc
>> *desc" (or maybe "*entry").  Try to converge things, not diverge them.
>>
>>> +{
>>> +	return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
>>> +		PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
>>> +		(pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;
>>
>> Where does this bit layout come from?  Is this defined in the spec
>> somewhere?  A reference would help.
> 
> Currently, more and more Non-PCI device use MSI(or similar MSI mechanism), like DMAR fault irq
> and HPET FSB irq. And we have to add additional code to support the MSI capability.
> So I hope we can decouple MSI code and PCI code, then we can unify all MSI(or Message Based interrupt)
> in one framework.
Hi Yijing,
	I have a following patch to share more code among MSI/DMAR/HPET,
which is one step forward as you suggested. Will send out that patch set
soon.
Regards!
Gerry

> 
> Thanks!
> Yijing.
> 
>>
>>> +}
>>> +
>>> +static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
>>> +			    unsigned int nr_irqs, void *arg)
>>> +{
>>> +	int i, ret;
>>> +	irq_hw_number_t hwirq = arch_msi_irq_domain_get_hwirq(arg);
>>> +
>>> +	if (irq_find_mapping(domain, hwirq) > 0)
>>> +		return -EEXIST;
>>> +
>>> +	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
>>> +	if (ret >= 0)
>>
>> 	if (ret < 0)
>> 		return ret;
>>
>> and un-indent the mainline code below.  Then it's obvious that this is the
>> normal case, not the error case.
>>
>>> +		for (i = 0; i < nr_irqs; i++) {
>>> +			irq_domain_set_hwirq_and_chip(domain, virq + i,
>>> +					hwirq + i, &msi_chip, (void *)(long)i);
>>> +			__irq_set_handler(virq + i, handle_edge_irq, 0, "edge");
>>> +		}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static void msi_domain_free(struct irq_domain *domain, unsigned int virq,
>>> +			    unsigned int nr_irqs)
>>> +{
>>> +	int i;
>>> +
>>> +	for (i = 0; i < nr_irqs; i++) {
>>> +		struct msi_desc *msidesc = irq_get_msi_desc(virq);
>>> +
>>> +		if (msidesc)
>>> +			msidesc->irq = 0;
>>> +	}
>>> +	irq_domain_free_irqs_top(domain, virq, nr_irqs);
>>> +}
>>> +
>>> +static int msi_domain_activate(struct irq_domain *domain,
>>> +			       struct irq_data *irq_data)
>>> +{
>>> +	int ret = 0;
>>> +	struct msi_msg msg;
>>> +
>>> +	/*
>>> +	 * irq_data->chip_data is MSI/MSIx offset.
>>
>> "MSI-X", as you wrote on the next line.
>>
>>> +	 * MSI-X message is written per-IRQ, the offset is always 0.
>>> +	 * MSI message denotes a contiguous group of IRQs, written for 0th IRQ.
>>> +	 */
>>> +	if (!irq_data->chip_data) {
>>
>> 	if (irq_data->chip_data)
>> 		return 0;
>>
>> and un-indent the mainline code below, and drop the "ret = 0" init above.
>>
>>> +		ret = irq_chip_compose_msi_msg(irq_data, &msg);
>>> +		if (ret == 0)
>>
>> 	if (ret)
>> 		return ret;
>>
>>> +			write_msi_msg(irq_data->irq, &msg);
>>> +	}
>>> +
>>> +	return ret;
>> 	return 0;
>>> +}
>>> +
>>> +static int msi_domain_deactivate(struct irq_domain *domain,
>>> +				 struct irq_data *irq_data)
>>> +{
>>> +	struct msi_msg msg;
>>> +
>>> +	if (irq_data->chip_data) {
>>> +		memset(&msg, 0, sizeof(msg));
>>> +		write_msi_msg(irq_data->irq, &msg);
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static struct irq_domain_ops msi_domain_ops = {
>>> +	.alloc = msi_domain_alloc,
>>> +	.free = msi_domain_free,
>>> +	.activate = msi_domain_activate,
>>> +	.deactivate = msi_domain_deactivate,
>>> +};
>>> +
>>> +struct irq_domain *msi_create_irq_domain(struct irq_domain *parent)
>>> +{
>>> +	struct irq_domain *domain;
>>> +
>>> +	domain = irq_domain_add_tree(NULL, &msi_domain_ops, NULL);
>>> +	if (domain)
>>
>> 	if (!domain)
>> 		return NULL;
>>
>> and un-indent this:
>>
>>> +		domain->parent = parent;
>>> +
>>> +	return domain;
>>> +}
>>> +
>>> +int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
>>> +			      struct pci_dev *dev, void *arg)
>>> +{
>>> +	int i, virq;
>>> +	struct msi_desc *msidesc;
>>> +	int node = dev_to_node(&dev->dev);
>>> +
>>> +	list_for_each_entry(msidesc, &dev->msi_list, list) {
>>> +		arch_msi_irq_domain_set_hwirq(arg, msi_get_hwirq(dev, msidesc));
>>> +		virq = irq_domain_alloc_irqs(domain, msidesc->nvec_used,
>>> +					     node, arg);
>>> +		if (virq < 0) {
>>> +			/* Special handling for pci_enable_msi_range(). */
>>> +			return (type == PCI_CAP_ID_MSI &&
>>> +				msidesc->nvec_used > 1) ?  1 : -ENOSPC;	
>>
>> I think "if" would be easier to read than this ternary expression.
>>
>>> +		}
>>> +		for (i = 0; i < msidesc->nvec_used; i++)
>>> +			irq_set_msi_desc_off(virq + i, i, msidesc);
>>> +	}
>>> +
>>> +	list_for_each_entry(msidesc, &dev->msi_list, list)
>>> +		if (msidesc->nvec_used == 1)
>>> +			dev_dbg(&dev->dev, "irq %d for MSI/MSI-X\n", virq);
>>> +		else
>>> +			dev_dbg(&dev->dev, "irq [%d-%d] for MSI/MSI-X\n",
>>> +				virq, virq + msidesc->nvec_used - 1);
>>> +
>>> +	return 0;
>>> +}
>>> +#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */
>>> diff --git a/include/linux/msi.h b/include/linux/msi.h
>>> index 44f4746d033b..05dcd425f82b 100644
>>> --- a/include/linux/msi.h
>>> +++ b/include/linux/msi.h
>>> @@ -75,4 +75,15 @@ struct msi_chip {
>>>  	void (*teardown_irq)(struct msi_chip *chip, unsigned int irq);
>>>  };
>>>  
>>> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
>>
>> Use a space here, not a tab.
>>
>>> +extern struct irq_chip msi_chip;
>>
>> I don't think "msi_chip" is a good name.  "Chip" only hints that it's a
>> semiconductor integrated circuit; it doesn't say anything about what it
>> does.  I've suggested "msi_controller" elsewhere.
>>
>> Why does this need to be exported?  And why should there be only one in a
>> system?
>>
>>> +extern struct irq_domain *msi_create_irq_domain(struct irq_domain *parent);
>>> +extern int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
>>> +				     struct pci_dev *dev, void *arg);
>>> +
>>> +extern irq_hw_number_t arch_msi_irq_domain_get_hwirq(void *arg);
>>> +extern void arch_msi_irq_domain_set_hwirq(void *arg, irq_hw_number_t hwirq);
>>
>> Look at the rest of the file and notice that the existing code does not use
>> "extern" on function declarations.
>>
>>> +#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */
>>
>> Use a space here (not a tab), like the #endif just below.
>>
>>>  #endif /* LINUX_MSI_H */
>>> -- 
>>> 1.7.10.4
>>>
>>
>> .
>>
> 
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-06  4:58     ` Jiang Liu
@ 2014-11-06  5:28       ` Bjorn Helgaas
  0 siblings, 0 replies; 65+ messages in thread
From: Bjorn Helgaas @ 2014-11-06  5:28 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Yijing Wang, Alexander Gordeev,
	Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, x86, linux-kernel, linux-pci, linux-acpi,
	linux-arm

On Wed, Nov 5, 2014 at 9:58 PM, Jiang Liu <jiang.liu@linux.intel.com> wrote:
> On 2014/11/6 7:09, Bjorn Helgaas wrote:
>> On Tue, Nov 04, 2014 at 08:01:55PM +0800, Jiang Liu wrote:

>>> +{
>>> +    return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
>>> +            PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
>>> +            (pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;
>>
>> Where does this bit layout come from?  Is this defined in the spec
>> somewhere?  A reference would help.
> We need a unique number to identify every possible MSI source,
> and this ID number is only used within the irqdomain subsystem.
> So we used above algorithm to generate the ID number, there's
> no specification for it.

A comment to that effect would be great.

Bjorn

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-06  5:06       ` Jiang Liu
@ 2014-11-06  5:42         ` Yijing Wang
  0 siblings, 0 replies; 65+ messages in thread
From: Yijing Wang @ 2014-11-06  5:42 UTC (permalink / raw)
  To: Jiang Liu, Bjorn Helgaas
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Alexander Gordeev, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86,
	linux-kernel, linux-pci, linux-acpi, linux-arm-kernel

On 2014/11/6 13:06, Jiang Liu wrote:
> On 2014/11/6 9:58, Yijing Wang wrote:
>>>>  
>>>> @@ -1098,3 +1099,128 @@ int pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries,
>>>>  	return nvec;
>>>>  }
>>>>  EXPORT_SYMBOL(pci_enable_msix_range);
>>>> +
>>>> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
>>>
>>> Space, not tab.
>>>
>>>> +static inline irq_hw_number_t
>>>> +msi_get_hwirq(struct pci_dev *pdev, struct msi_desc *msidesc)
>>>
>>> The convention in this file is "struct pci_dev *dev".  And "struct msi_desc
>>> *desc" (or maybe "*entry").  Try to converge things, not diverge them.
>>>
>>>> +{
>>>> +	return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
>>>> +		PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
>>>> +		(pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;
>>>
>>> Where does this bit layout come from?  Is this defined in the spec
>>> somewhere?  A reference would help.
>>
>> Currently, more and more Non-PCI device use MSI(or similar MSI mechanism), like DMAR fault irq
>> and HPET FSB irq. And we have to add additional code to support the MSI capability.
>> So I hope we can decouple MSI code and PCI code, then we can unify all MSI(or Message Based interrupt)
>> in one framework.
> Hi Yijing,
> 	I have a following patch to share more code among MSI/DMAR/HPET,
> which is one step forward as you suggested. Will send out that patch set
> soon.

That's Great! :)

> Regards!
> Gerry
> 
>>
>> Thanks!
>> Yijing.
>>
>>>
>>>> +}
>>>> +
>>>> +static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
>>>> +			    unsigned int nr_irqs, void *arg)
>>>> +{
>>>> +	int i, ret;
>>>> +	irq_hw_number_t hwirq = arch_msi_irq_domain_get_hwirq(arg);
>>>> +
>>>> +	if (irq_find_mapping(domain, hwirq) > 0)
>>>> +		return -EEXIST;
>>>> +
>>>> +	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
>>>> +	if (ret >= 0)
>>>
>>> 	if (ret < 0)
>>> 		return ret;
>>>
>>> and un-indent the mainline code below.  Then it's obvious that this is the
>>> normal case, not the error case.
>>>
>>>> +		for (i = 0; i < nr_irqs; i++) {
>>>> +			irq_domain_set_hwirq_and_chip(domain, virq + i,
>>>> +					hwirq + i, &msi_chip, (void *)(long)i);
>>>> +			__irq_set_handler(virq + i, handle_edge_irq, 0, "edge");
>>>> +		}
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static void msi_domain_free(struct irq_domain *domain, unsigned int virq,
>>>> +			    unsigned int nr_irqs)
>>>> +{
>>>> +	int i;
>>>> +
>>>> +	for (i = 0; i < nr_irqs; i++) {
>>>> +		struct msi_desc *msidesc = irq_get_msi_desc(virq);
>>>> +
>>>> +		if (msidesc)
>>>> +			msidesc->irq = 0;
>>>> +	}
>>>> +	irq_domain_free_irqs_top(domain, virq, nr_irqs);
>>>> +}
>>>> +
>>>> +static int msi_domain_activate(struct irq_domain *domain,
>>>> +			       struct irq_data *irq_data)
>>>> +{
>>>> +	int ret = 0;
>>>> +	struct msi_msg msg;
>>>> +
>>>> +	/*
>>>> +	 * irq_data->chip_data is MSI/MSIx offset.
>>>
>>> "MSI-X", as you wrote on the next line.
>>>
>>>> +	 * MSI-X message is written per-IRQ, the offset is always 0.
>>>> +	 * MSI message denotes a contiguous group of IRQs, written for 0th IRQ.
>>>> +	 */
>>>> +	if (!irq_data->chip_data) {
>>>
>>> 	if (irq_data->chip_data)
>>> 		return 0;
>>>
>>> and un-indent the mainline code below, and drop the "ret = 0" init above.
>>>
>>>> +		ret = irq_chip_compose_msi_msg(irq_data, &msg);
>>>> +		if (ret == 0)
>>>
>>> 	if (ret)
>>> 		return ret;
>>>
>>>> +			write_msi_msg(irq_data->irq, &msg);
>>>> +	}
>>>> +
>>>> +	return ret;
>>> 	return 0;
>>>> +}
>>>> +
>>>> +static int msi_domain_deactivate(struct irq_domain *domain,
>>>> +				 struct irq_data *irq_data)
>>>> +{
>>>> +	struct msi_msg msg;
>>>> +
>>>> +	if (irq_data->chip_data) {
>>>> +		memset(&msg, 0, sizeof(msg));
>>>> +		write_msi_msg(irq_data->irq, &msg);
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static struct irq_domain_ops msi_domain_ops = {
>>>> +	.alloc = msi_domain_alloc,
>>>> +	.free = msi_domain_free,
>>>> +	.activate = msi_domain_activate,
>>>> +	.deactivate = msi_domain_deactivate,
>>>> +};
>>>> +
>>>> +struct irq_domain *msi_create_irq_domain(struct irq_domain *parent)
>>>> +{
>>>> +	struct irq_domain *domain;
>>>> +
>>>> +	domain = irq_domain_add_tree(NULL, &msi_domain_ops, NULL);
>>>> +	if (domain)
>>>
>>> 	if (!domain)
>>> 		return NULL;
>>>
>>> and un-indent this:
>>>
>>>> +		domain->parent = parent;
>>>> +
>>>> +	return domain;
>>>> +}
>>>> +
>>>> +int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
>>>> +			      struct pci_dev *dev, void *arg)
>>>> +{
>>>> +	int i, virq;
>>>> +	struct msi_desc *msidesc;
>>>> +	int node = dev_to_node(&dev->dev);
>>>> +
>>>> +	list_for_each_entry(msidesc, &dev->msi_list, list) {
>>>> +		arch_msi_irq_domain_set_hwirq(arg, msi_get_hwirq(dev, msidesc));
>>>> +		virq = irq_domain_alloc_irqs(domain, msidesc->nvec_used,
>>>> +					     node, arg);
>>>> +		if (virq < 0) {
>>>> +			/* Special handling for pci_enable_msi_range(). */
>>>> +			return (type == PCI_CAP_ID_MSI &&
>>>> +				msidesc->nvec_used > 1) ?  1 : -ENOSPC;	
>>>
>>> I think "if" would be easier to read than this ternary expression.
>>>
>>>> +		}
>>>> +		for (i = 0; i < msidesc->nvec_used; i++)
>>>> +			irq_set_msi_desc_off(virq + i, i, msidesc);
>>>> +	}
>>>> +
>>>> +	list_for_each_entry(msidesc, &dev->msi_list, list)
>>>> +		if (msidesc->nvec_used == 1)
>>>> +			dev_dbg(&dev->dev, "irq %d for MSI/MSI-X\n", virq);
>>>> +		else
>>>> +			dev_dbg(&dev->dev, "irq [%d-%d] for MSI/MSI-X\n",
>>>> +				virq, virq + msidesc->nvec_used - 1);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */
>>>> diff --git a/include/linux/msi.h b/include/linux/msi.h
>>>> index 44f4746d033b..05dcd425f82b 100644
>>>> --- a/include/linux/msi.h
>>>> +++ b/include/linux/msi.h
>>>> @@ -75,4 +75,15 @@ struct msi_chip {
>>>>  	void (*teardown_irq)(struct msi_chip *chip, unsigned int irq);
>>>>  };
>>>>  
>>>> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
>>>
>>> Use a space here, not a tab.
>>>
>>>> +extern struct irq_chip msi_chip;
>>>
>>> I don't think "msi_chip" is a good name.  "Chip" only hints that it's a
>>> semiconductor integrated circuit; it doesn't say anything about what it
>>> does.  I've suggested "msi_controller" elsewhere.
>>>
>>> Why does this need to be exported?  And why should there be only one in a
>>> system?
>>>
>>>> +extern struct irq_domain *msi_create_irq_domain(struct irq_domain *parent);
>>>> +extern int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
>>>> +				     struct pci_dev *dev, void *arg);
>>>> +
>>>> +extern irq_hw_number_t arch_msi_irq_domain_get_hwirq(void *arg);
>>>> +extern void arch_msi_irq_domain_set_hwirq(void *arg, irq_hw_number_t hwirq);
>>>
>>> Look at the rest of the file and notice that the existing code does not use
>>> "extern" on function declarations.
>>>
>>>> +#endif	/* CONFIG_PCI_MSI_IRQ_DOMAIN */
>>>
>>> Use a space here (not a tab), like the #endif just below.
>>>
>>>>  #endif /* LINUX_MSI_H */
>>>> -- 
>>>> 1.7.10.4
>>>>
>>>
>>> .
>>>
>>
>>
> 
> .
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 01/31] irqdomain: Introduce new interfaces to support hierarchy irqdomains
  2014-11-05 23:48   ` Thomas Gleixner
@ 2014-11-06  6:09     ` Jiang Liu
  0 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-06  6:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Benjamin Herrenschmidt, Ingo Molnar, H. Peter Anvin,
	Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Jonathan Corbet, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Joerg Roedel, Greg Kroah-Hartman, x86,
	linux-kernel, linux-pci, linux-acpi, linux-arm-kernel, linux-doc



On 2014/11/6 7:48, Thomas Gleixner wrote:
> On Tue, 4 Nov 2014, Jiang Liu wrote:
>>  /* Number of irqs reserved for a legacy isa controller */
>>  #define NUM_ISA_INTERRUPTS	16
>> @@ -64,6 +66,16 @@ struct irq_domain_ops {
>>  	int (*xlate)(struct irq_domain *d, struct device_node *node,
>>  		     const u32 *intspec, unsigned int intsize,
>>  		     unsigned long *out_hwirq, unsigned int *out_type);
>> +
>> +#ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
>> +	/* extended V2 interfaces to support hierarchy irq_domains */
>> +	int (*alloc)(struct irq_domain *d, unsigned int virq,
>> +		     unsigned int nr_irqs, void *arg);
>> +	void (*free)(struct irq_domain *d, unsigned int virq,
>> +		     unsigned int nr_irqs);
>> +	int (*activate)(struct irq_domain *d, struct irq_data *irq_data);
>> +	int (*deactivate)(struct irq_domain *d, struct irq_data *irq_data);
> 
> Why do we have a return value here? Especially the deactivate one
> makes no sense at all.
> 
>> +extern int irq_domain_activate_irq(struct irq_data *irq_data);
>> +extern int irq_domain_deactivate_irq(struct irq_data *irq_data);
> 
> And here.
> 
>> @@ -178,6 +179,7 @@ int irq_startup(struct irq_desc *desc, bool resend)
>>  	irq_state_clr_disabled(desc);
>>  	desc->depth = 0;
>>  
>> +	irq_domain_activate_irq(&desc->irq_data);
> 
> We do not check it and we cannot do here AFAICT.
> 
>>  	if (desc->irq_data.chip->irq_startup) {
>>  		ret = desc->irq_data.chip->irq_startup(&desc->irq_data);
>>  		irq_state_clr_masked(desc);
>> @@ -199,6 +201,7 @@ void irq_shutdown(struct irq_desc *desc)
>>  		desc->irq_data.chip->irq_disable(&desc->irq_data);
>>  	else
>>  		desc->irq_data.chip->irq_mask(&desc->irq_data);
>> +	irq_domain_deactivate_irq(&desc->irq_data);
> 
> Ditto.
> 
> So the return value for irq_domain_deactivate_irq() is silly to begin
> with, but also the return value for irq_domain_activate_irq() does not
> really make sense. We've allocated the resources for the interrupt
> already down the hierarchy chain. So there is no reason why the actual
> activation should fail.
Hi Thomas,
	Fair enough, I have changed them to return void, which also
simplify the implementation. But add one or two BUG_ON()s:)
Regards!
Gerry
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-04 12:01 ` [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain Jiang Liu
  2014-11-05 23:09   ` Bjorn Helgaas
@ 2014-11-06 10:01   ` Thomas Gleixner
  2014-11-06 10:30     ` Thomas Gleixner
  2014-11-06 11:41     ` Jiang Liu
  1 sibling, 2 replies; 65+ messages in thread
From: Thomas Gleixner @ 2014-11-06 10:01 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Ingo Molnar, H. Peter Anvin,
	Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Yijing Wang, Alexander Gordeev,
	Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, Suravee Suthikulanit, x86, LKML, linux-pci,
	linux-acpi, LAK

On Tue, 4 Nov 2014, Jiang Liu wrote:
> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
> +static inline irq_hw_number_t
> +msi_get_hwirq(struct pci_dev *pdev, struct msi_desc *msidesc)
> +{
> +	return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
> +		PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
> +		(pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;
> +}
> +
> +static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
> +			    unsigned int nr_irqs, void *arg)
> +{
> +	int i, ret;
> +	irq_hw_number_t hwirq = arch_msi_irq_domain_get_hwirq(arg);
> +
> +	if (irq_find_mapping(domain, hwirq) > 0)
> +		return -EEXIST;
> +
> +	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
> +	if (ret >= 0)
> +		for (i = 0; i < nr_irqs; i++) {
> +			irq_domain_set_hwirq_and_chip(domain, virq + i,
> +					hwirq + i, &msi_chip, (void *)(long)i);

I think msi_chip being a global unique thing is problematic. It does
not allow multi platform kernels to select a chip at boot time and it
does not allow per domain chip implementations when you have multiple
msi domains. Aside of that msi_chip is a pretty bad name for a global.

The solution is rather simple and msi is wide spread enough to justify
that.

struct irqdomain_msi_data {
       struct irq_chip       *irq_chip;
};

We make that a struct so we can accomodate for other special things
which might be domain rather than architecture specific. One
obvious use case would be to hold the arch_msi_irq_domain_get/set_hwirq
callbacks.

struct irq_domain *msi_create_irq_domain(struct irq_domain *parent,
       		  			 struct irqdomain_msi_data *data)
{
        struct irq_domain *domain;

        domain = irq_domain_add_tree(NULL, &msi_domain_ops, NULL);
        if (domain) {
                domain->parent = parent;
		domain->msi_data = data;
	}
        return domain;
}

Now the above becomes:

    	struct irq_chip *msi_chip = domain->msi_data->irq_chip;

	irq_domain_set_hwirq_and_chip(domain, virq + i,
				      hwirq + i, msi_chip, (void *)(long)i);

> +int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
> +			      struct pci_dev *dev, void *arg)
> +{
> +	int i, virq;
> +	struct msi_desc *msidesc;
> +	int node = dev_to_node(&dev->dev);
> +
> +	list_for_each_entry(msidesc, &dev->msi_list, list) {
> +		arch_msi_irq_domain_set_hwirq(arg, msi_get_hwirq(dev, msidesc));

The arch_xxx callbacks want to be documented. It's not obvious what
they are supposed to do.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-06 10:01   ` Thomas Gleixner
@ 2014-11-06 10:30     ` Thomas Gleixner
  2014-11-06 11:41     ` Jiang Liu
  1 sibling, 0 replies; 65+ messages in thread
From: Thomas Gleixner @ 2014-11-06 10:30 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Ingo Molnar, H. Peter Anvin,
	Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Yijing Wang, Alexander Gordeev,
	Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, Suravee Suthikulanit, x86, LKML, linux-pci,
	linux-acpi, LAK

On Thu, 6 Nov 2014, Thomas Gleixner wrote:
> On Tue, 4 Nov 2014, Jiang Liu wrote:
> > +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
> > +static inline irq_hw_number_t
> > +msi_get_hwirq(struct pci_dev *pdev, struct msi_desc *msidesc)
> > +{
> > +	return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
> > +		PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
> > +		(pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;
> > +}
> > +
> > +static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
> > +			    unsigned int nr_irqs, void *arg)
> > +{
> > +	int i, ret;
> > +	irq_hw_number_t hwirq = arch_msi_irq_domain_get_hwirq(arg);
> > +
> > +	if (irq_find_mapping(domain, hwirq) > 0)
> > +		return -EEXIST;
> > +
> > +	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
> > +	if (ret >= 0)
> > +		for (i = 0; i < nr_irqs; i++) {
> > +			irq_domain_set_hwirq_and_chip(domain, virq + i,
> > +					hwirq + i, &msi_chip, (void *)(long)i);
> 
> I think msi_chip being a global unique thing is problematic. It does
> not allow multi platform kernels to select a chip at boot time and it
> does not allow per domain chip implementations when you have multiple
> msi domains. Aside of that msi_chip is a pretty bad name for a global.
> 
> The solution is rather simple and msi is wide spread enough to justify
> that.
> 
> struct irqdomain_msi_data {
>        struct irq_chip       *irq_chip;
> };
> 
> We make that a struct so we can accomodate for other special things
> which might be domain rather than architecture specific. One
> obvious use case would be to hold the arch_msi_irq_domain_get/set_hwirq
> callbacks.

That needs to hand in the domain as an argument as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-06 10:01   ` Thomas Gleixner
  2014-11-06 10:30     ` Thomas Gleixner
@ 2014-11-06 11:41     ` Jiang Liu
  2014-11-06 11:59       ` Thomas Gleixner
  1 sibling, 1 reply; 65+ messages in thread
From: Jiang Liu @ 2014-11-06 11:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Benjamin Herrenschmidt, Ingo Molnar, H. Peter Anvin,
	Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Yijing Wang, Alexander Gordeev,
	Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, Suravee Suthikulanit, x86, LKML, linux-pci,
	linux-acpi, LAK

On 2014/11/6 18:01, Thomas Gleixner wrote:
> On Tue, 4 Nov 2014, Jiang Liu wrote:
>> +#ifdef	CONFIG_PCI_MSI_IRQ_DOMAIN
>> +static inline irq_hw_number_t
>> +msi_get_hwirq(struct pci_dev *pdev, struct msi_desc *msidesc)
>> +{
>> +	return (irq_hw_number_t)msidesc->msi_attrib.entry_nr |
>> +		PCI_DEVID(pdev->bus->number, pdev->devfn) << 11 |
>> +		(pci_domain_nr(pdev->bus) & 0xFFFFFFFF) << 27;
>> +}
>> +
>> +static int msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
>> +			    unsigned int nr_irqs, void *arg)
>> +{
>> +	int i, ret;
>> +	irq_hw_number_t hwirq = arch_msi_irq_domain_get_hwirq(arg);
>> +
>> +	if (irq_find_mapping(domain, hwirq) > 0)
>> +		return -EEXIST;
>> +
>> +	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
>> +	if (ret >= 0)
>> +		for (i = 0; i < nr_irqs; i++) {
>> +			irq_domain_set_hwirq_and_chip(domain, virq + i,
>> +					hwirq + i, &msi_chip, (void *)(long)i);
> 
> I think msi_chip being a global unique thing is problematic. It does
> not allow multi platform kernels to select a chip at boot time and it
> does not allow per domain chip implementations when you have multiple
> msi domains. Aside of that msi_chip is a pretty bad name for a global.
> 
> The solution is rather simple and msi is wide spread enough to justify
> that.
> 
> struct irqdomain_msi_data {
>        struct irq_chip       *irq_chip;
> };
> 
> We make that a struct so we can accomodate for other special things
> which might be domain rather than architecture specific. One
> obvious use case would be to hold the arch_msi_irq_domain_get/set_hwirq
> callbacks.
> 
> struct irq_domain *msi_create_irq_domain(struct irq_domain *parent,
>        		  			 struct irqdomain_msi_data *data)
> {
>         struct irq_domain *domain;
> 
>         domain = irq_domain_add_tree(NULL, &msi_domain_ops, NULL);
>         if (domain) {
>                 domain->parent = parent;
> 		domain->msi_data = data;
> 	}
>         return domain;
> }
> 
> Now the above becomes:
> 
>     	struct irq_chip *msi_chip = domain->msi_data->irq_chip;
> 
> 	irq_domain_set_hwirq_and_chip(domain, virq + i,
> 				      hwirq + i, msi_chip, (void *)(long)i);
Hi Thomas,
	Actually I'm working on a patch set to improve MSI support in
the way you described above this afternoon. And I'm also trying to
split MSI code into PCI dependent part and PCI independent part.
I plan to add a file kernel/irq/msi.c to host PCI independent part,
is that OK? Or should I put it under something like drivers/msi/?
The PCI indepenent part will be used to support DMAR/HPET/HTIRQ and
some ARM/ARM64 interrupts.
Regards!
Gerry
> 
>> +int msi_irq_domain_alloc_irqs(struct irq_domain *domain, int type,
>> +			      struct pci_dev *dev, void *arg)
>> +{
>> +	int i, virq;
>> +	struct msi_desc *msidesc;
>> +	int node = dev_to_node(&dev->dev);
>> +
>> +	list_for_each_entry(msidesc, &dev->msi_list, list) {
>> +		arch_msi_irq_domain_set_hwirq(arg, msi_get_hwirq(dev, msidesc));
> 
> The arch_xxx callbacks want to be documented. It's not obvious what
> they are supposed to do.
> 
> Thanks,
> 
> 	tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 13/31] x86: irq_remapping: Introduce new interfaces to support hierarchy irqdomain
  2014-11-04 12:01 ` [Patch Part2 v4 13/31] x86: irq_remapping: Introduce new interfaces to support hierarchy irqdomain Jiang Liu
@ 2014-11-06 11:43   ` Yijing Wang
  0 siblings, 0 replies; 65+ messages in thread
From: Yijing Wang @ 2014-11-06 11:43 UTC (permalink / raw)
  To: Jiang Liu, Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, x86, Joerg Roedel, Matthias Brugger
  Cc: Tony Luck, Greg Kroah-Hartman, linux-kernel, linux-acpi, iommu,
	linux-pci, Andrew Morton, linux-arm-kernel

> +
> +enum irq_alloc_type {
> +	X86_IRQ_ALLOC_TYPE_IOAPIC = 1,
> +	X86_IRQ_ALLOC_TYPE_HPET,
> +	X86_IRQ_ALLOC_TYPE_MSI,
> +	X86_IRQ_ALLOC_TYPE_MSIX,
> +};

Hi Gerry, why not to use X86_IRQ_ALLOC_TYPE_MSI to represent both MSI and MSI-X type?
There are some differences to process MSI and MSI-X in irq remapping domain ?

>  
> +extern struct irq_domain *irq_remapping_get_ir_irq_domain(
> +				struct irq_alloc_info *info);
> +extern struct irq_domain *irq_remapping_get_irq_domain(
> +				struct irq_alloc_info *info);

The two functions are too similar, and both get irq_domain by irq_alloc_info, possible to merge them ?

> +extern void irq_remapping_print_chip(struct irq_data *data, struct seq_file *p);
> +
> +/*
> + * Create MSI/MSIx irqdomain for interrupt remapping device, use @parent as
> + * parent irqdomain.
> + */
> +static inline struct irq_domain *
> +arch_create_msi_irq_domain(struct irq_domain *parent)
> +{
> +	return NULL;
> +}
> +
> +/* Get parent irqdomain for interrupt remapping irqdomain */
> +static inline struct irq_domain *arch_get_ir_parent_domain(void)
> +{
> +	return x86_vector_domain;
> +}
> +
>  #else  /* CONFIG_IRQ_REMAP */
>  
>  static inline void setup_irq_remapping_ops(void) { }
> @@ -101,6 +126,20 @@ static inline bool setup_remapped_irq(int irq,
>  {
>  	return false;
>  }
> +
> +static inline struct irq_domain *
> +irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info)
> +{
> +	return NULL;
> +}
> +
> +static inline struct irq_domain *
> +irq_remapping_get_irq_domain(struct irq_alloc_info *info)
> +{
> +	return NULL;
> +}
> +
> +#define	irq_remapping_print_chip	NULL
>  #endif /* CONFIG_IRQ_REMAP */
>  
>  extern int dmar_alloc_hwirq(void);
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index 63886bafed9f..176ff4372b7d 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -377,7 +377,7 @@ void panic_if_irq_remap(const char *msg)
>  		panic(msg);
>  }
>  
> -static void ir_ack_apic_edge(struct irq_data *data)
> +void ir_ack_apic_edge(struct irq_data *data)
>  {
>  	ack_APIC_irq();
>  }
> @@ -388,6 +388,19 @@ static void ir_ack_apic_level(struct irq_data *data)
>  	eoi_ioapic_irq(data->irq, irqd_cfg(data));
>  }
>  
> +void irq_remapping_print_chip(struct irq_data *data, struct seq_file *p)
> +{
> +	/*
> +	 * Assume interrupt is remapped if the parent irqdomain isn't the
> +	 * vector domain, which is true for MSI, HPET and IOAPIC on x86
> +	 * platforms.
> +	 */
> +	if (data->domain && data->domain->parent != arch_get_ir_parent_domain())
> +		seq_printf(p, " IR-%s", data->chip->name);
> +	else
> +		seq_printf(p, " %s", data->chip->name);
> +}
> +
>  static void ir_print_prefix(struct irq_data *data, struct seq_file *p)
>  {
>  	seq_printf(p, " IR-%s", data->chip->name);
> @@ -409,3 +422,36 @@ bool setup_remapped_irq(int irq, struct irq_cfg *cfg, struct irq_chip *chip)
>  	irq_remap_modify_chip_defaults(chip);
>  	return true;
>  }
> +
> +/**
> + * irq_remapping_get_ir_irq_domain - Get the irqdomain associated the IOMMU
> + *				     device serving @info
> + * @info: interrupt allocation information, used to find the IOMMU device
> + *
> + * It's used to get parent irqdomain for HPET and IOAPIC domains.
> + * Returns pointer to IRQ domain, or NULL on failure.
> + */
> +struct irq_domain *
> +irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info)
> +{
> +	if (!remap_ops || !remap_ops->get_ir_irq_domain)
> +		return NULL;
> +
> +	return remap_ops->get_ir_irq_domain(info);
> +}
> +
> +/**
> + * irq_remapping_get_irq_domain - Get the irqdomain serving the MSI interrupt
> + * @info: interrupt allocation information, used to find the IOMMU device
> + *
> + * It's used to get irqdomain for MSI/MSIx interrupt allocation.
> + * Returns pointer to IRQ domain, or NULL on failure.
> + */
> +struct irq_domain *
> +irq_remapping_get_irq_domain(struct irq_alloc_info *info)
> +{
> +	if (!remap_ops || !remap_ops->get_irq_domain)
> +		return NULL;
> +
> +	return remap_ops->get_irq_domain(info);
> +}
> diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> index fde250f86e60..8c159d6fac46 100644
> --- a/drivers/iommu/irq_remapping.h
> +++ b/drivers/iommu/irq_remapping.h
> @@ -30,6 +30,8 @@ struct irq_data;
>  struct cpumask;
>  struct pci_dev;
>  struct msi_msg;
> +struct irq_domain;
> +struct irq_alloc_info;
>  
>  extern int disable_irq_remap;
>  extern int irq_remap_broken;
> @@ -81,11 +83,19 @@ struct irq_remap_ops {
>  
>  	/* Setup interrupt remapping for an HPET MSI */
>  	int (*alloc_hpet_msi)(unsigned int, unsigned int);
> +
> +	/* Get the irqdomain associated the IOMMU device */
> +	struct irq_domain *(*get_ir_irq_domain)(struct irq_alloc_info *);
> +
> +	/* Get the MSI irqdomain associated with the IOMMU device */
> +	struct irq_domain *(*get_irq_domain)(struct irq_alloc_info *);
>  };
>  
>  extern struct irq_remap_ops intel_irq_remap_ops;
>  extern struct irq_remap_ops amd_iommu_irq_ops;
>  
> +extern void ir_ack_apic_edge(struct irq_data *data);
> +
>  #else  /* CONFIG_IRQ_REMAP */
>  
>  #define irq_remapping_enabled 0
> 


-- 
Thanks!
Yijing


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain
  2014-11-06 11:41     ` Jiang Liu
@ 2014-11-06 11:59       ` Thomas Gleixner
  0 siblings, 0 replies; 65+ messages in thread
From: Thomas Gleixner @ 2014-11-06 11:59 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Ingo Molnar, H. Peter Anvin,
	Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap, Yinghai Lu,
	Borislav Petkov, Grant Likely, Marc Zyngier, Yingjoe Chen,
	Matthias Brugger, Yijing Wang, Alexander Gordeev,
	Konrad Rzeszutek Wilk, Andrew Morton, Tony Luck, Joerg Roedel,
	Greg Kroah-Hartman, Suravee Suthikulanit, x86, LKML, linux-pci,
	linux-acpi, LAK

On Thu, 6 Nov 2014, Jiang Liu wrote:
> On 2014/11/6 18:01, Thomas Gleixner wrote:
> Hi Thomas,
> 	Actually I'm working on a patch set to improve MSI support in
> the way you described above this afternoon. And I'm also trying to
> split MSI code into PCI dependent part and PCI independent part.
> I plan to add a file kernel/irq/msi.c to host PCI independent part,

kernel/irq/msi.c is fine.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
  2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
                   ` (31 preceding siblings ...)
  2014-11-04 14:47 ` [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Joerg Roedel
@ 2014-11-06 13:07 ` Joerg Roedel
  2014-11-06 13:35   ` Jiang Liu
  32 siblings, 1 reply; 65+ messages in thread
From: Joerg Roedel @ 2014-11-06 13:07 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

On Tue, Nov 04, 2014 at 08:01:34PM +0800, Jiang Liu wrote:
> Patch 1-5 enhance irqdomain and irq core to support hierarchy irqdomain
> and stacked irqchip.
> Patch 6-12 implement an irqdomain to manange CPU interrupt vectors, and
> it's the root irqdomain for x86 platforms.
> Patch 13-16 converts Intel and AMD interrupt remapping drivers to
> support hierarchy irqdomain.
> Patch 17-23 converts HPET and MSI to support hierarchy irqdomain.
> Patch 24-27 cleans up unsued code in x86 arch and interrupt remapping
> drivers.
> Patch 28-31 converts DMAR, HTIRQ and UV to support hierarchy irqdomain.

Okay, I looked over the IOMMU changes and did some testing. With Jiangs
fix for the AMD Kaveri boot panic it looks good to me. So for the IOMMU
parts:

Acked-by: Joerg Roedel <jroedel@suse.de>


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms
  2014-11-06 13:07 ` Joerg Roedel
@ 2014-11-06 13:35   ` Jiang Liu
  0 siblings, 0 replies; 65+ messages in thread
From: Jiang Liu @ 2014-11-06 13:35 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Benjamin Herrenschmidt, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Rafael J. Wysocki, Bjorn Helgaas, Randy Dunlap,
	Yinghai Lu, Borislav Petkov, Grant Likely, Marc Zyngier,
	Yingjoe Chen, Matthias Brugger, Konrad Rzeszutek Wilk,
	Andrew Morton, Tony Luck, Greg Kroah-Hartman, x86, linux-kernel,
	linux-pci, linux-acpi, linux-arm-kernel

On 2014/11/6 21:07, Joerg Roedel wrote:
> On Tue, Nov 04, 2014 at 08:01:34PM +0800, Jiang Liu wrote:
>> Patch 1-5 enhance irqdomain and irq core to support hierarchy irqdomain
>> and stacked irqchip.
>> Patch 6-12 implement an irqdomain to manange CPU interrupt vectors, and
>> it's the root irqdomain for x86 platforms.
>> Patch 13-16 converts Intel and AMD interrupt remapping drivers to
>> support hierarchy irqdomain.
>> Patch 17-23 converts HPET and MSI to support hierarchy irqdomain.
>> Patch 24-27 cleans up unsued code in x86 arch and interrupt remapping
>> drivers.
>> Patch 28-31 converts DMAR, HTIRQ and UV to support hierarchy irqdomain.
> 
> Okay, I looked over the IOMMU changes and did some testing. With Jiangs
> fix for the AMD Kaveri boot panic it looks good to me. So for the IOMMU
> parts:
> 
> Acked-by: Joerg Roedel <jroedel@suse.de>
Thanks, Joerg!

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2014-11-06 13:35 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-04 12:01 [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 01/31] irqdomain: Introduce new interfaces to support hierarchy irqdomains Jiang Liu
2014-11-05 23:48   ` Thomas Gleixner
2014-11-06  6:09     ` Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 02/31] irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 03/31] genirq: Introduce helper functions to support stacked irq_chip Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 04/31] genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 05/31] genirq: Add IRQ_SET_MASK_OK_DONE " Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 06/31] x86, irq: Save destination CPU ID in irq_cfg Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 07/31] x86, irq: Use hierarchy irqdomain to manage CPU interrupt vectors Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 08/31] x86, hpet: Use new irqdomain interfaces to allocate/free IRQ Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 09/31] x86, MSI: " Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 10/31] x86, uv: " Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 11/31] x86, htirq: " Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 12/31] x86, dmar: " Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 13/31] x86: irq_remapping: Introduce new interfaces to support hierarchy irqdomain Jiang Liu
2014-11-06 11:43   ` Yijing Wang
2014-11-04 12:01 ` [Patch Part2 v4 14/31] iommu/vt-d: Change prototypes to prepare for enabling " Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 15/31] iommu/vt-d: Enhance Intel IR driver to suppport " Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 16/31] iommu/amd: Enhance AMD " Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 17/31] x86, hpet: Enhance HPET IRQ to support " Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 18/31] PCI/MSI, trivial: Fix minor syntax issues according to coding styles Jiang Liu
2014-11-05 22:10   ` Bjorn Helgaas
2014-11-05 22:10   ` Bjorn Helgaas
2014-11-04 12:01 ` [Patch Part2 v4 19/31] PCI/MSI: Simplify PCI MSI code by initializing msi_desc.nvec_used earlier Jiang Liu
2014-11-05 22:35   ` Bjorn Helgaas
2014-11-04 12:01 ` [Patch Part2 v4 20/31] PCI/MSI: Kill redundant calling for irq_set_msi_desc() for MSIx interrupts Jiang Liu
2014-11-05 22:45   ` Bjorn Helgaas
2014-11-06  1:32     ` Yijing Wang
2014-11-06  4:04       ` Bjorn Helgaas
2014-11-06  4:31       ` Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 21/31] PCI/MSI: enhance PCI MSI core to support hierarchy irqdomain Jiang Liu
2014-11-05 23:09   ` Bjorn Helgaas
2014-11-06  1:58     ` Yijing Wang
2014-11-06  4:10       ` Bjorn Helgaas
2014-11-06  4:54         ` Yijing Wang
2014-11-06  5:06       ` Jiang Liu
2014-11-06  5:42         ` Yijing Wang
2014-11-06  4:58     ` Jiang Liu
2014-11-06  5:28       ` Bjorn Helgaas
2014-11-06 10:01   ` Thomas Gleixner
2014-11-06 10:30     ` Thomas Gleixner
2014-11-06 11:41     ` Jiang Liu
2014-11-06 11:59       ` Thomas Gleixner
2014-11-04 12:01 ` [Patch Part2 v4 22/31] x86, PCI, MSI: Use hierarchy irqdomain to manage MSI interrupts Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 23/31] x86, irq: Directly call native_compose_msi_msg() for DMAR IRQ Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 24/31] iommu/vt-d: Clean up unused MSI related code Jiang Liu
2014-11-04 12:01 ` [Patch Part2 v4 25/31] iommu/amd: " Jiang Liu
2014-11-04 12:02 ` [Patch Part2 v4 26/31] x86: irq_remapping: " Jiang Liu
2014-11-04 12:02 ` [Patch Part2 v4 27/31] x86, irq: Clean up unused MSI related code and interfaces Jiang Liu
2014-11-04 12:02 ` [Patch Part2 v4 28/31] iommu/vt-d: Refine the interfaces to create IRQ for DMAR unit Jiang Liu
2014-11-04 12:02 ` [Patch Part2 v4 29/31] x86, irq: Use hierarchy irqdomain to manage DMAR interrupts Jiang Liu
2014-11-04 12:02 ` [Patch Part2 v4 30/31] x86, htirq: Use hierarchy irqdomain to manage Hypertransport interrupts Jiang Liu
2014-11-04 12:02 ` [Patch Part2 v4 31/31] x86, uv: Use hierarchy irqdomain to manage UV interrupts Jiang Liu
2014-11-04 14:47 ` [Patch Part2 v4 00/31] Enable hierarchy irqdomian on x86 platforms Joerg Roedel
2014-11-04 15:12   ` Jiang Liu
2014-11-04 15:32     ` Joerg Roedel
2014-11-05  8:51     ` Joerg Roedel
2014-11-05  9:04       ` Jiang Liu
2014-11-05  9:41       ` Jiang Liu
2014-11-05  9:58         ` Joerg Roedel
2014-11-05 10:28           ` Jiang Liu
2014-11-05 11:10             ` Joerg Roedel
2014-11-06 13:07 ` Joerg Roedel
2014-11-06 13:35   ` Jiang Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).