linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch V2 00/13] Introduce dev-msi and interrupt message store
@ 2021-02-26 20:11 Megha Dey
  2021-02-26 20:11 ` [Patch V2 01/13] x86/irq: Add DEV_MSI allocation type Megha Dey
                   ` (12 more replies)
  0 siblings, 13 replies; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

Provide support for device specific MSI implementations for devices which
have their own resource management and interrupt chip. These devices are
not related to PCI and contrary to platform MSI they do not share a
common resource and interrupt chip. They provide their own domain specific
resource management and interrupt chip.

On top of this, add support for Interrupt Message Store or IMS[1], which
is a scalable device specific interrupt mechanism to support devices which
may need more than 2048 interrupts. With IMS, there is theoretically no
upper bound on the number of interrupts a device can support. The normal
IMS use case is for guest usages but it can very well be used by host too.

Introduce a generic IMS irq chip and domain which stores the interrupt
messages as an array in device memory. Allocation and freeing of interrupts
happens via the generic msi_domain_alloc/free_irqs() interface. One only
needs to ensure the interrupt domain is stored in the underlying device struct.

These patches can be divided into following steps:

1. X86 specific preparation for device MSI
   x86/irq: Add DEV_MSI allocation type
   x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI

2. Generic device MSI infrastructure
   platform-msi: Provide default irq_chip:: Ack
   genirq/proc: Take buslock on affinity write
   genirq/msi: Provide and use msi_domain_set_default_info_flags()
   platform-msi: Add device MSI infrastructure
   irqdomain/msi: Provide msi_alloc/free_store() callbacks
   genirq: Set auxiliary data for an interrupt
   genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number

3. Interrupt Message Store (IMS) irq domain/chip implementation for device array
   irqchip: Add IMS (Interrupt Message Store) driver
   iommu/vt-d: Add DEV-MSI support

4. Add platform check for subdevice irq domain
   iommu: Add capability IOMMU_CAP_VIOMMU
   platform-msi: Add platform check for subdevice irq domain

The device IMS (Interrupt Message Storage) should not be enabled in any
virtualization environments unless there is a HYPERCALL domain which
makes the changes in the message store monitored by the hypervisor.[2]
As the initial step, we allow the IMS to be enabled only if we are
running on the bare metal. It's easy to enable IMS in the virtualization
environments if above preconditions are met in the future.

These patches have been tested with the IDXD driver:
        https://github.com/intel/idxd-driver idxd-stage2.5
Most of these patches are originally by Thomas:
        https://lore.kernel.org/linux-hyperv/20200826111628.794979401@linutronix.de/
and are rebased on latest kernel.

This patches series has undergone a lot of changes since first submitted as an RFC
in September 2019. I have divided the changes into 3 stages for better understanding:

Stage 1: Standalone RFC IMS series[3]
-------------------------------------
https://lore.kernel.org/lkml/1568338328-22458-1-git-send-email-megha.dey@linux.intel.com/
V1:
1. Extend existing platform-msi to support IMS
2. platform_msi_domain_alloc_irqs_group is introduced to allocate IMS
   interrupts, tagged with a group ID.
3. To free vectors of a particular group, platform_msi_domain_free_irqs_group
   API in introduced

Stage 2: dev-msi/IMS merged with Dave Jiang's IDXD series[2]
------------------------------------------------------------
V1: (changes from stage 1):
1. Introduced a new list for platform-msi descriptors
2. Introduced dynamic allocation for IMS interrupts
3. shifted creation of ims domain from arch/x86 to drivers/
4. Removed arch specific callbacks
5. Introduced enum platform_msi_type
6. Added more technical details to the cover letter
7. Merge common code between platform-msi.c and ims-msi.c
8. Introduce new structures platform_msi_ops and platform_msi_funcs
9. Addressed Andriy Shevchenko's comments on RFC V1 patch series
10. Dropped the dynamic group allocation scheme.
11. Use RCU lock instead of mutex lock to protect the device list

V2:
1. IMS made dev-msi
2. With recommendations from Jason/Thomas/Dan on making IMS more generic
3. Pass a non-pci generic device(struct device) for IMS management instead of mdev
4. Remove all references to mdev and symbol_get/put
5. Remove all references to IMS in common code and replace with dev-msi
6. Remove dynamic allocation of platform-msi interrupts: no groups,no
   new msi list or list helpers
7. Create a generic dev-msi domain with and without interrupt remapping enabled.
8. Introduce dev_msi_domain_alloc_irqs and dev_msi_domain_free_irqs apis

V3:
1. No need to add support for 2 different dev-msi irq domains, a common
   once can be used for both the cases(with IR enabled/disabled)
2. Add arch specific function to specify additions to msi_prepare callback
   instead of making the callback a weak function
3. Call platform ops directly instead of a wrapper function
4. Make mask/unmask callbacks as void functions
5. dev->msi_domain should be updated at the device driver level before
   calling dev_msi_alloc_irqs()
6. dev_msi_alloc/free_irqs() cannot be used for PCI devices
7. Followed the generic layering scheme: infrastructure bits->arch bits->enabling bits

V4:
1. Make interrupt remapping code more readable
2. Add flush writes to unmask/write and reset ims slots
3. Interrupt Message Storm-> Interrupt Message Store
4. Merge in pasid programming code.

Stage 3: Standalone dev-msi and IMS driver series
-------------------------------------------------
V1:(Changes from Stage 2 V4)[6]
1. Split dev-msi/IMS code from Dave Jiang’s IDXD patch series
2. Set the source-id of all dev-msi interrupt requests to the parent PCI device
3. Separated core irq code from IMS related code
4. Added missing set_desc ops to the IMS msi_domain_ops
5. Added more details in the commit message-test case for auxillary interrupt data
6. Updated the copyright year from 2020 to 2021
7. Updated cover letter
8. Add platform check for subdevice irq domain (Lu Baolu):
   V1->V2:
   - V1 patches:[4]
   - Rename probably_on_bare_metal() with on_bare_metal();
   - Some vendors might use the same name for both bare metal and virtual
     environment. Before we add vendor specific code to distinguish
     between them, let's return false in on_bare_metal(). This won't
     introduce any regression. The only impact is that the coming new
     platform msi feature won't be supported until the vendor specific code
     is provided.
   V2->V3:
   - V2 patches:[5]
   - Add all identified heuristics so far

V1->V2:
1. s/arch_support_pci_device_ims/arch_support_pci_device_msi/g
2. Remove CONFIG_DEVICE_MSI in arch/x86/pci/common.c
3. Added helper functions to get linux IRQ and dev-msi HW IRQ numbers
4. Change the caching mode logic from dynamic to static

Dave Jiang (1):
  genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number

Lu Baolu (2):
  iommu: Add capability IOMMU_CAP_VIOMMU_HINT
  platform-msi: Add platform check for subdevice irq domain

Megha Dey (3):
  genirq: Set auxiliary data for an interrupt
  iommu/vt-d: Add DEV-MSI support
  irqchip: Add IMS (Interrupt Message Store) driver

Thomas Gleixner (7):
  x86/irq: Add DEV_MSI allocation type
  x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI
  platform-msi: Provide default irq_chip:: Ack
  genirq/proc: Take buslock on affinity write
  genirq/msi: Provide and use msi_domain_set_default_info_flags()
  platform-msi: Add device MSI infrastructure
  irqdomain/msi: Provide msi_alloc/free_store() callbacks

 arch/x86/include/asm/hw_irq.h       |   1 +
 arch/x86/include/asm/msi.h          |   4 +-
 arch/x86/kernel/apic/msi.c          |  27 +++--
 arch/x86/pci/common.c               |  72 ++++++++++++
 drivers/base/platform-msi.c         | 141 ++++++++++++++++++++++++
 drivers/iommu/amd/iommu.c           |   2 +
 drivers/iommu/intel/iommu.c         |   5 +
 drivers/iommu/intel/irq_remapping.c |   6 +-
 drivers/iommu/virtio-iommu.c        |   9 ++
 drivers/irqchip/Kconfig             |  14 +++
 drivers/irqchip/Makefile            |   1 +
 drivers/irqchip/irq-ims-msi.c       | 211 ++++++++++++++++++++++++++++++++++++
 drivers/pci/controller/pci-hyperv.c |   2 +-
 drivers/pci/msi.c                   |   7 +-
 include/linux/interrupt.h           |   2 +
 include/linux/iommu.h               |   2 +
 include/linux/irq.h                 |   4 +
 include/linux/irqchip/irq-ims-msi.h |  68 ++++++++++++
 include/linux/irqdomain.h           |   1 +
 include/linux/msi.h                 |  41 +++++++
 kernel/irq/Kconfig                  |   4 +
 kernel/irq/manage.c                 |  38 ++++++-
 kernel/irq/msi.c                    |  79 ++++++++++++++
 23 files changed, 722 insertions(+), 19 deletions(-)
 create mode 100644 drivers/irqchip/irq-ims-msi.c
 create mode 100644 include/linux/irqchip/irq-ims-msi.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 01/13] x86/irq: Add DEV_MSI allocation type
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-02-26 20:11 ` [Patch V2 02/13] x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI Megha Dey
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

From: Thomas Gleixner <tglx@linutronix.de>

For the upcoming device MSI support a new allocation type is
required.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 arch/x86/include/asm/hw_irq.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index d465ece..0531b9c 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -41,6 +41,7 @@ enum irq_alloc_type {
 	X86_IRQ_ALLOC_TYPE_DMAR,
 	X86_IRQ_ALLOC_TYPE_AMDVI,
 	X86_IRQ_ALLOC_TYPE_UV,
+	X86_IRQ_ALLOC_TYPE_DEV_MSI,
 };
 
 struct ioapic_alloc_info {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 02/13] x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
  2021-02-26 20:11 ` [Patch V2 01/13] x86/irq: Add DEV_MSI allocation type Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-02-26 20:11 ` [Patch V2 03/13] platform-msi: Provide default irq_chip:: Ack Megha Dey
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

From: Thomas Gleixner <tglx@linutronix.de>

Rename it to x86_msi_prepare() and handle the allocation type setup
depending on the device type.

Add a new arch_msi_prepare define which will be utilized by the upcoming
device MSI support. Define it to NULL if not provided by an architecture
in the generic MSI header.

One arch specific function for MSI support is truly enough.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 arch/x86/include/asm/msi.h          |  4 +++-
 arch/x86/kernel/apic/msi.c          | 27 ++++++++++++++++++++-------
 drivers/pci/controller/pci-hyperv.c |  2 +-
 include/linux/msi.h                 |  4 ++++
 4 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index b85147d..9bd214e 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -6,9 +6,11 @@
 
 typedef struct irq_alloc_info msi_alloc_info_t;
 
-int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 		    msi_alloc_info_t *arg);
 
+#define arch_msi_prepare		x86_msi_prepare
+
 /* Structs and defines for the X86 specific MSI message format */
 
 typedef struct x86_msi_data {
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 44ebe25..84b16c7 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -153,26 +153,39 @@ static struct irq_chip pci_msi_controller = {
 	.flags			= IRQCHIP_SKIP_SET_WAKE,
 };
 
-int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
-		    msi_alloc_info_t *arg)
+static void pci_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
 {
-	struct pci_dev *pdev = to_pci_dev(dev);
-	struct msi_desc *desc = first_pci_msi_entry(pdev);
+	struct msi_desc *desc = first_msi_entry(dev);
 
-	init_irq_alloc_info(arg, NULL);
 	if (desc->msi_attrib.is_msix) {
 		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
 	} else {
 		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
 		arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
 	}
+}
+
+static void dev_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
+{
+	arg->type = X86_IRQ_ALLOC_TYPE_DEV_MSI;
+}
+
+int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+		    msi_alloc_info_t *arg)
+{
+	init_irq_alloc_info(arg, NULL);
+
+	if (dev_is_pci(dev))
+		pci_msi_prepare(dev, arg);
+	else
+		dev_msi_prepare(dev, arg);
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(pci_msi_prepare);
+EXPORT_SYMBOL_GPL(x86_msi_prepare);
 
 static struct msi_domain_ops pci_msi_domain_ops = {
-	.msi_prepare	= pci_msi_prepare,
+	.msi_prepare	= x86_msi_prepare,
 };
 
 static struct msi_domain_info pci_msi_domain_info = {
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 27a17a1..ac4fe8b7 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1546,7 +1546,7 @@ static struct irq_chip hv_msi_irq_chip = {
 };
 
 static struct msi_domain_ops hv_msi_ops = {
-	.msi_prepare	= pci_msi_prepare,
+	.msi_prepare	= arch_msi_prepare,
 	.msi_free	= hv_msi_free,
 };
 
diff --git a/include/linux/msi.h b/include/linux/msi.h
index aef35fd..f3e54d2 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -473,4 +473,8 @@ static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev)
 }
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
 
+#ifndef arch_msi_prepare
+# define arch_msi_prepare	NULL
+#endif
+
 #endif /* LINUX_MSI_H */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 03/13] platform-msi: Provide default irq_chip:: Ack
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
  2021-02-26 20:11 ` [Patch V2 01/13] x86/irq: Add DEV_MSI allocation type Megha Dey
  2021-02-26 20:11 ` [Patch V2 02/13] x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-02-26 20:11 ` [Patch V2 04/13] genirq/proc: Take buslock on affinity write Megha Dey
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

From: Thomas Gleixner <tglx@linutronix.de>

For the upcoming device MSI support it's required to have a default
irq_chip::ack implementation (irq_chip_ack_parent) so the drivers do not
need to care.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 drivers/base/platform-msi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index 2c1e2e0..9d9ccfc 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -101,6 +101,8 @@ static void platform_msi_update_chip_ops(struct msi_domain_info *info)
 		chip->irq_mask = irq_chip_mask_parent;
 	if (!chip->irq_unmask)
 		chip->irq_unmask = irq_chip_unmask_parent;
+	if (!chip->irq_ack)
+		chip->irq_ack = irq_chip_ack_parent;
 	if (!chip->irq_eoi)
 		chip->irq_eoi = irq_chip_eoi_parent;
 	if (!chip->irq_set_affinity)
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 04/13] genirq/proc: Take buslock on affinity write
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
                   ` (2 preceding siblings ...)
  2021-02-26 20:11 ` [Patch V2 03/13] platform-msi: Provide default irq_chip:: Ack Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-02-26 20:11 ` [Patch V2 05/13] genirq/msi: Provide and use msi_domain_set_default_info_flags() Megha Dey
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

From: Thomas Gleixner <tglx@linutronix.de>

Until now interrupt chips which support setting affinity are not locking
the associated bus lock for two reasons:

 - All chips which support affinity setting do not use buslock because they
   just can operated directly on the hardware.

 - All chips which use buslock do not support affinity setting because
   their interrupt chips are not capable. These chips are usually connected
   over a bus like I2C, SPI etc. and have an interrupt output which is
   conneted to CPU interrupt of some sort. So there is no way to set the
   affinity on the chip itself.

Upcoming hardware which is PCIE based sports a non standard MSI(X) variant
which stores the MSI message in RAM which is associated to e.g. a device
queue. The device manages this RAM and writes have to be issued via command
queues or similar mechanisms which is obviously not possible from interrupt
disabled, raw spinlock held context.

The buslock mechanism of irq chips can be utilized to support that. The
affinity write to the chip writes to shadow state, marks it pending and the
irq chip's irq_bus_sync_unlock() callback handles the command queue and
wait for completion similar to the other chip operations on I2C or SPI
buses.

Change the locking in irq_set_affinity() to bus_lock/unlock to help with
that. There are a few other callers than the proc interface, but none of
them is affected by this change as none of them affects an irq chip with
bus lock support.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 kernel/irq/manage.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index dec3f73..85ede4e 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -443,16 +443,16 @@ int irq_update_affinity_desc(unsigned int irq,
 
 int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force)
 {
-	struct irq_desc *desc = irq_to_desc(irq);
+	struct irq_desc *desc;
 	unsigned long flags;
 	int ret;
 
+	desc = irq_get_desc_buslock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL);
 	if (!desc)
 		return -EINVAL;
 
-	raw_spin_lock_irqsave(&desc->lock, flags);
 	ret = irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask, force);
-	raw_spin_unlock_irqrestore(&desc->lock, flags);
+	irq_put_desc_busunlock(desc, flags);
 	return ret;
 }
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 05/13] genirq/msi: Provide and use msi_domain_set_default_info_flags()
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
                   ` (3 preceding siblings ...)
  2021-02-26 20:11 ` [Patch V2 04/13] genirq/proc: Take buslock on affinity write Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-02-26 20:11 ` [Patch V2 06/13] platform-msi: Add device MSI infrastructure Megha Dey
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

From: Thomas Gleixner <tglx@linutronix.de>

MSI interrupts have some common flags which should be set not only for
PCI/MSI interrupts.

Move the PCI/MSI flag setting into a common function so it can be reused.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 drivers/pci/msi.c   |  7 +------
 include/linux/msi.h |  1 +
 kernel/irq/msi.c    | 24 ++++++++++++++++++++++++
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 3162f88..20d2512 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1492,12 +1492,7 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
 	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
 		pci_msi_domain_update_chip_ops(info);
 
-	info->flags |= MSI_FLAG_ACTIVATE_EARLY;
-	if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))
-		info->flags |= MSI_FLAG_MUST_REACTIVATE;
-
-	/* PCI-MSI is oneshot-safe */
-	info->chip->flags |= IRQCHIP_ONESHOT_SAFE;
+	msi_domain_set_default_info_flags(info);
 
 	domain = msi_create_irq_domain(fwnode, info, parent);
 	if (!domain)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index f3e54d2..f6e52de 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -454,6 +454,7 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
 void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq,
 			      unsigned int nvec);
 void *platform_msi_get_host_data(struct irq_domain *domain);
+void msi_domain_set_default_info_flags(struct msi_domain_info *info);
 #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
 
 #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index b338d62..c54316d 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -70,6 +70,30 @@ void get_cached_msi_msg(unsigned int irq, struct msi_msg *msg)
 EXPORT_SYMBOL_GPL(get_cached_msi_msg);
 
 #ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
+void msi_domain_set_default_info_flags(struct msi_domain_info *info)
+{
+	/* Required so that a device latches a valid MSI message on startup */
+	info->flags |= MSI_FLAG_ACTIVATE_EARLY;
+
+	/*
+	 * Interrupt reservation mode allows to stear the MSI message of an
+	 * inactive device to a special (usually spurious interrupt) target.
+	 * This allows to prevent interrupt vector exhaustion e.g. on x86.
+	 * But (PCI)MSI interrupts are activated early - see above - so the
+	 * interrupt request/startup sequence would not try to allocate a
+	 * usable vector which means that the device interrupts would end
+	 * up on the special vector and issue spurious interrupt messages.
+	 * Setting the reactivation flag ensures that when the interrupt
+	 * is requested the activation is invoked again so that a real
+	 * vector can be allocated.
+	 */
+	if (IS_ENABLED(CONFIG_GENERIC_IRQ_RESERVATION_MODE))
+		info->flags |= MSI_FLAG_MUST_REACTIVATE;
+
+	/* MSI is oneshot-safe at least in theory */
+	info->chip->flags |= IRQCHIP_ONESHOT_SAFE;
+}
+
 static inline void irq_chip_write_msi_msg(struct irq_data *data,
 					  struct msi_msg *msg)
 {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 06/13] platform-msi: Add device MSI infrastructure
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
                   ` (4 preceding siblings ...)
  2021-02-26 20:11 ` [Patch V2 05/13] genirq/msi: Provide and use msi_domain_set_default_info_flags() Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-02-26 20:11 ` [Patch V2 07/13] irqdomain/msi: Provide msi_alloc/free_store() callbacks Megha Dey
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

From: Thomas Gleixner <tglx@linutronix.de>

Add device specific MSI domain infrastructure for devices which have their
own resource management and interrupt chip. These devices are not related
to PCI and contrary to platform MSI they do not share a common resource and
interrupt chip. They provide their own domain specific resource management
and interrupt chip.

This utilizes the new alloc/free override in a non evil way which avoids
having yet another set of specialized alloc/free functions. Just using
msi_domain_alloc/free_irqs() is sufficient

While initially it was suggested and tried to piggyback device MSI on
platform MSI, the better variant is to reimplement platform MSI on top of
device MSI.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 drivers/base/platform-msi.c | 131 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/irqdomain.h   |   1 +
 include/linux/msi.h         |  24 ++++++++
 kernel/irq/Kconfig          |   4 ++
 4 files changed, 160 insertions(+)

diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index 9d9ccfc..6127b3b 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -419,3 +419,134 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
 
 	return err;
 }
+
+#ifdef CONFIG_DEVICE_MSI
+/*
+ * Device specific MSI domain infrastructure for devices which have their
+ * own resource management and interrupt chip. These devices are not
+ * related to PCI and contrary to platform MSI they do not share a common
+ * resource and interrupt chip. They provide their own domain specific
+ * resource management and interrupt chip.
+ */
+
+static void device_msi_free_msi_entries(struct device *dev)
+{
+	struct list_head *msi_list = dev_to_msi_list(dev);
+	struct msi_desc *entry, *tmp;
+
+	list_for_each_entry_safe(entry, tmp, msi_list, list) {
+		list_del(&entry->list);
+		free_msi_entry(entry);
+	}
+}
+
+/**
+ * device_msi_free_irqs - Free MSI interrupts assigned to  a device
+ * @dev:	Pointer to the device
+ *
+ * Frees the interrupt and the MSI descriptors.
+ */
+static void device_msi_free_irqs(struct irq_domain *domain, struct device *dev)
+{
+	__msi_domain_free_irqs(domain, dev);
+	device_msi_free_msi_entries(dev);
+}
+
+/**
+ * device_msi_alloc_irqs - Allocate MSI interrupts for a device
+ * @dev:	Pointer to the device
+ * @nvec:	Number of vectors
+ *
+ * Allocates the required number of MSI descriptors and the corresponding
+ * interrupt descriptors.
+ */
+static int device_msi_alloc_irqs(struct irq_domain *domain, struct device *dev, int nvec)
+{
+	int i, ret = -ENOMEM;
+
+	for (i = 0; i < nvec; i++) {
+		struct msi_desc *entry = alloc_msi_entry(dev, 1, NULL);
+
+		if (!entry)
+			goto fail;
+		list_add_tail(&entry->list, dev_to_msi_list(dev));
+	}
+
+	ret = __msi_domain_alloc_irqs(domain, dev, nvec);
+	if (!ret)
+		return 0;
+fail:
+	device_msi_free_msi_entries(dev);
+	return ret;
+}
+
+static void device_msi_update_dom_ops(struct msi_domain_info *info)
+{
+	if (!info->ops->domain_alloc_irqs)
+		info->ops->domain_alloc_irqs = device_msi_alloc_irqs;
+	if (!info->ops->domain_free_irqs)
+		info->ops->domain_free_irqs = device_msi_free_irqs;
+	if (!info->ops->msi_prepare)
+		info->ops->msi_prepare = arch_msi_prepare;
+}
+
+/**
+ * device_msi_create_msi_irq_domain - Create an irq domain for devices
+ * @fwnode:	Firmware node of the interrupt controller
+ * @info:	MSI domain info to configure the new domain
+ * @parent:	Parent domain
+ */
+struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn,
+						struct msi_domain_info *info,
+						struct irq_domain *parent)
+{
+	struct irq_domain *domain;
+
+	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
+		platform_msi_update_chip_ops(info);
+
+	if (info->flags & MSI_FLAG_USE_DEF_DOM_OPS)
+		device_msi_update_dom_ops(info);
+
+	msi_domain_set_default_info_flags(info);
+
+	domain = msi_create_irq_domain(fn, info, parent);
+	if (domain)
+		irq_domain_update_bus_token(domain, DOMAIN_BUS_DEVICE_MSI);
+	return domain;
+}
+
+#ifdef CONFIG_PCI
+#include <linux/pci.h>
+
+/**
+ * pci_subdevice_msi_create_irq_domain - Create an irq domain for subdevices
+ * @pdev:	Pointer to PCI device for which the subdevice domain is created
+ * @info:	MSI domain info to configure the new domain
+ */
+struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev,
+						       struct msi_domain_info *info)
+{
+	struct irq_domain *domain, *pdev_msi;
+	struct fwnode_handle *fn;
+
+	/*
+	 * Retrieve the MSI domain of the underlying PCI device's MSI
+	 * domain. The PCI device domain's parent domain is also the parent
+	 * domain of the new subdevice domain.
+	 */
+	pdev_msi = dev_get_msi_domain(&pdev->dev);
+	if (!pdev_msi)
+		return NULL;
+
+	fn = irq_domain_alloc_named_fwnode(dev_name(&pdev->dev));
+	if (!fn)
+		return NULL;
+	domain = device_msi_create_irq_domain(fn, info, pdev_msi->parent);
+	if (!domain)
+		irq_domain_free_fwnode(fn);
+	return domain;
+}
+EXPORT_SYMBOL_GPL(pci_subdevice_msi_create_irq_domain);
+#endif /* CONFIG_PCI */
+#endif /* CONFIG_DEVICE_MSI */
diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 42d1968..06c88ba 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -85,6 +85,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_TI_SCI_INTA_MSI,
 	DOMAIN_BUS_WAKEUP,
 	DOMAIN_BUS_VMD_MSI,
+	DOMAIN_BUS_DEVICE_MSI,
 };
 
 /**
diff --git a/include/linux/msi.h b/include/linux/msi.h
index f6e52de..46e879c 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -95,6 +95,18 @@ struct ti_sci_inta_msi_desc {
 };
 
 /**
+ * device_msi_desc - Device MSI specific MSI descriptor data
+ * @priv:		Pointer to device specific private data
+ * @priv_iomem:		Pointer to device specific private io memory
+ * @hwirq:		The hardware irq number in the device domain
+ */
+struct device_msi_desc {
+	void		*priv;
+	void __iomem	*priv_iomem;
+	u16		hwirq;
+};
+
+/**
  * struct msi_desc - Descriptor structure for MSI based interrupts
  * @list:	List head for management
  * @irq:	The base interrupt number
@@ -166,6 +178,7 @@ struct msi_desc {
 		struct platform_msi_desc platform;
 		struct fsl_mc_msi_desc fsl_mc;
 		struct ti_sci_inta_msi_desc inta;
+		struct device_msi_desc device_msi;
 	};
 };
 
@@ -457,6 +470,17 @@ void *platform_msi_get_host_data(struct irq_domain *domain);
 void msi_domain_set_default_info_flags(struct msi_domain_info *info);
 #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
 
+#ifdef CONFIG_DEVICE_MSI
+struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn,
+						struct msi_domain_info *info,
+						struct irq_domain *parent);
+
+# ifdef CONFIG_PCI
+struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev,
+						       struct msi_domain_info *info);
+# endif
+#endif /* CONFIG_DEVICE_MSI */
+
 #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
 void pci_msi_domain_write_msg(struct irq_data *irq_data, struct msi_msg *msg);
 struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index d79ef24..7223327 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -89,6 +89,10 @@ config GENERIC_MSI_IRQ_DOMAIN
 	select IRQ_DOMAIN_HIERARCHY
 	select GENERIC_MSI_IRQ
 
+config DEVICE_MSI
+	bool
+	select GENERIC_MSI_IRQ_DOMAIN
+
 config IRQ_MSI_IOMMU
 	bool
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 07/13] irqdomain/msi: Provide msi_alloc/free_store() callbacks
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
                   ` (5 preceding siblings ...)
  2021-02-26 20:11 ` [Patch V2 06/13] platform-msi: Add device MSI infrastructure Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-03-25 17:08   ` Marc Zyngier
  2021-02-26 20:11 ` [Patch V2 08/13] genirq: Set auxiliary data for an interrupt Megha Dey
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

From: Thomas Gleixner <tglx@linutronix.de>

For devices which don't have a standard storage for MSI messages like the
upcoming IMS (Interrupt Message Store) it's required to allocate storage
space before allocating interrupts and after freeing them.

This could be achieved with the existing callbacks, but that would be
awkward because they operate on msi_alloc_info_t which is not uniform
across architectures. Also these callbacks are invoked per interrupt but
the allocation might have bulk requirements depending on the device.

As such devices can operate on different architectures it is simpler to
have separate callbacks which operate on struct device. The resulting
storage information has to be stored in struct msi_desc so the underlying
irq chip implementation can retrieve it for the relevant operations.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 include/linux/msi.h |  8 ++++++++
 kernel/irq/msi.c    | 11 +++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 46e879c..e915932 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -323,6 +323,10 @@ struct msi_domain_info;
  *			function.
  * @domain_free_irqs:	Optional function to override the default free
  *			function.
+ * @msi_alloc_store:	Optional callback to allocate storage in a device
+ *			specific non-standard MSI store
+ * @msi_alloc_free:	Optional callback to free storage in a device
+ *			specific non-standard MSI store
  *
  * @get_hwirq, @msi_init and @msi_free are callbacks used by
  * msi_create_irq_domain() and related interfaces
@@ -372,6 +376,10 @@ struct msi_domain_ops {
 					     struct device *dev, int nvec);
 	void		(*domain_free_irqs)(struct irq_domain *domain,
 					    struct device *dev);
+	int		(*msi_alloc_store)(struct irq_domain *domain,
+					   struct device *dev, int nvec);
+	void		(*msi_free_store)(struct irq_domain *domain,
+					  struct device *dev);
 };
 
 /**
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index c54316d..047b59d 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -434,6 +434,12 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
 	if (ret)
 		return ret;
 
+	if (ops->msi_alloc_store) {
+		ret = ops->msi_alloc_store(domain, dev, nvec);
+		if (ret)
+			return ret;
+	}
+
 	for_each_msi_entry(desc, dev) {
 		ops->set_desc(&arg, desc);
 
@@ -529,6 +535,8 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
 
 void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
 {
+	struct msi_domain_info *info = domain->host_data;
+	struct msi_domain_ops *ops = info->ops;
 	struct msi_desc *desc;
 
 	for_each_msi_entry(desc, dev) {
@@ -542,6 +550,9 @@ void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
 			desc->irq = 0;
 		}
 	}
+
+	if (ops->msi_free_store)
+		ops->msi_free_store(domain, dev);
 }
 
 /**
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 08/13] genirq: Set auxiliary data for an interrupt
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
                   ` (6 preceding siblings ...)
  2021-02-26 20:11 ` [Patch V2 07/13] irqdomain/msi: Provide msi_alloc/free_store() callbacks Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-03-25 17:23   ` Marc Zyngier
  2021-02-26 20:11 ` [Patch V2 09/13] iommu/vt-d: Add DEV-MSI support Megha Dey
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

Introduce a new function pointer in the irq_chip structure(irq_set_auxdata)
which is responsible for updating data which is stored in a shared register
or data storage. For example, the idxd driver uses the auxiliary data API
to enable/set and disable PASID field that is in the IMS entry (introduced
in a later patch) and that data are not typically present in MSI entry.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 include/linux/interrupt.h |  2 ++
 include/linux/irq.h       |  4 ++++
 kernel/irq/manage.c       | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 967e257..461ed1c 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -496,6 +496,8 @@ extern int irq_get_irqchip_state(unsigned int irq, enum irqchip_irq_state which,
 extern int irq_set_irqchip_state(unsigned int irq, enum irqchip_irq_state which,
 				 bool state);
 
+int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val);
+
 #ifdef CONFIG_IRQ_FORCED_THREADING
 # ifdef CONFIG_PREEMPT_RT
 #  define force_irqthreads	(true)
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 2efde6a..fc19f32 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -491,6 +491,8 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
  *				irq_request_resources
  * @irq_compose_msi_msg:	optional to compose message content for MSI
  * @irq_write_msi_msg:	optional to write message content for MSI
+ * @irq_set_auxdata:	Optional function to update auxiliary data e.g. in
+ *			shared registers
  * @irq_get_irqchip_state:	return the internal state of an interrupt
  * @irq_set_irqchip_state:	set the internal state of a interrupt
  * @irq_set_vcpu_affinity:	optional to target a vCPU in a virtual machine
@@ -538,6 +540,8 @@ struct irq_chip {
 	void		(*irq_compose_msi_msg)(struct irq_data *data, struct msi_msg *msg);
 	void		(*irq_write_msi_msg)(struct irq_data *data, struct msi_msg *msg);
 
+	int		(*irq_set_auxdata)(struct irq_data *data, unsigned int which, u64 auxval);
+
 	int		(*irq_get_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool *state);
 	int		(*irq_set_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool state);
 
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 85ede4e..68ff559 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -2860,3 +2860,35 @@ bool irq_check_status_bit(unsigned int irq, unsigned int bitmask)
 	return res;
 }
 EXPORT_SYMBOL_GPL(irq_check_status_bit);
+
+/**
+ * irq_set_auxdata - Set auxiliary data
+ * @irq:	Interrupt to update
+ * @which:	Selector which data to update
+ * @auxval:	Auxiliary data value
+ *
+ * Function to update auxiliary data for an interrupt, e.g. to update data
+ * which is stored in a shared register or data storage (e.g. IMS).
+ */
+int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val)
+{
+	struct irq_desc *desc;
+	struct irq_data *data;
+	unsigned long flags;
+	int res = -ENODEV;
+
+	desc = irq_get_desc_buslock(irq, &flags, 0);
+	if (!desc)
+		return -EINVAL;
+
+	for (data = &desc->irq_data; data; data = irqd_get_parent_data(data)) {
+		if (data->chip->irq_set_auxdata) {
+			res = data->chip->irq_set_auxdata(data, which, val);
+			break;
+		}
+	}
+
+	irq_put_desc_busunlock(desc, flags);
+	return res;
+}
+EXPORT_SYMBOL_GPL(irq_set_auxdata);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 09/13] iommu/vt-d: Add DEV-MSI support
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
                   ` (7 preceding siblings ...)
  2021-02-26 20:11 ` [Patch V2 08/13] genirq: Set auxiliary data for an interrupt Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-02-26 20:11 ` [Patch V2 10/13] iommu: Add capability IOMMU_CAP_VIOMMU_HINT Megha Dey
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

Add required support in the interrupt remapping driver for devices
which generate dev-msi interrupts and use the intel remapping
domain as the parent domain. Set the source-id of all dev-msi
interrupt requests to the parent PCI device associated with it.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 drivers/iommu/intel/irq_remapping.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 611ef52..2a55e54 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1282,6 +1282,9 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
 		set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
 		break;
+	case X86_IRQ_ALLOC_TYPE_DEV_MSI:
+		set_msi_sid(irte, to_pci_dev(info->desc->dev->parent));
+		break;
 	default:
 		BUG_ON(1);
 		break;
@@ -1325,7 +1328,8 @@ static int intel_irq_remapping_alloc(struct irq_domain *domain,
 	if (!info || !iommu)
 		return -EINVAL;
 	if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI &&
-	    info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX)
+	    info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX &&
+	    info->type != X86_IRQ_ALLOC_TYPE_DEV_MSI)
 		return -EINVAL;
 
 	/*
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 10/13] iommu: Add capability IOMMU_CAP_VIOMMU_HINT
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
                   ` (8 preceding siblings ...)
  2021-02-26 20:11 ` [Patch V2 09/13] iommu/vt-d: Add DEV-MSI support Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-02-26 20:11 ` [Patch V2 11/13] platform-msi: Add platform check for subdevice irq domain Megha Dey
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar, Joerg Roedel

From: Lu Baolu <baolu.lu@linux.intel.com>

Some IOMMU specification defines some kind of hint mechanism, through
which BIOS can imply that OS runs in a virtualized environment. For
example, the caching mode defined in VT-d spec and NpCache capability
defined in the AMD IOMMU specification. This hint could also be used
outside of the IOMMU subsystem, where it could be used with other known
means (CPUID, smbios) to sense whether Linux is running in a virtualized
environment. Add a capability bit so that it could be used there.

Cc: Joerg Roedel <joro@8bytes.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 drivers/iommu/amd/iommu.c    | 2 ++
 drivers/iommu/intel/iommu.c  | 5 +++++
 drivers/iommu/virtio-iommu.c | 9 +++++++++
 include/linux/iommu.h        | 2 ++
 4 files changed, 18 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b5..a912318 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2140,6 +2140,8 @@ static bool amd_iommu_capable(enum iommu_cap cap)
 		return (irq_remapping_enabled == 1);
 	case IOMMU_CAP_NOEXEC:
 		return false;
+	case IOMMU_CAP_VIOMMU_HINT:
+		return amd_iommu_np_cache;
 	default:
 		break;
 	}
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee09323..55fa198 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -294,6 +294,7 @@ static inline void context_clear_entry(struct context_entry *context)
  */
 static struct dmar_domain *si_domain;
 static int hw_pass_through = 1;
+static int intel_caching_mode;
 
 #define for_each_domain_iommu(idx, domain)			\
 	for (idx = 0; idx < g_num_of_iommus; idx++)		\
@@ -3253,6 +3254,8 @@ static int __init init_dmars(void)
 
 		if (!ecap_pass_through(iommu->ecap))
 			hw_pass_through = 0;
+		if (cap_caching_mode(iommu->cap))
+			intel_caching_mode = 1;
 		intel_svm_check(iommu);
 	}
 
@@ -5113,6 +5116,8 @@ static bool intel_iommu_capable(enum iommu_cap cap)
 		return domain_update_iommu_snooping(NULL) == 1;
 	if (cap == IOMMU_CAP_INTR_REMAP)
 		return irq_remapping_enabled == 1;
+	if (cap == IOMMU_CAP_VIOMMU_HINT)
+		return intel_caching_mode;
 
 	return false;
 }
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 2bfdd57..e4941ca 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -931,7 +931,16 @@ static int viommu_of_xlate(struct device *dev, struct of_phandle_args *args)
 	return iommu_fwspec_add_ids(dev, args->args, 1);
 }
 
+static bool viommu_capable(enum iommu_cap cap)
+{
+	if (cap == IOMMU_CAP_VIOMMU_HINT)
+		return true;
+
+	return false;
+}
+
 static struct iommu_ops viommu_ops = {
+	.capable		= viommu_capable,
 	.domain_alloc		= viommu_domain_alloc,
 	.domain_free		= viommu_domain_free,
 	.attach_dev		= viommu_attach_dev,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe51..9d0ade4 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -94,6 +94,8 @@ enum iommu_cap {
 					   transactions */
 	IOMMU_CAP_INTR_REMAP,		/* IOMMU supports interrupt isolation */
 	IOMMU_CAP_NOEXEC,		/* IOMMU_NOEXEC flag */
+	IOMMU_CAP_VIOMMU_HINT,		/* IOMMU can detect a hit for running in
+					   VM */
 };
 
 /*
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 11/13] platform-msi: Add platform check for subdevice irq domain
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
                   ` (9 preceding siblings ...)
  2021-02-26 20:11 ` [Patch V2 10/13] iommu: Add capability IOMMU_CAP_VIOMMU_HINT Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-02-26 20:11 ` [Patch V2 12/13] irqchip: Add IMS (Interrupt Message Store) driver Megha Dey
  2021-02-26 20:11 ` [Patch V2 13/13] genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number Megha Dey
  12 siblings, 0 replies; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar, Leon Romanovsky

From: Lu Baolu <baolu.lu@linux.intel.com>

The pci_subdevice_msi_create_irq_domain() should fail if the underlying
platform is not able to support IMS (Interrupt Message Storage). Otherwise,
the isolation of interrupt is not guaranteed.

For x86, IMS is only supported on bare metal for now. We could enable it
in the virtualization environments in the future if interrupt HYPERCALL
domain is supported or the hardware has the capability of interrupt
isolation for subdevices.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/linux-pci/87pn4nk7nn.fsf@nanos.tec.linutronix.de/
Link: https://lore.kernel.org/linux-pci/877dqrnzr3.fsf@nanos.tec.linutronix.de/
Link: https://lore.kernel.org/linux-pci/877dqqmc2h.fsf@nanos.tec.linutronix.de/
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 arch/x86/pci/common.c       | 72 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/base/platform-msi.c |  8 +++++
 include/linux/msi.h         |  2 ++
 3 files changed, 82 insertions(+)

diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 3507f45..64daa6a 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -12,6 +12,8 @@
 #include <linux/init.h>
 #include <linux/dmi.h>
 #include <linux/slab.h>
+#include <linux/iommu.h>
+#include <linux/msi.h>
 
 #include <asm/acpi.h>
 #include <asm/segment.h>
@@ -724,3 +726,73 @@ struct pci_dev *pci_real_dma_dev(struct pci_dev *dev)
 	return dev;
 }
 #endif
+
+/*
+ * We want to figure out which context we are running in. But the hardware
+ * does not introduce a reliable way (instruction, CPUID leaf, MSR, whatever)
+ * which can be manipulated by the VMM to let the OS figure out where it runs.
+ * So we go with the below probably on_bare_metal() function as a replacement
+ * for definitely on_bare_metal() to go forward only for the very simple reason
+ * that this is the only option we have.
+ */
+static const char * const vmm_vendor_name[] = {
+	"QEMU", "Bochs", "KVM", "Xen", "VMware", "VMW", "VMware Inc.",
+	"innotek GmbH", "Oracle Corporation", "Parallels", "BHYVE"
+};
+
+static void read_type0_virtual_machine(const struct dmi_header *dm, void *p)
+{
+	u8 *data = (u8 *)dm + 0x13;
+
+	/* BIOS Information (Type 0) */
+	if (dm->type != 0 || dm->length < 0x14)
+		return;
+
+	/* Bit 4 of BIOS Characteristics Extension Byte 2*/
+	if (*data & BIT(4))
+		*((bool *)p) = true;
+}
+
+static bool smbios_virtual_machine(void)
+{
+	bool bit_present = false;
+
+	dmi_walk(read_type0_virtual_machine, &bit_present);
+
+	return bit_present;
+}
+
+static bool on_bare_metal(struct device *dev)
+{
+	int i;
+
+	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+		return false;
+
+	if (smbios_virtual_machine())
+		return false;
+
+	if (iommu_capable(dev->bus, IOMMU_CAP_VIOMMU_HINT))
+		return false;
+
+	for (i = 0; i < ARRAY_SIZE(vmm_vendor_name); i++)
+		if (dmi_match(DMI_SYS_VENDOR, vmm_vendor_name[i]))
+			return false;
+
+	pr_info("System running on bare metal, report to bugzilla.kernel.org if not the case.");
+
+	return true;
+}
+
+bool arch_support_pci_device_msi(struct pci_dev *pdev)
+{
+	/*
+	 * When we are running in a VMM context, the device IMS could only be
+	 * enabled when the underlying hardware supports interrupt isolation
+	 * of the subdevice, or any mechanism (trap, hypercall) is added so
+	 * that changes in the interrupt message store could be managed by the
+	 * VMM. For now, we only support the device IMS when we are running on
+	 * the bare metal.
+	 */
+	return on_bare_metal(&pdev->dev);
+}
diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c
index 6127b3b..c4a0d9c 100644
--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -519,6 +519,11 @@ struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn,
 #ifdef CONFIG_PCI
 #include <linux/pci.h>
 
+bool __weak arch_support_pci_device_msi(struct pci_dev *pdev)
+{
+	return false;
+}
+
 /**
  * pci_subdevice_msi_create_irq_domain - Create an irq domain for subdevices
  * @pdev:	Pointer to PCI device for which the subdevice domain is created
@@ -530,6 +535,9 @@ struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev,
 	struct irq_domain *domain, *pdev_msi;
 	struct fwnode_handle *fn;
 
+	if (!arch_support_pci_device_msi(pdev))
+		return NULL;
+
 	/*
 	 * Retrieve the MSI domain of the underlying PCI device's MSI
 	 * domain. The PCI device domain's parent domain is also the parent
diff --git a/include/linux/msi.h b/include/linux/msi.h
index e915932..24abec0 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -489,6 +489,8 @@ struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev,
 # endif
 #endif /* CONFIG_DEVICE_MSI */
 
+bool arch_support_pci_device_msi(struct pci_dev *pdev);
+
 #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN
 void pci_msi_domain_write_msg(struct irq_data *irq_data, struct msi_msg *msg);
 struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 12/13] irqchip: Add IMS (Interrupt Message Store) driver
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
                   ` (10 preceding siblings ...)
  2021-02-26 20:11 ` [Patch V2 11/13] platform-msi: Add platform check for subdevice irq domain Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-03-25 17:43   ` Marc Zyngier
  2021-02-26 20:11 ` [Patch V2 13/13] genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number Megha Dey
  12 siblings, 1 reply; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

Generic IMS(Interrupt Message Store) irq chips and irq domain
implementations for IMS based devices which store the interrupt messages
in an array in device memory.

Allocation and freeing of interrupts happens via the generic
msi_domain_alloc/free_irqs() interface. No special purpose IMS magic
required as long as the interrupt domain is stored in the underlying
device struct. The irq_set_auxdata() is used to program the pasid into
the IMS entry.

[Megha: Fixed compile time errors
        Added necessary dependencies to IMS_MSI_ARRAY config
        Fixed polarity of IMS_VECTOR_CTRL
        Added reads after writes to flush writes to device
        Added set_desc ops to IMS msi domain ops
        Tested the IMS infrastructure with the IDXD driver]

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 drivers/irqchip/Kconfig             |  14 +++
 drivers/irqchip/Makefile            |   1 +
 drivers/irqchip/irq-ims-msi.c       | 211 ++++++++++++++++++++++++++++++++++++
 include/linux/irqchip/irq-ims-msi.h |  68 ++++++++++++
 4 files changed, 294 insertions(+)
 create mode 100644 drivers/irqchip/irq-ims-msi.c
 create mode 100644 include/linux/irqchip/irq-ims-msi.h

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index e74fa20..2fb0c24 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -586,4 +586,18 @@ config MST_IRQ
 	help
 	  Support MStar Interrupt Controller.
 
+config IMS_MSI
+	depends on PCI
+	select DEVICE_MSI
+	bool
+
+config IMS_MSI_ARRAY
+	bool "IMS Interrupt Message Store MSI controller for device memory storage arrays"
+	depends on PCI
+	select IMS_MSI
+	select GENERIC_MSI_IRQ_DOMAIN
+	help
+	  Support for IMS Interrupt Message Store MSI controller
+	  with IMS slot storage in a slot array in device memory
+
 endmenu
diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
index c59b95a..e903201 100644
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -113,3 +113,4 @@ obj-$(CONFIG_LOONGSON_PCH_MSI)		+= irq-loongson-pch-msi.o
 obj-$(CONFIG_MST_IRQ)			+= irq-mst-intc.o
 obj-$(CONFIG_SL28CPLD_INTC)		+= irq-sl28cpld.o
 obj-$(CONFIG_MACH_REALTEK_RTL)		+= irq-realtek-rtl.o
+obj-$(CONFIG_IMS_MSI)			+= irq-ims-msi.o
diff --git a/drivers/irqchip/irq-ims-msi.c b/drivers/irqchip/irq-ims-msi.c
new file mode 100644
index 0000000..fa23207
--- /dev/null
+++ b/drivers/irqchip/irq-ims-msi.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0
+// (C) Copyright 2021 Thomas Gleixner <tglx@linutronix.de>
+/*
+ * Shared interrupt chips and irq domains for IMS devices
+ */
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/msi.h>
+#include <linux/irq.h>
+#include <linux/irqdomain.h>
+
+#include <linux/irqchip/irq-ims-msi.h>
+
+#ifdef CONFIG_IMS_MSI_ARRAY
+
+struct ims_array_data {
+	struct ims_array_info	info;
+	unsigned long		map[0];
+};
+
+static inline void iowrite32_and_flush(u32 value, void __iomem *addr)
+{
+	iowrite32(value, addr);
+	ioread32(addr);
+}
+
+static void ims_array_mask_irq(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
+	u32 __iomem *ctrl = &slot->ctrl;
+
+	iowrite32_and_flush(ioread32(ctrl) | IMS_CTRL_VECTOR_MASKBIT, ctrl);
+}
+
+static void ims_array_unmask_irq(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
+	u32 __iomem *ctrl = &slot->ctrl;
+
+	iowrite32_and_flush(ioread32(ctrl) & ~IMS_CTRL_VECTOR_MASKBIT, ctrl);
+}
+
+static void ims_array_write_msi_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
+
+	iowrite32(msg->address_lo, &slot->address_lo);
+	iowrite32(msg->address_hi, &slot->address_hi);
+	iowrite32_and_flush(msg->data, &slot->data);
+}
+
+static int ims_array_set_auxdata(struct irq_data *data, unsigned int which,
+				 u64 auxval)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->device_msi.priv_iomem;
+	u32 val, __iomem *ctrl = &slot->ctrl;
+
+	if (which != IMS_AUXDATA_CONTROL_WORD)
+		return -EINVAL;
+	if (auxval & ~(u64)IMS_CONTROL_WORD_AUXMASK)
+		return -EINVAL;
+
+	val = ioread32(ctrl) & IMS_CONTROL_WORD_IRQMASK;
+	iowrite32_and_flush(val | (u32)auxval, ctrl);
+	return 0;
+}
+
+static const struct irq_chip ims_array_msi_controller = {
+	.name			= "IMS",
+	.irq_mask		= ims_array_mask_irq,
+	.irq_unmask		= ims_array_unmask_irq,
+	.irq_write_msi_msg	= ims_array_write_msi_msg,
+	.irq_set_auxdata	= ims_array_set_auxdata,
+	.irq_retrigger		= irq_chip_retrigger_hierarchy,
+	.flags			= IRQCHIP_SKIP_SET_WAKE,
+};
+
+static void ims_array_reset_slot(struct ims_slot __iomem *slot)
+{
+	iowrite32(0, &slot->address_lo);
+	iowrite32(0, &slot->address_hi);
+	iowrite32(0, &slot->data);
+	iowrite32_and_flush(IMS_CTRL_VECTOR_MASKBIT, &slot->ctrl);
+}
+
+static void ims_array_free_msi_store(struct irq_domain *domain,
+				     struct device *dev)
+{
+	struct msi_domain_info *info = domain->host_data;
+	struct ims_array_data *ims = info->data;
+	struct msi_desc *entry;
+
+	for_each_msi_entry(entry, dev) {
+		if (entry->device_msi.priv_iomem) {
+			clear_bit(entry->device_msi.hwirq, ims->map);
+			ims_array_reset_slot(entry->device_msi.priv_iomem);
+			entry->device_msi.priv_iomem = NULL;
+			entry->device_msi.hwirq = 0;
+		}
+	}
+}
+
+static int ims_array_alloc_msi_store(struct irq_domain *domain,
+				     struct device *dev, int nvec)
+{
+	struct msi_domain_info *info = domain->host_data;
+	struct ims_array_data *ims = info->data;
+	struct msi_desc *entry;
+
+	for_each_msi_entry(entry, dev) {
+		unsigned int idx;
+
+		idx = find_first_zero_bit(ims->map, ims->info.max_slots);
+		if (idx >= ims->info.max_slots)
+			goto fail;
+		set_bit(idx, ims->map);
+		entry->device_msi.priv_iomem = &ims->info.slots[idx];
+		ims_array_reset_slot(entry->device_msi.priv_iomem);
+		entry->device_msi.hwirq = idx;
+	}
+	return 0;
+
+fail:
+	ims_array_free_msi_store(domain, dev);
+	return -ENOSPC;
+}
+
+struct ims_array_domain_template {
+	struct msi_domain_ops	ops;
+	struct msi_domain_info	info;
+};
+
+static void ims_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
+{
+	arg->desc = desc;
+	arg->hwirq = desc->device_msi.hwirq;
+}
+
+static const struct ims_array_domain_template ims_array_domain_template = {
+	.ops = {
+		.msi_alloc_store	= ims_array_alloc_msi_store,
+		.msi_free_store		= ims_array_free_msi_store,
+		.set_desc               = ims_set_desc,
+	},
+	.info = {
+		.flags		= MSI_FLAG_USE_DEF_DOM_OPS |
+				  MSI_FLAG_USE_DEF_CHIP_OPS,
+		.handler	= handle_edge_irq,
+		.handler_name	= "edge",
+	},
+};
+
+struct irq_domain *
+pci_ims_array_create_msi_irq_domain(struct pci_dev *pdev,
+				    struct ims_array_info *ims_info)
+{
+	struct ims_array_domain_template *info;
+	struct ims_array_data *data;
+	struct irq_domain *domain;
+	struct irq_chip *chip;
+	unsigned int size;
+
+	/* Allocate new domain storage */
+	info = kmemdup(&ims_array_domain_template,
+		       sizeof(ims_array_domain_template), GFP_KERNEL);
+	if (!info)
+		return NULL;
+	/* Link the ops */
+	info->info.ops = &info->ops;
+
+	/* Allocate ims_info along with the bitmap */
+	size = sizeof(*data);
+	size += BITS_TO_LONGS(ims_info->max_slots) * sizeof(unsigned long);
+	data = kzalloc(size, GFP_KERNEL);
+	if (!data)
+		goto err_info;
+
+	data->info = *ims_info;
+	info->info.data = data;
+
+	/*
+	 * Allocate an interrupt chip because the core needs to be able to
+	 * update it with default callbacks.
+	 */
+	chip = kmemdup(&ims_array_msi_controller,
+		       sizeof(ims_array_msi_controller), GFP_KERNEL);
+	if (!chip)
+		goto err_data;
+	info->info.chip = chip;
+
+	domain = pci_subdevice_msi_create_irq_domain(pdev, &info->info);
+	if (!domain)
+		goto err_chip;
+
+	return domain;
+
+err_chip:
+	kfree(chip);
+err_data:
+	kfree(data);
+err_info:
+	kfree(info);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_ims_array_create_msi_irq_domain);
+
+#endif /* CONFIG_IMS_MSI_ARRAY */
diff --git a/include/linux/irqchip/irq-ims-msi.h b/include/linux/irqchip/irq-ims-msi.h
new file mode 100644
index 0000000..9ba767f
--- /dev/null
+++ b/include/linux/irqchip/irq-ims-msi.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* (C) Copyright 2021 Thomas Gleixner <tglx@linutronix.de> */
+
+#ifndef _LINUX_IRQCHIP_IRQ_IMS_MSI_H
+#define _LINUX_IRQCHIP_IRQ_IMS_MSI_H
+
+#include <linux/types.h>
+#include <linux/bits.h>
+
+/**
+ * ims_hw_slot - The hardware layout of an IMS based MSI message
+ * @address_lo:	Lower 32bit address
+ * @address_hi:	Upper 32bit address
+ * @data:	Message data
+ * @ctrl:	Control word
+ *
+ * This structure is used by both the device memory array and the queue
+ * memory variants of IMS.
+ */
+struct ims_slot {
+	u32	address_lo;
+	u32	address_hi;
+	u32	data;
+	u32	ctrl;
+} __packed;
+
+/*
+ * The IMS control word utilizes bit 0-2 for interrupt control. The remaining
+ * bits can contain auxiliary data.
+ */
+#define IMS_CONTROL_WORD_IRQMASK	GENMASK(2, 0)
+#define IMS_CONTROL_WORD_AUXMASK	GENMASK(31, 3)
+
+/* Auxiliary control word data related defines */
+enum {
+	IMS_AUXDATA_CONTROL_WORD,
+};
+
+/* Bit to mask the interrupt in ims_hw_slot::ctrl */
+#define IMS_CTRL_VECTOR_MASKBIT		BIT(0)
+#define IMS_CTRL_PASID_ENABLE           BIT(3)
+#define IMS_CTRL_PASID_SHIFT            12
+
+/* Set pasid and enable bit for the IMS entry */
+static inline u32 ims_ctrl_pasid_aux(unsigned int pasid, bool enable)
+{
+	u32 auxval = pasid << IMS_CTRL_PASID_SHIFT;
+
+	return enable ? auxval | IMS_CTRL_PASID_ENABLE : auxval;
+}
+
+/**
+ * struct ims_array_info - Information to create an IMS array domain
+ * @slots:	Pointer to the start of the array
+ * @max_slots:	Maximum number of slots in the array
+ */
+struct ims_array_info {
+	struct ims_slot		__iomem *slots;
+	unsigned int		max_slots;
+};
+
+struct pci_dev;
+struct irq_domain;
+
+struct irq_domain *pci_ims_array_create_msi_irq_domain(struct pci_dev *pdev,
+						       struct ims_array_info *ims_info);
+
+#endif
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch V2 13/13] genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number
  2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
                   ` (11 preceding siblings ...)
  2021-02-26 20:11 ` [Patch V2 12/13] irqchip: Add IMS (Interrupt Message Store) driver Megha Dey
@ 2021-02-26 20:11 ` Megha Dey
  2021-03-25 17:53   ` Marc Zyngier
  12 siblings, 1 reply; 28+ messages in thread
From: Megha Dey @ 2021-02-26 20:11 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, megha.dey, jgg, kvm, iommu,
	alex.williamson, bhelgaas, maz, linux-pci, baolu.lu,
	ravi.v.shankar

From: Dave Jiang <dave.jiang@intel.com>

Add new helpers to get the Linux IRQ number and device specific index
for given device-relative vector so that the drivers don't need to
allocate their own arrays to keep track of the vectors and hwirq for
the multi vector device MSI case.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Megha Dey <megha.dey@intel.com>
---
 include/linux/msi.h |  2 ++
 kernel/irq/msi.c    | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 24abec0..d60a6ba 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -451,6 +451,8 @@ struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode,
 int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec,
 				   irq_write_msi_msg_t write_msi_msg);
 void platform_msi_domain_free_irqs(struct device *dev);
+int msi_irq_vector(struct device *dev, unsigned int nr);
+int dev_msi_hwirq(struct device *dev, unsigned int nr);
 
 /* When an MSI domain is used as an intermediate domain */
 int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 047b59d..f2a8f55 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -581,4 +581,48 @@ struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain)
 	return (struct msi_domain_info *)domain->host_data;
 }
 
+/**
+ * msi_irq_vector - Get the Linux IRQ number of a device vector
+ * @dev: device to operate on
+ * @nr: device-relative interrupt vector index (0-based).
+ *
+ * Returns the Linux IRQ number of a device vector.
+ */
+int msi_irq_vector(struct device *dev, unsigned int nr)
+{
+	struct msi_desc *entry;
+	int i = 0;
+
+	for_each_msi_entry(entry, dev) {
+		if (i == nr)
+			return entry->irq;
+		i++;
+	}
+	WARN_ON_ONCE(1);
+	return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(msi_irq_vector);
+
+/**
+ * dev_msi_hwirq - Get the device MSI hw IRQ number of a device vector
+ * @dev: device to operate on
+ * @nr: device-relative interrupt vector index (0-based).
+ *
+ * Return the dev_msi hw IRQ number of a device vector.
+ */
+int dev_msi_hwirq(struct device *dev, unsigned int nr)
+{
+	struct msi_desc *entry;
+	int i = 0;
+
+	for_each_msi_entry(entry, dev) {
+		if (i == nr)
+			return entry->device_msi.hwirq;
+		i++;
+	}
+	WARN_ON_ONCE(1);
+	return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(dev_msi_hwirq);
+
 #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 07/13] irqdomain/msi: Provide msi_alloc/free_store() callbacks
  2021-02-26 20:11 ` [Patch V2 07/13] irqdomain/msi: Provide msi_alloc/free_store() callbacks Megha Dey
@ 2021-03-25 17:08   ` Marc Zyngier
  2021-03-25 18:44     ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Marc Zyngier @ 2021-03-25 17:08 UTC (permalink / raw)
  To: Megha Dey
  Cc: tglx, linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

On Fri, 26 Feb 2021 20:11:11 +0000,
Megha Dey <megha.dey@intel.com> wrote:
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> For devices which don't have a standard storage for MSI messages like the
> upcoming IMS (Interrupt Message Store) it's required to allocate storage
> space before allocating interrupts and after freeing them.
> 
> This could be achieved with the existing callbacks, but that would be
> awkward because they operate on msi_alloc_info_t which is not uniform
> across architectures. Also these callbacks are invoked per interrupt but
> the allocation might have bulk requirements depending on the device.
> 
> As such devices can operate on different architectures it is simpler to
> have separate callbacks which operate on struct device. The resulting
> storage information has to be stored in struct msi_desc so the underlying
> irq chip implementation can retrieve it for the relevant operations.
> 
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Megha Dey <megha.dey@intel.com>
> ---
>  include/linux/msi.h |  8 ++++++++
>  kernel/irq/msi.c    | 11 +++++++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 46e879c..e915932 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -323,6 +323,10 @@ struct msi_domain_info;
>   *			function.
>   * @domain_free_irqs:	Optional function to override the default free
>   *			function.
> + * @msi_alloc_store:	Optional callback to allocate storage in a device
> + *			specific non-standard MSI store
> + * @msi_alloc_free:	Optional callback to free storage in a device
> + *			specific non-standard MSI store
>   *
>   * @get_hwirq, @msi_init and @msi_free are callbacks used by
>   * msi_create_irq_domain() and related interfaces
> @@ -372,6 +376,10 @@ struct msi_domain_ops {
>  					     struct device *dev, int nvec);
>  	void		(*domain_free_irqs)(struct irq_domain *domain,
>  					    struct device *dev);
> +	int		(*msi_alloc_store)(struct irq_domain *domain,
> +					   struct device *dev, int nvec);
> +	void		(*msi_free_store)(struct irq_domain *domain,
> +					  struct device *dev);
>  };
>  
>  /**
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index c54316d..047b59d 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -434,6 +434,12 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
>  	if (ret)
>  		return ret;
>  
> +	if (ops->msi_alloc_store) {
> +		ret = ops->msi_alloc_store(domain, dev, nvec);

What is supposed to happen if we get aliasing devices (similar to what
we have with devices behind a PCI bridge)?

The ITS code goes through all kind of hoops to try and detect this
case when sizing the translation tables (in the .prepare callback),
and I have the feeling that sizing the message store is analogous.

Or do we all have the warm fuzzy feeling that aliasing is a thing of
the past and that we can ignore this potential problem?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 08/13] genirq: Set auxiliary data for an interrupt
  2021-02-26 20:11 ` [Patch V2 08/13] genirq: Set auxiliary data for an interrupt Megha Dey
@ 2021-03-25 17:23   ` Marc Zyngier
  2021-03-25 18:59     ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Marc Zyngier @ 2021-03-25 17:23 UTC (permalink / raw)
  To: Megha Dey
  Cc: tglx, linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

On Fri, 26 Feb 2021 20:11:12 +0000,
Megha Dey <megha.dey@intel.com> wrote:
> 
> Introduce a new function pointer in the irq_chip structure(irq_set_auxdata)
> which is responsible for updating data which is stored in a shared register
> or data storage. For example, the idxd driver uses the auxiliary data API
> to enable/set and disable PASID field that is in the IMS entry (introduced
> in a later patch) and that data are not typically present in MSI entry.
> 
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Megha Dey <megha.dey@intel.com>
> ---
>  include/linux/interrupt.h |  2 ++
>  include/linux/irq.h       |  4 ++++
>  kernel/irq/manage.c       | 32 ++++++++++++++++++++++++++++++++
>  3 files changed, 38 insertions(+)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 967e257..461ed1c 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -496,6 +496,8 @@ extern int irq_get_irqchip_state(unsigned int irq, enum irqchip_irq_state which,
>  extern int irq_set_irqchip_state(unsigned int irq, enum irqchip_irq_state which,
>  				 bool state);
>  
> +int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val);
> +
>  #ifdef CONFIG_IRQ_FORCED_THREADING
>  # ifdef CONFIG_PREEMPT_RT
>  #  define force_irqthreads	(true)
> diff --git a/include/linux/irq.h b/include/linux/irq.h
> index 2efde6a..fc19f32 100644
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -491,6 +491,8 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
>   *				irq_request_resources
>   * @irq_compose_msi_msg:	optional to compose message content for MSI
>   * @irq_write_msi_msg:	optional to write message content for MSI
> + * @irq_set_auxdata:	Optional function to update auxiliary data e.g. in
> + *			shared registers
>   * @irq_get_irqchip_state:	return the internal state of an interrupt
>   * @irq_set_irqchip_state:	set the internal state of a interrupt
>   * @irq_set_vcpu_affinity:	optional to target a vCPU in a virtual machine
> @@ -538,6 +540,8 @@ struct irq_chip {
>  	void		(*irq_compose_msi_msg)(struct irq_data *data, struct msi_msg *msg);
>  	void		(*irq_write_msi_msg)(struct irq_data *data, struct msi_msg *msg);
>  
> +	int		(*irq_set_auxdata)(struct irq_data *data, unsigned int which, u64 auxval);
> +
>  	int		(*irq_get_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool *state);
>  	int		(*irq_set_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool state);
>  
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 85ede4e..68ff559 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -2860,3 +2860,35 @@ bool irq_check_status_bit(unsigned int irq, unsigned int bitmask)
>  	return res;
>  }
>  EXPORT_SYMBOL_GPL(irq_check_status_bit);
> +
> +/**
> + * irq_set_auxdata - Set auxiliary data
> + * @irq:	Interrupt to update
> + * @which:	Selector which data to update
> + * @auxval:	Auxiliary data value
> + *
> + * Function to update auxiliary data for an interrupt, e.g. to update data
> + * which is stored in a shared register or data storage (e.g. IMS).
> + */
> +int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val)

This looks to me like a massively generalised version of
irq_set_irqchip_state(), only without any defined semantics when it
comes to the 'which' state, making it look like the irqchip version of
an ioctl.

We also have the irq_set_vcpu_affinity() callback that is used to
perpetrate all sort of sins (and I have abused this interface more
than I should admit it).

Can we try and converge on a single interface that allows for
"side-band state" to be communicated, with documented state?

> +{
> +	struct irq_desc *desc;
> +	struct irq_data *data;
> +	unsigned long flags;
> +	int res = -ENODEV;
> +
> +	desc = irq_get_desc_buslock(irq, &flags, 0);
> +	if (!desc)
> +		return -EINVAL;
> +
> +	for (data = &desc->irq_data; data; data = irqd_get_parent_data(data)) {
> +		if (data->chip->irq_set_auxdata) {
> +			res = data->chip->irq_set_auxdata(data, which, val);

And this is where things can break: because you don't define what
'which' is, you can end-up with two stacked layers clashing in their
interpretation of 'which', potentially doing the wrong thing.

Short of having a global, cross architecture definition of all the
possible states, this is frankly dodgy.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 12/13] irqchip: Add IMS (Interrupt Message Store) driver
  2021-02-26 20:11 ` [Patch V2 12/13] irqchip: Add IMS (Interrupt Message Store) driver Megha Dey
@ 2021-03-25 17:43   ` Marc Zyngier
  2021-03-25 19:07     ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Marc Zyngier @ 2021-03-25 17:43 UTC (permalink / raw)
  To: Megha Dey
  Cc: tglx, linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

On Fri, 26 Feb 2021 20:11:16 +0000,
Megha Dey <megha.dey@intel.com> wrote:
> 
> Generic IMS(Interrupt Message Store) irq chips and irq domain
> implementations for IMS based devices which store the interrupt messages
> in an array in device memory.
> 
> Allocation and freeing of interrupts happens via the generic
> msi_domain_alloc/free_irqs() interface. No special purpose IMS magic
> required as long as the interrupt domain is stored in the underlying
> device struct. The irq_set_auxdata() is used to program the pasid into
> the IMS entry.
> 
> [Megha: Fixed compile time errors
>         Added necessary dependencies to IMS_MSI_ARRAY config
>         Fixed polarity of IMS_VECTOR_CTRL
>         Added reads after writes to flush writes to device
>         Added set_desc ops to IMS msi domain ops
>         Tested the IMS infrastructure with the IDXD driver]
> 
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Megha Dey <megha.dey@intel.com>
> ---
>  drivers/irqchip/Kconfig             |  14 +++
>  drivers/irqchip/Makefile            |   1 +
>  drivers/irqchip/irq-ims-msi.c       | 211 ++++++++++++++++++++++++++++++++++++
>  include/linux/irqchip/irq-ims-msi.h |  68 ++++++++++++
>  4 files changed, 294 insertions(+)
>  create mode 100644 drivers/irqchip/irq-ims-msi.c
>  create mode 100644 include/linux/irqchip/irq-ims-msi.h
> 
> diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
> index e74fa20..2fb0c24 100644
> --- a/drivers/irqchip/Kconfig
> +++ b/drivers/irqchip/Kconfig
> @@ -586,4 +586,18 @@ config MST_IRQ
>  	help
>  	  Support MStar Interrupt Controller.
>  
> +config IMS_MSI
> +	depends on PCI
> +	select DEVICE_MSI
> +	bool
> +
> +config IMS_MSI_ARRAY
> +	bool "IMS Interrupt Message Store MSI controller for device memory storage arrays"
> +	depends on PCI
> +	select IMS_MSI
> +	select GENERIC_MSI_IRQ_DOMAIN
> +	help
> +	  Support for IMS Interrupt Message Store MSI controller
> +	  with IMS slot storage in a slot array in device memory
> +
>  endmenu
> diff --git a/drivers/irqchip/Makefile b/drivers/irqchip/Makefile
> index c59b95a..e903201 100644
> --- a/drivers/irqchip/Makefile
> +++ b/drivers/irqchip/Makefile
> @@ -113,3 +113,4 @@ obj-$(CONFIG_LOONGSON_PCH_MSI)		+= irq-loongson-pch-msi.o
>  obj-$(CONFIG_MST_IRQ)			+= irq-mst-intc.o
>  obj-$(CONFIG_SL28CPLD_INTC)		+= irq-sl28cpld.o
>  obj-$(CONFIG_MACH_REALTEK_RTL)		+= irq-realtek-rtl.o
> +obj-$(CONFIG_IMS_MSI)			+= irq-ims-msi.o
> diff --git a/drivers/irqchip/irq-ims-msi.c b/drivers/irqchip/irq-ims-msi.c
> new file mode 100644
> index 0000000..fa23207
> --- /dev/null
> +++ b/drivers/irqchip/irq-ims-msi.c
> @@ -0,0 +1,211 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// (C) Copyright 2021 Thomas Gleixner <tglx@linutronix.de>
> +/*
> + * Shared interrupt chips and irq domains for IMS devices
> + */
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include <linux/msi.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +
> +#include <linux/irqchip/irq-ims-msi.h>
> +
> +#ifdef CONFIG_IMS_MSI_ARRAY

Given that this covers the whole driver, what is this #defined used
for? You might as well make the driver depend on this config option.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 13/13] genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number
  2021-02-26 20:11 ` [Patch V2 13/13] genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number Megha Dey
@ 2021-03-25 17:53   ` Marc Zyngier
  2021-03-26  1:02     ` Dey, Megha
  0 siblings, 1 reply; 28+ messages in thread
From: Marc Zyngier @ 2021-03-25 17:53 UTC (permalink / raw)
  To: Megha Dey
  Cc: tglx, linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

On Fri, 26 Feb 2021 20:11:17 +0000,
Megha Dey <megha.dey@intel.com> wrote:
> 
> From: Dave Jiang <dave.jiang@intel.com>
> 
> Add new helpers to get the Linux IRQ number and device specific index
> for given device-relative vector so that the drivers don't need to
> allocate their own arrays to keep track of the vectors and hwirq for
> the multi vector device MSI case.
> 
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Megha Dey <megha.dey@intel.com>
> ---
>  include/linux/msi.h |  2 ++
>  kernel/irq/msi.c    | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 46 insertions(+)
> 
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 24abec0..d60a6ba 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -451,6 +451,8 @@ struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode,
>  int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec,
>  				   irq_write_msi_msg_t write_msi_msg);
>  void platform_msi_domain_free_irqs(struct device *dev);
> +int msi_irq_vector(struct device *dev, unsigned int nr);
> +int dev_msi_hwirq(struct device *dev, unsigned int nr);
>  
>  /* When an MSI domain is used as an intermediate domain */
>  int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 047b59d..f2a8f55 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -581,4 +581,48 @@ struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain)
>  	return (struct msi_domain_info *)domain->host_data;
>  }
>  
> +/**
> + * msi_irq_vector - Get the Linux IRQ number of a device vector
> + * @dev: device to operate on
> + * @nr: device-relative interrupt vector index (0-based).
> + *
> + * Returns the Linux IRQ number of a device vector.
> + */
> +int msi_irq_vector(struct device *dev, unsigned int nr)
> +{
> +	struct msi_desc *entry;
> +	int i = 0;
> +
> +	for_each_msi_entry(entry, dev) {
> +		if (i == nr)
> +			return entry->irq;
> +		i++;

This obviously doesn't work with Multi-MSI, does it?

> +	}
> +	WARN_ON_ONCE(1);
> +	return -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(msi_irq_vector);
> +
> +/**
> + * dev_msi_hwirq - Get the device MSI hw IRQ number of a device vector
> + * @dev: device to operate on
> + * @nr: device-relative interrupt vector index (0-based).
> + *
> + * Return the dev_msi hw IRQ number of a device vector.
> + */
> +int dev_msi_hwirq(struct device *dev, unsigned int nr)
> +{
> +	struct msi_desc *entry;
> +	int i = 0;
> +
> +	for_each_msi_entry(entry, dev) {
> +		if (i == nr)
> +			return entry->device_msi.hwirq;
> +		i++;
> +	}
> +	WARN_ON_ONCE(1);
> +	return -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(dev_msi_hwirq);
> +
>  #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */

And what uses these helpers?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 07/13] irqdomain/msi: Provide msi_alloc/free_store() callbacks
  2021-03-25 17:08   ` Marc Zyngier
@ 2021-03-25 18:44     ` Thomas Gleixner
  2021-03-26 10:14       ` Marc Zyngier
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2021-03-25 18:44 UTC (permalink / raw)
  To: Marc Zyngier, Megha Dey
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

On Thu, Mar 25 2021 at 17:08, Marc Zyngier wrote:
> Megha Dey <megha.dey@intel.com> wrote:
>> @@ -434,6 +434,12 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
>>  	if (ret)
>>  		return ret;
>>  
>> +	if (ops->msi_alloc_store) {
>> +		ret = ops->msi_alloc_store(domain, dev, nvec);
>
> What is supposed to happen if we get aliasing devices (similar to what
> we have with devices behind a PCI bridge)?
>
> The ITS code goes through all kind of hoops to try and detect this
> case when sizing the translation tables (in the .prepare callback),
> and I have the feeling that sizing the message store is analogous.

No. The message store itself is sized upfront by the underlying 'master'
device. Each 'master' device has it's own irqdomain.

This is the allocation for the subdevice and this is not part of PCI and
therefore not subject to PCI aliasing.

  |-----------|
  |  PCI dev  |         <- driver creates irqdomain and manages MSI
  |-----------|            Sizing is either fixed (hardware property)
                           or just managed by that irqdomain/driver
                           with some hardware constraints
       |subdev|         <- subdev gets ^^irqdomain assigned and
                           allocates from it.
       .....
       |subdev|

So this is fundamentally different from ITS because ITS has to size the
translation memory, i.e. where the MSI message is written to by the
device.

IMS just handles the storage of the message in the (sub)device. So if
that needs to be supported on ARM then the issue is not with the
subdevices, the issue is with the 'master' device, but that does not use
that alloc_store() callback as it provides it with the irqdomain it
manages.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 08/13] genirq: Set auxiliary data for an interrupt
  2021-03-25 17:23   ` Marc Zyngier
@ 2021-03-25 18:59     ` Thomas Gleixner
  2021-03-26 10:32       ` Marc Zyngier
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2021-03-25 18:59 UTC (permalink / raw)
  To: Marc Zyngier, Megha Dey
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

On Thu, Mar 25 2021 at 17:23, Marc Zyngier wrote:
>> +/**
>> + * irq_set_auxdata - Set auxiliary data
>> + * @irq:	Interrupt to update
>> + * @which:	Selector which data to update
>> + * @auxval:	Auxiliary data value
>> + *
>> + * Function to update auxiliary data for an interrupt, e.g. to update data
>> + * which is stored in a shared register or data storage (e.g. IMS).
>> + */
>> +int irq_set_auxdata(unsigned int irq, unsigned int which, u64 val)
>
> This looks to me like a massively generalised version of
> irq_set_irqchip_state(), only without any defined semantics when it
> comes to the 'which' state, making it look like the irqchip version of
> an ioctl.
>
> We also have the irq_set_vcpu_affinity() callback that is used to
> perpetrate all sort of sins (and I have abused this interface more
> than I should admit it).
>
> Can we try and converge on a single interface that allows for
> "side-band state" to be communicated, with documented state?
>
>> +{
>> +	struct irq_desc *desc;
>> +	struct irq_data *data;
>> +	unsigned long flags;
>> +	int res = -ENODEV;
>> +
>> +	desc = irq_get_desc_buslock(irq, &flags, 0);
>> +	if (!desc)
>> +		return -EINVAL;
>> +
>> +	for (data = &desc->irq_data; data; data = irqd_get_parent_data(data)) {
>> +		if (data->chip->irq_set_auxdata) {
>> +			res = data->chip->irq_set_auxdata(data, which, val);
>
> And this is where things can break: because you don't define what
> 'which' is, you can end-up with two stacked layers clashing in their
> interpretation of 'which', potentially doing the wrong thing.
>
> Short of having a global, cross architecture definition of all the
> possible states, this is frankly dodgy.

My bad, I suggested this in the first place.

So what you suggest is to make 'which' an enum and have that in
include/linux/whatever.h so we end up with unique identifiers accross
architectures, irqdomains and whatever, right?

That makes a lot of sense.

Though that leaves the question of the data type for 'val'. While u64 is
probably good enough for most stuff, anything which needs more than that
is left out (again). union as arguments are horrible especially if you
need the counterpart irq_get_auxdata() for which you need a pointer and
then you can't do a forward declaration. Something like this might work
though and avoid to make the pointer business unconditional:

        struct irq_auxdata {
               union {
        	     u64        val;
                     struct foo *foo;
               };
        };

Thanks,

        tglx





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 12/13] irqchip: Add IMS (Interrupt Message Store) driver
  2021-03-25 17:43   ` Marc Zyngier
@ 2021-03-25 19:07     ` Thomas Gleixner
  2021-03-26  1:03       ` Dey, Megha
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2021-03-25 19:07 UTC (permalink / raw)
  To: Marc Zyngier, Megha Dey
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

On Thu, Mar 25 2021 at 17:43, Marc Zyngier wrote:
> On Fri, 26 Feb 2021 20:11:16 +0000,
> Megha Dey <megha.dey@intel.com> wrote:
>> +
>> +#include <linux/irqchip/irq-ims-msi.h>
>> +
>> +#ifdef CONFIG_IMS_MSI_ARRAY
>
> Given that this covers the whole driver, what is this #defined used
> for? You might as well make the driver depend on this config option.

That's a leftover from the initial version I wrote which had also
support for IMS_MSI_QUEUE to store the message in queue memory, but we
have no use case yet for it.

But yes, as things stand now it does not make any sense and IIRC at the
end they do not share anything in the C file except for some includes at
the very end.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 13/13] genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number
  2021-03-25 17:53   ` Marc Zyngier
@ 2021-03-26  1:02     ` Dey, Megha
  2021-03-26 12:58       ` Marc Zyngier
  0 siblings, 1 reply; 28+ messages in thread
From: Dey, Megha @ 2021-03-26  1:02 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: tglx, linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

Hi Marc,

On 3/25/2021 10:53 AM, Marc Zyngier wrote:
> On Fri, 26 Feb 2021 20:11:17 +0000,
> Megha Dey <megha.dey@intel.com> wrote:
>> From: Dave Jiang <dave.jiang@intel.com>
>>
>> Add new helpers to get the Linux IRQ number and device specific index
>> for given device-relative vector so that the drivers don't need to
>> allocate their own arrays to keep track of the vectors and hwirq for
>> the multi vector device MSI case.
>>
>> Reviewed-by: Tony Luck <tony.luck@intel.com>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> Signed-off-by: Megha Dey <megha.dey@intel.com>
>> ---
>>   include/linux/msi.h |  2 ++
>>   kernel/irq/msi.c    | 44 ++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 46 insertions(+)
>>
>> diff --git a/include/linux/msi.h b/include/linux/msi.h
>> index 24abec0..d60a6ba 100644
>> --- a/include/linux/msi.h
>> +++ b/include/linux/msi.h
>> @@ -451,6 +451,8 @@ struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode,
>>   int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec,
>>   				   irq_write_msi_msg_t write_msi_msg);
>>   void platform_msi_domain_free_irqs(struct device *dev);
>> +int msi_irq_vector(struct device *dev, unsigned int nr);
>> +int dev_msi_hwirq(struct device *dev, unsigned int nr);
>>   
>>   /* When an MSI domain is used as an intermediate domain */
>>   int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>> index 047b59d..f2a8f55 100644
>> --- a/kernel/irq/msi.c
>> +++ b/kernel/irq/msi.c
>> @@ -581,4 +581,48 @@ struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain)
>>   	return (struct msi_domain_info *)domain->host_data;
>>   }
>>   
>> +/**
>> + * msi_irq_vector - Get the Linux IRQ number of a device vector
>> + * @dev: device to operate on
>> + * @nr: device-relative interrupt vector index (0-based).
>> + *
>> + * Returns the Linux IRQ number of a device vector.
>> + */
>> +int msi_irq_vector(struct device *dev, unsigned int nr)
>> +{
>> +	struct msi_desc *entry;
>> +	int i = 0;
>> +
>> +	for_each_msi_entry(entry, dev) {
>> +		if (i == nr)
>> +			return entry->irq;
>> +		i++;
> This obviously doesn't work with Multi-MSI, does it?

This API is only for devices that support device MSI interrupts. They 
follow MSI-x format and don't support multi MSI (part of MSI).

Not sure if I am missing something here, can you please let me know?

>
>> +	}
>> +	WARN_ON_ONCE(1);
>> +	return -EINVAL;
>> +}
>> +EXPORT_SYMBOL_GPL(msi_irq_vector);
>> +
>> +/**
>> + * dev_msi_hwirq - Get the device MSI hw IRQ number of a device vector
>> + * @dev: device to operate on
>> + * @nr: device-relative interrupt vector index (0-based).
>> + *
>> + * Return the dev_msi hw IRQ number of a device vector.
>> + */
>> +int dev_msi_hwirq(struct device *dev, unsigned int nr)
>> +{
>> +	struct msi_desc *entry;
>> +	int i = 0;
>> +
>> +	for_each_msi_entry(entry, dev) {
>> +		if (i == nr)
>> +			return entry->device_msi.hwirq;
>> +		i++;
>> +	}
>> +	WARN_ON_ONCE(1);
>> +	return -EINVAL;
>> +}
>> +EXPORT_SYMBOL_GPL(dev_msi_hwirq);
>> +
>>   #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
> And what uses these helpers?]
These helpers are to be used by a driver series(Intel's IDXD driver) 
which is currently stuck due to VFIO refactoring.
>
> Thanks,
>
> 	M.
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 12/13] irqchip: Add IMS (Interrupt Message Store) driver
  2021-03-25 19:07     ` Thomas Gleixner
@ 2021-03-26  1:03       ` Dey, Megha
  0 siblings, 0 replies; 28+ messages in thread
From: Dey, Megha @ 2021-03-26  1:03 UTC (permalink / raw)
  To: Thomas Gleixner, Marc Zyngier
  Cc: linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

Hi Thomas/Marc,

On 3/25/2021 12:07 PM, Thomas Gleixner wrote:
> On Thu, Mar 25 2021 at 17:43, Marc Zyngier wrote:
>> On Fri, 26 Feb 2021 20:11:16 +0000,
>> Megha Dey <megha.dey@intel.com> wrote:
>>> +
>>> +#include <linux/irqchip/irq-ims-msi.h>
>>> +
>>> +#ifdef CONFIG_IMS_MSI_ARRAY
>> Given that this covers the whole driver, what is this #defined used
>> for? You might as well make the driver depend on this config option.
> That's a leftover from the initial version I wrote which had also
> support for IMS_MSI_QUEUE to store the message in queue memory, but we
> have no use case yet for it.
>
> But yes, as things stand now it does not make any sense and IIRC at the
> end they do not share anything in the C file except for some includes at
> the very end.
Sure, I will make this change.
>
> Thanks,
>
>          tglx
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 07/13] irqdomain/msi: Provide msi_alloc/free_store() callbacks
  2021-03-25 18:44     ` Thomas Gleixner
@ 2021-03-26 10:14       ` Marc Zyngier
  0 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2021-03-26 10:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Megha Dey, linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw,
	x86, tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

On Thu, 25 Mar 2021 18:44:48 +0000,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Thu, Mar 25 2021 at 17:08, Marc Zyngier wrote:
> > Megha Dey <megha.dey@intel.com> wrote:
> >> @@ -434,6 +434,12 @@ int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
> >>  	if (ret)
> >>  		return ret;
> >>  
> >> +	if (ops->msi_alloc_store) {
> >> +		ret = ops->msi_alloc_store(domain, dev, nvec);
> >
> > What is supposed to happen if we get aliasing devices (similar to what
> > we have with devices behind a PCI bridge)?
> >
> > The ITS code goes through all kind of hoops to try and detect this
> > case when sizing the translation tables (in the .prepare callback),
> > and I have the feeling that sizing the message store is analogous.
> 
> No. The message store itself is sized upfront by the underlying 'master'
> device. Each 'master' device has it's own irqdomain.
> 
> This is the allocation for the subdevice and this is not part of PCI and
> therefore not subject to PCI aliasing.

Fair enough. If we are guaranteed that there is no aliasing, then this
point is moot.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 08/13] genirq: Set auxiliary data for an interrupt
  2021-03-25 18:59     ` Thomas Gleixner
@ 2021-03-26 10:32       ` Marc Zyngier
  2021-03-26 15:09         ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Marc Zyngier @ 2021-03-26 10:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Megha Dey, linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw,
	x86, tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

On Thu, 25 Mar 2021 18:59:48 +0000,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Thu, Mar 25 2021 at 17:23, Marc Zyngier wrote:
> >> +{
> >> +	struct irq_desc *desc;
> >> +	struct irq_data *data;
> >> +	unsigned long flags;
> >> +	int res = -ENODEV;
> >> +
> >> +	desc = irq_get_desc_buslock(irq, &flags, 0);
> >> +	if (!desc)
> >> +		return -EINVAL;
> >> +
> >> +	for (data = &desc->irq_data; data; data = irqd_get_parent_data(data)) {
> >> +		if (data->chip->irq_set_auxdata) {
> >> +			res = data->chip->irq_set_auxdata(data, which, val);
> >
> > And this is where things can break: because you don't define what
> > 'which' is, you can end-up with two stacked layers clashing in their
> > interpretation of 'which', potentially doing the wrong thing.
> >
> > Short of having a global, cross architecture definition of all the
> > possible states, this is frankly dodgy.
> 
> My bad, I suggested this in the first place.
> 
> So what you suggest is to make 'which' an enum and have that in
> include/linux/whatever.h so we end up with unique identifiers accross
> architectures, irqdomains and whatever, right?

Exactly. As long as 'which' is unique and unambiguous, we can solve
the stacking issue (which is oddly starting to smell like the ghost of
the SVR3 STREAMS... /me runs ;-).

Once we have that, I can start killing the irq_set_vcpu_affinity()
abuse I have been guilty of over the past years. Even more, we could
kill irq_set_vcpu_affinity() altogether, because we have a generic way
of passing side-band information from a driver down to the IRQ stack.

> That makes a lot of sense.
> 
> Though that leaves the question of the data type for 'val'. While u64 is
> probably good enough for most stuff, anything which needs more than that
> is left out (again). union as arguments are horrible especially if you
> need the counterpart irq_get_auxdata() for which you need a pointer and
> then you can't do a forward declaration. Something like this might work
> though and avoid to make the pointer business unconditional:
> 
>         struct irq_auxdata {
>                union {
>         	     u64        val;
>                      struct foo *foo;
>                };
>         };

I guess that at some point, irq_get_auxdata() will become a
requirement so we might as well bite the bullet now, and the above
looks like a good start to me.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 13/13] genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number
  2021-03-26  1:02     ` Dey, Megha
@ 2021-03-26 12:58       ` Marc Zyngier
  2021-03-30  1:57         ` Dey, Megha
  0 siblings, 1 reply; 28+ messages in thread
From: Marc Zyngier @ 2021-03-26 12:58 UTC (permalink / raw)
  To: Dey, Megha
  Cc: tglx, linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

On Fri, 26 Mar 2021 01:02:43 +0000,
"Dey, Megha" <megha.dey@intel.com> wrote:
> 
> Hi Marc,
> 
> On 3/25/2021 10:53 AM, Marc Zyngier wrote:
> > On Fri, 26 Feb 2021 20:11:17 +0000,
> > Megha Dey <megha.dey@intel.com> wrote:
> >> From: Dave Jiang <dave.jiang@intel.com>
> >> 
> >> Add new helpers to get the Linux IRQ number and device specific index
> >> for given device-relative vector so that the drivers don't need to
> >> allocate their own arrays to keep track of the vectors and hwirq for
> >> the multi vector device MSI case.
> >> 
> >> Reviewed-by: Tony Luck <tony.luck@intel.com>
> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> >> Signed-off-by: Megha Dey <megha.dey@intel.com>
> >> ---
> >>   include/linux/msi.h |  2 ++
> >>   kernel/irq/msi.c    | 44 ++++++++++++++++++++++++++++++++++++++++++++
> >>   2 files changed, 46 insertions(+)
> >> 
> >> diff --git a/include/linux/msi.h b/include/linux/msi.h
> >> index 24abec0..d60a6ba 100644
> >> --- a/include/linux/msi.h
> >> +++ b/include/linux/msi.h
> >> @@ -451,6 +451,8 @@ struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode,
> >>   int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec,
> >>   				   irq_write_msi_msg_t write_msi_msg);
> >>   void platform_msi_domain_free_irqs(struct device *dev);
> >> +int msi_irq_vector(struct device *dev, unsigned int nr);
> >> +int dev_msi_hwirq(struct device *dev, unsigned int nr);
> >>     /* When an MSI domain is used as an intermediate domain */
> >>   int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
> >> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> >> index 047b59d..f2a8f55 100644
> >> --- a/kernel/irq/msi.c
> >> +++ b/kernel/irq/msi.c
> >> @@ -581,4 +581,48 @@ struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain)
> >>   	return (struct msi_domain_info *)domain->host_data;
> >>   }
> >>   +/**
> >> + * msi_irq_vector - Get the Linux IRQ number of a device vector
> >> + * @dev: device to operate on
> >> + * @nr: device-relative interrupt vector index (0-based).
> >> + *
> >> + * Returns the Linux IRQ number of a device vector.
> >> + */
> >> +int msi_irq_vector(struct device *dev, unsigned int nr)
> >> +{
> >> +	struct msi_desc *entry;
> >> +	int i = 0;
> >> +
> >> +	for_each_msi_entry(entry, dev) {
> >> +		if (i == nr)
> >> +			return entry->irq;
> >> +		i++;
> > This obviously doesn't work with Multi-MSI, does it?
> 
> This API is only for devices that support device MSI interrupts. They
> follow MSI-x format and don't support multi MSI (part of MSI).
> 
> Not sure if I am missing something here, can you please let me know?

Nothing in the prototype of the function indicates this limitation,
nor does the documentation. And I'm not sure why you should exclude
part of the MSI functionality here. It can't be for performance
reason, so you might as well make sure this works for all the MSI
variants:

int msi_irq_vector(struct device *dev, unsigned int nr)
{
	struct msi_desc *entry;
	int irq, index = 0;

	for_each_msi_vector(entry, irq, dev) {
		if (index == nr}
			return irq;
		index++;
	}

	return WARN_ON_ONCE(-EINVAL);
}

>
> > 
> >> +	}
> >> +	WARN_ON_ONCE(1);
> >> +	return -EINVAL;
> >> +}
> >> +EXPORT_SYMBOL_GPL(msi_irq_vector);
> >> +
> >> +/**
> >> + * dev_msi_hwirq - Get the device MSI hw IRQ number of a device vector
> >> + * @dev: device to operate on
> >> + * @nr: device-relative interrupt vector index (0-based).
> >> + *
> >> + * Return the dev_msi hw IRQ number of a device vector.
> >> + */
> >> +int dev_msi_hwirq(struct device *dev, unsigned int nr)
> >> +{
> >> +	struct msi_desc *entry;
> >> +	int i = 0;
> >> +
> >> +	for_each_msi_entry(entry, dev) {
> >> +		if (i == nr)
> >> +			return entry->device_msi.hwirq;
> >> +		i++;
> >> +	}
> >> +	WARN_ON_ONCE(1);
> >> +	return -EINVAL;
> >> +}

And this helper would be more generally useful if it returned the n-th
msi_desc entry rather than some obscure field in a substructure.

struct msi_desc *msi_get_nth_desc(struct device *dev, unsigned int nth)
{
	struct msi_desc *entry = NULL;
	unsigned int i = 0;

	for_each_msi_entry(entry, dev) {
		if (i == nth)
			return entry;

		i++;
	}

	WARN_ON_ONCE(!entry);
	return entry;
}

You can always wrap it for your particular use case.

> >> +EXPORT_SYMBOL_GPL(dev_msi_hwirq);
> >> +
> >>   #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
> > And what uses these helpers?]
> These helpers are to be used by a driver series(Intel's IDXD driver)
> which is currently stuck due to VFIO refactoring.

Then I's suggest you keep the helpers together with the actual user,
unless this can generally be useful to existing users (exported
symbols without in-tree users is always a bit odd).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 08/13] genirq: Set auxiliary data for an interrupt
  2021-03-26 10:32       ` Marc Zyngier
@ 2021-03-26 15:09         ` Thomas Gleixner
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Gleixner @ 2021-03-26 15:09 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Megha Dey, linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw,
	x86, tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

On Fri, Mar 26 2021 at 10:32, Marc Zyngier wrote:
> On Thu, 25 Mar 2021 18:59:48 +0000,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> Though that leaves the question of the data type for 'val'. While u64 is
>> probably good enough for most stuff, anything which needs more than that
>> is left out (again). union as arguments are horrible especially if you
>> need the counterpart irq_get_auxdata() for which you need a pointer and
>> then you can't do a forward declaration. Something like this might work
>> though and avoid to make the pointer business unconditional:
>> 
>>         struct irq_auxdata {
>>                union {
>>         	     u64        val;
>>                      struct foo *foo;
>>                };
>>         };
>
> I guess that at some point, irq_get_auxdata() will become a
> requirement so we might as well bite the bullet now, and the above
> looks like a good start to me.

And because it's some time until rustification, we can come up with
something nasty like the below:

#define IRQ_AUXTYPE(t, m)       IRQ_AUXTYPE_ ## t

enum irq_auxdata_type {
        IRQ_AUXTYPE(U64,		val64),
        IRQ_AUXTYPE(IRQSTATE,		istate),
        IRQ_AUXTYPE(VCPU_AFFINITY,	vaff),
};

struct irq_auxdata {
        union {
                u64             val64;
                u8              istate;
                struct vcpu_aff *vaff;
       };
};

This does not protect you from accessing the wrong union member, but it
allows an analyzer to map IRQ_AUXTYPE_U64 to irq_auxdata::val64 and then
check both the call site and the implementation whether they fiddle with
some other member of irq_auxdata or do weird casts.

Maybe it's just nuts and has no value, but I had the urge to write some
nasty macros.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch V2 13/13] genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number
  2021-03-26 12:58       ` Marc Zyngier
@ 2021-03-30  1:57         ` Dey, Megha
  0 siblings, 0 replies; 28+ messages in thread
From: Dey, Megha @ 2021-03-30  1:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: tglx, linux-kernel, dave.jiang, ashok.raj, kevin.tian, dwmw, x86,
	tony.luck, dan.j.williams, jgg, kvm, iommu, alex.williamson,
	bhelgaas, linux-pci, baolu.lu, ravi.v.shankar

Hi Marc,

On 3/26/2021 6:28 PM, Marc Zyngier wrote:
> On Fri, 26 Mar 2021 01:02:43 +0000,
> "Dey, Megha" <megha.dey@intel.com> wrote:
>> Hi Marc,
>>
>> On 3/25/2021 10:53 AM, Marc Zyngier wrote:
>>> On Fri, 26 Feb 2021 20:11:17 +0000,
>>> Megha Dey <megha.dey@intel.com> wrote:
>>>> From: Dave Jiang <dave.jiang@intel.com>
>>>>
>>>> Add new helpers to get the Linux IRQ number and device specific index
>>>> for given device-relative vector so that the drivers don't need to
>>>> allocate their own arrays to keep track of the vectors and hwirq for
>>>> the multi vector device MSI case.
>>>>
>>>> Reviewed-by: Tony Luck <tony.luck@intel.com>
>>>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>>>> Signed-off-by: Megha Dey <megha.dey@intel.com>
>>>> ---
>>>>    include/linux/msi.h |  2 ++
>>>>    kernel/irq/msi.c    | 44 ++++++++++++++++++++++++++++++++++++++++++++
>>>>    2 files changed, 46 insertions(+)
>>>>
>>>> diff --git a/include/linux/msi.h b/include/linux/msi.h
>>>> index 24abec0..d60a6ba 100644
>>>> --- a/include/linux/msi.h
>>>> +++ b/include/linux/msi.h
>>>> @@ -451,6 +451,8 @@ struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode,
>>>>    int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec,
>>>>    				   irq_write_msi_msg_t write_msi_msg);
>>>>    void platform_msi_domain_free_irqs(struct device *dev);
>>>> +int msi_irq_vector(struct device *dev, unsigned int nr);
>>>> +int dev_msi_hwirq(struct device *dev, unsigned int nr);
>>>>      /* When an MSI domain is used as an intermediate domain */
>>>>    int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
>>>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>>>> index 047b59d..f2a8f55 100644
>>>> --- a/kernel/irq/msi.c
>>>> +++ b/kernel/irq/msi.c
>>>> @@ -581,4 +581,48 @@ struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain)
>>>>    	return (struct msi_domain_info *)domain->host_data;
>>>>    }
>>>>    +/**
>>>> + * msi_irq_vector - Get the Linux IRQ number of a device vector
>>>> + * @dev: device to operate on
>>>> + * @nr: device-relative interrupt vector index (0-based).
>>>> + *
>>>> + * Returns the Linux IRQ number of a device vector.
>>>> + */
>>>> +int msi_irq_vector(struct device *dev, unsigned int nr)
>>>> +{
>>>> +	struct msi_desc *entry;
>>>> +	int i = 0;
>>>> +
>>>> +	for_each_msi_entry(entry, dev) {
>>>> +		if (i == nr)
>>>> +			return entry->irq;
>>>> +		i++;
>>> This obviously doesn't work with Multi-MSI, does it?
>> This API is only for devices that support device MSI interrupts. They
>> follow MSI-x format and don't support multi MSI (part of MSI).
>>
>> Not sure if I am missing something here, can you please let me know?
> Nothing in the prototype of the function indicates this limitation,
> nor does the documentation. And I'm not sure why you should exclude
> part of the MSI functionality here. It can't be for performance
> reason, so you might as well make sure this works for all the MSI
> variants:
>
> int msi_irq_vector(struct device *dev, unsigned int nr)
> {
> 	struct msi_desc *entry;
> 	int irq, index = 0;
>
> 	for_each_msi_vector(entry, irq, dev) {
> 		if (index == nr}
> 			return irq;
> 		index++;
> 	}
>
> 	return WARN_ON_ONCE(-EINVAL);
> }
Ok, got it!
>>>> +	}
>>>> +	WARN_ON_ONCE(1);
>>>> +	return -EINVAL;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(msi_irq_vector);
>>>> +
>>>> +/**
>>>> + * dev_msi_hwirq - Get the device MSI hw IRQ number of a device vector
>>>> + * @dev: device to operate on
>>>> + * @nr: device-relative interrupt vector index (0-based).
>>>> + *
>>>> + * Return the dev_msi hw IRQ number of a device vector.
>>>> + */
>>>> +int dev_msi_hwirq(struct device *dev, unsigned int nr)
>>>> +{
>>>> +	struct msi_desc *entry;
>>>> +	int i = 0;
>>>> +
>>>> +	for_each_msi_entry(entry, dev) {
>>>> +		if (i == nr)
>>>> +			return entry->device_msi.hwirq;
>>>> +		i++;
>>>> +	}
>>>> +	WARN_ON_ONCE(1);
>>>> +	return -EINVAL;
>>>> +}
> And this helper would be more generally useful if it returned the n-th
> msi_desc entry rather than some obscure field in a substructure.
>
> struct msi_desc *msi_get_nth_desc(struct device *dev, unsigned int nth)
> {
> 	struct msi_desc *entry = NULL;
> 	unsigned int i = 0;
>
> 	for_each_msi_entry(entry, dev) {
> 		if (i == nth)
> 			return entry;
>
> 		i++;
> 	}
>
> 	WARN_ON_ONCE(!entry);
> 	return entry;
> }
>
> You can always wrap it for your particular use case.
Yeah, makes sense.
>
>>>> +EXPORT_SYMBOL_GPL(dev_msi_hwirq);
>>>> +
>>>>    #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
>>> And what uses these helpers?]
>> These helpers are to be used by a driver series(Intel's IDXD driver)
>> which is currently stuck due to VFIO refactoring.
> Then I's suggest you keep the helpers together with the actual user,
> unless this can generally be useful to existing users (exported
> symbols without in-tree users is always a bit odd).
Yeah in the next submission, we will submit this patch series along with 
the user driver patch series.
>
> Thanks,
>
> 	M.
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-03-30  1:59 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-26 20:11 [Patch V2 00/13] Introduce dev-msi and interrupt message store Megha Dey
2021-02-26 20:11 ` [Patch V2 01/13] x86/irq: Add DEV_MSI allocation type Megha Dey
2021-02-26 20:11 ` [Patch V2 02/13] x86/msi: Rename and rework pci_msi_prepare() to cover non-PCI MSI Megha Dey
2021-02-26 20:11 ` [Patch V2 03/13] platform-msi: Provide default irq_chip:: Ack Megha Dey
2021-02-26 20:11 ` [Patch V2 04/13] genirq/proc: Take buslock on affinity write Megha Dey
2021-02-26 20:11 ` [Patch V2 05/13] genirq/msi: Provide and use msi_domain_set_default_info_flags() Megha Dey
2021-02-26 20:11 ` [Patch V2 06/13] platform-msi: Add device MSI infrastructure Megha Dey
2021-02-26 20:11 ` [Patch V2 07/13] irqdomain/msi: Provide msi_alloc/free_store() callbacks Megha Dey
2021-03-25 17:08   ` Marc Zyngier
2021-03-25 18:44     ` Thomas Gleixner
2021-03-26 10:14       ` Marc Zyngier
2021-02-26 20:11 ` [Patch V2 08/13] genirq: Set auxiliary data for an interrupt Megha Dey
2021-03-25 17:23   ` Marc Zyngier
2021-03-25 18:59     ` Thomas Gleixner
2021-03-26 10:32       ` Marc Zyngier
2021-03-26 15:09         ` Thomas Gleixner
2021-02-26 20:11 ` [Patch V2 09/13] iommu/vt-d: Add DEV-MSI support Megha Dey
2021-02-26 20:11 ` [Patch V2 10/13] iommu: Add capability IOMMU_CAP_VIOMMU_HINT Megha Dey
2021-02-26 20:11 ` [Patch V2 11/13] platform-msi: Add platform check for subdevice irq domain Megha Dey
2021-02-26 20:11 ` [Patch V2 12/13] irqchip: Add IMS (Interrupt Message Store) driver Megha Dey
2021-03-25 17:43   ` Marc Zyngier
2021-03-25 19:07     ` Thomas Gleixner
2021-03-26  1:03       ` Dey, Megha
2021-02-26 20:11 ` [Patch V2 13/13] genirq/msi: Provide helpers to return Linux IRQ/dev_msi hw IRQ number Megha Dey
2021-03-25 17:53   ` Marc Zyngier
2021-03-26  1:02     ` Dey, Megha
2021-03-26 12:58       ` Marc Zyngier
2021-03-30  1:57         ` Dey, Megha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).