All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation
@ 2022-11-24 23:25 Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 01/33] genirq/msi: Rearrange MSI domain flags Thomas Gleixner
                   ` (34 more replies)
  0 siblings, 35 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:25 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

This is V3 of the third part of the effort to provide support for per device
MSI interrupt domains.

Version 2 of this part can be found here:

  https://lore.kernel.org/all/20221121083657.157152924@linutronix.de

This is based on the second part which is available here:

  https://lore.kernel.org/all/20221124225331.464480443@linutronix.de

and from git:

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git devmsi-v3-part2

This third part is available from git too:

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git devmsi-v3-part3

This part provides the actual per device domain implementation and related
functionality:

  1) Provide infrastructure to create and remove per device MSI domains

  2) Implement per device MSI domains in the PCI/MSI code and make
     them conditional on the availability of a suitable parent MSI
     domain. This allows to convert the existing domains one by one
     and keeps both the legacy and the current "global" PCI/MSI domain
     model working.

  3) Convert the related x86 MSI domains over (vector and remapping).

  4) Provide core infrastructure for dynamic allocations

  5) Provide PCI/MSI-X interfaces for device drivers to do post
     MSI-X enable allocation/free

  6) Enable dynamic allocation support on the x86 MSI parent domains

  7) Provide infrastructure to create PCI/IMS domains

  8) Enable IMS support on the x86 MSI parent domains

  9) Provide a driver for IDXD which demonstrates how IMS domains
     look like.

Changes vs. v2:

  - Rework the domain size initialization and handling (Kevin)

  - Enable IMS only when on real hardware (Kevin)

  - Rename the PCI/MSI irqchip functions (Kevin)
  
  - Update change logs and comments (Kevin)
   
The delta patch vs. V3 is attached below. It's not completely accurate as
it has some changes from part 2 intermingled, but you get the idea.

Thanks,

	tglx
---
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 1eb9f9eb4852..4d28967f910d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3656,6 +3656,13 @@ static const struct msi_parent_ops amdvi_msi_parent_ops = {
 	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
 };
 
+static const struct msi_parent_ops virt_amdvi_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "vIR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 {
 	struct fwnode_handle *fn;
@@ -3672,7 +3679,11 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 
 	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_AMDVI);
 	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
-	iommu->ir_domain->msi_parent_ops = &amdvi_msi_parent_ops;
+
+	if (amd_iommu_np_cache)
+		iommu->ir_domain->msi_parent_ops = &virt_amdvi_msi_parent_ops;
+	else
+		iommu->ir_domain->msi_parent_ops = &amdvi_msi_parent_ops;
 
 	return 0;
 }
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index f3ee93be9032..a723f53ba472 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -82,7 +82,7 @@ static const struct irq_domain_ops intel_ir_domain_ops;
 
 static void iommu_disable_irq_remapping(struct intel_iommu *iommu);
 static int __init parse_ioapics_under_ir(void);
-static const struct msi_parent_ops dmar_msi_parent_ops;
+static const struct msi_parent_ops dmar_msi_parent_ops, virt_dmar_msi_parent_ops;
 
 static bool ir_pre_enabled(struct intel_iommu *iommu)
 {
@@ -577,7 +577,11 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 
 	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_DMAR);
 	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
-	iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops;
+
+	if (cap_caching_mode(iommu->cap))
+		iommu->ir_domain->msi_parent_ops = &virt_dmar_msi_parent_ops;
+	else
+		iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops;
 
 	ir_table->base = page_address(pages);
 	ir_table->bitmap = bitmap;
@@ -1436,6 +1440,13 @@ static const struct msi_parent_ops dmar_msi_parent_ops = {
 	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
 };
 
+static const struct msi_parent_ops virt_dmar_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "vIR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 /*
  * Support of Interrupt Remapping Unit Hotplug
  */
diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index 622f0fd5f829..9eec1ec19917 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -699,7 +699,7 @@ config PCI_INTEL_IDXD_IMS
 	tristate "Intel IDXD Interrupt Message Store controller"
 	depends on PCI_MSI
 	help
-	  Support for Intel IDXD IMS Interrupt Message Store controller
+	  Support for Intel IDXD Interrupt Message Store (IMS) controller
 	  with IMS slot storage in a slot array in device memory
 
 endmenu
diff --git a/drivers/irqchip/irq-pci-intel-idxd.c b/drivers/irqchip/irq-pci-intel-idxd.c
index 509450b08f47..d33c32787ad5 100644
--- a/drivers/irqchip/irq-pci-intel-idxd.c
+++ b/drivers/irqchip/irq-pci-intel-idxd.c
@@ -128,7 +128,7 @@ static const struct msi_domain_template idxd_ims_template = {
 /**
  * pci_intel_idxd_create_ims_domain - Create a IDXD IMS domain
  * @pdev:	IDXD PCI device to operate on
- * @slots:	Pointer to the mapped slot memory arrray
+ * @slots:	Pointer to the mapped slot memory array
  * @nr_slots:	The number of slots in the array
  *
  * Returns: True on success, false otherwise
diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
index dfcaa77108de..d60585875009 100644
--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -141,7 +141,7 @@ EXPORT_SYMBOL_GPL(pci_msix_can_alloc_dyn);
  * Return: A struct msi_map
  *
  *	On success msi_map::index contains the allocated index (>= 0) and
- *	msi_map::virq the allocated Linux interrupt number (> 0).
+ *	msi_map::virq contains the allocated Linux interrupt number (> 0).
  *
  *	On fail msi_map::index contains the error code and msi_map::virq
  *	is set to 0.
@@ -376,7 +376,7 @@ EXPORT_SYMBOL(pci_irq_get_affinity);
  * the index - if there is a hardware table - or in case of purely software
  * managed IMS implementation the association happens via the
  * irq_write_msi_msg() callback of the implementation specific interrupt
- * chip, which utilizes the provided @cookie to store the MSI message in
+ * chip, which utilizes the provided @icookie to store the MSI message in
  * the appropriate place.
  *
  * Return: A struct msi_map
diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index ad235d1b0f35..687a986365f9 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -148,14 +148,14 @@ static void pci_device_domain_set_desc(msi_alloc_info_t *arg, struct msi_desc *d
 	arg->hwirq = desc->msi_index;
 }
 
-static void pci_mask_msi(struct irq_data *data)
+static void pci_irq_mask_msi(struct irq_data *data)
 {
 	struct msi_desc *desc = irq_data_get_msi_desc(data);
 
 	pci_msi_mask(desc, BIT(data->irq - desc->irq));
 }
 
-static void pci_unmask_msi(struct irq_data *data)
+static void pci_irq_unmask_msi(struct irq_data *data)
 {
 	struct msi_desc *desc = irq_data_get_msi_desc(data);
 
@@ -176,8 +176,8 @@ static void pci_unmask_msi(struct irq_data *data)
 static struct msi_domain_template pci_msi_template = {
 	.chip = {
 		.name			= "PCI-MSI",
-		.irq_mask		= pci_mask_msi,
-		.irq_unmask		= pci_unmask_msi,
+		.irq_mask		= pci_irq_mask_msi,
+		.irq_unmask		= pci_irq_unmask_msi,
 		.irq_write_msi_msg	= pci_msi_domain_write_msg,
 		.flags			= IRQCHIP_ONESHOT_SAFE,
 	},
@@ -192,12 +192,12 @@ static struct msi_domain_template pci_msi_template = {
 	},
 };
 
-static void pci_mask_msix(struct irq_data *data)
+static void pci_irq_mask_msix(struct irq_data *data)
 {
 	pci_msix_mask(irq_data_get_msi_desc(data));
 }
 
-static void pci_unmask_msix(struct irq_data *data)
+static void pci_irq_unmask_msix(struct irq_data *data)
 {
 	pci_msix_unmask(irq_data_get_msi_desc(data));
 }
@@ -213,8 +213,8 @@ static void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *a
 static struct msi_domain_template pci_msix_template = {
 	.chip = {
 		.name			= "PCI-MSIX",
-		.irq_mask		= pci_mask_msix,
-		.irq_unmask		= pci_unmask_msix,
+		.irq_mask		= pci_irq_mask_msix,
+		.irq_unmask		= pci_irq_unmask_msix,
 		.irq_write_msi_msg	= pci_msi_domain_write_msg,
 		.flags			= IRQCHIP_ONESHOT_SAFE,
 	},
@@ -302,7 +302,7 @@ bool pci_setup_msi_device_domain(struct pci_dev *pdev)
  */
 bool pci_setup_msix_device_domain(struct pci_dev *pdev, unsigned int hwsize)
 {
-	if (WARN_ON_ONCE(pdev->msix_enabled))
+	if (WARN_ON_ONCE(pdev->msi_enabled))
 		return false;
 
 	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSIX))
diff --git a/include/linux/msi.h b/include/linux/msi.h
index f73d20ccd552..a112b913fff9 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -455,7 +463,10 @@ struct msi_domain_ops {
  * struct msi_domain_info - MSI interrupt domain data
  * @flags:		Flags to decribe features and capabilities
  * @bus_token:		The domain bus token
- * @hwsize:		The hardware table size (0 if unknown/unlimited)
+ * @hwsize:		The hardware table size or the software index limit.
+ *			If 0 then the size is considered unlimited and
+ *			gets initialized to the maximum software index limit
+ *			by the domain creation code.
  * @ops:		The callback data structure
  * @chip:		Optional: associated interrupt chip
  * @chip_data:		Optional: associated interrupt chip data
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 43bc0c6d66ec..c39f75e473ea 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -118,30 +85,22 @@ static int msi_insert_desc(struct device *dev, struct msi_desc *desc,
 			   unsigned int domid, unsigned int index)
 {
 	struct msi_device_data *md = dev->msi.data;
+	struct xarray *xa = &md->__domains[domid].store;
 	unsigned int hwsize;
-	int baseidx, ret;
-
-	baseidx = msi_get_domain_base_index(dev, domid);
-	if (baseidx < 0) {
-		ret = baseidx;
-		goto fail;
-	}
+	int ret;
 
 	hwsize = msi_domain_get_hwsize(dev, domid);
 
 	if (index == MSI_ANY_INDEX) {
-		struct xa_limit limit;
+		struct xa_limit limit = { .min = 0, .max = hwsize - 1 };
 		unsigned int index;
 
-		limit.min = baseidx;
-		limit.max = baseidx + hwsize - 1;
-
-		/* Let the xarray allocate a free index within the limits */
-		ret = xa_alloc(&md->__store, &index, desc, limit, GFP_KERNEL);
+		/* Let the xarray allocate a free index within the limit */
+		ret = xa_alloc(xa, &index, desc, limit, GFP_KERNEL);
 		if (ret)
 			goto fail;
 
-		desc->msi_index = index - baseidx;
+		desc->msi_index = index;
 		return 0;
 	} else {
 		if (index >= hwsize) {
@@ -150,8 +109,7 @@ static int msi_insert_desc(struct device *dev, struct msi_desc *desc,
 		}
 
 		desc->msi_index = index;
-		index += baseidx;
-		ret = xa_insert(&md->__store, index, desc, GFP_KERNEL);
+		ret = xa_insert(xa, index, desc, GFP_KERNEL);
 		if (ret)
 			goto fail;
 		return 0;
@@ -313,10 +268,13 @@ EXPORT_SYMBOL_GPL(get_cached_msi_msg);
 static void msi_device_data_release(struct device *dev, void *res)
 {
 	struct msi_device_data *md = res;
+	int i;
 
-	msi_remove_device_irqdomains(dev, md);
-	WARN_ON_ONCE(!xa_empty(&md->__store));
-	xa_destroy(&md->__store);
+	for (i = 0; i < MSI_MAX_DEVICE_IRQDOMAINS; i++) {
+		msi_remove_device_irq_domain(dev, i);
+		WARN_ON_ONCE(!xa_empty(&md->__domains[i].store));
+		xa_destroy(&md->__domains[i].store);
+	}
 	dev->msi.data = NULL;
 }
 
@@ -348,11 +306,19 @@ int msi_setup_device_data(struct device *dev)
 		return ret;
 	}
 
-	msi_setup_default_irqdomain(dev, md);
+	for (i = 0; i < MSI_MAX_DEVICE_IRQDOMAINS; i++)
+		xa_init_flags(&md->__domains[i].store, XA_FLAGS_ALLOC);
+
+	/*
+	 * If @dev::msi::domain is set and is a global MSI domain, copy the
+	 * pointer into the domain array so all code can operate on domain
+	 * ids. The NULL pointer check is required to keep the legacy
+	 * architecture specific PCI/MSI support working.
+	 */
+	if (dev->msi.domain && !irq_domain_is_msi_parent(dev->msi.domain))
+		md->__domains[MSI_DEFAULT_DOMAIN].domain = dev->msi.domain;
 
-	xa_init_flags(&md->__store, XA_FLAGS_ALLOC);
 	mutex_init(&md->mutex);
-	md->__iter_idx = MSI_XA_MAX_INDEX;
 	dev->msi.data = md;
 	devres_add(dev, md);
 	return 0;
@@ -631,7 +592,7 @@ static struct irq_domain *msi_get_device_domain(struct device *dev, unsigned int
 	if (WARN_ON_ONCE(domid >= MSI_MAX_DEVICE_IRQDOMAINS))
 		return NULL;
 
-	domain = dev->msi.data->__irqdomains[domid];
+	domain = dev->msi.data->__domains[domid].domain;
 	if (!domain)
 		return NULL;
 
@@ -646,18 +607,13 @@ static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid
 	struct msi_domain_info *info;
 	struct irq_domain *domain;
 
-	/*
-	 * Retrieve the MSI domain for range checking. If there is no
-	 * domain or the domain is not a per device domain, then assume
-	 * full MSI range and pray that the calling subsystem knows what it
-	 * is doing.
-	 */
 	domain = msi_get_device_domain(dev, domid);
-	if (domain && irq_domain_is_msi_device(domain)) {
+	if (domain) {
 		info = domain->host_data;
 		return info->hwsize;
 	}
-	return MSI_MAX_INDEX + 1;
+	/* No domain, no size... */
+	return 0;
 }
 
 static inline void irq_chip_write_msi_msg(struct irq_data *data,
@@ -858,6 +814,17 @@ static struct irq_domain *__msi_create_irq_domain(struct fwnode_handle *fwnode,
 {
 	struct irq_domain *domain;
 
+	if (info->hwsize > MSI_XA_DOMAIN_SIZE)
+		return NULL;
+
+	/*
+	 * Hardware size 0 is valid for backwards compatibility and for
+	 * domains which are not backed by a hardware table. Grant the
+	 * maximum index space.
+	 */
+	if (!info->hwsize)
+		info->hwsize = MSI_XA_DOMAIN_SIZE;
+
 	msi_domain_update_dom_ops(info);
 	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
 		msi_domain_update_chip_ops(info);
@@ -997,7 +964,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
 	if (!bundle)
 		return false;
 
-	bundle->info.hwsize = hwsize ? hwsize : MSI_MAX_INDEX;
+	bundle->info.hwsize = hwsize;
 	bundle->info.chip = &bundle->chip;
 	bundle->info.ops = &bundle->ops;
 	bundle->info.data = domain_data;
@@ -1028,7 +995,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
 		goto fail;
 
 	domain->dev = dev;
-	dev->msi.data->__irqdomains[domid] = domain;
+	dev->msi.data->__domains[domid].domain = domain;
 	msi_unlock_descs(dev);
 	return true;
 
@@ -1058,7 +1025,7 @@ void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
 	if (!domain || !irq_domain_is_msi_device(domain))
 		goto unlock;
 
-	dev->msi.data->__irqdomains[domid] = NULL;
+	dev->msi.data->__domains[domid].domain = NULL;
 	info = domain->host_data;
 	irq_domain_remove(domain);
 	kfree(container_of(info, struct msi_domain_template, info));
@@ -1112,9 +1079,10 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev,
 		.last	= virq_base + nvec - 1,
 	};
 	struct msi_desc *desc;
+	struct xarray *xa;
 	int ret, virq;
 
-	if (!msi_ctrl_range_valid(dev, &ctrl))
+	if (!msi_ctrl_valid(dev, &ctrl))
 		return -EINVAL;
 
 	msi_lock_descs(dev);
@@ -1122,8 +1090,10 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev,
 	if (ret)
 		goto unlock;
 
+	xa = &dev->msi.data->__domains[ctrl.domid].store;
+
 	for (virq = virq_base; virq < virq_base + nvec; virq++) {
-		desc = xa_load(&dev->msi.data->__store, virq);
+		desc = xa_load(xa, virq);
 		desc->irq = virq;
 
 		ops->set_desc(arg, desc);
@@ -1257,8 +1227,8 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
 static int __msi_domain_alloc_irqs(struct device *dev, struct irq_domain *domain,
 				   struct msi_ctrl *ctrl)
 {
+	struct xarray *xa = &dev->msi.data->__domains[ctrl->domid].store;
 	struct msi_domain_info *info = domain->host_data;
-	struct xarray *xa = &dev->msi.data->__store;
 	struct msi_domain_ops *ops = info->ops;
 	unsigned int vflags = 0, allocated = 0;
 	msi_alloc_info_t arg = { };
@@ -1347,7 +1317,7 @@ static int __msi_domain_alloc_locked(struct device *dev, struct msi_ctrl *ctrl)
 	struct irq_domain *domain;
 	int ret;
 
-	if (!msi_ctrl_range_valid(dev, ctrl))
+	if (!msi_ctrl_valid(dev, ctrl))
 		return -EINVAL;
 
 	domain = msi_get_device_domain(dev, ctrl->domid);
@@ -1527,16 +1497,14 @@ struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, u
 static void __msi_domain_free_irqs(struct device *dev, struct irq_domain *domain,
 				   struct msi_ctrl *ctrl)
 {
+	struct xarray *xa = &dev->msi.data->__domains[ctrl->domid].store;
 	struct msi_domain_info *info = domain->host_data;
-	struct xarray *xa = &dev->msi.data->__store;
 	struct irq_data *irqd;
 	struct msi_desc *desc;
 	unsigned long idx;
-	int i, base;
-
-	base = ctrl->domid * MSI_XA_DOMAIN_SIZE;
+	int i;
 
-	xa_for_each_range(xa, idx, desc, ctrl->first + base, ctrl->last + base) {
+	xa_for_each_range(xa, idx, desc, ctrl->first, ctrl->last) {
 		/* Only handle MSI entries which have an interrupt associated */
 		if (!msi_desc_match(desc, MSI_DESC_ASSOCIATED))
 			continue;
@@ -1561,7 +1529,7 @@ static void msi_domain_free_locked(struct device *dev, struct msi_ctrl *ctrl)
 	struct msi_domain_ops *ops;
 	struct irq_domain *domain;
 
-	if (!msi_ctrl_range_valid(dev, ctrl))
+	if (!msi_ctrl_valid(dev, ctrl))
 		return;
 
 	domain = msi_get_device_domain(dev, ctrl->domid);

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [patch V3 01/33] genirq/msi: Rearrange MSI domain flags
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
@ 2022-11-24 23:25 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 02/33] genirq/msi: Provide struct msi_parent_ops Thomas Gleixner
                   ` (33 subsequent siblings)
  34 siblings, 1 reply; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:25 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe,
	Jason Gunthorpe

These flags got added as necessary and have no obvious structure. For
feature support checks and masking it's convenient to have two blocks of
flags:

   1) Flags to control the internal behaviour like allocating/freeing
      MSI descriptors. Those flags do not need any support from the
      underlying MSI parent domain. They are mostly under the control
      of the outermost domain which implements the actual MSI support.

   2) Flags to expose features, e.g. PCI multi-MSI or requirements
      which can depend on a underlying domain.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 include/linux/msi.h |   49 ++++++++++++++++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 15 deletions(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -24,6 +24,8 @@
 #include <linux/xarray.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
+#include <linux/bits.h>
+
 #include <asm/msi.h>
 
 /* Dummy shadow structures if an architecture does not define them */
@@ -440,7 +442,16 @@ struct msi_domain_info {
 	void				*data;
 };
 
-/* Flags for msi_domain_info */
+/*
+ * Flags for msi_domain_info
+ *
+ * Bit 0-15:	Generic MSI functionality which is not subject to restriction
+ *		by parent domains
+ *
+ * Bit 16-31:	Functionality which depends on the underlying parent domain and
+ *		can be masked out by msi_parent_ops::init_dev_msi_info() when
+ *		a device MSI domain is initialized.
+ */
 enum {
 	/*
 	 * Init non implemented ops callbacks with default MSI domain
@@ -452,33 +463,41 @@ enum {
 	 * callbacks.
 	 */
 	MSI_FLAG_USE_DEF_CHIP_OPS	= (1 << 1),
-	/* Support multiple PCI MSI interrupts */
-	MSI_FLAG_MULTI_PCI_MSI		= (1 << 2),
-	/* Support PCI MSIX interrupts */
-	MSI_FLAG_PCI_MSIX		= (1 << 3),
 	/* Needs early activate, required for PCI */
-	MSI_FLAG_ACTIVATE_EARLY		= (1 << 4),
+	MSI_FLAG_ACTIVATE_EARLY		= (1 << 2),
 	/*
 	 * Must reactivate when irq is started even when
 	 * MSI_FLAG_ACTIVATE_EARLY has been set.
 	 */
-	MSI_FLAG_MUST_REACTIVATE	= (1 << 5),
-	/* Is level-triggered capable, using two messages */
-	MSI_FLAG_LEVEL_CAPABLE		= (1 << 6),
+	MSI_FLAG_MUST_REACTIVATE	= (1 << 3),
 	/* Populate sysfs on alloc() and destroy it on free() */
-	MSI_FLAG_DEV_SYSFS		= (1 << 7),
-	/* MSI-X entries must be contiguous */
-	MSI_FLAG_MSIX_CONTIGUOUS	= (1 << 8),
+	MSI_FLAG_DEV_SYSFS		= (1 << 4),
 	/* Allocate simple MSI descriptors */
-	MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS	= (1 << 9),
+	MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS	= (1 << 5),
 	/* Free MSI descriptors */
-	MSI_FLAG_FREE_MSI_DESCS		= (1 << 10),
+	MSI_FLAG_FREE_MSI_DESCS		= (1 << 6),
 	/*
 	 * Quirk to handle MSI implementations which do not provide
 	 * masking. Currently known to affect x86, but has to be partially
 	 * handled in the core MSI code.
 	 */
-	MSI_FLAG_NOMASK_QUIRK		= (1 << 11),
+	MSI_FLAG_NOMASK_QUIRK		= (1 << 7),
+
+	/* Mask for the generic functionality */
+	MSI_GENERIC_FLAGS_MASK		= GENMASK(15, 0),
+
+	/* Mask for the domain specific functionality */
+	MSI_DOMAIN_FLAGS_MASK		= GENMASK(31, 16),
+
+	/* Support multiple PCI MSI interrupts */
+	MSI_FLAG_MULTI_PCI_MSI		= (1 << 16),
+	/* Support PCI MSIX interrupts */
+	MSI_FLAG_PCI_MSIX		= (1 << 17),
+	/* Is level-triggered capable, using two messages */
+	MSI_FLAG_LEVEL_CAPABLE		= (1 << 18),
+	/* MSI-X entries must be contiguous */
+	MSI_FLAG_MSIX_CONTIGUOUS	= (1 << 19),
+
 };
 
 int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask,


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 02/33] genirq/msi: Provide struct msi_parent_ops
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 01/33] genirq/msi: Rearrange MSI domain flags Thomas Gleixner
@ 2022-11-24 23:25 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 03/33] genirq/msi: Provide data structs for per device domains Thomas Gleixner
                   ` (32 subsequent siblings)
  34 siblings, 1 reply; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:25 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

MSI parent domains must have some control over the MSI domains which are
built on top. On domain creation they need to fill in e.g. architecture
specific chip callbacks or msi domain ops to make the outermost domain
parent agnostic which is obviously required for architecture independence
etc.

The structure contains:

    1) A bitfield which exposes the supported functional features. This
       allows to check for features and is also used in the initialization
       callback to mask out unsupported features when the actual domain
       implementation requests a broader range, e.g. on x86 PCI multi-MSI
       is only supported by remapping domains but not by the underlying
       vector domain. The PCI/MSI code can then always request multi-MSI
       support, but the resulting feature set after creation might not
       have it set.

    2) An optional string prefix which is put in front of domain and chip
       names during creation of the MSI domain. That allows to keep the
       naming schemes e.g. on x86 where PCI-MSI domains have a IR- prefix
       when interrupt remapping is enabled.

    3) An initialization callback to sanity check the domain info of
       the to be created MSI domain, to restrict features and to
       apply changes in MSI ops and interrupt chip callbacks to
       accomodate to the particular MSI parent implementation and/or
       the underlying hierarchy.

Add a conveniance function to delegate the initialization from the
MSI parent domain to an underlying domain in the hierarchy.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Renamed arguments and updated comments (Jason)
---
 include/linux/irqdomain.h |    5 +++++
 include/linux/msi.h       |   21 +++++++++++++++++++++
 kernel/irq/msi.c          |   41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+)

--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -46,6 +46,7 @@ struct irq_desc;
 struct cpumask;
 struct seq_file;
 struct irq_affinity_desc;
+struct msi_parent_ops;
 
 #define IRQ_DOMAIN_IRQ_SPEC_PARAMS 16
 
@@ -134,6 +135,7 @@ struct irq_domain_chip_generic;
  * @pm_dev:	Pointer to a device that can be utilized for power management
  *		purposes related to the irq domain.
  * @parent:	Pointer to parent irq_domain to support hierarchy irq_domains
+ * @msi_parent_ops: Pointer to MSI parent domain methods for per device domain init
  *
  * Revmap data, used internally by the irq domain code:
  * @revmap_size:	Size of the linear map table @revmap[]
@@ -157,6 +159,9 @@ struct irq_domain {
 #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
 	struct irq_domain		*parent;
 #endif
+#ifdef CONFIG_GENERIC_MSI_IRQ
+	const struct msi_parent_ops	*msi_parent_ops;
+#endif
 
 	/* reverse map data. The linear map gets appended to the irq_domain */
 	irq_hw_number_t			hwirq_max;
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -500,6 +500,27 @@ enum {
 
 };
 
+/**
+ * struct msi_parent_ops - MSI parent domain callbacks and configuration info
+ *
+ * @supported_flags:	Required: The supported MSI flags of the parent domain
+ * @prefix:		Optional: Prefix for the domain and chip name
+ * @init_dev_msi_info:	Required: Callback for MSI parent domains to setup parent
+ *			domain specific domain flags, domain ops and interrupt chip
+ *			callbacks when a per device domain is created.
+ */
+struct msi_parent_ops {
+	u32		supported_flags;
+	const char	*prefix;
+	bool		(*init_dev_msi_info)(struct device *dev, struct irq_domain *domain,
+					     struct irq_domain *msi_parent_domain,
+					     struct msi_domain_info *msi_child_info);
+};
+
+bool msi_parent_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+				  struct irq_domain *msi_parent_domain,
+				  struct msi_domain_info *msi_child_info);
+
 int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask,
 			    bool force);
 
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -789,6 +789,47 @@ struct irq_domain *msi_create_irq_domain
 	return domain;
 }
 
+/**
+ * msi_parent_init_dev_msi_info - Delegate initialization of device MSI info down
+ *				  in the domain hierarchy
+ * @dev:		The device for which the domain should be created
+ * @domain:		The domain in the hierarchy this op is being called on
+ * @msi_parent_domain:	The IRQ_DOMAIN_FLAG_MSI_PARENT domain for the child to
+ *			be created
+ * @msi_child_info:	The MSI domain info of the IRQ_DOMAIN_FLAG_MSI_DEVICE
+ *			domain to be created
+ *
+ * Return: true on success, false otherwise
+ *
+ * This is the most complex problem of per device MSI domains and the
+ * underlying interrupt domain hierarchy:
+ *
+ * The device domain to be initialized requests the broadest feature set
+ * possible and the underlying domain hierarchy puts restrictions on it.
+ *
+ * That's trivial for a simple parent->child relationship, but it gets
+ * interesting with an intermediate domain: root->parent->child.  The
+ * intermediate 'parent' can expand the capabilities which the 'root'
+ * domain is providing. So that creates a classic hen and egg problem:
+ * Which entity is doing the restrictions/expansions?
+ *
+ * One solution is to let the root domain handle the initialization that's
+ * why there is the @domain and the @msi_parent_domain pointer.
+ */
+bool msi_parent_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+				  struct irq_domain *msi_parent_domain,
+				  struct msi_domain_info *msi_child_info)
+{
+	struct irq_domain *parent = domain->parent;
+
+	if (WARN_ON_ONCE(!parent || !parent->msi_parent_ops ||
+			 !parent->msi_parent_ops->init_dev_msi_info))
+		return false;
+
+	return parent->msi_parent_ops->init_dev_msi_info(dev, parent, msi_parent_domain,
+							 msi_child_info);
+}
+
 int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
 			    int nvec, msi_alloc_info_t *arg)
 {


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 03/33] genirq/msi: Provide data structs for per device domains
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 01/33] genirq/msi: Rearrange MSI domain flags Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 02/33] genirq/msi: Provide struct msi_parent_ops Thomas Gleixner
@ 2022-11-24 23:25 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 04/33] genirq/msi: Add size info to struct msi_domain_info Thomas Gleixner
                   ` (31 subsequent siblings)
  34 siblings, 1 reply; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:25 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Provide struct msi_domain_template which contains a bundle of struct
irq_chip, struct msi_domain_ops and struct msi_domain_info and a name
field.

This template is used by MSI device domain implementations to provide the
domain specific functionality, feature bits etc.

When a MSI domain is created the template is duplicated in the core code
so that it can be modified per instance. That means templates can be
marked const at the MSI device domain code.

The template is a bundle to avoid several allocations and duplications
of the involved structures.

The name field is used to construct the final domain and chip name via:

    $PREFIX$NAME-$DEVNAME

where prefix is the optional prefix of the MSI parent domain, $NAME is the
provided name in template::chip and the device name so that the domain
is properly identified. On x86 this results for PCI/MSI in:

   PCI-MSI-0000:3d:00.1 or IR-PCI-MSIX-0000:3d:00.1

depending on the domain type and the availability of remapping.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Correct changelog (Kevin)
---
 include/linux/msi.h |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -24,6 +24,7 @@
 #include <linux/xarray.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
+#include <linux/irq.h>
 #include <linux/bits.h>
 
 #include <asm/msi.h>
@@ -74,7 +75,6 @@ struct msi_msg {
 
 extern int pci_msi_ignore_mask;
 /* Helper functions */
-struct irq_data;
 struct msi_desc;
 struct pci_dev;
 struct platform_msi_priv_data;
@@ -442,6 +442,20 @@ struct msi_domain_info {
 	void				*data;
 };
 
+/**
+ * struct msi_domain_template - Template for MSI device domains
+ * @name:	Storage for the resulting name. Filled in by the core.
+ * @chip:	Interrupt chip for this domain
+ * @ops:	MSI domain ops
+ * @info:	MSI domain info data
+ */
+struct msi_domain_template {
+	char			name[48];
+	struct irq_chip		chip;
+	struct msi_domain_ops	ops;
+	struct msi_domain_info	info;
+};
+
 /*
  * Flags for msi_domain_info
  *


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 04/33] genirq/msi: Add size info to struct msi_domain_info
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (2 preceding siblings ...)
  2022-11-24 23:25 ` [patch V3 03/33] genirq/msi: Provide data structs for per device domains Thomas Gleixner
@ 2022-11-24 23:25 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 05/33] genirq/msi: Split msi_create_irq_domain() Thomas Gleixner
                   ` (30 subsequent siblings)
  34 siblings, 1 reply; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:25 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

To allow proper range checking especially for dynamic allocations add a
size field to struct msi_domain_info. If the field is 0 then the size is
unknown or unlimited (up to MSI_MAX_INDEX) to provide backwards
compability.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Move the initialization into the common domain creation code
---
 include/linux/msi.h |    5 +++++
 kernel/irq/msi.c    |   11 +++++++++++
 2 files changed, 16 insertions(+)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -422,6 +422,10 @@ struct msi_domain_ops {
  * struct msi_domain_info - MSI interrupt domain data
  * @flags:		Flags to decribe features and capabilities
  * @bus_token:		The domain bus token
+ * @hwsize:		The hardware table size or the software index limit.
+ *			If 0 then the size is considered unlimited and
+ *			gets initialized to the maximum software index limit
+ *			by the domain creation code.
  * @ops:		The callback data structure
  * @chip:		Optional: associated interrupt chip
  * @chip_data:		Optional: associated interrupt chip data
@@ -433,6 +437,7 @@ struct msi_domain_ops {
 struct msi_domain_info {
 	u32				flags;
 	enum irq_domain_bus_token	bus_token;
+	unsigned int			hwsize;
 	struct msi_domain_ops		*ops;
 	struct irq_chip			*chip;
 	void				*chip_data;
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -773,6 +773,17 @@ struct irq_domain *msi_create_irq_domain
 {
 	struct irq_domain *domain;
 
+	if (info->hwsize > MSI_XA_DOMAIN_SIZE)
+		return NULL;
+
+	/*
+	 * Hardware size 0 is valid for backwards compatibility and for
+	 * domains which are not backed by a hardware table. Grant the
+	 * maximum index space.
+	 */
+	if (!info->hwsize)
+		info->hwsize = MSI_XA_DOMAIN_SIZE;
+
 	msi_domain_update_dom_ops(info);
 	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
 		msi_domain_update_chip_ops(info);


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 05/33] genirq/msi: Split msi_create_irq_domain()
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (3 preceding siblings ...)
  2022-11-24 23:25 ` [patch V3 04/33] genirq/msi: Add size info to struct msi_domain_info Thomas Gleixner
@ 2022-11-24 23:25 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 06/33] genirq/irqdomain: Add irq_domain::dev for per device MSI domains Thomas Gleixner
                   ` (29 subsequent siblings)
  34 siblings, 1 reply; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:25 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Split the functionality of msi_create_irq_domain() so it can
be reused for creating per device irq domains.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/irq/msi.c |   32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -759,17 +759,10 @@ static void msi_domain_update_chip_ops(s
 		chip->irq_set_affinity = msi_domain_set_affinity;
 }
 
-/**
- * msi_create_irq_domain - Create an MSI interrupt domain
- * @fwnode:	Optional fwnode of the interrupt controller
- * @info:	MSI domain info
- * @parent:	Parent irq domain
- *
- * Return: pointer to the created &struct irq_domain or %NULL on failure
- */
-struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
-					 struct msi_domain_info *info,
-					 struct irq_domain *parent)
+static struct irq_domain *__msi_create_irq_domain(struct fwnode_handle *fwnode,
+						  struct msi_domain_info *info,
+						  unsigned int flags,
+						  struct irq_domain *parent)
 {
 	struct irq_domain *domain;
 
@@ -788,7 +781,7 @@ struct irq_domain *msi_create_irq_domain
 	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
 		msi_domain_update_chip_ops(info);
 
-	domain = irq_domain_create_hierarchy(parent, IRQ_DOMAIN_FLAG_MSI, 0,
+	domain = irq_domain_create_hierarchy(parent, flags | IRQ_DOMAIN_FLAG_MSI, 0,
 					     fwnode, &msi_domain_ops, info);
 
 	if (domain) {
@@ -801,6 +794,21 @@ struct irq_domain *msi_create_irq_domain
 }
 
 /**
+ * msi_create_irq_domain - Create an MSI interrupt domain
+ * @fwnode:	Optional fwnode of the interrupt controller
+ * @info:	MSI domain info
+ * @parent:	Parent irq domain
+ *
+ * Return: pointer to the created &struct irq_domain or %NULL on failure
+ */
+struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
+					 struct msi_domain_info *info,
+					 struct irq_domain *parent)
+{
+	return __msi_create_irq_domain(fwnode, info, 0, parent);
+}
+
+/**
  * msi_parent_init_dev_msi_info - Delegate initialization of device MSI info down
  *				  in the domain hierarchy
  * @dev:		The device for which the domain should be created


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 06/33] genirq/irqdomain: Add irq_domain::dev for per device MSI domains
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (4 preceding siblings ...)
  2022-11-24 23:25 ` [patch V3 05/33] genirq/msi: Split msi_create_irq_domain() Thomas Gleixner
@ 2022-11-24 23:25 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] genirq/irqdomain: Add irq_domain:: Dev " tip-bot2 for Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 07/33] genirq/msi: Provide msi_create/free_device_irq_domain() Thomas Gleixner
                   ` (28 subsequent siblings)
  34 siblings, 1 reply; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:25 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Per device domains require the device pointer of the device which
instantiated the domain for some purposes. Add the pointer to struct
irq_domain. It will be used in the next step which provides the
infrastructure to create per device MSI domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqdomain.h |    4 ++++
 1 file changed, 4 insertions(+)

--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -132,6 +132,9 @@ struct irq_domain_chip_generic;
  * @gc:		Pointer to a list of generic chips. There is a helper function for
  *		setting up one or more generic chips for interrupt controllers
  *		drivers using the generic chip library which uses this pointer.
+ * @dev:	Pointer to the device which instantiated the irqdomain
+ *		With per device irq domains this is not necessarily the same
+ *		as @pm_dev.
  * @pm_dev:	Pointer to a device that can be utilized for power management
  *		purposes related to the irq domain.
  * @parent:	Pointer to parent irq_domain to support hierarchy irq_domains
@@ -155,6 +158,7 @@ struct irq_domain {
 	struct fwnode_handle		*fwnode;
 	enum irq_domain_bus_token	bus_token;
 	struct irq_domain_chip_generic	*gc;
+	struct device			*dev;
 	struct device			*pm_dev;
 #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
 	struct irq_domain		*parent;


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 07/33] genirq/msi: Provide msi_create/free_device_irq_domain()
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (5 preceding siblings ...)
  2022-11-24 23:25 ` [patch V3 06/33] genirq/irqdomain: Add irq_domain::dev for per device MSI domains Thomas Gleixner
@ 2022-11-24 23:25 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 08/33] genirq/msi: Provide msi_match_device_domain() Thomas Gleixner
                   ` (27 subsequent siblings)
  34 siblings, 1 reply; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:25 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Now that all prerequsites are in place, provide the actual interfaces for
creating and removing per device interrupt domains.

MSI device interrupt domains are created from the provided
msi_domain_template which is duplicated so that it can be modified for the
particular device.

The name of the domain and the name of the interrupt chip are composed by
"$(PREFIX)$(CHIPNAME)-$(DEVNAME)"

  $PREFIX:   The optional prefix provided by the underlying MSI parent domain
             via msi_parent_ops::prefix.
  $CHIPNAME: The name of the irq_chip in the template
  $DEVNAME:  The name of the device

The domain is further initialized through a MSI parent domain callback which
fills in the required functionality for the parent domain or domains further
down the hierarchy. This initialization can fail, e.g. when the requested
feature or MSI domain type cannot be supported.

The domain pointer is stored in the pointer array inside of msi_device_data
which is attached to the domain.

The domain can be removed via the API or left for disposal via devres when
the device is torn down. The API removal is useful e.g. for PCI to have
seperate domains for MSI and MSI-X, which are mutually exclusive and always
occupy the default domain id slot.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Remove unused argument and adopt the hwsize init (Kevin)
    Adopt to the xarray and domain storage split
---
 include/linux/msi.h |    6 ++
 kernel/irq/msi.c    |  138 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 144 insertions(+)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -547,6 +547,12 @@ struct irq_domain *msi_create_irq_domain
 					 struct msi_domain_info *info,
 					 struct irq_domain *parent);
 
+bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
+				  const struct msi_domain_template *template,
+				  unsigned int hwsize, void *domain_data,
+				  void *chip_data);
+void msi_remove_device_irq_domain(struct device *dev, unsigned int domid);
+
 int msi_domain_alloc_irqs_range_locked(struct device *dev, unsigned int domid,
 				       unsigned int first, unsigned int last);
 int msi_domain_alloc_irqs_range(struct device *dev, unsigned int domid,
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -240,6 +240,7 @@ static void msi_device_data_release(stru
 	int i;
 
 	for (i = 0; i < MSI_MAX_DEVICE_IRQDOMAINS; i++) {
+		msi_remove_device_irq_domain(dev, i);
 		WARN_ON_ONCE(!xa_empty(&md->__domains[i].store));
 		xa_destroy(&md->__domains[i].store);
 	}
@@ -849,6 +850,143 @@ bool msi_parent_init_dev_msi_info(struct
 							 msi_child_info);
 }
 
+/**
+ * msi_create_device_irq_domain - Create a device MSI interrupt domain
+ * @dev:		Pointer to the device
+ * @domid:		Domain id
+ * @template:		MSI domain info bundle used as template
+ * @hwsize:		Maximum number of MSI table entries (0 if unknown or unlimited)
+ * @domain_data:	Optional pointer to domain specific data which is set in
+ *			msi_domain_info::data
+ * @chip_data:		Optional pointer to chip specific data which is set in
+ *			msi_domain_info::chip_data
+ *
+ * Return: True on success, false otherwise
+ *
+ * There is no firmware node required for this interface because the per
+ * device domains are software constructs which are actually closer to the
+ * hardware reality than any firmware can describe them.
+ *
+ * The domain name and the irq chip name for a MSI device domain are
+ * composed by: "$(PREFIX)$(CHIPNAME)-$(DEVNAME)"
+ *
+ * $PREFIX:   Optional prefix provided by the underlying MSI parent domain
+ *	      via msi_parent_ops::prefix. If that pointer is NULL the prefix
+ *	      is empty.
+ * $CHIPNAME: The name of the irq_chip in @template
+ * $DEVNAME:  The name of the device
+ *
+ * This results in understandable chip names and hardware interrupt numbers
+ * in e.g. /proc/interrupts
+ *
+ * PCI-MSI-0000:00:1c.0     0-edge  Parent domain has no prefix
+ * IR-PCI-MSI-0000:00:1c.4  0-edge  Same with interrupt remapping prefix 'IR-'
+ *
+ * IR-PCI-MSIX-0000:3d:00.0 0-edge  Hardware interrupt numbers reflect
+ * IR-PCI-MSIX-0000:3d:00.0 1-edge  the real MSI-X index on that device
+ * IR-PCI-MSIX-0000:3d:00.0 2-edge
+ *
+ * On IMS domains the hardware interrupt number is either a table entry
+ * index or a purely software managed index but it is guaranteed to be
+ * unique.
+ *
+ * The domain pointer is stored in @dev::msi::data::__irqdomains[]. All
+ * subsequent operations on the domain depend on the domain id.
+ *
+ * The domain is automatically freed when the device is removed via devres
+ * in the context of @dev::msi::data freeing, but it can also be
+ * independently removed via @msi_remove_device_irq_domain().
+ */
+bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
+				  const struct msi_domain_template *template,
+				  unsigned int hwsize, void *domain_data,
+				  void *chip_data)
+{
+	struct irq_domain *domain, *parent = dev->msi.domain;
+	const struct msi_parent_ops *pops;
+	struct msi_domain_template *bundle;
+	struct fwnode_handle *fwnode;
+
+	if (!irq_domain_is_msi_parent(parent))
+		return false;
+
+	if (domid >= MSI_MAX_DEVICE_IRQDOMAINS)
+		return false;
+
+	bundle = kmemdup(template, sizeof(*bundle), GFP_KERNEL);
+	if (!bundle)
+		return false;
+
+	bundle->info.hwsize = hwsize;
+	bundle->info.chip = &bundle->chip;
+	bundle->info.ops = &bundle->ops;
+	bundle->info.data = domain_data;
+	bundle->info.chip_data = chip_data;
+
+	pops = parent->msi_parent_ops;
+	snprintf(bundle->name, sizeof(bundle->name), "%s%s-%s",
+		 pops->prefix ? : "", bundle->chip.name, dev_name(dev));
+	bundle->chip.name = bundle->name;
+
+	fwnode = irq_domain_alloc_named_fwnode(bundle->name);
+	if (!fwnode)
+		goto free_bundle;
+
+	if (msi_setup_device_data(dev))
+		goto free_fwnode;
+
+	msi_lock_descs(dev);
+
+	if (WARN_ON_ONCE(msi_get_device_domain(dev, domid)))
+		goto fail;
+
+	if (!pops->init_dev_msi_info(dev, parent, parent, &bundle->info))
+		goto fail;
+
+	domain = __msi_create_irq_domain(fwnode, &bundle->info, IRQ_DOMAIN_FLAG_MSI_DEVICE, parent);
+	if (!domain)
+		goto fail;
+
+	domain->dev = dev;
+	dev->msi.data->__domains[domid].domain = domain;
+	msi_unlock_descs(dev);
+	return true;
+
+fail:
+	msi_unlock_descs(dev);
+free_fwnode:
+	kfree(fwnode);
+free_bundle:
+	kfree(bundle);
+	return false;
+}
+
+/**
+ * msi_remove_device_irq_domain - Free a device MSI interrupt domain
+ * @dev:	Pointer to the device
+ * @domid:	Domain id
+ */
+void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
+{
+	struct msi_domain_info *info;
+	struct irq_domain *domain;
+
+	msi_lock_descs(dev);
+
+	domain = msi_get_device_domain(dev, domid);
+
+	if (!domain || !irq_domain_is_msi_device(domain))
+		goto unlock;
+
+	dev->msi.data->__domains[domid].domain = NULL;
+	info = domain->host_data;
+	irq_domain_remove(domain);
+	kfree(container_of(info, struct msi_domain_template, info));
+
+unlock:
+	msi_unlock_descs(dev);
+}
+
 int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
 			    int nvec, msi_alloc_info_t *arg)
 {


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 08/33] genirq/msi: Provide msi_match_device_domain()
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (6 preceding siblings ...)
  2022-11-24 23:25 ` [patch V3 07/33] genirq/msi: Provide msi_create/free_device_irq_domain() Thomas Gleixner
@ 2022-11-24 23:25 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-11-24 23:25 ` [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc() Thomas Gleixner
                   ` (26 subsequent siblings)
  34 siblings, 1 reply; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:25 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Provide an interface to match a per device domain bus token. This allows to
query which type of domain is installed for a particular domain id. Will be
used for PCI to avoid frequent create/remove cycles for the MSI resp. MSI-X
domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/msi.h |    3 +++
 kernel/irq/msi.c    |   25 +++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -553,6 +553,9 @@ bool msi_create_device_irq_domain(struct
 				  void *chip_data);
 void msi_remove_device_irq_domain(struct device *dev, unsigned int domid);
 
+bool msi_match_device_irq_domain(struct device *dev, unsigned int domid,
+				 enum irq_domain_bus_token bus_token);
+
 int msi_domain_alloc_irqs_range_locked(struct device *dev, unsigned int domid,
 				       unsigned int first, unsigned int last);
 int msi_domain_alloc_irqs_range(struct device *dev, unsigned int domid,
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -987,6 +987,31 @@ void msi_remove_device_irq_domain(struct
 	msi_unlock_descs(dev);
 }
 
+/**
+ * msi_match_device_irq_domain - Match a device irq domain against a bus token
+ * @dev:	Pointer to the device
+ * @domid:	Domain id
+ * @bus_token:	Bus token to match against the domain bus token
+ *
+ * Return: True if device domain exists and bus tokens match.
+ */
+bool msi_match_device_irq_domain(struct device *dev, unsigned int domid,
+				 enum irq_domain_bus_token bus_token)
+{
+	struct msi_domain_info *info;
+	struct irq_domain *domain;
+	bool ret = false;
+
+	msi_lock_descs(dev);
+	domain = msi_get_device_domain(dev, domid);
+	if (domain && irq_domain_is_msi_device(domain)) {
+		info = domain->host_data;
+		ret = info->bus_token == bus_token;
+	}
+	msi_unlock_descs(dev);
+	return ret;
+}
+
 int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
 			    int nvec, msi_alloc_info_t *arg)
 {


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (7 preceding siblings ...)
  2022-11-24 23:25 ` [patch V3 08/33] genirq/msi: Provide msi_match_device_domain() Thomas Gleixner
@ 2022-11-24 23:25 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
                     ` (3 more replies)
  2022-11-24 23:26 ` [patch V3 10/33] PCI/MSI: Split __pci_write_msi_msg() Thomas Gleixner
                   ` (25 subsequent siblings)
  34 siblings, 4 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:25 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Per device domains provide the real domain size to the core code. This
allows range checking on insertion of MSI descriptors and also paves the
way for dynamic index allocations which are required e.g. for IMS. This
avoids external mechanisms like bitmaps on the device side and just
utilizes the core internal MSI descriptor storxe for it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Adopt to the new info->hwsize handling and to the new xarray split
---
 kernel/irq/msi.c |   58 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 47 insertions(+), 11 deletions(-)

--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -40,6 +40,7 @@ struct msi_ctrl {
 #define MSI_XA_DOMAIN_SIZE	(MSI_MAX_INDEX + 1)
 
 static void msi_domain_free_locked(struct device *dev, struct msi_ctrl *ctrl);
+static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid);
 static inline int msi_sysfs_create_group(struct device *dev);
 
 
@@ -80,16 +81,28 @@ static void msi_free_desc(struct msi_des
 	kfree(desc);
 }
 
-static int msi_insert_desc(struct msi_device_data *md, struct msi_desc *desc,
+static int msi_insert_desc(struct device *dev, struct msi_desc *desc,
 			   unsigned int domid, unsigned int index)
 {
+	struct msi_device_data *md = dev->msi.data;
 	struct xarray *xa = &md->__domains[domid].store;
+	unsigned int hwsize;
 	int ret;
 
+	hwsize = msi_domain_get_hwsize(dev, domid);
+	if (index >= hwsize) {
+		ret = -ERANGE;
+		goto fail;
+	}
+
 	desc->msi_index = index;
 	ret = xa_insert(xa, index, desc, GFP_KERNEL);
 	if (ret)
-		msi_free_desc(desc);
+		goto fail;
+	return 0;
+
+fail:
+	msi_free_desc(desc);
 	return ret;
 }
 
@@ -117,7 +130,7 @@ int msi_domain_insert_msi_desc(struct de
 	/* Copy type specific data to the new descriptor. */
 	desc->pci = init_desc->pci;
 
-	return msi_insert_desc(dev->msi.data, desc, domid, init_desc->msi_index);
+	return msi_insert_desc(dev, desc, domid, init_desc->msi_index);
 }
 
 static bool msi_desc_match(struct msi_desc *desc, enum msi_desc_filter filter)
@@ -136,11 +149,16 @@ static bool msi_desc_match(struct msi_de
 
 static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl)
 {
+	unsigned int hwsize;
+
 	if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS ||
-			 !dev->msi.data->__domains[ctrl->domid].domain ||
-			 ctrl->first > ctrl->last ||
-			 ctrl->first > MSI_MAX_INDEX ||
-			 ctrl->last > MSI_MAX_INDEX))
+			 !dev->msi.data->__domains[ctrl->domid].domain))
+		return false;
+
+	hwsize = msi_domain_get_hwsize(dev, ctrl->domid);
+	if (WARN_ON_ONCE(ctrl->first > ctrl->last ||
+			 ctrl->first >= hwsize ||
+			 ctrl->last >= hwsize))
 		return false;
 	return true;
 }
@@ -208,7 +226,7 @@ static int msi_domain_add_simple_msi_des
 		desc = msi_alloc_desc(dev, 1, NULL);
 		if (!desc)
 			goto fail_mem;
-		ret = msi_insert_desc(dev->msi.data, desc, ctrl->domid, idx);
+		ret = msi_insert_desc(dev, desc, ctrl->domid, idx);
 		if (ret)
 			goto fail;
 	}
@@ -407,7 +425,10 @@ unsigned int msi_domain_get_virq(struct
 	if (!dev->msi.data)
 		return 0;
 
-	if (WARN_ON_ONCE(index > MSI_MAX_INDEX || domid >= MSI_MAX_DEVICE_IRQDOMAINS))
+	if (WARN_ON_ONCE(domid >= MSI_MAX_DEVICE_IRQDOMAINS))
+		return 0;
+
+	if (WARN_ON_ONCE(index >= msi_domain_get_hwsize(dev, domid)))
 		return 0;
 
 	/* This check is only valid for the PCI default MSI domain */
@@ -569,6 +590,20 @@ static struct irq_domain *msi_get_device
 	return domain;
 }
 
+static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid)
+{
+	struct msi_domain_info *info;
+	struct irq_domain *domain;
+
+	domain = msi_get_device_domain(dev, domid);
+	if (domain) {
+		info = domain->host_data;
+		return info->hwsize;
+	}
+	/* No domain, no size... */
+	return 0;
+}
+
 static inline void irq_chip_write_msi_msg(struct irq_data *data,
 					  struct msi_msg *msg)
 {
@@ -1359,7 +1394,7 @@ int msi_domain_alloc_irqs_all_locked(str
 	struct msi_ctrl ctrl = {
 		.domid	= domid,
 		.first	= 0,
-		.last	= MSI_MAX_INDEX,
+		.last	= msi_domain_get_hwsize(dev, domid) - 1,
 		.nirqs	= nirqs,
 	};
 
@@ -1473,7 +1508,8 @@ void msi_domain_free_irqs_range(struct d
  */
 void msi_domain_free_irqs_all_locked(struct device *dev, unsigned int domid)
 {
-	msi_domain_free_irqs_range_locked(dev, domid, 0, MSI_MAX_INDEX);
+	msi_domain_free_irqs_range_locked(dev, domid, 0,
+					  msi_domain_get_hwsize(dev, domid) - 1);
 }
 
 /**


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 10/33] PCI/MSI: Split __pci_write_msi_msg()
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (8 preceding siblings ...)
  2022-11-24 23:25 ` [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc() Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 11/33] genirq/msi: Provide BUS_DEVICE_PCI_MSI[X] Thomas Gleixner
                   ` (24 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe,
	Ahmed S. Darwish

The upcoming per device MSI domains will create different domains for MSI
and MSI-X. Split the write message function into MSI and MSI-X helpers so
they can be used by those new domain functions seperately.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/msi/msi.c |  104 +++++++++++++++++++++++++-------------------------
 1 file changed, 54 insertions(+), 50 deletions(-)

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -180,6 +180,58 @@ void __pci_read_msi_msg(struct msi_desc
 	}
 }
 
+static inline void pci_write_msg_msi(struct pci_dev *dev, struct msi_desc *desc,
+				     struct msi_msg *msg)
+{
+	int pos = dev->msi_cap;
+	u16 msgctl;
+
+	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+	msgctl &= ~PCI_MSI_FLAGS_QSIZE;
+	msgctl |= desc->pci.msi_attrib.multiple << 4;
+	pci_write_config_word(dev, pos + PCI_MSI_FLAGS, msgctl);
+
+	pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_LO, msg->address_lo);
+	if (desc->pci.msi_attrib.is_64) {
+		pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_HI,  msg->address_hi);
+		pci_write_config_word(dev, pos + PCI_MSI_DATA_64, msg->data);
+	} else {
+		pci_write_config_word(dev, pos + PCI_MSI_DATA_32, msg->data);
+	}
+	/* Ensure that the writes are visible in the device */
+	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+}
+
+static inline void pci_write_msg_msix(struct msi_desc *desc, struct msi_msg *msg)
+{
+	void __iomem *base = pci_msix_desc_addr(desc);
+	u32 ctrl = desc->pci.msix_ctrl;
+	bool unmasked = !(ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT);
+
+	if (desc->pci.msi_attrib.is_virtual)
+		return;
+	/*
+	 * The specification mandates that the entry is masked
+	 * when the message is modified:
+	 *
+	 * "If software changes the Address or Data value of an
+	 * entry while the entry is unmasked, the result is
+	 * undefined."
+	 */
+	if (unmasked)
+		pci_msix_write_vector_ctrl(desc, ctrl | PCI_MSIX_ENTRY_CTRL_MASKBIT);
+
+	writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
+	writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR);
+	writel(msg->data, base + PCI_MSIX_ENTRY_DATA);
+
+	if (unmasked)
+		pci_msix_write_vector_ctrl(desc, ctrl);
+
+	/* Ensure that the writes are visible in the device */
+	readl(base + PCI_MSIX_ENTRY_DATA);
+}
+
 void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
 	struct pci_dev *dev = msi_desc_to_pci_dev(entry);
@@ -187,63 +239,15 @@ void __pci_write_msi_msg(struct msi_desc
 	if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
 		/* Don't touch the hardware now */
 	} else if (entry->pci.msi_attrib.is_msix) {
-		void __iomem *base = pci_msix_desc_addr(entry);
-		u32 ctrl = entry->pci.msix_ctrl;
-		bool unmasked = !(ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT);
-
-		if (entry->pci.msi_attrib.is_virtual)
-			goto skip;
-
-		/*
-		 * The specification mandates that the entry is masked
-		 * when the message is modified:
-		 *
-		 * "If software changes the Address or Data value of an
-		 * entry while the entry is unmasked, the result is
-		 * undefined."
-		 */
-		if (unmasked)
-			pci_msix_write_vector_ctrl(entry, ctrl | PCI_MSIX_ENTRY_CTRL_MASKBIT);
-
-		writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
-		writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR);
-		writel(msg->data, base + PCI_MSIX_ENTRY_DATA);
-
-		if (unmasked)
-			pci_msix_write_vector_ctrl(entry, ctrl);
-
-		/* Ensure that the writes are visible in the device */
-		readl(base + PCI_MSIX_ENTRY_DATA);
+		pci_write_msg_msix(entry, msg);
 	} else {
-		int pos = dev->msi_cap;
-		u16 msgctl;
-
-		pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
-		msgctl &= ~PCI_MSI_FLAGS_QSIZE;
-		msgctl |= entry->pci.msi_attrib.multiple << 4;
-		pci_write_config_word(dev, pos + PCI_MSI_FLAGS, msgctl);
-
-		pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_LO,
-				       msg->address_lo);
-		if (entry->pci.msi_attrib.is_64) {
-			pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_HI,
-					       msg->address_hi);
-			pci_write_config_word(dev, pos + PCI_MSI_DATA_64,
-					      msg->data);
-		} else {
-			pci_write_config_word(dev, pos + PCI_MSI_DATA_32,
-					      msg->data);
-		}
-		/* Ensure that the writes are visible in the device */
-		pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+		pci_write_msg_msi(dev, entry, msg);
 	}
 
-skip:
 	entry->msg = *msg;
 
 	if (entry->write_msi_msg)
 		entry->write_msi_msg(entry, entry->write_msi_msg_data);
-
 }
 
 void pci_write_msi_msg(unsigned int irq, struct msi_msg *msg)


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 11/33] genirq/msi: Provide BUS_DEVICE_PCI_MSI[X]
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (9 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 10/33] PCI/MSI: Split __pci_write_msi_msg() Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 12/33] PCI/MSI: Add support for per device MSI[X] domains Thomas Gleixner
                   ` (23 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Provide new bus tokens for the upcoming per device PCI/MSI and PCI/MSIX
interrupt domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqdomain_defs.h |    2 ++
 kernel/irq/msi.c               |    4 ++++
 2 files changed, 6 insertions(+)

--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -21,6 +21,8 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_TI_SCI_INTA_MSI,
 	DOMAIN_BUS_WAKEUP,
 	DOMAIN_BUS_VMD_MSI,
+	DOMAIN_BUS_PCI_DEVICE_MSI,
+	DOMAIN_BUS_PCI_DEVICE_MSIX,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1122,6 +1122,8 @@ static bool msi_check_reservation_mode(s
 
 	switch(domain->bus_token) {
 	case DOMAIN_BUS_PCI_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
 	case DOMAIN_BUS_VMD_MSI:
 		break;
 	default:
@@ -1147,6 +1149,8 @@ static int msi_handle_pci_fail(struct ir
 {
 	switch(domain->bus_token) {
 	case DOMAIN_BUS_PCI_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
 	case DOMAIN_BUS_VMD_MSI:
 		if (IS_ENABLED(CONFIG_PCI_MSI))
 			break;


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 12/33] PCI/MSI: Add support for per device MSI[X] domains
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (10 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 11/33] genirq/msi: Provide BUS_DEVICE_PCI_MSI[X] Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-11-28  4:46   ` Tian, Kevin
                     ` (2 more replies)
  2022-11-24 23:26 ` [patch V3 13/33] x86/apic/vector: Provide MSI parent domain Thomas Gleixner
                   ` (22 subsequent siblings)
  34 siblings, 3 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe,
	Ahmed S. Darwish

Provide a template and the necessary callbacks to create PCI/MSI and
PCI/MSI-X domains.

The domains are created when MSI or MSI-X is enabled. The domain's lifetime
is either the device lifetime or in case that e.g. MSI-X was tried first
and failed, then the MSI-X domain is removed and a MSI domain is created as
both are mutually exclusive and reside in the default domain ID slot of the
per device domain pointer array.

Also expand pci_msi_domain_supports() to handle feature checks correctly
even in the case that the per device domain was not yet created by checking
the features supported by the MSI parent.

Add the necessary setup calls into the MSI and MSI-X enable code path.
These setup calls are backwards compatible. They return success when there
is no parent domain found, which means the existing global domains or the
legacy allocation path keep just working.

Co-developed-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
V3: Rename the chip callbacks and fix the check in the MSIX domain creation path (Kevin)
---
 drivers/pci/msi/irqdomain.c |  188 +++++++++++++++++++++++++++++++++++++++++++-
 drivers/pci/msi/msi.c       |   16 +++
 drivers/pci/msi/msi.h       |    2 
 3 files changed, 201 insertions(+), 5 deletions(-)

--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -139,6 +139,170 @@ struct irq_domain *pci_msi_create_irq_do
 }
 EXPORT_SYMBOL_GPL(pci_msi_create_irq_domain);
 
+/*
+ * Per device MSI[-X] domain functionality
+ */
+static void pci_device_domain_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
+{
+	arg->desc = desc;
+	arg->hwirq = desc->msi_index;
+}
+
+static void pci_irq_mask_msi(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+	pci_msi_mask(desc, BIT(data->irq - desc->irq));
+}
+
+static void pci_irq_unmask_msi(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+	pci_msi_unmask(desc, BIT(data->irq - desc->irq));
+}
+
+#ifdef CONFIG_GENERIC_IRQ_RESERVATION_MODE
+# define MSI_REACTIVATE		MSI_FLAG_MUST_REACTIVATE
+#else
+# define MSI_REACTIVATE		0
+#endif
+
+#define MSI_COMMON_FLAGS	(MSI_FLAG_FREE_MSI_DESCS |	\
+				 MSI_FLAG_ACTIVATE_EARLY |	\
+				 MSI_FLAG_DEV_SYSFS |		\
+				 MSI_REACTIVATE)
+
+static struct msi_domain_template pci_msi_template = {
+	.chip = {
+		.name			= "PCI-MSI",
+		.irq_mask		= pci_irq_mask_msi,
+		.irq_unmask		= pci_irq_unmask_msi,
+		.irq_write_msi_msg	= pci_msi_domain_write_msg,
+		.flags			= IRQCHIP_ONESHOT_SAFE,
+	},
+
+	.ops = {
+		.set_desc		= pci_device_domain_set_desc,
+	},
+
+	.info = {
+		.flags			= MSI_COMMON_FLAGS | MSI_FLAG_MULTI_PCI_MSI,
+		.bus_token		= DOMAIN_BUS_PCI_DEVICE_MSI,
+	},
+};
+
+static void pci_irq_mask_msix(struct irq_data *data)
+{
+	pci_msix_mask(irq_data_get_msi_desc(data));
+}
+
+static void pci_irq_unmask_msix(struct irq_data *data)
+{
+	pci_msix_unmask(irq_data_get_msi_desc(data));
+}
+
+static struct msi_domain_template pci_msix_template = {
+	.chip = {
+		.name			= "PCI-MSIX",
+		.irq_mask		= pci_irq_mask_msix,
+		.irq_unmask		= pci_irq_unmask_msix,
+		.irq_write_msi_msg	= pci_msi_domain_write_msg,
+		.flags			= IRQCHIP_ONESHOT_SAFE,
+	},
+
+	.ops = {
+		.set_desc		= pci_device_domain_set_desc,
+	},
+
+	.info = {
+		.flags			= MSI_COMMON_FLAGS | MSI_FLAG_PCI_MSIX,
+		.bus_token		= DOMAIN_BUS_PCI_DEVICE_MSIX,
+	},
+};
+
+static bool pci_match_device_domain(struct pci_dev *pdev, enum irq_domain_bus_token bus_token)
+{
+	return msi_match_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN, bus_token);
+}
+
+static bool pci_create_device_domain(struct pci_dev *pdev, struct msi_domain_template *tmpl,
+				     unsigned int hwsize)
+{
+	struct irq_domain *domain = dev_get_msi_domain(&pdev->dev);
+
+	if (!domain || !irq_domain_is_msi_parent(domain))
+		return true;
+
+	return msi_create_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN, tmpl,
+					    hwsize, NULL, NULL);
+}
+
+/**
+ * pci_setup_msi_device_domain - Setup a device MSI interrupt domain
+ * @pdev:	The PCI device to create the domain on
+ *
+ * Return:
+ *  True when:
+ *	- The device does not have a MSI parent irq domain associated,
+ *	  which keeps the legacy architecture specific and the global
+ *	  PCI/MSI domain models working
+ *	- The MSI domain exists already
+ *	- The MSI domain was successfully allocated
+ *  False when:
+ *	- MSI-X is enabled
+ *	- The domain creation fails.
+ *
+ * The created MSI domain is preserved until:
+ *	- The device is removed
+ *	- MSI is disabled and a MSI-X domain is created
+ */
+bool pci_setup_msi_device_domain(struct pci_dev *pdev)
+{
+	if (WARN_ON_ONCE(pdev->msix_enabled))
+		return false;
+
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSI))
+		return true;
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSIX))
+		msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
+
+	return pci_create_device_domain(pdev, &pci_msi_template, 1);
+}
+
+/**
+ * pci_setup_msix_device_domain - Setup a device MSI-X interrupt domain
+ * @pdev:	The PCI device to create the domain on
+ * @hwsize:	The size of the MSI-X vector table
+ *
+ * Return:
+ *  True when:
+ *	- The device does not have a MSI parent irq domain associated,
+ *	  which keeps the legacy architecture specific and the global
+ *	  PCI/MSI domain models working
+ *	- The MSI-X domain exists already
+ *	- The MSI-X domain was successfully allocated
+ *  False when:
+ *	- MSI is enabled
+ *	- The domain creation fails.
+ *
+ * The created MSI-X domain is preserved until:
+ *	- The device is removed
+ *	- MSI-X is disabled and a MSI domain is created
+ */
+bool pci_setup_msix_device_domain(struct pci_dev *pdev, unsigned int hwsize)
+{
+	if (WARN_ON_ONCE(pdev->msi_enabled))
+		return false;
+
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSIX))
+		return true;
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSI))
+		msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
+
+	return pci_create_device_domain(pdev, &pci_msix_template, hwsize);
+}
+
 /**
  * pci_msi_domain_supports - Check for support of a particular feature flag
  * @pdev:		The PCI device to operate on
@@ -152,13 +316,33 @@ bool pci_msi_domain_supports(struct pci_
 {
 	struct msi_domain_info *info;
 	struct irq_domain *domain;
+	unsigned int supported;
 
 	domain = dev_get_msi_domain(&pdev->dev);
 
 	if (!domain || !irq_domain_is_hierarchy(domain))
 		return mode == ALLOW_LEGACY;
-	info = domain->host_data;
-	return (info->flags & feature_mask) == feature_mask;
+
+	if (!irq_domain_is_msi_parent(domain)) {
+		/*
+		 * For "global" PCI/MSI interrupt domains the associated
+		 * msi_domain_info::flags is the authoritive source of
+		 * information.
+		 */
+		info = domain->host_data;
+		supported = info->flags;
+	} else {
+		/*
+		 * For MSI parent domains the supported feature set
+		 * is avaliable in the parent ops. This makes checks
+		 * possible before actually instantiating the
+		 * per device domain because the parent is never
+		 * expanding the PCI/MSI functionality.
+		 */
+		supported = domain->msi_parent_ops->supported_flags;
+	}
+
+	return (supported & feature_mask) == feature_mask;
 }
 
 /*
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -436,6 +436,9 @@ int __pci_enable_msi_range(struct pci_de
 	if (rc)
 		return rc;
 
+	if (!pci_setup_msi_device_domain(dev))
+		return -ENODEV;
+
 	for (;;) {
 		if (affd) {
 			nvec = irq_calc_affinity_vectors(minvec, nvec, affd);
@@ -787,9 +790,13 @@ int __pci_enable_msix_range(struct pci_d
 	if (!pci_msix_validate_entries(dev, entries, nvec, hwsize))
 		return -EINVAL;
 
-	/* PCI_IRQ_VIRTUAL is a horrible hack! */
-	if (nvec > hwsize && !(flags & PCI_IRQ_VIRTUAL))
-		nvec = hwsize;
+	if (hwsize < nvec) {
+		/* Keep the IRQ virtual hackery working */
+		if (flags & PCI_IRQ_VIRTUAL)
+			hwsize = nvec;
+		else
+			nvec = hwsize;
+	}
 
 	if (nvec < minvec)
 		return -ENOSPC;
@@ -798,6 +805,9 @@ int __pci_enable_msix_range(struct pci_d
 	if (rc)
 		return rc;
 
+	if (!pci_setup_msix_device_domain(dev, hwsize))
+		return -ENODEV;
+
 	for (;;) {
 		if (affd) {
 			nvec = irq_calc_affinity_vectors(minvec, nvec, affd);
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -105,6 +105,8 @@ enum support_mode {
 };
 
 bool pci_msi_domain_supports(struct pci_dev *dev, unsigned int feature_mask, enum support_mode mode);
+bool pci_setup_msi_device_domain(struct pci_dev *pdev);
+bool pci_setup_msix_device_domain(struct pci_dev *pdev, unsigned int hwsize);
 
 /* Legacy (!IRQDOMAIN) fallbacks */
 


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 13/33] x86/apic/vector: Provide MSI parent domain
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (11 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 12/33] PCI/MSI: Add support for per device MSI[X] domains Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
                     ` (2 more replies)
  2022-11-24 23:26 ` [patch V3 14/33] PCI/MSI: Remove unused pci_dev_has_special_msi_domain() Thomas Gleixner
                   ` (21 subsequent siblings)
  34 siblings, 3 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Enable MSI parent domain support in the x86 vector domain and fixup the
checks in the iommu implementations to check whether device::msi::domain is
the default MSI parent domain. That keeps the existing logic to protect
e.g. devices behind VMD working.

The interrupt remap PCI/MSI code still works because the underlying vector
domain still provides the same functionality.

None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are
affected either. They still work the same way both at the low level and the
PCI/MSI implementations they provide.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Fix kernel doc (robot)
---
 arch/x86/include/asm/msi.h          |    6 +
 arch/x86/include/asm/pci.h          |    1 
 arch/x86/kernel/apic/msi.c          |  176 ++++++++++++++++++++++++++----------
 drivers/iommu/amd/iommu.c           |    2 
 drivers/iommu/intel/irq_remapping.c |    2 
 5 files changed, 138 insertions(+), 49 deletions(-)

--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -62,4 +62,10 @@ typedef struct x86_msi_addr_hi {
 struct msi_msg;
 u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid);
 
+#define X86_VECTOR_MSI_FLAGS_SUPPORTED					\
+	(MSI_GENERIC_FLAGS_MASK | MSI_FLAG_PCI_MSIX)
+
+#define X86_VECTOR_MSI_FLAGS_REQUIRED					\
+	(MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS)
+
 #endif /* _ASM_X86_MSI_H */
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -92,6 +92,7 @@ void pcibios_scan_root(int bus);
 struct irq_routing_table *pcibios_get_irq_routing_table(void);
 int pcibios_set_irq_routing(struct pci_dev *dev, int pin, int irq);
 
+bool pci_dev_has_default_msi_parent_domain(struct pci_dev *dev);
 
 #define HAVE_PCI_MMAP
 #define arch_can_pci_mmap_wc()	pat_enabled()
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -142,67 +142,131 @@ msi_set_affinity(struct irq_data *irqd,
 	return ret;
 }
 
-/*
- * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
- * which implement the MSI or MSI-X Capability Structure.
+/**
+ * pci_dev_has_default_msi_parent_domain - Check whether the device has the default
+ *					   MSI parent domain associated
+ * @dev:	Pointer to the PCI device
  */
-static struct irq_chip pci_msi_controller = {
-	.name			= "PCI-MSI",
-	.irq_unmask		= pci_msi_unmask_irq,
-	.irq_mask		= pci_msi_mask_irq,
-	.irq_ack		= irq_chip_ack_parent,
-	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.irq_set_affinity	= msi_set_affinity,
-	.flags			= IRQCHIP_SKIP_SET_WAKE |
-				  IRQCHIP_AFFINITY_PRE_STARTUP,
-};
+bool pci_dev_has_default_msi_parent_domain(struct pci_dev *dev)
+{
+	struct irq_domain *domain = dev_get_msi_domain(&dev->dev);
 
-int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
-		    msi_alloc_info_t *arg)
+	if (!domain)
+		domain = dev_get_msi_domain(&dev->bus->dev);
+	if (!domain)
+		return false;
+
+	return domain == x86_vector_domain;
+}
+
+/**
+ * x86_msi_prepare - Setup of msi_alloc_info_t for allocations
+ * @domain:	The domain for which this setup happens
+ * @dev:	The device for which interrupts are allocated
+ * @nvec:	The number of vectors to allocate
+ * @alloc:	The allocation info structure to initialize
+ *
+ * This function is to be used for all types of MSI domains above the x86
+ * vector domain and any intermediates. It is always invoked from the
+ * top level interrupt domain. The domain specific allocation
+ * functionality is determined via the @domain's bus token which allows to
+ * map the X86 specific allocation type.
+ */
+static int x86_msi_prepare(struct irq_domain *domain, struct device *dev,
+			   int nvec, msi_alloc_info_t *alloc)
 {
-	init_irq_alloc_info(arg, NULL);
-	if (to_pci_dev(dev)->msix_enabled)
-		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
-	else
-		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
+	struct msi_domain_info *info = domain->host_data;
 
-	return 0;
+	init_irq_alloc_info(alloc, NULL);
+
+	switch (info->bus_token) {
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
+		return 0;
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
+		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
+		return 0;
+	default:
+		return -EINVAL;
+	}
 }
-EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
-static struct msi_domain_ops pci_msi_domain_ops = {
-	.msi_prepare	= pci_msi_prepare,
-};
+/**
+ * x86_init_dev_msi_info - Domain info setup for MSI domains
+ * @dev:		The device for which the domain should be created
+ * @domain:		The (root) domain providing this callback
+ * @real_parent:	The real parent domain of the to initialize domain
+ * @info:		The domain info for the to initialize domain
+ *
+ * This function is to be used for all types of MSI domains above the x86
+ * vector domain and any intermediates. The domain specific functionality
+ * is determined via the @real_parent.
+ */
+static bool x86_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+				  struct irq_domain *real_parent, struct msi_domain_info *info)
+{
+	const struct msi_parent_ops *pops = real_parent->msi_parent_ops;
+
+	/* MSI parent domain specific settings */
+	switch (real_parent->bus_token) {
+	case DOMAIN_BUS_ANY:
+		/* Only the vector domain can have the ANY token */
+		if (WARN_ON_ONCE(domain != real_parent))
+			return false;
+		info->chip->irq_set_affinity = msi_set_affinity;
+		/* See msi_set_affinity() for the gory details */
+		info->flags |= MSI_FLAG_NOMASK_QUIRK;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	/* Is the target supported? */
+	switch(info->bus_token) {
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	/*
+	 * Mask out the domain specific MSI feature flags which are not
+	 * supported by the real parent.
+	 */
+	info->flags			&= pops->supported_flags;
+	/* Enforce the required flags */
+	info->flags			|= X86_VECTOR_MSI_FLAGS_REQUIRED;
+
+	/* This is always invoked from the top level MSI domain! */
+	info->ops->msi_prepare		= x86_msi_prepare;
+
+	info->chip->irq_ack		= irq_chip_ack_parent;
+	info->chip->irq_retrigger	= irq_chip_retrigger_hierarchy;
+	info->chip->flags		|= IRQCHIP_SKIP_SET_WAKE |
+					   IRQCHIP_AFFINITY_PRE_STARTUP;
 
-static struct msi_domain_info pci_msi_domain_info = {
-	.flags		= MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
-			  MSI_FLAG_PCI_MSIX | MSI_FLAG_NOMASK_QUIRK,
-
-	.ops		= &pci_msi_domain_ops,
-	.chip		= &pci_msi_controller,
-	.handler	= handle_edge_irq,
-	.handler_name	= "edge",
+	info->handler			= handle_edge_irq;
+	info->handler_name		= "edge";
+
+	return true;
+}
+
+static const struct msi_parent_ops x86_vector_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED,
+	.init_dev_msi_info	= x86_init_dev_msi_info,
 };
 
 struct irq_domain * __init native_create_pci_msi_domain(void)
 {
-	struct fwnode_handle *fn;
-	struct irq_domain *d;
-
 	if (disable_apic)
 		return NULL;
 
-	fn = irq_domain_alloc_named_fwnode("PCI-MSI");
-	if (!fn)
-		return NULL;
-
-	d = pci_msi_create_irq_domain(fn, &pci_msi_domain_info,
-				      x86_vector_domain);
-	if (!d) {
-		irq_domain_free_fwnode(fn);
-		pr_warn("Failed to initialize PCI-MSI irqdomain.\n");
-	}
-	return d;
+	x86_vector_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
+	x86_vector_domain->msi_parent_ops = &x86_vector_msi_parent_ops;
+	return x86_vector_domain;
 }
 
 void __init x86_create_pci_msi_domain(void)
@@ -210,7 +274,25 @@ void __init x86_create_pci_msi_domain(vo
 	x86_pci_msi_default_domain = x86_init.irqs.create_pci_msi_domain();
 }
 
+/* Keep around for hyperV and the remap code below */
+int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+		    msi_alloc_info_t *arg)
+{
+	init_irq_alloc_info(arg, NULL);
+
+	if (to_pci_dev(dev)->msix_enabled)
+		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
+	else
+		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_msi_prepare);
+
 #ifdef CONFIG_IRQ_REMAP
+static struct msi_domain_ops pci_msi_domain_ops = {
+	.msi_prepare	= pci_msi_prepare,
+};
+
 static struct irq_chip pci_msi_ir_controller = {
 	.name			= "IR-PCI-MSI",
 	.irq_unmask		= pci_msi_unmask_irq,
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -812,7 +812,7 @@ static void
 amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu)
 {
 	if (!irq_remapping_enabled || !dev_is_pci(dev) ||
-	    pci_dev_has_special_msi_domain(to_pci_dev(dev)))
+	    !pci_dev_has_default_msi_parent_domain(to_pci_dev(dev)))
 		return;
 
 	dev_set_msi_domain(dev, iommu->msi_domain);
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1107,7 +1107,7 @@ static int reenable_irq_remapping(int ei
  */
 void intel_irq_remap_add_device(struct dmar_pci_notify_info *info)
 {
-	if (!irq_remapping_enabled || pci_dev_has_special_msi_domain(info->dev))
+	if (!irq_remapping_enabled || !pci_dev_has_default_msi_parent_domain(info->dev))
 		return;
 
 	dev_set_msi_domain(&info->dev->dev, map_dev_to_ir(info->dev));


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 14/33] PCI/MSI: Remove unused pci_dev_has_special_msi_domain()
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (12 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 13/33] x86/apic/vector: Provide MSI parent domain Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 15/33] iommu/vt-d: Switch to MSI parent domains Thomas Gleixner
                   ` (20 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

The check for special MSI domains like VMD which prevents the interrupt
remapping code to overwrite device::msi::domain is not longer required and
has been replaced by an x86 specific version which is aware of MSI parent
domains.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/msi/irqdomain.c |   21 ---------------------
 include/linux/msi.h         |    1 -
 2 files changed, 22 deletions(-)

--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -414,24 +414,3 @@ struct irq_domain *pci_msi_get_device_do
 					     DOMAIN_BUS_PCI_MSI);
 	return dom;
 }
-
-/**
- * pci_dev_has_special_msi_domain - Check whether the device is handled by
- *				    a non-standard PCI-MSI domain
- * @pdev:	The PCI device to check.
- *
- * Returns: True if the device irqdomain or the bus irqdomain is
- * non-standard PCI/MSI.
- */
-bool pci_dev_has_special_msi_domain(struct pci_dev *pdev)
-{
-	struct irq_domain *dom = dev_get_msi_domain(&pdev->dev);
-
-	if (!dom)
-		dom = dev_get_msi_domain(&pdev->bus->dev);
-
-	if (!dom)
-		return true;
-
-	return dom->bus_token != DOMAIN_BUS_PCI_MSI;
-}
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -609,7 +609,6 @@ struct irq_domain *pci_msi_create_irq_do
 					     struct irq_domain *parent);
 u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev);
 struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev);
-bool pci_dev_has_special_msi_domain(struct pci_dev *pdev);
 #else /* CONFIG_PCI_MSI */
 static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev)
 {


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 15/33] iommu/vt-d: Switch to MSI parent domains
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (13 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 14/33] PCI/MSI: Remove unused pci_dev_has_special_msi_domain() Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 16/33] iommu/amd: Switch to MSI base domains Thomas Gleixner
                   ` (19 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/msi.c          |    2 ++
 drivers/iommu/intel/iommu.h         |    1 -
 drivers/iommu/intel/irq_remapping.c |   27 ++++++++++++---------------
 include/linux/irqdomain_defs.h      |    1 +
 4 files changed, 15 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -217,6 +217,8 @@ static bool x86_init_dev_msi_info(struct
 		/* See msi_set_affinity() for the gory details */
 		info->flags |= MSI_FLAG_NOMASK_QUIRK;
 		break;
+	case DOMAIN_BUS_DMAR:
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		return false;
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -600,7 +600,6 @@ struct intel_iommu {
 #ifdef CONFIG_IRQ_REMAP
 	struct ir_table *ir_table;	/* Interrupt remapping info */
 	struct irq_domain *ir_domain;
-	struct irq_domain *ir_msi_domain;
 #endif
 	struct iommu_device iommu;  /* IOMMU core code handle */
 	int		node;
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -82,6 +82,7 @@ static const struct irq_domain_ops intel
 
 static void iommu_disable_irq_remapping(struct intel_iommu *iommu);
 static int __init parse_ioapics_under_ir(void);
+static const struct msi_parent_ops dmar_msi_parent_ops;
 
 static bool ir_pre_enabled(struct intel_iommu *iommu)
 {
@@ -230,7 +231,7 @@ static struct irq_domain *map_dev_to_ir(
 {
 	struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
 
-	return drhd ? drhd->iommu->ir_msi_domain : NULL;
+	return drhd ? drhd->iommu->ir_domain : NULL;
 }
 
 static int clear_entries(struct irq_2_iommu *irq_iommu)
@@ -573,10 +574,10 @@ static int intel_setup_irq_remapping(str
 		pr_err("IR%d: failed to allocate irqdomain\n", iommu->seq_id);
 		goto out_free_fwnode;
 	}
-	iommu->ir_msi_domain =
-		arch_create_remap_msi_irq_domain(iommu->ir_domain,
-						 "INTEL-IR-MSI",
-						 iommu->seq_id);
+
+	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_DMAR);
+	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
+	iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops;
 
 	ir_table->base = page_address(pages);
 	ir_table->bitmap = bitmap;
@@ -620,9 +621,6 @@ static int intel_setup_irq_remapping(str
 	return 0;
 
 out_free_ir_domain:
-	if (iommu->ir_msi_domain)
-		irq_domain_remove(iommu->ir_msi_domain);
-	iommu->ir_msi_domain = NULL;
 	irq_domain_remove(iommu->ir_domain);
 	iommu->ir_domain = NULL;
 out_free_fwnode:
@@ -644,13 +642,6 @@ static void intel_teardown_irq_remapping
 	struct fwnode_handle *fn;
 
 	if (iommu && iommu->ir_table) {
-		if (iommu->ir_msi_domain) {
-			fn = iommu->ir_msi_domain->fwnode;
-
-			irq_domain_remove(iommu->ir_msi_domain);
-			irq_domain_free_fwnode(fn);
-			iommu->ir_msi_domain = NULL;
-		}
 		if (iommu->ir_domain) {
 			fn = iommu->ir_domain->fwnode;
 
@@ -1437,6 +1428,12 @@ static const struct irq_domain_ops intel
 	.deactivate = intel_irq_remapping_deactivate,
 };
 
+static const struct msi_parent_ops dmar_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "IR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 /*
  * Support of Interrupt Remapping Unit Hotplug
  */
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -23,6 +23,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_VMD_MSI,
 	DOMAIN_BUS_PCI_DEVICE_MSI,
 	DOMAIN_BUS_PCI_DEVICE_MSIX,
+	DOMAIN_BUS_DMAR,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 16/33] iommu/amd: Switch to MSI base domains
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (14 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 15/33] iommu/vt-d: Switch to MSI parent domains Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 17/33] x86/apic/msi: Remove arch_create_remap_msi_irq_domain() Thomas Gleixner
                   ` (18 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/msi.c          |    1 +
 drivers/iommu/amd/amd_iommu_types.h |    1 -
 drivers/iommu/amd/iommu.c           |   19 +++++++++++++------
 include/linux/irqdomain_defs.h      |    1 +
 4 files changed, 15 insertions(+), 7 deletions(-)

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -218,6 +218,7 @@ static bool x86_init_dev_msi_info(struct
 		info->flags |= MSI_FLAG_NOMASK_QUIRK;
 		break;
 	case DOMAIN_BUS_DMAR:
+	case DOMAIN_BUS_AMDVI:
 		break;
 	default:
 		WARN_ON_ONCE(1);
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -734,7 +734,6 @@ struct amd_iommu {
 	u8 max_counters;
 #ifdef CONFIG_IRQ_REMAP
 	struct irq_domain *ir_domain;
-	struct irq_domain *msi_domain;
 
 	struct amd_irte_ops *irte_ops;
 #endif
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -815,7 +815,7 @@ amd_iommu_set_pci_msi_domain(struct devi
 	    !pci_dev_has_default_msi_parent_domain(to_pci_dev(dev)))
 		return;
 
-	dev_set_msi_domain(dev, iommu->msi_domain);
+	dev_set_msi_domain(dev, iommu->ir_domain);
 }
 
 #else /* CONFIG_IRQ_REMAP */
@@ -3648,6 +3648,12 @@ static struct irq_chip amd_ir_chip = {
 	.irq_compose_msi_msg	= ir_compose_msi_msg,
 };
 
+static const struct msi_parent_ops amdvi_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "IR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 {
 	struct fwnode_handle *fn;
@@ -3655,16 +3661,17 @@ int amd_iommu_create_irq_domain(struct a
 	fn = irq_domain_alloc_named_id_fwnode("AMD-IR", iommu->index);
 	if (!fn)
 		return -ENOMEM;
-	iommu->ir_domain = irq_domain_create_tree(fn, &amd_ir_domain_ops, iommu);
+	iommu->ir_domain = irq_domain_create_hierarchy(arch_get_ir_parent_domain(), 0, 0,
+						       fn, &amd_ir_domain_ops, iommu);
 	if (!iommu->ir_domain) {
 		irq_domain_free_fwnode(fn);
 		return -ENOMEM;
 	}
 
-	iommu->ir_domain->parent = arch_get_ir_parent_domain();
-	iommu->msi_domain = arch_create_remap_msi_irq_domain(iommu->ir_domain,
-							     "AMD-IR-MSI",
-							     iommu->index);
+	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_AMDVI);
+	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
+	iommu->ir_domain->msi_parent_ops = &amdvi_msi_parent_ops;
+
 	return 0;
 }
 
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -24,6 +24,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_PCI_DEVICE_MSI,
 	DOMAIN_BUS_PCI_DEVICE_MSIX,
 	DOMAIN_BUS_DMAR,
+	DOMAIN_BUS_AMDVI,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 17/33] x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (15 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 16/33] iommu/amd: Switch to MSI base domains Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 18/33] genirq/msi: Provide struct msi_map Thomas Gleixner
                   ` (17 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

and related code which is not longer required now that the interrupt remap
code has been converted to MSI parent domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/irq_remapping.h |    4 ---
 arch/x86/kernel/apic/msi.c           |   42 -----------------------------------
 2 files changed, 1 insertion(+), 45 deletions(-)

--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -44,10 +44,6 @@ extern int irq_remapping_reenable(int);
 extern int irq_remap_enable_fault_handling(void);
 extern void panic_if_irq_remap(const char *msg);
 
-/* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
-extern struct irq_domain *
-arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id);
-
 /* Get parent irqdomain for interrupt remapping irqdomain */
 static inline struct irq_domain *arch_get_ir_parent_domain(void)
 {
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -277,7 +277,7 @@ void __init x86_create_pci_msi_domain(vo
 	x86_pci_msi_default_domain = x86_init.irqs.create_pci_msi_domain();
 }
 
-/* Keep around for hyperV and the remap code below */
+/* Keep around for hyperV */
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 		    msi_alloc_info_t *arg)
 {
@@ -291,46 +291,6 @@ int pci_msi_prepare(struct irq_domain *d
 }
 EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
-#ifdef CONFIG_IRQ_REMAP
-static struct msi_domain_ops pci_msi_domain_ops = {
-	.msi_prepare	= pci_msi_prepare,
-};
-
-static struct irq_chip pci_msi_ir_controller = {
-	.name			= "IR-PCI-MSI",
-	.irq_unmask		= pci_msi_unmask_irq,
-	.irq_mask		= pci_msi_mask_irq,
-	.irq_ack		= irq_chip_ack_parent,
-	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.flags			= IRQCHIP_SKIP_SET_WAKE |
-				  IRQCHIP_AFFINITY_PRE_STARTUP,
-};
-
-static struct msi_domain_info pci_msi_ir_domain_info = {
-	.flags		= MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
-			  MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX,
-	.ops		= &pci_msi_domain_ops,
-	.chip		= &pci_msi_ir_controller,
-	.handler	= handle_edge_irq,
-	.handler_name	= "edge",
-};
-
-struct irq_domain *arch_create_remap_msi_irq_domain(struct irq_domain *parent,
-						    const char *name, int id)
-{
-	struct fwnode_handle *fn;
-	struct irq_domain *d;
-
-	fn = irq_domain_alloc_named_id_fwnode(name, id);
-	if (!fn)
-		return NULL;
-	d = pci_msi_create_irq_domain(fn, &pci_msi_ir_domain_info, parent);
-	if (!d)
-		irq_domain_free_fwnode(fn);
-	return d;
-}
-#endif
-
 #ifdef CONFIG_DMAR_TABLE
 /*
  * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 18/33] genirq/msi: Provide struct msi_map
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (16 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 17/33] x86/apic/msi: Remove arch_create_remap_msi_irq_domain() Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 19/33] genirq/msi: Provide msi_desc::msi_data Thomas Gleixner
                   ` (16 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

A simple struct to hold a MSI index / Linux interrupt number pair. It will
be returned from the dynamic vector allocation function and handed back to
the corresponding free() function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/msi_api.h |   13 +++++++++++++
 1 file changed, 13 insertions(+)

--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -18,6 +18,19 @@ enum msi_domain_ids {
 	MSI_MAX_DEVICE_IRQDOMAINS,
 };
 
+/**
+ * msi_map - Mapping between MSI index and Linux interrupt number
+ * @index:	The MSI index, e.g. slot in the MSI-X table or
+ *		a software managed index if >= 0. If negative
+ *		the allocation function failed and it contains
+ *		the error code.
+ * @virq:	The associated Linux interrupt number
+ */
+struct msi_map {
+	int	index;
+	int	virq;
+};
+
 unsigned int msi_domain_get_virq(struct device *dev, unsigned int domid, unsigned int index);
 
 /**


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 19/33] genirq/msi: Provide msi_desc::msi_data
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (17 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 18/33] genirq/msi: Provide struct msi_map Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] genirq/msi: Provide msi_desc:: Msi_data tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 20/33] genirq/msi: Provide msi_domain_ops::prepare_desc() Thomas Gleixner
                   ` (15 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

The upcoming support for PCI/IMS requires to store some information related
to the message handling in the MSI descriptor, e.g. PASID or a pointer to a
queue.

Provide a generic storage struct which maps over the existing PCI specific
storage which means the size of struct msi_desc is not getting bigger.

This storage struct has two elements:

  1) msi_domain_cookie
  2) msi_instance_cookie

The domain cookie is going to be used to store domain specific information,
e.g. iobase pointer, data pointer.

The instance cookie is going to be handed in when allocating an interrupt
on an IMS domain so the irq chip callbacks of the IMS domain have the
necessary per vector information available. It also comes in handy when
cleaning up the platform MSI code for wire to MSI bridges which need to
hand down the type information to the underlying interrupt domain.

For the core code the cookies are opaque and meaningless. It just stores
the instance cookie during an allocation through the upcoming interfaces
for IMS and wire to MSI brigdes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Rename and split into domain/instance
V3: Update stale changelog (Kevin)
---
 include/linux/msi.h     |   38 +++++++++++++++++++++++++++++++++++++-
 include/linux/msi_api.h |   17 +++++++++++++++++
 2 files changed, 54 insertions(+), 1 deletion(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -125,6 +125,38 @@ struct pci_msi_desc {
 	};
 };
 
+/**
+ * union msi_domain_cookie - Opaque MSI domain specific data
+ * @value:	u64 value store
+ * @ptr:	Pointer to domain specific data
+ * @iobase:	Domain specific IOmem pointer
+ *
+ * The content of this data is implementation defined and used by the MSI
+ * domain to store domain specific information which is requried for
+ * interrupt chip callbacks.
+ */
+union msi_domain_cookie {
+	u64	value;
+	void	*ptr;
+	void	__iomem *iobase;
+};
+
+/**
+ * struct msi_desc_data - Generic MSI descriptor data
+ * @dcookie:	Cookie for MSI domain specific data which is required
+ *		for irq_chip callbacks
+ * @icookie:	Cookie for the MSI interrupt instance provided by
+ *		the usage site to the allocation function
+ *
+ * The content of this data is implementation defined, e.g. PCI/IMS
+ * implementations define the meaning of the data. The MSI core ignores
+ * this data completely.
+ */
+struct msi_desc_data {
+	union msi_domain_cookie		dcookie;
+	union msi_instance_cookie	icookie;
+};
+
 #define MSI_MAX_INDEX		((unsigned int)USHRT_MAX)
 
 /**
@@ -142,6 +174,7 @@ struct pci_msi_desc {
  *
  * @msi_index:	Index of the msi descriptor
  * @pci:	PCI specific msi descriptor data
+ * @data:	Generic MSI descriptor data
  */
 struct msi_desc {
 	/* Shared device/bus type independent data */
@@ -161,7 +194,10 @@ struct msi_desc {
 	void *write_msi_msg_data;
 
 	u16				msi_index;
-	struct pci_msi_desc		pci;
+	union {
+		struct pci_msi_desc	pci;
+		struct msi_desc_data	data;
+	};
 };
 
 /*
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -19,6 +19,23 @@ enum msi_domain_ids {
 };
 
 /**
+ * union msi_instance_cookie - MSI instance cookie
+ * @value:	u64 value store
+ * @ptr:	Pointer to usage site specific data
+ *
+ * This cookie is handed to the IMS allocation function and stored in the
+ * MSI descriptor for the interrupt chip callbacks.
+ *
+ * The content of this cookie is MSI domain implementation defined.  For
+ * PCI/IMS implementations this could be a PASID or a pointer to queue
+ * memory.
+ */
+union msi_instance_cookie {
+	u64	value;
+	void	*ptr;
+};
+
+/**
  * msi_map - Mapping between MSI index and Linux interrupt number
  * @index:	The MSI index, e.g. slot in the MSI-X table or
  *		a software managed index if >= 0. If negative


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 20/33] genirq/msi: Provide msi_domain_ops::prepare_desc()
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (18 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 19/33] genirq/msi: Provide msi_desc::msi_data Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] genirq/msi: Provide msi_domain_ops:: Prepare_desc() tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 21/33] genirq/msi: Provide msi_domain_alloc_irq_at() Thomas Gleixner
                   ` (14 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

The existing MSI domain ops msi_prepare() and set_desc() turned out to be
unsuitable for implementing IMS support.

msi_prepare() does not operate on the MSI descriptors. set_desc() lacks
an irq_domain pointer and has a completely different purpose.

Introduce a prepare_desc() op which allows IMS implementations to amend an
MSI descriptor which was allocated by the core code, e.g. by adjusting the
iomem base or adding some data based on the allocated index. This is way
better than requiring that all IMS domain implementations preallocate the
MSI descriptor and then allocate the interrupt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/msi.h |    6 +++++-
 kernel/irq/msi.c    |    3 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -410,6 +410,8 @@ struct msi_domain_info;
  * @msi_init:		Domain specific init function for MSI interrupts
  * @msi_free:		Domain specific function to free a MSI interrupts
  * @msi_prepare:	Prepare the allocation of the interrupts in the domain
+ * @prepare_desc:	Optional function to prepare the allocated MSI descriptor
+ *			in the domain
  * @set_desc:		Set the msi descriptor for an interrupt
  * @domain_alloc_irqs:	Optional function to override the default allocation
  *			function.
@@ -421,7 +423,7 @@ struct msi_domain_info;
  * @get_hwirq, @msi_init and @msi_free are callbacks used by the underlying
  * irqdomain.
  *
- * @msi_check, @msi_prepare and @set_desc are callbacks used by the
+ * @msi_check, @msi_prepare, @prepare_desc and @set_desc are callbacks used by the
  * msi_domain_alloc/free_irqs*() variants.
  *
  * @domain_alloc_irqs, @domain_free_irqs can be used to override the
@@ -444,6 +446,8 @@ struct msi_domain_ops {
 	int		(*msi_prepare)(struct irq_domain *domain,
 				       struct device *dev, int nvec,
 				       msi_alloc_info_t *arg);
+	void		(*prepare_desc)(struct irq_domain *domain, msi_alloc_info_t *arg,
+					struct msi_desc *desc);
 	void		(*set_desc)(msi_alloc_info_t *arg,
 				    struct msi_desc *desc);
 	int		(*domain_alloc_irqs)(struct irq_domain *domain,
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1260,6 +1260,9 @@ static int __msi_domain_alloc_irqs(struc
 		if (WARN_ON_ONCE(allocated >= ctrl->nirqs))
 			return -EINVAL;
 
+		if (ops->prepare_desc)
+			ops->prepare_desc(domain, &arg, desc);
+
 		ops->set_desc(&arg, desc);
 
 		virq = __irq_domain_alloc_irqs(domain, -1, desc->nvec_used,


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 21/33] genirq/msi: Provide msi_domain_alloc_irq_at()
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (19 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 20/33] genirq/msi: Provide msi_domain_ops::prepare_desc() Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-11-28 14:39   ` Thomas Gleixner
                     ` (2 more replies)
  2022-11-24 23:26 ` [patch V3 22/33] genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN Thomas Gleixner
                   ` (13 subsequent siblings)
  34 siblings, 3 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

For supporting post MSI-X enable allocations and for the upcoming PCI/IMS
support a separate interface is required which allows not only the
allocation of a specific index, but also the allocation of any, i.e. the
next free index. The latter is especially required for IMS because IMS
completely does away with index to functionality mappings which are
often found in MSI/MSI-X implementation.

But even with MSI-X there are devices where only the first few indices have
a fixed functionality and the rest is freely assignable by software,
e.g. to queues.

msi_domain_alloc_irq_at() is also different from the range based interfaces
as it always enforces that the MSI descriptor is allocated by the core code
and not preallocated by the caller like the PCI/MSI[-X] enable code path
does.

msi_domain_alloc_irq_at() can be invoked with the index argument set to
MSI_ANY_INDEX which makes the core code pick the next free index. The irq
domain can provide a prepare_desc() operation callback in it's
msi_domain_ops to do domain specific post allocation initialization before
the actual Linux interrupt and the associated interrupt descriptor and
hierarchy alloccations are conducted.

The function also takes an optional @icookie argument which is of type
union msi_instance_cookie. This cookie is not used by the core code and is
stored in the allocated msi_desc::data::icookie. The meaning of the cookie
is completely implementation defined. In case of IMS this might be a PASID
or a pointer to a device queue, but for the MSI core it's opaque and not
used in any way.

The function returns a struct msi_map which on success contains the
allocated index number and the Linux interrupt number so the caller can
spare the index to Linux interrupt number lookup.

On failure map::index contains the error code and map::virq is 0.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Fix the recursive allocation issue and the index calculation (Reinette)
V3: Fixup stale changelog and typos in comments (Kevin)
    Adopt to the reworked domain/xarray storage model
---
 include/linux/msi.h     |    4 +
 include/linux/msi_api.h |    7 +++
 kernel/irq/msi.c        |  107 +++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 108 insertions(+), 10 deletions(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -80,6 +80,7 @@ struct pci_dev;
 struct platform_msi_priv_data;
 struct device_attribute;
 struct irq_domain;
+struct irq_affinity_desc;
 
 void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg);
 #ifdef CONFIG_GENERIC_MSI_IRQ
@@ -602,6 +603,9 @@ int msi_domain_alloc_irqs_range(struct d
 				unsigned int first, unsigned int last);
 int msi_domain_alloc_irqs_all_locked(struct device *dev, unsigned int domid, int nirqs);
 
+struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index,
+				       const struct irq_affinity_desc *affdesc,
+				       union msi_instance_cookie *cookie);
 
 void msi_domain_free_irqs_range_locked(struct device *dev, unsigned int domid,
 				       unsigned int first, unsigned int last);
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -48,6 +48,13 @@ struct msi_map {
 	int	virq;
 };
 
+/*
+ * Constant to be used for dynamic allocations when the allocation is any
+ * free MSI index, which is either an entry in a hardware table or a
+ * software managed index.
+ */
+#define MSI_ANY_INDEX		UINT_MAX
+
 unsigned int msi_domain_get_virq(struct device *dev, unsigned int domid, unsigned int index);
 
 /**
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -90,17 +90,30 @@ static int msi_insert_desc(struct device
 	int ret;
 
 	hwsize = msi_domain_get_hwsize(dev, domid);
-	if (index >= hwsize) {
-		ret = -ERANGE;
-		goto fail;
-	}
 
-	desc->msi_index = index;
-	ret = xa_insert(xa, index, desc, GFP_KERNEL);
-	if (ret)
-		goto fail;
-	return 0;
+	if (index == MSI_ANY_INDEX) {
+		struct xa_limit limit = { .min = 0, .max = hwsize - 1 };
+		unsigned int index;
+
+		/* Let the xarray allocate a free index within the limit */
+		ret = xa_alloc(xa, &index, desc, limit, GFP_KERNEL);
+		if (ret)
+			goto fail;
 
+		desc->msi_index = index;
+		return 0;
+	} else {
+		if (index >= hwsize) {
+			ret = -ERANGE;
+			goto fail;
+		}
+
+		desc->msi_index = index;
+		ret = xa_insert(xa, index, desc, GFP_KERNEL);
+		if (ret)
+			goto fail;
+		return 0;
+	}
 fail:
 	msi_free_desc(desc);
 	return ret;
@@ -294,7 +307,7 @@ int msi_setup_device_data(struct device
 	}
 
 	for (i = 0; i < MSI_MAX_DEVICE_IRQDOMAINS; i++)
-		xa_init(&md->__domains[i].store);
+		xa_init_flags(&md->__domains[i].store, XA_FLAGS_ALLOC);
 
 	/*
 	 * If @dev::msi::domain is set and is a global MSI domain, copy the
@@ -1407,6 +1420,80 @@ int msi_domain_alloc_irqs_all_locked(str
 	return msi_domain_alloc_locked(dev, &ctrl);
 }
 
+/**
+ * msi_domain_alloc_irq_at - Allocate an interrupt from a MSI interrupt domain at
+ *			     a given index - or at the next free index
+ *
+ * @dev:	Pointer to device struct of the device for which the interrupts
+ *		are allocated
+ * @domid:	Id of the interrupt domain to operate on
+ * @index:	Index for allocation. If @index == %MSI_ANY_INDEX the allocation
+ *		uses the next free index.
+ * @affdesc:	Optional pointer to an interrupt affinity descriptor structure
+ * @icookie:	Optional pointer to a domain specific per instance cookie. If
+ *		non-NULL the content of the cookie is stored in msi_desc::data.
+ *		Must be NULL for MSI-X allocations
+ *
+ * This requires a MSI interrupt domain which lets the core code manage the
+ * MSI descriptors.
+ *
+ * Return: struct msi_map
+ *
+ *	On success msi_map::index contains the allocated index number and
+ *	msi_map::virq the corresponding Linux interrupt number
+ *
+ *	On failure msi_map::index contains the error code and msi_map::virq
+ *	is %0.
+ */
+struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index,
+				       const struct irq_affinity_desc *affdesc,
+				       union msi_instance_cookie *icookie)
+{
+	struct msi_ctrl ctrl = { .domid	= domid, .nirqs = 1, };
+	struct msi_domain_info *info;
+	struct irq_domain *domain;
+	struct msi_map map = { };
+	struct msi_desc *desc;
+	int ret;
+
+	msi_lock_descs(dev);
+	domain = msi_get_device_domain(dev, domid);
+	if (!domain) {
+		map.index = -ENODEV;
+		goto unlock;
+	}
+
+	desc = msi_alloc_desc(dev, 1, affdesc);
+	if (!desc) {
+		map.index = -ENOMEM;
+		goto unlock;
+	}
+
+	if (icookie)
+		desc->data.icookie = *icookie;
+
+	ret = msi_insert_desc(dev, desc, domid, index);
+	if (ret) {
+		map.index = ret;
+		goto unlock;
+	}
+
+	ctrl.first = ctrl.last = desc->msi_index;
+	info = domain->host_data;
+
+	ret = __msi_domain_alloc_irqs(dev, domain, &ctrl);
+	if (ret) {
+		map.index = ret;
+		msi_domain_free_locked(dev, &ctrl);
+	} else {
+		map.index = desc->msi_index;
+		map.virq = desc->irq;
+	}
+unlock:
+	msi_unlock_descs(dev);
+	return map;
+}
+
 static void __msi_domain_free_irqs(struct device *dev, struct irq_domain *domain,
 				   struct msi_ctrl *ctrl)
 {


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 22/33] genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (20 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 21/33] genirq/msi: Provide msi_domain_alloc_irq_at() Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 23/33] PCI/MSI: Split MSI-X descriptor setup Thomas Gleixner
                   ` (12 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe,
	Jason Gunthorpe

Provide a new MSI feature flag in preparation for dynamic MSIX allocation
after the initial MSI-X enable has been done.

This needs to be an explicit MSI interrupt domain feature because quite
some implementations (both interrupt domains and legacy allocation mode)
have clear expectations that the allocation code is only invoked when MSI-X
is about to be enabled. They either talk to hypervisors or do some other
work and are not prepared to be invoked on an already MSI-X enabled device.

This is also explicit MSI-X only because rewriting the size of the MSI
entries is only possible when disabling MSI which in turn might cause lost
interrupts on the device.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 include/linux/msi.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -557,7 +557,8 @@ enum {
 	MSI_FLAG_LEVEL_CAPABLE		= (1 << 18),
 	/* MSI-X entries must be contiguous */
 	MSI_FLAG_MSIX_CONTIGUOUS	= (1 << 19),
-
+	/* PCI/MSI-X vectors can be dynamically allocated/freed post MSI-X enable */
+	MSI_FLAG_PCI_MSIX_ALLOC_DYN	= (1 << 20),
 };
 
 /**


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 23/33] PCI/MSI: Split MSI-X descriptor setup
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (21 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 22/33] genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 24/33] PCI/MSI: Provide prepare_desc() MSI domain op Thomas Gleixner
                   ` (11 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

The upcoming mechanism to allocate MSI-X vectors after enabling MSI-X needs
to share some of the MSI-X descriptor setup.

The regular descriptor setup on enable has the following code flow:

    1) Allocate descriptor
    2) Setup descriptor with PCI specific data
    3) Insert descriptor
    4) Allocate interrupts which in turn scans the inserted
       descriptors

This cannot be easily changed because the PCI/MSI code needs to handle the
legacy architecture specific allocation model and the irq domain model
where quite some domains have the assumption that the above flow is how it
works.

Ideally the code flow should look like this:

   1) Invoke allocation at the MSI core
   2) MSI core allocates descriptor
   3) MSI core calls back into the irq domain which fills in
      the domain specific parts

This could be done for underlying parent MSI domains which support
post-enable allocation/free but that would create significantly different
code pathes for MSI/MSI-X enable.

Though for dynamic allocation which wants to share the allocation code with
the upcoming PCI/IMS support it's the right thing to do.

Split the MSI-X descriptor setup into the preallocation part which just sets
the index and fills in the horrible hack of virtual IRQs and the real PCI
specific MSI-X setup part which solely depends on the index in the
descriptor. This allows to provide a common dynamic allocation interface at
the MSI core level for both PCI/MSI-X and PCI/IMS.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/msi/msi.c |   72 +++++++++++++++++++++++++++++++-------------------
 drivers/pci/msi/msi.h |    2 +
 2 files changed, 47 insertions(+), 27 deletions(-)

--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -569,34 +569,56 @@ static void __iomem *msix_map_region(str
 	return ioremap(phys_addr, nr_entries * PCI_MSIX_ENTRY_SIZE);
 }
 
-static int msix_setup_msi_descs(struct pci_dev *dev, void __iomem *base,
-				struct msix_entry *entries, int nvec,
-				struct irq_affinity_desc *masks)
+/**
+ * msix_prepare_msi_desc - Prepare a half initialized MSI descriptor for operation
+ * @dev:	The PCI device for which the descriptor is prepared
+ * @desc:	The MSI descriptor for preparation
+ *
+ * This is separate from msix_setup_msi_descs() below to handle dynamic
+ * allocations for MSI-X after initial enablement.
+ *
+ * Ideally the whole MSI-X setup would work that way, but there is no way to
+ * support this for the legacy arch_setup_msi_irqs() mechanism and for the
+ * fake irq domains like the x86 XEN one. Sigh...
+ *
+ * The descriptor is zeroed and only @desc::msi_index and @desc::affinity
+ * are set. When called from msix_setup_msi_descs() then the is_virtual
+ * attribute is initialized as well.
+ *
+ * Fill in the rest.
+ */
+void msix_prepare_msi_desc(struct pci_dev *dev, struct msi_desc *desc)
+{
+	desc->nvec_used				= 1;
+	desc->pci.msi_attrib.is_msix		= 1;
+	desc->pci.msi_attrib.is_64		= 1;
+	desc->pci.msi_attrib.default_irq	= dev->irq;
+	desc->pci.mask_base			= dev->msix_base;
+	desc->pci.msi_attrib.can_mask		= !pci_msi_ignore_mask &&
+						  !desc->pci.msi_attrib.is_virtual;
+
+	if (desc->pci.msi_attrib.can_mask) {
+		void __iomem *addr = pci_msix_desc_addr(desc);
+
+		desc->pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
+	}
+}
+
+static int msix_setup_msi_descs(struct pci_dev *dev, struct msix_entry *entries,
+				int nvec, struct irq_affinity_desc *masks)
 {
 	int ret = 0, i, vec_count = pci_msix_vec_count(dev);
 	struct irq_affinity_desc *curmsk;
 	struct msi_desc desc;
-	void __iomem *addr;
 
 	memset(&desc, 0, sizeof(desc));
 
-	desc.nvec_used			= 1;
-	desc.pci.msi_attrib.is_msix	= 1;
-	desc.pci.msi_attrib.is_64	= 1;
-	desc.pci.msi_attrib.default_irq	= dev->irq;
-	desc.pci.mask_base		= base;
-
 	for (i = 0, curmsk = masks; i < nvec; i++, curmsk++) {
 		desc.msi_index = entries ? entries[i].entry : i;
 		desc.affinity = masks ? curmsk : NULL;
 		desc.pci.msi_attrib.is_virtual = desc.msi_index >= vec_count;
-		desc.pci.msi_attrib.can_mask = !pci_msi_ignore_mask &&
-					       !desc.pci.msi_attrib.is_virtual;
 
-		if (desc.pci.msi_attrib.can_mask) {
-			addr = pci_msix_desc_addr(&desc);
-			desc.pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
-		}
+		msix_prepare_msi_desc(dev, &desc);
 
 		ret = msi_insert_msi_desc(&dev->dev, &desc);
 		if (ret)
@@ -629,9 +651,8 @@ static void msix_mask_all(void __iomem *
 		writel(ctrl, base + PCI_MSIX_ENTRY_VECTOR_CTRL);
 }
 
-static int msix_setup_interrupts(struct pci_dev *dev, void __iomem *base,
-				 struct msix_entry *entries, int nvec,
-				 struct irq_affinity *affd)
+static int msix_setup_interrupts(struct pci_dev *dev, struct msix_entry *entries,
+				 int nvec, struct irq_affinity *affd)
 {
 	struct irq_affinity_desc *masks = NULL;
 	int ret;
@@ -640,7 +661,7 @@ static int msix_setup_interrupts(struct
 		masks = irq_create_affinity_masks(nvec, affd);
 
 	msi_lock_descs(&dev->dev);
-	ret = msix_setup_msi_descs(dev, base, entries, nvec, masks);
+	ret = msix_setup_msi_descs(dev, entries, nvec, masks);
 	if (ret)
 		goto out_free;
 
@@ -678,7 +699,6 @@ static int msix_setup_interrupts(struct
 static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 				int nvec, struct irq_affinity *affd)
 {
-	void __iomem *base;
 	int ret, tsize;
 	u16 control;
 
@@ -696,15 +716,13 @@ static int msix_capability_init(struct p
 	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control);
 	/* Request & Map MSI-X table region */
 	tsize = msix_table_size(control);
-	base = msix_map_region(dev, tsize);
-	if (!base) {
+	dev->msix_base = msix_map_region(dev, tsize);
+	if (!dev->msix_base) {
 		ret = -ENOMEM;
 		goto out_disable;
 	}
 
-	dev->msix_base = base;
-
-	ret = msix_setup_interrupts(dev, base, entries, nvec, affd);
+	ret = msix_setup_interrupts(dev, entries, nvec, affd);
 	if (ret)
 		goto out_disable;
 
@@ -719,7 +737,7 @@ static int msix_capability_init(struct p
 	 * which takes the MSI-X mask bits into account even
 	 * when MSI-X is disabled, which prevents MSI delivery.
 	 */
-	msix_mask_all(base, tsize);
+	msix_mask_all(dev->msix_base, tsize);
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 
 	pcibios_free_irq(dev);
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -84,6 +84,8 @@ static inline __attribute_const__ u32 ms
 	return (1 << (1 << desc->pci.msi_attrib.multi_cap)) - 1;
 }
 
+void msix_prepare_msi_desc(struct pci_dev *dev, struct msi_desc *desc);
+
 /* Subsystem variables */
 extern int pci_msi_enable;
 


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 24/33] PCI/MSI: Provide prepare_desc() MSI domain op
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (22 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 23/33] PCI/MSI: Split MSI-X descriptor setup Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 25/33] PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X Thomas Gleixner
                   ` (10 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe,
	Jason Gunthorpe

The setup of MSI descriptors for PCI/MSI-X interrupts depends partially on
the MSI index for which the descriptor is initialized.

Dynamic MSI-X vector allocation post MSI-X enablement allows to allocate
vectors at a given index or at any free index in the available table
range. The latter requires that the descriptor is initialized after the
MSI core has chosen an index.

Implement the prepare_desc() op in the PCI/MSI-X specific msi_domain_ops
which is invoked before the core interrupt descriptor and the associated
Linux interrupt number is allocated.

That callback is also provided for the upcoming PCI/IMS implementations so
the implementation specific interrupt domain can do their domain specific
initialization of the MSI descriptors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
V2: Reworded changelog (Bjorn)
---
 drivers/pci/msi/irqdomain.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -202,6 +202,14 @@ static void pci_irq_unmask_msix(struct i
 	pci_msix_unmask(irq_data_get_msi_desc(data));
 }
 
+static void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
+				  struct msi_desc *desc)
+{
+	/* Don't fiddle with preallocated MSI descriptors */
+	if (!desc->pci.mask_base)
+		msix_prepare_msi_desc(to_pci_dev(desc->dev), desc);
+}
+
 static struct msi_domain_template pci_msix_template = {
 	.chip = {
 		.name			= "PCI-MSIX",
@@ -212,6 +220,7 @@ static struct msi_domain_template pci_ms
 	},
 
 	.ops = {
+		.prepare_desc		= pci_msix_prepare_desc,
 		.set_desc		= pci_device_domain_set_desc,
 	},
 


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 25/33] PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (23 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 24/33] PCI/MSI: Provide prepare_desc() MSI domain op Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 26/33] x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN Thomas Gleixner
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

MSI-X vectors can be allocated after the initial MSI-X enablement, but this
needs explicit support of the underlying interrupt domains.

Provide a function to query the ability and functions to allocate/free
individual vectors post-enable.

The allocation can either request a specific index in the MSI-X table or
with the index argument MSI_ANY_INDEX it allocates the next free vector.

The return value is a struct msi_map which on success contains both index
and the Linux interrupt number. In case of failure index is negative and
the Linux interrupt number is 0.

The allocation function is for a single MSI-X index at a time as that's
sufficient for the most urgent use case VFIO to get rid of the 'disable
MSI-X, reallocate, enable-MSI-X' cycle which is prone to lost interrupts
and redirections to the legacy and obviously unhandled INTx.

As single index allocation is also sufficient for the use cases Jason
Gunthorpe pointed out: Allocation of a MSI-X or IMS vector for a network
queue. See Link below.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20211126232735.547996838@linutronix.de
---
V2: Added Link to previous discussions (Bjorn)
---
 drivers/pci/msi/api.c       |   67 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/msi/irqdomain.c |    3 +
 include/linux/pci.h         |    6 +++
 3 files changed, 75 insertions(+), 1 deletion(-)

--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -113,6 +113,73 @@ int pci_enable_msix_range(struct pci_dev
 EXPORT_SYMBOL(pci_enable_msix_range);
 
 /**
+ * pci_msix_can_alloc_dyn - Query whether dynamic allocation after enabling
+ *			    MSI-X is supported
+ *
+ * @dev:	PCI device to operate on
+ *
+ * Return: True if supported, false otherwise
+ */
+bool pci_msix_can_alloc_dyn(struct pci_dev *dev)
+{
+	if (!dev->msix_cap)
+		return false;
+
+	return pci_msi_domain_supports(dev, MSI_FLAG_PCI_MSIX_ALLOC_DYN, DENY_LEGACY);
+}
+EXPORT_SYMBOL_GPL(pci_msix_can_alloc_dyn);
+
+/**
+ * pci_msix_alloc_irq_at - Allocate an MSI-X interrupt after enabling MSI-X
+ *			   at a given MSI-X vector index or any free vector index
+ *
+ * @dev:	PCI device to operate on
+ * @index:	Index to allocate. If @index == MSI_ANY_INDEX this allocates
+ *		the next free index in the MSI-X table
+ * @affdesc:	Optional pointer to an affinity descriptor structure. NULL otherwise
+ *
+ * Return: A struct msi_map
+ *
+ *	On success msi_map::index contains the allocated index (>= 0) and
+ *	msi_map::virq contains the allocated Linux interrupt number (> 0).
+ *
+ *	On fail msi_map::index contains the error code and msi_map::virq
+ *	is set to 0.
+ */
+struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
+				     const struct irq_affinity_desc *affdesc)
+{
+	struct msi_map map = { .index = -ENOTSUPP };
+
+	if (!dev->msix_enabled)
+		return map;
+
+	if (!pci_msix_can_alloc_dyn(dev))
+		return map;
+
+	return msi_domain_alloc_irq_at(&dev->dev, MSI_DEFAULT_DOMAIN, index, affdesc, NULL);
+}
+EXPORT_SYMBOL_GPL(pci_msix_alloc_irq_at);
+
+/**
+ * pci_msix_free_irq - Free an interrupt on a PCI/MSIX interrupt domain
+ *		      which was allocated via pci_msix_alloc_irq_at()
+ *
+ * @dev:	The PCI device to operate on
+ * @map:	A struct msi_map describing the interrupt to free
+ *		as returned from the allocation function.
+ */
+void pci_msix_free_irq(struct pci_dev *dev, struct msi_map map)
+{
+	if (WARN_ON_ONCE(map.index < 0 || map.virq <= 0))
+		return;
+	if (WARN_ON_ONCE(!pci_msix_can_alloc_dyn(dev)))
+		return;
+	msi_domain_free_irqs_range(&dev->dev, MSI_DEFAULT_DOMAIN, map.index, map.index);
+}
+EXPORT_SYMBOL_GPL(pci_msix_free_irq);
+
+/**
  * pci_disable_msix() - Disable MSI-X interrupt mode on device
  * @dev: the PCI device to operate on
  *
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -225,7 +225,8 @@ static struct msi_domain_template pci_ms
 	},
 
 	.info = {
-		.flags			= MSI_COMMON_FLAGS | MSI_FLAG_PCI_MSIX,
+		.flags			= MSI_COMMON_FLAGS | MSI_FLAG_PCI_MSIX |
+					  MSI_FLAG_PCI_MSIX_ALLOC_DYN,
 		.bus_token		= DOMAIN_BUS_PCI_DEVICE_MSIX,
 	},
 };
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -38,6 +38,7 @@
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/resource_ext.h>
+#include <linux/msi_api.h>
 #include <uapi/linux/pci.h>
 
 #include <linux/pci_ids.h>
@@ -1559,6 +1560,11 @@ int pci_alloc_irq_vectors_affinity(struc
 				   unsigned int max_vecs, unsigned int flags,
 				   struct irq_affinity *affd);
 
+bool pci_msix_can_alloc_dyn(struct pci_dev *dev);
+struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
+				     const struct irq_affinity_desc *affdesc);
+void pci_msix_free_irq(struct pci_dev *pdev, struct msi_map map);
+
 void pci_free_irq_vectors(struct pci_dev *dev);
 int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
 const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev, int vec);


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 26/33] x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (24 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 25/33] PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 27/33] genirq/msi: Provide constants for PCI/IMS support Thomas Gleixner
                   ` (8 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

x86 MSI irqdomains can handle MSI-X allocation post MSI-X enable just out
of the box - on the vector domain and on the remapping domains,

Add the feature flag to the supported feature list

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/msi.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -63,7 +63,7 @@ struct msi_msg;
 u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid);
 
 #define X86_VECTOR_MSI_FLAGS_SUPPORTED					\
-	(MSI_GENERIC_FLAGS_MASK | MSI_FLAG_PCI_MSIX)
+	(MSI_GENERIC_FLAGS_MASK | MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN)
 
 #define X86_VECTOR_MSI_FLAGS_REQUIRED					\
 	(MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS)


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 27/33] genirq/msi: Provide constants for PCI/IMS support
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (25 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 26/33] x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 28/33] PCI/MSI: Provide IMS (Interrupt Message Store) support Thomas Gleixner
                   ` (7 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Provide the necessary constants for PCI/IMS support:

  - A new bus token for MSI irqdomain identification
  - A MSI feature flag for the MSI irqdomains to signal support
  - A secondary domain id

The latter expands the device internal domain pointer storage array from 1
to 2 entries. That extra pointer is mostly unused today, but the
alternative solutions would not be free either and would introduce more
complexity all over the place. Trade the 8bytes for simplicity.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqdomain_defs.h |    1 +
 include/linux/msi.h            |    2 ++
 include/linux/msi_api.h        |    1 +
 3 files changed, 4 insertions(+)

--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -25,6 +25,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_PCI_DEVICE_MSIX,
 	DOMAIN_BUS_DMAR,
 	DOMAIN_BUS_AMDVI,
+	DOMAIN_BUS_PCI_DEVICE_IMS,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -559,6 +559,8 @@ enum {
 	MSI_FLAG_MSIX_CONTIGUOUS	= (1 << 19),
 	/* PCI/MSI-X vectors can be dynamically allocated/freed post MSI-X enable */
 	MSI_FLAG_PCI_MSIX_ALLOC_DYN	= (1 << 20),
+	/* Support for PCI/IMS */
+	MSI_FLAG_PCI_IMS		= (1 << 21),
 };
 
 /**
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -15,6 +15,7 @@ struct device;
  */
 enum msi_domain_ids {
 	MSI_DEFAULT_DOMAIN,
+	MSI_SECONDARY_DOMAIN,
 	MSI_MAX_DEVICE_IRQDOMAINS,
 };
 


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 28/33] PCI/MSI: Provide IMS (Interrupt Message Store) support
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (26 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 27/33] genirq/msi: Provide constants for PCI/IMS support Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
                     ` (2 more replies)
  2022-11-24 23:26 ` [patch V3 29/33] PCI/MSI: Provide pci_ims_alloc/free_irq() Thomas Gleixner
                   ` (6 subsequent siblings)
  34 siblings, 3 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

IMS (Interrupt Message Store) is a new specification which allows
implementation specific storage of MSI messages contrary to the
strict standard specified MSI and MSI-X message stores.

This requires new device specific interrupt domains to handle the
implementation defined storage which can be an array in device memory or
host/guest memory which is shared with hardware queues.

Add a function to create IMS domains for PCI devices. IMS domains are using
the new per device domain mechanism and are configured by the device driver
via a template. IMS domains are created as secondary device domains so they
work side on side with MSI[-X] on the same device.

The IMS domains have a few constraints:

  - The index space is managed by the core code.

    Device memory based IMS provides a storage array with a fixed size
    which obviously requires an index. But there is no association between
    index and functionality so the core can randomly allocate an index in
    the array.

    System memory based IMS does not have the concept of an index as the
    storage is somewhere in memory. In that case the index is purely
    software based to keep track of the allocations.

  - There is no requirement for consecutive index ranges

    This is currently a limitation of the MSI core and can be implemented
    if there is a justified use case by changing the internal storage from
    xarray to maple_tree. For now it's single vector allocation.

  - The interrupt chip must provide the following callbacks:

  	- irq_mask()
	- irq_unmask()
	- irq_write_msi_msg()

   - The interrupt chip must provide the following optional callbacks
     when the irq_mask(), irq_unmask() and irq_write_msi_msg() callbacks
     cannot operate directly on hardware, e.g. in the case that the
     interrupt message store is in queue memory:

     	- irq_bus_lock()
	- irq_bus_unlock()

     These callbacks are invoked from preemptible task context and are
     allowed to sleep. In this case the mandatory callbacks above just
     store the information. The irq_bus_unlock() callback is supposed to
     make the change effective before returning.

   - Interrupt affinity setting is handled by the underlying parent
     interrupt domain and communicated to the IMS domain via
     irq_write_msi_msg(). IMS domains cannot have a irq_set_affinity()
     callback. That's a reasonable restriction similar to the PCI/MSI
     device domain implementations.

The domain is automatically destroyed when the PCI device is removed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Queue memory -> system memory (Kevin)
---
 drivers/pci/msi/irqdomain.c |   59 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h         |    5 +++
 2 files changed, 64 insertions(+)

--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -355,6 +355,65 @@ bool pci_msi_domain_supports(struct pci_
 	return (supported & feature_mask) == feature_mask;
 }
 
+/**
+ * pci_create_ims_domain - Create a secondary IMS domain for a PCI device
+ * @pdev:	The PCI device to operate on
+ * @template:	The MSI info template which describes the domain
+ * @hwsize:	The size of the hardware entry table or 0 if the domain
+ *		is purely software managed
+ * @data:	Optional pointer to domain specific data to be stored
+ *		in msi_domain_info::data
+ *
+ * Return: True on success, false otherwise
+ *
+ * An IMS domain is expected to have the following constraints:
+ *	- The index space is managed by the core code
+ *
+ *	- There is no requirement for consecutive index ranges
+ *
+ *	- The interrupt chip must provide the following callbacks:
+ *		- irq_mask()
+ *		- irq_unmask()
+ *		- irq_write_msi_msg()
+ *
+ *	- The interrupt chip must provide the following optional callbacks
+ *	  when the irq_mask(), irq_unmask() and irq_write_msi_msg() callbacks
+ *	  cannot operate directly on hardware, e.g. in the case that the
+ *	  interrupt message store is in queue memory:
+ *		- irq_bus_lock()
+ *		- irq_bus_unlock()
+ *
+ *	  These callbacks are invoked from preemptible task context and are
+ *	  allowed to sleep. In this case the mandatory callbacks above just
+ *	  store the information. The irq_bus_unlock() callback is supposed
+ *	  to make the change effective before returning.
+ *
+ *	- Interrupt affinity setting is handled by the underlying parent
+ *	  interrupt domain and communicated to the IMS domain via
+ *	  irq_write_msi_msg().
+ *
+ * The domain is automatically destroyed when the PCI device is removed.
+ */
+bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
+			   unsigned int hwsize, void *data)
+{
+	struct irq_domain *domain = dev_get_msi_domain(&pdev->dev);
+
+	if (!domain || !irq_domain_is_msi_parent(domain))
+		return -ENOTSUPP;
+
+	if (template->info.bus_token != DOMAIN_BUS_PCI_DEVICE_IMS ||
+	    !(template->info.flags & MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS) ||
+	    !(template->info.flags & MSI_FLAG_FREE_MSI_DESCS) ||
+	    !template->chip.irq_mask || !template->chip.irq_unmask ||
+	    !template->chip.irq_write_msi_msg || template->chip.irq_set_affinity)
+		return -EINVAL;
+
+	return msi_create_device_irq_domain(&pdev->dev, MSI_SECONDARY_DOMAIN, template,
+					    hwsize, data, NULL);
+}
+EXPORT_SYMBOL_GPL(pci_create_ims_domain);
+
 /*
  * Users of the generic MSI infrastructure expect a device to have a single ID,
  * so with DMA aliases we have to pick the least-worst compromise. Devices with
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2487,6 +2487,11 @@ static inline bool pci_is_thunderbolt_at
 void pci_uevent_ers(struct pci_dev *pdev, enum  pci_ers_result err_type);
 #endif
 
+struct msi_domain_template;
+
+bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
+			   unsigned int hwsize, void *data);
+
 #include <linux/dma-mapping.h>
 
 #define pci_printk(level, pdev, fmt, arg...) \


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 29/33] PCI/MSI: Provide pci_ims_alloc/free_irq()
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (27 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 28/33] PCI/MSI: Provide IMS (Interrupt Message Store) support Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-11-28  4:47   ` Tian, Kevin
                     ` (2 more replies)
  2022-11-24 23:26 ` [patch V3 30/33] x86/apic/msi: Enable PCI/IMS Thomas Gleixner
                   ` (5 subsequent siblings)
  34 siblings, 3 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Single vector allocation which allocates the next free index in the IMS
space. The free function releases.

All allocated vectors are released also via pci_free_vectors() which is
also releasing MSI/MSI-X vectors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
V3: s/cookie/icookie/ (Kevin)
---
 drivers/pci/msi/api.c |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h   |    3 +++
 2 files changed, 53 insertions(+)

--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -361,6 +361,56 @@ const struct cpumask *pci_irq_get_affini
 EXPORT_SYMBOL(pci_irq_get_affinity);
 
 /**
+ * pci_ims_alloc_irq - Allocate an interrupt on a PCI/IMS interrupt domain
+ * @dev:	The PCI device to operate on
+ * @icookie:	Pointer to an IMS implementation specific cookie for this
+ *		IMS instance (PASID, queue ID, pointer...).
+ *		The cookie content is copied into the MSI descriptor for the
+ *		interrupt chip callbacks or domain specific setup functions.
+ * @affdesc:	Optional pointer to an interrupt affinity descriptor
+ *
+ * There is no index for IMS allocations as IMS is an implementation
+ * specific storage and does not have any direct associations between
+ * index, which might be a pure software construct, and device
+ * functionality. This association is established by the driver either via
+ * the index - if there is a hardware table - or in case of purely software
+ * managed IMS implementation the association happens via the
+ * irq_write_msi_msg() callback of the implementation specific interrupt
+ * chip, which utilizes the provided @icookie to store the MSI message in
+ * the appropriate place.
+ *
+ * Return: A struct msi_map
+ *
+ *	On success msi_map::index contains the allocated index (>= 0) and
+ *	msi_map::virq the allocated Linux interrupt number (> 0).
+ *
+ *	On fail msi_map::index contains the error code and msi_map::virq
+ *	is set to 0.
+ */
+struct msi_map pci_ims_alloc_irq(struct pci_dev *dev, union msi_instance_cookie *icookie,
+				 const struct irq_affinity_desc *affdesc)
+{
+	return msi_domain_alloc_irq_at(&dev->dev, MSI_SECONDARY_DOMAIN, MSI_ANY_INDEX,
+				       affdesc, icookie);
+}
+EXPORT_SYMBOL_GPL(pci_ims_alloc_irq);
+
+/**
+ * pci_ims_free_irq - Allocate an interrupt on a PCI/IMS interrupt domain
+ *		      which was allocated via pci_ims_alloc_irq()
+ * @dev:	The PCI device to operate on
+ * @map:	A struct msi_map describing the interrupt to free as
+ *		returned from pci_ims_alloc_irq()
+ */
+void pci_ims_free_irq(struct pci_dev *dev, struct msi_map map)
+{
+	if (WARN_ON_ONCE(map.index < 0 || !map.virq))
+		return;
+	msi_domain_free_irqs_range(&dev->dev, MSI_SECONDARY_DOMAIN, map.index, map.index);
+}
+EXPORT_SYMBOL_GPL(pci_ims_free_irq);
+
+/**
  * pci_free_irq_vectors() - Free previously allocated IRQs for a device
  * @dev: the PCI device to operate on
  *
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2491,6 +2491,9 @@ struct msi_domain_template;
 
 bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
 			   unsigned int hwsize, void *data);
+struct msi_map pci_ims_alloc_irq(struct pci_dev *pdev, union msi_instance_cookie *icookie,
+				 const struct irq_affinity_desc *affdesc);
+void pci_ims_free_irq(struct pci_dev *pdev, struct msi_map map);
 
 #include <linux/dma-mapping.h>
 


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 30/33] x86/apic/msi: Enable PCI/IMS
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (28 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 29/33] PCI/MSI: Provide pci_ims_alloc/free_irq() Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 31/33] iommu/vt-d: " Thomas Gleixner
                   ` (4 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Enable IMS in the domain init and allocation mapping code, but do not
enable it on the vector domain as discussed in various threads on LKML.

The interrupt remap domains can expand this setting like they do with
PCI multi MSI.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/msi.c |    5 +++++
 1 file changed, 5 insertions(+)

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -184,6 +184,7 @@ static int x86_msi_prepare(struct irq_do
 		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
 		return 0;
 	case DOMAIN_BUS_PCI_DEVICE_MSIX:
+	case DOMAIN_BUS_PCI_DEVICE_IMS:
 		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
 		return 0;
 	default:
@@ -230,6 +231,10 @@ static bool x86_init_dev_msi_info(struct
 	case DOMAIN_BUS_PCI_DEVICE_MSI:
 	case DOMAIN_BUS_PCI_DEVICE_MSIX:
 		break;
+	case DOMAIN_BUS_PCI_DEVICE_IMS:
+		if (!(pops->supported_flags & MSI_FLAG_PCI_IMS))
+			return false;
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		return false;


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 31/33] iommu/vt-d: Enable PCI/IMS
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (29 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 30/33] x86/apic/msi: Enable PCI/IMS Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 32/33] iommu/amd: " Thomas Gleixner
                   ` (3 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

PCI/IMS works like PCI/MSI-X in the remapping. Just add the feature flag,
but only when on real hardware.

Virtualized IOMMUs need additional support, e.g. for PASID.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Only enable on real hardware (Kevin)
---
 drivers/iommu/intel/irq_remapping.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -82,7 +82,7 @@ static const struct irq_domain_ops intel
 
 static void iommu_disable_irq_remapping(struct intel_iommu *iommu);
 static int __init parse_ioapics_under_ir(void);
-static const struct msi_parent_ops dmar_msi_parent_ops;
+static const struct msi_parent_ops dmar_msi_parent_ops, virt_dmar_msi_parent_ops;
 
 static bool ir_pre_enabled(struct intel_iommu *iommu)
 {
@@ -577,7 +577,11 @@ static int intel_setup_irq_remapping(str
 
 	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_DMAR);
 	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
-	iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops;
+
+	if (cap_caching_mode(iommu->cap))
+		iommu->ir_domain->msi_parent_ops = &virt_dmar_msi_parent_ops;
+	else
+		iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops;
 
 	ir_table->base = page_address(pages);
 	ir_table->bitmap = bitmap;
@@ -1429,11 +1433,20 @@ static const struct irq_domain_ops intel
 };
 
 static const struct msi_parent_ops dmar_msi_parent_ops = {
-	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI |
+				  MSI_FLAG_PCI_IMS,
 	.prefix			= "IR-",
 	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
 };
 
+static const struct msi_parent_ops virt_dmar_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "vIR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 /*
  * Support of Interrupt Remapping Unit Hotplug
  */


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 32/33] iommu/amd: Enable PCI/IMS
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (30 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 31/33] iommu/vt-d: " Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-11-24 23:26 ` [patch V3 33/33] irqchip: Add IDXD Interrupt Message Store driver Thomas Gleixner
                   ` (2 subsequent siblings)
  34 siblings, 2 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

PCI/IMS works like PCI/MSI-X in the remapping. Just add the feature flag,
but only when on real hardware.

Virtualized IOMMUs need additional support.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Only enable on real hardware (Kevin)
---
 drivers/iommu/amd/iommu.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3649,11 +3649,20 @@ static struct irq_chip amd_ir_chip = {
 };
 
 static const struct msi_parent_ops amdvi_msi_parent_ops = {
-	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI |
+				  MSI_FLAG_PCI_IMS,
 	.prefix			= "IR-",
 	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
 };
 
+static const struct msi_parent_ops virt_amdvi_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "vIR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 {
 	struct fwnode_handle *fn;
@@ -3670,7 +3679,11 @@ int amd_iommu_create_irq_domain(struct a
 
 	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_AMDVI);
 	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
-	iommu->ir_domain->msi_parent_ops = &amdvi_msi_parent_ops;
+
+	if (amd_iommu_np_cache)
+		iommu->ir_domain->msi_parent_ops = &virt_amdvi_msi_parent_ops;
+	else
+		iommu->ir_domain->msi_parent_ops = &amdvi_msi_parent_ops;
 
 	return 0;
 }


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [patch V3 33/33] irqchip: Add IDXD Interrupt Message Store driver
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (31 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 32/33] iommu/amd: " Thomas Gleixner
@ 2022-11-24 23:26 ` Thomas Gleixner
  2022-11-28  4:50 ` [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Tian, Kevin
  2022-12-05 11:07 ` Marc Zyngier
  34 siblings, 0 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-24 23:26 UTC (permalink / raw)
  To: LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Provide a driver for the Intel IDXD IMS implementation. The implementation
uses a large message store array in device memory.

The IMS domain implementation is minimal and just provides the required
irq_chip callbacks and one domain callback which prepares the MSI
descriptor for easy usage in the irq_chip callbacks.

The necessary iobase is stored in the irqdomain and the PASID which is
required for operation is handed in via msi_instance_cookie in the
allocation function.

Not much to see here. A few lines of code and a filled in template is all
what's needed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Update changelog/comments (Kevin)
---
 drivers/irqchip/Kconfig                    |    7 +
 drivers/irqchip/Makefile                   |    1 
 drivers/irqchip/irq-pci-intel-idxd.c       |  143 +++++++++++++++++++++++++++++
 include/linux/irqchip/irq-pci-intel-idxd.h |   22 ++++
 4 files changed, 173 insertions(+)

--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -695,4 +695,11 @@ config SUNPLUS_SP7021_INTC
 	  chained controller, routing all interrupt source in P-Chip to
 	  the primary controller on C-Chip.
 
+config PCI_INTEL_IDXD_IMS
+	tristate "Intel IDXD Interrupt Message Store controller"
+	depends on PCI_MSI
+	help
+	  Support for Intel IDXD Interrupt Message Store (IMS) controller
+	  with IMS slot storage in a slot array in device memory
+
 endmenu
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -121,3 +121,4 @@ obj-$(CONFIG_IRQ_IDT3243X)		+= irq-idt32
 obj-$(CONFIG_APPLE_AIC)			+= irq-apple-aic.o
 obj-$(CONFIG_MCHP_EIC)			+= irq-mchp-eic.o
 obj-$(CONFIG_SUNPLUS_SP7021_INTC)	+= irq-sp7021-intc.o
+obj-$(CONFIG_PCI_INTEL_IDXD_IMS)	+= irq-pci-intel-idxd.o
--- /dev/null
+++ b/drivers/irqchip/irq-pci-intel-idxd.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Interrupt chip and domain for Intel IDXD with hardware array based
+ * interrupt message store (IMS).
+ */
+#include <linux/device.h>
+#include <linux/irq.h>
+#include <linux/irqdomain.h>
+#include <linux/msi.h>
+#include <linux/pci.h>
+
+#include <linux/irqchip/irq-pci-intel-idxd.h>
+
+MODULE_LICENSE("GPL");
+
+/**
+ * struct ims_slot - The hardware layout of a slot in the memory table
+ * @address_lo:	Lower 32bit address
+ * @address_hi:	Upper 32bit address
+ * @data:	Message data
+ * @ctrl:	Control word
+ */
+struct ims_slot {
+	u32	address_lo;
+	u32	address_hi;
+	u32	data;
+	u32	ctrl;
+} __packed;
+
+/* Bit to mask the interrupt in the control word */
+#define CTRL_VECTOR_MASKBIT	BIT(0)
+/* Bit to enable PASID in the control word */
+#define CTRL_PASID_ENABLE	BIT(3)
+/* Position of PASID.LSB in the control word */
+#define CTRL_PASID_SHIFT	12
+
+static inline void iowrite32_and_flush(u32 value, void __iomem *addr)
+{
+	iowrite32(value, addr);
+	ioread32(addr);
+}
+
+static void idxd_mask(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->data.dcookie.iobase;
+	u32 cval = (u32)desc->data.icookie.value;
+
+	iowrite32_and_flush(cval | CTRL_VECTOR_MASKBIT, &slot->ctrl);
+}
+
+static void idxd_unmask(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->data.dcookie.iobase;
+	u32 cval = (u32)desc->data.icookie.value;
+
+	iowrite32_and_flush(cval, &slot->ctrl);
+}
+
+static void idxd_write_msi_msg(struct irq_data *data, struct msi_msg *msg)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->data.dcookie.iobase;
+
+	iowrite32(msg->address_lo, &slot->address_lo);
+	iowrite32(msg->address_hi, &slot->address_hi);
+	iowrite32_and_flush(msg->data, &slot->data);
+}
+
+static void idxd_shutdown(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+	struct ims_slot __iomem *slot = desc->data.dcookie.iobase;
+
+	iowrite32(0, &slot->address_lo);
+	iowrite32(0, &slot->address_hi);
+	iowrite32(0, &slot->data);
+	iowrite32_and_flush(CTRL_VECTOR_MASKBIT, &slot->ctrl);
+}
+
+static void idxd_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
+			      struct msi_desc *desc)
+{
+	struct msi_domain_info *info = domain->host_data;
+	struct ims_slot __iomem *slot;
+
+	/* Set up the slot address for the irq_chip callbacks */
+	slot = (__force struct ims_slot __iomem *) info->data;
+	slot += desc->msi_index;
+	desc->data.dcookie.iobase = slot;
+
+	/* Mask the interrupt for paranoia sake */
+	iowrite32_and_flush(CTRL_VECTOR_MASKBIT, &slot->ctrl);
+
+	/*
+	 * The caller provided PASID. Shift it to the proper position
+	 * and set the PASID enable bit.
+	 */
+	desc->data.icookie.value <<= CTRL_PASID_SHIFT;
+	desc->data.icookie.value |= CTRL_PASID_ENABLE;
+
+	arg->hwirq = desc->msi_index;
+}
+
+static const struct msi_domain_template idxd_ims_template = {
+	.chip = {
+		.name			= "PCI-IDXD",
+		.irq_mask		= idxd_mask,
+		.irq_unmask		= idxd_unmask,
+		.irq_write_msi_msg	= idxd_write_msi_msg,
+		.irq_shutdown		= idxd_shutdown,
+		.flags			= IRQCHIP_ONESHOT_SAFE,
+	},
+
+	.ops = {
+		.prepare_desc		= idxd_prepare_desc,
+	},
+
+	.info = {
+		.flags			= MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS |
+					  MSI_FLAG_FREE_MSI_DESCS |
+					  MSI_FLAG_PCI_IMS,
+		.bus_token		= DOMAIN_BUS_PCI_DEVICE_IMS,
+	},
+};
+
+/**
+ * pci_intel_idxd_create_ims_domain - Create a IDXD IMS domain
+ * @pdev:	IDXD PCI device to operate on
+ * @slots:	Pointer to the mapped slot memory array
+ * @nr_slots:	The number of slots in the array
+ *
+ * Returns: True on success, false otherwise
+ *
+ * The domain is automatically destroyed when the @pdev is destroyed
+ */
+bool pci_intel_idxd_create_ims_domain(struct pci_dev *pdev, void __iomem *slots,
+				      unsigned int nr_slots)
+{
+	return pci_create_ims_domain(pdev, &idxd_ims_template, nr_slots, (__force void *)slots);
+}
+EXPORT_SYMBOL_GPL(pci_intel_idxd_create_ims_domain);
--- /dev/null
+++ b/include/linux/irqchip/irq-pci-intel-idxd.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* (C) Copyright 2022 Thomas Gleixner <tglx@linutronix.de> */
+
+#ifndef _LINUX_IRQCHIP_IRQ_PCI_INTEL_IDXD_H
+#define _LINUX_IRQCHIP_IRQ_PCI_INTEL_IDXD_H
+
+#include <linux/msi_api.h>
+#include <linux/bits.h>
+#include <linux/types.h>
+
+/*
+ * Conveniance macro to wrap the PASID for interrupt allocation
+ * via pci_ims_alloc_irq(pdev, INTEL_IDXD_DEV_COOKIE(pasid))
+ */
+#define INTEL_IDXD_DEV_COOKIE(pasid)	(union msi_instance_cookie) { .value = (pasid), }
+
+struct pci_dev;
+
+bool pci_intel_idxd_create_ims_domain(struct pci_dev *pdev, void __iomem *slots,
+				      unsigned int nr_slots);
+
+#endif


^ permalink raw reply	[flat|nested] 126+ messages in thread

* RE: [patch V3 12/33] PCI/MSI: Add support for per device MSI[X] domains
  2022-11-24 23:26 ` [patch V3 12/33] PCI/MSI: Add support for per device MSI[X] domains Thomas Gleixner
@ 2022-11-28  4:46   ` Tian, Kevin
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 126+ messages in thread
From: Tian, Kevin @ 2022-11-28  4:46 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Jiang, Dave, Alex Williamson, Williams, Dan J,
	Logan Gunthorpe, Raj, Ashok, Jon Mason, Allen Hubbe,
	Ahmed S. Darwish

> From: Thomas Gleixner <tglx@linutronix.de>
> Sent: Friday, November 25, 2022 7:26 AM
> +
> +	if (!irq_domain_is_msi_parent(domain)) {
> +		/*
> +		 * For "global" PCI/MSI interrupt domains the associated
> +		 * msi_domain_info::flags is the authoritive source of
> +		 * information.
> +		 */
> +		info = domain->host_data;
> +		supported = info->flags;
> +	} else {
> +		/*
> +		 * For MSI parent domains the supported feature set
> +		 * is avaliable in the parent ops. This makes checks
> +		 * possible before actually instantiating the
> +		 * per device domain because the parent is never
> +		 * expanding the PCI/MSI functionality.
> +		 */
> +		supported = domain->msi_parent_ops->supported_flags;
> +	}

As discussed in v2 it's probably clearer to also mark out that it's always
the direct parent putting restrictions on all existing architectures. That
is why checking direct parent is sufficient here.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* RE: [patch V3 29/33] PCI/MSI: Provide pci_ims_alloc/free_irq()
  2022-11-24 23:26 ` [patch V3 29/33] PCI/MSI: Provide pci_ims_alloc/free_irq() Thomas Gleixner
@ 2022-11-28  4:47   ` Tian, Kevin
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 126+ messages in thread
From: Tian, Kevin @ 2022-11-28  4:47 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Jiang, Dave, Alex Williamson, Williams, Dan J,
	Logan Gunthorpe, Raj, Ashok, Jon Mason, Allen Hubbe

> From: Thomas Gleixner <tglx@linutronix.de>
> Sent: Friday, November 25, 2022 7:27 AM
> +/**
> + * pci_ims_free_irq - Allocate an interrupt on a PCI/IMS interrupt domain
> + *		      which was allocated via pci_ims_alloc_irq()
> + * @dev:	The PCI device to operate on
> + * @map:	A struct msi_map describing the interrupt to free as
> + *		returned from pci_ims_alloc_irq()
> + */
> +void pci_ims_free_irq(struct pci_dev *dev, struct msi_map map)
> +{
> +	if (WARN_ON_ONCE(map.index < 0 || !map.virq))
> +		return;

"map.virq <= 0"

^ permalink raw reply	[flat|nested] 126+ messages in thread

* RE: [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (32 preceding siblings ...)
  2022-11-24 23:26 ` [patch V3 33/33] irqchip: Add IDXD Interrupt Message Store driver Thomas Gleixner
@ 2022-11-28  4:50 ` Tian, Kevin
  2022-12-05 11:07 ` Marc Zyngier
  34 siblings, 0 replies; 126+ messages in thread
From: Tian, Kevin @ 2022-11-28  4:50 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Jiang, Dave, Alex Williamson, Williams, Dan J,
	Logan Gunthorpe, Raj, Ashok, Jon Mason, Allen Hubbe

> From: Thomas Gleixner <tglx@linutronix.de>
> Sent: Friday, November 25, 2022 7:26 AM
> 
> Changes vs. v2:
> 
>   - Rework the domain size initialization and handling (Kevin)
> 
>   - Enable IMS only when on real hardware (Kevin)
> 
>   - Rename the PCI/MSI irqchip functions (Kevin)
> 
>   - Update change logs and comments (Kevin)
> 
> The delta patch vs. V3 is attached below. It's not completely accurate as
> it has some changes from part 2 intermingled, but you get the idea.
> 

This series looks good to me:

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 21/33] genirq/msi: Provide msi_domain_alloc_irq_at()
  2022-11-24 23:26 ` [patch V3 21/33] genirq/msi: Provide msi_domain_alloc_irq_at() Thomas Gleixner
@ 2022-11-28 14:39   ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 126+ messages in thread
From: Thomas Gleixner @ 2022-11-28 14:39 UTC (permalink / raw)
  To: LKML

On Fri, Nov 25 2022 at 00:26, Thomas Gleixner wrote:
> +struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index,
> +				       const struct irq_affinity_desc *affdesc,
> +				       union msi_instance_cookie *icookie)
> +{
> +	struct msi_ctrl ctrl = { .domid	= domid, .nirqs = 1, };
> +	struct msi_domain_info *info;

Remove

> +	struct irq_domain *domain;
> +	struct msi_map map = { };
> +	struct msi_desc *desc;
> +	int ret;
> +
> +	msi_lock_descs(dev);
> +	domain = msi_get_device_domain(dev, domid);
> +	if (!domain) {
> +		map.index = -ENODEV;
> +		goto unlock;
> +	}
> +
> +	desc = msi_alloc_desc(dev, 1, affdesc);
> +	if (!desc) {
> +		map.index = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	if (icookie)
> +		desc->data.icookie = *icookie;
> +
> +	ret = msi_insert_desc(dev, desc, domid, index);
> +	if (ret) {
> +		map.index = ret;
> +		goto unlock;
> +	}
> +
> +	ctrl.first = ctrl.last = desc->msi_index;
> +	info = domain->host_data;

Pointless

> +	ret = __msi_domain_alloc_irqs(dev, domain, &ctrl);
> +	if (ret) {
> +		map.index = ret;
> +		msi_domain_free_locked(dev, &ctrl);
> +	} else {
> +		map.index = desc->msi_index;
> +		map.virq = desc->irq;
> +	}
> +unlock:
> +	msi_unlock_descs(dev);
> +	return map;
> +}
> +
>  static void __msi_domain_free_irqs(struct device *dev, struct irq_domain *domain,
>  				   struct msi_ctrl *ctrl)
>  {

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation
  2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
                   ` (33 preceding siblings ...)
  2022-11-28  4:50 ` [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Tian, Kevin
@ 2022-12-05 11:07 ` Marc Zyngier
  34 siblings, 0 replies; 126+ messages in thread
From: Marc Zyngier @ 2022-12-05 11:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Greg Kroah-Hartman, Jason Gunthorpe,
	Dave Jiang, Alex Williamson, Kevin Tian, Dan Williams,
	Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Thu, 24 Nov 2022 23:25:45 +0000,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> This is V3 of the third part of the effort to provide support for per device
> MSI interrupt domains.

As for Part-2, I have only glanced at the various changes due to
limited bandwidth, but this seems to be a reasonable approach to the
multi-bus (punt intended!) MSI thingy.

FWIW:

Acked-by: Marc Zyngier <maz@kernel.org>

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [tip: irq/core] iommu/vt-d: Enable PCI/IMS
  2022-11-24 23:26 ` [patch V3 31/33] iommu/vt-d: " Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     4f8d12389509c80f275a12926901d6619f2046c7
Gitweb:        https://git.kernel.org/tip/4f8d12389509c80f275a12926901d6619f2046c7
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:34 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:04 +01:00

iommu/vt-d: Enable PCI/IMS

PCI/IMS works like PCI/MSI-X in the remapping. Just add the feature flag,
but only when on real hardware.

Virtualized IOMMUs need additional support, e.g. for PASID.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232327.081482253@linutronix.de

---
 drivers/iommu/intel/irq_remapping.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 6fab407..a723f53 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -82,7 +82,7 @@ static const struct irq_domain_ops intel_ir_domain_ops;
 
 static void iommu_disable_irq_remapping(struct intel_iommu *iommu);
 static int __init parse_ioapics_under_ir(void);
-static const struct msi_parent_ops dmar_msi_parent_ops;
+static const struct msi_parent_ops dmar_msi_parent_ops, virt_dmar_msi_parent_ops;
 
 static bool ir_pre_enabled(struct intel_iommu *iommu)
 {
@@ -577,7 +577,11 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 
 	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_DMAR);
 	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
-	iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops;
+
+	if (cap_caching_mode(iommu->cap))
+		iommu->ir_domain->msi_parent_ops = &virt_dmar_msi_parent_ops;
+	else
+		iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops;
 
 	ir_table->base = page_address(pages);
 	ir_table->bitmap = bitmap;
@@ -1429,11 +1433,20 @@ static const struct irq_domain_ops intel_ir_domain_ops = {
 };
 
 static const struct msi_parent_ops dmar_msi_parent_ops = {
-	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI |
+				  MSI_FLAG_PCI_IMS,
 	.prefix			= "IR-",
 	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
 };
 
+static const struct msi_parent_ops virt_dmar_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "vIR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 /*
  * Support of Interrupt Remapping Unit Hotplug
  */

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] iommu/amd: Enable PCI/IMS
  2022-11-24 23:26 ` [patch V3 32/33] iommu/amd: " Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     73c658f384d7a48e0e18ef0bc5458c8c6ea80574
Gitweb:        https://git.kernel.org/tip/73c658f384d7a48e0e18ef0bc5458c8c6ea80574
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:36 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:05 +01:00

iommu/amd: Enable PCI/IMS

PCI/IMS works like PCI/MSI-X in the remapping. Just add the feature flag,
but only when on real hardware.

Virtualized IOMMUs need additional support.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232327.140571546@linutronix.de

---
 drivers/iommu/amd/iommu.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7caccd8..4d28967 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3649,11 +3649,20 @@ static struct irq_chip amd_ir_chip = {
 };
 
 static const struct msi_parent_ops amdvi_msi_parent_ops = {
-	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI |
+				  MSI_FLAG_PCI_IMS,
 	.prefix			= "IR-",
 	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
 };
 
+static const struct msi_parent_ops virt_amdvi_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "vIR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 {
 	struct fwnode_handle *fn;
@@ -3670,7 +3679,11 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 
 	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_AMDVI);
 	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
-	iommu->ir_domain->msi_parent_ops = &amdvi_msi_parent_ops;
+
+	if (amd_iommu_np_cache)
+		iommu->ir_domain->msi_parent_ops = &virt_amdvi_msi_parent_ops;
+	else
+		iommu->ir_domain->msi_parent_ops = &amdvi_msi_parent_ops;
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] x86/apic/msi: Enable PCI/IMS
  2022-11-24 23:26 ` [patch V3 30/33] x86/apic/msi: Enable PCI/IMS Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     ddd98f1b7b57dad5ae5efbe54154722aa6368b11
Gitweb:        https://git.kernel.org/tip/ddd98f1b7b57dad5ae5efbe54154722aa6368b11
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:32 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:04 +01:00

x86/apic/msi: Enable PCI/IMS

Enable IMS in the domain init and allocation mapping code, but do not
enable it on the vector domain as discussed in various threads on LKML.

The interrupt remap domains can expand this setting like they do with
PCI multi MSI.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232327.022658817@linutronix.de

---
 arch/x86/kernel/apic/msi.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 682f51a..35d5b8f 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -184,6 +184,7 @@ static int x86_msi_prepare(struct irq_domain *domain, struct device *dev,
 		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
 		return 0;
 	case DOMAIN_BUS_PCI_DEVICE_MSIX:
+	case DOMAIN_BUS_PCI_DEVICE_IMS:
 		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
 		return 0;
 	default:
@@ -230,6 +231,10 @@ static bool x86_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 	case DOMAIN_BUS_PCI_DEVICE_MSI:
 	case DOMAIN_BUS_PCI_DEVICE_MSIX:
 		break;
+	case DOMAIN_BUS_PCI_DEVICE_IMS:
+		if (!(pops->supported_flags & MSI_FLAG_PCI_IMS))
+			return false;
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		return false;

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Provide pci_ims_alloc/free_irq()
  2022-11-24 23:26 ` [patch V3 29/33] PCI/MSI: Provide pci_ims_alloc/free_irq() Thomas Gleixner
  2022-11-28  4:47   ` Tian, Kevin
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Bjorn Helgaas, Marc Zyngier, x86,
	linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     927889e50cc1f5364ae9ebc2065734dbdfa34362
Gitweb:        https://git.kernel.org/tip/927889e50cc1f5364ae9ebc2065734dbdfa34362
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:31 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:04 +01:00

PCI/MSI: Provide pci_ims_alloc/free_irq()

Single vector allocation which allocates the next free index in the IMS
space. The free function releases.

All allocated vectors are released also via pci_free_vectors() which is
also releasing MSI/MSI-X vectors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.961711347@linutronix.de

---
 drivers/pci/msi/api.c | 50 ++++++++++++++++++++++++++++++++++++++++++-
 include/linux/pci.h   |  3 +++-
 2 files changed, 53 insertions(+)

diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
index c8816db..b8009aa 100644
--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -366,6 +366,56 @@ const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
 EXPORT_SYMBOL(pci_irq_get_affinity);
 
 /**
+ * pci_ims_alloc_irq - Allocate an interrupt on a PCI/IMS interrupt domain
+ * @dev:	The PCI device to operate on
+ * @icookie:	Pointer to an IMS implementation specific cookie for this
+ *		IMS instance (PASID, queue ID, pointer...).
+ *		The cookie content is copied into the MSI descriptor for the
+ *		interrupt chip callbacks or domain specific setup functions.
+ * @affdesc:	Optional pointer to an interrupt affinity descriptor
+ *
+ * There is no index for IMS allocations as IMS is an implementation
+ * specific storage and does not have any direct associations between
+ * index, which might be a pure software construct, and device
+ * functionality. This association is established by the driver either via
+ * the index - if there is a hardware table - or in case of purely software
+ * managed IMS implementation the association happens via the
+ * irq_write_msi_msg() callback of the implementation specific interrupt
+ * chip, which utilizes the provided @icookie to store the MSI message in
+ * the appropriate place.
+ *
+ * Return: A struct msi_map
+ *
+ *	On success msi_map::index contains the allocated index (>= 0) and
+ *	msi_map::virq the allocated Linux interrupt number (> 0).
+ *
+ *	On fail msi_map::index contains the error code and msi_map::virq
+ *	is set to 0.
+ */
+struct msi_map pci_ims_alloc_irq(struct pci_dev *dev, union msi_instance_cookie *icookie,
+				 const struct irq_affinity_desc *affdesc)
+{
+	return msi_domain_alloc_irq_at(&dev->dev, MSI_SECONDARY_DOMAIN, MSI_ANY_INDEX,
+				       affdesc, icookie);
+}
+EXPORT_SYMBOL_GPL(pci_ims_alloc_irq);
+
+/**
+ * pci_ims_free_irq - Allocate an interrupt on a PCI/IMS interrupt domain
+ *		      which was allocated via pci_ims_alloc_irq()
+ * @dev:	The PCI device to operate on
+ * @map:	A struct msi_map describing the interrupt to free as
+ *		returned from pci_ims_alloc_irq()
+ */
+void pci_ims_free_irq(struct pci_dev *dev, struct msi_map map)
+{
+	if (WARN_ON_ONCE(map.index < 0 || map.virq <= 0))
+		return;
+	msi_domain_free_irqs_range(&dev->dev, MSI_SECONDARY_DOMAIN, map.index, map.index);
+}
+EXPORT_SYMBOL_GPL(pci_ims_free_irq);
+
+/**
  * pci_free_irq_vectors() - Free previously allocated IRQs for a device
  * @dev: the PCI device to operate on
  *
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1592b63..aa514b5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2491,6 +2491,9 @@ struct msi_domain_template;
 
 bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
 			   unsigned int hwsize, void *data);
+struct msi_map pci_ims_alloc_irq(struct pci_dev *pdev, union msi_instance_cookie *icookie,
+				 const struct irq_affinity_desc *affdesc);
+void pci_ims_free_irq(struct pci_dev *pdev, struct msi_map map);
 
 #include <linux/dma-mapping.h>
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide constants for PCI/IMS support
  2022-11-24 23:26 ` [patch V3 27/33] genirq/msi: Provide constants for PCI/IMS support Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     52a50e8785c28723c15867a07c5d5ee3b2ed1c25
Gitweb:        https://git.kernel.org/tip/52a50e8785c28723c15867a07c5d5ee3b2ed1c25
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:28 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:04 +01:00

genirq/msi: Provide constants for PCI/IMS support

Provide the necessary constants for PCI/IMS support:

  - A new bus token for MSI irqdomain identification
  - A MSI feature flag for the MSI irqdomains to signal support
  - A secondary domain id

The latter expands the device internal domain pointer storage array from 1
to 2 entries. That extra pointer is mostly unused today, but the
alternative solutions would not be free either and would introduce more
complexity all over the place. Trade the 8bytes for simplicity.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.846169830@linutronix.de

---
 include/linux/irqdomain_defs.h | 1 +
 include/linux/msi.h            | 2 ++
 include/linux/msi_api.h        | 1 +
 3 files changed, 4 insertions(+)

diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index 0b2d8a8..c29921f 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -25,6 +25,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_PCI_DEVICE_MSIX,
 	DOMAIN_BUS_DMAR,
 	DOMAIN_BUS_AMDVI,
+	DOMAIN_BUS_PCI_DEVICE_IMS,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 3cb1586..a112b91 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -559,6 +559,8 @@ enum {
 	MSI_FLAG_MSIX_CONTIGUOUS	= (1 << 19),
 	/* PCI/MSI-X vectors can be dynamically allocated/freed post MSI-X enable */
 	MSI_FLAG_PCI_MSIX_ALLOC_DYN	= (1 << 20),
+	/* Support for PCI/IMS */
+	MSI_FLAG_PCI_IMS		= (1 << 21),
 };
 
 /**
diff --git a/include/linux/msi_api.h b/include/linux/msi_api.h
index 5ae72d1..391087a 100644
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -15,6 +15,7 @@ struct device;
  */
 enum msi_domain_ids {
 	MSI_DEFAULT_DOMAIN,
+	MSI_SECONDARY_DOMAIN,
 	MSI_MAX_DEVICE_IRQDOMAINS,
 };
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN
  2022-11-24 23:26 ` [patch V3 26/33] x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     3ec4b570fe9fbd5ae91d354d7a157a11f7dcf714
Gitweb:        https://git.kernel.org/tip/3ec4b570fe9fbd5ae91d354d7a157a11f7dcf714
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:26 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:04 +01:00

x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN

x86 MSI irqdomains can handle MSI-X allocation post MSI-X enable just out
of the box - on the vector domain and on the remapping domains,

Add the feature flag to the supported feature list

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.787373104@linutronix.de

---
 arch/x86/include/asm/msi.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index 7702958..935c6d4 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -63,7 +63,7 @@ struct msi_msg;
 u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid);
 
 #define X86_VECTOR_MSI_FLAGS_SUPPORTED					\
-	(MSI_GENERIC_FLAGS_MASK | MSI_FLAG_PCI_MSIX)
+	(MSI_GENERIC_FLAGS_MASK | MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN)
 
 #define X86_VECTOR_MSI_FLAGS_REQUIRED					\
 	(MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS)

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Provide prepare_desc() MSI domain op
  2022-11-24 23:26 ` [patch V3 24/33] PCI/MSI: Provide prepare_desc() MSI domain op Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Jason Gunthorpe, Kevin Tian, Bjorn Helgaas,
	Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     d9abbfee95e4a80dfbdf7030d0541a315ee2879b
Gitweb:        https://git.kernel.org/tip/d9abbfee95e4a80dfbdf7030d0541a315ee2879b
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:23 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:04 +01:00

PCI/MSI: Provide prepare_desc() MSI domain op

The setup of MSI descriptors for PCI/MSI-X interrupts depends partially on
the MSI index for which the descriptor is initialized.

Dynamic MSI-X vector allocation post MSI-X enablement allows to allocate
vectors at a given index or at any free index in the available table
range. The latter requires that the descriptor is initialized after the
MSI core has chosen an index.

Implement the prepare_desc() op in the PCI/MSI-X specific msi_domain_ops
which is invoked before the core interrupt descriptor and the associated
Linux interrupt number is allocated.

That callback is also provided for the upcoming PCI/IMS implementations so
the implementation specific interrupt domain can do their domain specific
initialization of the MSI descriptors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.673658806@linutronix.de

---
 drivers/pci/msi/irqdomain.c |  9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index 4736403..8afaef1 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -202,6 +202,14 @@ static void pci_irq_unmask_msix(struct irq_data *data)
 	pci_msix_unmask(irq_data_get_msi_desc(data));
 }
 
+static void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
+				  struct msi_desc *desc)
+{
+	/* Don't fiddle with preallocated MSI descriptors */
+	if (!desc->pci.mask_base)
+		msix_prepare_msi_desc(to_pci_dev(desc->dev), desc);
+}
+
 static const struct msi_domain_template pci_msix_template = {
 	.chip = {
 		.name			= "PCI-MSIX",
@@ -212,6 +220,7 @@ static const struct msi_domain_template pci_msix_template = {
 	},
 
 	.ops = {
+		.prepare_desc		= pci_msix_prepare_desc,
 		.set_desc		= pci_device_domain_set_desc,
 	},
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Provide IMS (Interrupt Message Store) support
  2022-11-24 23:26 ` [patch V3 28/33] PCI/MSI: Provide IMS (Interrupt Message Store) support Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2024-03-27 16:32   ` [patch V3 28/33] " Bjorn Helgaas
  2 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     4c528684eaf6bf5aa9b4a481ae569729af772bfd
Gitweb:        https://git.kernel.org/tip/4c528684eaf6bf5aa9b4a481ae569729af772bfd
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:29 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:04 +01:00

PCI/MSI: Provide IMS (Interrupt Message Store) support

IMS (Interrupt Message Store) is a new specification which allows
implementation specific storage of MSI messages contrary to the
strict standard specified MSI and MSI-X message stores.

This requires new device specific interrupt domains to handle the
implementation defined storage which can be an array in device memory or
host/guest memory which is shared with hardware queues.

Add a function to create IMS domains for PCI devices. IMS domains are using
the new per device domain mechanism and are configured by the device driver
via a template. IMS domains are created as secondary device domains so they
work side on side with MSI[-X] on the same device.

The IMS domains have a few constraints:

  - The index space is managed by the core code.

    Device memory based IMS provides a storage array with a fixed size
    which obviously requires an index. But there is no association between
    index and functionality so the core can randomly allocate an index in
    the array.

    System memory based IMS does not have the concept of an index as the
    storage is somewhere in memory. In that case the index is purely
    software based to keep track of the allocations.

  - There is no requirement for consecutive index ranges

    This is currently a limitation of the MSI core and can be implemented
    if there is a justified use case by changing the internal storage from
    xarray to maple_tree. For now it's single vector allocation.

  - The interrupt chip must provide the following callbacks:

  	- irq_mask()
	- irq_unmask()
	- irq_write_msi_msg()

   - The interrupt chip must provide the following optional callbacks
     when the irq_mask(), irq_unmask() and irq_write_msi_msg() callbacks
     cannot operate directly on hardware, e.g. in the case that the
     interrupt message store is in queue memory:

     	- irq_bus_lock()
	- irq_bus_unlock()

     These callbacks are invoked from preemptible task context and are
     allowed to sleep. In this case the mandatory callbacks above just
     store the information. The irq_bus_unlock() callback is supposed to
     make the change effective before returning.

   - Interrupt affinity setting is handled by the underlying parent
     interrupt domain and communicated to the IMS domain via
     irq_write_msi_msg(). IMS domains cannot have a irq_set_affinity()
     callback. That's a reasonable restriction similar to the PCI/MSI
     device domain implementations.

The domain is automatically destroyed when the PCI device is removed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.904316841@linutronix.de

---
 drivers/pci/msi/irqdomain.c | 59 ++++++++++++++++++++++++++++++++++++-
 include/linux/pci.h         |  5 +++-
 2 files changed, 64 insertions(+)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index deb1930..e33bcc8 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -355,6 +355,65 @@ bool pci_msi_domain_supports(struct pci_dev *pdev, unsigned int feature_mask,
 	return (supported & feature_mask) == feature_mask;
 }
 
+/**
+ * pci_create_ims_domain - Create a secondary IMS domain for a PCI device
+ * @pdev:	The PCI device to operate on
+ * @template:	The MSI info template which describes the domain
+ * @hwsize:	The size of the hardware entry table or 0 if the domain
+ *		is purely software managed
+ * @data:	Optional pointer to domain specific data to be stored
+ *		in msi_domain_info::data
+ *
+ * Return: True on success, false otherwise
+ *
+ * An IMS domain is expected to have the following constraints:
+ *	- The index space is managed by the core code
+ *
+ *	- There is no requirement for consecutive index ranges
+ *
+ *	- The interrupt chip must provide the following callbacks:
+ *		- irq_mask()
+ *		- irq_unmask()
+ *		- irq_write_msi_msg()
+ *
+ *	- The interrupt chip must provide the following optional callbacks
+ *	  when the irq_mask(), irq_unmask() and irq_write_msi_msg() callbacks
+ *	  cannot operate directly on hardware, e.g. in the case that the
+ *	  interrupt message store is in queue memory:
+ *		- irq_bus_lock()
+ *		- irq_bus_unlock()
+ *
+ *	  These callbacks are invoked from preemptible task context and are
+ *	  allowed to sleep. In this case the mandatory callbacks above just
+ *	  store the information. The irq_bus_unlock() callback is supposed
+ *	  to make the change effective before returning.
+ *
+ *	- Interrupt affinity setting is handled by the underlying parent
+ *	  interrupt domain and communicated to the IMS domain via
+ *	  irq_write_msi_msg().
+ *
+ * The domain is automatically destroyed when the PCI device is removed.
+ */
+bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
+			   unsigned int hwsize, void *data)
+{
+	struct irq_domain *domain = dev_get_msi_domain(&pdev->dev);
+
+	if (!domain || !irq_domain_is_msi_parent(domain))
+		return false;
+
+	if (template->info.bus_token != DOMAIN_BUS_PCI_DEVICE_IMS ||
+	    !(template->info.flags & MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS) ||
+	    !(template->info.flags & MSI_FLAG_FREE_MSI_DESCS) ||
+	    !template->chip.irq_mask || !template->chip.irq_unmask ||
+	    !template->chip.irq_write_msi_msg || template->chip.irq_set_affinity)
+		return false;
+
+	return msi_create_device_irq_domain(&pdev->dev, MSI_SECONDARY_DOMAIN, template,
+					    hwsize, data, NULL);
+}
+EXPORT_SYMBOL_GPL(pci_create_ims_domain);
+
 /*
  * Users of the generic MSI infrastructure expect a device to have a single ID,
  * so with DMA aliases we have to pick the least-worst compromise. Devices with
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 68b14ba..1592b63 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2487,6 +2487,11 @@ static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
 void pci_uevent_ers(struct pci_dev *pdev, enum  pci_ers_result err_type);
 #endif
 
+struct msi_domain_template;
+
+bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
+			   unsigned int hwsize, void *data);
+
 #include <linux/dma-mapping.h>
 
 #define pci_printk(level, pdev, fmt, arg...) \

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN
  2022-11-24 23:26 ` [patch V3 22/33] genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Jason Gunthorpe, Kevin Tian, Marc Zyngier, x86,
	linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     656013915af76b199827f26e18776d897d2b7e7e
Gitweb:        https://git.kernel.org/tip/656013915af76b199827f26e18776d897d2b7e7e
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:20 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:03 +01:00

genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN

Provide a new MSI feature flag in preparation for dynamic MSIX allocation
after the initial MSI-X enable has been done.

This needs to be an explicit MSI interrupt domain feature because quite
some implementations (both interrupt domains and legacy allocation mode)
have clear expectations that the allocation code is only invoked when MSI-X
is about to be enabled. They either talk to hypervisors or do some other
work and are not prepared to be invoked on an already MSI-X enabled device.

This is also explicit MSI-X only because rewriting the size of the MSI
entries is only possible when disabling MSI which in turn might cause lost
interrupts on the device.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.558843119@linutronix.de

---
 include/linux/msi.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 00c5019..3cb1586 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -557,7 +557,8 @@ enum {
 	MSI_FLAG_LEVEL_CAPABLE		= (1 << 18),
 	/* MSI-X entries must be contiguous */
 	MSI_FLAG_MSIX_CONTIGUOUS	= (1 << 19),
-
+	/* PCI/MSI-X vectors can be dynamically allocated/freed post MSI-X enable */
+	MSI_FLAG_PCI_MSIX_ALLOC_DYN	= (1 << 20),
 };
 
 /**

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Split MSI-X descriptor setup
  2022-11-24 23:26 ` [patch V3 23/33] PCI/MSI: Split MSI-X descriptor setup Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Bjorn Helgaas, Marc Zyngier, x86,
	linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     4b844ea1a6bab042be5644a4e88e2fc19bbe853f
Gitweb:        https://git.kernel.org/tip/4b844ea1a6bab042be5644a4e88e2fc19bbe853f
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:21 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:03 +01:00

PCI/MSI: Split MSI-X descriptor setup

The upcoming mechanism to allocate MSI-X vectors after enabling MSI-X needs
to share some of the MSI-X descriptor setup.

The regular descriptor setup on enable has the following code flow:

    1) Allocate descriptor
    2) Setup descriptor with PCI specific data
    3) Insert descriptor
    4) Allocate interrupts which in turn scans the inserted
       descriptors

This cannot be easily changed because the PCI/MSI code needs to handle the
legacy architecture specific allocation model and the irq domain model
where quite some domains have the assumption that the above flow is how it
works.

Ideally the code flow should look like this:

   1) Invoke allocation at the MSI core
   2) MSI core allocates descriptor
   3) MSI core calls back into the irq domain which fills in
      the domain specific parts

This could be done for underlying parent MSI domains which support
post-enable allocation/free but that would create significantly different
code pathes for MSI/MSI-X enable.

Though for dynamic allocation which wants to share the allocation code with
the upcoming PCI/IMS support it's the right thing to do.

Split the MSI-X descriptor setup into the preallocation part which just sets
the index and fills in the horrible hack of virtual IRQs and the real PCI
specific MSI-X setup part which solely depends on the index in the
descriptor. This allows to provide a common dynamic allocation interface at
the MSI core level for both PCI/MSI-X and PCI/IMS.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.616292598@linutronix.de

---
 drivers/pci/msi/msi.c | 72 ++++++++++++++++++++++++++----------------
 drivers/pci/msi/msi.h |  2 +-
 2 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index b8d74df..1f71662 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -569,34 +569,56 @@ static void __iomem *msix_map_region(struct pci_dev *dev,
 	return ioremap(phys_addr, nr_entries * PCI_MSIX_ENTRY_SIZE);
 }
 
-static int msix_setup_msi_descs(struct pci_dev *dev, void __iomem *base,
-				struct msix_entry *entries, int nvec,
-				struct irq_affinity_desc *masks)
+/**
+ * msix_prepare_msi_desc - Prepare a half initialized MSI descriptor for operation
+ * @dev:	The PCI device for which the descriptor is prepared
+ * @desc:	The MSI descriptor for preparation
+ *
+ * This is separate from msix_setup_msi_descs() below to handle dynamic
+ * allocations for MSI-X after initial enablement.
+ *
+ * Ideally the whole MSI-X setup would work that way, but there is no way to
+ * support this for the legacy arch_setup_msi_irqs() mechanism and for the
+ * fake irq domains like the x86 XEN one. Sigh...
+ *
+ * The descriptor is zeroed and only @desc::msi_index and @desc::affinity
+ * are set. When called from msix_setup_msi_descs() then the is_virtual
+ * attribute is initialized as well.
+ *
+ * Fill in the rest.
+ */
+void msix_prepare_msi_desc(struct pci_dev *dev, struct msi_desc *desc)
+{
+	desc->nvec_used				= 1;
+	desc->pci.msi_attrib.is_msix		= 1;
+	desc->pci.msi_attrib.is_64		= 1;
+	desc->pci.msi_attrib.default_irq	= dev->irq;
+	desc->pci.mask_base			= dev->msix_base;
+	desc->pci.msi_attrib.can_mask		= !pci_msi_ignore_mask &&
+						  !desc->pci.msi_attrib.is_virtual;
+
+	if (desc->pci.msi_attrib.can_mask) {
+		void __iomem *addr = pci_msix_desc_addr(desc);
+
+		desc->pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
+	}
+}
+
+static int msix_setup_msi_descs(struct pci_dev *dev, struct msix_entry *entries,
+				int nvec, struct irq_affinity_desc *masks)
 {
 	int ret = 0, i, vec_count = pci_msix_vec_count(dev);
 	struct irq_affinity_desc *curmsk;
 	struct msi_desc desc;
-	void __iomem *addr;
 
 	memset(&desc, 0, sizeof(desc));
 
-	desc.nvec_used			= 1;
-	desc.pci.msi_attrib.is_msix	= 1;
-	desc.pci.msi_attrib.is_64	= 1;
-	desc.pci.msi_attrib.default_irq	= dev->irq;
-	desc.pci.mask_base		= base;
-
 	for (i = 0, curmsk = masks; i < nvec; i++, curmsk++) {
 		desc.msi_index = entries ? entries[i].entry : i;
 		desc.affinity = masks ? curmsk : NULL;
 		desc.pci.msi_attrib.is_virtual = desc.msi_index >= vec_count;
-		desc.pci.msi_attrib.can_mask = !pci_msi_ignore_mask &&
-					       !desc.pci.msi_attrib.is_virtual;
 
-		if (desc.pci.msi_attrib.can_mask) {
-			addr = pci_msix_desc_addr(&desc);
-			desc.pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
-		}
+		msix_prepare_msi_desc(dev, &desc);
 
 		ret = msi_insert_msi_desc(&dev->dev, &desc);
 		if (ret)
@@ -629,9 +651,8 @@ static void msix_mask_all(void __iomem *base, int tsize)
 		writel(ctrl, base + PCI_MSIX_ENTRY_VECTOR_CTRL);
 }
 
-static int msix_setup_interrupts(struct pci_dev *dev, void __iomem *base,
-				 struct msix_entry *entries, int nvec,
-				 struct irq_affinity *affd)
+static int msix_setup_interrupts(struct pci_dev *dev, struct msix_entry *entries,
+				 int nvec, struct irq_affinity *affd)
 {
 	struct irq_affinity_desc *masks = NULL;
 	int ret;
@@ -640,7 +661,7 @@ static int msix_setup_interrupts(struct pci_dev *dev, void __iomem *base,
 		masks = irq_create_affinity_masks(nvec, affd);
 
 	msi_lock_descs(&dev->dev);
-	ret = msix_setup_msi_descs(dev, base, entries, nvec, masks);
+	ret = msix_setup_msi_descs(dev, entries, nvec, masks);
 	if (ret)
 		goto out_free;
 
@@ -678,7 +699,6 @@ out_unlock:
 static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 				int nvec, struct irq_affinity *affd)
 {
-	void __iomem *base;
 	int ret, tsize;
 	u16 control;
 
@@ -696,15 +716,13 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control);
 	/* Request & Map MSI-X table region */
 	tsize = msix_table_size(control);
-	base = msix_map_region(dev, tsize);
-	if (!base) {
+	dev->msix_base = msix_map_region(dev, tsize);
+	if (!dev->msix_base) {
 		ret = -ENOMEM;
 		goto out_disable;
 	}
 
-	dev->msix_base = base;
-
-	ret = msix_setup_interrupts(dev, base, entries, nvec, affd);
+	ret = msix_setup_interrupts(dev, entries, nvec, affd);
 	if (ret)
 		goto out_disable;
 
@@ -719,7 +737,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	 * which takes the MSI-X mask bits into account even
 	 * when MSI-X is disabled, which prevents MSI delivery.
 	 */
-	msix_mask_all(base, tsize);
+	msix_mask_all(dev->msix_base, tsize);
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 
 	pcibios_free_irq(dev);
diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
index 74408cc..ee53cf0 100644
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -84,6 +84,8 @@ static inline __attribute_const__ u32 msi_multi_mask(struct msi_desc *desc)
 	return (1 << (1 << desc->pci.msi_attrib.multi_cap)) - 1;
 }
 
+void msix_prepare_msi_desc(struct pci_dev *dev, struct msi_desc *desc);
+
 /* Subsystem variables */
 extern int pci_msi_enable;
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide msi_domain_alloc_irq_at()
  2022-11-24 23:26 ` [patch V3 21/33] genirq/msi: Provide msi_domain_alloc_irq_at() Thomas Gleixner
  2022-11-28 14:39   ` Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     37fdc15ffe05c0734c30eadb06fdec7a1dbb2702
Gitweb:        https://git.kernel.org/tip/37fdc15ffe05c0734c30eadb06fdec7a1dbb2702
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:18 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:03 +01:00

genirq/msi: Provide msi_domain_alloc_irq_at()

For supporting post MSI-X enable allocations and for the upcoming PCI/IMS
support a separate interface is required which allows not only the
allocation of a specific index, but also the allocation of any, i.e. the
next free index. The latter is especially required for IMS because IMS
completely does away with index to functionality mappings which are
often found in MSI/MSI-X implementation.

But even with MSI-X there are devices where only the first few indices have
a fixed functionality and the rest is freely assignable by software,
e.g. to queues.

msi_domain_alloc_irq_at() is also different from the range based interfaces
as it always enforces that the MSI descriptor is allocated by the core code
and not preallocated by the caller like the PCI/MSI[-X] enable code path
does.

msi_domain_alloc_irq_at() can be invoked with the index argument set to
MSI_ANY_INDEX which makes the core code pick the next free index. The irq
domain can provide a prepare_desc() operation callback in it's
msi_domain_ops to do domain specific post allocation initialization before
the actual Linux interrupt and the associated interrupt descriptor and
hierarchy alloccations are conducted.

The function also takes an optional @icookie argument which is of type
union msi_instance_cookie. This cookie is not used by the core code and is
stored in the allocated msi_desc::data::icookie. The meaning of the cookie
is completely implementation defined. In case of IMS this might be a PASID
or a pointer to a device queue, but for the MSI core it's opaque and not
used in any way.

The function returns a struct msi_map which on success contains the
allocated index number and the Linux interrupt number so the caller can
spare the index to Linux interrupt number lookup.

On failure map::index contains the error code and map::virq is 0.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.501359457@linutronix.de

---
 include/linux/msi.h     |   4 +-
 include/linux/msi_api.h |   7 +++-
 kernel/irq/msi.c        | 105 +++++++++++++++++++++++++++++++++++----
 3 files changed, 106 insertions(+), 10 deletions(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index cb0bee3..00c5019 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -80,6 +80,7 @@ struct pci_dev;
 struct platform_msi_priv_data;
 struct device_attribute;
 struct irq_domain;
+struct irq_affinity_desc;
 
 void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg);
 #ifdef CONFIG_GENERIC_MSI_IRQ
@@ -602,6 +603,9 @@ int msi_domain_alloc_irqs_range(struct device *dev, unsigned int domid,
 				unsigned int first, unsigned int last);
 int msi_domain_alloc_irqs_all_locked(struct device *dev, unsigned int domid, int nirqs);
 
+struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index,
+				       const struct irq_affinity_desc *affdesc,
+				       union msi_instance_cookie *cookie);
 
 void msi_domain_free_irqs_range_locked(struct device *dev, unsigned int domid,
 				       unsigned int first, unsigned int last);
diff --git a/include/linux/msi_api.h b/include/linux/msi_api.h
index 2e4456e..5ae72d1 100644
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -48,6 +48,13 @@ struct msi_map {
 	int	virq;
 };
 
+/*
+ * Constant to be used for dynamic allocations when the allocation is any
+ * free MSI index, which is either an entry in a hardware table or a
+ * software managed index.
+ */
+#define MSI_ANY_INDEX		UINT_MAX
+
 unsigned int msi_domain_get_virq(struct device *dev, unsigned int domid, unsigned int index);
 
 /**
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 077d1d1..73354c5 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -90,17 +90,30 @@ static int msi_insert_desc(struct device *dev, struct msi_desc *desc,
 	int ret;
 
 	hwsize = msi_domain_get_hwsize(dev, domid);
-	if (index >= hwsize) {
-		ret = -ERANGE;
-		goto fail;
-	}
 
-	desc->msi_index = index;
-	ret = xa_insert(xa, index, desc, GFP_KERNEL);
-	if (ret)
-		goto fail;
-	return 0;
+	if (index == MSI_ANY_INDEX) {
+		struct xa_limit limit = { .min = 0, .max = hwsize - 1 };
+		unsigned int index;
 
+		/* Let the xarray allocate a free index within the limit */
+		ret = xa_alloc(xa, &index, desc, limit, GFP_KERNEL);
+		if (ret)
+			goto fail;
+
+		desc->msi_index = index;
+		return 0;
+	} else {
+		if (index >= hwsize) {
+			ret = -ERANGE;
+			goto fail;
+		}
+
+		desc->msi_index = index;
+		ret = xa_insert(xa, index, desc, GFP_KERNEL);
+		if (ret)
+			goto fail;
+		return 0;
+	}
 fail:
 	msi_free_desc(desc);
 	return ret;
@@ -294,7 +307,7 @@ int msi_setup_device_data(struct device *dev)
 	}
 
 	for (i = 0; i < MSI_MAX_DEVICE_IRQDOMAINS; i++)
-		xa_init(&md->__domains[i].store);
+		xa_init_flags(&md->__domains[i].store, XA_FLAGS_ALLOC);
 
 	/*
 	 * If @dev::msi::domain is set and is a global MSI domain, copy the
@@ -1405,6 +1418,78 @@ int msi_domain_alloc_irqs_all_locked(struct device *dev, unsigned int domid, int
 	return msi_domain_alloc_locked(dev, &ctrl);
 }
 
+/**
+ * msi_domain_alloc_irq_at - Allocate an interrupt from a MSI interrupt domain at
+ *			     a given index - or at the next free index
+ *
+ * @dev:	Pointer to device struct of the device for which the interrupts
+ *		are allocated
+ * @domid:	Id of the interrupt domain to operate on
+ * @index:	Index for allocation. If @index == %MSI_ANY_INDEX the allocation
+ *		uses the next free index.
+ * @affdesc:	Optional pointer to an interrupt affinity descriptor structure
+ * @icookie:	Optional pointer to a domain specific per instance cookie. If
+ *		non-NULL the content of the cookie is stored in msi_desc::data.
+ *		Must be NULL for MSI-X allocations
+ *
+ * This requires a MSI interrupt domain which lets the core code manage the
+ * MSI descriptors.
+ *
+ * Return: struct msi_map
+ *
+ *	On success msi_map::index contains the allocated index number and
+ *	msi_map::virq the corresponding Linux interrupt number
+ *
+ *	On failure msi_map::index contains the error code and msi_map::virq
+ *	is %0.
+ */
+struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index,
+				       const struct irq_affinity_desc *affdesc,
+				       union msi_instance_cookie *icookie)
+{
+	struct msi_ctrl ctrl = { .domid	= domid, .nirqs = 1, };
+	struct irq_domain *domain;
+	struct msi_map map = { };
+	struct msi_desc *desc;
+	int ret;
+
+	msi_lock_descs(dev);
+	domain = msi_get_device_domain(dev, domid);
+	if (!domain) {
+		map.index = -ENODEV;
+		goto unlock;
+	}
+
+	desc = msi_alloc_desc(dev, 1, affdesc);
+	if (!desc) {
+		map.index = -ENOMEM;
+		goto unlock;
+	}
+
+	if (icookie)
+		desc->data.icookie = *icookie;
+
+	ret = msi_insert_desc(dev, desc, domid, index);
+	if (ret) {
+		map.index = ret;
+		goto unlock;
+	}
+
+	ctrl.first = ctrl.last = desc->msi_index;
+
+	ret = __msi_domain_alloc_irqs(dev, domain, &ctrl);
+	if (ret) {
+		map.index = ret;
+		msi_domain_free_locked(dev, &ctrl);
+	} else {
+		map.index = desc->msi_index;
+		map.virq = desc->irq;
+	}
+unlock:
+	msi_unlock_descs(dev);
+	return map;
+}
+
 static void __msi_domain_free_irqs(struct device *dev, struct irq_domain *domain,
 				   struct msi_ctrl *ctrl)
 {

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide msi_domain_ops:: Prepare_desc()
  2022-11-24 23:26 ` [patch V3 20/33] genirq/msi: Provide msi_domain_ops::prepare_desc() Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     828b3c25195d7681ea894c70057324c673755dfc
Gitweb:        https://git.kernel.org/tip/828b3c25195d7681ea894c70057324c673755dfc
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:16 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:03 +01:00

genirq/msi: Provide msi_domain_ops:: Prepare_desc()

The existing MSI domain ops msi_prepare() and set_desc() turned out to be
unsuitable for implementing IMS support.

msi_prepare() does not operate on the MSI descriptors. set_desc() lacks
an irq_domain pointer and has a completely different purpose.

Introduce a prepare_desc() op which allows IMS implementations to amend an
MSI descriptor which was allocated by the core code, e.g. by adjusting the
iomem base or adding some data based on the allocated index. This is way
better than requiring that all IMS domain implementations preallocate the
MSI descriptor and then allocate the interrupt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.444560717@linutronix.de

---
 include/linux/msi.h | 6 +++++-
 kernel/irq/msi.c    | 3 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index dca3b80..cb0bee3 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -410,6 +410,8 @@ struct msi_domain_info;
  * @msi_init:		Domain specific init function for MSI interrupts
  * @msi_free:		Domain specific function to free a MSI interrupts
  * @msi_prepare:	Prepare the allocation of the interrupts in the domain
+ * @prepare_desc:	Optional function to prepare the allocated MSI descriptor
+ *			in the domain
  * @set_desc:		Set the msi descriptor for an interrupt
  * @domain_alloc_irqs:	Optional function to override the default allocation
  *			function.
@@ -421,7 +423,7 @@ struct msi_domain_info;
  * @get_hwirq, @msi_init and @msi_free are callbacks used by the underlying
  * irqdomain.
  *
- * @msi_check, @msi_prepare and @set_desc are callbacks used by the
+ * @msi_check, @msi_prepare, @prepare_desc and @set_desc are callbacks used by the
  * msi_domain_alloc/free_irqs*() variants.
  *
  * @domain_alloc_irqs, @domain_free_irqs can be used to override the
@@ -444,6 +446,8 @@ struct msi_domain_ops {
 	int		(*msi_prepare)(struct irq_domain *domain,
 				       struct device *dev, int nvec,
 				       msi_alloc_info_t *arg);
+	void		(*prepare_desc)(struct irq_domain *domain, msi_alloc_info_t *arg,
+					struct msi_desc *desc);
 	void		(*set_desc)(msi_alloc_info_t *arg,
 				    struct msi_desc *desc);
 	int		(*domain_alloc_irqs)(struct irq_domain *domain,
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 0536db7..077d1d1 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1257,6 +1257,9 @@ static int __msi_domain_alloc_irqs(struct device *dev, struct irq_domain *domain
 		if (WARN_ON_ONCE(allocated >= ctrl->nirqs))
 			return -EINVAL;
 
+		if (ops->prepare_desc)
+			ops->prepare_desc(domain, &arg, desc);
+
 		ops->set_desc(&arg, desc);
 
 		virq = __irq_domain_alloc_irqs(domain, -1, desc->nvec_used,

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide struct msi_map
  2022-11-24 23:26 ` [patch V3 18/33] genirq/msi: Provide struct msi_map Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     055b6b34405d0c064a3dbd2531c8fef60a64e059
Gitweb:        https://git.kernel.org/tip/055b6b34405d0c064a3dbd2531c8fef60a64e059
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:13 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:03 +01:00

genirq/msi: Provide struct msi_map

A simple struct to hold a MSI index / Linux interrupt number pair. It will
be returned from the dynamic vector allocation function and handed back to
the corresponding free() function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.326410494@linutronix.de

---
 include/linux/msi_api.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/msi_api.h b/include/linux/msi_api.h
index 8640171..4cb7f4c 100644
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -18,6 +18,19 @@ enum msi_domain_ids {
 	MSI_MAX_DEVICE_IRQDOMAINS,
 };
 
+/**
+ * msi_map - Mapping between MSI index and Linux interrupt number
+ * @index:	The MSI index, e.g. slot in the MSI-X table or
+ *		a software managed index if >= 0. If negative
+ *		the allocation function failed and it contains
+ *		the error code.
+ * @virq:	The associated Linux interrupt number
+ */
+struct msi_map {
+	int	index;
+	int	virq;
+};
+
 unsigned int msi_domain_get_virq(struct device *dev, unsigned int domid, unsigned int index);
 
 /**

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide msi_desc:: Msi_data
  2022-11-24 23:26 ` [patch V3 19/33] genirq/msi: Provide msi_desc::msi_data Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     2f70cc0b67c0d71abd5ffa4a8de33277308c2034
Gitweb:        https://git.kernel.org/tip/2f70cc0b67c0d71abd5ffa4a8de33277308c2034
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:15 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:03 +01:00

genirq/msi: Provide msi_desc:: Msi_data

The upcoming support for PCI/IMS requires to store some information related
to the message handling in the MSI descriptor, e.g. PASID or a pointer to a
queue.

Provide a generic storage struct which maps over the existing PCI specific
storage which means the size of struct msi_desc is not getting bigger.

This storage struct has two elements:

  1) msi_domain_cookie
  2) msi_instance_cookie

The domain cookie is going to be used to store domain specific information,
e.g. iobase pointer, data pointer.

The instance cookie is going to be handed in when allocating an interrupt
on an IMS domain so the irq chip callbacks of the IMS domain have the
necessary per vector information available. It also comes in handy when
cleaning up the platform MSI code for wire to MSI bridges which need to
hand down the type information to the underlying interrupt domain.

For the core code the cookies are opaque and meaningless. It just stores
the instance cookie during an allocation through the upcoming interfaces
for IMS and wire to MSI brigdes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.385036043@linutronix.de

---
 include/linux/msi.h     | 38 +++++++++++++++++++++++++++++++++++++-
 include/linux/msi_api.h | 17 +++++++++++++++++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index b5dda4b..dca3b80 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -125,6 +125,38 @@ struct pci_msi_desc {
 	};
 };
 
+/**
+ * union msi_domain_cookie - Opaque MSI domain specific data
+ * @value:	u64 value store
+ * @ptr:	Pointer to domain specific data
+ * @iobase:	Domain specific IOmem pointer
+ *
+ * The content of this data is implementation defined and used by the MSI
+ * domain to store domain specific information which is requried for
+ * interrupt chip callbacks.
+ */
+union msi_domain_cookie {
+	u64	value;
+	void	*ptr;
+	void	__iomem *iobase;
+};
+
+/**
+ * struct msi_desc_data - Generic MSI descriptor data
+ * @dcookie:	Cookie for MSI domain specific data which is required
+ *		for irq_chip callbacks
+ * @icookie:	Cookie for the MSI interrupt instance provided by
+ *		the usage site to the allocation function
+ *
+ * The content of this data is implementation defined, e.g. PCI/IMS
+ * implementations define the meaning of the data. The MSI core ignores
+ * this data completely.
+ */
+struct msi_desc_data {
+	union msi_domain_cookie		dcookie;
+	union msi_instance_cookie	icookie;
+};
+
 #define MSI_MAX_INDEX		((unsigned int)USHRT_MAX)
 
 /**
@@ -142,6 +174,7 @@ struct pci_msi_desc {
  *
  * @msi_index:	Index of the msi descriptor
  * @pci:	PCI specific msi descriptor data
+ * @data:	Generic MSI descriptor data
  */
 struct msi_desc {
 	/* Shared device/bus type independent data */
@@ -161,7 +194,10 @@ struct msi_desc {
 	void *write_msi_msg_data;
 
 	u16				msi_index;
-	struct pci_msi_desc		pci;
+	union {
+		struct pci_msi_desc	pci;
+		struct msi_desc_data	data;
+	};
 };
 
 /*
diff --git a/include/linux/msi_api.h b/include/linux/msi_api.h
index 4cb7f4c..2e4456e 100644
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -19,6 +19,23 @@ enum msi_domain_ids {
 };
 
 /**
+ * union msi_instance_cookie - MSI instance cookie
+ * @value:	u64 value store
+ * @ptr:	Pointer to usage site specific data
+ *
+ * This cookie is handed to the IMS allocation function and stored in the
+ * MSI descriptor for the interrupt chip callbacks.
+ *
+ * The content of this cookie is MSI domain implementation defined.  For
+ * PCI/IMS implementations this could be a PASID or a pointer to queue
+ * memory.
+ */
+union msi_instance_cookie {
+	u64	value;
+	void	*ptr;
+};
+
+/**
  * msi_map - Mapping between MSI index and Linux interrupt number
  * @index:	The MSI index, e.g. slot in the MSI-X table or
  *		a software managed index if >= 0. If negative

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
  2022-11-24 23:26 ` [patch V3 17/33] x86/apic/msi: Remove arch_create_remap_msi_irq_domain() Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     3d81f920bf95dbf2911a87b9ca7e0167525ea325
Gitweb:        https://git.kernel.org/tip/3d81f920bf95dbf2911a87b9ca7e0167525ea325
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:12 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:03 +01:00

x86/apic/msi: Remove arch_create_remap_msi_irq_domain()

and related code which is not longer required now that the interrupt remap
code has been converted to MSI parent domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.267353814@linutronix.de

---
 arch/x86/include/asm/irq_remapping.h |  4 +---
 arch/x86/kernel/apic/msi.c           | 42 +---------------------------
 2 files changed, 1 insertion(+), 45 deletions(-)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 7cc4943..7a2ed15 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -44,10 +44,6 @@ extern int irq_remapping_reenable(int);
 extern int irq_remap_enable_fault_handling(void);
 extern void panic_if_irq_remap(const char *msg);
 
-/* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
-extern struct irq_domain *
-arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id);
-
 /* Get parent irqdomain for interrupt remapping irqdomain */
 static inline struct irq_domain *arch_get_ir_parent_domain(void)
 {
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index d198da3..682f51a 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -277,7 +277,7 @@ void __init x86_create_pci_msi_domain(void)
 	x86_pci_msi_default_domain = x86_init.irqs.create_pci_msi_domain();
 }
 
-/* Keep around for hyperV and the remap code below */
+/* Keep around for hyperV */
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 		    msi_alloc_info_t *arg)
 {
@@ -291,46 +291,6 @@ int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 }
 EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
-#ifdef CONFIG_IRQ_REMAP
-static struct msi_domain_ops pci_msi_domain_ops = {
-	.msi_prepare	= pci_msi_prepare,
-};
-
-static struct irq_chip pci_msi_ir_controller = {
-	.name			= "IR-PCI-MSI",
-	.irq_unmask		= pci_msi_unmask_irq,
-	.irq_mask		= pci_msi_mask_irq,
-	.irq_ack		= irq_chip_ack_parent,
-	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.flags			= IRQCHIP_SKIP_SET_WAKE |
-				  IRQCHIP_AFFINITY_PRE_STARTUP,
-};
-
-static struct msi_domain_info pci_msi_ir_domain_info = {
-	.flags		= MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
-			  MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX,
-	.ops		= &pci_msi_domain_ops,
-	.chip		= &pci_msi_ir_controller,
-	.handler	= handle_edge_irq,
-	.handler_name	= "edge",
-};
-
-struct irq_domain *arch_create_remap_msi_irq_domain(struct irq_domain *parent,
-						    const char *name, int id)
-{
-	struct fwnode_handle *fn;
-	struct irq_domain *d;
-
-	fn = irq_domain_alloc_named_id_fwnode(name, id);
-	if (!fn)
-		return NULL;
-	d = pci_msi_create_irq_domain(fn, &pci_msi_ir_domain_info, parent);
-	if (!d)
-		irq_domain_free_fwnode(fn);
-	return d;
-}
-#endif
-
 #ifdef CONFIG_DMAR_TABLE
 /*
  * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] iommu/amd: Switch to MSI base domains
  2022-11-24 23:26 ` [patch V3 16/33] iommu/amd: Switch to MSI base domains Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     eb7395d58b1c2c8d5ff76bc2b102d4300a68a67b
Gitweb:        https://git.kernel.org/tip/eb7395d58b1c2c8d5ff76bc2b102d4300a68a67b
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:10 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:02 +01:00

iommu/amd: Switch to MSI base domains

Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.209212272@linutronix.de

---
 arch/x86/kernel/apic/msi.c          |  1 +
 drivers/iommu/amd/amd_iommu_types.h |  1 -
 drivers/iommu/amd/iommu.c           | 19 +++++++++++++------
 include/linux/irqdomain_defs.h      |  1 +
 4 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index a8dccb0..d198da3 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -218,6 +218,7 @@ static bool x86_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 		info->flags |= MSI_FLAG_NOMASK_QUIRK;
 		break;
 	case DOMAIN_BUS_DMAR:
+	case DOMAIN_BUS_AMDVI:
 		break;
 	default:
 		WARN_ON_ONCE(1);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 1d0a70c..3d68419 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -734,7 +734,6 @@ struct amd_iommu {
 	u8 max_counters;
 #ifdef CONFIG_IRQ_REMAP
 	struct irq_domain *ir_domain;
-	struct irq_domain *msi_domain;
 
 	struct amd_irte_ops *irte_ops;
 #endif
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 67e209c..7caccd8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -815,7 +815,7 @@ amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu)
 	    !pci_dev_has_default_msi_parent_domain(to_pci_dev(dev)))
 		return;
 
-	dev_set_msi_domain(dev, iommu->msi_domain);
+	dev_set_msi_domain(dev, iommu->ir_domain);
 }
 
 #else /* CONFIG_IRQ_REMAP */
@@ -3648,6 +3648,12 @@ static struct irq_chip amd_ir_chip = {
 	.irq_compose_msi_msg	= ir_compose_msi_msg,
 };
 
+static const struct msi_parent_ops amdvi_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "IR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 {
 	struct fwnode_handle *fn;
@@ -3655,16 +3661,17 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 	fn = irq_domain_alloc_named_id_fwnode("AMD-IR", iommu->index);
 	if (!fn)
 		return -ENOMEM;
-	iommu->ir_domain = irq_domain_create_tree(fn, &amd_ir_domain_ops, iommu);
+	iommu->ir_domain = irq_domain_create_hierarchy(arch_get_ir_parent_domain(), 0, 0,
+						       fn, &amd_ir_domain_ops, iommu);
 	if (!iommu->ir_domain) {
 		irq_domain_free_fwnode(fn);
 		return -ENOMEM;
 	}
 
-	iommu->ir_domain->parent = arch_get_ir_parent_domain();
-	iommu->msi_domain = arch_create_remap_msi_irq_domain(iommu->ir_domain,
-							     "AMD-IR-MSI",
-							     iommu->index);
+	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_AMDVI);
+	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
+	iommu->ir_domain->msi_parent_ops = &amdvi_msi_parent_ops;
+
 	return 0;
 }
 
diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index 3a09396..0b2d8a8 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -24,6 +24,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_PCI_DEVICE_MSI,
 	DOMAIN_BUS_PCI_DEVICE_MSIX,
 	DOMAIN_BUS_DMAR,
+	DOMAIN_BUS_AMDVI,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Remove unused pci_dev_has_special_msi_domain()
  2022-11-24 23:26 ` [patch V3 14/33] PCI/MSI: Remove unused pci_dev_has_special_msi_domain() Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Bjorn Helgaas, Marc Zyngier, x86,
	linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     96bd2f29f00a60245042f4c0ed85c3e27940d821
Gitweb:        https://git.kernel.org/tip/96bd2f29f00a60245042f4c0ed85c3e27940d821
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:07 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:02 +01:00

PCI/MSI: Remove unused pci_dev_has_special_msi_domain()

The check for special MSI domains like VMD which prevents the interrupt
remapping code to overwrite device::msi::domain is not longer required and
has been replaced by an x86 specific version which is aware of MSI parent
domains.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.093093200@linutronix.de

---
 drivers/pci/msi/irqdomain.c | 21 ---------------------
 include/linux/msi.h         |  1 -
 2 files changed, 22 deletions(-)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index be3d50f..4736403 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -414,24 +414,3 @@ struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev)
 					     DOMAIN_BUS_PCI_MSI);
 	return dom;
 }
-
-/**
- * pci_dev_has_special_msi_domain - Check whether the device is handled by
- *				    a non-standard PCI-MSI domain
- * @pdev:	The PCI device to check.
- *
- * Returns: True if the device irqdomain or the bus irqdomain is
- * non-standard PCI/MSI.
- */
-bool pci_dev_has_special_msi_domain(struct pci_dev *pdev)
-{
-	struct irq_domain *dom = dev_get_msi_domain(&pdev->dev);
-
-	if (!dom)
-		dom = dev_get_msi_domain(&pdev->bus->dev);
-
-	if (!dom)
-		return true;
-
-	return dom->bus_token != DOMAIN_BUS_PCI_MSI;
-}
diff --git a/include/linux/msi.h b/include/linux/msi.h
index b4ab005..b5dda4b 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -617,7 +617,6 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
 					     struct irq_domain *parent);
 u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev);
 struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev);
-bool pci_dev_has_special_msi_domain(struct pci_dev *pdev);
 #else /* CONFIG_PCI_MSI */
 static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev)
 {

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] iommu/vt-d: Switch to MSI parent domains
  2022-11-24 23:26 ` [patch V3 15/33] iommu/vt-d: Switch to MSI parent domains Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     01290527cfe8bb5aa8bb6c29ba5f3493d75652ca
Gitweb:        https://git.kernel.org/tip/01290527cfe8bb5aa8bb6c29ba5f3493d75652ca
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:08 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:02 +01:00

iommu/vt-d: Switch to MSI parent domains

Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.151226317@linutronix.de

---
 arch/x86/kernel/apic/msi.c          |  2 ++
 drivers/iommu/intel/iommu.h         |  1 -
 drivers/iommu/intel/irq_remapping.c | 27 ++++++++++++---------------
 include/linux/irqdomain_defs.h      |  1 +
 4 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index db96bfc..a8dccb0 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -217,6 +217,8 @@ static bool x86_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 		/* See msi_set_affinity() for the gory details */
 		info->flags |= MSI_FLAG_NOMASK_QUIRK;
 		break;
+	case DOMAIN_BUS_DMAR:
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		return false;
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 92023df..6eadb86 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -600,7 +600,6 @@ struct intel_iommu {
 #ifdef CONFIG_IRQ_REMAP
 	struct ir_table *ir_table;	/* Interrupt remapping info */
 	struct irq_domain *ir_domain;
-	struct irq_domain *ir_msi_domain;
 #endif
 	struct iommu_device iommu;  /* IOMMU core code handle */
 	int		node;
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 08bbf08..6fab407 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -82,6 +82,7 @@ static const struct irq_domain_ops intel_ir_domain_ops;
 
 static void iommu_disable_irq_remapping(struct intel_iommu *iommu);
 static int __init parse_ioapics_under_ir(void);
+static const struct msi_parent_ops dmar_msi_parent_ops;
 
 static bool ir_pre_enabled(struct intel_iommu *iommu)
 {
@@ -230,7 +231,7 @@ static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
 {
 	struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
 
-	return drhd ? drhd->iommu->ir_msi_domain : NULL;
+	return drhd ? drhd->iommu->ir_domain : NULL;
 }
 
 static int clear_entries(struct irq_2_iommu *irq_iommu)
@@ -573,10 +574,10 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 		pr_err("IR%d: failed to allocate irqdomain\n", iommu->seq_id);
 		goto out_free_fwnode;
 	}
-	iommu->ir_msi_domain =
-		arch_create_remap_msi_irq_domain(iommu->ir_domain,
-						 "INTEL-IR-MSI",
-						 iommu->seq_id);
+
+	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_DMAR);
+	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
+	iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops;
 
 	ir_table->base = page_address(pages);
 	ir_table->bitmap = bitmap;
@@ -620,9 +621,6 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 	return 0;
 
 out_free_ir_domain:
-	if (iommu->ir_msi_domain)
-		irq_domain_remove(iommu->ir_msi_domain);
-	iommu->ir_msi_domain = NULL;
 	irq_domain_remove(iommu->ir_domain);
 	iommu->ir_domain = NULL;
 out_free_fwnode:
@@ -644,13 +642,6 @@ static void intel_teardown_irq_remapping(struct intel_iommu *iommu)
 	struct fwnode_handle *fn;
 
 	if (iommu && iommu->ir_table) {
-		if (iommu->ir_msi_domain) {
-			fn = iommu->ir_msi_domain->fwnode;
-
-			irq_domain_remove(iommu->ir_msi_domain);
-			irq_domain_free_fwnode(fn);
-			iommu->ir_msi_domain = NULL;
-		}
 		if (iommu->ir_domain) {
 			fn = iommu->ir_domain->fwnode;
 
@@ -1437,6 +1428,12 @@ static const struct irq_domain_ops intel_ir_domain_ops = {
 	.deactivate = intel_irq_remapping_deactivate,
 };
 
+static const struct msi_parent_ops dmar_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "IR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 /*
  * Support of Interrupt Remapping Unit Hotplug
  */
diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index b3f4b7e..3a09396 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -23,6 +23,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_VMD_MSI,
 	DOMAIN_BUS_PCI_DEVICE_MSI,
 	DOMAIN_BUS_PCI_DEVICE_MSIX,
+	DOMAIN_BUS_DMAR,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide BUS_DEVICE_PCI_MSI[X]
  2022-11-24 23:26 ` [patch V3 11/33] genirq/msi: Provide BUS_DEVICE_PCI_MSI[X] Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     29b2f2cfd3f1fd3638799671c3a6758e13943875
Gitweb:        https://git.kernel.org/tip/29b2f2cfd3f1fd3638799671c3a6758e13943875
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:02 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:02 +01:00

genirq/msi: Provide BUS_DEVICE_PCI_MSI[X]

Provide new bus tokens for the upcoming per device PCI/MSI and PCI/MSIX
interrupt domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.917219885@linutronix.de

---
 include/linux/irqdomain_defs.h | 2 ++
 kernel/irq/msi.c               | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index 69035b4..b3f4b7e 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -21,6 +21,8 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_TI_SCI_INTA_MSI,
 	DOMAIN_BUS_WAKEUP,
 	DOMAIN_BUS_VMD_MSI,
+	DOMAIN_BUS_PCI_DEVICE_MSI,
+	DOMAIN_BUS_PCI_DEVICE_MSIX,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 21a7452..0536db7 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1121,6 +1121,8 @@ static bool msi_check_reservation_mode(struct irq_domain *domain,
 
 	switch(domain->bus_token) {
 	case DOMAIN_BUS_PCI_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
 	case DOMAIN_BUS_VMD_MSI:
 		break;
 	default:
@@ -1146,6 +1148,8 @@ static int msi_handle_pci_fail(struct irq_domain *domain, struct msi_desc *desc,
 {
 	switch(domain->bus_token) {
 	case DOMAIN_BUS_PCI_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
 	case DOMAIN_BUS_VMD_MSI:
 		if (IS_ENABLED(CONFIG_PCI_MSI))
 			break;

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Add support for per device MSI[X] domains
  2022-11-24 23:26 ` [patch V3 12/33] PCI/MSI: Add support for per device MSI[X] domains Thomas Gleixner
  2022-11-28  4:46   ` Tian, Kevin
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ahmed S. Darwish, Thomas Gleixner, Kevin Tian, Bjorn Helgaas,
	Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     e893c81d302e9b2f9ef2258f09d9b696ea67b5b9
Gitweb:        https://git.kernel.org/tip/e893c81d302e9b2f9ef2258f09d9b696ea67b5b9
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:04 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:02 +01:00

PCI/MSI: Add support for per device MSI[X] domains

Provide a template and the necessary callbacks to create PCI/MSI and
PCI/MSI-X domains.

The domains are created when MSI or MSI-X is enabled. The domain's lifetime
is either the device lifetime or in case that e.g. MSI-X was tried first
and failed, then the MSI-X domain is removed and a MSI domain is created as
both are mutually exclusive and reside in the default domain ID slot of the
per device domain pointer array.

Also expand pci_msi_domain_supports() to handle feature checks correctly
even in the case that the per device domain was not yet created by checking
the features supported by the MSI parent.

Add the necessary setup calls into the MSI and MSI-X enable code path.
These setup calls are backwards compatible. They return success when there
is no parent domain found, which means the existing global domains or the
legacy allocation path keep just working.

Co-developed-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.975388241@linutronix.de

---
 drivers/pci/msi/irqdomain.c | 188 ++++++++++++++++++++++++++++++++++-
 drivers/pci/msi/msi.c       |  16 ++-
 drivers/pci/msi/msi.h       |   2 +-
 3 files changed, 201 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index f4338fb..be3d50f 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -139,6 +139,170 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
 }
 EXPORT_SYMBOL_GPL(pci_msi_create_irq_domain);
 
+/*
+ * Per device MSI[-X] domain functionality
+ */
+static void pci_device_domain_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
+{
+	arg->desc = desc;
+	arg->hwirq = desc->msi_index;
+}
+
+static void pci_irq_mask_msi(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+	pci_msi_mask(desc, BIT(data->irq - desc->irq));
+}
+
+static void pci_irq_unmask_msi(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+	pci_msi_unmask(desc, BIT(data->irq - desc->irq));
+}
+
+#ifdef CONFIG_GENERIC_IRQ_RESERVATION_MODE
+# define MSI_REACTIVATE		MSI_FLAG_MUST_REACTIVATE
+#else
+# define MSI_REACTIVATE		0
+#endif
+
+#define MSI_COMMON_FLAGS	(MSI_FLAG_FREE_MSI_DESCS |	\
+				 MSI_FLAG_ACTIVATE_EARLY |	\
+				 MSI_FLAG_DEV_SYSFS |		\
+				 MSI_REACTIVATE)
+
+static const struct msi_domain_template pci_msi_template = {
+	.chip = {
+		.name			= "PCI-MSI",
+		.irq_mask		= pci_irq_mask_msi,
+		.irq_unmask		= pci_irq_unmask_msi,
+		.irq_write_msi_msg	= pci_msi_domain_write_msg,
+		.flags			= IRQCHIP_ONESHOT_SAFE,
+	},
+
+	.ops = {
+		.set_desc		= pci_device_domain_set_desc,
+	},
+
+	.info = {
+		.flags			= MSI_COMMON_FLAGS | MSI_FLAG_MULTI_PCI_MSI,
+		.bus_token		= DOMAIN_BUS_PCI_DEVICE_MSI,
+	},
+};
+
+static void pci_irq_mask_msix(struct irq_data *data)
+{
+	pci_msix_mask(irq_data_get_msi_desc(data));
+}
+
+static void pci_irq_unmask_msix(struct irq_data *data)
+{
+	pci_msix_unmask(irq_data_get_msi_desc(data));
+}
+
+static const struct msi_domain_template pci_msix_template = {
+	.chip = {
+		.name			= "PCI-MSIX",
+		.irq_mask		= pci_irq_mask_msix,
+		.irq_unmask		= pci_irq_unmask_msix,
+		.irq_write_msi_msg	= pci_msi_domain_write_msg,
+		.flags			= IRQCHIP_ONESHOT_SAFE,
+	},
+
+	.ops = {
+		.set_desc		= pci_device_domain_set_desc,
+	},
+
+	.info = {
+		.flags			= MSI_COMMON_FLAGS | MSI_FLAG_PCI_MSIX,
+		.bus_token		= DOMAIN_BUS_PCI_DEVICE_MSIX,
+	},
+};
+
+static bool pci_match_device_domain(struct pci_dev *pdev, enum irq_domain_bus_token bus_token)
+{
+	return msi_match_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN, bus_token);
+}
+
+static bool pci_create_device_domain(struct pci_dev *pdev, const struct msi_domain_template *tmpl,
+				     unsigned int hwsize)
+{
+	struct irq_domain *domain = dev_get_msi_domain(&pdev->dev);
+
+	if (!domain || !irq_domain_is_msi_parent(domain))
+		return true;
+
+	return msi_create_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN, tmpl,
+					    hwsize, NULL, NULL);
+}
+
+/**
+ * pci_setup_msi_device_domain - Setup a device MSI interrupt domain
+ * @pdev:	The PCI device to create the domain on
+ *
+ * Return:
+ *  True when:
+ *	- The device does not have a MSI parent irq domain associated,
+ *	  which keeps the legacy architecture specific and the global
+ *	  PCI/MSI domain models working
+ *	- The MSI domain exists already
+ *	- The MSI domain was successfully allocated
+ *  False when:
+ *	- MSI-X is enabled
+ *	- The domain creation fails.
+ *
+ * The created MSI domain is preserved until:
+ *	- The device is removed
+ *	- MSI is disabled and a MSI-X domain is created
+ */
+bool pci_setup_msi_device_domain(struct pci_dev *pdev)
+{
+	if (WARN_ON_ONCE(pdev->msix_enabled))
+		return false;
+
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSI))
+		return true;
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSIX))
+		msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
+
+	return pci_create_device_domain(pdev, &pci_msi_template, 1);
+}
+
+/**
+ * pci_setup_msix_device_domain - Setup a device MSI-X interrupt domain
+ * @pdev:	The PCI device to create the domain on
+ * @hwsize:	The size of the MSI-X vector table
+ *
+ * Return:
+ *  True when:
+ *	- The device does not have a MSI parent irq domain associated,
+ *	  which keeps the legacy architecture specific and the global
+ *	  PCI/MSI domain models working
+ *	- The MSI-X domain exists already
+ *	- The MSI-X domain was successfully allocated
+ *  False when:
+ *	- MSI is enabled
+ *	- The domain creation fails.
+ *
+ * The created MSI-X domain is preserved until:
+ *	- The device is removed
+ *	- MSI-X is disabled and a MSI domain is created
+ */
+bool pci_setup_msix_device_domain(struct pci_dev *pdev, unsigned int hwsize)
+{
+	if (WARN_ON_ONCE(pdev->msi_enabled))
+		return false;
+
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSIX))
+		return true;
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSI))
+		msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
+
+	return pci_create_device_domain(pdev, &pci_msix_template, hwsize);
+}
+
 /**
  * pci_msi_domain_supports - Check for support of a particular feature flag
  * @pdev:		The PCI device to operate on
@@ -152,13 +316,33 @@ bool pci_msi_domain_supports(struct pci_dev *pdev, unsigned int feature_mask,
 {
 	struct msi_domain_info *info;
 	struct irq_domain *domain;
+	unsigned int supported;
 
 	domain = dev_get_msi_domain(&pdev->dev);
 
 	if (!domain || !irq_domain_is_hierarchy(domain))
 		return mode == ALLOW_LEGACY;
-	info = domain->host_data;
-	return (info->flags & feature_mask) == feature_mask;
+
+	if (!irq_domain_is_msi_parent(domain)) {
+		/*
+		 * For "global" PCI/MSI interrupt domains the associated
+		 * msi_domain_info::flags is the authoritive source of
+		 * information.
+		 */
+		info = domain->host_data;
+		supported = info->flags;
+	} else {
+		/*
+		 * For MSI parent domains the supported feature set
+		 * is avaliable in the parent ops. This makes checks
+		 * possible before actually instantiating the
+		 * per device domain because the parent is never
+		 * expanding the PCI/MSI functionality.
+		 */
+		supported = domain->msi_parent_ops->supported_flags;
+	}
+
+	return (supported & feature_mask) == feature_mask;
 }
 
 /*
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 76a3d44..b8d74df 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -436,6 +436,9 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 	if (rc)
 		return rc;
 
+	if (!pci_setup_msi_device_domain(dev))
+		return -ENODEV;
+
 	for (;;) {
 		if (affd) {
 			nvec = irq_calc_affinity_vectors(minvec, nvec, affd);
@@ -787,9 +790,13 @@ int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int
 	if (!pci_msix_validate_entries(dev, entries, nvec, hwsize))
 		return -EINVAL;
 
-	/* PCI_IRQ_VIRTUAL is a horrible hack! */
-	if (nvec > hwsize && !(flags & PCI_IRQ_VIRTUAL))
-		nvec = hwsize;
+	if (hwsize < nvec) {
+		/* Keep the IRQ virtual hackery working */
+		if (flags & PCI_IRQ_VIRTUAL)
+			hwsize = nvec;
+		else
+			nvec = hwsize;
+	}
 
 	if (nvec < minvec)
 		return -ENOSPC;
@@ -798,6 +805,9 @@ int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int
 	if (rc)
 		return rc;
 
+	if (!pci_setup_msix_device_domain(dev, hwsize))
+		return -ENODEV;
+
 	for (;;) {
 		if (affd) {
 			nvec = irq_calc_affinity_vectors(minvec, nvec, affd);
diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
index 9d75b6f..74408cc 100644
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -105,6 +105,8 @@ enum support_mode {
 };
 
 bool pci_msi_domain_supports(struct pci_dev *dev, unsigned int feature_mask, enum support_mode mode);
+bool pci_setup_msi_device_domain(struct pci_dev *pdev);
+bool pci_setup_msix_device_domain(struct pci_dev *pdev, unsigned int hwsize);
 
 /* Legacy (!IRQDOMAIN) fallbacks */
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] x86/apic/vector: Provide MSI parent domain
  2022-11-24 23:26 ` [patch V3 13/33] x86/apic/vector: Provide MSI parent domain Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2023-01-04 12:34   ` [patch V3 13/33] " Jason Gunthorpe
  2 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     db2e709542d52ff480f1acdfb6d0803d1529dc12
Gitweb:        https://git.kernel.org/tip/db2e709542d52ff480f1acdfb6d0803d1529dc12
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:05 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:02 +01:00

x86/apic/vector: Provide MSI parent domain

Enable MSI parent domain support in the x86 vector domain and fixup the
checks in the iommu implementations to check whether device::msi::domain is
the default MSI parent domain. That keeps the existing logic to protect
e.g. devices behind VMD working.

The interrupt remap PCI/MSI code still works because the underlying vector
domain still provides the same functionality.

None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are
affected either. They still work the same way both at the low level and the
PCI/MSI implementations they provide.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.034672592@linutronix.de

---
 arch/x86/include/asm/msi.h          |   6 +-
 arch/x86/include/asm/pci.h          |   1 +-
 arch/x86/kernel/apic/msi.c          | 176 +++++++++++++++++++--------
 drivers/iommu/amd/iommu.c           |   2 +-
 drivers/iommu/intel/irq_remapping.c |   2 +-
 5 files changed, 138 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index d71c7e8..7702958 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -62,4 +62,10 @@ typedef struct x86_msi_addr_hi {
 struct msi_msg;
 u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid);
 
+#define X86_VECTOR_MSI_FLAGS_SUPPORTED					\
+	(MSI_GENERIC_FLAGS_MASK | MSI_FLAG_PCI_MSIX)
+
+#define X86_VECTOR_MSI_FLAGS_REQUIRED					\
+	(MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS)
+
 #endif /* _ASM_X86_MSI_H */
diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index c4789de..b40c462 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -92,6 +92,7 @@ void pcibios_scan_root(int bus);
 struct irq_routing_table *pcibios_get_irq_routing_table(void);
 int pcibios_set_irq_routing(struct pci_dev *dev, int pin, int irq);
 
+bool pci_dev_has_default_msi_parent_domain(struct pci_dev *dev);
 
 #define HAVE_PCI_MMAP
 #define arch_can_pci_mmap_wc()	pat_enabled()
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 71c8751..db96bfc 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -142,67 +142,131 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force)
 	return ret;
 }
 
-/*
- * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
- * which implement the MSI or MSI-X Capability Structure.
+/**
+ * pci_dev_has_default_msi_parent_domain - Check whether the device has the default
+ *					   MSI parent domain associated
+ * @dev:	Pointer to the PCI device
  */
-static struct irq_chip pci_msi_controller = {
-	.name			= "PCI-MSI",
-	.irq_unmask		= pci_msi_unmask_irq,
-	.irq_mask		= pci_msi_mask_irq,
-	.irq_ack		= irq_chip_ack_parent,
-	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.irq_set_affinity	= msi_set_affinity,
-	.flags			= IRQCHIP_SKIP_SET_WAKE |
-				  IRQCHIP_AFFINITY_PRE_STARTUP,
-};
-
-int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
-		    msi_alloc_info_t *arg)
+bool pci_dev_has_default_msi_parent_domain(struct pci_dev *dev)
 {
-	init_irq_alloc_info(arg, NULL);
-	if (to_pci_dev(dev)->msix_enabled)
-		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
-	else
-		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
+	struct irq_domain *domain = dev_get_msi_domain(&dev->dev);
 
-	return 0;
+	if (!domain)
+		domain = dev_get_msi_domain(&dev->bus->dev);
+	if (!domain)
+		return false;
+
+	return domain == x86_vector_domain;
 }
-EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
-static struct msi_domain_ops pci_msi_domain_ops = {
-	.msi_prepare	= pci_msi_prepare,
-};
+/**
+ * x86_msi_prepare - Setup of msi_alloc_info_t for allocations
+ * @domain:	The domain for which this setup happens
+ * @dev:	The device for which interrupts are allocated
+ * @nvec:	The number of vectors to allocate
+ * @alloc:	The allocation info structure to initialize
+ *
+ * This function is to be used for all types of MSI domains above the x86
+ * vector domain and any intermediates. It is always invoked from the
+ * top level interrupt domain. The domain specific allocation
+ * functionality is determined via the @domain's bus token which allows to
+ * map the X86 specific allocation type.
+ */
+static int x86_msi_prepare(struct irq_domain *domain, struct device *dev,
+			   int nvec, msi_alloc_info_t *alloc)
+{
+	struct msi_domain_info *info = domain->host_data;
+
+	init_irq_alloc_info(alloc, NULL);
+
+	switch (info->bus_token) {
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
+		return 0;
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
+		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
 
-static struct msi_domain_info pci_msi_domain_info = {
-	.flags		= MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
-			  MSI_FLAG_PCI_MSIX | MSI_FLAG_NOMASK_QUIRK,
+/**
+ * x86_init_dev_msi_info - Domain info setup for MSI domains
+ * @dev:		The device for which the domain should be created
+ * @domain:		The (root) domain providing this callback
+ * @real_parent:	The real parent domain of the to initialize domain
+ * @info:		The domain info for the to initialize domain
+ *
+ * This function is to be used for all types of MSI domains above the x86
+ * vector domain and any intermediates. The domain specific functionality
+ * is determined via the @real_parent.
+ */
+static bool x86_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+				  struct irq_domain *real_parent, struct msi_domain_info *info)
+{
+	const struct msi_parent_ops *pops = real_parent->msi_parent_ops;
+
+	/* MSI parent domain specific settings */
+	switch (real_parent->bus_token) {
+	case DOMAIN_BUS_ANY:
+		/* Only the vector domain can have the ANY token */
+		if (WARN_ON_ONCE(domain != real_parent))
+			return false;
+		info->chip->irq_set_affinity = msi_set_affinity;
+		/* See msi_set_affinity() for the gory details */
+		info->flags |= MSI_FLAG_NOMASK_QUIRK;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return false;
+	}
 
-	.ops		= &pci_msi_domain_ops,
-	.chip		= &pci_msi_controller,
-	.handler	= handle_edge_irq,
-	.handler_name	= "edge",
+	/* Is the target supported? */
+	switch(info->bus_token) {
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	/*
+	 * Mask out the domain specific MSI feature flags which are not
+	 * supported by the real parent.
+	 */
+	info->flags			&= pops->supported_flags;
+	/* Enforce the required flags */
+	info->flags			|= X86_VECTOR_MSI_FLAGS_REQUIRED;
+
+	/* This is always invoked from the top level MSI domain! */
+	info->ops->msi_prepare		= x86_msi_prepare;
+
+	info->chip->irq_ack		= irq_chip_ack_parent;
+	info->chip->irq_retrigger	= irq_chip_retrigger_hierarchy;
+	info->chip->flags		|= IRQCHIP_SKIP_SET_WAKE |
+					   IRQCHIP_AFFINITY_PRE_STARTUP;
+
+	info->handler			= handle_edge_irq;
+	info->handler_name		= "edge";
+
+	return true;
+}
+
+static const struct msi_parent_ops x86_vector_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED,
+	.init_dev_msi_info	= x86_init_dev_msi_info,
 };
 
 struct irq_domain * __init native_create_pci_msi_domain(void)
 {
-	struct fwnode_handle *fn;
-	struct irq_domain *d;
-
 	if (disable_apic)
 		return NULL;
 
-	fn = irq_domain_alloc_named_fwnode("PCI-MSI");
-	if (!fn)
-		return NULL;
-
-	d = pci_msi_create_irq_domain(fn, &pci_msi_domain_info,
-				      x86_vector_domain);
-	if (!d) {
-		irq_domain_free_fwnode(fn);
-		pr_warn("Failed to initialize PCI-MSI irqdomain.\n");
-	}
-	return d;
+	x86_vector_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
+	x86_vector_domain->msi_parent_ops = &x86_vector_msi_parent_ops;
+	return x86_vector_domain;
 }
 
 void __init x86_create_pci_msi_domain(void)
@@ -210,7 +274,25 @@ void __init x86_create_pci_msi_domain(void)
 	x86_pci_msi_default_domain = x86_init.irqs.create_pci_msi_domain();
 }
 
+/* Keep around for hyperV and the remap code below */
+int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+		    msi_alloc_info_t *arg)
+{
+	init_irq_alloc_info(arg, NULL);
+
+	if (to_pci_dev(dev)->msix_enabled)
+		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
+	else
+		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_msi_prepare);
+
 #ifdef CONFIG_IRQ_REMAP
+static struct msi_domain_ops pci_msi_domain_ops = {
+	.msi_prepare	= pci_msi_prepare,
+};
+
 static struct irq_chip pci_msi_ir_controller = {
 	.name			= "IR-PCI-MSI",
 	.irq_unmask		= pci_msi_unmask_irq,
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 72dfe57..67e209c 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -812,7 +812,7 @@ static void
 amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu)
 {
 	if (!irq_remapping_enabled || !dev_is_pci(dev) ||
-	    pci_dev_has_special_msi_domain(to_pci_dev(dev)))
+	    !pci_dev_has_default_msi_parent_domain(to_pci_dev(dev)))
 		return;
 
 	dev_set_msi_domain(dev, iommu->msi_domain);
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index a914eba..08bbf08 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1107,7 +1107,7 @@ error:
  */
 void intel_irq_remap_add_device(struct dmar_pci_notify_info *info)
 {
-	if (!irq_remapping_enabled || pci_dev_has_special_msi_domain(info->dev))
+	if (!irq_remapping_enabled || !pci_dev_has_default_msi_parent_domain(info->dev))
 		return;
 
 	dev_set_msi_domain(&info->dev->dev, map_dev_to_ir(info->dev));

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide msi_match_device_domain()
  2022-11-24 23:25 ` [patch V3 08/33] genirq/msi: Provide msi_match_device_domain() Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     26e91b75bf6108550035355c835bf0c93c885b61
Gitweb:        https://git.kernel.org/tip/26e91b75bf6108550035355c835bf0c93c885b61
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:25:57 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:01 +01:00

genirq/msi: Provide msi_match_device_domain()

Provide an interface to match a per device domain bus token. This allows to
query which type of domain is installed for a particular domain id. Will be
used for PCI to avoid frequent create/remove cycles for the MSI resp. MSI-X
domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.738047902@linutronix.de

---
 include/linux/msi.h |  3 +++
 kernel/irq/msi.c    | 25 +++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index ef46a3e..b4ab005 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -553,6 +553,9 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
 				  void *chip_data);
 void msi_remove_device_irq_domain(struct device *dev, unsigned int domid);
 
+bool msi_match_device_irq_domain(struct device *dev, unsigned int domid,
+				 enum irq_domain_bus_token bus_token);
+
 int msi_domain_alloc_irqs_range_locked(struct device *dev, unsigned int domid,
 				       unsigned int first, unsigned int last);
 int msi_domain_alloc_irqs_range(struct device *dev, unsigned int domid,
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 8b415bd..7449998 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -986,6 +986,31 @@ unlock:
 	msi_unlock_descs(dev);
 }
 
+/**
+ * msi_match_device_irq_domain - Match a device irq domain against a bus token
+ * @dev:	Pointer to the device
+ * @domid:	Domain id
+ * @bus_token:	Bus token to match against the domain bus token
+ *
+ * Return: True if device domain exists and bus tokens match.
+ */
+bool msi_match_device_irq_domain(struct device *dev, unsigned int domid,
+				 enum irq_domain_bus_token bus_token)
+{
+	struct msi_domain_info *info;
+	struct irq_domain *domain;
+	bool ret = false;
+
+	msi_lock_descs(dev);
+	domain = msi_get_device_domain(dev, domid);
+	if (domain && irq_domain_is_msi_device(domain)) {
+		info = domain->host_data;
+		ret = info->bus_token == bus_token;
+	}
+	msi_unlock_descs(dev);
+	return ret;
+}
+
 int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
 			    int nvec, msi_alloc_info_t *arg)
 {

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/irqdomain: Add irq_domain:: Dev for per device MSI domains
  2022-11-24 23:25 ` [patch V3 06/33] genirq/irqdomain: Add irq_domain::dev for per device MSI domains Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     4443664f298d1a2cba25a2e48d53b78f4138209b
Gitweb:        https://git.kernel.org/tip/4443664f298d1a2cba25a2e48d53b78f4138209b
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:25:54 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:01 +01:00

genirq/irqdomain: Add irq_domain:: Dev for per device MSI domains

Per device domains require the device pointer of the device which
instantiated the domain for some purposes. Add the pointer to struct
irq_domain. It will be used in the next step which provides the
infrastructure to create per device MSI domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.618807601@linutronix.de

---
 include/linux/irqdomain.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index a668cc0..a372086 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -132,6 +132,9 @@ struct irq_domain_chip_generic;
  * @gc:		Pointer to a list of generic chips. There is a helper function for
  *		setting up one or more generic chips for interrupt controllers
  *		drivers using the generic chip library which uses this pointer.
+ * @dev:	Pointer to the device which instantiated the irqdomain
+ *		With per device irq domains this is not necessarily the same
+ *		as @pm_dev.
  * @pm_dev:	Pointer to a device that can be utilized for power management
  *		purposes related to the irq domain.
  * @parent:	Pointer to parent irq_domain to support hierarchy irq_domains
@@ -155,6 +158,7 @@ struct irq_domain {
 	struct fwnode_handle		*fwnode;
 	enum irq_domain_bus_token	bus_token;
 	struct irq_domain_chip_generic	*gc;
+	struct device			*dev;
 	struct device			*pm_dev;
 #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
 	struct irq_domain		*parent;

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Split __pci_write_msi_msg()
  2022-11-24 23:26 ` [patch V3 10/33] PCI/MSI: Split __pci_write_msi_msg() Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ahmed S. Darwish, Thomas Gleixner, Kevin Tian, Bjorn Helgaas,
	Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     8bf5fb3f8fde23ae4ef69f0120f6cf56ad5a462d
Gitweb:        https://git.kernel.org/tip/8bf5fb3f8fde23ae4ef69f0120f6cf56ad5a462d
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:00 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:02 +01:00

PCI/MSI: Split __pci_write_msi_msg()

The upcoming per device MSI domains will create different domains for MSI
and MSI-X. Split the write message function into MSI and MSI-X helpers so
they can be used by those new domain functions seperately.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.857982142@linutronix.de

---
 drivers/pci/msi/msi.c | 104 +++++++++++++++++++++--------------------
 1 file changed, 54 insertions(+), 50 deletions(-)

diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index d107bde..76a3d44 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -180,6 +180,58 @@ void __pci_read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 	}
 }
 
+static inline void pci_write_msg_msi(struct pci_dev *dev, struct msi_desc *desc,
+				     struct msi_msg *msg)
+{
+	int pos = dev->msi_cap;
+	u16 msgctl;
+
+	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+	msgctl &= ~PCI_MSI_FLAGS_QSIZE;
+	msgctl |= desc->pci.msi_attrib.multiple << 4;
+	pci_write_config_word(dev, pos + PCI_MSI_FLAGS, msgctl);
+
+	pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_LO, msg->address_lo);
+	if (desc->pci.msi_attrib.is_64) {
+		pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_HI,  msg->address_hi);
+		pci_write_config_word(dev, pos + PCI_MSI_DATA_64, msg->data);
+	} else {
+		pci_write_config_word(dev, pos + PCI_MSI_DATA_32, msg->data);
+	}
+	/* Ensure that the writes are visible in the device */
+	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+}
+
+static inline void pci_write_msg_msix(struct msi_desc *desc, struct msi_msg *msg)
+{
+	void __iomem *base = pci_msix_desc_addr(desc);
+	u32 ctrl = desc->pci.msix_ctrl;
+	bool unmasked = !(ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT);
+
+	if (desc->pci.msi_attrib.is_virtual)
+		return;
+	/*
+	 * The specification mandates that the entry is masked
+	 * when the message is modified:
+	 *
+	 * "If software changes the Address or Data value of an
+	 * entry while the entry is unmasked, the result is
+	 * undefined."
+	 */
+	if (unmasked)
+		pci_msix_write_vector_ctrl(desc, ctrl | PCI_MSIX_ENTRY_CTRL_MASKBIT);
+
+	writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
+	writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR);
+	writel(msg->data, base + PCI_MSIX_ENTRY_DATA);
+
+	if (unmasked)
+		pci_msix_write_vector_ctrl(desc, ctrl);
+
+	/* Ensure that the writes are visible in the device */
+	readl(base + PCI_MSIX_ENTRY_DATA);
+}
+
 void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
 	struct pci_dev *dev = msi_desc_to_pci_dev(entry);
@@ -187,63 +239,15 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 	if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
 		/* Don't touch the hardware now */
 	} else if (entry->pci.msi_attrib.is_msix) {
-		void __iomem *base = pci_msix_desc_addr(entry);
-		u32 ctrl = entry->pci.msix_ctrl;
-		bool unmasked = !(ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT);
-
-		if (entry->pci.msi_attrib.is_virtual)
-			goto skip;
-
-		/*
-		 * The specification mandates that the entry is masked
-		 * when the message is modified:
-		 *
-		 * "If software changes the Address or Data value of an
-		 * entry while the entry is unmasked, the result is
-		 * undefined."
-		 */
-		if (unmasked)
-			pci_msix_write_vector_ctrl(entry, ctrl | PCI_MSIX_ENTRY_CTRL_MASKBIT);
-
-		writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
-		writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR);
-		writel(msg->data, base + PCI_MSIX_ENTRY_DATA);
-
-		if (unmasked)
-			pci_msix_write_vector_ctrl(entry, ctrl);
-
-		/* Ensure that the writes are visible in the device */
-		readl(base + PCI_MSIX_ENTRY_DATA);
+		pci_write_msg_msix(entry, msg);
 	} else {
-		int pos = dev->msi_cap;
-		u16 msgctl;
-
-		pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
-		msgctl &= ~PCI_MSI_FLAGS_QSIZE;
-		msgctl |= entry->pci.msi_attrib.multiple << 4;
-		pci_write_config_word(dev, pos + PCI_MSI_FLAGS, msgctl);
-
-		pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_LO,
-				       msg->address_lo);
-		if (entry->pci.msi_attrib.is_64) {
-			pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_HI,
-					       msg->address_hi);
-			pci_write_config_word(dev, pos + PCI_MSI_DATA_64,
-					      msg->data);
-		} else {
-			pci_write_config_word(dev, pos + PCI_MSI_DATA_32,
-					      msg->data);
-		}
-		/* Ensure that the writes are visible in the device */
-		pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+		pci_write_msg_msi(dev, entry, msg);
 	}
 
-skip:
 	entry->msg = *msg;
 
 	if (entry->write_msi_msg)
 		entry->write_msi_msg(entry, entry->write_msi_msg_data);
-
 }
 
 void pci_write_msi_msg(unsigned int irq, struct msi_msg *msg)

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Add range checking to msi_insert_desc()
  2022-11-24 23:25 ` [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc() Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     02de943b0519c5940094ed8cd10d348a63ab0646
Gitweb:        https://git.kernel.org/tip/02de943b0519c5940094ed8cd10d348a63ab0646
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:25:59 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:02 +01:00

genirq/msi: Add range checking to msi_insert_desc()

Per device domains provide the real domain size to the core code. This
allows range checking on insertion of MSI descriptors and also paves the
way for dynamic index allocations which are required e.g. for IMS. This
avoids external mechanisms like bitmaps on the device side and just
utilizes the core internal MSI descriptor storxe for it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.798556374@linutronix.de

---
 kernel/irq/msi.c | 58 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 47 insertions(+), 11 deletions(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 7449998..21a7452 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -40,6 +40,7 @@ struct msi_ctrl {
 #define MSI_XA_DOMAIN_SIZE	(MSI_MAX_INDEX + 1)
 
 static void msi_domain_free_locked(struct device *dev, struct msi_ctrl *ctrl);
+static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid);
 static inline int msi_sysfs_create_group(struct device *dev);
 
 
@@ -80,16 +81,28 @@ static void msi_free_desc(struct msi_desc *desc)
 	kfree(desc);
 }
 
-static int msi_insert_desc(struct msi_device_data *md, struct msi_desc *desc,
+static int msi_insert_desc(struct device *dev, struct msi_desc *desc,
 			   unsigned int domid, unsigned int index)
 {
+	struct msi_device_data *md = dev->msi.data;
 	struct xarray *xa = &md->__domains[domid].store;
+	unsigned int hwsize;
 	int ret;
 
+	hwsize = msi_domain_get_hwsize(dev, domid);
+	if (index >= hwsize) {
+		ret = -ERANGE;
+		goto fail;
+	}
+
 	desc->msi_index = index;
 	ret = xa_insert(xa, index, desc, GFP_KERNEL);
 	if (ret)
-		msi_free_desc(desc);
+		goto fail;
+	return 0;
+
+fail:
+	msi_free_desc(desc);
 	return ret;
 }
 
@@ -117,7 +130,7 @@ int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid,
 	/* Copy type specific data to the new descriptor. */
 	desc->pci = init_desc->pci;
 
-	return msi_insert_desc(dev->msi.data, desc, domid, init_desc->msi_index);
+	return msi_insert_desc(dev, desc, domid, init_desc->msi_index);
 }
 
 static bool msi_desc_match(struct msi_desc *desc, enum msi_desc_filter filter)
@@ -136,11 +149,16 @@ static bool msi_desc_match(struct msi_desc *desc, enum msi_desc_filter filter)
 
 static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl)
 {
+	unsigned int hwsize;
+
 	if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS ||
-			 !dev->msi.data->__domains[ctrl->domid].domain ||
-			 ctrl->first > ctrl->last ||
-			 ctrl->first > MSI_MAX_INDEX ||
-			 ctrl->last > MSI_MAX_INDEX))
+			 !dev->msi.data->__domains[ctrl->domid].domain))
+		return false;
+
+	hwsize = msi_domain_get_hwsize(dev, ctrl->domid);
+	if (WARN_ON_ONCE(ctrl->first > ctrl->last ||
+			 ctrl->first >= hwsize ||
+			 ctrl->last >= hwsize))
 		return false;
 	return true;
 }
@@ -208,7 +226,7 @@ static int msi_domain_add_simple_msi_descs(struct device *dev, struct msi_ctrl *
 		desc = msi_alloc_desc(dev, 1, NULL);
 		if (!desc)
 			goto fail_mem;
-		ret = msi_insert_desc(dev->msi.data, desc, ctrl->domid, idx);
+		ret = msi_insert_desc(dev, desc, ctrl->domid, idx);
 		if (ret)
 			goto fail;
 	}
@@ -406,7 +424,10 @@ unsigned int msi_domain_get_virq(struct device *dev, unsigned int domid, unsigne
 	if (!dev->msi.data)
 		return 0;
 
-	if (WARN_ON_ONCE(index > MSI_MAX_INDEX || domid >= MSI_MAX_DEVICE_IRQDOMAINS))
+	if (WARN_ON_ONCE(domid >= MSI_MAX_DEVICE_IRQDOMAINS))
+		return 0;
+
+	if (WARN_ON_ONCE(index >= msi_domain_get_hwsize(dev, domid)))
 		return 0;
 
 	/* This check is only valid for the PCI default MSI domain */
@@ -568,6 +589,20 @@ static struct irq_domain *msi_get_device_domain(struct device *dev, unsigned int
 	return domain;
 }
 
+static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid)
+{
+	struct msi_domain_info *info;
+	struct irq_domain *domain;
+
+	domain = msi_get_device_domain(dev, domid);
+	if (domain) {
+		info = domain->host_data;
+		return info->hwsize;
+	}
+	/* No domain, no size... */
+	return 0;
+}
+
 static inline void irq_chip_write_msi_msg(struct irq_data *data,
 					  struct msi_msg *msg)
 {
@@ -1356,7 +1391,7 @@ int msi_domain_alloc_irqs_all_locked(struct device *dev, unsigned int domid, int
 	struct msi_ctrl ctrl = {
 		.domid	= domid,
 		.first	= 0,
-		.last	= MSI_MAX_INDEX,
+		.last	= msi_domain_get_hwsize(dev, domid) - 1,
 		.nirqs	= nirqs,
 	};
 
@@ -1470,7 +1505,8 @@ void msi_domain_free_irqs_range(struct device *dev, unsigned int domid,
  */
 void msi_domain_free_irqs_all_locked(struct device *dev, unsigned int domid)
 {
-	msi_domain_free_irqs_range_locked(dev, domid, 0, MSI_MAX_INDEX);
+	msi_domain_free_irqs_range_locked(dev, domid, 0,
+					  msi_domain_get_hwsize(dev, domid) - 1);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide msi_create/free_device_irq_domain()
  2022-11-24 23:25 ` [patch V3 07/33] genirq/msi: Provide msi_create/free_device_irq_domain() Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     27a6dea3ebaab3d6f8ded969ec3af710bcbe0c02
Gitweb:        https://git.kernel.org/tip/27a6dea3ebaab3d6f8ded969ec3af710bcbe0c02
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:25:56 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:01 +01:00

genirq/msi: Provide msi_create/free_device_irq_domain()

Now that all prerequsites are in place, provide the actual interfaces for
creating and removing per device interrupt domains.

MSI device interrupt domains are created from the provided
msi_domain_template which is duplicated so that it can be modified for the
particular device.

The name of the domain and the name of the interrupt chip are composed by
"$(PREFIX)$(CHIPNAME)-$(DEVNAME)"

  $PREFIX:   The optional prefix provided by the underlying MSI parent domain
             via msi_parent_ops::prefix.
  $CHIPNAME: The name of the irq_chip in the template
  $DEVNAME:  The name of the device

The domain is further initialized through a MSI parent domain callback which
fills in the required functionality for the parent domain or domains further
down the hierarchy. This initialization can fail, e.g. when the requested
feature or MSI domain type cannot be supported.

The domain pointer is stored in the pointer array inside of msi_device_data
which is attached to the domain.

The domain can be removed via the API or left for disposal via devres when
the device is torn down. The API removal is useful e.g. for PCI to have
seperate domains for MSI and MSI-X, which are mutually exclusive and always
occupy the default domain id slot.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.678838546@linutronix.de

---
 include/linux/msi.h |   6 ++-
 kernel/irq/msi.c    | 138 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 144 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 08a0e2a..ef46a3e 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -547,6 +547,12 @@ struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
 					 struct msi_domain_info *info,
 					 struct irq_domain *parent);
 
+bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
+				  const struct msi_domain_template *template,
+				  unsigned int hwsize, void *domain_data,
+				  void *chip_data);
+void msi_remove_device_irq_domain(struct device *dev, unsigned int domid);
+
 int msi_domain_alloc_irqs_range_locked(struct device *dev, unsigned int domid,
 				       unsigned int first, unsigned int last);
 int msi_domain_alloc_irqs_range(struct device *dev, unsigned int domid,
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 0f7fe56..8b415bd 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -240,6 +240,7 @@ static void msi_device_data_release(struct device *dev, void *res)
 	int i;
 
 	for (i = 0; i < MSI_MAX_DEVICE_IRQDOMAINS; i++) {
+		msi_remove_device_irq_domain(dev, i);
 		WARN_ON_ONCE(!xa_empty(&md->__domains[i].store));
 		xa_destroy(&md->__domains[i].store);
 	}
@@ -848,6 +849,143 @@ bool msi_parent_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 							 msi_child_info);
 }
 
+/**
+ * msi_create_device_irq_domain - Create a device MSI interrupt domain
+ * @dev:		Pointer to the device
+ * @domid:		Domain id
+ * @template:		MSI domain info bundle used as template
+ * @hwsize:		Maximum number of MSI table entries (0 if unknown or unlimited)
+ * @domain_data:	Optional pointer to domain specific data which is set in
+ *			msi_domain_info::data
+ * @chip_data:		Optional pointer to chip specific data which is set in
+ *			msi_domain_info::chip_data
+ *
+ * Return: True on success, false otherwise
+ *
+ * There is no firmware node required for this interface because the per
+ * device domains are software constructs which are actually closer to the
+ * hardware reality than any firmware can describe them.
+ *
+ * The domain name and the irq chip name for a MSI device domain are
+ * composed by: "$(PREFIX)$(CHIPNAME)-$(DEVNAME)"
+ *
+ * $PREFIX:   Optional prefix provided by the underlying MSI parent domain
+ *	      via msi_parent_ops::prefix. If that pointer is NULL the prefix
+ *	      is empty.
+ * $CHIPNAME: The name of the irq_chip in @template
+ * $DEVNAME:  The name of the device
+ *
+ * This results in understandable chip names and hardware interrupt numbers
+ * in e.g. /proc/interrupts
+ *
+ * PCI-MSI-0000:00:1c.0     0-edge  Parent domain has no prefix
+ * IR-PCI-MSI-0000:00:1c.4  0-edge  Same with interrupt remapping prefix 'IR-'
+ *
+ * IR-PCI-MSIX-0000:3d:00.0 0-edge  Hardware interrupt numbers reflect
+ * IR-PCI-MSIX-0000:3d:00.0 1-edge  the real MSI-X index on that device
+ * IR-PCI-MSIX-0000:3d:00.0 2-edge
+ *
+ * On IMS domains the hardware interrupt number is either a table entry
+ * index or a purely software managed index but it is guaranteed to be
+ * unique.
+ *
+ * The domain pointer is stored in @dev::msi::data::__irqdomains[]. All
+ * subsequent operations on the domain depend on the domain id.
+ *
+ * The domain is automatically freed when the device is removed via devres
+ * in the context of @dev::msi::data freeing, but it can also be
+ * independently removed via @msi_remove_device_irq_domain().
+ */
+bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
+				  const struct msi_domain_template *template,
+				  unsigned int hwsize, void *domain_data,
+				  void *chip_data)
+{
+	struct irq_domain *domain, *parent = dev->msi.domain;
+	const struct msi_parent_ops *pops;
+	struct msi_domain_template *bundle;
+	struct fwnode_handle *fwnode;
+
+	if (!irq_domain_is_msi_parent(parent))
+		return false;
+
+	if (domid >= MSI_MAX_DEVICE_IRQDOMAINS)
+		return false;
+
+	bundle = kmemdup(template, sizeof(*bundle), GFP_KERNEL);
+	if (!bundle)
+		return false;
+
+	bundle->info.hwsize = hwsize;
+	bundle->info.chip = &bundle->chip;
+	bundle->info.ops = &bundle->ops;
+	bundle->info.data = domain_data;
+	bundle->info.chip_data = chip_data;
+
+	pops = parent->msi_parent_ops;
+	snprintf(bundle->name, sizeof(bundle->name), "%s%s-%s",
+		 pops->prefix ? : "", bundle->chip.name, dev_name(dev));
+	bundle->chip.name = bundle->name;
+
+	fwnode = irq_domain_alloc_named_fwnode(bundle->name);
+	if (!fwnode)
+		goto free_bundle;
+
+	if (msi_setup_device_data(dev))
+		goto free_fwnode;
+
+	msi_lock_descs(dev);
+
+	if (WARN_ON_ONCE(msi_get_device_domain(dev, domid)))
+		goto fail;
+
+	if (!pops->init_dev_msi_info(dev, parent, parent, &bundle->info))
+		goto fail;
+
+	domain = __msi_create_irq_domain(fwnode, &bundle->info, IRQ_DOMAIN_FLAG_MSI_DEVICE, parent);
+	if (!domain)
+		goto fail;
+
+	domain->dev = dev;
+	dev->msi.data->__domains[domid].domain = domain;
+	msi_unlock_descs(dev);
+	return true;
+
+fail:
+	msi_unlock_descs(dev);
+free_fwnode:
+	kfree(fwnode);
+free_bundle:
+	kfree(bundle);
+	return false;
+}
+
+/**
+ * msi_remove_device_irq_domain - Free a device MSI interrupt domain
+ * @dev:	Pointer to the device
+ * @domid:	Domain id
+ */
+void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
+{
+	struct msi_domain_info *info;
+	struct irq_domain *domain;
+
+	msi_lock_descs(dev);
+
+	domain = msi_get_device_domain(dev, domid);
+
+	if (!domain || !irq_domain_is_msi_device(domain))
+		goto unlock;
+
+	dev->msi.data->__domains[domid].domain = NULL;
+	info = domain->host_data;
+	irq_domain_remove(domain);
+	kfree(container_of(info, struct msi_domain_template, info));
+
+unlock:
+	msi_unlock_descs(dev);
+}
+
 int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
 			    int nvec, msi_alloc_info_t *arg)
 {

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Add size info to struct msi_domain_info
  2022-11-24 23:25 ` [patch V3 04/33] genirq/msi: Add size info to struct msi_domain_info Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     61bf992fc618503c910416f28afa0b015838b72b
Gitweb:        https://git.kernel.org/tip/61bf992fc618503c910416f28afa0b015838b72b
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:25:51 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:01 +01:00

genirq/msi: Add size info to struct msi_domain_info

To allow proper range checking especially for dynamic allocations add a
size field to struct msi_domain_info. If the field is 0 then the size is
unknown or unlimited (up to MSI_MAX_INDEX) to provide backwards
compability.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.501144862@linutronix.de

---
 include/linux/msi.h |  5 +++++
 kernel/irq/msi.c    | 11 +++++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 7fb8737..08a0e2a 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -422,6 +422,10 @@ struct msi_domain_ops {
  * struct msi_domain_info - MSI interrupt domain data
  * @flags:		Flags to decribe features and capabilities
  * @bus_token:		The domain bus token
+ * @hwsize:		The hardware table size or the software index limit.
+ *			If 0 then the size is considered unlimited and
+ *			gets initialized to the maximum software index limit
+ *			by the domain creation code.
  * @ops:		The callback data structure
  * @chip:		Optional: associated interrupt chip
  * @chip_data:		Optional: associated interrupt chip data
@@ -433,6 +437,7 @@ struct msi_domain_ops {
 struct msi_domain_info {
 	u32				flags;
 	enum irq_domain_bus_token	bus_token;
+	unsigned int			hwsize;
 	struct msi_domain_ops		*ops;
 	struct irq_chip			*chip;
 	void				*chip_data;
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index c368116..0a38905 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -772,6 +772,17 @@ struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
 {
 	struct irq_domain *domain;
 
+	if (info->hwsize > MSI_XA_DOMAIN_SIZE)
+		return NULL;
+
+	/*
+	 * Hardware size 0 is valid for backwards compatibility and for
+	 * domains which are not backed by a hardware table. Grant the
+	 * maximum index space.
+	 */
+	if (!info->hwsize)
+		info->hwsize = MSI_XA_DOMAIN_SIZE;
+
 	msi_domain_update_dom_ops(info);
 	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
 		msi_domain_update_chip_ops(info);

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Split msi_create_irq_domain()
  2022-11-24 23:25 ` [patch V3 05/33] genirq/msi: Split msi_create_irq_domain() Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     a80c0aceeaffdb3afe9536fe747480e85841da7f
Gitweb:        https://git.kernel.org/tip/a80c0aceeaffdb3afe9536fe747480e85841da7f
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:25:52 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:01 +01:00

genirq/msi: Split msi_create_irq_domain()

Split the functionality of msi_create_irq_domain() so it can
be reused for creating per device irq domains.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.559086358@linutronix.de

---
 kernel/irq/msi.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 0a38905..0f7fe56 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -758,17 +758,10 @@ static void msi_domain_update_chip_ops(struct msi_domain_info *info)
 		chip->irq_set_affinity = msi_domain_set_affinity;
 }
 
-/**
- * msi_create_irq_domain - Create an MSI interrupt domain
- * @fwnode:	Optional fwnode of the interrupt controller
- * @info:	MSI domain info
- * @parent:	Parent irq domain
- *
- * Return: pointer to the created &struct irq_domain or %NULL on failure
- */
-struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
-					 struct msi_domain_info *info,
-					 struct irq_domain *parent)
+static struct irq_domain *__msi_create_irq_domain(struct fwnode_handle *fwnode,
+						  struct msi_domain_info *info,
+						  unsigned int flags,
+						  struct irq_domain *parent)
 {
 	struct irq_domain *domain;
 
@@ -787,7 +780,7 @@ struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
 	if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
 		msi_domain_update_chip_ops(info);
 
-	domain = irq_domain_create_hierarchy(parent, IRQ_DOMAIN_FLAG_MSI, 0,
+	domain = irq_domain_create_hierarchy(parent, flags | IRQ_DOMAIN_FLAG_MSI, 0,
 					     fwnode, &msi_domain_ops, info);
 
 	if (domain) {
@@ -800,6 +793,21 @@ struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
 }
 
 /**
+ * msi_create_irq_domain - Create an MSI interrupt domain
+ * @fwnode:	Optional fwnode of the interrupt controller
+ * @info:	MSI domain info
+ * @parent:	Parent irq domain
+ *
+ * Return: pointer to the created &struct irq_domain or %NULL on failure
+ */
+struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
+					 struct msi_domain_info *info,
+					 struct irq_domain *parent)
+{
+	return __msi_create_irq_domain(fwnode, info, 0, parent);
+}
+
+/**
  * msi_parent_init_dev_msi_info - Delegate initialization of device MSI info down
  *				  in the domain hierarchy
  * @dev:		The device for which the domain should be created

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide data structs for per device domains
  2022-11-24 23:25 ` [patch V3 03/33] genirq/msi: Provide data structs for per device domains Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     ebca4396ee18521e9e5d435a15e5d0ab2eb6b009
Gitweb:        https://git.kernel.org/tip/ebca4396ee18521e9e5d435a15e5d0ab2eb6b009
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:25:49 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:01 +01:00

genirq/msi: Provide data structs for per device domains

Provide struct msi_domain_template which contains a bundle of struct
irq_chip, struct msi_domain_ops and struct msi_domain_info and a name
field.

This template is used by MSI device domain implementations to provide the
domain specific functionality, feature bits etc.

When a MSI domain is created the template is duplicated in the core code
so that it can be modified per instance. That means templates can be
marked const at the MSI device domain code.

The template is a bundle to avoid several allocations and duplications
of the involved structures.

The name field is used to construct the final domain and chip name via:

    $PREFIX$NAME-$DEVNAME

where prefix is the optional prefix of the MSI parent domain, $NAME is the
provided name in template::chip and the device name so that the domain
is properly identified. On x86 this results for PCI/MSI in:

   PCI-MSI-0000:3d:00.1 or IR-PCI-MSIX-0000:3d:00.1

depending on the domain type and the availability of remapping.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.442499757@linutronix.de

---
 include/linux/msi.h | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 9bf3cba..7fb8737 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -24,6 +24,7 @@
 #include <linux/xarray.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
+#include <linux/irq.h>
 #include <linux/bits.h>
 
 #include <asm/msi.h>
@@ -74,7 +75,6 @@ struct msi_msg {
 
 extern int pci_msi_ignore_mask;
 /* Helper functions */
-struct irq_data;
 struct msi_desc;
 struct pci_dev;
 struct platform_msi_priv_data;
@@ -442,6 +442,20 @@ struct msi_domain_info {
 	void				*data;
 };
 
+/**
+ * struct msi_domain_template - Template for MSI device domains
+ * @name:	Storage for the resulting name. Filled in by the core.
+ * @chip:	Interrupt chip for this domain
+ * @ops:	MSI domain ops
+ * @info:	MSI domain info data
+ */
+struct msi_domain_template {
+	char			name[48];
+	struct irq_chip		chip;
+	struct msi_domain_ops	ops;
+	struct msi_domain_info	info;
+};
+
 /*
  * Flags for msi_domain_info
  *

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Rearrange MSI domain flags
  2022-11-24 23:25 ` [patch V3 01/33] genirq/msi: Rearrange MSI domain flags Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Jason Gunthorpe, Kevin Tian, Marc Zyngier, x86,
	linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     2d958b02b04f18955b0e15eda531461153c399d4
Gitweb:        https://git.kernel.org/tip/2d958b02b04f18955b0e15eda531461153c399d4
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:25:46 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:01 +01:00

genirq/msi: Rearrange MSI domain flags

These flags got added as necessary and have no obvious structure. For
feature support checks and masking it's convenient to have two blocks of
flags:

   1) Flags to control the internal behaviour like allocating/freeing
      MSI descriptors. Those flags do not need any support from the
      underlying MSI parent domain. They are mostly under the control
      of the outermost domain which implements the actual MSI support.

   2) Flags to expose features, e.g. PCI multi-MSI or requirements
      which can depend on a underlying domain.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.322714918@linutronix.de

---
 include/linux/msi.h | 49 ++++++++++++++++++++++++++++++--------------
 1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 43b8866..a4339eb 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -24,6 +24,8 @@
 #include <linux/xarray.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
+#include <linux/bits.h>
+
 #include <asm/msi.h>
 
 /* Dummy shadow structures if an architecture does not define them */
@@ -440,7 +442,16 @@ struct msi_domain_info {
 	void				*data;
 };
 
-/* Flags for msi_domain_info */
+/*
+ * Flags for msi_domain_info
+ *
+ * Bit 0-15:	Generic MSI functionality which is not subject to restriction
+ *		by parent domains
+ *
+ * Bit 16-31:	Functionality which depends on the underlying parent domain and
+ *		can be masked out by msi_parent_ops::init_dev_msi_info() when
+ *		a device MSI domain is initialized.
+ */
 enum {
 	/*
 	 * Init non implemented ops callbacks with default MSI domain
@@ -452,33 +463,41 @@ enum {
 	 * callbacks.
 	 */
 	MSI_FLAG_USE_DEF_CHIP_OPS	= (1 << 1),
-	/* Support multiple PCI MSI interrupts */
-	MSI_FLAG_MULTI_PCI_MSI		= (1 << 2),
-	/* Support PCI MSIX interrupts */
-	MSI_FLAG_PCI_MSIX		= (1 << 3),
 	/* Needs early activate, required for PCI */
-	MSI_FLAG_ACTIVATE_EARLY		= (1 << 4),
+	MSI_FLAG_ACTIVATE_EARLY		= (1 << 2),
 	/*
 	 * Must reactivate when irq is started even when
 	 * MSI_FLAG_ACTIVATE_EARLY has been set.
 	 */
-	MSI_FLAG_MUST_REACTIVATE	= (1 << 5),
-	/* Is level-triggered capable, using two messages */
-	MSI_FLAG_LEVEL_CAPABLE		= (1 << 6),
+	MSI_FLAG_MUST_REACTIVATE	= (1 << 3),
 	/* Populate sysfs on alloc() and destroy it on free() */
-	MSI_FLAG_DEV_SYSFS		= (1 << 7),
-	/* MSI-X entries must be contiguous */
-	MSI_FLAG_MSIX_CONTIGUOUS	= (1 << 8),
+	MSI_FLAG_DEV_SYSFS		= (1 << 4),
 	/* Allocate simple MSI descriptors */
-	MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS	= (1 << 9),
+	MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS	= (1 << 5),
 	/* Free MSI descriptors */
-	MSI_FLAG_FREE_MSI_DESCS		= (1 << 10),
+	MSI_FLAG_FREE_MSI_DESCS		= (1 << 6),
 	/*
 	 * Quirk to handle MSI implementations which do not provide
 	 * masking. Currently known to affect x86, but has to be partially
 	 * handled in the core MSI code.
 	 */
-	MSI_FLAG_NOMASK_QUIRK		= (1 << 11),
+	MSI_FLAG_NOMASK_QUIRK		= (1 << 7),
+
+	/* Mask for the generic functionality */
+	MSI_GENERIC_FLAGS_MASK		= GENMASK(15, 0),
+
+	/* Mask for the domain specific functionality */
+	MSI_DOMAIN_FLAGS_MASK		= GENMASK(31, 16),
+
+	/* Support multiple PCI MSI interrupts */
+	MSI_FLAG_MULTI_PCI_MSI		= (1 << 16),
+	/* Support PCI MSIX interrupts */
+	MSI_FLAG_PCI_MSIX		= (1 << 17),
+	/* Is level-triggered capable, using two messages */
+	MSI_FLAG_LEVEL_CAPABLE		= (1 << 18),
+	/* MSI-X entries must be contiguous */
+	MSI_FLAG_MSIX_CONTIGUOUS	= (1 << 19),
+
 };
 
 int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask,

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide struct msi_parent_ops
  2022-11-24 23:25 ` [patch V3 02/33] genirq/msi: Provide struct msi_parent_ops Thomas Gleixner
@ 2022-12-05 18:25   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 18:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     b78780d93b068706d04f8f2f02bd08db5da01479
Gitweb:        https://git.kernel.org/tip/b78780d93b068706d04f8f2f02bd08db5da01479
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:25:48 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 19:21:01 +01:00

genirq/msi: Provide struct msi_parent_ops

MSI parent domains must have some control over the MSI domains which are
built on top. On domain creation they need to fill in e.g. architecture
specific chip callbacks or msi domain ops to make the outermost domain
parent agnostic which is obviously required for architecture independence
etc.

The structure contains:

    1) A bitfield which exposes the supported functional features. This
       allows to check for features and is also used in the initialization
       callback to mask out unsupported features when the actual domain
       implementation requests a broader range, e.g. on x86 PCI multi-MSI
       is only supported by remapping domains but not by the underlying
       vector domain. The PCI/MSI code can then always request multi-MSI
       support, but the resulting feature set after creation might not
       have it set.

    2) An optional string prefix which is put in front of domain and chip
       names during creation of the MSI domain. That allows to keep the
       naming schemes e.g. on x86 where PCI-MSI domains have a IR- prefix
       when interrupt remapping is enabled.

    3) An initialization callback to sanity check the domain info of
       the to be created MSI domain, to restrict features and to
       apply changes in MSI ops and interrupt chip callbacks to
       accomodate to the particular MSI parent implementation and/or
       the underlying hierarchy.

Add a conveniance function to delegate the initialization from the
MSI parent domain to an underlying domain in the hierarchy.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.382485843@linutronix.de

---
 include/linux/irqdomain.h |  5 +++++-
 include/linux/msi.h       | 21 +++++++++++++++++++-
 kernel/irq/msi.c          | 41 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 67 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 24b7668..a668cc0 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -46,6 +46,7 @@ struct irq_desc;
 struct cpumask;
 struct seq_file;
 struct irq_affinity_desc;
+struct msi_parent_ops;
 
 #define IRQ_DOMAIN_IRQ_SPEC_PARAMS 16
 
@@ -134,6 +135,7 @@ struct irq_domain_chip_generic;
  * @pm_dev:	Pointer to a device that can be utilized for power management
  *		purposes related to the irq domain.
  * @parent:	Pointer to parent irq_domain to support hierarchy irq_domains
+ * @msi_parent_ops: Pointer to MSI parent domain methods for per device domain init
  *
  * Revmap data, used internally by the irq domain code:
  * @revmap_size:	Size of the linear map table @revmap[]
@@ -157,6 +159,9 @@ struct irq_domain {
 #ifdef	CONFIG_IRQ_DOMAIN_HIERARCHY
 	struct irq_domain		*parent;
 #endif
+#ifdef CONFIG_GENERIC_MSI_IRQ
+	const struct msi_parent_ops	*msi_parent_ops;
+#endif
 
 	/* reverse map data. The linear map gets appended to the irq_domain */
 	irq_hw_number_t			hwirq_max;
diff --git a/include/linux/msi.h b/include/linux/msi.h
index a4339eb..9bf3cba 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -500,6 +500,27 @@ enum {
 
 };
 
+/**
+ * struct msi_parent_ops - MSI parent domain callbacks and configuration info
+ *
+ * @supported_flags:	Required: The supported MSI flags of the parent domain
+ * @prefix:		Optional: Prefix for the domain and chip name
+ * @init_dev_msi_info:	Required: Callback for MSI parent domains to setup parent
+ *			domain specific domain flags, domain ops and interrupt chip
+ *			callbacks when a per device domain is created.
+ */
+struct msi_parent_ops {
+	u32		supported_flags;
+	const char	*prefix;
+	bool		(*init_dev_msi_info)(struct device *dev, struct irq_domain *domain,
+					     struct irq_domain *msi_parent_domain,
+					     struct msi_domain_info *msi_child_info);
+};
+
+bool msi_parent_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+				  struct irq_domain *msi_parent_domain,
+				  struct msi_domain_info *msi_child_info);
+
 int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask,
 			    bool force);
 
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 8e653f0..c368116 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -788,6 +788,47 @@ struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
 	return domain;
 }
 
+/**
+ * msi_parent_init_dev_msi_info - Delegate initialization of device MSI info down
+ *				  in the domain hierarchy
+ * @dev:		The device for which the domain should be created
+ * @domain:		The domain in the hierarchy this op is being called on
+ * @msi_parent_domain:	The IRQ_DOMAIN_FLAG_MSI_PARENT domain for the child to
+ *			be created
+ * @msi_child_info:	The MSI domain info of the IRQ_DOMAIN_FLAG_MSI_DEVICE
+ *			domain to be created
+ *
+ * Return: true on success, false otherwise
+ *
+ * This is the most complex problem of per device MSI domains and the
+ * underlying interrupt domain hierarchy:
+ *
+ * The device domain to be initialized requests the broadest feature set
+ * possible and the underlying domain hierarchy puts restrictions on it.
+ *
+ * That's trivial for a simple parent->child relationship, but it gets
+ * interesting with an intermediate domain: root->parent->child.  The
+ * intermediate 'parent' can expand the capabilities which the 'root'
+ * domain is providing. So that creates a classic hen and egg problem:
+ * Which entity is doing the restrictions/expansions?
+ *
+ * One solution is to let the root domain handle the initialization that's
+ * why there is the @domain and the @msi_parent_domain pointer.
+ */
+bool msi_parent_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+				  struct irq_domain *msi_parent_domain,
+				  struct msi_domain_info *msi_child_info)
+{
+	struct irq_domain *parent = domain->parent;
+
+	if (WARN_ON_ONCE(!parent || !parent->msi_parent_ops ||
+			 !parent->msi_parent_ops->init_dev_msi_info))
+		return false;
+
+	return parent->msi_parent_ops->init_dev_msi_info(dev, parent, msi_parent_domain,
+							 msi_child_info);
+}
+
 int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
 			    int nvec, msi_alloc_info_t *arg)
 {

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] iommu/amd: Enable PCI/IMS
  2022-11-24 23:26 ` [patch V3 32/33] iommu/amd: " Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     fa5745aca1dc819aee6463a2475b5c277f7cf8f6
Gitweb:        https://git.kernel.org/tip/fa5745aca1dc819aee6463a2475b5c277f7cf8f6
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:36 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:35 +01:00

iommu/amd: Enable PCI/IMS

PCI/IMS works like PCI/MSI-X in the remapping. Just add the feature flag,
but only when on real hardware.

Virtualized IOMMUs need additional support.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232327.140571546@linutronix.de

---
 drivers/iommu/amd/iommu.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7caccd8..4d28967 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3649,11 +3649,20 @@ static struct irq_chip amd_ir_chip = {
 };
 
 static const struct msi_parent_ops amdvi_msi_parent_ops = {
-	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI |
+				  MSI_FLAG_PCI_IMS,
 	.prefix			= "IR-",
 	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
 };
 
+static const struct msi_parent_ops virt_amdvi_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "vIR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 {
 	struct fwnode_handle *fn;
@@ -3670,7 +3679,11 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 
 	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_AMDVI);
 	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
-	iommu->ir_domain->msi_parent_ops = &amdvi_msi_parent_ops;
+
+	if (amd_iommu_np_cache)
+		iommu->ir_domain->msi_parent_ops = &virt_amdvi_msi_parent_ops;
+	else
+		iommu->ir_domain->msi_parent_ops = &amdvi_msi_parent_ops;
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Provide pci_ims_alloc/free_irq()
  2022-11-24 23:26 ` [patch V3 29/33] PCI/MSI: Provide pci_ims_alloc/free_irq() Thomas Gleixner
  2022-11-28  4:47   ` Tian, Kevin
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Bjorn Helgaas, Marc Zyngier, x86,
	linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     c9e5bea273834a63b5e9ba90ad94b305ba50704e
Gitweb:        https://git.kernel.org/tip/c9e5bea273834a63b5e9ba90ad94b305ba50704e
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:31 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:35 +01:00

PCI/MSI: Provide pci_ims_alloc/free_irq()

Single vector allocation which allocates the next free index in the IMS
space. The free function releases.

All allocated vectors are released also via pci_free_vectors() which is
also releasing MSI/MSI-X vectors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.961711347@linutronix.de

---
 drivers/pci/msi/api.c | 50 ++++++++++++++++++++++++++++++++++++++++++-
 include/linux/pci.h   |  3 +++-
 2 files changed, 53 insertions(+)

diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
index c8816db..b8009aa 100644
--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -366,6 +366,56 @@ const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr)
 EXPORT_SYMBOL(pci_irq_get_affinity);
 
 /**
+ * pci_ims_alloc_irq - Allocate an interrupt on a PCI/IMS interrupt domain
+ * @dev:	The PCI device to operate on
+ * @icookie:	Pointer to an IMS implementation specific cookie for this
+ *		IMS instance (PASID, queue ID, pointer...).
+ *		The cookie content is copied into the MSI descriptor for the
+ *		interrupt chip callbacks or domain specific setup functions.
+ * @affdesc:	Optional pointer to an interrupt affinity descriptor
+ *
+ * There is no index for IMS allocations as IMS is an implementation
+ * specific storage and does not have any direct associations between
+ * index, which might be a pure software construct, and device
+ * functionality. This association is established by the driver either via
+ * the index - if there is a hardware table - or in case of purely software
+ * managed IMS implementation the association happens via the
+ * irq_write_msi_msg() callback of the implementation specific interrupt
+ * chip, which utilizes the provided @icookie to store the MSI message in
+ * the appropriate place.
+ *
+ * Return: A struct msi_map
+ *
+ *	On success msi_map::index contains the allocated index (>= 0) and
+ *	msi_map::virq the allocated Linux interrupt number (> 0).
+ *
+ *	On fail msi_map::index contains the error code and msi_map::virq
+ *	is set to 0.
+ */
+struct msi_map pci_ims_alloc_irq(struct pci_dev *dev, union msi_instance_cookie *icookie,
+				 const struct irq_affinity_desc *affdesc)
+{
+	return msi_domain_alloc_irq_at(&dev->dev, MSI_SECONDARY_DOMAIN, MSI_ANY_INDEX,
+				       affdesc, icookie);
+}
+EXPORT_SYMBOL_GPL(pci_ims_alloc_irq);
+
+/**
+ * pci_ims_free_irq - Allocate an interrupt on a PCI/IMS interrupt domain
+ *		      which was allocated via pci_ims_alloc_irq()
+ * @dev:	The PCI device to operate on
+ * @map:	A struct msi_map describing the interrupt to free as
+ *		returned from pci_ims_alloc_irq()
+ */
+void pci_ims_free_irq(struct pci_dev *dev, struct msi_map map)
+{
+	if (WARN_ON_ONCE(map.index < 0 || map.virq <= 0))
+		return;
+	msi_domain_free_irqs_range(&dev->dev, MSI_SECONDARY_DOMAIN, map.index, map.index);
+}
+EXPORT_SYMBOL_GPL(pci_ims_free_irq);
+
+/**
  * pci_free_irq_vectors() - Free previously allocated IRQs for a device
  * @dev: the PCI device to operate on
  *
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1592b63..aa514b5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2491,6 +2491,9 @@ struct msi_domain_template;
 
 bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
 			   unsigned int hwsize, void *data);
+struct msi_map pci_ims_alloc_irq(struct pci_dev *pdev, union msi_instance_cookie *icookie,
+				 const struct irq_affinity_desc *affdesc);
+void pci_ims_free_irq(struct pci_dev *pdev, struct msi_map map);
 
 #include <linux/dma-mapping.h>
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] iommu/vt-d: Enable PCI/IMS
  2022-11-24 23:26 ` [patch V3 31/33] iommu/vt-d: " Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     810531a1af5393f010d6508b1cb48e6650fc5e8f
Gitweb:        https://git.kernel.org/tip/810531a1af5393f010d6508b1cb48e6650fc5e8f
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:34 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:35 +01:00

iommu/vt-d: Enable PCI/IMS

PCI/IMS works like PCI/MSI-X in the remapping. Just add the feature flag,
but only when on real hardware.

Virtualized IOMMUs need additional support, e.g. for PASID.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232327.081482253@linutronix.de

---
 drivers/iommu/intel/irq_remapping.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 6fab407..a723f53 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -82,7 +82,7 @@ static const struct irq_domain_ops intel_ir_domain_ops;
 
 static void iommu_disable_irq_remapping(struct intel_iommu *iommu);
 static int __init parse_ioapics_under_ir(void);
-static const struct msi_parent_ops dmar_msi_parent_ops;
+static const struct msi_parent_ops dmar_msi_parent_ops, virt_dmar_msi_parent_ops;
 
 static bool ir_pre_enabled(struct intel_iommu *iommu)
 {
@@ -577,7 +577,11 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 
 	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_DMAR);
 	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
-	iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops;
+
+	if (cap_caching_mode(iommu->cap))
+		iommu->ir_domain->msi_parent_ops = &virt_dmar_msi_parent_ops;
+	else
+		iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops;
 
 	ir_table->base = page_address(pages);
 	ir_table->bitmap = bitmap;
@@ -1429,11 +1433,20 @@ static const struct irq_domain_ops intel_ir_domain_ops = {
 };
 
 static const struct msi_parent_ops dmar_msi_parent_ops = {
-	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI |
+				  MSI_FLAG_PCI_IMS,
 	.prefix			= "IR-",
 	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
 };
 
+static const struct msi_parent_ops virt_dmar_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED |
+				  MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "vIR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 /*
  * Support of Interrupt Remapping Unit Hotplug
  */

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] x86/apic/msi: Enable PCI/IMS
  2022-11-24 23:26 ` [patch V3 30/33] x86/apic/msi: Enable PCI/IMS Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     6e24c887732901140f4e82ba2315c2e15f06f1d6
Gitweb:        https://git.kernel.org/tip/6e24c887732901140f4e82ba2315c2e15f06f1d6
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:32 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:35 +01:00

x86/apic/msi: Enable PCI/IMS

Enable IMS in the domain init and allocation mapping code, but do not
enable it on the vector domain as discussed in various threads on LKML.

The interrupt remap domains can expand this setting like they do with
PCI multi MSI.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232327.022658817@linutronix.de

---
 arch/x86/kernel/apic/msi.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 682f51a..35d5b8f 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -184,6 +184,7 @@ static int x86_msi_prepare(struct irq_domain *domain, struct device *dev,
 		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
 		return 0;
 	case DOMAIN_BUS_PCI_DEVICE_MSIX:
+	case DOMAIN_BUS_PCI_DEVICE_IMS:
 		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
 		return 0;
 	default:
@@ -230,6 +231,10 @@ static bool x86_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 	case DOMAIN_BUS_PCI_DEVICE_MSI:
 	case DOMAIN_BUS_PCI_DEVICE_MSIX:
 		break;
+	case DOMAIN_BUS_PCI_DEVICE_IMS:
+		if (!(pops->supported_flags & MSI_FLAG_PCI_IMS))
+			return false;
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		return false;

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Provide IMS (Interrupt Message Store) support
  2022-11-24 23:26 ` [patch V3 28/33] PCI/MSI: Provide IMS (Interrupt Message Store) support Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2024-03-27 16:32   ` [patch V3 28/33] " Bjorn Helgaas
  2 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     0194425af0c87acaad457989a2c6d90dba58e776
Gitweb:        https://git.kernel.org/tip/0194425af0c87acaad457989a2c6d90dba58e776
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:29 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:34 +01:00

PCI/MSI: Provide IMS (Interrupt Message Store) support

IMS (Interrupt Message Store) is a new specification which allows
implementation specific storage of MSI messages contrary to the
strict standard specified MSI and MSI-X message stores.

This requires new device specific interrupt domains to handle the
implementation defined storage which can be an array in device memory or
host/guest memory which is shared with hardware queues.

Add a function to create IMS domains for PCI devices. IMS domains are using
the new per device domain mechanism and are configured by the device driver
via a template. IMS domains are created as secondary device domains so they
work side on side with MSI[-X] on the same device.

The IMS domains have a few constraints:

  - The index space is managed by the core code.

    Device memory based IMS provides a storage array with a fixed size
    which obviously requires an index. But there is no association between
    index and functionality so the core can randomly allocate an index in
    the array.

    System memory based IMS does not have the concept of an index as the
    storage is somewhere in memory. In that case the index is purely
    software based to keep track of the allocations.

  - There is no requirement for consecutive index ranges

    This is currently a limitation of the MSI core and can be implemented
    if there is a justified use case by changing the internal storage from
    xarray to maple_tree. For now it's single vector allocation.

  - The interrupt chip must provide the following callbacks:

  	- irq_mask()
	- irq_unmask()
	- irq_write_msi_msg()

   - The interrupt chip must provide the following optional callbacks
     when the irq_mask(), irq_unmask() and irq_write_msi_msg() callbacks
     cannot operate directly on hardware, e.g. in the case that the
     interrupt message store is in queue memory:

     	- irq_bus_lock()
	- irq_bus_unlock()

     These callbacks are invoked from preemptible task context and are
     allowed to sleep. In this case the mandatory callbacks above just
     store the information. The irq_bus_unlock() callback is supposed to
     make the change effective before returning.

   - Interrupt affinity setting is handled by the underlying parent
     interrupt domain and communicated to the IMS domain via
     irq_write_msi_msg(). IMS domains cannot have a irq_set_affinity()
     callback. That's a reasonable restriction similar to the PCI/MSI
     device domain implementations.

The domain is automatically destroyed when the PCI device is removed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.904316841@linutronix.de

---
 drivers/pci/msi/irqdomain.c | 59 ++++++++++++++++++++++++++++++++++++-
 include/linux/pci.h         |  5 +++-
 2 files changed, 64 insertions(+)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index deb1930..e33bcc8 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -355,6 +355,65 @@ bool pci_msi_domain_supports(struct pci_dev *pdev, unsigned int feature_mask,
 	return (supported & feature_mask) == feature_mask;
 }
 
+/**
+ * pci_create_ims_domain - Create a secondary IMS domain for a PCI device
+ * @pdev:	The PCI device to operate on
+ * @template:	The MSI info template which describes the domain
+ * @hwsize:	The size of the hardware entry table or 0 if the domain
+ *		is purely software managed
+ * @data:	Optional pointer to domain specific data to be stored
+ *		in msi_domain_info::data
+ *
+ * Return: True on success, false otherwise
+ *
+ * An IMS domain is expected to have the following constraints:
+ *	- The index space is managed by the core code
+ *
+ *	- There is no requirement for consecutive index ranges
+ *
+ *	- The interrupt chip must provide the following callbacks:
+ *		- irq_mask()
+ *		- irq_unmask()
+ *		- irq_write_msi_msg()
+ *
+ *	- The interrupt chip must provide the following optional callbacks
+ *	  when the irq_mask(), irq_unmask() and irq_write_msi_msg() callbacks
+ *	  cannot operate directly on hardware, e.g. in the case that the
+ *	  interrupt message store is in queue memory:
+ *		- irq_bus_lock()
+ *		- irq_bus_unlock()
+ *
+ *	  These callbacks are invoked from preemptible task context and are
+ *	  allowed to sleep. In this case the mandatory callbacks above just
+ *	  store the information. The irq_bus_unlock() callback is supposed
+ *	  to make the change effective before returning.
+ *
+ *	- Interrupt affinity setting is handled by the underlying parent
+ *	  interrupt domain and communicated to the IMS domain via
+ *	  irq_write_msi_msg().
+ *
+ * The domain is automatically destroyed when the PCI device is removed.
+ */
+bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
+			   unsigned int hwsize, void *data)
+{
+	struct irq_domain *domain = dev_get_msi_domain(&pdev->dev);
+
+	if (!domain || !irq_domain_is_msi_parent(domain))
+		return false;
+
+	if (template->info.bus_token != DOMAIN_BUS_PCI_DEVICE_IMS ||
+	    !(template->info.flags & MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS) ||
+	    !(template->info.flags & MSI_FLAG_FREE_MSI_DESCS) ||
+	    !template->chip.irq_mask || !template->chip.irq_unmask ||
+	    !template->chip.irq_write_msi_msg || template->chip.irq_set_affinity)
+		return false;
+
+	return msi_create_device_irq_domain(&pdev->dev, MSI_SECONDARY_DOMAIN, template,
+					    hwsize, data, NULL);
+}
+EXPORT_SYMBOL_GPL(pci_create_ims_domain);
+
 /*
  * Users of the generic MSI infrastructure expect a device to have a single ID,
  * so with DMA aliases we have to pick the least-worst compromise. Devices with
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 68b14ba..1592b63 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2487,6 +2487,11 @@ static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev)
 void pci_uevent_ers(struct pci_dev *pdev, enum  pci_ers_result err_type);
 #endif
 
+struct msi_domain_template;
+
+bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
+			   unsigned int hwsize, void *data);
+
 #include <linux/dma-mapping.h>
 
 #define pci_printk(level, pdev, fmt, arg...) \

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN
  2022-11-24 23:26 ` [patch V3 22/33] genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Jason Gunthorpe, Kevin Tian, Marc Zyngier, x86,
	linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     b834e3c08fc6c4460c2bce6575cba4705f6301e3
Gitweb:        https://git.kernel.org/tip/b834e3c08fc6c4460c2bce6575cba4705f6301e3
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:20 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:34 +01:00

genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN

Provide a new MSI feature flag in preparation for dynamic MSIX allocation
after the initial MSI-X enable has been done.

This needs to be an explicit MSI interrupt domain feature because quite
some implementations (both interrupt domains and legacy allocation mode)
have clear expectations that the allocation code is only invoked when MSI-X
is about to be enabled. They either talk to hypervisors or do some other
work and are not prepared to be invoked on an already MSI-X enabled device.

This is also explicit MSI-X only because rewriting the size of the MSI
entries is only possible when disabling MSI which in turn might cause lost
interrupts on the device.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.558843119@linutronix.de

---
 include/linux/msi.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 00c5019..3cb1586 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -557,7 +557,8 @@ enum {
 	MSI_FLAG_LEVEL_CAPABLE		= (1 << 18),
 	/* MSI-X entries must be contiguous */
 	MSI_FLAG_MSIX_CONTIGUOUS	= (1 << 19),
-
+	/* PCI/MSI-X vectors can be dynamically allocated/freed post MSI-X enable */
+	MSI_FLAG_PCI_MSIX_ALLOC_DYN	= (1 << 20),
 };
 
 /**

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Provide prepare_desc() MSI domain op
  2022-11-24 23:26 ` [patch V3 24/33] PCI/MSI: Provide prepare_desc() MSI domain op Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Jason Gunthorpe, Kevin Tian, Bjorn Helgaas,
	Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     73bd063ca03493f44e0700cc08824093da9741bc
Gitweb:        https://git.kernel.org/tip/73bd063ca03493f44e0700cc08824093da9741bc
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:23 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:34 +01:00

PCI/MSI: Provide prepare_desc() MSI domain op

The setup of MSI descriptors for PCI/MSI-X interrupts depends partially on
the MSI index for which the descriptor is initialized.

Dynamic MSI-X vector allocation post MSI-X enablement allows to allocate
vectors at a given index or at any free index in the available table
range. The latter requires that the descriptor is initialized after the
MSI core has chosen an index.

Implement the prepare_desc() op in the PCI/MSI-X specific msi_domain_ops
which is invoked before the core interrupt descriptor and the associated
Linux interrupt number is allocated.

That callback is also provided for the upcoming PCI/IMS implementations so
the implementation specific interrupt domain can do their domain specific
initialization of the MSI descriptors.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.673658806@linutronix.de

---
 drivers/pci/msi/irqdomain.c |  9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index 4736403..8afaef1 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -202,6 +202,14 @@ static void pci_irq_unmask_msix(struct irq_data *data)
 	pci_msix_unmask(irq_data_get_msi_desc(data));
 }
 
+static void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
+				  struct msi_desc *desc)
+{
+	/* Don't fiddle with preallocated MSI descriptors */
+	if (!desc->pci.mask_base)
+		msix_prepare_msi_desc(to_pci_dev(desc->dev), desc);
+}
+
 static const struct msi_domain_template pci_msix_template = {
 	.chip = {
 		.name			= "PCI-MSIX",
@@ -212,6 +220,7 @@ static const struct msi_domain_template pci_msix_template = {
 	},
 
 	.ops = {
+		.prepare_desc		= pci_msix_prepare_desc,
 		.set_desc		= pci_device_domain_set_desc,
 	},
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide constants for PCI/IMS support
  2022-11-24 23:26 ` [patch V3 27/33] genirq/msi: Provide constants for PCI/IMS support Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     e23d4192bf9b612bce5b24f22719fd3cc6edaa69
Gitweb:        https://git.kernel.org/tip/e23d4192bf9b612bce5b24f22719fd3cc6edaa69
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:28 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:34 +01:00

genirq/msi: Provide constants for PCI/IMS support

Provide the necessary constants for PCI/IMS support:

  - A new bus token for MSI irqdomain identification
  - A MSI feature flag for the MSI irqdomains to signal support
  - A secondary domain id

The latter expands the device internal domain pointer storage array from 1
to 2 entries. That extra pointer is mostly unused today, but the
alternative solutions would not be free either and would introduce more
complexity all over the place. Trade the 8bytes for simplicity.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.846169830@linutronix.de

---
 include/linux/irqdomain_defs.h | 1 +
 include/linux/msi.h            | 2 ++
 include/linux/msi_api.h        | 1 +
 3 files changed, 4 insertions(+)

diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index 0b2d8a8..c29921f 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -25,6 +25,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_PCI_DEVICE_MSIX,
 	DOMAIN_BUS_DMAR,
 	DOMAIN_BUS_AMDVI,
+	DOMAIN_BUS_PCI_DEVICE_IMS,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 3cb1586..a112b91 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -559,6 +559,8 @@ enum {
 	MSI_FLAG_MSIX_CONTIGUOUS	= (1 << 19),
 	/* PCI/MSI-X vectors can be dynamically allocated/freed post MSI-X enable */
 	MSI_FLAG_PCI_MSIX_ALLOC_DYN	= (1 << 20),
+	/* Support for PCI/IMS */
+	MSI_FLAG_PCI_IMS		= (1 << 21),
 };
 
 /**
diff --git a/include/linux/msi_api.h b/include/linux/msi_api.h
index 5ae72d1..391087a 100644
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -15,6 +15,7 @@ struct device;
  */
 enum msi_domain_ids {
 	MSI_DEFAULT_DOMAIN,
+	MSI_SECONDARY_DOMAIN,
 	MSI_MAX_DEVICE_IRQDOMAINS,
 };
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN
  2022-11-24 23:26 ` [patch V3 26/33] x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     486254ad967dbef37fd797dd296fe69b465aa0f9
Gitweb:        https://git.kernel.org/tip/486254ad967dbef37fd797dd296fe69b465aa0f9
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:26 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:34 +01:00

x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN

x86 MSI irqdomains can handle MSI-X allocation post MSI-X enable just out
of the box - on the vector domain and on the remapping domains,

Add the feature flag to the supported feature list

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.787373104@linutronix.de

---
 arch/x86/include/asm/msi.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index 7702958..935c6d4 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -63,7 +63,7 @@ struct msi_msg;
 u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid);
 
 #define X86_VECTOR_MSI_FLAGS_SUPPORTED					\
-	(MSI_GENERIC_FLAGS_MASK | MSI_FLAG_PCI_MSIX)
+	(MSI_GENERIC_FLAGS_MASK | MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN)
 
 #define X86_VECTOR_MSI_FLAGS_REQUIRED					\
 	(MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS)

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Split MSI-X descriptor setup
  2022-11-24 23:26 ` [patch V3 23/33] PCI/MSI: Split MSI-X descriptor setup Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Bjorn Helgaas, Marc Zyngier, x86,
	linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     612ad43330d98f800f3784d68e7d8ab66d17a512
Gitweb:        https://git.kernel.org/tip/612ad43330d98f800f3784d68e7d8ab66d17a512
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:21 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:34 +01:00

PCI/MSI: Split MSI-X descriptor setup

The upcoming mechanism to allocate MSI-X vectors after enabling MSI-X needs
to share some of the MSI-X descriptor setup.

The regular descriptor setup on enable has the following code flow:

    1) Allocate descriptor
    2) Setup descriptor with PCI specific data
    3) Insert descriptor
    4) Allocate interrupts which in turn scans the inserted
       descriptors

This cannot be easily changed because the PCI/MSI code needs to handle the
legacy architecture specific allocation model and the irq domain model
where quite some domains have the assumption that the above flow is how it
works.

Ideally the code flow should look like this:

   1) Invoke allocation at the MSI core
   2) MSI core allocates descriptor
   3) MSI core calls back into the irq domain which fills in
      the domain specific parts

This could be done for underlying parent MSI domains which support
post-enable allocation/free but that would create significantly different
code pathes for MSI/MSI-X enable.

Though for dynamic allocation which wants to share the allocation code with
the upcoming PCI/IMS support it's the right thing to do.

Split the MSI-X descriptor setup into the preallocation part which just sets
the index and fills in the horrible hack of virtual IRQs and the real PCI
specific MSI-X setup part which solely depends on the index in the
descriptor. This allows to provide a common dynamic allocation interface at
the MSI core level for both PCI/MSI-X and PCI/IMS.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.616292598@linutronix.de

---
 drivers/pci/msi/msi.c | 72 ++++++++++++++++++++++++++----------------
 drivers/pci/msi/msi.h |  2 +-
 2 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index b8d74df..1f71662 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -569,34 +569,56 @@ static void __iomem *msix_map_region(struct pci_dev *dev,
 	return ioremap(phys_addr, nr_entries * PCI_MSIX_ENTRY_SIZE);
 }
 
-static int msix_setup_msi_descs(struct pci_dev *dev, void __iomem *base,
-				struct msix_entry *entries, int nvec,
-				struct irq_affinity_desc *masks)
+/**
+ * msix_prepare_msi_desc - Prepare a half initialized MSI descriptor for operation
+ * @dev:	The PCI device for which the descriptor is prepared
+ * @desc:	The MSI descriptor for preparation
+ *
+ * This is separate from msix_setup_msi_descs() below to handle dynamic
+ * allocations for MSI-X after initial enablement.
+ *
+ * Ideally the whole MSI-X setup would work that way, but there is no way to
+ * support this for the legacy arch_setup_msi_irqs() mechanism and for the
+ * fake irq domains like the x86 XEN one. Sigh...
+ *
+ * The descriptor is zeroed and only @desc::msi_index and @desc::affinity
+ * are set. When called from msix_setup_msi_descs() then the is_virtual
+ * attribute is initialized as well.
+ *
+ * Fill in the rest.
+ */
+void msix_prepare_msi_desc(struct pci_dev *dev, struct msi_desc *desc)
+{
+	desc->nvec_used				= 1;
+	desc->pci.msi_attrib.is_msix		= 1;
+	desc->pci.msi_attrib.is_64		= 1;
+	desc->pci.msi_attrib.default_irq	= dev->irq;
+	desc->pci.mask_base			= dev->msix_base;
+	desc->pci.msi_attrib.can_mask		= !pci_msi_ignore_mask &&
+						  !desc->pci.msi_attrib.is_virtual;
+
+	if (desc->pci.msi_attrib.can_mask) {
+		void __iomem *addr = pci_msix_desc_addr(desc);
+
+		desc->pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
+	}
+}
+
+static int msix_setup_msi_descs(struct pci_dev *dev, struct msix_entry *entries,
+				int nvec, struct irq_affinity_desc *masks)
 {
 	int ret = 0, i, vec_count = pci_msix_vec_count(dev);
 	struct irq_affinity_desc *curmsk;
 	struct msi_desc desc;
-	void __iomem *addr;
 
 	memset(&desc, 0, sizeof(desc));
 
-	desc.nvec_used			= 1;
-	desc.pci.msi_attrib.is_msix	= 1;
-	desc.pci.msi_attrib.is_64	= 1;
-	desc.pci.msi_attrib.default_irq	= dev->irq;
-	desc.pci.mask_base		= base;
-
 	for (i = 0, curmsk = masks; i < nvec; i++, curmsk++) {
 		desc.msi_index = entries ? entries[i].entry : i;
 		desc.affinity = masks ? curmsk : NULL;
 		desc.pci.msi_attrib.is_virtual = desc.msi_index >= vec_count;
-		desc.pci.msi_attrib.can_mask = !pci_msi_ignore_mask &&
-					       !desc.pci.msi_attrib.is_virtual;
 
-		if (desc.pci.msi_attrib.can_mask) {
-			addr = pci_msix_desc_addr(&desc);
-			desc.pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
-		}
+		msix_prepare_msi_desc(dev, &desc);
 
 		ret = msi_insert_msi_desc(&dev->dev, &desc);
 		if (ret)
@@ -629,9 +651,8 @@ static void msix_mask_all(void __iomem *base, int tsize)
 		writel(ctrl, base + PCI_MSIX_ENTRY_VECTOR_CTRL);
 }
 
-static int msix_setup_interrupts(struct pci_dev *dev, void __iomem *base,
-				 struct msix_entry *entries, int nvec,
-				 struct irq_affinity *affd)
+static int msix_setup_interrupts(struct pci_dev *dev, struct msix_entry *entries,
+				 int nvec, struct irq_affinity *affd)
 {
 	struct irq_affinity_desc *masks = NULL;
 	int ret;
@@ -640,7 +661,7 @@ static int msix_setup_interrupts(struct pci_dev *dev, void __iomem *base,
 		masks = irq_create_affinity_masks(nvec, affd);
 
 	msi_lock_descs(&dev->dev);
-	ret = msix_setup_msi_descs(dev, base, entries, nvec, masks);
+	ret = msix_setup_msi_descs(dev, entries, nvec, masks);
 	if (ret)
 		goto out_free;
 
@@ -678,7 +699,6 @@ out_unlock:
 static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 				int nvec, struct irq_affinity *affd)
 {
-	void __iomem *base;
 	int ret, tsize;
 	u16 control;
 
@@ -696,15 +716,13 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &control);
 	/* Request & Map MSI-X table region */
 	tsize = msix_table_size(control);
-	base = msix_map_region(dev, tsize);
-	if (!base) {
+	dev->msix_base = msix_map_region(dev, tsize);
+	if (!dev->msix_base) {
 		ret = -ENOMEM;
 		goto out_disable;
 	}
 
-	dev->msix_base = base;
-
-	ret = msix_setup_interrupts(dev, base, entries, nvec, affd);
+	ret = msix_setup_interrupts(dev, entries, nvec, affd);
 	if (ret)
 		goto out_disable;
 
@@ -719,7 +737,7 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	 * which takes the MSI-X mask bits into account even
 	 * when MSI-X is disabled, which prevents MSI delivery.
 	 */
-	msix_mask_all(base, tsize);
+	msix_mask_all(dev->msix_base, tsize);
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 
 	pcibios_free_irq(dev);
diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
index 74408cc..ee53cf0 100644
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -84,6 +84,8 @@ static inline __attribute_const__ u32 msi_multi_mask(struct msi_desc *desc)
 	return (1 << (1 << desc->pci.msi_attrib.multi_cap)) - 1;
 }
 
+void msix_prepare_msi_desc(struct pci_dev *dev, struct msi_desc *desc);
+
 /* Subsystem variables */
 extern int pci_msi_enable;
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] iommu/amd: Switch to MSI base domains
  2022-11-24 23:26 ` [patch V3 16/33] iommu/amd: Switch to MSI base domains Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     cc7594ffadde77e2825faf1c576230530c829bc3
Gitweb:        https://git.kernel.org/tip/cc7594ffadde77e2825faf1c576230530c829bc3
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:10 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:33 +01:00

iommu/amd: Switch to MSI base domains

Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.209212272@linutronix.de

---
 arch/x86/kernel/apic/msi.c          |  1 +
 drivers/iommu/amd/amd_iommu_types.h |  1 -
 drivers/iommu/amd/iommu.c           | 19 +++++++++++++------
 include/linux/irqdomain_defs.h      |  1 +
 4 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index a8dccb0..d198da3 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -218,6 +218,7 @@ static bool x86_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 		info->flags |= MSI_FLAG_NOMASK_QUIRK;
 		break;
 	case DOMAIN_BUS_DMAR:
+	case DOMAIN_BUS_AMDVI:
 		break;
 	default:
 		WARN_ON_ONCE(1);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 1d0a70c..3d68419 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -734,7 +734,6 @@ struct amd_iommu {
 	u8 max_counters;
 #ifdef CONFIG_IRQ_REMAP
 	struct irq_domain *ir_domain;
-	struct irq_domain *msi_domain;
 
 	struct amd_irte_ops *irte_ops;
 #endif
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 67e209c..7caccd8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -815,7 +815,7 @@ amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu)
 	    !pci_dev_has_default_msi_parent_domain(to_pci_dev(dev)))
 		return;
 
-	dev_set_msi_domain(dev, iommu->msi_domain);
+	dev_set_msi_domain(dev, iommu->ir_domain);
 }
 
 #else /* CONFIG_IRQ_REMAP */
@@ -3648,6 +3648,12 @@ static struct irq_chip amd_ir_chip = {
 	.irq_compose_msi_msg	= ir_compose_msi_msg,
 };
 
+static const struct msi_parent_ops amdvi_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "IR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 {
 	struct fwnode_handle *fn;
@@ -3655,16 +3661,17 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 	fn = irq_domain_alloc_named_id_fwnode("AMD-IR", iommu->index);
 	if (!fn)
 		return -ENOMEM;
-	iommu->ir_domain = irq_domain_create_tree(fn, &amd_ir_domain_ops, iommu);
+	iommu->ir_domain = irq_domain_create_hierarchy(arch_get_ir_parent_domain(), 0, 0,
+						       fn, &amd_ir_domain_ops, iommu);
 	if (!iommu->ir_domain) {
 		irq_domain_free_fwnode(fn);
 		return -ENOMEM;
 	}
 
-	iommu->ir_domain->parent = arch_get_ir_parent_domain();
-	iommu->msi_domain = arch_create_remap_msi_irq_domain(iommu->ir_domain,
-							     "AMD-IR-MSI",
-							     iommu->index);
+	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_AMDVI);
+	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
+	iommu->ir_domain->msi_parent_ops = &amdvi_msi_parent_ops;
+
 	return 0;
 }
 
diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index 3a09396..0b2d8a8 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -24,6 +24,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_PCI_DEVICE_MSI,
 	DOMAIN_BUS_PCI_DEVICE_MSIX,
 	DOMAIN_BUS_DMAR,
+	DOMAIN_BUS_AMDVI,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
  2022-11-24 23:26 ` [patch V3 17/33] x86/apic/msi: Remove arch_create_remap_msi_irq_domain() Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     4d5a4ccc519ab0a62e220dc8dcd8bc1c5f8fee10
Gitweb:        https://git.kernel.org/tip/4d5a4ccc519ab0a62e220dc8dcd8bc1c5f8fee10
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:12 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:33 +01:00

x86/apic/msi: Remove arch_create_remap_msi_irq_domain()

and related code which is not longer required now that the interrupt remap
code has been converted to MSI parent domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.267353814@linutronix.de

---
 arch/x86/include/asm/irq_remapping.h |  4 +---
 arch/x86/kernel/apic/msi.c           | 42 +---------------------------
 2 files changed, 1 insertion(+), 45 deletions(-)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 7cc4943..7a2ed15 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -44,10 +44,6 @@ extern int irq_remapping_reenable(int);
 extern int irq_remap_enable_fault_handling(void);
 extern void panic_if_irq_remap(const char *msg);
 
-/* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
-extern struct irq_domain *
-arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int id);
-
 /* Get parent irqdomain for interrupt remapping irqdomain */
 static inline struct irq_domain *arch_get_ir_parent_domain(void)
 {
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index d198da3..682f51a 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -277,7 +277,7 @@ void __init x86_create_pci_msi_domain(void)
 	x86_pci_msi_default_domain = x86_init.irqs.create_pci_msi_domain();
 }
 
-/* Keep around for hyperV and the remap code below */
+/* Keep around for hyperV */
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 		    msi_alloc_info_t *arg)
 {
@@ -291,46 +291,6 @@ int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
 }
 EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
-#ifdef CONFIG_IRQ_REMAP
-static struct msi_domain_ops pci_msi_domain_ops = {
-	.msi_prepare	= pci_msi_prepare,
-};
-
-static struct irq_chip pci_msi_ir_controller = {
-	.name			= "IR-PCI-MSI",
-	.irq_unmask		= pci_msi_unmask_irq,
-	.irq_mask		= pci_msi_mask_irq,
-	.irq_ack		= irq_chip_ack_parent,
-	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.flags			= IRQCHIP_SKIP_SET_WAKE |
-				  IRQCHIP_AFFINITY_PRE_STARTUP,
-};
-
-static struct msi_domain_info pci_msi_ir_domain_info = {
-	.flags		= MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
-			  MSI_FLAG_MULTI_PCI_MSI | MSI_FLAG_PCI_MSIX,
-	.ops		= &pci_msi_domain_ops,
-	.chip		= &pci_msi_ir_controller,
-	.handler	= handle_edge_irq,
-	.handler_name	= "edge",
-};
-
-struct irq_domain *arch_create_remap_msi_irq_domain(struct irq_domain *parent,
-						    const char *name, int id)
-{
-	struct fwnode_handle *fn;
-	struct irq_domain *d;
-
-	fn = irq_domain_alloc_named_id_fwnode(name, id);
-	if (!fn)
-		return NULL;
-	d = pci_msi_create_irq_domain(fn, &pci_msi_ir_domain_info, parent);
-	if (!d)
-		irq_domain_free_fwnode(fn);
-	return d;
-}
-#endif
-
 #ifdef CONFIG_DMAR_TABLE
 /*
  * The Intel IOMMU (ab)uses the high bits of the MSI address to contain the

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide msi_domain_alloc_irq_at()
  2022-11-24 23:26 ` [patch V3 21/33] genirq/msi: Provide msi_domain_alloc_irq_at() Thomas Gleixner
  2022-11-28 14:39   ` Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     3d393b21740bffbeeae7d4fa534a6b16c3e3e832
Gitweb:        https://git.kernel.org/tip/3d393b21740bffbeeae7d4fa534a6b16c3e3e832
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:18 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:34 +01:00

genirq/msi: Provide msi_domain_alloc_irq_at()

For supporting post MSI-X enable allocations and for the upcoming PCI/IMS
support a separate interface is required which allows not only the
allocation of a specific index, but also the allocation of any, i.e. the
next free index. The latter is especially required for IMS because IMS
completely does away with index to functionality mappings which are
often found in MSI/MSI-X implementation.

But even with MSI-X there are devices where only the first few indices have
a fixed functionality and the rest is freely assignable by software,
e.g. to queues.

msi_domain_alloc_irq_at() is also different from the range based interfaces
as it always enforces that the MSI descriptor is allocated by the core code
and not preallocated by the caller like the PCI/MSI[-X] enable code path
does.

msi_domain_alloc_irq_at() can be invoked with the index argument set to
MSI_ANY_INDEX which makes the core code pick the next free index. The irq
domain can provide a prepare_desc() operation callback in it's
msi_domain_ops to do domain specific post allocation initialization before
the actual Linux interrupt and the associated interrupt descriptor and
hierarchy alloccations are conducted.

The function also takes an optional @icookie argument which is of type
union msi_instance_cookie. This cookie is not used by the core code and is
stored in the allocated msi_desc::data::icookie. The meaning of the cookie
is completely implementation defined. In case of IMS this might be a PASID
or a pointer to a device queue, but for the MSI core it's opaque and not
used in any way.

The function returns a struct msi_map which on success contains the
allocated index number and the Linux interrupt number so the caller can
spare the index to Linux interrupt number lookup.

On failure map::index contains the error code and map::virq is 0.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.501359457@linutronix.de

---
 include/linux/msi.h     |   4 +-
 include/linux/msi_api.h |   7 +++-
 kernel/irq/msi.c        | 105 +++++++++++++++++++++++++++++++++++----
 3 files changed, 106 insertions(+), 10 deletions(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index cb0bee3..00c5019 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -80,6 +80,7 @@ struct pci_dev;
 struct platform_msi_priv_data;
 struct device_attribute;
 struct irq_domain;
+struct irq_affinity_desc;
 
 void __get_cached_msi_msg(struct msi_desc *entry, struct msi_msg *msg);
 #ifdef CONFIG_GENERIC_MSI_IRQ
@@ -602,6 +603,9 @@ int msi_domain_alloc_irqs_range(struct device *dev, unsigned int domid,
 				unsigned int first, unsigned int last);
 int msi_domain_alloc_irqs_all_locked(struct device *dev, unsigned int domid, int nirqs);
 
+struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index,
+				       const struct irq_affinity_desc *affdesc,
+				       union msi_instance_cookie *cookie);
 
 void msi_domain_free_irqs_range_locked(struct device *dev, unsigned int domid,
 				       unsigned int first, unsigned int last);
diff --git a/include/linux/msi_api.h b/include/linux/msi_api.h
index 2e4456e..5ae72d1 100644
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -48,6 +48,13 @@ struct msi_map {
 	int	virq;
 };
 
+/*
+ * Constant to be used for dynamic allocations when the allocation is any
+ * free MSI index, which is either an entry in a hardware table or a
+ * software managed index.
+ */
+#define MSI_ANY_INDEX		UINT_MAX
+
 unsigned int msi_domain_get_virq(struct device *dev, unsigned int domid, unsigned int index);
 
 /**
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 6370ea5..bd4d4dd 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -90,17 +90,30 @@ static int msi_insert_desc(struct device *dev, struct msi_desc *desc,
 	int ret;
 
 	hwsize = msi_domain_get_hwsize(dev, domid);
-	if (index >= hwsize) {
-		ret = -ERANGE;
-		goto fail;
-	}
 
-	desc->msi_index = index;
-	ret = xa_insert(xa, index, desc, GFP_KERNEL);
-	if (ret)
-		goto fail;
-	return 0;
+	if (index == MSI_ANY_INDEX) {
+		struct xa_limit limit = { .min = 0, .max = hwsize - 1 };
+		unsigned int index;
 
+		/* Let the xarray allocate a free index within the limit */
+		ret = xa_alloc(xa, &index, desc, limit, GFP_KERNEL);
+		if (ret)
+			goto fail;
+
+		desc->msi_index = index;
+		return 0;
+	} else {
+		if (index >= hwsize) {
+			ret = -ERANGE;
+			goto fail;
+		}
+
+		desc->msi_index = index;
+		ret = xa_insert(xa, index, desc, GFP_KERNEL);
+		if (ret)
+			goto fail;
+		return 0;
+	}
 fail:
 	msi_free_desc(desc);
 	return ret;
@@ -294,7 +307,7 @@ int msi_setup_device_data(struct device *dev)
 	}
 
 	for (i = 0; i < MSI_MAX_DEVICE_IRQDOMAINS; i++)
-		xa_init(&md->__domains[i].store);
+		xa_init_flags(&md->__domains[i].store, XA_FLAGS_ALLOC);
 
 	/*
 	 * If @dev::msi::domain is set and is a global MSI domain, copy the
@@ -1402,6 +1415,78 @@ int msi_domain_alloc_irqs_all_locked(struct device *dev, unsigned int domid, int
 	return msi_domain_alloc_locked(dev, &ctrl);
 }
 
+/**
+ * msi_domain_alloc_irq_at - Allocate an interrupt from a MSI interrupt domain at
+ *			     a given index - or at the next free index
+ *
+ * @dev:	Pointer to device struct of the device for which the interrupts
+ *		are allocated
+ * @domid:	Id of the interrupt domain to operate on
+ * @index:	Index for allocation. If @index == %MSI_ANY_INDEX the allocation
+ *		uses the next free index.
+ * @affdesc:	Optional pointer to an interrupt affinity descriptor structure
+ * @icookie:	Optional pointer to a domain specific per instance cookie. If
+ *		non-NULL the content of the cookie is stored in msi_desc::data.
+ *		Must be NULL for MSI-X allocations
+ *
+ * This requires a MSI interrupt domain which lets the core code manage the
+ * MSI descriptors.
+ *
+ * Return: struct msi_map
+ *
+ *	On success msi_map::index contains the allocated index number and
+ *	msi_map::virq the corresponding Linux interrupt number
+ *
+ *	On failure msi_map::index contains the error code and msi_map::virq
+ *	is %0.
+ */
+struct msi_map msi_domain_alloc_irq_at(struct device *dev, unsigned int domid, unsigned int index,
+				       const struct irq_affinity_desc *affdesc,
+				       union msi_instance_cookie *icookie)
+{
+	struct msi_ctrl ctrl = { .domid	= domid, .nirqs = 1, };
+	struct irq_domain *domain;
+	struct msi_map map = { };
+	struct msi_desc *desc;
+	int ret;
+
+	msi_lock_descs(dev);
+	domain = msi_get_device_domain(dev, domid);
+	if (!domain) {
+		map.index = -ENODEV;
+		goto unlock;
+	}
+
+	desc = msi_alloc_desc(dev, 1, affdesc);
+	if (!desc) {
+		map.index = -ENOMEM;
+		goto unlock;
+	}
+
+	if (icookie)
+		desc->data.icookie = *icookie;
+
+	ret = msi_insert_desc(dev, desc, domid, index);
+	if (ret) {
+		map.index = ret;
+		goto unlock;
+	}
+
+	ctrl.first = ctrl.last = desc->msi_index;
+
+	ret = __msi_domain_alloc_irqs(dev, domain, &ctrl);
+	if (ret) {
+		map.index = ret;
+		msi_domain_free_locked(dev, &ctrl);
+	} else {
+		map.index = desc->msi_index;
+		map.virq = desc->irq;
+	}
+unlock:
+	msi_unlock_descs(dev);
+	return map;
+}
+
 static void __msi_domain_free_irqs(struct device *dev, struct irq_domain *domain,
 				   struct msi_ctrl *ctrl)
 {

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide struct msi_map
  2022-11-24 23:26 ` [patch V3 18/33] genirq/msi: Provide struct msi_map Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     06bff9e347271566e8dd79e7c3eb971660209a00
Gitweb:        https://git.kernel.org/tip/06bff9e347271566e8dd79e7c3eb971660209a00
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:13 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:33 +01:00

genirq/msi: Provide struct msi_map

A simple struct to hold a MSI index / Linux interrupt number pair. It will
be returned from the dynamic vector allocation function and handed back to
the corresponding free() function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.326410494@linutronix.de

---
 include/linux/msi_api.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/msi_api.h b/include/linux/msi_api.h
index 8640171..4cb7f4c 100644
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -18,6 +18,19 @@ enum msi_domain_ids {
 	MSI_MAX_DEVICE_IRQDOMAINS,
 };
 
+/**
+ * msi_map - Mapping between MSI index and Linux interrupt number
+ * @index:	The MSI index, e.g. slot in the MSI-X table or
+ *		a software managed index if >= 0. If negative
+ *		the allocation function failed and it contains
+ *		the error code.
+ * @virq:	The associated Linux interrupt number
+ */
+struct msi_map {
+	int	index;
+	int	virq;
+};
+
 unsigned int msi_domain_get_virq(struct device *dev, unsigned int domid, unsigned int index);
 
 /**

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide msi_desc:: Msi_data
  2022-11-24 23:26 ` [patch V3 19/33] genirq/msi: Provide msi_desc::msi_data Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] genirq/msi: Provide msi_desc:: Msi_data tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     efd42049657e958797a483f793e4064042faa49c
Gitweb:        https://git.kernel.org/tip/efd42049657e958797a483f793e4064042faa49c
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:15 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:33 +01:00

genirq/msi: Provide msi_desc:: Msi_data

The upcoming support for PCI/IMS requires to store some information related
to the message handling in the MSI descriptor, e.g. PASID or a pointer to a
queue.

Provide a generic storage struct which maps over the existing PCI specific
storage which means the size of struct msi_desc is not getting bigger.

This storage struct has two elements:

  1) msi_domain_cookie
  2) msi_instance_cookie

The domain cookie is going to be used to store domain specific information,
e.g. iobase pointer, data pointer.

The instance cookie is going to be handed in when allocating an interrupt
on an IMS domain so the irq chip callbacks of the IMS domain have the
necessary per vector information available. It also comes in handy when
cleaning up the platform MSI code for wire to MSI bridges which need to
hand down the type information to the underlying interrupt domain.

For the core code the cookies are opaque and meaningless. It just stores
the instance cookie during an allocation through the upcoming interfaces
for IMS and wire to MSI brigdes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.385036043@linutronix.de

---
 include/linux/msi.h     | 38 +++++++++++++++++++++++++++++++++++++-
 include/linux/msi_api.h | 17 +++++++++++++++++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index b5dda4b..dca3b80 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -125,6 +125,38 @@ struct pci_msi_desc {
 	};
 };
 
+/**
+ * union msi_domain_cookie - Opaque MSI domain specific data
+ * @value:	u64 value store
+ * @ptr:	Pointer to domain specific data
+ * @iobase:	Domain specific IOmem pointer
+ *
+ * The content of this data is implementation defined and used by the MSI
+ * domain to store domain specific information which is requried for
+ * interrupt chip callbacks.
+ */
+union msi_domain_cookie {
+	u64	value;
+	void	*ptr;
+	void	__iomem *iobase;
+};
+
+/**
+ * struct msi_desc_data - Generic MSI descriptor data
+ * @dcookie:	Cookie for MSI domain specific data which is required
+ *		for irq_chip callbacks
+ * @icookie:	Cookie for the MSI interrupt instance provided by
+ *		the usage site to the allocation function
+ *
+ * The content of this data is implementation defined, e.g. PCI/IMS
+ * implementations define the meaning of the data. The MSI core ignores
+ * this data completely.
+ */
+struct msi_desc_data {
+	union msi_domain_cookie		dcookie;
+	union msi_instance_cookie	icookie;
+};
+
 #define MSI_MAX_INDEX		((unsigned int)USHRT_MAX)
 
 /**
@@ -142,6 +174,7 @@ struct pci_msi_desc {
  *
  * @msi_index:	Index of the msi descriptor
  * @pci:	PCI specific msi descriptor data
+ * @data:	Generic MSI descriptor data
  */
 struct msi_desc {
 	/* Shared device/bus type independent data */
@@ -161,7 +194,10 @@ struct msi_desc {
 	void *write_msi_msg_data;
 
 	u16				msi_index;
-	struct pci_msi_desc		pci;
+	union {
+		struct pci_msi_desc	pci;
+		struct msi_desc_data	data;
+	};
 };
 
 /*
diff --git a/include/linux/msi_api.h b/include/linux/msi_api.h
index 4cb7f4c..2e4456e 100644
--- a/include/linux/msi_api.h
+++ b/include/linux/msi_api.h
@@ -19,6 +19,23 @@ enum msi_domain_ids {
 };
 
 /**
+ * union msi_instance_cookie - MSI instance cookie
+ * @value:	u64 value store
+ * @ptr:	Pointer to usage site specific data
+ *
+ * This cookie is handed to the IMS allocation function and stored in the
+ * MSI descriptor for the interrupt chip callbacks.
+ *
+ * The content of this cookie is MSI domain implementation defined.  For
+ * PCI/IMS implementations this could be a PASID or a pointer to queue
+ * memory.
+ */
+union msi_instance_cookie {
+	u64	value;
+	void	*ptr;
+};
+
+/**
  * msi_map - Mapping between MSI index and Linux interrupt number
  * @index:	The MSI index, e.g. slot in the MSI-X table or
  *		a software managed index if >= 0. If negative

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide msi_domain_ops:: Prepare_desc()
  2022-11-24 23:26 ` [patch V3 20/33] genirq/msi: Provide msi_domain_ops::prepare_desc() Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] genirq/msi: Provide msi_domain_ops:: Prepare_desc() tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     8f986fd7755bec8b8c5776824afa1bd1151986d9
Gitweb:        https://git.kernel.org/tip/8f986fd7755bec8b8c5776824afa1bd1151986d9
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:16 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:33 +01:00

genirq/msi: Provide msi_domain_ops:: Prepare_desc()

The existing MSI domain ops msi_prepare() and set_desc() turned out to be
unsuitable for implementing IMS support.

msi_prepare() does not operate on the MSI descriptors. set_desc() lacks
an irq_domain pointer and has a completely different purpose.

Introduce a prepare_desc() op which allows IMS implementations to amend an
MSI descriptor which was allocated by the core code, e.g. by adjusting the
iomem base or adding some data based on the allocated index. This is way
better than requiring that all IMS domain implementations preallocate the
MSI descriptor and then allocate the interrupt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.444560717@linutronix.de

---
 include/linux/msi.h | 6 +++++-
 kernel/irq/msi.c    | 3 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index dca3b80..cb0bee3 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -410,6 +410,8 @@ struct msi_domain_info;
  * @msi_init:		Domain specific init function for MSI interrupts
  * @msi_free:		Domain specific function to free a MSI interrupts
  * @msi_prepare:	Prepare the allocation of the interrupts in the domain
+ * @prepare_desc:	Optional function to prepare the allocated MSI descriptor
+ *			in the domain
  * @set_desc:		Set the msi descriptor for an interrupt
  * @domain_alloc_irqs:	Optional function to override the default allocation
  *			function.
@@ -421,7 +423,7 @@ struct msi_domain_info;
  * @get_hwirq, @msi_init and @msi_free are callbacks used by the underlying
  * irqdomain.
  *
- * @msi_check, @msi_prepare and @set_desc are callbacks used by the
+ * @msi_check, @msi_prepare, @prepare_desc and @set_desc are callbacks used by the
  * msi_domain_alloc/free_irqs*() variants.
  *
  * @domain_alloc_irqs, @domain_free_irqs can be used to override the
@@ -444,6 +446,8 @@ struct msi_domain_ops {
 	int		(*msi_prepare)(struct irq_domain *domain,
 				       struct device *dev, int nvec,
 				       msi_alloc_info_t *arg);
+	void		(*prepare_desc)(struct irq_domain *domain, msi_alloc_info_t *arg,
+					struct msi_desc *desc);
 	void		(*set_desc)(msi_alloc_info_t *arg,
 				    struct msi_desc *desc);
 	int		(*domain_alloc_irqs)(struct irq_domain *domain,
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 0749e66..6370ea5 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1254,6 +1254,9 @@ static int __msi_domain_alloc_irqs(struct device *dev, struct irq_domain *domain
 		if (WARN_ON_ONCE(allocated >= ctrl->nirqs))
 			return -EINVAL;
 
+		if (ops->prepare_desc)
+			ops->prepare_desc(domain, &arg, desc);
+
 		ops->set_desc(&arg, desc);
 
 		virq = __irq_domain_alloc_irqs(domain, -1, desc->nvec_used,

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] iommu/vt-d: Switch to MSI parent domains
  2022-11-24 23:26 ` [patch V3 15/33] iommu/vt-d: Switch to MSI parent domains Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     9a945234abea27d45f8d89e1a1b35ab5bf41dd01
Gitweb:        https://git.kernel.org/tip/9a945234abea27d45f8d89e1a1b35ab5bf41dd01
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:08 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:33 +01:00

iommu/vt-d: Switch to MSI parent domains

Remove the global PCI/MSI irqdomain implementation and provide the required
MSI parent ops so the PCI/MSI code can detect the new parent and setup per
device domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.151226317@linutronix.de

---
 arch/x86/kernel/apic/msi.c          |  2 ++
 drivers/iommu/intel/iommu.h         |  1 -
 drivers/iommu/intel/irq_remapping.c | 27 ++++++++++++---------------
 include/linux/irqdomain_defs.h      |  1 +
 4 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index db96bfc..a8dccb0 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -217,6 +217,8 @@ static bool x86_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 		/* See msi_set_affinity() for the gory details */
 		info->flags |= MSI_FLAG_NOMASK_QUIRK;
 		break;
+	case DOMAIN_BUS_DMAR:
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		return false;
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 92023df..6eadb86 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -600,7 +600,6 @@ struct intel_iommu {
 #ifdef CONFIG_IRQ_REMAP
 	struct ir_table *ir_table;	/* Interrupt remapping info */
 	struct irq_domain *ir_domain;
-	struct irq_domain *ir_msi_domain;
 #endif
 	struct iommu_device iommu;  /* IOMMU core code handle */
 	int		node;
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 08bbf08..6fab407 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -82,6 +82,7 @@ static const struct irq_domain_ops intel_ir_domain_ops;
 
 static void iommu_disable_irq_remapping(struct intel_iommu *iommu);
 static int __init parse_ioapics_under_ir(void);
+static const struct msi_parent_ops dmar_msi_parent_ops;
 
 static bool ir_pre_enabled(struct intel_iommu *iommu)
 {
@@ -230,7 +231,7 @@ static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
 {
 	struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
 
-	return drhd ? drhd->iommu->ir_msi_domain : NULL;
+	return drhd ? drhd->iommu->ir_domain : NULL;
 }
 
 static int clear_entries(struct irq_2_iommu *irq_iommu)
@@ -573,10 +574,10 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 		pr_err("IR%d: failed to allocate irqdomain\n", iommu->seq_id);
 		goto out_free_fwnode;
 	}
-	iommu->ir_msi_domain =
-		arch_create_remap_msi_irq_domain(iommu->ir_domain,
-						 "INTEL-IR-MSI",
-						 iommu->seq_id);
+
+	irq_domain_update_bus_token(iommu->ir_domain,  DOMAIN_BUS_DMAR);
+	iommu->ir_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
+	iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops;
 
 	ir_table->base = page_address(pages);
 	ir_table->bitmap = bitmap;
@@ -620,9 +621,6 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 	return 0;
 
 out_free_ir_domain:
-	if (iommu->ir_msi_domain)
-		irq_domain_remove(iommu->ir_msi_domain);
-	iommu->ir_msi_domain = NULL;
 	irq_domain_remove(iommu->ir_domain);
 	iommu->ir_domain = NULL;
 out_free_fwnode:
@@ -644,13 +642,6 @@ static void intel_teardown_irq_remapping(struct intel_iommu *iommu)
 	struct fwnode_handle *fn;
 
 	if (iommu && iommu->ir_table) {
-		if (iommu->ir_msi_domain) {
-			fn = iommu->ir_msi_domain->fwnode;
-
-			irq_domain_remove(iommu->ir_msi_domain);
-			irq_domain_free_fwnode(fn);
-			iommu->ir_msi_domain = NULL;
-		}
 		if (iommu->ir_domain) {
 			fn = iommu->ir_domain->fwnode;
 
@@ -1437,6 +1428,12 @@ static const struct irq_domain_ops intel_ir_domain_ops = {
 	.deactivate = intel_irq_remapping_deactivate,
 };
 
+static const struct msi_parent_ops dmar_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED | MSI_FLAG_MULTI_PCI_MSI,
+	.prefix			= "IR-",
+	.init_dev_msi_info	= msi_parent_init_dev_msi_info,
+};
+
 /*
  * Support of Interrupt Remapping Unit Hotplug
  */
diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index b3f4b7e..3a09396 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -23,6 +23,7 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_VMD_MSI,
 	DOMAIN_BUS_PCI_DEVICE_MSI,
 	DOMAIN_BUS_PCI_DEVICE_MSIX,
+	DOMAIN_BUS_DMAR,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Add support for per device MSI[X] domains
  2022-11-24 23:26 ` [patch V3 12/33] PCI/MSI: Add support for per device MSI[X] domains Thomas Gleixner
  2022-11-28  4:46   ` Tian, Kevin
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ahmed S. Darwish, Thomas Gleixner, Kevin Tian, Bjorn Helgaas,
	Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     15c72f824b32761696b1854500bb3dedccbbb45a
Gitweb:        https://git.kernel.org/tip/15c72f824b32761696b1854500bb3dedccbbb45a
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:04 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:32 +01:00

PCI/MSI: Add support for per device MSI[X] domains

Provide a template and the necessary callbacks to create PCI/MSI and
PCI/MSI-X domains.

The domains are created when MSI or MSI-X is enabled. The domain's lifetime
is either the device lifetime or in case that e.g. MSI-X was tried first
and failed, then the MSI-X domain is removed and a MSI domain is created as
both are mutually exclusive and reside in the default domain ID slot of the
per device domain pointer array.

Also expand pci_msi_domain_supports() to handle feature checks correctly
even in the case that the per device domain was not yet created by checking
the features supported by the MSI parent.

Add the necessary setup calls into the MSI and MSI-X enable code path.
These setup calls are backwards compatible. They return success when there
is no parent domain found, which means the existing global domains or the
legacy allocation path keep just working.

Co-developed-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.975388241@linutronix.de

---
 drivers/pci/msi/irqdomain.c | 188 ++++++++++++++++++++++++++++++++++-
 drivers/pci/msi/msi.c       |  16 ++-
 drivers/pci/msi/msi.h       |   2 +-
 3 files changed, 201 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index f4338fb..be3d50f 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -139,6 +139,170 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
 }
 EXPORT_SYMBOL_GPL(pci_msi_create_irq_domain);
 
+/*
+ * Per device MSI[-X] domain functionality
+ */
+static void pci_device_domain_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
+{
+	arg->desc = desc;
+	arg->hwirq = desc->msi_index;
+}
+
+static void pci_irq_mask_msi(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+	pci_msi_mask(desc, BIT(data->irq - desc->irq));
+}
+
+static void pci_irq_unmask_msi(struct irq_data *data)
+{
+	struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+	pci_msi_unmask(desc, BIT(data->irq - desc->irq));
+}
+
+#ifdef CONFIG_GENERIC_IRQ_RESERVATION_MODE
+# define MSI_REACTIVATE		MSI_FLAG_MUST_REACTIVATE
+#else
+# define MSI_REACTIVATE		0
+#endif
+
+#define MSI_COMMON_FLAGS	(MSI_FLAG_FREE_MSI_DESCS |	\
+				 MSI_FLAG_ACTIVATE_EARLY |	\
+				 MSI_FLAG_DEV_SYSFS |		\
+				 MSI_REACTIVATE)
+
+static const struct msi_domain_template pci_msi_template = {
+	.chip = {
+		.name			= "PCI-MSI",
+		.irq_mask		= pci_irq_mask_msi,
+		.irq_unmask		= pci_irq_unmask_msi,
+		.irq_write_msi_msg	= pci_msi_domain_write_msg,
+		.flags			= IRQCHIP_ONESHOT_SAFE,
+	},
+
+	.ops = {
+		.set_desc		= pci_device_domain_set_desc,
+	},
+
+	.info = {
+		.flags			= MSI_COMMON_FLAGS | MSI_FLAG_MULTI_PCI_MSI,
+		.bus_token		= DOMAIN_BUS_PCI_DEVICE_MSI,
+	},
+};
+
+static void pci_irq_mask_msix(struct irq_data *data)
+{
+	pci_msix_mask(irq_data_get_msi_desc(data));
+}
+
+static void pci_irq_unmask_msix(struct irq_data *data)
+{
+	pci_msix_unmask(irq_data_get_msi_desc(data));
+}
+
+static const struct msi_domain_template pci_msix_template = {
+	.chip = {
+		.name			= "PCI-MSIX",
+		.irq_mask		= pci_irq_mask_msix,
+		.irq_unmask		= pci_irq_unmask_msix,
+		.irq_write_msi_msg	= pci_msi_domain_write_msg,
+		.flags			= IRQCHIP_ONESHOT_SAFE,
+	},
+
+	.ops = {
+		.set_desc		= pci_device_domain_set_desc,
+	},
+
+	.info = {
+		.flags			= MSI_COMMON_FLAGS | MSI_FLAG_PCI_MSIX,
+		.bus_token		= DOMAIN_BUS_PCI_DEVICE_MSIX,
+	},
+};
+
+static bool pci_match_device_domain(struct pci_dev *pdev, enum irq_domain_bus_token bus_token)
+{
+	return msi_match_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN, bus_token);
+}
+
+static bool pci_create_device_domain(struct pci_dev *pdev, const struct msi_domain_template *tmpl,
+				     unsigned int hwsize)
+{
+	struct irq_domain *domain = dev_get_msi_domain(&pdev->dev);
+
+	if (!domain || !irq_domain_is_msi_parent(domain))
+		return true;
+
+	return msi_create_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN, tmpl,
+					    hwsize, NULL, NULL);
+}
+
+/**
+ * pci_setup_msi_device_domain - Setup a device MSI interrupt domain
+ * @pdev:	The PCI device to create the domain on
+ *
+ * Return:
+ *  True when:
+ *	- The device does not have a MSI parent irq domain associated,
+ *	  which keeps the legacy architecture specific and the global
+ *	  PCI/MSI domain models working
+ *	- The MSI domain exists already
+ *	- The MSI domain was successfully allocated
+ *  False when:
+ *	- MSI-X is enabled
+ *	- The domain creation fails.
+ *
+ * The created MSI domain is preserved until:
+ *	- The device is removed
+ *	- MSI is disabled and a MSI-X domain is created
+ */
+bool pci_setup_msi_device_domain(struct pci_dev *pdev)
+{
+	if (WARN_ON_ONCE(pdev->msix_enabled))
+		return false;
+
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSI))
+		return true;
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSIX))
+		msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
+
+	return pci_create_device_domain(pdev, &pci_msi_template, 1);
+}
+
+/**
+ * pci_setup_msix_device_domain - Setup a device MSI-X interrupt domain
+ * @pdev:	The PCI device to create the domain on
+ * @hwsize:	The size of the MSI-X vector table
+ *
+ * Return:
+ *  True when:
+ *	- The device does not have a MSI parent irq domain associated,
+ *	  which keeps the legacy architecture specific and the global
+ *	  PCI/MSI domain models working
+ *	- The MSI-X domain exists already
+ *	- The MSI-X domain was successfully allocated
+ *  False when:
+ *	- MSI is enabled
+ *	- The domain creation fails.
+ *
+ * The created MSI-X domain is preserved until:
+ *	- The device is removed
+ *	- MSI-X is disabled and a MSI domain is created
+ */
+bool pci_setup_msix_device_domain(struct pci_dev *pdev, unsigned int hwsize)
+{
+	if (WARN_ON_ONCE(pdev->msi_enabled))
+		return false;
+
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSIX))
+		return true;
+	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSI))
+		msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
+
+	return pci_create_device_domain(pdev, &pci_msix_template, hwsize);
+}
+
 /**
  * pci_msi_domain_supports - Check for support of a particular feature flag
  * @pdev:		The PCI device to operate on
@@ -152,13 +316,33 @@ bool pci_msi_domain_supports(struct pci_dev *pdev, unsigned int feature_mask,
 {
 	struct msi_domain_info *info;
 	struct irq_domain *domain;
+	unsigned int supported;
 
 	domain = dev_get_msi_domain(&pdev->dev);
 
 	if (!domain || !irq_domain_is_hierarchy(domain))
 		return mode == ALLOW_LEGACY;
-	info = domain->host_data;
-	return (info->flags & feature_mask) == feature_mask;
+
+	if (!irq_domain_is_msi_parent(domain)) {
+		/*
+		 * For "global" PCI/MSI interrupt domains the associated
+		 * msi_domain_info::flags is the authoritive source of
+		 * information.
+		 */
+		info = domain->host_data;
+		supported = info->flags;
+	} else {
+		/*
+		 * For MSI parent domains the supported feature set
+		 * is avaliable in the parent ops. This makes checks
+		 * possible before actually instantiating the
+		 * per device domain because the parent is never
+		 * expanding the PCI/MSI functionality.
+		 */
+		supported = domain->msi_parent_ops->supported_flags;
+	}
+
+	return (supported & feature_mask) == feature_mask;
 }
 
 /*
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 76a3d44..b8d74df 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -436,6 +436,9 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 	if (rc)
 		return rc;
 
+	if (!pci_setup_msi_device_domain(dev))
+		return -ENODEV;
+
 	for (;;) {
 		if (affd) {
 			nvec = irq_calc_affinity_vectors(minvec, nvec, affd);
@@ -787,9 +790,13 @@ int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int
 	if (!pci_msix_validate_entries(dev, entries, nvec, hwsize))
 		return -EINVAL;
 
-	/* PCI_IRQ_VIRTUAL is a horrible hack! */
-	if (nvec > hwsize && !(flags & PCI_IRQ_VIRTUAL))
-		nvec = hwsize;
+	if (hwsize < nvec) {
+		/* Keep the IRQ virtual hackery working */
+		if (flags & PCI_IRQ_VIRTUAL)
+			hwsize = nvec;
+		else
+			nvec = hwsize;
+	}
 
 	if (nvec < minvec)
 		return -ENOSPC;
@@ -798,6 +805,9 @@ int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int
 	if (rc)
 		return rc;
 
+	if (!pci_setup_msix_device_domain(dev, hwsize))
+		return -ENODEV;
+
 	for (;;) {
 		if (affd) {
 			nvec = irq_calc_affinity_vectors(minvec, nvec, affd);
diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
index 9d75b6f..74408cc 100644
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -105,6 +105,8 @@ enum support_mode {
 };
 
 bool pci_msi_domain_supports(struct pci_dev *dev, unsigned int feature_mask, enum support_mode mode);
+bool pci_setup_msi_device_domain(struct pci_dev *pdev);
+bool pci_setup_msix_device_domain(struct pci_dev *pdev, unsigned int hwsize);
 
 /* Legacy (!IRQDOMAIN) fallbacks */
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Provide BUS_DEVICE_PCI_MSI[X]
  2022-11-24 23:26 ` [patch V3 11/33] genirq/msi: Provide BUS_DEVICE_PCI_MSI[X] Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     bd141a3db40c877e01de8e981edb57c03199d876
Gitweb:        https://git.kernel.org/tip/bd141a3db40c877e01de8e981edb57c03199d876
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:02 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:32 +01:00

genirq/msi: Provide BUS_DEVICE_PCI_MSI[X]

Provide new bus tokens for the upcoming per device PCI/MSI and PCI/MSIX
interrupt domains.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.917219885@linutronix.de

---
 include/linux/irqdomain_defs.h | 2 ++
 kernel/irq/msi.c               | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/irqdomain_defs.h b/include/linux/irqdomain_defs.h
index 69035b4..b3f4b7e 100644
--- a/include/linux/irqdomain_defs.h
+++ b/include/linux/irqdomain_defs.h
@@ -21,6 +21,8 @@ enum irq_domain_bus_token {
 	DOMAIN_BUS_TI_SCI_INTA_MSI,
 	DOMAIN_BUS_WAKEUP,
 	DOMAIN_BUS_VMD_MSI,
+	DOMAIN_BUS_PCI_DEVICE_MSI,
+	DOMAIN_BUS_PCI_DEVICE_MSIX,
 };
 
 #endif /* _LINUX_IRQDOMAIN_DEFS_H */
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index ae58692..0749e66 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1118,6 +1118,8 @@ static bool msi_check_reservation_mode(struct irq_domain *domain,
 
 	switch(domain->bus_token) {
 	case DOMAIN_BUS_PCI_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
 	case DOMAIN_BUS_VMD_MSI:
 		break;
 	default:
@@ -1143,6 +1145,8 @@ static int msi_handle_pci_fail(struct irq_domain *domain, struct msi_desc *desc,
 {
 	switch(domain->bus_token) {
 	case DOMAIN_BUS_PCI_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
 	case DOMAIN_BUS_VMD_MSI:
 		if (IS_ENABLED(CONFIG_PCI_MSI))
 			break;

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Remove unused pci_dev_has_special_msi_domain()
  2022-11-24 23:26 ` [patch V3 14/33] PCI/MSI: Remove unused pci_dev_has_special_msi_domain() Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Bjorn Helgaas, Marc Zyngier, x86,
	linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     45c0402457c1ed2f07ee32dc129ae710e0dc288c
Gitweb:        https://git.kernel.org/tip/45c0402457c1ed2f07ee32dc129ae710e0dc288c
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:07 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:33 +01:00

PCI/MSI: Remove unused pci_dev_has_special_msi_domain()

The check for special MSI domains like VMD which prevents the interrupt
remapping code to overwrite device::msi::domain is not longer required and
has been replaced by an x86 specific version which is aware of MSI parent
domains.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.093093200@linutronix.de

---
 drivers/pci/msi/irqdomain.c | 21 ---------------------
 include/linux/msi.h         |  1 -
 2 files changed, 22 deletions(-)

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index be3d50f..4736403 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -414,24 +414,3 @@ struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev)
 					     DOMAIN_BUS_PCI_MSI);
 	return dom;
 }
-
-/**
- * pci_dev_has_special_msi_domain - Check whether the device is handled by
- *				    a non-standard PCI-MSI domain
- * @pdev:	The PCI device to check.
- *
- * Returns: True if the device irqdomain or the bus irqdomain is
- * non-standard PCI/MSI.
- */
-bool pci_dev_has_special_msi_domain(struct pci_dev *pdev)
-{
-	struct irq_domain *dom = dev_get_msi_domain(&pdev->dev);
-
-	if (!dom)
-		dom = dev_get_msi_domain(&pdev->bus->dev);
-
-	if (!dom)
-		return true;
-
-	return dom->bus_token != DOMAIN_BUS_PCI_MSI;
-}
diff --git a/include/linux/msi.h b/include/linux/msi.h
index b4ab005..b5dda4b 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -617,7 +617,6 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
 					     struct irq_domain *parent);
 u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev);
 struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev);
-bool pci_dev_has_special_msi_domain(struct pci_dev *pdev);
 #else /* CONFIG_PCI_MSI */
 static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev)
 {

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] PCI/MSI: Split __pci_write_msi_msg()
  2022-11-24 23:26 ` [patch V3 10/33] PCI/MSI: Split __pci_write_msi_msg() Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ahmed S. Darwish, Thomas Gleixner, Kevin Tian, Bjorn Helgaas,
	Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     877d6c4e93f5091bfa52549bde8fb9ce71d6f7e5
Gitweb:        https://git.kernel.org/tip/877d6c4e93f5091bfa52549bde8fb9ce71d6f7e5
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:00 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:32 +01:00

PCI/MSI: Split __pci_write_msi_msg()

The upcoming per device MSI domains will create different domains for MSI
and MSI-X. Split the write message function into MSI and MSI-X helpers so
they can be used by those new domain functions seperately.

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.857982142@linutronix.de

---
 drivers/pci/msi/msi.c | 104 +++++++++++++++++++++--------------------
 1 file changed, 54 insertions(+), 50 deletions(-)

diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index d107bde..76a3d44 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -180,6 +180,58 @@ void __pci_read_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 	}
 }
 
+static inline void pci_write_msg_msi(struct pci_dev *dev, struct msi_desc *desc,
+				     struct msi_msg *msg)
+{
+	int pos = dev->msi_cap;
+	u16 msgctl;
+
+	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+	msgctl &= ~PCI_MSI_FLAGS_QSIZE;
+	msgctl |= desc->pci.msi_attrib.multiple << 4;
+	pci_write_config_word(dev, pos + PCI_MSI_FLAGS, msgctl);
+
+	pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_LO, msg->address_lo);
+	if (desc->pci.msi_attrib.is_64) {
+		pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_HI,  msg->address_hi);
+		pci_write_config_word(dev, pos + PCI_MSI_DATA_64, msg->data);
+	} else {
+		pci_write_config_word(dev, pos + PCI_MSI_DATA_32, msg->data);
+	}
+	/* Ensure that the writes are visible in the device */
+	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+}
+
+static inline void pci_write_msg_msix(struct msi_desc *desc, struct msi_msg *msg)
+{
+	void __iomem *base = pci_msix_desc_addr(desc);
+	u32 ctrl = desc->pci.msix_ctrl;
+	bool unmasked = !(ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT);
+
+	if (desc->pci.msi_attrib.is_virtual)
+		return;
+	/*
+	 * The specification mandates that the entry is masked
+	 * when the message is modified:
+	 *
+	 * "If software changes the Address or Data value of an
+	 * entry while the entry is unmasked, the result is
+	 * undefined."
+	 */
+	if (unmasked)
+		pci_msix_write_vector_ctrl(desc, ctrl | PCI_MSIX_ENTRY_CTRL_MASKBIT);
+
+	writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
+	writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR);
+	writel(msg->data, base + PCI_MSIX_ENTRY_DATA);
+
+	if (unmasked)
+		pci_msix_write_vector_ctrl(desc, ctrl);
+
+	/* Ensure that the writes are visible in the device */
+	readl(base + PCI_MSIX_ENTRY_DATA);
+}
+
 void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
 	struct pci_dev *dev = msi_desc_to_pci_dev(entry);
@@ -187,63 +239,15 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 	if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
 		/* Don't touch the hardware now */
 	} else if (entry->pci.msi_attrib.is_msix) {
-		void __iomem *base = pci_msix_desc_addr(entry);
-		u32 ctrl = entry->pci.msix_ctrl;
-		bool unmasked = !(ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT);
-
-		if (entry->pci.msi_attrib.is_virtual)
-			goto skip;
-
-		/*
-		 * The specification mandates that the entry is masked
-		 * when the message is modified:
-		 *
-		 * "If software changes the Address or Data value of an
-		 * entry while the entry is unmasked, the result is
-		 * undefined."
-		 */
-		if (unmasked)
-			pci_msix_write_vector_ctrl(entry, ctrl | PCI_MSIX_ENTRY_CTRL_MASKBIT);
-
-		writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
-		writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR);
-		writel(msg->data, base + PCI_MSIX_ENTRY_DATA);
-
-		if (unmasked)
-			pci_msix_write_vector_ctrl(entry, ctrl);
-
-		/* Ensure that the writes are visible in the device */
-		readl(base + PCI_MSIX_ENTRY_DATA);
+		pci_write_msg_msix(entry, msg);
 	} else {
-		int pos = dev->msi_cap;
-		u16 msgctl;
-
-		pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
-		msgctl &= ~PCI_MSI_FLAGS_QSIZE;
-		msgctl |= entry->pci.msi_attrib.multiple << 4;
-		pci_write_config_word(dev, pos + PCI_MSI_FLAGS, msgctl);
-
-		pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_LO,
-				       msg->address_lo);
-		if (entry->pci.msi_attrib.is_64) {
-			pci_write_config_dword(dev, pos + PCI_MSI_ADDRESS_HI,
-					       msg->address_hi);
-			pci_write_config_word(dev, pos + PCI_MSI_DATA_64,
-					      msg->data);
-		} else {
-			pci_write_config_word(dev, pos + PCI_MSI_DATA_32,
-					      msg->data);
-		}
-		/* Ensure that the writes are visible in the device */
-		pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+		pci_write_msg_msi(dev, entry, msg);
 	}
 
-skip:
 	entry->msg = *msg;
 
 	if (entry->write_msi_msg)
 		entry->write_msi_msg(entry, entry->write_msi_msg_data);
-
 }
 
 void pci_write_msi_msg(unsigned int irq, struct msi_msg *msg)

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] genirq/msi: Add range checking to msi_insert_desc()
  2022-11-24 23:25 ` [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc() Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2022-12-13 19:04   ` [patch V3 09/33] " Guenter Roeck
  2023-02-20 17:11   ` [REGRESSION] " Russell King (Oracle)
  3 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     36db3d9003ea85217b357a658cf7b37920c2c38e
Gitweb:        https://git.kernel.org/tip/36db3d9003ea85217b357a658cf7b37920c2c38e
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:25:59 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:32 +01:00

genirq/msi: Add range checking to msi_insert_desc()

Per device domains provide the real domain size to the core code. This
allows range checking on insertion of MSI descriptors and also paves the
way for dynamic index allocations which are required e.g. for IMS. This
avoids external mechanisms like bitmaps on the device side and just
utilizes the core internal MSI descriptor storxe for it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232325.798556374@linutronix.de

---
 kernel/irq/msi.c | 53 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 7449998..ae58692 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -40,6 +40,7 @@ struct msi_ctrl {
 #define MSI_XA_DOMAIN_SIZE	(MSI_MAX_INDEX + 1)
 
 static void msi_domain_free_locked(struct device *dev, struct msi_ctrl *ctrl);
+static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid);
 static inline int msi_sysfs_create_group(struct device *dev);
 
 
@@ -80,16 +81,28 @@ static void msi_free_desc(struct msi_desc *desc)
 	kfree(desc);
 }
 
-static int msi_insert_desc(struct msi_device_data *md, struct msi_desc *desc,
+static int msi_insert_desc(struct device *dev, struct msi_desc *desc,
 			   unsigned int domid, unsigned int index)
 {
+	struct msi_device_data *md = dev->msi.data;
 	struct xarray *xa = &md->__domains[domid].store;
+	unsigned int hwsize;
 	int ret;
 
+	hwsize = msi_domain_get_hwsize(dev, domid);
+	if (index >= hwsize) {
+		ret = -ERANGE;
+		goto fail;
+	}
+
 	desc->msi_index = index;
 	ret = xa_insert(xa, index, desc, GFP_KERNEL);
 	if (ret)
-		msi_free_desc(desc);
+		goto fail;
+	return 0;
+
+fail:
+	msi_free_desc(desc);
 	return ret;
 }
 
@@ -117,7 +130,7 @@ int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid,
 	/* Copy type specific data to the new descriptor. */
 	desc->pci = init_desc->pci;
 
-	return msi_insert_desc(dev->msi.data, desc, domid, init_desc->msi_index);
+	return msi_insert_desc(dev, desc, domid, init_desc->msi_index);
 }
 
 static bool msi_desc_match(struct msi_desc *desc, enum msi_desc_filter filter)
@@ -136,11 +149,16 @@ static bool msi_desc_match(struct msi_desc *desc, enum msi_desc_filter filter)
 
 static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl)
 {
+	unsigned int hwsize;
+
 	if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS ||
-			 !dev->msi.data->__domains[ctrl->domid].domain ||
-			 ctrl->first > ctrl->last ||
-			 ctrl->first > MSI_MAX_INDEX ||
-			 ctrl->last > MSI_MAX_INDEX))
+			 !dev->msi.data->__domains[ctrl->domid].domain))
+		return false;
+
+	hwsize = msi_domain_get_hwsize(dev, ctrl->domid);
+	if (WARN_ON_ONCE(ctrl->first > ctrl->last ||
+			 ctrl->first >= hwsize ||
+			 ctrl->last >= hwsize))
 		return false;
 	return true;
 }
@@ -208,7 +226,7 @@ static int msi_domain_add_simple_msi_descs(struct device *dev, struct msi_ctrl *
 		desc = msi_alloc_desc(dev, 1, NULL);
 		if (!desc)
 			goto fail_mem;
-		ret = msi_insert_desc(dev->msi.data, desc, ctrl->domid, idx);
+		ret = msi_insert_desc(dev, desc, ctrl->domid, idx);
 		if (ret)
 			goto fail;
 	}
@@ -568,6 +586,20 @@ static struct irq_domain *msi_get_device_domain(struct device *dev, unsigned int
 	return domain;
 }
 
+static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid)
+{
+	struct msi_domain_info *info;
+	struct irq_domain *domain;
+
+	domain = msi_get_device_domain(dev, domid);
+	if (domain) {
+		info = domain->host_data;
+		return info->hwsize;
+	}
+	/* No domain, no size... */
+	return 0;
+}
+
 static inline void irq_chip_write_msi_msg(struct irq_data *data,
 					  struct msi_msg *msg)
 {
@@ -1356,7 +1388,7 @@ int msi_domain_alloc_irqs_all_locked(struct device *dev, unsigned int domid, int
 	struct msi_ctrl ctrl = {
 		.domid	= domid,
 		.first	= 0,
-		.last	= MSI_MAX_INDEX,
+		.last	= msi_domain_get_hwsize(dev, domid) - 1,
 		.nirqs	= nirqs,
 	};
 
@@ -1470,7 +1502,8 @@ void msi_domain_free_irqs_range(struct device *dev, unsigned int domid,
  */
 void msi_domain_free_irqs_all_locked(struct device *dev, unsigned int domid)
 {
-	msi_domain_free_irqs_range_locked(dev, domid, 0, MSI_MAX_INDEX);
+	msi_domain_free_irqs_range_locked(dev, domid, 0,
+					  msi_domain_get_hwsize(dev, domid) - 1);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip: irq/core] x86/apic/vector: Provide MSI parent domain
  2022-11-24 23:26 ` [patch V3 13/33] x86/apic/vector: Provide MSI parent domain Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
  2023-01-04 12:34   ` [patch V3 13/33] " Jason Gunthorpe
  2 siblings, 0 replies; 126+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-12-05 21:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Kevin Tian, Marc Zyngier, x86, linux-kernel

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     b6d5fc3a5245c65f7c83440460a1566d09cc9038
Gitweb:        https://git.kernel.org/tip/b6d5fc3a5245c65f7c83440460a1566d09cc9038
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 25 Nov 2022 00:26:05 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 05 Dec 2022 22:22:33 +01:00

x86/apic/vector: Provide MSI parent domain

Enable MSI parent domain support in the x86 vector domain and fixup the
checks in the iommu implementations to check whether device::msi::domain is
the default MSI parent domain. That keeps the existing logic to protect
e.g. devices behind VMD working.

The interrupt remap PCI/MSI code still works because the underlying vector
domain still provides the same functionality.

None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are
affected either. They still work the same way both at the low level and the
PCI/MSI implementations they provide.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221124232326.034672592@linutronix.de

---
 arch/x86/include/asm/msi.h          |   6 +-
 arch/x86/include/asm/pci.h          |   1 +-
 arch/x86/kernel/apic/msi.c          | 176 +++++++++++++++++++--------
 drivers/iommu/amd/iommu.c           |   2 +-
 drivers/iommu/intel/irq_remapping.c |   2 +-
 5 files changed, 138 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/msi.h b/arch/x86/include/asm/msi.h
index d71c7e8..7702958 100644
--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -62,4 +62,10 @@ typedef struct x86_msi_addr_hi {
 struct msi_msg;
 u32 x86_msi_msg_get_destid(struct msi_msg *msg, bool extid);
 
+#define X86_VECTOR_MSI_FLAGS_SUPPORTED					\
+	(MSI_GENERIC_FLAGS_MASK | MSI_FLAG_PCI_MSIX)
+
+#define X86_VECTOR_MSI_FLAGS_REQUIRED					\
+	(MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS)
+
 #endif /* _ASM_X86_MSI_H */
diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index c4789de..b40c462 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -92,6 +92,7 @@ void pcibios_scan_root(int bus);
 struct irq_routing_table *pcibios_get_irq_routing_table(void);
 int pcibios_set_irq_routing(struct pci_dev *dev, int pin, int irq);
 
+bool pci_dev_has_default_msi_parent_domain(struct pci_dev *dev);
 
 #define HAVE_PCI_MMAP
 #define arch_can_pci_mmap_wc()	pat_enabled()
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 71c8751..db96bfc 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -142,67 +142,131 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force)
 	return ret;
 }
 
-/*
- * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
- * which implement the MSI or MSI-X Capability Structure.
+/**
+ * pci_dev_has_default_msi_parent_domain - Check whether the device has the default
+ *					   MSI parent domain associated
+ * @dev:	Pointer to the PCI device
  */
-static struct irq_chip pci_msi_controller = {
-	.name			= "PCI-MSI",
-	.irq_unmask		= pci_msi_unmask_irq,
-	.irq_mask		= pci_msi_mask_irq,
-	.irq_ack		= irq_chip_ack_parent,
-	.irq_retrigger		= irq_chip_retrigger_hierarchy,
-	.irq_set_affinity	= msi_set_affinity,
-	.flags			= IRQCHIP_SKIP_SET_WAKE |
-				  IRQCHIP_AFFINITY_PRE_STARTUP,
-};
-
-int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
-		    msi_alloc_info_t *arg)
+bool pci_dev_has_default_msi_parent_domain(struct pci_dev *dev)
 {
-	init_irq_alloc_info(arg, NULL);
-	if (to_pci_dev(dev)->msix_enabled)
-		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
-	else
-		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
+	struct irq_domain *domain = dev_get_msi_domain(&dev->dev);
 
-	return 0;
+	if (!domain)
+		domain = dev_get_msi_domain(&dev->bus->dev);
+	if (!domain)
+		return false;
+
+	return domain == x86_vector_domain;
 }
-EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
-static struct msi_domain_ops pci_msi_domain_ops = {
-	.msi_prepare	= pci_msi_prepare,
-};
+/**
+ * x86_msi_prepare - Setup of msi_alloc_info_t for allocations
+ * @domain:	The domain for which this setup happens
+ * @dev:	The device for which interrupts are allocated
+ * @nvec:	The number of vectors to allocate
+ * @alloc:	The allocation info structure to initialize
+ *
+ * This function is to be used for all types of MSI domains above the x86
+ * vector domain and any intermediates. It is always invoked from the
+ * top level interrupt domain. The domain specific allocation
+ * functionality is determined via the @domain's bus token which allows to
+ * map the X86 specific allocation type.
+ */
+static int x86_msi_prepare(struct irq_domain *domain, struct device *dev,
+			   int nvec, msi_alloc_info_t *alloc)
+{
+	struct msi_domain_info *info = domain->host_data;
+
+	init_irq_alloc_info(alloc, NULL);
+
+	switch (info->bus_token) {
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
+		return 0;
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
+		alloc->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
 
-static struct msi_domain_info pci_msi_domain_info = {
-	.flags		= MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
-			  MSI_FLAG_PCI_MSIX | MSI_FLAG_NOMASK_QUIRK,
+/**
+ * x86_init_dev_msi_info - Domain info setup for MSI domains
+ * @dev:		The device for which the domain should be created
+ * @domain:		The (root) domain providing this callback
+ * @real_parent:	The real parent domain of the to initialize domain
+ * @info:		The domain info for the to initialize domain
+ *
+ * This function is to be used for all types of MSI domains above the x86
+ * vector domain and any intermediates. The domain specific functionality
+ * is determined via the @real_parent.
+ */
+static bool x86_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+				  struct irq_domain *real_parent, struct msi_domain_info *info)
+{
+	const struct msi_parent_ops *pops = real_parent->msi_parent_ops;
+
+	/* MSI parent domain specific settings */
+	switch (real_parent->bus_token) {
+	case DOMAIN_BUS_ANY:
+		/* Only the vector domain can have the ANY token */
+		if (WARN_ON_ONCE(domain != real_parent))
+			return false;
+		info->chip->irq_set_affinity = msi_set_affinity;
+		/* See msi_set_affinity() for the gory details */
+		info->flags |= MSI_FLAG_NOMASK_QUIRK;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return false;
+	}
 
-	.ops		= &pci_msi_domain_ops,
-	.chip		= &pci_msi_controller,
-	.handler	= handle_edge_irq,
-	.handler_name	= "edge",
+	/* Is the target supported? */
+	switch(info->bus_token) {
+	case DOMAIN_BUS_PCI_DEVICE_MSI:
+	case DOMAIN_BUS_PCI_DEVICE_MSIX:
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	/*
+	 * Mask out the domain specific MSI feature flags which are not
+	 * supported by the real parent.
+	 */
+	info->flags			&= pops->supported_flags;
+	/* Enforce the required flags */
+	info->flags			|= X86_VECTOR_MSI_FLAGS_REQUIRED;
+
+	/* This is always invoked from the top level MSI domain! */
+	info->ops->msi_prepare		= x86_msi_prepare;
+
+	info->chip->irq_ack		= irq_chip_ack_parent;
+	info->chip->irq_retrigger	= irq_chip_retrigger_hierarchy;
+	info->chip->flags		|= IRQCHIP_SKIP_SET_WAKE |
+					   IRQCHIP_AFFINITY_PRE_STARTUP;
+
+	info->handler			= handle_edge_irq;
+	info->handler_name		= "edge";
+
+	return true;
+}
+
+static const struct msi_parent_ops x86_vector_msi_parent_ops = {
+	.supported_flags	= X86_VECTOR_MSI_FLAGS_SUPPORTED,
+	.init_dev_msi_info	= x86_init_dev_msi_info,
 };
 
 struct irq_domain * __init native_create_pci_msi_domain(void)
 {
-	struct fwnode_handle *fn;
-	struct irq_domain *d;
-
 	if (disable_apic)
 		return NULL;
 
-	fn = irq_domain_alloc_named_fwnode("PCI-MSI");
-	if (!fn)
-		return NULL;
-
-	d = pci_msi_create_irq_domain(fn, &pci_msi_domain_info,
-				      x86_vector_domain);
-	if (!d) {
-		irq_domain_free_fwnode(fn);
-		pr_warn("Failed to initialize PCI-MSI irqdomain.\n");
-	}
-	return d;
+	x86_vector_domain->flags |= IRQ_DOMAIN_FLAG_MSI_PARENT;
+	x86_vector_domain->msi_parent_ops = &x86_vector_msi_parent_ops;
+	return x86_vector_domain;
 }
 
 void __init x86_create_pci_msi_domain(void)
@@ -210,7 +274,25 @@ void __init x86_create_pci_msi_domain(void)
 	x86_pci_msi_default_domain = x86_init.irqs.create_pci_msi_domain();
 }
 
+/* Keep around for hyperV and the remap code below */
+int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+		    msi_alloc_info_t *arg)
+{
+	init_irq_alloc_info(arg, NULL);
+
+	if (to_pci_dev(dev)->msix_enabled)
+		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
+	else
+		arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pci_msi_prepare);
+
 #ifdef CONFIG_IRQ_REMAP
+static struct msi_domain_ops pci_msi_domain_ops = {
+	.msi_prepare	= pci_msi_prepare,
+};
+
 static struct irq_chip pci_msi_ir_controller = {
 	.name			= "IR-PCI-MSI",
 	.irq_unmask		= pci_msi_unmask_irq,
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 72dfe57..67e209c 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -812,7 +812,7 @@ static void
 amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu)
 {
 	if (!irq_remapping_enabled || !dev_is_pci(dev) ||
-	    pci_dev_has_special_msi_domain(to_pci_dev(dev)))
+	    !pci_dev_has_default_msi_parent_domain(to_pci_dev(dev)))
 		return;
 
 	dev_set_msi_domain(dev, iommu->msi_domain);
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index a914eba..08bbf08 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1107,7 +1107,7 @@ error:
  */
 void intel_irq_remap_add_device(struct dmar_pci_notify_info *info)
 {
-	if (!irq_remapping_enabled || pci_dev_has_special_msi_domain(info->dev))
+	if (!irq_remapping_enabled || !pci_dev_has_default_msi_parent_domain(info->dev))
 		return;
 
 	dev_set_msi_domain(&info->dev->dev, map_dev_to_ir(info->dev));

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-11-24 23:25 ` [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc() Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
@ 2022-12-13 19:04   ` Guenter Roeck
  2022-12-14  9:42     ` Niklas Schnelle
  2023-02-20 17:11   ` [REGRESSION] " Russell King (Oracle)
  3 siblings, 1 reply; 126+ messages in thread
From: Guenter Roeck @ 2022-12-13 19:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

Hi,

On Fri, Nov 25, 2022 at 12:25:59AM +0100, Thomas Gleixner wrote:
> Per device domains provide the real domain size to the core code. This
> allows range checking on insertion of MSI descriptors and also paves the
> way for dynamic index allocations which are required e.g. for IMS. This
> avoids external mechanisms like bitmaps on the device side and just
> utilizes the core internal MSI descriptor storxe for it.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---

This patch results in various s390 qemu test failures.
There is a warning backtrace

   12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0

followed by

[   12.684333] virtio_net: probe of virtio0 failed with error -34

and Ethernet interfaces don't instantiate.

When trying to instantiate virtio-pci and booting from it, I see
the same warning backtrace followed by

[    9.943123] virtio_blk: probe of virtio0 failed with error -34

and a crash.

A typical backtrace is

[   12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0
[   12.675108] Modules linked in:
[   12.675346] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.1.0-03225-g764822972d64 #1
[   12.675512] Hardware name: QEMU 8561 QEMU (KVM/Linux)
[   12.675648] Krnl PSW : 0704c00180000000 00000000001ec4c6 (msi_ctrl_valid+0x2e/0xb0)
[   12.675853]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[   12.675987] Krnl GPRS: 00000000435318a9 0000000000000000 00000000035510a0 0000000000000000
[   12.676069]            0000000000000000 000000000000ffff 0000000000000000 0000037fffb1b6c0
[   12.676151]            0000000000000000 0000037fffb1b658 0000000000000000 0000037fffb1b658
[   12.676232]            0000000002ae4100 00000000035510a0 0000037fffb1b568 0000037fffb1b538
[   12.677127] Krnl Code: 00000000001ec4b8: 58303000		l	%r3,0(%r3)
[   12.677127]            00000000001ec4bc: ec3c000f017f	clij	%r3,1,12,00000000001ec4da
[   12.677127]           #00000000001ec4c2: af000000		mc	0,0
[   12.677127]           >00000000001ec4c6: a7280000		lhi	%r2,0
[   12.677127]            00000000001ec4ca: b9840022		llgcr	%r2,%r2
[   12.677127]            00000000001ec4ce: ebbff0a00004	lmg	%r11,%r15,160(%r15)
[   12.677127]            00000000001ec4d4: c0f400714f1a	brcl	15,0000000001016308
[   12.677127]            00000000001ec4da: b9160033		llgfr	%r3,%r3
[   12.677743] Call Trace:
[   12.677835]  [<00000000001ec4c6>] msi_ctrl_valid+0x2e/0xb0
[   12.677943]  [<00000000001ec58a>] msi_domain_free_descs+0x42/0x120
[   12.678024]  [<00000000001ecaf0>] msi_domain_free_msi_descs_range+0x38/0x48
[   12.678103]  [<00000000009db7ae>] __pci_enable_msix_range+0x44e/0x710
[   12.678186]  [<00000000009d9da4>] pci_alloc_irq_vectors_affinity+0xa4/0x120
[   12.678268]  [<00000000009f5888>] vp_request_msix_vectors+0xb8/0x208
[   12.678348]  [<00000000009f5f24>] vp_find_vqs_msix+0x254/0x2f0
[   12.678428]  [<00000000009f6016>] vp_find_vqs+0x56/0x1f8
[   12.678508]  [<00000000009f4e4e>] vp_modern_find_vqs+0x3e/0x90
[   12.678587]  [<0000000000ad8c14>] virtnet_find_vqs+0x244/0x3e8
[   12.678669]  [<0000000000ad9268>] virtnet_probe+0x4b0/0xca8
[   12.678748]  [<00000000009ed6b4>] virtio_dev_probe+0x1ec/0x418
[   12.678826]  [<0000000000a3c246>] really_probe+0xd6/0x480
[   12.678906]  [<0000000000a3c7a0>] driver_probe_device+0x40/0xf0
[   12.678985]  [<0000000000a3d0e4>] __driver_attach+0xbc/0x228
[   12.679065]  [<0000000000a396c0>] bus_for_each_dev+0x80/0xb8
[   12.679143]  [<0000000000a3b38e>] bus_add_driver+0x1d6/0x260
[   12.679222]  [<0000000000a3dc10>] driver_register+0xa8/0x170
[   12.679312]  [<00000000017b8848>] virtio_net_driver_init+0x88/0xc0

This worked fine in v6.1 and earlier kernels. Bisect log attached.

Guenter

---
# bad: [764822972d64e7f3e6792278ecc7a3b3c81087cd] Merge tag 'nfsd-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
# good: [830b3c68c1fb1e9176028d02ef86f3cf76aa2476] Linux 6.1
git bisect start 'HEAD' 'v6.1'
# good: [01f3cbb296a9ad378167c01758c99557b5bc3208] Merge tag 'soc-dt-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect good 01f3cbb296a9ad378167c01758c99557b5bc3208
# bad: [e2ed78d5d9ca07a2b9d158ebac366170a2d3083d] Merge tag 'linux-kselftest-kunit-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
git bisect bad e2ed78d5d9ca07a2b9d158ebac366170a2d3083d
# bad: [045e222d0a9dcec152abe0633f538cafd965b12b] Merge tag 'pm-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 045e222d0a9dcec152abe0633f538cafd965b12b
# good: [f10bc40168032962ebee26894bdbdc972cde35bf] Merge tag 'core-debugobjects-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good f10bc40168032962ebee26894bdbdc972cde35bf
# bad: [9d33edb20f7e6943250d6bb96ceaf2368f674d51] Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 9d33edb20f7e6943250d6bb96ceaf2368f674d51
# good: [c459f11f32a022d0f97694030419d16816275a9d] genirq/msi: Remove unused alloc/free interfaces
git bisect good c459f11f32a022d0f97694030419d16816275a9d
# bad: [d51a15af37ce8cf59e73de51dcdce3c9f4944974] irqchip/gic-v2m: Mark a few functions __init
git bisect bad d51a15af37ce8cf59e73de51dcdce3c9f4944974
# bad: [4d5a4ccc519ab0a62e220dc8dcd8bc1c5f8fee10] x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
git bisect bad 4d5a4ccc519ab0a62e220dc8dcd8bc1c5f8fee10
# good: [26e91b75bf6108550035355c835bf0c93c885b61] genirq/msi: Provide msi_match_device_domain()
git bisect good 26e91b75bf6108550035355c835bf0c93c885b61
# bad: [15c72f824b32761696b1854500bb3dedccbbb45a] PCI/MSI: Add support for per device MSI[X] domains
git bisect bad 15c72f824b32761696b1854500bb3dedccbbb45a
# bad: [877d6c4e93f5091bfa52549bde8fb9ce71d6f7e5] PCI/MSI: Split __pci_write_msi_msg()
git bisect bad 877d6c4e93f5091bfa52549bde8fb9ce71d6f7e5
# bad: [36db3d9003ea85217b357a658cf7b37920c2c38e] genirq/msi: Add range checking to msi_insert_desc()
git bisect bad 36db3d9003ea85217b357a658cf7b37920c2c38e
# first bad commit: [36db3d9003ea85217b357a658cf7b37920c2c38e] genirq/msi: Add range checking to msi_insert_desc()

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-13 19:04   ` [patch V3 09/33] " Guenter Roeck
@ 2022-12-14  9:42     ` Niklas Schnelle
  2022-12-15 14:49       ` Thomas Gleixner
  0 siblings, 1 reply; 126+ messages in thread
From: Niklas Schnelle @ 2022-12-14  9:42 UTC (permalink / raw)
  To: Guenter Roeck, Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe,
	Matthew Rosato

On Tue, 2022-12-13 at 11:04 -0800, Guenter Roeck wrote:
> Hi,
> 
> On Fri, Nov 25, 2022 at 12:25:59AM +0100, Thomas Gleixner wrote:
> > Per device domains provide the real domain size to the core code. This
> > allows range checking on insertion of MSI descriptors and also paves the
> > way for dynamic index allocations which are required e.g. for IMS. This
> > avoids external mechanisms like bitmaps on the device side and just
> > utilizes the core internal MSI descriptor storxe for it.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > ---
> 
> This patch results in various s390 qemu test failures.
> There is a warning backtrace
> 
>    12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0
> 
> followed by
> 
> [   12.684333] virtio_net: probe of virtio0 failed with error -34
> 
> and Ethernet interfaces don't instantiate.
> 
> When trying to instantiate virtio-pci and booting from it, I see
> the same warning backtrace followed by
> 
> [    9.943123] virtio_blk: probe of virtio0 failed with error -34
> 
> and a crash.
> 
> A typical backtrace is
> 
> [   12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0
> [   12.675108] Modules linked in:
> [   12.675346] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.1.0-03225-g764822972d64 #1
> [   12.675512] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> [   12.675648] Krnl PSW : 0704c00180000000 00000000001ec4c6 (msi_ctrl_valid+0x2e/0xb0)
> [   12.675853]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [   12.675987] Krnl GPRS: 00000000435318a9 0000000000000000 00000000035510a0 0000000000000000
> [   12.676069]            0000000000000000 000000000000ffff 0000000000000000 0000037fffb1b6c0
> [   12.676151]            0000000000000000 0000037fffb1b658 0000000000000000 0000037fffb1b658
> [   12.676232]            0000000002ae4100 00000000035510a0 0000037fffb1b568 0000037fffb1b538
> [   12.677127] Krnl Code: 00000000001ec4b8: 58303000		l	%r3,0(%r3)
> [   12.677127]            00000000001ec4bc: ec3c000f017f	clij	%r3,1,12,00000000001ec4da
> [   12.677127]           #00000000001ec4c2: af000000		mc	0,0
> [   12.677127]           >00000000001ec4c6: a7280000		lhi	%r2,0
> [   12.677127]            00000000001ec4ca: b9840022		llgcr	%r2,%r2
> [   12.677127]            00000000001ec4ce: ebbff0a00004	lmg	%r11,%r15,160(%r15)
> [   12.677127]            00000000001ec4d4: c0f400714f1a	brcl	15,0000000001016308
> [   12.677127]            00000000001ec4da: b9160033		llgfr	%r3,%r3
> [   12.677743] Call Trace:
> [   12.677835]  [<00000000001ec4c6>] msi_ctrl_valid+0x2e/0xb0
> [   12.677943]  [<00000000001ec58a>] msi_domain_free_descs+0x42/0x120
> [   12.678024]  [<00000000001ecaf0>] msi_domain_free_msi_descs_range+0x38/0x48
> [   12.678103]  [<00000000009db7ae>] __pci_enable_msix_range+0x44e/0x710
> [   12.678186]  [<00000000009d9da4>] pci_alloc_irq_vectors_affinity+0xa4/0x120
> [   12.678268]  [<00000000009f5888>] vp_request_msix_vectors+0xb8/0x208
> [   12.678348]  [<00000000009f5f24>] vp_find_vqs_msix+0x254/0x2f0
> [   12.678428]  [<00000000009f6016>] vp_find_vqs+0x56/0x1f8
> [   12.678508]  [<00000000009f4e4e>] vp_modern_find_vqs+0x3e/0x90
> [   12.678587]  [<0000000000ad8c14>] virtnet_find_vqs+0x244/0x3e8
> [   12.678669]  [<0000000000ad9268>] virtnet_probe+0x4b0/0xca8
> [   12.678748]  [<00000000009ed6b4>] virtio_dev_probe+0x1ec/0x418
> [   12.678826]  [<0000000000a3c246>] really_probe+0xd6/0x480
> [   12.678906]  [<0000000000a3c7a0>] driver_probe_device+0x40/0xf0
> [   12.678985]  [<0000000000a3d0e4>] __driver_attach+0xbc/0x228
> [   12.679065]  [<0000000000a396c0>] bus_for_each_dev+0x80/0xb8
> [   12.679143]  [<0000000000a3b38e>] bus_add_driver+0x1d6/0x260
> [   12.679222]  [<0000000000a3dc10>] driver_register+0xa8/0x170
> [   12.679312]  [<00000000017b8848>] virtio_net_driver_init+0x88/0xc0
> 
> This worked fine in v6.1 and earlier kernels. Bisect log attached.
> 
> Guenter

Yes, we were about to report the same issue. Currently in linux-next
PCI support is broken for both ConnectX based NICs, NVMes etc. Matthew
Rosato bisected this to the above mentioned commit on Monday and was I
believe still investigating details.


As far as I'm aware so far he tracked this down to code calling
msi_domain_get_hwsize() which in turn calls msi_get_device_domain()
which then returns NULL leading to msi_domain_get_hwsize() returning 0.
I think this is related to the fact that we currently don't use IRQ
domains.

Thanks,
Niklas

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-14  9:42     ` Niklas Schnelle
@ 2022-12-15 14:49       ` Thomas Gleixner
  2022-12-15 16:23         ` Matthew Rosato
  0 siblings, 1 reply; 126+ messages in thread
From: Thomas Gleixner @ 2022-12-15 14:49 UTC (permalink / raw)
  To: Niklas Schnelle, Guenter Roeck
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe,
	Matthew Rosato

On Wed, Dec 14 2022 at 10:42, Niklas Schnelle wrote:
> On Tue, 2022-12-13 at 11:04 -0800, Guenter Roeck wrote:
>> This patch results in various s390 qemu test failures.
>> There is a warning backtrace
>> 
>>    12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0
>> 
>> followed by
>> 
>> [   12.684333] virtio_net: probe of virtio0 failed with error -34
>> 
>> and Ethernet interfaces don't instantiate.
> As far as I'm aware so far he tracked this down to code calling
> msi_domain_get_hwsize() which in turn calls msi_get_device_domain()
> which then returns NULL leading to msi_domain_get_hwsize() returning 0.
> I think this is related to the fact that we currently don't use IRQ
> domains.

Correct and for some stupid reason I thought 0 is a good return value
here :)



diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index bd4d4dd626b4..8fb10f216dc0 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -609,8 +609,8 @@ static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid
 		info = domain->host_data;
 		return info->hwsize;
 	}
-	/* No domain, no size... */
-	return 0;
+	/* No domain, default to MSI_MAX_INDEX */
+	return MSI_MAX_INDEX;
 }
 
 static inline void irq_chip_write_msi_msg(struct irq_data *data,

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-15 14:49       ` Thomas Gleixner
@ 2022-12-15 16:23         ` Matthew Rosato
  2022-12-15 21:32           ` Guenter Roeck
  2022-12-16  9:53           ` Marc Zyngier
  0 siblings, 2 replies; 126+ messages in thread
From: Matthew Rosato @ 2022-12-15 16:23 UTC (permalink / raw)
  To: Thomas Gleixner, Niklas Schnelle, Guenter Roeck
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On 12/15/22 9:49 AM, Thomas Gleixner wrote:
> On Wed, Dec 14 2022 at 10:42, Niklas Schnelle wrote:
>> On Tue, 2022-12-13 at 11:04 -0800, Guenter Roeck wrote:
>>> This patch results in various s390 qemu test failures.
>>> There is a warning backtrace
>>>
>>>    12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0
>>>
>>> followed by
>>>
>>> [   12.684333] virtio_net: probe of virtio0 failed with error -34
>>>
>>> and Ethernet interfaces don't instantiate.
>> As far as I'm aware so far he tracked this down to code calling
>> msi_domain_get_hwsize() which in turn calls msi_get_device_domain()
>> which then returns NULL leading to msi_domain_get_hwsize() returning 0.
>> I think this is related to the fact that we currently don't use IRQ
>> domains.
> 
> Correct and for some stupid reason I thought 0 is a good return value
> here :)
> 
> 
> 
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index bd4d4dd626b4..8fb10f216dc0 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -609,8 +609,8 @@ static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid
>  		info = domain->host_data;
>  		return info->hwsize;
>  	}
> -	/* No domain, no size... */
> -	return 0;
> +	/* No domain, default to MSI_MAX_INDEX */
> +	return MSI_MAX_INDEX;
>  }
>  
>  static inline void irq_chip_write_msi_msg(struct irq_data *data,

Ah, that makes sense...  So, with that diff applied, that fixes most of the issues I'm seeing incl. the virtio one that Guenter mentioned.  But it looks like NVMe devices are still broken on s390 with a different backtrace -- the bisect for that one points to another patch in part2 of this work and looks like another issue with missing irq domain:

40742716f294 genirq/msi: Make msi_add_simple_msi_descs() device domain aware


[    4.308861] ------------[ cut here ]------------
[    4.308865] WARNING: CPU: 7 PID: 9 at kernel/irq/msi.c:167 msi_domain_free_msi_descs_range+0x3c/0xd0
[    4.308877] Modules linked in: mlx5_core aes_s390 nvme des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 nvme_core sha1_s390 sha_common pkey zcrypt rng_core autofs4
[    4.308896] CPU: 7 PID: 9 Comm: kworker/u20:0 Not tainted 6.1.0 #179
[    4.308898] Hardware name: IBM 3931 A01 782 (KVM/Linux)
[    4.308900] Workqueue: events_unbound async_run_entry_fn
[    4.308905] Krnl PSW : 0704c00180000000 00000000b6426b78 (msi_domain_free_msi_descs_range+0x40/0xd0)
[    4.308909]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[    4.308911] Krnl GPRS: 0700000080eda0a0 0000000000000000 0000000080eda0a0 0000000000000000
[    4.308913]            0000000000000000 0000000000000000 0000000000000cc0 0000000080eda000
[    4.308914]            00000000b7ddc000 0000000091934aa8 000000000000ffff 0000000000000000
[    4.308915]            0000000080344200 0000000080f2b1c0 0000037fffb5b918 0000037fffb5b8c8
[    4.308924] Krnl Code: 00000000b6426b68: e54cf0ac0000	mvhi	172(%r15),0
[    4.308924]            00000000b6426b6e: ec3c000b017f	clij	%r3,1,12,00000000b6426b84
[    4.308924]           #00000000b6426b74: af000000		mc	0,0
[    4.308924]           >00000000b6426b78: eb9ff0b00004	lmg	%r9,%r15,176(%r15)
[    4.308924]            00000000b6426b7e: 07fe		bcr	15,%r14
[    4.308924]            00000000b6426b80: 47000700		bc	0,1792
[    4.308924]            00000000b6426b84: b90400a5		lgr	%r10,%r5
[    4.308924]            00000000b6426b88: b9040013		lgr	%r1,%r3
[    4.308935] Call Trace:
[    4.308938]  [<00000000b6426b78>] msi_domain_free_msi_descs_range+0x40/0xd0 
[    4.308945]  [<00000000b6bb126e>] pci_free_msi_irqs+0x26/0x48 
[    4.308950]  [<00000000b6baf4d4>] pci_disable_msix+0x6c/0x80 
[    4.308954]  [<00000000b6baf716>] pci_free_irq_vectors+0x26/0x88 
[    4.308956]  [<000003ff7fdfa8f4>] nvme_setup_io_queues+0x18c/0x398 [nvme] 
[    4.308968]  [<000003ff7fdfbf1e>] nvme_probe+0x2e6/0x3b0 [nvme] 
[    4.308972]  [<00000000b6ba44cc>] local_pci_probe+0x44/0x80 
[    4.308974]  [<00000000b6ba46d8>] pci_call_probe+0x50/0x180 
[    4.308976]  [<00000000b6ba5166>] pci_device_probe+0xae/0x110 
[    4.308978]  [<00000000b6c0a19a>] really_probe+0xd2/0x480 
[    4.308982]  [<00000000b6c0a6f8>] driver_probe_device+0x40/0xf0 
[    4.308984]  [<00000000b6c0a80e>] __driver_attach_async_helper+0x66/0xf0 
[    4.308986]  [<00000000b63cfb72>] async_run_entry_fn+0x4a/0x1b0 
[    4.308987]  [<00000000b63c1368>] process_one_work+0x200/0x458 
[    4.308991]  [<00000000b63c1aee>] worker_thread+0x66/0x480 
[    4.308993]  [<00000000b63caa00>] kthread+0x108/0x110 
[    4.308996]  [<00000000b634f2dc>] __ret_from_fork+0x3c/0x58 
[    4.308999]  [<00000000b6f8da2a>] ret_from_fork+0xa/0x40 
[    4.309006] Last Breaking-Event-Address:
[    4.309007]  [<00000000b6426ba8>] msi_domain_free_msi_descs_range+0x70/0xd0
[    4.309009] ---[ end trace 0000000000000000 ]---
[    8.957187] nvme: probe of 0003:00:00.0 failed with error -22
[    8.957216] ------------[ cut here ]------------
[    8.957217] WARNING: CPU: 5 PID: 9 at kernel/irq/msi.c:275 msi_device_data_release+0x76/0xa0
[    8.957229] Modules linked in: mlx5_core aes_s390 nvme des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 nvme_core sha1_s390 sha_common pkey zcrypt rng_core autofs4
[    8.957248] CPU: 5 PID: 9 Comm: kworker/u20:0 Tainted: G        W          6.1.0 #179
[    8.957252] Hardware name: IBM 3931 A01 782 (KVM/Linux)
[    8.957254] Workqueue: events_unbound async_run_entry_fn
[    8.957259] Krnl PSW : 0704e00180000000 00000000b642729a (msi_device_data_release+0x7a/0xa0)
[    8.957262]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[    8.957265] Krnl GPRS: a813fdc020800000 00000000928b6840 0000000091934ab8 0000000080344200
[    8.957267]            0000000000000000 0000000091934a80 0000000080d79988 0000000000000000
[    8.957268]            0000000080eda0a0 0000037fffb5bc10 0000000080eda0a0 0000000091934aa8
[    8.957270]            0000000080344200 00000000800e0402 00000000b642724e 0000037fffb5bad0
[    8.957279] Krnl Code: 00000000b642728c: f0a0000407fe	srp	4(11,%r0),2046,0
[    8.957279]            00000000b6427292: 47000700		bc	0,1792
[    8.957279]           #00000000b6427296: af000000		mc	0,0
[    8.957279]           >00000000b642729a: a7f4ffdf		brc	15,00000000b6427258
[    8.957279]            00000000b642729e: af000000		mc	0,0
[    8.957279]            00000000b64272a2: 4120b048		la	%r2,72(%r11)
[    8.957279]            00000000b64272a6: c0e5005a0c4d	brasl	%r14,00000000b6f68b40
[    8.957279]            00000000b64272ac: e548a1180000	mvghi	280(%r10),0
[    8.957290] Call Trace:
[    8.957292]  [<00000000b642729a>] msi_device_data_release+0x7a/0xa0 
[    8.957295] ([<00000000b642724e>] msi_device_data_release+0x2e/0xa0)
[    8.957298]  [<00000000b6c0f608>] release_nodes+0x50/0xd8 
[    8.957305]  [<00000000b6c111aa>] devres_release_all+0xaa/0xf0 
[    8.957308]  [<00000000b6c0a2f2>] really_probe+0x22a/0x480 
[    8.957310]  [<00000000b6c0a6f8>] driver_probe_device+0x40/0xf0 
[    8.957312]  [<00000000b6c0a80e>] __driver_attach_async_helper+0x66/0xf0 
[    8.957314]  [<00000000b63cfb72>] async_run_entry_fn+0x4a/0x1b0 
[    8.957315]  [<00000000b63c1368>] process_one_work+0x200/0x458 
[    8.957320]  [<00000000b63c1aee>] worker_thread+0x66/0x480 
[    8.957322]  [<00000000b63caa00>] kthread+0x108/0x110 
[    8.957325]  [<00000000b634f2dc>] __ret_from_fork+0x3c/0x58 
[    8.957328]  [<00000000b6f8da2a>] ret_from_fork+0xa/0x40 
[    8.957336] Last Breaking-Event-Address:
[    8.957337]  [<00000000b6427254>] msi_device_data_release+0x34/0xa0
[    8.957339] ---[ end trace 0000000000000000 ]---

The line number for the first warning points to the WARN_ON check in msi_ctrl_valid -- specifically it's the !dev->msi.data->__domains[ctrl->domid].domain check that is failing.

The second warning is the WARN_ON_ONCE(!xa_empty(&md->__domains[i].store)) check in msi_device_data_release, presumably a victim of backing out after the first error.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-15 16:23         ` Matthew Rosato
@ 2022-12-15 21:32           ` Guenter Roeck
  2022-12-16  9:53           ` Marc Zyngier
  1 sibling, 0 replies; 126+ messages in thread
From: Guenter Roeck @ 2022-12-15 21:32 UTC (permalink / raw)
  To: Matthew Rosato, Thomas Gleixner, Niklas Schnelle
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On 12/15/22 08:23, Matthew Rosato wrote:
> On 12/15/22 9:49 AM, Thomas Gleixner wrote:
>> On Wed, Dec 14 2022 at 10:42, Niklas Schnelle wrote:
>>> On Tue, 2022-12-13 at 11:04 -0800, Guenter Roeck wrote:
>>>> This patch results in various s390 qemu test failures.
>>>> There is a warning backtrace
>>>>
>>>>     12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0
>>>>
>>>> followed by
>>>>
>>>> [   12.684333] virtio_net: probe of virtio0 failed with error -34
>>>>
>>>> and Ethernet interfaces don't instantiate.
>>> As far as I'm aware so far he tracked this down to code calling
>>> msi_domain_get_hwsize() which in turn calls msi_get_device_domain()
>>> which then returns NULL leading to msi_domain_get_hwsize() returning 0.
>>> I think this is related to the fact that we currently don't use IRQ
>>> domains.
>>
>> Correct and for some stupid reason I thought 0 is a good return value
>> here :)
>>
>>
>>
>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>> index bd4d4dd626b4..8fb10f216dc0 100644
>> --- a/kernel/irq/msi.c
>> +++ b/kernel/irq/msi.c
>> @@ -609,8 +609,8 @@ static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid
>>   		info = domain->host_data;
>>   		return info->hwsize;
>>   	}
>> -	/* No domain, no size... */
>> -	return 0;
>> +	/* No domain, default to MSI_MAX_INDEX */
>> +	return MSI_MAX_INDEX;
>>   }
>>   
>>   static inline void irq_chip_write_msi_msg(struct irq_data *data,
> 
> Ah, that makes sense...  So, with that diff applied, that fixes most of the issues I'm seeing incl. the virtio one that Guenter mentioned.  But it looks like NVMe devices are still broken on s390 with a different backtrace -- the bisect for that one points to another patch in part2 of this work and looks like another issue with missing irq domain:
> 
> 40742716f294 genirq/msi: Make msi_add_simple_msi_descs() device domain aware
> 
> 
> [    4.308861] ------------[ cut here ]------------
> [    4.308865] WARNING: CPU: 7 PID: 9 at kernel/irq/msi.c:167 msi_domain_free_msi_descs_range+0x3c/0xd0
> [    4.308877] Modules linked in: mlx5_core aes_s390 nvme des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 nvme_core sha1_s390 sha_common pkey zcrypt rng_core autofs4
> [    4.308896] CPU: 7 PID: 9 Comm: kworker/u20:0 Not tainted 6.1.0 #179
> [    4.308898] Hardware name: IBM 3931 A01 782 (KVM/Linux)
> [    4.308900] Workqueue: events_unbound async_run_entry_fn
> [    4.308905] Krnl PSW : 0704c00180000000 00000000b6426b78 (msi_domain_free_msi_descs_range+0x40/0xd0)
> [    4.308909]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [    4.308911] Krnl GPRS: 0700000080eda0a0 0000000000000000 0000000080eda0a0 0000000000000000
> [    4.308913]            0000000000000000 0000000000000000 0000000000000cc0 0000000080eda000
> [    4.308914]            00000000b7ddc000 0000000091934aa8 000000000000ffff 0000000000000000
> [    4.308915]            0000000080344200 0000000080f2b1c0 0000037fffb5b918 0000037fffb5b8c8
> [    4.308924] Krnl Code: 00000000b6426b68: e54cf0ac0000	mvhi	172(%r15),0
> [    4.308924]            00000000b6426b6e: ec3c000b017f	clij	%r3,1,12,00000000b6426b84
> [    4.308924]           #00000000b6426b74: af000000		mc	0,0
> [    4.308924]           >00000000b6426b78: eb9ff0b00004	lmg	%r9,%r15,176(%r15)
> [    4.308924]            00000000b6426b7e: 07fe		bcr	15,%r14
> [    4.308924]            00000000b6426b80: 47000700		bc	0,1792
> [    4.308924]            00000000b6426b84: b90400a5		lgr	%r10,%r5
> [    4.308924]            00000000b6426b88: b9040013		lgr	%r1,%r3
> [    4.308935] Call Trace:
> [    4.308938]  [<00000000b6426b78>] msi_domain_free_msi_descs_range+0x40/0xd0
> [    4.308945]  [<00000000b6bb126e>] pci_free_msi_irqs+0x26/0x48
> [    4.308950]  [<00000000b6baf4d4>] pci_disable_msix+0x6c/0x80
> [    4.308954]  [<00000000b6baf716>] pci_free_irq_vectors+0x26/0x88
> [    4.308956]  [<000003ff7fdfa8f4>] nvme_setup_io_queues+0x18c/0x398 [nvme]
> [    4.308968]  [<000003ff7fdfbf1e>] nvme_probe+0x2e6/0x3b0 [nvme]
> [    4.308972]  [<00000000b6ba44cc>] local_pci_probe+0x44/0x80
> [    4.308974]  [<00000000b6ba46d8>] pci_call_probe+0x50/0x180
> [    4.308976]  [<00000000b6ba5166>] pci_device_probe+0xae/0x110
> [    4.308978]  [<00000000b6c0a19a>] really_probe+0xd2/0x480
> [    4.308982]  [<00000000b6c0a6f8>] driver_probe_device+0x40/0xf0
> [    4.308984]  [<00000000b6c0a80e>] __driver_attach_async_helper+0x66/0xf0
> [    4.308986]  [<00000000b63cfb72>] async_run_entry_fn+0x4a/0x1b0
> [    4.308987]  [<00000000b63c1368>] process_one_work+0x200/0x458
> [    4.308991]  [<00000000b63c1aee>] worker_thread+0x66/0x480
> [    4.308993]  [<00000000b63caa00>] kthread+0x108/0x110
> [    4.308996]  [<00000000b634f2dc>] __ret_from_fork+0x3c/0x58
> [    4.308999]  [<00000000b6f8da2a>] ret_from_fork+0xa/0x40
> [    4.309006] Last Breaking-Event-Address:
> [    4.309007]  [<00000000b6426ba8>] msi_domain_free_msi_descs_range+0x70/0xd0
> [    4.309009] ---[ end trace 0000000000000000 ]---
> [    8.957187] nvme: probe of 0003:00:00.0 failed with error -22

With this patch applied, I see the same error on powerpc, followed by

WARNING: CPU: 0 PID: 21 at arch/powerpc/kernel/irq.c:348 .virq_to_hw+0x1c/0x60
Modules linked in:
CPU: 0 PID: 21 Comm: kworker/u2:2 Tainted: G        W        N 6.1.0-10397-g8a1566869bf4 #1
Hardware name: QEMU ppce500 e5500 0x80240020 QEMU e500
Workqueue: events_unbound .async_run_entry_fn
NIP:  c00000000000415c LR: c000000000004150 CTR: 0000000000000000
REGS: c0000000043c2f70 TRAP: 0700   Tainted: G        W        N  (6.1.0-10397-g8a1566869bf4)
MSR:  0000000080029002 <CE,EE,ME>  CR: 24828842  XER: 00000000
IRQMASK: 0
GPR00: c000000000004150 c0000000043c3210 c00000000169dd00 0000000000000000
GPR04: 0000000000000013 0000000000000000 0000000000000000 c000000005000368
GPR08: c0000000050002a8 0000000000000001 c000000005000360 fffffffffffffffd
GPR12: c00000000186a400 c0000000025f2000 c0000000060da000 c00000000433ca40
GPR16: 0000000000000004 c00000000617a020 0000000000000000 0000000000000000
GPR20: 0000000000000001 0000000000000000 c0000000043c3758 c00000000617a020
GPR24: 0000000000000000 0000000000000001 0000000000000001 c0000000043810b8
GPR28: c0000000043810b8 0000000000000041 fffffffffffffff0 c00000000602e4b8
NIP [c00000000000415c] .virq_to_hw+0x1c/0x60
LR [c000000000004150] .virq_to_hw+0x10/0x60
Call Trace:
[c0000000043c3210] [c000000000004150] .virq_to_hw+0x10/0x60 (unreliable)
[c0000000043c3280] [c000000000041d58] .fsl_teardown_msi_irqs+0x48/0xe0
[c0000000043c3310] [c00000000002b204] .arch_teardown_msi_irqs+0x34/0x50
[c0000000043c3380] [c0000000008d6e68] .pci_msi_legacy_teardown_msi_irqs+0x28/0x40
[c0000000043c3400] [c0000000008d66c0] .pci_msi_teardown_msi_irqs+0x30/0xa0
[c0000000043c3480] [c0000000008d590c] .__pci_enable_msix_range+0x5fc/0x990
[c0000000043c35e0] [c0000000008d3934] .pci_alloc_irq_vectors_affinity+0x154/0x1d0
[c0000000043c36c0] [c000000000a74360] .nvme_setup_io_queues+0x2b0/0x9c0
[c0000000043c3830] [c000000000a76298] .nvme_probe+0x538/0x620
[c0000000043c38d0] [c0000000008c6e84] .pci_device_probe+0xc4/0x190
[c0000000043c3960] [c0000000009a9f38] .really_probe+0x108/0x460
[c0000000043c39f0] [c0000000009aa3a4] .driver_probe_device+0x44/0x120
[c0000000043c3a80] [c0000000009aa4e4] .__driver_attach_async_helper+0x64/0x120
[c0000000043c3b10] [c000000000094ca0] .async_run_entry_fn+0x50/0x140
[c0000000043c3bb0] [c000000000081e98] .process_one_work+0x2d8/0x7b0
[c0000000043c3c90] [c000000000082408] .worker_thread+0x98/0x4f0
[c0000000043c3d70] [c00000000008f2ac] .kthread+0x13c/0x150
[c0000000043c3e10] [c0000000000005d8] .ret_from_kernel_thread+0x58/0x60
Instruction dump:
78630020 ebc1fff0 ebe1fff8 7c0803a6 4e800020 7c0802a6 f8010010 f821ff91
480fb275 60000000 7c690074 7929d182 <0b090000> 38210070 e8630008 e8010010
irq event stamp: 96256
hardirqs last  enabled at (96255): [<c000000001114314>] ._raw_spin_unlock_irqrestore+0x84/0xd0
hardirqs last disabled at (96256): [<c000000000010b68>] .program_check_exception+0x38/0x120
softirqs last  enabled at (96188): [<c00000000111557c>] .__do_softirq+0x60c/0x678
softirqs last disabled at (96181): [<c000000000004d30>] .do_softirq_own_stack+0x30/0x50
---[ end trace 0000000000000000 ]---
Kernel attempted to read user page (d8) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x000000d8
Faulting instruction address: 0xc0000000000e5540
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500
Modules linked in:
CPU: 0 PID: 21 Comm: kworker/u2:2 Tainted: G        W        N 6.1.0-10397-g8a1566869bf4 #1
Hardware name: QEMU ppce500 e5500 0x80240020 QEMU e500
Workqueue: events_unbound .async_run_entry_fn
NIP:  c0000000000e5540 LR: c0000000000e44a4 CTR: 0000000000000000
REGS: c0000000043c2c90 TRAP: 0300   Tainted: G        W        N  (6.1.0-10397-g8a1566869bf4)
MSR:  0000000080029002 <CE,EE,ME>  CR: 44828842  XER: 00000000
DEAR: 00000000000000d8 ESR: 0000000000000000 IRQMASK: 1
GPR00: c0000000000e44a4 c0000000043c2f30 c00000000169dd00 00000000000000d8
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
GPR08: 0000000000000001 0000000000000001 c00000000003f5a4 0000000000000001
GPR12: 0000000044828842 c0000000025f2000 00000000000000d8 0000000000000000
GPR16: 0000000000000004 c00000000617a020 c0000000043e8040 0000000000000000
GPR20: 0000000000000001 c0000000019e5918 c00000000182e738 0000000000000000
GPR24: 0000000000000000 0000000000000001 0000000000000000 00000000000000d8
GPR28: 0000000000000001 0000000000000000 0000000000000000 c00000000182e738
NIP [c0000000000e5540] .__lock_acquire+0x2f0/0x1e90
LR [c0000000000e44a4] .lock_acquire+0x144/0x450
Call Trace:
[c0000000043c2f30] [c0000000043c2ff0] 0xc0000000043c2ff0 (unreliable)
[c0000000043c3060] [c0000000000e44a4] .lock_acquire+0x144/0x450
[c0000000043c3160] [c000000001113ebc] ._raw_spin_lock_irqsave+0x5c/0xb0
[c0000000043c31f0] [c00000000003f5a4] .msi_bitmap_free_hwirqs+0x34/0x90
[c0000000043c3280] [c000000000041da4] .fsl_teardown_msi_irqs+0x94/0xe0
[c0000000043c3310] [c00000000002b204] .arch_teardown_msi_irqs+0x34/0x50
[c0000000043c3380] [c0000000008d6e68] .pci_msi_legacy_teardown_msi_irqs+0x28/0x40
[c0000000043c3400] [c0000000008d66c0] .pci_msi_teardown_msi_irqs+0x30/0xa0
[c0000000043c3480] [c0000000008d590c] .__pci_enable_msix_range+0x5fc/0x990
[c0000000043c35e0] [c0000000008d3934] .pci_alloc_irq_vectors_affinity+0x154/0x1d0
[c0000000043c36c0] [c000000000a74360] .nvme_setup_io_queues+0x2b0/0x9c0
[c0000000043c3830] [c000000000a76298] .nvme_probe+0x538/0x620
[c0000000043c38d0] [c0000000008c6e84] .pci_device_probe+0xc4/0x190
[c0000000043c3960] [c0000000009a9f38] .really_probe+0x108/0x460
[c0000000043c39f0] [c0000000009aa3a4] .driver_probe_device+0x44/0x120
[c0000000043c3a80] [c0000000009aa4e4] .__driver_attach_async_helper+0x64/0x120
[c0000000043c3b10] [c000000000094ca0] .async_run_entry_fn+0x50/0x140
[c0000000043c3bb0] [c000000000081e98] .process_one_work+0x2d8/0x7b0
[c0000000043c3c90] [c000000000082408] .worker_thread+0x98/0x4f0
[c0000000043c3d70] [c00000000008f2ac] .kthread+0x13c/0x150
[c0000000043c3e10] [c0000000000005d8] .ret_from_kernel_thread+0x58/0x60
Instruction dump:
2c0a0000 40c200bc 3c82ffe4 3c62ffe3 38841f48 38639c50 4bf6eeb1 60000000
0fe00000 60000000 60000000 60000000 <e9430000> 3d22006c 3929cc98 7c2a4800
---[ end trace 0000000000000000 ]---

Guenter



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-15 16:23         ` Matthew Rosato
  2022-12-15 21:32           ` Guenter Roeck
@ 2022-12-16  9:53           ` Marc Zyngier
  2022-12-16 13:50             ` Matthew Rosato
  1 sibling, 1 reply; 126+ messages in thread
From: Marc Zyngier @ 2022-12-16  9:53 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: Thomas Gleixner, Niklas Schnelle, Guenter Roeck, LKML, x86,
	Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Greg Kroah-Hartman, Jason Gunthorpe,
	Dave Jiang, Alex Williamson, Kevin Tian, Dan Williams,
	Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Thu, 15 Dec 2022 16:23:20 +0000,
Matthew Rosato <mjrosato@linux.ibm.com> wrote:
> 
> On 12/15/22 9:49 AM, Thomas Gleixner wrote:
> > On Wed, Dec 14 2022 at 10:42, Niklas Schnelle wrote:
> >> On Tue, 2022-12-13 at 11:04 -0800, Guenter Roeck wrote:
> >>> This patch results in various s390 qemu test failures.
> >>> There is a warning backtrace
> >>>
> >>>    12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0
> >>>
> >>> followed by
> >>>
> >>> [   12.684333] virtio_net: probe of virtio0 failed with error -34
> >>>
> >>> and Ethernet interfaces don't instantiate.
> >> As far as I'm aware so far he tracked this down to code calling
> >> msi_domain_get_hwsize() which in turn calls msi_get_device_domain()
> >> which then returns NULL leading to msi_domain_get_hwsize() returning 0.
> >> I think this is related to the fact that we currently don't use IRQ
> >> domains.
> > 
> > Correct and for some stupid reason I thought 0 is a good return value
> > here :)
> > 
> > 
> > 
> > diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> > index bd4d4dd626b4..8fb10f216dc0 100644
> > --- a/kernel/irq/msi.c
> > +++ b/kernel/irq/msi.c
> > @@ -609,8 +609,8 @@ static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid
> >  		info = domain->host_data;
> >  		return info->hwsize;
> >  	}
> > -	/* No domain, no size... */
> > -	return 0;
> > +	/* No domain, default to MSI_MAX_INDEX */
> > +	return MSI_MAX_INDEX;
> >  }
> >  
> >  static inline void irq_chip_write_msi_msg(struct irq_data *data,
> 
> Ah, that makes sense...  So, with that diff applied, that fixes most of the issues I'm seeing incl. the virtio one that Guenter mentioned.  But it looks like NVMe devices are still broken on s390 with a different backtrace -- the bisect for that one points to another patch in part2 of this work and looks like another issue with missing irq domain:
> 
> 40742716f294 genirq/msi: Make msi_add_simple_msi_descs() device domain aware
> 
> 
> [    4.308861] ------------[ cut here ]------------
> [    4.308865] WARNING: CPU: 7 PID: 9 at kernel/irq/msi.c:167 msi_domain_free_msi_descs_range+0x3c/0xd0
> [    4.308877] Modules linked in: mlx5_core aes_s390 nvme des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 nvme_core sha1_s390 sha_common pkey zcrypt rng_core autofs4
> [    4.308896] CPU: 7 PID: 9 Comm: kworker/u20:0 Not tainted 6.1.0 #179
> [    4.308898] Hardware name: IBM 3931 A01 782 (KVM/Linux)
> [    4.308900] Workqueue: events_unbound async_run_entry_fn
> [    4.308905] Krnl PSW : 0704c00180000000 00000000b6426b78 (msi_domain_free_msi_descs_range+0x40/0xd0)
> [    4.308909]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [    4.308911] Krnl GPRS: 0700000080eda0a0 0000000000000000 0000000080eda0a0 0000000000000000
> [    4.308913]            0000000000000000 0000000000000000 0000000000000cc0 0000000080eda000
> [    4.308914]            00000000b7ddc000 0000000091934aa8 000000000000ffff 0000000000000000
> [    4.308915]            0000000080344200 0000000080f2b1c0 0000037fffb5b918 0000037fffb5b8c8
> [    4.308924] Krnl Code: 00000000b6426b68: e54cf0ac0000	mvhi	172(%r15),0
> [    4.308924]            00000000b6426b6e: ec3c000b017f	clij	%r3,1,12,00000000b6426b84
> [    4.308924]           #00000000b6426b74: af000000		mc	0,0
> [    4.308924]           >00000000b6426b78: eb9ff0b00004	lmg	%r9,%r15,176(%r15)
> [    4.308924]            00000000b6426b7e: 07fe		bcr	15,%r14
> [    4.308924]            00000000b6426b80: 47000700		bc	0,1792
> [    4.308924]            00000000b6426b84: b90400a5		lgr	%r10,%r5
> [    4.308924]            00000000b6426b88: b9040013		lgr	%r1,%r3
> [    4.308935] Call Trace:
> [    4.308938]  [<00000000b6426b78>] msi_domain_free_msi_descs_range+0x40/0xd0 
> [    4.308945]  [<00000000b6bb126e>] pci_free_msi_irqs+0x26/0x48 
> [    4.308950]  [<00000000b6baf4d4>] pci_disable_msix+0x6c/0x80 
> [    4.308954]  [<00000000b6baf716>] pci_free_irq_vectors+0x26/0x88 
> [    4.308956]  [<000003ff7fdfa8f4>] nvme_setup_io_queues+0x18c/0x398 [nvme] 
> [    4.308968]  [<000003ff7fdfbf1e>] nvme_probe+0x2e6/0x3b0 [nvme] 
> [    4.308972]  [<00000000b6ba44cc>] local_pci_probe+0x44/0x80 
> [    4.308974]  [<00000000b6ba46d8>] pci_call_probe+0x50/0x180 
> [    4.308976]  [<00000000b6ba5166>] pci_device_probe+0xae/0x110 
> [    4.308978]  [<00000000b6c0a19a>] really_probe+0xd2/0x480 
> [    4.308982]  [<00000000b6c0a6f8>] driver_probe_device+0x40/0xf0 
> [    4.308984]  [<00000000b6c0a80e>] __driver_attach_async_helper+0x66/0xf0 
> [    4.308986]  [<00000000b63cfb72>] async_run_entry_fn+0x4a/0x1b0 
> [    4.308987]  [<00000000b63c1368>] process_one_work+0x200/0x458 
> [    4.308991]  [<00000000b63c1aee>] worker_thread+0x66/0x480 
> [    4.308993]  [<00000000b63caa00>] kthread+0x108/0x110 
> [    4.308996]  [<00000000b634f2dc>] __ret_from_fork+0x3c/0x58 
> [    4.308999]  [<00000000b6f8da2a>] ret_from_fork+0xa/0x40 
> [    4.309006] Last Breaking-Event-Address:
> [    4.309007]  [<00000000b6426ba8>] msi_domain_free_msi_descs_range+0x70/0xd0
> [    4.309009] ---[ end trace 0000000000000000 ]---
> [    8.957187] nvme: probe of 0003:00:00.0 failed with error -22
> [    8.957216] ------------[ cut here ]------------
> [    8.957217] WARNING: CPU: 5 PID: 9 at kernel/irq/msi.c:275 msi_device_data_release+0x76/0xa0
> [    8.957229] Modules linked in: mlx5_core aes_s390 nvme des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 nvme_core sha1_s390 sha_common pkey zcrypt rng_core autofs4
> [    8.957248] CPU: 5 PID: 9 Comm: kworker/u20:0 Tainted: G        W          6.1.0 #179
> [    8.957252] Hardware name: IBM 3931 A01 782 (KVM/Linux)
> [    8.957254] Workqueue: events_unbound async_run_entry_fn
> [    8.957259] Krnl PSW : 0704e00180000000 00000000b642729a (msi_device_data_release+0x7a/0xa0)
> [    8.957262]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> [    8.957265] Krnl GPRS: a813fdc020800000 00000000928b6840 0000000091934ab8 0000000080344200
> [    8.957267]            0000000000000000 0000000091934a80 0000000080d79988 0000000000000000
> [    8.957268]            0000000080eda0a0 0000037fffb5bc10 0000000080eda0a0 0000000091934aa8
> [    8.957270]            0000000080344200 00000000800e0402 00000000b642724e 0000037fffb5bad0
> [    8.957279] Krnl Code: 00000000b642728c: f0a0000407fe	srp	4(11,%r0),2046,0
> [    8.957279]            00000000b6427292: 47000700		bc	0,1792
> [    8.957279]           #00000000b6427296: af000000		mc	0,0
> [    8.957279]           >00000000b642729a: a7f4ffdf		brc	15,00000000b6427258
> [    8.957279]            00000000b642729e: af000000		mc	0,0
> [    8.957279]            00000000b64272a2: 4120b048		la	%r2,72(%r11)
> [    8.957279]            00000000b64272a6: c0e5005a0c4d	brasl	%r14,00000000b6f68b40
> [    8.957279]            00000000b64272ac: e548a1180000	mvghi	280(%r10),0
> [    8.957290] Call Trace:
> [    8.957292]  [<00000000b642729a>] msi_device_data_release+0x7a/0xa0 
> [    8.957295] ([<00000000b642724e>] msi_device_data_release+0x2e/0xa0)
> [    8.957298]  [<00000000b6c0f608>] release_nodes+0x50/0xd8 
> [    8.957305]  [<00000000b6c111aa>] devres_release_all+0xaa/0xf0 
> [    8.957308]  [<00000000b6c0a2f2>] really_probe+0x22a/0x480 
> [    8.957310]  [<00000000b6c0a6f8>] driver_probe_device+0x40/0xf0 
> [    8.957312]  [<00000000b6c0a80e>] __driver_attach_async_helper+0x66/0xf0 
> [    8.957314]  [<00000000b63cfb72>] async_run_entry_fn+0x4a/0x1b0 
> [    8.957315]  [<00000000b63c1368>] process_one_work+0x200/0x458 
> [    8.957320]  [<00000000b63c1aee>] worker_thread+0x66/0x480 
> [    8.957322]  [<00000000b63caa00>] kthread+0x108/0x110 
> [    8.957325]  [<00000000b634f2dc>] __ret_from_fork+0x3c/0x58 
> [    8.957328]  [<00000000b6f8da2a>] ret_from_fork+0xa/0x40 
> [    8.957336] Last Breaking-Event-Address:
> [    8.957337]  [<00000000b6427254>] msi_device_data_release+0x34/0xa0
> [    8.957339] ---[ end trace 0000000000000000 ]---
> 
> The line number for the first warning points to the WARN_ON check in msi_ctrl_valid -- specifically it's the !dev->msi.data->__domains[ctrl->domid].domain check that is failing.
> 
> The second warning is the WARN_ON_ONCE(!xa_empty(&md->__domains[i].store)) check in msi_device_data_release, presumably a victim of backing out after the first error.
> 

Yeah, the non-irqdomain legacy path definitely wounds up here, and we
end-up leaking descriptors. If the following hack works for you, I'll
ferry the two fixes to Linus asap.

Thanks,

	M.

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index bd4d4dd626b4..9921dc57f1b4 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -165,7 +165,8 @@ static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl)
 	unsigned int hwsize;
 
 	if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS ||
-			 !dev->msi.data->__domains[ctrl->domid].domain))
+			 (dev->msi.domain &&
+			  !dev->msi.data->__domains[ctrl->domid].domain))
 		return false;
 
 	hwsize = msi_domain_get_hwsize(dev, ctrl->domid);

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-16  9:53           ` Marc Zyngier
@ 2022-12-16 13:50             ` Matthew Rosato
  2022-12-16 13:58               ` Marc Zyngier
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Rosato @ 2022-12-16 13:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, Niklas Schnelle, Guenter Roeck, LKML, x86,
	Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Greg Kroah-Hartman, Jason Gunthorpe,
	Dave Jiang, Alex Williamson, Kevin Tian, Dan Williams,
	Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On 12/16/22 4:53 AM, Marc Zyngier wrote:
> On Thu, 15 Dec 2022 16:23:20 +0000,
> Matthew Rosato <mjrosato@linux.ibm.com> wrote:
>>
>> On 12/15/22 9:49 AM, Thomas Gleixner wrote:
>>> On Wed, Dec 14 2022 at 10:42, Niklas Schnelle wrote:
>>>> On Tue, 2022-12-13 at 11:04 -0800, Guenter Roeck wrote:
>>>>> This patch results in various s390 qemu test failures.
>>>>> There is a warning backtrace
>>>>>
>>>>>    12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0
>>>>>
>>>>> followed by
>>>>>
>>>>> [   12.684333] virtio_net: probe of virtio0 failed with error -34
>>>>>
>>>>> and Ethernet interfaces don't instantiate.
>>>> As far as I'm aware so far he tracked this down to code calling
>>>> msi_domain_get_hwsize() which in turn calls msi_get_device_domain()
>>>> which then returns NULL leading to msi_domain_get_hwsize() returning 0.
>>>> I think this is related to the fact that we currently don't use IRQ
>>>> domains.
>>>
>>> Correct and for some stupid reason I thought 0 is a good return value
>>> here :)
>>>
>>>
>>>
>>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>>> index bd4d4dd626b4..8fb10f216dc0 100644
>>> --- a/kernel/irq/msi.c
>>> +++ b/kernel/irq/msi.c
>>> @@ -609,8 +609,8 @@ static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid
>>>  		info = domain->host_data;
>>>  		return info->hwsize;
>>>  	}
>>> -	/* No domain, no size... */
>>> -	return 0;
>>> +	/* No domain, default to MSI_MAX_INDEX */
>>> +	return MSI_MAX_INDEX;
>>>  }
>>>  
>>>  static inline void irq_chip_write_msi_msg(struct irq_data *data,
>>
>> Ah, that makes sense...  So, with that diff applied, that fixes most of the issues I'm seeing incl. the virtio one that Guenter mentioned.  But it looks like NVMe devices are still broken on s390 with a different backtrace -- the bisect for that one points to another patch in part2 of this work and looks like another issue with missing irq domain:
>>
>> 40742716f294 genirq/msi: Make msi_add_simple_msi_descs() device domain aware
>>
>>
>> [    4.308861] ------------[ cut here ]------------
>> [    4.308865] WARNING: CPU: 7 PID: 9 at kernel/irq/msi.c:167 msi_domain_free_msi_descs_range+0x3c/0xd0
>> [    4.308877] Modules linked in: mlx5_core aes_s390 nvme des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 nvme_core sha1_s390 sha_common pkey zcrypt rng_core autofs4
>> [    4.308896] CPU: 7 PID: 9 Comm: kworker/u20:0 Not tainted 6.1.0 #179
>> [    4.308898] Hardware name: IBM 3931 A01 782 (KVM/Linux)
>> [    4.308900] Workqueue: events_unbound async_run_entry_fn
>> [    4.308905] Krnl PSW : 0704c00180000000 00000000b6426b78 (msi_domain_free_msi_descs_range+0x40/0xd0)
>> [    4.308909]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>> [    4.308911] Krnl GPRS: 0700000080eda0a0 0000000000000000 0000000080eda0a0 0000000000000000
>> [    4.308913]            0000000000000000 0000000000000000 0000000000000cc0 0000000080eda000
>> [    4.308914]            00000000b7ddc000 0000000091934aa8 000000000000ffff 0000000000000000
>> [    4.308915]            0000000080344200 0000000080f2b1c0 0000037fffb5b918 0000037fffb5b8c8
>> [    4.308924] Krnl Code: 00000000b6426b68: e54cf0ac0000	mvhi	172(%r15),0
>> [    4.308924]            00000000b6426b6e: ec3c000b017f	clij	%r3,1,12,00000000b6426b84
>> [    4.308924]           #00000000b6426b74: af000000		mc	0,0
>> [    4.308924]           >00000000b6426b78: eb9ff0b00004	lmg	%r9,%r15,176(%r15)
>> [    4.308924]            00000000b6426b7e: 07fe		bcr	15,%r14
>> [    4.308924]            00000000b6426b80: 47000700		bc	0,1792
>> [    4.308924]            00000000b6426b84: b90400a5		lgr	%r10,%r5
>> [    4.308924]            00000000b6426b88: b9040013		lgr	%r1,%r3
>> [    4.308935] Call Trace:
>> [    4.308938]  [<00000000b6426b78>] msi_domain_free_msi_descs_range+0x40/0xd0 
>> [    4.308945]  [<00000000b6bb126e>] pci_free_msi_irqs+0x26/0x48 
>> [    4.308950]  [<00000000b6baf4d4>] pci_disable_msix+0x6c/0x80 
>> [    4.308954]  [<00000000b6baf716>] pci_free_irq_vectors+0x26/0x88 
>> [    4.308956]  [<000003ff7fdfa8f4>] nvme_setup_io_queues+0x18c/0x398 [nvme] 
>> [    4.308968]  [<000003ff7fdfbf1e>] nvme_probe+0x2e6/0x3b0 [nvme] 
>> [    4.308972]  [<00000000b6ba44cc>] local_pci_probe+0x44/0x80 
>> [    4.308974]  [<00000000b6ba46d8>] pci_call_probe+0x50/0x180 
>> [    4.308976]  [<00000000b6ba5166>] pci_device_probe+0xae/0x110 
>> [    4.308978]  [<00000000b6c0a19a>] really_probe+0xd2/0x480 
>> [    4.308982]  [<00000000b6c0a6f8>] driver_probe_device+0x40/0xf0 
>> [    4.308984]  [<00000000b6c0a80e>] __driver_attach_async_helper+0x66/0xf0 
>> [    4.308986]  [<00000000b63cfb72>] async_run_entry_fn+0x4a/0x1b0 
>> [    4.308987]  [<00000000b63c1368>] process_one_work+0x200/0x458 
>> [    4.308991]  [<00000000b63c1aee>] worker_thread+0x66/0x480 
>> [    4.308993]  [<00000000b63caa00>] kthread+0x108/0x110 
>> [    4.308996]  [<00000000b634f2dc>] __ret_from_fork+0x3c/0x58 
>> [    4.308999]  [<00000000b6f8da2a>] ret_from_fork+0xa/0x40 
>> [    4.309006] Last Breaking-Event-Address:
>> [    4.309007]  [<00000000b6426ba8>] msi_domain_free_msi_descs_range+0x70/0xd0
>> [    4.309009] ---[ end trace 0000000000000000 ]---
>> [    8.957187] nvme: probe of 0003:00:00.0 failed with error -22
>> [    8.957216] ------------[ cut here ]------------
>> [    8.957217] WARNING: CPU: 5 PID: 9 at kernel/irq/msi.c:275 msi_device_data_release+0x76/0xa0
>> [    8.957229] Modules linked in: mlx5_core aes_s390 nvme des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 nvme_core sha1_s390 sha_common pkey zcrypt rng_core autofs4
>> [    8.957248] CPU: 5 PID: 9 Comm: kworker/u20:0 Tainted: G        W          6.1.0 #179
>> [    8.957252] Hardware name: IBM 3931 A01 782 (KVM/Linux)
>> [    8.957254] Workqueue: events_unbound async_run_entry_fn
>> [    8.957259] Krnl PSW : 0704e00180000000 00000000b642729a (msi_device_data_release+0x7a/0xa0)
>> [    8.957262]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
>> [    8.957265] Krnl GPRS: a813fdc020800000 00000000928b6840 0000000091934ab8 0000000080344200
>> [    8.957267]            0000000000000000 0000000091934a80 0000000080d79988 0000000000000000
>> [    8.957268]            0000000080eda0a0 0000037fffb5bc10 0000000080eda0a0 0000000091934aa8
>> [    8.957270]            0000000080344200 00000000800e0402 00000000b642724e 0000037fffb5bad0
>> [    8.957279] Krnl Code: 00000000b642728c: f0a0000407fe	srp	4(11,%r0),2046,0
>> [    8.957279]            00000000b6427292: 47000700		bc	0,1792
>> [    8.957279]           #00000000b6427296: af000000		mc	0,0
>> [    8.957279]           >00000000b642729a: a7f4ffdf		brc	15,00000000b6427258
>> [    8.957279]            00000000b642729e: af000000		mc	0,0
>> [    8.957279]            00000000b64272a2: 4120b048		la	%r2,72(%r11)
>> [    8.957279]            00000000b64272a6: c0e5005a0c4d	brasl	%r14,00000000b6f68b40
>> [    8.957279]            00000000b64272ac: e548a1180000	mvghi	280(%r10),0
>> [    8.957290] Call Trace:
>> [    8.957292]  [<00000000b642729a>] msi_device_data_release+0x7a/0xa0 
>> [    8.957295] ([<00000000b642724e>] msi_device_data_release+0x2e/0xa0)
>> [    8.957298]  [<00000000b6c0f608>] release_nodes+0x50/0xd8 
>> [    8.957305]  [<00000000b6c111aa>] devres_release_all+0xaa/0xf0 
>> [    8.957308]  [<00000000b6c0a2f2>] really_probe+0x22a/0x480 
>> [    8.957310]  [<00000000b6c0a6f8>] driver_probe_device+0x40/0xf0 
>> [    8.957312]  [<00000000b6c0a80e>] __driver_attach_async_helper+0x66/0xf0 
>> [    8.957314]  [<00000000b63cfb72>] async_run_entry_fn+0x4a/0x1b0 
>> [    8.957315]  [<00000000b63c1368>] process_one_work+0x200/0x458 
>> [    8.957320]  [<00000000b63c1aee>] worker_thread+0x66/0x480 
>> [    8.957322]  [<00000000b63caa00>] kthread+0x108/0x110 
>> [    8.957325]  [<00000000b634f2dc>] __ret_from_fork+0x3c/0x58 
>> [    8.957328]  [<00000000b6f8da2a>] ret_from_fork+0xa/0x40 
>> [    8.957336] Last Breaking-Event-Address:
>> [    8.957337]  [<00000000b6427254>] msi_device_data_release+0x34/0xa0
>> [    8.957339] ---[ end trace 0000000000000000 ]---
>>
>> The line number for the first warning points to the WARN_ON check in msi_ctrl_valid -- specifically it's the !dev->msi.data->__domains[ctrl->domid].domain check that is failing.
>>
>> The second warning is the WARN_ON_ONCE(!xa_empty(&md->__domains[i].store)) check in msi_device_data_release, presumably a victim of backing out after the first error.
>>
> 
> Yeah, the non-irqdomain legacy path definitely wounds up here, and we
> end-up leaking descriptors. If the following hack works for you, I'll
> ferry the two fixes to Linus asap.
> 
> Thanks,
> 
> 	M.
> 
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index bd4d4dd626b4..9921dc57f1b4 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -165,7 +165,8 @@ static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl)
>  	unsigned int hwsize;
>  
>  	if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS ||
> -			 !dev->msi.data->__domains[ctrl->domid].domain))
> +			 (dev->msi.domain &&
> +			  !dev->msi.data->__domains[ctrl->domid].domain))
>  		return false;
>  
>  	hwsize = msi_domain_get_hwsize(dev, ctrl->domid);
> 

Close, but I had to add an extra ) at the end that was missing :)

With both these fixes applied, it actually then leads to the very next WARN_ON failing in msi_ctrl_valid...  Because ctrl->last == hwsize.  I think Thomas' initial fix for msi_domain_get_hwsize has an off-by-1 error, I think we should return MSI_XA_DOMAIN_SIZE for msi_domain_get_hwsize instead.

Here's what my final squashed diff looks like, and with this applied everything seems to be working again for s390 (Guenter, can you test again on powerpc?).  Thanks all!

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index bd4d4dd626b4..955267bbc2be 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -165,7 +165,8 @@ static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl)
        unsigned int hwsize;
 
        if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS ||
-                        !dev->msi.data->__domains[ctrl->domid].domain))
+                        (dev->msi.domain &&
+                         !dev->msi.data->__domains[ctrl->domid].domain)))
                return false;
 
        hwsize = msi_domain_get_hwsize(dev, ctrl->domid);
@@ -609,8 +610,8 @@ static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid
                info = domain->host_data;
                return info->hwsize;
        }
-       /* No domain, no size... */
-       return 0;
+       /* No domain, default to MSI_XA_DOMAIN_SIZE */
+       return MSI_XA_DOMAIN_SIZE;
 }

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-16 13:50             ` Matthew Rosato
@ 2022-12-16 13:58               ` Marc Zyngier
  2022-12-16 14:03                 ` Marc Zyngier
                                   ` (2 more replies)
  0 siblings, 3 replies; 126+ messages in thread
From: Marc Zyngier @ 2022-12-16 13:58 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: Thomas Gleixner, Niklas Schnelle, Guenter Roeck, LKML, x86,
	Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Greg Kroah-Hartman, Jason Gunthorpe,
	Dave Jiang, Alex Williamson, Kevin Tian, Dan Williams,
	Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Fri, 16 Dec 2022 13:50:59 +0000,
Matthew Rosato <mjrosato@linux.ibm.com> wrote:
> 
> On 12/16/22 4:53 AM, Marc Zyngier wrote:
> > On Thu, 15 Dec 2022 16:23:20 +0000,
> > Matthew Rosato <mjrosato@linux.ibm.com> wrote:
> >>
> >> On 12/15/22 9:49 AM, Thomas Gleixner wrote:
> >>> On Wed, Dec 14 2022 at 10:42, Niklas Schnelle wrote:
> >>>> On Tue, 2022-12-13 at 11:04 -0800, Guenter Roeck wrote:
> >>>>> This patch results in various s390 qemu test failures.
> >>>>> There is a warning backtrace
> >>>>>
> >>>>>    12.674858] WARNING: CPU: 0 PID: 1 at kernel/irq/msi.c:167 msi_ctrl_valid+0x2a/0xb0
> >>>>>
> >>>>> followed by
> >>>>>
> >>>>> [   12.684333] virtio_net: probe of virtio0 failed with error -34
> >>>>>
> >>>>> and Ethernet interfaces don't instantiate.
> >>>> As far as I'm aware so far he tracked this down to code calling
> >>>> msi_domain_get_hwsize() which in turn calls msi_get_device_domain()
> >>>> which then returns NULL leading to msi_domain_get_hwsize() returning 0.
> >>>> I think this is related to the fact that we currently don't use IRQ
> >>>> domains.
> >>>
> >>> Correct and for some stupid reason I thought 0 is a good return value
> >>> here :)
> >>>
> >>>
> >>>
> >>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> >>> index bd4d4dd626b4..8fb10f216dc0 100644
> >>> --- a/kernel/irq/msi.c
> >>> +++ b/kernel/irq/msi.c
> >>> @@ -609,8 +609,8 @@ static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid
> >>>  		info = domain->host_data;
> >>>  		return info->hwsize;
> >>>  	}
> >>> -	/* No domain, no size... */
> >>> -	return 0;
> >>> +	/* No domain, default to MSI_MAX_INDEX */
> >>> +	return MSI_MAX_INDEX;
> >>>  }
> >>>  
> >>>  static inline void irq_chip_write_msi_msg(struct irq_data *data,
> >>
> >> Ah, that makes sense...  So, with that diff applied, that fixes most of the issues I'm seeing incl. the virtio one that Guenter mentioned.  But it looks like NVMe devices are still broken on s390 with a different backtrace -- the bisect for that one points to another patch in part2 of this work and looks like another issue with missing irq domain:
> >>
> >> 40742716f294 genirq/msi: Make msi_add_simple_msi_descs() device domain aware
> >>
> >>
> >> [    4.308861] ------------[ cut here ]------------
> >> [    4.308865] WARNING: CPU: 7 PID: 9 at kernel/irq/msi.c:167 msi_domain_free_msi_descs_range+0x3c/0xd0
> >> [    4.308877] Modules linked in: mlx5_core aes_s390 nvme des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 nvme_core sha1_s390 sha_common pkey zcrypt rng_core autofs4
> >> [    4.308896] CPU: 7 PID: 9 Comm: kworker/u20:0 Not tainted 6.1.0 #179
> >> [    4.308898] Hardware name: IBM 3931 A01 782 (KVM/Linux)
> >> [    4.308900] Workqueue: events_unbound async_run_entry_fn
> >> [    4.308905] Krnl PSW : 0704c00180000000 00000000b6426b78 (msi_domain_free_msi_descs_range+0x40/0xd0)
> >> [    4.308909]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> >> [    4.308911] Krnl GPRS: 0700000080eda0a0 0000000000000000 0000000080eda0a0 0000000000000000
> >> [    4.308913]            0000000000000000 0000000000000000 0000000000000cc0 0000000080eda000
> >> [    4.308914]            00000000b7ddc000 0000000091934aa8 000000000000ffff 0000000000000000
> >> [    4.308915]            0000000080344200 0000000080f2b1c0 0000037fffb5b918 0000037fffb5b8c8
> >> [    4.308924] Krnl Code: 00000000b6426b68: e54cf0ac0000	mvhi	172(%r15),0
> >> [    4.308924]            00000000b6426b6e: ec3c000b017f	clij	%r3,1,12,00000000b6426b84
> >> [    4.308924]           #00000000b6426b74: af000000		mc	0,0
> >> [    4.308924]           >00000000b6426b78: eb9ff0b00004	lmg	%r9,%r15,176(%r15)
> >> [    4.308924]            00000000b6426b7e: 07fe		bcr	15,%r14
> >> [    4.308924]            00000000b6426b80: 47000700		bc	0,1792
> >> [    4.308924]            00000000b6426b84: b90400a5		lgr	%r10,%r5
> >> [    4.308924]            00000000b6426b88: b9040013		lgr	%r1,%r3
> >> [    4.308935] Call Trace:
> >> [    4.308938]  [<00000000b6426b78>] msi_domain_free_msi_descs_range+0x40/0xd0 
> >> [    4.308945]  [<00000000b6bb126e>] pci_free_msi_irqs+0x26/0x48 
> >> [    4.308950]  [<00000000b6baf4d4>] pci_disable_msix+0x6c/0x80 
> >> [    4.308954]  [<00000000b6baf716>] pci_free_irq_vectors+0x26/0x88 
> >> [    4.308956]  [<000003ff7fdfa8f4>] nvme_setup_io_queues+0x18c/0x398 [nvme] 
> >> [    4.308968]  [<000003ff7fdfbf1e>] nvme_probe+0x2e6/0x3b0 [nvme] 
> >> [    4.308972]  [<00000000b6ba44cc>] local_pci_probe+0x44/0x80 
> >> [    4.308974]  [<00000000b6ba46d8>] pci_call_probe+0x50/0x180 
> >> [    4.308976]  [<00000000b6ba5166>] pci_device_probe+0xae/0x110 
> >> [    4.308978]  [<00000000b6c0a19a>] really_probe+0xd2/0x480 
> >> [    4.308982]  [<00000000b6c0a6f8>] driver_probe_device+0x40/0xf0 
> >> [    4.308984]  [<00000000b6c0a80e>] __driver_attach_async_helper+0x66/0xf0 
> >> [    4.308986]  [<00000000b63cfb72>] async_run_entry_fn+0x4a/0x1b0 
> >> [    4.308987]  [<00000000b63c1368>] process_one_work+0x200/0x458 
> >> [    4.308991]  [<00000000b63c1aee>] worker_thread+0x66/0x480 
> >> [    4.308993]  [<00000000b63caa00>] kthread+0x108/0x110 
> >> [    4.308996]  [<00000000b634f2dc>] __ret_from_fork+0x3c/0x58 
> >> [    4.308999]  [<00000000b6f8da2a>] ret_from_fork+0xa/0x40 
> >> [    4.309006] Last Breaking-Event-Address:
> >> [    4.309007]  [<00000000b6426ba8>] msi_domain_free_msi_descs_range+0x70/0xd0
> >> [    4.309009] ---[ end trace 0000000000000000 ]---
> >> [    8.957187] nvme: probe of 0003:00:00.0 failed with error -22
> >> [    8.957216] ------------[ cut here ]------------
> >> [    8.957217] WARNING: CPU: 5 PID: 9 at kernel/irq/msi.c:275 msi_device_data_release+0x76/0xa0
> >> [    8.957229] Modules linked in: mlx5_core aes_s390 nvme des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 nvme_core sha1_s390 sha_common pkey zcrypt rng_core autofs4
> >> [    8.957248] CPU: 5 PID: 9 Comm: kworker/u20:0 Tainted: G        W          6.1.0 #179
> >> [    8.957252] Hardware name: IBM 3931 A01 782 (KVM/Linux)
> >> [    8.957254] Workqueue: events_unbound async_run_entry_fn
> >> [    8.957259] Krnl PSW : 0704e00180000000 00000000b642729a (msi_device_data_release+0x7a/0xa0)
> >> [    8.957262]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> >> [    8.957265] Krnl GPRS: a813fdc020800000 00000000928b6840 0000000091934ab8 0000000080344200
> >> [    8.957267]            0000000000000000 0000000091934a80 0000000080d79988 0000000000000000
> >> [    8.957268]            0000000080eda0a0 0000037fffb5bc10 0000000080eda0a0 0000000091934aa8
> >> [    8.957270]            0000000080344200 00000000800e0402 00000000b642724e 0000037fffb5bad0
> >> [    8.957279] Krnl Code: 00000000b642728c: f0a0000407fe	srp	4(11,%r0),2046,0
> >> [    8.957279]            00000000b6427292: 47000700		bc	0,1792
> >> [    8.957279]           #00000000b6427296: af000000		mc	0,0
> >> [    8.957279]           >00000000b642729a: a7f4ffdf		brc	15,00000000b6427258
> >> [    8.957279]            00000000b642729e: af000000		mc	0,0
> >> [    8.957279]            00000000b64272a2: 4120b048		la	%r2,72(%r11)
> >> [    8.957279]            00000000b64272a6: c0e5005a0c4d	brasl	%r14,00000000b6f68b40
> >> [    8.957279]            00000000b64272ac: e548a1180000	mvghi	280(%r10),0
> >> [    8.957290] Call Trace:
> >> [    8.957292]  [<00000000b642729a>] msi_device_data_release+0x7a/0xa0 
> >> [    8.957295] ([<00000000b642724e>] msi_device_data_release+0x2e/0xa0)
> >> [    8.957298]  [<00000000b6c0f608>] release_nodes+0x50/0xd8 
> >> [    8.957305]  [<00000000b6c111aa>] devres_release_all+0xaa/0xf0 
> >> [    8.957308]  [<00000000b6c0a2f2>] really_probe+0x22a/0x480 
> >> [    8.957310]  [<00000000b6c0a6f8>] driver_probe_device+0x40/0xf0 
> >> [    8.957312]  [<00000000b6c0a80e>] __driver_attach_async_helper+0x66/0xf0 
> >> [    8.957314]  [<00000000b63cfb72>] async_run_entry_fn+0x4a/0x1b0 
> >> [    8.957315]  [<00000000b63c1368>] process_one_work+0x200/0x458 
> >> [    8.957320]  [<00000000b63c1aee>] worker_thread+0x66/0x480 
> >> [    8.957322]  [<00000000b63caa00>] kthread+0x108/0x110 
> >> [    8.957325]  [<00000000b634f2dc>] __ret_from_fork+0x3c/0x58 
> >> [    8.957328]  [<00000000b6f8da2a>] ret_from_fork+0xa/0x40 
> >> [    8.957336] Last Breaking-Event-Address:
> >> [    8.957337]  [<00000000b6427254>] msi_device_data_release+0x34/0xa0
> >> [    8.957339] ---[ end trace 0000000000000000 ]---
> >>
> >> The line number for the first warning points to the WARN_ON check in msi_ctrl_valid -- specifically it's the !dev->msi.data->__domains[ctrl->domid].domain check that is failing.
> >>
> >> The second warning is the WARN_ON_ONCE(!xa_empty(&md->__domains[i].store)) check in msi_device_data_release, presumably a victim of backing out after the first error.
> >>
> > 
> > Yeah, the non-irqdomain legacy path definitely wounds up here, and we
> > end-up leaking descriptors. If the following hack works for you, I'll
> > ferry the two fixes to Linus asap.
> > 
> > Thanks,
> > 
> > 	M.
> > 
> > diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> > index bd4d4dd626b4..9921dc57f1b4 100644
> > --- a/kernel/irq/msi.c
> > +++ b/kernel/irq/msi.c
> > @@ -165,7 +165,8 @@ static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl)
> >  	unsigned int hwsize;
> >  
> >  	if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS ||
> > -			 !dev->msi.data->__domains[ctrl->domid].domain))
> > +			 (dev->msi.domain &&
> > +			  !dev->msi.data->__domains[ctrl->domid].domain))
> >  		return false;
> >  
> >  	hwsize = msi_domain_get_hwsize(dev, ctrl->domid);
> > 
> 
> Close, but I had to add an extra ) at the end that was missing :)

Hey, I never said I tried to compile the thing! ;-)

> 
> With both these fixes applied, it actually then leads to the very
> next WARN_ON failing in msi_ctrl_valid...  Because ctrl->last ==
> hwsize.  I think Thomas' initial fix for msi_domain_get_hwsize has
> an off-by-1 error, I think we should return MSI_XA_DOMAIN_SIZE for
> msi_domain_get_hwsize instead.

Yes, that's a good point, and that's consistent with what
__msi_create_irq_domain() does already, assuming MSI_XA_DOMAIN_SIZE
when info->hwsize is 0. No reason to do something else here.

I'll update Thomas' patch. Once Guenter confirms that PPC is OK, I'll
send it out.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-16 13:58               ` Marc Zyngier
@ 2022-12-16 14:03                 ` Marc Zyngier
  2022-12-16 14:11                   ` Matthew Rosato
  2022-12-16 15:47                 ` Guenter Roeck
  2022-12-17  0:45                 ` Guenter Roeck
  2 siblings, 1 reply; 126+ messages in thread
From: Marc Zyngier @ 2022-12-16 14:03 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: Thomas Gleixner, Niklas Schnelle, Guenter Roeck, LKML, x86,
	Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Greg Kroah-Hartman, Jason Gunthorpe,
	Dave Jiang, Alex Williamson, Kevin Tian, Dan Williams,
	Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Fri, 16 Dec 2022 13:58:59 +0000,
Marc Zyngier <maz@kernel.org> wrote:
> 
> I'll update Thomas' patch. Once Guenter confirms that PPC is OK, I'll
> send it out.

And FWIW, the branch is at [1].

Thanks,

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/msi-fixes-6.2

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-16 14:03                 ` Marc Zyngier
@ 2022-12-16 14:11                   ` Matthew Rosato
  2022-12-16 17:30                     ` Marc Zyngier
  0 siblings, 1 reply; 126+ messages in thread
From: Matthew Rosato @ 2022-12-16 14:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, Niklas Schnelle, Guenter Roeck, LKML, x86,
	Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Greg Kroah-Hartman, Jason Gunthorpe,
	Dave Jiang, Alex Williamson, Kevin Tian, Dan Williams,
	Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On 12/16/22 9:03 AM, Marc Zyngier wrote:
> On Fri, 16 Dec 2022 13:58:59 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
>>
>> I'll update Thomas' patch. Once Guenter confirms that PPC is OK, I'll
>> send it out.
> 
> And FWIW, the branch is at [1].
> 
> Thanks,
> 
> 	M.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/msi-fixes-6.2
> 

FYI, your patch there mentions MSI_XA_DOMAIN_SIZE in the commit message but the code (and comment) is still returning MSI_MAX_INDEX 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-16 13:58               ` Marc Zyngier
  2022-12-16 14:03                 ` Marc Zyngier
@ 2022-12-16 15:47                 ` Guenter Roeck
  2022-12-17  0:45                 ` Guenter Roeck
  2 siblings, 0 replies; 126+ messages in thread
From: Guenter Roeck @ 2022-12-16 15:47 UTC (permalink / raw)
  To: Marc Zyngier, Matthew Rosato
  Cc: Thomas Gleixner, Niklas Schnelle, LKML, x86, Joerg Roedel,
	Will Deacon, linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Greg Kroah-Hartman, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Kevin Tian, Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason,
	Allen Hubbe

On 12/16/22 05:58, Marc Zyngier wrote:
[ ... ]

>>
>> With both these fixes applied, it actually then leads to the very
>> next WARN_ON failing in msi_ctrl_valid...  Because ctrl->last ==
>> hwsize.  I think Thomas' initial fix for msi_domain_get_hwsize has
>> an off-by-1 error, I think we should return MSI_XA_DOMAIN_SIZE for
>> msi_domain_get_hwsize instead.
> 
> Yes, that's a good point, and that's consistent with what
> __msi_create_irq_domain() does already, assuming MSI_XA_DOMAIN_SIZE
> when info->hwsize is 0. No reason to do something else here.
> 
> I'll update Thomas' patch. Once Guenter confirms that PPC is OK, I'll
> send it out.
> 

It wasn't just ppc; that was just the easiest to report. I applied
the two patches on top of the irq merge and will test the resulting
branch (mainline is too broken right now). I hope that will give me
useful results. It will take a while though since my testbed is
still busy testing the most recent release candidates.

Guenter


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-16 14:11                   ` Matthew Rosato
@ 2022-12-16 17:30                     ` Marc Zyngier
  0 siblings, 0 replies; 126+ messages in thread
From: Marc Zyngier @ 2022-12-16 17:30 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: Thomas Gleixner, Niklas Schnelle, Guenter Roeck, LKML, x86,
	Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Greg Kroah-Hartman, Jason Gunthorpe,
	Dave Jiang, Alex Williamson, Kevin Tian, Dan Williams,
	Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On 2022-12-16 14:11, Matthew Rosato wrote:
> On 12/16/22 9:03 AM, Marc Zyngier wrote:
>> On Fri, 16 Dec 2022 13:58:59 +0000,
>> Marc Zyngier <maz@kernel.org> wrote:
>>> 
>>> I'll update Thomas' patch. Once Guenter confirms that PPC is OK, I'll
>>> send it out.
>> 
>> And FWIW, the branch is at [1].
>> 
>> Thanks,
>> 
>> 	M.
>> 
>> [1] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/msi-fixes-6.2
>> 
> 
> FYI, your patch there mentions MSI_XA_DOMAIN_SIZE in the commit
> message but the code (and comment) is still returning MSI_MAX_INDEX

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/msi-fixes-6.2&id=e982ad82bd8f7931f5788a15dfa3709f7a7ee79f

That was the case initially (amended the commit message, but
didn't commit the code...), then pushed the real stuff a couple of
minutes later. It took about 20 minutes for the mirror to sync
up...

I guess the git mirroring is a bit busy at the moment...

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-16 13:58               ` Marc Zyngier
  2022-12-16 14:03                 ` Marc Zyngier
  2022-12-16 15:47                 ` Guenter Roeck
@ 2022-12-17  0:45                 ` Guenter Roeck
  2022-12-17 10:46                   ` Marc Zyngier
  2 siblings, 1 reply; 126+ messages in thread
From: Guenter Roeck @ 2022-12-17  0:45 UTC (permalink / raw)
  To: Marc Zyngier, Matthew Rosato
  Cc: Thomas Gleixner, Niklas Schnelle, LKML, x86, Joerg Roedel,
	Will Deacon, linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Greg Kroah-Hartman, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Kevin Tian, Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason,
	Allen Hubbe

On 12/16/22 05:58, Marc Zyngier wrote:
[ ... ]

>> With both these fixes applied, it actually then leads to the very
>> next WARN_ON failing in msi_ctrl_valid...  Because ctrl->last ==
>> hwsize.  I think Thomas' initial fix for msi_domain_get_hwsize has
>> an off-by-1 error, I think we should return MSI_XA_DOMAIN_SIZE for
>> msi_domain_get_hwsize instead.
> 
> Yes, that's a good point, and that's consistent with what
> __msi_create_irq_domain() does already, assuming MSI_XA_DOMAIN_SIZE
> when info->hwsize is 0. No reason to do something else here.
> 
> I'll update Thomas' patch. Once Guenter confirms that PPC is OK, I'll
> send it out.
> 
With

7a27b6136dcb (local/testing, testing-msi) genirq/msi: Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when no domain is present
c581d525bb1d genirq/msi: Check for the presence of an irq domain when validating msi_ctrl
9d33edb20f7e Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

I still get the following runtime warning.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 8 at kernel/irq/msi.c:196 .msi_domain_free_descs+0x144/0x170
Modules linked in:
CPU: 0 PID: 8 Comm: kworker/u2:0 Tainted: G                 N 6.1.0-01957-g7a27b6136dcb #1
Hardware name: QEMU ppce500 e5500 0x80240020 QEMU e500
Workqueue: nvme-reset-wq .nvme_reset_work
NIP:  c000000000107d54 LR: c000000000107d44 CTR: 0000000000000000
REGS: c0000000041e74d0 TRAP: 0700   Tainted: G                 N  (6.1.0-01957-g7a27b6136dcb)
MSR:  0000000080029002 <CE,EE,ME>  CR: 44002282  XER: 20000000
IRQMASK: 0
GPR00: c000000000107d44 c0000000041e7770 c000000001629c00 c000000004e748a0
GPR04: 000000005358db0a c000000001ce7a00 c00000000423b5d0 000000004735aaa2
GPR08: 0000000000000002 0000000000000013 c00000000423acc0 c00000000214a998
GPR12: 0000000024002282 c000000002579000 c00000000008e190 c000000004173540
GPR16: 0000000000000000 c0000000043810b8 0000000000000000 0000000000000001
GPR20: c0000000060b22d8 c0000000060a70f0 0000000000000000 c000000001996800
GPR24: c0000000017df6c0 c0000000043810b8 c0000000060b2388 c0000000060b2000
GPR28: ffffffffffffffff c0000000041e7888 c000000006025ac8 c000000004e748a0
NIP [c000000000107d54] .msi_domain_free_descs+0x144/0x170
LR [c000000000107d44] .msi_domain_free_descs+0x134/0x170
Call Trace:
[c0000000041e7770] [c000000000107d44] .msi_domain_free_descs+0x134/0x170 (unreliable)
[c0000000041e7810] [c0000000001085d8] .msi_domain_free_msi_descs_range+0x38/0x70
[c0000000041e78a0] [c0000000008d000c] .pci_msi_teardown_msi_irqs+0x4c/0xa0
[c0000000041e7920] [c0000000008cf9e8] .pci_free_msi_irqs+0x18/0x50
[c0000000041e79a0] [c0000000008cd8d0] .pci_free_irq_vectors+0x80/0xb0
[c0000000041e7a20] [c000000000a6d2a0] .nvme_reset_work+0x870/0x1780
[c0000000041e7bb0] [c000000000080e68] .process_one_work+0x2d8/0x7b0
[c0000000041e7c90] [c0000000000813d8] .worker_thread+0x98/0x4f0
[c0000000041e7d70] [c00000000008e2cc] .kthread+0x13c/0x150
[c0000000041e7e10] [c0000000000005d8] .ret_from_kernel_thread+0x58/0x60
Instruction dump:
7fc3f378 48ff1ca9 60000000 7c7f1b79 41c2002c e8810070 7fc3f378 48ff3491
60000000 813f0000 2c090000 41e2ffb0 <0fe00000> 60000000 60000000 ebc10090
irq event stamp: 98168
hardirqs last  enabled at (98167): [<c00000000110a274>] ._raw_spin_unlock_irqrestore+0x84/0xd0
hardirqs last disabled at (98168): [<c000000000010b58>] .program_check_exception+0x38/0x120
softirqs last  enabled at (97760): [<c00000000110b4dc>] .__do_softirq+0x60c/0x678
softirqs last disabled at (97749): [<c000000000004d20>] .do_softirq_own_stack+0x30/0x50
---[ end trace 0000000000000000 ]---
nvme nvme0: 1/0/0 default/read/poll queues
nvme nvme0: Ignoring bogus Namespace Identifiers
...

The system boots fine, though. This is seen when booting the ppce500 machine with
e5500 CPU and corenet64_smp_defconfig from nvme.

Guenter


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-17  0:45                 ` Guenter Roeck
@ 2022-12-17 10:46                   ` Marc Zyngier
  2022-12-17 13:36                     ` Guenter Roeck
  0 siblings, 1 reply; 126+ messages in thread
From: Marc Zyngier @ 2022-12-17 10:46 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Matthew Rosato, Thomas Gleixner, Niklas Schnelle, LKML, x86,
	Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Greg Kroah-Hartman, Jason Gunthorpe,
	Dave Jiang, Alex Williamson, Kevin Tian, Dan Williams,
	Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy

On Sat, 17 Dec 2022 00:45:50 +0000,
Guenter Roeck <linux@roeck-us.net> wrote:
> 
> On 12/16/22 05:58, Marc Zyngier wrote:
> [ ... ]
> 
> >> With both these fixes applied, it actually then leads to the very
> >> next WARN_ON failing in msi_ctrl_valid...  Because ctrl->last ==
> >> hwsize.  I think Thomas' initial fix for msi_domain_get_hwsize has
> >> an off-by-1 error, I think we should return MSI_XA_DOMAIN_SIZE for
> >> msi_domain_get_hwsize instead.
> > 
> > Yes, that's a good point, and that's consistent with what
> > __msi_create_irq_domain() does already, assuming MSI_XA_DOMAIN_SIZE
> > when info->hwsize is 0. No reason to do something else here.
> > 
> > I'll update Thomas' patch. Once Guenter confirms that PPC is OK, I'll
> > send it out.
> > 
> With
> 
> 7a27b6136dcb (local/testing, testing-msi) genirq/msi: Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when no domain is present
> c581d525bb1d genirq/msi: Check for the presence of an irq domain when validating msi_ctrl
> 9d33edb20f7e Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> 
> I still get the following runtime warning.
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 8 at kernel/irq/msi.c:196 .msi_domain_free_descs+0x144/0x170
> Modules linked in:
> CPU: 0 PID: 8 Comm: kworker/u2:0 Tainted: G                 N 6.1.0-01957-g7a27b6136dcb #1
> Hardware name: QEMU ppce500 e5500 0x80240020 QEMU e500
> Workqueue: nvme-reset-wq .nvme_reset_work
> NIP:  c000000000107d54 LR: c000000000107d44 CTR: 0000000000000000
> REGS: c0000000041e74d0 TRAP: 0700   Tainted: G                 N  (6.1.0-01957-g7a27b6136dcb)
> MSR:  0000000080029002 <CE,EE,ME>  CR: 44002282  XER: 20000000
> IRQMASK: 0
> GPR00: c000000000107d44 c0000000041e7770 c000000001629c00 c000000004e748a0
> GPR04: 000000005358db0a c000000001ce7a00 c00000000423b5d0 000000004735aaa2
> GPR08: 0000000000000002 0000000000000013 c00000000423acc0 c00000000214a998
> GPR12: 0000000024002282 c000000002579000 c00000000008e190 c000000004173540
> GPR16: 0000000000000000 c0000000043810b8 0000000000000000 0000000000000001
> GPR20: c0000000060b22d8 c0000000060a70f0 0000000000000000 c000000001996800
> GPR24: c0000000017df6c0 c0000000043810b8 c0000000060b2388 c0000000060b2000
> GPR28: ffffffffffffffff c0000000041e7888 c000000006025ac8 c000000004e748a0
> NIP [c000000000107d54] .msi_domain_free_descs+0x144/0x170
> LR [c000000000107d44] .msi_domain_free_descs+0x134/0x170
> Call Trace:
> [c0000000041e7770] [c000000000107d44] .msi_domain_free_descs+0x134/0x170 (unreliable)
> [c0000000041e7810] [c0000000001085d8] .msi_domain_free_msi_descs_range+0x38/0x70
> [c0000000041e78a0] [c0000000008d000c] .pci_msi_teardown_msi_irqs+0x4c/0xa0
> [c0000000041e7920] [c0000000008cf9e8] .pci_free_msi_irqs+0x18/0x50
> [c0000000041e79a0] [c0000000008cd8d0] .pci_free_irq_vectors+0x80/0xb0
> [c0000000041e7a20] [c000000000a6d2a0] .nvme_reset_work+0x870/0x1780
> [c0000000041e7bb0] [c000000000080e68] .process_one_work+0x2d8/0x7b0
> [c0000000041e7c90] [c0000000000813d8] .worker_thread+0x98/0x4f0
> [c0000000041e7d70] [c00000000008e2cc] .kthread+0x13c/0x150
> [c0000000041e7e10] [c0000000000005d8] .ret_from_kernel_thread+0x58/0x60
> Instruction dump:
> 7fc3f378 48ff1ca9 60000000 7c7f1b79 41c2002c e8810070 7fc3f378 48ff3491
> 60000000 813f0000 2c090000 41e2ffb0 <0fe00000> 60000000 60000000 ebc10090
> irq event stamp: 98168
> hardirqs last  enabled at (98167): [<c00000000110a274>] ._raw_spin_unlock_irqrestore+0x84/0xd0
> hardirqs last disabled at (98168): [<c000000000010b58>] .program_check_exception+0x38/0x120
> softirqs last  enabled at (97760): [<c00000000110b4dc>] .__do_softirq+0x60c/0x678
> softirqs last disabled at (97749): [<c000000000004d20>] .do_softirq_own_stack+0x30/0x50
> ---[ end trace 0000000000000000 ]---
> nvme nvme0: 1/0/0 default/read/poll queues
> nvme nvme0: Ignoring bogus Namespace Identifiers
> ...
> 
> The system boots fine, though. This is seen when booting the ppce500
> machine with e5500 CPU and corenet64_smp_defconfig from nvme.

+PPC folks.

Thanks for the report.

I managed to reproduce this, although in a limited way (a SMP qemu
instance wouldn't boot at all). The problem is that the core MSI code
now assumes that if the arch code is in charge of breaking the
association of a MSI with a device, it is also in charge of clearing
the irq in the MSI descriptor.

This matches the s390 behaviour, but not powerpc's, hence the splat
and the leaked MSI descriptors. The minimal fix should be as follow,
which I'll add to the pile of patches.

Thanks,

	M.

diff --git a/arch/powerpc/platforms/4xx/hsta_msi.c b/arch/powerpc/platforms/4xx/hsta_msi.c
index d4f7fff1fc87..e11b57a62b05 100644
--- a/arch/powerpc/platforms/4xx/hsta_msi.c
+++ b/arch/powerpc/platforms/4xx/hsta_msi.c
@@ -115,6 +115,7 @@ static void hsta_teardown_msi_irqs(struct pci_dev *dev)
 		msi_bitmap_free_hwirqs(&ppc4xx_hsta_msi.bmp, irq, 1);
 		pr_debug("%s: Teardown IRQ %u (index %u)\n", __func__,
 			 entry->irq, irq);
+		entry->irq = 0;
 	}
 }
 
diff --git a/arch/powerpc/platforms/cell/axon_msi.c b/arch/powerpc/platforms/cell/axon_msi.c
index 5b012abca773..0c11aad896c7 100644
--- a/arch/powerpc/platforms/cell/axon_msi.c
+++ b/arch/powerpc/platforms/cell/axon_msi.c
@@ -289,6 +289,7 @@ static void axon_msi_teardown_msi_irqs(struct pci_dev *dev)
 	msi_for_each_desc(entry, &dev->dev, MSI_DESC_ASSOCIATED) {
 		irq_set_msi_desc(entry->irq, NULL);
 		irq_dispose_mapping(entry->irq);
+		entry->irq = 0;
 	}
 }
 
diff --git a/arch/powerpc/platforms/pasemi/msi.c b/arch/powerpc/platforms/pasemi/msi.c
index dc1846660005..166c97fff16d 100644
--- a/arch/powerpc/platforms/pasemi/msi.c
+++ b/arch/powerpc/platforms/pasemi/msi.c
@@ -66,6 +66,7 @@ static void pasemi_msi_teardown_msi_irqs(struct pci_dev *pdev)
 		hwirq = virq_to_hw(entry->irq);
 		irq_set_msi_desc(entry->irq, NULL);
 		irq_dispose_mapping(entry->irq);
+		entry->irq = 0;
 		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, ALLOC_CHUNK);
 	}
 }
diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 73c2d70706c0..57978a44d55b 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -132,6 +132,7 @@ static void fsl_teardown_msi_irqs(struct pci_dev *pdev)
 		msi_data = irq_get_chip_data(entry->irq);
 		irq_set_msi_desc(entry->irq, NULL);
 		irq_dispose_mapping(entry->irq);
+		entry->irq = 0;
 		msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
 	}
 }
diff --git a/arch/powerpc/sysdev/mpic_u3msi.c b/arch/powerpc/sysdev/mpic_u3msi.c
index 1d8cfdfdf115..492cb03c0b62 100644
--- a/arch/powerpc/sysdev/mpic_u3msi.c
+++ b/arch/powerpc/sysdev/mpic_u3msi.c
@@ -108,6 +108,7 @@ static void u3msi_teardown_msi_irqs(struct pci_dev *pdev)
 		hwirq = virq_to_hw(entry->irq);
 		irq_set_msi_desc(entry->irq, NULL);
 		irq_dispose_mapping(entry->irq);
+		entry->irq = 0;
 		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, 1);
 	}
 }

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-12-17 10:46                   ` Marc Zyngier
@ 2022-12-17 13:36                     ` Guenter Roeck
  0 siblings, 0 replies; 126+ messages in thread
From: Guenter Roeck @ 2022-12-17 13:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Matthew Rosato, Thomas Gleixner, Niklas Schnelle, LKML, x86,
	Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Greg Kroah-Hartman, Jason Gunthorpe,
	Dave Jiang, Alex Williamson, Kevin Tian, Dan Williams,
	Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy

On 12/17/22 02:46, Marc Zyngier wrote:
> On Sat, 17 Dec 2022 00:45:50 +0000,
> Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> On 12/16/22 05:58, Marc Zyngier wrote:
>> [ ... ]
>>
>>>> With both these fixes applied, it actually then leads to the very
>>>> next WARN_ON failing in msi_ctrl_valid...  Because ctrl->last ==
>>>> hwsize.  I think Thomas' initial fix for msi_domain_get_hwsize has
>>>> an off-by-1 error, I think we should return MSI_XA_DOMAIN_SIZE for
>>>> msi_domain_get_hwsize instead.
>>>
>>> Yes, that's a good point, and that's consistent with what
>>> __msi_create_irq_domain() does already, assuming MSI_XA_DOMAIN_SIZE
>>> when info->hwsize is 0. No reason to do something else here.
>>>
>>> I'll update Thomas' patch. Once Guenter confirms that PPC is OK, I'll
>>> send it out.
>>>
>> With
>>
>> 7a27b6136dcb (local/testing, testing-msi) genirq/msi: Return MSI_XA_DOMAIN_SIZE as the maximum MSI index when no domain is present
>> c581d525bb1d genirq/msi: Check for the presence of an irq domain when validating msi_ctrl
>> 9d33edb20f7e Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>
>> I still get the following runtime warning.
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 8 at kernel/irq/msi.c:196 .msi_domain_free_descs+0x144/0x170
>> Modules linked in:
>> CPU: 0 PID: 8 Comm: kworker/u2:0 Tainted: G                 N 6.1.0-01957-g7a27b6136dcb #1
>> Hardware name: QEMU ppce500 e5500 0x80240020 QEMU e500
>> Workqueue: nvme-reset-wq .nvme_reset_work
>> NIP:  c000000000107d54 LR: c000000000107d44 CTR: 0000000000000000
>> REGS: c0000000041e74d0 TRAP: 0700   Tainted: G                 N  (6.1.0-01957-g7a27b6136dcb)
>> MSR:  0000000080029002 <CE,EE,ME>  CR: 44002282  XER: 20000000
>> IRQMASK: 0
>> GPR00: c000000000107d44 c0000000041e7770 c000000001629c00 c000000004e748a0
>> GPR04: 000000005358db0a c000000001ce7a00 c00000000423b5d0 000000004735aaa2
>> GPR08: 0000000000000002 0000000000000013 c00000000423acc0 c00000000214a998
>> GPR12: 0000000024002282 c000000002579000 c00000000008e190 c000000004173540
>> GPR16: 0000000000000000 c0000000043810b8 0000000000000000 0000000000000001
>> GPR20: c0000000060b22d8 c0000000060a70f0 0000000000000000 c000000001996800
>> GPR24: c0000000017df6c0 c0000000043810b8 c0000000060b2388 c0000000060b2000
>> GPR28: ffffffffffffffff c0000000041e7888 c000000006025ac8 c000000004e748a0
>> NIP [c000000000107d54] .msi_domain_free_descs+0x144/0x170
>> LR [c000000000107d44] .msi_domain_free_descs+0x134/0x170
>> Call Trace:
>> [c0000000041e7770] [c000000000107d44] .msi_domain_free_descs+0x134/0x170 (unreliable)
>> [c0000000041e7810] [c0000000001085d8] .msi_domain_free_msi_descs_range+0x38/0x70
>> [c0000000041e78a0] [c0000000008d000c] .pci_msi_teardown_msi_irqs+0x4c/0xa0
>> [c0000000041e7920] [c0000000008cf9e8] .pci_free_msi_irqs+0x18/0x50
>> [c0000000041e79a0] [c0000000008cd8d0] .pci_free_irq_vectors+0x80/0xb0
>> [c0000000041e7a20] [c000000000a6d2a0] .nvme_reset_work+0x870/0x1780
>> [c0000000041e7bb0] [c000000000080e68] .process_one_work+0x2d8/0x7b0
>> [c0000000041e7c90] [c0000000000813d8] .worker_thread+0x98/0x4f0
>> [c0000000041e7d70] [c00000000008e2cc] .kthread+0x13c/0x150
>> [c0000000041e7e10] [c0000000000005d8] .ret_from_kernel_thread+0x58/0x60
>> Instruction dump:
>> 7fc3f378 48ff1ca9 60000000 7c7f1b79 41c2002c e8810070 7fc3f378 48ff3491
>> 60000000 813f0000 2c090000 41e2ffb0 <0fe00000> 60000000 60000000 ebc10090
>> irq event stamp: 98168
>> hardirqs last  enabled at (98167): [<c00000000110a274>] ._raw_spin_unlock_irqrestore+0x84/0xd0
>> hardirqs last disabled at (98168): [<c000000000010b58>] .program_check_exception+0x38/0x120
>> softirqs last  enabled at (97760): [<c00000000110b4dc>] .__do_softirq+0x60c/0x678
>> softirqs last disabled at (97749): [<c000000000004d20>] .do_softirq_own_stack+0x30/0x50
>> ---[ end trace 0000000000000000 ]---
>> nvme nvme0: 1/0/0 default/read/poll queues
>> nvme nvme0: Ignoring bogus Namespace Identifiers
>> ...
>>
>> The system boots fine, though. This is seen when booting the ppce500
>> machine with e5500 CPU and corenet64_smp_defconfig from nvme.
> 
> +PPC folks.
> 
> Thanks for the report.
> 
> I managed to reproduce this, although in a limited way (a SMP qemu
> instance wouldn't boot at all). The problem is that the core MSI code
> now assumes that if the arch code is in charge of breaking the
> association of a MSI with a device, it is also in charge of clearing
> the irq in the MSI descriptor.
> 
> This matches the s390 behaviour, but not powerpc's, hence the splat
> and the leaked MSI descriptors. The minimal fix should be as follow,
> which I'll add to the pile of patches.
> 

Confirmed, the patch below fixes the ppc problem.

Thanks,
Guenter

> Thanks,
> 
> 	M.
> 
> diff --git a/arch/powerpc/platforms/4xx/hsta_msi.c b/arch/powerpc/platforms/4xx/hsta_msi.c
> index d4f7fff1fc87..e11b57a62b05 100644
> --- a/arch/powerpc/platforms/4xx/hsta_msi.c
> +++ b/arch/powerpc/platforms/4xx/hsta_msi.c
> @@ -115,6 +115,7 @@ static void hsta_teardown_msi_irqs(struct pci_dev *dev)
>   		msi_bitmap_free_hwirqs(&ppc4xx_hsta_msi.bmp, irq, 1);
>   		pr_debug("%s: Teardown IRQ %u (index %u)\n", __func__,
>   			 entry->irq, irq);
> +		entry->irq = 0;
>   	}
>   }
>   
> diff --git a/arch/powerpc/platforms/cell/axon_msi.c b/arch/powerpc/platforms/cell/axon_msi.c
> index 5b012abca773..0c11aad896c7 100644
> --- a/arch/powerpc/platforms/cell/axon_msi.c
> +++ b/arch/powerpc/platforms/cell/axon_msi.c
> @@ -289,6 +289,7 @@ static void axon_msi_teardown_msi_irqs(struct pci_dev *dev)
>   	msi_for_each_desc(entry, &dev->dev, MSI_DESC_ASSOCIATED) {
>   		irq_set_msi_desc(entry->irq, NULL);
>   		irq_dispose_mapping(entry->irq);
> +		entry->irq = 0;
>   	}
>   }
>   
> diff --git a/arch/powerpc/platforms/pasemi/msi.c b/arch/powerpc/platforms/pasemi/msi.c
> index dc1846660005..166c97fff16d 100644
> --- a/arch/powerpc/platforms/pasemi/msi.c
> +++ b/arch/powerpc/platforms/pasemi/msi.c
> @@ -66,6 +66,7 @@ static void pasemi_msi_teardown_msi_irqs(struct pci_dev *pdev)
>   		hwirq = virq_to_hw(entry->irq);
>   		irq_set_msi_desc(entry->irq, NULL);
>   		irq_dispose_mapping(entry->irq);
> +		entry->irq = 0;
>   		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, ALLOC_CHUNK);
>   	}
>   }
> diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
> index 73c2d70706c0..57978a44d55b 100644
> --- a/arch/powerpc/sysdev/fsl_msi.c
> +++ b/arch/powerpc/sysdev/fsl_msi.c
> @@ -132,6 +132,7 @@ static void fsl_teardown_msi_irqs(struct pci_dev *pdev)
>   		msi_data = irq_get_chip_data(entry->irq);
>   		irq_set_msi_desc(entry->irq, NULL);
>   		irq_dispose_mapping(entry->irq);
> +		entry->irq = 0;
>   		msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
>   	}
>   }
> diff --git a/arch/powerpc/sysdev/mpic_u3msi.c b/arch/powerpc/sysdev/mpic_u3msi.c
> index 1d8cfdfdf115..492cb03c0b62 100644
> --- a/arch/powerpc/sysdev/mpic_u3msi.c
> +++ b/arch/powerpc/sysdev/mpic_u3msi.c
> @@ -108,6 +108,7 @@ static void u3msi_teardown_msi_irqs(struct pci_dev *pdev)
>   		hwirq = virq_to_hw(entry->irq);
>   		irq_set_msi_desc(entry->irq, NULL);
>   		irq_dispose_mapping(entry->irq);
> +		entry->irq = 0;
>   		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, 1);
>   	}
>   }
> 


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 13/33] x86/apic/vector: Provide MSI parent domain
  2022-11-24 23:26 ` [patch V3 13/33] x86/apic/vector: Provide MSI parent domain Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
@ 2023-01-04 12:34   ` Jason Gunthorpe
  2023-01-09 20:32     ` Thomas Gleixner
  2023-01-10 12:14     ` Thomas Gleixner
  2 siblings, 2 replies; 126+ messages in thread
From: Jason Gunthorpe @ 2023-01-04 12:34 UTC (permalink / raw)
  To: Thomas Gleixner, Moshe Shemesh, Shay Drory
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman, Dave Jiang,
	Alex Williamson, Kevin Tian, Dan Williams, Logan Gunthorpe,
	Ashok Raj, Jon Mason, Allen Hubbe

On Fri, Nov 25, 2022 at 12:26:05AM +0100, Thomas Gleixner wrote:
> Enable MSI parent domain support in the x86 vector domain and fixup the
> checks in the iommu implementations to check whether device::msi::domain is
> the default MSI parent domain. That keeps the existing logic to protect
> e.g. devices behind VMD working.
> 
> The interrupt remap PCI/MSI code still works because the underlying vector
> domain still provides the same functionality.
> 
> None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are
> affected either. They still work the same way both at the low level and the
> PCI/MSI implementations they provide.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: Fix kernel doc (robot)
> ---
>  arch/x86/include/asm/msi.h          |    6 +
>  arch/x86/include/asm/pci.h          |    1 
>  arch/x86/kernel/apic/msi.c          |  176 ++++++++++++++++++++++++++----------
>  drivers/iommu/amd/iommu.c           |    2 
>  drivers/iommu/intel/irq_remapping.c |    2
>  5 files changed, 138 insertions(+), 49 deletions(-)

Our test team has discovered some kmem leak complaints on rc1 and
bisected it to this patch.

I don't see an obvious way that fwnode gets destroyed here. So maybe
it should be like this?

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 955267bbc2be63..cbbcb7fd2bd00d 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1000,7 +1000,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
 fail:
 	msi_unlock_descs(dev);
 free_fwnode:
-	kfree(fwnode);
+	irq_domain_free_fwnode(fwnode); // ???
 free_bundle:
 	kfree(bundle);
 	return false;
@@ -1013,6 +1013,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
  */
 void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
 {
+	struct fwnode_handle *fwnode = NULL;
 	struct msi_domain_info *info;
 	struct irq_domain *domain;
 
@@ -1025,7 +1026,10 @@ void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
 
 	dev->msi.data->__domains[domid].domain = NULL;
 	info = domain->host_data;
+	if (domain->flags & IRQ_DOMAIN_FLAG_MSI_DEVICE)
+		fwnode = domain->fwnode;
 	irq_domain_remove(domain);
+	irq_domain_free_fwnode(fwnode);
 	kfree(container_of(info, struct msi_domain_template, info));
 
 unlock:

Thanks,
Jason

kmemleak trace
unreferenced object 0xffff888120ba9a00 (size 96):
  comm "systemd-modules", pid 221, jiffies 4294893411 (age 635.732s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff  ................
    00 00 00 00 00 00 00 00 18 9a ba 20 81 88 ff ff  ........... ....
  backtrace:
    [<00000000bcb7f3b1>] kmalloc_trace+0x27/0x110
    [<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
    [<000000005f45a989>] __driver_attach+0x1ff/0x4a0
    [<0000000000dcaab2>] bus_for_each_dev+0x11e/0x1a0
unreferenced object 0xffff888120baa800 (size 32):
  comm "systemd-modules", pid 221, jiffies 4294893411 (age 635.732s)
  hex dump (first 32 bytes):
    50 43 49 2d 4d 53 49 58 2d 30 30 30 30 3a 30 38  PCI-MSIX-0000:08
    3a 30 30 2e 30 00 ff ff 00 00 00 00 00 00 00 00  :00.0...........
  backtrace:
    [<00000000bef783eb>] __kmalloc_node_track_caller+0x4c/0x1b0
    [<00000000f16b54a8>] kvasprintf+0xb0/0x130
    [<0000000078634624>] kasprintf+0xa6/0xd0
    [<00000000f17eea1c>] __irq_domain_alloc_fwnode+0x1ce/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
unreferenced object 0xffff88812bc8ca80 (size 96):
  comm "systemd-modules", pid 221, jiffies 4294893596 (age 634.996s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff  ................
    00 00 00 00 00 00 00 00 98 ca c8 2b 81 88 ff ff  ...........+....
  backtrace:
    [<00000000bcb7f3b1>] kmalloc_trace+0x27/0x110
    [<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
    [<000000005f45a989>] __driver_attach+0x1ff/0x4a0
    [<0000000000dcaab2>] bus_for_each_dev+0x11e/0x1a0
unreferenced object 0xffff88812bc8dcc0 (size 32):
  comm "systemd-modules", pid 221, jiffies 4294893596 (age 635.000s)
  hex dump (first 32 bytes):
    50 43 49 2d 4d 53 49 58 2d 30 30 30 30 3a 30 38  PCI-MSIX-0000:08
    3a 30 30 2e 31 00 ff ff 82 97 0b 00 00 00 00 00  :00.1...........
  backtrace:
    [<00000000bef783eb>] __kmalloc_node_track_caller+0x4c/0x1b0
    [<00000000f16b54a8>] kvasprintf+0xb0/0x130
    [<0000000078634624>] kasprintf+0xa6/0xd0
    [<00000000f17eea1c>] __irq_domain_alloc_fwnode+0x1ce/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
unreferenced object 0xffff888108177580 (size 96):
  comm "sh", pid 9721, jiffies 4294943281 (age 436.568s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff  ................
    00 00 00 00 00 00 00 00 98 75 17 08 81 88 ff ff  .........u......
  backtrace:
    [<00000000bcb7f3b1>] kmalloc_trace+0x27/0x110
    [<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
    [<000000004aebbb6e>] __device_attach_driver+0x157/0x280
    [<00000000c3894808>] bus_for_each_drv+0x123/0x1a0
unreferenced object 0xffff8881525f1680 (size 32):
  comm "sh", pid 9721, jiffies 4294943281 (age 436.568s)
  hex dump (first 32 bytes):
    50 43 49 2d 4d 53 49 58 2d 30 30 30 30 3a 30 38  PCI-MSIX-0000:08
    3a 30 30 2e 32 00 ff ff 00 00 00 00 00 00 00 00  :00.2...........
  backtrace:
    [<00000000bef783eb>] __kmalloc_node_track_caller+0x4c/0x1b0
    [<00000000f16b54a8>] kvasprintf+0xb0/0x130
    [<0000000078634624>] kasprintf+0xa6/0xd0
    [<00000000f17eea1c>] __irq_domain_alloc_fwnode+0x1ce/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
unreferenced object 0xffff888155ac9f00 (size 96):
  comm "sh", pid 9721, jiffies 4294943493 (age 435.768s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff  ................
    00 00 00 00 00 00 00 00 18 9f ac 55 81 88 ff ff  ...........U....
  backtrace:
    [<00000000bcb7f3b1>] kmalloc_trace+0x27/0x110
    [<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
    [<000000004aebbb6e>] __device_attach_driver+0x157/0x280
    [<00000000c3894808>] bus_for_each_drv+0x123/0x1a0
unreferenced object 0xffff88816b4dfd40 (size 32):
  comm "sh", pid 9721, jiffies 4294943493 (age 435.808s)
  hex dump (first 32 bytes):
    50 43 49 2d 4d 53 49 58 2d 30 30 30 30 3a 30 38  PCI-MSIX-0000:08
    3a 30 30 2e 33 00 ff ff 00 00 00 00 00 00 00 00  :00.3...........
  backtrace:
    [<00000000bef783eb>] __kmalloc_node_track_caller+0x4c/0x1b0
    [<00000000f16b54a8>] kvasprintf+0xb0/0x130
    [<0000000078634624>] kasprintf+0xa6/0xd0
    [<00000000f17eea1c>] __irq_domain_alloc_fwnode+0x1ce/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
unreferenced object 0xffff88812e17e380 (size 96):
  comm "sh", pid 9828, jiffies 4294944405 (age 432.160s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff  ................
    00 00 00 00 00 00 00 00 98 e3 17 2e 81 88 ff ff  ................
  backtrace:
    [<00000000bcb7f3b1>] kmalloc_trace+0x27/0x110
    [<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
    [<000000004aebbb6e>] __device_attach_driver+0x157/0x280
    [<00000000c3894808>] bus_for_each_drv+0x123/0x1a0
unreferenced object 0xffff8881557a9bc0 (size 32):
  comm "sh", pid 9828, jiffies 4294944405 (age 432.160s)
  hex dump (first 32 bytes):
    50 43 49 2d 4d 53 49 58 2d 30 30 30 30 3a 30 38  PCI-MSIX-0000:08
    3a 30 30 2e 36 00 ff ff 00 00 00 00 00 00 00 00  :00.6...........
  backtrace:
    [<00000000bef783eb>] __kmalloc_node_track_caller+0x4c/0x1b0
    [<00000000f16b54a8>] kvasprintf+0xb0/0x130
    [<0000000078634624>] kasprintf+0xa6/0xd0
    [<00000000f17eea1c>] __irq_domain_alloc_fwnode+0x1ce/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
unreferenced object 0xffff88813f624380 (size 96):
  comm "sh", pid 9828, jiffies 4294944654 (age 431.208s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff  ................
    00 00 00 00 00 00 00 00 98 43 62 3f 81 88 ff ff  .........Cb?....
  backtrace:
    [<00000000bcb7f3b1>] kmalloc_trace+0x27/0x110
    [<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
    [<000000004aebbb6e>] __device_attach_driver+0x157/0x280
    [<00000000c3894808>] bus_for_each_drv+0x123/0x1a0
unreferenced object 0xffff88813a95c440 (size 32):
  comm "sh", pid 9828, jiffies 4294944654 (age 431.208s)
  hex dump (first 32 bytes):
    50 43 49 2d 4d 53 49 58 2d 30 30 30 30 3a 30 38  PCI-MSIX-0000:08
    3a 30 30 2e 37 00 ff ff 2f 5f 5f 70 79 63 61 63  :00.7.../__pycac
  backtrace:
    [<00000000bef783eb>] __kmalloc_node_track_caller+0x4c/0x1b0
    [<00000000f16b54a8>] kvasprintf+0xb0/0x130
    [<0000000078634624>] kasprintf+0xa6/0xd0
    [<00000000f17eea1c>] __irq_domain_alloc_fwnode+0x1ce/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000002aec9527>] driver_probe_device+0x49/0x120
unreferenced object 0xffff88813aa3b880 (size 96):
  comm "sh", pid 10020, jiffies 4294950696 (age 407.044s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff  ................
    00 00 00 00 00 00 00 00 98 b8 a3 3a 81 88 ff ff  ...........:....
  backtrace:
    [<00000000bcb7f3b1>] kmalloc_trace+0x27/0x110
    [<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000000d688957>] device_driver_attach+0xae/0x1b0
    [<00000000003e203b>] bind_store+0x150/0x1f0
    [<000000003b2d7ae5>] kernfs_fop_write_iter+0x348/0x520
unreferenced object 0xffff888142df4b80 (size 32):
  comm "sh", pid 10020, jiffies 4294950696 (age 407.088s)
  hex dump (first 32 bytes):
    50 43 49 2d 4d 53 49 58 2d 30 30 30 30 3a 30 38  PCI-MSIX-0000:08
    3a 30 30 2e 32 00 ff ff 00 b0 f4 60 01 00 00 00  :00.2......`....
  backtrace:
    [<00000000bef783eb>] __kmalloc_node_track_caller+0x4c/0x1b0
    [<00000000f16b54a8>] kvasprintf+0xb0/0x130
    [<0000000078634624>] kasprintf+0xa6/0xd0
    [<00000000f17eea1c>] __irq_domain_alloc_fwnode+0x1ce/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000000d688957>] device_driver_attach+0xae/0x1b0
unreferenced object 0xffff88816cd32780 (size 96):
  comm "sh", pid 10050, jiffies 4294950903 (age 406.300s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff  ................
    00 00 00 00 00 00 00 00 98 27 d3 6c 81 88 ff ff  .........'.l....
  backtrace:
    [<00000000bcb7f3b1>] kmalloc_trace+0x27/0x110
    [<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000000d688957>] device_driver_attach+0xae/0x1b0
    [<00000000003e203b>] bind_store+0x150/0x1f0
    [<000000003b2d7ae5>] kernfs_fop_write_iter+0x348/0x520
unreferenced object 0xffff88816b1df980 (size 32):
  comm "sh", pid 10050, jiffies 4294950903 (age 406.300s)
  hex dump (first 32 bytes):
    50 43 49 2d 4d 53 49 58 2d 30 30 30 30 3a 30 38  PCI-MSIX-0000:08
    3a 30 30 2e 33 00 ff ff 00 00 00 00 00 00 00 00  :00.3...........
  backtrace:
    [<00000000bef783eb>] __kmalloc_node_track_caller+0x4c/0x1b0
    [<00000000f16b54a8>] kvasprintf+0xb0/0x130
    [<0000000078634624>] kasprintf+0xa6/0xd0
    [<00000000f17eea1c>] __irq_domain_alloc_fwnode+0x1ce/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000000d688957>] device_driver_attach+0xae/0x1b0
unreferenced object 0xffff8881620cd580 (size 96):
  comm "test-ovn-2-swit", pid 10619, jiffies 4294958587 (age 375.592s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff  ................
    00 00 00 00 00 00 00 00 98 d5 0c 62 81 88 ff ff  ...........b....
  backtrace:
    [<00000000bcb7f3b1>] kmalloc_trace+0x27/0x110
    [<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000000d688957>] device_driver_attach+0xae/0x1b0
    [<00000000003e203b>] bind_store+0x150/0x1f0
    [<000000003b2d7ae5>] kernfs_fop_write_iter+0x348/0x520
unreferenced object 0xffff88815cd13700 (size 32):
  comm "test-ovn-2-swit", pid 10619, jiffies 4294958587 (age 375.636s)
  hex dump (first 32 bytes):
    50 43 49 2d 4d 53 49 58 2d 30 30 30 30 3a 30 38  PCI-MSIX-0000:08
    3a 30 30 2e 32 00 ff ff 80 55 5a 07 00 ea ff ff  :00.2....UZ.....
  backtrace:
    [<00000000bef783eb>] __kmalloc_node_track_caller+0x4c/0x1b0
    [<00000000f16b54a8>] kvasprintf+0xb0/0x130
    [<0000000078634624>] kasprintf+0xa6/0xd0
    [<00000000f17eea1c>] __irq_domain_alloc_fwnode+0x1ce/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000000d688957>] device_driver_attach+0xae/0x1b0
unreferenced object 0xffff88816c302400 (size 96):
  comm "test-ovn-2-swit", pid 10619, jiffies 4294958796 (age 374.800s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 e0 19 8b 83 ff ff ff ff  ................
    00 00 00 00 00 00 00 00 18 24 30 6c 81 88 ff ff  .........$0l....
  backtrace:
    [<00000000bcb7f3b1>] kmalloc_trace+0x27/0x110
    [<000000008cdbc98d>] __irq_domain_alloc_fwnode+0x51/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000000d688957>] device_driver_attach+0xae/0x1b0
    [<00000000003e203b>] bind_store+0x150/0x1f0
    [<000000003b2d7ae5>] kernfs_fop_write_iter+0x348/0x520
unreferenced object 0xffff88812cfd9180 (size 32):
  comm "test-ovn-2-swit", pid 10619, jiffies 4294958796 (age 374.800s)
  hex dump (first 32 bytes):
    50 43 49 2d 4d 53 49 58 2d 30 30 30 30 3a 30 38  PCI-MSIX-0000:08
    3a 30 30 2e 33 00 ff ff 73 00 00 00 00 00 00 00  :00.3...s.......
  backtrace:
    [<00000000bef783eb>] __kmalloc_node_track_caller+0x4c/0x1b0
    [<00000000f16b54a8>] kvasprintf+0xb0/0x130
    [<0000000078634624>] kasprintf+0xa6/0xd0
    [<00000000f17eea1c>] __irq_domain_alloc_fwnode+0x1ce/0x2b0
    [<00000000c57acf9d>] msi_create_device_irq_domain+0x283/0x670
    [<000000009b567982>] __pci_enable_msix_range+0x49e/0xdb0
    [<0000000077cc1445>] pci_alloc_irq_vectors_affinity+0x11f/0x1c0
    [<00000000532e9ef5>] mlx5_irq_table_create+0x24c/0x940 [mlx5_core]
    [<00000000fabd2b80>] mlx5_load+0x1fa/0x680 [mlx5_core]
    [<000000006bb22ae4>] mlx5_init_one+0x485/0x670 [mlx5_core]
    [<00000000eaa5e1ad>] probe_one+0x4c2/0x720 [mlx5_core]
    [<00000000df8efb43>] local_pci_probe+0xd6/0x170
    [<0000000085cb9924>] pci_device_probe+0x231/0x6e0
    [<000000002671d86e>] really_probe+0x1cf/0xaa0
    [<000000002aeba218>] __driver_probe_device+0x18f/0x470
    [<000000000d688957>] device_driver_attach+0xae/0x1b0

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [patch V3 13/33] x86/apic/vector: Provide MSI parent domain
  2023-01-04 12:34   ` [patch V3 13/33] " Jason Gunthorpe
@ 2023-01-09 20:32     ` Thomas Gleixner
  2023-01-10 12:14     ` Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: Thomas Gleixner @ 2023-01-09 20:32 UTC (permalink / raw)
  To: Jason Gunthorpe, Moshe Shemesh, Shay Drory
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman, Dave Jiang,
	Alex Williamson, Kevin Tian, Dan Williams, Logan Gunthorpe,
	Ashok Raj, Jon Mason, Allen Hubbe

Jason!

On Wed, Jan 04 2023 at 08:34, Jason Gunthorpe wrote:
>
> Our test team has discovered some kmem leak complaints on rc1 and
> bisected it to this patch.
>
> I don't see an obvious way that fwnode gets destroyed here. So maybe
> it should be like this?

I'm back from vacation now. Will have a look tomorrow.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 13/33] x86/apic/vector: Provide MSI parent domain
  2023-01-04 12:34   ` [patch V3 13/33] " Jason Gunthorpe
  2023-01-09 20:32     ` Thomas Gleixner
@ 2023-01-10 12:14     ` Thomas Gleixner
  2023-01-10 14:59       ` Jason Gunthorpe
  1 sibling, 1 reply; 126+ messages in thread
From: Thomas Gleixner @ 2023-01-10 12:14 UTC (permalink / raw)
  To: Jason Gunthorpe, Moshe Shemesh, Shay Drory
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman, Dave Jiang,
	Alex Williamson, Kevin Tian, Dan Williams, Logan Gunthorpe,
	Ashok Raj, Jon Mason, Allen Hubbe

Jason,

On Wed, Jan 04 2023 at 08:34, Jason Gunthorpe wrote:
> Our test team has discovered some kmem leak complaints on rc1 and
> bisected it to this patch.
>
> I don't see an obvious way that fwnode gets destroyed here. So maybe
> it should be like this?
>
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 955267bbc2be63..cbbcb7fd2bd00d 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -1000,7 +1000,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
>  fail:
>  	msi_unlock_descs(dev);
>  free_fwnode:
> -	kfree(fwnode);
> +	irq_domain_free_fwnode(fwnode); // ???

That's correct. kfree(fwnode) leaks fwnode->name

>  free_bundle:
>  	kfree(bundle);
>  	return false;
> @@ -1013,6 +1013,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
>   */
>  void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
>  {
> +	struct fwnode_handle *fwnode = NULL;
>  	struct msi_domain_info *info;
>  	struct irq_domain *domain;
>  
> @@ -1025,7 +1026,10 @@ void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
>  
>  	dev->msi.data->__domains[domid].domain = NULL;
>  	info = domain->host_data;
> +	if (domain->flags & IRQ_DOMAIN_FLAG_MSI_DEVICE)
> +		fwnode = domain->fwnode;

irq_domain_is_msi_device() ?

>  	irq_domain_remove(domain);
> +	irq_domain_free_fwnode(fwnode);

For some reason I thought the fwnode would be handled by
irq_domain_remove() but fwnode_handle_put() is a NOP for the named
fwnodes.

Care to send a proper patch with changelog?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 13/33] x86/apic/vector: Provide MSI parent domain
  2023-01-10 12:14     ` Thomas Gleixner
@ 2023-01-10 14:59       ` Jason Gunthorpe
  2023-01-11 16:02         ` Kalle Valo
  0 siblings, 1 reply; 126+ messages in thread
From: Jason Gunthorpe @ 2023-01-10 14:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Moshe Shemesh, Shay Drory, LKML, x86, Joerg Roedel, Will Deacon,
	linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Marc Zyngier,
	Greg Kroah-Hartman, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Tue, Jan 10, 2023 at 01:14:00PM +0100, Thomas Gleixner wrote:

> Care to send a proper patch with changelog?

Yes, I'll post it in a few days once the test team confirms it

Thanks,
Jason

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 13/33] x86/apic/vector: Provide MSI parent domain
  2023-01-10 14:59       ` Jason Gunthorpe
@ 2023-01-11 16:02         ` Kalle Valo
  2023-01-11 16:35           ` Jason Gunthorpe
  0 siblings, 1 reply; 126+ messages in thread
From: Kalle Valo @ 2023-01-11 16:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Thomas Gleixner, Moshe Shemesh, Shay Drory, LKML, x86,
	Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman, Dave Jiang,
	Alex Williamson, Kevin Tian, Dan Williams, Logan Gunthorpe,
	Ashok Raj, Jon Mason, Allen Hubbe

Jason Gunthorpe <jgg@nvidia.com> writes:

> On Tue, Jan 10, 2023 at 01:14:00PM +0100, Thomas Gleixner wrote:
>
>> Care to send a proper patch with changelog?
>
> Yes, I'll post it in a few days once the test team confirms it

I think I'm seeing the same leak and it's spamming logs on my test box a
lot. Let me know if you need any help with testing, I can do that pretty
quickly.

unreferenced object 0xffff888113dc7520 (size 96):
comm "insmod", pid 50676, jiffies 4301551867 (age 1463.666s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 25 68 a5 ff ff ff ff  .........%h.....
00 00 00 00 00 00 00 00 38 75 dc 13 81 88 ff ff  ........8u......
backtrace:
[<ffffffffa3105532>] __kmem_cache_alloc_node+0x1d2/0x2b0
[<ffffffffa2fdfb45>] kmalloc_trace+0x25/0x60
[<ffffffffa2cb8b42>] __irq_domain_alloc_fwnode+0x52/0x2b0
[<ffffffffa2cc6add>] msi_create_device_irq_domain+0x27d/0x630
[<ffffffffa3aaf5a9>] pci_setup_msi_device_domain+0xe9/0x120
[<ffffffffa3aababd>] __pci_enable_msi_range+0x3fd/0x5a0
[<ffffffffa3aa8ac3>] pci_alloc_irq_vectors_affinity+0x153/0x200
[<ffffffffa3aa8b7c>] pci_alloc_irq_vectors+0xc/0x10
[<ffffffffc0b75287>] ath11k_pci_alloc_msi+0xb7/0x610 [ath11k_pci]
[<ffffffffc0b7696e>] ath11k_pci_probe+0x5be/0x1090 [ath11k_pci]
[<ffffffffa3a8d4e9>] local_pci_probe+0xd9/0x170
[<ffffffffa3a8f687>] pci_call_probe+0x167/0x440
[<ffffffffa3a919f6>] pci_device_probe+0xa6/0x100
[<ffffffffa43c2c09>] really_probe+0x1c9/0xa50
[<ffffffffa43c361a>] __driver_probe_device+0x18a/0x460
[<ffffffffa43c393a>] driver_probe_device+0x4a/0x120

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 13/33] x86/apic/vector: Provide MSI parent domain
  2023-01-11 16:02         ` Kalle Valo
@ 2023-01-11 16:35           ` Jason Gunthorpe
  2023-01-11 17:07             ` Kalle Valo
  0 siblings, 1 reply; 126+ messages in thread
From: Jason Gunthorpe @ 2023-01-11 16:35 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Thomas Gleixner, Moshe Shemesh, Shay Drory, LKML, x86,
	Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman, Dave Jiang,
	Alex Williamson, Kevin Tian, Dan Williams, Logan Gunthorpe,
	Ashok Raj, Jon Mason, Allen Hubbe

On Wed, Jan 11, 2023 at 06:02:13PM +0200, Kalle Valo wrote:
> Jason Gunthorpe <jgg@nvidia.com> writes:
> 
> > On Tue, Jan 10, 2023 at 01:14:00PM +0100, Thomas Gleixner wrote:
> >
> >> Care to send a proper patch with changelog?
> >
> > Yes, I'll post it in a few days once the test team confirms it
> 
> I think I'm seeing the same leak and it's spamming logs on my test box a
> lot. Let me know if you need any help with testing, I can do that pretty
> quickly.

https://github.com/jgunthorpe/linux/commits/msi_fwnode_leak

Jason

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [patch V3 13/33] x86/apic/vector: Provide MSI parent domain
  2023-01-11 16:35           ` Jason Gunthorpe
@ 2023-01-11 17:07             ` Kalle Valo
  0 siblings, 0 replies; 126+ messages in thread
From: Kalle Valo @ 2023-01-11 17:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Thomas Gleixner, Moshe Shemesh, Shay Drory, LKML, x86,
	Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman, Dave Jiang,
	Alex Williamson, Kevin Tian, Dan Williams, Logan Gunthorpe,
	Ashok Raj, Jon Mason, Allen Hubbe

Jason Gunthorpe <jgg@nvidia.com> writes:

> On Wed, Jan 11, 2023 at 06:02:13PM +0200, Kalle Valo wrote:
>> Jason Gunthorpe <jgg@nvidia.com> writes:
>> 
>> > On Tue, Jan 10, 2023 at 01:14:00PM +0100, Thomas Gleixner wrote:
>> >
>> >> Care to send a proper patch with changelog?
>> >
>> > Yes, I'll post it in a few days once the test team confirms it
>> 
>> I think I'm seeing the same leak and it's spamming logs on my test box a
>> lot. Let me know if you need any help with testing, I can do that pretty
>> quickly.
>
> https://github.com/jgunthorpe/linux/commits/msi_fwnode_leak

Nice, this fixes the issue for me. I don't see memleaks anymore while
running my ath11k regression tests. Thanks!

Tested-by: Kalle Valo <kvalo@kernel.org>

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2022-11-24 23:25 ` [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc() Thomas Gleixner
                     ` (2 preceding siblings ...)
  2022-12-13 19:04   ` [patch V3 09/33] " Guenter Roeck
@ 2023-02-20 17:11   ` Russell King (Oracle)
  2023-02-20 18:29     ` Marc Zyngier
  2023-02-20 18:30     ` [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking " Thomas Gleixner
  3 siblings, 2 replies; 126+ messages in thread
From: Russell King (Oracle) @ 2023-02-20 17:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Fri, Nov 25, 2022 at 12:25:59AM +0100, Thomas Gleixner wrote:
> Per device domains provide the real domain size to the core code. This
> allows range checking on insertion of MSI descriptors and also paves the
> way for dynamic index allocations which are required e.g. for IMS. This
> avoids external mechanisms like bitmaps on the device side and just
> utilizes the core internal MSI descriptor storxe for it.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Hi Thomas,

This patch appears to cause a regression on Macchiatobin, delaying the
boot by about ten seconds due to all the warnings the kernel now
produces.

> @@ -136,11 +149,16 @@ static bool msi_desc_match(struct msi_de
>  
>  static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl)
>  {
> +	unsigned int hwsize;
> +
>  	if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS ||
> -			 !dev->msi.data->__domains[ctrl->domid].domain ||
> -			 ctrl->first > ctrl->last ||
> -			 ctrl->first > MSI_MAX_INDEX ||
> -			 ctrl->last > MSI_MAX_INDEX))
> +			 !dev->msi.data->__domains[ctrl->domid].domain))
> +		return false;
> +
> +	hwsize = msi_domain_get_hwsize(dev, ctrl->domid);

This calls msi_get_device_domain() without taking dev->msi.data->mutex,
resulting in the lockdep_assert_held() firing for what seems to be every
MSI created by the Armada 8040 ICU driver, which suggests something isn't
taking the lock as you expect. Please can you take a look and propose a
patch to fix this regression.

Thanks.

[    0.960451] WARNING: CPU: 2 PID: 1 at kernel/irq/msi.c:588 msi_get_device_domain+0x70/0xa0
[    0.967454] Modules linked in:
[    0.969216] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.2.0+ #1134
[    0.974116] Hardware name: Marvell 8040 MACCHIATOBin Single-shot (DT)
[    0.979276] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.984961] pc : msi_get_device_domain+0x70/0xa0
[    0.988292] lr : msi_get_device_domain+0x6c/0xa0
[    0.991623] sp : ffffffc080083460
[    0.993643] x29: ffffffc080083460 x28: 0000000000000000 x27: ffffffc041dcb6c8
[    0.999506] x26: ffffff8101f23810 x25: ffffffc080083668 x24: ffffff8101f23080
[    1.005370] x23: 0000000000000012 x22: ffffff81003d1000 x21: ffffff81025dfd90
[    1.011234] x20: ffffff8101f23810 x19: 0000000000000000 x18: 00000000fffffffd
[    1.017097] x17: 00000000cc510454 x16: 0000000000000051 x15: 0000000000000002
[    1.022960] x14: 00000000000389cb x13: 0000000000000001 x12: 0000000000000040
[    1.028822] x11: ffffff8100400490 x10: ffffff8100400492 x9 : 0000000000000000
[    1.034685] x8 : 0000000000000000 x7 : ffffff81001c8858 x6 : ffffffc0402ad718
[    1.040547] x5 : 00000000ffffffff x4 : ffffff81003d4c80 x3 : 0000000000000000
[    1.046410] x2 : ffffffc0fed09000 x1 : 0000000000000000 x0 : 0000000000000000
[    1.052274] Call trace:
[    1.053422]  msi_get_device_domain+0x70/0xa0
[    1.056404]  msi_ctrl_valid+0x5c/0x94
[    1.058775]  msi_domain_populate_irqs+0x64/0x1b0
[    1.062106]  platform_msi_device_domain_alloc+0x20/0x30
[    1.066048]  mvebu_icu_irq_domain_alloc+0xa0/0x1a0
[    1.069555]  __irq_domain_alloc_irqs+0xf8/0x46c
[    1.072799]  irq_create_fwspec_mapping+0x224/0x320
[    1.076303]  irq_create_of_mapping+0x68/0x90
[    1.079284]  of_irq_get+0x88/0xd0
[    1.081308]  platform_get_irq_optional+0x20/0x114
[    1.084725]  platform_get_irq+0x18/0x50
[    1.087269]  dw8250_probe+0x60/0x6e0
[    1.089552]  platform_probe+0x64/0xd0
[    1.091923]  really_probe+0xb8/0x2d4
[    1.094207]  __driver_probe_device+0x74/0xdc
[    1.097190]  driver_probe_device+0xd0/0x160
[    1.100085]  __driver_attach+0x94/0x1a0
[    1.102631]  bus_for_each_dev+0x6c/0xc0
[    1.105176]  driver_attach+0x20/0x30
[    1.107460]  bus_add_driver+0x148/0x200
[    1.110006]  driver_register+0x74/0x120
[    1.112550]  __platform_driver_register+0x24/0x30
[    1.115966]  dw8250_platform_driver_init+0x18/0x20
[    1.119473]  do_one_initcall+0x70/0x370
[    1.122018]  kernel_init_freeable+0x1d0/0x238
[    1.125087]  kernel_init+0x20/0x120
[    1.127283]  ret_from_fork+0x10/0x20
[    1.129567] ---[ end trace 0000000000000000 ]---

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2023-02-20 17:11   ` [REGRESSION] " Russell King (Oracle)
@ 2023-02-20 18:29     ` Marc Zyngier
  2023-02-20 18:43       ` Thomas Gleixner
                         ` (2 more replies)
  2023-02-20 18:30     ` [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking " Thomas Gleixner
  1 sibling, 3 replies; 126+ messages in thread
From: Marc Zyngier @ 2023-02-20 18:29 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Thomas Gleixner, LKML, x86, Joerg Roedel, Will Deacon, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Mon, 20 Feb 2023 17:11:23 +0000,
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> 
> On Fri, Nov 25, 2022 at 12:25:59AM +0100, Thomas Gleixner wrote:
> > Per device domains provide the real domain size to the core code. This
> > allows range checking on insertion of MSI descriptors and also paves the
> > way for dynamic index allocations which are required e.g. for IMS. This
> > avoids external mechanisms like bitmaps on the device side and just
> > utilizes the core internal MSI descriptor storxe for it.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> Hi Thomas,
> 
> This patch appears to cause a regression on Macchiatobin, delaying the
> boot by about ten seconds due to all the warnings the kernel now
> produces.
> 
> > @@ -136,11 +149,16 @@ static bool msi_desc_match(struct msi_de
> >  
> >  static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl)
> >  {
> > +	unsigned int hwsize;
> > +
> >  	if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS ||
> > -			 !dev->msi.data->__domains[ctrl->domid].domain ||
> > -			 ctrl->first > ctrl->last ||
> > -			 ctrl->first > MSI_MAX_INDEX ||
> > -			 ctrl->last > MSI_MAX_INDEX))
> > +			 !dev->msi.data->__domains[ctrl->domid].domain))
> > +		return false;
> > +
> > +	hwsize = msi_domain_get_hwsize(dev, ctrl->domid);
> 
> This calls msi_get_device_domain() without taking dev->msi.data->mutex,
> resulting in the lockdep_assert_held() firing for what seems to be every
> MSI created by the Armada 8040 ICU driver, which suggests something isn't
> taking the lock as you expect. Please can you take a look and propose a
> patch to fix this regression.

Since you already worked it out, I only had to translate your words
into the patch below, which solves it for me.

Lockdep also reports[1] a possible circular locking dependency between
phy_attach_direct() and rtnetlink_rcv_msg(), which looks interesting.

Thanks,

	M.

[1] https://paste.debian.net/1271454/

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 783a3e6a0b10..13d96495e6d0 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1084,10 +1084,13 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev,
 	struct xarray *xa;
 	int ret, virq;
 
-	if (!msi_ctrl_valid(dev, &ctrl))
-		return -EINVAL;
-
 	msi_lock_descs(dev);
+
+	if (!msi_ctrl_valid(dev, &ctrl)) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
 	ret = msi_domain_add_simple_msi_descs(dev, &ctrl);
 	if (ret)
 		goto unlock;

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2023-02-20 17:11   ` [REGRESSION] " Russell King (Oracle)
  2023-02-20 18:29     ` Marc Zyngier
@ 2023-02-20 18:30     ` Thomas Gleixner
  1 sibling, 0 replies; 126+ messages in thread
From: Thomas Gleixner @ 2023-02-20 18:30 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Mon, Feb 20 2023 at 17:11, Russell King wrote:
> On Fri, Nov 25, 2022 at 12:25:59AM +0100, Thomas Gleixner wrote:
> Hi Thomas,
>
> This patch appears to cause a regression on Macchiatobin, delaying the
> boot by about ten seconds due to all the warnings the kernel now
> produces.
>
>> @@ -136,11 +149,16 @@ static bool msi_desc_match(struct msi_de
>>  
>>  static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl)
>>  {
>> +	unsigned int hwsize;
>> +
>>  	if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS ||
>> -			 !dev->msi.data->__domains[ctrl->domid].domain ||
>> -			 ctrl->first > ctrl->last ||
>> -			 ctrl->first > MSI_MAX_INDEX ||
>> -			 ctrl->last > MSI_MAX_INDEX))
>> +			 !dev->msi.data->__domains[ctrl->domid].domain))
>> +		return false;
>> +
>> +	hwsize = msi_domain_get_hwsize(dev, ctrl->domid);
>
> This calls msi_get_device_domain() without taking dev->msi.data->mutex,
> resulting in the lockdep_assert_held() firing for what seems to be every
> MSI created by the Armada 8040 ICU driver, which suggests something isn't
> taking the lock as you expect. Please can you take a look and propose a
> patch to fix this regression.

Groan. I'll have a look.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2023-02-20 18:29     ` Marc Zyngier
@ 2023-02-20 18:43       ` Thomas Gleixner
  2023-02-20 19:00       ` Russell King (Oracle)
  2023-02-20 19:17       ` Russell King (Oracle)
  2 siblings, 0 replies; 126+ messages in thread
From: Thomas Gleixner @ 2023-02-20 18:43 UTC (permalink / raw)
  To: Marc Zyngier, Russell King (Oracle)
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Greg Kroah-Hartman, Jason Gunthorpe,
	Dave Jiang, Alex Williamson, Kevin Tian, Dan Williams,
	Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Mon, Feb 20 2023 at 18:29, Marc Zyngier wrote:
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 783a3e6a0b10..13d96495e6d0 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -1084,10 +1084,13 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev,
>  	struct xarray *xa;
>  	int ret, virq;
>  
> -	if (!msi_ctrl_valid(dev, &ctrl))
> -		return -EINVAL;
> -
>  	msi_lock_descs(dev);
> +
> +	if (!msi_ctrl_valid(dev, &ctrl)) {
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +
>  	ret = msi_domain_add_simple_msi_descs(dev, &ctrl);
>  	if (ret)
>  		goto unlock;

Yup, you beat me by a minute :)

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2023-02-20 18:29     ` Marc Zyngier
  2023-02-20 18:43       ` Thomas Gleixner
@ 2023-02-20 19:00       ` Russell King (Oracle)
  2023-02-20 19:17       ` Russell King (Oracle)
  2 siblings, 0 replies; 126+ messages in thread
From: Russell King (Oracle) @ 2023-02-20 19:00 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, LKML, x86, Joerg Roedel, Will Deacon, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Mon, Feb 20, 2023 at 06:29:33PM +0000, Marc Zyngier wrote:
> On Mon, 20 Feb 2023 17:11:23 +0000,
> "Russell King (Oracle)" <linux@armlinux.org.uk> wrote:
> > 
> > On Fri, Nov 25, 2022 at 12:25:59AM +0100, Thomas Gleixner wrote:
> > > Per device domains provide the real domain size to the core code. This
> > > allows range checking on insertion of MSI descriptors and also paves the
> > > way for dynamic index allocations which are required e.g. for IMS. This
> > > avoids external mechanisms like bitmaps on the device side and just
> > > utilizes the core internal MSI descriptor storxe for it.
> > > 
> > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > 
> > Hi Thomas,
> > 
> > This patch appears to cause a regression on Macchiatobin, delaying the
> > boot by about ten seconds due to all the warnings the kernel now
> > produces.
> > 
> > > @@ -136,11 +149,16 @@ static bool msi_desc_match(struct msi_de
> > >  
> > >  static bool msi_ctrl_valid(struct device *dev, struct msi_ctrl *ctrl)
> > >  {
> > > +	unsigned int hwsize;
> > > +
> > >  	if (WARN_ON_ONCE(ctrl->domid >= MSI_MAX_DEVICE_IRQDOMAINS ||
> > > -			 !dev->msi.data->__domains[ctrl->domid].domain ||
> > > -			 ctrl->first > ctrl->last ||
> > > -			 ctrl->first > MSI_MAX_INDEX ||
> > > -			 ctrl->last > MSI_MAX_INDEX))
> > > +			 !dev->msi.data->__domains[ctrl->domid].domain))
> > > +		return false;
> > > +
> > > +	hwsize = msi_domain_get_hwsize(dev, ctrl->domid);
> > 
> > This calls msi_get_device_domain() without taking dev->msi.data->mutex,
> > resulting in the lockdep_assert_held() firing for what seems to be every
> > MSI created by the Armada 8040 ICU driver, which suggests something isn't
> > taking the lock as you expect. Please can you take a look and propose a
> > patch to fix this regression.
> 
> Since you already worked it out, I only had to translate your words
> into the patch below, which solves it for me.

Thanks for making incorrect assumptions. I hadn't "worked it out",
I merely reported it and stated the bleeding obvious.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2023-02-20 18:29     ` Marc Zyngier
  2023-02-20 18:43       ` Thomas Gleixner
  2023-02-20 19:00       ` Russell King (Oracle)
@ 2023-02-20 19:17       ` Russell King (Oracle)
  2023-02-20 19:43         ` Andrew Lunn
  2 siblings, 1 reply; 126+ messages in thread
From: Russell King (Oracle) @ 2023-02-20 19:17 UTC (permalink / raw)
  To: Marc Zyngier, Andrew Lunn
  Cc: Thomas Gleixner, LKML, x86, Joerg Roedel, Will Deacon, linux-pci,
	Bjorn Helgaas, Lorenzo Pieralisi, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Mon, Feb 20, 2023 at 06:29:33PM +0000, Marc Zyngier wrote:
> Lockdep also reports[1] a possible circular locking dependency between
> phy_attach_direct() and rtnetlink_rcv_msg(), which looks interesting.
> 
> [1] https://paste.debian.net/1271454/

Adding Andrew, but really this should be in a separate thread, since
this has nothing to do with MSI.

It looks like the open path takes the RTNL lock followed by the phydev
lock, whereas the PHY probe path takes the phydev lock, and then if
there's a SFP attached to the PHY, we end up taking the RTNL lock.
That's going to be utterly horrid to try and solve, and isn't going
to be quick to fix.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc()
  2023-02-20 19:17       ` Russell King (Oracle)
@ 2023-02-20 19:43         ` Andrew Lunn
  2023-02-20 20:15           ` phylib locking (was: Re: [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking) " Russell King (Oracle)
  0 siblings, 1 reply; 126+ messages in thread
From: Andrew Lunn @ 2023-02-20 19:43 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Marc Zyngier, Thomas Gleixner, LKML, x86, Joerg Roedel,
	Will Deacon, linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Greg Kroah-Hartman, Jason Gunthorpe, Dave Jiang, Alex Williamson,
	Kevin Tian, Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason,
	Allen Hubbe

On Mon, Feb 20, 2023 at 07:17:11PM +0000, Russell King (Oracle) wrote:
> On Mon, Feb 20, 2023 at 06:29:33PM +0000, Marc Zyngier wrote:
> > Lockdep also reports[1] a possible circular locking dependency between
> > phy_attach_direct() and rtnetlink_rcv_msg(), which looks interesting.
> > 
> > [1] https://paste.debian.net/1271454/
> 
> Adding Andrew, but really this should be in a separate thread, since
> this has nothing to do with MSI.
> 
> It looks like the open path takes the RTNL lock followed by the phydev
> lock, whereas the PHY probe path takes the phydev lock, and then if
> there's a SFP attached to the PHY, we end up taking the RTNL lock.
> That's going to be utterly horrid to try and solve, and isn't going
> to be quick to fix.

What are we actually trying to protect in phy_probe() when we take the
lock and call phydev->drv->probe(phydev) ?

The main purpose of the lock is to protect members of phydev, such as
link, speed, duplex, which can be inconsistent when the lock is not
held. But the PHY is not attached to a MAC yet, so a MAC cannot be
using it, and those members of phydev are not valid yet anyway.

The lock also prevents parallel operation on the device by phylib, but
i cannot think of how that could happen at this early stage in the
life of the PHY.

So maybe we can move the mutex_lock() after the call to
phydev->drv->probe()?

	Andrew

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: phylib locking (was: Re: [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking) to msi_insert_desc()
  2023-02-20 19:43         ` Andrew Lunn
@ 2023-02-20 20:15           ` Russell King (Oracle)
  2023-02-21 14:57             ` Russell King (Oracle)
  0 siblings, 1 reply; 126+ messages in thread
From: Russell King (Oracle) @ 2023-02-20 20:15 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Marc Zyngier, LKML

[dropped most on the Cc as this has probably deviated off topic for
them... and changed the subject]

On Mon, Feb 20, 2023 at 08:43:44PM +0100, Andrew Lunn wrote:
> On Mon, Feb 20, 2023 at 07:17:11PM +0000, Russell King (Oracle) wrote:
> > On Mon, Feb 20, 2023 at 06:29:33PM +0000, Marc Zyngier wrote:
> > > Lockdep also reports[1] a possible circular locking dependency between
> > > phy_attach_direct() and rtnetlink_rcv_msg(), which looks interesting.
> > > 
> > > [1] https://paste.debian.net/1271454/
> > 
> > Adding Andrew, but really this should be in a separate thread, since
> > this has nothing to do with MSI.
> > 
> > It looks like the open path takes the RTNL lock followed by the phydev
> > lock, whereas the PHY probe path takes the phydev lock, and then if
> > there's a SFP attached to the PHY, we end up taking the RTNL lock.
> > That's going to be utterly horrid to try and solve, and isn't going
> > to be quick to fix.
> 
> What are we actually trying to protect in phy_probe() when we take the
> lock and call phydev->drv->probe(phydev) ?
> 
> The main purpose of the lock is to protect members of phydev, such as
> link, speed, duplex, which can be inconsistent when the lock is not
> held. But the PHY is not attached to a MAC yet, so a MAC cannot be
> using it, and those members of phydev are not valid yet anyway.
> 
> The lock also prevents parallel operation on the device by phylib, but
> i cannot think of how that could happen at this early stage in the
> life of the PHY.
> 
> So maybe we can move the mutex_lock() after the call to
> phydev->drv->probe()?

That's what I've been thinking too - I dug back in the history, and
it was a spin_lock_bh(), and before that it was a spin_lock().

The patch that converted it to a spin_lock_bh() is a brilliant
example of a poor commit message "Lock debugging finds a problem"
but doesn't say _what_ the problem was! Going back further still, the
spin_lock() was there from the very beginnings of PHYLIB. So the
reasoning for having a lock here has been lost in the depths of time.

The lock certainly doesn't prevent any interaction with
phy_attach_direct(), so it seems to be utterly pointless to take
the lock in the probe() function.

So yes, I agree, we can move the lock - and I wonder whether we
could just get rid of it completely in phy_probe().

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: phylib locking (was: Re: [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking) to msi_insert_desc()
  2023-02-20 20:15           ` phylib locking (was: Re: [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking) " Russell King (Oracle)
@ 2023-02-21 14:57             ` Russell King (Oracle)
  0 siblings, 0 replies; 126+ messages in thread
From: Russell King (Oracle) @ 2023-02-21 14:57 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Marc Zyngier, LKML

On Mon, Feb 20, 2023 at 08:15:59PM +0000, Russell King (Oracle) wrote:
> [dropped most on the Cc as this has probably deviated off topic for
> them... and changed the subject]
> 
> On Mon, Feb 20, 2023 at 08:43:44PM +0100, Andrew Lunn wrote:
> > On Mon, Feb 20, 2023 at 07:17:11PM +0000, Russell King (Oracle) wrote:
> > > On Mon, Feb 20, 2023 at 06:29:33PM +0000, Marc Zyngier wrote:
> > > > Lockdep also reports[1] a possible circular locking dependency between
> > > > phy_attach_direct() and rtnetlink_rcv_msg(), which looks interesting.
> > > > 
> > > > [1] https://paste.debian.net/1271454/
> > > 
> > > Adding Andrew, but really this should be in a separate thread, since
> > > this has nothing to do with MSI.
> > > 
> > > It looks like the open path takes the RTNL lock followed by the phydev
> > > lock, whereas the PHY probe path takes the phydev lock, and then if
> > > there's a SFP attached to the PHY, we end up taking the RTNL lock.
> > > That's going to be utterly horrid to try and solve, and isn't going
> > > to be quick to fix.
> > 
> > What are we actually trying to protect in phy_probe() when we take the
> > lock and call phydev->drv->probe(phydev) ?
> > 
> > The main purpose of the lock is to protect members of phydev, such as
> > link, speed, duplex, which can be inconsistent when the lock is not
> > held. But the PHY is not attached to a MAC yet, so a MAC cannot be
> > using it, and those members of phydev are not valid yet anyway.
> > 
> > The lock also prevents parallel operation on the device by phylib, but
> > i cannot think of how that could happen at this early stage in the
> > life of the PHY.
> > 
> > So maybe we can move the mutex_lock() after the call to
> > phydev->drv->probe()?
> 
> That's what I've been thinking too - I dug back in the history, and
> it was a spin_lock_bh(), and before that it was a spin_lock().
> 
> The patch that converted it to a spin_lock_bh() is a brilliant
> example of a poor commit message "Lock debugging finds a problem"
> but doesn't say _what_ the problem was! Going back further still, the
> spin_lock() was there from the very beginnings of PHYLIB. So the
> reasoning for having a lock here has been lost in the depths of time.
> 
> The lock certainly doesn't prevent any interaction with
> phy_attach_direct(), so it seems to be utterly pointless to take
> the lock in the probe() function.
> 
> So yes, I agree, we can move the lock - and I wonder whether we
> could just get rid of it completely in phy_probe().

Thinking about this more, I think taking phydev->lock in both
phy_probe() and phy_remove() are both entirely pointless, so I think
we should remove both and be done with this. As I note above, it does
nothing to stop a race between phy_attach_direct() and phy_probe() or
even phy_remove(). So, I think this is entirely sensible:

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 71becceb8764..b46a074b27e4 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -3098,8 +3098,6 @@ static int phy_probe(struct device *dev)
 	if (phydrv->flags & PHY_IS_INTERNAL)
 		phydev->is_internal = true;
 
-	mutex_lock(&phydev->lock);
-
 	/* Deassert the reset signal */
 	phy_device_reset(phydev, 0);
 
@@ -3173,8 +3171,6 @@ static int phy_probe(struct device *dev)
 	if (err)
 		phy_device_reset(phydev, 1);
 
-	mutex_unlock(&phydev->lock);
-
 	return err;
 }
 
@@ -3184,9 +3180,7 @@ static int phy_remove(struct device *dev)
 
 	cancel_delayed_work_sync(&phydev->state_queue);
 
-	mutex_lock(&phydev->lock);
 	phydev->state = PHY_DOWN;
-	mutex_unlock(&phydev->lock);
 
 	sfp_bus_del_upstream(phydev->sfp_bus);
 	phydev->sfp_bus = NULL;

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [patch V3 28/33] PCI/MSI: Provide IMS (Interrupt Message Store) support
  2022-11-24 23:26 ` [patch V3 28/33] PCI/MSI: Provide IMS (Interrupt Message Store) support Thomas Gleixner
  2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
  2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
@ 2024-03-27 16:32   ` Bjorn Helgaas
  2024-03-29  1:41     ` Tian, Kevin
  2 siblings, 1 reply; 126+ messages in thread
From: Bjorn Helgaas @ 2024-03-27 16:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Dave Jiang, Alex Williamson, Kevin Tian,
	Dan Williams, Logan Gunthorpe, Ashok Raj, Jon Mason, Allen Hubbe

On Fri, Nov 25, 2022 at 12:26:29AM +0100, Thomas Gleixner wrote:
> IMS (Interrupt Message Store) is a new specification which allows
> implementation specific storage of MSI messages contrary to the
> strict standard specified MSI and MSI-X message stores.
> ...

> + * pci_create_ims_domain - Create a secondary IMS domain for a PCI device

> + * Return: True on success, false otherwise

> +bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
> +			   unsigned int hwsize, void *data)

pci_create_ims_domain() is exported for use by modules, but AFAICT,
there is no in-tree user of this yet.

I assume one is coming, but if there isn't one on the near horizon,
we could/should remove this for now.

Either way, I think "bool" is not the optimal return type because
"pci_create_ims_domain" doesn't lend itself to a "true/false" reading
and most interfaces that actually do something return 0 for success or
a negative errno, so this will look like the opposite in the caller.

I have similar comments about the following interfaces returning
"bool", but they are internal to the PCI core:

  bool pci_create_device_domain()
  bool pci_setup_msi_device_domain()
  bool pci_setup_msix_device_domain()

Bjorn

^ permalink raw reply	[flat|nested] 126+ messages in thread

* RE: [patch V3 28/33] PCI/MSI: Provide IMS (Interrupt Message Store) support
  2024-03-27 16:32   ` [patch V3 28/33] " Bjorn Helgaas
@ 2024-03-29  1:41     ` Tian, Kevin
  0 siblings, 0 replies; 126+ messages in thread
From: Tian, Kevin @ 2024-03-29  1:41 UTC (permalink / raw)
  To: Bjorn Helgaas, Thomas Gleixner
  Cc: LKML, x86, Joerg Roedel, Will Deacon, linux-pci, Bjorn Helgaas,
	Lorenzo Pieralisi, Marc Zyngier, Greg Kroah-Hartman,
	Jason Gunthorpe, Jiang, Dave, Alex Williamson, Williams, Dan J,
	Logan Gunthorpe, Raj, Ashok, Jon Mason, Allen Hubbe

> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Thursday, March 28, 2024 12:33 AM
> 
> On Fri, Nov 25, 2022 at 12:26:29AM +0100, Thomas Gleixner wrote:
> > IMS (Interrupt Message Store) is a new specification which allows
> > implementation specific storage of MSI messages contrary to the
> > strict standard specified MSI and MSI-X message stores.
> > ...
> 
> > + * pci_create_ims_domain - Create a secondary IMS domain for a PCI
> device
> 
> > + * Return: True on success, false otherwise
> 
> > +bool pci_create_ims_domain(struct pci_dev *pdev, const struct
> msi_domain_template *template,
> > +			   unsigned int hwsize, void *data)
> 
> pci_create_ims_domain() is exported for use by modules, but AFAICT,
> there is no in-tree user of this yet.
> 
> I assume one is coming, but if there isn't one on the near horizon,
> we could/should remove this for now.
> 

There won't be one in the near term. So I agree it's a good idea to
remove it. Anyway this can be easily added back when the real
user comes.

^ permalink raw reply	[flat|nested] 126+ messages in thread

end of thread, other threads:[~2024-03-29  1:41 UTC | newest]

Thread overview: 126+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-24 23:25 [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Thomas Gleixner
2022-11-24 23:25 ` [patch V3 01/33] genirq/msi: Rearrange MSI domain flags Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-11-24 23:25 ` [patch V3 02/33] genirq/msi: Provide struct msi_parent_ops Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-11-24 23:25 ` [patch V3 03/33] genirq/msi: Provide data structs for per device domains Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-11-24 23:25 ` [patch V3 04/33] genirq/msi: Add size info to struct msi_domain_info Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-11-24 23:25 ` [patch V3 05/33] genirq/msi: Split msi_create_irq_domain() Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-11-24 23:25 ` [patch V3 06/33] genirq/irqdomain: Add irq_domain::dev for per device MSI domains Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] genirq/irqdomain: Add irq_domain:: Dev " tip-bot2 for Thomas Gleixner
2022-11-24 23:25 ` [patch V3 07/33] genirq/msi: Provide msi_create/free_device_irq_domain() Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-11-24 23:25 ` [patch V3 08/33] genirq/msi: Provide msi_match_device_domain() Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-11-24 23:25 ` [patch V3 09/33] genirq/msi: Add range checking to msi_insert_desc() Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-12-13 19:04   ` [patch V3 09/33] " Guenter Roeck
2022-12-14  9:42     ` Niklas Schnelle
2022-12-15 14:49       ` Thomas Gleixner
2022-12-15 16:23         ` Matthew Rosato
2022-12-15 21:32           ` Guenter Roeck
2022-12-16  9:53           ` Marc Zyngier
2022-12-16 13:50             ` Matthew Rosato
2022-12-16 13:58               ` Marc Zyngier
2022-12-16 14:03                 ` Marc Zyngier
2022-12-16 14:11                   ` Matthew Rosato
2022-12-16 17:30                     ` Marc Zyngier
2022-12-16 15:47                 ` Guenter Roeck
2022-12-17  0:45                 ` Guenter Roeck
2022-12-17 10:46                   ` Marc Zyngier
2022-12-17 13:36                     ` Guenter Roeck
2023-02-20 17:11   ` [REGRESSION] " Russell King (Oracle)
2023-02-20 18:29     ` Marc Zyngier
2023-02-20 18:43       ` Thomas Gleixner
2023-02-20 19:00       ` Russell King (Oracle)
2023-02-20 19:17       ` Russell King (Oracle)
2023-02-20 19:43         ` Andrew Lunn
2023-02-20 20:15           ` phylib locking (was: Re: [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking) " Russell King (Oracle)
2023-02-21 14:57             ` Russell King (Oracle)
2023-02-20 18:30     ` [REGRESSION] Re: [patch V3 09/33] genirq/msi: Add range checking " Thomas Gleixner
2022-11-24 23:26 ` [patch V3 10/33] PCI/MSI: Split __pci_write_msi_msg() Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 11/33] genirq/msi: Provide BUS_DEVICE_PCI_MSI[X] Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 12/33] PCI/MSI: Add support for per device MSI[X] domains Thomas Gleixner
2022-11-28  4:46   ` Tian, Kevin
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 13/33] x86/apic/vector: Provide MSI parent domain Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2023-01-04 12:34   ` [patch V3 13/33] " Jason Gunthorpe
2023-01-09 20:32     ` Thomas Gleixner
2023-01-10 12:14     ` Thomas Gleixner
2023-01-10 14:59       ` Jason Gunthorpe
2023-01-11 16:02         ` Kalle Valo
2023-01-11 16:35           ` Jason Gunthorpe
2023-01-11 17:07             ` Kalle Valo
2022-11-24 23:26 ` [patch V3 14/33] PCI/MSI: Remove unused pci_dev_has_special_msi_domain() Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 15/33] iommu/vt-d: Switch to MSI parent domains Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 16/33] iommu/amd: Switch to MSI base domains Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 17/33] x86/apic/msi: Remove arch_create_remap_msi_irq_domain() Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 18/33] genirq/msi: Provide struct msi_map Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 19/33] genirq/msi: Provide msi_desc::msi_data Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] genirq/msi: Provide msi_desc:: Msi_data tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 20/33] genirq/msi: Provide msi_domain_ops::prepare_desc() Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] genirq/msi: Provide msi_domain_ops:: Prepare_desc() tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 21/33] genirq/msi: Provide msi_domain_alloc_irq_at() Thomas Gleixner
2022-11-28 14:39   ` Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 22/33] genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 23/33] PCI/MSI: Split MSI-X descriptor setup Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 24/33] PCI/MSI: Provide prepare_desc() MSI domain op Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 25/33] PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X Thomas Gleixner
2022-11-24 23:26 ` [patch V3 26/33] x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 27/33] genirq/msi: Provide constants for PCI/IMS support Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 28/33] PCI/MSI: Provide IMS (Interrupt Message Store) support Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2024-03-27 16:32   ` [patch V3 28/33] " Bjorn Helgaas
2024-03-29  1:41     ` Tian, Kevin
2022-11-24 23:26 ` [patch V3 29/33] PCI/MSI: Provide pci_ims_alloc/free_irq() Thomas Gleixner
2022-11-28  4:47   ` Tian, Kevin
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 30/33] x86/apic/msi: Enable PCI/IMS Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 31/33] iommu/vt-d: " Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 32/33] iommu/amd: " Thomas Gleixner
2022-12-05 18:25   ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2022-12-05 21:41   ` tip-bot2 for Thomas Gleixner
2022-11-24 23:26 ` [patch V3 33/33] irqchip: Add IDXD Interrupt Message Store driver Thomas Gleixner
2022-11-28  4:50 ` [patch V3 00/33] genirq, PCI/MSI: Support for per device MSI and PCI/IMS - Part 3 implementation Tian, Kevin
2022-12-05 11:07 ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.