linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes
@ 2016-07-19 13:02 Eric Auger
  2016-07-19 13:02 ` [PATCH v11 01/10] genirq/msi: export msi_get_domain_info Eric Auger
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Eric Auger @ 2016-07-19 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes

This series implements the MSI address mapping/unmapping in the MSI layer.
IOMMU binding happens on pci_enable_msi since this function can sleep and
return errors. On msi_domain_set_affinity, msi_domain_(de)activate, which
are not allowed to sleep, we simply look for the already existing binding.

Irqchips likely to be downstream to iommus (not bypassing MSIs) are supposed
to register their MSI doorbells. This make possible to retrieve their
characteristics, detect whether MSI assignment is safe and report to the
userspace the size/alignment of the guest PA window to provision for MSI
mapping.

A new MSI domain info flag value is introduced to report whether the msi
domain implements IRQ remapping. GIC v3 ITS is the first MSI controller
advertising it. This flag value will be used by VFIO subsystem to
determine whether MSI forwarding is safe.

More details & context can be found at:
http://www.linaro.org/blog/core-dump/kvm-pciemsi-passthrough-armarm64/

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/linux/tree/v4.7-rc7-passthrough-v11

History:

v10 -> v11:
- restored irq_chip msi_doorbell_info since lookup function introduced
  in v10 (taking the chip_data as parameter) did not work for ITS and
  most probably for other irqchips/
- changed the registration API
- eventually tested with GICv3 ITS

v9 -> v10:
- was forced to introduce important changes on parts that were reviewed
  already :-( I took the initiative to replace the irqchip's
  get_doorbell_info callback by a new API, msi-doorbell).
  the new API makes possible to register, lookup doorbells and also compute
  the total requirements and IRQ safety flag used by VFIO.
- also added code in GICv3 ITS to register a global doorbell.

v8 -> v9:
- use a union in irq_chip_msi_doorbell_info + boolean telling whether the
  doorbell is percpu
- decouple irq_data parsing from the actual mapping/unmapping in
  msi_handle_doorbell_mappings
- fix misc style issues

v7 -> v8:
take into account Marc's comments:
- use iommu_msi_msg_pa_to_va with new proto
- change in irq_chip_msi_doorbell_info struct definition:
  prot and size became shared between all doorbells and phys_addr_t __percpu
- cleanups in v2m irqchip
- eventually did not touch MSI_FLAG_IRQ_REMAPPING naming
- On msi_handle_doorbell_mappings, stop on the first irqchip where doorbells
  can be found
- fix resource deallocation on mapping failure in msi_domain_alloc_irqs

v6 -> v7:
- do alloc/map handling on pci_enable_msi and search on msi_(de)domain_activate
- add msi_doorbell_info callback in irq-chip to retrieve the characteristics
  of doorbells

RFC v5 -> patch v6:
- split to ease the review process
- rebase on default iommu domain code (irq_data_to_msi_mapping_domain
  checks IOMMU_DOMAIN_DMA type)
- fix unmap sequence on msi_domain_set_affinity (reported by Marc):
  unmap the previous doorbell when the new one has been mapped & written to
  the device, ie. irq_chip_write_msi_msg.
- "msi: msi_compose wrapper removed" following change above
- add size parameter to iommu_get_reserved_iova API following Marc's request

RFC v4 -> RFC v5:
- take into account Thomas' comments on MSI related patches
  - split "msi: IOMMU map the doorbell address when needed"
  - increase readability and add comments
  - fix style issues
 - split "iommu: Add DOMAIN_ATTR_MSI_MAPPING attribute"
 - platform ITS now advertises IOMMU_CAP_INTR_REMAP
 - fix compilation issue with CONFIG_IOMMU API unset
 - arm-smmu-v3 now advertises DOMAIN_ATTR_MSI_MAPPING

RFC v3 -> v4:
- Move doorbell mapping/unmapping in msi.c
- fix ref count issue on set_affinity: in case of a change in the address
  the previous address is decremented
- doorbell map/unmap now is done on msi composition. Should allow the use
  case for platform MSI controllers
- create dma-reserved-iommu.h/c exposing/implementing a new API dedicated
  to reserved IOVA management (looking like dma-iommu glue)
- series reordering to ease the review:
  - first part is related to IOMMU
  - second related to MSI sub-system
  - third related to VFIO (except arm-smmu IOMMU_CAP_INTR_REMAP removal)
- expose the number of requested IOVA pages through VFIO_IOMMU_GET_INFO
  [this partially addresses Marc's comments on iommu_get/put_single_reserved
   size/alignment problematic - which I did not ignore - but I don't know
   how much I can do at the moment]

RFC v2 -> RFC v3:
- should fix wrong handling of some CONFIG combinations:
  CONFIG_IOVA, CONFIG_IOMMU_API, CONFIG_PCI_MSI_IRQ_DOMAIN
- fix MSI_FLAG_IRQ_REMAPPING setting in GICv3 ITS (although not tested)

PATCH v1 -> RFC v2:
- reverted to RFC since it looks more reasonable ;-) the code is split
  between VFIO, IOMMU, MSI controller and I am not sure I did the right
  choices. Also API need to be further discussed.
- iova API usage in arm-smmu.c.
- MSI controller natively programs the MSI addr with either the PA or IOVA.
  This is not done anymore in vfio-pci driver as suggested by Alex.
- check irq remapping capability of the group

RFC v1 [2] -> PATCH v1:
- use the existing dma map/unmap ioctl interface with a flag to register a
  reserved IOVA range. Use the legacy Rb to store this special vfio_dma.
- a single reserved IOVA contiguous region now is allowed
- use of an RB tree indexed by PA to store allocated reserved slots
- use of a vfio_domain iova_domain to manage iova allocation within the
  window provided by the userspace
- vfio alloc_map/unmap_free take a vfio_group handle
- vfio_group handle is cached in vfio_pci_device
- add ref counting to bindings
- user modality enabled at the end of the series


Eric Auger (10):
  genirq/msi: export msi_get_domain_info
  genirq/msi: msi_compose wrapper
  genirq/irq: introduce msi_doorbell_info
  genirq/msi-doorbell: allow MSI doorbell (un)registration
  genirq/msi-doorbell: msi_doorbell_pages
  genirq/msi-doorbell: msi_doorbell_safe
  irqchip/gicv2m: register the MSI global doorbell
  irqchip/gicv3-its: register the MSI global doorbell
  genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs
  genirq/msi: use the MSI doorbell's IOVA when requested

 drivers/iommu/Kconfig            |   1 +
 drivers/irqchip/irq-gic-v2m.c    |  23 ++++++-
 drivers/irqchip/irq-gic-v3-its.c |  28 ++++++++
 include/linux/irq.h              |  16 ++++-
 include/linux/msi-doorbell.h     |  78 ++++++++++++++++++++++
 kernel/irq/Kconfig               |   4 ++
 kernel/irq/Makefile              |   1 +
 kernel/irq/msi-doorbell.c        | 129 ++++++++++++++++++++++++++++++++++++
 kernel/irq/msi.c                 | 137 +++++++++++++++++++++++++++++++++++----
 9 files changed, 404 insertions(+), 13 deletions(-)
 create mode 100644 include/linux/msi-doorbell.h
 create mode 100644 kernel/irq/msi-doorbell.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 01/10] genirq/msi: export msi_get_domain_info
  2016-07-19 13:02 [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes Eric Auger
@ 2016-07-19 13:02 ` Eric Auger
  2016-07-19 13:02 ` [PATCH v11 02/10] genirq/msi: msi_compose wrapper Eric Auger
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Eric Auger @ 2016-07-19 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

We plan to use msi_get_domain_info in VFIO module so let's export it.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v2 -> v3:
- remove static implementation in case CONFIG_PCI_MSI_IRQ_DOMAIN is not set
---
 kernel/irq/msi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 38e89ce..9b0ba4a 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -400,5 +400,6 @@ struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain)
 {
 	return (struct msi_domain_info *)domain->host_data;
 }
+EXPORT_SYMBOL_GPL(msi_get_domain_info);
 
 #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v11 02/10] genirq/msi: msi_compose wrapper
  2016-07-19 13:02 [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes Eric Auger
  2016-07-19 13:02 ` [PATCH v11 01/10] genirq/msi: export msi_get_domain_info Eric Auger
@ 2016-07-19 13:02 ` Eric Auger
  2016-07-19 13:02 ` [PATCH v11 03/10] genirq/irq: introduce msi_doorbell_info Eric Auger
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Eric Auger @ 2016-07-19 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

Currently the MSI message is composed by directly calling
irq_chip_compose_msi_msg and erased by setting the memory to zero.

On some platforms, we will need to complexify this composition to
properly handle MSI emission through IOMMU. Also we will need to track
when the MSI message is erased.

We propose to introduce a common wrapper for actual composition and
erasure, msi_compose.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v4 -> v5:
- just introduce the msi-compose wrapper without adding new
  functionalities

v3 -> v4:
- that code was formely in irq-gic-common.c
  "irqchip/gicv2m/v3-its-pci-msi: IOMMU map the MSI frame when needed"
  also the [un]mapping was done in irq_write_msi_msg; now done on compose

v2 -> v3:
- protect iova/addr manipulation with CONFIG_ARCH_DMA_ADDR_T_64BIT and
  CONFIG_PHYS_ADDR_T_64BIT
- only expose gic_pci_msi_domain_write_msg in case CONFIG_IOMMU_API &
  CONFIG_PCI_MSI_IRQ_DOMAIN are set.
- gic_set/unset_msi_addr duly become static
---
 kernel/irq/msi.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 9b0ba4a..72bf4d6 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -55,6 +55,19 @@ static inline void irq_chip_write_msi_msg(struct irq_data *data,
 	data->chip->irq_write_msi_msg(data, msg);
 }
 
+static int msi_compose(struct irq_data *irq_data,
+		       struct msi_msg *msg, bool erase)
+{
+	int ret = 0;
+
+	if (erase)
+		memset(msg, 0, sizeof(*msg));
+	else
+		ret = irq_chip_compose_msi_msg(irq_data, msg);
+
+	return ret;
+}
+
 /**
  * msi_domain_set_affinity - Generic affinity setter function for MSI domains
  * @irq_data:	The irq data associated to the interrupt
@@ -73,7 +86,7 @@ int msi_domain_set_affinity(struct irq_data *irq_data,
 
 	ret = parent->chip->irq_set_affinity(parent, mask, force);
 	if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) {
-		BUG_ON(irq_chip_compose_msi_msg(irq_data, &msg));
+		BUG_ON(msi_compose(irq_data, &msg, false));
 		irq_chip_write_msi_msg(irq_data, &msg);
 	}
 
@@ -85,7 +98,7 @@ static void msi_domain_activate(struct irq_domain *domain,
 {
 	struct msi_msg msg;
 
-	BUG_ON(irq_chip_compose_msi_msg(irq_data, &msg));
+	BUG_ON(msi_compose(irq_data, &msg, false));
 	irq_chip_write_msi_msg(irq_data, &msg);
 }
 
@@ -94,7 +107,7 @@ static void msi_domain_deactivate(struct irq_domain *domain,
 {
 	struct msi_msg msg;
 
-	memset(&msg, 0, sizeof(msg));
+	msi_compose(irq_data, &msg, true);
 	irq_chip_write_msi_msg(irq_data, &msg);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v11 03/10] genirq/irq: introduce msi_doorbell_info
  2016-07-19 13:02 [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes Eric Auger
  2016-07-19 13:02 ` [PATCH v11 01/10] genirq/msi: export msi_get_domain_info Eric Auger
  2016-07-19 13:02 ` [PATCH v11 02/10] genirq/msi: msi_compose wrapper Eric Auger
@ 2016-07-19 13:02 ` Eric Auger
  2016-07-19 14:15   ` Thomas Gleixner
  2016-07-19 13:02 ` [PATCH v11 04/10] genirq/msi-doorbell: allow MSI doorbell (un)registration Eric Auger
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Eric Auger @ 2016-07-19 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

From: Eric Auger <eric.auger@linaro.org>

The purpose is to be able to retrieve the MSI doorbells of an irqchip.
This is now needed since on some platforms those doorbells must be
iommu mapped (in case the MSIs transit through an IOMMU that do not
bypass those transactions).

The assumption is there is a maximum of one doorbell region per cpu. The
doorbell can be global or per cpu.

A doorbell region is characterized by its physical address base, size,
IOMMU protection flag and whether it implements IRQ remapping (aka.
IRQ translation). Those characteristics are shared among all doorbells.

irq_chip msi_doorbell_info callback enables to retrieve the doorbells of
the irqchip.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---
v10 -> v11:
- disappeared in V10 and restored now. struct irq_chip_msi_doorbell_info
  is identifical to the one in V10 (union, irq_remapping field).

v7 -> v8:
- size and prot now are shared among all doorbells
- doorbells now directly points to a percpu phys_addr_t

v7: creation
---
 include/linux/irq.h | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 4d758a7..2e355d5 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -312,6 +312,18 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
 	return d->hwirq;
 }
 
+/* Describe all the MSI doorbell regions for an irqchip */
+struct irq_chip_msi_doorbell_info {
+	union {
+		phys_addr_t __percpu *percpu_doorbells;
+		phys_addr_t global_doorbell;
+	};
+	bool doorbell_is_percpu;
+	bool irq_remapping;	/* is irq_remapping implemented? */
+	size_t size;				/* size of each doorbell */
+	int prot;				/* iommu protection flag */
+};
+
 /**
  * struct irq_chip - hardware interrupt chip descriptor
  *
@@ -349,6 +361,7 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
  * @irq_get_irqchip_state:	return the internal state of an interrupt
  * @irq_set_irqchip_state:	set the internal state of a interrupt
  * @irq_set_vcpu_affinity:	optional to target a vCPU in a virtual machine
+ * @msi_doorbell_info:	return the MSI doorbell info
  * @ipi_send_single:	send a single IPI to destination cpus
  * @ipi_send_mask:	send an IPI to destination cpus in cpumask
  * @flags:		chip specific flags
@@ -394,7 +407,8 @@ struct irq_chip {
 	int		(*irq_set_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool state);
 
 	int		(*irq_set_vcpu_affinity)(struct irq_data *data, void *vcpu_info);
-
+	struct irq_chip_msi_doorbell_info *(*msi_doorbell_info)(
+							struct irq_data *data);
 	void		(*ipi_send_single)(struct irq_data *data, unsigned int cpu);
 	void		(*ipi_send_mask)(struct irq_data *data, const struct cpumask *dest);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v11 04/10] genirq/msi-doorbell: allow MSI doorbell (un)registration
  2016-07-19 13:02 [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes Eric Auger
                   ` (2 preceding siblings ...)
  2016-07-19 13:02 ` [PATCH v11 03/10] genirq/irq: introduce msi_doorbell_info Eric Auger
@ 2016-07-19 13:02 ` Eric Auger
  2016-07-19 14:22   ` Thomas Gleixner
  2016-07-19 13:02 ` [PATCH v11 05/10] genirq/msi-doorbell: msi_doorbell_pages Eric Auger
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Eric Auger @ 2016-07-19 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

This new API aims at allowing irqchips to allocate & register
the MSI doorbells likely to be iommu mapped.

Later on, other services will be added allowing the VFIO layer
to query information based on all registered doorbells.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- remove void *chip_data argument from register/unregister function
- remove lookup funtions since we restored the struct irq_chip
  msi_doorbell_info ops to realize this function
- reword commit message and title
---
 drivers/iommu/Kconfig        |  1 +
 include/linux/msi-doorbell.h | 52 +++++++++++++++++++++++++++++++++++++
 kernel/irq/Kconfig           |  4 +++
 kernel/irq/Makefile          |  1 +
 kernel/irq/msi-doorbell.c    | 62 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 120 insertions(+)
 create mode 100644 include/linux/msi-doorbell.h
 create mode 100644 kernel/irq/msi-doorbell.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 5ea1610..ba54146 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -78,6 +78,7 @@ config IOMMU_DMA
 config IOMMU_MSI
 	bool
 	select IOMMU_DMA
+	select MSI_DOORBELL
 
 config FSL_PAMU
 	bool "Freescale IOMMU support"
diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
new file mode 100644
index 0000000..c455e33
--- /dev/null
+++ b/include/linux/msi-doorbell.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2016 Eric Auger
+ *
+ * Eric Auger <eric.auger@linaro.org>
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2.  This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#ifndef _LINUX_MSI_DOORBELL_H
+#define _LINUX_MSI_DOORBELL_H
+
+#include <linux/irq.h>
+
+#ifdef CONFIG_MSI_DOORBELL
+
+/**
+ * msi_doorbell_register_global: allocate and register a global doorbell
+ *
+ * @base: physical base address of the global doorbell
+ * @size: size of the global doorbell
+ * @prot: protection/memory attributes
+ * @irq_remapping: is irq_remapping implemented for this doorbell
+ * returns the newly allocated doorbell info handle
+ */
+struct irq_chip_msi_doorbell_info *
+msi_doorbell_register_global(phys_addr_t base, size_t size,
+			     int prot, bool irq_remapping);
+
+/**
+ * msi_doorbell_unregister: remove a doorbell from the list of registered
+ * doorbells and deallocates its
+ * @db: doorbell info to unregister
+ */
+void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *db);
+
+#else
+
+static inline struct irq_chip_msi_doorbell_info *
+msi_doorbell_register_global(phys_addr_t base, size_t size,
+			     int prot, bool irq_remapping)
+{
+	return ERR_PTR(-ENOENT);
+}
+
+static inline void
+msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *db) {}
+
+#endif /* CONFIG_MSI_DOORBELL */
+
+#endif
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 3bbfd6a..d4faaaa 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -72,6 +72,10 @@ config GENERIC_IRQ_IPI
 config GENERIC_MSI_IRQ
 	bool
 
+# MSI doorbell support (for doorbell IOMMU mapping)
+config MSI_DOORBELL
+	bool
+
 # Generic MSI hierarchical interrupt domain support
 config GENERIC_MSI_IRQ_DOMAIN
 	bool
diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index 2ee42e9..be02dfd 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_GENERIC_IRQ_MIGRATION) += cpuhotplug.o
 obj-$(CONFIG_PM_SLEEP) += pm.o
 obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
 obj-$(CONFIG_GENERIC_IRQ_IPI) += ipi.o
+obj-$(CONFIG_MSI_DOORBELL) += msi-doorbell.o
diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
new file mode 100644
index 0000000..0ff541e
--- /dev/null
+++ b/kernel/irq/msi-doorbell.c
@@ -0,0 +1,62 @@
+/*
+ * linux/kernel/irq/msi-doorbell.c
+ *
+ * Copyright (C) 2016 Linaro
+ * Author: Eric Auger <eric.auger@linaro.org>
+ *
+ * This file is licensed under GPLv2.
+ *
+ * This file contains common code to manage MSI doorbells likely
+ * to be iommu mapped. Typically meaningful on ARM.
+ */
+
+#include <linux/slab.h>
+#include <linux/irq.h>
+#include <linux/msi-doorbell.h>
+
+struct irqchip_doorbell {
+	struct irq_chip_msi_doorbell_info info;
+	struct list_head next;
+};
+
+static LIST_HEAD(irqchip_doorbell_list);
+static DEFINE_MUTEX(irqchip_doorbell_mutex);
+
+struct irq_chip_msi_doorbell_info *
+msi_doorbell_register_global(phys_addr_t base, size_t size,
+			     int prot, bool irq_remapping)
+{
+	struct irqchip_doorbell *db;
+
+	db = kmalloc(sizeof(*db), GFP_KERNEL);
+	if (!db)
+		return ERR_PTR(-ENOMEM);
+
+	db->info.doorbell_is_percpu = false;
+	db->info.global_doorbell = base;
+	db->info.size = size;
+	db->info.prot = prot;
+	db->info.irq_remapping = irq_remapping;
+
+	mutex_lock(&irqchip_doorbell_mutex);
+	list_add(&db->next, &irqchip_doorbell_list);
+	mutex_unlock(&irqchip_doorbell_mutex);
+	return &db->info;
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_register_global);
+
+void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *dbinfo)
+{
+	struct irqchip_doorbell *db, *tmp;
+
+	mutex_lock(&irqchip_doorbell_mutex);
+	list_for_each_entry_safe(db, tmp, &irqchip_doorbell_list, next) {
+		if (dbinfo == &db->info) {
+			list_del(&db->next);
+			kfree(db);
+			break;
+		}
+	}
+	mutex_unlock(&irqchip_doorbell_mutex);
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v11 05/10] genirq/msi-doorbell: msi_doorbell_pages
  2016-07-19 13:02 [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes Eric Auger
                   ` (3 preceding siblings ...)
  2016-07-19 13:02 ` [PATCH v11 04/10] genirq/msi-doorbell: allow MSI doorbell (un)registration Eric Auger
@ 2016-07-19 13:02 ` Eric Auger
  2016-07-19 14:38   ` Thomas Gleixner
  2016-07-19 13:02 ` [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe Eric Auger
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Eric Auger @ 2016-07-19 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

msi_doorbell_pages sum up the number of iommu pages of a given order
requested to map all the registered doorbells. This function will allow
to dimension the intermediate physical address (IPA) aperture requested
to map the MSI doorbells.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10: creation
---
 include/linux/msi-doorbell.h | 14 ++++++++++++
 kernel/irq/msi-doorbell.c    | 52 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
index c455e33..146bfbc 100644
--- a/include/linux/msi-doorbell.h
+++ b/include/linux/msi-doorbell.h
@@ -35,6 +35,14 @@ msi_doorbell_register_global(phys_addr_t base, size_t size,
  */
 void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *db);
 
+/**
+ * msi_doorbell_pages: compute the number of iommu pages of size 1 << order
+ * requested to map all the registered doorbells
+ *
+ * @order: iommu page order
+ */
+int msi_doorbell_pages(unsigned int order);
+
 #else
 
 static inline struct irq_chip_msi_doorbell_info *
@@ -47,6 +55,12 @@ msi_doorbell_register_global(phys_addr_t base, size_t size,
 static inline void
 msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *db) {}
 
+static inline int
+msi_doorbell_pages(unsigned int order)
+{
+	return 0;
+}
+
 #endif /* CONFIG_MSI_DOORBELL */
 
 #endif
diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
index 0ff541e..a5bde37 100644
--- a/kernel/irq/msi-doorbell.c
+++ b/kernel/irq/msi-doorbell.c
@@ -60,3 +60,55 @@ void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *dbinfo)
 	mutex_unlock(&irqchip_doorbell_mutex);
 }
 EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
+
+static int compute_db_mapping_requirements(phys_addr_t addr, size_t size,
+					   unsigned int order)
+{
+	phys_addr_t offset, granule;
+	unsigned int nb_pages;
+
+	granule = (uint64_t)(1 << order);
+	offset = addr & (granule - 1);
+	size = ALIGN(size + offset, granule);
+	nb_pages = size >> order;
+
+	return nb_pages;
+}
+
+static int
+compute_dbinfo_mapping_requirements(struct irq_chip_msi_doorbell_info *dbinfo,
+				    unsigned int order)
+{
+	int ret = 0;
+
+	if (!dbinfo->doorbell_is_percpu) {
+		ret = compute_db_mapping_requirements(dbinfo->global_doorbell,
+						      dbinfo->size, order);
+	} else {
+		phys_addr_t __percpu *pbase;
+		int cpu;
+
+		for_each_possible_cpu(cpu) {
+			pbase = per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
+			ret += compute_db_mapping_requirements(*pbase,
+							       dbinfo->size,
+							       order);
+		}
+	}
+	return ret;
+}
+
+int msi_doorbell_pages(unsigned int order)
+{
+	struct irqchip_doorbell *db;
+	int ret = 0;
+
+	mutex_lock(&irqchip_doorbell_mutex);
+	list_for_each_entry(db, &irqchip_doorbell_list, next) {
+		ret += compute_dbinfo_mapping_requirements(&db->info, order);
+	}
+	mutex_unlock(&irqchip_doorbell_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_pages);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe
  2016-07-19 13:02 [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes Eric Auger
                   ` (4 preceding siblings ...)
  2016-07-19 13:02 ` [PATCH v11 05/10] genirq/msi-doorbell: msi_doorbell_pages Eric Auger
@ 2016-07-19 13:02 ` Eric Auger
  2016-07-20  8:12   ` Thomas Gleixner
  2016-07-19 13:02 ` [PATCH v11 07/10] irqchip/gicv2m: register the MSI global doorbell Eric Auger
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Eric Auger @ 2016-07-19 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

msi_doorbell_safe returns whether all the registered doorbells
implement irq_remapping.

When assigning a PCIe device whose host controller is upstream to
an IOMMU, we currently do not know whether the MSI controller is
upstream or downstream to the IOMMU.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/linux/msi-doorbell.h | 12 ++++++++++++
 kernel/irq/msi-doorbell.c    | 15 +++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/include/linux/msi-doorbell.h b/include/linux/msi-doorbell.h
index 146bfbc..9c30141 100644
--- a/include/linux/msi-doorbell.h
+++ b/include/linux/msi-doorbell.h
@@ -43,6 +43,13 @@ void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *db);
  */
 int msi_doorbell_pages(unsigned int order);
 
+/**
+ * msi_doorbell_safe: return whether all registered doorbells
+ * do implement irq_remapping and are safe to assign (coarse safety
+ * assessment)
+ */
+bool msi_doorbell_safe(void);
+
 #else
 
 static inline struct irq_chip_msi_doorbell_info *
@@ -61,6 +68,11 @@ msi_doorbell_pages(unsigned int order)
 	return 0;
 }
 
+static inline bool
+msi_doorbell_safe(void)
+{
+	return true;
+}
 #endif /* CONFIG_MSI_DOORBELL */
 
 #endif
diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
index a5bde37..fa5b429 100644
--- a/kernel/irq/msi-doorbell.c
+++ b/kernel/irq/msi-doorbell.c
@@ -112,3 +112,18 @@ int msi_doorbell_pages(unsigned int order)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(msi_doorbell_pages);
+
+bool msi_doorbell_safe(void)
+{
+	struct irqchip_doorbell *db;
+	bool irq_remapping = true;
+
+	mutex_lock(&irqchip_doorbell_mutex);
+	list_for_each_entry(db, &irqchip_doorbell_list, next) {
+		irq_remapping &= db->info.irq_remapping;
+	}
+	mutex_unlock(&irqchip_doorbell_mutex);
+
+	return irq_remapping;
+}
+EXPORT_SYMBOL_GPL(msi_doorbell_safe);
\ No newline at end of file
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v11 07/10] irqchip/gicv2m: register the MSI global doorbell
  2016-07-19 13:02 [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes Eric Auger
                   ` (5 preceding siblings ...)
  2016-07-19 13:02 ` [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe Eric Auger
@ 2016-07-19 13:02 ` Eric Auger
  2016-07-19 13:02 ` [PATCH v11 08/10] irqchip/gicv3-its: " Eric Auger
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Eric Auger @ 2016-07-19 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

Use the msi-doorbell API to register the global doorbell
and implement the msi_doorbell_info

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v10 -> v11:
- use the new registration API and re-implement the msi_doorbell_info
  ops

v9 -> v10:
- introduce the registration concept in place of msi_doorbell_info
  callback

v8 -> v9:
- use global_doorbell instead of percpu_doorbells

v7 -> v8:
- gicv2m_msi_doorbell_info does not return a pointer to const
- remove spurious !v2m check
- add IOMMU_MMIO flag

v7: creation
---
 drivers/irqchip/irq-gic-v2m.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index ad0d296..25a32da 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -24,6 +24,8 @@
 #include <linux/of_pci.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/iommu.h>
+#include <linux/msi-doorbell.h>
 
 /*
 * MSI_TYPER:
@@ -68,6 +70,7 @@ struct v2m_data {
 	u32 spi_offset;		/* offset to be subtracted from SPI number */
 	unsigned long *bm;	/* MSI vector bitmap */
 	u32 flags;		/* v2m flags for specific implementation */
+	struct irq_chip_msi_doorbell_info *doorbell_info; /* MSI doorbell */
 };
 
 static void gicv2m_mask_msi_irq(struct irq_data *d)
@@ -109,6 +112,14 @@ static void gicv2m_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 		msg->data -= v2m->spi_offset;
 }
 
+static struct irq_chip_msi_doorbell_info *
+gicv2m_msi_doorbell_info(struct irq_data *data)
+{
+	struct v2m_data *v2m = irq_data_get_irq_chip_data(data);
+
+	return v2m->doorbell_info;
+}
+
 static struct irq_chip gicv2m_irq_chip = {
 	.name			= "GICv2m",
 	.irq_mask		= irq_chip_mask_parent,
@@ -116,6 +127,7 @@ static struct irq_chip gicv2m_irq_chip = {
 	.irq_eoi		= irq_chip_eoi_parent,
 	.irq_set_affinity	= irq_chip_set_affinity_parent,
 	.irq_compose_msi_msg	= gicv2m_compose_msi_msg,
+	.msi_doorbell_info	= gicv2m_msi_doorbell_info,
 };
 
 static int gicv2m_irq_gic_domain_alloc(struct irq_domain *domain,
@@ -366,12 +378,21 @@ static int __init gicv2m_init_one(struct fwnode_handle *fwnode,
 		goto err_iounmap;
 	}
 
+	v2m->doorbell_info =
+		msi_doorbell_register_global(v2m->res.start, sizeof(u32),
+					     IOMMU_WRITE | IOMMU_MMIO, false);
+	if (IS_ERR_OR_NULL(v2m->doorbell_info)) {
+		ret = PTR_ERR(v2m->doorbell_info);
+		goto err_free_bm;
+	}
+
 	list_add_tail(&v2m->entry, &v2m_nodes);
 
 	pr_info("range%pR, SPI[%d:%d]\n", res,
 		v2m->spi_start, (v2m->spi_start + v2m->nr_spis - 1));
 	return 0;
-
+err_free_bm:
+	kfree(v2m->bm);
 err_iounmap:
 	iounmap(v2m->base);
 err_free_v2m:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v11 08/10] irqchip/gicv3-its: register the MSI global doorbell
  2016-07-19 13:02 [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes Eric Auger
                   ` (6 preceding siblings ...)
  2016-07-19 13:02 ` [PATCH v11 07/10] irqchip/gicv2m: register the MSI global doorbell Eric Auger
@ 2016-07-19 13:02 ` Eric Auger
  2016-07-19 13:02 ` [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs Eric Auger
  2016-07-19 13:02 ` [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested Eric Auger
  9 siblings, 0 replies; 30+ messages in thread
From: Eric Auger @ 2016-07-19 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds the registration of the MSI global doorbell in
gicv3-its driver plus the implementation for irq_chip
msi_doorbell_info ops.

This will allow the msi layer to iommu_map this doorbell when
requested.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- adapt to new doorbell registration API and implement msi_doorbell_info
---
 drivers/irqchip/irq-gic-v3-its.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 5eb1f9e..23102e3 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -29,6 +29,8 @@
 #include <linux/of_platform.h>
 #include <linux/percpu.h>
 #include <linux/slab.h>
+#include <linux/iommu.h>
+#include <linux/msi-doorbell.h>
 
 #include <linux/irqchip.h>
 #include <linux/irqchip/arm-gic-v3.h>
@@ -84,6 +86,7 @@ struct its_node {
 	u32			ite_size;
 	u32			device_ids;
 	int			numa_node;
+	struct irq_chip_msi_doorbell_info *doorbell_info;
 };
 
 #define ITS_ITT_ALIGN		SZ_256
@@ -656,6 +659,16 @@ static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg)
 	msg->data		= its_get_event_id(d);
 }
 
+static struct irq_chip_msi_doorbell_info *
+its_msi_doorbell_info(struct irq_data *d)
+{
+	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
+	struct its_node *its = its_dev->its;
+
+	return its->doorbell_info;
+}
+
+
 static struct irq_chip its_irq_chip = {
 	.name			= "ITS",
 	.irq_mask		= its_mask_irq,
@@ -663,6 +676,7 @@ static struct irq_chip its_irq_chip = {
 	.irq_eoi		= irq_chip_eoi_parent,
 	.irq_set_affinity	= its_set_affinity,
 	.irq_compose_msi_msg	= its_irq_compose_msi_msg,
+	.msi_doorbell_info	= its_msi_doorbell_info,
 };
 
 /*
@@ -1607,6 +1621,7 @@ static int __init its_probe(struct device_node *node,
 
 	if (of_property_read_bool(node, "msi-controller")) {
 		struct msi_domain_info *info;
+		phys_addr_t translater;
 
 		info = kzalloc(sizeof(*info), GFP_KERNEL);
 		if (!info) {
@@ -1614,10 +1629,23 @@ static int __init its_probe(struct device_node *node,
 			goto out_free_tables;
 		}
 
+		translater = its->phys_base + GITS_TRANSLATER;
+		its->doorbell_info =
+			msi_doorbell_register_global(translater, sizeof(u32),
+						     IOMMU_WRITE | IOMMU_MMIO,
+						     true);
+
+		if (IS_ERR_OR_NULL(its->doorbell_info))  {
+			kfree(info);
+			goto out_free_tables;
+		}
+
+
 		inner_domain = irq_domain_add_tree(node, &its_domain_ops, its);
 		if (!inner_domain) {
 			err = -ENOMEM;
 			kfree(info);
+			msi_doorbell_unregister_global(its->doorbell_info);
 			goto out_free_tables;
 		}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs
  2016-07-19 13:02 [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes Eric Auger
                   ` (7 preceding siblings ...)
  2016-07-19 13:02 ` [PATCH v11 08/10] irqchip/gicv3-its: " Eric Auger
@ 2016-07-19 13:02 ` Eric Auger
  2016-07-20  9:04   ` Thomas Gleixner
  2016-07-19 13:02 ` [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested Eric Auger
  9 siblings, 1 reply; 30+ messages in thread
From: Eric Auger @ 2016-07-19 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

This patch handles the iommu mapping of MSI doorbells that require to
be mapped in an iommu domain. This happens on msi_domain_alloc/free_irqs
since this is called in code that can sleep (pci_enable/disable_msi):
iommu_map/unmap is not stated as atomic. On msi_domain_(de)activate and
msi_domain_set_affinity, which must be atomic, we just lookup for this
pre-allocated/mapped IOVA.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- restore v9 version based on irq_chip msi_doorbell_info

v9 -> v10:
- use irqchip API to lookup for the chip_data's doorbell

v8 -> v9:
- decouple irq_data parsing from the actual mapping/unmapping

v7 -> v8:
- new percpu pointer type
- exit from the irq domain hierarchy parsing on first map/unmap success
- reset desc->irq to 0 on mapping failure

v7: creation
---
 kernel/irq/msi.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 101 insertions(+), 8 deletions(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 72bf4d6..69b5b19 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -14,6 +14,9 @@
 #include <linux/irq.h>
 #include <linux/irqdomain.h>
 #include <linux/msi.h>
+#include <linux/msi-iommu.h>
+#include <linux/iommu.h>
+#include <linux/msi-doorbell.h>
 
 /* Temparory solution for building, will be removed later */
 #include <linux/pci.h>
@@ -322,6 +325,74 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev,
 }
 
 /**
+ * msi_handle_doorbell_mappings: in case the irq data corresponds to an
+ * MSI that requires iommu mapping, traverse the irq domain hierarchy
+ * to retrieve the doorbells to handle and iommu_map/unmap them according
+ * to @map boolean.
+ *
+ * @data: irq data handle
+ * @map: mapping if true, unmapping if false
+ */
+static int msi_handle_doorbell_mappings(struct irq_data *data, bool map)
+{
+	const struct irq_chip_msi_doorbell_info *dbinfo;
+	struct iommu_domain *domain;
+	struct irq_chip *chip;
+	struct device *dev;
+	dma_addr_t iova;
+	int ret = 0, cpu;
+
+	while (data) {
+		dev = msi_desc_to_dev(irq_data_get_msi_desc(data));
+		domain = iommu_msi_domain(dev);
+		if (domain) {
+			chip = irq_data_get_irq_chip(data);
+			if (chip->msi_doorbell_info)
+				break;
+		}
+		data = data->parent_data;
+	}
+
+	if (!data)
+		return 0;
+
+	dbinfo = chip->msi_doorbell_info(data);
+	if (!dbinfo)
+		return -EINVAL;
+
+	if (!dbinfo->doorbell_is_percpu) {
+		if (!map) {
+			iommu_msi_put_doorbell_iova(domain,
+						    dbinfo->global_doorbell);
+			return 0;
+		}
+		return iommu_msi_get_doorbell_iova(domain,
+						   dbinfo->global_doorbell,
+						   dbinfo->size, dbinfo->prot,
+						   &iova);
+	}
+
+	/* percpu doorbells */
+	for_each_possible_cpu(cpu) {
+		phys_addr_t __percpu *db_addr =
+			per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
+
+		if (!map) {
+			iommu_msi_put_doorbell_iova(domain, *db_addr);
+		} else {
+
+			ret = iommu_msi_get_doorbell_iova(domain, *db_addr,
+							  dbinfo->size,
+							  dbinfo->prot, &iova);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+/**
  * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
  * @domain:	The domain to allocate from
  * @dev:	Pointer to device struct of the device for which the interrupts
@@ -352,17 +423,29 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
 
 		virq = __irq_domain_alloc_irqs(domain, virq, desc->nvec_used,
 					       dev_to_node(dev), &arg, false);
-		if (virq < 0) {
-			ret = -ENOSPC;
-			if (ops->handle_error)
-				ret = ops->handle_error(domain, desc, ret);
-			if (ops->msi_finish)
-				ops->msi_finish(&arg, ret);
-			return ret;
-		}
+		if (virq < 0)
+			goto error;
 
 		for (i = 0; i < desc->nvec_used; i++)
 			irq_set_msi_desc_off(virq, i, desc);
+
+		for (i = 0; i < desc->nvec_used; i++) {
+			struct irq_data *d = irq_get_irq_data(virq + i);
+
+			ret = msi_handle_doorbell_mappings(d, true);
+			if (ret)
+				break;
+		}
+		if (ret) {
+			for (; i >= 0; i--) {
+				struct irq_data *d = irq_get_irq_data(virq + i);
+
+				msi_handle_doorbell_mappings(d, false);
+			}
+			irq_domain_free_irqs(virq, desc->nvec_used);
+			desc->irq = 0;
+			goto error;
+		}
 	}
 
 	if (ops->msi_finish)
@@ -377,6 +460,13 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
 	}
 
 	return 0;
+error:
+	ret = -ENOSPC;
+	if (ops->handle_error)
+		ret = ops->handle_error(domain, desc, ret);
+	if (ops->msi_finish)
+		ops->msi_finish(&arg, ret);
+	return ret;
 }
 
 /**
@@ -396,6 +486,9 @@ void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
 		 * entry. If that's the case, don't do anything.
 		 */
 		if (desc->irq) {
+			msi_handle_doorbell_mappings(
+				irq_get_irq_data(desc->irq),
+				false);
 			irq_domain_free_irqs(desc->irq, desc->nvec_used);
 			desc->irq = 0;
 		}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested
  2016-07-19 13:02 [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes Eric Auger
                   ` (8 preceding siblings ...)
  2016-07-19 13:02 ` [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs Eric Auger
@ 2016-07-19 13:02 ` Eric Auger
  2016-07-20  9:09   ` Thomas Gleixner
  9 siblings, 1 reply; 30+ messages in thread
From: Eric Auger @ 2016-07-19 13:02 UTC (permalink / raw)
  To: linux-arm-kernel

On MSI message composition we now use the MSI doorbell's IOVA in
place of the doorbell's PA in case the device is upstream to an
IOMMU that requires MSI addresses to be mapped. The doorbell's
allocation and mapping happened on an early stage (pci_enable_msi).

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v8 -> v9:
- Braces on both sides of the 'else' in msi_compose

v7 -> v8:
- use iommu_msi_msg_pa_to_va
- add WARN_ON

v6 -> v7:
- allocation/mapping is done at an earlier stage. We now just perform
  the iova lookup. So it is safe now to be called in a code that cannot
  sleep. iommu_msi_set_doorbell_iova is moved in the dma-reserved-iommu
  API: I think it cleans things up with respect to various #ifdef CONFIGS.

v5:
- use macros to increase the readability
- add comments
- fix a typo that caused a compilation error if CONFIG_IOMMU_API
  is not set
---
 kernel/irq/msi.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 69b5b19..e375544 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
 {
 	int ret = 0;
 
-	if (erase)
+	if (erase) {
 		memset(msg, 0, sizeof(*msg));
-	else
+	} else {
+		struct device *dev;
+
 		ret = irq_chip_compose_msi_msg(irq_data, msg);
+		if (ret)
+			return ret;
+
+		dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
+		WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));
+	}
 
 	return ret;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v11 03/10] genirq/irq: introduce msi_doorbell_info
  2016-07-19 13:02 ` [PATCH v11 03/10] genirq/irq: introduce msi_doorbell_info Eric Auger
@ 2016-07-19 14:15   ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-07-19 14:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 19 Jul 2016, Eric Auger wrote:
>  
> +/* Describe all the MSI doorbell regions for an irqchip */
> +struct irq_chip_msi_doorbell_info {
> +	union {
> +		phys_addr_t __percpu *percpu_doorbells;
> +		phys_addr_t global_doorbell;
> +	};
> +	bool doorbell_is_percpu;
> +	bool irq_remapping;	/* is irq_remapping implemented? */

Please do not use tail comments. Use proper kernel doc for documentation.

> +	size_t size;				/* size of each doorbell */
> +	int prot;				/* iommu protection flag */

Please align the members proper

	union {
		phys_addr_t __percpu	*percpu_doorbells;
		phys_addr_t 		global_doorbell;
	};
	bool	doorbell_is_percpu;
	bool	irq_remapping;

> +};
> +
>  /**
>   * struct irq_chip - hardware interrupt chip descriptor
>   *
> @@ -349,6 +361,7 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
>   * @irq_get_irqchip_state:	return the internal state of an interrupt
>   * @irq_set_irqchip_state:	set the internal state of a interrupt
>   * @irq_set_vcpu_affinity:	optional to target a vCPU in a virtual machine
> + * @msi_doorbell_info:	return the MSI doorbell info
>   * @ipi_send_single:	send a single IPI to destination cpus
>   * @ipi_send_mask:	send an IPI to destination cpus in cpumask
>   * @flags:		chip specific flags
> @@ -394,7 +407,8 @@ struct irq_chip {
>  	int		(*irq_set_irqchip_state)(struct irq_data *data, enum irqchip_irq_state which, bool state);
>  
>  	int		(*irq_set_vcpu_affinity)(struct irq_data *data, void *vcpu_info);
> -
> +	struct irq_chip_msi_doorbell_info *(*msi_doorbell_info)(

  	irq_get_msi_doorbell_info or msi_get_doorbell_info please

> +							struct irq_data *data);

No need for a line break here. Please keep it as a single line.

Thanks

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 04/10] genirq/msi-doorbell: allow MSI doorbell (un)registration
  2016-07-19 13:02 ` [PATCH v11 04/10] genirq/msi-doorbell: allow MSI doorbell (un)registration Eric Auger
@ 2016-07-19 14:22   ` Thomas Gleixner
  2016-07-20  7:50     ` Auger Eric
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-07-19 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 19 Jul 2016, Eric Auger wrote:
> +
> +#include <linux/slab.h>
> +#include <linux/irq.h>
> +#include <linux/msi-doorbell.h>
> +
> +struct irqchip_doorbell {
> +	struct irq_chip_msi_doorbell_info info;
> +	struct list_head next;

Again, please align the struct members.

> +};
> +
> +static LIST_HEAD(irqchip_doorbell_list);
> +static DEFINE_MUTEX(irqchip_doorbell_mutex);
> +
> +struct irq_chip_msi_doorbell_info *
> +msi_doorbell_register_global(phys_addr_t base, size_t size,
> +			     int prot, bool irq_remapping)
> +{
> +	struct irqchip_doorbell *db;
> +
> +	db = kmalloc(sizeof(*db), GFP_KERNEL);
> +	if (!db)
> +		return ERR_PTR(-ENOMEM);
> +
> +	db->info.doorbell_is_percpu = false;

Please use kzalloc and get rid of zero initialization. If you add stuff to the
struct then initialization will be automatically 0.

> +void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *dbinfo)
> +{
> +	struct irqchip_doorbell *db, *tmp;
> +
> +	mutex_lock(&irqchip_doorbell_mutex);
> +	list_for_each_entry_safe(db, tmp, &irqchip_doorbell_list, next) {

Why do you need that iterator? 

    db = container_of(dbinfo, struct ....., info);

Hmm?

> +		if (dbinfo == &db->info) {
> +			list_del(&db->next);
> +			kfree(db);

Please move the kfree() outside of the lock region. It does not matter much
here, but we really should stop doing random crap in locked regions.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 05/10] genirq/msi-doorbell: msi_doorbell_pages
  2016-07-19 13:02 ` [PATCH v11 05/10] genirq/msi-doorbell: msi_doorbell_pages Eric Auger
@ 2016-07-19 14:38   ` Thomas Gleixner
  2016-07-20  7:50     ` Auger Eric
  2016-07-21 13:38     ` Auger Eric
  0 siblings, 2 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-07-19 14:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 19 Jul 2016, Eric Auger wrote:
> msi_doorbell_pages sum up the number of iommu pages of a given order

adding () to the function name would make it immediately clear that
msi_doorbell_pages is a function.

> +/**
> + * msi_doorbell_pages: compute the number of iommu pages of size 1 << order
> + * requested to map all the registered doorbells
> + *
> + * @order: iommu page order
> + */

Why are you adding the kernel doc to the header and not to the implementation?

> +int msi_doorbell_pages(unsigned int order);
> +
>  #else
>  
>  static inline struct irq_chip_msi_doorbell_info *
> @@ -47,6 +55,12 @@ msi_doorbell_register_global(phys_addr_t base, size_t size,
>  static inline void
>  msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *db) {}
>  
> +static inline int
> +msi_doorbell_pages(unsigned int order)

What's the point of this line break? 

> +{
> +	return 0;
> +}
> +
>  #endif /* CONFIG_MSI_DOORBELL */
>  
>  #endif
> diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
> index 0ff541e..a5bde37 100644
> --- a/kernel/irq/msi-doorbell.c
> +++ b/kernel/irq/msi-doorbell.c
> @@ -60,3 +60,55 @@ void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *dbinfo)
>  	mutex_unlock(&irqchip_doorbell_mutex);
>  }
>  EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
> +
> +static int compute_db_mapping_requirements(phys_addr_t addr, size_t size,
> +					   unsigned int order)
> +{
> +	phys_addr_t offset, granule;
> +	unsigned int nb_pages;
> +
> +	granule = (uint64_t)(1 << order);
> +	offset = addr & (granule - 1);
> +	size = ALIGN(size + offset, granule);
> +	nb_pages = size >> order;
> +
> +	return nb_pages;
> +}
> +
> +static int
> +compute_dbinfo_mapping_requirements(struct irq_chip_msi_doorbell_info *dbinfo,
> +				    unsigned int order)

I'm sure you can find even longer function names which require more line
breaks.

> +{
> +	int ret = 0;
> +
> +	if (!dbinfo->doorbell_is_percpu) {
> +		ret = compute_db_mapping_requirements(dbinfo->global_doorbell,
> +						      dbinfo->size, order);
> +	} else {
> +		phys_addr_t __percpu *pbase;
> +		int cpu;
> +
> +		for_each_possible_cpu(cpu) {
> +			pbase = per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
> +			ret += compute_db_mapping_requirements(*pbase,
> +							       dbinfo->size,
> +							       order);
> +		}
> +	}
> +	return ret;
> +}
> +
> +int msi_doorbell_pages(unsigned int order)
> +{
> +	struct irqchip_doorbell *db;
> +	int ret = 0;
> +
> +	mutex_lock(&irqchip_doorbell_mutex);
> +	list_for_each_entry(db, &irqchip_doorbell_list, next) {

Pointless braces

> +		ret += compute_dbinfo_mapping_requirements(&db->info, order);
> +	}
> +	mutex_unlock(&irqchip_doorbell_mutex);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(msi_doorbell_pages);

So here is a general rant about your naming choices.

   struct irqchip_doorbell
   struct irq_chip_msi_doorbell_info

   struct irq_chip {
   	  ....	  *(*msi_doorbell_info);
   }

   irqchip_doorbell_mutex

   msi_doorbell_register_global
   msi_doorbell_unregister_global

   msi_doorbell_pages

This really sucks. Your public functions start sensibly with msi_doorbell.

Though what is the _global postfix for the register/unregister functions for?
Are there _private functions in the pipeline?

msi_doorbell_pages() is not telling me what it does. msi_calc_doorbell_pages()
would describe it right away.

You doorbell info structure can really do with:

    struct msi_doorbell_info;

And the wrapper struct around it is fine with:

    struct msi_doorbell;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 04/10] genirq/msi-doorbell: allow MSI doorbell (un)registration
  2016-07-19 14:22   ` Thomas Gleixner
@ 2016-07-20  7:50     ` Auger Eric
  0 siblings, 0 replies; 30+ messages in thread
From: Auger Eric @ 2016-07-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,
On 19/07/2016 16:22, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
>> +
>> +#include <linux/slab.h>
>> +#include <linux/irq.h>
>> +#include <linux/msi-doorbell.h>
>> +
>> +struct irqchip_doorbell {
>> +	struct irq_chip_msi_doorbell_info info;
>> +	struct list_head next;
> 
> Again, please align the struct members.
> 
>> +};
>> +
>> +static LIST_HEAD(irqchip_doorbell_list);
>> +static DEFINE_MUTEX(irqchip_doorbell_mutex);
>> +
>> +struct irq_chip_msi_doorbell_info *
>> +msi_doorbell_register_global(phys_addr_t base, size_t size,
>> +			     int prot, bool irq_remapping)
>> +{
>> +	struct irqchip_doorbell *db;
>> +
>> +	db = kmalloc(sizeof(*db), GFP_KERNEL);
>> +	if (!db)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	db->info.doorbell_is_percpu = false;
> 
> Please use kzalloc and get rid of zero initialization. If you add stuff to the
> struct then initialization will be automatically 0.
OK
> 
>> +void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *dbinfo)
>> +{
>> +	struct irqchip_doorbell *db, *tmp;
>> +
>> +	mutex_lock(&irqchip_doorbell_mutex);
>> +	list_for_each_entry_safe(db, tmp, &irqchip_doorbell_list, next) {
> 
> Why do you need that iterator? 
> 
>     db = container_of(dbinfo, struct ....., info);
> 
> Hmm?

definitively
> 
>> +		if (dbinfo == &db->info) {
>> +			list_del(&db->next);
>> +			kfree(db);
> 
> Please move the kfree() outside of the lock region. It does not matter much
> here, but we really should stop doing random crap in locked regions.
OK

Thanks

Eric
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 05/10] genirq/msi-doorbell: msi_doorbell_pages
  2016-07-19 14:38   ` Thomas Gleixner
@ 2016-07-20  7:50     ` Auger Eric
  2016-07-21 13:38     ` Auger Eric
  1 sibling, 0 replies; 30+ messages in thread
From: Auger Eric @ 2016-07-20  7:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,
On 19/07/2016 16:38, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
>> msi_doorbell_pages sum up the number of iommu pages of a given order
> 
> adding () to the function name would make it immediately clear that
> msi_doorbell_pages is a function.
> 
>> +/**
>> + * msi_doorbell_pages: compute the number of iommu pages of size 1 << order
>> + * requested to map all the registered doorbells
>> + *
>> + * @order: iommu page order
>> + */
> 
> Why are you adding the kernel doc to the header and not to the implementation?
> 
>> +int msi_doorbell_pages(unsigned int order);
>> +
>>  #else
>>  
>>  static inline struct irq_chip_msi_doorbell_info *
>> @@ -47,6 +55,12 @@ msi_doorbell_register_global(phys_addr_t base, size_t size,
>>  static inline void
>>  msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *db) {}
>>  
>> +static inline int
>> +msi_doorbell_pages(unsigned int order)
> 
> What's the point of this line break?

> 
>> +{
>> +	return 0;
>> +}
>> +
>>  #endif /* CONFIG_MSI_DOORBELL */
>>  
>>  #endif
>> diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
>> index 0ff541e..a5bde37 100644
>> --- a/kernel/irq/msi-doorbell.c
>> +++ b/kernel/irq/msi-doorbell.c
>> @@ -60,3 +60,55 @@ void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *dbinfo)
>>  	mutex_unlock(&irqchip_doorbell_mutex);
>>  }
>>  EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
>> +
>> +static int compute_db_mapping_requirements(phys_addr_t addr, size_t size,
>> +					   unsigned int order)
>> +{
>> +	phys_addr_t offset, granule;
>> +	unsigned int nb_pages;
>> +
>> +	granule = (uint64_t)(1 << order);
>> +	offset = addr & (granule - 1);
>> +	size = ALIGN(size + offset, granule);
>> +	nb_pages = size >> order;
>> +
>> +	return nb_pages;
>> +}
>> +
>> +static int
>> +compute_dbinfo_mapping_requirements(struct irq_chip_msi_doorbell_info *dbinfo,
>> +				    unsigned int order)
> 
> I'm sure you can find even longer function names which require more line
> breaks.
> 
>> +{
>> +	int ret = 0;
>> +
>> +	if (!dbinfo->doorbell_is_percpu) {
>> +		ret = compute_db_mapping_requirements(dbinfo->global_doorbell,
>> +						      dbinfo->size, order);
>> +	} else {
>> +		phys_addr_t __percpu *pbase;
>> +		int cpu;
>> +
>> +		for_each_possible_cpu(cpu) {
>> +			pbase = per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
>> +			ret += compute_db_mapping_requirements(*pbase,
>> +							       dbinfo->size,
>> +							       order);
>> +		}
>> +	}
>> +	return ret;
>> +}
>> +
>> +int msi_doorbell_pages(unsigned int order)
>> +{
>> +	struct irqchip_doorbell *db;
>> +	int ret = 0;
>> +
>> +	mutex_lock(&irqchip_doorbell_mutex);
>> +	list_for_each_entry(db, &irqchip_doorbell_list, next) {
> 
> Pointless braces
> 
>> +		ret += compute_dbinfo_mapping_requirements(&db->info, order);
>> +	}
>> +	mutex_unlock(&irqchip_doorbell_mutex);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_pages);
> 
> So here is a general rant about your naming choices.
> 
>    struct irqchip_doorbell
>    struct irq_chip_msi_doorbell_info
> 
>    struct irq_chip {
>    	  ....	  *(*msi_doorbell_info);
>    }
> 
>    irqchip_doorbell_mutex
> 
>    msi_doorbell_register_global
>    msi_doorbell_unregister_global
> 
>    msi_doorbell_pages
> 
> This really sucks. Your public functions start sensibly with msi_doorbell.
> 
> Though what is the _global postfix for the register/unregister functions for?
> Are there _private functions in the pipeline?
global is to be opposed to per-cpu (doorbell). Currently gicv2m and
gicv3-its expose a single "global" doorbell and I have not yet coped
with irqchips exposing per-cpu doorbells.
> 
> msi_doorbell_pages() is not telling me what it does. msi_calc_doorbell_pages()
> would describe it right away.
> 
> You doorbell info structure can really do with:
> 
>     struct msi_doorbell_info;
> 
> And the wrapper struct around it is fine with:
> 
>     struct msi_doorbell;
Yes you're right I will revisit the names and fix all style issues you
reported.

Thank you for your time

Eric
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe
  2016-07-19 13:02 ` [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe Eric Auger
@ 2016-07-20  8:12   ` Thomas Gleixner
  2016-07-21 13:38     ` Auger Eric
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-07-20  8:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 19 Jul 2016, Eric Auger wrote:
> +bool msi_doorbell_safe(void)
> +{
> +	struct irqchip_doorbell *db;
> +	bool irq_remapping = true;
> +
> +	mutex_lock(&irqchip_doorbell_mutex);
> +	list_for_each_entry(db, &irqchip_doorbell_list, next) {
> +		irq_remapping &= db->info.irq_remapping;

db->info.irq_remapping is set in msi_doorbell_register(). So you can keep book
about that there. No need to iterate here.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs
  2016-07-19 13:02 ` [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs Eric Auger
@ 2016-07-20  9:04   ` Thomas Gleixner
  2016-07-25 16:21     ` Auger Eric
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-07-20  9:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 19 Jul 2016, Eric Auger wrote:
>  /**
> + * msi_handle_doorbell_mappings: in case the irq data corresponds to an
> + * MSI that requires iommu mapping, traverse the irq domain hierarchy
> + * to retrieve the doorbells to handle and iommu_map/unmap them according
> + * to @map boolean.
> + *
> + * @data: irq data handle
> + * @map: mapping if true, unmapping if false
> + */


Please run that through the kernel doc generator. It does not work that way.

The format is:

/**
 * function_name - Short function description    
 * @arg1:	Description of arg1
 * @argument2:	Description of argument2
 *
 * Long explanation including documentation of the return values.
 */

> +static int msi_handle_doorbell_mappings(struct irq_data *data, bool map)
> +{
> +	const struct irq_chip_msi_doorbell_info *dbinfo;
> +	struct iommu_domain *domain;
> +	struct irq_chip *chip;
> +	struct device *dev;
> +	dma_addr_t iova;
> +	int ret = 0, cpu;
> +
> +	while (data) {
> +		dev = msi_desc_to_dev(irq_data_get_msi_desc(data));
> +		domain = iommu_msi_domain(dev);
> +		if (domain) {
> +			chip = irq_data_get_irq_chip(data);
> +			if (chip->msi_doorbell_info)
> +				break;
> +		}
> +		data = data->parent_data;
> +	}

Please split that out into a seperate function

struct irq_data *msi_get_doorbell_info(data)
{
	.....
		if (chip->msi_doorbell_info)
			return chip->msi_get_doorbell_info(data);
	}
	return NULL;
}

       info = msi_get_doorbell_info(data);
       .....

> +	if (!data)
> +		return 0;
> +
> +	dbinfo = chip->msi_doorbell_info(data);
> +	if (!dbinfo)
> +		return -EINVAL;
> +
> +	if (!dbinfo->doorbell_is_percpu) {
> +		if (!map) {
> +			iommu_msi_put_doorbell_iova(domain,
> +						    dbinfo->global_doorbell);
> +			return 0;
> +		}
> +		return iommu_msi_get_doorbell_iova(domain,
> +						   dbinfo->global_doorbell,
> +						   dbinfo->size, dbinfo->prot,
> +						   &iova);
> +	}

You can spare an indentation level with a helper function

    	if (!dbinfo->doorbell_is_percpu)
		return msi_map_global_doorbell(domain, dbinfo);

> +
> +	/* percpu doorbells */
> +	for_each_possible_cpu(cpu) {
> +		phys_addr_t __percpu *db_addr =
> +			per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
> +
> +		if (!map) {
> +			iommu_msi_put_doorbell_iova(domain, *db_addr);
> +		} else {
> +
> +			ret = iommu_msi_get_doorbell_iova(domain, *db_addr,
> +							  dbinfo->size,
> +							  dbinfo->prot, &iova);
> +			if (ret)
> +				return ret;
> +		}
> +	}

Same here:

	for_each_possible_cpu(cpu) {
		ret = msi_map_percpu_doorbell(domain, cpu);
		if (ret)
			return ret;
	}
     	return 0;
     
Hmm?

> +
> +	return 0;
> +}
> +
> +/**
>   * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
>   * @domain:	The domain to allocate from
>   * @dev:	Pointer to device struct of the device for which the interrupts
> @@ -352,17 +423,29 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
>  
>  		virq = __irq_domain_alloc_irqs(domain, virq, desc->nvec_used,
>  					       dev_to_node(dev), &arg, false);
> -		if (virq < 0) {
> -			ret = -ENOSPC;
> -			if (ops->handle_error)
> -				ret = ops->handle_error(domain, desc, ret);
> -			if (ops->msi_finish)
> -				ops->msi_finish(&arg, ret);
> -			return ret;
> -		}
> +		if (virq < 0)
> +			goto error;
>  
>  		for (i = 0; i < desc->nvec_used; i++)
>  			irq_set_msi_desc_off(virq, i, desc);
> +
> +		for (i = 0; i < desc->nvec_used; i++) {
> +			struct irq_data *d = irq_get_irq_data(virq + i);
> +
> +			ret = msi_handle_doorbell_mappings(d, true);
> +			if (ret)
> +				break;
> +		}
> +		if (ret) {
> +			for (; i >= 0; i--) {
> +				struct irq_data *d = irq_get_irq_data(virq + i);
> +
> +				msi_handle_doorbell_mappings(d, false);
> +			}
> +			irq_domain_free_irqs(virq, desc->nvec_used);
> +			desc->irq = 0;
> +			goto error;

How is that supposed to work? You clear desc->irq and then you call
ops->handle_error.

Why are you adding this extra stuff here? Look at the call sites of
msi_domain_alloc_irqs(). All of them use msi_domain_free_irqs() in case of
error. There is no reason why you can't do the same....

>  /**
> @@ -396,6 +486,9 @@ void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
>  		 * entry. If that's the case, don't do anything.
>  		 */
>  		if (desc->irq) {
> +			msi_handle_doorbell_mappings(
> +				irq_get_irq_data(desc->irq),
> +				false);
>  			irq_domain_free_irqs(desc->irq, desc->nvec_used);
>  			desc->irq = 0;

Can you please restructure the code so it reads

		if (desc->irq)
			continue;

		msi_handle_doorbell_mappings(irq_get_irq_data(desc->irq),
					     false);	
		irq_domain_free_irqs(desc->irq, desc->nvec_used);
		desc->irq = 0;

Just blindly whacking stuff into the 80 char limit is not helping readability.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested
  2016-07-19 13:02 ` [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested Eric Auger
@ 2016-07-20  9:09   ` Thomas Gleixner
  2016-07-25 16:31     ` Auger Eric
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-07-20  9:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 19 Jul 2016, Eric Auger wrote:

First of all - valid for all patches:

Subject: sys/subsys: Sentence starts with an uppercase letter

Now for this particular one:

genirq/msi: use the MSI doorbell's IOVA when requested

> On MSI message composition we now use the MSI doorbell's IOVA in
> place of the doorbell's PA in case the device is upstream to an
> IOMMU that requires MSI addresses to be mapped. The doorbell's
> allocation and mapping happened on an early stage (pci_enable_msi).

This changelog is completely useless. At least I cannot figure out what that
patch actually does. And the implementation is not self explaining either.
 
> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
>  {
>  	int ret = 0;
>  
> -	if (erase)
> +	if (erase) {
>  		memset(msg, 0, sizeof(*msg));
> -	else
> +	} else {
> +		struct device *dev;
> +
>  		ret = irq_chip_compose_msi_msg(irq_data, msg);
> +		if (ret)
> +			return ret;
> +
> +		dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
> +		WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));

What the heck is this call doing? And why is there only a WARN_ON and not a
proper error return code handling?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 05/10] genirq/msi-doorbell: msi_doorbell_pages
  2016-07-19 14:38   ` Thomas Gleixner
  2016-07-20  7:50     ` Auger Eric
@ 2016-07-21 13:38     ` Auger Eric
  1 sibling, 0 replies; 30+ messages in thread
From: Auger Eric @ 2016-07-21 13:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On 19/07/2016 16:38, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
>> msi_doorbell_pages sum up the number of iommu pages of a given order
> 
> adding () to the function name would make it immediately clear that
> msi_doorbell_pages is a function.
> 
>> +/**
>> + * msi_doorbell_pages: compute the number of iommu pages of size 1 << order
>> + * requested to map all the registered doorbells
>> + *
>> + * @order: iommu page order
>> + */
> 
> Why are you adding the kernel doc to the header and not to the implementation?

I am confused by this comment. I was told in the past that it was better
to put the comments in the API header. On your side do you want me to
move all function kernel-doc comments to the implementation.

Looking at kernel-doc-nano-HOWTO.txt, I was not able to find any
indication about the best choice.

I will now run the kernel-doc script to check the conformance of my
comments.

Thank you for your patience!

Best Regards

Eric
> 
>> +int msi_doorbell_pages(unsigned int order);
>> +
>>  #else
>>  
>>  static inline struct irq_chip_msi_doorbell_info *
>> @@ -47,6 +55,12 @@ msi_doorbell_register_global(phys_addr_t base, size_t size,
>>  static inline void
>>  msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *db) {}
>>  
>> +static inline int
>> +msi_doorbell_pages(unsigned int order)
> 
> What's the point of this line break? 
> 
>> +{
>> +	return 0;
>> +}
>> +
>>  #endif /* CONFIG_MSI_DOORBELL */
>>  
>>  #endif
>> diff --git a/kernel/irq/msi-doorbell.c b/kernel/irq/msi-doorbell.c
>> index 0ff541e..a5bde37 100644
>> --- a/kernel/irq/msi-doorbell.c
>> +++ b/kernel/irq/msi-doorbell.c
>> @@ -60,3 +60,55 @@ void msi_doorbell_unregister_global(struct irq_chip_msi_doorbell_info *dbinfo)
>>  	mutex_unlock(&irqchip_doorbell_mutex);
>>  }
>>  EXPORT_SYMBOL_GPL(msi_doorbell_unregister_global);
>> +
>> +static int compute_db_mapping_requirements(phys_addr_t addr, size_t size,
>> +					   unsigned int order)
>> +{
>> +	phys_addr_t offset, granule;
>> +	unsigned int nb_pages;
>> +
>> +	granule = (uint64_t)(1 << order);
>> +	offset = addr & (granule - 1);
>> +	size = ALIGN(size + offset, granule);
>> +	nb_pages = size >> order;
>> +
>> +	return nb_pages;
>> +}
>> +
>> +static int
>> +compute_dbinfo_mapping_requirements(struct irq_chip_msi_doorbell_info *dbinfo,
>> +				    unsigned int order)
> 
> I'm sure you can find even longer function names which require more line
> breaks.
> 
>> +{
>> +	int ret = 0;
>> +
>> +	if (!dbinfo->doorbell_is_percpu) {
>> +		ret = compute_db_mapping_requirements(dbinfo->global_doorbell,
>> +						      dbinfo->size, order);
>> +	} else {
>> +		phys_addr_t __percpu *pbase;
>> +		int cpu;
>> +
>> +		for_each_possible_cpu(cpu) {
>> +			pbase = per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
>> +			ret += compute_db_mapping_requirements(*pbase,
>> +							       dbinfo->size,
>> +							       order);
>> +		}
>> +	}
>> +	return ret;
>> +}
>> +
>> +int msi_doorbell_pages(unsigned int order)
>> +{
>> +	struct irqchip_doorbell *db;
>> +	int ret = 0;
>> +
>> +	mutex_lock(&irqchip_doorbell_mutex);
>> +	list_for_each_entry(db, &irqchip_doorbell_list, next) {
> 
> Pointless braces
> 
>> +		ret += compute_dbinfo_mapping_requirements(&db->info, order);
>> +	}
>> +	mutex_unlock(&irqchip_doorbell_mutex);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(msi_doorbell_pages);
> 
> So here is a general rant about your naming choices.
> 
>    struct irqchip_doorbell
>    struct irq_chip_msi_doorbell_info
> 
>    struct irq_chip {
>    	  ....	  *(*msi_doorbell_info);
>    }
> 
>    irqchip_doorbell_mutex
> 
>    msi_doorbell_register_global
>    msi_doorbell_unregister_global
> 
>    msi_doorbell_pages
> 
> This really sucks. Your public functions start sensibly with msi_doorbell.
> 
> Though what is the _global postfix for the register/unregister functions for?
> Are there _private functions in the pipeline?
> 
> msi_doorbell_pages() is not telling me what it does. msi_calc_doorbell_pages()
> would describe it right away.
> 
> You doorbell info structure can really do with:
> 
>     struct msi_doorbell_info;
> 
> And the wrapper struct around it is fine with:
> 
>     struct msi_doorbell;
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe
  2016-07-20  8:12   ` Thomas Gleixner
@ 2016-07-21 13:38     ` Auger Eric
  2016-07-22 12:44       ` Thomas Gleixner
  0 siblings, 1 reply; 30+ messages in thread
From: Auger Eric @ 2016-07-21 13:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 20/07/2016 10:12, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
>> +bool msi_doorbell_safe(void)
>> +{
>> +	struct irqchip_doorbell *db;
>> +	bool irq_remapping = true;
>> +
>> +	mutex_lock(&irqchip_doorbell_mutex);
>> +	list_for_each_entry(db, &irqchip_doorbell_list, next) {
>> +		irq_remapping &= db->info.irq_remapping;
> 
> db->info.irq_remapping is set in msi_doorbell_register(). So you can keep book
> about that there. No need to iterate here.
Yes makes sense to store the info at registration time. Currently this
function is not in any fast path but that's cleaner from a general
perspective. I will need to do such iteration at un-registration though.

Thanks

Eric
> 
> Thanks,
> 
> 	tglx
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe
  2016-07-21 13:38     ` Auger Eric
@ 2016-07-22 12:44       ` Thomas Gleixner
  2016-07-22 14:08         ` Auger Eric
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-07-22 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 21 Jul 2016, Auger Eric wrote:
> On 20/07/2016 10:12, Thomas Gleixner wrote:
> > On Tue, 19 Jul 2016, Eric Auger wrote:
> >> +bool msi_doorbell_safe(void)
> >> +{
> >> +	struct irqchip_doorbell *db;
> >> +	bool irq_remapping = true;
> >> +
> >> +	mutex_lock(&irqchip_doorbell_mutex);
> >> +	list_for_each_entry(db, &irqchip_doorbell_list, next) {
> >> +		irq_remapping &= db->info.irq_remapping;
> > 
> > db->info.irq_remapping is set in msi_doorbell_register(). So you can keep book
> > about that there. No need to iterate here.
> Yes makes sense to store the info at registration time. Currently this
> function is not in any fast path but that's cleaner from a general
> perspective. I will need to do such iteration at un-registration though.

Two simple counter should be sufficient.

  nr_registered_bells;
  nr_remapped_bells;

....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe
  2016-07-22 12:44       ` Thomas Gleixner
@ 2016-07-22 14:08         ` Auger Eric
  0 siblings, 0 replies; 30+ messages in thread
From: Auger Eric @ 2016-07-22 14:08 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,
On 22/07/2016 14:44, Thomas Gleixner wrote:
> On Thu, 21 Jul 2016, Auger Eric wrote:
>> On 20/07/2016 10:12, Thomas Gleixner wrote:
>>> On Tue, 19 Jul 2016, Eric Auger wrote:
>>>> +bool msi_doorbell_safe(void)
>>>> +{
>>>> +	struct irqchip_doorbell *db;
>>>> +	bool irq_remapping = true;
>>>> +
>>>> +	mutex_lock(&irqchip_doorbell_mutex);
>>>> +	list_for_each_entry(db, &irqchip_doorbell_list, next) {
>>>> +		irq_remapping &= db->info.irq_remapping;
>>>
>>> db->info.irq_remapping is set in msi_doorbell_register(). So you can keep book
>>> about that there. No need to iterate here.
>> Yes makes sense to store the info at registration time. Currently this
>> function is not in any fast path but that's cleaner from a general
>> perspective. I will need to do such iteration at un-registration though.
> 
> Two simple counter should be sufficient.
> 
>   nr_registered_bells;
>   nr_remapped_bells;

Yes definitively smarter to use counters! mental viscosity.

Thanks

Eric
> 
> ....
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs
  2016-07-20  9:04   ` Thomas Gleixner
@ 2016-07-25 16:21     ` Auger Eric
  2016-07-26  9:00       ` Thomas Gleixner
  0 siblings, 1 reply; 30+ messages in thread
From: Auger Eric @ 2016-07-25 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On 20/07/2016 11:04, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
>>  /**
>> + * msi_handle_doorbell_mappings: in case the irq data corresponds to an
>> + * MSI that requires iommu mapping, traverse the irq domain hierarchy
>> + * to retrieve the doorbells to handle and iommu_map/unmap them according
>> + * to @map boolean.
>> + *
>> + * @data: irq data handle
>> + * @map: mapping if true, unmapping if false
>> + */
> 
> 
> Please run that through the kernel doc generator. It does not work that way.
> 
> The format is:
> 
> /**
>  * function_name - Short function description    
>  * @arg1:	Description of arg1
>  * @argument2:	Description of argument2
>  *
>  * Long explanation including documentation of the return values.
>  */
> 
>> +static int msi_handle_doorbell_mappings(struct irq_data *data, bool map)
>> +{
>> +	const struct irq_chip_msi_doorbell_info *dbinfo;
>> +	struct iommu_domain *domain;
>> +	struct irq_chip *chip;
>> +	struct device *dev;
>> +	dma_addr_t iova;
>> +	int ret = 0, cpu;
>> +
>> +	while (data) {
>> +		dev = msi_desc_to_dev(irq_data_get_msi_desc(data));
>> +		domain = iommu_msi_domain(dev);
>> +		if (domain) {
>> +			chip = irq_data_get_irq_chip(data);
>> +			if (chip->msi_doorbell_info)
>> +				break;
>> +		}
>> +		data = data->parent_data;
>> +	}
> 
> Please split that out into a seperate function
> 
> struct irq_data *msi_get_doorbell_info(data)
> {
> 	.....
> 		if (chip->msi_doorbell_info)
> 			return chip->msi_get_doorbell_info(data);
> 	}
> 	return NULL;
> }
> 
>        info = msi_get_doorbell_info(data);
>        .....
> 
>> +	if (!data)
>> +		return 0;
>> +
>> +	dbinfo = chip->msi_doorbell_info(data);
>> +	if (!dbinfo)
>> +		return -EINVAL;
>> +
>> +	if (!dbinfo->doorbell_is_percpu) {
>> +		if (!map) {
>> +			iommu_msi_put_doorbell_iova(domain,
>> +						    dbinfo->global_doorbell);
>> +			return 0;
>> +		}
>> +		return iommu_msi_get_doorbell_iova(domain,
>> +						   dbinfo->global_doorbell,
>> +						   dbinfo->size, dbinfo->prot,
>> +						   &iova);
>> +	}
> 
> You can spare an indentation level with a helper function
> 
>     	if (!dbinfo->doorbell_is_percpu)
> 		return msi_map_global_doorbell(domain, dbinfo);
> 
>> +
>> +	/* percpu doorbells */
>> +	for_each_possible_cpu(cpu) {
>> +		phys_addr_t __percpu *db_addr =
>> +			per_cpu_ptr(dbinfo->percpu_doorbells, cpu);
>> +
>> +		if (!map) {
>> +			iommu_msi_put_doorbell_iova(domain, *db_addr);
>> +		} else {
>> +
>> +			ret = iommu_msi_get_doorbell_iova(domain, *db_addr,
>> +							  dbinfo->size,
>> +							  dbinfo->prot, &iova);
>> +			if (ret)
>> +				return ret;
>> +		}
>> +	}
> 
> Same here:
> 
> 	for_each_possible_cpu(cpu) {
> 		ret = msi_map_percpu_doorbell(domain, cpu);
> 		if (ret)
> 			return ret;
> 	}
>      	return 0;
>      
> Hmm?
> 
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>>   * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain
>>   * @domain:	The domain to allocate from
>>   * @dev:	Pointer to device struct of the device for which the interrupts
>> @@ -352,17 +423,29 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
>>  
>>  		virq = __irq_domain_alloc_irqs(domain, virq, desc->nvec_used,
>>  					       dev_to_node(dev), &arg, false);
>> -		if (virq < 0) {
>> -			ret = -ENOSPC;
>> -			if (ops->handle_error)
>> -				ret = ops->handle_error(domain, desc, ret);
>> -			if (ops->msi_finish)
>> -				ops->msi_finish(&arg, ret);
>> -			return ret;
>> -		}
>> +		if (virq < 0)
>> +			goto error;
>>  
>>  		for (i = 0; i < desc->nvec_used; i++)
>>  			irq_set_msi_desc_off(virq, i, desc);
>> +
>> +		for (i = 0; i < desc->nvec_used; i++) {
>> +			struct irq_data *d = irq_get_irq_data(virq + i);
>> +
>> +			ret = msi_handle_doorbell_mappings(d, true);
>> +			if (ret)
>> +				break;
>> +		}
>> +		if (ret) {
>> +			for (; i >= 0; i--) {
>> +				struct irq_data *d = irq_get_irq_data(virq + i);
>> +
>> +				msi_handle_doorbell_mappings(d, false);
>> +			}
>> +			irq_domain_free_irqs(virq, desc->nvec_used);
>> +			desc->irq = 0;
>> +			goto error;
> 
> How is that supposed to work? You clear desc->irq and then you call
> ops->handle_error.
if I don't clear the desc->irq I enter an infinite loop in pci_enable_msix_range.
This happens because msix_capability_init and pcie_enable_msix returns 1.

In msix_capability_init, at out_avail: we enumerate the msi_desc which have a non
zero irq, hence the returned value equal to 1.

Currently the only handle_error ops I found, pci_msi_domain_handle_error does not
use irq field so works although questionable.

As for the irq_domain_free_irqs I think I can remove it since handled later.

How do you advise to handle the above situation?

Thanks

Eric

> 
> Why are you adding this extra stuff here? Look at the call sites of
> msi_domain_alloc_irqs(). All of them use msi_domain_free_irqs() in case of
> error. There is no reason why you can't do the same....
> 
>>  /**
>> @@ -396,6 +486,9 @@ void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
>>  		 * entry. If that's the case, don't do anything.
>>  		 */
>>  		if (desc->irq) {
>> +			msi_handle_doorbell_mappings(
>> +				irq_get_irq_data(desc->irq),
>> +				false);
>>  			irq_domain_free_irqs(desc->irq, desc->nvec_used);
>>  			desc->irq = 0;
> 
> Can you please restructure the code so it reads
> 
> 		if (desc->irq)
> 			continue;
> 
> 		msi_handle_doorbell_mappings(irq_get_irq_data(desc->irq),
> 					     false);	
> 		irq_domain_free_irqs(desc->irq, desc->nvec_used);
> 		desc->irq = 0;
> 
> Just blindly whacking stuff into the 80 char limit is not helping readability.
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested
  2016-07-20  9:09   ` Thomas Gleixner
@ 2016-07-25 16:31     ` Auger Eric
  2016-07-26  9:04       ` Thomas Gleixner
  0 siblings, 1 reply; 30+ messages in thread
From: Auger Eric @ 2016-07-25 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On 20/07/2016 11:09, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
> 
> First of all - valid for all patches:
> 
> Subject: sys/subsys: Sentence starts with an uppercase letter
OK understood.
> 
> Now for this particular one:
> 
> genirq/msi: use the MSI doorbell's IOVA when requested
> 
>> On MSI message composition we now use the MSI doorbell's IOVA in
>> place of the doorbell's PA in case the device is upstream to an
>> IOMMU that requires MSI addresses to be mapped. The doorbell's
>> allocation and mapping happened on an early stage (pci_enable_msi).
> 
> This changelog is completely useless. At least I cannot figure out what that
> patch actually does. And the implementation is not self explaining either.

>  
>> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
>>  {
>>  	int ret = 0;
>>  
>> -	if (erase)
>> +	if (erase) {
>>  		memset(msg, 0, sizeof(*msg));
>> -	else
>> +	} else {
>> +		struct device *dev;
>> +
>>  		ret = irq_chip_compose_msi_msg(irq_data, msg);
>> +		if (ret)
>> +			return ret;
>> +
>> +		dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
>> +		WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));
> 
> What the heck is this call doing? And why is there only a WARN_ON and not a
> proper error return code handling?

iommu_msi_msg_pa_to_va is part of the new iommu-msi API introduced in PART I of
this series. This helper function detects the physical address found in the
MSI message has a corresponding allocated IOVA. This happens if the MSI doorbell
is accessed through an IOMMU and this IOMMU do not bypass the MSI addresses
(ARM case). Allocation of this IOVA was performed in the previous patch.

So, if this is the case, the physical address is swapped with the IOVA
address. That way the PCIe device will send the MSI with this IOVA and
the address will be translated by the IOMMU into the target MSI doorbell PA.

Hope this clarifies

Thanks

Eric 
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs
  2016-07-25 16:21     ` Auger Eric
@ 2016-07-26  9:00       ` Thomas Gleixner
  2016-07-26  9:54         ` Auger Eric
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-07-26  9:00 UTC (permalink / raw)
  To: linux-arm-kernel

B1;2802;0cEric,

On Mon, 25 Jul 2016, Auger Eric wrote:
> On 20/07/2016 11:04, Thomas Gleixner wrote:
> > On Tue, 19 Jul 2016, Eric Auger wrote:
> >> +		if (ret) {
> >> +			for (; i >= 0; i--) {
> >> +				struct irq_data *d = irq_get_irq_data(virq + i);
> >> +
> >> +				msi_handle_doorbell_mappings(d, false);
> >> +			}
> >> +			irq_domain_free_irqs(virq, desc->nvec_used);
> >> +			desc->irq = 0;
> >> +			goto error;
> > 
> > How is that supposed to work? You clear desc->irq and then you call
> > ops->handle_error.
>
> if I don't clear the desc->irq I enter an infinite loop in
> pci_enable_msix_range.
>
> This happens because msix_capability_init and pcie_enable_msix returns 1.
> In msix_capability_init, at out_avail: we enumerate the msi_desc which have
> a non zero irq, hence the returned value equal to 1.
>
> Currently the only handle_error ops I found, pci_msi_domain_handle_error
> does not use irq field so works although questionable.

The logic here is: If the allocation does not succeed for the requested number
of interrupts, we tell the caller how many interrupts we were able to set up.
So the caller can decide what to do.

In your case you don't want to have a partial allocation, so instead of
playing silly games with desc->irq you should add a flag which tells the PCI
code that you are not interested in a partial allocation and that it should
return an error code instead.

Something like PCI_DEV_FLAGS_MSI_NO_PARTIAL_ALLOC should do the trick.

> As for the irq_domain_free_irqs I think I can remove it since handled later.

Not only the free_irqs(). You should let the teardown function handle
everything including your doorbell mapping teardown. It's nothing special and
free_msi_irqs() at the end of msix_capability_init() will take care of it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested
  2016-07-25 16:31     ` Auger Eric
@ 2016-07-26  9:04       ` Thomas Gleixner
  2016-07-26 10:02         ` Auger Eric
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Gleixner @ 2016-07-26  9:04 UTC (permalink / raw)
  To: linux-arm-kernel

Eric,

On Mon, 25 Jul 2016, Auger Eric wrote:
> On 20/07/2016 11:09, Thomas Gleixner wrote:
> > On Tue, 19 Jul 2016, Eric Auger wrote:
> >> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
> >>  {
> >>  	int ret = 0;
> >>  
> >> -	if (erase)
> >> +	if (erase) {
> >>  		memset(msg, 0, sizeof(*msg));
> >> -	else
> >> +	} else {
> >> +		struct device *dev;
> >> +
> >>  		ret = irq_chip_compose_msi_msg(irq_data, msg);
> >> +		if (ret)
> >> +			return ret;
> >> +
> >> +		dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
> >> +		WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));
> > 
> > What the heck is this call doing? And why is there only a WARN_ON and not a
> > proper error return code handling?
> 
> iommu_msi_msg_pa_to_va is part of the new iommu-msi API introduced in PART I
> of this series. This helper function detects the physical address found in
> the MSI message has a corresponding allocated IOVA. This happens if the MSI
> doorbell is accessed through an IOMMU and this IOMMU do not bypass the MSI
> addresses (ARM case). Allocation of this IOVA was performed in the previous
> patch.
>
> So, if this is the case, the physical address is swapped with the IOVA
> address. That way the PCIe device will send the MSI with this IOVA and
> the address will be translated by the IOMMU into the target MSI doorbell PA.
> 
> Hope this clarifies

No, it does not. You are explaining in great length what that function is
doing, but you are not explaining WHY your don't do a proper return code
handling and just do a WARN_ON() and happily proceed. If that function fails
then the interrupt will not be functional, so WHY on earth are you continuing?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs
  2016-07-26  9:00       ` Thomas Gleixner
@ 2016-07-26  9:54         ` Auger Eric
  2016-07-26 11:01           ` Thomas Gleixner
  0 siblings, 1 reply; 30+ messages in thread
From: Auger Eric @ 2016-07-26  9:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On 26/07/2016 11:00, Thomas Gleixner wrote:
> B1;2802;0cEric,
> 
> On Mon, 25 Jul 2016, Auger Eric wrote:
>> On 20/07/2016 11:04, Thomas Gleixner wrote:
>>> On Tue, 19 Jul 2016, Eric Auger wrote:
>>>> +		if (ret) {
>>>> +			for (; i >= 0; i--) {
>>>> +				struct irq_data *d = irq_get_irq_data(virq + i);
>>>> +
>>>> +				msi_handle_doorbell_mappings(d, false);
>>>> +			}
>>>> +			irq_domain_free_irqs(virq, desc->nvec_used);
>>>> +			desc->irq = 0;
>>>> +			goto error;
>>>
>>> How is that supposed to work? You clear desc->irq and then you call
>>> ops->handle_error.
>>
>> if I don't clear the desc->irq I enter an infinite loop in
>> pci_enable_msix_range.
>>
>> This happens because msix_capability_init and pcie_enable_msix returns 1.
>> In msix_capability_init, at out_avail: we enumerate the msi_desc which have
>> a non zero irq, hence the returned value equal to 1.
>>
>> Currently the only handle_error ops I found, pci_msi_domain_handle_error
>> does not use irq field so works although questionable.
> 
> The logic here is: If the allocation does not succeed for the requested number
> of interrupts, we tell the caller how many interrupts we were able to set up.
> So the caller can decide what to do.
> 
> In your case you don't want to have a partial allocation, so instead of
> playing silly games with desc->irq you should add a flag which tells the PCI
> code that you are not interested in a partial allocation and that it should
> return an error code instead.

In that case can we consider we even succeeded in allocating 1 MSI? In case the
IOMMU mapping fails, the MSI transaction will never reach the target MSI frame
so it is not usable. So when you mean "partial" I understand we did not succeed
in allocating maxvec IRQs, correct? Here we succeeded in allocating 0 IRQ and still
msi_capability_init returns 1.

msi_capability_init doc-comment says "a positive return value indicates the number of
interrupts which could have been allocated."

I understand allocation success currently only depends on the fact virq was allocated
and set to desc->irq. But with that IOMMU stuff doesn't the criteria changes?


> Something like PCI_DEV_FLAGS_MSI_NO_PARTIAL_ALLOC should do the trick.
> 
>> As for the irq_domain_free_irqs I think I can remove it since handled later.
> 
> Not only the free_irqs(). You should let the teardown function handle
> everything including your doorbell mapping teardown. It's nothing special and
> free_msi_irqs() at the end of msix_capability_init() will take care of it.
Yep I was forced to call free_irqs myself since free_msi_irqs was doing nothing
due the fact I resetted the irq field. Wrong thing loop ;-)

Thanks

Eric

> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested
  2016-07-26  9:04       ` Thomas Gleixner
@ 2016-07-26 10:02         ` Auger Eric
  0 siblings, 0 replies; 30+ messages in thread
From: Auger Eric @ 2016-07-26 10:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

On 26/07/2016 11:04, Thomas Gleixner wrote:
> Eric,
> 
> On Mon, 25 Jul 2016, Auger Eric wrote:
>> On 20/07/2016 11:09, Thomas Gleixner wrote:
>>> On Tue, 19 Jul 2016, Eric Auger wrote:
>>>> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
>>>>  {
>>>>  	int ret = 0;
>>>>  
>>>> -	if (erase)
>>>> +	if (erase) {
>>>>  		memset(msg, 0, sizeof(*msg));
>>>> -	else
>>>> +	} else {
>>>> +		struct device *dev;
>>>> +
>>>>  		ret = irq_chip_compose_msi_msg(irq_data, msg);
>>>> +		if (ret)
>>>> +			return ret;
>>>> +
>>>> +		dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
>>>> +		WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));
>>>
>>> What the heck is this call doing? And why is there only a WARN_ON and not a
>>> proper error return code handling?
>>
>> iommu_msi_msg_pa_to_va is part of the new iommu-msi API introduced in PART I
>> of this series. This helper function detects the physical address found in
>> the MSI message has a corresponding allocated IOVA. This happens if the MSI
>> doorbell is accessed through an IOMMU and this IOMMU do not bypass the MSI
>> addresses (ARM case). Allocation of this IOVA was performed in the previous
>> patch.
>>
>> So, if this is the case, the physical address is swapped with the IOVA
>> address. That way the PCIe device will send the MSI with this IOVA and
>> the address will be translated by the IOMMU into the target MSI doorbell PA.
>>
>> Hope this clarifies
> 
> No, it does not. You are explaining in great length what that function is
> doing, but you are not explaining WHY your don't do a proper return code
> handling and just do a WARN_ON() and happily proceed. If that function fails
> then the interrupt will not be functional, so WHY on earth are you continuing?
Oh sorry I focused on the function's goal. Originally I could not return an
error since there is a BUG_ON(ret) afterwards. And typically the userspace can
willingly omit to pass IPA range that map MSIs. But now we have this 2 phases where
we first map the MSIs on pci_enable_msi_range and use the IOVA at compose time
I need to analyze again if the userspace can induce a BUG_ON.

Thanks

Eric
> 
> Thanks,
> 
> 	tglx
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs
  2016-07-26  9:54         ` Auger Eric
@ 2016-07-26 11:01           ` Thomas Gleixner
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Gleixner @ 2016-07-26 11:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 26 Jul 2016, Auger Eric wrote:
> On 26/07/2016 11:00, Thomas Gleixner wrote:
> > In your case you don't want to have a partial allocation, so instead of
> > playing silly games with desc->irq you should add a flag which tells the PCI
> > code that you are not interested in a partial allocation and that it should
> > return an error code instead.
> 

> In that case can we consider we even succeeded in allocating 1 MSI? In case
> the IOMMU mapping fails, the MSI transaction will never reach the target MSI
> frame so it is not usable. So when you mean "partial" I understand we did
> not succeed in allocating maxvec IRQs, correct? Here we succeeded in
> allocating 0 IRQ and still msi_capability_init returns 1.
>
> msi_capability_init doc-comment says "a positive return value indicates the
> number of interrupts which could have been allocated."
> 
> I understand allocation success currently only depends on the fact virq was
> allocated and set to desc->irq. But with that IOMMU stuff doesn't the
> criteria changes?

Right. But then you need to express it differently in a consistent way. Not by
hacking around it by setting desc->irq to 0.

Something like a flag field in msi_desc which denotes various properties would
be a possible solution. MSI_IRQ_ALLOCATED and MSI_IRQ_REMAPPED would be
sufficient for now. And the deallocation/cleanup would rely on those flags
rather than checking desc->irq.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-07-26 11:01 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19 13:02 [PATCH v11 00/10] KVM PCIe/MSI passthrough on ARM/ARM64: kernel part 2/3: msi changes Eric Auger
2016-07-19 13:02 ` [PATCH v11 01/10] genirq/msi: export msi_get_domain_info Eric Auger
2016-07-19 13:02 ` [PATCH v11 02/10] genirq/msi: msi_compose wrapper Eric Auger
2016-07-19 13:02 ` [PATCH v11 03/10] genirq/irq: introduce msi_doorbell_info Eric Auger
2016-07-19 14:15   ` Thomas Gleixner
2016-07-19 13:02 ` [PATCH v11 04/10] genirq/msi-doorbell: allow MSI doorbell (un)registration Eric Auger
2016-07-19 14:22   ` Thomas Gleixner
2016-07-20  7:50     ` Auger Eric
2016-07-19 13:02 ` [PATCH v11 05/10] genirq/msi-doorbell: msi_doorbell_pages Eric Auger
2016-07-19 14:38   ` Thomas Gleixner
2016-07-20  7:50     ` Auger Eric
2016-07-21 13:38     ` Auger Eric
2016-07-19 13:02 ` [PATCH v11 06/10] genirq/msi-doorbell: msi_doorbell_safe Eric Auger
2016-07-20  8:12   ` Thomas Gleixner
2016-07-21 13:38     ` Auger Eric
2016-07-22 12:44       ` Thomas Gleixner
2016-07-22 14:08         ` Auger Eric
2016-07-19 13:02 ` [PATCH v11 07/10] irqchip/gicv2m: register the MSI global doorbell Eric Auger
2016-07-19 13:02 ` [PATCH v11 08/10] irqchip/gicv3-its: " Eric Auger
2016-07-19 13:02 ` [PATCH v11 09/10] genirq/msi: map/unmap the MSI doorbells on msi_domain_alloc/free_irqs Eric Auger
2016-07-20  9:04   ` Thomas Gleixner
2016-07-25 16:21     ` Auger Eric
2016-07-26  9:00       ` Thomas Gleixner
2016-07-26  9:54         ` Auger Eric
2016-07-26 11:01           ` Thomas Gleixner
2016-07-19 13:02 ` [PATCH v11 10/10] genirq/msi: use the MSI doorbell's IOVA when requested Eric Auger
2016-07-20  9:09   ` Thomas Gleixner
2016-07-25 16:31     ` Auger Eric
2016-07-26  9:04       ` Thomas Gleixner
2016-07-26 10:02         ` Auger Eric

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).