All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-09-01 12:52 ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: linux-kernel, patches, will.deacon, a.motakis, a.rigo, john.liuli

This RFC proposes an integration of "ARM: Forwarding physical
interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
KVM.

It enables to transform a VFIO platform driver IRQ into a forwarded
IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
switch can be avoided on guest virtual IRQ completion. Before this
patch, a maintenance IRQ was triggered on the virtual IRQ completion.

When the IRQ is forwarded, the VFIO platform driver does not need to
disable the IRQ anymore. Indeed when returning from the IRQ handler
the IRQ is not deactivated. Only its priority is lowered. This means
the same IRQ cannot hit before the guest completes the virtual IRQ
and the GIC automatically deactivates the corresponding physical IRQ.

Besides, the injection still is based on irqfd triggering. The only
impact on irqfd process is resamplefd is not called anymore on
virtual IRQ completion since this latter becomes "transparent".

The current integration is based on an extension of the KVM-VFIO
device, previously used by KVM to interact with VFIO groups. The
patch serie now enables KVM to directly interact with a VFIO
platform device. The VFIO external API was extended for that purpose.

Th KVM-VFIO device can get/put the vfio platform device, check its
integrity and type, get the IRQ number associated to an IRQ index.

The IRQ forward programming is architecture specific (virtual interrupt
controller programming basically). However the whole infrastructure is
kept generic.

from a user point of view, the functionality is provided through new
KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
and the capability can be checked with KVM_HAS_DEVICE_ATTR.
Assignment can only be changed when the physical IRQ is not active.
It is the responsability of the user to do this check.

This patch serie has the following dependencies:
- "ARM: Forwarding physical interrupts to a guest VM"
  (http://lwn.net/Articles/603514/) in
- [PATCH v3] irqfd for ARM
- and obviously the VFIO platform driver serie:
  [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
  https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html

Integrated pieces can be found at
ssh://git.linaro.org/people/eric.auger/linux.git
on branch 3.17rc3_irqfd_forward_integ_v2

This was was tested on Calxeda Midway, assigning the xgmac main IRQ.

v1 -> v2:
- forward control is moved from architecture specific file into generic
  vfio.c module.
  only kvm_arch_set_fwd_state remains architecture specific
- integrate Kim's patch which enables KVM-VFIO for ARM
- fix vgic state bypass in vgic_queue_hwirq
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
  to include/uapi/linux/kvm.h
  also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed
- kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD

Eric Auger (8):
  KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
    IRQ
  KVM: ARM: VGIC: add forwarded irq rbtree lock
  VFIO: platform: handler tests whether the IRQ is forwarded
  KVM: KVM-VFIO: update user API to program forwarded IRQ
  VFIO: Extend external user API
  KVM: KVM-VFIO: add new VFIO external API hooks
  KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
    control
  KVM: KVM-VFIO: ARM forwarding control

Kim Phillips (1):
  ARM: KVM: Enable the KVM-VFIO device

 Documentation/virtual/kvm/devices/vfio.txt |  26 ++
 arch/arm/include/asm/kvm_host.h            |   7 +
 arch/arm/kvm/Kconfig                       |   1 +
 arch/arm/kvm/Makefile                      |   4 +-
 arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
 drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
 drivers/vfio/vfio.c                        |  24 ++
 include/kvm/arm_vgic.h                     |   1 +
 include/linux/kvm_host.h                   |  27 ++
 include/linux/vfio.h                       |   3 +
 include/uapi/linux/kvm.h                   |   9 +
 virt/kvm/arm/vgic.c                        |  59 +++-
 virt/kvm/vfio.c                            | 497 ++++++++++++++++++++++++++++-
 13 files changed, 733 insertions(+), 17 deletions(-)
 create mode 100644 arch/arm/kvm/kvm_vfio_arm.c

-- 
1.9.1


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-09-01 12:52 ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

This RFC proposes an integration of "ARM: Forwarding physical
interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
KVM.

It enables to transform a VFIO platform driver IRQ into a forwarded
IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
switch can be avoided on guest virtual IRQ completion. Before this
patch, a maintenance IRQ was triggered on the virtual IRQ completion.

When the IRQ is forwarded, the VFIO platform driver does not need to
disable the IRQ anymore. Indeed when returning from the IRQ handler
the IRQ is not deactivated. Only its priority is lowered. This means
the same IRQ cannot hit before the guest completes the virtual IRQ
and the GIC automatically deactivates the corresponding physical IRQ.

Besides, the injection still is based on irqfd triggering. The only
impact on irqfd process is resamplefd is not called anymore on
virtual IRQ completion since this latter becomes "transparent".

The current integration is based on an extension of the KVM-VFIO
device, previously used by KVM to interact with VFIO groups. The
patch serie now enables KVM to directly interact with a VFIO
platform device. The VFIO external API was extended for that purpose.

Th KVM-VFIO device can get/put the vfio platform device, check its
integrity and type, get the IRQ number associated to an IRQ index.

The IRQ forward programming is architecture specific (virtual interrupt
controller programming basically). However the whole infrastructure is
kept generic.

from a user point of view, the functionality is provided through new
KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
and the capability can be checked with KVM_HAS_DEVICE_ATTR.
Assignment can only be changed when the physical IRQ is not active.
It is the responsability of the user to do this check.

This patch serie has the following dependencies:
- "ARM: Forwarding physical interrupts to a guest VM"
  (http://lwn.net/Articles/603514/) in
- [PATCH v3] irqfd for ARM
- and obviously the VFIO platform driver serie:
  [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
  https://www.mail-archive.com/kvm at vger.kernel.org/msg103247.html

Integrated pieces can be found at
ssh://git.linaro.org/people/eric.auger/linux.git
on branch 3.17rc3_irqfd_forward_integ_v2

This was was tested on Calxeda Midway, assigning the xgmac main IRQ.

v1 -> v2:
- forward control is moved from architecture specific file into generic
  vfio.c module.
  only kvm_arch_set_fwd_state remains architecture specific
- integrate Kim's patch which enables KVM-VFIO for ARM
- fix vgic state bypass in vgic_queue_hwirq
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
  to include/uapi/linux/kvm.h
  also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed
- kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD

Eric Auger (8):
  KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
    IRQ
  KVM: ARM: VGIC: add forwarded irq rbtree lock
  VFIO: platform: handler tests whether the IRQ is forwarded
  KVM: KVM-VFIO: update user API to program forwarded IRQ
  VFIO: Extend external user API
  KVM: KVM-VFIO: add new VFIO external API hooks
  KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
    control
  KVM: KVM-VFIO: ARM forwarding control

Kim Phillips (1):
  ARM: KVM: Enable the KVM-VFIO device

 Documentation/virtual/kvm/devices/vfio.txt |  26 ++
 arch/arm/include/asm/kvm_host.h            |   7 +
 arch/arm/kvm/Kconfig                       |   1 +
 arch/arm/kvm/Makefile                      |   4 +-
 arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
 drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
 drivers/vfio/vfio.c                        |  24 ++
 include/kvm/arm_vgic.h                     |   1 +
 include/linux/kvm_host.h                   |  27 ++
 include/linux/vfio.h                       |   3 +
 include/uapi/linux/kvm.h                   |   9 +
 virt/kvm/arm/vgic.c                        |  59 +++-
 virt/kvm/vfio.c                            | 497 ++++++++++++++++++++++++++++-
 13 files changed, 733 insertions(+), 17 deletions(-)
 create mode 100644 arch/arm/kvm/kvm_vfio_arm.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 1/9] KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded IRQ
  2014-09-01 12:52 ` Eric Auger
@ 2014-09-01 12:52   ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: linux-kernel, patches, will.deacon, a.motakis, a.rigo, john.liuli

Fix multiple injection of level sensitive forwarded IRQs.
With current code, the second injection fails since the state bitmaps
are not reset (process_maintenance is not called anymore).
New implementation consists in fully bypassing the vgic state
management for forwarded IRQ (checks are ignored in
vgic_update_irq_pending). This obviously assumes the forwarded IRQ is
injected from kernel side.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

It was attempted to reset the states in __kvm_vgic_sync_hwstate, checking
the emptied LR of forwarded IRQ. However surprisingly this solution does
not seem to work. Some times, a new forwarded IRQ injection is observed
while the LR of the previous instance was not observed as empty.

v1 -> v2:
- fix vgic state bypass in vgic_queue_hwirq
---
 virt/kvm/arm/vgic.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 0007300..8ef495b 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1259,7 +1259,9 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
 
 static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
 {
-	if (vgic_irq_is_queued(vcpu, irq))
+	bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) > 0);
+
+	if (vgic_irq_is_queued(vcpu, irq) && !is_forwarded)
 		return true; /* level interrupt, already queued */
 
 	if (vgic_queue_irq(vcpu, 0, irq)) {
@@ -1517,14 +1519,18 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
 	int edge_triggered, level_triggered;
 	int enabled;
 	bool ret = true;
+	bool is_forwarded;
 
 	spin_lock(&dist->lock);
 
 	vcpu = kvm_get_vcpu(kvm, cpuid);
+	is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) > 0);
+
 	edge_triggered = vgic_irq_is_edge(vcpu, irq_num);
 	level_triggered = !edge_triggered;
 
-	if (!vgic_validate_injection(vcpu, irq_num, level)) {
+	if (!is_forwarded &&
+		!vgic_validate_injection(vcpu, irq_num, level)) {
 		ret = false;
 		goto out;
 	}
@@ -1557,7 +1563,8 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
 		goto out;
 	}
 
-	if (level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
+	if (!is_forwarded &&
+		level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
 		/*
 		 * Level interrupt in progress, will be picked up
 		 * when EOId.
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 1/9] KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded IRQ
@ 2014-09-01 12:52   ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

Fix multiple injection of level sensitive forwarded IRQs.
With current code, the second injection fails since the state bitmaps
are not reset (process_maintenance is not called anymore).
New implementation consists in fully bypassing the vgic state
management for forwarded IRQ (checks are ignored in
vgic_update_irq_pending). This obviously assumes the forwarded IRQ is
injected from kernel side.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

It was attempted to reset the states in __kvm_vgic_sync_hwstate, checking
the emptied LR of forwarded IRQ. However surprisingly this solution does
not seem to work. Some times, a new forwarded IRQ injection is observed
while the LR of the previous instance was not observed as empty.

v1 -> v2:
- fix vgic state bypass in vgic_queue_hwirq
---
 virt/kvm/arm/vgic.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 0007300..8ef495b 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1259,7 +1259,9 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
 
 static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
 {
-	if (vgic_irq_is_queued(vcpu, irq))
+	bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) > 0);
+
+	if (vgic_irq_is_queued(vcpu, irq) && !is_forwarded)
 		return true; /* level interrupt, already queued */
 
 	if (vgic_queue_irq(vcpu, 0, irq)) {
@@ -1517,14 +1519,18 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
 	int edge_triggered, level_triggered;
 	int enabled;
 	bool ret = true;
+	bool is_forwarded;
 
 	spin_lock(&dist->lock);
 
 	vcpu = kvm_get_vcpu(kvm, cpuid);
+	is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) > 0);
+
 	edge_triggered = vgic_irq_is_edge(vcpu, irq_num);
 	level_triggered = !edge_triggered;
 
-	if (!vgic_validate_injection(vcpu, irq_num, level)) {
+	if (!is_forwarded &&
+		!vgic_validate_injection(vcpu, irq_num, level)) {
 		ret = false;
 		goto out;
 	}
@@ -1557,7 +1563,8 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
 		goto out;
 	}
 
-	if (level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
+	if (!is_forwarded &&
+		level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
 		/*
 		 * Level interrupt in progress, will be picked up
 		 * when EOId.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 2/9] KVM: ARM: VGIC: add forwarded irq rbtree lock
  2014-09-01 12:52 ` Eric Auger
@ 2014-09-01 12:52   ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: linux-kernel, patches, will.deacon, a.motakis, a.rigo, john.liuli

add a lock related to the rb tree manipulation. The rb tree can be
searched in one thread (irqfd handler for instance) and map/unmap
happen in another.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 include/kvm/arm_vgic.h |  1 +
 virt/kvm/arm/vgic.c    | 46 +++++++++++++++++++++++++++++++++++++---------
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 743020f..3da244f 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -177,6 +177,7 @@ struct vgic_dist {
 	unsigned long		irq_pending_on_cpu;
 
 	struct rb_root		irq_phys_map;
+	spinlock_t			rb_tree_lock;
 #endif
 };
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 8ef495b..dbc2a5a 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1630,9 +1630,15 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
 
 int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 {
-	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
-	struct rb_node **new = &root->rb_node, *parent = NULL;
+	struct rb_root *root;
+	struct rb_node **new, *parent = NULL;
 	struct irq_phys_map *new_map;
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+
+	spin_lock(&dist->rb_tree_lock);
+
+	root = vgic_get_irq_phys_map(vcpu, virt_irq);
+	new = &root->rb_node;
 
 	/* Boilerplate rb_tree code */
 	while (*new) {
@@ -1644,13 +1650,17 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 			new = &(*new)->rb_left;
 		else if (this->virt_irq > virt_irq)
 			new = &(*new)->rb_right;
-		else
+		else {
+			spin_unlock(&dist->rb_tree_lock);
 			return -EEXIST;
+		}
 	}
 
 	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
-	if (!new_map)
+	if (!new_map) {
+		spin_unlock(&dist->rb_tree_lock);
 		return -ENOMEM;
+	}
 
 	new_map->virt_irq = virt_irq;
 	new_map->phys_irq = phys_irq;
@@ -1658,6 +1668,8 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 	rb_link_node(&new_map->node, parent, new);
 	rb_insert_color(&new_map->node, root);
 
+	spin_unlock(&dist->rb_tree_lock);
+
 	return 0;
 }
 
@@ -1685,24 +1697,39 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
 
 int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq)
 {
-	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
+	struct irq_phys_map *map;
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	int ret;
+
+	spin_lock(&dist->rb_tree_lock);
+	map = vgic_irq_map_search(vcpu, virt_irq);
 
 	if (map)
-		return map->phys_irq;
+		ret = map->phys_irq;
+	else
+		ret =  -ENOENT;
+
+	spin_unlock(&dist->rb_tree_lock);
+	return ret;
 
-	return -ENOENT;
 }
 
 int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 {
-	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
+	struct irq_phys_map *map;
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+
+	spin_lock(&dist->rb_tree_lock);
+
+	map = vgic_irq_map_search(vcpu, virt_irq);
 
 	if (map && map->phys_irq == phys_irq) {
 		rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, virt_irq));
 		kfree(map);
+		spin_unlock(&dist->rb_tree_lock);
 		return 0;
 	}
-
+	spin_unlock(&dist->rb_tree_lock);
 	return -ENOENT;
 }
 
@@ -1898,6 +1925,7 @@ int kvm_vgic_create(struct kvm *kvm)
 	}
 
 	spin_lock_init(&kvm->arch.vgic.lock);
+	spin_lock_init(&kvm->arch.vgic.rb_tree_lock);
 	kvm->arch.vgic.in_kernel = true;
 	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
 	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 2/9] KVM: ARM: VGIC: add forwarded irq rbtree lock
@ 2014-09-01 12:52   ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

add a lock related to the rb tree manipulation. The rb tree can be
searched in one thread (irqfd handler for instance) and map/unmap
happen in another.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 include/kvm/arm_vgic.h |  1 +
 virt/kvm/arm/vgic.c    | 46 +++++++++++++++++++++++++++++++++++++---------
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 743020f..3da244f 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -177,6 +177,7 @@ struct vgic_dist {
 	unsigned long		irq_pending_on_cpu;
 
 	struct rb_root		irq_phys_map;
+	spinlock_t			rb_tree_lock;
 #endif
 };
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 8ef495b..dbc2a5a 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1630,9 +1630,15 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
 
 int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 {
-	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
-	struct rb_node **new = &root->rb_node, *parent = NULL;
+	struct rb_root *root;
+	struct rb_node **new, *parent = NULL;
 	struct irq_phys_map *new_map;
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+
+	spin_lock(&dist->rb_tree_lock);
+
+	root = vgic_get_irq_phys_map(vcpu, virt_irq);
+	new = &root->rb_node;
 
 	/* Boilerplate rb_tree code */
 	while (*new) {
@@ -1644,13 +1650,17 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 			new = &(*new)->rb_left;
 		else if (this->virt_irq > virt_irq)
 			new = &(*new)->rb_right;
-		else
+		else {
+			spin_unlock(&dist->rb_tree_lock);
 			return -EEXIST;
+		}
 	}
 
 	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
-	if (!new_map)
+	if (!new_map) {
+		spin_unlock(&dist->rb_tree_lock);
 		return -ENOMEM;
+	}
 
 	new_map->virt_irq = virt_irq;
 	new_map->phys_irq = phys_irq;
@@ -1658,6 +1668,8 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 	rb_link_node(&new_map->node, parent, new);
 	rb_insert_color(&new_map->node, root);
 
+	spin_unlock(&dist->rb_tree_lock);
+
 	return 0;
 }
 
@@ -1685,24 +1697,39 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
 
 int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq)
 {
-	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
+	struct irq_phys_map *map;
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	int ret;
+
+	spin_lock(&dist->rb_tree_lock);
+	map = vgic_irq_map_search(vcpu, virt_irq);
 
 	if (map)
-		return map->phys_irq;
+		ret = map->phys_irq;
+	else
+		ret =  -ENOENT;
+
+	spin_unlock(&dist->rb_tree_lock);
+	return ret;
 
-	return -ENOENT;
 }
 
 int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
 {
-	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
+	struct irq_phys_map *map;
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+
+	spin_lock(&dist->rb_tree_lock);
+
+	map = vgic_irq_map_search(vcpu, virt_irq);
 
 	if (map && map->phys_irq == phys_irq) {
 		rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, virt_irq));
 		kfree(map);
+		spin_unlock(&dist->rb_tree_lock);
 		return 0;
 	}
-
+	spin_unlock(&dist->rb_tree_lock);
 	return -ENOENT;
 }
 
@@ -1898,6 +1925,7 @@ int kvm_vgic_create(struct kvm *kvm)
 	}
 
 	spin_lock_init(&kvm->arch.vgic.lock);
+	spin_lock_init(&kvm->arch.vgic.rb_tree_lock);
 	kvm->arch.vgic.in_kernel = true;
 	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
 	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 3/9] ARM: KVM: Enable the KVM-VFIO device
  2014-09-01 12:52 ` Eric Auger
@ 2014-09-01 12:52   ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli, Kim Phillips

From: Kim Phillips <kim.phillips@linaro.org>

Used by KVM-enabled VFIO-based device passthrough support in QEMU.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
---
 arch/arm/kvm/Kconfig  | 1 +
 arch/arm/kvm/Makefile | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index e519a40..aace254 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -24,6 +24,7 @@ config KVM
 	select KVM_MMIO
 	select KVM_ARM_HOST
 	depends on ARM_VIRT_EXT && ARM_LPAE
+	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	---help---
 	  Support hosting virtualized guest machines. You will also
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 859db09..ea1fa76 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
 AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
 
 KVM := ../../../virt/kvm
-kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
+kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 3/9] ARM: KVM: Enable the KVM-VFIO device
@ 2014-09-01 12:52   ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

From: Kim Phillips <kim.phillips@linaro.org>

Used by KVM-enabled VFIO-based device passthrough support in QEMU.

Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
---
 arch/arm/kvm/Kconfig  | 1 +
 arch/arm/kvm/Makefile | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index e519a40..aace254 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -24,6 +24,7 @@ config KVM
 	select KVM_MMIO
 	select KVM_ARM_HOST
 	depends on ARM_VIRT_EXT && ARM_LPAE
+	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	---help---
 	  Support hosting virtualized guest machines. You will also
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 859db09..ea1fa76 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
 AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
 
 KVM := ../../../virt/kvm
-kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
+kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
  2014-09-01 12:52 ` Eric Auger
@ 2014-09-01 12:52   ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: linux-kernel, patches, will.deacon, a.motakis, a.rigo, john.liuli

In case the IRQ is forwarded, the VFIO platform IRQ handler does not
need to disable the IRQ anymore. In that mode, when the handler completes
the IRQ is not deactivated but only its priority is lowered.

Some other actor (typically a guest) is supposed to deactivate the IRQ,
allowing at that time a new physical IRQ to hit.

In virtualization use case, the physical IRQ is automatically completed
by the interrupt controller when the guest completes the corresponding
virtual IRQ.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 6768508..1f851b2 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
 	struct vfio_platform_irq *irq_ctx = dev_id;
 	unsigned long flags;
 	int ret = IRQ_NONE;
+	struct irq_data *d;
+	bool is_forwarded;
 
 	spin_lock_irqsave(&irq_ctx->lock, flags);
 
 	if (!irq_ctx->masked) {
 		ret = IRQ_HANDLED;
+		d = irq_get_irq_data(irq_ctx->hwirq);
+		is_forwarded = irqd_irq_forwarded(d);
 
-		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
+		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
+						!is_forwarded) {
 			disable_irq_nosync(irq_ctx->hwirq);
 			irq_ctx->masked = true;
 		}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
@ 2014-09-01 12:52   ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

In case the IRQ is forwarded, the VFIO platform IRQ handler does not
need to disable the IRQ anymore. In that mode, when the handler completes
the IRQ is not deactivated but only its priority is lowered.

Some other actor (typically a guest) is supposed to deactivate the IRQ,
allowing at that time a new physical IRQ to hit.

In virtualization use case, the physical IRQ is automatically completed
by the interrupt controller when the guest completes the corresponding
virtual IRQ.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 6768508..1f851b2 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
 	struct vfio_platform_irq *irq_ctx = dev_id;
 	unsigned long flags;
 	int ret = IRQ_NONE;
+	struct irq_data *d;
+	bool is_forwarded;
 
 	spin_lock_irqsave(&irq_ctx->lock, flags);
 
 	if (!irq_ctx->masked) {
 		ret = IRQ_HANDLED;
+		d = irq_get_irq_data(irq_ctx->hwirq);
+		is_forwarded = irqd_irq_forwarded(d);
 
-		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
+		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
+						!is_forwarded) {
 			disable_irq_nosync(irq_ctx->hwirq);
 			irq_ctx->masked = true;
 		}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 5/9] KVM: KVM-VFIO: update user API to program forwarded IRQ
  2014-09-01 12:52 ` Eric Auger
@ 2014-09-01 12:52   ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: linux-kernel, patches, will.deacon, a.motakis, a.rigo, john.liuli

add new device group commands:
- KVM_DEV_VFIO_DEVICE_FORWARD_IRQ and
  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ

which enable to turn forwarded IRQ mode on/off.

the kvm_arch_forwarded_irq struct embodies a forwarded IRQ

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v1 -> v2:
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
  to include/uapi/linux/kvm.h
  also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
---
 Documentation/virtual/kvm/devices/vfio.txt | 26 ++++++++++++++++++++++++++
 include/uapi/linux/kvm.h                   |  9 +++++++++
 2 files changed, 35 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..048baa0 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -13,6 +13,7 @@ VFIO-group is held by KVM.
 
 Groups:
   KVM_DEV_VFIO_GROUP
+  KVM_DEV_VFIO_DEVICE
 
 KVM_DEV_VFIO_GROUP attributes:
   KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
@@ -20,3 +21,28 @@ KVM_DEV_VFIO_GROUP attributes:
 
 For each, kvm_device_attr.addr points to an int32_t file descriptor
 for the VFIO group.
+
+KVM_DEV_VFIO_DEVICE attributes:
+  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ
+  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
+
+For each, kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct.
+This user API makes possible to create a special IRQ handling mode,
+where KVM and a VFIO platform driver collaborate to improve IRQ
+handling performance.
+
+fd represents the file descriptor of a valid VFIO device whose physical
+IRQ, referenced by its index, is injected into the VM guest irq (gsi).
+
+On FORWARD_IRQ, KVM-VFIO device programs:
+- the host, to not complete the physical IRQ itself.
+- the GIC, to automatically complete the physical IRQ when the guest
+  completes the virtual IRQ.
+This avoids trapping the end-of-interrupt for level sensitive IRQ.
+
+On UNFORWARD_IRQ, one returns to the mode where the host completes the
+physical IRQ and the guest completes the virtual IRQ.
+
+It is up to the caller of this API to make sure the IRQ is not
+outstanding when the FORWARD/UNFORWARD is called. This could lead to
+some inconsistency on who is going to complete the IRQ.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index cf3a2ff..8cd7b0e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -947,6 +947,12 @@ struct kvm_device_attr {
 	__u64	addr;		/* userspace address of attr data */
 };
 
+struct kvm_arch_forwarded_irq {
+	__u32 fd; /* file desciptor of the VFIO device */
+	__u32 index; /* VFIO device IRQ index */
+	__u32 gsi; /* gsi, ie. virtual IRQ number */
+};
+
 #define KVM_DEV_TYPE_FSL_MPIC_20	1
 #define KVM_DEV_TYPE_FSL_MPIC_42	2
 #define KVM_DEV_TYPE_XICS		3
@@ -954,6 +960,9 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_GROUP			1
 #define   KVM_DEV_VFIO_GROUP_ADD			1
 #define   KVM_DEV_VFIO_GROUP_DEL			2
+#define  KVM_DEV_VFIO_DEVICE			2
+#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ			1
+#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ			2
 #define KVM_DEV_TYPE_ARM_VGIC_V2	5
 #define KVM_DEV_TYPE_FLIC		6
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 5/9] KVM: KVM-VFIO: update user API to program forwarded IRQ
@ 2014-09-01 12:52   ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

add new device group commands:
- KVM_DEV_VFIO_DEVICE_FORWARD_IRQ and
  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ

which enable to turn forwarded IRQ mode on/off.

the kvm_arch_forwarded_irq struct embodies a forwarded IRQ

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v1 -> v2:
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
  to include/uapi/linux/kvm.h
  also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
---
 Documentation/virtual/kvm/devices/vfio.txt | 26 ++++++++++++++++++++++++++
 include/uapi/linux/kvm.h                   |  9 +++++++++
 2 files changed, 35 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..048baa0 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -13,6 +13,7 @@ VFIO-group is held by KVM.
 
 Groups:
   KVM_DEV_VFIO_GROUP
+  KVM_DEV_VFIO_DEVICE
 
 KVM_DEV_VFIO_GROUP attributes:
   KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
@@ -20,3 +21,28 @@ KVM_DEV_VFIO_GROUP attributes:
 
 For each, kvm_device_attr.addr points to an int32_t file descriptor
 for the VFIO group.
+
+KVM_DEV_VFIO_DEVICE attributes:
+  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ
+  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
+
+For each, kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct.
+This user API makes possible to create a special IRQ handling mode,
+where KVM and a VFIO platform driver collaborate to improve IRQ
+handling performance.
+
+fd represents the file descriptor of a valid VFIO device whose physical
+IRQ, referenced by its index, is injected into the VM guest irq (gsi).
+
+On FORWARD_IRQ, KVM-VFIO device programs:
+- the host, to not complete the physical IRQ itself.
+- the GIC, to automatically complete the physical IRQ when the guest
+  completes the virtual IRQ.
+This avoids trapping the end-of-interrupt for level sensitive IRQ.
+
+On UNFORWARD_IRQ, one returns to the mode where the host completes the
+physical IRQ and the guest completes the virtual IRQ.
+
+It is up to the caller of this API to make sure the IRQ is not
+outstanding when the FORWARD/UNFORWARD is called. This could lead to
+some inconsistency on who is going to complete the IRQ.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index cf3a2ff..8cd7b0e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -947,6 +947,12 @@ struct kvm_device_attr {
 	__u64	addr;		/* userspace address of attr data */
 };
 
+struct kvm_arch_forwarded_irq {
+	__u32 fd; /* file desciptor of the VFIO device */
+	__u32 index; /* VFIO device IRQ index */
+	__u32 gsi; /* gsi, ie. virtual IRQ number */
+};
+
 #define KVM_DEV_TYPE_FSL_MPIC_20	1
 #define KVM_DEV_TYPE_FSL_MPIC_42	2
 #define KVM_DEV_TYPE_XICS		3
@@ -954,6 +960,9 @@ struct kvm_device_attr {
 #define  KVM_DEV_VFIO_GROUP			1
 #define   KVM_DEV_VFIO_GROUP_ADD			1
 #define   KVM_DEV_VFIO_GROUP_DEL			2
+#define  KVM_DEV_VFIO_DEVICE			2
+#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ			1
+#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ			2
 #define KVM_DEV_TYPE_ARM_VGIC_V2	5
 #define KVM_DEV_TYPE_FLIC		6
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 6/9] VFIO: Extend external user API
  2014-09-01 12:52 ` Eric Auger
  (?)
@ 2014-09-01 12:52   ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: linux-kernel, patches, will.deacon, a.motakis, a.rigo, john.liuli

New functions are added to be called from ARM KVM-VFIO device.

- vfio_device_get_external_user enables to get a vfio device from
  its fd
- vfio_device_put_external_user puts the vfio device
- vfio_external_base_device returns the struct device*,
  useful to access the platform_device

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v1 -> v2:

- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed
---
 drivers/vfio/vfio.c  | 24 ++++++++++++++++++++++++
 include/linux/vfio.h |  3 +++
 2 files changed, 27 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 8e84471..282814e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group *group)
 }
 EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 
+struct vfio_device *vfio_device_get_external_user(struct file *filep)
+{
+	struct vfio_device *vdev = filep->private_data;
+
+	if (filep->f_op != &vfio_device_fops)
+		return ERR_PTR(-EINVAL);
+
+	vfio_device_get(vdev);
+	return vdev;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
+
+void vfio_device_put_external_user(struct vfio_device *vdev)
+{
+	vfio_device_put(vdev);
+}
+EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
+
+struct device *vfio_external_base_device(struct vfio_device *vdev)
+{
+	return vdev->dev;
+}
+EXPORT_SYMBOL_GPL(vfio_external_base_device);
+
 int vfio_external_user_iommu_id(struct vfio_group *group)
 {
 	return iommu_group_id(group->iommu_group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ffe04ed..bd4b6cb 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
 					  unsigned long arg);
+extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
+extern void vfio_device_put_external_user(struct vfio_device *vdev);
+extern struct device *vfio_external_base_device(struct vfio_device *vdev);
 
 struct pci_dev;
 #ifdef CONFIG_EEH
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 6/9] VFIO: Extend external user API
@ 2014-09-01 12:52   ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: patches, john.liuli, will.deacon, a.rigo, linux-kernel, a.motakis

New functions are added to be called from ARM KVM-VFIO device.

- vfio_device_get_external_user enables to get a vfio device from
  its fd
- vfio_device_put_external_user puts the vfio device
- vfio_external_base_device returns the struct device*,
  useful to access the platform_device

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v1 -> v2:

- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed
---
 drivers/vfio/vfio.c  | 24 ++++++++++++++++++++++++
 include/linux/vfio.h |  3 +++
 2 files changed, 27 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 8e84471..282814e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group *group)
 }
 EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 
+struct vfio_device *vfio_device_get_external_user(struct file *filep)
+{
+	struct vfio_device *vdev = filep->private_data;
+
+	if (filep->f_op != &vfio_device_fops)
+		return ERR_PTR(-EINVAL);
+
+	vfio_device_get(vdev);
+	return vdev;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
+
+void vfio_device_put_external_user(struct vfio_device *vdev)
+{
+	vfio_device_put(vdev);
+}
+EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
+
+struct device *vfio_external_base_device(struct vfio_device *vdev)
+{
+	return vdev->dev;
+}
+EXPORT_SYMBOL_GPL(vfio_external_base_device);
+
 int vfio_external_user_iommu_id(struct vfio_group *group)
 {
 	return iommu_group_id(group->iommu_group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ffe04ed..bd4b6cb 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
 					  unsigned long arg);
+extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
+extern void vfio_device_put_external_user(struct vfio_device *vdev);
+extern struct device *vfio_external_base_device(struct vfio_device *vdev);
 
 struct pci_dev;
 #ifdef CONFIG_EEH
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 6/9] VFIO: Extend external user API
@ 2014-09-01 12:52   ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

New functions are added to be called from ARM KVM-VFIO device.

- vfio_device_get_external_user enables to get a vfio device from
  its fd
- vfio_device_put_external_user puts the vfio device
- vfio_external_base_device returns the struct device*,
  useful to access the platform_device

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v1 -> v2:

- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed
---
 drivers/vfio/vfio.c  | 24 ++++++++++++++++++++++++
 include/linux/vfio.h |  3 +++
 2 files changed, 27 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 8e84471..282814e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group *group)
 }
 EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 
+struct vfio_device *vfio_device_get_external_user(struct file *filep)
+{
+	struct vfio_device *vdev = filep->private_data;
+
+	if (filep->f_op != &vfio_device_fops)
+		return ERR_PTR(-EINVAL);
+
+	vfio_device_get(vdev);
+	return vdev;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
+
+void vfio_device_put_external_user(struct vfio_device *vdev)
+{
+	vfio_device_put(vdev);
+}
+EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
+
+struct device *vfio_external_base_device(struct vfio_device *vdev)
+{
+	return vdev->dev;
+}
+EXPORT_SYMBOL_GPL(vfio_external_base_device);
+
 int vfio_external_user_iommu_id(struct vfio_group *group)
 {
 	return iommu_group_id(group->iommu_group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ffe04ed..bd4b6cb 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
 					  unsigned long arg);
+extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
+extern void vfio_device_put_external_user(struct vfio_device *vdev);
+extern struct device *vfio_external_base_device(struct vfio_device *vdev);
 
 struct pci_dev;
 #ifdef CONFIG_EEH
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 7/9] KVM: KVM-VFIO: add new VFIO external API hooks
  2014-09-01 12:52 ` Eric Auger
@ 2014-09-01 12:52   ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: linux-kernel, patches, will.deacon, a.motakis, a.rigo, john.liuli

add functions that implement the gateway to the extended
external VFIO API:
- kvm_vfio_device_get_external_user
- kvm_vfio_device_put_external_user
- kvm_vfio_external_base_device

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v1 -> v2:
- kvm_vfio_external_get_base_device renamed into
  kvm_vfio_external_base_device
- kvm_vfio_external_get_type removed
---
 arch/arm/include/asm/kvm_host.h |  5 +++++
 virt/kvm/vfio.c                 | 45 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..1aee6bb 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -171,6 +171,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 
+struct vfio_device;
+struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep);
+void kvm_vfio_device_put_external_user(struct vfio_device *vdev);
+struct device *kvm_vfio_external_base_device(struct vfio_device *vdev);
+
 /* We do not have shadow page tables, hence the empty hooks */
 static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
 {
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index ba1a93f..76dc7a1 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -59,6 +59,51 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
 	symbol_put(vfio_group_put_external_user);
 }
 
+struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
+{
+	struct vfio_device *vdev;
+	struct vfio_device *(*fn)(struct file *);
+
+	fn = symbol_get(vfio_device_get_external_user);
+	if (!fn)
+		return ERR_PTR(-EINVAL);
+
+	vdev = fn(filep);
+
+	symbol_put(vfio_device_get_external_user);
+
+	return vdev;
+}
+
+void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
+{
+	void (*fn)(struct vfio_device *);
+
+	fn = symbol_get(vfio_device_put_external_user);
+	if (!fn)
+		return;
+
+	fn(vdev);
+
+	symbol_put(vfio_device_put_external_user);
+}
+
+struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
+{
+	struct device *(*fn)(struct vfio_device *);
+	struct device *dev;
+
+	fn = symbol_get(vfio_external_base_device);
+	if (!fn)
+		return NULL;
+
+	dev = fn(vdev);
+
+	symbol_put(vfio_external_base_device);
+
+	return dev;
+}
+
 static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
 {
 	long (*fn)(struct vfio_group *, unsigned long);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 7/9] KVM: KVM-VFIO: add new VFIO external API hooks
@ 2014-09-01 12:52   ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

add functions that implement the gateway to the extended
external VFIO API:
- kvm_vfio_device_get_external_user
- kvm_vfio_device_put_external_user
- kvm_vfio_external_base_device

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v1 -> v2:
- kvm_vfio_external_get_base_device renamed into
  kvm_vfio_external_base_device
- kvm_vfio_external_get_type removed
---
 arch/arm/include/asm/kvm_host.h |  5 +++++
 virt/kvm/vfio.c                 | 45 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..1aee6bb 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -171,6 +171,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 
+struct vfio_device;
+struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep);
+void kvm_vfio_device_put_external_user(struct vfio_device *vdev);
+struct device *kvm_vfio_external_base_device(struct vfio_device *vdev);
+
 /* We do not have shadow page tables, hence the empty hooks */
 static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
 {
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index ba1a93f..76dc7a1 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -59,6 +59,51 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
 	symbol_put(vfio_group_put_external_user);
 }
 
+struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
+{
+	struct vfio_device *vdev;
+	struct vfio_device *(*fn)(struct file *);
+
+	fn = symbol_get(vfio_device_get_external_user);
+	if (!fn)
+		return ERR_PTR(-EINVAL);
+
+	vdev = fn(filep);
+
+	symbol_put(vfio_device_get_external_user);
+
+	return vdev;
+}
+
+void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
+{
+	void (*fn)(struct vfio_device *);
+
+	fn = symbol_get(vfio_device_put_external_user);
+	if (!fn)
+		return;
+
+	fn(vdev);
+
+	symbol_put(vfio_device_put_external_user);
+}
+
+struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
+{
+	struct device *(*fn)(struct vfio_device *);
+	struct device *dev;
+
+	fn = symbol_get(vfio_external_base_device);
+	if (!fn)
+		return NULL;
+
+	dev = fn(vdev);
+
+	symbol_put(vfio_external_base_device);
+
+	return dev;
+}
+
 static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
 {
 	long (*fn)(struct vfio_group *, unsigned long);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-01 12:52 ` Eric Auger
  (?)
@ 2014-09-01 12:52   ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: linux-kernel, patches, will.deacon, a.motakis, a.rigo, john.liuli

This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.

This is a new control channel which enables KVM to cooperate with
viable VFIO devices.

The kvm-vfio device now holds a list of devices (kvm_vfio_device)
in addition to a list of groups (kvm_vfio_group). The new
infrastructure enables to check the validity of the VFIO device
file descriptor, get and hold a reference to it.

The first concrete implemented command is IRQ forward control:
KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.

It consists in programing the VFIO driver and KVM in a consistent manner
so that an optimized IRQ injection/completion is set up. Each
kvm_vfio_device holds a list of forwarded IRQ. When putting a
kvm_vfio_device, the implementation makes sure the forwarded IRQs
are set again in the normal handling state (non forwarded).

The forwarding programmming is architecture specific, embodied by the
kvm_arch_set_fwd_state function. Its implementation is given in a
separate patch file.

The forwarding control modality is enabled by the
__KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v1 -> v2:
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
- original patch file separated into 2 parts: generic part moved in vfio.c
  and ARM specific part(kvm_arch_set_fwd_state)
---
 include/linux/kvm_host.h |  27 +++
 virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 477 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..24350dc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1065,6 +1065,21 @@ struct kvm_device_ops {
 		      unsigned long arg);
 };
 
+enum kvm_fwd_irq_action {
+	KVM_VFIO_IRQ_SET_FORWARD,
+	KVM_VFIO_IRQ_SET_NORMAL,
+	KVM_VFIO_IRQ_CLEANUP,
+};
+
+/* internal structure describing a forwarded IRQ */
+struct kvm_fwd_irq {
+	struct list_head link;
+	__u32 index; /* platform device irq index */
+	__u32 hwirq; /*physical IRQ */
+	__u32 gsi; /* virtual IRQ */
+	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
+};
+
 void kvm_device_get(struct kvm_device *dev);
 void kvm_device_put(struct kvm_device *dev);
 struct kvm_device *kvm_device_from_filp(struct file *filp);
@@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
 extern struct kvm_device_ops kvm_flic_ops;
 
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
+			   enum kvm_fwd_irq_action action);
+
+#else
+static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
+					 enum kvm_fwd_irq_action action)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
 static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 76dc7a1..e4a81c4 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -18,14 +18,24 @@
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
+#include <linux/platform_device.h>
 
 struct kvm_vfio_group {
 	struct list_head node;
 	struct vfio_group *vfio_group;
 };
 
+struct kvm_vfio_device {
+	struct list_head node;
+	struct vfio_device *vfio_device;
+	/* list of forwarded IRQs for that VFIO device */
+	struct list_head fwd_irq_list;
+	int fd;
+};
+
 struct kvm_vfio {
 	struct list_head group_list;
+	struct list_head device_list;
 	struct mutex lock;
 	bool noncoherent;
 };
@@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 	return -ENXIO;
 }
 
+/**
+ * get_vfio_device - returns the vfio-device corresponding to this fd
+ * @fd:fd of the vfio platform device
+ *
+ * checks it is a vfio device
+ * increment its ref counter
+ */
+static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
+{
+	struct fd f;
+	struct vfio_device *vdev;
+
+	f = fdget(fd);
+	if (!f.file)
+		return NULL;
+	vdev = kvm_vfio_device_get_external_user(f.file);
+	fdput(f);
+	return vdev;
+}
+
+/**
+ * put_vfio_device: put the vfio platform device
+ * @vdev: vfio_device to put
+ *
+ * decrement the ref counter
+ */
+static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
+{
+	kvm_vfio_device_put_external_user(vdev);
+}
+
+/**
+ * kvm_vfio_find_device - look for the device in the assigned
+ * device list
+ * @kv: the kvm-vfio device
+ * @vdev: the vfio_device to look for
+ *
+ * returns the associated kvm_vfio_device if the device is known,
+ * meaning at least 1 IRQ is forwarded for this device.
+ * in the device is not registered, returns NULL.
+ */
+struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
+					     struct vfio_device *vdev)
+{
+	struct kvm_vfio_device *kvm_vdev_iter;
+
+	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
+		if (kvm_vdev_iter->vfio_device == vdev)
+			return kvm_vdev_iter;
+	}
+	return NULL;
+}
+
+/**
+ * kvm_vfio_find_irq - look for a an irq in the device IRQ list
+ * @kvm_vdev: the kvm_vfio_device
+ * @irq_index: irq index
+ *
+ * returns the forwarded irq struct if it exists, NULL in the negative
+ */
+struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
+				      int irq_index)
+{
+	struct kvm_fwd_irq *fwd_irq_iter;
+
+	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
+		if (fwd_irq_iter->index == irq_index)
+			return fwd_irq_iter;
+	}
+	return NULL;
+}
+
+/**
+ * validate_forward - checks whether forwarding a given IRQ is meaningful
+ * @vdev:  vfio_device the IRQ belongs to
+ * @fwd_irq: user struct containing the irq_index to forward
+ * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
+ * kvm_vfio_device that holds it
+ * @hwirq: irq numberthe irq index corresponds to
+ *
+ * checks the vfio-device is a platform vfio device
+ * checks the irq_index corresponds to an actual hwirq and
+ * checks this hwirq is not already forwarded
+ * returns < 0 on following errors:
+ * not a platform device, bad irq index, already forwarded
+ */
+static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
+			    struct vfio_device *vdev,
+			    struct kvm_arch_forwarded_irq *fwd_irq,
+			    struct kvm_vfio_device **kvm_vdev,
+			    int *hwirq)
+{
+	struct device *dev = kvm_vfio_external_base_device(vdev);
+	struct platform_device *platdev;
+
+	*hwirq = -1;
+	*kvm_vdev = NULL;
+	if (strcmp(dev->bus->name, "platform") == 0) {
+		platdev = to_platform_device(dev);
+		*hwirq = platform_get_irq(platdev, fwd_irq->index);
+		if (*hwirq < 0) {
+			kvm_err("%s incorrect index\n", __func__);
+			return -EINVAL;
+		}
+	} else {
+		kvm_err("%s not a platform device\n", __func__);
+		return -EINVAL;
+	}
+	/* is a ref to this device already owned by the KVM-VFIO device? */
+	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
+	if (*kvm_vdev) {
+		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
+			kvm_err("%s irq %d already forwarded\n",
+				__func__, *hwirq);
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+/**
+ * validate_unforward: check a deassignment is meaningful
+ * @kv: the kvm_vfio device
+ * @vdev: the vfio_device whose irq to deassign belongs to
+ * @fwd_irq: the user struct that contains the fd and irq_index of the irq
+ * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
+ * it exists
+ *
+ * returns 0 if the provided irq effectively is forwarded
+ * (a ref to this vfio_device is hold and this irq belongs to
+ * the forwarded irq of this device)
+ * returns -EINVAL in the negative
+ */
+static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
+			      struct vfio_device *vdev,
+			      struct kvm_arch_forwarded_irq *fwd_irq,
+			      struct kvm_vfio_device **kvm_vdev)
+{
+	struct kvm_fwd_irq *pfwd;
+
+	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
+	if (!kvm_vdev) {
+		kvm_err("%s no forwarded irq for this device\n", __func__);
+		return -EINVAL;
+	}
+	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
+	if (!pfwd) {
+		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * kvm_vfio_forward - set a forwarded IRQ
+ * @kdev: the kvm device
+ * @vdev: the vfio device the IRQ belongs to
+ * @fwd_irq: the user struct containing the irq_index and guest irq
+ * @must_put: tells the caller whether the vfio_device must be put after
+ * the call (ref must be released in case a ref onto this device was
+ * already hold or in case of new device and failure)
+ *
+ * validate the injection, activate forward and store the information
+ * about which irq and which device is concerned so that on deassign or
+ * kvm-vfio destruction everuthing can be cleaned up.
+ */
+static int kvm_vfio_forward(struct kvm_device *kdev,
+			    struct vfio_device *vdev,
+			    struct kvm_arch_forwarded_irq *fwd_irq,
+			    bool *must_put)
+{
+	int ret;
+	struct kvm_fwd_irq *pfwd = NULL;
+	struct kvm_vfio_device *kvm_vdev = NULL;
+	struct kvm_vfio *kv = kdev->private;
+	int hwirq;
+
+	*must_put = true;
+	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
+					&kvm_vdev, &hwirq);
+	if (ret < 0)
+		return -EINVAL;
+
+	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
+	if (!pfwd)
+		return -ENOMEM;
+	pfwd->index = fwd_irq->index;
+	pfwd->gsi = fwd_irq->gsi;
+	pfwd->hwirq = hwirq;
+	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
+	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
+	if (ret < 0) {
+		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
+		kfree(pfwd);
+		return ret;
+	}
+
+	if (!kvm_vdev) {
+		/* create & insert the new device and keep the ref */
+		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
+		if (!kvm_vdev) {
+			kvm_arch_set_fwd_state(pfwd, false);
+			kfree(pfwd);
+			return -ENOMEM;
+		}
+
+		kvm_vdev->vfio_device = vdev;
+		kvm_vdev->fd = fwd_irq->fd;
+		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
+		list_add(&kvm_vdev->node, &kv->device_list);
+		/*
+		 * the only case where we keep the ref:
+		 * new device and forward setting successful
+		 */
+		*must_put = false;
+	}
+
+	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
+
+	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
+	fwd_irq->fd, hwirq, fwd_irq->gsi);
+
+	return 0;
+}
+
+/**
+ * remove_assigned_device - put a given device from the list
+ * @kv: the kvm-vfio device
+ * @vdev: the vfio-device to remove
+ *
+ * change the state of all forwarded IRQs, free the forwarded IRQ list,
+ * remove the corresponding kvm_vfio_device from the assigned device
+ * list.
+ * returns true if the device could be removed, false in the negative
+ */
+bool remove_assigned_device(struct kvm_vfio *kv,
+			    struct vfio_device *vdev)
+{
+	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
+	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
+	bool removed = false;
+	int ret;
+
+	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
+				 &kv->device_list, node) {
+		if (kvm_vdev_iter->vfio_device == vdev) {
+			/* loop on all its forwarded IRQ */
+			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
+						 &kvm_vdev_iter->fwd_irq_list,
+						 link) {
+				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
+						KVM_VFIO_IRQ_SET_NORMAL);
+				if (ret < 0)
+					return ret;
+				list_del(&fwd_irq_iter->link);
+				kfree(fwd_irq_iter);
+			}
+			/* all IRQs could be deassigned */
+			list_del(&kvm_vdev_iter->node);
+			kvm_vfio_device_put_external_user(
+				kvm_vdev_iter->vfio_device);
+			kfree(kvm_vdev_iter);
+			removed = true;
+			break;
+		}
+	}
+	return removed;
+}
+
+
+/**
+ * remove_fwd_irq - remove a forwarded irq
+ *
+ * @kv: kvm-vfio device
+ * kvm_vdev: the kvm_vfio_device the IRQ belongs to
+ * irq_index: the index of the IRQ
+ *
+ * change the forwarded state of the IRQ, remove the IRQ from
+ * the device forwarded IRQ list. In case it is the last one,
+ * put the device
+ */
+int remove_fwd_irq(struct kvm_vfio *kv,
+		   struct kvm_vfio_device *kvm_vdev,
+		   int irq_index)
+{
+	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
+	int ret = -1;
+
+	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
+				 &kvm_vdev->fwd_irq_list, link) {
+		if (fwd_irq_iter->index == irq_index) {
+			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
+						KVM_VFIO_IRQ_SET_NORMAL);
+			if (ret < 0)
+				break;
+			list_del(&fwd_irq_iter->link);
+			kfree(fwd_irq_iter);
+			ret = 0;
+			break;
+		}
+	}
+	if (list_empty(&kvm_vdev->fwd_irq_list))
+		remove_assigned_device(kv, kvm_vdev->vfio_device);
+
+	return ret;
+}
+
+/**
+ * kvm_vfio_unforward - remove a forwarded IRQ
+ * @kdev: the kvm device
+ * @vdev: the vfio_device
+ * @fwd_irq: user struct
+ * after checking this IRQ effectively is forwarded, change its state,
+ * remove it from the corresponding kvm_vfio_device list
+ */
+static int kvm_vfio_unforward(struct kvm_device *kdev,
+				     struct vfio_device *vdev,
+				     struct kvm_arch_forwarded_irq *fwd_irq)
+{
+	struct kvm_vfio *kv = kdev->private;
+	struct kvm_vfio_device *kvm_vdev;
+	int ret;
+
+	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
+	if (ret < 0)
+		return -EINVAL;
+
+	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
+	if (ret < 0)
+		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
+			__func__, fwd_irq->fd, fwd_irq->index);
+	else
+		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
+			  __func__, fwd_irq->fd, fwd_irq->index);
+	return ret;
+}
+
+
+
+
+/**
+ * kvm_vfio_set_device - the top function for interracting with a vfio
+ * device
+ */
+static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
+{
+	struct kvm_vfio *kv = kdev->private;
+	struct vfio_device *vdev;
+	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
+	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
+
+	switch (attr) {
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
+		bool must_put;
+		int ret;
+
+		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
+			return -EFAULT;
+		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
+		if (IS_ERR(vdev))
+			return PTR_ERR(vdev);
+		mutex_lock(&kv->lock);
+		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
+		if (must_put)
+			kvm_vfio_put_vfio_device(vdev);
+		mutex_unlock(&kv->lock);
+		return ret;
+		}
+	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
+		int ret;
+
+		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
+			return -EFAULT;
+		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
+		if (IS_ERR(vdev))
+			return PTR_ERR(vdev);
+
+		kvm_vfio_device_put_external_user(vdev);
+		mutex_lock(&kv->lock);
+		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
+		mutex_unlock(&kv->lock);
+		return ret;
+	}
+#endif
+	default:
+		return -ENXIO;
+	}
+}
+
+/**
+ * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
+ * @kv: kvm-vfio device
+ *
+ * loop on all got devices and their associated forwarded IRQs
+ * restore the non forwarded state, remove IRQs and their devices from
+ * the respective list, put the vfio platform devices
+ *
+ * When this function is called, the vcpu already are destroyed. No
+ * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
+ * kvm_arch_set_fwd_state action
+ */
+int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
+{
+	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
+	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
+
+	/* loop on all the assigned devices */
+	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
+				 &kv->device_list, node) {
+
+		/* loop on all its forwarded IRQ */
+		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
+					 &kvm_vdev_iter->fwd_irq_list, link) {
+			kvm_arch_set_fwd_state(fwd_irq_iter,
+						KVM_VFIO_IRQ_CLEANUP);
+			list_del(&fwd_irq_iter->link);
+			kfree(fwd_irq_iter);
+		}
+		list_del(&kvm_vdev_iter->node);
+		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
+		kfree(kvm_vdev_iter);
+	}
+	return 0;
+}
+
+
 static int kvm_vfio_set_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
 	case KVM_DEV_VFIO_GROUP:
 		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
+	case KVM_DEV_VFIO_DEVICE:
+		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
 	}
 
 	return -ENXIO;
@@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 		case KVM_DEV_VFIO_GROUP_DEL:
 			return 0;
 		}
-
 		break;
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+	case KVM_DEV_VFIO_DEVICE:
+		switch (attr->attr) {
+		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
+		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
+			return 0;
+		}
+		break;
+#endif
 	}
-
 	return -ENXIO;
 }
 
@@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
 		list_del(&kvg->node);
 		kfree(kvg);
 	}
+	kvm_vfio_put_all_devices(kv);
 
 	kvm_vfio_update_coherency(dev);
 
@@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
 		return -ENOMEM;
 
 	INIT_LIST_HEAD(&kv->group_list);
+	INIT_LIST_HEAD(&kv->device_list);
 	mutex_init(&kv->lock);
 
 	dev->private = kv;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-01 12:52   ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: patches, john.liuli, will.deacon, a.rigo, linux-kernel, a.motakis

This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.

This is a new control channel which enables KVM to cooperate with
viable VFIO devices.

The kvm-vfio device now holds a list of devices (kvm_vfio_device)
in addition to a list of groups (kvm_vfio_group). The new
infrastructure enables to check the validity of the VFIO device
file descriptor, get and hold a reference to it.

The first concrete implemented command is IRQ forward control:
KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.

It consists in programing the VFIO driver and KVM in a consistent manner
so that an optimized IRQ injection/completion is set up. Each
kvm_vfio_device holds a list of forwarded IRQ. When putting a
kvm_vfio_device, the implementation makes sure the forwarded IRQs
are set again in the normal handling state (non forwarded).

The forwarding programmming is architecture specific, embodied by the
kvm_arch_set_fwd_state function. Its implementation is given in a
separate patch file.

The forwarding control modality is enabled by the
__KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v1 -> v2:
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
- original patch file separated into 2 parts: generic part moved in vfio.c
  and ARM specific part(kvm_arch_set_fwd_state)
---
 include/linux/kvm_host.h |  27 +++
 virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 477 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..24350dc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1065,6 +1065,21 @@ struct kvm_device_ops {
 		      unsigned long arg);
 };
 
+enum kvm_fwd_irq_action {
+	KVM_VFIO_IRQ_SET_FORWARD,
+	KVM_VFIO_IRQ_SET_NORMAL,
+	KVM_VFIO_IRQ_CLEANUP,
+};
+
+/* internal structure describing a forwarded IRQ */
+struct kvm_fwd_irq {
+	struct list_head link;
+	__u32 index; /* platform device irq index */
+	__u32 hwirq; /*physical IRQ */
+	__u32 gsi; /* virtual IRQ */
+	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
+};
+
 void kvm_device_get(struct kvm_device *dev);
 void kvm_device_put(struct kvm_device *dev);
 struct kvm_device *kvm_device_from_filp(struct file *filp);
@@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
 extern struct kvm_device_ops kvm_flic_ops;
 
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
+			   enum kvm_fwd_irq_action action);
+
+#else
+static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
+					 enum kvm_fwd_irq_action action)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
 static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 76dc7a1..e4a81c4 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -18,14 +18,24 @@
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
+#include <linux/platform_device.h>
 
 struct kvm_vfio_group {
 	struct list_head node;
 	struct vfio_group *vfio_group;
 };
 
+struct kvm_vfio_device {
+	struct list_head node;
+	struct vfio_device *vfio_device;
+	/* list of forwarded IRQs for that VFIO device */
+	struct list_head fwd_irq_list;
+	int fd;
+};
+
 struct kvm_vfio {
 	struct list_head group_list;
+	struct list_head device_list;
 	struct mutex lock;
 	bool noncoherent;
 };
@@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 	return -ENXIO;
 }
 
+/**
+ * get_vfio_device - returns the vfio-device corresponding to this fd
+ * @fd:fd of the vfio platform device
+ *
+ * checks it is a vfio device
+ * increment its ref counter
+ */
+static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
+{
+	struct fd f;
+	struct vfio_device *vdev;
+
+	f = fdget(fd);
+	if (!f.file)
+		return NULL;
+	vdev = kvm_vfio_device_get_external_user(f.file);
+	fdput(f);
+	return vdev;
+}
+
+/**
+ * put_vfio_device: put the vfio platform device
+ * @vdev: vfio_device to put
+ *
+ * decrement the ref counter
+ */
+static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
+{
+	kvm_vfio_device_put_external_user(vdev);
+}
+
+/**
+ * kvm_vfio_find_device - look for the device in the assigned
+ * device list
+ * @kv: the kvm-vfio device
+ * @vdev: the vfio_device to look for
+ *
+ * returns the associated kvm_vfio_device if the device is known,
+ * meaning at least 1 IRQ is forwarded for this device.
+ * in the device is not registered, returns NULL.
+ */
+struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
+					     struct vfio_device *vdev)
+{
+	struct kvm_vfio_device *kvm_vdev_iter;
+
+	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
+		if (kvm_vdev_iter->vfio_device == vdev)
+			return kvm_vdev_iter;
+	}
+	return NULL;
+}
+
+/**
+ * kvm_vfio_find_irq - look for a an irq in the device IRQ list
+ * @kvm_vdev: the kvm_vfio_device
+ * @irq_index: irq index
+ *
+ * returns the forwarded irq struct if it exists, NULL in the negative
+ */
+struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
+				      int irq_index)
+{
+	struct kvm_fwd_irq *fwd_irq_iter;
+
+	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
+		if (fwd_irq_iter->index == irq_index)
+			return fwd_irq_iter;
+	}
+	return NULL;
+}
+
+/**
+ * validate_forward - checks whether forwarding a given IRQ is meaningful
+ * @vdev:  vfio_device the IRQ belongs to
+ * @fwd_irq: user struct containing the irq_index to forward
+ * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
+ * kvm_vfio_device that holds it
+ * @hwirq: irq numberthe irq index corresponds to
+ *
+ * checks the vfio-device is a platform vfio device
+ * checks the irq_index corresponds to an actual hwirq and
+ * checks this hwirq is not already forwarded
+ * returns < 0 on following errors:
+ * not a platform device, bad irq index, already forwarded
+ */
+static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
+			    struct vfio_device *vdev,
+			    struct kvm_arch_forwarded_irq *fwd_irq,
+			    struct kvm_vfio_device **kvm_vdev,
+			    int *hwirq)
+{
+	struct device *dev = kvm_vfio_external_base_device(vdev);
+	struct platform_device *platdev;
+
+	*hwirq = -1;
+	*kvm_vdev = NULL;
+	if (strcmp(dev->bus->name, "platform") == 0) {
+		platdev = to_platform_device(dev);
+		*hwirq = platform_get_irq(platdev, fwd_irq->index);
+		if (*hwirq < 0) {
+			kvm_err("%s incorrect index\n", __func__);
+			return -EINVAL;
+		}
+	} else {
+		kvm_err("%s not a platform device\n", __func__);
+		return -EINVAL;
+	}
+	/* is a ref to this device already owned by the KVM-VFIO device? */
+	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
+	if (*kvm_vdev) {
+		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
+			kvm_err("%s irq %d already forwarded\n",
+				__func__, *hwirq);
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+/**
+ * validate_unforward: check a deassignment is meaningful
+ * @kv: the kvm_vfio device
+ * @vdev: the vfio_device whose irq to deassign belongs to
+ * @fwd_irq: the user struct that contains the fd and irq_index of the irq
+ * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
+ * it exists
+ *
+ * returns 0 if the provided irq effectively is forwarded
+ * (a ref to this vfio_device is hold and this irq belongs to
+ * the forwarded irq of this device)
+ * returns -EINVAL in the negative
+ */
+static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
+			      struct vfio_device *vdev,
+			      struct kvm_arch_forwarded_irq *fwd_irq,
+			      struct kvm_vfio_device **kvm_vdev)
+{
+	struct kvm_fwd_irq *pfwd;
+
+	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
+	if (!kvm_vdev) {
+		kvm_err("%s no forwarded irq for this device\n", __func__);
+		return -EINVAL;
+	}
+	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
+	if (!pfwd) {
+		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * kvm_vfio_forward - set a forwarded IRQ
+ * @kdev: the kvm device
+ * @vdev: the vfio device the IRQ belongs to
+ * @fwd_irq: the user struct containing the irq_index and guest irq
+ * @must_put: tells the caller whether the vfio_device must be put after
+ * the call (ref must be released in case a ref onto this device was
+ * already hold or in case of new device and failure)
+ *
+ * validate the injection, activate forward and store the information
+ * about which irq and which device is concerned so that on deassign or
+ * kvm-vfio destruction everuthing can be cleaned up.
+ */
+static int kvm_vfio_forward(struct kvm_device *kdev,
+			    struct vfio_device *vdev,
+			    struct kvm_arch_forwarded_irq *fwd_irq,
+			    bool *must_put)
+{
+	int ret;
+	struct kvm_fwd_irq *pfwd = NULL;
+	struct kvm_vfio_device *kvm_vdev = NULL;
+	struct kvm_vfio *kv = kdev->private;
+	int hwirq;
+
+	*must_put = true;
+	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
+					&kvm_vdev, &hwirq);
+	if (ret < 0)
+		return -EINVAL;
+
+	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
+	if (!pfwd)
+		return -ENOMEM;
+	pfwd->index = fwd_irq->index;
+	pfwd->gsi = fwd_irq->gsi;
+	pfwd->hwirq = hwirq;
+	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
+	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
+	if (ret < 0) {
+		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
+		kfree(pfwd);
+		return ret;
+	}
+
+	if (!kvm_vdev) {
+		/* create & insert the new device and keep the ref */
+		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
+		if (!kvm_vdev) {
+			kvm_arch_set_fwd_state(pfwd, false);
+			kfree(pfwd);
+			return -ENOMEM;
+		}
+
+		kvm_vdev->vfio_device = vdev;
+		kvm_vdev->fd = fwd_irq->fd;
+		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
+		list_add(&kvm_vdev->node, &kv->device_list);
+		/*
+		 * the only case where we keep the ref:
+		 * new device and forward setting successful
+		 */
+		*must_put = false;
+	}
+
+	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
+
+	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
+	fwd_irq->fd, hwirq, fwd_irq->gsi);
+
+	return 0;
+}
+
+/**
+ * remove_assigned_device - put a given device from the list
+ * @kv: the kvm-vfio device
+ * @vdev: the vfio-device to remove
+ *
+ * change the state of all forwarded IRQs, free the forwarded IRQ list,
+ * remove the corresponding kvm_vfio_device from the assigned device
+ * list.
+ * returns true if the device could be removed, false in the negative
+ */
+bool remove_assigned_device(struct kvm_vfio *kv,
+			    struct vfio_device *vdev)
+{
+	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
+	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
+	bool removed = false;
+	int ret;
+
+	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
+				 &kv->device_list, node) {
+		if (kvm_vdev_iter->vfio_device == vdev) {
+			/* loop on all its forwarded IRQ */
+			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
+						 &kvm_vdev_iter->fwd_irq_list,
+						 link) {
+				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
+						KVM_VFIO_IRQ_SET_NORMAL);
+				if (ret < 0)
+					return ret;
+				list_del(&fwd_irq_iter->link);
+				kfree(fwd_irq_iter);
+			}
+			/* all IRQs could be deassigned */
+			list_del(&kvm_vdev_iter->node);
+			kvm_vfio_device_put_external_user(
+				kvm_vdev_iter->vfio_device);
+			kfree(kvm_vdev_iter);
+			removed = true;
+			break;
+		}
+	}
+	return removed;
+}
+
+
+/**
+ * remove_fwd_irq - remove a forwarded irq
+ *
+ * @kv: kvm-vfio device
+ * kvm_vdev: the kvm_vfio_device the IRQ belongs to
+ * irq_index: the index of the IRQ
+ *
+ * change the forwarded state of the IRQ, remove the IRQ from
+ * the device forwarded IRQ list. In case it is the last one,
+ * put the device
+ */
+int remove_fwd_irq(struct kvm_vfio *kv,
+		   struct kvm_vfio_device *kvm_vdev,
+		   int irq_index)
+{
+	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
+	int ret = -1;
+
+	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
+				 &kvm_vdev->fwd_irq_list, link) {
+		if (fwd_irq_iter->index == irq_index) {
+			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
+						KVM_VFIO_IRQ_SET_NORMAL);
+			if (ret < 0)
+				break;
+			list_del(&fwd_irq_iter->link);
+			kfree(fwd_irq_iter);
+			ret = 0;
+			break;
+		}
+	}
+	if (list_empty(&kvm_vdev->fwd_irq_list))
+		remove_assigned_device(kv, kvm_vdev->vfio_device);
+
+	return ret;
+}
+
+/**
+ * kvm_vfio_unforward - remove a forwarded IRQ
+ * @kdev: the kvm device
+ * @vdev: the vfio_device
+ * @fwd_irq: user struct
+ * after checking this IRQ effectively is forwarded, change its state,
+ * remove it from the corresponding kvm_vfio_device list
+ */
+static int kvm_vfio_unforward(struct kvm_device *kdev,
+				     struct vfio_device *vdev,
+				     struct kvm_arch_forwarded_irq *fwd_irq)
+{
+	struct kvm_vfio *kv = kdev->private;
+	struct kvm_vfio_device *kvm_vdev;
+	int ret;
+
+	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
+	if (ret < 0)
+		return -EINVAL;
+
+	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
+	if (ret < 0)
+		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
+			__func__, fwd_irq->fd, fwd_irq->index);
+	else
+		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
+			  __func__, fwd_irq->fd, fwd_irq->index);
+	return ret;
+}
+
+
+
+
+/**
+ * kvm_vfio_set_device - the top function for interracting with a vfio
+ * device
+ */
+static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
+{
+	struct kvm_vfio *kv = kdev->private;
+	struct vfio_device *vdev;
+	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
+	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
+
+	switch (attr) {
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
+		bool must_put;
+		int ret;
+
+		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
+			return -EFAULT;
+		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
+		if (IS_ERR(vdev))
+			return PTR_ERR(vdev);
+		mutex_lock(&kv->lock);
+		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
+		if (must_put)
+			kvm_vfio_put_vfio_device(vdev);
+		mutex_unlock(&kv->lock);
+		return ret;
+		}
+	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
+		int ret;
+
+		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
+			return -EFAULT;
+		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
+		if (IS_ERR(vdev))
+			return PTR_ERR(vdev);
+
+		kvm_vfio_device_put_external_user(vdev);
+		mutex_lock(&kv->lock);
+		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
+		mutex_unlock(&kv->lock);
+		return ret;
+	}
+#endif
+	default:
+		return -ENXIO;
+	}
+}
+
+/**
+ * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
+ * @kv: kvm-vfio device
+ *
+ * loop on all got devices and their associated forwarded IRQs
+ * restore the non forwarded state, remove IRQs and their devices from
+ * the respective list, put the vfio platform devices
+ *
+ * When this function is called, the vcpu already are destroyed. No
+ * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
+ * kvm_arch_set_fwd_state action
+ */
+int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
+{
+	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
+	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
+
+	/* loop on all the assigned devices */
+	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
+				 &kv->device_list, node) {
+
+		/* loop on all its forwarded IRQ */
+		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
+					 &kvm_vdev_iter->fwd_irq_list, link) {
+			kvm_arch_set_fwd_state(fwd_irq_iter,
+						KVM_VFIO_IRQ_CLEANUP);
+			list_del(&fwd_irq_iter->link);
+			kfree(fwd_irq_iter);
+		}
+		list_del(&kvm_vdev_iter->node);
+		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
+		kfree(kvm_vdev_iter);
+	}
+	return 0;
+}
+
+
 static int kvm_vfio_set_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
 	case KVM_DEV_VFIO_GROUP:
 		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
+	case KVM_DEV_VFIO_DEVICE:
+		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
 	}
 
 	return -ENXIO;
@@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 		case KVM_DEV_VFIO_GROUP_DEL:
 			return 0;
 		}
-
 		break;
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+	case KVM_DEV_VFIO_DEVICE:
+		switch (attr->attr) {
+		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
+		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
+			return 0;
+		}
+		break;
+#endif
 	}
-
 	return -ENXIO;
 }
 
@@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
 		list_del(&kvg->node);
 		kfree(kvg);
 	}
+	kvm_vfio_put_all_devices(kv);
 
 	kvm_vfio_update_coherency(dev);
 
@@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
 		return -ENOMEM;
 
 	INIT_LIST_HEAD(&kv->group_list);
+	INIT_LIST_HEAD(&kv->device_list);
 	mutex_init(&kv->lock);
 
 	dev->private = kv;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-01 12:52   ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.

This is a new control channel which enables KVM to cooperate with
viable VFIO devices.

The kvm-vfio device now holds a list of devices (kvm_vfio_device)
in addition to a list of groups (kvm_vfio_group). The new
infrastructure enables to check the validity of the VFIO device
file descriptor, get and hold a reference to it.

The first concrete implemented command is IRQ forward control:
KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.

It consists in programing the VFIO driver and KVM in a consistent manner
so that an optimized IRQ injection/completion is set up. Each
kvm_vfio_device holds a list of forwarded IRQ. When putting a
kvm_vfio_device, the implementation makes sure the forwarded IRQs
are set again in the normal handling state (non forwarded).

The forwarding programmming is architecture specific, embodied by the
kvm_arch_set_fwd_state function. Its implementation is given in a
separate patch file.

The forwarding control modality is enabled by the
__KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.

Signed-off-by: Eric Auger <eric.auger@linaro.org>

---

v1 -> v2:
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
- original patch file separated into 2 parts: generic part moved in vfio.c
  and ARM specific part(kvm_arch_set_fwd_state)
---
 include/linux/kvm_host.h |  27 +++
 virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 477 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..24350dc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1065,6 +1065,21 @@ struct kvm_device_ops {
 		      unsigned long arg);
 };
 
+enum kvm_fwd_irq_action {
+	KVM_VFIO_IRQ_SET_FORWARD,
+	KVM_VFIO_IRQ_SET_NORMAL,
+	KVM_VFIO_IRQ_CLEANUP,
+};
+
+/* internal structure describing a forwarded IRQ */
+struct kvm_fwd_irq {
+	struct list_head link;
+	__u32 index; /* platform device irq index */
+	__u32 hwirq; /*physical IRQ */
+	__u32 gsi; /* virtual IRQ */
+	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
+};
+
 void kvm_device_get(struct kvm_device *dev);
 void kvm_device_put(struct kvm_device *dev);
 struct kvm_device *kvm_device_from_filp(struct file *filp);
@@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
 extern struct kvm_device_ops kvm_flic_ops;
 
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
+			   enum kvm_fwd_irq_action action);
+
+#else
+static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
+					 enum kvm_fwd_irq_action action)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
 static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 76dc7a1..e4a81c4 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -18,14 +18,24 @@
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
+#include <linux/platform_device.h>
 
 struct kvm_vfio_group {
 	struct list_head node;
 	struct vfio_group *vfio_group;
 };
 
+struct kvm_vfio_device {
+	struct list_head node;
+	struct vfio_device *vfio_device;
+	/* list of forwarded IRQs for that VFIO device */
+	struct list_head fwd_irq_list;
+	int fd;
+};
+
 struct kvm_vfio {
 	struct list_head group_list;
+	struct list_head device_list;
 	struct mutex lock;
 	bool noncoherent;
 };
@@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 	return -ENXIO;
 }
 
+/**
+ * get_vfio_device - returns the vfio-device corresponding to this fd
+ * @fd:fd of the vfio platform device
+ *
+ * checks it is a vfio device
+ * increment its ref counter
+ */
+static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
+{
+	struct fd f;
+	struct vfio_device *vdev;
+
+	f = fdget(fd);
+	if (!f.file)
+		return NULL;
+	vdev = kvm_vfio_device_get_external_user(f.file);
+	fdput(f);
+	return vdev;
+}
+
+/**
+ * put_vfio_device: put the vfio platform device
+ * @vdev: vfio_device to put
+ *
+ * decrement the ref counter
+ */
+static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
+{
+	kvm_vfio_device_put_external_user(vdev);
+}
+
+/**
+ * kvm_vfio_find_device - look for the device in the assigned
+ * device list
+ * @kv: the kvm-vfio device
+ * @vdev: the vfio_device to look for
+ *
+ * returns the associated kvm_vfio_device if the device is known,
+ * meaning at least 1 IRQ is forwarded for this device.
+ * in the device is not registered, returns NULL.
+ */
+struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
+					     struct vfio_device *vdev)
+{
+	struct kvm_vfio_device *kvm_vdev_iter;
+
+	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
+		if (kvm_vdev_iter->vfio_device == vdev)
+			return kvm_vdev_iter;
+	}
+	return NULL;
+}
+
+/**
+ * kvm_vfio_find_irq - look for a an irq in the device IRQ list
+ * @kvm_vdev: the kvm_vfio_device
+ * @irq_index: irq index
+ *
+ * returns the forwarded irq struct if it exists, NULL in the negative
+ */
+struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
+				      int irq_index)
+{
+	struct kvm_fwd_irq *fwd_irq_iter;
+
+	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
+		if (fwd_irq_iter->index == irq_index)
+			return fwd_irq_iter;
+	}
+	return NULL;
+}
+
+/**
+ * validate_forward - checks whether forwarding a given IRQ is meaningful
+ * @vdev:  vfio_device the IRQ belongs to
+ * @fwd_irq: user struct containing the irq_index to forward
+ * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
+ * kvm_vfio_device that holds it
+ * @hwirq: irq numberthe irq index corresponds to
+ *
+ * checks the vfio-device is a platform vfio device
+ * checks the irq_index corresponds to an actual hwirq and
+ * checks this hwirq is not already forwarded
+ * returns < 0 on following errors:
+ * not a platform device, bad irq index, already forwarded
+ */
+static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
+			    struct vfio_device *vdev,
+			    struct kvm_arch_forwarded_irq *fwd_irq,
+			    struct kvm_vfio_device **kvm_vdev,
+			    int *hwirq)
+{
+	struct device *dev = kvm_vfio_external_base_device(vdev);
+	struct platform_device *platdev;
+
+	*hwirq = -1;
+	*kvm_vdev = NULL;
+	if (strcmp(dev->bus->name, "platform") == 0) {
+		platdev = to_platform_device(dev);
+		*hwirq = platform_get_irq(platdev, fwd_irq->index);
+		if (*hwirq < 0) {
+			kvm_err("%s incorrect index\n", __func__);
+			return -EINVAL;
+		}
+	} else {
+		kvm_err("%s not a platform device\n", __func__);
+		return -EINVAL;
+	}
+	/* is a ref to this device already owned by the KVM-VFIO device? */
+	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
+	if (*kvm_vdev) {
+		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
+			kvm_err("%s irq %d already forwarded\n",
+				__func__, *hwirq);
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+/**
+ * validate_unforward: check a deassignment is meaningful
+ * @kv: the kvm_vfio device
+ * @vdev: the vfio_device whose irq to deassign belongs to
+ * @fwd_irq: the user struct that contains the fd and irq_index of the irq
+ * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
+ * it exists
+ *
+ * returns 0 if the provided irq effectively is forwarded
+ * (a ref to this vfio_device is hold and this irq belongs to
+ * the forwarded irq of this device)
+ * returns -EINVAL in the negative
+ */
+static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
+			      struct vfio_device *vdev,
+			      struct kvm_arch_forwarded_irq *fwd_irq,
+			      struct kvm_vfio_device **kvm_vdev)
+{
+	struct kvm_fwd_irq *pfwd;
+
+	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
+	if (!kvm_vdev) {
+		kvm_err("%s no forwarded irq for this device\n", __func__);
+		return -EINVAL;
+	}
+	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
+	if (!pfwd) {
+		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/**
+ * kvm_vfio_forward - set a forwarded IRQ
+ * @kdev: the kvm device
+ * @vdev: the vfio device the IRQ belongs to
+ * @fwd_irq: the user struct containing the irq_index and guest irq
+ * @must_put: tells the caller whether the vfio_device must be put after
+ * the call (ref must be released in case a ref onto this device was
+ * already hold or in case of new device and failure)
+ *
+ * validate the injection, activate forward and store the information
+ * about which irq and which device is concerned so that on deassign or
+ * kvm-vfio destruction everuthing can be cleaned up.
+ */
+static int kvm_vfio_forward(struct kvm_device *kdev,
+			    struct vfio_device *vdev,
+			    struct kvm_arch_forwarded_irq *fwd_irq,
+			    bool *must_put)
+{
+	int ret;
+	struct kvm_fwd_irq *pfwd = NULL;
+	struct kvm_vfio_device *kvm_vdev = NULL;
+	struct kvm_vfio *kv = kdev->private;
+	int hwirq;
+
+	*must_put = true;
+	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
+					&kvm_vdev, &hwirq);
+	if (ret < 0)
+		return -EINVAL;
+
+	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
+	if (!pfwd)
+		return -ENOMEM;
+	pfwd->index = fwd_irq->index;
+	pfwd->gsi = fwd_irq->gsi;
+	pfwd->hwirq = hwirq;
+	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
+	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
+	if (ret < 0) {
+		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
+		kfree(pfwd);
+		return ret;
+	}
+
+	if (!kvm_vdev) {
+		/* create & insert the new device and keep the ref */
+		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
+		if (!kvm_vdev) {
+			kvm_arch_set_fwd_state(pfwd, false);
+			kfree(pfwd);
+			return -ENOMEM;
+		}
+
+		kvm_vdev->vfio_device = vdev;
+		kvm_vdev->fd = fwd_irq->fd;
+		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
+		list_add(&kvm_vdev->node, &kv->device_list);
+		/*
+		 * the only case where we keep the ref:
+		 * new device and forward setting successful
+		 */
+		*must_put = false;
+	}
+
+	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
+
+	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
+	fwd_irq->fd, hwirq, fwd_irq->gsi);
+
+	return 0;
+}
+
+/**
+ * remove_assigned_device - put a given device from the list
+ * @kv: the kvm-vfio device
+ * @vdev: the vfio-device to remove
+ *
+ * change the state of all forwarded IRQs, free the forwarded IRQ list,
+ * remove the corresponding kvm_vfio_device from the assigned device
+ * list.
+ * returns true if the device could be removed, false in the negative
+ */
+bool remove_assigned_device(struct kvm_vfio *kv,
+			    struct vfio_device *vdev)
+{
+	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
+	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
+	bool removed = false;
+	int ret;
+
+	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
+				 &kv->device_list, node) {
+		if (kvm_vdev_iter->vfio_device == vdev) {
+			/* loop on all its forwarded IRQ */
+			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
+						 &kvm_vdev_iter->fwd_irq_list,
+						 link) {
+				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
+						KVM_VFIO_IRQ_SET_NORMAL);
+				if (ret < 0)
+					return ret;
+				list_del(&fwd_irq_iter->link);
+				kfree(fwd_irq_iter);
+			}
+			/* all IRQs could be deassigned */
+			list_del(&kvm_vdev_iter->node);
+			kvm_vfio_device_put_external_user(
+				kvm_vdev_iter->vfio_device);
+			kfree(kvm_vdev_iter);
+			removed = true;
+			break;
+		}
+	}
+	return removed;
+}
+
+
+/**
+ * remove_fwd_irq - remove a forwarded irq
+ *
+ * @kv: kvm-vfio device
+ * kvm_vdev: the kvm_vfio_device the IRQ belongs to
+ * irq_index: the index of the IRQ
+ *
+ * change the forwarded state of the IRQ, remove the IRQ from
+ * the device forwarded IRQ list. In case it is the last one,
+ * put the device
+ */
+int remove_fwd_irq(struct kvm_vfio *kv,
+		   struct kvm_vfio_device *kvm_vdev,
+		   int irq_index)
+{
+	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
+	int ret = -1;
+
+	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
+				 &kvm_vdev->fwd_irq_list, link) {
+		if (fwd_irq_iter->index == irq_index) {
+			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
+						KVM_VFIO_IRQ_SET_NORMAL);
+			if (ret < 0)
+				break;
+			list_del(&fwd_irq_iter->link);
+			kfree(fwd_irq_iter);
+			ret = 0;
+			break;
+		}
+	}
+	if (list_empty(&kvm_vdev->fwd_irq_list))
+		remove_assigned_device(kv, kvm_vdev->vfio_device);
+
+	return ret;
+}
+
+/**
+ * kvm_vfio_unforward - remove a forwarded IRQ
+ * @kdev: the kvm device
+ * @vdev: the vfio_device
+ * @fwd_irq: user struct
+ * after checking this IRQ effectively is forwarded, change its state,
+ * remove it from the corresponding kvm_vfio_device list
+ */
+static int kvm_vfio_unforward(struct kvm_device *kdev,
+				     struct vfio_device *vdev,
+				     struct kvm_arch_forwarded_irq *fwd_irq)
+{
+	struct kvm_vfio *kv = kdev->private;
+	struct kvm_vfio_device *kvm_vdev;
+	int ret;
+
+	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
+	if (ret < 0)
+		return -EINVAL;
+
+	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
+	if (ret < 0)
+		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
+			__func__, fwd_irq->fd, fwd_irq->index);
+	else
+		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
+			  __func__, fwd_irq->fd, fwd_irq->index);
+	return ret;
+}
+
+
+
+
+/**
+ * kvm_vfio_set_device - the top function for interracting with a vfio
+ * device
+ */
+static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
+{
+	struct kvm_vfio *kv = kdev->private;
+	struct vfio_device *vdev;
+	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
+	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
+
+	switch (attr) {
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
+		bool must_put;
+		int ret;
+
+		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
+			return -EFAULT;
+		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
+		if (IS_ERR(vdev))
+			return PTR_ERR(vdev);
+		mutex_lock(&kv->lock);
+		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
+		if (must_put)
+			kvm_vfio_put_vfio_device(vdev);
+		mutex_unlock(&kv->lock);
+		return ret;
+		}
+	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
+		int ret;
+
+		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
+			return -EFAULT;
+		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
+		if (IS_ERR(vdev))
+			return PTR_ERR(vdev);
+
+		kvm_vfio_device_put_external_user(vdev);
+		mutex_lock(&kv->lock);
+		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
+		mutex_unlock(&kv->lock);
+		return ret;
+	}
+#endif
+	default:
+		return -ENXIO;
+	}
+}
+
+/**
+ * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
+ * @kv: kvm-vfio device
+ *
+ * loop on all got devices and their associated forwarded IRQs
+ * restore the non forwarded state, remove IRQs and their devices from
+ * the respective list, put the vfio platform devices
+ *
+ * When this function is called, the vcpu already are destroyed. No
+ * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
+ * kvm_arch_set_fwd_state action
+ */
+int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
+{
+	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
+	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
+
+	/* loop on all the assigned devices */
+	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
+				 &kv->device_list, node) {
+
+		/* loop on all its forwarded IRQ */
+		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
+					 &kvm_vdev_iter->fwd_irq_list, link) {
+			kvm_arch_set_fwd_state(fwd_irq_iter,
+						KVM_VFIO_IRQ_CLEANUP);
+			list_del(&fwd_irq_iter->link);
+			kfree(fwd_irq_iter);
+		}
+		list_del(&kvm_vdev_iter->node);
+		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
+		kfree(kvm_vdev_iter);
+	}
+	return 0;
+}
+
+
 static int kvm_vfio_set_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
 	case KVM_DEV_VFIO_GROUP:
 		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
+	case KVM_DEV_VFIO_DEVICE:
+		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
 	}
 
 	return -ENXIO;
@@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 		case KVM_DEV_VFIO_GROUP_DEL:
 			return 0;
 		}
-
 		break;
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+	case KVM_DEV_VFIO_DEVICE:
+		switch (attr->attr) {
+		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
+		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
+			return 0;
+		}
+		break;
+#endif
 	}
-
 	return -ENXIO;
 }
 
@@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
 		list_del(&kvg->node);
 		kfree(kvg);
 	}
+	kvm_vfio_put_all_devices(kv);
 
 	kvm_vfio_update_coherency(dev);
 
@@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
 		return -ENOMEM;
 
 	INIT_LIST_HEAD(&kv->group_list);
+	INIT_LIST_HEAD(&kv->device_list);
 	mutex_init(&kv->lock);
 
 	dev->private = kv;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 9/9] KVM: KVM-VFIO: ARM forwarding control
  2014-09-01 12:52 ` Eric Auger
@ 2014-09-01 12:52   ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: eric.auger, eric.auger, christoffer.dall, marc.zyngier,
	linux-arm-kernel, kvmarm, kvm, alex.williamson, joel.schopp,
	kim.phillips, paulus, gleb, pbonzini
  Cc: linux-kernel, patches, will.deacon, a.motakis, a.rigo, john.liuli

Enables forwarding control for ARM. By defining
__KVM_HAVE_ARCH_KVM_VFIO_FORWARD the patch enables
KVM_DEV_VFIO_DEVICE_FORWARD/UNFORWARD_IRQ command on ARM. As a
result it brings an optimized injection/completion handling for
forwarded IRQ. The ARM specific part is implemented in a new module,
kvm_vfio_arm.c

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 arch/arm/include/asm/kvm_host.h |  2 +
 arch/arm/kvm/Makefile           |  2 +-
 arch/arm/kvm/kvm_vfio_arm.c     | 85 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/kvm/kvm_vfio_arm.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 1aee6bb..dfd3b05 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -25,6 +25,8 @@
 #include <asm/fpstate.h>
 #include <kvm/arm_arch_timer.h>
 
+#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
 #else
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index ea1fa76..26a5a42 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
+obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o kvm_vfio_arm.o
 obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
 obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
 obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
diff --git a/arch/arm/kvm/kvm_vfio_arm.c b/arch/arm/kvm/kvm_vfio_arm.c
new file mode 100644
index 0000000..0d316b1
--- /dev/null
+++ b/arch/arm/kvm/kvm_vfio_arm.c
@@ -0,0 +1,85 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Authors: Eric Auger <eric.auger@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/kvm_host.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/vfio.h>
+#include <linux/irq.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm.h>
+#include <linux/irq.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+
+/**
+ * kvm_arch_set_fwd_state - change the forwarded state of an IRQ
+ * @pfwd: the forwarded irq struct
+ * @action: action to perform (set forward, set back normal, cleanup)
+ *
+ * programs the GIC and VGIC
+ * returns the VGIC map/unmap return status
+ * It is the responsability of the caller to make sure the physical IRQ
+ * is not active. there is a critical section between the start of the
+ * VFIO IRQ handler and LR programming.
+ */
+int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
+			   enum kvm_fwd_irq_action action)
+{
+	int ret;
+	struct irq_desc *desc = irq_to_desc(pfwd->hwirq);
+	struct irq_data *d = &desc->irq_data;
+	struct irq_chip *chip = desc->irq_data.chip;
+
+	disable_irq(pfwd->hwirq);
+	/* no fwd state change can happen if the IRQ is in progress */
+	if (irqd_irq_inprogress(d)) {
+		kvm_err("%s cannot change fwd state (IRQ %d in progress\n",
+			__func__, pfwd->hwirq);
+		enable_irq(pfwd->hwirq);
+		return -1;
+	}
+
+	if (action == KVM_VFIO_IRQ_SET_FORWARD) {
+		irqd_set_irq_forwarded(d);
+		ret = vgic_map_phys_irq(pfwd->vcpu,
+					pfwd->gsi + VGIC_NR_PRIVATE_IRQS,
+					pfwd->hwirq);
+	} else if (action == KVM_VFIO_IRQ_SET_NORMAL) {
+		irqd_clr_irq_forwarded(d);
+		ret = vgic_unmap_phys_irq(pfwd->vcpu,
+					  pfwd->gsi +
+						VGIC_NR_PRIVATE_IRQS,
+					  pfwd->hwirq);
+	} else if (action == KVM_VFIO_IRQ_CLEANUP) {
+		irqd_clr_irq_forwarded(d);
+		/*
+		 * in case the guest did not complete the
+		 * virtual IRQ, let's do it for him.
+		 * when cleanup is called, VCPU have already
+		 * been freed, do not manipulate VGIC
+		 */
+		chip->irq_eoi(d);
+		ret = 0;
+	} else {
+		enable_irq(pfwd->hwirq);
+		ret = -EINVAL;
+	}
+
+	enable_irq(pfwd->hwirq);
+	return ret;
+}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [RFC v2 9/9] KVM: KVM-VFIO: ARM forwarding control
@ 2014-09-01 12:52   ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-01 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

Enables forwarding control for ARM. By defining
__KVM_HAVE_ARCH_KVM_VFIO_FORWARD the patch enables
KVM_DEV_VFIO_DEVICE_FORWARD/UNFORWARD_IRQ command on ARM. As a
result it brings an optimized injection/completion handling for
forwarded IRQ. The ARM specific part is implemented in a new module,
kvm_vfio_arm.c

Signed-off-by: Eric Auger <eric.auger@linaro.org>
---
 arch/arm/include/asm/kvm_host.h |  2 +
 arch/arm/kvm/Makefile           |  2 +-
 arch/arm/kvm/kvm_vfio_arm.c     | 85 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/kvm/kvm_vfio_arm.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 1aee6bb..dfd3b05 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -25,6 +25,8 @@
 #include <asm/fpstate.h>
 #include <kvm/arm_arch_timer.h>
 
+#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+
 #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
 #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
 #else
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index ea1fa76..26a5a42 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
+obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o kvm_vfio_arm.o
 obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
 obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
 obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
diff --git a/arch/arm/kvm/kvm_vfio_arm.c b/arch/arm/kvm/kvm_vfio_arm.c
new file mode 100644
index 0000000..0d316b1
--- /dev/null
+++ b/arch/arm/kvm/kvm_vfio_arm.c
@@ -0,0 +1,85 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Authors: Eric Auger <eric.auger@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/kvm_host.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/vfio.h>
+#include <linux/irq.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm.h>
+#include <linux/irq.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+
+/**
+ * kvm_arch_set_fwd_state - change the forwarded state of an IRQ
+ * @pfwd: the forwarded irq struct
+ * @action: action to perform (set forward, set back normal, cleanup)
+ *
+ * programs the GIC and VGIC
+ * returns the VGIC map/unmap return status
+ * It is the responsability of the caller to make sure the physical IRQ
+ * is not active. there is a critical section between the start of the
+ * VFIO IRQ handler and LR programming.
+ */
+int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
+			   enum kvm_fwd_irq_action action)
+{
+	int ret;
+	struct irq_desc *desc = irq_to_desc(pfwd->hwirq);
+	struct irq_data *d = &desc->irq_data;
+	struct irq_chip *chip = desc->irq_data.chip;
+
+	disable_irq(pfwd->hwirq);
+	/* no fwd state change can happen if the IRQ is in progress */
+	if (irqd_irq_inprogress(d)) {
+		kvm_err("%s cannot change fwd state (IRQ %d in progress\n",
+			__func__, pfwd->hwirq);
+		enable_irq(pfwd->hwirq);
+		return -1;
+	}
+
+	if (action == KVM_VFIO_IRQ_SET_FORWARD) {
+		irqd_set_irq_forwarded(d);
+		ret = vgic_map_phys_irq(pfwd->vcpu,
+					pfwd->gsi + VGIC_NR_PRIVATE_IRQS,
+					pfwd->hwirq);
+	} else if (action == KVM_VFIO_IRQ_SET_NORMAL) {
+		irqd_clr_irq_forwarded(d);
+		ret = vgic_unmap_phys_irq(pfwd->vcpu,
+					  pfwd->gsi +
+						VGIC_NR_PRIVATE_IRQS,
+					  pfwd->hwirq);
+	} else if (action == KVM_VFIO_IRQ_CLEANUP) {
+		irqd_clr_irq_forwarded(d);
+		/*
+		 * in case the guest did not complete the
+		 * virtual IRQ, let's do it for him.
+		 * when cleanup is called, VCPU have already
+		 * been freed, do not manipulate VGIC
+		 */
+		chip->irq_eoi(d);
+		ret = 0;
+	} else {
+		enable_irq(pfwd->hwirq);
+		ret = -EINVAL;
+	}
+
+	enable_irq(pfwd->hwirq);
+	return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
  2014-09-01 12:52 ` Eric Auger
@ 2014-09-02 21:05   ` Alex Williamson
  -1 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-02 21:05 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, christoffer.dall, marc.zyngier, linux-arm-kernel,
	kvmarm, kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> This RFC proposes an integration of "ARM: Forwarding physical
> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> KVM.
> 
> It enables to transform a VFIO platform driver IRQ into a forwarded
> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> switch can be avoided on guest virtual IRQ completion. Before this
> patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> 
> When the IRQ is forwarded, the VFIO platform driver does not need to
> disable the IRQ anymore. Indeed when returning from the IRQ handler
> the IRQ is not deactivated. Only its priority is lowered. This means
> the same IRQ cannot hit before the guest completes the virtual IRQ
> and the GIC automatically deactivates the corresponding physical IRQ.
> 
> Besides, the injection still is based on irqfd triggering. The only
> impact on irqfd process is resamplefd is not called anymore on
> virtual IRQ completion since this latter becomes "transparent".
> 
> The current integration is based on an extension of the KVM-VFIO
> device, previously used by KVM to interact with VFIO groups. The
> patch serie now enables KVM to directly interact with a VFIO
> platform device. The VFIO external API was extended for that purpose.
> 
> Th KVM-VFIO device can get/put the vfio platform device, check its
> integrity and type, get the IRQ number associated to an IRQ index.
> 
> The IRQ forward programming is architecture specific (virtual interrupt
> controller programming basically). However the whole infrastructure is
> kept generic.
> 
> from a user point of view, the functionality is provided through new
> KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> Assignment can only be changed when the physical IRQ is not active.
> It is the responsability of the user to do this check.
> 
> This patch serie has the following dependencies:
> - "ARM: Forwarding physical interrupts to a guest VM"
>   (http://lwn.net/Articles/603514/) in
> - [PATCH v3] irqfd for ARM
> - and obviously the VFIO platform driver serie:
>   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
>   https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> 
> Integrated pieces can be found at
> ssh://git.linaro.org/people/eric.auger/linux.git
> on branch 3.17rc3_irqfd_forward_integ_v2
> 
> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> 
> v1 -> v2:
> - forward control is moved from architecture specific file into generic
>   vfio.c module.
>   only kvm_arch_set_fwd_state remains architecture specific
> - integrate Kim's patch which enables KVM-VFIO for ARM
> - fix vgic state bypass in vgic_queue_hwirq
> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
>   to include/uapi/linux/kvm.h
>   also irq_index renamed into index and guest_irq renamed into gsi
> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> - vfio_external_get_base_device renamed into vfio_external_base_device
> - vfio_external_get_type removed
> - kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> 
> Eric Auger (8):
>   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
>     IRQ
>   KVM: ARM: VGIC: add forwarded irq rbtree lock
>   VFIO: platform: handler tests whether the IRQ is forwarded
>   KVM: KVM-VFIO: update user API to program forwarded IRQ
>   VFIO: Extend external user API
>   KVM: KVM-VFIO: add new VFIO external API hooks
>   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
>     control
>   KVM: KVM-VFIO: ARM forwarding control
> 
> Kim Phillips (1):
>   ARM: KVM: Enable the KVM-VFIO device
> 
>  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
>  arch/arm/include/asm/kvm_host.h            |   7 +
>  arch/arm/kvm/Kconfig                       |   1 +
>  arch/arm/kvm/Makefile                      |   4 +-
>  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
>  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
>  drivers/vfio/vfio.c                        |  24 ++
>  include/kvm/arm_vgic.h                     |   1 +
>  include/linux/kvm_host.h                   |  27 ++
>  include/linux/vfio.h                       |   3 +
>  include/uapi/linux/kvm.h                   |   9 +
>  virt/kvm/arm/vgic.c                        |  59 +++-
>  virt/kvm/vfio.c                            | 497 ++++++++++++++++++++++++++++-
>  13 files changed, 733 insertions(+), 17 deletions(-)
>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> 

Have we ventured too far in the other direction?  I suppose what I was
hoping to see was something more like:

	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{

		/* get vfio_device */

		/* get mutex */

		/* verify device+irq isn't already forwarded */

		/* allocate device/forwarded irq */

		/* get struct device */

		/* callout to arch code passing struct device, gsi, ... */

		/* if success, add to kv, else free and error */

		/* mutex unlock */
	}

Exposing the internal mutex out to arch code, as in v1, was an
indication that we were pushing too much out to arch code, but including
platform_device.h into virt/kvm/vfio.c tells me we're still not
abstracting at the right point.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-09-02 21:05   ` Alex Williamson
  0 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-02 21:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> This RFC proposes an integration of "ARM: Forwarding physical
> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> KVM.
> 
> It enables to transform a VFIO platform driver IRQ into a forwarded
> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> switch can be avoided on guest virtual IRQ completion. Before this
> patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> 
> When the IRQ is forwarded, the VFIO platform driver does not need to
> disable the IRQ anymore. Indeed when returning from the IRQ handler
> the IRQ is not deactivated. Only its priority is lowered. This means
> the same IRQ cannot hit before the guest completes the virtual IRQ
> and the GIC automatically deactivates the corresponding physical IRQ.
> 
> Besides, the injection still is based on irqfd triggering. The only
> impact on irqfd process is resamplefd is not called anymore on
> virtual IRQ completion since this latter becomes "transparent".
> 
> The current integration is based on an extension of the KVM-VFIO
> device, previously used by KVM to interact with VFIO groups. The
> patch serie now enables KVM to directly interact with a VFIO
> platform device. The VFIO external API was extended for that purpose.
> 
> Th KVM-VFIO device can get/put the vfio platform device, check its
> integrity and type, get the IRQ number associated to an IRQ index.
> 
> The IRQ forward programming is architecture specific (virtual interrupt
> controller programming basically). However the whole infrastructure is
> kept generic.
> 
> from a user point of view, the functionality is provided through new
> KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> Assignment can only be changed when the physical IRQ is not active.
> It is the responsability of the user to do this check.
> 
> This patch serie has the following dependencies:
> - "ARM: Forwarding physical interrupts to a guest VM"
>   (http://lwn.net/Articles/603514/) in
> - [PATCH v3] irqfd for ARM
> - and obviously the VFIO platform driver serie:
>   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
>   https://www.mail-archive.com/kvm at vger.kernel.org/msg103247.html
> 
> Integrated pieces can be found at
> ssh://git.linaro.org/people/eric.auger/linux.git
> on branch 3.17rc3_irqfd_forward_integ_v2
> 
> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> 
> v1 -> v2:
> - forward control is moved from architecture specific file into generic
>   vfio.c module.
>   only kvm_arch_set_fwd_state remains architecture specific
> - integrate Kim's patch which enables KVM-VFIO for ARM
> - fix vgic state bypass in vgic_queue_hwirq
> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
>   to include/uapi/linux/kvm.h
>   also irq_index renamed into index and guest_irq renamed into gsi
> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> - vfio_external_get_base_device renamed into vfio_external_base_device
> - vfio_external_get_type removed
> - kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> 
> Eric Auger (8):
>   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
>     IRQ
>   KVM: ARM: VGIC: add forwarded irq rbtree lock
>   VFIO: platform: handler tests whether the IRQ is forwarded
>   KVM: KVM-VFIO: update user API to program forwarded IRQ
>   VFIO: Extend external user API
>   KVM: KVM-VFIO: add new VFIO external API hooks
>   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
>     control
>   KVM: KVM-VFIO: ARM forwarding control
> 
> Kim Phillips (1):
>   ARM: KVM: Enable the KVM-VFIO device
> 
>  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
>  arch/arm/include/asm/kvm_host.h            |   7 +
>  arch/arm/kvm/Kconfig                       |   1 +
>  arch/arm/kvm/Makefile                      |   4 +-
>  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
>  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
>  drivers/vfio/vfio.c                        |  24 ++
>  include/kvm/arm_vgic.h                     |   1 +
>  include/linux/kvm_host.h                   |  27 ++
>  include/linux/vfio.h                       |   3 +
>  include/uapi/linux/kvm.h                   |   9 +
>  virt/kvm/arm/vgic.c                        |  59 +++-
>  virt/kvm/vfio.c                            | 497 ++++++++++++++++++++++++++++-
>  13 files changed, 733 insertions(+), 17 deletions(-)
>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> 

Have we ventured too far in the other direction?  I suppose what I was
hoping to see was something more like:

	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{

		/* get vfio_device */

		/* get mutex */

		/* verify device+irq isn't already forwarded */

		/* allocate device/forwarded irq */

		/* get struct device */

		/* callout to arch code passing struct device, gsi, ... */

		/* if success, add to kv, else free and error */

		/* mutex unlock */
	}

Exposing the internal mutex out to arch code, as in v1, was an
indication that we were pushing too much out to arch code, but including
platform_device.h into virt/kvm/vfio.c tells me we're still not
abstracting at the right point.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
  2014-09-02 21:05   ` Alex Williamson
@ 2014-09-05 12:52     ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-05 12:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: eric.auger, christoffer.dall, marc.zyngier, linux-arm-kernel,
	kvmarm, kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On 09/02/2014 11:05 PM, Alex Williamson wrote:
> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
>> This RFC proposes an integration of "ARM: Forwarding physical
>> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
>> KVM.
>>
>> It enables to transform a VFIO platform driver IRQ into a forwarded
>> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
>> switch can be avoided on guest virtual IRQ completion. Before this
>> patch, a maintenance IRQ was triggered on the virtual IRQ completion.
>>
>> When the IRQ is forwarded, the VFIO platform driver does not need to
>> disable the IRQ anymore. Indeed when returning from the IRQ handler
>> the IRQ is not deactivated. Only its priority is lowered. This means
>> the same IRQ cannot hit before the guest completes the virtual IRQ
>> and the GIC automatically deactivates the corresponding physical IRQ.
>>
>> Besides, the injection still is based on irqfd triggering. The only
>> impact on irqfd process is resamplefd is not called anymore on
>> virtual IRQ completion since this latter becomes "transparent".
>>
>> The current integration is based on an extension of the KVM-VFIO
>> device, previously used by KVM to interact with VFIO groups. The
>> patch serie now enables KVM to directly interact with a VFIO
>> platform device. The VFIO external API was extended for that purpose.
>>
>> Th KVM-VFIO device can get/put the vfio platform device, check its
>> integrity and type, get the IRQ number associated to an IRQ index.
>>
>> The IRQ forward programming is architecture specific (virtual interrupt
>> controller programming basically). However the whole infrastructure is
>> kept generic.
>>
>> from a user point of view, the functionality is provided through new
>> KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
>> and the capability can be checked with KVM_HAS_DEVICE_ATTR.
>> Assignment can only be changed when the physical IRQ is not active.
>> It is the responsability of the user to do this check.
>>
>> This patch serie has the following dependencies:
>> - "ARM: Forwarding physical interrupts to a guest VM"
>>   (http://lwn.net/Articles/603514/) in
>> - [PATCH v3] irqfd for ARM
>> - and obviously the VFIO platform driver serie:
>>   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
>>   https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
>>
>> Integrated pieces can be found at
>> ssh://git.linaro.org/people/eric.auger/linux.git
>> on branch 3.17rc3_irqfd_forward_integ_v2
>>
>> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
>>
>> v1 -> v2:
>> - forward control is moved from architecture specific file into generic
>>   vfio.c module.
>>   only kvm_arch_set_fwd_state remains architecture specific
>> - integrate Kim's patch which enables KVM-VFIO for ARM
>> - fix vgic state bypass in vgic_queue_hwirq
>> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
>>   to include/uapi/linux/kvm.h
>>   also irq_index renamed into index and guest_irq renamed into gsi
>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
>> - vfio_external_get_base_device renamed into vfio_external_base_device
>> - vfio_external_get_type removed
>> - kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>
>> Eric Auger (8):
>>   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
>>     IRQ
>>   KVM: ARM: VGIC: add forwarded irq rbtree lock
>>   VFIO: platform: handler tests whether the IRQ is forwarded
>>   KVM: KVM-VFIO: update user API to program forwarded IRQ
>>   VFIO: Extend external user API
>>   KVM: KVM-VFIO: add new VFIO external API hooks
>>   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
>>     control
>>   KVM: KVM-VFIO: ARM forwarding control
>>
>> Kim Phillips (1):
>>   ARM: KVM: Enable the KVM-VFIO device
>>
>>  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
>>  arch/arm/include/asm/kvm_host.h            |   7 +
>>  arch/arm/kvm/Kconfig                       |   1 +
>>  arch/arm/kvm/Makefile                      |   4 +-
>>  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
>>  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
>>  drivers/vfio/vfio.c                        |  24 ++
>>  include/kvm/arm_vgic.h                     |   1 +
>>  include/linux/kvm_host.h                   |  27 ++
>>  include/linux/vfio.h                       |   3 +
>>  include/uapi/linux/kvm.h                   |   9 +
>>  virt/kvm/arm/vgic.c                        |  59 +++-
>>  virt/kvm/vfio.c                            | 497 ++++++++++++++++++++++++++++-
>>  13 files changed, 733 insertions(+), 17 deletions(-)
>>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
>>
> 
> Have we ventured too far in the other direction?  I suppose what I was
> hoping to see was something more like:
> 
> 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> 
> 		/* get vfio_device */
> 
> 		/* get mutex */
> 
> 		/* verify device+irq isn't already forwarded */
> 
> 		/* allocate device/forwarded irq */
> 
> 		/* get struct device */
> 
> 		/* callout to arch code passing struct device, gsi, ... */
> 
> 		/* if success, add to kv, else free and error */
> 
> 		/* mutex unlock */
> 	}
> 
> Exposing the internal mutex out to arch code, as in v1, was an
> indication that we were pushing too much out to arch code, but including
> platform_device.h into virt/kvm/vfio.c tells me we're still not
> abstracting at the right point.  Thanks,
Hi Alex,

Yes it makes sense. I will rework the patch in this direction.

Thanks

Eric
> 
> Alex
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-09-05 12:52     ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-05 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/02/2014 11:05 PM, Alex Williamson wrote:
> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
>> This RFC proposes an integration of "ARM: Forwarding physical
>> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
>> KVM.
>>
>> It enables to transform a VFIO platform driver IRQ into a forwarded
>> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
>> switch can be avoided on guest virtual IRQ completion. Before this
>> patch, a maintenance IRQ was triggered on the virtual IRQ completion.
>>
>> When the IRQ is forwarded, the VFIO platform driver does not need to
>> disable the IRQ anymore. Indeed when returning from the IRQ handler
>> the IRQ is not deactivated. Only its priority is lowered. This means
>> the same IRQ cannot hit before the guest completes the virtual IRQ
>> and the GIC automatically deactivates the corresponding physical IRQ.
>>
>> Besides, the injection still is based on irqfd triggering. The only
>> impact on irqfd process is resamplefd is not called anymore on
>> virtual IRQ completion since this latter becomes "transparent".
>>
>> The current integration is based on an extension of the KVM-VFIO
>> device, previously used by KVM to interact with VFIO groups. The
>> patch serie now enables KVM to directly interact with a VFIO
>> platform device. The VFIO external API was extended for that purpose.
>>
>> Th KVM-VFIO device can get/put the vfio platform device, check its
>> integrity and type, get the IRQ number associated to an IRQ index.
>>
>> The IRQ forward programming is architecture specific (virtual interrupt
>> controller programming basically). However the whole infrastructure is
>> kept generic.
>>
>> from a user point of view, the functionality is provided through new
>> KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
>> and the capability can be checked with KVM_HAS_DEVICE_ATTR.
>> Assignment can only be changed when the physical IRQ is not active.
>> It is the responsability of the user to do this check.
>>
>> This patch serie has the following dependencies:
>> - "ARM: Forwarding physical interrupts to a guest VM"
>>   (http://lwn.net/Articles/603514/) in
>> - [PATCH v3] irqfd for ARM
>> - and obviously the VFIO platform driver serie:
>>   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
>>   https://www.mail-archive.com/kvm at vger.kernel.org/msg103247.html
>>
>> Integrated pieces can be found at
>> ssh://git.linaro.org/people/eric.auger/linux.git
>> on branch 3.17rc3_irqfd_forward_integ_v2
>>
>> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
>>
>> v1 -> v2:
>> - forward control is moved from architecture specific file into generic
>>   vfio.c module.
>>   only kvm_arch_set_fwd_state remains architecture specific
>> - integrate Kim's patch which enables KVM-VFIO for ARM
>> - fix vgic state bypass in vgic_queue_hwirq
>> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
>>   to include/uapi/linux/kvm.h
>>   also irq_index renamed into index and guest_irq renamed into gsi
>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
>> - vfio_external_get_base_device renamed into vfio_external_base_device
>> - vfio_external_get_type removed
>> - kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>
>> Eric Auger (8):
>>   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
>>     IRQ
>>   KVM: ARM: VGIC: add forwarded irq rbtree lock
>>   VFIO: platform: handler tests whether the IRQ is forwarded
>>   KVM: KVM-VFIO: update user API to program forwarded IRQ
>>   VFIO: Extend external user API
>>   KVM: KVM-VFIO: add new VFIO external API hooks
>>   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
>>     control
>>   KVM: KVM-VFIO: ARM forwarding control
>>
>> Kim Phillips (1):
>>   ARM: KVM: Enable the KVM-VFIO device
>>
>>  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
>>  arch/arm/include/asm/kvm_host.h            |   7 +
>>  arch/arm/kvm/Kconfig                       |   1 +
>>  arch/arm/kvm/Makefile                      |   4 +-
>>  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
>>  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
>>  drivers/vfio/vfio.c                        |  24 ++
>>  include/kvm/arm_vgic.h                     |   1 +
>>  include/linux/kvm_host.h                   |  27 ++
>>  include/linux/vfio.h                       |   3 +
>>  include/uapi/linux/kvm.h                   |   9 +
>>  virt/kvm/arm/vgic.c                        |  59 +++-
>>  virt/kvm/vfio.c                            | 497 ++++++++++++++++++++++++++++-
>>  13 files changed, 733 insertions(+), 17 deletions(-)
>>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
>>
> 
> Have we ventured too far in the other direction?  I suppose what I was
> hoping to see was something more like:
> 
> 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> 
> 		/* get vfio_device */
> 
> 		/* get mutex */
> 
> 		/* verify device+irq isn't already forwarded */
> 
> 		/* allocate device/forwarded irq */
> 
> 		/* get struct device */
> 
> 		/* callout to arch code passing struct device, gsi, ... */
> 
> 		/* if success, add to kv, else free and error */
> 
> 		/* mutex unlock */
> 	}
> 
> Exposing the internal mutex out to arch code, as in v1, was an
> indication that we were pushing too much out to arch code, but including
> platform_device.h into virt/kvm/vfio.c tells me we're still not
> abstracting at the right point.  Thanks,
Hi Alex,

Yes it makes sense. I will rework the patch in this direction.

Thanks

Eric
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 1/9] KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded IRQ
  2014-09-01 12:52   ` Eric Auger
@ 2014-09-11  3:09     ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:09 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Mon, Sep 01, 2014 at 02:52:40PM +0200, Eric Auger wrote:
> Fix multiple injection of level sensitive forwarded IRQs.
> With current code, the second injection fails since the state bitmaps
> are not reset (process_maintenance is not called anymore).
> New implementation consists in fully bypassing the vgic state
> management for forwarded IRQ (checks are ignored in
> vgic_update_irq_pending). This obviously assumes the forwarded IRQ is
> injected from kernel side.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> It was attempted to reset the states in __kvm_vgic_sync_hwstate, checking
> the emptied LR of forwarded IRQ. However surprisingly this solution does
> not seem to work. Some times, a new forwarded IRQ injection is observed
> while the LR of the previous instance was not observed as empty.

hmmm, concerning.  It would probably have been helpful overall if you
could start by describing the problem with the current implementation in
the commit message, and then explain the fix...

> 
> v1 -> v2:
> - fix vgic state bypass in vgic_queue_hwirq
> ---
>  virt/kvm/arm/vgic.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 0007300..8ef495b 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1259,7 +1259,9 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
>  
>  static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
>  {
> -	if (vgic_irq_is_queued(vcpu, irq))
> +	bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) > 0);

can you create a static function to factor this vgic_get_phys_irq check out, please?

> +
> +	if (vgic_irq_is_queued(vcpu, irq) && !is_forwarded)
>  		return true; /* level interrupt, already queued */

so essentially if an IRQ is already on a LR so we shouldn't resample the
line, then we still resample the line if the IRQ is forwarded?

I think you need to explain this, both to me here, and also in the code
by moving the comment following the return statement above the check and
comment this clearly.

>  
>  	if (vgic_queue_irq(vcpu, 0, irq)) {
> @@ -1517,14 +1519,18 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  	int edge_triggered, level_triggered;
>  	int enabled;
>  	bool ret = true;
> +	bool is_forwarded;
>  
>  	spin_lock(&dist->lock);
>  
>  	vcpu = kvm_get_vcpu(kvm, cpuid);
> +	is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) > 0);

use your new function here as well.

> +
>  	edge_triggered = vgic_irq_is_edge(vcpu, irq_num);
>  	level_triggered = !edge_triggered;
>  
> -	if (!vgic_validate_injection(vcpu, irq_num, level)) {
> +	if (!is_forwarded &&
> +		!vgic_validate_injection(vcpu, irq_num, level)) {

I don't see the rationale here either.  If an IRQ is forwarded, why do
you need to do anything if the condition of the line hasn't changed for
a level-triggered IRQ or if you have a falling edge on an edge-triggered
IRQ (assuming active-HIGH)?

>  		ret = false;
>  		goto out;
>  	}
> @@ -1557,7 +1563,8 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  		goto out;
>  	}
>  
> -	if (level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
> +	if (!is_forwarded &&
> +		level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {

So here it's making sense for SPIs since you can have an EOIed interrupt
on a CPU that didn't exit the VM yet, and this it's still queued, but
you still need to resample the line to respect other CPUs.  Only, we
ever only target a single CPU for SPIs IIRC (the first in the target
list register) so we have to wait for that CPU to to exit the VM anyhow.

This leads me to believe that, given a fowarded irq, you can only have
XXX situations at this point:

(1) is_queued && target_vcpu_in_vm:
The vcpu should resample this line when it exits the VM, because we
check the LRs for IRQs like this one, so we don't have to do anything
and go to out here.

(2) is_queued && !target_vcpu_in_vm::
You have a bug because you exited the VM which must have done an EOI on
the interrupt, otherwise this function shouldn't have been called!  This
means that we should have cleared the queued state of the interrupt.

(3) !is_queued && whatever:
Set the irq pending bits, so do not goto out.

I'm aware that there's theoretically a race between (1) and (2), but you
should consider target_cpu_in_vm as "it hasn't been through
__kvm_vgic_sync_hwstate() yet" and this should hold.

Tell me where this breaks?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 1/9] KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded IRQ
@ 2014-09-11  3:09     ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 01, 2014 at 02:52:40PM +0200, Eric Auger wrote:
> Fix multiple injection of level sensitive forwarded IRQs.
> With current code, the second injection fails since the state bitmaps
> are not reset (process_maintenance is not called anymore).
> New implementation consists in fully bypassing the vgic state
> management for forwarded IRQ (checks are ignored in
> vgic_update_irq_pending). This obviously assumes the forwarded IRQ is
> injected from kernel side.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> It was attempted to reset the states in __kvm_vgic_sync_hwstate, checking
> the emptied LR of forwarded IRQ. However surprisingly this solution does
> not seem to work. Some times, a new forwarded IRQ injection is observed
> while the LR of the previous instance was not observed as empty.

hmmm, concerning.  It would probably have been helpful overall if you
could start by describing the problem with the current implementation in
the commit message, and then explain the fix...

> 
> v1 -> v2:
> - fix vgic state bypass in vgic_queue_hwirq
> ---
>  virt/kvm/arm/vgic.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 0007300..8ef495b 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1259,7 +1259,9 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
>  
>  static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
>  {
> -	if (vgic_irq_is_queued(vcpu, irq))
> +	bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) > 0);

can you create a static function to factor this vgic_get_phys_irq check out, please?

> +
> +	if (vgic_irq_is_queued(vcpu, irq) && !is_forwarded)
>  		return true; /* level interrupt, already queued */

so essentially if an IRQ is already on a LR so we shouldn't resample the
line, then we still resample the line if the IRQ is forwarded?

I think you need to explain this, both to me here, and also in the code
by moving the comment following the return statement above the check and
comment this clearly.

>  
>  	if (vgic_queue_irq(vcpu, 0, irq)) {
> @@ -1517,14 +1519,18 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  	int edge_triggered, level_triggered;
>  	int enabled;
>  	bool ret = true;
> +	bool is_forwarded;
>  
>  	spin_lock(&dist->lock);
>  
>  	vcpu = kvm_get_vcpu(kvm, cpuid);
> +	is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) > 0);

use your new function here as well.

> +
>  	edge_triggered = vgic_irq_is_edge(vcpu, irq_num);
>  	level_triggered = !edge_triggered;
>  
> -	if (!vgic_validate_injection(vcpu, irq_num, level)) {
> +	if (!is_forwarded &&
> +		!vgic_validate_injection(vcpu, irq_num, level)) {

I don't see the rationale here either.  If an IRQ is forwarded, why do
you need to do anything if the condition of the line hasn't changed for
a level-triggered IRQ or if you have a falling edge on an edge-triggered
IRQ (assuming active-HIGH)?

>  		ret = false;
>  		goto out;
>  	}
> @@ -1557,7 +1563,8 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  		goto out;
>  	}
>  
> -	if (level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
> +	if (!is_forwarded &&
> +		level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {

So here it's making sense for SPIs since you can have an EOIed interrupt
on a CPU that didn't exit the VM yet, and this it's still queued, but
you still need to resample the line to respect other CPUs.  Only, we
ever only target a single CPU for SPIs IIRC (the first in the target
list register) so we have to wait for that CPU to to exit the VM anyhow.

This leads me to believe that, given a fowarded irq, you can only have
XXX situations at this point:

(1) is_queued && target_vcpu_in_vm:
The vcpu should resample this line when it exits the VM, because we
check the LRs for IRQs like this one, so we don't have to do anything
and go to out here.

(2) is_queued && !target_vcpu_in_vm::
You have a bug because you exited the VM which must have done an EOI on
the interrupt, otherwise this function shouldn't have been called!  This
means that we should have cleared the queued state of the interrupt.

(3) !is_queued && whatever:
Set the irq pending bits, so do not goto out.

I'm aware that there's theoretically a race between (1) and (2), but you
should consider target_cpu_in_vm as "it hasn't been through
__kvm_vgic_sync_hwstate() yet" and this should hold.

Tell me where this breaks?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 2/9] KVM: ARM: VGIC: add forwarded irq rbtree lock
  2014-09-01 12:52   ` Eric Auger
@ 2014-09-11  3:09     ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:09 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Mon, Sep 01, 2014 at 02:52:41PM +0200, Eric Auger wrote:
> add a lock related to the rb tree manipulation. The rb tree can be

Ok, I can't hold myself back any longer.  Please begin sentences with a
capital letter. You don't do this in French? :)

> searched in one thread (irqfd handler for instance) and map/unmap
> happen in another.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>  include/kvm/arm_vgic.h |  1 +
>  virt/kvm/arm/vgic.c    | 46 +++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 743020f..3da244f 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -177,6 +177,7 @@ struct vgic_dist {
>  	unsigned long		irq_pending_on_cpu;
>  
>  	struct rb_root		irq_phys_map;
> +	spinlock_t			rb_tree_lock;
>  #endif
>  };
>  
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 8ef495b..dbc2a5a 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1630,9 +1630,15 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>  
>  int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>  {
> -	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> -	struct rb_node **new = &root->rb_node, *parent = NULL;
> +	struct rb_root *root;
> +	struct rb_node **new, *parent = NULL;
>  	struct irq_phys_map *new_map;
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> +	spin_lock(&dist->rb_tree_lock);
> +
> +	root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +	new = &root->rb_node;
>  
>  	/* Boilerplate rb_tree code */
>  	while (*new) {
> @@ -1644,13 +1650,17 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>  			new = &(*new)->rb_left;
>  		else if (this->virt_irq > virt_irq)
>  			new = &(*new)->rb_right;
> -		else
> +		else {
> +			spin_unlock(&dist->rb_tree_lock);
>  			return -EEXIST;
> +		}

can you initialize a ret variable to -EEXIST in the beginning of this
function, and add an out label above the unlock below, replace this
multi-line statement with a goto out, and set ret = 0 after the while
loop?

>  	}
>  
>  	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
> -	if (!new_map)
> +	if (!new_map) {
> +		spin_unlock(&dist->rb_tree_lock);
>  		return -ENOMEM;

then this becomes ret = -ENOMEM; goto out;

> +	}
>  
>  	new_map->virt_irq = virt_irq;
>  	new_map->phys_irq = phys_irq;
> @@ -1658,6 +1668,8 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>  	rb_link_node(&new_map->node, parent, new);
>  	rb_insert_color(&new_map->node, root);
>  
> +	spin_unlock(&dist->rb_tree_lock);
> +

aren't you allocating memory with GFP_KERNEL while holding a spinlock
here?

>  	return 0;
>  }
>  
> @@ -1685,24 +1697,39 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>  
>  int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq)
>  {
> -	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
> +	struct irq_phys_map *map;
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	int ret;
> +
> +	spin_lock(&dist->rb_tree_lock);
> +	map = vgic_irq_map_search(vcpu, virt_irq);
>  
>  	if (map)
> -		return map->phys_irq;
> +		ret = map->phys_irq;
> +	else
> +		ret =  -ENOENT;

initialize ret to -ENOENT and avoid the else statement.

> +
> +	spin_unlock(&dist->rb_tree_lock);
> +	return ret;
>  
> -	return -ENOENT;
>  }
>  
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>  {
> -	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
> +	struct irq_phys_map *map;
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> +	spin_lock(&dist->rb_tree_lock);
> +
> +	map = vgic_irq_map_search(vcpu, virt_irq);
>  
>  	if (map && map->phys_irq == phys_irq) {
>  		rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, virt_irq));
>  		kfree(map);
> +		spin_unlock(&dist->rb_tree_lock);

can kfree sleep?  I don't remember.  In any case, you can unlock before
calling kfree.

>  		return 0;
>  	}
> -
> +	spin_unlock(&dist->rb_tree_lock);
>  	return -ENOENT;

an out label and single unlock location would be preferred here as well
I think.

>  }
>  
> @@ -1898,6 +1925,7 @@ int kvm_vgic_create(struct kvm *kvm)
>  	}
>  
>  	spin_lock_init(&kvm->arch.vgic.lock);
> +	spin_lock_init(&kvm->arch.vgic.rb_tree_lock);
>  	kvm->arch.vgic.in_kernel = true;
>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 2/9] KVM: ARM: VGIC: add forwarded irq rbtree lock
@ 2014-09-11  3:09     ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 01, 2014 at 02:52:41PM +0200, Eric Auger wrote:
> add a lock related to the rb tree manipulation. The rb tree can be

Ok, I can't hold myself back any longer.  Please begin sentences with a
capital letter. You don't do this in French? :)

> searched in one thread (irqfd handler for instance) and map/unmap
> happen in another.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>  include/kvm/arm_vgic.h |  1 +
>  virt/kvm/arm/vgic.c    | 46 +++++++++++++++++++++++++++++++++++++---------
>  2 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 743020f..3da244f 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -177,6 +177,7 @@ struct vgic_dist {
>  	unsigned long		irq_pending_on_cpu;
>  
>  	struct rb_root		irq_phys_map;
> +	spinlock_t			rb_tree_lock;
>  #endif
>  };
>  
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 8ef495b..dbc2a5a 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1630,9 +1630,15 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>  
>  int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>  {
> -	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
> -	struct rb_node **new = &root->rb_node, *parent = NULL;
> +	struct rb_root *root;
> +	struct rb_node **new, *parent = NULL;
>  	struct irq_phys_map *new_map;
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> +	spin_lock(&dist->rb_tree_lock);
> +
> +	root = vgic_get_irq_phys_map(vcpu, virt_irq);
> +	new = &root->rb_node;
>  
>  	/* Boilerplate rb_tree code */
>  	while (*new) {
> @@ -1644,13 +1650,17 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>  			new = &(*new)->rb_left;
>  		else if (this->virt_irq > virt_irq)
>  			new = &(*new)->rb_right;
> -		else
> +		else {
> +			spin_unlock(&dist->rb_tree_lock);
>  			return -EEXIST;
> +		}

can you initialize a ret variable to -EEXIST in the beginning of this
function, and add an out label above the unlock below, replace this
multi-line statement with a goto out, and set ret = 0 after the while
loop?

>  	}
>  
>  	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
> -	if (!new_map)
> +	if (!new_map) {
> +		spin_unlock(&dist->rb_tree_lock);
>  		return -ENOMEM;

then this becomes ret = -ENOMEM; goto out;

> +	}
>  
>  	new_map->virt_irq = virt_irq;
>  	new_map->phys_irq = phys_irq;
> @@ -1658,6 +1668,8 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>  	rb_link_node(&new_map->node, parent, new);
>  	rb_insert_color(&new_map->node, root);
>  
> +	spin_unlock(&dist->rb_tree_lock);
> +

aren't you allocating memory with GFP_KERNEL while holding a spinlock
here?

>  	return 0;
>  }
>  
> @@ -1685,24 +1697,39 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>  
>  int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq)
>  {
> -	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
> +	struct irq_phys_map *map;
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	int ret;
> +
> +	spin_lock(&dist->rb_tree_lock);
> +	map = vgic_irq_map_search(vcpu, virt_irq);
>  
>  	if (map)
> -		return map->phys_irq;
> +		ret = map->phys_irq;
> +	else
> +		ret =  -ENOENT;

initialize ret to -ENOENT and avoid the else statement.

> +
> +	spin_unlock(&dist->rb_tree_lock);
> +	return ret;
>  
> -	return -ENOENT;
>  }
>  
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>  {
> -	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
> +	struct irq_phys_map *map;
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> +	spin_lock(&dist->rb_tree_lock);
> +
> +	map = vgic_irq_map_search(vcpu, virt_irq);
>  
>  	if (map && map->phys_irq == phys_irq) {
>  		rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, virt_irq));
>  		kfree(map);
> +		spin_unlock(&dist->rb_tree_lock);

can kfree sleep?  I don't remember.  In any case, you can unlock before
calling kfree.

>  		return 0;
>  	}
> -
> +	spin_unlock(&dist->rb_tree_lock);
>  	return -ENOENT;

an out label and single unlock location would be preferred here as well
I think.

>  }
>  
> @@ -1898,6 +1925,7 @@ int kvm_vgic_create(struct kvm *kvm)
>  	}
>  
>  	spin_lock_init(&kvm->arch.vgic.lock);
> +	spin_lock_init(&kvm->arch.vgic.rb_tree_lock);
>  	kvm->arch.vgic.in_kernel = true;
>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
  2014-09-01 12:52   ` Eric Auger
@ 2014-09-11  3:10     ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Mon, Sep 01, 2014 at 02:52:43PM +0200, Eric Auger wrote:
> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
> need to disable the IRQ anymore. In that mode, when the handler completes

add a comma after completes

> the IRQ is not deactivated but only its priority is lowered.
> 
> Some other actor (typically a guest) is supposed to deactivate the IRQ,
> allowing at that time a new physical IRQ to hit.
> 
> In virtualization use case, the physical IRQ is automatically completed
> by the interrupt controller when the guest completes the corresponding
> virtual IRQ.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>  drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> index 6768508..1f851b2 100644
> --- a/drivers/vfio/platform/vfio_platform_irq.c
> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> @@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
>  	struct vfio_platform_irq *irq_ctx = dev_id;
>  	unsigned long flags;
>  	int ret = IRQ_NONE;
> +	struct irq_data *d;
> +	bool is_forwarded;
>  
>  	spin_lock_irqsave(&irq_ctx->lock, flags);
>  
>  	if (!irq_ctx->masked) {
>  		ret = IRQ_HANDLED;
> +		d = irq_get_irq_data(irq_ctx->hwirq);
> +		is_forwarded = irqd_irq_forwarded(d);
>  
> -		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
> +		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
> +						!is_forwarded) {
>  			disable_irq_nosync(irq_ctx->hwirq);
>  			irq_ctx->masked = true;
>  		}
> -- 
> 1.9.1
> 
It makes sense that these needs to be all controlled in the kernel, but
I'm wondering if it would be cleaner / more correct to clear the
AUTOMASKED flag when the IRQ is forwarded and have vfio refuse setting
this flag as long as the irq is forwarded?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
@ 2014-09-11  3:10     ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 01, 2014 at 02:52:43PM +0200, Eric Auger wrote:
> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
> need to disable the IRQ anymore. In that mode, when the handler completes

add a comma after completes

> the IRQ is not deactivated but only its priority is lowered.
> 
> Some other actor (typically a guest) is supposed to deactivate the IRQ,
> allowing at that time a new physical IRQ to hit.
> 
> In virtualization use case, the physical IRQ is automatically completed
> by the interrupt controller when the guest completes the corresponding
> virtual IRQ.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>  drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> index 6768508..1f851b2 100644
> --- a/drivers/vfio/platform/vfio_platform_irq.c
> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> @@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
>  	struct vfio_platform_irq *irq_ctx = dev_id;
>  	unsigned long flags;
>  	int ret = IRQ_NONE;
> +	struct irq_data *d;
> +	bool is_forwarded;
>  
>  	spin_lock_irqsave(&irq_ctx->lock, flags);
>  
>  	if (!irq_ctx->masked) {
>  		ret = IRQ_HANDLED;
> +		d = irq_get_irq_data(irq_ctx->hwirq);
> +		is_forwarded = irqd_irq_forwarded(d);
>  
> -		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
> +		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
> +						!is_forwarded) {
>  			disable_irq_nosync(irq_ctx->hwirq);
>  			irq_ctx->masked = true;
>  		}
> -- 
> 1.9.1
> 
It makes sense that these needs to be all controlled in the kernel, but
I'm wondering if it would be cleaner / more correct to clear the
AUTOMASKED flag when the IRQ is forwarded and have vfio refuse setting
this flag as long as the irq is forwarded?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 5/9] KVM: KVM-VFIO: update user API to program forwarded IRQ
  2014-09-01 12:52   ` Eric Auger
@ 2014-09-11  3:10     ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Mon, Sep 01, 2014 at 02:52:44PM +0200, Eric Auger wrote:
> add new device group commands:
> - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ and
>   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
> 
> which enable to turn forwarded IRQ mode on/off.
> 
> the kvm_arch_forwarded_irq struct embodies a forwarded IRQ
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v1 -> v2:
> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
>   to include/uapi/linux/kvm.h
>   also irq_index renamed into index and guest_irq renamed into gsi
> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> ---
>  Documentation/virtual/kvm/devices/vfio.txt | 26 ++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h                   |  9 +++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
> index ef51740..048baa0 100644
> --- a/Documentation/virtual/kvm/devices/vfio.txt
> +++ b/Documentation/virtual/kvm/devices/vfio.txt
> @@ -13,6 +13,7 @@ VFIO-group is held by KVM.
>  
>  Groups:
>    KVM_DEV_VFIO_GROUP
> +  KVM_DEV_VFIO_DEVICE
>  
>  KVM_DEV_VFIO_GROUP attributes:
>    KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
> @@ -20,3 +21,28 @@ KVM_DEV_VFIO_GROUP attributes:
>  
>  For each, kvm_device_attr.addr points to an int32_t file descriptor
>  for the VFIO group.
> +
> +KVM_DEV_VFIO_DEVICE attributes:
> +  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ
> +  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
> +
> +For each, kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct.
> +This user API makes possible to create a special IRQ handling mode,

  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ enables a special IRQ handling mode on
  hardware that supports it,

> +where KVM and a VFIO platform driver collaborate to improve IRQ
> +handling performance.
> +
> +'fd represents the file descriptor of a valid VFIO device whose physical

fd is described out of context here.  Can you copy the struct definition
into this document, perhaps right after the "For each, ..." line above.

> +IRQ, referenced by its index, is injected into the VM guest irq (gsi).
                                             as a virtual IRQ (specified
					     by the gsi field) into the
					     VM.

> +
> +On FORWARD_IRQ, KVM-VFIO device programs:
   When setting the  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ attribute, the
   KVM-VFIO device tells the host (or VFIO?) to not complete the
   physical IRQ, and instead ensures that KVM (or the VM) completes the
   physical IRQ.

> +- the host, to not complete the physical IRQ itself.
> +- the GIC, to automatically complete the physical IRQ when the guest
> +  completes the virtual IRQ.

and drop this bullet form.

> +This avoids trapping the end-of-interrupt for level sensitive IRQ.

avoid this last line, it's specific to ARM.

> +
> +On UNFORWARD_IRQ, one returns to the mode where the host completes the
   When setting the KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ attribute, the
   host (VFIO?) will again complete the physical IRQ and KVM will not...
 
> +physical IRQ and the guest completes the virtual IRQ.
> +
> +It is up to the caller of this API to make sure the IRQ is not
> +outstanding when the FORWARD/UNFORWARD is called. This could lead to

outstanding? can you be specific?

don't refer to FOWARD/UNFORWARD, either refer to these attributes by
their full name or use a clear reference in proper English.

> +some inconsistency on who is going to complete the IRQ.

This sounds like the whole thing is fragile and if userspace doesn't do
things right, IRQ handling of a piece of hardware is going to be
inconsistent?  Is this the case?  If so, we need some stronger
semantics.  If not, this should be rephrased.

> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index cf3a2ff..8cd7b0e 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -947,6 +947,12 @@ struct kvm_device_attr {
>  	__u64	addr;		/* userspace address of attr data */
>  };
>  
> +struct kvm_arch_forwarded_irq {
> +	__u32 fd; /* file desciptor of the VFIO device */
> +	__u32 index; /* VFIO device IRQ index */
> +	__u32 gsi; /* gsi, ie. virtual IRQ number */
> +};
> +
>  #define KVM_DEV_TYPE_FSL_MPIC_20	1
>  #define KVM_DEV_TYPE_FSL_MPIC_42	2
>  #define KVM_DEV_TYPE_XICS		3
> @@ -954,6 +960,9 @@ struct kvm_device_attr {
>  #define  KVM_DEV_VFIO_GROUP			1
>  #define   KVM_DEV_VFIO_GROUP_ADD			1
>  #define   KVM_DEV_VFIO_GROUP_DEL			2
> +#define  KVM_DEV_VFIO_DEVICE			2
> +#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ			1
> +#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ			2
>  #define KVM_DEV_TYPE_ARM_VGIC_V2	5
>  #define KVM_DEV_TYPE_FLIC		6
>  
> -- 
> 1.9.1
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 5/9] KVM: KVM-VFIO: update user API to program forwarded IRQ
@ 2014-09-11  3:10     ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 01, 2014 at 02:52:44PM +0200, Eric Auger wrote:
> add new device group commands:
> - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ and
>   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
> 
> which enable to turn forwarded IRQ mode on/off.
> 
> the kvm_arch_forwarded_irq struct embodies a forwarded IRQ
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v1 -> v2:
> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
>   to include/uapi/linux/kvm.h
>   also irq_index renamed into index and guest_irq renamed into gsi
> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> ---
>  Documentation/virtual/kvm/devices/vfio.txt | 26 ++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h                   |  9 +++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
> index ef51740..048baa0 100644
> --- a/Documentation/virtual/kvm/devices/vfio.txt
> +++ b/Documentation/virtual/kvm/devices/vfio.txt
> @@ -13,6 +13,7 @@ VFIO-group is held by KVM.
>  
>  Groups:
>    KVM_DEV_VFIO_GROUP
> +  KVM_DEV_VFIO_DEVICE
>  
>  KVM_DEV_VFIO_GROUP attributes:
>    KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
> @@ -20,3 +21,28 @@ KVM_DEV_VFIO_GROUP attributes:
>  
>  For each, kvm_device_attr.addr points to an int32_t file descriptor
>  for the VFIO group.
> +
> +KVM_DEV_VFIO_DEVICE attributes:
> +  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ
> +  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
> +
> +For each, kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct.
> +This user API makes possible to create a special IRQ handling mode,

  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ enables a special IRQ handling mode on
  hardware that supports it,

> +where KVM and a VFIO platform driver collaborate to improve IRQ
> +handling performance.
> +
> +'fd represents the file descriptor of a valid VFIO device whose physical

fd is described out of context here.  Can you copy the struct definition
into this document, perhaps right after the "For each, ..." line above.

> +IRQ, referenced by its index, is injected into the VM guest irq (gsi).
                                             as a virtual IRQ (specified
					     by the gsi field) into the
					     VM.

> +
> +On FORWARD_IRQ, KVM-VFIO device programs:
   When setting the  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ attribute, the
   KVM-VFIO device tells the host (or VFIO?) to not complete the
   physical IRQ, and instead ensures that KVM (or the VM) completes the
   physical IRQ.

> +- the host, to not complete the physical IRQ itself.
> +- the GIC, to automatically complete the physical IRQ when the guest
> +  completes the virtual IRQ.

and drop this bullet form.

> +This avoids trapping the end-of-interrupt for level sensitive IRQ.

avoid this last line, it's specific to ARM.

> +
> +On UNFORWARD_IRQ, one returns to the mode where the host completes the
   When setting the KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ attribute, the
   host (VFIO?) will again complete the physical IRQ and KVM will not...
 
> +physical IRQ and the guest completes the virtual IRQ.
> +
> +It is up to the caller of this API to make sure the IRQ is not
> +outstanding when the FORWARD/UNFORWARD is called. This could lead to

outstanding? can you be specific?

don't refer to FOWARD/UNFORWARD, either refer to these attributes by
their full name or use a clear reference in proper English.

> +some inconsistency on who is going to complete the IRQ.

This sounds like the whole thing is fragile and if userspace doesn't do
things right, IRQ handling of a piece of hardware is going to be
inconsistent?  Is this the case?  If so, we need some stronger
semantics.  If not, this should be rephrased.

> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index cf3a2ff..8cd7b0e 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -947,6 +947,12 @@ struct kvm_device_attr {
>  	__u64	addr;		/* userspace address of attr data */
>  };
>  
> +struct kvm_arch_forwarded_irq {
> +	__u32 fd; /* file desciptor of the VFIO device */
> +	__u32 index; /* VFIO device IRQ index */
> +	__u32 gsi; /* gsi, ie. virtual IRQ number */
> +};
> +
>  #define KVM_DEV_TYPE_FSL_MPIC_20	1
>  #define KVM_DEV_TYPE_FSL_MPIC_42	2
>  #define KVM_DEV_TYPE_XICS		3
> @@ -954,6 +960,9 @@ struct kvm_device_attr {
>  #define  KVM_DEV_VFIO_GROUP			1
>  #define   KVM_DEV_VFIO_GROUP_ADD			1
>  #define   KVM_DEV_VFIO_GROUP_DEL			2
> +#define  KVM_DEV_VFIO_DEVICE			2
> +#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ			1
> +#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ			2
>  #define KVM_DEV_TYPE_ARM_VGIC_V2	5
>  #define KVM_DEV_TYPE_FLIC		6
>  
> -- 
> 1.9.1
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 6/9] VFIO: Extend external user API
  2014-09-01 12:52   ` Eric Auger
@ 2014-09-11  3:10     ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Mon, Sep 01, 2014 at 02:52:45PM +0200, Eric Auger wrote:
> New functions are added to be called from ARM KVM-VFIO device.

This commit message seems somewhat random.  This patch doesn't deal with
anything ARM specific, it introduces some generic functions that allows
users external to vfio itself to retrieve information about a vfio
platform device.

> 
> - vfio_device_get_external_user enables to get a vfio device from
>   its fd
> - vfio_device_put_external_user puts the vfio device
> - vfio_external_base_device returns the struct device*,
>   useful to access the platform_device
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v1 -> v2:
> 
> - vfio_external_get_base_device renamed into vfio_external_base_device
> - vfio_external_get_type removed
> ---
>  drivers/vfio/vfio.c  | 24 ++++++++++++++++++++++++
>  include/linux/vfio.h |  3 +++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 8e84471..282814e 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group *group)
>  }
>  EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
>  
> +struct vfio_device *vfio_device_get_external_user(struct file *filep)
> +{
> +	struct vfio_device *vdev = filep->private_data;
> +
> +	if (filep->f_op != &vfio_device_fops)
> +		return ERR_PTR(-EINVAL);
> +
> +	vfio_device_get(vdev);
> +	return vdev;
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
> +
> +void vfio_device_put_external_user(struct vfio_device *vdev)
> +{
> +	vfio_device_put(vdev);
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
> +
> +struct device *vfio_external_base_device(struct vfio_device *vdev)
> +{
> +	return vdev->dev;
> +}
> +EXPORT_SYMBOL_GPL(vfio_external_base_device);
> +
>  int vfio_external_user_iommu_id(struct vfio_group *group)
>  {
>  	return iommu_group_id(group->iommu_group);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index ffe04ed..bd4b6cb 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
>  extern int vfio_external_user_iommu_id(struct vfio_group *group);
>  extern long vfio_external_check_extension(struct vfio_group *group,
>  					  unsigned long arg);
> +extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
> +extern void vfio_device_put_external_user(struct vfio_device *vdev);
> +extern struct device *vfio_external_base_device(struct vfio_device *vdev);
>  
>  struct pci_dev;
>  #ifdef CONFIG_EEH
> -- 
> 1.9.1
> 
Looks good to me,
-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 6/9] VFIO: Extend external user API
@ 2014-09-11  3:10     ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 01, 2014 at 02:52:45PM +0200, Eric Auger wrote:
> New functions are added to be called from ARM KVM-VFIO device.

This commit message seems somewhat random.  This patch doesn't deal with
anything ARM specific, it introduces some generic functions that allows
users external to vfio itself to retrieve information about a vfio
platform device.

> 
> - vfio_device_get_external_user enables to get a vfio device from
>   its fd
> - vfio_device_put_external_user puts the vfio device
> - vfio_external_base_device returns the struct device*,
>   useful to access the platform_device
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v1 -> v2:
> 
> - vfio_external_get_base_device renamed into vfio_external_base_device
> - vfio_external_get_type removed
> ---
>  drivers/vfio/vfio.c  | 24 ++++++++++++++++++++++++
>  include/linux/vfio.h |  3 +++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 8e84471..282814e 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group *group)
>  }
>  EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
>  
> +struct vfio_device *vfio_device_get_external_user(struct file *filep)
> +{
> +	struct vfio_device *vdev = filep->private_data;
> +
> +	if (filep->f_op != &vfio_device_fops)
> +		return ERR_PTR(-EINVAL);
> +
> +	vfio_device_get(vdev);
> +	return vdev;
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
> +
> +void vfio_device_put_external_user(struct vfio_device *vdev)
> +{
> +	vfio_device_put(vdev);
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
> +
> +struct device *vfio_external_base_device(struct vfio_device *vdev)
> +{
> +	return vdev->dev;
> +}
> +EXPORT_SYMBOL_GPL(vfio_external_base_device);
> +
>  int vfio_external_user_iommu_id(struct vfio_group *group)
>  {
>  	return iommu_group_id(group->iommu_group);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index ffe04ed..bd4b6cb 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
>  extern int vfio_external_user_iommu_id(struct vfio_group *group);
>  extern long vfio_external_check_extension(struct vfio_group *group,
>  					  unsigned long arg);
> +extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
> +extern void vfio_device_put_external_user(struct vfio_device *vdev);
> +extern struct device *vfio_external_base_device(struct vfio_device *vdev);
>  
>  struct pci_dev;
>  #ifdef CONFIG_EEH
> -- 
> 1.9.1
> 
Looks good to me,
-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 7/9] KVM: KVM-VFIO: add new VFIO external API hooks
  2014-09-01 12:52   ` Eric Auger
@ 2014-09-11  3:10     ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Mon, Sep 01, 2014 at 02:52:46PM +0200, Eric Auger wrote:
> add functions that implement the gateway to the extended

Capital letter when beginning a new sentence.  Also the reference to
'the extended VFIO API' feels a bit weird.  Can't you make your commit
message a little more descriptive of this patch, something along the
lines of:

Provide wrapper functions that allows KVM-VFIO device code to get an
external handle on a struct vfio_device based on a vfio device file
descriptor.  We provide this through three new functions:

(assuming I got this right).



> external VFIO API:
> - kvm_vfio_device_get_external_user
> - kvm_vfio_device_put_external_user
> - kvm_vfio_external_base_device
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v1 -> v2:
> - kvm_vfio_external_get_base_device renamed into
>   kvm_vfio_external_base_device
> - kvm_vfio_external_get_type removed
> ---
>  arch/arm/include/asm/kvm_host.h |  5 +++++
>  virt/kvm/vfio.c                 | 45 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 6dfb404..1aee6bb 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -171,6 +171,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>  unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
>  int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
>  
> +struct vfio_device;
> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep);
> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev);
> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev);
> +
>  /* We do not have shadow page tables, hence the empty hooks */
>  static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
>  {
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index ba1a93f..76dc7a1 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -59,6 +59,51 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
>  	symbol_put(vfio_group_put_external_user);
>  }
>  
> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
> +{
> +	struct vfio_device *vdev;
> +	struct vfio_device *(*fn)(struct file *);
> +
> +	fn = symbol_get(vfio_device_get_external_user);
> +	if (!fn)
> +		return ERR_PTR(-EINVAL);
> +
> +	vdev = fn(filep);
> +
> +	symbol_put(vfio_device_get_external_user);
> +
> +	return vdev;
> +}
> +
> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
> +{
> +	void (*fn)(struct vfio_device *);
> +
> +	fn = symbol_get(vfio_device_put_external_user);
> +	if (!fn)
> +		return;
> +
> +	fn(vdev);
> +
> +	symbol_put(vfio_device_put_external_user);
> +}
> +
> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
> +{
> +	struct device *(*fn)(struct vfio_device *);
> +	struct device *dev;
> +
> +	fn = symbol_get(vfio_external_base_device);
> +	if (!fn)
> +		return NULL;
> +
> +	dev = fn(vdev);
> +
> +	symbol_put(vfio_external_base_device);
> +
> +	return dev;
> +}
> +
>  static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
>  {
>  	long (*fn)(struct vfio_group *, unsigned long);
> -- 
> 1.9.1
> 

otherwise looks good to me!
-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 7/9] KVM: KVM-VFIO: add new VFIO external API hooks
@ 2014-09-11  3:10     ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 01, 2014 at 02:52:46PM +0200, Eric Auger wrote:
> add functions that implement the gateway to the extended

Capital letter when beginning a new sentence.  Also the reference to
'the extended VFIO API' feels a bit weird.  Can't you make your commit
message a little more descriptive of this patch, something along the
lines of:

Provide wrapper functions that allows KVM-VFIO device code to get an
external handle on a struct vfio_device based on a vfio device file
descriptor.  We provide this through three new functions:

(assuming I got this right).



> external VFIO API:
> - kvm_vfio_device_get_external_user
> - kvm_vfio_device_put_external_user
> - kvm_vfio_external_base_device
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v1 -> v2:
> - kvm_vfio_external_get_base_device renamed into
>   kvm_vfio_external_base_device
> - kvm_vfio_external_get_type removed
> ---
>  arch/arm/include/asm/kvm_host.h |  5 +++++
>  virt/kvm/vfio.c                 | 45 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 6dfb404..1aee6bb 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -171,6 +171,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>  unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
>  int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
>  
> +struct vfio_device;
> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep);
> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev);
> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev);
> +
>  /* We do not have shadow page tables, hence the empty hooks */
>  static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
>  {
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index ba1a93f..76dc7a1 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -59,6 +59,51 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
>  	symbol_put(vfio_group_put_external_user);
>  }
>  
> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
> +{
> +	struct vfio_device *vdev;
> +	struct vfio_device *(*fn)(struct file *);
> +
> +	fn = symbol_get(vfio_device_get_external_user);
> +	if (!fn)
> +		return ERR_PTR(-EINVAL);
> +
> +	vdev = fn(filep);
> +
> +	symbol_put(vfio_device_get_external_user);
> +
> +	return vdev;
> +}
> +
> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
> +{
> +	void (*fn)(struct vfio_device *);
> +
> +	fn = symbol_get(vfio_device_put_external_user);
> +	if (!fn)
> +		return;
> +
> +	fn(vdev);
> +
> +	symbol_put(vfio_device_put_external_user);
> +}
> +
> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
> +{
> +	struct device *(*fn)(struct vfio_device *);
> +	struct device *dev;
> +
> +	fn = symbol_get(vfio_external_base_device);
> +	if (!fn)
> +		return NULL;
> +
> +	dev = fn(vdev);
> +
> +	symbol_put(vfio_external_base_device);
> +
> +	return dev;
> +}
> +
>  static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
>  {
>  	long (*fn)(struct vfio_group *, unsigned long);
> -- 
> 1.9.1
> 

otherwise looks good to me!
-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-01 12:52   ` Eric Auger
@ 2014-09-11  3:10     ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> 
> This is a new control channel which enables KVM to cooperate with
> viable VFIO devices.
> 
> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> in addition to a list of groups (kvm_vfio_group). The new
> infrastructure enables to check the validity of the VFIO device
> file descriptor, get and hold a reference to it.
> 
> The first concrete implemented command is IRQ forward control:
> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> 
> It consists in programing the VFIO driver and KVM in a consistent manner
> so that an optimized IRQ injection/completion is set up. Each
> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> are set again in the normal handling state (non forwarded).

'putting a kvm_vfio_device' sounds to like you're golf'ing :)

When a kvm_vfio_device is released?

> 
> The forwarding programmming is architecture specific, embodied by the
> kvm_arch_set_fwd_state function. Its implementation is given in a
> separate patch file.

I would drop the last sentence and instead indicate that this is handled
properly when the architecture does not support such a feature.

> 
> The forwarding control modality is enabled by the
> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v1 -> v2:
> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> - original patch file separated into 2 parts: generic part moved in vfio.c
>   and ARM specific part(kvm_arch_set_fwd_state)
> ---
>  include/linux/kvm_host.h |  27 +++
>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 477 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index a4c33b3..24350dc 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
>  		      unsigned long arg);
>  };
>  
> +enum kvm_fwd_irq_action {
> +	KVM_VFIO_IRQ_SET_FORWARD,
> +	KVM_VFIO_IRQ_SET_NORMAL,
> +	KVM_VFIO_IRQ_CLEANUP,

This is KVM internal API, so it would probably be good to document this.
Especially the CLEANUP bit worries me, see below.

> +};
> +
> +/* internal structure describing a forwarded IRQ */
> +struct kvm_fwd_irq {
> +	struct list_head link;

this list entry is local to the kvm vfio device, right? that means you
probably want a struct with just the below fields, and then have a
containing struct in the generic device file, private to it's logic.

> +	__u32 index; /* platform device irq index */
> +	__u32 hwirq; /*physical IRQ */
> +	__u32 gsi; /* virtual IRQ */
> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> +};
> +
>  void kvm_device_get(struct kvm_device *dev);
>  void kvm_device_put(struct kvm_device *dev);
>  struct kvm_device *kvm_device_from_filp(struct file *filp);
> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>  extern struct kvm_device_ops kvm_flic_ops;
>  
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,

what's the 'p' in pfwd?

> +			   enum kvm_fwd_irq_action action);
> +
> +#else
> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> +					 enum kvm_fwd_irq_action action)
> +{
> +	return 0;
> +}
> +#endif
> +
>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>  
>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 76dc7a1..e4a81c4 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -18,14 +18,24 @@
>  #include <linux/slab.h>
>  #include <linux/uaccess.h>
>  #include <linux/vfio.h>
> +#include <linux/platform_device.h>
>  
>  struct kvm_vfio_group {
>  	struct list_head node;
>  	struct vfio_group *vfio_group;
>  };
>  
> +struct kvm_vfio_device {
> +	struct list_head node;
> +	struct vfio_device *vfio_device;
> +	/* list of forwarded IRQs for that VFIO device */
> +	struct list_head fwd_irq_list;
> +	int fd;
> +};
> +
>  struct kvm_vfio {
>  	struct list_head group_list;
> +	struct list_head device_list;
>  	struct mutex lock;
>  	bool noncoherent;
>  };
> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>  	return -ENXIO;
>  }
>  
> +/**
> + * get_vfio_device - returns the vfio-device corresponding to this fd
> + * @fd:fd of the vfio platform device
> + *
> + * checks it is a vfio device
> + * increment its ref counter

why the short lines?  Just write this out in proper English.

> + */
> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> +{
> +	struct fd f;
> +	struct vfio_device *vdev;
> +
> +	f = fdget(fd);
> +	if (!f.file)
> +		return NULL;
> +	vdev = kvm_vfio_device_get_external_user(f.file);
> +	fdput(f);
> +	return vdev;
> +}
> +
> +/**
> + * put_vfio_device: put the vfio platform device
> + * @vdev: vfio_device to put
> + *
> + * decrement the ref counter
> + */
> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> +{
> +	kvm_vfio_device_put_external_user(vdev);
> +}
> +
> +/**
> + * kvm_vfio_find_device - look for the device in the assigned
> + * device list
> + * @kv: the kvm-vfio device
> + * @vdev: the vfio_device to look for
> + *
> + * returns the associated kvm_vfio_device if the device is known,
> + * meaning at least 1 IRQ is forwarded for this device.
> + * in the device is not registered, returns NULL.
> + */

are these functions meant to be exported?  Otherwise they should be
static, and the documentation on these simple list iteration wrappers
seems like overkill imho.

> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> +					     struct vfio_device *vdev)
> +{
> +	struct kvm_vfio_device *kvm_vdev_iter;
> +
> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> +		if (kvm_vdev_iter->vfio_device == vdev)
> +			return kvm_vdev_iter;
> +	}
> +	return NULL;
> +}
> +
> +/**
> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> + * @kvm_vdev: the kvm_vfio_device
> + * @irq_index: irq index
> + *
> + * returns the forwarded irq struct if it exists, NULL in the negative
> + */
> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> +				      int irq_index)
> +{
> +	struct kvm_fwd_irq *fwd_irq_iter;
> +
> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> +		if (fwd_irq_iter->index == irq_index)
> +			return fwd_irq_iter;
> +	}
> +	return NULL;
> +}
> +
> +/**
> + * validate_forward - checks whether forwarding a given IRQ is meaningful
> + * @vdev:  vfio_device the IRQ belongs to
> + * @fwd_irq: user struct containing the irq_index to forward
> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> + * kvm_vfio_device that holds it
> + * @hwirq: irq numberthe irq index corresponds to
> + *
> + * checks the vfio-device is a platform vfio device
> + * checks the irq_index corresponds to an actual hwirq and
> + * checks this hwirq is not already forwarded
> + * returns < 0 on following errors:
> + * not a platform device, bad irq index, already forwarded
> + */
> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> +			    struct vfio_device *vdev,
> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> +			    struct kvm_vfio_device **kvm_vdev,
> +			    int *hwirq)
> +{
> +	struct device *dev = kvm_vfio_external_base_device(vdev);
> +	struct platform_device *platdev;
> +
> +	*hwirq = -1;
> +	*kvm_vdev = NULL;
> +	if (strcmp(dev->bus->name, "platform") == 0) {
> +		platdev = to_platform_device(dev);
> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> +		if (*hwirq < 0) {
> +			kvm_err("%s incorrect index\n", __func__);
> +			return -EINVAL;
> +		}
> +	} else {
> +		kvm_err("%s not a platform device\n", __func__);
> +		return -EINVAL;
> +	}

need some spaceing here, also, I would turn this around, first check if
the strcmp fails, and then error out, then do you next check etc., to
avoid so many nested statements.

> +	/* is a ref to this device already owned by the KVM-VFIO device? */

this comment is not particularly helpful in its current form, it would
be helpful if you specified that we're checking whether that particular
device/irq combo is already registered.

> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> +	if (*kvm_vdev) {
> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> +			kvm_err("%s irq %d already forwarded\n",
> +				__func__, *hwirq);

don't flood the kernel log because of a user error, just allocate an
error code for this purpose and document it in the ABI, -EEXIST or
something.

> +			return -EINVAL;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/**
> + * validate_unforward: check a deassignment is meaningful
> + * @kv: the kvm_vfio device
> + * @vdev: the vfio_device whose irq to deassign belongs to
> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> + * it exists
> + *
> + * returns 0 if the provided irq effectively is forwarded
> + * (a ref to this vfio_device is hold and this irq belongs to
                                    held
> + * the forwarded irq of this device)
> + * returns -EINVAL in the negative

               ENOENT should be returned if you don't have an entry.
	       EINVAL could be used if you supply an fd that isn't a
	       VFIO device file descriptor, for example.  Again,
	       consider documenting all this in the API.

> + */
> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> +			      struct vfio_device *vdev,
> +			      struct kvm_arch_forwarded_irq *fwd_irq,
> +			      struct kvm_vfio_device **kvm_vdev)
> +{
> +	struct kvm_fwd_irq *pfwd;
> +
> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> +	if (!kvm_vdev) {
> +		kvm_err("%s no forwarded irq for this device\n", __func__);

don't flood the kernel log

> +		return -EINVAL;
> +	}
> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> +	if (!pfwd) {
> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);


> +		return -EINVAL;

same here

> +	}
> +	return 0;
> +}
> +
> +/**
> + * kvm_vfio_forward - set a forwarded IRQ
> + * @kdev: the kvm device
> + * @vdev: the vfio device the IRQ belongs to
> + * @fwd_irq: the user struct containing the irq_index and guest irq
> + * @must_put: tells the caller whether the vfio_device must be put after
> + * the call (ref must be released in case a ref onto this device was
> + * already hold or in case of new device and failure)
> + *
> + * validate the injection, activate forward and store the information
      Validate
> + * about which irq and which device is concerned so that on deassign or
> + * kvm-vfio destruction everuthing can be cleaned up.
                           everything

I'm not sure I understand this explanation.  Do we have concerned
devices?

I think you want to say something along the lines of: If userspace passed
a valid vfio device and irq handle and the architecture supports
forwarding this combination, register the vfio_device and irq
combination in the ....

> + */
> +static int kvm_vfio_forward(struct kvm_device *kdev,
> +			    struct vfio_device *vdev,
> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> +			    bool *must_put)
> +{
> +	int ret;
> +	struct kvm_fwd_irq *pfwd = NULL;
> +	struct kvm_vfio_device *kvm_vdev = NULL;
> +	struct kvm_vfio *kv = kdev->private;
> +	int hwirq;
> +
> +	*must_put = true;
> +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
> +					&kvm_vdev, &hwirq);
> +	if (ret < 0)
> +		return -EINVAL;
> +
> +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);

seems a bit pointless to zero-out the memory if you're setting all
fields below.

> +	if (!pfwd)
> +		return -ENOMEM;
> +	pfwd->index = fwd_irq->index;
> +	pfwd->gsi = fwd_irq->gsi;
> +	pfwd->hwirq = hwirq;
> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> +	if (ret < 0) {
> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);

this whole thing feels incredibly broken to me.  Setting a forward
should either work or not work, not something in between that leaves
something to be cleaned up.  Why this two-stage thingy here?

> +		kfree(pfwd);

probably want to move your free-and-return-error to the end of the
function.

> +		return ret;
> +	}
> +
> +	if (!kvm_vdev) {
> +		/* create & insert the new device and keep the ref */
> +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);

again, no need for zeroing out the memory.

> +		if (!kvm_vdev) {
> +			kvm_arch_set_fwd_state(pfwd, false);
> +			kfree(pfwd);
> +			return -ENOMEM;
> +		}
> +
> +		kvm_vdev->vfio_device = vdev;
> +		kvm_vdev->fd = fwd_irq->fd;
> +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
> +		list_add(&kvm_vdev->node, &kv->device_list);
> +		/*
> +		 * the only case where we keep the ref:
> +		 * new device and forward setting successful
> +		 */
> +		*must_put = false;
> +	}
> +
> +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
> +
> +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
> +	fwd_irq->fd, hwirq, fwd_irq->gsi);

please indent this to align with the opening parenthesis.

> +
> +	return 0;
> +}
> +
> +/**
> + * remove_assigned_device - put a given device from the list

this isn't a 'put', at least not *just* a put.

> + * @kv: the kvm-vfio device
> + * @vdev: the vfio-device to remove
> + *
> + * change the state of all forwarded IRQs, free the forwarded IRQ list,
> + * remove the corresponding kvm_vfio_device from the assigned device
> + * list.
> + * returns true if the device could be removed, false in the negative
> + */
> +bool remove_assigned_device(struct kvm_vfio *kv,
> +			    struct vfio_device *vdev)
> +{
> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> +	bool removed = false;
> +	int ret;
> +
> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> +				 &kv->device_list, node) {
> +		if (kvm_vdev_iter->vfio_device == vdev) {
> +			/* loop on all its forwarded IRQ */
> +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> +						 &kvm_vdev_iter->fwd_irq_list,
> +						 link) {

hmm, seems this function is only called when you have no more forwarded
IRQs, so isn't all of this completely dead (and unnecessary) code?

> +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> +						KVM_VFIO_IRQ_SET_NORMAL);
> +				if (ret < 0)
> +					return ret;

you're returning an error code to a bool function, which means you'll
return true when there was an error.  Is this your intention? ;)

if we have an error here, this would be a very very bad situation wouldn't it?

> +				list_del(&fwd_irq_iter->link);
> +				kfree(fwd_irq_iter);
> +			}
> +			/* all IRQs could be deassigned */
> +			list_del(&kvm_vdev_iter->node);
> +			kvm_vfio_device_put_external_user(
> +				kvm_vdev_iter->vfio_device);
> +			kfree(kvm_vdev_iter);
> +			removed = true;
> +			break;
> +		}
> +	}
> +	return removed;
> +}
> +
> +
> +/**
> + * remove_fwd_irq - remove a forwarded irq
> + *
> + * @kv: kvm-vfio device
> + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
> + * irq_index: the index of the IRQ
> + *
> + * change the forwarded state of the IRQ, remove the IRQ from
> + * the device forwarded IRQ list. In case it is the last one,
> + * put the device
> + */
> +int remove_fwd_irq(struct kvm_vfio *kv,
> +		   struct kvm_vfio_device *kvm_vdev,
> +		   int irq_index)
> +{
> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> +	int ret = -1;
> +
> +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> +				 &kvm_vdev->fwd_irq_list, link) {

hmmm, you can only forward one irq for a specific device once, right?
And you already have a lookup function, so why not call that, and then
remove it?

I'm confused.

> +		if (fwd_irq_iter->index == irq_index) {
> +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> +						KVM_VFIO_IRQ_SET_NORMAL);
> +			if (ret < 0)
> +				break;
> +			list_del(&fwd_irq_iter->link);
> +			kfree(fwd_irq_iter);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	if (list_empty(&kvm_vdev->fwd_irq_list))
> +		remove_assigned_device(kv, kvm_vdev->vfio_device);
> +
> +	return ret;
> +}
> +
> +/**
> + * kvm_vfio_unforward - remove a forwarded IRQ
> + * @kdev: the kvm device
> + * @vdev: the vfio_device
> + * @fwd_irq: user struct
> + * after checking this IRQ effectively is forwarded, change its state,
> + * remove it from the corresponding kvm_vfio_device list
> + */
> +static int kvm_vfio_unforward(struct kvm_device *kdev,
> +				     struct vfio_device *vdev,
> +				     struct kvm_arch_forwarded_irq *fwd_irq)
> +{
> +	struct kvm_vfio *kv = kdev->private;
> +	struct kvm_vfio_device *kvm_vdev;
> +	int ret;
> +
> +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
> +	if (ret < 0)
> +		return -EINVAL;

why do you override the return value?  Propagate it.

> +
> +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
> +	if (ret < 0)
> +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
> +			__func__, fwd_irq->fd, fwd_irq->index);
> +	else
> +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
> +			  __func__, fwd_irq->fd, fwd_irq->index);

again with the kernel log here.



> +	return ret;
> +}
> +
> +
> +
> +
> +/**
> + * kvm_vfio_set_device - the top function for interracting with a vfio

                                top?             interacting

> + * device
> + */

probably just skip this comment

> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> +{
> +	struct kvm_vfio *kv = kdev->private;
> +	struct vfio_device *vdev;
> +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
> +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> +
> +	switch (attr) {
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> +		bool must_put;
> +		int ret;
> +
> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> +			return -EFAULT;
> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> +		if (IS_ERR(vdev))
> +			return PTR_ERR(vdev);

seems like this whole block of code is replicated below, needs
refactoring.

> +		mutex_lock(&kv->lock);
> +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
> +		if (must_put)
> +			kvm_vfio_put_vfio_device(vdev);

this must_put looks plain weird.  I think you want to balance your
get/put's always; can't you just get an extra reference in
kvm_vfio_forward() ?

> +		mutex_unlock(&kv->lock);
> +		return ret;
> +		}
> +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
> +		int ret;
> +
> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> +			return -EFAULT;
> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> +		if (IS_ERR(vdev))
> +			return PTR_ERR(vdev);
> +
> +		kvm_vfio_device_put_external_user(vdev);

you're dropping the reference to the device but referencing it in your
unfoward call below?

> +		mutex_lock(&kv->lock);
> +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
> +		mutex_unlock(&kv->lock);
> +		return ret;
> +	}
> +#endif
> +	default:
> +		return -ENXIO;
> +	}
> +}
> +
> +/**
> + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
> + * @kv: kvm-vfio device
> + *
> + * loop on all got devices and their associated forwarded IRQs

'loop on all got' ?

Restore the non-forwarded state for all registered devices and ...

> + * restore the non forwarded state, remove IRQs and their devices from
> + * the respective list, put the vfio platform devices
> + *
> + * When this function is called, the vcpu already are destroyed. No
                                    the VPUCs are already destroyed.
> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> + * kvm_arch_set_fwd_state action

this last bit didn't make any sense to me.  Also, why are we referring
to the vgic in generic code?

> + */
> +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
> +{
> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> +
> +	/* loop on all the assigned devices */

unnecessary comment

> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> +				 &kv->device_list, node) {
> +
> +		/* loop on all its forwarded IRQ */

same

> +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> +					 &kvm_vdev_iter->fwd_irq_list, link) {
> +			kvm_arch_set_fwd_state(fwd_irq_iter,
> +						KVM_VFIO_IRQ_CLEANUP);
> +			list_del(&fwd_irq_iter->link);
> +			kfree(fwd_irq_iter);
> +		}
> +		list_del(&kvm_vdev_iter->node);
> +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
> +		kfree(kvm_vdev_iter);
> +	}
> +	return 0;
> +}
> +
> +
>  static int kvm_vfio_set_attr(struct kvm_device *dev,
>  			     struct kvm_device_attr *attr)
>  {
>  	switch (attr->group) {
>  	case KVM_DEV_VFIO_GROUP:
>  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> +	case KVM_DEV_VFIO_DEVICE:
> +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
>  	}
>  
>  	return -ENXIO;
> @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
>  		case KVM_DEV_VFIO_GROUP_DEL:
>  			return 0;
>  		}
> -
>  		break;
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +	case KVM_DEV_VFIO_DEVICE:
> +		switch (attr->attr) {
> +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> +			return 0;
> +		}
> +		break;
> +#endif
>  	}
> -
>  	return -ENXIO;
>  }
>  
> @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
>  		list_del(&kvg->node);
>  		kfree(kvg);
>  	}
> +	kvm_vfio_put_all_devices(kv);
>  
>  	kvm_vfio_update_coherency(dev);
>  
> @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
>  		return -ENOMEM;
>  
>  	INIT_LIST_HEAD(&kv->group_list);
> +	INIT_LIST_HEAD(&kv->device_list);
>  	mutex_init(&kv->lock);
>  
>  	dev->private = kv;
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11  3:10     ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> 
> This is a new control channel which enables KVM to cooperate with
> viable VFIO devices.
> 
> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> in addition to a list of groups (kvm_vfio_group). The new
> infrastructure enables to check the validity of the VFIO device
> file descriptor, get and hold a reference to it.
> 
> The first concrete implemented command is IRQ forward control:
> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> 
> It consists in programing the VFIO driver and KVM in a consistent manner
> so that an optimized IRQ injection/completion is set up. Each
> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> are set again in the normal handling state (non forwarded).

'putting a kvm_vfio_device' sounds to like you're golf'ing :)

When a kvm_vfio_device is released?

> 
> The forwarding programmming is architecture specific, embodied by the
> kvm_arch_set_fwd_state function. Its implementation is given in a
> separate patch file.

I would drop the last sentence and instead indicate that this is handled
properly when the architecture does not support such a feature.

> 
> The forwarding control modality is enabled by the
> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> 
> ---
> 
> v1 -> v2:
> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> - original patch file separated into 2 parts: generic part moved in vfio.c
>   and ARM specific part(kvm_arch_set_fwd_state)
> ---
>  include/linux/kvm_host.h |  27 +++
>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 477 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index a4c33b3..24350dc 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
>  		      unsigned long arg);
>  };
>  
> +enum kvm_fwd_irq_action {
> +	KVM_VFIO_IRQ_SET_FORWARD,
> +	KVM_VFIO_IRQ_SET_NORMAL,
> +	KVM_VFIO_IRQ_CLEANUP,

This is KVM internal API, so it would probably be good to document this.
Especially the CLEANUP bit worries me, see below.

> +};
> +
> +/* internal structure describing a forwarded IRQ */
> +struct kvm_fwd_irq {
> +	struct list_head link;

this list entry is local to the kvm vfio device, right? that means you
probably want a struct with just the below fields, and then have a
containing struct in the generic device file, private to it's logic.

> +	__u32 index; /* platform device irq index */
> +	__u32 hwirq; /*physical IRQ */
> +	__u32 gsi; /* virtual IRQ */
> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> +};
> +
>  void kvm_device_get(struct kvm_device *dev);
>  void kvm_device_put(struct kvm_device *dev);
>  struct kvm_device *kvm_device_from_filp(struct file *filp);
> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>  extern struct kvm_device_ops kvm_flic_ops;
>  
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,

what's the 'p' in pfwd?

> +			   enum kvm_fwd_irq_action action);
> +
> +#else
> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> +					 enum kvm_fwd_irq_action action)
> +{
> +	return 0;
> +}
> +#endif
> +
>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>  
>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 76dc7a1..e4a81c4 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -18,14 +18,24 @@
>  #include <linux/slab.h>
>  #include <linux/uaccess.h>
>  #include <linux/vfio.h>
> +#include <linux/platform_device.h>
>  
>  struct kvm_vfio_group {
>  	struct list_head node;
>  	struct vfio_group *vfio_group;
>  };
>  
> +struct kvm_vfio_device {
> +	struct list_head node;
> +	struct vfio_device *vfio_device;
> +	/* list of forwarded IRQs for that VFIO device */
> +	struct list_head fwd_irq_list;
> +	int fd;
> +};
> +
>  struct kvm_vfio {
>  	struct list_head group_list;
> +	struct list_head device_list;
>  	struct mutex lock;
>  	bool noncoherent;
>  };
> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>  	return -ENXIO;
>  }
>  
> +/**
> + * get_vfio_device - returns the vfio-device corresponding to this fd
> + * @fd:fd of the vfio platform device
> + *
> + * checks it is a vfio device
> + * increment its ref counter

why the short lines?  Just write this out in proper English.

> + */
> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> +{
> +	struct fd f;
> +	struct vfio_device *vdev;
> +
> +	f = fdget(fd);
> +	if (!f.file)
> +		return NULL;
> +	vdev = kvm_vfio_device_get_external_user(f.file);
> +	fdput(f);
> +	return vdev;
> +}
> +
> +/**
> + * put_vfio_device: put the vfio platform device
> + * @vdev: vfio_device to put
> + *
> + * decrement the ref counter
> + */
> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> +{
> +	kvm_vfio_device_put_external_user(vdev);
> +}
> +
> +/**
> + * kvm_vfio_find_device - look for the device in the assigned
> + * device list
> + * @kv: the kvm-vfio device
> + * @vdev: the vfio_device to look for
> + *
> + * returns the associated kvm_vfio_device if the device is known,
> + * meaning at least 1 IRQ is forwarded for this device.
> + * in the device is not registered, returns NULL.
> + */

are these functions meant to be exported?  Otherwise they should be
static, and the documentation on these simple list iteration wrappers
seems like overkill imho.

> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> +					     struct vfio_device *vdev)
> +{
> +	struct kvm_vfio_device *kvm_vdev_iter;
> +
> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> +		if (kvm_vdev_iter->vfio_device == vdev)
> +			return kvm_vdev_iter;
> +	}
> +	return NULL;
> +}
> +
> +/**
> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> + * @kvm_vdev: the kvm_vfio_device
> + * @irq_index: irq index
> + *
> + * returns the forwarded irq struct if it exists, NULL in the negative
> + */
> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> +				      int irq_index)
> +{
> +	struct kvm_fwd_irq *fwd_irq_iter;
> +
> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> +		if (fwd_irq_iter->index == irq_index)
> +			return fwd_irq_iter;
> +	}
> +	return NULL;
> +}
> +
> +/**
> + * validate_forward - checks whether forwarding a given IRQ is meaningful
> + * @vdev:  vfio_device the IRQ belongs to
> + * @fwd_irq: user struct containing the irq_index to forward
> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> + * kvm_vfio_device that holds it
> + * @hwirq: irq numberthe irq index corresponds to
> + *
> + * checks the vfio-device is a platform vfio device
> + * checks the irq_index corresponds to an actual hwirq and
> + * checks this hwirq is not already forwarded
> + * returns < 0 on following errors:
> + * not a platform device, bad irq index, already forwarded
> + */
> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> +			    struct vfio_device *vdev,
> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> +			    struct kvm_vfio_device **kvm_vdev,
> +			    int *hwirq)
> +{
> +	struct device *dev = kvm_vfio_external_base_device(vdev);
> +	struct platform_device *platdev;
> +
> +	*hwirq = -1;
> +	*kvm_vdev = NULL;
> +	if (strcmp(dev->bus->name, "platform") == 0) {
> +		platdev = to_platform_device(dev);
> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> +		if (*hwirq < 0) {
> +			kvm_err("%s incorrect index\n", __func__);
> +			return -EINVAL;
> +		}
> +	} else {
> +		kvm_err("%s not a platform device\n", __func__);
> +		return -EINVAL;
> +	}

need some spaceing here, also, I would turn this around, first check if
the strcmp fails, and then error out, then do you next check etc., to
avoid so many nested statements.

> +	/* is a ref to this device already owned by the KVM-VFIO device? */

this comment is not particularly helpful in its current form, it would
be helpful if you specified that we're checking whether that particular
device/irq combo is already registered.

> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> +	if (*kvm_vdev) {
> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> +			kvm_err("%s irq %d already forwarded\n",
> +				__func__, *hwirq);

don't flood the kernel log because of a user error, just allocate an
error code for this purpose and document it in the ABI, -EEXIST or
something.

> +			return -EINVAL;
> +		}
> +	}
> +	return 0;
> +}
> +
> +/**
> + * validate_unforward: check a deassignment is meaningful
> + * @kv: the kvm_vfio device
> + * @vdev: the vfio_device whose irq to deassign belongs to
> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> + * it exists
> + *
> + * returns 0 if the provided irq effectively is forwarded
> + * (a ref to this vfio_device is hold and this irq belongs to
                                    held
> + * the forwarded irq of this device)
> + * returns -EINVAL in the negative

               ENOENT should be returned if you don't have an entry.
	       EINVAL could be used if you supply an fd that isn't a
	       VFIO device file descriptor, for example.  Again,
	       consider documenting all this in the API.

> + */
> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> +			      struct vfio_device *vdev,
> +			      struct kvm_arch_forwarded_irq *fwd_irq,
> +			      struct kvm_vfio_device **kvm_vdev)
> +{
> +	struct kvm_fwd_irq *pfwd;
> +
> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> +	if (!kvm_vdev) {
> +		kvm_err("%s no forwarded irq for this device\n", __func__);

don't flood the kernel log

> +		return -EINVAL;
> +	}
> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> +	if (!pfwd) {
> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);


> +		return -EINVAL;

same here

> +	}
> +	return 0;
> +}
> +
> +/**
> + * kvm_vfio_forward - set a forwarded IRQ
> + * @kdev: the kvm device
> + * @vdev: the vfio device the IRQ belongs to
> + * @fwd_irq: the user struct containing the irq_index and guest irq
> + * @must_put: tells the caller whether the vfio_device must be put after
> + * the call (ref must be released in case a ref onto this device was
> + * already hold or in case of new device and failure)
> + *
> + * validate the injection, activate forward and store the information
      Validate
> + * about which irq and which device is concerned so that on deassign or
> + * kvm-vfio destruction everuthing can be cleaned up.
                           everything

I'm not sure I understand this explanation.  Do we have concerned
devices?

I think you want to say something along the lines of: If userspace passed
a valid vfio device and irq handle and the architecture supports
forwarding this combination, register the vfio_device and irq
combination in the ....

> + */
> +static int kvm_vfio_forward(struct kvm_device *kdev,
> +			    struct vfio_device *vdev,
> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> +			    bool *must_put)
> +{
> +	int ret;
> +	struct kvm_fwd_irq *pfwd = NULL;
> +	struct kvm_vfio_device *kvm_vdev = NULL;
> +	struct kvm_vfio *kv = kdev->private;
> +	int hwirq;
> +
> +	*must_put = true;
> +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
> +					&kvm_vdev, &hwirq);
> +	if (ret < 0)
> +		return -EINVAL;
> +
> +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);

seems a bit pointless to zero-out the memory if you're setting all
fields below.

> +	if (!pfwd)
> +		return -ENOMEM;
> +	pfwd->index = fwd_irq->index;
> +	pfwd->gsi = fwd_irq->gsi;
> +	pfwd->hwirq = hwirq;
> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> +	if (ret < 0) {
> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);

this whole thing feels incredibly broken to me.  Setting a forward
should either work or not work, not something in between that leaves
something to be cleaned up.  Why this two-stage thingy here?

> +		kfree(pfwd);

probably want to move your free-and-return-error to the end of the
function.

> +		return ret;
> +	}
> +
> +	if (!kvm_vdev) {
> +		/* create & insert the new device and keep the ref */
> +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);

again, no need for zeroing out the memory.

> +		if (!kvm_vdev) {
> +			kvm_arch_set_fwd_state(pfwd, false);
> +			kfree(pfwd);
> +			return -ENOMEM;
> +		}
> +
> +		kvm_vdev->vfio_device = vdev;
> +		kvm_vdev->fd = fwd_irq->fd;
> +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
> +		list_add(&kvm_vdev->node, &kv->device_list);
> +		/*
> +		 * the only case where we keep the ref:
> +		 * new device and forward setting successful
> +		 */
> +		*must_put = false;
> +	}
> +
> +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
> +
> +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
> +	fwd_irq->fd, hwirq, fwd_irq->gsi);

please indent this to align with the opening parenthesis.

> +
> +	return 0;
> +}
> +
> +/**
> + * remove_assigned_device - put a given device from the list

this isn't a 'put', at least not *just* a put.

> + * @kv: the kvm-vfio device
> + * @vdev: the vfio-device to remove
> + *
> + * change the state of all forwarded IRQs, free the forwarded IRQ list,
> + * remove the corresponding kvm_vfio_device from the assigned device
> + * list.
> + * returns true if the device could be removed, false in the negative
> + */
> +bool remove_assigned_device(struct kvm_vfio *kv,
> +			    struct vfio_device *vdev)
> +{
> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> +	bool removed = false;
> +	int ret;
> +
> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> +				 &kv->device_list, node) {
> +		if (kvm_vdev_iter->vfio_device == vdev) {
> +			/* loop on all its forwarded IRQ */
> +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> +						 &kvm_vdev_iter->fwd_irq_list,
> +						 link) {

hmm, seems this function is only called when you have no more forwarded
IRQs, so isn't all of this completely dead (and unnecessary) code?

> +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> +						KVM_VFIO_IRQ_SET_NORMAL);
> +				if (ret < 0)
> +					return ret;

you're returning an error code to a bool function, which means you'll
return true when there was an error.  Is this your intention? ;)

if we have an error here, this would be a very very bad situation wouldn't it?

> +				list_del(&fwd_irq_iter->link);
> +				kfree(fwd_irq_iter);
> +			}
> +			/* all IRQs could be deassigned */
> +			list_del(&kvm_vdev_iter->node);
> +			kvm_vfio_device_put_external_user(
> +				kvm_vdev_iter->vfio_device);
> +			kfree(kvm_vdev_iter);
> +			removed = true;
> +			break;
> +		}
> +	}
> +	return removed;
> +}
> +
> +
> +/**
> + * remove_fwd_irq - remove a forwarded irq
> + *
> + * @kv: kvm-vfio device
> + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
> + * irq_index: the index of the IRQ
> + *
> + * change the forwarded state of the IRQ, remove the IRQ from
> + * the device forwarded IRQ list. In case it is the last one,
> + * put the device
> + */
> +int remove_fwd_irq(struct kvm_vfio *kv,
> +		   struct kvm_vfio_device *kvm_vdev,
> +		   int irq_index)
> +{
> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> +	int ret = -1;
> +
> +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> +				 &kvm_vdev->fwd_irq_list, link) {

hmmm, you can only forward one irq for a specific device once, right?
And you already have a lookup function, so why not call that, and then
remove it?

I'm confused.

> +		if (fwd_irq_iter->index == irq_index) {
> +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> +						KVM_VFIO_IRQ_SET_NORMAL);
> +			if (ret < 0)
> +				break;
> +			list_del(&fwd_irq_iter->link);
> +			kfree(fwd_irq_iter);
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	if (list_empty(&kvm_vdev->fwd_irq_list))
> +		remove_assigned_device(kv, kvm_vdev->vfio_device);
> +
> +	return ret;
> +}
> +
> +/**
> + * kvm_vfio_unforward - remove a forwarded IRQ
> + * @kdev: the kvm device
> + * @vdev: the vfio_device
> + * @fwd_irq: user struct
> + * after checking this IRQ effectively is forwarded, change its state,
> + * remove it from the corresponding kvm_vfio_device list
> + */
> +static int kvm_vfio_unforward(struct kvm_device *kdev,
> +				     struct vfio_device *vdev,
> +				     struct kvm_arch_forwarded_irq *fwd_irq)
> +{
> +	struct kvm_vfio *kv = kdev->private;
> +	struct kvm_vfio_device *kvm_vdev;
> +	int ret;
> +
> +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
> +	if (ret < 0)
> +		return -EINVAL;

why do you override the return value?  Propagate it.

> +
> +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
> +	if (ret < 0)
> +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
> +			__func__, fwd_irq->fd, fwd_irq->index);
> +	else
> +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
> +			  __func__, fwd_irq->fd, fwd_irq->index);

again with the kernel log here.



> +	return ret;
> +}
> +
> +
> +
> +
> +/**
> + * kvm_vfio_set_device - the top function for interracting with a vfio

                                top?             interacting

> + * device
> + */

probably just skip this comment

> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> +{
> +	struct kvm_vfio *kv = kdev->private;
> +	struct vfio_device *vdev;
> +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
> +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> +
> +	switch (attr) {
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> +		bool must_put;
> +		int ret;
> +
> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> +			return -EFAULT;
> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> +		if (IS_ERR(vdev))
> +			return PTR_ERR(vdev);

seems like this whole block of code is replicated below, needs
refactoring.

> +		mutex_lock(&kv->lock);
> +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
> +		if (must_put)
> +			kvm_vfio_put_vfio_device(vdev);

this must_put looks plain weird.  I think you want to balance your
get/put's always; can't you just get an extra reference in
kvm_vfio_forward() ?

> +		mutex_unlock(&kv->lock);
> +		return ret;
> +		}
> +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
> +		int ret;
> +
> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> +			return -EFAULT;
> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> +		if (IS_ERR(vdev))
> +			return PTR_ERR(vdev);
> +
> +		kvm_vfio_device_put_external_user(vdev);

you're dropping the reference to the device but referencing it in your
unfoward call below?

> +		mutex_lock(&kv->lock);
> +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
> +		mutex_unlock(&kv->lock);
> +		return ret;
> +	}
> +#endif
> +	default:
> +		return -ENXIO;
> +	}
> +}
> +
> +/**
> + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
> + * @kv: kvm-vfio device
> + *
> + * loop on all got devices and their associated forwarded IRQs

'loop on all got' ?

Restore the non-forwarded state for all registered devices and ...

> + * restore the non forwarded state, remove IRQs and their devices from
> + * the respective list, put the vfio platform devices
> + *
> + * When this function is called, the vcpu already are destroyed. No
                                    the VPUCs are already destroyed.
> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> + * kvm_arch_set_fwd_state action

this last bit didn't make any sense to me.  Also, why are we referring
to the vgic in generic code?

> + */
> +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
> +{
> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> +
> +	/* loop on all the assigned devices */

unnecessary comment

> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> +				 &kv->device_list, node) {
> +
> +		/* loop on all its forwarded IRQ */

same

> +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> +					 &kvm_vdev_iter->fwd_irq_list, link) {
> +			kvm_arch_set_fwd_state(fwd_irq_iter,
> +						KVM_VFIO_IRQ_CLEANUP);
> +			list_del(&fwd_irq_iter->link);
> +			kfree(fwd_irq_iter);
> +		}
> +		list_del(&kvm_vdev_iter->node);
> +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
> +		kfree(kvm_vdev_iter);
> +	}
> +	return 0;
> +}
> +
> +
>  static int kvm_vfio_set_attr(struct kvm_device *dev,
>  			     struct kvm_device_attr *attr)
>  {
>  	switch (attr->group) {
>  	case KVM_DEV_VFIO_GROUP:
>  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> +	case KVM_DEV_VFIO_DEVICE:
> +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
>  	}
>  
>  	return -ENXIO;
> @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
>  		case KVM_DEV_VFIO_GROUP_DEL:
>  			return 0;
>  		}
> -
>  		break;
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +	case KVM_DEV_VFIO_DEVICE:
> +		switch (attr->attr) {
> +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> +			return 0;
> +		}
> +		break;
> +#endif
>  	}
> -
>  	return -ENXIO;
>  }
>  
> @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
>  		list_del(&kvg->node);
>  		kfree(kvg);
>  	}
> +	kvm_vfio_put_all_devices(kv);
>  
>  	kvm_vfio_update_coherency(dev);
>  
> @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
>  		return -ENOMEM;
>  
>  	INIT_LIST_HEAD(&kv->group_list);
> +	INIT_LIST_HEAD(&kv->device_list);
>  	mutex_init(&kv->lock);
>  
>  	dev->private = kv;
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 9/9] KVM: KVM-VFIO: ARM forwarding control
  2014-09-01 12:52   ` Eric Auger
@ 2014-09-11  3:10     ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Mon, Sep 01, 2014 at 02:52:48PM +0200, Eric Auger wrote:
> Enables forwarding control for ARM. By defining
> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD the patch enables
> KVM_DEV_VFIO_DEVICE_FORWARD/UNFORWARD_IRQ command on ARM. As a
> result it brings an optimized injection/completion handling for
> forwarded IRQ. The ARM specific part is implemented in a new module,

a new module ?!?

you mean file, right?

> kvm_vfio_arm.c
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>  arch/arm/include/asm/kvm_host.h |  2 +
>  arch/arm/kvm/Makefile           |  2 +-
>  arch/arm/kvm/kvm_vfio_arm.c     | 85 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 88 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 1aee6bb..dfd3b05 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -25,6 +25,8 @@
>  #include <asm/fpstate.h>
>  #include <kvm/arm_arch_timer.h>
>  
> +#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +
>  #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>  #else
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index ea1fa76..26a5a42 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf
>  
>  obj-y += kvm-arm.o init.o interrupts.o
>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
> -obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
> +obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o kvm_vfio_arm.o
>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
> diff --git a/arch/arm/kvm/kvm_vfio_arm.c b/arch/arm/kvm/kvm_vfio_arm.c
> new file mode 100644
> index 0000000..0d316b1
> --- /dev/null
> +++ b/arch/arm/kvm/kvm_vfio_arm.c
> @@ -0,0 +1,85 @@
> +/*
> + * Copyright (C) 2014 Linaro Ltd.
> + * Authors: Eric Auger <eric.auger@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/file.h>
> +#include <linux/kvm_host.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/vfio.h>
> +#include <linux/irq.h>
> +#include <asm/kvm_host.h>
> +#include <asm/kvm.h>
> +#include <linux/irq.h>
> +#include <linux/platform_device.h>
> +#include <linux/interrupt.h>
> +
> +/**
> + * kvm_arch_set_fwd_state - change the forwarded state of an IRQ
> + * @pfwd: the forwarded irq struct
> + * @action: action to perform (set forward, set back normal, cleanup)
> + *
> + * programs the GIC and VGIC
> + * returns the VGIC map/unmap return status
> + * It is the responsability of the caller to make sure the physical IRQ
                responsibility
> + * is not active. there is a critical section between the start of the
                     There
> + * VFIO IRQ handler and LR programming.

Did we implement code to ensure this in the previous patch? I don't
think I noticed it.  Doesn't disabling the IRQ have the desired effect?

a critical section? who are the contenders of this, what action should I
take to make sure access to the critical section is serialized?

> + */
> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> +			   enum kvm_fwd_irq_action action)
> +{
> +	int ret;
> +	struct irq_desc *desc = irq_to_desc(pfwd->hwirq);
> +	struct irq_data *d = &desc->irq_data;
> +	struct irq_chip *chip = desc->irq_data.chip;
> +
> +	disable_irq(pfwd->hwirq);
> +	/* no fwd state change can happen if the IRQ is in progress */
> +	if (irqd_irq_inprogress(d)) {
> +		kvm_err("%s cannot change fwd state (IRQ %d in progress\n",
> +			__func__, pfwd->hwirq);
> +		enable_irq(pfwd->hwirq);
> +		return -1;

-1? seems like you're defining some new error code convenstions here.
-EBUSY perhaps?

probably want to use a goto label here as well.

> +	}
> +
> +	if (action == KVM_VFIO_IRQ_SET_FORWARD) {
> +		irqd_set_irq_forwarded(d);
> +		ret = vgic_map_phys_irq(pfwd->vcpu,
> +					pfwd->gsi + VGIC_NR_PRIVATE_IRQS,
> +					pfwd->hwirq);
> +	} else if (action == KVM_VFIO_IRQ_SET_NORMAL) {
> +		irqd_clr_irq_forwarded(d);
> +		ret = vgic_unmap_phys_irq(pfwd->vcpu,
> +					  pfwd->gsi +
> +						VGIC_NR_PRIVATE_IRQS,
> +					  pfwd->hwirq);
> +	} else if (action == KVM_VFIO_IRQ_CLEANUP) {
> +		irqd_clr_irq_forwarded(d);
> +		/*
> +		 * in case the guest did not complete the
> +		 * virtual IRQ, let's do it for him.
> +		 * when cleanup is called, VCPU have already
> +		 * been freed, do not manipulate VGIC
> +		 */
> +		chip->irq_eoi(d);
> +		ret = 0;

why can't you do this on SET_NORMAL?

don't you also need to make sure the LR is unqueued from the VCPU and
kick the VCPU?  If there are hidden semantics that this should only ever
be called when the VM is stopped (stopping) then that is not clear, and
I think you need to just ahve the SET_NORMAL do the dirty work with a
cleaner set of semantics.


> +	} else {
> +		enable_irq(pfwd->hwirq);

you're enabling the irq twice now.

> +		ret = -EINVAL;
> +	}
> +
> +	enable_irq(pfwd->hwirq);
> +	return ret;
> +}
> -- 
> 1.9.1
> 
Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 9/9] KVM: KVM-VFIO: ARM forwarding control
@ 2014-09-11  3:10     ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 01, 2014 at 02:52:48PM +0200, Eric Auger wrote:
> Enables forwarding control for ARM. By defining
> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD the patch enables
> KVM_DEV_VFIO_DEVICE_FORWARD/UNFORWARD_IRQ command on ARM. As a
> result it brings an optimized injection/completion handling for
> forwarded IRQ. The ARM specific part is implemented in a new module,

a new module ?!?

you mean file, right?

> kvm_vfio_arm.c
> 
> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> ---
>  arch/arm/include/asm/kvm_host.h |  2 +
>  arch/arm/kvm/Makefile           |  2 +-
>  arch/arm/kvm/kvm_vfio_arm.c     | 85 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 88 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 1aee6bb..dfd3b05 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -25,6 +25,8 @@
>  #include <asm/fpstate.h>
>  #include <kvm/arm_arch_timer.h>
>  
> +#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +
>  #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>  #else
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index ea1fa76..26a5a42 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf
>  
>  obj-y += kvm-arm.o init.o interrupts.o
>  obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
> -obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
> +obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o kvm_vfio_arm.o
>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
>  obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
>  obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
> diff --git a/arch/arm/kvm/kvm_vfio_arm.c b/arch/arm/kvm/kvm_vfio_arm.c
> new file mode 100644
> index 0000000..0d316b1
> --- /dev/null
> +++ b/arch/arm/kvm/kvm_vfio_arm.c
> @@ -0,0 +1,85 @@
> +/*
> + * Copyright (C) 2014 Linaro Ltd.
> + * Authors: Eric Auger <eric.auger@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/file.h>
> +#include <linux/kvm_host.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/vfio.h>
> +#include <linux/irq.h>
> +#include <asm/kvm_host.h>
> +#include <asm/kvm.h>
> +#include <linux/irq.h>
> +#include <linux/platform_device.h>
> +#include <linux/interrupt.h>
> +
> +/**
> + * kvm_arch_set_fwd_state - change the forwarded state of an IRQ
> + * @pfwd: the forwarded irq struct
> + * @action: action to perform (set forward, set back normal, cleanup)
> + *
> + * programs the GIC and VGIC
> + * returns the VGIC map/unmap return status
> + * It is the responsability of the caller to make sure the physical IRQ
                responsibility
> + * is not active. there is a critical section between the start of the
                     There
> + * VFIO IRQ handler and LR programming.

Did we implement code to ensure this in the previous patch? I don't
think I noticed it.  Doesn't disabling the IRQ have the desired effect?

a critical section? who are the contenders of this, what action should I
take to make sure access to the critical section is serialized?

> + */
> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> +			   enum kvm_fwd_irq_action action)
> +{
> +	int ret;
> +	struct irq_desc *desc = irq_to_desc(pfwd->hwirq);
> +	struct irq_data *d = &desc->irq_data;
> +	struct irq_chip *chip = desc->irq_data.chip;
> +
> +	disable_irq(pfwd->hwirq);
> +	/* no fwd state change can happen if the IRQ is in progress */
> +	if (irqd_irq_inprogress(d)) {
> +		kvm_err("%s cannot change fwd state (IRQ %d in progress\n",
> +			__func__, pfwd->hwirq);
> +		enable_irq(pfwd->hwirq);
> +		return -1;

-1? seems like you're defining some new error code convenstions here.
-EBUSY perhaps?

probably want to use a goto label here as well.

> +	}
> +
> +	if (action == KVM_VFIO_IRQ_SET_FORWARD) {
> +		irqd_set_irq_forwarded(d);
> +		ret = vgic_map_phys_irq(pfwd->vcpu,
> +					pfwd->gsi + VGIC_NR_PRIVATE_IRQS,
> +					pfwd->hwirq);
> +	} else if (action == KVM_VFIO_IRQ_SET_NORMAL) {
> +		irqd_clr_irq_forwarded(d);
> +		ret = vgic_unmap_phys_irq(pfwd->vcpu,
> +					  pfwd->gsi +
> +						VGIC_NR_PRIVATE_IRQS,
> +					  pfwd->hwirq);
> +	} else if (action == KVM_VFIO_IRQ_CLEANUP) {
> +		irqd_clr_irq_forwarded(d);
> +		/*
> +		 * in case the guest did not complete the
> +		 * virtual IRQ, let's do it for him.
> +		 * when cleanup is called, VCPU have already
> +		 * been freed, do not manipulate VGIC
> +		 */
> +		chip->irq_eoi(d);
> +		ret = 0;

why can't you do this on SET_NORMAL?

don't you also need to make sure the LR is unqueued from the VCPU and
kick the VCPU?  If there are hidden semantics that this should only ever
be called when the VM is stopped (stopping) then that is not clear, and
I think you need to just ahve the SET_NORMAL do the dirty work with a
cleaner set of semantics.


> +	} else {
> +		enable_irq(pfwd->hwirq);

you're enabling the irq twice now.

> +		ret = -EINVAL;
> +	}
> +
> +	enable_irq(pfwd->hwirq);
> +	return ret;
> +}
> -- 
> 1.9.1
> 
Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
  2014-09-02 21:05   ` Alex Williamson
@ 2014-09-11  3:10     ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Eric Auger, eric.auger, marc.zyngier, linux-arm-kernel, kvmarm,
	kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> > This RFC proposes an integration of "ARM: Forwarding physical
> > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> > KVM.
> > 
> > It enables to transform a VFIO platform driver IRQ into a forwarded
> > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> > switch can be avoided on guest virtual IRQ completion. Before this
> > patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> > 
> > When the IRQ is forwarded, the VFIO platform driver does not need to
> > disable the IRQ anymore. Indeed when returning from the IRQ handler
> > the IRQ is not deactivated. Only its priority is lowered. This means
> > the same IRQ cannot hit before the guest completes the virtual IRQ
> > and the GIC automatically deactivates the corresponding physical IRQ.
> > 
> > Besides, the injection still is based on irqfd triggering. The only
> > impact on irqfd process is resamplefd is not called anymore on
> > virtual IRQ completion since this latter becomes "transparent".
> > 
> > The current integration is based on an extension of the KVM-VFIO
> > device, previously used by KVM to interact with VFIO groups. The
> > patch serie now enables KVM to directly interact with a VFIO
> > platform device. The VFIO external API was extended for that purpose.
> > 
> > Th KVM-VFIO device can get/put the vfio platform device, check its
> > integrity and type, get the IRQ number associated to an IRQ index.
> > 
> > The IRQ forward programming is architecture specific (virtual interrupt
> > controller programming basically). However the whole infrastructure is
> > kept generic.
> > 
> > from a user point of view, the functionality is provided through new
> > KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> > and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> > Assignment can only be changed when the physical IRQ is not active.
> > It is the responsability of the user to do this check.
> > 
> > This patch serie has the following dependencies:
> > - "ARM: Forwarding physical interrupts to a guest VM"
> >   (http://lwn.net/Articles/603514/) in
> > - [PATCH v3] irqfd for ARM
> > - and obviously the VFIO platform driver serie:
> >   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> >   https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> > 
> > Integrated pieces can be found at
> > ssh://git.linaro.org/people/eric.auger/linux.git
> > on branch 3.17rc3_irqfd_forward_integ_v2
> > 
> > This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> > 
> > v1 -> v2:
> > - forward control is moved from architecture specific file into generic
> >   vfio.c module.
> >   only kvm_arch_set_fwd_state remains architecture specific
> > - integrate Kim's patch which enables KVM-VFIO for ARM
> > - fix vgic state bypass in vgic_queue_hwirq
> > - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
> >   to include/uapi/linux/kvm.h
> >   also irq_index renamed into index and guest_irq renamed into gsi
> > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> > - vfio_external_get_base_device renamed into vfio_external_base_device
> > - vfio_external_get_type removed
> > - kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
> > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > 
> > Eric Auger (8):
> >   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> >     IRQ
> >   KVM: ARM: VGIC: add forwarded irq rbtree lock
> >   VFIO: platform: handler tests whether the IRQ is forwarded
> >   KVM: KVM-VFIO: update user API to program forwarded IRQ
> >   VFIO: Extend external user API
> >   KVM: KVM-VFIO: add new VFIO external API hooks
> >   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
> >     control
> >   KVM: KVM-VFIO: ARM forwarding control
> > 
> > Kim Phillips (1):
> >   ARM: KVM: Enable the KVM-VFIO device
> > 
> >  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> >  arch/arm/include/asm/kvm_host.h            |   7 +
> >  arch/arm/kvm/Kconfig                       |   1 +
> >  arch/arm/kvm/Makefile                      |   4 +-
> >  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
> >  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> >  drivers/vfio/vfio.c                        |  24 ++
> >  include/kvm/arm_vgic.h                     |   1 +
> >  include/linux/kvm_host.h                   |  27 ++
> >  include/linux/vfio.h                       |   3 +
> >  include/uapi/linux/kvm.h                   |   9 +
> >  virt/kvm/arm/vgic.c                        |  59 +++-
> >  virt/kvm/vfio.c                            | 497 ++++++++++++++++++++++++++++-
> >  13 files changed, 733 insertions(+), 17 deletions(-)
> >  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> > 
> 
> Have we ventured too far in the other direction?  I suppose what I was
> hoping to see was something more like:
> 
> 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> 
> 		/* get vfio_device */
> 
> 		/* get mutex */
> 
> 		/* verify device+irq isn't already forwarded */
> 
> 		/* allocate device/forwarded irq */
> 
> 		/* get struct device */
> 
> 		/* callout to arch code passing struct device, gsi, ... */
> 
> 		/* if success, add to kv, else free and error */
> 
> 		/* mutex unlock */
> 	}

I think that's essentially what this patch set is trying to do, but
there are just too many complicated intertwining cases right now that
makes the code hard to read.

> 
> Exposing the internal mutex out to arch code, as in v1, was an
> indication that we were pushing too much out to arch code, but including
> platform_device.h into virt/kvm/vfio.c tells me we're still not
> abstracting at the right point.  Thanks,
> 
I raised my eyebrows over the platform device bus thingy here as well,
but on the other hand, there's nothing ARM-specific about referring to
the platform device bus.

I think perhaps it just has to be made more clear that the generic code
deals with translating the device resources in the necessary way, and
currently it only supports vfio-platform devices?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-09-11  3:10     ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11  3:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> > This RFC proposes an integration of "ARM: Forwarding physical
> > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> > KVM.
> > 
> > It enables to transform a VFIO platform driver IRQ into a forwarded
> > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> > switch can be avoided on guest virtual IRQ completion. Before this
> > patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> > 
> > When the IRQ is forwarded, the VFIO platform driver does not need to
> > disable the IRQ anymore. Indeed when returning from the IRQ handler
> > the IRQ is not deactivated. Only its priority is lowered. This means
> > the same IRQ cannot hit before the guest completes the virtual IRQ
> > and the GIC automatically deactivates the corresponding physical IRQ.
> > 
> > Besides, the injection still is based on irqfd triggering. The only
> > impact on irqfd process is resamplefd is not called anymore on
> > virtual IRQ completion since this latter becomes "transparent".
> > 
> > The current integration is based on an extension of the KVM-VFIO
> > device, previously used by KVM to interact with VFIO groups. The
> > patch serie now enables KVM to directly interact with a VFIO
> > platform device. The VFIO external API was extended for that purpose.
> > 
> > Th KVM-VFIO device can get/put the vfio platform device, check its
> > integrity and type, get the IRQ number associated to an IRQ index.
> > 
> > The IRQ forward programming is architecture specific (virtual interrupt
> > controller programming basically). However the whole infrastructure is
> > kept generic.
> > 
> > from a user point of view, the functionality is provided through new
> > KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> > and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> > Assignment can only be changed when the physical IRQ is not active.
> > It is the responsability of the user to do this check.
> > 
> > This patch serie has the following dependencies:
> > - "ARM: Forwarding physical interrupts to a guest VM"
> >   (http://lwn.net/Articles/603514/) in
> > - [PATCH v3] irqfd for ARM
> > - and obviously the VFIO platform driver serie:
> >   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> >   https://www.mail-archive.com/kvm at vger.kernel.org/msg103247.html
> > 
> > Integrated pieces can be found at
> > ssh://git.linaro.org/people/eric.auger/linux.git
> > on branch 3.17rc3_irqfd_forward_integ_v2
> > 
> > This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> > 
> > v1 -> v2:
> > - forward control is moved from architecture specific file into generic
> >   vfio.c module.
> >   only kvm_arch_set_fwd_state remains architecture specific
> > - integrate Kim's patch which enables KVM-VFIO for ARM
> > - fix vgic state bypass in vgic_queue_hwirq
> > - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
> >   to include/uapi/linux/kvm.h
> >   also irq_index renamed into index and guest_irq renamed into gsi
> > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> > - vfio_external_get_base_device renamed into vfio_external_base_device
> > - vfio_external_get_type removed
> > - kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
> > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > 
> > Eric Auger (8):
> >   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> >     IRQ
> >   KVM: ARM: VGIC: add forwarded irq rbtree lock
> >   VFIO: platform: handler tests whether the IRQ is forwarded
> >   KVM: KVM-VFIO: update user API to program forwarded IRQ
> >   VFIO: Extend external user API
> >   KVM: KVM-VFIO: add new VFIO external API hooks
> >   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
> >     control
> >   KVM: KVM-VFIO: ARM forwarding control
> > 
> > Kim Phillips (1):
> >   ARM: KVM: Enable the KVM-VFIO device
> > 
> >  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> >  arch/arm/include/asm/kvm_host.h            |   7 +
> >  arch/arm/kvm/Kconfig                       |   1 +
> >  arch/arm/kvm/Makefile                      |   4 +-
> >  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
> >  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> >  drivers/vfio/vfio.c                        |  24 ++
> >  include/kvm/arm_vgic.h                     |   1 +
> >  include/linux/kvm_host.h                   |  27 ++
> >  include/linux/vfio.h                       |   3 +
> >  include/uapi/linux/kvm.h                   |   9 +
> >  virt/kvm/arm/vgic.c                        |  59 +++-
> >  virt/kvm/vfio.c                            | 497 ++++++++++++++++++++++++++++-
> >  13 files changed, 733 insertions(+), 17 deletions(-)
> >  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> > 
> 
> Have we ventured too far in the other direction?  I suppose what I was
> hoping to see was something more like:
> 
> 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> 
> 		/* get vfio_device */
> 
> 		/* get mutex */
> 
> 		/* verify device+irq isn't already forwarded */
> 
> 		/* allocate device/forwarded irq */
> 
> 		/* get struct device */
> 
> 		/* callout to arch code passing struct device, gsi, ... */
> 
> 		/* if success, add to kv, else free and error */
> 
> 		/* mutex unlock */
> 	}

I think that's essentially what this patch set is trying to do, but
there are just too many complicated intertwining cases right now that
makes the code hard to read.

> 
> Exposing the internal mutex out to arch code, as in v1, was an
> indication that we were pushing too much out to arch code, but including
> platform_device.h into virt/kvm/vfio.c tells me we're still not
> abstracting at the right point.  Thanks,
> 
I raised my eyebrows over the platform device bus thingy here as well,
but on the other hand, there's nothing ARM-specific about referring to
the platform device bus.

I think perhaps it just has to be made more clear that the generic code
deals with translating the device resources in the necessary way, and
currently it only supports vfio-platform devices?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-11  3:10     ` Christoffer Dall
  (?)
@ 2014-09-11  5:05       ` Alex Williamson
  -1 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11  5:05 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Eric Auger, eric.auger, marc.zyngier, linux-arm-kernel, kvmarm,
	kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> > This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> > 
> > This is a new control channel which enables KVM to cooperate with
> > viable VFIO devices.
> > 
> > The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> > in addition to a list of groups (kvm_vfio_group). The new
> > infrastructure enables to check the validity of the VFIO device
> > file descriptor, get and hold a reference to it.
> > 
> > The first concrete implemented command is IRQ forward control:
> > KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> > 
> > It consists in programing the VFIO driver and KVM in a consistent manner
> > so that an optimized IRQ injection/completion is set up. Each
> > kvm_vfio_device holds a list of forwarded IRQ. When putting a
> > kvm_vfio_device, the implementation makes sure the forwarded IRQs
> > are set again in the normal handling state (non forwarded).
> 
> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> 
> When a kvm_vfio_device is released?
> 
> > 
> > The forwarding programmming is architecture specific, embodied by the
> > kvm_arch_set_fwd_state function. Its implementation is given in a
> > separate patch file.
> 
> I would drop the last sentence and instead indicate that this is handled
> properly when the architecture does not support such a feature.
> 
> > 
> > The forwarding control modality is enabled by the
> > __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> > 
> > Signed-off-by: Eric Auger <eric.auger@linaro.org>
> > 
> > ---
> > 
> > v1 -> v2:
> > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > - original patch file separated into 2 parts: generic part moved in vfio.c
> >   and ARM specific part(kvm_arch_set_fwd_state)
> > ---
> >  include/linux/kvm_host.h |  27 +++
> >  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 477 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index a4c33b3..24350dc 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> >  		      unsigned long arg);
> >  };
> >  
> > +enum kvm_fwd_irq_action {
> > +	KVM_VFIO_IRQ_SET_FORWARD,
> > +	KVM_VFIO_IRQ_SET_NORMAL,
> > +	KVM_VFIO_IRQ_CLEANUP,
> 
> This is KVM internal API, so it would probably be good to document this.
> Especially the CLEANUP bit worries me, see below.

This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
Extra states worry me too.

> > +};
> > +
> > +/* internal structure describing a forwarded IRQ */
> > +struct kvm_fwd_irq {
> > +	struct list_head link;
> 
> this list entry is local to the kvm vfio device, right? that means you
> probably want a struct with just the below fields, and then have a
> containing struct in the generic device file, private to it's logic.

Yes, this is part of the abstraction problem.

> > +	__u32 index; /* platform device irq index */

This is a vfio_device irq_index, but vfio_devices support indexes and
sub-indexes.  At this level the API should match vfio, not the specifics
of platform devices not supporting sub-index.

> > +	__u32 hwirq; /*physical IRQ */
> > +	__u32 gsi; /* virtual IRQ */
> > +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/

Not sure I understand why vcpu is necessary.  Also I see a 'get' in the
code below, but not a 'put'.

> > +};
> > +
> >  void kvm_device_get(struct kvm_device *dev);
> >  void kvm_device_put(struct kvm_device *dev);
> >  struct kvm_device *kvm_device_from_filp(struct file *filp);
> > @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
> >  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >  extern struct kvm_device_ops kvm_flic_ops;
> >  
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> 
> what's the 'p' in pfwd?

p is for pointer?

> > +			   enum kvm_fwd_irq_action action);
> > +
> > +#else
> > +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > +					 enum kvm_fwd_irq_action action)
> > +{
> > +	return 0;
> > +}
> > +#endif
> > +
> >  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >  
> >  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> > index 76dc7a1..e4a81c4 100644
> > --- a/virt/kvm/vfio.c
> > +++ b/virt/kvm/vfio.c
> > @@ -18,14 +18,24 @@
> >  #include <linux/slab.h>
> >  #include <linux/uaccess.h>
> >  #include <linux/vfio.h>
> > +#include <linux/platform_device.h>
> >  
> >  struct kvm_vfio_group {
> >  	struct list_head node;
> >  	struct vfio_group *vfio_group;
> >  };
> >  
> > +struct kvm_vfio_device {
> > +	struct list_head node;
> > +	struct vfio_device *vfio_device;
> > +	/* list of forwarded IRQs for that VFIO device */
> > +	struct list_head fwd_irq_list;
> > +	int fd;
> > +};
> > +
> >  struct kvm_vfio {
> >  	struct list_head group_list;
> > +	struct list_head device_list;
> >  	struct mutex lock;
> >  	bool noncoherent;
> >  };
> > @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >  	return -ENXIO;
> >  }
> >  
> > +/**
> > + * get_vfio_device - returns the vfio-device corresponding to this fd
> > + * @fd:fd of the vfio platform device
> > + *
> > + * checks it is a vfio device
> > + * increment its ref counter
> 
> why the short lines?  Just write this out in proper English.
> 
> > + */
> > +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> > +{
> > +	struct fd f;
> > +	struct vfio_device *vdev;
> > +
> > +	f = fdget(fd);
> > +	if (!f.file)
> > +		return NULL;
> > +	vdev = kvm_vfio_device_get_external_user(f.file);
> > +	fdput(f);
> > +	return vdev;
> > +}
> > +
> > +/**
> > + * put_vfio_device: put the vfio platform device
> > + * @vdev: vfio_device to put
> > + *
> > + * decrement the ref counter
> > + */
> > +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> > +{
> > +	kvm_vfio_device_put_external_user(vdev);
> > +}
> > +
> > +/**
> > + * kvm_vfio_find_device - look for the device in the assigned
> > + * device list
> > + * @kv: the kvm-vfio device
> > + * @vdev: the vfio_device to look for
> > + *
> > + * returns the associated kvm_vfio_device if the device is known,
> > + * meaning at least 1 IRQ is forwarded for this device.
> > + * in the device is not registered, returns NULL.
> > + */

Why are we talking about forwarded IRQs already, this is a simple lookup
function, who knows what other users it will have in the future.

> 
> are these functions meant to be exported?  Otherwise they should be
> static, and the documentation on these simple list iteration wrappers
> seems like overkill imho.
> 
> > +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> > +					     struct vfio_device *vdev)
> > +{
> > +	struct kvm_vfio_device *kvm_vdev_iter;
> > +
> > +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> > +		if (kvm_vdev_iter->vfio_device == vdev)
> > +			return kvm_vdev_iter;
> > +	}
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> > + * @kvm_vdev: the kvm_vfio_device
> > + * @irq_index: irq index
> > + *
> > + * returns the forwarded irq struct if it exists, NULL in the negative
> > + */
> > +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> > +				      int irq_index)

+sub-index

probably important to note on both of these that they need to be called
with kv->lock

> > +{
> > +	struct kvm_fwd_irq *fwd_irq_iter;
> > +
> > +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> > +		if (fwd_irq_iter->index == irq_index)
> > +			return fwd_irq_iter;
> > +	}
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * validate_forward - checks whether forwarding a given IRQ is meaningful
> > + * @vdev:  vfio_device the IRQ belongs to
> > + * @fwd_irq: user struct containing the irq_index to forward
> > + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> > + * kvm_vfio_device that holds it
> > + * @hwirq: irq numberthe irq index corresponds to
> > + *
> > + * checks the vfio-device is a platform vfio device
> > + * checks the irq_index corresponds to an actual hwirq and
> > + * checks this hwirq is not already forwarded
> > + * returns < 0 on following errors:
> > + * not a platform device, bad irq index, already forwarded
> > + */
> > +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> > +			    struct vfio_device *vdev,
> > +			    struct kvm_arch_forwarded_irq *fwd_irq,
> > +			    struct kvm_vfio_device **kvm_vdev,
> > +			    int *hwirq)
> > +{
> > +	struct device *dev = kvm_vfio_external_base_device(vdev);
> > +	struct platform_device *platdev;
> > +
> > +	*hwirq = -1;
> > +	*kvm_vdev = NULL;
> > +	if (strcmp(dev->bus->name, "platform") == 0) {

Should be testing dev->bus_type == &platform_bus_type, and ideally
creating a dev_is_platform() macro to make that even cleaner.

However, we're being sort of sneaky here that we're actually doing
something platform device specific here.  Why?  Don't we just need to
make sure that kvm-vfio doesn't have any record of this forward
(-EEXIST) and let the platform device code error out later for this
case?

> > +		platdev = to_platform_device(dev);
> > +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> > +		if (*hwirq < 0) {
> > +			kvm_err("%s incorrect index\n", __func__);
> > +			return -EINVAL;
> > +		}
> > +	} else {
> > +		kvm_err("%s not a platform device\n", __func__);
> > +		return -EINVAL;
> > +	}
> 
> need some spaceing here, also, I would turn this around, first check if
> the strcmp fails, and then error out, then do you next check etc., to
> avoid so many nested statements.
> 
> > +	/* is a ref to this device already owned by the KVM-VFIO device? */
> 
> this comment is not particularly helpful in its current form, it would
> be helpful if you specified that we're checking whether that particular
> device/irq combo is already registered.
> 
> > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > +	if (*kvm_vdev) {
> > +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> > +			kvm_err("%s irq %d already forwarded\n",
> > +				__func__, *hwirq);

Why didn't we do this first?

> don't flood the kernel log because of a user error, just allocate an
> error code for this purpose and document it in the ABI, -EEXIST or
> something.
> 
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +
> > +/**
> > + * validate_unforward: check a deassignment is meaningful
> > + * @kv: the kvm_vfio device
> > + * @vdev: the vfio_device whose irq to deassign belongs to
> > + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> > + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> > + * it exists
> > + *
> > + * returns 0 if the provided irq effectively is forwarded
> > + * (a ref to this vfio_device is hold and this irq belongs to
>                                     held
> > + * the forwarded irq of this device)
> > + * returns -EINVAL in the negative
> 
>                ENOENT should be returned if you don't have an entry.
> 	       EINVAL could be used if you supply an fd that isn't a
> 	       VFIO device file descriptor, for example.  Again,
> 	       consider documenting all this in the API.
> 
> > + */
> > +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> > +			      struct vfio_device *vdev,
> > +			      struct kvm_arch_forwarded_irq *fwd_irq,
> > +			      struct kvm_vfio_device **kvm_vdev)
> > +{
> > +	struct kvm_fwd_irq *pfwd;
> > +
> > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > +	if (!kvm_vdev) {
> > +		kvm_err("%s no forwarded irq for this device\n", __func__);
> 
> don't flood the kernel log
> 
> > +		return -EINVAL;
> > +	}
> > +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> > +	if (!pfwd) {
> > +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> 
> > +		return -EINVAL;
> 
> same here
> 
> > +	}
> > +	return 0;
> > +}
> > +
> > +/**
> > + * kvm_vfio_forward - set a forwarded IRQ
> > + * @kdev: the kvm device
> > + * @vdev: the vfio device the IRQ belongs to
> > + * @fwd_irq: the user struct containing the irq_index and guest irq
> > + * @must_put: tells the caller whether the vfio_device must be put after
> > + * the call (ref must be released in case a ref onto this device was
> > + * already hold or in case of new device and failure)
> > + *
> > + * validate the injection, activate forward and store the information
>       Validate
> > + * about which irq and which device is concerned so that on deassign or
> > + * kvm-vfio destruction everuthing can be cleaned up.
>                            everything
> 
> I'm not sure I understand this explanation.  Do we have concerned
> devices?
> 
> I think you want to say something along the lines of: If userspace passed
> a valid vfio device and irq handle and the architecture supports
> forwarding this combination, register the vfio_device and irq
> combination in the ....
> 
> > + */
> > +static int kvm_vfio_forward(struct kvm_device *kdev,
> > +			    struct vfio_device *vdev,
> > +			    struct kvm_arch_forwarded_irq *fwd_irq,
> > +			    bool *must_put)
> > +{
> > +	int ret;
> > +	struct kvm_fwd_irq *pfwd = NULL;
> > +	struct kvm_vfio_device *kvm_vdev = NULL;
> > +	struct kvm_vfio *kv = kdev->private;
> > +	int hwirq;
> > +
> > +	*must_put = true;
> > +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
> > +					&kvm_vdev, &hwirq);
> > +	if (ret < 0)
> > +		return -EINVAL;
> > +
> > +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
> 
> seems a bit pointless to zero-out the memory if you're setting all
> fields below.
> 
> > +	if (!pfwd)
> > +		return -ENOMEM;
> > +	pfwd->index = fwd_irq->index;
> > +	pfwd->gsi = fwd_irq->gsi;
> > +	pfwd->hwirq = hwirq;
> > +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> > +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> > +	if (ret < 0) {
> > +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> 
> this whole thing feels incredibly broken to me.  Setting a forward
> should either work or not work, not something in between that leaves
> something to be cleaned up.  Why this two-stage thingy here?

Yep, I agree.  I also don't see the point of the validate function, just
open code it here and push the platform_get_irq test into
kvm_arch_set_fwd_state.  kvm-vfio doesn't care about the hwirq.

> > +		kfree(pfwd);
> 
> probably want to move your free-and-return-error to the end of the
> function.
> 
> > +		return ret;
> > +	}
> > +
> > +	if (!kvm_vdev) {
> > +		/* create & insert the new device and keep the ref */
> > +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
> 
> again, no need for zeroing out the memory.

I think you also want to allocate this before you setup the forward so
you can eliminate any complicated teardown later.

> > +		if (!kvm_vdev) {
> > +			kvm_arch_set_fwd_state(pfwd, false);

false?  The function takes an enum.

> > +			kfree(pfwd);
> > +			return -ENOMEM;
> > +		}
> > +
> > +		kvm_vdev->vfio_device = vdev;
> > +		kvm_vdev->fd = fwd_irq->fd;
> > +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
> > +		list_add(&kvm_vdev->node, &kv->device_list);
> > +		/*
> > +		 * the only case where we keep the ref:
> > +		 * new device and forward setting successful
> > +		 */
> > +		*must_put = false;
> > +	}
> > +
> > +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
> > +
> > +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
> > +	fwd_irq->fd, hwirq, fwd_irq->gsi);
> 
> please indent this to align with the opening parenthesis.
> 
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * remove_assigned_device - put a given device from the list
> 
> this isn't a 'put', at least not *just* a put.
> 
> > + * @kv: the kvm-vfio device
> > + * @vdev: the vfio-device to remove
> > + *
> > + * change the state of all forwarded IRQs, free the forwarded IRQ list,
> > + * remove the corresponding kvm_vfio_device from the assigned device
> > + * list.
> > + * returns true if the device could be removed, false in the negative
> > + */
> > +bool remove_assigned_device(struct kvm_vfio *kv,
> > +			    struct vfio_device *vdev)
> > +{
> > +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> > +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> > +	bool removed = false;
> > +	int ret;
> > +
> > +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> > +				 &kv->device_list, node) {
> > +		if (kvm_vdev_iter->vfio_device == vdev) {
> > +			/* loop on all its forwarded IRQ */
> > +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> > +						 &kvm_vdev_iter->fwd_irq_list,
> > +						 link) {
> 
> hmm, seems this function is only called when you have no more forwarded
> IRQs, so isn't all of this completely dead (and unnecessary) code?
> 
> > +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> > +						KVM_VFIO_IRQ_SET_NORMAL);
> > +				if (ret < 0)
> > +					return ret;
> 
> you're returning an error code to a bool function, which means you'll
> return true when there was an error.  Is this your intention? ;)
> 
> if we have an error here, this would be a very very bad situation wouldn't it?
> 
> > +				list_del(&fwd_irq_iter->link);
> > +				kfree(fwd_irq_iter);
> > +			}
> > +			/* all IRQs could be deassigned */
> > +			list_del(&kvm_vdev_iter->node);
> > +			kvm_vfio_device_put_external_user(
> > +				kvm_vdev_iter->vfio_device);
> > +			kfree(kvm_vdev_iter);
> > +			removed = true;
> > +			break;
> > +		}
> > +	}
> > +	return removed;
> > +}
> > +
> > +
> > +/**
> > + * remove_fwd_irq - remove a forwarded irq
> > + *
> > + * @kv: kvm-vfio device
> > + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
> > + * irq_index: the index of the IRQ
> > + *
> > + * change the forwarded state of the IRQ, remove the IRQ from
> > + * the device forwarded IRQ list. In case it is the last one,
> > + * put the device
> > + */
> > +int remove_fwd_irq(struct kvm_vfio *kv,
> > +		   struct kvm_vfio_device *kvm_vdev,
> > +		   int irq_index)
> > +{
> > +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> > +	int ret = -1;
> > +
> > +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> > +				 &kvm_vdev->fwd_irq_list, link) {
> 
> hmmm, you can only forward one irq for a specific device once, right?
> And you already have a lookup function, so why not call that, and then
> remove it?
> 
> I'm confused.

Me too, this and the previous function need some more consideration.

> > +		if (fwd_irq_iter->index == irq_index) {
> > +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> > +						KVM_VFIO_IRQ_SET_NORMAL);
> > +			if (ret < 0)
> > +				break;
> > +			list_del(&fwd_irq_iter->link);
> > +			kfree(fwd_irq_iter);
> > +			ret = 0;
> > +			break;
> > +		}
> > +	}
> > +	if (list_empty(&kvm_vdev->fwd_irq_list))
> > +		remove_assigned_device(kv, kvm_vdev->vfio_device);
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * kvm_vfio_unforward - remove a forwarded IRQ
> > + * @kdev: the kvm device
> > + * @vdev: the vfio_device
> > + * @fwd_irq: user struct
> > + * after checking this IRQ effectively is forwarded, change its state,
> > + * remove it from the corresponding kvm_vfio_device list
> > + */
> > +static int kvm_vfio_unforward(struct kvm_device *kdev,
> > +				     struct vfio_device *vdev,
> > +				     struct kvm_arch_forwarded_irq *fwd_irq)
> > +{
> > +	struct kvm_vfio *kv = kdev->private;
> > +	struct kvm_vfio_device *kvm_vdev;
> > +	int ret;
> > +
> > +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
> > +	if (ret < 0)
> > +		return -EINVAL;
> 
> why do you override the return value?  Propagate it.
> 
> > +
> > +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
> > +	if (ret < 0)
> > +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
> > +			__func__, fwd_irq->fd, fwd_irq->index);
> > +	else
> > +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
> > +			  __func__, fwd_irq->fd, fwd_irq->index);
> 
> again with the kernel log here.
> 
> 
> 
> > +	return ret;
> > +}
> > +
> > +
> > +
> > +
> > +/**
> > + * kvm_vfio_set_device - the top function for interracting with a vfio
> 
>                                 top?             interacting
> 
> > + * device
> > + */
> 
> probably just skip this comment
> 
> > +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> > +{
> > +	struct kvm_vfio *kv = kdev->private;
> > +	struct vfio_device *vdev;
> > +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
> > +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> > +
> > +	switch (attr) {
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> > +		bool must_put;
> > +		int ret;
> > +
> > +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> > +			return -EFAULT;
> > +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> > +		if (IS_ERR(vdev))
> > +			return PTR_ERR(vdev);
> 
> seems like this whole block of code is replicated below, needs
> refactoring.
> 
> > +		mutex_lock(&kv->lock);
> > +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
> > +		if (must_put)
> > +			kvm_vfio_put_vfio_device(vdev);
> 
> this must_put looks plain weird.  I think you want to balance your
> get/put's always; can't you just get an extra reference in
> kvm_vfio_forward() ?

Yeah, this is very broken.  Every forwarded IRQ should hold a reference
to the vfio_device.  Every unforward should drop a reference.  Trying to
maintain a single reference is a non-goal.

> 
> > +		mutex_unlock(&kv->lock);
> > +		return ret;
> > +		}
> > +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
> > +		int ret;
> > +
> > +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> > +			return -EFAULT;
> > +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> > +		if (IS_ERR(vdev))
> > +			return PTR_ERR(vdev);
> > +
> > +		kvm_vfio_device_put_external_user(vdev);
> 
> you're dropping the reference to the device but referencing it in your
> unfoward call below?
> 
> > +		mutex_lock(&kv->lock);
> > +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
> > +		mutex_unlock(&kv->lock);
> > +		return ret;
> > +	}
> > +#endif
> > +	default:
> > +		return -ENXIO;
> > +	}
> > +}
> > +
> > +/**
> > + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
> > + * @kv: kvm-vfio device
> > + *
> > + * loop on all got devices and their associated forwarded IRQs
> 
> 'loop on all got' ?
> 
> Restore the non-forwarded state for all registered devices and ...
> 
> > + * restore the non forwarded state, remove IRQs and their devices from
> > + * the respective list, put the vfio platform devices
> > + *
> > + * When this function is called, the vcpu already are destroyed. No
>                                     the VPUCs are already destroyed.
> > + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> > + * kvm_arch_set_fwd_state action
> 
> this last bit didn't make any sense to me.  Also, why are we referring
> to the vgic in generic code?
> 
> > + */
> > +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
> > +{
> > +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> > +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> > +
> > +	/* loop on all the assigned devices */
> 
> unnecessary comment
> 
> > +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> > +				 &kv->device_list, node) {
> > +
> > +		/* loop on all its forwarded IRQ */
> 
> same
> 
> > +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> > +					 &kvm_vdev_iter->fwd_irq_list, link) {
> > +			kvm_arch_set_fwd_state(fwd_irq_iter,
> > +						KVM_VFIO_IRQ_CLEANUP);
> > +			list_del(&fwd_irq_iter->link);
> > +			kfree(fwd_irq_iter);
> > +		}


Ugh, how many of these cleanup functions do we need?

> > +		list_del(&kvm_vdev_iter->node);
> > +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
> > +		kfree(kvm_vdev_iter);
> > +	}
> > +	return 0;
> > +}
> > +
> > +
> >  static int kvm_vfio_set_attr(struct kvm_device *dev,
> >  			     struct kvm_device_attr *attr)
> >  {
> >  	switch (attr->group) {
> >  	case KVM_DEV_VFIO_GROUP:
> >  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> > +	case KVM_DEV_VFIO_DEVICE:
> > +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> >  	}
> >  
> >  	return -ENXIO;
> > @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> >  		case KVM_DEV_VFIO_GROUP_DEL:
> >  			return 0;
> >  		}
> > -
> >  		break;
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > +	case KVM_DEV_VFIO_DEVICE:
> > +		switch (attr->attr) {
> > +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> > +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> > +			return 0;
> > +		}
> > +		break;
> > +#endif
> >  	}
> > -
> >  	return -ENXIO;
> >  }
> >  
> > @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> >  		list_del(&kvg->node);
> >  		kfree(kvg);
> >  	}
> > +	kvm_vfio_put_all_devices(kv);
> >  
> >  	kvm_vfio_update_coherency(dev);
> >  
> > @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
> >  		return -ENOMEM;
> >  
> >  	INIT_LIST_HEAD(&kv->group_list);
> > +	INIT_LIST_HEAD(&kv->device_list);
> >  	mutex_init(&kv->lock);
> >  
> >  	dev->private = kv;
> > -- 
> > 1.9.1
> > 




^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11  5:05       ` Alex Williamson
  0 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11  5:05 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: joel.schopp, kim.phillips, eric.auger, kvm, Eric Auger,
	marc.zyngier, john.liuli, patches, will.deacon, linux-kernel,
	a.rigo, gleb, paulus, a.motakis, pbonzini, kvmarm,
	linux-arm-kernel

On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> > This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> > 
> > This is a new control channel which enables KVM to cooperate with
> > viable VFIO devices.
> > 
> > The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> > in addition to a list of groups (kvm_vfio_group). The new
> > infrastructure enables to check the validity of the VFIO device
> > file descriptor, get and hold a reference to it.
> > 
> > The first concrete implemented command is IRQ forward control:
> > KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> > 
> > It consists in programing the VFIO driver and KVM in a consistent manner
> > so that an optimized IRQ injection/completion is set up. Each
> > kvm_vfio_device holds a list of forwarded IRQ. When putting a
> > kvm_vfio_device, the implementation makes sure the forwarded IRQs
> > are set again in the normal handling state (non forwarded).
> 
> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> 
> When a kvm_vfio_device is released?
> 
> > 
> > The forwarding programmming is architecture specific, embodied by the
> > kvm_arch_set_fwd_state function. Its implementation is given in a
> > separate patch file.
> 
> I would drop the last sentence and instead indicate that this is handled
> properly when the architecture does not support such a feature.
> 
> > 
> > The forwarding control modality is enabled by the
> > __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> > 
> > Signed-off-by: Eric Auger <eric.auger@linaro.org>
> > 
> > ---
> > 
> > v1 -> v2:
> > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > - original patch file separated into 2 parts: generic part moved in vfio.c
> >   and ARM specific part(kvm_arch_set_fwd_state)
> > ---
> >  include/linux/kvm_host.h |  27 +++
> >  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 477 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index a4c33b3..24350dc 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> >  		      unsigned long arg);
> >  };
> >  
> > +enum kvm_fwd_irq_action {
> > +	KVM_VFIO_IRQ_SET_FORWARD,
> > +	KVM_VFIO_IRQ_SET_NORMAL,
> > +	KVM_VFIO_IRQ_CLEANUP,
> 
> This is KVM internal API, so it would probably be good to document this.
> Especially the CLEANUP bit worries me, see below.

This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
Extra states worry me too.

> > +};
> > +
> > +/* internal structure describing a forwarded IRQ */
> > +struct kvm_fwd_irq {
> > +	struct list_head link;
> 
> this list entry is local to the kvm vfio device, right? that means you
> probably want a struct with just the below fields, and then have a
> containing struct in the generic device file, private to it's logic.

Yes, this is part of the abstraction problem.

> > +	__u32 index; /* platform device irq index */

This is a vfio_device irq_index, but vfio_devices support indexes and
sub-indexes.  At this level the API should match vfio, not the specifics
of platform devices not supporting sub-index.

> > +	__u32 hwirq; /*physical IRQ */
> > +	__u32 gsi; /* virtual IRQ */
> > +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/

Not sure I understand why vcpu is necessary.  Also I see a 'get' in the
code below, but not a 'put'.

> > +};
> > +
> >  void kvm_device_get(struct kvm_device *dev);
> >  void kvm_device_put(struct kvm_device *dev);
> >  struct kvm_device *kvm_device_from_filp(struct file *filp);
> > @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
> >  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >  extern struct kvm_device_ops kvm_flic_ops;
> >  
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> 
> what's the 'p' in pfwd?

p is for pointer?

> > +			   enum kvm_fwd_irq_action action);
> > +
> > +#else
> > +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > +					 enum kvm_fwd_irq_action action)
> > +{
> > +	return 0;
> > +}
> > +#endif
> > +
> >  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >  
> >  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> > index 76dc7a1..e4a81c4 100644
> > --- a/virt/kvm/vfio.c
> > +++ b/virt/kvm/vfio.c
> > @@ -18,14 +18,24 @@
> >  #include <linux/slab.h>
> >  #include <linux/uaccess.h>
> >  #include <linux/vfio.h>
> > +#include <linux/platform_device.h>
> >  
> >  struct kvm_vfio_group {
> >  	struct list_head node;
> >  	struct vfio_group *vfio_group;
> >  };
> >  
> > +struct kvm_vfio_device {
> > +	struct list_head node;
> > +	struct vfio_device *vfio_device;
> > +	/* list of forwarded IRQs for that VFIO device */
> > +	struct list_head fwd_irq_list;
> > +	int fd;
> > +};
> > +
> >  struct kvm_vfio {
> >  	struct list_head group_list;
> > +	struct list_head device_list;
> >  	struct mutex lock;
> >  	bool noncoherent;
> >  };
> > @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >  	return -ENXIO;
> >  }
> >  
> > +/**
> > + * get_vfio_device - returns the vfio-device corresponding to this fd
> > + * @fd:fd of the vfio platform device
> > + *
> > + * checks it is a vfio device
> > + * increment its ref counter
> 
> why the short lines?  Just write this out in proper English.
> 
> > + */
> > +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> > +{
> > +	struct fd f;
> > +	struct vfio_device *vdev;
> > +
> > +	f = fdget(fd);
> > +	if (!f.file)
> > +		return NULL;
> > +	vdev = kvm_vfio_device_get_external_user(f.file);
> > +	fdput(f);
> > +	return vdev;
> > +}
> > +
> > +/**
> > + * put_vfio_device: put the vfio platform device
> > + * @vdev: vfio_device to put
> > + *
> > + * decrement the ref counter
> > + */
> > +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> > +{
> > +	kvm_vfio_device_put_external_user(vdev);
> > +}
> > +
> > +/**
> > + * kvm_vfio_find_device - look for the device in the assigned
> > + * device list
> > + * @kv: the kvm-vfio device
> > + * @vdev: the vfio_device to look for
> > + *
> > + * returns the associated kvm_vfio_device if the device is known,
> > + * meaning at least 1 IRQ is forwarded for this device.
> > + * in the device is not registered, returns NULL.
> > + */

Why are we talking about forwarded IRQs already, this is a simple lookup
function, who knows what other users it will have in the future.

> 
> are these functions meant to be exported?  Otherwise they should be
> static, and the documentation on these simple list iteration wrappers
> seems like overkill imho.
> 
> > +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> > +					     struct vfio_device *vdev)
> > +{
> > +	struct kvm_vfio_device *kvm_vdev_iter;
> > +
> > +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> > +		if (kvm_vdev_iter->vfio_device == vdev)
> > +			return kvm_vdev_iter;
> > +	}
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> > + * @kvm_vdev: the kvm_vfio_device
> > + * @irq_index: irq index
> > + *
> > + * returns the forwarded irq struct if it exists, NULL in the negative
> > + */
> > +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> > +				      int irq_index)

+sub-index

probably important to note on both of these that they need to be called
with kv->lock

> > +{
> > +	struct kvm_fwd_irq *fwd_irq_iter;
> > +
> > +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> > +		if (fwd_irq_iter->index == irq_index)
> > +			return fwd_irq_iter;
> > +	}
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * validate_forward - checks whether forwarding a given IRQ is meaningful
> > + * @vdev:  vfio_device the IRQ belongs to
> > + * @fwd_irq: user struct containing the irq_index to forward
> > + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> > + * kvm_vfio_device that holds it
> > + * @hwirq: irq numberthe irq index corresponds to
> > + *
> > + * checks the vfio-device is a platform vfio device
> > + * checks the irq_index corresponds to an actual hwirq and
> > + * checks this hwirq is not already forwarded
> > + * returns < 0 on following errors:
> > + * not a platform device, bad irq index, already forwarded
> > + */
> > +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> > +			    struct vfio_device *vdev,
> > +			    struct kvm_arch_forwarded_irq *fwd_irq,
> > +			    struct kvm_vfio_device **kvm_vdev,
> > +			    int *hwirq)
> > +{
> > +	struct device *dev = kvm_vfio_external_base_device(vdev);
> > +	struct platform_device *platdev;
> > +
> > +	*hwirq = -1;
> > +	*kvm_vdev = NULL;
> > +	if (strcmp(dev->bus->name, "platform") == 0) {

Should be testing dev->bus_type == &platform_bus_type, and ideally
creating a dev_is_platform() macro to make that even cleaner.

However, we're being sort of sneaky here that we're actually doing
something platform device specific here.  Why?  Don't we just need to
make sure that kvm-vfio doesn't have any record of this forward
(-EEXIST) and let the platform device code error out later for this
case?

> > +		platdev = to_platform_device(dev);
> > +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> > +		if (*hwirq < 0) {
> > +			kvm_err("%s incorrect index\n", __func__);
> > +			return -EINVAL;
> > +		}
> > +	} else {
> > +		kvm_err("%s not a platform device\n", __func__);
> > +		return -EINVAL;
> > +	}
> 
> need some spaceing here, also, I would turn this around, first check if
> the strcmp fails, and then error out, then do you next check etc., to
> avoid so many nested statements.
> 
> > +	/* is a ref to this device already owned by the KVM-VFIO device? */
> 
> this comment is not particularly helpful in its current form, it would
> be helpful if you specified that we're checking whether that particular
> device/irq combo is already registered.
> 
> > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > +	if (*kvm_vdev) {
> > +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> > +			kvm_err("%s irq %d already forwarded\n",
> > +				__func__, *hwirq);

Why didn't we do this first?

> don't flood the kernel log because of a user error, just allocate an
> error code for this purpose and document it in the ABI, -EEXIST or
> something.
> 
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +
> > +/**
> > + * validate_unforward: check a deassignment is meaningful
> > + * @kv: the kvm_vfio device
> > + * @vdev: the vfio_device whose irq to deassign belongs to
> > + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> > + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> > + * it exists
> > + *
> > + * returns 0 if the provided irq effectively is forwarded
> > + * (a ref to this vfio_device is hold and this irq belongs to
>                                     held
> > + * the forwarded irq of this device)
> > + * returns -EINVAL in the negative
> 
>                ENOENT should be returned if you don't have an entry.
> 	       EINVAL could be used if you supply an fd that isn't a
> 	       VFIO device file descriptor, for example.  Again,
> 	       consider documenting all this in the API.
> 
> > + */
> > +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> > +			      struct vfio_device *vdev,
> > +			      struct kvm_arch_forwarded_irq *fwd_irq,
> > +			      struct kvm_vfio_device **kvm_vdev)
> > +{
> > +	struct kvm_fwd_irq *pfwd;
> > +
> > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > +	if (!kvm_vdev) {
> > +		kvm_err("%s no forwarded irq for this device\n", __func__);
> 
> don't flood the kernel log
> 
> > +		return -EINVAL;
> > +	}
> > +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> > +	if (!pfwd) {
> > +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> 
> > +		return -EINVAL;
> 
> same here
> 
> > +	}
> > +	return 0;
> > +}
> > +
> > +/**
> > + * kvm_vfio_forward - set a forwarded IRQ
> > + * @kdev: the kvm device
> > + * @vdev: the vfio device the IRQ belongs to
> > + * @fwd_irq: the user struct containing the irq_index and guest irq
> > + * @must_put: tells the caller whether the vfio_device must be put after
> > + * the call (ref must be released in case a ref onto this device was
> > + * already hold or in case of new device and failure)
> > + *
> > + * validate the injection, activate forward and store the information
>       Validate
> > + * about which irq and which device is concerned so that on deassign or
> > + * kvm-vfio destruction everuthing can be cleaned up.
>                            everything
> 
> I'm not sure I understand this explanation.  Do we have concerned
> devices?
> 
> I think you want to say something along the lines of: If userspace passed
> a valid vfio device and irq handle and the architecture supports
> forwarding this combination, register the vfio_device and irq
> combination in the ....
> 
> > + */
> > +static int kvm_vfio_forward(struct kvm_device *kdev,
> > +			    struct vfio_device *vdev,
> > +			    struct kvm_arch_forwarded_irq *fwd_irq,
> > +			    bool *must_put)
> > +{
> > +	int ret;
> > +	struct kvm_fwd_irq *pfwd = NULL;
> > +	struct kvm_vfio_device *kvm_vdev = NULL;
> > +	struct kvm_vfio *kv = kdev->private;
> > +	int hwirq;
> > +
> > +	*must_put = true;
> > +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
> > +					&kvm_vdev, &hwirq);
> > +	if (ret < 0)
> > +		return -EINVAL;
> > +
> > +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
> 
> seems a bit pointless to zero-out the memory if you're setting all
> fields below.
> 
> > +	if (!pfwd)
> > +		return -ENOMEM;
> > +	pfwd->index = fwd_irq->index;
> > +	pfwd->gsi = fwd_irq->gsi;
> > +	pfwd->hwirq = hwirq;
> > +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> > +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> > +	if (ret < 0) {
> > +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> 
> this whole thing feels incredibly broken to me.  Setting a forward
> should either work or not work, not something in between that leaves
> something to be cleaned up.  Why this two-stage thingy here?

Yep, I agree.  I also don't see the point of the validate function, just
open code it here and push the platform_get_irq test into
kvm_arch_set_fwd_state.  kvm-vfio doesn't care about the hwirq.

> > +		kfree(pfwd);
> 
> probably want to move your free-and-return-error to the end of the
> function.
> 
> > +		return ret;
> > +	}
> > +
> > +	if (!kvm_vdev) {
> > +		/* create & insert the new device and keep the ref */
> > +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
> 
> again, no need for zeroing out the memory.

I think you also want to allocate this before you setup the forward so
you can eliminate any complicated teardown later.

> > +		if (!kvm_vdev) {
> > +			kvm_arch_set_fwd_state(pfwd, false);

false?  The function takes an enum.

> > +			kfree(pfwd);
> > +			return -ENOMEM;
> > +		}
> > +
> > +		kvm_vdev->vfio_device = vdev;
> > +		kvm_vdev->fd = fwd_irq->fd;
> > +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
> > +		list_add(&kvm_vdev->node, &kv->device_list);
> > +		/*
> > +		 * the only case where we keep the ref:
> > +		 * new device and forward setting successful
> > +		 */
> > +		*must_put = false;
> > +	}
> > +
> > +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
> > +
> > +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
> > +	fwd_irq->fd, hwirq, fwd_irq->gsi);
> 
> please indent this to align with the opening parenthesis.
> 
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * remove_assigned_device - put a given device from the list
> 
> this isn't a 'put', at least not *just* a put.
> 
> > + * @kv: the kvm-vfio device
> > + * @vdev: the vfio-device to remove
> > + *
> > + * change the state of all forwarded IRQs, free the forwarded IRQ list,
> > + * remove the corresponding kvm_vfio_device from the assigned device
> > + * list.
> > + * returns true if the device could be removed, false in the negative
> > + */
> > +bool remove_assigned_device(struct kvm_vfio *kv,
> > +			    struct vfio_device *vdev)
> > +{
> > +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> > +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> > +	bool removed = false;
> > +	int ret;
> > +
> > +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> > +				 &kv->device_list, node) {
> > +		if (kvm_vdev_iter->vfio_device == vdev) {
> > +			/* loop on all its forwarded IRQ */
> > +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> > +						 &kvm_vdev_iter->fwd_irq_list,
> > +						 link) {
> 
> hmm, seems this function is only called when you have no more forwarded
> IRQs, so isn't all of this completely dead (and unnecessary) code?
> 
> > +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> > +						KVM_VFIO_IRQ_SET_NORMAL);
> > +				if (ret < 0)
> > +					return ret;
> 
> you're returning an error code to a bool function, which means you'll
> return true when there was an error.  Is this your intention? ;)
> 
> if we have an error here, this would be a very very bad situation wouldn't it?
> 
> > +				list_del(&fwd_irq_iter->link);
> > +				kfree(fwd_irq_iter);
> > +			}
> > +			/* all IRQs could be deassigned */
> > +			list_del(&kvm_vdev_iter->node);
> > +			kvm_vfio_device_put_external_user(
> > +				kvm_vdev_iter->vfio_device);
> > +			kfree(kvm_vdev_iter);
> > +			removed = true;
> > +			break;
> > +		}
> > +	}
> > +	return removed;
> > +}
> > +
> > +
> > +/**
> > + * remove_fwd_irq - remove a forwarded irq
> > + *
> > + * @kv: kvm-vfio device
> > + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
> > + * irq_index: the index of the IRQ
> > + *
> > + * change the forwarded state of the IRQ, remove the IRQ from
> > + * the device forwarded IRQ list. In case it is the last one,
> > + * put the device
> > + */
> > +int remove_fwd_irq(struct kvm_vfio *kv,
> > +		   struct kvm_vfio_device *kvm_vdev,
> > +		   int irq_index)
> > +{
> > +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> > +	int ret = -1;
> > +
> > +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> > +				 &kvm_vdev->fwd_irq_list, link) {
> 
> hmmm, you can only forward one irq for a specific device once, right?
> And you already have a lookup function, so why not call that, and then
> remove it?
> 
> I'm confused.

Me too, this and the previous function need some more consideration.

> > +		if (fwd_irq_iter->index == irq_index) {
> > +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> > +						KVM_VFIO_IRQ_SET_NORMAL);
> > +			if (ret < 0)
> > +				break;
> > +			list_del(&fwd_irq_iter->link);
> > +			kfree(fwd_irq_iter);
> > +			ret = 0;
> > +			break;
> > +		}
> > +	}
> > +	if (list_empty(&kvm_vdev->fwd_irq_list))
> > +		remove_assigned_device(kv, kvm_vdev->vfio_device);
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * kvm_vfio_unforward - remove a forwarded IRQ
> > + * @kdev: the kvm device
> > + * @vdev: the vfio_device
> > + * @fwd_irq: user struct
> > + * after checking this IRQ effectively is forwarded, change its state,
> > + * remove it from the corresponding kvm_vfio_device list
> > + */
> > +static int kvm_vfio_unforward(struct kvm_device *kdev,
> > +				     struct vfio_device *vdev,
> > +				     struct kvm_arch_forwarded_irq *fwd_irq)
> > +{
> > +	struct kvm_vfio *kv = kdev->private;
> > +	struct kvm_vfio_device *kvm_vdev;
> > +	int ret;
> > +
> > +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
> > +	if (ret < 0)
> > +		return -EINVAL;
> 
> why do you override the return value?  Propagate it.
> 
> > +
> > +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
> > +	if (ret < 0)
> > +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
> > +			__func__, fwd_irq->fd, fwd_irq->index);
> > +	else
> > +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
> > +			  __func__, fwd_irq->fd, fwd_irq->index);
> 
> again with the kernel log here.
> 
> 
> 
> > +	return ret;
> > +}
> > +
> > +
> > +
> > +
> > +/**
> > + * kvm_vfio_set_device - the top function for interracting with a vfio
> 
>                                 top?             interacting
> 
> > + * device
> > + */
> 
> probably just skip this comment
> 
> > +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> > +{
> > +	struct kvm_vfio *kv = kdev->private;
> > +	struct vfio_device *vdev;
> > +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
> > +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> > +
> > +	switch (attr) {
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> > +		bool must_put;
> > +		int ret;
> > +
> > +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> > +			return -EFAULT;
> > +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> > +		if (IS_ERR(vdev))
> > +			return PTR_ERR(vdev);
> 
> seems like this whole block of code is replicated below, needs
> refactoring.
> 
> > +		mutex_lock(&kv->lock);
> > +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
> > +		if (must_put)
> > +			kvm_vfio_put_vfio_device(vdev);
> 
> this must_put looks plain weird.  I think you want to balance your
> get/put's always; can't you just get an extra reference in
> kvm_vfio_forward() ?

Yeah, this is very broken.  Every forwarded IRQ should hold a reference
to the vfio_device.  Every unforward should drop a reference.  Trying to
maintain a single reference is a non-goal.

> 
> > +		mutex_unlock(&kv->lock);
> > +		return ret;
> > +		}
> > +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
> > +		int ret;
> > +
> > +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> > +			return -EFAULT;
> > +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> > +		if (IS_ERR(vdev))
> > +			return PTR_ERR(vdev);
> > +
> > +		kvm_vfio_device_put_external_user(vdev);
> 
> you're dropping the reference to the device but referencing it in your
> unfoward call below?
> 
> > +		mutex_lock(&kv->lock);
> > +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
> > +		mutex_unlock(&kv->lock);
> > +		return ret;
> > +	}
> > +#endif
> > +	default:
> > +		return -ENXIO;
> > +	}
> > +}
> > +
> > +/**
> > + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
> > + * @kv: kvm-vfio device
> > + *
> > + * loop on all got devices and their associated forwarded IRQs
> 
> 'loop on all got' ?
> 
> Restore the non-forwarded state for all registered devices and ...
> 
> > + * restore the non forwarded state, remove IRQs and their devices from
> > + * the respective list, put the vfio platform devices
> > + *
> > + * When this function is called, the vcpu already are destroyed. No
>                                     the VPUCs are already destroyed.
> > + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> > + * kvm_arch_set_fwd_state action
> 
> this last bit didn't make any sense to me.  Also, why are we referring
> to the vgic in generic code?
> 
> > + */
> > +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
> > +{
> > +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> > +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> > +
> > +	/* loop on all the assigned devices */
> 
> unnecessary comment
> 
> > +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> > +				 &kv->device_list, node) {
> > +
> > +		/* loop on all its forwarded IRQ */
> 
> same
> 
> > +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> > +					 &kvm_vdev_iter->fwd_irq_list, link) {
> > +			kvm_arch_set_fwd_state(fwd_irq_iter,
> > +						KVM_VFIO_IRQ_CLEANUP);
> > +			list_del(&fwd_irq_iter->link);
> > +			kfree(fwd_irq_iter);
> > +		}


Ugh, how many of these cleanup functions do we need?

> > +		list_del(&kvm_vdev_iter->node);
> > +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
> > +		kfree(kvm_vdev_iter);
> > +	}
> > +	return 0;
> > +}
> > +
> > +
> >  static int kvm_vfio_set_attr(struct kvm_device *dev,
> >  			     struct kvm_device_attr *attr)
> >  {
> >  	switch (attr->group) {
> >  	case KVM_DEV_VFIO_GROUP:
> >  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> > +	case KVM_DEV_VFIO_DEVICE:
> > +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> >  	}
> >  
> >  	return -ENXIO;
> > @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> >  		case KVM_DEV_VFIO_GROUP_DEL:
> >  			return 0;
> >  		}
> > -
> >  		break;
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > +	case KVM_DEV_VFIO_DEVICE:
> > +		switch (attr->attr) {
> > +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> > +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> > +			return 0;
> > +		}
> > +		break;
> > +#endif
> >  	}
> > -
> >  	return -ENXIO;
> >  }
> >  
> > @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> >  		list_del(&kvg->node);
> >  		kfree(kvg);
> >  	}
> > +	kvm_vfio_put_all_devices(kv);
> >  
> >  	kvm_vfio_update_coherency(dev);
> >  
> > @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
> >  		return -ENOMEM;
> >  
> >  	INIT_LIST_HEAD(&kv->group_list);
> > +	INIT_LIST_HEAD(&kv->device_list);
> >  	mutex_init(&kv->lock);
> >  
> >  	dev->private = kv;
> > -- 
> > 1.9.1
> > 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11  5:05       ` Alex Williamson
  0 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11  5:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> > This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> > 
> > This is a new control channel which enables KVM to cooperate with
> > viable VFIO devices.
> > 
> > The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> > in addition to a list of groups (kvm_vfio_group). The new
> > infrastructure enables to check the validity of the VFIO device
> > file descriptor, get and hold a reference to it.
> > 
> > The first concrete implemented command is IRQ forward control:
> > KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> > 
> > It consists in programing the VFIO driver and KVM in a consistent manner
> > so that an optimized IRQ injection/completion is set up. Each
> > kvm_vfio_device holds a list of forwarded IRQ. When putting a
> > kvm_vfio_device, the implementation makes sure the forwarded IRQs
> > are set again in the normal handling state (non forwarded).
> 
> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> 
> When a kvm_vfio_device is released?
> 
> > 
> > The forwarding programmming is architecture specific, embodied by the
> > kvm_arch_set_fwd_state function. Its implementation is given in a
> > separate patch file.
> 
> I would drop the last sentence and instead indicate that this is handled
> properly when the architecture does not support such a feature.
> 
> > 
> > The forwarding control modality is enabled by the
> > __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> > 
> > Signed-off-by: Eric Auger <eric.auger@linaro.org>
> > 
> > ---
> > 
> > v1 -> v2:
> > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > - original patch file separated into 2 parts: generic part moved in vfio.c
> >   and ARM specific part(kvm_arch_set_fwd_state)
> > ---
> >  include/linux/kvm_host.h |  27 +++
> >  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 477 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index a4c33b3..24350dc 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> >  		      unsigned long arg);
> >  };
> >  
> > +enum kvm_fwd_irq_action {
> > +	KVM_VFIO_IRQ_SET_FORWARD,
> > +	KVM_VFIO_IRQ_SET_NORMAL,
> > +	KVM_VFIO_IRQ_CLEANUP,
> 
> This is KVM internal API, so it would probably be good to document this.
> Especially the CLEANUP bit worries me, see below.

This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
Extra states worry me too.

> > +};
> > +
> > +/* internal structure describing a forwarded IRQ */
> > +struct kvm_fwd_irq {
> > +	struct list_head link;
> 
> this list entry is local to the kvm vfio device, right? that means you
> probably want a struct with just the below fields, and then have a
> containing struct in the generic device file, private to it's logic.

Yes, this is part of the abstraction problem.

> > +	__u32 index; /* platform device irq index */

This is a vfio_device irq_index, but vfio_devices support indexes and
sub-indexes.  At this level the API should match vfio, not the specifics
of platform devices not supporting sub-index.

> > +	__u32 hwirq; /*physical IRQ */
> > +	__u32 gsi; /* virtual IRQ */
> > +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/

Not sure I understand why vcpu is necessary.  Also I see a 'get' in the
code below, but not a 'put'.

> > +};
> > +
> >  void kvm_device_get(struct kvm_device *dev);
> >  void kvm_device_put(struct kvm_device *dev);
> >  struct kvm_device *kvm_device_from_filp(struct file *filp);
> > @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
> >  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >  extern struct kvm_device_ops kvm_flic_ops;
> >  
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> 
> what's the 'p' in pfwd?

p is for pointer?

> > +			   enum kvm_fwd_irq_action action);
> > +
> > +#else
> > +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > +					 enum kvm_fwd_irq_action action)
> > +{
> > +	return 0;
> > +}
> > +#endif
> > +
> >  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >  
> >  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> > index 76dc7a1..e4a81c4 100644
> > --- a/virt/kvm/vfio.c
> > +++ b/virt/kvm/vfio.c
> > @@ -18,14 +18,24 @@
> >  #include <linux/slab.h>
> >  #include <linux/uaccess.h>
> >  #include <linux/vfio.h>
> > +#include <linux/platform_device.h>
> >  
> >  struct kvm_vfio_group {
> >  	struct list_head node;
> >  	struct vfio_group *vfio_group;
> >  };
> >  
> > +struct kvm_vfio_device {
> > +	struct list_head node;
> > +	struct vfio_device *vfio_device;
> > +	/* list of forwarded IRQs for that VFIO device */
> > +	struct list_head fwd_irq_list;
> > +	int fd;
> > +};
> > +
> >  struct kvm_vfio {
> >  	struct list_head group_list;
> > +	struct list_head device_list;
> >  	struct mutex lock;
> >  	bool noncoherent;
> >  };
> > @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >  	return -ENXIO;
> >  }
> >  
> > +/**
> > + * get_vfio_device - returns the vfio-device corresponding to this fd
> > + * @fd:fd of the vfio platform device
> > + *
> > + * checks it is a vfio device
> > + * increment its ref counter
> 
> why the short lines?  Just write this out in proper English.
> 
> > + */
> > +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> > +{
> > +	struct fd f;
> > +	struct vfio_device *vdev;
> > +
> > +	f = fdget(fd);
> > +	if (!f.file)
> > +		return NULL;
> > +	vdev = kvm_vfio_device_get_external_user(f.file);
> > +	fdput(f);
> > +	return vdev;
> > +}
> > +
> > +/**
> > + * put_vfio_device: put the vfio platform device
> > + * @vdev: vfio_device to put
> > + *
> > + * decrement the ref counter
> > + */
> > +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> > +{
> > +	kvm_vfio_device_put_external_user(vdev);
> > +}
> > +
> > +/**
> > + * kvm_vfio_find_device - look for the device in the assigned
> > + * device list
> > + * @kv: the kvm-vfio device
> > + * @vdev: the vfio_device to look for
> > + *
> > + * returns the associated kvm_vfio_device if the device is known,
> > + * meaning at least 1 IRQ is forwarded for this device.
> > + * in the device is not registered, returns NULL.
> > + */

Why are we talking about forwarded IRQs already, this is a simple lookup
function, who knows what other users it will have in the future.

> 
> are these functions meant to be exported?  Otherwise they should be
> static, and the documentation on these simple list iteration wrappers
> seems like overkill imho.
> 
> > +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> > +					     struct vfio_device *vdev)
> > +{
> > +	struct kvm_vfio_device *kvm_vdev_iter;
> > +
> > +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> > +		if (kvm_vdev_iter->vfio_device == vdev)
> > +			return kvm_vdev_iter;
> > +	}
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> > + * @kvm_vdev: the kvm_vfio_device
> > + * @irq_index: irq index
> > + *
> > + * returns the forwarded irq struct if it exists, NULL in the negative
> > + */
> > +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> > +				      int irq_index)

+sub-index

probably important to note on both of these that they need to be called
with kv->lock

> > +{
> > +	struct kvm_fwd_irq *fwd_irq_iter;
> > +
> > +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> > +		if (fwd_irq_iter->index == irq_index)
> > +			return fwd_irq_iter;
> > +	}
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * validate_forward - checks whether forwarding a given IRQ is meaningful
> > + * @vdev:  vfio_device the IRQ belongs to
> > + * @fwd_irq: user struct containing the irq_index to forward
> > + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> > + * kvm_vfio_device that holds it
> > + * @hwirq: irq numberthe irq index corresponds to
> > + *
> > + * checks the vfio-device is a platform vfio device
> > + * checks the irq_index corresponds to an actual hwirq and
> > + * checks this hwirq is not already forwarded
> > + * returns < 0 on following errors:
> > + * not a platform device, bad irq index, already forwarded
> > + */
> > +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> > +			    struct vfio_device *vdev,
> > +			    struct kvm_arch_forwarded_irq *fwd_irq,
> > +			    struct kvm_vfio_device **kvm_vdev,
> > +			    int *hwirq)
> > +{
> > +	struct device *dev = kvm_vfio_external_base_device(vdev);
> > +	struct platform_device *platdev;
> > +
> > +	*hwirq = -1;
> > +	*kvm_vdev = NULL;
> > +	if (strcmp(dev->bus->name, "platform") == 0) {

Should be testing dev->bus_type == &platform_bus_type, and ideally
creating a dev_is_platform() macro to make that even cleaner.

However, we're being sort of sneaky here that we're actually doing
something platform device specific here.  Why?  Don't we just need to
make sure that kvm-vfio doesn't have any record of this forward
(-EEXIST) and let the platform device code error out later for this
case?

> > +		platdev = to_platform_device(dev);
> > +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> > +		if (*hwirq < 0) {
> > +			kvm_err("%s incorrect index\n", __func__);
> > +			return -EINVAL;
> > +		}
> > +	} else {
> > +		kvm_err("%s not a platform device\n", __func__);
> > +		return -EINVAL;
> > +	}
> 
> need some spaceing here, also, I would turn this around, first check if
> the strcmp fails, and then error out, then do you next check etc., to
> avoid so many nested statements.
> 
> > +	/* is a ref to this device already owned by the KVM-VFIO device? */
> 
> this comment is not particularly helpful in its current form, it would
> be helpful if you specified that we're checking whether that particular
> device/irq combo is already registered.
> 
> > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > +	if (*kvm_vdev) {
> > +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> > +			kvm_err("%s irq %d already forwarded\n",
> > +				__func__, *hwirq);

Why didn't we do this first?

> don't flood the kernel log because of a user error, just allocate an
> error code for this purpose and document it in the ABI, -EEXIST or
> something.
> 
> > +			return -EINVAL;
> > +		}
> > +	}
> > +	return 0;
> > +}
> > +
> > +/**
> > + * validate_unforward: check a deassignment is meaningful
> > + * @kv: the kvm_vfio device
> > + * @vdev: the vfio_device whose irq to deassign belongs to
> > + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> > + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> > + * it exists
> > + *
> > + * returns 0 if the provided irq effectively is forwarded
> > + * (a ref to this vfio_device is hold and this irq belongs to
>                                     held
> > + * the forwarded irq of this device)
> > + * returns -EINVAL in the negative
> 
>                ENOENT should be returned if you don't have an entry.
> 	       EINVAL could be used if you supply an fd that isn't a
> 	       VFIO device file descriptor, for example.  Again,
> 	       consider documenting all this in the API.
> 
> > + */
> > +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> > +			      struct vfio_device *vdev,
> > +			      struct kvm_arch_forwarded_irq *fwd_irq,
> > +			      struct kvm_vfio_device **kvm_vdev)
> > +{
> > +	struct kvm_fwd_irq *pfwd;
> > +
> > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > +	if (!kvm_vdev) {
> > +		kvm_err("%s no forwarded irq for this device\n", __func__);
> 
> don't flood the kernel log
> 
> > +		return -EINVAL;
> > +	}
> > +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> > +	if (!pfwd) {
> > +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> 
> > +		return -EINVAL;
> 
> same here
> 
> > +	}
> > +	return 0;
> > +}
> > +
> > +/**
> > + * kvm_vfio_forward - set a forwarded IRQ
> > + * @kdev: the kvm device
> > + * @vdev: the vfio device the IRQ belongs to
> > + * @fwd_irq: the user struct containing the irq_index and guest irq
> > + * @must_put: tells the caller whether the vfio_device must be put after
> > + * the call (ref must be released in case a ref onto this device was
> > + * already hold or in case of new device and failure)
> > + *
> > + * validate the injection, activate forward and store the information
>       Validate
> > + * about which irq and which device is concerned so that on deassign or
> > + * kvm-vfio destruction everuthing can be cleaned up.
>                            everything
> 
> I'm not sure I understand this explanation.  Do we have concerned
> devices?
> 
> I think you want to say something along the lines of: If userspace passed
> a valid vfio device and irq handle and the architecture supports
> forwarding this combination, register the vfio_device and irq
> combination in the ....
> 
> > + */
> > +static int kvm_vfio_forward(struct kvm_device *kdev,
> > +			    struct vfio_device *vdev,
> > +			    struct kvm_arch_forwarded_irq *fwd_irq,
> > +			    bool *must_put)
> > +{
> > +	int ret;
> > +	struct kvm_fwd_irq *pfwd = NULL;
> > +	struct kvm_vfio_device *kvm_vdev = NULL;
> > +	struct kvm_vfio *kv = kdev->private;
> > +	int hwirq;
> > +
> > +	*must_put = true;
> > +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
> > +					&kvm_vdev, &hwirq);
> > +	if (ret < 0)
> > +		return -EINVAL;
> > +
> > +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
> 
> seems a bit pointless to zero-out the memory if you're setting all
> fields below.
> 
> > +	if (!pfwd)
> > +		return -ENOMEM;
> > +	pfwd->index = fwd_irq->index;
> > +	pfwd->gsi = fwd_irq->gsi;
> > +	pfwd->hwirq = hwirq;
> > +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> > +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> > +	if (ret < 0) {
> > +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> 
> this whole thing feels incredibly broken to me.  Setting a forward
> should either work or not work, not something in between that leaves
> something to be cleaned up.  Why this two-stage thingy here?

Yep, I agree.  I also don't see the point of the validate function, just
open code it here and push the platform_get_irq test into
kvm_arch_set_fwd_state.  kvm-vfio doesn't care about the hwirq.

> > +		kfree(pfwd);
> 
> probably want to move your free-and-return-error to the end of the
> function.
> 
> > +		return ret;
> > +	}
> > +
> > +	if (!kvm_vdev) {
> > +		/* create & insert the new device and keep the ref */
> > +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
> 
> again, no need for zeroing out the memory.

I think you also want to allocate this before you setup the forward so
you can eliminate any complicated teardown later.

> > +		if (!kvm_vdev) {
> > +			kvm_arch_set_fwd_state(pfwd, false);

false?  The function takes an enum.

> > +			kfree(pfwd);
> > +			return -ENOMEM;
> > +		}
> > +
> > +		kvm_vdev->vfio_device = vdev;
> > +		kvm_vdev->fd = fwd_irq->fd;
> > +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
> > +		list_add(&kvm_vdev->node, &kv->device_list);
> > +		/*
> > +		 * the only case where we keep the ref:
> > +		 * new device and forward setting successful
> > +		 */
> > +		*must_put = false;
> > +	}
> > +
> > +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
> > +
> > +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
> > +	fwd_irq->fd, hwirq, fwd_irq->gsi);
> 
> please indent this to align with the opening parenthesis.
> 
> > +
> > +	return 0;
> > +}
> > +
> > +/**
> > + * remove_assigned_device - put a given device from the list
> 
> this isn't a 'put', at least not *just* a put.
> 
> > + * @kv: the kvm-vfio device
> > + * @vdev: the vfio-device to remove
> > + *
> > + * change the state of all forwarded IRQs, free the forwarded IRQ list,
> > + * remove the corresponding kvm_vfio_device from the assigned device
> > + * list.
> > + * returns true if the device could be removed, false in the negative
> > + */
> > +bool remove_assigned_device(struct kvm_vfio *kv,
> > +			    struct vfio_device *vdev)
> > +{
> > +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> > +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> > +	bool removed = false;
> > +	int ret;
> > +
> > +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> > +				 &kv->device_list, node) {
> > +		if (kvm_vdev_iter->vfio_device == vdev) {
> > +			/* loop on all its forwarded IRQ */
> > +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> > +						 &kvm_vdev_iter->fwd_irq_list,
> > +						 link) {
> 
> hmm, seems this function is only called when you have no more forwarded
> IRQs, so isn't all of this completely dead (and unnecessary) code?
> 
> > +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> > +						KVM_VFIO_IRQ_SET_NORMAL);
> > +				if (ret < 0)
> > +					return ret;
> 
> you're returning an error code to a bool function, which means you'll
> return true when there was an error.  Is this your intention? ;)
> 
> if we have an error here, this would be a very very bad situation wouldn't it?
> 
> > +				list_del(&fwd_irq_iter->link);
> > +				kfree(fwd_irq_iter);
> > +			}
> > +			/* all IRQs could be deassigned */
> > +			list_del(&kvm_vdev_iter->node);
> > +			kvm_vfio_device_put_external_user(
> > +				kvm_vdev_iter->vfio_device);
> > +			kfree(kvm_vdev_iter);
> > +			removed = true;
> > +			break;
> > +		}
> > +	}
> > +	return removed;
> > +}
> > +
> > +
> > +/**
> > + * remove_fwd_irq - remove a forwarded irq
> > + *
> > + * @kv: kvm-vfio device
> > + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
> > + * irq_index: the index of the IRQ
> > + *
> > + * change the forwarded state of the IRQ, remove the IRQ from
> > + * the device forwarded IRQ list. In case it is the last one,
> > + * put the device
> > + */
> > +int remove_fwd_irq(struct kvm_vfio *kv,
> > +		   struct kvm_vfio_device *kvm_vdev,
> > +		   int irq_index)
> > +{
> > +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> > +	int ret = -1;
> > +
> > +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> > +				 &kvm_vdev->fwd_irq_list, link) {
> 
> hmmm, you can only forward one irq for a specific device once, right?
> And you already have a lookup function, so why not call that, and then
> remove it?
> 
> I'm confused.

Me too, this and the previous function need some more consideration.

> > +		if (fwd_irq_iter->index == irq_index) {
> > +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> > +						KVM_VFIO_IRQ_SET_NORMAL);
> > +			if (ret < 0)
> > +				break;
> > +			list_del(&fwd_irq_iter->link);
> > +			kfree(fwd_irq_iter);
> > +			ret = 0;
> > +			break;
> > +		}
> > +	}
> > +	if (list_empty(&kvm_vdev->fwd_irq_list))
> > +		remove_assigned_device(kv, kvm_vdev->vfio_device);
> > +
> > +	return ret;
> > +}
> > +
> > +/**
> > + * kvm_vfio_unforward - remove a forwarded IRQ
> > + * @kdev: the kvm device
> > + * @vdev: the vfio_device
> > + * @fwd_irq: user struct
> > + * after checking this IRQ effectively is forwarded, change its state,
> > + * remove it from the corresponding kvm_vfio_device list
> > + */
> > +static int kvm_vfio_unforward(struct kvm_device *kdev,
> > +				     struct vfio_device *vdev,
> > +				     struct kvm_arch_forwarded_irq *fwd_irq)
> > +{
> > +	struct kvm_vfio *kv = kdev->private;
> > +	struct kvm_vfio_device *kvm_vdev;
> > +	int ret;
> > +
> > +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
> > +	if (ret < 0)
> > +		return -EINVAL;
> 
> why do you override the return value?  Propagate it.
> 
> > +
> > +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
> > +	if (ret < 0)
> > +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
> > +			__func__, fwd_irq->fd, fwd_irq->index);
> > +	else
> > +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
> > +			  __func__, fwd_irq->fd, fwd_irq->index);
> 
> again with the kernel log here.
> 
> 
> 
> > +	return ret;
> > +}
> > +
> > +
> > +
> > +
> > +/**
> > + * kvm_vfio_set_device - the top function for interracting with a vfio
> 
>                                 top?             interacting
> 
> > + * device
> > + */
> 
> probably just skip this comment
> 
> > +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> > +{
> > +	struct kvm_vfio *kv = kdev->private;
> > +	struct vfio_device *vdev;
> > +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
> > +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> > +
> > +	switch (attr) {
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> > +		bool must_put;
> > +		int ret;
> > +
> > +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> > +			return -EFAULT;
> > +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> > +		if (IS_ERR(vdev))
> > +			return PTR_ERR(vdev);
> 
> seems like this whole block of code is replicated below, needs
> refactoring.
> 
> > +		mutex_lock(&kv->lock);
> > +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
> > +		if (must_put)
> > +			kvm_vfio_put_vfio_device(vdev);
> 
> this must_put looks plain weird.  I think you want to balance your
> get/put's always; can't you just get an extra reference in
> kvm_vfio_forward() ?

Yeah, this is very broken.  Every forwarded IRQ should hold a reference
to the vfio_device.  Every unforward should drop a reference.  Trying to
maintain a single reference is a non-goal.

> 
> > +		mutex_unlock(&kv->lock);
> > +		return ret;
> > +		}
> > +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
> > +		int ret;
> > +
> > +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> > +			return -EFAULT;
> > +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> > +		if (IS_ERR(vdev))
> > +			return PTR_ERR(vdev);
> > +
> > +		kvm_vfio_device_put_external_user(vdev);
> 
> you're dropping the reference to the device but referencing it in your
> unfoward call below?
> 
> > +		mutex_lock(&kv->lock);
> > +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
> > +		mutex_unlock(&kv->lock);
> > +		return ret;
> > +	}
> > +#endif
> > +	default:
> > +		return -ENXIO;
> > +	}
> > +}
> > +
> > +/**
> > + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
> > + * @kv: kvm-vfio device
> > + *
> > + * loop on all got devices and their associated forwarded IRQs
> 
> 'loop on all got' ?
> 
> Restore the non-forwarded state for all registered devices and ...
> 
> > + * restore the non forwarded state, remove IRQs and their devices from
> > + * the respective list, put the vfio platform devices
> > + *
> > + * When this function is called, the vcpu already are destroyed. No
>                                     the VPUCs are already destroyed.
> > + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> > + * kvm_arch_set_fwd_state action
> 
> this last bit didn't make any sense to me.  Also, why are we referring
> to the vgic in generic code?
> 
> > + */
> > +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
> > +{
> > +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> > +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> > +
> > +	/* loop on all the assigned devices */
> 
> unnecessary comment
> 
> > +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> > +				 &kv->device_list, node) {
> > +
> > +		/* loop on all its forwarded IRQ */
> 
> same
> 
> > +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> > +					 &kvm_vdev_iter->fwd_irq_list, link) {
> > +			kvm_arch_set_fwd_state(fwd_irq_iter,
> > +						KVM_VFIO_IRQ_CLEANUP);
> > +			list_del(&fwd_irq_iter->link);
> > +			kfree(fwd_irq_iter);
> > +		}


Ugh, how many of these cleanup functions do we need?

> > +		list_del(&kvm_vdev_iter->node);
> > +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
> > +		kfree(kvm_vdev_iter);
> > +	}
> > +	return 0;
> > +}
> > +
> > +
> >  static int kvm_vfio_set_attr(struct kvm_device *dev,
> >  			     struct kvm_device_attr *attr)
> >  {
> >  	switch (attr->group) {
> >  	case KVM_DEV_VFIO_GROUP:
> >  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> > +	case KVM_DEV_VFIO_DEVICE:
> > +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> >  	}
> >  
> >  	return -ENXIO;
> > @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> >  		case KVM_DEV_VFIO_GROUP_DEL:
> >  			return 0;
> >  		}
> > -
> >  		break;
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > +	case KVM_DEV_VFIO_DEVICE:
> > +		switch (attr->attr) {
> > +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> > +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> > +			return 0;
> > +		}
> > +		break;
> > +#endif
> >  	}
> > -
> >  	return -ENXIO;
> >  }
> >  
> > @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> >  		list_del(&kvg->node);
> >  		kfree(kvg);
> >  	}
> > +	kvm_vfio_put_all_devices(kv);
> >  
> >  	kvm_vfio_update_coherency(dev);
> >  
> > @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
> >  		return -ENOMEM;
> >  
> >  	INIT_LIST_HEAD(&kv->group_list);
> > +	INIT_LIST_HEAD(&kv->device_list);
> >  	mutex_init(&kv->lock);
> >  
> >  	dev->private = kv;
> > -- 
> > 1.9.1
> > 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
  2014-09-11  3:10     ` Christoffer Dall
@ 2014-09-11  5:09       ` Alex Williamson
  -1 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11  5:09 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Eric Auger, eric.auger, marc.zyngier, linux-arm-kernel, kvmarm,
	kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> > On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> > > This RFC proposes an integration of "ARM: Forwarding physical
> > > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> > > KVM.
> > > 
> > > It enables to transform a VFIO platform driver IRQ into a forwarded
> > > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> > > switch can be avoided on guest virtual IRQ completion. Before this
> > > patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> > > 
> > > When the IRQ is forwarded, the VFIO platform driver does not need to
> > > disable the IRQ anymore. Indeed when returning from the IRQ handler
> > > the IRQ is not deactivated. Only its priority is lowered. This means
> > > the same IRQ cannot hit before the guest completes the virtual IRQ
> > > and the GIC automatically deactivates the corresponding physical IRQ.
> > > 
> > > Besides, the injection still is based on irqfd triggering. The only
> > > impact on irqfd process is resamplefd is not called anymore on
> > > virtual IRQ completion since this latter becomes "transparent".
> > > 
> > > The current integration is based on an extension of the KVM-VFIO
> > > device, previously used by KVM to interact with VFIO groups. The
> > > patch serie now enables KVM to directly interact with a VFIO
> > > platform device. The VFIO external API was extended for that purpose.
> > > 
> > > Th KVM-VFIO device can get/put the vfio platform device, check its
> > > integrity and type, get the IRQ number associated to an IRQ index.
> > > 
> > > The IRQ forward programming is architecture specific (virtual interrupt
> > > controller programming basically). However the whole infrastructure is
> > > kept generic.
> > > 
> > > from a user point of view, the functionality is provided through new
> > > KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> > > and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> > > Assignment can only be changed when the physical IRQ is not active.
> > > It is the responsability of the user to do this check.
> > > 
> > > This patch serie has the following dependencies:
> > > - "ARM: Forwarding physical interrupts to a guest VM"
> > >   (http://lwn.net/Articles/603514/) in
> > > - [PATCH v3] irqfd for ARM
> > > - and obviously the VFIO platform driver serie:
> > >   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> > >   https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> > > 
> > > Integrated pieces can be found at
> > > ssh://git.linaro.org/people/eric.auger/linux.git
> > > on branch 3.17rc3_irqfd_forward_integ_v2
> > > 
> > > This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> > > 
> > > v1 -> v2:
> > > - forward control is moved from architecture specific file into generic
> > >   vfio.c module.
> > >   only kvm_arch_set_fwd_state remains architecture specific
> > > - integrate Kim's patch which enables KVM-VFIO for ARM
> > > - fix vgic state bypass in vgic_queue_hwirq
> > > - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
> > >   to include/uapi/linux/kvm.h
> > >   also irq_index renamed into index and guest_irq renamed into gsi
> > > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> > > - vfio_external_get_base_device renamed into vfio_external_base_device
> > > - vfio_external_get_type removed
> > > - kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
> > > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > 
> > > Eric Auger (8):
> > >   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> > >     IRQ
> > >   KVM: ARM: VGIC: add forwarded irq rbtree lock
> > >   VFIO: platform: handler tests whether the IRQ is forwarded
> > >   KVM: KVM-VFIO: update user API to program forwarded IRQ
> > >   VFIO: Extend external user API
> > >   KVM: KVM-VFIO: add new VFIO external API hooks
> > >   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
> > >     control
> > >   KVM: KVM-VFIO: ARM forwarding control
> > > 
> > > Kim Phillips (1):
> > >   ARM: KVM: Enable the KVM-VFIO device
> > > 
> > >  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> > >  arch/arm/include/asm/kvm_host.h            |   7 +
> > >  arch/arm/kvm/Kconfig                       |   1 +
> > >  arch/arm/kvm/Makefile                      |   4 +-
> > >  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
> > >  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> > >  drivers/vfio/vfio.c                        |  24 ++
> > >  include/kvm/arm_vgic.h                     |   1 +
> > >  include/linux/kvm_host.h                   |  27 ++
> > >  include/linux/vfio.h                       |   3 +
> > >  include/uapi/linux/kvm.h                   |   9 +
> > >  virt/kvm/arm/vgic.c                        |  59 +++-
> > >  virt/kvm/vfio.c                            | 497 ++++++++++++++++++++++++++++-
> > >  13 files changed, 733 insertions(+), 17 deletions(-)
> > >  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> > > 
> > 
> > Have we ventured too far in the other direction?  I suppose what I was
> > hoping to see was something more like:
> > 
> > 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> > 
> > 		/* get vfio_device */
> > 
> > 		/* get mutex */
> > 
> > 		/* verify device+irq isn't already forwarded */
> > 
> > 		/* allocate device/forwarded irq */
> > 
> > 		/* get struct device */
> > 
> > 		/* callout to arch code passing struct device, gsi, ... */
> > 
> > 		/* if success, add to kv, else free and error */
> > 
> > 		/* mutex unlock */
> > 	}
> 
> I think that's essentially what this patch set is trying to do, but
> there are just too many complicated intertwining cases right now that
> makes the code hard to read.
> 
> > 
> > Exposing the internal mutex out to arch code, as in v1, was an
> > indication that we were pushing too much out to arch code, but including
> > platform_device.h into virt/kvm/vfio.c tells me we're still not
> > abstracting at the right point.  Thanks,
> > 
> I raised my eyebrows over the platform device bus thingy here as well,
> but on the other hand, there's nothing ARM-specific about referring to
> the platform device bus.
> 
> I think perhaps it just has to be made more clear that the generic code
> deals with translating the device resources in the necessary way, and
> currently it only supports vfio-platform devices?

Ok, you're probably right, looking at it again it is closer than I
thought.  At the same time, the use of platform device in
virt/kvm/vfio.c is pointless and can easily be pushed out to the arch
code as just another error return case.  vfio.c doesn't need to be aware
of hwirq.  The rest of the code is just overly complicated, with three
different cleanup functions and validation function bloat.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-09-11  5:09       ` Alex Williamson
  0 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11  5:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> > On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> > > This RFC proposes an integration of "ARM: Forwarding physical
> > > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> > > KVM.
> > > 
> > > It enables to transform a VFIO platform driver IRQ into a forwarded
> > > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> > > switch can be avoided on guest virtual IRQ completion. Before this
> > > patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> > > 
> > > When the IRQ is forwarded, the VFIO platform driver does not need to
> > > disable the IRQ anymore. Indeed when returning from the IRQ handler
> > > the IRQ is not deactivated. Only its priority is lowered. This means
> > > the same IRQ cannot hit before the guest completes the virtual IRQ
> > > and the GIC automatically deactivates the corresponding physical IRQ.
> > > 
> > > Besides, the injection still is based on irqfd triggering. The only
> > > impact on irqfd process is resamplefd is not called anymore on
> > > virtual IRQ completion since this latter becomes "transparent".
> > > 
> > > The current integration is based on an extension of the KVM-VFIO
> > > device, previously used by KVM to interact with VFIO groups. The
> > > patch serie now enables KVM to directly interact with a VFIO
> > > platform device. The VFIO external API was extended for that purpose.
> > > 
> > > Th KVM-VFIO device can get/put the vfio platform device, check its
> > > integrity and type, get the IRQ number associated to an IRQ index.
> > > 
> > > The IRQ forward programming is architecture specific (virtual interrupt
> > > controller programming basically). However the whole infrastructure is
> > > kept generic.
> > > 
> > > from a user point of view, the functionality is provided through new
> > > KVM-VFIO device commands, KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> > > and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> > > Assignment can only be changed when the physical IRQ is not active.
> > > It is the responsability of the user to do this check.
> > > 
> > > This patch serie has the following dependencies:
> > > - "ARM: Forwarding physical interrupts to a guest VM"
> > >   (http://lwn.net/Articles/603514/) in
> > > - [PATCH v3] irqfd for ARM
> > > - and obviously the VFIO platform driver serie:
> > >   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> > >   https://www.mail-archive.com/kvm at vger.kernel.org/msg103247.html
> > > 
> > > Integrated pieces can be found at
> > > ssh://git.linaro.org/people/eric.auger/linux.git
> > > on branch 3.17rc3_irqfd_forward_integ_v2
> > > 
> > > This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> > > 
> > > v1 -> v2:
> > > - forward control is moved from architecture specific file into generic
> > >   vfio.c module.
> > >   only kvm_arch_set_fwd_state remains architecture specific
> > > - integrate Kim's patch which enables KVM-VFIO for ARM
> > > - fix vgic state bypass in vgic_queue_hwirq
> > > - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
> > >   to include/uapi/linux/kvm.h
> > >   also irq_index renamed into index and guest_irq renamed into gsi
> > > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> > > - vfio_external_get_base_device renamed into vfio_external_base_device
> > > - vfio_external_get_type removed
> > > - kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
> > > - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > 
> > > Eric Auger (8):
> > >   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> > >     IRQ
> > >   KVM: ARM: VGIC: add forwarded irq rbtree lock
> > >   VFIO: platform: handler tests whether the IRQ is forwarded
> > >   KVM: KVM-VFIO: update user API to program forwarded IRQ
> > >   VFIO: Extend external user API
> > >   KVM: KVM-VFIO: add new VFIO external API hooks
> > >   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding
> > >     control
> > >   KVM: KVM-VFIO: ARM forwarding control
> > > 
> > > Kim Phillips (1):
> > >   ARM: KVM: Enable the KVM-VFIO device
> > > 
> > >  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> > >  arch/arm/include/asm/kvm_host.h            |   7 +
> > >  arch/arm/kvm/Kconfig                       |   1 +
> > >  arch/arm/kvm/Makefile                      |   4 +-
> > >  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
> > >  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> > >  drivers/vfio/vfio.c                        |  24 ++
> > >  include/kvm/arm_vgic.h                     |   1 +
> > >  include/linux/kvm_host.h                   |  27 ++
> > >  include/linux/vfio.h                       |   3 +
> > >  include/uapi/linux/kvm.h                   |   9 +
> > >  virt/kvm/arm/vgic.c                        |  59 +++-
> > >  virt/kvm/vfio.c                            | 497 ++++++++++++++++++++++++++++-
> > >  13 files changed, 733 insertions(+), 17 deletions(-)
> > >  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> > > 
> > 
> > Have we ventured too far in the other direction?  I suppose what I was
> > hoping to see was something more like:
> > 
> > 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> > 
> > 		/* get vfio_device */
> > 
> > 		/* get mutex */
> > 
> > 		/* verify device+irq isn't already forwarded */
> > 
> > 		/* allocate device/forwarded irq */
> > 
> > 		/* get struct device */
> > 
> > 		/* callout to arch code passing struct device, gsi, ... */
> > 
> > 		/* if success, add to kv, else free and error */
> > 
> > 		/* mutex unlock */
> > 	}
> 
> I think that's essentially what this patch set is trying to do, but
> there are just too many complicated intertwining cases right now that
> makes the code hard to read.
> 
> > 
> > Exposing the internal mutex out to arch code, as in v1, was an
> > indication that we were pushing too much out to arch code, but including
> > platform_device.h into virt/kvm/vfio.c tells me we're still not
> > abstracting at the right point.  Thanks,
> > 
> I raised my eyebrows over the platform device bus thingy here as well,
> but on the other hand, there's nothing ARM-specific about referring to
> the platform device bus.
> 
> I think perhaps it just has to be made more clear that the generic code
> deals with translating the device resources in the necessary way, and
> currently it only supports vfio-platform devices?

Ok, you're probably right, looking at it again it is closer than I
thought.  At the same time, the use of platform device in
virt/kvm/vfio.c is pointless and can easily be pushed out to the arch
code as just another error return case.  vfio.c doesn't need to be aware
of hwirq.  The rest of the code is just overly complicated, with three
different cleanup functions and validation function bloat.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
  2014-09-11  3:10     ` Christoffer Dall
@ 2014-09-11  8:44       ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11  8:44 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:43PM +0200, Eric Auger wrote:
>> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
>> need to disable the IRQ anymore. In that mode, when the handler completes
> 
> add a comma after completes
Hi Christoffer,
ok
> 
>> the IRQ is not deactivated but only its priority is lowered.
>>
>> Some other actor (typically a guest) is supposed to deactivate the IRQ,
>> allowing at that time a new physical IRQ to hit.
>>
>> In virtualization use case, the physical IRQ is automatically completed
>> by the interrupt controller when the guest completes the corresponding
>> virtual IRQ.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>>  drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
>> index 6768508..1f851b2 100644
>> --- a/drivers/vfio/platform/vfio_platform_irq.c
>> +++ b/drivers/vfio/platform/vfio_platform_irq.c
>> @@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
>>  	struct vfio_platform_irq *irq_ctx = dev_id;
>>  	unsigned long flags;
>>  	int ret = IRQ_NONE;
>> +	struct irq_data *d;
>> +	bool is_forwarded;
>>  
>>  	spin_lock_irqsave(&irq_ctx->lock, flags);
>>  
>>  	if (!irq_ctx->masked) {
>>  		ret = IRQ_HANDLED;
>> +		d = irq_get_irq_data(irq_ctx->hwirq);
>> +		is_forwarded = irqd_irq_forwarded(d);
>>  
>> -		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
>> +		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
>> +						!is_forwarded) {
>>  			disable_irq_nosync(irq_ctx->hwirq);
>>  			irq_ctx->masked = true;
>>  		}
>> -- 
>> 1.9.1
>>
> It makes sense that these needs to be all controlled in the kernel, but
> I'm wondering if it would be cleaner / more correct to clear the
> AUTOMASKED flag when the IRQ is forwarded and have vfio refuse setting
> this flag as long as the irq is forwarded?

If I am not wrong, even if the user sets AUTOMASKED, this info never is
exploited by the vfio platform driver. AUTOMASKED only is set internally
to the driver, on init, for level sensitive IRQs.

It seems to be the same on PCI (for INTx). I do not see anywhere the
user flag curectly copied into a local storage. But I prefer to be
careful ;-)

If confirmed, although the flag value is exposed in the user API, the
user set value never is exploited so this removes the need to check.

the forwarded IRQ modality being fully dynamic currently, then I would
need to update the irq_ctx->flags on each vfio_irq_handler call. I don't
know if its better?

Best Regards

Eric


> 
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
@ 2014-09-11  8:44       ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11  8:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:43PM +0200, Eric Auger wrote:
>> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
>> need to disable the IRQ anymore. In that mode, when the handler completes
> 
> add a comma after completes
Hi Christoffer,
ok
> 
>> the IRQ is not deactivated but only its priority is lowered.
>>
>> Some other actor (typically a guest) is supposed to deactivate the IRQ,
>> allowing at that time a new physical IRQ to hit.
>>
>> In virtualization use case, the physical IRQ is automatically completed
>> by the interrupt controller when the guest completes the corresponding
>> virtual IRQ.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>>  drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
>> index 6768508..1f851b2 100644
>> --- a/drivers/vfio/platform/vfio_platform_irq.c
>> +++ b/drivers/vfio/platform/vfio_platform_irq.c
>> @@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
>>  	struct vfio_platform_irq *irq_ctx = dev_id;
>>  	unsigned long flags;
>>  	int ret = IRQ_NONE;
>> +	struct irq_data *d;
>> +	bool is_forwarded;
>>  
>>  	spin_lock_irqsave(&irq_ctx->lock, flags);
>>  
>>  	if (!irq_ctx->masked) {
>>  		ret = IRQ_HANDLED;
>> +		d = irq_get_irq_data(irq_ctx->hwirq);
>> +		is_forwarded = irqd_irq_forwarded(d);
>>  
>> -		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
>> +		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
>> +						!is_forwarded) {
>>  			disable_irq_nosync(irq_ctx->hwirq);
>>  			irq_ctx->masked = true;
>>  		}
>> -- 
>> 1.9.1
>>
> It makes sense that these needs to be all controlled in the kernel, but
> I'm wondering if it would be cleaner / more correct to clear the
> AUTOMASKED flag when the IRQ is forwarded and have vfio refuse setting
> this flag as long as the irq is forwarded?

If I am not wrong, even if the user sets AUTOMASKED, this info never is
exploited by the vfio platform driver. AUTOMASKED only is set internally
to the driver, on init, for level sensitive IRQs.

It seems to be the same on PCI (for INTx). I do not see anywhere the
user flag curectly copied into a local storage. But I prefer to be
careful ;-)

If confirmed, although the flag value is exposed in the user API, the
user set value never is exploited so this removes the need to check.

the forwarded IRQ modality being fully dynamic currently, then I would
need to update the irq_ctx->flags on each vfio_irq_handler call. I don't
know if its better?

Best Regards

Eric


> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 5/9] KVM: KVM-VFIO: update user API to program forwarded IRQ
  2014-09-11  3:10     ` Christoffer Dall
@ 2014-09-11  8:49       ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11  8:49 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:44PM +0200, Eric Auger wrote:
>> add new device group commands:
>> - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ and
>>   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
>>
>> which enable to turn forwarded IRQ mode on/off.
>>
>> the kvm_arch_forwarded_irq struct embodies a forwarded IRQ
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v1 -> v2:
>> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
>>   to include/uapi/linux/kvm.h
>>   also irq_index renamed into index and guest_irq renamed into gsi
>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
>> ---
>>  Documentation/virtual/kvm/devices/vfio.txt | 26 ++++++++++++++++++++++++++
>>  include/uapi/linux/kvm.h                   |  9 +++++++++
>>  2 files changed, 35 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
>> index ef51740..048baa0 100644
>> --- a/Documentation/virtual/kvm/devices/vfio.txt
>> +++ b/Documentation/virtual/kvm/devices/vfio.txt
>> @@ -13,6 +13,7 @@ VFIO-group is held by KVM.
>>  
>>  Groups:
>>    KVM_DEV_VFIO_GROUP
>> +  KVM_DEV_VFIO_DEVICE
>>  
>>  KVM_DEV_VFIO_GROUP attributes:
>>    KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
>> @@ -20,3 +21,28 @@ KVM_DEV_VFIO_GROUP attributes:
>>  
>>  For each, kvm_device_attr.addr points to an int32_t file descriptor
>>  for the VFIO group.
>> +
>> +KVM_DEV_VFIO_DEVICE attributes:
>> +  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ
>> +  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
>> +
>> +For each, kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct.
>> +This user API makes possible to create a special IRQ handling mode,
> 
>   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ enables a special IRQ handling mode on
>   hardware that supports it,
OK
> 
>> +where KVM and a VFIO platform driver collaborate to improve IRQ
>> +handling performance.
>> +
>> +'fd represents the file descriptor of a valid VFIO device whose physical
> 
> fd is described out of context here.  Can you copy the struct definition
> into this document, perhaps right after the "For each, ..." line above.
yes sure
> 
>> +IRQ, referenced by its index, is injected into the VM guest irq (gsi).
>                                              as a virtual IRQ (specified
> 					     by the gsi field) into the
> 					     VM.
> 
>> +
>> +On FORWARD_IRQ, KVM-VFIO device programs:
>    When setting the  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ attribute, the
>    KVM-VFIO device tells the host (or VFIO?) to not complete the
>    physical IRQ, and instead ensures that KVM (or the VM) completes the
>    physical IRQ.
> 
>> +- the host, to not complete the physical IRQ itself.
>> +- the GIC, to automatically complete the physical IRQ when the guest
>> +  completes the virtual IRQ.
> 
> and drop this bullet form.
ok
> 
>> +This avoids trapping the end-of-interrupt for level sensitive IRQ.
> 
> avoid this last line, it's specific to ARM.
ok
> 
>> +
>> +On UNFORWARD_IRQ, one returns to the mode where the host completes the
>    When setting the KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ attribute, the
>    host (VFIO?) will again complete the physical IRQ and KVM will not...
>  
>> +physical IRQ and the guest completes the virtual IRQ.
>> +
>> +It is up to the caller of this API to make sure the IRQ is not
>> +outstanding when the FORWARD/UNFORWARD is called. This could lead to
> 
> outstanding? can you be specific?
active? and I should add *physical* IRQ
> 
> don't refer to FOWARD/UNFORWARD, either refer to these attributes by
> their full name or use a clear reference in proper English.
ok
> 
>> +some inconsistency on who is going to complete the IRQ.
> 
> This sounds like the whole thing is fragile and if userspace doesn't do
> things right, IRQ handling of a piece of hardware is going to be
> inconsistent?  Is this the case?  If so, we need some stronger
> semantics.  If not, this should be rephrased.
Actually the KVM-VFIO device rejects any attempt to change the
forwarding mode if the physical IRQ is active. So I hope this is robust
and will change the explanation.

Thanks

Eric
> 
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index cf3a2ff..8cd7b0e 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -947,6 +947,12 @@ struct kvm_device_attr {
>>  	__u64	addr;		/* userspace address of attr data */
>>  };
>>  
>> +struct kvm_arch_forwarded_irq {
>> +	__u32 fd; /* file desciptor of the VFIO device */
>> +	__u32 index; /* VFIO device IRQ index */
>> +	__u32 gsi; /* gsi, ie. virtual IRQ number */
>> +};
>> +
>>  #define KVM_DEV_TYPE_FSL_MPIC_20	1
>>  #define KVM_DEV_TYPE_FSL_MPIC_42	2
>>  #define KVM_DEV_TYPE_XICS		3
>> @@ -954,6 +960,9 @@ struct kvm_device_attr {
>>  #define  KVM_DEV_VFIO_GROUP			1
>>  #define   KVM_DEV_VFIO_GROUP_ADD			1
>>  #define   KVM_DEV_VFIO_GROUP_DEL			2
>> +#define  KVM_DEV_VFIO_DEVICE			2
>> +#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ			1
>> +#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ			2
>>  #define KVM_DEV_TYPE_ARM_VGIC_V2	5
>>  #define KVM_DEV_TYPE_FLIC		6
>>  
>> -- 
>> 1.9.1
>>
> 
> Thanks,
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 5/9] KVM: KVM-VFIO: update user API to program forwarded IRQ
@ 2014-09-11  8:49       ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11  8:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:44PM +0200, Eric Auger wrote:
>> add new device group commands:
>> - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ and
>>   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
>>
>> which enable to turn forwarded IRQ mode on/off.
>>
>> the kvm_arch_forwarded_irq struct embodies a forwarded IRQ
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v1 -> v2:
>> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
>>   to include/uapi/linux/kvm.h
>>   also irq_index renamed into index and guest_irq renamed into gsi
>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
>> ---
>>  Documentation/virtual/kvm/devices/vfio.txt | 26 ++++++++++++++++++++++++++
>>  include/uapi/linux/kvm.h                   |  9 +++++++++
>>  2 files changed, 35 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
>> index ef51740..048baa0 100644
>> --- a/Documentation/virtual/kvm/devices/vfio.txt
>> +++ b/Documentation/virtual/kvm/devices/vfio.txt
>> @@ -13,6 +13,7 @@ VFIO-group is held by KVM.
>>  
>>  Groups:
>>    KVM_DEV_VFIO_GROUP
>> +  KVM_DEV_VFIO_DEVICE
>>  
>>  KVM_DEV_VFIO_GROUP attributes:
>>    KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
>> @@ -20,3 +21,28 @@ KVM_DEV_VFIO_GROUP attributes:
>>  
>>  For each, kvm_device_attr.addr points to an int32_t file descriptor
>>  for the VFIO group.
>> +
>> +KVM_DEV_VFIO_DEVICE attributes:
>> +  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ
>> +  KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ
>> +
>> +For each, kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct.
>> +This user API makes possible to create a special IRQ handling mode,
> 
>   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ enables a special IRQ handling mode on
>   hardware that supports it,
OK
> 
>> +where KVM and a VFIO platform driver collaborate to improve IRQ
>> +handling performance.
>> +
>> +'fd represents the file descriptor of a valid VFIO device whose physical
> 
> fd is described out of context here.  Can you copy the struct definition
> into this document, perhaps right after the "For each, ..." line above.
yes sure
> 
>> +IRQ, referenced by its index, is injected into the VM guest irq (gsi).
>                                              as a virtual IRQ (specified
> 					     by the gsi field) into the
> 					     VM.
> 
>> +
>> +On FORWARD_IRQ, KVM-VFIO device programs:
>    When setting the  KVM_DEV_VFIO_DEVICE_FORWARD_IRQ attribute, the
>    KVM-VFIO device tells the host (or VFIO?) to not complete the
>    physical IRQ, and instead ensures that KVM (or the VM) completes the
>    physical IRQ.
> 
>> +- the host, to not complete the physical IRQ itself.
>> +- the GIC, to automatically complete the physical IRQ when the guest
>> +  completes the virtual IRQ.
> 
> and drop this bullet form.
ok
> 
>> +This avoids trapping the end-of-interrupt for level sensitive IRQ.
> 
> avoid this last line, it's specific to ARM.
ok
> 
>> +
>> +On UNFORWARD_IRQ, one returns to the mode where the host completes the
>    When setting the KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ attribute, the
>    host (VFIO?) will again complete the physical IRQ and KVM will not...
>  
>> +physical IRQ and the guest completes the virtual IRQ.
>> +
>> +It is up to the caller of this API to make sure the IRQ is not
>> +outstanding when the FORWARD/UNFORWARD is called. This could lead to
> 
> outstanding? can you be specific?
active? and I should add *physical* IRQ
> 
> don't refer to FOWARD/UNFORWARD, either refer to these attributes by
> their full name or use a clear reference in proper English.
ok
> 
>> +some inconsistency on who is going to complete the IRQ.
> 
> This sounds like the whole thing is fragile and if userspace doesn't do
> things right, IRQ handling of a piece of hardware is going to be
> inconsistent?  Is this the case?  If so, we need some stronger
> semantics.  If not, this should be rephrased.
Actually the KVM-VFIO device rejects any attempt to change the
forwarding mode if the physical IRQ is active. So I hope this is robust
and will change the explanation.

Thanks

Eric
> 
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index cf3a2ff..8cd7b0e 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -947,6 +947,12 @@ struct kvm_device_attr {
>>  	__u64	addr;		/* userspace address of attr data */
>>  };
>>  
>> +struct kvm_arch_forwarded_irq {
>> +	__u32 fd; /* file desciptor of the VFIO device */
>> +	__u32 index; /* VFIO device IRQ index */
>> +	__u32 gsi; /* gsi, ie. virtual IRQ number */
>> +};
>> +
>>  #define KVM_DEV_TYPE_FSL_MPIC_20	1
>>  #define KVM_DEV_TYPE_FSL_MPIC_42	2
>>  #define KVM_DEV_TYPE_XICS		3
>> @@ -954,6 +960,9 @@ struct kvm_device_attr {
>>  #define  KVM_DEV_VFIO_GROUP			1
>>  #define   KVM_DEV_VFIO_GROUP_ADD			1
>>  #define   KVM_DEV_VFIO_GROUP_DEL			2
>> +#define  KVM_DEV_VFIO_DEVICE			2
>> +#define   KVM_DEV_VFIO_DEVICE_FORWARD_IRQ			1
>> +#define   KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ			2
>>  #define KVM_DEV_TYPE_ARM_VGIC_V2	5
>>  #define KVM_DEV_TYPE_FLIC		6
>>  
>> -- 
>> 1.9.1
>>
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 6/9] VFIO: Extend external user API
  2014-09-11  3:10     ` Christoffer Dall
@ 2014-09-11  8:50       ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11  8:50 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:45PM +0200, Eric Auger wrote:
>> New functions are added to be called from ARM KVM-VFIO device.
> 
> This commit message seems somewhat random.  This patch doesn't deal with
> anything ARM specific, it introduces some generic functions that allows
> users external to vfio itself to retrieve information about a vfio
> platform device.

Yes you're right.
> 
>>
>> - vfio_device_get_external_user enables to get a vfio device from
>>   its fd
>> - vfio_device_put_external_user puts the vfio device
>> - vfio_external_base_device returns the struct device*,
>>   useful to access the platform_device
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v1 -> v2:
>>
>> - vfio_external_get_base_device renamed into vfio_external_base_device
>> - vfio_external_get_type removed
>> ---
>>  drivers/vfio/vfio.c  | 24 ++++++++++++++++++++++++
>>  include/linux/vfio.h |  3 +++
>>  2 files changed, 27 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>> index 8e84471..282814e 100644
>> --- a/drivers/vfio/vfio.c
>> +++ b/drivers/vfio/vfio.c
>> @@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group *group)
>>  }
>>  EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
>>  
>> +struct vfio_device *vfio_device_get_external_user(struct file *filep)
>> +{
>> +	struct vfio_device *vdev = filep->private_data;
>> +
>> +	if (filep->f_op != &vfio_device_fops)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	vfio_device_get(vdev);
>> +	return vdev;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
>> +
>> +void vfio_device_put_external_user(struct vfio_device *vdev)
>> +{
>> +	vfio_device_put(vdev);
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
>> +
>> +struct device *vfio_external_base_device(struct vfio_device *vdev)
>> +{
>> +	return vdev->dev;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_external_base_device);
>> +
>>  int vfio_external_user_iommu_id(struct vfio_group *group)
>>  {
>>  	return iommu_group_id(group->iommu_group);
>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>> index ffe04ed..bd4b6cb 100644
>> --- a/include/linux/vfio.h
>> +++ b/include/linux/vfio.h
>> @@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
>>  extern int vfio_external_user_iommu_id(struct vfio_group *group);
>>  extern long vfio_external_check_extension(struct vfio_group *group,
>>  					  unsigned long arg);
>> +extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
>> +extern void vfio_device_put_external_user(struct vfio_device *vdev);
>> +extern struct device *vfio_external_base_device(struct vfio_device *vdev);
>>  
>>  struct pci_dev;
>>  #ifdef CONFIG_EEH
>> -- 
>> 1.9.1
>>
> Looks good to me,
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 6/9] VFIO: Extend external user API
@ 2014-09-11  8:50       ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11  8:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:45PM +0200, Eric Auger wrote:
>> New functions are added to be called from ARM KVM-VFIO device.
> 
> This commit message seems somewhat random.  This patch doesn't deal with
> anything ARM specific, it introduces some generic functions that allows
> users external to vfio itself to retrieve information about a vfio
> platform device.

Yes you're right.
> 
>>
>> - vfio_device_get_external_user enables to get a vfio device from
>>   its fd
>> - vfio_device_put_external_user puts the vfio device
>> - vfio_external_base_device returns the struct device*,
>>   useful to access the platform_device
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v1 -> v2:
>>
>> - vfio_external_get_base_device renamed into vfio_external_base_device
>> - vfio_external_get_type removed
>> ---
>>  drivers/vfio/vfio.c  | 24 ++++++++++++++++++++++++
>>  include/linux/vfio.h |  3 +++
>>  2 files changed, 27 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>> index 8e84471..282814e 100644
>> --- a/drivers/vfio/vfio.c
>> +++ b/drivers/vfio/vfio.c
>> @@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group *group)
>>  }
>>  EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
>>  
>> +struct vfio_device *vfio_device_get_external_user(struct file *filep)
>> +{
>> +	struct vfio_device *vdev = filep->private_data;
>> +
>> +	if (filep->f_op != &vfio_device_fops)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	vfio_device_get(vdev);
>> +	return vdev;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
>> +
>> +void vfio_device_put_external_user(struct vfio_device *vdev)
>> +{
>> +	vfio_device_put(vdev);
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
>> +
>> +struct device *vfio_external_base_device(struct vfio_device *vdev)
>> +{
>> +	return vdev->dev;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_external_base_device);
>> +
>>  int vfio_external_user_iommu_id(struct vfio_group *group)
>>  {
>>  	return iommu_group_id(group->iommu_group);
>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>> index ffe04ed..bd4b6cb 100644
>> --- a/include/linux/vfio.h
>> +++ b/include/linux/vfio.h
>> @@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
>>  extern int vfio_external_user_iommu_id(struct vfio_group *group);
>>  extern long vfio_external_check_extension(struct vfio_group *group,
>>  					  unsigned long arg);
>> +extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
>> +extern void vfio_device_put_external_user(struct vfio_device *vdev);
>> +extern struct device *vfio_external_base_device(struct vfio_device *vdev);
>>  
>>  struct pci_dev;
>>  #ifdef CONFIG_EEH
>> -- 
>> 1.9.1
>>
> Looks good to me,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 7/9] KVM: KVM-VFIO: add new VFIO external API hooks
  2014-09-11  3:10     ` Christoffer Dall
@ 2014-09-11  8:51       ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11  8:51 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:46PM +0200, Eric Auger wrote:
>> add functions that implement the gateway to the extended
> 
> Capital letter when beginning a new sentence.  Also the reference to
> 'the extended VFIO API' feels a bit weird.  Can't you make your commit
> message a little more descriptive of this patch, something along the
> lines of:
> 
> Provide wrapper functions that allows KVM-VFIO device code to get an
> external handle on a struct vfio_device based on a vfio device file
> descriptor.  We provide this through three new functions:
> 
> (assuming I got this right).
sure
> 
> 
> 
>> external VFIO API:
>> - kvm_vfio_device_get_external_user
>> - kvm_vfio_device_put_external_user
>> - kvm_vfio_external_base_device
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v1 -> v2:
>> - kvm_vfio_external_get_base_device renamed into
>>   kvm_vfio_external_base_device
>> - kvm_vfio_external_get_type removed
>> ---
>>  arch/arm/include/asm/kvm_host.h |  5 +++++
>>  virt/kvm/vfio.c                 | 45 +++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 50 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 6dfb404..1aee6bb 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -171,6 +171,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>>  unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
>>  int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
>>  
>> +struct vfio_device;
>> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep);
>> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev);
>> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev);
>> +
>>  /* We do not have shadow page tables, hence the empty hooks */
>>  static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
>>  {
>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>> index ba1a93f..76dc7a1 100644
>> --- a/virt/kvm/vfio.c
>> +++ b/virt/kvm/vfio.c
>> @@ -59,6 +59,51 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
>>  	symbol_put(vfio_group_put_external_user);
>>  }
>>  
>> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
>> +{
>> +	struct vfio_device *vdev;
>> +	struct vfio_device *(*fn)(struct file *);
>> +
>> +	fn = symbol_get(vfio_device_get_external_user);
>> +	if (!fn)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	vdev = fn(filep);
>> +
>> +	symbol_put(vfio_device_get_external_user);
>> +
>> +	return vdev;
>> +}
>> +
>> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
>> +{
>> +	void (*fn)(struct vfio_device *);
>> +
>> +	fn = symbol_get(vfio_device_put_external_user);
>> +	if (!fn)
>> +		return;
>> +
>> +	fn(vdev);
>> +
>> +	symbol_put(vfio_device_put_external_user);
>> +}
>> +
>> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
>> +{
>> +	struct device *(*fn)(struct vfio_device *);
>> +	struct device *dev;
>> +
>> +	fn = symbol_get(vfio_external_base_device);
>> +	if (!fn)
>> +		return NULL;
>> +
>> +	dev = fn(vdev);
>> +
>> +	symbol_put(vfio_external_base_device);
>> +
>> +	return dev;
>> +}
>> +
>>  static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
>>  {
>>  	long (*fn)(struct vfio_group *, unsigned long);
>> -- 
>> 1.9.1
>>
> 
> otherwise looks good to me!
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 7/9] KVM: KVM-VFIO: add new VFIO external API hooks
@ 2014-09-11  8:51       ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11  8:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:46PM +0200, Eric Auger wrote:
>> add functions that implement the gateway to the extended
> 
> Capital letter when beginning a new sentence.  Also the reference to
> 'the extended VFIO API' feels a bit weird.  Can't you make your commit
> message a little more descriptive of this patch, something along the
> lines of:
> 
> Provide wrapper functions that allows KVM-VFIO device code to get an
> external handle on a struct vfio_device based on a vfio device file
> descriptor.  We provide this through three new functions:
> 
> (assuming I got this right).
sure
> 
> 
> 
>> external VFIO API:
>> - kvm_vfio_device_get_external_user
>> - kvm_vfio_device_put_external_user
>> - kvm_vfio_external_base_device
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v1 -> v2:
>> - kvm_vfio_external_get_base_device renamed into
>>   kvm_vfio_external_base_device
>> - kvm_vfio_external_get_type removed
>> ---
>>  arch/arm/include/asm/kvm_host.h |  5 +++++
>>  virt/kvm/vfio.c                 | 45 +++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 50 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 6dfb404..1aee6bb 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -171,6 +171,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>>  unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
>>  int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
>>  
>> +struct vfio_device;
>> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep);
>> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev);
>> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev);
>> +
>>  /* We do not have shadow page tables, hence the empty hooks */
>>  static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
>>  {
>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>> index ba1a93f..76dc7a1 100644
>> --- a/virt/kvm/vfio.c
>> +++ b/virt/kvm/vfio.c
>> @@ -59,6 +59,51 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
>>  	symbol_put(vfio_group_put_external_user);
>>  }
>>  
>> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
>> +{
>> +	struct vfio_device *vdev;
>> +	struct vfio_device *(*fn)(struct file *);
>> +
>> +	fn = symbol_get(vfio_device_get_external_user);
>> +	if (!fn)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	vdev = fn(filep);
>> +
>> +	symbol_put(vfio_device_get_external_user);
>> +
>> +	return vdev;
>> +}
>> +
>> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
>> +{
>> +	void (*fn)(struct vfio_device *);
>> +
>> +	fn = symbol_get(vfio_device_put_external_user);
>> +	if (!fn)
>> +		return;
>> +
>> +	fn(vdev);
>> +
>> +	symbol_put(vfio_device_put_external_user);
>> +}
>> +
>> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
>> +{
>> +	struct device *(*fn)(struct vfio_device *);
>> +	struct device *dev;
>> +
>> +	fn = symbol_get(vfio_external_base_device);
>> +	if (!fn)
>> +		return NULL;
>> +
>> +	dev = fn(vdev);
>> +
>> +	symbol_put(vfio_external_base_device);
>> +
>> +	return dev;
>> +}
>> +
>>  static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
>>  {
>>  	long (*fn)(struct vfio_group *, unsigned long);
>> -- 
>> 1.9.1
>>
> 
> otherwise looks good to me!
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-11  3:10     ` Christoffer Dall
@ 2014-09-11  9:35       ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11  9:35 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
>> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
>>
>> This is a new control channel which enables KVM to cooperate with
>> viable VFIO devices.
>>
>> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
>> in addition to a list of groups (kvm_vfio_group). The new
>> infrastructure enables to check the validity of the VFIO device
>> file descriptor, get and hold a reference to it.
>>
>> The first concrete implemented command is IRQ forward control:
>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
>>
>> It consists in programing the VFIO driver and KVM in a consistent manner
>> so that an optimized IRQ injection/completion is set up. Each
>> kvm_vfio_device holds a list of forwarded IRQ. When putting a
>> kvm_vfio_device, the implementation makes sure the forwarded IRQs
>> are set again in the normal handling state (non forwarded).
> 
> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> 
> When a kvm_vfio_device is released?
sure
> 
>>
>> The forwarding programmming is architecture specific, embodied by the
>> kvm_arch_set_fwd_state function. Its implementation is given in a
>> separate patch file.
> 
> I would drop the last sentence and instead indicate that this is handled
> properly when the architecture does not support such a feature.
ok
> 
>>
>> The forwarding control modality is enabled by the
>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v1 -> v2:
>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> - original patch file separated into 2 parts: generic part moved in vfio.c
>>   and ARM specific part(kvm_arch_set_fwd_state)
>> ---
>>  include/linux/kvm_host.h |  27 +++
>>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 477 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index a4c33b3..24350dc 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
>>  		      unsigned long arg);
>>  };
>>  
>> +enum kvm_fwd_irq_action {
>> +	KVM_VFIO_IRQ_SET_FORWARD,
>> +	KVM_VFIO_IRQ_SET_NORMAL,
>> +	KVM_VFIO_IRQ_CLEANUP,
> 
> This is KVM internal API, so it would probably be good to document this.
> Especially the CLEANUP bit worries me, see below.
I will document it
> 
>> +};
>> +
>> +/* internal structure describing a forwarded IRQ */
>> +struct kvm_fwd_irq {
>> +	struct list_head link;
> 
> this list entry is local to the kvm vfio device, right? that means you
> probably want a struct with just the below fields, and then have a
> containing struct in the generic device file, private to it's logic.
I will introduce 2 separate structs
> 
>> +	__u32 index; /* platform device irq index */
>> +	__u32 hwirq; /*physical IRQ */
>> +	__u32 gsi; /* virtual IRQ */
>> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
>> +};
>> +
>>  void kvm_device_get(struct kvm_device *dev);
>>  void kvm_device_put(struct kvm_device *dev);
>>  struct kvm_device *kvm_device_from_filp(struct file *filp);
>> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
>>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>>  extern struct kvm_device_ops kvm_flic_ops;
>>  
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> 
> what's the 'p' in pfwd?
will rename
> 
>> +			   enum kvm_fwd_irq_action action);
>> +
>> +#else
>> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
>> +					 enum kvm_fwd_irq_action action)
>> +{
>> +	return 0;
>> +}
>> +#endif
>> +
>>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>  
>>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>> index 76dc7a1..e4a81c4 100644
>> --- a/virt/kvm/vfio.c
>> +++ b/virt/kvm/vfio.c
>> @@ -18,14 +18,24 @@
>>  #include <linux/slab.h>
>>  #include <linux/uaccess.h>
>>  #include <linux/vfio.h>
>> +#include <linux/platform_device.h>
>>  
>>  struct kvm_vfio_group {
>>  	struct list_head node;
>>  	struct vfio_group *vfio_group;
>>  };
>>  
>> +struct kvm_vfio_device {
>> +	struct list_head node;
>> +	struct vfio_device *vfio_device;
>> +	/* list of forwarded IRQs for that VFIO device */
>> +	struct list_head fwd_irq_list;
>> +	int fd;
>> +};
>> +
>>  struct kvm_vfio {
>>  	struct list_head group_list;
>> +	struct list_head device_list;
>>  	struct mutex lock;
>>  	bool noncoherent;
>>  };
>> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>>  	return -ENXIO;
>>  }
>>  
>> +/**
>> + * get_vfio_device - returns the vfio-device corresponding to this fd
>> + * @fd:fd of the vfio platform device
>> + *
>> + * checks it is a vfio device
>> + * increment its ref counter
> 
> why the short lines?  Just write this out in proper English.
OK
> 
>> + */
>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
>> +{
>> +	struct fd f;
>> +	struct vfio_device *vdev;
>> +
>> +	f = fdget(fd);
>> +	if (!f.file)
>> +		return NULL;
>> +	vdev = kvm_vfio_device_get_external_user(f.file);
>> +	fdput(f);
>> +	return vdev;
>> +}
>> +
>> +/**
>> + * put_vfio_device: put the vfio platform device
>> + * @vdev: vfio_device to put
>> + *
>> + * decrement the ref counter
>> + */
>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
>> +{
>> +	kvm_vfio_device_put_external_user(vdev);
>> +}
>> +
>> +/**
>> + * kvm_vfio_find_device - look for the device in the assigned
>> + * device list
>> + * @kv: the kvm-vfio device
>> + * @vdev: the vfio_device to look for
>> + *
>> + * returns the associated kvm_vfio_device if the device is known,
>> + * meaning at least 1 IRQ is forwarded for this device.
>> + * in the device is not registered, returns NULL.
>> + */
> 
> are these functions meant to be exported?  Otherwise they should be
> static, and the documentation on these simple list iteration wrappers
> seems like overkill imho.
could be static indeed
> 
>> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
>> +					     struct vfio_device *vdev)
>> +{
>> +	struct kvm_vfio_device *kvm_vdev_iter;
>> +
>> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
>> +		if (kvm_vdev_iter->vfio_device == vdev)
>> +			return kvm_vdev_iter;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +/**
>> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
>> + * @kvm_vdev: the kvm_vfio_device
>> + * @irq_index: irq index
>> + *
>> + * returns the forwarded irq struct if it exists, NULL in the negative
>> + */
>> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
>> +				      int irq_index)
>> +{
>> +	struct kvm_fwd_irq *fwd_irq_iter;
>> +
>> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
>> +		if (fwd_irq_iter->index == irq_index)
>> +			return fwd_irq_iter;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +/**
>> + * validate_forward - checks whether forwarding a given IRQ is meaningful
>> + * @vdev:  vfio_device the IRQ belongs to
>> + * @fwd_irq: user struct containing the irq_index to forward
>> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
>> + * kvm_vfio_device that holds it
>> + * @hwirq: irq numberthe irq index corresponds to
>> + *
>> + * checks the vfio-device is a platform vfio device
>> + * checks the irq_index corresponds to an actual hwirq and
>> + * checks this hwirq is not already forwarded
>> + * returns < 0 on following errors:
>> + * not a platform device, bad irq index, already forwarded
>> + */
>> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
>> +			    struct vfio_device *vdev,
>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
>> +			    struct kvm_vfio_device **kvm_vdev,
>> +			    int *hwirq)
>> +{
>> +	struct device *dev = kvm_vfio_external_base_device(vdev);
>> +	struct platform_device *platdev;
>> +
>> +	*hwirq = -1;
>> +	*kvm_vdev = NULL;
>> +	if (strcmp(dev->bus->name, "platform") == 0) {
>> +		platdev = to_platform_device(dev);
>> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
>> +		if (*hwirq < 0) {
>> +			kvm_err("%s incorrect index\n", __func__);
>> +			return -EINVAL;
>> +		}
>> +	} else {
>> +		kvm_err("%s not a platform device\n", __func__);
>> +		return -EINVAL;
>> +	}
> 
> need some spaceing here, also, I would turn this around, first check if
> the strcmp fails, and then error out, then do you next check etc., to
> avoid so many nested statements.
ok
> 
>> +	/* is a ref to this device already owned by the KVM-VFIO device? */
> 
> this comment is not particularly helpful in its current form, it would
> be helpful if you specified that we're checking whether that particular
> device/irq combo is already registered.
> 
>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
>> +	if (*kvm_vdev) {
>> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
>> +			kvm_err("%s irq %d already forwarded\n",
>> +				__func__, *hwirq);
> 
> don't flood the kernel log because of a user error, just allocate an
> error code for this purpose and document it in the ABI, -EEXIST or
> something.
ok
> 
>> +			return -EINVAL;
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +/**
>> + * validate_unforward: check a deassignment is meaningful
>> + * @kv: the kvm_vfio device
>> + * @vdev: the vfio_device whose irq to deassign belongs to
>> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
>> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
>> + * it exists
>> + *
>> + * returns 0 if the provided irq effectively is forwarded
>> + * (a ref to this vfio_device is hold and this irq belongs to
>                                     held
>> + * the forwarded irq of this device)
>> + * returns -EINVAL in the negative
> 
>                ENOENT should be returned if you don't have an entry.
> 	       EINVAL could be used if you supply an fd that isn't a
> 	       VFIO device file descriptor, for example.  Again,
> 	       consider documenting all this in the API.
> 
>> + */
>> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
>> +			      struct vfio_device *vdev,
>> +			      struct kvm_arch_forwarded_irq *fwd_irq,
>> +			      struct kvm_vfio_device **kvm_vdev)
>> +{
>> +	struct kvm_fwd_irq *pfwd;
>> +
>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
>> +	if (!kvm_vdev) {
>> +		kvm_err("%s no forwarded irq for this device\n", __func__);
> 
> don't flood the kernel log
ok
> 
>> +		return -EINVAL;
>> +	}
>> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
>> +	if (!pfwd) {
>> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> 
> 
>> +		return -EINVAL;
> 
> same here
ok
> 
>> +	}
>> +	return 0;
>> +}
>> +
>> +/**
>> + * kvm_vfio_forward - set a forwarded IRQ
>> + * @kdev: the kvm device
>> + * @vdev: the vfio device the IRQ belongs to
>> + * @fwd_irq: the user struct containing the irq_index and guest irq
>> + * @must_put: tells the caller whether the vfio_device must be put after
>> + * the call (ref must be released in case a ref onto this device was
>> + * already hold or in case of new device and failure)
>> + *
>> + * validate the injection, activate forward and store the information
>       Validate
>> + * about which irq and which device is concerned so that on deassign or
>> + * kvm-vfio destruction everuthing can be cleaned up.
>                            everything
> 
> I'm not sure I understand this explanation.  Do we have concerned
> devices?
> 
> I think you want to say something along the lines of: If userspace passed
> a valid vfio device and irq handle and the architecture supports
> forwarding this combination, register the vfio_device and irq
> combination in the ....
ok
> 
>> + */
>> +static int kvm_vfio_forward(struct kvm_device *kdev,
>> +			    struct vfio_device *vdev,
>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
>> +			    bool *must_put)
>> +{
>> +	int ret;
>> +	struct kvm_fwd_irq *pfwd = NULL;
>> +	struct kvm_vfio_device *kvm_vdev = NULL;
>> +	struct kvm_vfio *kv = kdev->private;
>> +	int hwirq;
>> +
>> +	*must_put = true;
>> +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
>> +					&kvm_vdev, &hwirq);
>> +	if (ret < 0)
>> +		return -EINVAL;
>> +
>> +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
> 
> seems a bit pointless to zero-out the memory if you're setting all
> fields below.
ok
> 
>> +	if (!pfwd)
>> +		return -ENOMEM;
>> +	pfwd->index = fwd_irq->index;
>> +	pfwd->gsi = fwd_irq->gsi;
>> +	pfwd->hwirq = hwirq;
>> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
>> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
>> +	if (ret < 0) {
>> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> 
> this whole thing feels incredibly broken to me.  Setting a forward
> should either work or not work, not something in between that leaves
> something to be cleaned up.  Why this two-stage thingy here?
I wanted to exploit the return value of vgic_map_phys_irq which is
likely to fail if the phys/virt mapping exists at VGIC level.

I already validated the injection from a KVM_VFIO_DEVICE point of view
(the device/irq is not known internally). But what if another external
component - which does not exist yet - maps the IRQ at VGIC level? Maybe
I need to replace the existing validation check by querying the VGIC at
low level instead of checking KVM-VFIO local variables.
> 
>> +		kfree(pfwd);
> 
> probably want to move your free-and-return-error to the end of the
> function.
ok
> 
>> +		return ret;
>> +	}
>> +
>> +	if (!kvm_vdev) {
>> +		/* create & insert the new device and keep the ref */
>> +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
> 
> again, no need for zeroing out the memory.
ok
> 
>> +		if (!kvm_vdev) {
>> +			kvm_arch_set_fwd_state(pfwd, false);
>> +			kfree(pfwd);
>> +			return -ENOMEM;
>> +		}
>> +
>> +		kvm_vdev->vfio_device = vdev;
>> +		kvm_vdev->fd = fwd_irq->fd;
>> +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
>> +		list_add(&kvm_vdev->node, &kv->device_list);
>> +		/*
>> +		 * the only case where we keep the ref:
>> +		 * new device and forward setting successful
>> +		 */
>> +		*must_put = false;
>> +	}
>> +
>> +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
>> +
>> +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
>> +	fwd_irq->fd, hwirq, fwd_irq->gsi);
> 
> please indent this to align with the opening parenthesis.
ok
> 
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * remove_assigned_device - put a given device from the list
> 
> this isn't a 'put', at least not *just* a put.
correct, I will rephrase
> 
>> + * @kv: the kvm-vfio device
>> + * @vdev: the vfio-device to remove
>> + *
>> + * change the state of all forwarded IRQs, free the forwarded IRQ list,
>> + * remove the corresponding kvm_vfio_device from the assigned device
>> + * list.
>> + * returns true if the device could be removed, false in the negative
>> + */
>> +bool remove_assigned_device(struct kvm_vfio *kv,
>> +			    struct vfio_device *vdev)
>> +{
>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>> +	bool removed = false;
>> +	int ret;
>> +
>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
>> +				 &kv->device_list, node) {
>> +		if (kvm_vdev_iter->vfio_device == vdev) {
>> +			/* loop on all its forwarded IRQ */
>> +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>> +						 &kvm_vdev_iter->fwd_irq_list,
>> +						 link) {
> 
> hmm, seems this function is only called when you have no more forwarded
> IRQs, so isn't all of this completely dead (and unnecessary) code?
yep I can simplify all that cleanup
> 
>> +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
>> +						KVM_VFIO_IRQ_SET_NORMAL);
>> +				if (ret < 0)
>> +					return ret;
> 
> you're returning an error code to a bool function, which means you'll
> return true when there was an error.  Is this your intention? ;)
definitively not!
> 
> if we have an error here, this would be a very very bad situation wouldn't it?
sure. I will simplify this, transform kvm_arch_set_fwd_state into a void
function
> 
>> +				list_del(&fwd_irq_iter->link);
>> +				kfree(fwd_irq_iter);
>> +			}
>> +			/* all IRQs could be deassigned */
>> +			list_del(&kvm_vdev_iter->node);
>> +			kvm_vfio_device_put_external_user(
>> +				kvm_vdev_iter->vfio_device);
>> +			kfree(kvm_vdev_iter);
>> +			removed = true;
>> +			break;
>> +		}
>> +	}
>> +	return removed;
>> +}
>> +
>> +
>> +/**
>> + * remove_fwd_irq - remove a forwarded irq
>> + *
>> + * @kv: kvm-vfio device
>> + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
>> + * irq_index: the index of the IRQ
>> + *
>> + * change the forwarded state of the IRQ, remove the IRQ from
>> + * the device forwarded IRQ list. In case it is the last one,
>> + * put the device
>> + */
>> +int remove_fwd_irq(struct kvm_vfio *kv,
>> +		   struct kvm_vfio_device *kvm_vdev,
>> +		   int irq_index)
>> +{
>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>> +	int ret = -1;
>> +
>> +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>> +				 &kvm_vdev->fwd_irq_list, link) {
> 
> hmmm, you can only forward one irq for a specific device once, right?
> And you already have a lookup function, so why not call that, and then
> remove it?
> 
> I'm confused.
will fix that
> 
>> +		if (fwd_irq_iter->index == irq_index) {
>> +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
>> +						KVM_VFIO_IRQ_SET_NORMAL);
>> +			if (ret < 0)
>> +				break;
>> +			list_del(&fwd_irq_iter->link);
>> +			kfree(fwd_irq_iter);
>> +			ret = 0;
>> +			break;
>> +		}
>> +	}
>> +	if (list_empty(&kvm_vdev->fwd_irq_list))
>> +		remove_assigned_device(kv, kvm_vdev->vfio_device);
>> +
>> +	return ret;
>> +}
>> +
>> +/**
>> + * kvm_vfio_unforward - remove a forwarded IRQ
>> + * @kdev: the kvm device
>> + * @vdev: the vfio_device
>> + * @fwd_irq: user struct
>> + * after checking this IRQ effectively is forwarded, change its state,
>> + * remove it from the corresponding kvm_vfio_device list
>> + */
>> +static int kvm_vfio_unforward(struct kvm_device *kdev,
>> +				     struct vfio_device *vdev,
>> +				     struct kvm_arch_forwarded_irq *fwd_irq)
>> +{
>> +	struct kvm_vfio *kv = kdev->private;
>> +	struct kvm_vfio_device *kvm_vdev;
>> +	int ret;
>> +
>> +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
>> +	if (ret < 0)
>> +		return -EINVAL;
> 
> why do you override the return value?  Propagate it.
ok
> 
>> +
>> +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
>> +	if (ret < 0)
>> +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
>> +			__func__, fwd_irq->fd, fwd_irq->index);
>> +	else
>> +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
>> +			  __func__, fwd_irq->fd, fwd_irq->index);
> 
> again with the kernel log here.
ok
> 
> 
> 
>> +	return ret;
>> +}
>> +
>> +
>> +
>> +
>> +/**
>> + * kvm_vfio_set_device - the top function for interracting with a vfio
> 
>                                 top?             interacting
> 
>> + * device
>> + */
> 
> probably just skip this comment
ok
> 
>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
>> +{
>> +	struct kvm_vfio *kv = kdev->private;
>> +	struct vfio_device *vdev;
>> +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
>> +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
>> +
>> +	switch (attr) {
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
>> +		bool must_put;
>> +		int ret;
>> +
>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
>> +			return -EFAULT;
>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
>> +		if (IS_ERR(vdev))
>> +			return PTR_ERR(vdev);
> 
> seems like this whole block of code is replicated below, needs
> refactoring.
ok
> 
>> +		mutex_lock(&kv->lock);
>> +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
>> +		if (must_put)
>> +			kvm_vfio_put_vfio_device(vdev);
> 
> this must_put looks plain weird.  I think you want to balance your
> get/put's always; can't you just get an extra reference in
> kvm_vfio_forward() ?
I will investigate that. Makes sense
> 
>> +		mutex_unlock(&kv->lock);
>> +		return ret;
>> +		}
>> +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
>> +		int ret;
>> +
>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
>> +			return -EFAULT;
>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
>> +		if (IS_ERR(vdev))
>> +			return PTR_ERR(vdev);
>> +
>> +		kvm_vfio_device_put_external_user(vdev);
> 
> you're dropping the reference to the device but referencing it in your
> unfoward call below?
thanks for identifying that bug.
> 
>> +		mutex_lock(&kv->lock);
>> +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
>> +		mutex_unlock(&kv->lock);
>> +		return ret;
>> +	}
>> +#endif
>> +	default:
>> +		return -ENXIO;
>> +	}
>> +}
>> +
>> +/**
>> + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
>> + * @kv: kvm-vfio device
>> + *
>> + * loop on all got devices and their associated forwarded IRQs
> 
> 'loop on all got' ?
> 
> Restore the non-forwarded state for all registered devices and ...
ok
> 
>> + * restore the non forwarded state, remove IRQs and their devices from
>> + * the respective list, put the vfio platform devices
>> + *
>> + * When this function is called, the vcpu already are destroyed. No
>                                     the VPUCs are already destroyed.
>> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
>> + * kvm_arch_set_fwd_state action
> 
> this last bit didn't make any sense to me.  Also, why are we referring
> to the vgic in generic code?
doesn't make sense anymore indeed. I wanted to emphasize the fact that
VGIC KVM device is destroyed before the KVM VFIO device and this
explains why I need a special CLEANUP cmd (besides the fact I need to
call chip->irq_eoi(d) for the forwarded IRQs);


> 
>> + */
>> +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
>> +{
>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
>> +
>> +	/* loop on all the assigned devices */
> 
> unnecessary comment
ok
> 
>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
>> +				 &kv->device_list, node) {
>> +
>> +		/* loop on all its forwarded IRQ */
> 
> same
ok

Thanks for the detailed review

Best Regards

Eric
> 
>> +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>> +					 &kvm_vdev_iter->fwd_irq_list, link) {
>> +			kvm_arch_set_fwd_state(fwd_irq_iter,
>> +						KVM_VFIO_IRQ_CLEANUP);
>> +			list_del(&fwd_irq_iter->link);
>> +			kfree(fwd_irq_iter);
>> +		}
>> +		list_del(&kvm_vdev_iter->node);
>> +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
>> +		kfree(kvm_vdev_iter);
>> +	}
>> +	return 0;
>> +}
>> +
>> +
>>  static int kvm_vfio_set_attr(struct kvm_device *dev,
>>  			     struct kvm_device_attr *attr)
>>  {
>>  	switch (attr->group) {
>>  	case KVM_DEV_VFIO_GROUP:
>>  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
>> +	case KVM_DEV_VFIO_DEVICE:
>> +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
>>  	}
>>  
>>  	return -ENXIO;
>> @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
>>  		case KVM_DEV_VFIO_GROUP_DEL:
>>  			return 0;
>>  		}
>> -
>>  		break;
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> +	case KVM_DEV_VFIO_DEVICE:
>> +		switch (attr->attr) {
>> +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>> +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>> +			return 0;
>> +		}
>> +		break;
>> +#endif
>>  	}
>> -
>>  	return -ENXIO;
>>  }
>>  
>> @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
>>  		list_del(&kvg->node);
>>  		kfree(kvg);
>>  	}
>> +	kvm_vfio_put_all_devices(kv);
>>  
>>  	kvm_vfio_update_coherency(dev);
>>  
>> @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
>>  		return -ENOMEM;
>>  
>>  	INIT_LIST_HEAD(&kv->group_list);
>> +	INIT_LIST_HEAD(&kv->device_list);
>>  	mutex_init(&kv->lock);
>>  
>>  	dev->private = kv;
>> -- 
>> 1.9.1
>>


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11  9:35       ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11  9:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
>> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
>>
>> This is a new control channel which enables KVM to cooperate with
>> viable VFIO devices.
>>
>> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
>> in addition to a list of groups (kvm_vfio_group). The new
>> infrastructure enables to check the validity of the VFIO device
>> file descriptor, get and hold a reference to it.
>>
>> The first concrete implemented command is IRQ forward control:
>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
>>
>> It consists in programing the VFIO driver and KVM in a consistent manner
>> so that an optimized IRQ injection/completion is set up. Each
>> kvm_vfio_device holds a list of forwarded IRQ. When putting a
>> kvm_vfio_device, the implementation makes sure the forwarded IRQs
>> are set again in the normal handling state (non forwarded).
> 
> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> 
> When a kvm_vfio_device is released?
sure
> 
>>
>> The forwarding programmming is architecture specific, embodied by the
>> kvm_arch_set_fwd_state function. Its implementation is given in a
>> separate patch file.
> 
> I would drop the last sentence and instead indicate that this is handled
> properly when the architecture does not support such a feature.
ok
> 
>>
>> The forwarding control modality is enabled by the
>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> v1 -> v2:
>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> - original patch file separated into 2 parts: generic part moved in vfio.c
>>   and ARM specific part(kvm_arch_set_fwd_state)
>> ---
>>  include/linux/kvm_host.h |  27 +++
>>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 477 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index a4c33b3..24350dc 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
>>  		      unsigned long arg);
>>  };
>>  
>> +enum kvm_fwd_irq_action {
>> +	KVM_VFIO_IRQ_SET_FORWARD,
>> +	KVM_VFIO_IRQ_SET_NORMAL,
>> +	KVM_VFIO_IRQ_CLEANUP,
> 
> This is KVM internal API, so it would probably be good to document this.
> Especially the CLEANUP bit worries me, see below.
I will document it
> 
>> +};
>> +
>> +/* internal structure describing a forwarded IRQ */
>> +struct kvm_fwd_irq {
>> +	struct list_head link;
> 
> this list entry is local to the kvm vfio device, right? that means you
> probably want a struct with just the below fields, and then have a
> containing struct in the generic device file, private to it's logic.
I will introduce 2 separate structs
> 
>> +	__u32 index; /* platform device irq index */
>> +	__u32 hwirq; /*physical IRQ */
>> +	__u32 gsi; /* virtual IRQ */
>> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
>> +};
>> +
>>  void kvm_device_get(struct kvm_device *dev);
>>  void kvm_device_put(struct kvm_device *dev);
>>  struct kvm_device *kvm_device_from_filp(struct file *filp);
>> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
>>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>>  extern struct kvm_device_ops kvm_flic_ops;
>>  
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> 
> what's the 'p' in pfwd?
will rename
> 
>> +			   enum kvm_fwd_irq_action action);
>> +
>> +#else
>> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
>> +					 enum kvm_fwd_irq_action action)
>> +{
>> +	return 0;
>> +}
>> +#endif
>> +
>>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>  
>>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>> index 76dc7a1..e4a81c4 100644
>> --- a/virt/kvm/vfio.c
>> +++ b/virt/kvm/vfio.c
>> @@ -18,14 +18,24 @@
>>  #include <linux/slab.h>
>>  #include <linux/uaccess.h>
>>  #include <linux/vfio.h>
>> +#include <linux/platform_device.h>
>>  
>>  struct kvm_vfio_group {
>>  	struct list_head node;
>>  	struct vfio_group *vfio_group;
>>  };
>>  
>> +struct kvm_vfio_device {
>> +	struct list_head node;
>> +	struct vfio_device *vfio_device;
>> +	/* list of forwarded IRQs for that VFIO device */
>> +	struct list_head fwd_irq_list;
>> +	int fd;
>> +};
>> +
>>  struct kvm_vfio {
>>  	struct list_head group_list;
>> +	struct list_head device_list;
>>  	struct mutex lock;
>>  	bool noncoherent;
>>  };
>> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>>  	return -ENXIO;
>>  }
>>  
>> +/**
>> + * get_vfio_device - returns the vfio-device corresponding to this fd
>> + * @fd:fd of the vfio platform device
>> + *
>> + * checks it is a vfio device
>> + * increment its ref counter
> 
> why the short lines?  Just write this out in proper English.
OK
> 
>> + */
>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
>> +{
>> +	struct fd f;
>> +	struct vfio_device *vdev;
>> +
>> +	f = fdget(fd);
>> +	if (!f.file)
>> +		return NULL;
>> +	vdev = kvm_vfio_device_get_external_user(f.file);
>> +	fdput(f);
>> +	return vdev;
>> +}
>> +
>> +/**
>> + * put_vfio_device: put the vfio platform device
>> + * @vdev: vfio_device to put
>> + *
>> + * decrement the ref counter
>> + */
>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
>> +{
>> +	kvm_vfio_device_put_external_user(vdev);
>> +}
>> +
>> +/**
>> + * kvm_vfio_find_device - look for the device in the assigned
>> + * device list
>> + * @kv: the kvm-vfio device
>> + * @vdev: the vfio_device to look for
>> + *
>> + * returns the associated kvm_vfio_device if the device is known,
>> + * meaning at least 1 IRQ is forwarded for this device.
>> + * in the device is not registered, returns NULL.
>> + */
> 
> are these functions meant to be exported?  Otherwise they should be
> static, and the documentation on these simple list iteration wrappers
> seems like overkill imho.
could be static indeed
> 
>> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
>> +					     struct vfio_device *vdev)
>> +{
>> +	struct kvm_vfio_device *kvm_vdev_iter;
>> +
>> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
>> +		if (kvm_vdev_iter->vfio_device == vdev)
>> +			return kvm_vdev_iter;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +/**
>> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
>> + * @kvm_vdev: the kvm_vfio_device
>> + * @irq_index: irq index
>> + *
>> + * returns the forwarded irq struct if it exists, NULL in the negative
>> + */
>> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
>> +				      int irq_index)
>> +{
>> +	struct kvm_fwd_irq *fwd_irq_iter;
>> +
>> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
>> +		if (fwd_irq_iter->index == irq_index)
>> +			return fwd_irq_iter;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +/**
>> + * validate_forward - checks whether forwarding a given IRQ is meaningful
>> + * @vdev:  vfio_device the IRQ belongs to
>> + * @fwd_irq: user struct containing the irq_index to forward
>> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
>> + * kvm_vfio_device that holds it
>> + * @hwirq: irq numberthe irq index corresponds to
>> + *
>> + * checks the vfio-device is a platform vfio device
>> + * checks the irq_index corresponds to an actual hwirq and
>> + * checks this hwirq is not already forwarded
>> + * returns < 0 on following errors:
>> + * not a platform device, bad irq index, already forwarded
>> + */
>> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
>> +			    struct vfio_device *vdev,
>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
>> +			    struct kvm_vfio_device **kvm_vdev,
>> +			    int *hwirq)
>> +{
>> +	struct device *dev = kvm_vfio_external_base_device(vdev);
>> +	struct platform_device *platdev;
>> +
>> +	*hwirq = -1;
>> +	*kvm_vdev = NULL;
>> +	if (strcmp(dev->bus->name, "platform") == 0) {
>> +		platdev = to_platform_device(dev);
>> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
>> +		if (*hwirq < 0) {
>> +			kvm_err("%s incorrect index\n", __func__);
>> +			return -EINVAL;
>> +		}
>> +	} else {
>> +		kvm_err("%s not a platform device\n", __func__);
>> +		return -EINVAL;
>> +	}
> 
> need some spaceing here, also, I would turn this around, first check if
> the strcmp fails, and then error out, then do you next check etc., to
> avoid so many nested statements.
ok
> 
>> +	/* is a ref to this device already owned by the KVM-VFIO device? */
> 
> this comment is not particularly helpful in its current form, it would
> be helpful if you specified that we're checking whether that particular
> device/irq combo is already registered.
> 
>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
>> +	if (*kvm_vdev) {
>> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
>> +			kvm_err("%s irq %d already forwarded\n",
>> +				__func__, *hwirq);
> 
> don't flood the kernel log because of a user error, just allocate an
> error code for this purpose and document it in the ABI, -EEXIST or
> something.
ok
> 
>> +			return -EINVAL;
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +/**
>> + * validate_unforward: check a deassignment is meaningful
>> + * @kv: the kvm_vfio device
>> + * @vdev: the vfio_device whose irq to deassign belongs to
>> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
>> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
>> + * it exists
>> + *
>> + * returns 0 if the provided irq effectively is forwarded
>> + * (a ref to this vfio_device is hold and this irq belongs to
>                                     held
>> + * the forwarded irq of this device)
>> + * returns -EINVAL in the negative
> 
>                ENOENT should be returned if you don't have an entry.
> 	       EINVAL could be used if you supply an fd that isn't a
> 	       VFIO device file descriptor, for example.  Again,
> 	       consider documenting all this in the API.
> 
>> + */
>> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
>> +			      struct vfio_device *vdev,
>> +			      struct kvm_arch_forwarded_irq *fwd_irq,
>> +			      struct kvm_vfio_device **kvm_vdev)
>> +{
>> +	struct kvm_fwd_irq *pfwd;
>> +
>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
>> +	if (!kvm_vdev) {
>> +		kvm_err("%s no forwarded irq for this device\n", __func__);
> 
> don't flood the kernel log
ok
> 
>> +		return -EINVAL;
>> +	}
>> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
>> +	if (!pfwd) {
>> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> 
> 
>> +		return -EINVAL;
> 
> same here
ok
> 
>> +	}
>> +	return 0;
>> +}
>> +
>> +/**
>> + * kvm_vfio_forward - set a forwarded IRQ
>> + * @kdev: the kvm device
>> + * @vdev: the vfio device the IRQ belongs to
>> + * @fwd_irq: the user struct containing the irq_index and guest irq
>> + * @must_put: tells the caller whether the vfio_device must be put after
>> + * the call (ref must be released in case a ref onto this device was
>> + * already hold or in case of new device and failure)
>> + *
>> + * validate the injection, activate forward and store the information
>       Validate
>> + * about which irq and which device is concerned so that on deassign or
>> + * kvm-vfio destruction everuthing can be cleaned up.
>                            everything
> 
> I'm not sure I understand this explanation.  Do we have concerned
> devices?
> 
> I think you want to say something along the lines of: If userspace passed
> a valid vfio device and irq handle and the architecture supports
> forwarding this combination, register the vfio_device and irq
> combination in the ....
ok
> 
>> + */
>> +static int kvm_vfio_forward(struct kvm_device *kdev,
>> +			    struct vfio_device *vdev,
>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
>> +			    bool *must_put)
>> +{
>> +	int ret;
>> +	struct kvm_fwd_irq *pfwd = NULL;
>> +	struct kvm_vfio_device *kvm_vdev = NULL;
>> +	struct kvm_vfio *kv = kdev->private;
>> +	int hwirq;
>> +
>> +	*must_put = true;
>> +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
>> +					&kvm_vdev, &hwirq);
>> +	if (ret < 0)
>> +		return -EINVAL;
>> +
>> +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
> 
> seems a bit pointless to zero-out the memory if you're setting all
> fields below.
ok
> 
>> +	if (!pfwd)
>> +		return -ENOMEM;
>> +	pfwd->index = fwd_irq->index;
>> +	pfwd->gsi = fwd_irq->gsi;
>> +	pfwd->hwirq = hwirq;
>> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
>> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
>> +	if (ret < 0) {
>> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> 
> this whole thing feels incredibly broken to me.  Setting a forward
> should either work or not work, not something in between that leaves
> something to be cleaned up.  Why this two-stage thingy here?
I wanted to exploit the return value of vgic_map_phys_irq which is
likely to fail if the phys/virt mapping exists at VGIC level.

I already validated the injection from a KVM_VFIO_DEVICE point of view
(the device/irq is not known internally). But what if another external
component - which does not exist yet - maps the IRQ at VGIC level? Maybe
I need to replace the existing validation check by querying the VGIC at
low level instead of checking KVM-VFIO local variables.
> 
>> +		kfree(pfwd);
> 
> probably want to move your free-and-return-error to the end of the
> function.
ok
> 
>> +		return ret;
>> +	}
>> +
>> +	if (!kvm_vdev) {
>> +		/* create & insert the new device and keep the ref */
>> +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
> 
> again, no need for zeroing out the memory.
ok
> 
>> +		if (!kvm_vdev) {
>> +			kvm_arch_set_fwd_state(pfwd, false);
>> +			kfree(pfwd);
>> +			return -ENOMEM;
>> +		}
>> +
>> +		kvm_vdev->vfio_device = vdev;
>> +		kvm_vdev->fd = fwd_irq->fd;
>> +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
>> +		list_add(&kvm_vdev->node, &kv->device_list);
>> +		/*
>> +		 * the only case where we keep the ref:
>> +		 * new device and forward setting successful
>> +		 */
>> +		*must_put = false;
>> +	}
>> +
>> +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
>> +
>> +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
>> +	fwd_irq->fd, hwirq, fwd_irq->gsi);
> 
> please indent this to align with the opening parenthesis.
ok
> 
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * remove_assigned_device - put a given device from the list
> 
> this isn't a 'put', at least not *just* a put.
correct, I will rephrase
> 
>> + * @kv: the kvm-vfio device
>> + * @vdev: the vfio-device to remove
>> + *
>> + * change the state of all forwarded IRQs, free the forwarded IRQ list,
>> + * remove the corresponding kvm_vfio_device from the assigned device
>> + * list.
>> + * returns true if the device could be removed, false in the negative
>> + */
>> +bool remove_assigned_device(struct kvm_vfio *kv,
>> +			    struct vfio_device *vdev)
>> +{
>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>> +	bool removed = false;
>> +	int ret;
>> +
>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
>> +				 &kv->device_list, node) {
>> +		if (kvm_vdev_iter->vfio_device == vdev) {
>> +			/* loop on all its forwarded IRQ */
>> +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>> +						 &kvm_vdev_iter->fwd_irq_list,
>> +						 link) {
> 
> hmm, seems this function is only called when you have no more forwarded
> IRQs, so isn't all of this completely dead (and unnecessary) code?
yep I can simplify all that cleanup
> 
>> +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
>> +						KVM_VFIO_IRQ_SET_NORMAL);
>> +				if (ret < 0)
>> +					return ret;
> 
> you're returning an error code to a bool function, which means you'll
> return true when there was an error.  Is this your intention? ;)
definitively not!
> 
> if we have an error here, this would be a very very bad situation wouldn't it?
sure. I will simplify this, transform kvm_arch_set_fwd_state into a void
function
> 
>> +				list_del(&fwd_irq_iter->link);
>> +				kfree(fwd_irq_iter);
>> +			}
>> +			/* all IRQs could be deassigned */
>> +			list_del(&kvm_vdev_iter->node);
>> +			kvm_vfio_device_put_external_user(
>> +				kvm_vdev_iter->vfio_device);
>> +			kfree(kvm_vdev_iter);
>> +			removed = true;
>> +			break;
>> +		}
>> +	}
>> +	return removed;
>> +}
>> +
>> +
>> +/**
>> + * remove_fwd_irq - remove a forwarded irq
>> + *
>> + * @kv: kvm-vfio device
>> + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
>> + * irq_index: the index of the IRQ
>> + *
>> + * change the forwarded state of the IRQ, remove the IRQ from
>> + * the device forwarded IRQ list. In case it is the last one,
>> + * put the device
>> + */
>> +int remove_fwd_irq(struct kvm_vfio *kv,
>> +		   struct kvm_vfio_device *kvm_vdev,
>> +		   int irq_index)
>> +{
>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>> +	int ret = -1;
>> +
>> +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>> +				 &kvm_vdev->fwd_irq_list, link) {
> 
> hmmm, you can only forward one irq for a specific device once, right?
> And you already have a lookup function, so why not call that, and then
> remove it?
> 
> I'm confused.
will fix that
> 
>> +		if (fwd_irq_iter->index == irq_index) {
>> +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
>> +						KVM_VFIO_IRQ_SET_NORMAL);
>> +			if (ret < 0)
>> +				break;
>> +			list_del(&fwd_irq_iter->link);
>> +			kfree(fwd_irq_iter);
>> +			ret = 0;
>> +			break;
>> +		}
>> +	}
>> +	if (list_empty(&kvm_vdev->fwd_irq_list))
>> +		remove_assigned_device(kv, kvm_vdev->vfio_device);
>> +
>> +	return ret;
>> +}
>> +
>> +/**
>> + * kvm_vfio_unforward - remove a forwarded IRQ
>> + * @kdev: the kvm device
>> + * @vdev: the vfio_device
>> + * @fwd_irq: user struct
>> + * after checking this IRQ effectively is forwarded, change its state,
>> + * remove it from the corresponding kvm_vfio_device list
>> + */
>> +static int kvm_vfio_unforward(struct kvm_device *kdev,
>> +				     struct vfio_device *vdev,
>> +				     struct kvm_arch_forwarded_irq *fwd_irq)
>> +{
>> +	struct kvm_vfio *kv = kdev->private;
>> +	struct kvm_vfio_device *kvm_vdev;
>> +	int ret;
>> +
>> +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
>> +	if (ret < 0)
>> +		return -EINVAL;
> 
> why do you override the return value?  Propagate it.
ok
> 
>> +
>> +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
>> +	if (ret < 0)
>> +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
>> +			__func__, fwd_irq->fd, fwd_irq->index);
>> +	else
>> +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
>> +			  __func__, fwd_irq->fd, fwd_irq->index);
> 
> again with the kernel log here.
ok
> 
> 
> 
>> +	return ret;
>> +}
>> +
>> +
>> +
>> +
>> +/**
>> + * kvm_vfio_set_device - the top function for interracting with a vfio
> 
>                                 top?             interacting
> 
>> + * device
>> + */
> 
> probably just skip this comment
ok
> 
>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
>> +{
>> +	struct kvm_vfio *kv = kdev->private;
>> +	struct vfio_device *vdev;
>> +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
>> +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
>> +
>> +	switch (attr) {
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
>> +		bool must_put;
>> +		int ret;
>> +
>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
>> +			return -EFAULT;
>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
>> +		if (IS_ERR(vdev))
>> +			return PTR_ERR(vdev);
> 
> seems like this whole block of code is replicated below, needs
> refactoring.
ok
> 
>> +		mutex_lock(&kv->lock);
>> +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
>> +		if (must_put)
>> +			kvm_vfio_put_vfio_device(vdev);
> 
> this must_put looks plain weird.  I think you want to balance your
> get/put's always; can't you just get an extra reference in
> kvm_vfio_forward() ?
I will investigate that. Makes sense
> 
>> +		mutex_unlock(&kv->lock);
>> +		return ret;
>> +		}
>> +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
>> +		int ret;
>> +
>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
>> +			return -EFAULT;
>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
>> +		if (IS_ERR(vdev))
>> +			return PTR_ERR(vdev);
>> +
>> +		kvm_vfio_device_put_external_user(vdev);
> 
> you're dropping the reference to the device but referencing it in your
> unfoward call below?
thanks for identifying that bug.
> 
>> +		mutex_lock(&kv->lock);
>> +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
>> +		mutex_unlock(&kv->lock);
>> +		return ret;
>> +	}
>> +#endif
>> +	default:
>> +		return -ENXIO;
>> +	}
>> +}
>> +
>> +/**
>> + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
>> + * @kv: kvm-vfio device
>> + *
>> + * loop on all got devices and their associated forwarded IRQs
> 
> 'loop on all got' ?
> 
> Restore the non-forwarded state for all registered devices and ...
ok
> 
>> + * restore the non forwarded state, remove IRQs and their devices from
>> + * the respective list, put the vfio platform devices
>> + *
>> + * When this function is called, the vcpu already are destroyed. No
>                                     the VPUCs are already destroyed.
>> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
>> + * kvm_arch_set_fwd_state action
> 
> this last bit didn't make any sense to me.  Also, why are we referring
> to the vgic in generic code?
doesn't make sense anymore indeed. I wanted to emphasize the fact that
VGIC KVM device is destroyed before the KVM VFIO device and this
explains why I need a special CLEANUP cmd (besides the fact I need to
call chip->irq_eoi(d) for the forwarded IRQs);


> 
>> + */
>> +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
>> +{
>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
>> +
>> +	/* loop on all the assigned devices */
> 
> unnecessary comment
ok
> 
>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
>> +				 &kv->device_list, node) {
>> +
>> +		/* loop on all its forwarded IRQ */
> 
> same
ok

Thanks for the detailed review

Best Regards

Eric
> 
>> +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>> +					 &kvm_vdev_iter->fwd_irq_list, link) {
>> +			kvm_arch_set_fwd_state(fwd_irq_iter,
>> +						KVM_VFIO_IRQ_CLEANUP);
>> +			list_del(&fwd_irq_iter->link);
>> +			kfree(fwd_irq_iter);
>> +		}
>> +		list_del(&kvm_vdev_iter->node);
>> +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
>> +		kfree(kvm_vdev_iter);
>> +	}
>> +	return 0;
>> +}
>> +
>> +
>>  static int kvm_vfio_set_attr(struct kvm_device *dev,
>>  			     struct kvm_device_attr *attr)
>>  {
>>  	switch (attr->group) {
>>  	case KVM_DEV_VFIO_GROUP:
>>  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
>> +	case KVM_DEV_VFIO_DEVICE:
>> +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
>>  	}
>>  
>>  	return -ENXIO;
>> @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
>>  		case KVM_DEV_VFIO_GROUP_DEL:
>>  			return 0;
>>  		}
>> -
>>  		break;
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> +	case KVM_DEV_VFIO_DEVICE:
>> +		switch (attr->attr) {
>> +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>> +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>> +			return 0;
>> +		}
>> +		break;
>> +#endif
>>  	}
>> -
>>  	return -ENXIO;
>>  }
>>  
>> @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
>>  		list_del(&kvg->node);
>>  		kfree(kvg);
>>  	}
>> +	kvm_vfio_put_all_devices(kv);
>>  
>>  	kvm_vfio_update_coherency(dev);
>>  
>> @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
>>  		return -ENOMEM;
>>  
>>  	INIT_LIST_HEAD(&kv->group_list);
>> +	INIT_LIST_HEAD(&kv->device_list);
>>  	mutex_init(&kv->lock);
>>  
>>  	dev->private = kv;
>> -- 
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-11  5:05       ` Alex Williamson
@ 2014-09-11 12:04         ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11 12:04 UTC (permalink / raw)
  To: Alex Williamson, Christoffer Dall
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	joel.schopp, kim.phillips, paulus, gleb, pbonzini, linux-kernel,
	patches, will.deacon, a.motakis, a.rigo, john.liuli

On 09/11/2014 07:05 AM, Alex Williamson wrote:
> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
>> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
>>> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
>>>
>>> This is a new control channel which enables KVM to cooperate with
>>> viable VFIO devices.
>>>
>>> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
>>> in addition to a list of groups (kvm_vfio_group). The new
>>> infrastructure enables to check the validity of the VFIO device
>>> file descriptor, get and hold a reference to it.
>>>
>>> The first concrete implemented command is IRQ forward control:
>>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
>>>
>>> It consists in programing the VFIO driver and KVM in a consistent manner
>>> so that an optimized IRQ injection/completion is set up. Each
>>> kvm_vfio_device holds a list of forwarded IRQ. When putting a
>>> kvm_vfio_device, the implementation makes sure the forwarded IRQs
>>> are set again in the normal handling state (non forwarded).
>>
>> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
>>
>> When a kvm_vfio_device is released?
>>
>>>
>>> The forwarding programmming is architecture specific, embodied by the
>>> kvm_arch_set_fwd_state function. Its implementation is given in a
>>> separate patch file.
>>
>> I would drop the last sentence and instead indicate that this is handled
>> properly when the architecture does not support such a feature.
>>
>>>
>>> The forwarding control modality is enabled by the
>>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> ---
>>>
>>> v1 -> v2:
>>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>> - original patch file separated into 2 parts: generic part moved in vfio.c
>>>   and ARM specific part(kvm_arch_set_fwd_state)
>>> ---
>>>  include/linux/kvm_host.h |  27 +++
>>>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
>>>  2 files changed, 477 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index a4c33b3..24350dc 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
>>>  		      unsigned long arg);
>>>  };
>>>  
>>> +enum kvm_fwd_irq_action {
>>> +	KVM_VFIO_IRQ_SET_FORWARD,
>>> +	KVM_VFIO_IRQ_SET_NORMAL,
>>> +	KVM_VFIO_IRQ_CLEANUP,
>>
>> This is KVM internal API, so it would probably be good to document this.
>> Especially the CLEANUP bit worries me, see below.
> 
> This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
Hi Alex,

will change that.
> Extra states worry me too.

I tried to explained the 2 motivations behind. Please let me know if it
makes sense.
> 
>>> +};
>>> +
>>> +/* internal structure describing a forwarded IRQ */
>>> +struct kvm_fwd_irq {
>>> +	struct list_head link;
>>
>> this list entry is local to the kvm vfio device, right? that means you
>> probably want a struct with just the below fields, and then have a
>> containing struct in the generic device file, private to it's logic.
> 
> Yes, this is part of the abstraction problem.
OK will fix that.
> 
>>> +	__u32 index; /* platform device irq index */
> 
> This is a vfio_device irq_index, but vfio_devices support indexes and
> sub-indexes.  At this level the API should match vfio, not the specifics
> of platform devices not supporting sub-index.
I will add sub-indexes then.
> 
>>> +	__u32 hwirq; /*physical IRQ */
>>> +	__u32 gsi; /* virtual IRQ */
>>> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> 
> Not sure I understand why vcpu is necessary.
vcpu is used when providing the physical IRQ/virtual IRQ mapping to the
virtual GIC. I can remove it from and add a vcpu struct * param to
kvm_arch_set_fwd_state if you prefer.

  Also I see a 'get' in the code below, but not a 'put'.
Sorry I do not understand your comment here? What 'get' do you mention?
> 
>>> +};
>>> +
>>>  void kvm_device_get(struct kvm_device *dev);
>>>  void kvm_device_put(struct kvm_device *dev);
>>>  struct kvm_device *kvm_device_from_filp(struct file *filp);
>>> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
>>>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>>>  extern struct kvm_device_ops kvm_flic_ops;
>>>  
>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
>>
>> what's the 'p' in pfwd?
> 
> p is for pointer?
yes it was ;-)
> 
>>> +			   enum kvm_fwd_irq_action action);
>>> +
>>> +#else
>>> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
>>> +					 enum kvm_fwd_irq_action action)
>>> +{
>>> +	return 0;
>>> +}
>>> +#endif
>>> +
>>>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>>  
>>>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
>>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>>> index 76dc7a1..e4a81c4 100644
>>> --- a/virt/kvm/vfio.c
>>> +++ b/virt/kvm/vfio.c
>>> @@ -18,14 +18,24 @@
>>>  #include <linux/slab.h>
>>>  #include <linux/uaccess.h>
>>>  #include <linux/vfio.h>
>>> +#include <linux/platform_device.h>
>>>  
>>>  struct kvm_vfio_group {
>>>  	struct list_head node;
>>>  	struct vfio_group *vfio_group;
>>>  };
>>>  
>>> +struct kvm_vfio_device {
>>> +	struct list_head node;
>>> +	struct vfio_device *vfio_device;
>>> +	/* list of forwarded IRQs for that VFIO device */
>>> +	struct list_head fwd_irq_list;
>>> +	int fd;
>>> +};
>>> +
>>>  struct kvm_vfio {
>>>  	struct list_head group_list;
>>> +	struct list_head device_list;
>>>  	struct mutex lock;
>>>  	bool noncoherent;
>>>  };
>>> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>>>  	return -ENXIO;
>>>  }
>>>  
>>> +/**
>>> + * get_vfio_device - returns the vfio-device corresponding to this fd
>>> + * @fd:fd of the vfio platform device
>>> + *
>>> + * checks it is a vfio device
>>> + * increment its ref counter
>>
>> why the short lines?  Just write this out in proper English.
>>
>>> + */
>>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
>>> +{
>>> +	struct fd f;
>>> +	struct vfio_device *vdev;
>>> +
>>> +	f = fdget(fd);
>>> +	if (!f.file)
>>> +		return NULL;
>>> +	vdev = kvm_vfio_device_get_external_user(f.file);
>>> +	fdput(f);
>>> +	return vdev;
>>> +}
>>> +
>>> +/**
>>> + * put_vfio_device: put the vfio platform device
>>> + * @vdev: vfio_device to put
>>> + *
>>> + * decrement the ref counter
>>> + */
>>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
>>> +{
>>> +	kvm_vfio_device_put_external_user(vdev);
>>> +}
>>> +
>>> +/**
>>> + * kvm_vfio_find_device - look for the device in the assigned
>>> + * device list
>>> + * @kv: the kvm-vfio device
>>> + * @vdev: the vfio_device to look for
>>> + *
>>> + * returns the associated kvm_vfio_device if the device is known,
>>> + * meaning at least 1 IRQ is forwarded for this device.
>>> + * in the device is not registered, returns NULL.
>>> + */
> 
> Why are we talking about forwarded IRQs already, this is a simple lookup
> function, who knows what other users it will have in the future.
I will correct
> 
>>
>> are these functions meant to be exported?  Otherwise they should be
>> static, and the documentation on these simple list iteration wrappers
>> seems like overkill imho.
>>
>>> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
>>> +					     struct vfio_device *vdev)
>>> +{
>>> +	struct kvm_vfio_device *kvm_vdev_iter;
>>> +
>>> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
>>> +		if (kvm_vdev_iter->vfio_device == vdev)
>>> +			return kvm_vdev_iter;
>>> +	}
>>> +	return NULL;
>>> +}
>>> +
>>> +/**
>>> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
>>> + * @kvm_vdev: the kvm_vfio_device
>>> + * @irq_index: irq index
>>> + *
>>> + * returns the forwarded irq struct if it exists, NULL in the negative
>>> + */
>>> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
>>> +				      int irq_index)
> 
> +sub-index
OK
> 
> probably important to note on both of these that they need to be called
> with kv->lock
OK
> 
>>> +{
>>> +	struct kvm_fwd_irq *fwd_irq_iter;
>>> +
>>> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
>>> +		if (fwd_irq_iter->index == irq_index)
>>> +			return fwd_irq_iter;
>>> +	}
>>> +	return NULL;
>>> +}
>>> +
>>> +/**
>>> + * validate_forward - checks whether forwarding a given IRQ is meaningful
>>> + * @vdev:  vfio_device the IRQ belongs to
>>> + * @fwd_irq: user struct containing the irq_index to forward
>>> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
>>> + * kvm_vfio_device that holds it
>>> + * @hwirq: irq numberthe irq index corresponds to
>>> + *
>>> + * checks the vfio-device is a platform vfio device
>>> + * checks the irq_index corresponds to an actual hwirq and
>>> + * checks this hwirq is not already forwarded
>>> + * returns < 0 on following errors:
>>> + * not a platform device, bad irq index, already forwarded
>>> + */
>>> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
>>> +			    struct vfio_device *vdev,
>>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
>>> +			    struct kvm_vfio_device **kvm_vdev,
>>> +			    int *hwirq)
>>> +{
>>> +	struct device *dev = kvm_vfio_external_base_device(vdev);
>>> +	struct platform_device *platdev;
>>> +
>>> +	*hwirq = -1;
>>> +	*kvm_vdev = NULL;
>>> +	if (strcmp(dev->bus->name, "platform") == 0) {
> 
> Should be testing dev->bus_type == &platform_bus_type, and ideally
> creating a dev_is_platform() macro to make that even cleaner.
OK
> 
> However, we're being sort of sneaky here that we're actually doing
> something platform device specific here.  Why?  Don't we just need to
> make sure that kvm-vfio doesn't have any record of this forward
> (-EEXIST) and let the platform device code error out later for this
> case?
After having answered to Christoffer's comments, I think I should check
whether the IRQ is not already mapped at VGIC level. In that case I
would need to split the validate function into 2 parts:
- generic part only checks the vfio_device/irq_index is not already
recorded. I do not need the hwirq for that.
- arm specific part checks no GIC mapping does exist (need the hwirq)

Would it be OK for both of you?
> 
>>> +		platdev = to_platform_device(dev);
>>> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
>>> +		if (*hwirq < 0) {
>>> +			kvm_err("%s incorrect index\n", __func__);
>>> +			return -EINVAL;
>>> +		}
>>> +	} else {
>>> +		kvm_err("%s not a platform device\n", __func__);
>>> +		return -EINVAL;
>>> +	}
>>
>> need some spaceing here, also, I would turn this around, first check if
>> the strcmp fails, and then error out, then do you next check etc., to
>> avoid so many nested statements.
>>
>>> +	/* is a ref to this device already owned by the KVM-VFIO device? */
>>
>> this comment is not particularly helpful in its current form, it would
>> be helpful if you specified that we're checking whether that particular
>> device/irq combo is already registered.
>>
>>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
>>> +	if (*kvm_vdev) {
>>> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
>>> +			kvm_err("%s irq %d already forwarded\n",
>>> +				__func__, *hwirq);
> 
> Why didn't we do this first?
see above comment
> 
>> don't flood the kernel log because of a user error, just allocate an
>> error code for this purpose and document it in the ABI, -EEXIST or
>> something.
>>
>>> +			return -EINVAL;
>>> +		}
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>> +/**
>>> + * validate_unforward: check a deassignment is meaningful
>>> + * @kv: the kvm_vfio device
>>> + * @vdev: the vfio_device whose irq to deassign belongs to
>>> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
>>> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
>>> + * it exists
>>> + *
>>> + * returns 0 if the provided irq effectively is forwarded
>>> + * (a ref to this vfio_device is hold and this irq belongs to
>>                                     held
>>> + * the forwarded irq of this device)
>>> + * returns -EINVAL in the negative
>>
>>                ENOENT should be returned if you don't have an entry.
>> 	       EINVAL could be used if you supply an fd that isn't a
>> 	       VFIO device file descriptor, for example.  Again,
>> 	       consider documenting all this in the API.
>>
>>> + */
>>> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
>>> +			      struct vfio_device *vdev,
>>> +			      struct kvm_arch_forwarded_irq *fwd_irq,
>>> +			      struct kvm_vfio_device **kvm_vdev)
>>> +{
>>> +	struct kvm_fwd_irq *pfwd;
>>> +
>>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
>>> +	if (!kvm_vdev) {
>>> +		kvm_err("%s no forwarded irq for this device\n", __func__);
>>
>> don't flood the kernel log
>>
>>> +		return -EINVAL;
>>> +	}
>>> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
>>> +	if (!pfwd) {
>>> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
>>
>>> +		return -EINVAL;
>>
>> same here
I do not understand. With current functions I need to first retrieve the
device and then iterate on IRQs of that device.
>>
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>> +/**
>>> + * kvm_vfio_forward - set a forwarded IRQ
>>> + * @kdev: the kvm device
>>> + * @vdev: the vfio device the IRQ belongs to
>>> + * @fwd_irq: the user struct containing the irq_index and guest irq
>>> + * @must_put: tells the caller whether the vfio_device must be put after
>>> + * the call (ref must be released in case a ref onto this device was
>>> + * already hold or in case of new device and failure)
>>> + *
>>> + * validate the injection, activate forward and store the information
>>       Validate
>>> + * about which irq and which device is concerned so that on deassign or
>>> + * kvm-vfio destruction everuthing can be cleaned up.
>>                            everything
>>
>> I'm not sure I understand this explanation.  Do we have concerned
>> devices?
>>
>> I think you want to say something along the lines of: If userspace passed
>> a valid vfio device and irq handle and the architecture supports
>> forwarding this combination, register the vfio_device and irq
>> combination in the ....
>>
>>> + */
>>> +static int kvm_vfio_forward(struct kvm_device *kdev,
>>> +			    struct vfio_device *vdev,
>>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
>>> +			    bool *must_put)
>>> +{
>>> +	int ret;
>>> +	struct kvm_fwd_irq *pfwd = NULL;
>>> +	struct kvm_vfio_device *kvm_vdev = NULL;
>>> +	struct kvm_vfio *kv = kdev->private;
>>> +	int hwirq;
>>> +
>>> +	*must_put = true;
>>> +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
>>> +					&kvm_vdev, &hwirq);
>>> +	if (ret < 0)
>>> +		return -EINVAL;
>>> +
>>> +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
>>
>> seems a bit pointless to zero-out the memory if you're setting all
>> fields below.
>>
>>> +	if (!pfwd)
>>> +		return -ENOMEM;
>>> +	pfwd->index = fwd_irq->index;
>>> +	pfwd->gsi = fwd_irq->gsi;
>>> +	pfwd->hwirq = hwirq;
>>> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
>>> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
>>> +	if (ret < 0) {
>>> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
>>
>> this whole thing feels incredibly broken to me.  Setting a forward
>> should either work or not work, not something in between that leaves
>> something to be cleaned up.  Why this two-stage thingy here?
> 
> Yep, I agree.  I also don't see the point of the validate function, just
> open code it here and push the platform_get_irq test into
> kvm_arch_set_fwd_state.  kvm-vfio doesn't care about the hwirq.
> 
>>> +		kfree(pfwd);
>>
>> probably want to move your free-and-return-error to the end of the
>> function.
>>
>>> +		return ret;
>>> +	}
>>> +
>>> +	if (!kvm_vdev) {
>>> +		/* create & insert the new device and keep the ref */
>>> +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
>>
>> again, no need for zeroing out the memory.
> 
> I think you also want to allocate this before you setup the forward so
> you can eliminate any complicated teardown later.
ok
> 
>>> +		if (!kvm_vdev) {
>>> +			kvm_arch_set_fwd_state(pfwd, false);
> 
> false?  The function takes an enum.
Thanks for identifying that bug.
> 
>>> +			kfree(pfwd);
>>> +			return -ENOMEM;
>>> +		}
>>> +
>>> +		kvm_vdev->vfio_device = vdev;
>>> +		kvm_vdev->fd = fwd_irq->fd;
>>> +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
>>> +		list_add(&kvm_vdev->node, &kv->device_list);
>>> +		/*
>>> +		 * the only case where we keep the ref:
>>> +		 * new device and forward setting successful
>>> +		 */
>>> +		*must_put = false;
>>> +	}
>>> +
>>> +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
>>> +
>>> +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
>>> +	fwd_irq->fd, hwirq, fwd_irq->gsi);
>>
>> please indent this to align with the opening parenthesis.
>>
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/**
>>> + * remove_assigned_device - put a given device from the list
>>
>> this isn't a 'put', at least not *just* a put.
>>
>>> + * @kv: the kvm-vfio device
>>> + * @vdev: the vfio-device to remove
>>> + *
>>> + * change the state of all forwarded IRQs, free the forwarded IRQ list,
>>> + * remove the corresponding kvm_vfio_device from the assigned device
>>> + * list.
>>> + * returns true if the device could be removed, false in the negative
>>> + */
>>> +bool remove_assigned_device(struct kvm_vfio *kv,
>>> +			    struct vfio_device *vdev)
>>> +{
>>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
>>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>>> +	bool removed = false;
>>> +	int ret;
>>> +
>>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
>>> +				 &kv->device_list, node) {
>>> +		if (kvm_vdev_iter->vfio_device == vdev) {
>>> +			/* loop on all its forwarded IRQ */
>>> +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>>> +						 &kvm_vdev_iter->fwd_irq_list,
>>> +						 link) {
>>
>> hmm, seems this function is only called when you have no more forwarded
>> IRQs, so isn't all of this completely dead (and unnecessary) code?
>>
>>> +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
>>> +						KVM_VFIO_IRQ_SET_NORMAL);
>>> +				if (ret < 0)
>>> +					return ret;
>>
>> you're returning an error code to a bool function, which means you'll
>> return true when there was an error.  Is this your intention? ;)
>>
>> if we have an error here, this would be a very very bad situation wouldn't it?
>>
>>> +				list_del(&fwd_irq_iter->link);
>>> +				kfree(fwd_irq_iter);
>>> +			}
>>> +			/* all IRQs could be deassigned */
>>> +			list_del(&kvm_vdev_iter->node);
>>> +			kvm_vfio_device_put_external_user(
>>> +				kvm_vdev_iter->vfio_device);
>>> +			kfree(kvm_vdev_iter);
>>> +			removed = true;
>>> +			break;
>>> +		}
>>> +	}
>>> +	return removed;
>>> +}
>>> +
>>> +
>>> +/**
>>> + * remove_fwd_irq - remove a forwarded irq
>>> + *
>>> + * @kv: kvm-vfio device
>>> + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
>>> + * irq_index: the index of the IRQ
>>> + *
>>> + * change the forwarded state of the IRQ, remove the IRQ from
>>> + * the device forwarded IRQ list. In case it is the last one,
>>> + * put the device
>>> + */
>>> +int remove_fwd_irq(struct kvm_vfio *kv,
>>> +		   struct kvm_vfio_device *kvm_vdev,
>>> +		   int irq_index)
>>> +{
>>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>>> +	int ret = -1;
>>> +
>>> +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>>> +				 &kvm_vdev->fwd_irq_list, link) {
>>
>> hmmm, you can only forward one irq for a specific device once, right?
>> And you already have a lookup function, so why not call that, and then
>> remove it?
>>
>> I'm confused.

> 
> Me too, this and the previous function need some more consideration.
understood
> 
>>> +		if (fwd_irq_iter->index == irq_index) {
>>> +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
>>> +						KVM_VFIO_IRQ_SET_NORMAL);
>>> +			if (ret < 0)
>>> +				break;
>>> +			list_del(&fwd_irq_iter->link);
>>> +			kfree(fwd_irq_iter);
>>> +			ret = 0;
>>> +			break;
>>> +		}
>>> +	}
>>> +	if (list_empty(&kvm_vdev->fwd_irq_list))
>>> +		remove_assigned_device(kv, kvm_vdev->vfio_device);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +/**
>>> + * kvm_vfio_unforward - remove a forwarded IRQ
>>> + * @kdev: the kvm device
>>> + * @vdev: the vfio_device
>>> + * @fwd_irq: user struct
>>> + * after checking this IRQ effectively is forwarded, change its state,
>>> + * remove it from the corresponding kvm_vfio_device list
>>> + */
>>> +static int kvm_vfio_unforward(struct kvm_device *kdev,
>>> +				     struct vfio_device *vdev,
>>> +				     struct kvm_arch_forwarded_irq *fwd_irq)
>>> +{
>>> +	struct kvm_vfio *kv = kdev->private;
>>> +	struct kvm_vfio_device *kvm_vdev;
>>> +	int ret;
>>> +
>>> +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
>>> +	if (ret < 0)
>>> +		return -EINVAL;
>>
>> why do you override the return value?  Propagate it.
>>
>>> +
>>> +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
>>> +	if (ret < 0)
>>> +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
>>> +			__func__, fwd_irq->fd, fwd_irq->index);
>>> +	else
>>> +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
>>> +			  __func__, fwd_irq->fd, fwd_irq->index);
>>
>> again with the kernel log here.
>>
>>
>>
>>> +	return ret;
>>> +}
>>> +
>>> +
>>> +
>>> +
>>> +/**
>>> + * kvm_vfio_set_device - the top function for interracting with a vfio
>>
>>                                 top?             interacting
>>
>>> + * device
>>> + */
>>
>> probably just skip this comment
>>
>>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
>>> +{
>>> +	struct kvm_vfio *kv = kdev->private;
>>> +	struct vfio_device *vdev;
>>> +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
>>> +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
>>> +
>>> +	switch (attr) {
>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>> +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
>>> +		bool must_put;
>>> +		int ret;
>>> +
>>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
>>> +			return -EFAULT;
>>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
>>> +		if (IS_ERR(vdev))
>>> +			return PTR_ERR(vdev);
>>
>> seems like this whole block of code is replicated below, needs
>> refactoring.
>>
>>> +		mutex_lock(&kv->lock);
>>> +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
>>> +		if (must_put)
>>> +			kvm_vfio_put_vfio_device(vdev);
>>
>> this must_put looks plain weird.  I think you want to balance your
>> get/put's always; can't you just get an extra reference in
>> kvm_vfio_forward() ?
> 
> Yeah, this is very broken.  Every forwarded IRQ should hold a reference
> to the vfio_device.  Every unforward should drop a reference.  Trying to
> maintain a single reference is a non-goal.
OK will do that.
> 
>>
>>> +		mutex_unlock(&kv->lock);
>>> +		return ret;
>>> +		}
>>> +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
>>> +		int ret;
>>> +
>>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
>>> +			return -EFAULT;
>>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
>>> +		if (IS_ERR(vdev))
>>> +			return PTR_ERR(vdev);
>>> +
>>> +		kvm_vfio_device_put_external_user(vdev);
>>
>> you're dropping the reference to the device but referencing it in your
>> unfoward call below?
>>
>>> +		mutex_lock(&kv->lock);
>>> +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
>>> +		mutex_unlock(&kv->lock);
>>> +		return ret;
>>> +	}
>>> +#endif
>>> +	default:
>>> +		return -ENXIO;
>>> +	}
>>> +}
>>> +
>>> +/**
>>> + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
>>> + * @kv: kvm-vfio device
>>> + *
>>> + * loop on all got devices and their associated forwarded IRQs
>>
>> 'loop on all got' ?
>>
>> Restore the non-forwarded state for all registered devices and ...
>>
>>> + * restore the non forwarded state, remove IRQs and their devices from
>>> + * the respective list, put the vfio platform devices
>>> + *
>>> + * When this function is called, the vcpu already are destroyed. No
>>                                     the VPUCs are already destroyed.
>>> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
>>> + * kvm_arch_set_fwd_state action
>>
>> this last bit didn't make any sense to me.  Also, why are we referring
>> to the vgic in generic code?
>>
>>> + */
>>> +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
>>> +{
>>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
>>> +
>>> +	/* loop on all the assigned devices */
>>
>> unnecessary comment
>>
>>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
>>> +				 &kv->device_list, node) {
>>> +
>>> +		/* loop on all its forwarded IRQ */
>>
>> same
>>
>>> +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>>> +					 &kvm_vdev_iter->fwd_irq_list, link) {
>>> +			kvm_arch_set_fwd_state(fwd_irq_iter,
>>> +						KVM_VFIO_IRQ_CLEANUP);
>>> +			list_del(&fwd_irq_iter->link);
>>> +			kfree(fwd_irq_iter);
>>> +		}
> 
> 
> Ugh, how many of these cleanup functions do we need?
will simplify!

Thanks

Best Regards

Eric
> 
>>> +		list_del(&kvm_vdev_iter->node);
>>> +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
>>> +		kfree(kvm_vdev_iter);
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>> +
>>>  static int kvm_vfio_set_attr(struct kvm_device *dev,
>>>  			     struct kvm_device_attr *attr)
>>>  {
>>>  	switch (attr->group) {
>>>  	case KVM_DEV_VFIO_GROUP:
>>>  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
>>> +	case KVM_DEV_VFIO_DEVICE:
>>> +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
>>>  	}
>>>  
>>>  	return -ENXIO;
>>> @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
>>>  		case KVM_DEV_VFIO_GROUP_DEL:
>>>  			return 0;
>>>  		}
>>> -
>>>  		break;
>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>> +	case KVM_DEV_VFIO_DEVICE:
>>> +		switch (attr->attr) {
>>> +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>>> +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>>> +			return 0;
>>> +		}
>>> +		break;
>>> +#endif
>>>  	}
>>> -
>>>  	return -ENXIO;
>>>  }
>>>  
>>> @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
>>>  		list_del(&kvg->node);
>>>  		kfree(kvg);
>>>  	}
>>> +	kvm_vfio_put_all_devices(kv);
>>>  
>>>  	kvm_vfio_update_coherency(dev);
>>>  
>>> @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
>>>  		return -ENOMEM;
>>>  
>>>  	INIT_LIST_HEAD(&kv->group_list);
>>> +	INIT_LIST_HEAD(&kv->device_list);
>>>  	mutex_init(&kv->lock);
>>>  
>>>  	dev->private = kv;
>>> -- 
>>> 1.9.1
>>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11 12:04         ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11 12:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/2014 07:05 AM, Alex Williamson wrote:
> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
>> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
>>> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
>>>
>>> This is a new control channel which enables KVM to cooperate with
>>> viable VFIO devices.
>>>
>>> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
>>> in addition to a list of groups (kvm_vfio_group). The new
>>> infrastructure enables to check the validity of the VFIO device
>>> file descriptor, get and hold a reference to it.
>>>
>>> The first concrete implemented command is IRQ forward control:
>>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
>>>
>>> It consists in programing the VFIO driver and KVM in a consistent manner
>>> so that an optimized IRQ injection/completion is set up. Each
>>> kvm_vfio_device holds a list of forwarded IRQ. When putting a
>>> kvm_vfio_device, the implementation makes sure the forwarded IRQs
>>> are set again in the normal handling state (non forwarded).
>>
>> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
>>
>> When a kvm_vfio_device is released?
>>
>>>
>>> The forwarding programmming is architecture specific, embodied by the
>>> kvm_arch_set_fwd_state function. Its implementation is given in a
>>> separate patch file.
>>
>> I would drop the last sentence and instead indicate that this is handled
>> properly when the architecture does not support such a feature.
>>
>>>
>>> The forwarding control modality is enabled by the
>>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>>
>>> ---
>>>
>>> v1 -> v2:
>>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>> - original patch file separated into 2 parts: generic part moved in vfio.c
>>>   and ARM specific part(kvm_arch_set_fwd_state)
>>> ---
>>>  include/linux/kvm_host.h |  27 +++
>>>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
>>>  2 files changed, 477 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index a4c33b3..24350dc 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
>>>  		      unsigned long arg);
>>>  };
>>>  
>>> +enum kvm_fwd_irq_action {
>>> +	KVM_VFIO_IRQ_SET_FORWARD,
>>> +	KVM_VFIO_IRQ_SET_NORMAL,
>>> +	KVM_VFIO_IRQ_CLEANUP,
>>
>> This is KVM internal API, so it would probably be good to document this.
>> Especially the CLEANUP bit worries me, see below.
> 
> This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
Hi Alex,

will change that.
> Extra states worry me too.

I tried to explained the 2 motivations behind. Please let me know if it
makes sense.
> 
>>> +};
>>> +
>>> +/* internal structure describing a forwarded IRQ */
>>> +struct kvm_fwd_irq {
>>> +	struct list_head link;
>>
>> this list entry is local to the kvm vfio device, right? that means you
>> probably want a struct with just the below fields, and then have a
>> containing struct in the generic device file, private to it's logic.
> 
> Yes, this is part of the abstraction problem.
OK will fix that.
> 
>>> +	__u32 index; /* platform device irq index */
> 
> This is a vfio_device irq_index, but vfio_devices support indexes and
> sub-indexes.  At this level the API should match vfio, not the specifics
> of platform devices not supporting sub-index.
I will add sub-indexes then.
> 
>>> +	__u32 hwirq; /*physical IRQ */
>>> +	__u32 gsi; /* virtual IRQ */
>>> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> 
> Not sure I understand why vcpu is necessary.
vcpu is used when providing the physical IRQ/virtual IRQ mapping to the
virtual GIC. I can remove it from and add a vcpu struct * param to
kvm_arch_set_fwd_state if you prefer.

  Also I see a 'get' in the code below, but not a 'put'.
Sorry I do not understand your comment here? What 'get' do you mention?
> 
>>> +};
>>> +
>>>  void kvm_device_get(struct kvm_device *dev);
>>>  void kvm_device_put(struct kvm_device *dev);
>>>  struct kvm_device *kvm_device_from_filp(struct file *filp);
>>> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
>>>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
>>>  extern struct kvm_device_ops kvm_flic_ops;
>>>  
>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
>>
>> what's the 'p' in pfwd?
> 
> p is for pointer?
yes it was ;-)
> 
>>> +			   enum kvm_fwd_irq_action action);
>>> +
>>> +#else
>>> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
>>> +					 enum kvm_fwd_irq_action action)
>>> +{
>>> +	return 0;
>>> +}
>>> +#endif
>>> +
>>>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>>  
>>>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
>>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>>> index 76dc7a1..e4a81c4 100644
>>> --- a/virt/kvm/vfio.c
>>> +++ b/virt/kvm/vfio.c
>>> @@ -18,14 +18,24 @@
>>>  #include <linux/slab.h>
>>>  #include <linux/uaccess.h>
>>>  #include <linux/vfio.h>
>>> +#include <linux/platform_device.h>
>>>  
>>>  struct kvm_vfio_group {
>>>  	struct list_head node;
>>>  	struct vfio_group *vfio_group;
>>>  };
>>>  
>>> +struct kvm_vfio_device {
>>> +	struct list_head node;
>>> +	struct vfio_device *vfio_device;
>>> +	/* list of forwarded IRQs for that VFIO device */
>>> +	struct list_head fwd_irq_list;
>>> +	int fd;
>>> +};
>>> +
>>>  struct kvm_vfio {
>>>  	struct list_head group_list;
>>> +	struct list_head device_list;
>>>  	struct mutex lock;
>>>  	bool noncoherent;
>>>  };
>>> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>>>  	return -ENXIO;
>>>  }
>>>  
>>> +/**
>>> + * get_vfio_device - returns the vfio-device corresponding to this fd
>>> + * @fd:fd of the vfio platform device
>>> + *
>>> + * checks it is a vfio device
>>> + * increment its ref counter
>>
>> why the short lines?  Just write this out in proper English.
>>
>>> + */
>>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
>>> +{
>>> +	struct fd f;
>>> +	struct vfio_device *vdev;
>>> +
>>> +	f = fdget(fd);
>>> +	if (!f.file)
>>> +		return NULL;
>>> +	vdev = kvm_vfio_device_get_external_user(f.file);
>>> +	fdput(f);
>>> +	return vdev;
>>> +}
>>> +
>>> +/**
>>> + * put_vfio_device: put the vfio platform device
>>> + * @vdev: vfio_device to put
>>> + *
>>> + * decrement the ref counter
>>> + */
>>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
>>> +{
>>> +	kvm_vfio_device_put_external_user(vdev);
>>> +}
>>> +
>>> +/**
>>> + * kvm_vfio_find_device - look for the device in the assigned
>>> + * device list
>>> + * @kv: the kvm-vfio device
>>> + * @vdev: the vfio_device to look for
>>> + *
>>> + * returns the associated kvm_vfio_device if the device is known,
>>> + * meaning at least 1 IRQ is forwarded for this device.
>>> + * in the device is not registered, returns NULL.
>>> + */
> 
> Why are we talking about forwarded IRQs already, this is a simple lookup
> function, who knows what other users it will have in the future.
I will correct
> 
>>
>> are these functions meant to be exported?  Otherwise they should be
>> static, and the documentation on these simple list iteration wrappers
>> seems like overkill imho.
>>
>>> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
>>> +					     struct vfio_device *vdev)
>>> +{
>>> +	struct kvm_vfio_device *kvm_vdev_iter;
>>> +
>>> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
>>> +		if (kvm_vdev_iter->vfio_device == vdev)
>>> +			return kvm_vdev_iter;
>>> +	}
>>> +	return NULL;
>>> +}
>>> +
>>> +/**
>>> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
>>> + * @kvm_vdev: the kvm_vfio_device
>>> + * @irq_index: irq index
>>> + *
>>> + * returns the forwarded irq struct if it exists, NULL in the negative
>>> + */
>>> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
>>> +				      int irq_index)
> 
> +sub-index
OK
> 
> probably important to note on both of these that they need to be called
> with kv->lock
OK
> 
>>> +{
>>> +	struct kvm_fwd_irq *fwd_irq_iter;
>>> +
>>> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
>>> +		if (fwd_irq_iter->index == irq_index)
>>> +			return fwd_irq_iter;
>>> +	}
>>> +	return NULL;
>>> +}
>>> +
>>> +/**
>>> + * validate_forward - checks whether forwarding a given IRQ is meaningful
>>> + * @vdev:  vfio_device the IRQ belongs to
>>> + * @fwd_irq: user struct containing the irq_index to forward
>>> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
>>> + * kvm_vfio_device that holds it
>>> + * @hwirq: irq numberthe irq index corresponds to
>>> + *
>>> + * checks the vfio-device is a platform vfio device
>>> + * checks the irq_index corresponds to an actual hwirq and
>>> + * checks this hwirq is not already forwarded
>>> + * returns < 0 on following errors:
>>> + * not a platform device, bad irq index, already forwarded
>>> + */
>>> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
>>> +			    struct vfio_device *vdev,
>>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
>>> +			    struct kvm_vfio_device **kvm_vdev,
>>> +			    int *hwirq)
>>> +{
>>> +	struct device *dev = kvm_vfio_external_base_device(vdev);
>>> +	struct platform_device *platdev;
>>> +
>>> +	*hwirq = -1;
>>> +	*kvm_vdev = NULL;
>>> +	if (strcmp(dev->bus->name, "platform") == 0) {
> 
> Should be testing dev->bus_type == &platform_bus_type, and ideally
> creating a dev_is_platform() macro to make that even cleaner.
OK
> 
> However, we're being sort of sneaky here that we're actually doing
> something platform device specific here.  Why?  Don't we just need to
> make sure that kvm-vfio doesn't have any record of this forward
> (-EEXIST) and let the platform device code error out later for this
> case?
After having answered to Christoffer's comments, I think I should check
whether the IRQ is not already mapped at VGIC level. In that case I
would need to split the validate function into 2 parts:
- generic part only checks the vfio_device/irq_index is not already
recorded. I do not need the hwirq for that.
- arm specific part checks no GIC mapping does exist (need the hwirq)

Would it be OK for both of you?
> 
>>> +		platdev = to_platform_device(dev);
>>> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
>>> +		if (*hwirq < 0) {
>>> +			kvm_err("%s incorrect index\n", __func__);
>>> +			return -EINVAL;
>>> +		}
>>> +	} else {
>>> +		kvm_err("%s not a platform device\n", __func__);
>>> +		return -EINVAL;
>>> +	}
>>
>> need some spaceing here, also, I would turn this around, first check if
>> the strcmp fails, and then error out, then do you next check etc., to
>> avoid so many nested statements.
>>
>>> +	/* is a ref to this device already owned by the KVM-VFIO device? */
>>
>> this comment is not particularly helpful in its current form, it would
>> be helpful if you specified that we're checking whether that particular
>> device/irq combo is already registered.
>>
>>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
>>> +	if (*kvm_vdev) {
>>> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
>>> +			kvm_err("%s irq %d already forwarded\n",
>>> +				__func__, *hwirq);
> 
> Why didn't we do this first?
see above comment
> 
>> don't flood the kernel log because of a user error, just allocate an
>> error code for this purpose and document it in the ABI, -EEXIST or
>> something.
>>
>>> +			return -EINVAL;
>>> +		}
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>> +/**
>>> + * validate_unforward: check a deassignment is meaningful
>>> + * @kv: the kvm_vfio device
>>> + * @vdev: the vfio_device whose irq to deassign belongs to
>>> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
>>> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
>>> + * it exists
>>> + *
>>> + * returns 0 if the provided irq effectively is forwarded
>>> + * (a ref to this vfio_device is hold and this irq belongs to
>>                                     held
>>> + * the forwarded irq of this device)
>>> + * returns -EINVAL in the negative
>>
>>                ENOENT should be returned if you don't have an entry.
>> 	       EINVAL could be used if you supply an fd that isn't a
>> 	       VFIO device file descriptor, for example.  Again,
>> 	       consider documenting all this in the API.
>>
>>> + */
>>> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
>>> +			      struct vfio_device *vdev,
>>> +			      struct kvm_arch_forwarded_irq *fwd_irq,
>>> +			      struct kvm_vfio_device **kvm_vdev)
>>> +{
>>> +	struct kvm_fwd_irq *pfwd;
>>> +
>>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
>>> +	if (!kvm_vdev) {
>>> +		kvm_err("%s no forwarded irq for this device\n", __func__);
>>
>> don't flood the kernel log
>>
>>> +		return -EINVAL;
>>> +	}
>>> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
>>> +	if (!pfwd) {
>>> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
>>
>>> +		return -EINVAL;
>>
>> same here
I do not understand. With current functions I need to first retrieve the
device and then iterate on IRQs of that device.
>>
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>> +/**
>>> + * kvm_vfio_forward - set a forwarded IRQ
>>> + * @kdev: the kvm device
>>> + * @vdev: the vfio device the IRQ belongs to
>>> + * @fwd_irq: the user struct containing the irq_index and guest irq
>>> + * @must_put: tells the caller whether the vfio_device must be put after
>>> + * the call (ref must be released in case a ref onto this device was
>>> + * already hold or in case of new device and failure)
>>> + *
>>> + * validate the injection, activate forward and store the information
>>       Validate
>>> + * about which irq and which device is concerned so that on deassign or
>>> + * kvm-vfio destruction everuthing can be cleaned up.
>>                            everything
>>
>> I'm not sure I understand this explanation.  Do we have concerned
>> devices?
>>
>> I think you want to say something along the lines of: If userspace passed
>> a valid vfio device and irq handle and the architecture supports
>> forwarding this combination, register the vfio_device and irq
>> combination in the ....
>>
>>> + */
>>> +static int kvm_vfio_forward(struct kvm_device *kdev,
>>> +			    struct vfio_device *vdev,
>>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
>>> +			    bool *must_put)
>>> +{
>>> +	int ret;
>>> +	struct kvm_fwd_irq *pfwd = NULL;
>>> +	struct kvm_vfio_device *kvm_vdev = NULL;
>>> +	struct kvm_vfio *kv = kdev->private;
>>> +	int hwirq;
>>> +
>>> +	*must_put = true;
>>> +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
>>> +					&kvm_vdev, &hwirq);
>>> +	if (ret < 0)
>>> +		return -EINVAL;
>>> +
>>> +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
>>
>> seems a bit pointless to zero-out the memory if you're setting all
>> fields below.
>>
>>> +	if (!pfwd)
>>> +		return -ENOMEM;
>>> +	pfwd->index = fwd_irq->index;
>>> +	pfwd->gsi = fwd_irq->gsi;
>>> +	pfwd->hwirq = hwirq;
>>> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
>>> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
>>> +	if (ret < 0) {
>>> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
>>
>> this whole thing feels incredibly broken to me.  Setting a forward
>> should either work or not work, not something in between that leaves
>> something to be cleaned up.  Why this two-stage thingy here?
> 
> Yep, I agree.  I also don't see the point of the validate function, just
> open code it here and push the platform_get_irq test into
> kvm_arch_set_fwd_state.  kvm-vfio doesn't care about the hwirq.
> 
>>> +		kfree(pfwd);
>>
>> probably want to move your free-and-return-error to the end of the
>> function.
>>
>>> +		return ret;
>>> +	}
>>> +
>>> +	if (!kvm_vdev) {
>>> +		/* create & insert the new device and keep the ref */
>>> +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
>>
>> again, no need for zeroing out the memory.
> 
> I think you also want to allocate this before you setup the forward so
> you can eliminate any complicated teardown later.
ok
> 
>>> +		if (!kvm_vdev) {
>>> +			kvm_arch_set_fwd_state(pfwd, false);
> 
> false?  The function takes an enum.
Thanks for identifying that bug.
> 
>>> +			kfree(pfwd);
>>> +			return -ENOMEM;
>>> +		}
>>> +
>>> +		kvm_vdev->vfio_device = vdev;
>>> +		kvm_vdev->fd = fwd_irq->fd;
>>> +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
>>> +		list_add(&kvm_vdev->node, &kv->device_list);
>>> +		/*
>>> +		 * the only case where we keep the ref:
>>> +		 * new device and forward setting successful
>>> +		 */
>>> +		*must_put = false;
>>> +	}
>>> +
>>> +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
>>> +
>>> +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
>>> +	fwd_irq->fd, hwirq, fwd_irq->gsi);
>>
>> please indent this to align with the opening parenthesis.
>>
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +/**
>>> + * remove_assigned_device - put a given device from the list
>>
>> this isn't a 'put', at least not *just* a put.
>>
>>> + * @kv: the kvm-vfio device
>>> + * @vdev: the vfio-device to remove
>>> + *
>>> + * change the state of all forwarded IRQs, free the forwarded IRQ list,
>>> + * remove the corresponding kvm_vfio_device from the assigned device
>>> + * list.
>>> + * returns true if the device could be removed, false in the negative
>>> + */
>>> +bool remove_assigned_device(struct kvm_vfio *kv,
>>> +			    struct vfio_device *vdev)
>>> +{
>>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
>>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>>> +	bool removed = false;
>>> +	int ret;
>>> +
>>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
>>> +				 &kv->device_list, node) {
>>> +		if (kvm_vdev_iter->vfio_device == vdev) {
>>> +			/* loop on all its forwarded IRQ */
>>> +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>>> +						 &kvm_vdev_iter->fwd_irq_list,
>>> +						 link) {
>>
>> hmm, seems this function is only called when you have no more forwarded
>> IRQs, so isn't all of this completely dead (and unnecessary) code?
>>
>>> +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
>>> +						KVM_VFIO_IRQ_SET_NORMAL);
>>> +				if (ret < 0)
>>> +					return ret;
>>
>> you're returning an error code to a bool function, which means you'll
>> return true when there was an error.  Is this your intention? ;)
>>
>> if we have an error here, this would be a very very bad situation wouldn't it?
>>
>>> +				list_del(&fwd_irq_iter->link);
>>> +				kfree(fwd_irq_iter);
>>> +			}
>>> +			/* all IRQs could be deassigned */
>>> +			list_del(&kvm_vdev_iter->node);
>>> +			kvm_vfio_device_put_external_user(
>>> +				kvm_vdev_iter->vfio_device);
>>> +			kfree(kvm_vdev_iter);
>>> +			removed = true;
>>> +			break;
>>> +		}
>>> +	}
>>> +	return removed;
>>> +}
>>> +
>>> +
>>> +/**
>>> + * remove_fwd_irq - remove a forwarded irq
>>> + *
>>> + * @kv: kvm-vfio device
>>> + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
>>> + * irq_index: the index of the IRQ
>>> + *
>>> + * change the forwarded state of the IRQ, remove the IRQ from
>>> + * the device forwarded IRQ list. In case it is the last one,
>>> + * put the device
>>> + */
>>> +int remove_fwd_irq(struct kvm_vfio *kv,
>>> +		   struct kvm_vfio_device *kvm_vdev,
>>> +		   int irq_index)
>>> +{
>>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>>> +	int ret = -1;
>>> +
>>> +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>>> +				 &kvm_vdev->fwd_irq_list, link) {
>>
>> hmmm, you can only forward one irq for a specific device once, right?
>> And you already have a lookup function, so why not call that, and then
>> remove it?
>>
>> I'm confused.

> 
> Me too, this and the previous function need some more consideration.
understood
> 
>>> +		if (fwd_irq_iter->index == irq_index) {
>>> +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
>>> +						KVM_VFIO_IRQ_SET_NORMAL);
>>> +			if (ret < 0)
>>> +				break;
>>> +			list_del(&fwd_irq_iter->link);
>>> +			kfree(fwd_irq_iter);
>>> +			ret = 0;
>>> +			break;
>>> +		}
>>> +	}
>>> +	if (list_empty(&kvm_vdev->fwd_irq_list))
>>> +		remove_assigned_device(kv, kvm_vdev->vfio_device);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +/**
>>> + * kvm_vfio_unforward - remove a forwarded IRQ
>>> + * @kdev: the kvm device
>>> + * @vdev: the vfio_device
>>> + * @fwd_irq: user struct
>>> + * after checking this IRQ effectively is forwarded, change its state,
>>> + * remove it from the corresponding kvm_vfio_device list
>>> + */
>>> +static int kvm_vfio_unforward(struct kvm_device *kdev,
>>> +				     struct vfio_device *vdev,
>>> +				     struct kvm_arch_forwarded_irq *fwd_irq)
>>> +{
>>> +	struct kvm_vfio *kv = kdev->private;
>>> +	struct kvm_vfio_device *kvm_vdev;
>>> +	int ret;
>>> +
>>> +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
>>> +	if (ret < 0)
>>> +		return -EINVAL;
>>
>> why do you override the return value?  Propagate it.
>>
>>> +
>>> +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
>>> +	if (ret < 0)
>>> +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
>>> +			__func__, fwd_irq->fd, fwd_irq->index);
>>> +	else
>>> +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
>>> +			  __func__, fwd_irq->fd, fwd_irq->index);
>>
>> again with the kernel log here.
>>
>>
>>
>>> +	return ret;
>>> +}
>>> +
>>> +
>>> +
>>> +
>>> +/**
>>> + * kvm_vfio_set_device - the top function for interracting with a vfio
>>
>>                                 top?             interacting
>>
>>> + * device
>>> + */
>>
>> probably just skip this comment
>>
>>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
>>> +{
>>> +	struct kvm_vfio *kv = kdev->private;
>>> +	struct vfio_device *vdev;
>>> +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
>>> +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
>>> +
>>> +	switch (attr) {
>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>> +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
>>> +		bool must_put;
>>> +		int ret;
>>> +
>>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
>>> +			return -EFAULT;
>>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
>>> +		if (IS_ERR(vdev))
>>> +			return PTR_ERR(vdev);
>>
>> seems like this whole block of code is replicated below, needs
>> refactoring.
>>
>>> +		mutex_lock(&kv->lock);
>>> +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
>>> +		if (must_put)
>>> +			kvm_vfio_put_vfio_device(vdev);
>>
>> this must_put looks plain weird.  I think you want to balance your
>> get/put's always; can't you just get an extra reference in
>> kvm_vfio_forward() ?
> 
> Yeah, this is very broken.  Every forwarded IRQ should hold a reference
> to the vfio_device.  Every unforward should drop a reference.  Trying to
> maintain a single reference is a non-goal.
OK will do that.
> 
>>
>>> +		mutex_unlock(&kv->lock);
>>> +		return ret;
>>> +		}
>>> +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
>>> +		int ret;
>>> +
>>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
>>> +			return -EFAULT;
>>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
>>> +		if (IS_ERR(vdev))
>>> +			return PTR_ERR(vdev);
>>> +
>>> +		kvm_vfio_device_put_external_user(vdev);
>>
>> you're dropping the reference to the device but referencing it in your
>> unfoward call below?
>>
>>> +		mutex_lock(&kv->lock);
>>> +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
>>> +		mutex_unlock(&kv->lock);
>>> +		return ret;
>>> +	}
>>> +#endif
>>> +	default:
>>> +		return -ENXIO;
>>> +	}
>>> +}
>>> +
>>> +/**
>>> + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
>>> + * @kv: kvm-vfio device
>>> + *
>>> + * loop on all got devices and their associated forwarded IRQs
>>
>> 'loop on all got' ?
>>
>> Restore the non-forwarded state for all registered devices and ...
>>
>>> + * restore the non forwarded state, remove IRQs and their devices from
>>> + * the respective list, put the vfio platform devices
>>> + *
>>> + * When this function is called, the vcpu already are destroyed. No
>>                                     the VPUCs are already destroyed.
>>> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
>>> + * kvm_arch_set_fwd_state action
>>
>> this last bit didn't make any sense to me.  Also, why are we referring
>> to the vgic in generic code?
>>
>>> + */
>>> +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
>>> +{
>>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
>>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
>>> +
>>> +	/* loop on all the assigned devices */
>>
>> unnecessary comment
>>
>>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
>>> +				 &kv->device_list, node) {
>>> +
>>> +		/* loop on all its forwarded IRQ */
>>
>> same
>>
>>> +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
>>> +					 &kvm_vdev_iter->fwd_irq_list, link) {
>>> +			kvm_arch_set_fwd_state(fwd_irq_iter,
>>> +						KVM_VFIO_IRQ_CLEANUP);
>>> +			list_del(&fwd_irq_iter->link);
>>> +			kfree(fwd_irq_iter);
>>> +		}
> 
> 
> Ugh, how many of these cleanup functions do we need?
will simplify!

Thanks

Best Regards

Eric
> 
>>> +		list_del(&kvm_vdev_iter->node);
>>> +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
>>> +		kfree(kvm_vdev_iter);
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>> +
>>>  static int kvm_vfio_set_attr(struct kvm_device *dev,
>>>  			     struct kvm_device_attr *attr)
>>>  {
>>>  	switch (attr->group) {
>>>  	case KVM_DEV_VFIO_GROUP:
>>>  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
>>> +	case KVM_DEV_VFIO_DEVICE:
>>> +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
>>>  	}
>>>  
>>>  	return -ENXIO;
>>> @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
>>>  		case KVM_DEV_VFIO_GROUP_DEL:
>>>  			return 0;
>>>  		}
>>> -
>>>  		break;
>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>> +	case KVM_DEV_VFIO_DEVICE:
>>> +		switch (attr->attr) {
>>> +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>>> +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>>> +			return 0;
>>> +		}
>>> +		break;
>>> +#endif
>>>  	}
>>> -
>>>  	return -ENXIO;
>>>  }
>>>  
>>> @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
>>>  		list_del(&kvg->node);
>>>  		kfree(kvg);
>>>  	}
>>> +	kvm_vfio_put_all_devices(kv);
>>>  
>>>  	kvm_vfio_update_coherency(dev);
>>>  
>>> @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
>>>  		return -ENOMEM;
>>>  
>>>  	INIT_LIST_HEAD(&kv->group_list);
>>> +	INIT_LIST_HEAD(&kv->device_list);
>>>  	mutex_init(&kv->lock);
>>>  
>>>  	dev->private = kv;
>>> -- 
>>> 1.9.1
>>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-11  9:35       ` Eric Auger
  (?)
@ 2014-09-11 15:47         ` Alex Williamson
  -1 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11 15:47 UTC (permalink / raw)
  To: Eric Auger
  Cc: Christoffer Dall, eric.auger, marc.zyngier, linux-arm-kernel,
	kvmarm, kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, 2014-09-11 at 11:35 +0200, Eric Auger wrote:
> On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> >> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> >>
> >> This is a new control channel which enables KVM to cooperate with
> >> viable VFIO devices.
> >>
> >> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> >> in addition to a list of groups (kvm_vfio_group). The new
> >> infrastructure enables to check the validity of the VFIO device
> >> file descriptor, get and hold a reference to it.
> >>
> >> The first concrete implemented command is IRQ forward control:
> >> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> >>
> >> It consists in programing the VFIO driver and KVM in a consistent manner
> >> so that an optimized IRQ injection/completion is set up. Each
> >> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> >> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> >> are set again in the normal handling state (non forwarded).
> > 
> > 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> > 
> > When a kvm_vfio_device is released?
> sure
> > 
> >>
> >> The forwarding programmming is architecture specific, embodied by the
> >> kvm_arch_set_fwd_state function. Its implementation is given in a
> >> separate patch file.
> > 
> > I would drop the last sentence and instead indicate that this is handled
> > properly when the architecture does not support such a feature.
> ok
> > 
> >>
> >> The forwarding control modality is enabled by the
> >> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>
> >> ---
> >>
> >> v1 -> v2:
> >> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> - original patch file separated into 2 parts: generic part moved in vfio.c
> >>   and ARM specific part(kvm_arch_set_fwd_state)
> >> ---
> >>  include/linux/kvm_host.h |  27 +++
> >>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> >>  2 files changed, 477 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index a4c33b3..24350dc 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> >>  		      unsigned long arg);
> >>  };
> >>  
> >> +enum kvm_fwd_irq_action {
> >> +	KVM_VFIO_IRQ_SET_FORWARD,
> >> +	KVM_VFIO_IRQ_SET_NORMAL,
> >> +	KVM_VFIO_IRQ_CLEANUP,
> > 
> > This is KVM internal API, so it would probably be good to document this.
> > Especially the CLEANUP bit worries me, see below.
> I will document it
> > 
> >> +};
> >> +
> >> +/* internal structure describing a forwarded IRQ */
> >> +struct kvm_fwd_irq {
> >> +	struct list_head link;
> > 
> > this list entry is local to the kvm vfio device, right? that means you
> > probably want a struct with just the below fields, and then have a
> > containing struct in the generic device file, private to it's logic.
> I will introduce 2 separate structs
> > 
> >> +	__u32 index; /* platform device irq index */
> >> +	__u32 hwirq; /*physical IRQ */
> >> +	__u32 gsi; /* virtual IRQ */
> >> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> >> +};
> >> +
> >>  void kvm_device_get(struct kvm_device *dev);
> >>  void kvm_device_put(struct kvm_device *dev);
> >>  struct kvm_device *kvm_device_from_filp(struct file *filp);
> >> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
> >>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >>  extern struct kvm_device_ops kvm_flic_ops;
> >>  
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > 
> > what's the 'p' in pfwd?
> will rename
> > 
> >> +			   enum kvm_fwd_irq_action action);
> >> +
> >> +#else
> >> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> >> +					 enum kvm_fwd_irq_action action)
> >> +{
> >> +	return 0;
> >> +}
> >> +#endif
> >> +
> >>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >>  
> >>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> >> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> >> index 76dc7a1..e4a81c4 100644
> >> --- a/virt/kvm/vfio.c
> >> +++ b/virt/kvm/vfio.c
> >> @@ -18,14 +18,24 @@
> >>  #include <linux/slab.h>
> >>  #include <linux/uaccess.h>
> >>  #include <linux/vfio.h>
> >> +#include <linux/platform_device.h>
> >>  
> >>  struct kvm_vfio_group {
> >>  	struct list_head node;
> >>  	struct vfio_group *vfio_group;
> >>  };
> >>  
> >> +struct kvm_vfio_device {
> >> +	struct list_head node;
> >> +	struct vfio_device *vfio_device;
> >> +	/* list of forwarded IRQs for that VFIO device */
> >> +	struct list_head fwd_irq_list;
> >> +	int fd;
> >> +};
> >> +
> >>  struct kvm_vfio {
> >>  	struct list_head group_list;
> >> +	struct list_head device_list;
> >>  	struct mutex lock;
> >>  	bool noncoherent;
> >>  };
> >> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >>  	return -ENXIO;
> >>  }
> >>  
> >> +/**
> >> + * get_vfio_device - returns the vfio-device corresponding to this fd
> >> + * @fd:fd of the vfio platform device
> >> + *
> >> + * checks it is a vfio device
> >> + * increment its ref counter
> > 
> > why the short lines?  Just write this out in proper English.
> OK
> > 
> >> + */
> >> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> >> +{
> >> +	struct fd f;
> >> +	struct vfio_device *vdev;
> >> +
> >> +	f = fdget(fd);
> >> +	if (!f.file)
> >> +		return NULL;
> >> +	vdev = kvm_vfio_device_get_external_user(f.file);
> >> +	fdput(f);
> >> +	return vdev;
> >> +}
> >> +
> >> +/**
> >> + * put_vfio_device: put the vfio platform device
> >> + * @vdev: vfio_device to put
> >> + *
> >> + * decrement the ref counter
> >> + */
> >> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> >> +{
> >> +	kvm_vfio_device_put_external_user(vdev);
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_find_device - look for the device in the assigned
> >> + * device list
> >> + * @kv: the kvm-vfio device
> >> + * @vdev: the vfio_device to look for
> >> + *
> >> + * returns the associated kvm_vfio_device if the device is known,
> >> + * meaning at least 1 IRQ is forwarded for this device.
> >> + * in the device is not registered, returns NULL.
> >> + */
> > 
> > are these functions meant to be exported?  Otherwise they should be
> > static, and the documentation on these simple list iteration wrappers
> > seems like overkill imho.
> could be static indeed
> > 
> >> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> >> +					     struct vfio_device *vdev)
> >> +{
> >> +	struct kvm_vfio_device *kvm_vdev_iter;
> >> +
> >> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> >> +		if (kvm_vdev_iter->vfio_device == vdev)
> >> +			return kvm_vdev_iter;
> >> +	}
> >> +	return NULL;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> >> + * @kvm_vdev: the kvm_vfio_device
> >> + * @irq_index: irq index
> >> + *
> >> + * returns the forwarded irq struct if it exists, NULL in the negative
> >> + */
> >> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> >> +				      int irq_index)
> >> +{
> >> +	struct kvm_fwd_irq *fwd_irq_iter;
> >> +
> >> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> >> +		if (fwd_irq_iter->index == irq_index)
> >> +			return fwd_irq_iter;
> >> +	}
> >> +	return NULL;
> >> +}
> >> +
> >> +/**
> >> + * validate_forward - checks whether forwarding a given IRQ is meaningful
> >> + * @vdev:  vfio_device the IRQ belongs to
> >> + * @fwd_irq: user struct containing the irq_index to forward
> >> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> >> + * kvm_vfio_device that holds it
> >> + * @hwirq: irq numberthe irq index corresponds to
> >> + *
> >> + * checks the vfio-device is a platform vfio device
> >> + * checks the irq_index corresponds to an actual hwirq and
> >> + * checks this hwirq is not already forwarded
> >> + * returns < 0 on following errors:
> >> + * not a platform device, bad irq index, already forwarded
> >> + */
> >> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> >> +			    struct vfio_device *vdev,
> >> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >> +			    struct kvm_vfio_device **kvm_vdev,
> >> +			    int *hwirq)
> >> +{
> >> +	struct device *dev = kvm_vfio_external_base_device(vdev);
> >> +	struct platform_device *platdev;
> >> +
> >> +	*hwirq = -1;
> >> +	*kvm_vdev = NULL;
> >> +	if (strcmp(dev->bus->name, "platform") == 0) {
> >> +		platdev = to_platform_device(dev);
> >> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> >> +		if (*hwirq < 0) {
> >> +			kvm_err("%s incorrect index\n", __func__);
> >> +			return -EINVAL;
> >> +		}
> >> +	} else {
> >> +		kvm_err("%s not a platform device\n", __func__);
> >> +		return -EINVAL;
> >> +	}
> > 
> > need some spaceing here, also, I would turn this around, first check if
> > the strcmp fails, and then error out, then do you next check etc., to
> > avoid so many nested statements.
> ok
> > 
> >> +	/* is a ref to this device already owned by the KVM-VFIO device? */
> > 
> > this comment is not particularly helpful in its current form, it would
> > be helpful if you specified that we're checking whether that particular
> > device/irq combo is already registered.
> > 
> >> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >> +	if (*kvm_vdev) {
> >> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> >> +			kvm_err("%s irq %d already forwarded\n",
> >> +				__func__, *hwirq);
> > 
> > don't flood the kernel log because of a user error, just allocate an
> > error code for this purpose and document it in the ABI, -EEXIST or
> > something.
> ok
> > 
> >> +			return -EINVAL;
> >> +		}
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * validate_unforward: check a deassignment is meaningful
> >> + * @kv: the kvm_vfio device
> >> + * @vdev: the vfio_device whose irq to deassign belongs to
> >> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> >> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> >> + * it exists
> >> + *
> >> + * returns 0 if the provided irq effectively is forwarded
> >> + * (a ref to this vfio_device is hold and this irq belongs to
> >                                     held
> >> + * the forwarded irq of this device)
> >> + * returns -EINVAL in the negative
> > 
> >                ENOENT should be returned if you don't have an entry.
> > 	       EINVAL could be used if you supply an fd that isn't a
> > 	       VFIO device file descriptor, for example.  Again,
> > 	       consider documenting all this in the API.
> > 
> >> + */
> >> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> >> +			      struct vfio_device *vdev,
> >> +			      struct kvm_arch_forwarded_irq *fwd_irq,
> >> +			      struct kvm_vfio_device **kvm_vdev)
> >> +{
> >> +	struct kvm_fwd_irq *pfwd;
> >> +
> >> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >> +	if (!kvm_vdev) {
> >> +		kvm_err("%s no forwarded irq for this device\n", __func__);
> > 
> > don't flood the kernel log
> ok
> > 
> >> +		return -EINVAL;
> >> +	}
> >> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> >> +	if (!pfwd) {
> >> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> > 
> > 
> >> +		return -EINVAL;
> > 
> > same here
> ok
> > 
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_forward - set a forwarded IRQ
> >> + * @kdev: the kvm device
> >> + * @vdev: the vfio device the IRQ belongs to
> >> + * @fwd_irq: the user struct containing the irq_index and guest irq
> >> + * @must_put: tells the caller whether the vfio_device must be put after
> >> + * the call (ref must be released in case a ref onto this device was
> >> + * already hold or in case of new device and failure)
> >> + *
> >> + * validate the injection, activate forward and store the information
> >       Validate
> >> + * about which irq and which device is concerned so that on deassign or
> >> + * kvm-vfio destruction everuthing can be cleaned up.
> >                            everything
> > 
> > I'm not sure I understand this explanation.  Do we have concerned
> > devices?
> > 
> > I think you want to say something along the lines of: If userspace passed
> > a valid vfio device and irq handle and the architecture supports
> > forwarding this combination, register the vfio_device and irq
> > combination in the ....
> ok
> > 
> >> + */
> >> +static int kvm_vfio_forward(struct kvm_device *kdev,
> >> +			    struct vfio_device *vdev,
> >> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >> +			    bool *must_put)
> >> +{
> >> +	int ret;
> >> +	struct kvm_fwd_irq *pfwd = NULL;
> >> +	struct kvm_vfio_device *kvm_vdev = NULL;
> >> +	struct kvm_vfio *kv = kdev->private;
> >> +	int hwirq;
> >> +
> >> +	*must_put = true;
> >> +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
> >> +					&kvm_vdev, &hwirq);
> >> +	if (ret < 0)
> >> +		return -EINVAL;
> >> +
> >> +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
> > 
> > seems a bit pointless to zero-out the memory if you're setting all
> > fields below.
> ok
> > 
> >> +	if (!pfwd)
> >> +		return -ENOMEM;
> >> +	pfwd->index = fwd_irq->index;
> >> +	pfwd->gsi = fwd_irq->gsi;
> >> +	pfwd->hwirq = hwirq;
> >> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> >> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> >> +	if (ret < 0) {
> >> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> > 
> > this whole thing feels incredibly broken to me.  Setting a forward
> > should either work or not work, not something in between that leaves
> > something to be cleaned up.  Why this two-stage thingy here?
> I wanted to exploit the return value of vgic_map_phys_irq which is
> likely to fail if the phys/virt mapping exists at VGIC level.
> 
> I already validated the injection from a KVM_VFIO_DEVICE point of view
> (the device/irq is not known internally). But what if another external
> component - which does not exist yet - maps the IRQ at VGIC level? Maybe
> I need to replace the existing validation check by querying the VGIC at
> low level instead of checking KVM-VFIO local variables.

The kvm-vfio interface needs to follow the user API, an IRQ is either
forwarded or not forwarded.  We're either tracking it or not tracking
it.  This limbo state doesn't make any sense to track here.  The
kvm-vfio level validation (testing for duplicates) should be device
agnostic.  kvm_arch_set_fwd_state() is where any lower level tests
should be done.

> > 
> >> +		kfree(pfwd);
> > 
> > probably want to move your free-and-return-error to the end of the
> > function.
> ok
> > 
> >> +		return ret;
> >> +	}
> >> +
> >> +	if (!kvm_vdev) {
> >> +		/* create & insert the new device and keep the ref */
> >> +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
> > 
> > again, no need for zeroing out the memory.
> ok
> > 
> >> +		if (!kvm_vdev) {
> >> +			kvm_arch_set_fwd_state(pfwd, false);
> >> +			kfree(pfwd);
> >> +			return -ENOMEM;
> >> +		}
> >> +
> >> +		kvm_vdev->vfio_device = vdev;
> >> +		kvm_vdev->fd = fwd_irq->fd;
> >> +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
> >> +		list_add(&kvm_vdev->node, &kv->device_list);
> >> +		/*
> >> +		 * the only case where we keep the ref:
> >> +		 * new device and forward setting successful
> >> +		 */
> >> +		*must_put = false;
> >> +	}
> >> +
> >> +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
> >> +
> >> +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
> >> +	fwd_irq->fd, hwirq, fwd_irq->gsi);
> > 
> > please indent this to align with the opening parenthesis.
> ok
> > 
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * remove_assigned_device - put a given device from the list
> > 
> > this isn't a 'put', at least not *just* a put.
> correct, I will rephrase
> > 
> >> + * @kv: the kvm-vfio device
> >> + * @vdev: the vfio-device to remove
> >> + *
> >> + * change the state of all forwarded IRQs, free the forwarded IRQ list,
> >> + * remove the corresponding kvm_vfio_device from the assigned device
> >> + * list.
> >> + * returns true if the device could be removed, false in the negative
> >> + */
> >> +bool remove_assigned_device(struct kvm_vfio *kv,
> >> +			    struct vfio_device *vdev)
> >> +{
> >> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> >> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >> +	bool removed = false;
> >> +	int ret;
> >> +
> >> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> >> +				 &kv->device_list, node) {
> >> +		if (kvm_vdev_iter->vfio_device == vdev) {
> >> +			/* loop on all its forwarded IRQ */
> >> +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >> +						 &kvm_vdev_iter->fwd_irq_list,
> >> +						 link) {
> > 
> > hmm, seems this function is only called when you have no more forwarded
> > IRQs, so isn't all of this completely dead (and unnecessary) code?
> yep I can simplify all that cleanup
> > 
> >> +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> >> +						KVM_VFIO_IRQ_SET_NORMAL);
> >> +				if (ret < 0)
> >> +					return ret;
> > 
> > you're returning an error code to a bool function, which means you'll
> > return true when there was an error.  Is this your intention? ;)
> definitively not!
> > 
> > if we have an error here, this would be a very very bad situation wouldn't it?
> sure. I will simplify this, transform kvm_arch_set_fwd_state into a void
> function

Please no, kvm_arch_set_fwd_state() needs to indicate to kvm-vfio
whether the requested forward was setup, how can it do that without a
return?

> > 
> >> +				list_del(&fwd_irq_iter->link);
> >> +				kfree(fwd_irq_iter);
> >> +			}
> >> +			/* all IRQs could be deassigned */
> >> +			list_del(&kvm_vdev_iter->node);
> >> +			kvm_vfio_device_put_external_user(
> >> +				kvm_vdev_iter->vfio_device);
> >> +			kfree(kvm_vdev_iter);
> >> +			removed = true;
> >> +			break;
> >> +		}
> >> +	}
> >> +	return removed;
> >> +}
> >> +
> >> +
> >> +/**
> >> + * remove_fwd_irq - remove a forwarded irq
> >> + *
> >> + * @kv: kvm-vfio device
> >> + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
> >> + * irq_index: the index of the IRQ
> >> + *
> >> + * change the forwarded state of the IRQ, remove the IRQ from
> >> + * the device forwarded IRQ list. In case it is the last one,
> >> + * put the device
> >> + */
> >> +int remove_fwd_irq(struct kvm_vfio *kv,
> >> +		   struct kvm_vfio_device *kvm_vdev,
> >> +		   int irq_index)
> >> +{
> >> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >> +	int ret = -1;
> >> +
> >> +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >> +				 &kvm_vdev->fwd_irq_list, link) {
> > 
> > hmmm, you can only forward one irq for a specific device once, right?
> > And you already have a lookup function, so why not call that, and then
> > remove it?
> > 
> > I'm confused.
> will fix that
> > 
> >> +		if (fwd_irq_iter->index == irq_index) {
> >> +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> >> +						KVM_VFIO_IRQ_SET_NORMAL);
> >> +			if (ret < 0)
> >> +				break;
> >> +			list_del(&fwd_irq_iter->link);
> >> +			kfree(fwd_irq_iter);
> >> +			ret = 0;
> >> +			break;
> >> +		}
> >> +	}
> >> +	if (list_empty(&kvm_vdev->fwd_irq_list))
> >> +		remove_assigned_device(kv, kvm_vdev->vfio_device);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_unforward - remove a forwarded IRQ
> >> + * @kdev: the kvm device
> >> + * @vdev: the vfio_device
> >> + * @fwd_irq: user struct
> >> + * after checking this IRQ effectively is forwarded, change its state,
> >> + * remove it from the corresponding kvm_vfio_device list
> >> + */
> >> +static int kvm_vfio_unforward(struct kvm_device *kdev,
> >> +				     struct vfio_device *vdev,
> >> +				     struct kvm_arch_forwarded_irq *fwd_irq)
> >> +{
> >> +	struct kvm_vfio *kv = kdev->private;
> >> +	struct kvm_vfio_device *kvm_vdev;
> >> +	int ret;
> >> +
> >> +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
> >> +	if (ret < 0)
> >> +		return -EINVAL;
> > 
> > why do you override the return value?  Propagate it.
> ok
> > 
> >> +
> >> +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
> >> +	if (ret < 0)
> >> +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
> >> +			__func__, fwd_irq->fd, fwd_irq->index);
> >> +	else
> >> +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
> >> +			  __func__, fwd_irq->fd, fwd_irq->index);
> > 
> > again with the kernel log here.
> ok
> > 
> > 
> > 
> >> +	return ret;
> >> +}
> >> +
> >> +
> >> +
> >> +
> >> +/**
> >> + * kvm_vfio_set_device - the top function for interracting with a vfio
> > 
> >                                 top?             interacting
> > 
> >> + * device
> >> + */
> > 
> > probably just skip this comment
> ok
> > 
> >> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> >> +{
> >> +	struct kvm_vfio *kv = kdev->private;
> >> +	struct vfio_device *vdev;
> >> +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
> >> +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> >> +
> >> +	switch (attr) {
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> >> +		bool must_put;
> >> +		int ret;
> >> +
> >> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> >> +			return -EFAULT;
> >> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> >> +		if (IS_ERR(vdev))
> >> +			return PTR_ERR(vdev);
> > 
> > seems like this whole block of code is replicated below, needs
> > refactoring.
> ok
> > 
> >> +		mutex_lock(&kv->lock);
> >> +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
> >> +		if (must_put)
> >> +			kvm_vfio_put_vfio_device(vdev);
> > 
> > this must_put looks plain weird.  I think you want to balance your
> > get/put's always; can't you just get an extra reference in
> > kvm_vfio_forward() ?
> I will investigate that. Makes sense
> > 
> >> +		mutex_unlock(&kv->lock);
> >> +		return ret;
> >> +		}
> >> +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
> >> +		int ret;
> >> +
> >> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> >> +			return -EFAULT;
> >> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> >> +		if (IS_ERR(vdev))
> >> +			return PTR_ERR(vdev);
> >> +
> >> +		kvm_vfio_device_put_external_user(vdev);
> > 
> > you're dropping the reference to the device but referencing it in your
> > unfoward call below?
> thanks for identifying that bug.
> > 
> >> +		mutex_lock(&kv->lock);
> >> +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
> >> +		mutex_unlock(&kv->lock);
> >> +		return ret;
> >> +	}
> >> +#endif
> >> +	default:
> >> +		return -ENXIO;
> >> +	}
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
> >> + * @kv: kvm-vfio device
> >> + *
> >> + * loop on all got devices and their associated forwarded IRQs
> > 
> > 'loop on all got' ?
> > 
> > Restore the non-forwarded state for all registered devices and ...
> ok
> > 
> >> + * restore the non forwarded state, remove IRQs and their devices from
> >> + * the respective list, put the vfio platform devices
> >> + *
> >> + * When this function is called, the vcpu already are destroyed. No
> >                                     the VPUCs are already destroyed.
> >> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> >> + * kvm_arch_set_fwd_state action
> > 
> > this last bit didn't make any sense to me.  Also, why are we referring
> > to the vgic in generic code?
> doesn't make sense anymore indeed. I wanted to emphasize the fact that
> VGIC KVM device is destroyed before the KVM VFIO device and this
> explains why I need a special CLEANUP cmd (besides the fact I need to
> call chip->irq_eoi(d) for the forwarded IRQs);

Nope, still doesn't make sense or justify the additional state for me.

> > 
> >> + */
> >> +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
> >> +{
> >> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> >> +
> >> +	/* loop on all the assigned devices */
> > 
> > unnecessary comment
> ok
> > 
> >> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> >> +				 &kv->device_list, node) {
> >> +
> >> +		/* loop on all its forwarded IRQ */
> > 
> > same
> ok
> 
> Thanks for the detailed review
> 
> Best Regards
> 
> Eric
> > 
> >> +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >> +					 &kvm_vdev_iter->fwd_irq_list, link) {
> >> +			kvm_arch_set_fwd_state(fwd_irq_iter,
> >> +						KVM_VFIO_IRQ_CLEANUP);
> >> +			list_del(&fwd_irq_iter->link);
> >> +			kfree(fwd_irq_iter);
> >> +		}
> >> +		list_del(&kvm_vdev_iter->node);
> >> +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
> >> +		kfree(kvm_vdev_iter);
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +
> >>  static int kvm_vfio_set_attr(struct kvm_device *dev,
> >>  			     struct kvm_device_attr *attr)
> >>  {
> >>  	switch (attr->group) {
> >>  	case KVM_DEV_VFIO_GROUP:
> >>  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> >> +	case KVM_DEV_VFIO_DEVICE:
> >> +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> >>  	}
> >>  
> >>  	return -ENXIO;
> >> @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> >>  		case KVM_DEV_VFIO_GROUP_DEL:
> >>  			return 0;
> >>  		}
> >> -
> >>  		break;
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> +	case KVM_DEV_VFIO_DEVICE:
> >> +		switch (attr->attr) {
> >> +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >> +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >> +			return 0;
> >> +		}
> >> +		break;
> >> +#endif
> >>  	}
> >> -
> >>  	return -ENXIO;
> >>  }
> >>  
> >> @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> >>  		list_del(&kvg->node);
> >>  		kfree(kvg);
> >>  	}
> >> +	kvm_vfio_put_all_devices(kv);
> >>  
> >>  	kvm_vfio_update_coherency(dev);
> >>  
> >> @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
> >>  		return -ENOMEM;
> >>  
> >>  	INIT_LIST_HEAD(&kv->group_list);
> >> +	INIT_LIST_HEAD(&kv->device_list);
> >>  	mutex_init(&kv->lock);
> >>  
> >>  	dev->private = kv;
> >> -- 
> >> 1.9.1
> >>
> 




^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11 15:47         ` Alex Williamson
  0 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11 15:47 UTC (permalink / raw)
  To: Eric Auger
  Cc: joel.schopp, kim.phillips, eric.auger, kvm, patches,
	marc.zyngier, john.liuli, will.deacon, linux-kernel, a.rigo,
	gleb, paulus, Christoffer Dall, a.motakis, pbonzini, kvmarm,
	linux-arm-kernel

On Thu, 2014-09-11 at 11:35 +0200, Eric Auger wrote:
> On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> >> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> >>
> >> This is a new control channel which enables KVM to cooperate with
> >> viable VFIO devices.
> >>
> >> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> >> in addition to a list of groups (kvm_vfio_group). The new
> >> infrastructure enables to check the validity of the VFIO device
> >> file descriptor, get and hold a reference to it.
> >>
> >> The first concrete implemented command is IRQ forward control:
> >> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> >>
> >> It consists in programing the VFIO driver and KVM in a consistent manner
> >> so that an optimized IRQ injection/completion is set up. Each
> >> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> >> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> >> are set again in the normal handling state (non forwarded).
> > 
> > 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> > 
> > When a kvm_vfio_device is released?
> sure
> > 
> >>
> >> The forwarding programmming is architecture specific, embodied by the
> >> kvm_arch_set_fwd_state function. Its implementation is given in a
> >> separate patch file.
> > 
> > I would drop the last sentence and instead indicate that this is handled
> > properly when the architecture does not support such a feature.
> ok
> > 
> >>
> >> The forwarding control modality is enabled by the
> >> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>
> >> ---
> >>
> >> v1 -> v2:
> >> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> - original patch file separated into 2 parts: generic part moved in vfio.c
> >>   and ARM specific part(kvm_arch_set_fwd_state)
> >> ---
> >>  include/linux/kvm_host.h |  27 +++
> >>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> >>  2 files changed, 477 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index a4c33b3..24350dc 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> >>  		      unsigned long arg);
> >>  };
> >>  
> >> +enum kvm_fwd_irq_action {
> >> +	KVM_VFIO_IRQ_SET_FORWARD,
> >> +	KVM_VFIO_IRQ_SET_NORMAL,
> >> +	KVM_VFIO_IRQ_CLEANUP,
> > 
> > This is KVM internal API, so it would probably be good to document this.
> > Especially the CLEANUP bit worries me, see below.
> I will document it
> > 
> >> +};
> >> +
> >> +/* internal structure describing a forwarded IRQ */
> >> +struct kvm_fwd_irq {
> >> +	struct list_head link;
> > 
> > this list entry is local to the kvm vfio device, right? that means you
> > probably want a struct with just the below fields, and then have a
> > containing struct in the generic device file, private to it's logic.
> I will introduce 2 separate structs
> > 
> >> +	__u32 index; /* platform device irq index */
> >> +	__u32 hwirq; /*physical IRQ */
> >> +	__u32 gsi; /* virtual IRQ */
> >> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> >> +};
> >> +
> >>  void kvm_device_get(struct kvm_device *dev);
> >>  void kvm_device_put(struct kvm_device *dev);
> >>  struct kvm_device *kvm_device_from_filp(struct file *filp);
> >> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
> >>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >>  extern struct kvm_device_ops kvm_flic_ops;
> >>  
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > 
> > what's the 'p' in pfwd?
> will rename
> > 
> >> +			   enum kvm_fwd_irq_action action);
> >> +
> >> +#else
> >> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> >> +					 enum kvm_fwd_irq_action action)
> >> +{
> >> +	return 0;
> >> +}
> >> +#endif
> >> +
> >>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >>  
> >>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> >> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> >> index 76dc7a1..e4a81c4 100644
> >> --- a/virt/kvm/vfio.c
> >> +++ b/virt/kvm/vfio.c
> >> @@ -18,14 +18,24 @@
> >>  #include <linux/slab.h>
> >>  #include <linux/uaccess.h>
> >>  #include <linux/vfio.h>
> >> +#include <linux/platform_device.h>
> >>  
> >>  struct kvm_vfio_group {
> >>  	struct list_head node;
> >>  	struct vfio_group *vfio_group;
> >>  };
> >>  
> >> +struct kvm_vfio_device {
> >> +	struct list_head node;
> >> +	struct vfio_device *vfio_device;
> >> +	/* list of forwarded IRQs for that VFIO device */
> >> +	struct list_head fwd_irq_list;
> >> +	int fd;
> >> +};
> >> +
> >>  struct kvm_vfio {
> >>  	struct list_head group_list;
> >> +	struct list_head device_list;
> >>  	struct mutex lock;
> >>  	bool noncoherent;
> >>  };
> >> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >>  	return -ENXIO;
> >>  }
> >>  
> >> +/**
> >> + * get_vfio_device - returns the vfio-device corresponding to this fd
> >> + * @fd:fd of the vfio platform device
> >> + *
> >> + * checks it is a vfio device
> >> + * increment its ref counter
> > 
> > why the short lines?  Just write this out in proper English.
> OK
> > 
> >> + */
> >> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> >> +{
> >> +	struct fd f;
> >> +	struct vfio_device *vdev;
> >> +
> >> +	f = fdget(fd);
> >> +	if (!f.file)
> >> +		return NULL;
> >> +	vdev = kvm_vfio_device_get_external_user(f.file);
> >> +	fdput(f);
> >> +	return vdev;
> >> +}
> >> +
> >> +/**
> >> + * put_vfio_device: put the vfio platform device
> >> + * @vdev: vfio_device to put
> >> + *
> >> + * decrement the ref counter
> >> + */
> >> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> >> +{
> >> +	kvm_vfio_device_put_external_user(vdev);
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_find_device - look for the device in the assigned
> >> + * device list
> >> + * @kv: the kvm-vfio device
> >> + * @vdev: the vfio_device to look for
> >> + *
> >> + * returns the associated kvm_vfio_device if the device is known,
> >> + * meaning at least 1 IRQ is forwarded for this device.
> >> + * in the device is not registered, returns NULL.
> >> + */
> > 
> > are these functions meant to be exported?  Otherwise they should be
> > static, and the documentation on these simple list iteration wrappers
> > seems like overkill imho.
> could be static indeed
> > 
> >> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> >> +					     struct vfio_device *vdev)
> >> +{
> >> +	struct kvm_vfio_device *kvm_vdev_iter;
> >> +
> >> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> >> +		if (kvm_vdev_iter->vfio_device == vdev)
> >> +			return kvm_vdev_iter;
> >> +	}
> >> +	return NULL;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> >> + * @kvm_vdev: the kvm_vfio_device
> >> + * @irq_index: irq index
> >> + *
> >> + * returns the forwarded irq struct if it exists, NULL in the negative
> >> + */
> >> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> >> +				      int irq_index)
> >> +{
> >> +	struct kvm_fwd_irq *fwd_irq_iter;
> >> +
> >> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> >> +		if (fwd_irq_iter->index == irq_index)
> >> +			return fwd_irq_iter;
> >> +	}
> >> +	return NULL;
> >> +}
> >> +
> >> +/**
> >> + * validate_forward - checks whether forwarding a given IRQ is meaningful
> >> + * @vdev:  vfio_device the IRQ belongs to
> >> + * @fwd_irq: user struct containing the irq_index to forward
> >> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> >> + * kvm_vfio_device that holds it
> >> + * @hwirq: irq numberthe irq index corresponds to
> >> + *
> >> + * checks the vfio-device is a platform vfio device
> >> + * checks the irq_index corresponds to an actual hwirq and
> >> + * checks this hwirq is not already forwarded
> >> + * returns < 0 on following errors:
> >> + * not a platform device, bad irq index, already forwarded
> >> + */
> >> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> >> +			    struct vfio_device *vdev,
> >> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >> +			    struct kvm_vfio_device **kvm_vdev,
> >> +			    int *hwirq)
> >> +{
> >> +	struct device *dev = kvm_vfio_external_base_device(vdev);
> >> +	struct platform_device *platdev;
> >> +
> >> +	*hwirq = -1;
> >> +	*kvm_vdev = NULL;
> >> +	if (strcmp(dev->bus->name, "platform") == 0) {
> >> +		platdev = to_platform_device(dev);
> >> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> >> +		if (*hwirq < 0) {
> >> +			kvm_err("%s incorrect index\n", __func__);
> >> +			return -EINVAL;
> >> +		}
> >> +	} else {
> >> +		kvm_err("%s not a platform device\n", __func__);
> >> +		return -EINVAL;
> >> +	}
> > 
> > need some spaceing here, also, I would turn this around, first check if
> > the strcmp fails, and then error out, then do you next check etc., to
> > avoid so many nested statements.
> ok
> > 
> >> +	/* is a ref to this device already owned by the KVM-VFIO device? */
> > 
> > this comment is not particularly helpful in its current form, it would
> > be helpful if you specified that we're checking whether that particular
> > device/irq combo is already registered.
> > 
> >> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >> +	if (*kvm_vdev) {
> >> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> >> +			kvm_err("%s irq %d already forwarded\n",
> >> +				__func__, *hwirq);
> > 
> > don't flood the kernel log because of a user error, just allocate an
> > error code for this purpose and document it in the ABI, -EEXIST or
> > something.
> ok
> > 
> >> +			return -EINVAL;
> >> +		}
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * validate_unforward: check a deassignment is meaningful
> >> + * @kv: the kvm_vfio device
> >> + * @vdev: the vfio_device whose irq to deassign belongs to
> >> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> >> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> >> + * it exists
> >> + *
> >> + * returns 0 if the provided irq effectively is forwarded
> >> + * (a ref to this vfio_device is hold and this irq belongs to
> >                                     held
> >> + * the forwarded irq of this device)
> >> + * returns -EINVAL in the negative
> > 
> >                ENOENT should be returned if you don't have an entry.
> > 	       EINVAL could be used if you supply an fd that isn't a
> > 	       VFIO device file descriptor, for example.  Again,
> > 	       consider documenting all this in the API.
> > 
> >> + */
> >> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> >> +			      struct vfio_device *vdev,
> >> +			      struct kvm_arch_forwarded_irq *fwd_irq,
> >> +			      struct kvm_vfio_device **kvm_vdev)
> >> +{
> >> +	struct kvm_fwd_irq *pfwd;
> >> +
> >> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >> +	if (!kvm_vdev) {
> >> +		kvm_err("%s no forwarded irq for this device\n", __func__);
> > 
> > don't flood the kernel log
> ok
> > 
> >> +		return -EINVAL;
> >> +	}
> >> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> >> +	if (!pfwd) {
> >> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> > 
> > 
> >> +		return -EINVAL;
> > 
> > same here
> ok
> > 
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_forward - set a forwarded IRQ
> >> + * @kdev: the kvm device
> >> + * @vdev: the vfio device the IRQ belongs to
> >> + * @fwd_irq: the user struct containing the irq_index and guest irq
> >> + * @must_put: tells the caller whether the vfio_device must be put after
> >> + * the call (ref must be released in case a ref onto this device was
> >> + * already hold or in case of new device and failure)
> >> + *
> >> + * validate the injection, activate forward and store the information
> >       Validate
> >> + * about which irq and which device is concerned so that on deassign or
> >> + * kvm-vfio destruction everuthing can be cleaned up.
> >                            everything
> > 
> > I'm not sure I understand this explanation.  Do we have concerned
> > devices?
> > 
> > I think you want to say something along the lines of: If userspace passed
> > a valid vfio device and irq handle and the architecture supports
> > forwarding this combination, register the vfio_device and irq
> > combination in the ....
> ok
> > 
> >> + */
> >> +static int kvm_vfio_forward(struct kvm_device *kdev,
> >> +			    struct vfio_device *vdev,
> >> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >> +			    bool *must_put)
> >> +{
> >> +	int ret;
> >> +	struct kvm_fwd_irq *pfwd = NULL;
> >> +	struct kvm_vfio_device *kvm_vdev = NULL;
> >> +	struct kvm_vfio *kv = kdev->private;
> >> +	int hwirq;
> >> +
> >> +	*must_put = true;
> >> +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
> >> +					&kvm_vdev, &hwirq);
> >> +	if (ret < 0)
> >> +		return -EINVAL;
> >> +
> >> +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
> > 
> > seems a bit pointless to zero-out the memory if you're setting all
> > fields below.
> ok
> > 
> >> +	if (!pfwd)
> >> +		return -ENOMEM;
> >> +	pfwd->index = fwd_irq->index;
> >> +	pfwd->gsi = fwd_irq->gsi;
> >> +	pfwd->hwirq = hwirq;
> >> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> >> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> >> +	if (ret < 0) {
> >> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> > 
> > this whole thing feels incredibly broken to me.  Setting a forward
> > should either work or not work, not something in between that leaves
> > something to be cleaned up.  Why this two-stage thingy here?
> I wanted to exploit the return value of vgic_map_phys_irq which is
> likely to fail if the phys/virt mapping exists at VGIC level.
> 
> I already validated the injection from a KVM_VFIO_DEVICE point of view
> (the device/irq is not known internally). But what if another external
> component - which does not exist yet - maps the IRQ at VGIC level? Maybe
> I need to replace the existing validation check by querying the VGIC at
> low level instead of checking KVM-VFIO local variables.

The kvm-vfio interface needs to follow the user API, an IRQ is either
forwarded or not forwarded.  We're either tracking it or not tracking
it.  This limbo state doesn't make any sense to track here.  The
kvm-vfio level validation (testing for duplicates) should be device
agnostic.  kvm_arch_set_fwd_state() is where any lower level tests
should be done.

> > 
> >> +		kfree(pfwd);
> > 
> > probably want to move your free-and-return-error to the end of the
> > function.
> ok
> > 
> >> +		return ret;
> >> +	}
> >> +
> >> +	if (!kvm_vdev) {
> >> +		/* create & insert the new device and keep the ref */
> >> +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
> > 
> > again, no need for zeroing out the memory.
> ok
> > 
> >> +		if (!kvm_vdev) {
> >> +			kvm_arch_set_fwd_state(pfwd, false);
> >> +			kfree(pfwd);
> >> +			return -ENOMEM;
> >> +		}
> >> +
> >> +		kvm_vdev->vfio_device = vdev;
> >> +		kvm_vdev->fd = fwd_irq->fd;
> >> +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
> >> +		list_add(&kvm_vdev->node, &kv->device_list);
> >> +		/*
> >> +		 * the only case where we keep the ref:
> >> +		 * new device and forward setting successful
> >> +		 */
> >> +		*must_put = false;
> >> +	}
> >> +
> >> +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
> >> +
> >> +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
> >> +	fwd_irq->fd, hwirq, fwd_irq->gsi);
> > 
> > please indent this to align with the opening parenthesis.
> ok
> > 
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * remove_assigned_device - put a given device from the list
> > 
> > this isn't a 'put', at least not *just* a put.
> correct, I will rephrase
> > 
> >> + * @kv: the kvm-vfio device
> >> + * @vdev: the vfio-device to remove
> >> + *
> >> + * change the state of all forwarded IRQs, free the forwarded IRQ list,
> >> + * remove the corresponding kvm_vfio_device from the assigned device
> >> + * list.
> >> + * returns true if the device could be removed, false in the negative
> >> + */
> >> +bool remove_assigned_device(struct kvm_vfio *kv,
> >> +			    struct vfio_device *vdev)
> >> +{
> >> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> >> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >> +	bool removed = false;
> >> +	int ret;
> >> +
> >> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> >> +				 &kv->device_list, node) {
> >> +		if (kvm_vdev_iter->vfio_device == vdev) {
> >> +			/* loop on all its forwarded IRQ */
> >> +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >> +						 &kvm_vdev_iter->fwd_irq_list,
> >> +						 link) {
> > 
> > hmm, seems this function is only called when you have no more forwarded
> > IRQs, so isn't all of this completely dead (and unnecessary) code?
> yep I can simplify all that cleanup
> > 
> >> +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> >> +						KVM_VFIO_IRQ_SET_NORMAL);
> >> +				if (ret < 0)
> >> +					return ret;
> > 
> > you're returning an error code to a bool function, which means you'll
> > return true when there was an error.  Is this your intention? ;)
> definitively not!
> > 
> > if we have an error here, this would be a very very bad situation wouldn't it?
> sure. I will simplify this, transform kvm_arch_set_fwd_state into a void
> function

Please no, kvm_arch_set_fwd_state() needs to indicate to kvm-vfio
whether the requested forward was setup, how can it do that without a
return?

> > 
> >> +				list_del(&fwd_irq_iter->link);
> >> +				kfree(fwd_irq_iter);
> >> +			}
> >> +			/* all IRQs could be deassigned */
> >> +			list_del(&kvm_vdev_iter->node);
> >> +			kvm_vfio_device_put_external_user(
> >> +				kvm_vdev_iter->vfio_device);
> >> +			kfree(kvm_vdev_iter);
> >> +			removed = true;
> >> +			break;
> >> +		}
> >> +	}
> >> +	return removed;
> >> +}
> >> +
> >> +
> >> +/**
> >> + * remove_fwd_irq - remove a forwarded irq
> >> + *
> >> + * @kv: kvm-vfio device
> >> + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
> >> + * irq_index: the index of the IRQ
> >> + *
> >> + * change the forwarded state of the IRQ, remove the IRQ from
> >> + * the device forwarded IRQ list. In case it is the last one,
> >> + * put the device
> >> + */
> >> +int remove_fwd_irq(struct kvm_vfio *kv,
> >> +		   struct kvm_vfio_device *kvm_vdev,
> >> +		   int irq_index)
> >> +{
> >> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >> +	int ret = -1;
> >> +
> >> +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >> +				 &kvm_vdev->fwd_irq_list, link) {
> > 
> > hmmm, you can only forward one irq for a specific device once, right?
> > And you already have a lookup function, so why not call that, and then
> > remove it?
> > 
> > I'm confused.
> will fix that
> > 
> >> +		if (fwd_irq_iter->index == irq_index) {
> >> +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> >> +						KVM_VFIO_IRQ_SET_NORMAL);
> >> +			if (ret < 0)
> >> +				break;
> >> +			list_del(&fwd_irq_iter->link);
> >> +			kfree(fwd_irq_iter);
> >> +			ret = 0;
> >> +			break;
> >> +		}
> >> +	}
> >> +	if (list_empty(&kvm_vdev->fwd_irq_list))
> >> +		remove_assigned_device(kv, kvm_vdev->vfio_device);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_unforward - remove a forwarded IRQ
> >> + * @kdev: the kvm device
> >> + * @vdev: the vfio_device
> >> + * @fwd_irq: user struct
> >> + * after checking this IRQ effectively is forwarded, change its state,
> >> + * remove it from the corresponding kvm_vfio_device list
> >> + */
> >> +static int kvm_vfio_unforward(struct kvm_device *kdev,
> >> +				     struct vfio_device *vdev,
> >> +				     struct kvm_arch_forwarded_irq *fwd_irq)
> >> +{
> >> +	struct kvm_vfio *kv = kdev->private;
> >> +	struct kvm_vfio_device *kvm_vdev;
> >> +	int ret;
> >> +
> >> +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
> >> +	if (ret < 0)
> >> +		return -EINVAL;
> > 
> > why do you override the return value?  Propagate it.
> ok
> > 
> >> +
> >> +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
> >> +	if (ret < 0)
> >> +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
> >> +			__func__, fwd_irq->fd, fwd_irq->index);
> >> +	else
> >> +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
> >> +			  __func__, fwd_irq->fd, fwd_irq->index);
> > 
> > again with the kernel log here.
> ok
> > 
> > 
> > 
> >> +	return ret;
> >> +}
> >> +
> >> +
> >> +
> >> +
> >> +/**
> >> + * kvm_vfio_set_device - the top function for interracting with a vfio
> > 
> >                                 top?             interacting
> > 
> >> + * device
> >> + */
> > 
> > probably just skip this comment
> ok
> > 
> >> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> >> +{
> >> +	struct kvm_vfio *kv = kdev->private;
> >> +	struct vfio_device *vdev;
> >> +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
> >> +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> >> +
> >> +	switch (attr) {
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> >> +		bool must_put;
> >> +		int ret;
> >> +
> >> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> >> +			return -EFAULT;
> >> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> >> +		if (IS_ERR(vdev))
> >> +			return PTR_ERR(vdev);
> > 
> > seems like this whole block of code is replicated below, needs
> > refactoring.
> ok
> > 
> >> +		mutex_lock(&kv->lock);
> >> +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
> >> +		if (must_put)
> >> +			kvm_vfio_put_vfio_device(vdev);
> > 
> > this must_put looks plain weird.  I think you want to balance your
> > get/put's always; can't you just get an extra reference in
> > kvm_vfio_forward() ?
> I will investigate that. Makes sense
> > 
> >> +		mutex_unlock(&kv->lock);
> >> +		return ret;
> >> +		}
> >> +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
> >> +		int ret;
> >> +
> >> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> >> +			return -EFAULT;
> >> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> >> +		if (IS_ERR(vdev))
> >> +			return PTR_ERR(vdev);
> >> +
> >> +		kvm_vfio_device_put_external_user(vdev);
> > 
> > you're dropping the reference to the device but referencing it in your
> > unfoward call below?
> thanks for identifying that bug.
> > 
> >> +		mutex_lock(&kv->lock);
> >> +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
> >> +		mutex_unlock(&kv->lock);
> >> +		return ret;
> >> +	}
> >> +#endif
> >> +	default:
> >> +		return -ENXIO;
> >> +	}
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
> >> + * @kv: kvm-vfio device
> >> + *
> >> + * loop on all got devices and their associated forwarded IRQs
> > 
> > 'loop on all got' ?
> > 
> > Restore the non-forwarded state for all registered devices and ...
> ok
> > 
> >> + * restore the non forwarded state, remove IRQs and their devices from
> >> + * the respective list, put the vfio platform devices
> >> + *
> >> + * When this function is called, the vcpu already are destroyed. No
> >                                     the VPUCs are already destroyed.
> >> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> >> + * kvm_arch_set_fwd_state action
> > 
> > this last bit didn't make any sense to me.  Also, why are we referring
> > to the vgic in generic code?
> doesn't make sense anymore indeed. I wanted to emphasize the fact that
> VGIC KVM device is destroyed before the KVM VFIO device and this
> explains why I need a special CLEANUP cmd (besides the fact I need to
> call chip->irq_eoi(d) for the forwarded IRQs);

Nope, still doesn't make sense or justify the additional state for me.

> > 
> >> + */
> >> +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
> >> +{
> >> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> >> +
> >> +	/* loop on all the assigned devices */
> > 
> > unnecessary comment
> ok
> > 
> >> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> >> +				 &kv->device_list, node) {
> >> +
> >> +		/* loop on all its forwarded IRQ */
> > 
> > same
> ok
> 
> Thanks for the detailed review
> 
> Best Regards
> 
> Eric
> > 
> >> +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >> +					 &kvm_vdev_iter->fwd_irq_list, link) {
> >> +			kvm_arch_set_fwd_state(fwd_irq_iter,
> >> +						KVM_VFIO_IRQ_CLEANUP);
> >> +			list_del(&fwd_irq_iter->link);
> >> +			kfree(fwd_irq_iter);
> >> +		}
> >> +		list_del(&kvm_vdev_iter->node);
> >> +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
> >> +		kfree(kvm_vdev_iter);
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +
> >>  static int kvm_vfio_set_attr(struct kvm_device *dev,
> >>  			     struct kvm_device_attr *attr)
> >>  {
> >>  	switch (attr->group) {
> >>  	case KVM_DEV_VFIO_GROUP:
> >>  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> >> +	case KVM_DEV_VFIO_DEVICE:
> >> +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> >>  	}
> >>  
> >>  	return -ENXIO;
> >> @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> >>  		case KVM_DEV_VFIO_GROUP_DEL:
> >>  			return 0;
> >>  		}
> >> -
> >>  		break;
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> +	case KVM_DEV_VFIO_DEVICE:
> >> +		switch (attr->attr) {
> >> +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >> +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >> +			return 0;
> >> +		}
> >> +		break;
> >> +#endif
> >>  	}
> >> -
> >>  	return -ENXIO;
> >>  }
> >>  
> >> @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> >>  		list_del(&kvg->node);
> >>  		kfree(kvg);
> >>  	}
> >> +	kvm_vfio_put_all_devices(kv);
> >>  
> >>  	kvm_vfio_update_coherency(dev);
> >>  
> >> @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
> >>  		return -ENOMEM;
> >>  
> >>  	INIT_LIST_HEAD(&kv->group_list);
> >> +	INIT_LIST_HEAD(&kv->device_list);
> >>  	mutex_init(&kv->lock);
> >>  
> >>  	dev->private = kv;
> >> -- 
> >> 1.9.1
> >>
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11 15:47         ` Alex Williamson
  0 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11 15:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2014-09-11 at 11:35 +0200, Eric Auger wrote:
> On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> >> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> >>
> >> This is a new control channel which enables KVM to cooperate with
> >> viable VFIO devices.
> >>
> >> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> >> in addition to a list of groups (kvm_vfio_group). The new
> >> infrastructure enables to check the validity of the VFIO device
> >> file descriptor, get and hold a reference to it.
> >>
> >> The first concrete implemented command is IRQ forward control:
> >> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> >>
> >> It consists in programing the VFIO driver and KVM in a consistent manner
> >> so that an optimized IRQ injection/completion is set up. Each
> >> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> >> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> >> are set again in the normal handling state (non forwarded).
> > 
> > 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> > 
> > When a kvm_vfio_device is released?
> sure
> > 
> >>
> >> The forwarding programmming is architecture specific, embodied by the
> >> kvm_arch_set_fwd_state function. Its implementation is given in a
> >> separate patch file.
> > 
> > I would drop the last sentence and instead indicate that this is handled
> > properly when the architecture does not support such a feature.
> ok
> > 
> >>
> >> The forwarding control modality is enabled by the
> >> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>
> >> ---
> >>
> >> v1 -> v2:
> >> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> - original patch file separated into 2 parts: generic part moved in vfio.c
> >>   and ARM specific part(kvm_arch_set_fwd_state)
> >> ---
> >>  include/linux/kvm_host.h |  27 +++
> >>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> >>  2 files changed, 477 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index a4c33b3..24350dc 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> >>  		      unsigned long arg);
> >>  };
> >>  
> >> +enum kvm_fwd_irq_action {
> >> +	KVM_VFIO_IRQ_SET_FORWARD,
> >> +	KVM_VFIO_IRQ_SET_NORMAL,
> >> +	KVM_VFIO_IRQ_CLEANUP,
> > 
> > This is KVM internal API, so it would probably be good to document this.
> > Especially the CLEANUP bit worries me, see below.
> I will document it
> > 
> >> +};
> >> +
> >> +/* internal structure describing a forwarded IRQ */
> >> +struct kvm_fwd_irq {
> >> +	struct list_head link;
> > 
> > this list entry is local to the kvm vfio device, right? that means you
> > probably want a struct with just the below fields, and then have a
> > containing struct in the generic device file, private to it's logic.
> I will introduce 2 separate structs
> > 
> >> +	__u32 index; /* platform device irq index */
> >> +	__u32 hwirq; /*physical IRQ */
> >> +	__u32 gsi; /* virtual IRQ */
> >> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> >> +};
> >> +
> >>  void kvm_device_get(struct kvm_device *dev);
> >>  void kvm_device_put(struct kvm_device *dev);
> >>  struct kvm_device *kvm_device_from_filp(struct file *filp);
> >> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
> >>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >>  extern struct kvm_device_ops kvm_flic_ops;
> >>  
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > 
> > what's the 'p' in pfwd?
> will rename
> > 
> >> +			   enum kvm_fwd_irq_action action);
> >> +
> >> +#else
> >> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> >> +					 enum kvm_fwd_irq_action action)
> >> +{
> >> +	return 0;
> >> +}
> >> +#endif
> >> +
> >>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >>  
> >>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> >> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> >> index 76dc7a1..e4a81c4 100644
> >> --- a/virt/kvm/vfio.c
> >> +++ b/virt/kvm/vfio.c
> >> @@ -18,14 +18,24 @@
> >>  #include <linux/slab.h>
> >>  #include <linux/uaccess.h>
> >>  #include <linux/vfio.h>
> >> +#include <linux/platform_device.h>
> >>  
> >>  struct kvm_vfio_group {
> >>  	struct list_head node;
> >>  	struct vfio_group *vfio_group;
> >>  };
> >>  
> >> +struct kvm_vfio_device {
> >> +	struct list_head node;
> >> +	struct vfio_device *vfio_device;
> >> +	/* list of forwarded IRQs for that VFIO device */
> >> +	struct list_head fwd_irq_list;
> >> +	int fd;
> >> +};
> >> +
> >>  struct kvm_vfio {
> >>  	struct list_head group_list;
> >> +	struct list_head device_list;
> >>  	struct mutex lock;
> >>  	bool noncoherent;
> >>  };
> >> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >>  	return -ENXIO;
> >>  }
> >>  
> >> +/**
> >> + * get_vfio_device - returns the vfio-device corresponding to this fd
> >> + * @fd:fd of the vfio platform device
> >> + *
> >> + * checks it is a vfio device
> >> + * increment its ref counter
> > 
> > why the short lines?  Just write this out in proper English.
> OK
> > 
> >> + */
> >> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> >> +{
> >> +	struct fd f;
> >> +	struct vfio_device *vdev;
> >> +
> >> +	f = fdget(fd);
> >> +	if (!f.file)
> >> +		return NULL;
> >> +	vdev = kvm_vfio_device_get_external_user(f.file);
> >> +	fdput(f);
> >> +	return vdev;
> >> +}
> >> +
> >> +/**
> >> + * put_vfio_device: put the vfio platform device
> >> + * @vdev: vfio_device to put
> >> + *
> >> + * decrement the ref counter
> >> + */
> >> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> >> +{
> >> +	kvm_vfio_device_put_external_user(vdev);
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_find_device - look for the device in the assigned
> >> + * device list
> >> + * @kv: the kvm-vfio device
> >> + * @vdev: the vfio_device to look for
> >> + *
> >> + * returns the associated kvm_vfio_device if the device is known,
> >> + * meaning at least 1 IRQ is forwarded for this device.
> >> + * in the device is not registered, returns NULL.
> >> + */
> > 
> > are these functions meant to be exported?  Otherwise they should be
> > static, and the documentation on these simple list iteration wrappers
> > seems like overkill imho.
> could be static indeed
> > 
> >> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> >> +					     struct vfio_device *vdev)
> >> +{
> >> +	struct kvm_vfio_device *kvm_vdev_iter;
> >> +
> >> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> >> +		if (kvm_vdev_iter->vfio_device == vdev)
> >> +			return kvm_vdev_iter;
> >> +	}
> >> +	return NULL;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> >> + * @kvm_vdev: the kvm_vfio_device
> >> + * @irq_index: irq index
> >> + *
> >> + * returns the forwarded irq struct if it exists, NULL in the negative
> >> + */
> >> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> >> +				      int irq_index)
> >> +{
> >> +	struct kvm_fwd_irq *fwd_irq_iter;
> >> +
> >> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> >> +		if (fwd_irq_iter->index == irq_index)
> >> +			return fwd_irq_iter;
> >> +	}
> >> +	return NULL;
> >> +}
> >> +
> >> +/**
> >> + * validate_forward - checks whether forwarding a given IRQ is meaningful
> >> + * @vdev:  vfio_device the IRQ belongs to
> >> + * @fwd_irq: user struct containing the irq_index to forward
> >> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> >> + * kvm_vfio_device that holds it
> >> + * @hwirq: irq numberthe irq index corresponds to
> >> + *
> >> + * checks the vfio-device is a platform vfio device
> >> + * checks the irq_index corresponds to an actual hwirq and
> >> + * checks this hwirq is not already forwarded
> >> + * returns < 0 on following errors:
> >> + * not a platform device, bad irq index, already forwarded
> >> + */
> >> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> >> +			    struct vfio_device *vdev,
> >> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >> +			    struct kvm_vfio_device **kvm_vdev,
> >> +			    int *hwirq)
> >> +{
> >> +	struct device *dev = kvm_vfio_external_base_device(vdev);
> >> +	struct platform_device *platdev;
> >> +
> >> +	*hwirq = -1;
> >> +	*kvm_vdev = NULL;
> >> +	if (strcmp(dev->bus->name, "platform") == 0) {
> >> +		platdev = to_platform_device(dev);
> >> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> >> +		if (*hwirq < 0) {
> >> +			kvm_err("%s incorrect index\n", __func__);
> >> +			return -EINVAL;
> >> +		}
> >> +	} else {
> >> +		kvm_err("%s not a platform device\n", __func__);
> >> +		return -EINVAL;
> >> +	}
> > 
> > need some spaceing here, also, I would turn this around, first check if
> > the strcmp fails, and then error out, then do you next check etc., to
> > avoid so many nested statements.
> ok
> > 
> >> +	/* is a ref to this device already owned by the KVM-VFIO device? */
> > 
> > this comment is not particularly helpful in its current form, it would
> > be helpful if you specified that we're checking whether that particular
> > device/irq combo is already registered.
> > 
> >> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >> +	if (*kvm_vdev) {
> >> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> >> +			kvm_err("%s irq %d already forwarded\n",
> >> +				__func__, *hwirq);
> > 
> > don't flood the kernel log because of a user error, just allocate an
> > error code for this purpose and document it in the ABI, -EEXIST or
> > something.
> ok
> > 
> >> +			return -EINVAL;
> >> +		}
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * validate_unforward: check a deassignment is meaningful
> >> + * @kv: the kvm_vfio device
> >> + * @vdev: the vfio_device whose irq to deassign belongs to
> >> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> >> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> >> + * it exists
> >> + *
> >> + * returns 0 if the provided irq effectively is forwarded
> >> + * (a ref to this vfio_device is hold and this irq belongs to
> >                                     held
> >> + * the forwarded irq of this device)
> >> + * returns -EINVAL in the negative
> > 
> >                ENOENT should be returned if you don't have an entry.
> > 	       EINVAL could be used if you supply an fd that isn't a
> > 	       VFIO device file descriptor, for example.  Again,
> > 	       consider documenting all this in the API.
> > 
> >> + */
> >> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> >> +			      struct vfio_device *vdev,
> >> +			      struct kvm_arch_forwarded_irq *fwd_irq,
> >> +			      struct kvm_vfio_device **kvm_vdev)
> >> +{
> >> +	struct kvm_fwd_irq *pfwd;
> >> +
> >> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >> +	if (!kvm_vdev) {
> >> +		kvm_err("%s no forwarded irq for this device\n", __func__);
> > 
> > don't flood the kernel log
> ok
> > 
> >> +		return -EINVAL;
> >> +	}
> >> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> >> +	if (!pfwd) {
> >> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> > 
> > 
> >> +		return -EINVAL;
> > 
> > same here
> ok
> > 
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_forward - set a forwarded IRQ
> >> + * @kdev: the kvm device
> >> + * @vdev: the vfio device the IRQ belongs to
> >> + * @fwd_irq: the user struct containing the irq_index and guest irq
> >> + * @must_put: tells the caller whether the vfio_device must be put after
> >> + * the call (ref must be released in case a ref onto this device was
> >> + * already hold or in case of new device and failure)
> >> + *
> >> + * validate the injection, activate forward and store the information
> >       Validate
> >> + * about which irq and which device is concerned so that on deassign or
> >> + * kvm-vfio destruction everuthing can be cleaned up.
> >                            everything
> > 
> > I'm not sure I understand this explanation.  Do we have concerned
> > devices?
> > 
> > I think you want to say something along the lines of: If userspace passed
> > a valid vfio device and irq handle and the architecture supports
> > forwarding this combination, register the vfio_device and irq
> > combination in the ....
> ok
> > 
> >> + */
> >> +static int kvm_vfio_forward(struct kvm_device *kdev,
> >> +			    struct vfio_device *vdev,
> >> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >> +			    bool *must_put)
> >> +{
> >> +	int ret;
> >> +	struct kvm_fwd_irq *pfwd = NULL;
> >> +	struct kvm_vfio_device *kvm_vdev = NULL;
> >> +	struct kvm_vfio *kv = kdev->private;
> >> +	int hwirq;
> >> +
> >> +	*must_put = true;
> >> +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
> >> +					&kvm_vdev, &hwirq);
> >> +	if (ret < 0)
> >> +		return -EINVAL;
> >> +
> >> +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
> > 
> > seems a bit pointless to zero-out the memory if you're setting all
> > fields below.
> ok
> > 
> >> +	if (!pfwd)
> >> +		return -ENOMEM;
> >> +	pfwd->index = fwd_irq->index;
> >> +	pfwd->gsi = fwd_irq->gsi;
> >> +	pfwd->hwirq = hwirq;
> >> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> >> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> >> +	if (ret < 0) {
> >> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> > 
> > this whole thing feels incredibly broken to me.  Setting a forward
> > should either work or not work, not something in between that leaves
> > something to be cleaned up.  Why this two-stage thingy here?
> I wanted to exploit the return value of vgic_map_phys_irq which is
> likely to fail if the phys/virt mapping exists at VGIC level.
> 
> I already validated the injection from a KVM_VFIO_DEVICE point of view
> (the device/irq is not known internally). But what if another external
> component - which does not exist yet - maps the IRQ at VGIC level? Maybe
> I need to replace the existing validation check by querying the VGIC at
> low level instead of checking KVM-VFIO local variables.

The kvm-vfio interface needs to follow the user API, an IRQ is either
forwarded or not forwarded.  We're either tracking it or not tracking
it.  This limbo state doesn't make any sense to track here.  The
kvm-vfio level validation (testing for duplicates) should be device
agnostic.  kvm_arch_set_fwd_state() is where any lower level tests
should be done.

> > 
> >> +		kfree(pfwd);
> > 
> > probably want to move your free-and-return-error to the end of the
> > function.
> ok
> > 
> >> +		return ret;
> >> +	}
> >> +
> >> +	if (!kvm_vdev) {
> >> +		/* create & insert the new device and keep the ref */
> >> +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
> > 
> > again, no need for zeroing out the memory.
> ok
> > 
> >> +		if (!kvm_vdev) {
> >> +			kvm_arch_set_fwd_state(pfwd, false);
> >> +			kfree(pfwd);
> >> +			return -ENOMEM;
> >> +		}
> >> +
> >> +		kvm_vdev->vfio_device = vdev;
> >> +		kvm_vdev->fd = fwd_irq->fd;
> >> +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
> >> +		list_add(&kvm_vdev->node, &kv->device_list);
> >> +		/*
> >> +		 * the only case where we keep the ref:
> >> +		 * new device and forward setting successful
> >> +		 */
> >> +		*must_put = false;
> >> +	}
> >> +
> >> +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
> >> +
> >> +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
> >> +	fwd_irq->fd, hwirq, fwd_irq->gsi);
> > 
> > please indent this to align with the opening parenthesis.
> ok
> > 
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +/**
> >> + * remove_assigned_device - put a given device from the list
> > 
> > this isn't a 'put', at least not *just* a put.
> correct, I will rephrase
> > 
> >> + * @kv: the kvm-vfio device
> >> + * @vdev: the vfio-device to remove
> >> + *
> >> + * change the state of all forwarded IRQs, free the forwarded IRQ list,
> >> + * remove the corresponding kvm_vfio_device from the assigned device
> >> + * list.
> >> + * returns true if the device could be removed, false in the negative
> >> + */
> >> +bool remove_assigned_device(struct kvm_vfio *kv,
> >> +			    struct vfio_device *vdev)
> >> +{
> >> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> >> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >> +	bool removed = false;
> >> +	int ret;
> >> +
> >> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> >> +				 &kv->device_list, node) {
> >> +		if (kvm_vdev_iter->vfio_device == vdev) {
> >> +			/* loop on all its forwarded IRQ */
> >> +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >> +						 &kvm_vdev_iter->fwd_irq_list,
> >> +						 link) {
> > 
> > hmm, seems this function is only called when you have no more forwarded
> > IRQs, so isn't all of this completely dead (and unnecessary) code?
> yep I can simplify all that cleanup
> > 
> >> +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> >> +						KVM_VFIO_IRQ_SET_NORMAL);
> >> +				if (ret < 0)
> >> +					return ret;
> > 
> > you're returning an error code to a bool function, which means you'll
> > return true when there was an error.  Is this your intention? ;)
> definitively not!
> > 
> > if we have an error here, this would be a very very bad situation wouldn't it?
> sure. I will simplify this, transform kvm_arch_set_fwd_state into a void
> function

Please no, kvm_arch_set_fwd_state() needs to indicate to kvm-vfio
whether the requested forward was setup, how can it do that without a
return?

> > 
> >> +				list_del(&fwd_irq_iter->link);
> >> +				kfree(fwd_irq_iter);
> >> +			}
> >> +			/* all IRQs could be deassigned */
> >> +			list_del(&kvm_vdev_iter->node);
> >> +			kvm_vfio_device_put_external_user(
> >> +				kvm_vdev_iter->vfio_device);
> >> +			kfree(kvm_vdev_iter);
> >> +			removed = true;
> >> +			break;
> >> +		}
> >> +	}
> >> +	return removed;
> >> +}
> >> +
> >> +
> >> +/**
> >> + * remove_fwd_irq - remove a forwarded irq
> >> + *
> >> + * @kv: kvm-vfio device
> >> + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
> >> + * irq_index: the index of the IRQ
> >> + *
> >> + * change the forwarded state of the IRQ, remove the IRQ from
> >> + * the device forwarded IRQ list. In case it is the last one,
> >> + * put the device
> >> + */
> >> +int remove_fwd_irq(struct kvm_vfio *kv,
> >> +		   struct kvm_vfio_device *kvm_vdev,
> >> +		   int irq_index)
> >> +{
> >> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >> +	int ret = -1;
> >> +
> >> +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >> +				 &kvm_vdev->fwd_irq_list, link) {
> > 
> > hmmm, you can only forward one irq for a specific device once, right?
> > And you already have a lookup function, so why not call that, and then
> > remove it?
> > 
> > I'm confused.
> will fix that
> > 
> >> +		if (fwd_irq_iter->index == irq_index) {
> >> +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> >> +						KVM_VFIO_IRQ_SET_NORMAL);
> >> +			if (ret < 0)
> >> +				break;
> >> +			list_del(&fwd_irq_iter->link);
> >> +			kfree(fwd_irq_iter);
> >> +			ret = 0;
> >> +			break;
> >> +		}
> >> +	}
> >> +	if (list_empty(&kvm_vdev->fwd_irq_list))
> >> +		remove_assigned_device(kv, kvm_vdev->vfio_device);
> >> +
> >> +	return ret;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_unforward - remove a forwarded IRQ
> >> + * @kdev: the kvm device
> >> + * @vdev: the vfio_device
> >> + * @fwd_irq: user struct
> >> + * after checking this IRQ effectively is forwarded, change its state,
> >> + * remove it from the corresponding kvm_vfio_device list
> >> + */
> >> +static int kvm_vfio_unforward(struct kvm_device *kdev,
> >> +				     struct vfio_device *vdev,
> >> +				     struct kvm_arch_forwarded_irq *fwd_irq)
> >> +{
> >> +	struct kvm_vfio *kv = kdev->private;
> >> +	struct kvm_vfio_device *kvm_vdev;
> >> +	int ret;
> >> +
> >> +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
> >> +	if (ret < 0)
> >> +		return -EINVAL;
> > 
> > why do you override the return value?  Propagate it.
> ok
> > 
> >> +
> >> +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
> >> +	if (ret < 0)
> >> +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
> >> +			__func__, fwd_irq->fd, fwd_irq->index);
> >> +	else
> >> +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
> >> +			  __func__, fwd_irq->fd, fwd_irq->index);
> > 
> > again with the kernel log here.
> ok
> > 
> > 
> > 
> >> +	return ret;
> >> +}
> >> +
> >> +
> >> +
> >> +
> >> +/**
> >> + * kvm_vfio_set_device - the top function for interracting with a vfio
> > 
> >                                 top?             interacting
> > 
> >> + * device
> >> + */
> > 
> > probably just skip this comment
> ok
> > 
> >> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> >> +{
> >> +	struct kvm_vfio *kv = kdev->private;
> >> +	struct vfio_device *vdev;
> >> +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
> >> +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> >> +
> >> +	switch (attr) {
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> >> +		bool must_put;
> >> +		int ret;
> >> +
> >> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> >> +			return -EFAULT;
> >> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> >> +		if (IS_ERR(vdev))
> >> +			return PTR_ERR(vdev);
> > 
> > seems like this whole block of code is replicated below, needs
> > refactoring.
> ok
> > 
> >> +		mutex_lock(&kv->lock);
> >> +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
> >> +		if (must_put)
> >> +			kvm_vfio_put_vfio_device(vdev);
> > 
> > this must_put looks plain weird.  I think you want to balance your
> > get/put's always; can't you just get an extra reference in
> > kvm_vfio_forward() ?
> I will investigate that. Makes sense
> > 
> >> +		mutex_unlock(&kv->lock);
> >> +		return ret;
> >> +		}
> >> +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
> >> +		int ret;
> >> +
> >> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> >> +			return -EFAULT;
> >> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> >> +		if (IS_ERR(vdev))
> >> +			return PTR_ERR(vdev);
> >> +
> >> +		kvm_vfio_device_put_external_user(vdev);
> > 
> > you're dropping the reference to the device but referencing it in your
> > unfoward call below?
> thanks for identifying that bug.
> > 
> >> +		mutex_lock(&kv->lock);
> >> +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
> >> +		mutex_unlock(&kv->lock);
> >> +		return ret;
> >> +	}
> >> +#endif
> >> +	default:
> >> +		return -ENXIO;
> >> +	}
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
> >> + * @kv: kvm-vfio device
> >> + *
> >> + * loop on all got devices and their associated forwarded IRQs
> > 
> > 'loop on all got' ?
> > 
> > Restore the non-forwarded state for all registered devices and ...
> ok
> > 
> >> + * restore the non forwarded state, remove IRQs and their devices from
> >> + * the respective list, put the vfio platform devices
> >> + *
> >> + * When this function is called, the vcpu already are destroyed. No
> >                                     the VPUCs are already destroyed.
> >> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> >> + * kvm_arch_set_fwd_state action
> > 
> > this last bit didn't make any sense to me.  Also, why are we referring
> > to the vgic in generic code?
> doesn't make sense anymore indeed. I wanted to emphasize the fact that
> VGIC KVM device is destroyed before the KVM VFIO device and this
> explains why I need a special CLEANUP cmd (besides the fact I need to
> call chip->irq_eoi(d) for the forwarded IRQs);

Nope, still doesn't make sense or justify the additional state for me.

> > 
> >> + */
> >> +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
> >> +{
> >> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> >> +
> >> +	/* loop on all the assigned devices */
> > 
> > unnecessary comment
> ok
> > 
> >> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> >> +				 &kv->device_list, node) {
> >> +
> >> +		/* loop on all its forwarded IRQ */
> > 
> > same
> ok
> 
> Thanks for the detailed review
> 
> Best Regards
> 
> Eric
> > 
> >> +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >> +					 &kvm_vdev_iter->fwd_irq_list, link) {
> >> +			kvm_arch_set_fwd_state(fwd_irq_iter,
> >> +						KVM_VFIO_IRQ_CLEANUP);
> >> +			list_del(&fwd_irq_iter->link);
> >> +			kfree(fwd_irq_iter);
> >> +		}
> >> +		list_del(&kvm_vdev_iter->node);
> >> +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
> >> +		kfree(kvm_vdev_iter);
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +
> >>  static int kvm_vfio_set_attr(struct kvm_device *dev,
> >>  			     struct kvm_device_attr *attr)
> >>  {
> >>  	switch (attr->group) {
> >>  	case KVM_DEV_VFIO_GROUP:
> >>  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> >> +	case KVM_DEV_VFIO_DEVICE:
> >> +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> >>  	}
> >>  
> >>  	return -ENXIO;
> >> @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> >>  		case KVM_DEV_VFIO_GROUP_DEL:
> >>  			return 0;
> >>  		}
> >> -
> >>  		break;
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> +	case KVM_DEV_VFIO_DEVICE:
> >> +		switch (attr->attr) {
> >> +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >> +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >> +			return 0;
> >> +		}
> >> +		break;
> >> +#endif
> >>  	}
> >> -
> >>  	return -ENXIO;
> >>  }
> >>  
> >> @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> >>  		list_del(&kvg->node);
> >>  		kfree(kvg);
> >>  	}
> >> +	kvm_vfio_put_all_devices(kv);
> >>  
> >>  	kvm_vfio_update_coherency(dev);
> >>  
> >> @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
> >>  		return -ENOMEM;
> >>  
> >>  	INIT_LIST_HEAD(&kv->group_list);
> >> +	INIT_LIST_HEAD(&kv->device_list);
> >>  	mutex_init(&kv->lock);
> >>  
> >>  	dev->private = kv;
> >> -- 
> >> 1.9.1
> >>
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-11 12:04         ` Eric Auger
@ 2014-09-11 15:59           ` Alex Williamson
  -1 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11 15:59 UTC (permalink / raw)
  To: Eric Auger
  Cc: Christoffer Dall, eric.auger, marc.zyngier, linux-arm-kernel,
	kvmarm, kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, 2014-09-11 at 14:04 +0200, Eric Auger wrote:
> On 09/11/2014 07:05 AM, Alex Williamson wrote:
> > On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> >> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> >>> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> >>>
> >>> This is a new control channel which enables KVM to cooperate with
> >>> viable VFIO devices.
> >>>
> >>> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> >>> in addition to a list of groups (kvm_vfio_group). The new
> >>> infrastructure enables to check the validity of the VFIO device
> >>> file descriptor, get and hold a reference to it.
> >>>
> >>> The first concrete implemented command is IRQ forward control:
> >>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> >>>
> >>> It consists in programing the VFIO driver and KVM in a consistent manner
> >>> so that an optimized IRQ injection/completion is set up. Each
> >>> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> >>> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> >>> are set again in the normal handling state (non forwarded).
> >>
> >> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> >>
> >> When a kvm_vfio_device is released?
> >>
> >>>
> >>> The forwarding programmming is architecture specific, embodied by the
> >>> kvm_arch_set_fwd_state function. Its implementation is given in a
> >>> separate patch file.
> >>
> >> I would drop the last sentence and instead indicate that this is handled
> >> properly when the architecture does not support such a feature.
> >>
> >>>
> >>> The forwarding control modality is enabled by the
> >>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> >>>
> >>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>>
> >>> ---
> >>>
> >>> v1 -> v2:
> >>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> - original patch file separated into 2 parts: generic part moved in vfio.c
> >>>   and ARM specific part(kvm_arch_set_fwd_state)
> >>> ---
> >>>  include/linux/kvm_host.h |  27 +++
> >>>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> >>>  2 files changed, 477 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >>> index a4c33b3..24350dc 100644
> >>> --- a/include/linux/kvm_host.h
> >>> +++ b/include/linux/kvm_host.h
> >>> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> >>>  		      unsigned long arg);
> >>>  };
> >>>  
> >>> +enum kvm_fwd_irq_action {
> >>> +	KVM_VFIO_IRQ_SET_FORWARD,
> >>> +	KVM_VFIO_IRQ_SET_NORMAL,
> >>> +	KVM_VFIO_IRQ_CLEANUP,
> >>
> >> This is KVM internal API, so it would probably be good to document this.
> >> Especially the CLEANUP bit worries me, see below.
> > 
> > This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
> Hi Alex,
> 
> will change that.
> > Extra states worry me too.
> 
> I tried to explained the 2 motivations behind. Please let me know if it
> makes sense.

Not really.  It seems like it's just a leak of arch specific handling
out into common code.

> >>> +};
> >>> +
> >>> +/* internal structure describing a forwarded IRQ */
> >>> +struct kvm_fwd_irq {
> >>> +	struct list_head link;
> >>
> >> this list entry is local to the kvm vfio device, right? that means you
> >> probably want a struct with just the below fields, and then have a
> >> containing struct in the generic device file, private to it's logic.
> > 
> > Yes, this is part of the abstraction problem.
> OK will fix that.
> > 
> >>> +	__u32 index; /* platform device irq index */
> > 
> > This is a vfio_device irq_index, but vfio_devices support indexes and
> > sub-indexes.  At this level the API should match vfio, not the specifics
> > of platform devices not supporting sub-index.
> I will add sub-indexes then.
> > 
> >>> +	__u32 hwirq; /*physical IRQ */
> >>> +	__u32 gsi; /* virtual IRQ */
> >>> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> > 
> > Not sure I understand why vcpu is necessary.
> vcpu is used when providing the physical IRQ/virtual IRQ mapping to the
> virtual GIC. I can remove it from and add a vcpu struct * param to
> kvm_arch_set_fwd_state if you prefer.

The kvm-vfio API for this interface doesn't allow the user to indicate
which vcpu to inject to.  On x86, it would be the programming of the
interrupt controller that would decide that.  In the code here we
arbitrarily pick vcpu0.  It feels both architecture specific and a bit
unspecified.

> 
>   Also I see a 'get' in the code below, but not a 'put'.
> Sorry I do not understand your comment here? What 'get' do you mention?

I suppose vcpus don't subscribe to the get/put philosophy, I was
expecting a reference count, but there is none.  How do we know that
vcpu pointer is still valid later?

> > 
> >>> +};
> >>> +
> >>>  void kvm_device_get(struct kvm_device *dev);
> >>>  void kvm_device_put(struct kvm_device *dev);
> >>>  struct kvm_device *kvm_device_from_filp(struct file *filp);
> >>> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
> >>>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >>>  extern struct kvm_device_ops kvm_flic_ops;
> >>>  
> >>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> >>
> >> what's the 'p' in pfwd?
> > 
> > p is for pointer?
> yes it was ;-)
> > 
> >>> +			   enum kvm_fwd_irq_action action);
> >>> +
> >>> +#else
> >>> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> >>> +					 enum kvm_fwd_irq_action action)
> >>> +{
> >>> +	return 0;
> >>> +}
> >>> +#endif
> >>> +
> >>>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >>>  
> >>>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> >>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> >>> index 76dc7a1..e4a81c4 100644
> >>> --- a/virt/kvm/vfio.c
> >>> +++ b/virt/kvm/vfio.c
> >>> @@ -18,14 +18,24 @@
> >>>  #include <linux/slab.h>
> >>>  #include <linux/uaccess.h>
> >>>  #include <linux/vfio.h>
> >>> +#include <linux/platform_device.h>
> >>>  
> >>>  struct kvm_vfio_group {
> >>>  	struct list_head node;
> >>>  	struct vfio_group *vfio_group;
> >>>  };
> >>>  
> >>> +struct kvm_vfio_device {
> >>> +	struct list_head node;
> >>> +	struct vfio_device *vfio_device;
> >>> +	/* list of forwarded IRQs for that VFIO device */
> >>> +	struct list_head fwd_irq_list;
> >>> +	int fd;
> >>> +};
> >>> +
> >>>  struct kvm_vfio {
> >>>  	struct list_head group_list;
> >>> +	struct list_head device_list;
> >>>  	struct mutex lock;
> >>>  	bool noncoherent;
> >>>  };
> >>> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >>>  	return -ENXIO;
> >>>  }
> >>>  
> >>> +/**
> >>> + * get_vfio_device - returns the vfio-device corresponding to this fd
> >>> + * @fd:fd of the vfio platform device
> >>> + *
> >>> + * checks it is a vfio device
> >>> + * increment its ref counter
> >>
> >> why the short lines?  Just write this out in proper English.
> >>
> >>> + */
> >>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> >>> +{
> >>> +	struct fd f;
> >>> +	struct vfio_device *vdev;
> >>> +
> >>> +	f = fdget(fd);
> >>> +	if (!f.file)
> >>> +		return NULL;
> >>> +	vdev = kvm_vfio_device_get_external_user(f.file);
> >>> +	fdput(f);
> >>> +	return vdev;
> >>> +}
> >>> +
> >>> +/**
> >>> + * put_vfio_device: put the vfio platform device
> >>> + * @vdev: vfio_device to put
> >>> + *
> >>> + * decrement the ref counter
> >>> + */
> >>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> >>> +{
> >>> +	kvm_vfio_device_put_external_user(vdev);
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_find_device - look for the device in the assigned
> >>> + * device list
> >>> + * @kv: the kvm-vfio device
> >>> + * @vdev: the vfio_device to look for
> >>> + *
> >>> + * returns the associated kvm_vfio_device if the device is known,
> >>> + * meaning at least 1 IRQ is forwarded for this device.
> >>> + * in the device is not registered, returns NULL.
> >>> + */
> > 
> > Why are we talking about forwarded IRQs already, this is a simple lookup
> > function, who knows what other users it will have in the future.
> I will correct
> > 
> >>
> >> are these functions meant to be exported?  Otherwise they should be
> >> static, and the documentation on these simple list iteration wrappers
> >> seems like overkill imho.
> >>
> >>> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> >>> +					     struct vfio_device *vdev)
> >>> +{
> >>> +	struct kvm_vfio_device *kvm_vdev_iter;
> >>> +
> >>> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> >>> +		if (kvm_vdev_iter->vfio_device == vdev)
> >>> +			return kvm_vdev_iter;
> >>> +	}
> >>> +	return NULL;
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> >>> + * @kvm_vdev: the kvm_vfio_device
> >>> + * @irq_index: irq index
> >>> + *
> >>> + * returns the forwarded irq struct if it exists, NULL in the negative
> >>> + */
> >>> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> >>> +				      int irq_index)
> > 
> > +sub-index
> OK
> > 
> > probably important to note on both of these that they need to be called
> > with kv->lock
> OK
> > 
> >>> +{
> >>> +	struct kvm_fwd_irq *fwd_irq_iter;
> >>> +
> >>> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> >>> +		if (fwd_irq_iter->index == irq_index)
> >>> +			return fwd_irq_iter;
> >>> +	}
> >>> +	return NULL;
> >>> +}
> >>> +
> >>> +/**
> >>> + * validate_forward - checks whether forwarding a given IRQ is meaningful
> >>> + * @vdev:  vfio_device the IRQ belongs to
> >>> + * @fwd_irq: user struct containing the irq_index to forward
> >>> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> >>> + * kvm_vfio_device that holds it
> >>> + * @hwirq: irq numberthe irq index corresponds to
> >>> + *
> >>> + * checks the vfio-device is a platform vfio device
> >>> + * checks the irq_index corresponds to an actual hwirq and
> >>> + * checks this hwirq is not already forwarded
> >>> + * returns < 0 on following errors:
> >>> + * not a platform device, bad irq index, already forwarded
> >>> + */
> >>> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> >>> +			    struct vfio_device *vdev,
> >>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >>> +			    struct kvm_vfio_device **kvm_vdev,
> >>> +			    int *hwirq)
> >>> +{
> >>> +	struct device *dev = kvm_vfio_external_base_device(vdev);
> >>> +	struct platform_device *platdev;
> >>> +
> >>> +	*hwirq = -1;
> >>> +	*kvm_vdev = NULL;
> >>> +	if (strcmp(dev->bus->name, "platform") == 0) {
> > 
> > Should be testing dev->bus_type == &platform_bus_type, and ideally
> > creating a dev_is_platform() macro to make that even cleaner.
> OK
> > 
> > However, we're being sort of sneaky here that we're actually doing
> > something platform device specific here.  Why?  Don't we just need to
> > make sure that kvm-vfio doesn't have any record of this forward
> > (-EEXIST) and let the platform device code error out later for this
> > case?
> After having answered to Christoffer's comments, I think I should check
> whether the IRQ is not already mapped at VGIC level. In that case I
> would need to split the validate function into 2 parts:
> - generic part only checks the vfio_device/irq_index is not already
> recorded. I do not need the hwirq for that.
> - arm specific part checks no GIC mapping does exist (need the hwirq)
> 
> Would it be OK for both of you?

No, the generic part is fine, but that's all we should have in kvm-vfio.
The ARM specific part should be done as part of the arch call-out.

> > 
> >>> +		platdev = to_platform_device(dev);
> >>> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> >>> +		if (*hwirq < 0) {
> >>> +			kvm_err("%s incorrect index\n", __func__);
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	} else {
> >>> +		kvm_err("%s not a platform device\n", __func__);
> >>> +		return -EINVAL;
> >>> +	}
> >>
> >> need some spaceing here, also, I would turn this around, first check if
> >> the strcmp fails, and then error out, then do you next check etc., to
> >> avoid so many nested statements.
> >>
> >>> +	/* is a ref to this device already owned by the KVM-VFIO device? */
> >>
> >> this comment is not particularly helpful in its current form, it would
> >> be helpful if you specified that we're checking whether that particular
> >> device/irq combo is already registered.
> >>
> >>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >>> +	if (*kvm_vdev) {
> >>> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> >>> +			kvm_err("%s irq %d already forwarded\n",
> >>> +				__func__, *hwirq);
> > 
> > Why didn't we do this first?
> see above comment
> > 
> >> don't flood the kernel log because of a user error, just allocate an
> >> error code for this purpose and document it in the ABI, -EEXIST or
> >> something.
> >>
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/**
> >>> + * validate_unforward: check a deassignment is meaningful
> >>> + * @kv: the kvm_vfio device
> >>> + * @vdev: the vfio_device whose irq to deassign belongs to
> >>> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> >>> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> >>> + * it exists
> >>> + *
> >>> + * returns 0 if the provided irq effectively is forwarded
> >>> + * (a ref to this vfio_device is hold and this irq belongs to
> >>                                     held
> >>> + * the forwarded irq of this device)
> >>> + * returns -EINVAL in the negative
> >>
> >>                ENOENT should be returned if you don't have an entry.
> >> 	       EINVAL could be used if you supply an fd that isn't a
> >> 	       VFIO device file descriptor, for example.  Again,
> >> 	       consider documenting all this in the API.
> >>
> >>> + */
> >>> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> >>> +			      struct vfio_device *vdev,
> >>> +			      struct kvm_arch_forwarded_irq *fwd_irq,
> >>> +			      struct kvm_vfio_device **kvm_vdev)
> >>> +{
> >>> +	struct kvm_fwd_irq *pfwd;
> >>> +
> >>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >>> +	if (!kvm_vdev) {
> >>> +		kvm_err("%s no forwarded irq for this device\n", __func__);
> >>
> >> don't flood the kernel log
> >>
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> >>> +	if (!pfwd) {
> >>> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> >>
> >>> +		return -EINVAL;
> >>
> >> same here
> I do not understand. With current functions I need to first retrieve the
> device and then iterate on IRQs of that device.
> >>
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_forward - set a forwarded IRQ
> >>> + * @kdev: the kvm device
> >>> + * @vdev: the vfio device the IRQ belongs to
> >>> + * @fwd_irq: the user struct containing the irq_index and guest irq
> >>> + * @must_put: tells the caller whether the vfio_device must be put after
> >>> + * the call (ref must be released in case a ref onto this device was
> >>> + * already hold or in case of new device and failure)
> >>> + *
> >>> + * validate the injection, activate forward and store the information
> >>       Validate
> >>> + * about which irq and which device is concerned so that on deassign or
> >>> + * kvm-vfio destruction everuthing can be cleaned up.
> >>                            everything
> >>
> >> I'm not sure I understand this explanation.  Do we have concerned
> >> devices?
> >>
> >> I think you want to say something along the lines of: If userspace passed
> >> a valid vfio device and irq handle and the architecture supports
> >> forwarding this combination, register the vfio_device and irq
> >> combination in the ....
> >>
> >>> + */
> >>> +static int kvm_vfio_forward(struct kvm_device *kdev,
> >>> +			    struct vfio_device *vdev,
> >>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >>> +			    bool *must_put)
> >>> +{
> >>> +	int ret;
> >>> +	struct kvm_fwd_irq *pfwd = NULL;
> >>> +	struct kvm_vfio_device *kvm_vdev = NULL;
> >>> +	struct kvm_vfio *kv = kdev->private;
> >>> +	int hwirq;
> >>> +
> >>> +	*must_put = true;
> >>> +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
> >>> +					&kvm_vdev, &hwirq);
> >>> +	if (ret < 0)
> >>> +		return -EINVAL;
> >>> +
> >>> +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
> >>
> >> seems a bit pointless to zero-out the memory if you're setting all
> >> fields below.
> >>
> >>> +	if (!pfwd)
> >>> +		return -ENOMEM;
> >>> +	pfwd->index = fwd_irq->index;
> >>> +	pfwd->gsi = fwd_irq->gsi;
> >>> +	pfwd->hwirq = hwirq;
> >>> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> >>> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> >>> +	if (ret < 0) {
> >>> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> >>
> >> this whole thing feels incredibly broken to me.  Setting a forward
> >> should either work or not work, not something in between that leaves
> >> something to be cleaned up.  Why this two-stage thingy here?
> > 
> > Yep, I agree.  I also don't see the point of the validate function, just
> > open code it here and push the platform_get_irq test into
> > kvm_arch_set_fwd_state.  kvm-vfio doesn't care about the hwirq.
> > 
> >>> +		kfree(pfwd);
> >>
> >> probably want to move your free-and-return-error to the end of the
> >> function.
> >>
> >>> +		return ret;
> >>> +	}
> >>> +
> >>> +	if (!kvm_vdev) {
> >>> +		/* create & insert the new device and keep the ref */
> >>> +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
> >>
> >> again, no need for zeroing out the memory.
> > 
> > I think you also want to allocate this before you setup the forward so
> > you can eliminate any complicated teardown later.
> ok
> > 
> >>> +		if (!kvm_vdev) {
> >>> +			kvm_arch_set_fwd_state(pfwd, false);
> > 
> > false?  The function takes an enum.
> Thanks for identifying that bug.
> > 
> >>> +			kfree(pfwd);
> >>> +			return -ENOMEM;
> >>> +		}
> >>> +
> >>> +		kvm_vdev->vfio_device = vdev;
> >>> +		kvm_vdev->fd = fwd_irq->fd;
> >>> +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
> >>> +		list_add(&kvm_vdev->node, &kv->device_list);
> >>> +		/*
> >>> +		 * the only case where we keep the ref:
> >>> +		 * new device and forward setting successful
> >>> +		 */
> >>> +		*must_put = false;
> >>> +	}
> >>> +
> >>> +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
> >>> +
> >>> +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
> >>> +	fwd_irq->fd, hwirq, fwd_irq->gsi);
> >>
> >> please indent this to align with the opening parenthesis.
> >>
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/**
> >>> + * remove_assigned_device - put a given device from the list
> >>
> >> this isn't a 'put', at least not *just* a put.
> >>
> >>> + * @kv: the kvm-vfio device
> >>> + * @vdev: the vfio-device to remove
> >>> + *
> >>> + * change the state of all forwarded IRQs, free the forwarded IRQ list,
> >>> + * remove the corresponding kvm_vfio_device from the assigned device
> >>> + * list.
> >>> + * returns true if the device could be removed, false in the negative
> >>> + */
> >>> +bool remove_assigned_device(struct kvm_vfio *kv,
> >>> +			    struct vfio_device *vdev)
> >>> +{
> >>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> >>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >>> +	bool removed = false;
> >>> +	int ret;
> >>> +
> >>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> >>> +				 &kv->device_list, node) {
> >>> +		if (kvm_vdev_iter->vfio_device == vdev) {
> >>> +			/* loop on all its forwarded IRQ */
> >>> +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >>> +						 &kvm_vdev_iter->fwd_irq_list,
> >>> +						 link) {
> >>
> >> hmm, seems this function is only called when you have no more forwarded
> >> IRQs, so isn't all of this completely dead (and unnecessary) code?
> >>
> >>> +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> >>> +						KVM_VFIO_IRQ_SET_NORMAL);
> >>> +				if (ret < 0)
> >>> +					return ret;
> >>
> >> you're returning an error code to a bool function, which means you'll
> >> return true when there was an error.  Is this your intention? ;)
> >>
> >> if we have an error here, this would be a very very bad situation wouldn't it?
> >>
> >>> +				list_del(&fwd_irq_iter->link);
> >>> +				kfree(fwd_irq_iter);
> >>> +			}
> >>> +			/* all IRQs could be deassigned */
> >>> +			list_del(&kvm_vdev_iter->node);
> >>> +			kvm_vfio_device_put_external_user(
> >>> +				kvm_vdev_iter->vfio_device);
> >>> +			kfree(kvm_vdev_iter);
> >>> +			removed = true;
> >>> +			break;
> >>> +		}
> >>> +	}
> >>> +	return removed;
> >>> +}
> >>> +
> >>> +
> >>> +/**
> >>> + * remove_fwd_irq - remove a forwarded irq
> >>> + *
> >>> + * @kv: kvm-vfio device
> >>> + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
> >>> + * irq_index: the index of the IRQ
> >>> + *
> >>> + * change the forwarded state of the IRQ, remove the IRQ from
> >>> + * the device forwarded IRQ list. In case it is the last one,
> >>> + * put the device
> >>> + */
> >>> +int remove_fwd_irq(struct kvm_vfio *kv,
> >>> +		   struct kvm_vfio_device *kvm_vdev,
> >>> +		   int irq_index)
> >>> +{
> >>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >>> +	int ret = -1;
> >>> +
> >>> +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >>> +				 &kvm_vdev->fwd_irq_list, link) {
> >>
> >> hmmm, you can only forward one irq for a specific device once, right?
> >> And you already have a lookup function, so why not call that, and then
> >> remove it?
> >>
> >> I'm confused.
> 
> > 
> > Me too, this and the previous function need some more consideration.
> understood
> > 
> >>> +		if (fwd_irq_iter->index == irq_index) {
> >>> +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> >>> +						KVM_VFIO_IRQ_SET_NORMAL);
> >>> +			if (ret < 0)
> >>> +				break;
> >>> +			list_del(&fwd_irq_iter->link);
> >>> +			kfree(fwd_irq_iter);
> >>> +			ret = 0;
> >>> +			break;
> >>> +		}
> >>> +	}
> >>> +	if (list_empty(&kvm_vdev->fwd_irq_list))
> >>> +		remove_assigned_device(kv, kvm_vdev->vfio_device);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_unforward - remove a forwarded IRQ
> >>> + * @kdev: the kvm device
> >>> + * @vdev: the vfio_device
> >>> + * @fwd_irq: user struct
> >>> + * after checking this IRQ effectively is forwarded, change its state,
> >>> + * remove it from the corresponding kvm_vfio_device list
> >>> + */
> >>> +static int kvm_vfio_unforward(struct kvm_device *kdev,
> >>> +				     struct vfio_device *vdev,
> >>> +				     struct kvm_arch_forwarded_irq *fwd_irq)
> >>> +{
> >>> +	struct kvm_vfio *kv = kdev->private;
> >>> +	struct kvm_vfio_device *kvm_vdev;
> >>> +	int ret;
> >>> +
> >>> +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
> >>> +	if (ret < 0)
> >>> +		return -EINVAL;
> >>
> >> why do you override the return value?  Propagate it.
> >>
> >>> +
> >>> +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
> >>> +	if (ret < 0)
> >>> +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
> >>> +			__func__, fwd_irq->fd, fwd_irq->index);
> >>> +	else
> >>> +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
> >>> +			  __func__, fwd_irq->fd, fwd_irq->index);
> >>
> >> again with the kernel log here.
> >>
> >>
> >>
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +
> >>> +
> >>> +
> >>> +/**
> >>> + * kvm_vfio_set_device - the top function for interracting with a vfio
> >>
> >>                                 top?             interacting
> >>
> >>> + * device
> >>> + */
> >>
> >> probably just skip this comment
> >>
> >>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> >>> +{
> >>> +	struct kvm_vfio *kv = kdev->private;
> >>> +	struct vfio_device *vdev;
> >>> +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
> >>> +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> >>> +
> >>> +	switch (attr) {
> >>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> >>> +		bool must_put;
> >>> +		int ret;
> >>> +
> >>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> >>> +			return -EFAULT;
> >>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> >>> +		if (IS_ERR(vdev))
> >>> +			return PTR_ERR(vdev);
> >>
> >> seems like this whole block of code is replicated below, needs
> >> refactoring.
> >>
> >>> +		mutex_lock(&kv->lock);
> >>> +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
> >>> +		if (must_put)
> >>> +			kvm_vfio_put_vfio_device(vdev);
> >>
> >> this must_put looks plain weird.  I think you want to balance your
> >> get/put's always; can't you just get an extra reference in
> >> kvm_vfio_forward() ?
> > 
> > Yeah, this is very broken.  Every forwarded IRQ should hold a reference
> > to the vfio_device.  Every unforward should drop a reference.  Trying to
> > maintain a single reference is a non-goal.
> OK will do that.
> > 
> >>
> >>> +		mutex_unlock(&kv->lock);
> >>> +		return ret;
> >>> +		}
> >>> +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
> >>> +		int ret;
> >>> +
> >>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> >>> +			return -EFAULT;
> >>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> >>> +		if (IS_ERR(vdev))
> >>> +			return PTR_ERR(vdev);
> >>> +
> >>> +		kvm_vfio_device_put_external_user(vdev);
> >>
> >> you're dropping the reference to the device but referencing it in your
> >> unfoward call below?
> >>
> >>> +		mutex_lock(&kv->lock);
> >>> +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
> >>> +		mutex_unlock(&kv->lock);
> >>> +		return ret;
> >>> +	}
> >>> +#endif
> >>> +	default:
> >>> +		return -ENXIO;
> >>> +	}
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
> >>> + * @kv: kvm-vfio device
> >>> + *
> >>> + * loop on all got devices and their associated forwarded IRQs
> >>
> >> 'loop on all got' ?
> >>
> >> Restore the non-forwarded state for all registered devices and ...
> >>
> >>> + * restore the non forwarded state, remove IRQs and their devices from
> >>> + * the respective list, put the vfio platform devices
> >>> + *
> >>> + * When this function is called, the vcpu already are destroyed. No
> >>                                     the VPUCs are already destroyed.
> >>> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> >>> + * kvm_arch_set_fwd_state action
> >>
> >> this last bit didn't make any sense to me.  Also, why are we referring
> >> to the vgic in generic code?
> >>
> >>> + */
> >>> +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
> >>> +{
> >>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> >>> +
> >>> +	/* loop on all the assigned devices */
> >>
> >> unnecessary comment
> >>
> >>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> >>> +				 &kv->device_list, node) {
> >>> +
> >>> +		/* loop on all its forwarded IRQ */
> >>
> >> same
> >>
> >>> +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >>> +					 &kvm_vdev_iter->fwd_irq_list, link) {
> >>> +			kvm_arch_set_fwd_state(fwd_irq_iter,
> >>> +						KVM_VFIO_IRQ_CLEANUP);
> >>> +			list_del(&fwd_irq_iter->link);
> >>> +			kfree(fwd_irq_iter);
> >>> +		}
> > 
> > 
> > Ugh, how many of these cleanup functions do we need?
> will simplify!
> 
> Thanks
> 
> Best Regards
> 
> Eric
> > 
> >>> +		list_del(&kvm_vdev_iter->node);
> >>> +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
> >>> +		kfree(kvm_vdev_iter);
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +
> >>>  static int kvm_vfio_set_attr(struct kvm_device *dev,
> >>>  			     struct kvm_device_attr *attr)
> >>>  {
> >>>  	switch (attr->group) {
> >>>  	case KVM_DEV_VFIO_GROUP:
> >>>  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> >>> +	case KVM_DEV_VFIO_DEVICE:
> >>> +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> >>>  	}
> >>>  
> >>>  	return -ENXIO;
> >>> @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> >>>  		case KVM_DEV_VFIO_GROUP_DEL:
> >>>  			return 0;
> >>>  		}
> >>> -
> >>>  		break;
> >>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> +	case KVM_DEV_VFIO_DEVICE:
> >>> +		switch (attr->attr) {
> >>> +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >>> +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >>> +			return 0;
> >>> +		}
> >>> +		break;
> >>> +#endif
> >>>  	}
> >>> -
> >>>  	return -ENXIO;
> >>>  }
> >>>  
> >>> @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> >>>  		list_del(&kvg->node);
> >>>  		kfree(kvg);
> >>>  	}
> >>> +	kvm_vfio_put_all_devices(kv);
> >>>  
> >>>  	kvm_vfio_update_coherency(dev);
> >>>  
> >>> @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
> >>>  		return -ENOMEM;
> >>>  
> >>>  	INIT_LIST_HEAD(&kv->group_list);
> >>> +	INIT_LIST_HEAD(&kv->device_list);
> >>>  	mutex_init(&kv->lock);
> >>>  
> >>>  	dev->private = kv;
> >>> -- 
> >>> 1.9.1
> >>>
> > 
> > 
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/




^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11 15:59           ` Alex Williamson
  0 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11 15:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2014-09-11 at 14:04 +0200, Eric Auger wrote:
> On 09/11/2014 07:05 AM, Alex Williamson wrote:
> > On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> >> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> >>> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> >>>
> >>> This is a new control channel which enables KVM to cooperate with
> >>> viable VFIO devices.
> >>>
> >>> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> >>> in addition to a list of groups (kvm_vfio_group). The new
> >>> infrastructure enables to check the validity of the VFIO device
> >>> file descriptor, get and hold a reference to it.
> >>>
> >>> The first concrete implemented command is IRQ forward control:
> >>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> >>>
> >>> It consists in programing the VFIO driver and KVM in a consistent manner
> >>> so that an optimized IRQ injection/completion is set up. Each
> >>> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> >>> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> >>> are set again in the normal handling state (non forwarded).
> >>
> >> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> >>
> >> When a kvm_vfio_device is released?
> >>
> >>>
> >>> The forwarding programmming is architecture specific, embodied by the
> >>> kvm_arch_set_fwd_state function. Its implementation is given in a
> >>> separate patch file.
> >>
> >> I would drop the last sentence and instead indicate that this is handled
> >> properly when the architecture does not support such a feature.
> >>
> >>>
> >>> The forwarding control modality is enabled by the
> >>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> >>>
> >>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>>
> >>> ---
> >>>
> >>> v1 -> v2:
> >>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> - original patch file separated into 2 parts: generic part moved in vfio.c
> >>>   and ARM specific part(kvm_arch_set_fwd_state)
> >>> ---
> >>>  include/linux/kvm_host.h |  27 +++
> >>>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> >>>  2 files changed, 477 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >>> index a4c33b3..24350dc 100644
> >>> --- a/include/linux/kvm_host.h
> >>> +++ b/include/linux/kvm_host.h
> >>> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> >>>  		      unsigned long arg);
> >>>  };
> >>>  
> >>> +enum kvm_fwd_irq_action {
> >>> +	KVM_VFIO_IRQ_SET_FORWARD,
> >>> +	KVM_VFIO_IRQ_SET_NORMAL,
> >>> +	KVM_VFIO_IRQ_CLEANUP,
> >>
> >> This is KVM internal API, so it would probably be good to document this.
> >> Especially the CLEANUP bit worries me, see below.
> > 
> > This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
> Hi Alex,
> 
> will change that.
> > Extra states worry me too.
> 
> I tried to explained the 2 motivations behind. Please let me know if it
> makes sense.

Not really.  It seems like it's just a leak of arch specific handling
out into common code.

> >>> +};
> >>> +
> >>> +/* internal structure describing a forwarded IRQ */
> >>> +struct kvm_fwd_irq {
> >>> +	struct list_head link;
> >>
> >> this list entry is local to the kvm vfio device, right? that means you
> >> probably want a struct with just the below fields, and then have a
> >> containing struct in the generic device file, private to it's logic.
> > 
> > Yes, this is part of the abstraction problem.
> OK will fix that.
> > 
> >>> +	__u32 index; /* platform device irq index */
> > 
> > This is a vfio_device irq_index, but vfio_devices support indexes and
> > sub-indexes.  At this level the API should match vfio, not the specifics
> > of platform devices not supporting sub-index.
> I will add sub-indexes then.
> > 
> >>> +	__u32 hwirq; /*physical IRQ */
> >>> +	__u32 gsi; /* virtual IRQ */
> >>> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> > 
> > Not sure I understand why vcpu is necessary.
> vcpu is used when providing the physical IRQ/virtual IRQ mapping to the
> virtual GIC. I can remove it from and add a vcpu struct * param to
> kvm_arch_set_fwd_state if you prefer.

The kvm-vfio API for this interface doesn't allow the user to indicate
which vcpu to inject to.  On x86, it would be the programming of the
interrupt controller that would decide that.  In the code here we
arbitrarily pick vcpu0.  It feels both architecture specific and a bit
unspecified.

> 
>   Also I see a 'get' in the code below, but not a 'put'.
> Sorry I do not understand your comment here? What 'get' do you mention?

I suppose vcpus don't subscribe to the get/put philosophy, I was
expecting a reference count, but there is none.  How do we know that
vcpu pointer is still valid later?

> > 
> >>> +};
> >>> +
> >>>  void kvm_device_get(struct kvm_device *dev);
> >>>  void kvm_device_put(struct kvm_device *dev);
> >>>  struct kvm_device *kvm_device_from_filp(struct file *filp);
> >>> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
> >>>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >>>  extern struct kvm_device_ops kvm_flic_ops;
> >>>  
> >>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> >>
> >> what's the 'p' in pfwd?
> > 
> > p is for pointer?
> yes it was ;-)
> > 
> >>> +			   enum kvm_fwd_irq_action action);
> >>> +
> >>> +#else
> >>> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> >>> +					 enum kvm_fwd_irq_action action)
> >>> +{
> >>> +	return 0;
> >>> +}
> >>> +#endif
> >>> +
> >>>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >>>  
> >>>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> >>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> >>> index 76dc7a1..e4a81c4 100644
> >>> --- a/virt/kvm/vfio.c
> >>> +++ b/virt/kvm/vfio.c
> >>> @@ -18,14 +18,24 @@
> >>>  #include <linux/slab.h>
> >>>  #include <linux/uaccess.h>
> >>>  #include <linux/vfio.h>
> >>> +#include <linux/platform_device.h>
> >>>  
> >>>  struct kvm_vfio_group {
> >>>  	struct list_head node;
> >>>  	struct vfio_group *vfio_group;
> >>>  };
> >>>  
> >>> +struct kvm_vfio_device {
> >>> +	struct list_head node;
> >>> +	struct vfio_device *vfio_device;
> >>> +	/* list of forwarded IRQs for that VFIO device */
> >>> +	struct list_head fwd_irq_list;
> >>> +	int fd;
> >>> +};
> >>> +
> >>>  struct kvm_vfio {
> >>>  	struct list_head group_list;
> >>> +	struct list_head device_list;
> >>>  	struct mutex lock;
> >>>  	bool noncoherent;
> >>>  };
> >>> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >>>  	return -ENXIO;
> >>>  }
> >>>  
> >>> +/**
> >>> + * get_vfio_device - returns the vfio-device corresponding to this fd
> >>> + * @fd:fd of the vfio platform device
> >>> + *
> >>> + * checks it is a vfio device
> >>> + * increment its ref counter
> >>
> >> why the short lines?  Just write this out in proper English.
> >>
> >>> + */
> >>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> >>> +{
> >>> +	struct fd f;
> >>> +	struct vfio_device *vdev;
> >>> +
> >>> +	f = fdget(fd);
> >>> +	if (!f.file)
> >>> +		return NULL;
> >>> +	vdev = kvm_vfio_device_get_external_user(f.file);
> >>> +	fdput(f);
> >>> +	return vdev;
> >>> +}
> >>> +
> >>> +/**
> >>> + * put_vfio_device: put the vfio platform device
> >>> + * @vdev: vfio_device to put
> >>> + *
> >>> + * decrement the ref counter
> >>> + */
> >>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> >>> +{
> >>> +	kvm_vfio_device_put_external_user(vdev);
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_find_device - look for the device in the assigned
> >>> + * device list
> >>> + * @kv: the kvm-vfio device
> >>> + * @vdev: the vfio_device to look for
> >>> + *
> >>> + * returns the associated kvm_vfio_device if the device is known,
> >>> + * meaning at least 1 IRQ is forwarded for this device.
> >>> + * in the device is not registered, returns NULL.
> >>> + */
> > 
> > Why are we talking about forwarded IRQs already, this is a simple lookup
> > function, who knows what other users it will have in the future.
> I will correct
> > 
> >>
> >> are these functions meant to be exported?  Otherwise they should be
> >> static, and the documentation on these simple list iteration wrappers
> >> seems like overkill imho.
> >>
> >>> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> >>> +					     struct vfio_device *vdev)
> >>> +{
> >>> +	struct kvm_vfio_device *kvm_vdev_iter;
> >>> +
> >>> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> >>> +		if (kvm_vdev_iter->vfio_device == vdev)
> >>> +			return kvm_vdev_iter;
> >>> +	}
> >>> +	return NULL;
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> >>> + * @kvm_vdev: the kvm_vfio_device
> >>> + * @irq_index: irq index
> >>> + *
> >>> + * returns the forwarded irq struct if it exists, NULL in the negative
> >>> + */
> >>> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> >>> +				      int irq_index)
> > 
> > +sub-index
> OK
> > 
> > probably important to note on both of these that they need to be called
> > with kv->lock
> OK
> > 
> >>> +{
> >>> +	struct kvm_fwd_irq *fwd_irq_iter;
> >>> +
> >>> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> >>> +		if (fwd_irq_iter->index == irq_index)
> >>> +			return fwd_irq_iter;
> >>> +	}
> >>> +	return NULL;
> >>> +}
> >>> +
> >>> +/**
> >>> + * validate_forward - checks whether forwarding a given IRQ is meaningful
> >>> + * @vdev:  vfio_device the IRQ belongs to
> >>> + * @fwd_irq: user struct containing the irq_index to forward
> >>> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> >>> + * kvm_vfio_device that holds it
> >>> + * @hwirq: irq numberthe irq index corresponds to
> >>> + *
> >>> + * checks the vfio-device is a platform vfio device
> >>> + * checks the irq_index corresponds to an actual hwirq and
> >>> + * checks this hwirq is not already forwarded
> >>> + * returns < 0 on following errors:
> >>> + * not a platform device, bad irq index, already forwarded
> >>> + */
> >>> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> >>> +			    struct vfio_device *vdev,
> >>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >>> +			    struct kvm_vfio_device **kvm_vdev,
> >>> +			    int *hwirq)
> >>> +{
> >>> +	struct device *dev = kvm_vfio_external_base_device(vdev);
> >>> +	struct platform_device *platdev;
> >>> +
> >>> +	*hwirq = -1;
> >>> +	*kvm_vdev = NULL;
> >>> +	if (strcmp(dev->bus->name, "platform") == 0) {
> > 
> > Should be testing dev->bus_type == &platform_bus_type, and ideally
> > creating a dev_is_platform() macro to make that even cleaner.
> OK
> > 
> > However, we're being sort of sneaky here that we're actually doing
> > something platform device specific here.  Why?  Don't we just need to
> > make sure that kvm-vfio doesn't have any record of this forward
> > (-EEXIST) and let the platform device code error out later for this
> > case?
> After having answered to Christoffer's comments, I think I should check
> whether the IRQ is not already mapped at VGIC level. In that case I
> would need to split the validate function into 2 parts:
> - generic part only checks the vfio_device/irq_index is not already
> recorded. I do not need the hwirq for that.
> - arm specific part checks no GIC mapping does exist (need the hwirq)
> 
> Would it be OK for both of you?

No, the generic part is fine, but that's all we should have in kvm-vfio.
The ARM specific part should be done as part of the arch call-out.

> > 
> >>> +		platdev = to_platform_device(dev);
> >>> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> >>> +		if (*hwirq < 0) {
> >>> +			kvm_err("%s incorrect index\n", __func__);
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	} else {
> >>> +		kvm_err("%s not a platform device\n", __func__);
> >>> +		return -EINVAL;
> >>> +	}
> >>
> >> need some spaceing here, also, I would turn this around, first check if
> >> the strcmp fails, and then error out, then do you next check etc., to
> >> avoid so many nested statements.
> >>
> >>> +	/* is a ref to this device already owned by the KVM-VFIO device? */
> >>
> >> this comment is not particularly helpful in its current form, it would
> >> be helpful if you specified that we're checking whether that particular
> >> device/irq combo is already registered.
> >>
> >>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >>> +	if (*kvm_vdev) {
> >>> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> >>> +			kvm_err("%s irq %d already forwarded\n",
> >>> +				__func__, *hwirq);
> > 
> > Why didn't we do this first?
> see above comment
> > 
> >> don't flood the kernel log because of a user error, just allocate an
> >> error code for this purpose and document it in the ABI, -EEXIST or
> >> something.
> >>
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/**
> >>> + * validate_unforward: check a deassignment is meaningful
> >>> + * @kv: the kvm_vfio device
> >>> + * @vdev: the vfio_device whose irq to deassign belongs to
> >>> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> >>> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> >>> + * it exists
> >>> + *
> >>> + * returns 0 if the provided irq effectively is forwarded
> >>> + * (a ref to this vfio_device is hold and this irq belongs to
> >>                                     held
> >>> + * the forwarded irq of this device)
> >>> + * returns -EINVAL in the negative
> >>
> >>                ENOENT should be returned if you don't have an entry.
> >> 	       EINVAL could be used if you supply an fd that isn't a
> >> 	       VFIO device file descriptor, for example.  Again,
> >> 	       consider documenting all this in the API.
> >>
> >>> + */
> >>> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> >>> +			      struct vfio_device *vdev,
> >>> +			      struct kvm_arch_forwarded_irq *fwd_irq,
> >>> +			      struct kvm_vfio_device **kvm_vdev)
> >>> +{
> >>> +	struct kvm_fwd_irq *pfwd;
> >>> +
> >>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >>> +	if (!kvm_vdev) {
> >>> +		kvm_err("%s no forwarded irq for this device\n", __func__);
> >>
> >> don't flood the kernel log
> >>
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> >>> +	if (!pfwd) {
> >>> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> >>
> >>> +		return -EINVAL;
> >>
> >> same here
> I do not understand. With current functions I need to first retrieve the
> device and then iterate on IRQs of that device.
> >>
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_forward - set a forwarded IRQ
> >>> + * @kdev: the kvm device
> >>> + * @vdev: the vfio device the IRQ belongs to
> >>> + * @fwd_irq: the user struct containing the irq_index and guest irq
> >>> + * @must_put: tells the caller whether the vfio_device must be put after
> >>> + * the call (ref must be released in case a ref onto this device was
> >>> + * already hold or in case of new device and failure)
> >>> + *
> >>> + * validate the injection, activate forward and store the information
> >>       Validate
> >>> + * about which irq and which device is concerned so that on deassign or
> >>> + * kvm-vfio destruction everuthing can be cleaned up.
> >>                            everything
> >>
> >> I'm not sure I understand this explanation.  Do we have concerned
> >> devices?
> >>
> >> I think you want to say something along the lines of: If userspace passed
> >> a valid vfio device and irq handle and the architecture supports
> >> forwarding this combination, register the vfio_device and irq
> >> combination in the ....
> >>
> >>> + */
> >>> +static int kvm_vfio_forward(struct kvm_device *kdev,
> >>> +			    struct vfio_device *vdev,
> >>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >>> +			    bool *must_put)
> >>> +{
> >>> +	int ret;
> >>> +	struct kvm_fwd_irq *pfwd = NULL;
> >>> +	struct kvm_vfio_device *kvm_vdev = NULL;
> >>> +	struct kvm_vfio *kv = kdev->private;
> >>> +	int hwirq;
> >>> +
> >>> +	*must_put = true;
> >>> +	ret = kvm_vfio_validate_forward(kv, vdev, fwd_irq,
> >>> +					&kvm_vdev, &hwirq);
> >>> +	if (ret < 0)
> >>> +		return -EINVAL;
> >>> +
> >>> +	pfwd = kzalloc(sizeof(*pfwd), GFP_KERNEL);
> >>
> >> seems a bit pointless to zero-out the memory if you're setting all
> >> fields below.
> >>
> >>> +	if (!pfwd)
> >>> +		return -ENOMEM;
> >>> +	pfwd->index = fwd_irq->index;
> >>> +	pfwd->gsi = fwd_irq->gsi;
> >>> +	pfwd->hwirq = hwirq;
> >>> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> >>> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> >>> +	if (ret < 0) {
> >>> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> >>
> >> this whole thing feels incredibly broken to me.  Setting a forward
> >> should either work or not work, not something in between that leaves
> >> something to be cleaned up.  Why this two-stage thingy here?
> > 
> > Yep, I agree.  I also don't see the point of the validate function, just
> > open code it here and push the platform_get_irq test into
> > kvm_arch_set_fwd_state.  kvm-vfio doesn't care about the hwirq.
> > 
> >>> +		kfree(pfwd);
> >>
> >> probably want to move your free-and-return-error to the end of the
> >> function.
> >>
> >>> +		return ret;
> >>> +	}
> >>> +
> >>> +	if (!kvm_vdev) {
> >>> +		/* create & insert the new device and keep the ref */
> >>> +		kvm_vdev = kzalloc(sizeof(*kvm_vdev), GFP_KERNEL);
> >>
> >> again, no need for zeroing out the memory.
> > 
> > I think you also want to allocate this before you setup the forward so
> > you can eliminate any complicated teardown later.
> ok
> > 
> >>> +		if (!kvm_vdev) {
> >>> +			kvm_arch_set_fwd_state(pfwd, false);
> > 
> > false?  The function takes an enum.
> Thanks for identifying that bug.
> > 
> >>> +			kfree(pfwd);
> >>> +			return -ENOMEM;
> >>> +		}
> >>> +
> >>> +		kvm_vdev->vfio_device = vdev;
> >>> +		kvm_vdev->fd = fwd_irq->fd;
> >>> +		INIT_LIST_HEAD(&kvm_vdev->fwd_irq_list);
> >>> +		list_add(&kvm_vdev->node, &kv->device_list);
> >>> +		/*
> >>> +		 * the only case where we keep the ref:
> >>> +		 * new device and forward setting successful
> >>> +		 */
> >>> +		*must_put = false;
> >>> +	}
> >>> +
> >>> +	list_add(&pfwd->link, &kvm_vdev->fwd_irq_list);
> >>> +
> >>> +	kvm_debug("forwarding set for fd=%d, hwirq=%d, gsi=%d\n",
> >>> +	fwd_irq->fd, hwirq, fwd_irq->gsi);
> >>
> >> please indent this to align with the opening parenthesis.
> >>
> >>> +
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/**
> >>> + * remove_assigned_device - put a given device from the list
> >>
> >> this isn't a 'put', at least not *just* a put.
> >>
> >>> + * @kv: the kvm-vfio device
> >>> + * @vdev: the vfio-device to remove
> >>> + *
> >>> + * change the state of all forwarded IRQs, free the forwarded IRQ list,
> >>> + * remove the corresponding kvm_vfio_device from the assigned device
> >>> + * list.
> >>> + * returns true if the device could be removed, false in the negative
> >>> + */
> >>> +bool remove_assigned_device(struct kvm_vfio *kv,
> >>> +			    struct vfio_device *vdev)
> >>> +{
> >>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> >>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >>> +	bool removed = false;
> >>> +	int ret;
> >>> +
> >>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> >>> +				 &kv->device_list, node) {
> >>> +		if (kvm_vdev_iter->vfio_device == vdev) {
> >>> +			/* loop on all its forwarded IRQ */
> >>> +			list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >>> +						 &kvm_vdev_iter->fwd_irq_list,
> >>> +						 link) {
> >>
> >> hmm, seems this function is only called when you have no more forwarded
> >> IRQs, so isn't all of this completely dead (and unnecessary) code?
> >>
> >>> +				ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> >>> +						KVM_VFIO_IRQ_SET_NORMAL);
> >>> +				if (ret < 0)
> >>> +					return ret;
> >>
> >> you're returning an error code to a bool function, which means you'll
> >> return true when there was an error.  Is this your intention? ;)
> >>
> >> if we have an error here, this would be a very very bad situation wouldn't it?
> >>
> >>> +				list_del(&fwd_irq_iter->link);
> >>> +				kfree(fwd_irq_iter);
> >>> +			}
> >>> +			/* all IRQs could be deassigned */
> >>> +			list_del(&kvm_vdev_iter->node);
> >>> +			kvm_vfio_device_put_external_user(
> >>> +				kvm_vdev_iter->vfio_device);
> >>> +			kfree(kvm_vdev_iter);
> >>> +			removed = true;
> >>> +			break;
> >>> +		}
> >>> +	}
> >>> +	return removed;
> >>> +}
> >>> +
> >>> +
> >>> +/**
> >>> + * remove_fwd_irq - remove a forwarded irq
> >>> + *
> >>> + * @kv: kvm-vfio device
> >>> + * kvm_vdev: the kvm_vfio_device the IRQ belongs to
> >>> + * irq_index: the index of the IRQ
> >>> + *
> >>> + * change the forwarded state of the IRQ, remove the IRQ from
> >>> + * the device forwarded IRQ list. In case it is the last one,
> >>> + * put the device
> >>> + */
> >>> +int remove_fwd_irq(struct kvm_vfio *kv,
> >>> +		   struct kvm_vfio_device *kvm_vdev,
> >>> +		   int irq_index)
> >>> +{
> >>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >>> +	int ret = -1;
> >>> +
> >>> +	list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >>> +				 &kvm_vdev->fwd_irq_list, link) {
> >>
> >> hmmm, you can only forward one irq for a specific device once, right?
> >> And you already have a lookup function, so why not call that, and then
> >> remove it?
> >>
> >> I'm confused.
> 
> > 
> > Me too, this and the previous function need some more consideration.
> understood
> > 
> >>> +		if (fwd_irq_iter->index == irq_index) {
> >>> +			ret = kvm_arch_set_fwd_state(fwd_irq_iter,
> >>> +						KVM_VFIO_IRQ_SET_NORMAL);
> >>> +			if (ret < 0)
> >>> +				break;
> >>> +			list_del(&fwd_irq_iter->link);
> >>> +			kfree(fwd_irq_iter);
> >>> +			ret = 0;
> >>> +			break;
> >>> +		}
> >>> +	}
> >>> +	if (list_empty(&kvm_vdev->fwd_irq_list))
> >>> +		remove_assigned_device(kv, kvm_vdev->vfio_device);
> >>> +
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_unforward - remove a forwarded IRQ
> >>> + * @kdev: the kvm device
> >>> + * @vdev: the vfio_device
> >>> + * @fwd_irq: user struct
> >>> + * after checking this IRQ effectively is forwarded, change its state,
> >>> + * remove it from the corresponding kvm_vfio_device list
> >>> + */
> >>> +static int kvm_vfio_unforward(struct kvm_device *kdev,
> >>> +				     struct vfio_device *vdev,
> >>> +				     struct kvm_arch_forwarded_irq *fwd_irq)
> >>> +{
> >>> +	struct kvm_vfio *kv = kdev->private;
> >>> +	struct kvm_vfio_device *kvm_vdev;
> >>> +	int ret;
> >>> +
> >>> +	ret = kvm_vfio_validate_unforward(kv, vdev, fwd_irq, &kvm_vdev);
> >>> +	if (ret < 0)
> >>> +		return -EINVAL;
> >>
> >> why do you override the return value?  Propagate it.
> >>
> >>> +
> >>> +	ret = remove_fwd_irq(kv, kvm_vdev, fwd_irq->index);
> >>> +	if (ret < 0)
> >>> +		kvm_err("%s fail unforwarding (fd=%d, index=%d)\n",
> >>> +			__func__, fwd_irq->fd, fwd_irq->index);
> >>> +	else
> >>> +		kvm_debug("%s unforwarding IRQ (fd=%d, index=%d)\n",
> >>> +			  __func__, fwd_irq->fd, fwd_irq->index);
> >>
> >> again with the kernel log here.
> >>
> >>
> >>
> >>> +	return ret;
> >>> +}
> >>> +
> >>> +
> >>> +
> >>> +
> >>> +/**
> >>> + * kvm_vfio_set_device - the top function for interracting with a vfio
> >>
> >>                                 top?             interacting
> >>
> >>> + * device
> >>> + */
> >>
> >> probably just skip this comment
> >>
> >>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> >>> +{
> >>> +	struct kvm_vfio *kv = kdev->private;
> >>> +	struct vfio_device *vdev;
> >>> +	struct kvm_arch_forwarded_irq fwd_irq; /* user struct */
> >>> +	int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> >>> +
> >>> +	switch (attr) {
> >>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> +	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> >>> +		bool must_put;
> >>> +		int ret;
> >>> +
> >>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> >>> +			return -EFAULT;
> >>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> >>> +		if (IS_ERR(vdev))
> >>> +			return PTR_ERR(vdev);
> >>
> >> seems like this whole block of code is replicated below, needs
> >> refactoring.
> >>
> >>> +		mutex_lock(&kv->lock);
> >>> +		ret = kvm_vfio_forward(kdev, vdev, &fwd_irq, &must_put);
> >>> +		if (must_put)
> >>> +			kvm_vfio_put_vfio_device(vdev);
> >>
> >> this must_put looks plain weird.  I think you want to balance your
> >> get/put's always; can't you just get an extra reference in
> >> kvm_vfio_forward() ?
> > 
> > Yeah, this is very broken.  Every forwarded IRQ should hold a reference
> > to the vfio_device.  Every unforward should drop a reference.  Trying to
> > maintain a single reference is a non-goal.
> OK will do that.
> > 
> >>
> >>> +		mutex_unlock(&kv->lock);
> >>> +		return ret;
> >>> +		}
> >>> +	case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: {
> >>> +		int ret;
> >>> +
> >>> +		if (copy_from_user(&fwd_irq, argp, sizeof(fwd_irq)))
> >>> +			return -EFAULT;
> >>> +		vdev = kvm_vfio_get_vfio_device(fwd_irq.fd);
> >>> +		if (IS_ERR(vdev))
> >>> +			return PTR_ERR(vdev);
> >>> +
> >>> +		kvm_vfio_device_put_external_user(vdev);
> >>
> >> you're dropping the reference to the device but referencing it in your
> >> unfoward call below?
> >>
> >>> +		mutex_lock(&kv->lock);
> >>> +		ret = kvm_vfio_unforward(kdev, vdev, &fwd_irq);
> >>> +		mutex_unlock(&kv->lock);
> >>> +		return ret;
> >>> +	}
> >>> +#endif
> >>> +	default:
> >>> +		return -ENXIO;
> >>> +	}
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_put_all_devices - cancel forwarded IRQs and put all devices
> >>> + * @kv: kvm-vfio device
> >>> + *
> >>> + * loop on all got devices and their associated forwarded IRQs
> >>
> >> 'loop on all got' ?
> >>
> >> Restore the non-forwarded state for all registered devices and ...
> >>
> >>> + * restore the non forwarded state, remove IRQs and their devices from
> >>> + * the respective list, put the vfio platform devices
> >>> + *
> >>> + * When this function is called, the vcpu already are destroyed. No
> >>                                     the VPUCs are already destroyed.
> >>> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> >>> + * kvm_arch_set_fwd_state action
> >>
> >> this last bit didn't make any sense to me.  Also, why are we referring
> >> to the vgic in generic code?
> >>
> >>> + */
> >>> +int kvm_vfio_put_all_devices(struct kvm_vfio *kv)
> >>> +{
> >>> +	struct kvm_fwd_irq *fwd_irq_iter, *tmp_irq;
> >>> +	struct kvm_vfio_device *kvm_vdev_iter, *tmp_vdev;
> >>> +
> >>> +	/* loop on all the assigned devices */
> >>
> >> unnecessary comment
> >>
> >>> +	list_for_each_entry_safe(kvm_vdev_iter, tmp_vdev,
> >>> +				 &kv->device_list, node) {
> >>> +
> >>> +		/* loop on all its forwarded IRQ */
> >>
> >> same
> >>
> >>> +		list_for_each_entry_safe(fwd_irq_iter, tmp_irq,
> >>> +					 &kvm_vdev_iter->fwd_irq_list, link) {
> >>> +			kvm_arch_set_fwd_state(fwd_irq_iter,
> >>> +						KVM_VFIO_IRQ_CLEANUP);
> >>> +			list_del(&fwd_irq_iter->link);
> >>> +			kfree(fwd_irq_iter);
> >>> +		}
> > 
> > 
> > Ugh, how many of these cleanup functions do we need?
> will simplify!
> 
> Thanks
> 
> Best Regards
> 
> Eric
> > 
> >>> +		list_del(&kvm_vdev_iter->node);
> >>> +		kvm_vfio_device_put_external_user(kvm_vdev_iter->vfio_device);
> >>> +		kfree(kvm_vdev_iter);
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +
> >>>  static int kvm_vfio_set_attr(struct kvm_device *dev,
> >>>  			     struct kvm_device_attr *attr)
> >>>  {
> >>>  	switch (attr->group) {
> >>>  	case KVM_DEV_VFIO_GROUP:
> >>>  		return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> >>> +	case KVM_DEV_VFIO_DEVICE:
> >>> +		return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> >>>  	}
> >>>  
> >>>  	return -ENXIO;
> >>> @@ -267,10 +706,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> >>>  		case KVM_DEV_VFIO_GROUP_DEL:
> >>>  			return 0;
> >>>  		}
> >>> -
> >>>  		break;
> >>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> +	case KVM_DEV_VFIO_DEVICE:
> >>> +		switch (attr->attr) {
> >>> +		case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >>> +		case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >>> +			return 0;
> >>> +		}
> >>> +		break;
> >>> +#endif
> >>>  	}
> >>> -
> >>>  	return -ENXIO;
> >>>  }
> >>>  
> >>> @@ -284,6 +730,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> >>>  		list_del(&kvg->node);
> >>>  		kfree(kvg);
> >>>  	}
> >>> +	kvm_vfio_put_all_devices(kv);
> >>>  
> >>>  	kvm_vfio_update_coherency(dev);
> >>>  
> >>> @@ -306,6 +753,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
> >>>  		return -ENOMEM;
> >>>  
> >>>  	INIT_LIST_HEAD(&kv->group_list);
> >>> +	INIT_LIST_HEAD(&kv->device_list);
> >>>  	mutex_init(&kv->lock);
> >>>  
> >>>  	dev->private = kv;
> >>> -- 
> >>> 1.9.1
> >>>
> > 
> > 
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
  2014-09-11  8:44       ` Eric Auger
@ 2014-09-11 17:05         ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:05 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, Sep 11, 2014 at 10:44:02AM +0200, Eric Auger wrote:
> On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:43PM +0200, Eric Auger wrote:
> >> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
> >> need to disable the IRQ anymore. In that mode, when the handler completes
> > 
> > add a comma after completes
> Hi Christoffer,
> ok
> > 
> >> the IRQ is not deactivated but only its priority is lowered.
> >>
> >> Some other actor (typically a guest) is supposed to deactivate the IRQ,
> >> allowing at that time a new physical IRQ to hit.
> >>
> >> In virtualization use case, the physical IRQ is automatically completed
> >> by the interrupt controller when the guest completes the corresponding
> >> virtual IRQ.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >> ---
> >>  drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> >> index 6768508..1f851b2 100644
> >> --- a/drivers/vfio/platform/vfio_platform_irq.c
> >> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> >> @@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
> >>  	struct vfio_platform_irq *irq_ctx = dev_id;
> >>  	unsigned long flags;
> >>  	int ret = IRQ_NONE;
> >> +	struct irq_data *d;
> >> +	bool is_forwarded;
> >>  
> >>  	spin_lock_irqsave(&irq_ctx->lock, flags);
> >>  
> >>  	if (!irq_ctx->masked) {
> >>  		ret = IRQ_HANDLED;
> >> +		d = irq_get_irq_data(irq_ctx->hwirq);
> >> +		is_forwarded = irqd_irq_forwarded(d);
> >>  
> >> -		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
> >> +		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
> >> +						!is_forwarded) {
> >>  			disable_irq_nosync(irq_ctx->hwirq);
> >>  			irq_ctx->masked = true;
> >>  		}
> >> -- 
> >> 1.9.1
> >>
> > It makes sense that these needs to be all controlled in the kernel, but
> > I'm wondering if it would be cleaner / more correct to clear the
> > AUTOMASKED flag when the IRQ is forwarded and have vfio refuse setting
> > this flag as long as the irq is forwarded?
> 
> If I am not wrong, even if the user sets AUTOMASKED, this info never is
> exploited by the vfio platform driver. AUTOMASKED only is set internally
> to the driver, on init, for level sensitive IRQs.
> 
> It seems to be the same on PCI (for INTx). I do not see anywhere the
> user flag curectly copied into a local storage. But I prefer to be
> careful ;-)
> 
> If confirmed, although the flag value is exposed in the user API, the
> user set value never is exploited so this removes the need to check.
> 
> the forwarded IRQ modality being fully dynamic currently, then I would
> need to update the irq_ctx->flags on each vfio_irq_handler call. I don't
> know if its better?
> 
I'm not an expert on vfio, so I'll leave that to Alex Williamson to
answer, but I'm just worried that we need to special-case the forwarded
IRQ here, and if that may get lost elsewhere in the vfio code.  If the
AUTOMASKED flag covers specifically this behavior, then why don't we
simply clear/set that flag when forwarding/unforwarding the specific
IRQ?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
@ 2014-09-11 17:05         ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 10:44:02AM +0200, Eric Auger wrote:
> On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:43PM +0200, Eric Auger wrote:
> >> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
> >> need to disable the IRQ anymore. In that mode, when the handler completes
> > 
> > add a comma after completes
> Hi Christoffer,
> ok
> > 
> >> the IRQ is not deactivated but only its priority is lowered.
> >>
> >> Some other actor (typically a guest) is supposed to deactivate the IRQ,
> >> allowing at that time a new physical IRQ to hit.
> >>
> >> In virtualization use case, the physical IRQ is automatically completed
> >> by the interrupt controller when the guest completes the corresponding
> >> virtual IRQ.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >> ---
> >>  drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> >> index 6768508..1f851b2 100644
> >> --- a/drivers/vfio/platform/vfio_platform_irq.c
> >> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> >> @@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
> >>  	struct vfio_platform_irq *irq_ctx = dev_id;
> >>  	unsigned long flags;
> >>  	int ret = IRQ_NONE;
> >> +	struct irq_data *d;
> >> +	bool is_forwarded;
> >>  
> >>  	spin_lock_irqsave(&irq_ctx->lock, flags);
> >>  
> >>  	if (!irq_ctx->masked) {
> >>  		ret = IRQ_HANDLED;
> >> +		d = irq_get_irq_data(irq_ctx->hwirq);
> >> +		is_forwarded = irqd_irq_forwarded(d);
> >>  
> >> -		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
> >> +		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
> >> +						!is_forwarded) {
> >>  			disable_irq_nosync(irq_ctx->hwirq);
> >>  			irq_ctx->masked = true;
> >>  		}
> >> -- 
> >> 1.9.1
> >>
> > It makes sense that these needs to be all controlled in the kernel, but
> > I'm wondering if it would be cleaner / more correct to clear the
> > AUTOMASKED flag when the IRQ is forwarded and have vfio refuse setting
> > this flag as long as the irq is forwarded?
> 
> If I am not wrong, even if the user sets AUTOMASKED, this info never is
> exploited by the vfio platform driver. AUTOMASKED only is set internally
> to the driver, on init, for level sensitive IRQs.
> 
> It seems to be the same on PCI (for INTx). I do not see anywhere the
> user flag curectly copied into a local storage. But I prefer to be
> careful ;-)
> 
> If confirmed, although the flag value is exposed in the user API, the
> user set value never is exploited so this removes the need to check.
> 
> the forwarded IRQ modality being fully dynamic currently, then I would
> need to update the irq_ctx->flags on each vfio_irq_handler call. I don't
> know if its better?
> 
I'm not an expert on vfio, so I'll leave that to Alex Williamson to
answer, but I'm just worried that we need to special-case the forwarded
IRQ here, and if that may get lost elsewhere in the vfio code.  If the
AUTOMASKED flag covers specifically this behavior, then why don't we
simply clear/set that flag when forwarding/unforwarding the specific
IRQ?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
  2014-09-11  8:44       ` Eric Auger
@ 2014-09-11 17:08         ` Antonios Motakis
  -1 siblings, 0 replies; 101+ messages in thread
From: Antonios Motakis @ 2014-09-11 17:08 UTC (permalink / raw)
  To: Eric Auger
  Cc: Christoffer Dall, eric.auger, Marc Zyngier, Linux ARM Kernel,
	kvm-arm, KVM devel mailing list, Alex Williamson, Joel Schopp,
	Kim Phillips, paulus, Gleb Natapov, Paolo Bonzini, Linux Kernel,
	patches, Will Deacon, alvise rigo, john.liuli

On Thu, Sep 11, 2014 at 10:44 AM, Eric Auger <eric.auger@linaro.org> wrote:
>
> On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:43PM +0200, Eric Auger wrote:
> >> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
> >> need to disable the IRQ anymore. In that mode, when the handler completes
> >
> > add a comma after completes
> Hi Christoffer,
> ok
> >
> >> the IRQ is not deactivated but only its priority is lowered.
> >>
> >> Some other actor (typically a guest) is supposed to deactivate the IRQ,
> >> allowing at that time a new physical IRQ to hit.
> >>
> >> In virtualization use case, the physical IRQ is automatically completed
> >> by the interrupt controller when the guest completes the corresponding
> >> virtual IRQ.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >> ---
> >>  drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> >> index 6768508..1f851b2 100644
> >> --- a/drivers/vfio/platform/vfio_platform_irq.c
> >> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> >> @@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
> >>      struct vfio_platform_irq *irq_ctx = dev_id;
> >>      unsigned long flags;
> >>      int ret = IRQ_NONE;
> >> +    struct irq_data *d;
> >> +    bool is_forwarded;
> >>
> >>      spin_lock_irqsave(&irq_ctx->lock, flags);
> >>
> >>      if (!irq_ctx->masked) {
> >>              ret = IRQ_HANDLED;
> >> +            d = irq_get_irq_data(irq_ctx->hwirq);
> >> +            is_forwarded = irqd_irq_forwarded(d);
> >>
> >> -            if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
> >> +            if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
> >> +                                            !is_forwarded) {
> >>                      disable_irq_nosync(irq_ctx->hwirq);
> >>                      irq_ctx->masked = true;
> >>              }
> >> --
> >> 1.9.1
> >>
> > It makes sense that these needs to be all controlled in the kernel, but
> > I'm wondering if it would be cleaner / more correct to clear the
> > AUTOMASKED flag when the IRQ is forwarded and have vfio refuse setting
> > this flag as long as the irq is forwarded?
>
> If I am not wrong, even if the user sets AUTOMASKED, this info never is
> exploited by the vfio platform driver. AUTOMASKED only is set internally
> to the driver, on init, for level sensitive IRQs.
>
> It seems to be the same on PCI (for INTx). I do not see anywhere the
> user flag curectly copied into a local storage. But I prefer to be
> careful ;-)
>
> If confirmed, although the flag value is exposed in the user API, the
> user set value never is exploited so this removes the need to check.
>

Yeah, the way the API is right now the AUTOMASKED flag is only to be
communicated by the kernel to the user, never the other way around.

IMHO there shouldn't be a need to change that. The flag is there just
to inform the user for the kernel behavior for non-forwarded IRQs (and
it's still true if the user unforwards the IRQ later). The user
decides the mode of operation, but it might still be a bit of
information he wants to know.

> the forwarded IRQ modality being fully dynamic currently, then I would
> need to update the irq_ctx->flags on each vfio_irq_handler call. I don't
> know if its better?
>
> Best Regards
>
> Eric
>
>
> >
> > -Christoffer
> >
>



-- 
Antonios Motakis
Virtual Open Systems

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
@ 2014-09-11 17:08         ` Antonios Motakis
  0 siblings, 0 replies; 101+ messages in thread
From: Antonios Motakis @ 2014-09-11 17:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 10:44 AM, Eric Auger <eric.auger@linaro.org> wrote:
>
> On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:43PM +0200, Eric Auger wrote:
> >> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
> >> need to disable the IRQ anymore. In that mode, when the handler completes
> >
> > add a comma after completes
> Hi Christoffer,
> ok
> >
> >> the IRQ is not deactivated but only its priority is lowered.
> >>
> >> Some other actor (typically a guest) is supposed to deactivate the IRQ,
> >> allowing at that time a new physical IRQ to hit.
> >>
> >> In virtualization use case, the physical IRQ is automatically completed
> >> by the interrupt controller when the guest completes the corresponding
> >> virtual IRQ.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >> ---
> >>  drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> >> index 6768508..1f851b2 100644
> >> --- a/drivers/vfio/platform/vfio_platform_irq.c
> >> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> >> @@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
> >>      struct vfio_platform_irq *irq_ctx = dev_id;
> >>      unsigned long flags;
> >>      int ret = IRQ_NONE;
> >> +    struct irq_data *d;
> >> +    bool is_forwarded;
> >>
> >>      spin_lock_irqsave(&irq_ctx->lock, flags);
> >>
> >>      if (!irq_ctx->masked) {
> >>              ret = IRQ_HANDLED;
> >> +            d = irq_get_irq_data(irq_ctx->hwirq);
> >> +            is_forwarded = irqd_irq_forwarded(d);
> >>
> >> -            if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
> >> +            if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
> >> +                                            !is_forwarded) {
> >>                      disable_irq_nosync(irq_ctx->hwirq);
> >>                      irq_ctx->masked = true;
> >>              }
> >> --
> >> 1.9.1
> >>
> > It makes sense that these needs to be all controlled in the kernel, but
> > I'm wondering if it would be cleaner / more correct to clear the
> > AUTOMASKED flag when the IRQ is forwarded and have vfio refuse setting
> > this flag as long as the irq is forwarded?
>
> If I am not wrong, even if the user sets AUTOMASKED, this info never is
> exploited by the vfio platform driver. AUTOMASKED only is set internally
> to the driver, on init, for level sensitive IRQs.
>
> It seems to be the same on PCI (for INTx). I do not see anywhere the
> user flag curectly copied into a local storage. But I prefer to be
> careful ;-)
>
> If confirmed, although the flag value is exposed in the user API, the
> user set value never is exploited so this removes the need to check.
>

Yeah, the way the API is right now the AUTOMASKED flag is only to be
communicated by the kernel to the user, never the other way around.

IMHO there shouldn't be a need to change that. The flag is there just
to inform the user for the kernel behavior for non-forwarded IRQs (and
it's still true if the user unforwards the IRQ later). The user
decides the mode of operation, but it might still be a bit of
information he wants to know.

> the forwarded IRQ modality being fully dynamic currently, then I would
> need to update the irq_ctx->flags on each vfio_irq_handler call. I don't
> know if its better?
>
> Best Regards
>
> Eric
>
>
> >
> > -Christoffer
> >
>



-- 
Antonios Motakis
Virtual Open Systems

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 5/9] KVM: KVM-VFIO: update user API to program forwarded IRQ
  2014-09-11  8:49       ` Eric Auger
@ 2014-09-11 17:08         ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:08 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, Sep 11, 2014 at 10:49:08AM +0200, Eric Auger wrote:
> On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:44PM +0200, Eric Auger wrote:

[...]

> >> +
> >> +It is up to the caller of this API to make sure the IRQ is not
> >> +outstanding when the FORWARD/UNFORWARD is called. This could lead to
> > 
> > outstanding? can you be specific?
> active? and I should add *physical* IRQ
> > 
> > don't refer to FOWARD/UNFORWARD, either refer to these attributes by
> > their full name or use a clear reference in proper English.
> ok
> > 
> >> +some inconsistency on who is going to complete the IRQ.
> > 
> > This sounds like the whole thing is fragile and if userspace doesn't do
> > things right, IRQ handling of a piece of hardware is going to be
> > inconsistent?  Is this the case?  If so, we need some stronger
> > semantics.  If not, this should be rephrased.
> Actually the KVM-VFIO device rejects any attempt to change the
> forwarding mode if the physical IRQ is active. So I hope this is robust
> and will change the explanation.
> 
ok, so what is the proposed method if the IRQ is indeed active, should
user space loop around and try or can user space make sure somehow?  If
user space should simply retry for a number of times, we should probalby
return a proper error code for this case -EINTR?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 5/9] KVM: KVM-VFIO: update user API to program forwarded IRQ
@ 2014-09-11 17:08         ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 10:49:08AM +0200, Eric Auger wrote:
> On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:44PM +0200, Eric Auger wrote:

[...]

> >> +
> >> +It is up to the caller of this API to make sure the IRQ is not
> >> +outstanding when the FORWARD/UNFORWARD is called. This could lead to
> > 
> > outstanding? can you be specific?
> active? and I should add *physical* IRQ
> > 
> > don't refer to FOWARD/UNFORWARD, either refer to these attributes by
> > their full name or use a clear reference in proper English.
> ok
> > 
> >> +some inconsistency on who is going to complete the IRQ.
> > 
> > This sounds like the whole thing is fragile and if userspace doesn't do
> > things right, IRQ handling of a piece of hardware is going to be
> > inconsistent?  Is this the case?  If so, we need some stronger
> > semantics.  If not, this should be rephrased.
> Actually the KVM-VFIO device rejects any attempt to change the
> forwarding mode if the physical IRQ is active. So I hope this is robust
> and will change the explanation.
> 
ok, so what is the proposed method if the IRQ is indeed active, should
user space loop around and try or can user space make sure somehow?  If
user space should simply retry for a number of times, we should probalby
return a proper error code for this case -EINTR?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-11  5:05       ` Alex Williamson
@ 2014-09-11 17:10         ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:10 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Eric Auger, eric.auger, marc.zyngier, linux-arm-kernel, kvmarm,
	kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Wed, Sep 10, 2014 at 11:05:49PM -0600, Alex Williamson wrote:
> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:

[...]

> > >  
> > > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > 
> > what's the 'p' in pfwd?
> 
> p is for pointer?
> 

shouldn't the type declation spell out quite clearly to me that I'm
dealing with a pointer?

[...]

> > 
> > need some spaceing here, also, I would turn this around, first check if
> > the strcmp fails, and then error out, then do you next check etc., to
> > avoid so many nested statements.
> > 
> > > +	/* is a ref to this device already owned by the KVM-VFIO device? */
> > 
> > this comment is not particularly helpful in its current form, it would
> > be helpful if you specified that we're checking whether that particular
> > device/irq combo is already registered.
> > 
> > > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > > +	if (*kvm_vdev) {
> > > +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> > > +			kvm_err("%s irq %d already forwarded\n",
> > > +				__func__, *hwirq);
> 
> Why didn't we do this first?
> 
huh?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11 17:10         ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 10, 2014 at 11:05:49PM -0600, Alex Williamson wrote:
> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:

[...]

> > >  
> > > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > 
> > what's the 'p' in pfwd?
> 
> p is for pointer?
> 

shouldn't the type declation spell out quite clearly to me that I'm
dealing with a pointer?

[...]

> > 
> > need some spaceing here, also, I would turn this around, first check if
> > the strcmp fails, and then error out, then do you next check etc., to
> > avoid so many nested statements.
> > 
> > > +	/* is a ref to this device already owned by the KVM-VFIO device? */
> > 
> > this comment is not particularly helpful in its current form, it would
> > be helpful if you specified that we're checking whether that particular
> > device/irq combo is already registered.
> > 
> > > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > > +	if (*kvm_vdev) {
> > > +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> > > +			kvm_err("%s irq %d already forwarded\n",
> > > +				__func__, *hwirq);
> 
> Why didn't we do this first?
> 
huh?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-11 12:04         ` Eric Auger
@ 2014-09-11 17:22           ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:22 UTC (permalink / raw)
  To: Eric Auger
  Cc: Alex Williamson, eric.auger, marc.zyngier, linux-arm-kernel,
	kvmarm, kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, Sep 11, 2014 at 02:04:39PM +0200, Eric Auger wrote:
> On 09/11/2014 07:05 AM, Alex Williamson wrote:
> > On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> >> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> >>> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> >>>
> >>> This is a new control channel which enables KVM to cooperate with
> >>> viable VFIO devices.
> >>>
> >>> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> >>> in addition to a list of groups (kvm_vfio_group). The new
> >>> infrastructure enables to check the validity of the VFIO device
> >>> file descriptor, get and hold a reference to it.
> >>>
> >>> The first concrete implemented command is IRQ forward control:
> >>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> >>>
> >>> It consists in programing the VFIO driver and KVM in a consistent manner
> >>> so that an optimized IRQ injection/completion is set up. Each
> >>> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> >>> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> >>> are set again in the normal handling state (non forwarded).
> >>
> >> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> >>
> >> When a kvm_vfio_device is released?
> >>
> >>>
> >>> The forwarding programmming is architecture specific, embodied by the
> >>> kvm_arch_set_fwd_state function. Its implementation is given in a
> >>> separate patch file.
> >>
> >> I would drop the last sentence and instead indicate that this is handled
> >> properly when the architecture does not support such a feature.
> >>
> >>>
> >>> The forwarding control modality is enabled by the
> >>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> >>>
> >>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>>
> >>> ---
> >>>
> >>> v1 -> v2:
> >>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> - original patch file separated into 2 parts: generic part moved in vfio.c
> >>>   and ARM specific part(kvm_arch_set_fwd_state)
> >>> ---
> >>>  include/linux/kvm_host.h |  27 +++
> >>>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> >>>  2 files changed, 477 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >>> index a4c33b3..24350dc 100644
> >>> --- a/include/linux/kvm_host.h
> >>> +++ b/include/linux/kvm_host.h
> >>> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> >>>  		      unsigned long arg);
> >>>  };
> >>>  
> >>> +enum kvm_fwd_irq_action {
> >>> +	KVM_VFIO_IRQ_SET_FORWARD,
> >>> +	KVM_VFIO_IRQ_SET_NORMAL,
> >>> +	KVM_VFIO_IRQ_CLEANUP,
> >>
> >> This is KVM internal API, so it would probably be good to document this.
> >> Especially the CLEANUP bit worries me, see below.
> > 
> > This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
> Hi Alex,
> 
> will change that.
> > Extra states worry me too.
> 
> I tried to explained the 2 motivations behind. Please let me know if it
> makes sense.
> > 
> >>> +};
> >>> +
> >>> +/* internal structure describing a forwarded IRQ */
> >>> +struct kvm_fwd_irq {
> >>> +	struct list_head link;
> >>
> >> this list entry is local to the kvm vfio device, right? that means you
> >> probably want a struct with just the below fields, and then have a
> >> containing struct in the generic device file, private to it's logic.
> > 
> > Yes, this is part of the abstraction problem.
> OK will fix that.
> > 
> >>> +	__u32 index; /* platform device irq index */
> > 
> > This is a vfio_device irq_index, but vfio_devices support indexes and
> > sub-indexes.  At this level the API should match vfio, not the specifics
> > of platform devices not supporting sub-index.
> I will add sub-indexes then.
> > 
> >>> +	__u32 hwirq; /*physical IRQ */
> >>> +	__u32 gsi; /* virtual IRQ */
> >>> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> > 
> > Not sure I understand why vcpu is necessary.
> vcpu is used when providing the physical IRQ/virtual IRQ mapping to the
> virtual GIC. I can remove it from and add a vcpu struct * param to
> kvm_arch_set_fwd_state if you prefer.
> 
>   Also I see a 'get' in the code below, but not a 'put'.
> Sorry I do not understand your comment here? What 'get' do you mention?

he means kvm_get_vcpu(), but you are ok on that one, the kvm naming of
this function is unfortunate, because it doesn't increment any refcounts
but just resolves to an entry in the array.

> > 
> >>> +};
> >>> +
> >>>  void kvm_device_get(struct kvm_device *dev);
> >>>  void kvm_device_put(struct kvm_device *dev);
> >>>  struct kvm_device *kvm_device_from_filp(struct file *filp);
> >>> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
> >>>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >>>  extern struct kvm_device_ops kvm_flic_ops;
> >>>  
> >>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> >>
> >> what's the 'p' in pfwd?
> > 
> > p is for pointer?
> yes it was ;-)
> > 
> >>> +			   enum kvm_fwd_irq_action action);
> >>> +
> >>> +#else
> >>> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> >>> +					 enum kvm_fwd_irq_action action)
> >>> +{
> >>> +	return 0;
> >>> +}
> >>> +#endif
> >>> +
> >>>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >>>  
> >>>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> >>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> >>> index 76dc7a1..e4a81c4 100644
> >>> --- a/virt/kvm/vfio.c
> >>> +++ b/virt/kvm/vfio.c
> >>> @@ -18,14 +18,24 @@
> >>>  #include <linux/slab.h>
> >>>  #include <linux/uaccess.h>
> >>>  #include <linux/vfio.h>
> >>> +#include <linux/platform_device.h>
> >>>  
> >>>  struct kvm_vfio_group {
> >>>  	struct list_head node;
> >>>  	struct vfio_group *vfio_group;
> >>>  };
> >>>  
> >>> +struct kvm_vfio_device {
> >>> +	struct list_head node;
> >>> +	struct vfio_device *vfio_device;
> >>> +	/* list of forwarded IRQs for that VFIO device */
> >>> +	struct list_head fwd_irq_list;
> >>> +	int fd;
> >>> +};
> >>> +
> >>>  struct kvm_vfio {
> >>>  	struct list_head group_list;
> >>> +	struct list_head device_list;
> >>>  	struct mutex lock;
> >>>  	bool noncoherent;
> >>>  };
> >>> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >>>  	return -ENXIO;
> >>>  }
> >>>  
> >>> +/**
> >>> + * get_vfio_device - returns the vfio-device corresponding to this fd
> >>> + * @fd:fd of the vfio platform device
> >>> + *
> >>> + * checks it is a vfio device
> >>> + * increment its ref counter
> >>
> >> why the short lines?  Just write this out in proper English.
> >>
> >>> + */
> >>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> >>> +{
> >>> +	struct fd f;
> >>> +	struct vfio_device *vdev;
> >>> +
> >>> +	f = fdget(fd);
> >>> +	if (!f.file)
> >>> +		return NULL;
> >>> +	vdev = kvm_vfio_device_get_external_user(f.file);
> >>> +	fdput(f);
> >>> +	return vdev;
> >>> +}
> >>> +
> >>> +/**
> >>> + * put_vfio_device: put the vfio platform device
> >>> + * @vdev: vfio_device to put
> >>> + *
> >>> + * decrement the ref counter
> >>> + */
> >>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> >>> +{
> >>> +	kvm_vfio_device_put_external_user(vdev);
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_find_device - look for the device in the assigned
> >>> + * device list
> >>> + * @kv: the kvm-vfio device
> >>> + * @vdev: the vfio_device to look for
> >>> + *
> >>> + * returns the associated kvm_vfio_device if the device is known,
> >>> + * meaning at least 1 IRQ is forwarded for this device.
> >>> + * in the device is not registered, returns NULL.
> >>> + */
> > 
> > Why are we talking about forwarded IRQs already, this is a simple lookup
> > function, who knows what other users it will have in the future.
> I will correct
> > 
> >>
> >> are these functions meant to be exported?  Otherwise they should be
> >> static, and the documentation on these simple list iteration wrappers
> >> seems like overkill imho.
> >>
> >>> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> >>> +					     struct vfio_device *vdev)
> >>> +{
> >>> +	struct kvm_vfio_device *kvm_vdev_iter;
> >>> +
> >>> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> >>> +		if (kvm_vdev_iter->vfio_device == vdev)
> >>> +			return kvm_vdev_iter;
> >>> +	}
> >>> +	return NULL;
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> >>> + * @kvm_vdev: the kvm_vfio_device
> >>> + * @irq_index: irq index
> >>> + *
> >>> + * returns the forwarded irq struct if it exists, NULL in the negative
> >>> + */
> >>> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> >>> +				      int irq_index)
> > 
> > +sub-index
> OK
> > 
> > probably important to note on both of these that they need to be called
> > with kv->lock
> OK
> > 
> >>> +{
> >>> +	struct kvm_fwd_irq *fwd_irq_iter;
> >>> +
> >>> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> >>> +		if (fwd_irq_iter->index == irq_index)
> >>> +			return fwd_irq_iter;
> >>> +	}
> >>> +	return NULL;
> >>> +}
> >>> +
> >>> +/**
> >>> + * validate_forward - checks whether forwarding a given IRQ is meaningful
> >>> + * @vdev:  vfio_device the IRQ belongs to
> >>> + * @fwd_irq: user struct containing the irq_index to forward
> >>> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> >>> + * kvm_vfio_device that holds it
> >>> + * @hwirq: irq numberthe irq index corresponds to
> >>> + *
> >>> + * checks the vfio-device is a platform vfio device
> >>> + * checks the irq_index corresponds to an actual hwirq and
> >>> + * checks this hwirq is not already forwarded
> >>> + * returns < 0 on following errors:
> >>> + * not a platform device, bad irq index, already forwarded
> >>> + */
> >>> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> >>> +			    struct vfio_device *vdev,
> >>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >>> +			    struct kvm_vfio_device **kvm_vdev,
> >>> +			    int *hwirq)
> >>> +{
> >>> +	struct device *dev = kvm_vfio_external_base_device(vdev);
> >>> +	struct platform_device *platdev;
> >>> +
> >>> +	*hwirq = -1;
> >>> +	*kvm_vdev = NULL;
> >>> +	if (strcmp(dev->bus->name, "platform") == 0) {
> > 
> > Should be testing dev->bus_type == &platform_bus_type, and ideally
> > creating a dev_is_platform() macro to make that even cleaner.
> OK
> > 
> > However, we're being sort of sneaky here that we're actually doing
> > something platform device specific here.  Why?  Don't we just need to
> > make sure that kvm-vfio doesn't have any record of this forward
> > (-EEXIST) and let the platform device code error out later for this
> > case?
> After having answered to Christoffer's comments, I think I should check
> whether the IRQ is not already mapped at VGIC level. In that case I
> would need to split the validate function into 2 parts:
> - generic part only checks the vfio_device/irq_index is not already
> recorded. I do not need the hwirq for that.
> - arm specific part checks no GIC mapping does exist (need the hwirq)
> 
> Would it be OK for both of you?

the latter being in the arch specific code it sounds like, but sure,
there's a lot of cleanup to be made here, so go with that appraoch and
get a second version out soon, then we can have another look.

> > 
> >>> +		platdev = to_platform_device(dev);
> >>> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> >>> +		if (*hwirq < 0) {
> >>> +			kvm_err("%s incorrect index\n", __func__);
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	} else {
> >>> +		kvm_err("%s not a platform device\n", __func__);
> >>> +		return -EINVAL;
> >>> +	}
> >>
> >> need some spaceing here, also, I would turn this around, first check if
> >> the strcmp fails, and then error out, then do you next check etc., to
> >> avoid so many nested statements.
> >>
> >>> +	/* is a ref to this device already owned by the KVM-VFIO device? */
> >>
> >> this comment is not particularly helpful in its current form, it would
> >> be helpful if you specified that we're checking whether that particular
> >> device/irq combo is already registered.
> >>
> >>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >>> +	if (*kvm_vdev) {
> >>> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> >>> +			kvm_err("%s irq %d already forwarded\n",
> >>> +				__func__, *hwirq);
> > 
> > Why didn't we do this first?
> see above comment
> > 
> >> don't flood the kernel log because of a user error, just allocate an
> >> error code for this purpose and document it in the ABI, -EEXIST or
> >> something.
> >>
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/**
> >>> + * validate_unforward: check a deassignment is meaningful
> >>> + * @kv: the kvm_vfio device
> >>> + * @vdev: the vfio_device whose irq to deassign belongs to
> >>> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> >>> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> >>> + * it exists
> >>> + *
> >>> + * returns 0 if the provided irq effectively is forwarded
> >>> + * (a ref to this vfio_device is hold and this irq belongs to
> >>                                     held
> >>> + * the forwarded irq of this device)
> >>> + * returns -EINVAL in the negative
> >>
> >>                ENOENT should be returned if you don't have an entry.
> >> 	       EINVAL could be used if you supply an fd that isn't a
> >> 	       VFIO device file descriptor, for example.  Again,
> >> 	       consider documenting all this in the API.
> >>
> >>> + */
> >>> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> >>> +			      struct vfio_device *vdev,
> >>> +			      struct kvm_arch_forwarded_irq *fwd_irq,
> >>> +			      struct kvm_vfio_device **kvm_vdev)
> >>> +{
> >>> +	struct kvm_fwd_irq *pfwd;
> >>> +
> >>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >>> +	if (!kvm_vdev) {
> >>> +		kvm_err("%s no forwarded irq for this device\n", __func__);
> >>
> >> don't flood the kernel log
> >>
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> >>> +	if (!pfwd) {
> >>> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> >>
> >>> +		return -EINVAL;
> >>
> >> same here
> I do not understand. With current functions I need to first retrieve the
> device and then iterate on IRQs of that device.
> >>

I'm just saying you shouldn't print to the kernel log because the user
did something stupid.

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11 17:22           ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 02:04:39PM +0200, Eric Auger wrote:
> On 09/11/2014 07:05 AM, Alex Williamson wrote:
> > On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> >> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> >>> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> >>>
> >>> This is a new control channel which enables KVM to cooperate with
> >>> viable VFIO devices.
> >>>
> >>> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> >>> in addition to a list of groups (kvm_vfio_group). The new
> >>> infrastructure enables to check the validity of the VFIO device
> >>> file descriptor, get and hold a reference to it.
> >>>
> >>> The first concrete implemented command is IRQ forward control:
> >>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> >>>
> >>> It consists in programing the VFIO driver and KVM in a consistent manner
> >>> so that an optimized IRQ injection/completion is set up. Each
> >>> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> >>> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> >>> are set again in the normal handling state (non forwarded).
> >>
> >> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> >>
> >> When a kvm_vfio_device is released?
> >>
> >>>
> >>> The forwarding programmming is architecture specific, embodied by the
> >>> kvm_arch_set_fwd_state function. Its implementation is given in a
> >>> separate patch file.
> >>
> >> I would drop the last sentence and instead indicate that this is handled
> >> properly when the architecture does not support such a feature.
> >>
> >>>
> >>> The forwarding control modality is enabled by the
> >>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> >>>
> >>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>>
> >>> ---
> >>>
> >>> v1 -> v2:
> >>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> - original patch file separated into 2 parts: generic part moved in vfio.c
> >>>   and ARM specific part(kvm_arch_set_fwd_state)
> >>> ---
> >>>  include/linux/kvm_host.h |  27 +++
> >>>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> >>>  2 files changed, 477 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >>> index a4c33b3..24350dc 100644
> >>> --- a/include/linux/kvm_host.h
> >>> +++ b/include/linux/kvm_host.h
> >>> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> >>>  		      unsigned long arg);
> >>>  };
> >>>  
> >>> +enum kvm_fwd_irq_action {
> >>> +	KVM_VFIO_IRQ_SET_FORWARD,
> >>> +	KVM_VFIO_IRQ_SET_NORMAL,
> >>> +	KVM_VFIO_IRQ_CLEANUP,
> >>
> >> This is KVM internal API, so it would probably be good to document this.
> >> Especially the CLEANUP bit worries me, see below.
> > 
> > This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
> Hi Alex,
> 
> will change that.
> > Extra states worry me too.
> 
> I tried to explained the 2 motivations behind. Please let me know if it
> makes sense.
> > 
> >>> +};
> >>> +
> >>> +/* internal structure describing a forwarded IRQ */
> >>> +struct kvm_fwd_irq {
> >>> +	struct list_head link;
> >>
> >> this list entry is local to the kvm vfio device, right? that means you
> >> probably want a struct with just the below fields, and then have a
> >> containing struct in the generic device file, private to it's logic.
> > 
> > Yes, this is part of the abstraction problem.
> OK will fix that.
> > 
> >>> +	__u32 index; /* platform device irq index */
> > 
> > This is a vfio_device irq_index, but vfio_devices support indexes and
> > sub-indexes.  At this level the API should match vfio, not the specifics
> > of platform devices not supporting sub-index.
> I will add sub-indexes then.
> > 
> >>> +	__u32 hwirq; /*physical IRQ */
> >>> +	__u32 gsi; /* virtual IRQ */
> >>> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> > 
> > Not sure I understand why vcpu is necessary.
> vcpu is used when providing the physical IRQ/virtual IRQ mapping to the
> virtual GIC. I can remove it from and add a vcpu struct * param to
> kvm_arch_set_fwd_state if you prefer.
> 
>   Also I see a 'get' in the code below, but not a 'put'.
> Sorry I do not understand your comment here? What 'get' do you mention?

he means kvm_get_vcpu(), but you are ok on that one, the kvm naming of
this function is unfortunate, because it doesn't increment any refcounts
but just resolves to an entry in the array.

> > 
> >>> +};
> >>> +
> >>>  void kvm_device_get(struct kvm_device *dev);
> >>>  void kvm_device_put(struct kvm_device *dev);
> >>>  struct kvm_device *kvm_device_from_filp(struct file *filp);
> >>> @@ -1075,6 +1090,18 @@ extern struct kvm_device_ops kvm_vfio_ops;
> >>>  extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> >>>  extern struct kvm_device_ops kvm_flic_ops;
> >>>  
> >>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>> +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> >>
> >> what's the 'p' in pfwd?
> > 
> > p is for pointer?
> yes it was ;-)
> > 
> >>> +			   enum kvm_fwd_irq_action action);
> >>> +
> >>> +#else
> >>> +static inline int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> >>> +					 enum kvm_fwd_irq_action action)
> >>> +{
> >>> +	return 0;
> >>> +}
> >>> +#endif
> >>> +
> >>>  #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >>>  
> >>>  static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> >>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> >>> index 76dc7a1..e4a81c4 100644
> >>> --- a/virt/kvm/vfio.c
> >>> +++ b/virt/kvm/vfio.c
> >>> @@ -18,14 +18,24 @@
> >>>  #include <linux/slab.h>
> >>>  #include <linux/uaccess.h>
> >>>  #include <linux/vfio.h>
> >>> +#include <linux/platform_device.h>
> >>>  
> >>>  struct kvm_vfio_group {
> >>>  	struct list_head node;
> >>>  	struct vfio_group *vfio_group;
> >>>  };
> >>>  
> >>> +struct kvm_vfio_device {
> >>> +	struct list_head node;
> >>> +	struct vfio_device *vfio_device;
> >>> +	/* list of forwarded IRQs for that VFIO device */
> >>> +	struct list_head fwd_irq_list;
> >>> +	int fd;
> >>> +};
> >>> +
> >>>  struct kvm_vfio {
> >>>  	struct list_head group_list;
> >>> +	struct list_head device_list;
> >>>  	struct mutex lock;
> >>>  	bool noncoherent;
> >>>  };
> >>> @@ -246,12 +256,441 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >>>  	return -ENXIO;
> >>>  }
> >>>  
> >>> +/**
> >>> + * get_vfio_device - returns the vfio-device corresponding to this fd
> >>> + * @fd:fd of the vfio platform device
> >>> + *
> >>> + * checks it is a vfio device
> >>> + * increment its ref counter
> >>
> >> why the short lines?  Just write this out in proper English.
> >>
> >>> + */
> >>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> >>> +{
> >>> +	struct fd f;
> >>> +	struct vfio_device *vdev;
> >>> +
> >>> +	f = fdget(fd);
> >>> +	if (!f.file)
> >>> +		return NULL;
> >>> +	vdev = kvm_vfio_device_get_external_user(f.file);
> >>> +	fdput(f);
> >>> +	return vdev;
> >>> +}
> >>> +
> >>> +/**
> >>> + * put_vfio_device: put the vfio platform device
> >>> + * @vdev: vfio_device to put
> >>> + *
> >>> + * decrement the ref counter
> >>> + */
> >>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> >>> +{
> >>> +	kvm_vfio_device_put_external_user(vdev);
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_find_device - look for the device in the assigned
> >>> + * device list
> >>> + * @kv: the kvm-vfio device
> >>> + * @vdev: the vfio_device to look for
> >>> + *
> >>> + * returns the associated kvm_vfio_device if the device is known,
> >>> + * meaning at least 1 IRQ is forwarded for this device.
> >>> + * in the device is not registered, returns NULL.
> >>> + */
> > 
> > Why are we talking about forwarded IRQs already, this is a simple lookup
> > function, who knows what other users it will have in the future.
> I will correct
> > 
> >>
> >> are these functions meant to be exported?  Otherwise they should be
> >> static, and the documentation on these simple list iteration wrappers
> >> seems like overkill imho.
> >>
> >>> +struct kvm_vfio_device *kvm_vfio_find_device(struct kvm_vfio *kv,
> >>> +					     struct vfio_device *vdev)
> >>> +{
> >>> +	struct kvm_vfio_device *kvm_vdev_iter;
> >>> +
> >>> +	list_for_each_entry(kvm_vdev_iter, &kv->device_list, node) {
> >>> +		if (kvm_vdev_iter->vfio_device == vdev)
> >>> +			return kvm_vdev_iter;
> >>> +	}
> >>> +	return NULL;
> >>> +}
> >>> +
> >>> +/**
> >>> + * kvm_vfio_find_irq - look for a an irq in the device IRQ list
> >>> + * @kvm_vdev: the kvm_vfio_device
> >>> + * @irq_index: irq index
> >>> + *
> >>> + * returns the forwarded irq struct if it exists, NULL in the negative
> >>> + */
> >>> +struct kvm_fwd_irq *kvm_vfio_find_irq(struct kvm_vfio_device *kvm_vdev,
> >>> +				      int irq_index)
> > 
> > +sub-index
> OK
> > 
> > probably important to note on both of these that they need to be called
> > with kv->lock
> OK
> > 
> >>> +{
> >>> +	struct kvm_fwd_irq *fwd_irq_iter;
> >>> +
> >>> +	list_for_each_entry(fwd_irq_iter, &kvm_vdev->fwd_irq_list, link) {
> >>> +		if (fwd_irq_iter->index == irq_index)
> >>> +			return fwd_irq_iter;
> >>> +	}
> >>> +	return NULL;
> >>> +}
> >>> +
> >>> +/**
> >>> + * validate_forward - checks whether forwarding a given IRQ is meaningful
> >>> + * @vdev:  vfio_device the IRQ belongs to
> >>> + * @fwd_irq: user struct containing the irq_index to forward
> >>> + * @kvm_vdev: if a forwarded IRQ already exists for that VFIO device,
> >>> + * kvm_vfio_device that holds it
> >>> + * @hwirq: irq numberthe irq index corresponds to
> >>> + *
> >>> + * checks the vfio-device is a platform vfio device
> >>> + * checks the irq_index corresponds to an actual hwirq and
> >>> + * checks this hwirq is not already forwarded
> >>> + * returns < 0 on following errors:
> >>> + * not a platform device, bad irq index, already forwarded
> >>> + */
> >>> +static int kvm_vfio_validate_forward(struct kvm_vfio *kv,
> >>> +			    struct vfio_device *vdev,
> >>> +			    struct kvm_arch_forwarded_irq *fwd_irq,
> >>> +			    struct kvm_vfio_device **kvm_vdev,
> >>> +			    int *hwirq)
> >>> +{
> >>> +	struct device *dev = kvm_vfio_external_base_device(vdev);
> >>> +	struct platform_device *platdev;
> >>> +
> >>> +	*hwirq = -1;
> >>> +	*kvm_vdev = NULL;
> >>> +	if (strcmp(dev->bus->name, "platform") == 0) {
> > 
> > Should be testing dev->bus_type == &platform_bus_type, and ideally
> > creating a dev_is_platform() macro to make that even cleaner.
> OK
> > 
> > However, we're being sort of sneaky here that we're actually doing
> > something platform device specific here.  Why?  Don't we just need to
> > make sure that kvm-vfio doesn't have any record of this forward
> > (-EEXIST) and let the platform device code error out later for this
> > case?
> After having answered to Christoffer's comments, I think I should check
> whether the IRQ is not already mapped at VGIC level. In that case I
> would need to split the validate function into 2 parts:
> - generic part only checks the vfio_device/irq_index is not already
> recorded. I do not need the hwirq for that.
> - arm specific part checks no GIC mapping does exist (need the hwirq)
> 
> Would it be OK for both of you?

the latter being in the arch specific code it sounds like, but sure,
there's a lot of cleanup to be made here, so go with that appraoch and
get a second version out soon, then we can have another look.

> > 
> >>> +		platdev = to_platform_device(dev);
> >>> +		*hwirq = platform_get_irq(platdev, fwd_irq->index);
> >>> +		if (*hwirq < 0) {
> >>> +			kvm_err("%s incorrect index\n", __func__);
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	} else {
> >>> +		kvm_err("%s not a platform device\n", __func__);
> >>> +		return -EINVAL;
> >>> +	}
> >>
> >> need some spaceing here, also, I would turn this around, first check if
> >> the strcmp fails, and then error out, then do you next check etc., to
> >> avoid so many nested statements.
> >>
> >>> +	/* is a ref to this device already owned by the KVM-VFIO device? */
> >>
> >> this comment is not particularly helpful in its current form, it would
> >> be helpful if you specified that we're checking whether that particular
> >> device/irq combo is already registered.
> >>
> >>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >>> +	if (*kvm_vdev) {
> >>> +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> >>> +			kvm_err("%s irq %d already forwarded\n",
> >>> +				__func__, *hwirq);
> > 
> > Why didn't we do this first?
> see above comment
> > 
> >> don't flood the kernel log because of a user error, just allocate an
> >> error code for this purpose and document it in the ABI, -EEXIST or
> >> something.
> >>
> >>> +			return -EINVAL;
> >>> +		}
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >>> +/**
> >>> + * validate_unforward: check a deassignment is meaningful
> >>> + * @kv: the kvm_vfio device
> >>> + * @vdev: the vfio_device whose irq to deassign belongs to
> >>> + * @fwd_irq: the user struct that contains the fd and irq_index of the irq
> >>> + * @kvm_vdev: the kvm_vfio_device the forwarded irq belongs to, if
> >>> + * it exists
> >>> + *
> >>> + * returns 0 if the provided irq effectively is forwarded
> >>> + * (a ref to this vfio_device is hold and this irq belongs to
> >>                                     held
> >>> + * the forwarded irq of this device)
> >>> + * returns -EINVAL in the negative
> >>
> >>                ENOENT should be returned if you don't have an entry.
> >> 	       EINVAL could be used if you supply an fd that isn't a
> >> 	       VFIO device file descriptor, for example.  Again,
> >> 	       consider documenting all this in the API.
> >>
> >>> + */
> >>> +static int kvm_vfio_validate_unforward(struct kvm_vfio *kv,
> >>> +			      struct vfio_device *vdev,
> >>> +			      struct kvm_arch_forwarded_irq *fwd_irq,
> >>> +			      struct kvm_vfio_device **kvm_vdev)
> >>> +{
> >>> +	struct kvm_fwd_irq *pfwd;
> >>> +
> >>> +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> >>> +	if (!kvm_vdev) {
> >>> +		kvm_err("%s no forwarded irq for this device\n", __func__);
> >>
> >> don't flood the kernel log
> >>
> >>> +		return -EINVAL;
> >>> +	}
> >>> +	pfwd = kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index);
> >>> +	if (!pfwd) {
> >>> +		kvm_err("%s irq %d is not forwarded\n", __func__, fwd_irq->fd);
> >>
> >>> +		return -EINVAL;
> >>
> >> same here
> I do not understand. With current functions I need to first retrieve the
> device and then iterate on IRQs of that device.
> >>

I'm just saying you shouldn't print to the kernel log because the user
did something stupid.

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-11 15:59           ` Alex Williamson
@ 2014-09-11 17:24             ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:24 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Eric Auger, eric.auger, marc.zyngier, linux-arm-kernel, kvmarm,
	kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, Sep 11, 2014 at 09:59:24AM -0600, Alex Williamson wrote:
> On Thu, 2014-09-11 at 14:04 +0200, Eric Auger wrote:
> > On 09/11/2014 07:05 AM, Alex Williamson wrote:
> > > On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> > >> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> > >>> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> > >>>
> > >>> This is a new control channel which enables KVM to cooperate with
> > >>> viable VFIO devices.
> > >>>
> > >>> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> > >>> in addition to a list of groups (kvm_vfio_group). The new
> > >>> infrastructure enables to check the validity of the VFIO device
> > >>> file descriptor, get and hold a reference to it.
> > >>>
> > >>> The first concrete implemented command is IRQ forward control:
> > >>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> > >>>
> > >>> It consists in programing the VFIO driver and KVM in a consistent manner
> > >>> so that an optimized IRQ injection/completion is set up. Each
> > >>> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> > >>> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> > >>> are set again in the normal handling state (non forwarded).
> > >>
> > >> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> > >>
> > >> When a kvm_vfio_device is released?
> > >>
> > >>>
> > >>> The forwarding programmming is architecture specific, embodied by the
> > >>> kvm_arch_set_fwd_state function. Its implementation is given in a
> > >>> separate patch file.
> > >>
> > >> I would drop the last sentence and instead indicate that this is handled
> > >> properly when the architecture does not support such a feature.
> > >>
> > >>>
> > >>> The forwarding control modality is enabled by the
> > >>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> > >>>
> > >>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> > >>>
> > >>> ---
> > >>>
> > >>> v1 -> v2:
> > >>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > >>> - original patch file separated into 2 parts: generic part moved in vfio.c
> > >>>   and ARM specific part(kvm_arch_set_fwd_state)
> > >>> ---
> > >>>  include/linux/kvm_host.h |  27 +++
> > >>>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> > >>>  2 files changed, 477 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > >>> index a4c33b3..24350dc 100644
> > >>> --- a/include/linux/kvm_host.h
> > >>> +++ b/include/linux/kvm_host.h
> > >>> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> > >>>  		      unsigned long arg);
> > >>>  };
> > >>>  
> > >>> +enum kvm_fwd_irq_action {
> > >>> +	KVM_VFIO_IRQ_SET_FORWARD,
> > >>> +	KVM_VFIO_IRQ_SET_NORMAL,
> > >>> +	KVM_VFIO_IRQ_CLEANUP,
> > >>
> > >> This is KVM internal API, so it would probably be good to document this.
> > >> Especially the CLEANUP bit worries me, see below.
> > > 
> > > This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
> > Hi Alex,
> > 
> > will change that.
> > > Extra states worry me too.
> > 
> > I tried to explained the 2 motivations behind. Please let me know if it
> > makes sense.
> 
> Not really.  It seems like it's just a leak of arch specific handling
> out into common code.
> 
> > >>> +};
> > >>> +
> > >>> +/* internal structure describing a forwarded IRQ */
> > >>> +struct kvm_fwd_irq {
> > >>> +	struct list_head link;
> > >>
> > >> this list entry is local to the kvm vfio device, right? that means you
> > >> probably want a struct with just the below fields, and then have a
> > >> containing struct in the generic device file, private to it's logic.
> > > 
> > > Yes, this is part of the abstraction problem.
> > OK will fix that.
> > > 
> > >>> +	__u32 index; /* platform device irq index */
> > > 
> > > This is a vfio_device irq_index, but vfio_devices support indexes and
> > > sub-indexes.  At this level the API should match vfio, not the specifics
> > > of platform devices not supporting sub-index.
> > I will add sub-indexes then.
> > > 
> > >>> +	__u32 hwirq; /*physical IRQ */
> > >>> +	__u32 gsi; /* virtual IRQ */
> > >>> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> > > 
> > > Not sure I understand why vcpu is necessary.
> > vcpu is used when providing the physical IRQ/virtual IRQ mapping to the
> > virtual GIC. I can remove it from and add a vcpu struct * param to
> > kvm_arch_set_fwd_state if you prefer.
> 
> The kvm-vfio API for this interface doesn't allow the user to indicate
> which vcpu to inject to.  On x86, it would be the programming of the
> interrupt controller that would decide that.  In the code here we
> arbitrarily pick vcpu0.  It feels both architecture specific and a bit
> unspecified.
> 
> > 
> >   Also I see a 'get' in the code below, but not a 'put'.
> > Sorry I do not understand your comment here? What 'get' do you mention?
> 
> I suppose vcpus don't subscribe to the get/put philosophy, I was
> expecting a reference count, but there is none.  How do we know that
> vcpu pointer is still valid later?
> 

Because it will stay valid for as long as you can have a handle to this
instance of the kvm vfio device?

The only way for it to go away is when the VM is completely dying, but
the KVM device API should keep it a alive with a reference, right?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11 17:24             ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 09:59:24AM -0600, Alex Williamson wrote:
> On Thu, 2014-09-11 at 14:04 +0200, Eric Auger wrote:
> > On 09/11/2014 07:05 AM, Alex Williamson wrote:
> > > On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> > >> On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> > >>> This patch introduces a new KVM_DEV_VFIO_DEVICE attribute.
> > >>>
> > >>> This is a new control channel which enables KVM to cooperate with
> > >>> viable VFIO devices.
> > >>>
> > >>> The kvm-vfio device now holds a list of devices (kvm_vfio_device)
> > >>> in addition to a list of groups (kvm_vfio_group). The new
> > >>> infrastructure enables to check the validity of the VFIO device
> > >>> file descriptor, get and hold a reference to it.
> > >>>
> > >>> The first concrete implemented command is IRQ forward control:
> > >>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> > >>>
> > >>> It consists in programing the VFIO driver and KVM in a consistent manner
> > >>> so that an optimized IRQ injection/completion is set up. Each
> > >>> kvm_vfio_device holds a list of forwarded IRQ. When putting a
> > >>> kvm_vfio_device, the implementation makes sure the forwarded IRQs
> > >>> are set again in the normal handling state (non forwarded).
> > >>
> > >> 'putting a kvm_vfio_device' sounds to like you're golf'ing :)
> > >>
> > >> When a kvm_vfio_device is released?
> > >>
> > >>>
> > >>> The forwarding programmming is architecture specific, embodied by the
> > >>> kvm_arch_set_fwd_state function. Its implementation is given in a
> > >>> separate patch file.
> > >>
> > >> I would drop the last sentence and instead indicate that this is handled
> > >> properly when the architecture does not support such a feature.
> > >>
> > >>>
> > >>> The forwarding control modality is enabled by the
> > >>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD define.
> > >>>
> > >>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> > >>>
> > >>> ---
> > >>>
> > >>> v1 -> v2:
> > >>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > >>> - original patch file separated into 2 parts: generic part moved in vfio.c
> > >>>   and ARM specific part(kvm_arch_set_fwd_state)
> > >>> ---
> > >>>  include/linux/kvm_host.h |  27 +++
> > >>>  virt/kvm/vfio.c          | 452 ++++++++++++++++++++++++++++++++++++++++++++++-
> > >>>  2 files changed, 477 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > >>> index a4c33b3..24350dc 100644
> > >>> --- a/include/linux/kvm_host.h
> > >>> +++ b/include/linux/kvm_host.h
> > >>> @@ -1065,6 +1065,21 @@ struct kvm_device_ops {
> > >>>  		      unsigned long arg);
> > >>>  };
> > >>>  
> > >>> +enum kvm_fwd_irq_action {
> > >>> +	KVM_VFIO_IRQ_SET_FORWARD,
> > >>> +	KVM_VFIO_IRQ_SET_NORMAL,
> > >>> +	KVM_VFIO_IRQ_CLEANUP,
> > >>
> > >> This is KVM internal API, so it would probably be good to document this.
> > >> Especially the CLEANUP bit worries me, see below.
> > > 
> > > This also doesn't match the user API, which is simply FORWARD/UNFORWARD.
> > Hi Alex,
> > 
> > will change that.
> > > Extra states worry me too.
> > 
> > I tried to explained the 2 motivations behind. Please let me know if it
> > makes sense.
> 
> Not really.  It seems like it's just a leak of arch specific handling
> out into common code.
> 
> > >>> +};
> > >>> +
> > >>> +/* internal structure describing a forwarded IRQ */
> > >>> +struct kvm_fwd_irq {
> > >>> +	struct list_head link;
> > >>
> > >> this list entry is local to the kvm vfio device, right? that means you
> > >> probably want a struct with just the below fields, and then have a
> > >> containing struct in the generic device file, private to it's logic.
> > > 
> > > Yes, this is part of the abstraction problem.
> > OK will fix that.
> > > 
> > >>> +	__u32 index; /* platform device irq index */
> > > 
> > > This is a vfio_device irq_index, but vfio_devices support indexes and
> > > sub-indexes.  At this level the API should match vfio, not the specifics
> > > of platform devices not supporting sub-index.
> > I will add sub-indexes then.
> > > 
> > >>> +	__u32 hwirq; /*physical IRQ */
> > >>> +	__u32 gsi; /* virtual IRQ */
> > >>> +	struct kvm_vcpu *vcpu; /* vcpu to inject into*/
> > > 
> > > Not sure I understand why vcpu is necessary.
> > vcpu is used when providing the physical IRQ/virtual IRQ mapping to the
> > virtual GIC. I can remove it from and add a vcpu struct * param to
> > kvm_arch_set_fwd_state if you prefer.
> 
> The kvm-vfio API for this interface doesn't allow the user to indicate
> which vcpu to inject to.  On x86, it would be the programming of the
> interrupt controller that would decide that.  In the code here we
> arbitrarily pick vcpu0.  It feels both architecture specific and a bit
> unspecified.
> 
> > 
> >   Also I see a 'get' in the code below, but not a 'put'.
> > Sorry I do not understand your comment here? What 'get' do you mention?
> 
> I suppose vcpus don't subscribe to the get/put philosophy, I was
> expecting a reference count, but there is none.  How do we know that
> vcpu pointer is still valid later?
> 

Because it will stay valid for as long as you can have a handle to this
instance of the kvm vfio device?

The only way for it to go away is when the VM is completely dying, but
the KVM device API should keep it a alive with a reference, right?

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 2/9] KVM: ARM: VGIC: add forwarded irq rbtree lock
  2014-09-11  3:09     ` Christoffer Dall
@ 2014-09-11 17:31       ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11 17:31 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On 09/11/2014 05:09 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:41PM +0200, Eric Auger wrote:
>> add a lock related to the rb tree manipulation. The rb tree can be
> 
> Ok, I can't hold myself back any longer. 


 Please begin sentences with a
> capital letter. You don't do this in French? :)
Hi Christoffer,


yep that's understood ;-) Definitively we do. Just that I am discovering
it is common too in commits and comments ;-)
> 
>> searched in one thread (irqfd handler for instance) and map/unmap
>> happen in another.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>>  include/kvm/arm_vgic.h |  1 +
>>  virt/kvm/arm/vgic.c    | 46 +++++++++++++++++++++++++++++++++++++---------
>>  2 files changed, 38 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 743020f..3da244f 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -177,6 +177,7 @@ struct vgic_dist {
>>  	unsigned long		irq_pending_on_cpu;
>>  
>>  	struct rb_root		irq_phys_map;
>> +	spinlock_t			rb_tree_lock;
>>  #endif
>>  };
>>  
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 8ef495b..dbc2a5a 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1630,9 +1630,15 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>>  
>>  int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>>  {
>> -	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> -	struct rb_node **new = &root->rb_node, *parent = NULL;
>> +	struct rb_root *root;
>> +	struct rb_node **new, *parent = NULL;
>>  	struct irq_phys_map *new_map;
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +
>> +	spin_lock(&dist->rb_tree_lock);
>> +
>> +	root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	new = &root->rb_node;
>>  
>>  	/* Boilerplate rb_tree code */
>>  	while (*new) {
>> @@ -1644,13 +1650,17 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>>  			new = &(*new)->rb_left;
>>  		else if (this->virt_irq > virt_irq)
>>  			new = &(*new)->rb_right;
>> -		else
>> +		else {
>> +			spin_unlock(&dist->rb_tree_lock);
>>  			return -EEXIST;
>> +		}
> 
> can you initialize a ret variable to -EEXIST in the beginning of this
> function, and add an out label above the unlock below, replace this
> multi-line statement with a goto out, and set ret = 0 after the while
> loop?
sure
> 
>>  	}
>>  
>>  	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
>> -	if (!new_map)
>> +	if (!new_map) {
>> +		spin_unlock(&dist->rb_tree_lock);
>>  		return -ENOMEM;
> 
> then this becomes ret = -ENOMEM; goto out;
OK
> 
>> +	}
>>  
>>  	new_map->virt_irq = virt_irq;
>>  	new_map->phys_irq = phys_irq;
>> @@ -1658,6 +1668,8 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>>  	rb_link_node(&new_map->node, parent, new);
>>  	rb_insert_color(&new_map->node, root);
>>  
>> +	spin_unlock(&dist->rb_tree_lock);
>> +
> 
> aren't you allocating memory with GFP_KERNEL while holding a spinlock
> here?
oups. Thanks for noticing. I Will move the lock.
> 
>>  	return 0;
>>  }
>>  
>> @@ -1685,24 +1697,39 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>>  
>>  int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq)
>>  {
>> -	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
>> +	struct irq_phys_map *map;
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	int ret;
>> +
>> +	spin_lock(&dist->rb_tree_lock);
>> +	map = vgic_irq_map_search(vcpu, virt_irq);
>>  
>>  	if (map)
>> -		return map->phys_irq;
>> +		ret = map->phys_irq;
>> +	else
>> +		ret =  -ENOENT;
> 
> initialize ret to -ENOENT and avoid the else statement.
ok
> 
>> +
>> +	spin_unlock(&dist->rb_tree_lock);
>> +	return ret;
>>  
>> -	return -ENOENT;
>>  }
>>  
>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>>  {
>> -	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
>> +	struct irq_phys_map *map;
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +
>> +	spin_lock(&dist->rb_tree_lock);
>> +
>> +	map = vgic_irq_map_search(vcpu, virt_irq);
>>  
>>  	if (map && map->phys_irq == phys_irq) {
>>  		rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, virt_irq));
>>  		kfree(map);
>> +		spin_unlock(&dist->rb_tree_lock);
> 
> can kfree sleep?  I don't remember.  In any case, you can unlock before
> calling kfree.
no it can't but I will move anyway.
> 
>>  		return 0;
>>  	}
>> -
>> +	spin_unlock(&dist->rb_tree_lock);
>>  	return -ENOENT;
> 
> an out label and single unlock location would be preferred here as well
> I think.
ok

Thansk

Eric
> 
>>  }
>>  
>> @@ -1898,6 +1925,7 @@ int kvm_vgic_create(struct kvm *kvm)
>>  	}
>>  
>>  	spin_lock_init(&kvm->arch.vgic.lock);
>> +	spin_lock_init(&kvm->arch.vgic.rb_tree_lock);
>>  	kvm->arch.vgic.in_kernel = true;
>>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>> -- 
>> 1.9.1
>>


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 2/9] KVM: ARM: VGIC: add forwarded irq rbtree lock
@ 2014-09-11 17:31       ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/2014 05:09 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:41PM +0200, Eric Auger wrote:
>> add a lock related to the rb tree manipulation. The rb tree can be
> 
> Ok, I can't hold myself back any longer. 


 Please begin sentences with a
> capital letter. You don't do this in French? :)
Hi Christoffer,


yep that's understood ;-) Definitively we do. Just that I am discovering
it is common too in commits and comments ;-)
> 
>> searched in one thread (irqfd handler for instance) and map/unmap
>> happen in another.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>> ---
>>  include/kvm/arm_vgic.h |  1 +
>>  virt/kvm/arm/vgic.c    | 46 +++++++++++++++++++++++++++++++++++++---------
>>  2 files changed, 38 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 743020f..3da244f 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -177,6 +177,7 @@ struct vgic_dist {
>>  	unsigned long		irq_pending_on_cpu;
>>  
>>  	struct rb_root		irq_phys_map;
>> +	spinlock_t			rb_tree_lock;
>>  #endif
>>  };
>>  
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 8ef495b..dbc2a5a 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1630,9 +1630,15 @@ static struct rb_root *vgic_get_irq_phys_map(struct kvm_vcpu *vcpu,
>>  
>>  int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>>  {
>> -	struct rb_root *root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> -	struct rb_node **new = &root->rb_node, *parent = NULL;
>> +	struct rb_root *root;
>> +	struct rb_node **new, *parent = NULL;
>>  	struct irq_phys_map *new_map;
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +
>> +	spin_lock(&dist->rb_tree_lock);
>> +
>> +	root = vgic_get_irq_phys_map(vcpu, virt_irq);
>> +	new = &root->rb_node;
>>  
>>  	/* Boilerplate rb_tree code */
>>  	while (*new) {
>> @@ -1644,13 +1650,17 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>>  			new = &(*new)->rb_left;
>>  		else if (this->virt_irq > virt_irq)
>>  			new = &(*new)->rb_right;
>> -		else
>> +		else {
>> +			spin_unlock(&dist->rb_tree_lock);
>>  			return -EEXIST;
>> +		}
> 
> can you initialize a ret variable to -EEXIST in the beginning of this
> function, and add an out label above the unlock below, replace this
> multi-line statement with a goto out, and set ret = 0 after the while
> loop?
sure
> 
>>  	}
>>  
>>  	new_map = kzalloc(sizeof(*new_map), GFP_KERNEL);
>> -	if (!new_map)
>> +	if (!new_map) {
>> +		spin_unlock(&dist->rb_tree_lock);
>>  		return -ENOMEM;
> 
> then this becomes ret = -ENOMEM; goto out;
OK
> 
>> +	}
>>  
>>  	new_map->virt_irq = virt_irq;
>>  	new_map->phys_irq = phys_irq;
>> @@ -1658,6 +1668,8 @@ int vgic_map_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>>  	rb_link_node(&new_map->node, parent, new);
>>  	rb_insert_color(&new_map->node, root);
>>  
>> +	spin_unlock(&dist->rb_tree_lock);
>> +
> 
> aren't you allocating memory with GFP_KERNEL while holding a spinlock
> here?
oups. Thanks for noticing. I Will move the lock.
> 
>>  	return 0;
>>  }
>>  
>> @@ -1685,24 +1697,39 @@ static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu,
>>  
>>  int vgic_get_phys_irq(struct kvm_vcpu *vcpu, int virt_irq)
>>  {
>> -	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
>> +	struct irq_phys_map *map;
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	int ret;
>> +
>> +	spin_lock(&dist->rb_tree_lock);
>> +	map = vgic_irq_map_search(vcpu, virt_irq);
>>  
>>  	if (map)
>> -		return map->phys_irq;
>> +		ret = map->phys_irq;
>> +	else
>> +		ret =  -ENOENT;
> 
> initialize ret to -ENOENT and avoid the else statement.
ok
> 
>> +
>> +	spin_unlock(&dist->rb_tree_lock);
>> +	return ret;
>>  
>> -	return -ENOENT;
>>  }
>>  
>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, int virt_irq, int phys_irq)
>>  {
>> -	struct irq_phys_map *map = vgic_irq_map_search(vcpu, virt_irq);
>> +	struct irq_phys_map *map;
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +
>> +	spin_lock(&dist->rb_tree_lock);
>> +
>> +	map = vgic_irq_map_search(vcpu, virt_irq);
>>  
>>  	if (map && map->phys_irq == phys_irq) {
>>  		rb_erase(&map->node, vgic_get_irq_phys_map(vcpu, virt_irq));
>>  		kfree(map);
>> +		spin_unlock(&dist->rb_tree_lock);
> 
> can kfree sleep?  I don't remember.  In any case, you can unlock before
> calling kfree.
no it can't but I will move anyway.
> 
>>  		return 0;
>>  	}
>> -
>> +	spin_unlock(&dist->rb_tree_lock);
>>  	return -ENOENT;
> 
> an out label and single unlock location would be preferred here as well
> I think.
ok

Thansk

Eric
> 
>>  }
>>  
>> @@ -1898,6 +1925,7 @@ int kvm_vgic_create(struct kvm *kvm)
>>  	}
>>  
>>  	spin_lock_init(&kvm->arch.vgic.lock);
>> +	spin_lock_init(&kvm->arch.vgic.rb_tree_lock);
>>  	kvm->arch.vgic.in_kernel = true;
>>  	kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
>>  	kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
>> -- 
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-11  9:35       ` Eric Auger
@ 2014-09-11 17:32         ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:32 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, Sep 11, 2014 at 11:35:56AM +0200, Eric Auger wrote:
> On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:

[...]

> >> +	if (!pfwd)
> >> +		return -ENOMEM;
> >> +	pfwd->index = fwd_irq->index;
> >> +	pfwd->gsi = fwd_irq->gsi;
> >> +	pfwd->hwirq = hwirq;
> >> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> >> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> >> +	if (ret < 0) {
> >> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> > 
> > this whole thing feels incredibly broken to me.  Setting a forward
> > should either work or not work, not something in between that leaves
> > something to be cleaned up.  Why this two-stage thingy here?
> I wanted to exploit the return value of vgic_map_phys_irq which is
> likely to fail if the phys/virt mapping exists at VGIC level.

then just have the kvm_arch_set_fwd_state return with -EXIST and it is
the responsibility of that function itself to cleanup from whatever it
was doing, not to rely on its caller to call a cleanup function.

> 
> I already validated the injection from a KVM_VFIO_DEVICE point of view
> (the device/irq is not known internally). But what if another external
> component - which does not exist yet - maps the IRQ at VGIC level? Maybe
> I need to replace the existing validation check by querying the VGIC at
> low level instead of checking KVM-VFIO local variables.

No need to over-complicate this, in this case, the
kvm_arch_set_fwd_state() will simply fail (graceously), as I said above,
and you just return to the user, "sorry, couldn't do what you asked me
because of this error code".


[...]

> >> + *
> >> + * When this function is called, the vcpu already are destroyed. No
> >                                     the VPUCs are already destroyed.
> >> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> >> + * kvm_arch_set_fwd_state action
> > 
> > this last bit didn't make any sense to me.  Also, why are we referring
> > to the vgic in generic code?
> doesn't make sense anymore indeed. I wanted to emphasize the fact that
> VGIC KVM device is destroyed before the KVM VFIO device and this
> explains why I need a special CLEANUP cmd (besides the fact I need to
> call chip->irq_eoi(d) for the forwarded IRQs);
> 

I don't think it explains why you need a special CLEANUP cmd.  When the
vgic is going away it must cleanup its state.  When the kvm vfio device
goes away, it must unforward any unforwarded IRQs, and the architecture
specific implementation MUST correctly unforward such IRQs - as a single
operation!

Hope this helps.
-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11 17:32         ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 17:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 11:35:56AM +0200, Eric Auger wrote:
> On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:

[...]

> >> +	if (!pfwd)
> >> +		return -ENOMEM;
> >> +	pfwd->index = fwd_irq->index;
> >> +	pfwd->gsi = fwd_irq->gsi;
> >> +	pfwd->hwirq = hwirq;
> >> +	pfwd->vcpu = kvm_get_vcpu(kdev->kvm, 0);
> >> +	ret = kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_SET_FORWARD);
> >> +	if (ret < 0) {
> >> +		kvm_arch_set_fwd_state(pfwd, KVM_VFIO_IRQ_CLEANUP);
> > 
> > this whole thing feels incredibly broken to me.  Setting a forward
> > should either work or not work, not something in between that leaves
> > something to be cleaned up.  Why this two-stage thingy here?
> I wanted to exploit the return value of vgic_map_phys_irq which is
> likely to fail if the phys/virt mapping exists at VGIC level.

then just have the kvm_arch_set_fwd_state return with -EXIST and it is
the responsibility of that function itself to cleanup from whatever it
was doing, not to rely on its caller to call a cleanup function.

> 
> I already validated the injection from a KVM_VFIO_DEVICE point of view
> (the device/irq is not known internally). But what if another external
> component - which does not exist yet - maps the IRQ at VGIC level? Maybe
> I need to replace the existing validation check by querying the VGIC at
> low level instead of checking KVM-VFIO local variables.

No need to over-complicate this, in this case, the
kvm_arch_set_fwd_state() will simply fail (graceously), as I said above,
and you just return to the user, "sorry, couldn't do what you asked me
because of this error code".


[...]

> >> + *
> >> + * When this function is called, the vcpu already are destroyed. No
> >                                     the VPUCs are already destroyed.
> >> + * vgic manipulation can happen hence the KVM_VFIO_IRQ_CLEANUP
> >> + * kvm_arch_set_fwd_state action
> > 
> > this last bit didn't make any sense to me.  Also, why are we referring
> > to the vgic in generic code?
> doesn't make sense anymore indeed. I wanted to emphasize the fact that
> VGIC KVM device is destroyed before the KVM VFIO device and this
> explains why I need a special CLEANUP cmd (besides the fact I need to
> call chip->irq_eoi(d) for the forwarded IRQs);
> 

I don't think it explains why you need a special CLEANUP cmd.  When the
vgic is going away it must cleanup its state.  When the kvm vfio device
goes away, it must unforward any unforwarded IRQs, and the architecture
specific implementation MUST correctly unforward such IRQs - as a single
operation!

Hope this helps.
-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
  2014-09-11 17:05         ` Christoffer Dall
@ 2014-09-11 18:07           ` Alex Williamson
  -1 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11 18:07 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Eric Auger, eric.auger, marc.zyngier, linux-arm-kernel, kvmarm,
	kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, 2014-09-11 at 19:05 +0200, Christoffer Dall wrote:
> On Thu, Sep 11, 2014 at 10:44:02AM +0200, Eric Auger wrote:
> > On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > > On Mon, Sep 01, 2014 at 02:52:43PM +0200, Eric Auger wrote:
> > >> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
> > >> need to disable the IRQ anymore. In that mode, when the handler completes
> > > 
> > > add a comma after completes
> > Hi Christoffer,
> > ok
> > > 
> > >> the IRQ is not deactivated but only its priority is lowered.
> > >>
> > >> Some other actor (typically a guest) is supposed to deactivate the IRQ,
> > >> allowing at that time a new physical IRQ to hit.
> > >>
> > >> In virtualization use case, the physical IRQ is automatically completed
> > >> by the interrupt controller when the guest completes the corresponding
> > >> virtual IRQ.
> > >>
> > >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> > >> ---
> > >>  drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
> > >>  1 file changed, 6 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> > >> index 6768508..1f851b2 100644
> > >> --- a/drivers/vfio/platform/vfio_platform_irq.c
> > >> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> > >> @@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
> > >>  	struct vfio_platform_irq *irq_ctx = dev_id;
> > >>  	unsigned long flags;
> > >>  	int ret = IRQ_NONE;
> > >> +	struct irq_data *d;
> > >> +	bool is_forwarded;
> > >>  
> > >>  	spin_lock_irqsave(&irq_ctx->lock, flags);
> > >>  
> > >>  	if (!irq_ctx->masked) {
> > >>  		ret = IRQ_HANDLED;
> > >> +		d = irq_get_irq_data(irq_ctx->hwirq);
> > >> +		is_forwarded = irqd_irq_forwarded(d);
> > >>  
> > >> -		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
> > >> +		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
> > >> +						!is_forwarded) {
> > >>  			disable_irq_nosync(irq_ctx->hwirq);
> > >>  			irq_ctx->masked = true;
> > >>  		}
> > >> -- 
> > >> 1.9.1
> > >>
> > > It makes sense that these needs to be all controlled in the kernel, but
> > > I'm wondering if it would be cleaner / more correct to clear the
> > > AUTOMASKED flag when the IRQ is forwarded and have vfio refuse setting
> > > this flag as long as the irq is forwarded?
> > 
> > If I am not wrong, even if the user sets AUTOMASKED, this info never is
> > exploited by the vfio platform driver. AUTOMASKED only is set internally
> > to the driver, on init, for level sensitive IRQs.
> > 
> > It seems to be the same on PCI (for INTx). I do not see anywhere the
> > user flag curectly copied into a local storage. But I prefer to be
> > careful ;-)
> > 
> > If confirmed, although the flag value is exposed in the user API, the
> > user set value never is exploited so this removes the need to check.
> > 
> > the forwarded IRQ modality being fully dynamic currently, then I would
> > need to update the irq_ctx->flags on each vfio_irq_handler call. I don't
> > know if its better?
> > 
> I'm not an expert on vfio, so I'll leave that to Alex Williamson to
> answer, but I'm just worried that we need to special-case the forwarded
> IRQ here, and if that may get lost elsewhere in the vfio code.  If the
> AUTOMASKED flag covers specifically this behavior, then why don't we
> simply clear/set that flag when forwarding/unforwarding the specific
> IRQ?

The way that VFIO_IRQ_INFO_AUTOMASKED is being used here is unique to
the platform device vfio backend.  In the rest of VFIO,
VFIO_IRQ_INFO_AUTOMASKED is simply a flag bit exposed via
VFIO_DEVICE_GET_IRQ_INFO.  The flags field of struct vfio_irq_info is
output-only.  vfio-pci knows by the IRQ index whether it is edge or
level.  I do agree though that changing the flag bit, or better yet a
bool, rather than adding extra tests that need to be handled as each
usage seems less error prone.

Things could get confusing for userspace though if suddenly
VFIO_DEVICE_GET_IRQ_INFO starts calling the index edge triggered once
forwarding mode is enabled.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded
@ 2014-09-11 18:07           ` Alex Williamson
  0 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11 18:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2014-09-11 at 19:05 +0200, Christoffer Dall wrote:
> On Thu, Sep 11, 2014 at 10:44:02AM +0200, Eric Auger wrote:
> > On 09/11/2014 05:10 AM, Christoffer Dall wrote:
> > > On Mon, Sep 01, 2014 at 02:52:43PM +0200, Eric Auger wrote:
> > >> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
> > >> need to disable the IRQ anymore. In that mode, when the handler completes
> > > 
> > > add a comma after completes
> > Hi Christoffer,
> > ok
> > > 
> > >> the IRQ is not deactivated but only its priority is lowered.
> > >>
> > >> Some other actor (typically a guest) is supposed to deactivate the IRQ,
> > >> allowing at that time a new physical IRQ to hit.
> > >>
> > >> In virtualization use case, the physical IRQ is automatically completed
> > >> by the interrupt controller when the guest completes the corresponding
> > >> virtual IRQ.
> > >>
> > >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> > >> ---
> > >>  drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
> > >>  1 file changed, 6 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> > >> index 6768508..1f851b2 100644
> > >> --- a/drivers/vfio/platform/vfio_platform_irq.c
> > >> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> > >> @@ -88,13 +88,18 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
> > >>  	struct vfio_platform_irq *irq_ctx = dev_id;
> > >>  	unsigned long flags;
> > >>  	int ret = IRQ_NONE;
> > >> +	struct irq_data *d;
> > >> +	bool is_forwarded;
> > >>  
> > >>  	spin_lock_irqsave(&irq_ctx->lock, flags);
> > >>  
> > >>  	if (!irq_ctx->masked) {
> > >>  		ret = IRQ_HANDLED;
> > >> +		d = irq_get_irq_data(irq_ctx->hwirq);
> > >> +		is_forwarded = irqd_irq_forwarded(d);
> > >>  
> > >> -		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED) {
> > >> +		if (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED &&
> > >> +						!is_forwarded) {
> > >>  			disable_irq_nosync(irq_ctx->hwirq);
> > >>  			irq_ctx->masked = true;
> > >>  		}
> > >> -- 
> > >> 1.9.1
> > >>
> > > It makes sense that these needs to be all controlled in the kernel, but
> > > I'm wondering if it would be cleaner / more correct to clear the
> > > AUTOMASKED flag when the IRQ is forwarded and have vfio refuse setting
> > > this flag as long as the irq is forwarded?
> > 
> > If I am not wrong, even if the user sets AUTOMASKED, this info never is
> > exploited by the vfio platform driver. AUTOMASKED only is set internally
> > to the driver, on init, for level sensitive IRQs.
> > 
> > It seems to be the same on PCI (for INTx). I do not see anywhere the
> > user flag curectly copied into a local storage. But I prefer to be
> > careful ;-)
> > 
> > If confirmed, although the flag value is exposed in the user API, the
> > user set value never is exploited so this removes the need to check.
> > 
> > the forwarded IRQ modality being fully dynamic currently, then I would
> > need to update the irq_ctx->flags on each vfio_irq_handler call. I don't
> > know if its better?
> > 
> I'm not an expert on vfio, so I'll leave that to Alex Williamson to
> answer, but I'm just worried that we need to special-case the forwarded
> IRQ here, and if that may get lost elsewhere in the vfio code.  If the
> AUTOMASKED flag covers specifically this behavior, then why don't we
> simply clear/set that flag when forwarding/unforwarding the specific
> IRQ?

The way that VFIO_IRQ_INFO_AUTOMASKED is being used here is unique to
the platform device vfio backend.  In the rest of VFIO,
VFIO_IRQ_INFO_AUTOMASKED is simply a flag bit exposed via
VFIO_DEVICE_GET_IRQ_INFO.  The flags field of struct vfio_irq_info is
output-only.  vfio-pci knows by the IRQ index whether it is edge or
level.  I do agree though that changing the flag bit, or better yet a
bool, rather than adding extra tests that need to be handled as each
usage seems less error prone.

Things could get confusing for userspace though if suddenly
VFIO_DEVICE_GET_IRQ_INFO starts calling the index edge triggered once
forwarding mode is enabled.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-11 17:10         ` Christoffer Dall
@ 2014-09-11 18:14           ` Alex Williamson
  -1 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11 18:14 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Eric Auger, eric.auger, marc.zyngier, linux-arm-kernel, kvmarm,
	kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, 2014-09-11 at 19:10 +0200, Christoffer Dall wrote:
> On Wed, Sep 10, 2014 at 11:05:49PM -0600, Alex Williamson wrote:
> > On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> > > On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> 
> [...]
> 
> > > >  
> > > > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > > +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > > 
> > > what's the 'p' in pfwd?
> > 
> > p is for pointer?
> > 
> 
> shouldn't the type declation spell out quite clearly to me that I'm
> dealing with a pointer?

Sure.  In the cases where I've done similar things it's more a matter of
not needing to come up with another variable, for instance if I need
both a struct and a struct* I might call them foo and pfoo if I can't
come up with anything more meaningful.


> [...]
> 
> > > 
> > > need some spaceing here, also, I would turn this around, first check if
> > > the strcmp fails, and then error out, then do you next check etc., to
> > > avoid so many nested statements.
> > > 
> > > > +	/* is a ref to this device already owned by the KVM-VFIO device? */
> > > 
> > > this comment is not particularly helpful in its current form, it would
> > > be helpful if you specified that we're checking whether that particular
> > > device/irq combo is already registered.
> > > 
> > > > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > > > +	if (*kvm_vdev) {
> > > > +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> > > > +			kvm_err("%s irq %d already forwarded\n",
> > > > +				__func__, *hwirq);
> > 
> > Why didn't we do this first?
> > 
> huh?

The code is doing:

1. can the arch forward this irq
2. are we already forwarding this irq

It's backwards, test for duplicates locally before calling out into arch
code.  Besides, I think the arch code here should go away and just be
another error condition for the call-out.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11 18:14           ` Alex Williamson
  0 siblings, 0 replies; 101+ messages in thread
From: Alex Williamson @ 2014-09-11 18:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2014-09-11 at 19:10 +0200, Christoffer Dall wrote:
> On Wed, Sep 10, 2014 at 11:05:49PM -0600, Alex Williamson wrote:
> > On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> > > On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> 
> [...]
> 
> > > >  
> > > > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > > +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > > 
> > > what's the 'p' in pfwd?
> > 
> > p is for pointer?
> > 
> 
> shouldn't the type declation spell out quite clearly to me that I'm
> dealing with a pointer?

Sure.  In the cases where I've done similar things it's more a matter of
not needing to come up with another variable, for instance if I need
both a struct and a struct* I might call them foo and pfoo if I can't
come up with anything more meaningful.


> [...]
> 
> > > 
> > > need some spaceing here, also, I would turn this around, first check if
> > > the strcmp fails, and then error out, then do you next check etc., to
> > > avoid so many nested statements.
> > > 
> > > > +	/* is a ref to this device already owned by the KVM-VFIO device? */
> > > 
> > > this comment is not particularly helpful in its current form, it would
> > > be helpful if you specified that we're checking whether that particular
> > > device/irq combo is already registered.
> > > 
> > > > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > > > +	if (*kvm_vdev) {
> > > > +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> > > > +			kvm_err("%s irq %d already forwarded\n",
> > > > +				__func__, *hwirq);
> > 
> > Why didn't we do this first?
> > 
> huh?

The code is doing:

1. can the arch forward this irq
2. are we already forwarding this irq

It's backwards, test for duplicates locally before calling out into arch
code.  Besides, I think the arch code here should go away and just be
another error condition for the call-out.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 1/9] KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded IRQ
  2014-09-11  3:09     ` Christoffer Dall
@ 2014-09-11 18:17       ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11 18:17 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On 09/11/2014 05:09 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:40PM +0200, Eric Auger wrote:
>> Fix multiple injection of level sensitive forwarded IRQs.
>> With current code, the second injection fails since the state bitmaps
>> are not reset (process_maintenance is not called anymore).
>> New implementation consists in fully bypassing the vgic state
>> management for forwarded IRQ (checks are ignored in
>> vgic_update_irq_pending). This obviously assumes the forwarded IRQ is
>> injected from kernel side.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> It was attempted to reset the states in __kvm_vgic_sync_hwstate, checking
>> the emptied LR of forwarded IRQ. However surprisingly this solution does
>> not seem to work. Some times, a new forwarded IRQ injection is observed
>> while the LR of the previous instance was not observed as empty.
> 
> hmmm, concerning.  It would probably have been helpful overall if you
> could start by describing the problem with the current implementation in
> the commit message, and then explain the fix...
> 
>>
>> v1 -> v2:
>> - fix vgic state bypass in vgic_queue_hwirq
>> ---
>>  virt/kvm/arm/vgic.c | 13 ++++++++++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 0007300..8ef495b 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1259,7 +1259,9 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
>>  
>>  static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
>>  {
>> -	if (vgic_irq_is_queued(vcpu, irq))
>> +	bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) > 0);
> 
> can you create a static function to factor this vgic_get_phys_irq check out, please?
yes sure
> 
>> +
>> +	if (vgic_irq_is_queued(vcpu, irq) && !is_forwarded)
>>  		return true; /* level interrupt, already queued */
> 
> so essentially if an IRQ is already on a LR so we shouldn't resample the
> line, then we still resample the line if the IRQ is forwarded?
> 
> I think you need to explain this, both to me here, and also in the code
> by moving the comment following the return statement above the check and
> comment this clearly.
Well, I admit it may look a bit pushy! When we discussed this issue with
Marc, the outcome was that the vgic states were not accurate with
forwarded IRQs and VGIC state may be fully bypassed. Since the first
injection still sets the state - and I did not want to modify this - the
2d one would fail due to that check, and the validate_injection. May be
cleaner to not update the states when injecting the fwd irq too.

> 
>>  
>>  	if (vgic_queue_irq(vcpu, 0, irq)) {
>> @@ -1517,14 +1519,18 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>>  	int edge_triggered, level_triggered;
>>  	int enabled;
>>  	bool ret = true;
>> +	bool is_forwarded;
>>  
>>  	spin_lock(&dist->lock);
>>  
>>  	vcpu = kvm_get_vcpu(kvm, cpuid);
>> +	is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) > 0);
> 
> use your new function here as well.
ok
> 
>> +
>>  	edge_triggered = vgic_irq_is_edge(vcpu, irq_num);
>>  	level_triggered = !edge_triggered;
>>  
>> -	if (!vgic_validate_injection(vcpu, irq_num, level)) {
>> +	if (!is_forwarded &&
>> +		!vgic_validate_injection(vcpu, irq_num, level)) {
> 
> I don't see the rationale here either.  If an IRQ is forwarded, why do
> you need to do anything if the condition of the line hasn't changed for
> a level-triggered IRQ or if you have a falling edge on an edge-triggered
> IRQ (assuming active-HIGH)?
To me this even cannot cannot happen. a second fwd irq can only hit if
the same virtual IRQ was completed and completed the corresponding phys
IRQ. Still the problem is that on the 1st injection we updated the VGIC
state. I aknowledge this is a hack to work around the 1st injection
update the state and nothing reset them. So on subsequent injections, -
and even on the 1st one-  I never check the state.
> 
>>  		ret = false;
>>  		goto out;
>>  	}
>> @@ -1557,7 +1563,8 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>>  		goto out;
>>  	}
>>  
>> -	if (level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
>> +	if (!is_forwarded &&
>> +		level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
> 
> So here it's making sense for SPIs since you can have an EOIed interrupt
> on a CPU that didn't exit the VM yet, and this it's still queued, but
> you still need to resample the line to respect other CPUs.  Only, we
> ever only target a single CPU for SPIs IIRC (the first in the target
> list register) so we have to wait for that CPU to to exit the VM anyhow.
> 
> This leads me to believe that, given a fowarded irq, you can only have
> XXX situations at this point:
> 
> (1) is_queued && target_vcpu_in_vm:
> The vcpu should resample this line when it exits the VM, because we
> check the LRs for IRQs like this one, so we don't have to do anything
> and go to out here.
> 
> (2) is_queued && !target_vcpu_in_vm::
> You have a bug because you exited the VM which must have done an EOI on
> the interrupt, otherwise this function shouldn't have been called!  This
> means that we should have cleared the queued state of the interrupt.
> 
> (3) !is_queued && whatever:
> Set the irq pending bits, so do not goto out.
> 
> I'm aware that there's theoretically a race between (1) and (2), but you
> should consider target_cpu_in_vm as "it hasn't been through
> __kvm_vgic_sync_hwstate() yet" and this should hold.
I will prepare something accurate for next week.

Thanks

Best Regards

Eric
> 
> Tell me where this breaks?
> 
> -Christoffer
> 


^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 1/9] KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded IRQ
@ 2014-09-11 18:17       ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-09-11 18:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/2014 05:09 AM, Christoffer Dall wrote:
> On Mon, Sep 01, 2014 at 02:52:40PM +0200, Eric Auger wrote:
>> Fix multiple injection of level sensitive forwarded IRQs.
>> With current code, the second injection fails since the state bitmaps
>> are not reset (process_maintenance is not called anymore).
>> New implementation consists in fully bypassing the vgic state
>> management for forwarded IRQ (checks are ignored in
>> vgic_update_irq_pending). This obviously assumes the forwarded IRQ is
>> injected from kernel side.
>>
>> Signed-off-by: Eric Auger <eric.auger@linaro.org>
>>
>> ---
>>
>> It was attempted to reset the states in __kvm_vgic_sync_hwstate, checking
>> the emptied LR of forwarded IRQ. However surprisingly this solution does
>> not seem to work. Some times, a new forwarded IRQ injection is observed
>> while the LR of the previous instance was not observed as empty.
> 
> hmmm, concerning.  It would probably have been helpful overall if you
> could start by describing the problem with the current implementation in
> the commit message, and then explain the fix...
> 
>>
>> v1 -> v2:
>> - fix vgic state bypass in vgic_queue_hwirq
>> ---
>>  virt/kvm/arm/vgic.c | 13 ++++++++++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index 0007300..8ef495b 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1259,7 +1259,9 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
>>  
>>  static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
>>  {
>> -	if (vgic_irq_is_queued(vcpu, irq))
>> +	bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) > 0);
> 
> can you create a static function to factor this vgic_get_phys_irq check out, please?
yes sure
> 
>> +
>> +	if (vgic_irq_is_queued(vcpu, irq) && !is_forwarded)
>>  		return true; /* level interrupt, already queued */
> 
> so essentially if an IRQ is already on a LR so we shouldn't resample the
> line, then we still resample the line if the IRQ is forwarded?
> 
> I think you need to explain this, both to me here, and also in the code
> by moving the comment following the return statement above the check and
> comment this clearly.
Well, I admit it may look a bit pushy! When we discussed this issue with
Marc, the outcome was that the vgic states were not accurate with
forwarded IRQs and VGIC state may be fully bypassed. Since the first
injection still sets the state - and I did not want to modify this - the
2d one would fail due to that check, and the validate_injection. May be
cleaner to not update the states when injecting the fwd irq too.

> 
>>  
>>  	if (vgic_queue_irq(vcpu, 0, irq)) {
>> @@ -1517,14 +1519,18 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>>  	int edge_triggered, level_triggered;
>>  	int enabled;
>>  	bool ret = true;
>> +	bool is_forwarded;
>>  
>>  	spin_lock(&dist->lock);
>>  
>>  	vcpu = kvm_get_vcpu(kvm, cpuid);
>> +	is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) > 0);
> 
> use your new function here as well.
ok
> 
>> +
>>  	edge_triggered = vgic_irq_is_edge(vcpu, irq_num);
>>  	level_triggered = !edge_triggered;
>>  
>> -	if (!vgic_validate_injection(vcpu, irq_num, level)) {
>> +	if (!is_forwarded &&
>> +		!vgic_validate_injection(vcpu, irq_num, level)) {
> 
> I don't see the rationale here either.  If an IRQ is forwarded, why do
> you need to do anything if the condition of the line hasn't changed for
> a level-triggered IRQ or if you have a falling edge on an edge-triggered
> IRQ (assuming active-HIGH)?
To me this even cannot cannot happen. a second fwd irq can only hit if
the same virtual IRQ was completed and completed the corresponding phys
IRQ. Still the problem is that on the 1st injection we updated the VGIC
state. I aknowledge this is a hack to work around the 1st injection
update the state and nothing reset them. So on subsequent injections, -
and even on the 1st one-  I never check the state.
> 
>>  		ret = false;
>>  		goto out;
>>  	}
>> @@ -1557,7 +1563,8 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>>  		goto out;
>>  	}
>>  
>> -	if (level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
>> +	if (!is_forwarded &&
>> +		level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
> 
> So here it's making sense for SPIs since you can have an EOIed interrupt
> on a CPU that didn't exit the VM yet, and this it's still queued, but
> you still need to resample the line to respect other CPUs.  Only, we
> ever only target a single CPU for SPIs IIRC (the first in the target
> list register) so we have to wait for that CPU to to exit the VM anyhow.
> 
> This leads me to believe that, given a fowarded irq, you can only have
> XXX situations at this point:
> 
> (1) is_queued && target_vcpu_in_vm:
> The vcpu should resample this line when it exits the VM, because we
> check the LRs for IRQs like this one, so we don't have to do anything
> and go to out here.
> 
> (2) is_queued && !target_vcpu_in_vm::
> You have a bug because you exited the VM which must have done an EOI on
> the interrupt, otherwise this function shouldn't have been called!  This
> means that we should have cleared the queued state of the interrupt.
> 
> (3) !is_queued && whatever:
> Set the irq pending bits, so do not goto out.
> 
> I'm aware that there's theoretically a race between (1) and (2), but you
> should consider target_cpu_in_vm as "it hasn't been through
> __kvm_vgic_sync_hwstate() yet" and this should hold.
I will prepare something accurate for next week.

Thanks

Best Regards

Eric
> 
> Tell me where this breaks?
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
  2014-09-11 18:14           ` Alex Williamson
@ 2014-09-11 21:59             ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 21:59 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Eric Auger, eric.auger, marc.zyngier, linux-arm-kernel, kvmarm,
	kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, Sep 11, 2014 at 12:14:10PM -0600, Alex Williamson wrote:
> On Thu, 2014-09-11 at 19:10 +0200, Christoffer Dall wrote:
> > On Wed, Sep 10, 2014 at 11:05:49PM -0600, Alex Williamson wrote:
> > > On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> > > > On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> > 
> > [...]
> > 
> > > > >  
> > > > > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > > > +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > > > 
> > > > what's the 'p' in pfwd?
> > > 
> > > p is for pointer?
> > > 
> > 
> > shouldn't the type declation spell out quite clearly to me that I'm
> > dealing with a pointer?
> 
> Sure.  In the cases where I've done similar things it's more a matter of
> not needing to come up with another variable, for instance if I need
> both a struct and a struct* I might call them foo and pfoo if I can't
> come up with anything more meaningful.
> 
> 
> > [...]
> > 
> > > > 
> > > > need some spaceing here, also, I would turn this around, first check if
> > > > the strcmp fails, and then error out, then do you next check etc., to
> > > > avoid so many nested statements.
> > > > 
> > > > > +	/* is a ref to this device already owned by the KVM-VFIO device? */
> > > > 
> > > > this comment is not particularly helpful in its current form, it would
> > > > be helpful if you specified that we're checking whether that particular
> > > > device/irq combo is already registered.
> > > > 
> > > > > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > > > > +	if (*kvm_vdev) {
> > > > > +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> > > > > +			kvm_err("%s irq %d already forwarded\n",
> > > > > +				__func__, *hwirq);
> > > 
> > > Why didn't we do this first?
> > > 
> > huh?
> 
> The code is doing:
> 
> 1. can the arch forward this irq
> 2. are we already forwarding this irq
> 
> It's backwards, test for duplicates locally before calling out into arch
> code.  Besides, I think the arch code here should go away and just be
> another error condition for the call-out.  Thanks,
> 
Ah, right, you meant for the whole check.  I agree completely.

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control
@ 2014-09-11 21:59             ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 21:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 12:14:10PM -0600, Alex Williamson wrote:
> On Thu, 2014-09-11 at 19:10 +0200, Christoffer Dall wrote:
> > On Wed, Sep 10, 2014 at 11:05:49PM -0600, Alex Williamson wrote:
> > > On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> > > > On Mon, Sep 01, 2014 at 02:52:47PM +0200, Eric Auger wrote:
> > 
> > [...]
> > 
> > > > >  
> > > > > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > > > +int kvm_arch_set_fwd_state(struct kvm_fwd_irq *pfwd,
> > > > 
> > > > what's the 'p' in pfwd?
> > > 
> > > p is for pointer?
> > > 
> > 
> > shouldn't the type declation spell out quite clearly to me that I'm
> > dealing with a pointer?
> 
> Sure.  In the cases where I've done similar things it's more a matter of
> not needing to come up with another variable, for instance if I need
> both a struct and a struct* I might call them foo and pfoo if I can't
> come up with anything more meaningful.
> 
> 
> > [...]
> > 
> > > > 
> > > > need some spaceing here, also, I would turn this around, first check if
> > > > the strcmp fails, and then error out, then do you next check etc., to
> > > > avoid so many nested statements.
> > > > 
> > > > > +	/* is a ref to this device already owned by the KVM-VFIO device? */
> > > > 
> > > > this comment is not particularly helpful in its current form, it would
> > > > be helpful if you specified that we're checking whether that particular
> > > > device/irq combo is already registered.
> > > > 
> > > > > +	*kvm_vdev = kvm_vfio_find_device(kv, vdev);
> > > > > +	if (*kvm_vdev) {
> > > > > +		if (kvm_vfio_find_irq(*kvm_vdev, fwd_irq->index)) {
> > > > > +			kvm_err("%s irq %d already forwarded\n",
> > > > > +				__func__, *hwirq);
> > > 
> > > Why didn't we do this first?
> > > 
> > huh?
> 
> The code is doing:
> 
> 1. can the arch forward this irq
> 2. are we already forwarding this irq
> 
> It's backwards, test for duplicates locally before calling out into arch
> code.  Besides, I think the arch code here should go away and just be
> another error condition for the call-out.  Thanks,
> 
Ah, right, you meant for the whole check.  I agree completely.

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 1/9] KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded IRQ
  2014-09-11 18:17       ` Eric Auger
@ 2014-09-11 22:14         ` Christoffer Dall
  -1 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 22:14 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	alex.williamson, joel.schopp, kim.phillips, paulus, gleb,
	pbonzini, linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli

On Thu, Sep 11, 2014 at 08:17:49PM +0200, Eric Auger wrote:
> On 09/11/2014 05:09 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:40PM +0200, Eric Auger wrote:
> >> Fix multiple injection of level sensitive forwarded IRQs.
> >> With current code, the second injection fails since the state bitmaps
> >> are not reset (process_maintenance is not called anymore).
> >> New implementation consists in fully bypassing the vgic state
> >> management for forwarded IRQ (checks are ignored in
> >> vgic_update_irq_pending). This obviously assumes the forwarded IRQ is
> >> injected from kernel side.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>
> >> ---
> >>
> >> It was attempted to reset the states in __kvm_vgic_sync_hwstate, checking
> >> the emptied LR of forwarded IRQ. However surprisingly this solution does
> >> not seem to work. Some times, a new forwarded IRQ injection is observed
> >> while the LR of the previous instance was not observed as empty.
> > 
> > hmmm, concerning.  It would probably have been helpful overall if you
> > could start by describing the problem with the current implementation in
> > the commit message, and then explain the fix...
> > 
> >>
> >> v1 -> v2:
> >> - fix vgic state bypass in vgic_queue_hwirq
> >> ---
> >>  virt/kvm/arm/vgic.c | 13 ++++++++++---
> >>  1 file changed, 10 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> >> index 0007300..8ef495b 100644
> >> --- a/virt/kvm/arm/vgic.c
> >> +++ b/virt/kvm/arm/vgic.c
> >> @@ -1259,7 +1259,9 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
> >>  
> >>  static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
> >>  {
> >> -	if (vgic_irq_is_queued(vcpu, irq))
> >> +	bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) > 0);
> > 
> > can you create a static function to factor this vgic_get_phys_irq check out, please?
> yes sure
> > 
> >> +
> >> +	if (vgic_irq_is_queued(vcpu, irq) && !is_forwarded)
> >>  		return true; /* level interrupt, already queued */
> > 
> > so essentially if an IRQ is already on a LR so we shouldn't resample the
> > line, then we still resample the line if the IRQ is forwarded?
> > 
> > I think you need to explain this, both to me here, and also in the code
> > by moving the comment following the return statement above the check and
> > comment this clearly.
> Well, I admit it may look a bit pushy! When we discussed this issue with
> Marc, the outcome was that the vgic states were not accurate with
> forwarded IRQs and VGIC state may be fully bypassed.

Can you explain this in more details?  Perhaps with a concrete example?

> Since the first
> injection still sets the state - and I did not want to modify this - the
> 2d one would fail due to that check, and the validate_injection. May be
> cleaner to not update the states when injecting the fwd irq too.

Hmmm, I don't think I understand you here.  I think you need to think
about the whole flow of things here and understand any posible sequence
of events combined with any potential state you may have.  Perhaps this
is better deferred to a face-to-face discussion.

> 
> > 
> >>  
> >>  	if (vgic_queue_irq(vcpu, 0, irq)) {
> >> @@ -1517,14 +1519,18 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
> >>  	int edge_triggered, level_triggered;
> >>  	int enabled;
> >>  	bool ret = true;
> >> +	bool is_forwarded;
> >>  
> >>  	spin_lock(&dist->lock);
> >>  
> >>  	vcpu = kvm_get_vcpu(kvm, cpuid);
> >> +	is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) > 0);
> > 
> > use your new function here as well.
> ok
> > 
> >> +
> >>  	edge_triggered = vgic_irq_is_edge(vcpu, irq_num);
> >>  	level_triggered = !edge_triggered;
> >>  
> >> -	if (!vgic_validate_injection(vcpu, irq_num, level)) {
> >> +	if (!is_forwarded &&
> >> +		!vgic_validate_injection(vcpu, irq_num, level)) {
> > 
> > I don't see the rationale here either.  If an IRQ is forwarded, why do
> > you need to do anything if the condition of the line hasn't changed for
> > a level-triggered IRQ or if you have a falling edge on an edge-triggered
> > IRQ (assuming active-HIGH)?
> To me this even cannot cannot happen. a second fwd irq can only hit if
> the same virtual IRQ was completed and completed the corresponding phys
> IRQ. Still the problem is that on the 1st injection we updated the VGIC
> state. 

Updated teh VGIC state?  Be more specific!

> I aknowledge this is a hack to work around the 1st injection
> update the state and nothing reset them. So on subsequent injections, -
> and even on the 1st one-  I never check the state.

Is the case here that you propogate the line state onto the vcpu pending
state when somebody calls this inject function, so you use this as a
chance to resample the line?

If so, we need to document this clearly (and you need to convince me
that this is in fact the right thing we're doing overall), and we may
have to reword and refactor some of this to not have these weird-looking
corner cases with outrageously complicated comments describing state in
what's basically becoming a non-trivial state machine.

> > 
> >>  		ret = false;
> >>  		goto out;
> >>  	}
> >> @@ -1557,7 +1563,8 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
> >>  		goto out;
> >>  	}
> >>  
> >> -	if (level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
> >> +	if (!is_forwarded &&
> >> +		level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
> > 
> > So here it's making sense for SPIs since you can have an EOIed interrupt
> > on a CPU that didn't exit the VM yet, and this it's still queued, but
> > you still need to resample the line to respect other CPUs.  Only, we
> > ever only target a single CPU for SPIs IIRC (the first in the target
> > list register) so we have to wait for that CPU to to exit the VM anyhow.
> > 
> > This leads me to believe that, given a fowarded irq, you can only have
> > XXX situations at this point:
> > 
> > (1) is_queued && target_vcpu_in_vm:
> > The vcpu should resample this line when it exits the VM, because we
> > check the LRs for IRQs like this one, so we don't have to do anything
> > and go to out here.
> > 
> > (2) is_queued && !target_vcpu_in_vm::
> > You have a bug because you exited the VM which must have done an EOI on
> > the interrupt, otherwise this function shouldn't have been called!  This
> > means that we should have cleared the queued state of the interrupt.
> > 
> > (3) !is_queued && whatever:
> > Set the irq pending bits, so do not goto out.
> > 
> > I'm aware that there's theoretically a race between (1) and (2), but you
> > should consider target_cpu_in_vm as "it hasn't been through
> > __kvm_vgic_sync_hwstate() yet" and this should hold.
> I will prepare something accurate for next week.
> 
Yeah, I think we need to talk this through at LCU14.

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 1/9] KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded IRQ
@ 2014-09-11 22:14         ` Christoffer Dall
  0 siblings, 0 replies; 101+ messages in thread
From: Christoffer Dall @ 2014-09-11 22:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 11, 2014 at 08:17:49PM +0200, Eric Auger wrote:
> On 09/11/2014 05:09 AM, Christoffer Dall wrote:
> > On Mon, Sep 01, 2014 at 02:52:40PM +0200, Eric Auger wrote:
> >> Fix multiple injection of level sensitive forwarded IRQs.
> >> With current code, the second injection fails since the state bitmaps
> >> are not reset (process_maintenance is not called anymore).
> >> New implementation consists in fully bypassing the vgic state
> >> management for forwarded IRQ (checks are ignored in
> >> vgic_update_irq_pending). This obviously assumes the forwarded IRQ is
> >> injected from kernel side.
> >>
> >> Signed-off-by: Eric Auger <eric.auger@linaro.org>
> >>
> >> ---
> >>
> >> It was attempted to reset the states in __kvm_vgic_sync_hwstate, checking
> >> the emptied LR of forwarded IRQ. However surprisingly this solution does
> >> not seem to work. Some times, a new forwarded IRQ injection is observed
> >> while the LR of the previous instance was not observed as empty.
> > 
> > hmmm, concerning.  It would probably have been helpful overall if you
> > could start by describing the problem with the current implementation in
> > the commit message, and then explain the fix...
> > 
> >>
> >> v1 -> v2:
> >> - fix vgic state bypass in vgic_queue_hwirq
> >> ---
> >>  virt/kvm/arm/vgic.c | 13 ++++++++++---
> >>  1 file changed, 10 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> >> index 0007300..8ef495b 100644
> >> --- a/virt/kvm/arm/vgic.c
> >> +++ b/virt/kvm/arm/vgic.c
> >> @@ -1259,7 +1259,9 @@ static bool vgic_queue_sgi(struct kvm_vcpu *vcpu, int irq)
> >>  
> >>  static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq)
> >>  {
> >> -	if (vgic_irq_is_queued(vcpu, irq))
> >> +	bool is_forwarded =  (vgic_get_phys_irq(vcpu, irq) > 0);
> > 
> > can you create a static function to factor this vgic_get_phys_irq check out, please?
> yes sure
> > 
> >> +
> >> +	if (vgic_irq_is_queued(vcpu, irq) && !is_forwarded)
> >>  		return true; /* level interrupt, already queued */
> > 
> > so essentially if an IRQ is already on a LR so we shouldn't resample the
> > line, then we still resample the line if the IRQ is forwarded?
> > 
> > I think you need to explain this, both to me here, and also in the code
> > by moving the comment following the return statement above the check and
> > comment this clearly.
> Well, I admit it may look a bit pushy! When we discussed this issue with
> Marc, the outcome was that the vgic states were not accurate with
> forwarded IRQs and VGIC state may be fully bypassed.

Can you explain this in more details?  Perhaps with a concrete example?

> Since the first
> injection still sets the state - and I did not want to modify this - the
> 2d one would fail due to that check, and the validate_injection. May be
> cleaner to not update the states when injecting the fwd irq too.

Hmmm, I don't think I understand you here.  I think you need to think
about the whole flow of things here and understand any posible sequence
of events combined with any potential state you may have.  Perhaps this
is better deferred to a face-to-face discussion.

> 
> > 
> >>  
> >>  	if (vgic_queue_irq(vcpu, 0, irq)) {
> >> @@ -1517,14 +1519,18 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
> >>  	int edge_triggered, level_triggered;
> >>  	int enabled;
> >>  	bool ret = true;
> >> +	bool is_forwarded;
> >>  
> >>  	spin_lock(&dist->lock);
> >>  
> >>  	vcpu = kvm_get_vcpu(kvm, cpuid);
> >> +	is_forwarded = (vgic_get_phys_irq(vcpu, irq_num) > 0);
> > 
> > use your new function here as well.
> ok
> > 
> >> +
> >>  	edge_triggered = vgic_irq_is_edge(vcpu, irq_num);
> >>  	level_triggered = !edge_triggered;
> >>  
> >> -	if (!vgic_validate_injection(vcpu, irq_num, level)) {
> >> +	if (!is_forwarded &&
> >> +		!vgic_validate_injection(vcpu, irq_num, level)) {
> > 
> > I don't see the rationale here either.  If an IRQ is forwarded, why do
> > you need to do anything if the condition of the line hasn't changed for
> > a level-triggered IRQ or if you have a falling edge on an edge-triggered
> > IRQ (assuming active-HIGH)?
> To me this even cannot cannot happen. a second fwd irq can only hit if
> the same virtual IRQ was completed and completed the corresponding phys
> IRQ. Still the problem is that on the 1st injection we updated the VGIC
> state. 

Updated teh VGIC state?  Be more specific!

> I aknowledge this is a hack to work around the 1st injection
> update the state and nothing reset them. So on subsequent injections, -
> and even on the 1st one-  I never check the state.

Is the case here that you propogate the line state onto the vcpu pending
state when somebody calls this inject function, so you use this as a
chance to resample the line?

If so, we need to document this clearly (and you need to convince me
that this is in fact the right thing we're doing overall), and we may
have to reword and refactor some of this to not have these weird-looking
corner cases with outrageously complicated comments describing state in
what's basically becoming a non-trivial state machine.

> > 
> >>  		ret = false;
> >>  		goto out;
> >>  	}
> >> @@ -1557,7 +1563,8 @@ static bool vgic_update_irq_pending(struct kvm *kvm, int cpuid,
> >>  		goto out;
> >>  	}
> >>  
> >> -	if (level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
> >> +	if (!is_forwarded &&
> >> +		level_triggered && vgic_irq_is_queued(vcpu, irq_num)) {
> > 
> > So here it's making sense for SPIs since you can have an EOIed interrupt
> > on a CPU that didn't exit the VM yet, and this it's still queued, but
> > you still need to resample the line to respect other CPUs.  Only, we
> > ever only target a single CPU for SPIs IIRC (the first in the target
> > list register) so we have to wait for that CPU to to exit the VM anyhow.
> > 
> > This leads me to believe that, given a fowarded irq, you can only have
> > XXX situations at this point:
> > 
> > (1) is_queued && target_vcpu_in_vm:
> > The vcpu should resample this line when it exits the VM, because we
> > check the LRs for IRQs like this one, so we don't have to do anything
> > and go to out here.
> > 
> > (2) is_queued && !target_vcpu_in_vm::
> > You have a bug because you exited the VM which must have done an EOI on
> > the interrupt, otherwise this function shouldn't have been called!  This
> > means that we should have cleared the queued state of the interrupt.
> > 
> > (3) !is_queued && whatever:
> > Set the irq pending bits, so do not goto out.
> > 
> > I'm aware that there's theoretically a race between (1) and (2), but you
> > should consider target_cpu_in_vm as "it hasn't been through
> > __kvm_vgic_sync_hwstate() yet" and this should hold.
> I will prepare something accurate for next week.
> 
Yeah, I think we need to talk this through at LCU14.

-Christoffer

^ permalink raw reply	[flat|nested] 101+ messages in thread

* RE: [RFC v2 0/9] KVM-VFIO IRQ forward control
  2014-09-11  5:09       ` Alex Williamson
  (?)
@ 2014-11-17 11:25         ` Wu, Feng
  -1 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2014-11-17 11:25 UTC (permalink / raw)
  To: Alex Williamson, Christoffer Dall
  Cc: Eric Auger, eric.auger, marc.zyngier, linux-arm-kernel, kvmarm,
	kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli, Wu, Feng

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 8628 bytes --]



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Alex Williamson
> Sent: Thursday, September 11, 2014 1:10 PM
> To: Christoffer Dall
> Cc: Eric Auger; eric.auger@st.com; marc.zyngier@arm.com;
> linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> kvm@vger.kernel.org; joel.schopp@amd.com; kim.phillips@freescale.com;
> paulus@samba.org; gleb@kernel.org; pbonzini@redhat.com;
> linux-kernel@vger.kernel.org; patches@linaro.org; will.deacon@arm.com;
> a.motakis@virtualopensystems.com; a.rigo@virtualopensystems.com;
> john.liuli@huawei.com
> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
> 
> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> > On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> > > On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> > > > This RFC proposes an integration of "ARM: Forwarding physical
> > > > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> > > > KVM.
> > > >
> > > > It enables to transform a VFIO platform driver IRQ into a forwarded
> > > > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> > > > switch can be avoided on guest virtual IRQ completion. Before this
> > > > patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> > > >
> > > > When the IRQ is forwarded, the VFIO platform driver does not need to
> > > > disable the IRQ anymore. Indeed when returning from the IRQ handler
> > > > the IRQ is not deactivated. Only its priority is lowered. This means
> > > > the same IRQ cannot hit before the guest completes the virtual IRQ
> > > > and the GIC automatically deactivates the corresponding physical IRQ.
> > > >
> > > > Besides, the injection still is based on irqfd triggering. The only
> > > > impact on irqfd process is resamplefd is not called anymore on
> > > > virtual IRQ completion since this latter becomes "transparent".
> > > >
> > > > The current integration is based on an extension of the KVM-VFIO
> > > > device, previously used by KVM to interact with VFIO groups. The
> > > > patch serie now enables KVM to directly interact with a VFIO
> > > > platform device. The VFIO external API was extended for that purpose.
> > > >
> > > > Th KVM-VFIO device can get/put the vfio platform device, check its
> > > > integrity and type, get the IRQ number associated to an IRQ index.
> > > >
> > > > The IRQ forward programming is architecture specific (virtual interrupt
> > > > controller programming basically). However the whole infrastructure is
> > > > kept generic.
> > > >
> > > > from a user point of view, the functionality is provided through new
> > > > KVM-VFIO device commands,
> KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> > > > and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> > > > Assignment can only be changed when the physical IRQ is not active.
> > > > It is the responsability of the user to do this check.
> > > >
> > > > This patch serie has the following dependencies:
> > > > - "ARM: Forwarding physical interrupts to a guest VM"
> > > >   (http://lwn.net/Articles/603514/) in
> > > > - [PATCH v3] irqfd for ARM
> > > > - and obviously the VFIO platform driver serie:
> > > >   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> > > >   https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> > > >
> > > > Integrated pieces can be found at
> > > > ssh://git.linaro.org/people/eric.auger/linux.git
> > > > on branch 3.17rc3_irqfd_forward_integ_v2
> > > >
> > > > This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> > > >
> > > > v1 -> v2:
> > > > - forward control is moved from architecture specific file into generic
> > > >   vfio.c module.
> > > >   only kvm_arch_set_fwd_state remains architecture specific
> > > > - integrate Kim's patch which enables KVM-VFIO for ARM
> > > > - fix vgic state bypass in vgic_queue_hwirq
> > > > - struct kvm_arch_forwarded_irq moved from
> arch/arm/include/uapi/asm/kvm.h
> > > >   to include/uapi/linux/kvm.h
> > > >   also irq_index renamed into index and guest_irq renamed into gsi
> > > > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> > > > - vfio_external_get_base_device renamed into vfio_external_base_device
> > > > - vfio_external_get_type removed
> > > > - kvm_vfio_external_get_base_device renamed into
> kvm_vfio_external_base_device
> > > > - __KVM_HAVE_ARCH_KVM_VFIO renamed into
> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > >
> > > > Eric Auger (8):
> > > >   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> > > >     IRQ
> > > >   KVM: ARM: VGIC: add forwarded irq rbtree lock
> > > >   VFIO: platform: handler tests whether the IRQ is forwarded
> > > >   KVM: KVM-VFIO: update user API to program forwarded IRQ
> > > >   VFIO: Extend external user API
> > > >   KVM: KVM-VFIO: add new VFIO external API hooks
> > > >   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ
> forwarding
> > > >     control
> > > >   KVM: KVM-VFIO: ARM forwarding control
> > > >
> > > > Kim Phillips (1):
> > > >   ARM: KVM: Enable the KVM-VFIO device
> > > >
> > > >  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> > > >  arch/arm/include/asm/kvm_host.h            |   7 +
> > > >  arch/arm/kvm/Kconfig                       |   1 +
> > > >  arch/arm/kvm/Makefile                      |   4 +-
> > > >  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
> > > >  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> > > >  drivers/vfio/vfio.c                        |  24 ++
> > > >  include/kvm/arm_vgic.h                     |   1 +
> > > >  include/linux/kvm_host.h                   |  27 ++
> > > >  include/linux/vfio.h                       |   3 +
> > > >  include/uapi/linux/kvm.h                   |   9 +
> > > >  virt/kvm/arm/vgic.c                        |  59 +++-
> > > >  virt/kvm/vfio.c                            | 497
> ++++++++++++++++++++++++++++-
> > > >  13 files changed, 733 insertions(+), 17 deletions(-)
> > > >  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> > > >
> > >
> > > Have we ventured too far in the other direction?  I suppose what I was
> > > hoping to see was something more like:
> > >
> > > 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> > >
> > > 		/* get vfio_device */
> > >
> > > 		/* get mutex */
> > >
> > > 		/* verify device+irq isn't already forwarded */
> > >
> > > 		/* allocate device/forwarded irq */
> > >
> > > 		/* get struct device */
> > >
> > > 		/* callout to arch code passing struct device, gsi, ... */
> > >
> > > 		/* if success, add to kv, else free and error */
> > >
> > > 		/* mutex unlock */
> > > 	}
> >
> > I think that's essentially what this patch set is trying to do, but
> > there are just too many complicated intertwining cases right now that
> > makes the code hard to read.
> >
> > >
> > > Exposing the internal mutex out to arch code, as in v1, was an
> > > indication that we were pushing too much out to arch code, but including
> > > platform_device.h into virt/kvm/vfio.c tells me we're still not
> > > abstracting at the right point.  Thanks,
> > >
> > I raised my eyebrows over the platform device bus thingy here as well,
> > but on the other hand, there's nothing ARM-specific about referring to
> > the platform device bus.
> >
> > I think perhaps it just has to be made more clear that the generic code
> > deals with translating the device resources in the necessary way, and
> > currently it only supports vfio-platform devices?
> 
> Ok, you're probably right, looking at it again it is closer than I
> thought.  At the same time, the use of platform device in
> virt/kvm/vfio.c is pointless and can easily be pushed out to the arch
> code as just another error return case.  vfio.c doesn't need to be aware
> of hwirq.  The rest of the code is just overly complicated, with three
> different cleanup functions and validation function bloat.  Thanks,
> 
> Alex


Hi Alex, Could you please tell what is the current status of this patch set.
As you mentioned in another thread, something(such as, kvm_vfio_device_get_external_user(), etc.)
in this patch set can be leveraged for VT-d Posted-interrtups.

Thanks,
Feng

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 101+ messages in thread

* RE: [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-11-17 11:25         ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2014-11-17 11:25 UTC (permalink / raw)
  To: Alex Williamson, Christoffer Dall
  Cc: Eric Auger, eric.auger, marc.zyngier, linux-arm-kernel, kvmarm,
	kvm, joel.schopp, kim.phillips, paulus, gleb, pbonzini,
	linux-kernel, patches, will.deacon, a.motakis, a.rigo,
	john.liuli,



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Alex Williamson
> Sent: Thursday, September 11, 2014 1:10 PM
> To: Christoffer Dall
> Cc: Eric Auger; eric.auger@st.com; marc.zyngier@arm.com;
> linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> kvm@vger.kernel.org; joel.schopp@amd.com; kim.phillips@freescale.com;
> paulus@samba.org; gleb@kernel.org; pbonzini@redhat.com;
> linux-kernel@vger.kernel.org; patches@linaro.org; will.deacon@arm.com;
> a.motakis@virtualopensystems.com; a.rigo@virtualopensystems.com;
> john.liuli@huawei.com
> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
> 
> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> > On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> > > On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> > > > This RFC proposes an integration of "ARM: Forwarding physical
> > > > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> > > > KVM.
> > > >
> > > > It enables to transform a VFIO platform driver IRQ into a forwarded
> > > > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> > > > switch can be avoided on guest virtual IRQ completion. Before this
> > > > patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> > > >
> > > > When the IRQ is forwarded, the VFIO platform driver does not need to
> > > > disable the IRQ anymore. Indeed when returning from the IRQ handler
> > > > the IRQ is not deactivated. Only its priority is lowered. This means
> > > > the same IRQ cannot hit before the guest completes the virtual IRQ
> > > > and the GIC automatically deactivates the corresponding physical IRQ.
> > > >
> > > > Besides, the injection still is based on irqfd triggering. The only
> > > > impact on irqfd process is resamplefd is not called anymore on
> > > > virtual IRQ completion since this latter becomes "transparent".
> > > >
> > > > The current integration is based on an extension of the KVM-VFIO
> > > > device, previously used by KVM to interact with VFIO groups. The
> > > > patch serie now enables KVM to directly interact with a VFIO
> > > > platform device. The VFIO external API was extended for that purpose.
> > > >
> > > > Th KVM-VFIO device can get/put the vfio platform device, check its
> > > > integrity and type, get the IRQ number associated to an IRQ index.
> > > >
> > > > The IRQ forward programming is architecture specific (virtual interrupt
> > > > controller programming basically). However the whole infrastructure is
> > > > kept generic.
> > > >
> > > > from a user point of view, the functionality is provided through new
> > > > KVM-VFIO device commands,
> KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> > > > and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> > > > Assignment can only be changed when the physical IRQ is not active.
> > > > It is the responsability of the user to do this check.
> > > >
> > > > This patch serie has the following dependencies:
> > > > - "ARM: Forwarding physical interrupts to a guest VM"
> > > >   (http://lwn.net/Articles/603514/) in
> > > > - [PATCH v3] irqfd for ARM
> > > > - and obviously the VFIO platform driver serie:
> > > >   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> > > >   https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> > > >
> > > > Integrated pieces can be found at
> > > > ssh://git.linaro.org/people/eric.auger/linux.git
> > > > on branch 3.17rc3_irqfd_forward_integ_v2
> > > >
> > > > This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> > > >
> > > > v1 -> v2:
> > > > - forward control is moved from architecture specific file into generic
> > > >   vfio.c module.
> > > >   only kvm_arch_set_fwd_state remains architecture specific
> > > > - integrate Kim's patch which enables KVM-VFIO for ARM
> > > > - fix vgic state bypass in vgic_queue_hwirq
> > > > - struct kvm_arch_forwarded_irq moved from
> arch/arm/include/uapi/asm/kvm.h
> > > >   to include/uapi/linux/kvm.h
> > > >   also irq_index renamed into index and guest_irq renamed into gsi
> > > > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> > > > - vfio_external_get_base_device renamed into vfio_external_base_device
> > > > - vfio_external_get_type removed
> > > > - kvm_vfio_external_get_base_device renamed into
> kvm_vfio_external_base_device
> > > > - __KVM_HAVE_ARCH_KVM_VFIO renamed into
> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > >
> > > > Eric Auger (8):
> > > >   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> > > >     IRQ
> > > >   KVM: ARM: VGIC: add forwarded irq rbtree lock
> > > >   VFIO: platform: handler tests whether the IRQ is forwarded
> > > >   KVM: KVM-VFIO: update user API to program forwarded IRQ
> > > >   VFIO: Extend external user API
> > > >   KVM: KVM-VFIO: add new VFIO external API hooks
> > > >   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ
> forwarding
> > > >     control
> > > >   KVM: KVM-VFIO: ARM forwarding control
> > > >
> > > > Kim Phillips (1):
> > > >   ARM: KVM: Enable the KVM-VFIO device
> > > >
> > > >  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> > > >  arch/arm/include/asm/kvm_host.h            |   7 +
> > > >  arch/arm/kvm/Kconfig                       |   1 +
> > > >  arch/arm/kvm/Makefile                      |   4 +-
> > > >  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
> > > >  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> > > >  drivers/vfio/vfio.c                        |  24 ++
> > > >  include/kvm/arm_vgic.h                     |   1 +
> > > >  include/linux/kvm_host.h                   |  27 ++
> > > >  include/linux/vfio.h                       |   3 +
> > > >  include/uapi/linux/kvm.h                   |   9 +
> > > >  virt/kvm/arm/vgic.c                        |  59 +++-
> > > >  virt/kvm/vfio.c                            | 497
> ++++++++++++++++++++++++++++-
> > > >  13 files changed, 733 insertions(+), 17 deletions(-)
> > > >  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> > > >
> > >
> > > Have we ventured too far in the other direction?  I suppose what I was
> > > hoping to see was something more like:
> > >
> > > 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> > >
> > > 		/* get vfio_device */
> > >
> > > 		/* get mutex */
> > >
> > > 		/* verify device+irq isn't already forwarded */
> > >
> > > 		/* allocate device/forwarded irq */
> > >
> > > 		/* get struct device */
> > >
> > > 		/* callout to arch code passing struct device, gsi, ... */
> > >
> > > 		/* if success, add to kv, else free and error */
> > >
> > > 		/* mutex unlock */
> > > 	}
> >
> > I think that's essentially what this patch set is trying to do, but
> > there are just too many complicated intertwining cases right now that
> > makes the code hard to read.
> >
> > >
> > > Exposing the internal mutex out to arch code, as in v1, was an
> > > indication that we were pushing too much out to arch code, but including
> > > platform_device.h into virt/kvm/vfio.c tells me we're still not
> > > abstracting at the right point.  Thanks,
> > >
> > I raised my eyebrows over the platform device bus thingy here as well,
> > but on the other hand, there's nothing ARM-specific about referring to
> > the platform device bus.
> >
> > I think perhaps it just has to be made more clear that the generic code
> > deals with translating the device resources in the necessary way, and
> > currently it only supports vfio-platform devices?
> 
> Ok, you're probably right, looking at it again it is closer than I
> thought.  At the same time, the use of platform device in
> virt/kvm/vfio.c is pointless and can easily be pushed out to the arch
> code as just another error return case.  vfio.c doesn't need to be aware
> of hwirq.  The rest of the code is just overly complicated, with three
> different cleanup functions and validation function bloat.  Thanks,
> 
> Alex


Hi Alex, Could you please tell what is the current status of this patch set.
As you mentioned in another thread, something(such as, kvm_vfio_device_get_external_user(), etc.)
in this patch set can be leveraged for VT-d Posted-interrtups.

Thanks,
Feng

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-11-17 11:25         ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2014-11-17 11:25 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: linux-kernel-owner at vger.kernel.org
> [mailto:linux-kernel-owner at vger.kernel.org] On Behalf Of Alex Williamson
> Sent: Thursday, September 11, 2014 1:10 PM
> To: Christoffer Dall
> Cc: Eric Auger; eric.auger at st.com; marc.zyngier at arm.com;
> linux-arm-kernel at lists.infradead.org; kvmarm at lists.cs.columbia.edu;
> kvm at vger.kernel.org; joel.schopp at amd.com; kim.phillips at freescale.com;
> paulus at samba.org; gleb at kernel.org; pbonzini at redhat.com;
> linux-kernel at vger.kernel.org; patches at linaro.org; will.deacon at arm.com;
> a.motakis at virtualopensystems.com; a.rigo at virtualopensystems.com;
> john.liuli at huawei.com
> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
> 
> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> > On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> > > On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> > > > This RFC proposes an integration of "ARM: Forwarding physical
> > > > interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> > > > KVM.
> > > >
> > > > It enables to transform a VFIO platform driver IRQ into a forwarded
> > > > IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> > > > switch can be avoided on guest virtual IRQ completion. Before this
> > > > patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> > > >
> > > > When the IRQ is forwarded, the VFIO platform driver does not need to
> > > > disable the IRQ anymore. Indeed when returning from the IRQ handler
> > > > the IRQ is not deactivated. Only its priority is lowered. This means
> > > > the same IRQ cannot hit before the guest completes the virtual IRQ
> > > > and the GIC automatically deactivates the corresponding physical IRQ.
> > > >
> > > > Besides, the injection still is based on irqfd triggering. The only
> > > > impact on irqfd process is resamplefd is not called anymore on
> > > > virtual IRQ completion since this latter becomes "transparent".
> > > >
> > > > The current integration is based on an extension of the KVM-VFIO
> > > > device, previously used by KVM to interact with VFIO groups. The
> > > > patch serie now enables KVM to directly interact with a VFIO
> > > > platform device. The VFIO external API was extended for that purpose.
> > > >
> > > > Th KVM-VFIO device can get/put the vfio platform device, check its
> > > > integrity and type, get the IRQ number associated to an IRQ index.
> > > >
> > > > The IRQ forward programming is architecture specific (virtual interrupt
> > > > controller programming basically). However the whole infrastructure is
> > > > kept generic.
> > > >
> > > > from a user point of view, the functionality is provided through new
> > > > KVM-VFIO device commands,
> KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> > > > and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> > > > Assignment can only be changed when the physical IRQ is not active.
> > > > It is the responsability of the user to do this check.
> > > >
> > > > This patch serie has the following dependencies:
> > > > - "ARM: Forwarding physical interrupts to a guest VM"
> > > >   (http://lwn.net/Articles/603514/) in
> > > > - [PATCH v3] irqfd for ARM
> > > > - and obviously the VFIO platform driver serie:
> > > >   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> > > >   https://www.mail-archive.com/kvm at vger.kernel.org/msg103247.html
> > > >
> > > > Integrated pieces can be found at
> > > > ssh://git.linaro.org/people/eric.auger/linux.git
> > > > on branch 3.17rc3_irqfd_forward_integ_v2
> > > >
> > > > This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> > > >
> > > > v1 -> v2:
> > > > - forward control is moved from architecture specific file into generic
> > > >   vfio.c module.
> > > >   only kvm_arch_set_fwd_state remains architecture specific
> > > > - integrate Kim's patch which enables KVM-VFIO for ARM
> > > > - fix vgic state bypass in vgic_queue_hwirq
> > > > - struct kvm_arch_forwarded_irq moved from
> arch/arm/include/uapi/asm/kvm.h
> > > >   to include/uapi/linux/kvm.h
> > > >   also irq_index renamed into index and guest_irq renamed into gsi
> > > > - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> > > > - vfio_external_get_base_device renamed into vfio_external_base_device
> > > > - vfio_external_get_type removed
> > > > - kvm_vfio_external_get_base_device renamed into
> kvm_vfio_external_base_device
> > > > - __KVM_HAVE_ARCH_KVM_VFIO renamed into
> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> > > >
> > > > Eric Auger (8):
> > > >   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> > > >     IRQ
> > > >   KVM: ARM: VGIC: add forwarded irq rbtree lock
> > > >   VFIO: platform: handler tests whether the IRQ is forwarded
> > > >   KVM: KVM-VFIO: update user API to program forwarded IRQ
> > > >   VFIO: Extend external user API
> > > >   KVM: KVM-VFIO: add new VFIO external API hooks
> > > >   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ
> forwarding
> > > >     control
> > > >   KVM: KVM-VFIO: ARM forwarding control
> > > >
> > > > Kim Phillips (1):
> > > >   ARM: KVM: Enable the KVM-VFIO device
> > > >
> > > >  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> > > >  arch/arm/include/asm/kvm_host.h            |   7 +
> > > >  arch/arm/kvm/Kconfig                       |   1 +
> > > >  arch/arm/kvm/Makefile                      |   4 +-
> > > >  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
> > > >  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> > > >  drivers/vfio/vfio.c                        |  24 ++
> > > >  include/kvm/arm_vgic.h                     |   1 +
> > > >  include/linux/kvm_host.h                   |  27 ++
> > > >  include/linux/vfio.h                       |   3 +
> > > >  include/uapi/linux/kvm.h                   |   9 +
> > > >  virt/kvm/arm/vgic.c                        |  59 +++-
> > > >  virt/kvm/vfio.c                            | 497
> ++++++++++++++++++++++++++++-
> > > >  13 files changed, 733 insertions(+), 17 deletions(-)
> > > >  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> > > >
> > >
> > > Have we ventured too far in the other direction?  I suppose what I was
> > > hoping to see was something more like:
> > >
> > > 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> > >
> > > 		/* get vfio_device */
> > >
> > > 		/* get mutex */
> > >
> > > 		/* verify device+irq isn't already forwarded */
> > >
> > > 		/* allocate device/forwarded irq */
> > >
> > > 		/* get struct device */
> > >
> > > 		/* callout to arch code passing struct device, gsi, ... */
> > >
> > > 		/* if success, add to kv, else free and error */
> > >
> > > 		/* mutex unlock */
> > > 	}
> >
> > I think that's essentially what this patch set is trying to do, but
> > there are just too many complicated intertwining cases right now that
> > makes the code hard to read.
> >
> > >
> > > Exposing the internal mutex out to arch code, as in v1, was an
> > > indication that we were pushing too much out to arch code, but including
> > > platform_device.h into virt/kvm/vfio.c tells me we're still not
> > > abstracting at the right point.  Thanks,
> > >
> > I raised my eyebrows over the platform device bus thingy here as well,
> > but on the other hand, there's nothing ARM-specific about referring to
> > the platform device bus.
> >
> > I think perhaps it just has to be made more clear that the generic code
> > deals with translating the device resources in the necessary way, and
> > currently it only supports vfio-platform devices?
> 
> Ok, you're probably right, looking at it again it is closer than I
> thought.  At the same time, the use of platform device in
> virt/kvm/vfio.c is pointless and can easily be pushed out to the arch
> code as just another error return case.  vfio.c doesn't need to be aware
> of hwirq.  The rest of the code is just overly complicated, with three
> different cleanup functions and validation function bloat.  Thanks,
> 
> Alex


Hi Alex, Could you please tell what is the current status of this patch set.
As you mentioned in another thread, something(such as, kvm_vfio_device_get_external_user(), etc.)
in this patch set can be leveraged for VT-d Posted-interrtups.

Thanks,
Feng

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
  2014-11-17 11:25         ` Wu, Feng
  (?)
@ 2014-11-17 13:41           ` Eric Auger
  -1 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-11-17 13:41 UTC (permalink / raw)
  To: Wu, Feng, Alex Williamson, Christoffer Dall
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	joel.schopp, kim.phillips, paulus, gleb, pbonzini, linux-kernel,
	patches, will.deacon, a.motakis, a.rigo, john.liuli

Hi Feng,

I will submit a PATCH v3 release end of this week.

Best Regards

Eric

On 11/17/2014 12:25 PM, Wu, Feng wrote:
> 
> 
>> -----Original Message-----
>> From: linux-kernel-owner@vger.kernel.org
>> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Alex Williamson
>> Sent: Thursday, September 11, 2014 1:10 PM
>> To: Christoffer Dall
>> Cc: Eric Auger; eric.auger@st.com; marc.zyngier@arm.com;
>> linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
>> kvm@vger.kernel.org; joel.schopp@amd.com; kim.phillips@freescale.com;
>> paulus@samba.org; gleb@kernel.org; pbonzini@redhat.com;
>> linux-kernel@vger.kernel.org; patches@linaro.org; will.deacon@arm.com;
>> a.motakis@virtualopensystems.com; a.rigo@virtualopensystems.com;
>> john.liuli@huawei.com
>> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
>>
>> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
>>> On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
>>>> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
>>>>> This RFC proposes an integration of "ARM: Forwarding physical
>>>>> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
>>>>> KVM.
>>>>>
>>>>> It enables to transform a VFIO platform driver IRQ into a forwarded
>>>>> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
>>>>> switch can be avoided on guest virtual IRQ completion. Before this
>>>>> patch, a maintenance IRQ was triggered on the virtual IRQ completion.
>>>>>
>>>>> When the IRQ is forwarded, the VFIO platform driver does not need to
>>>>> disable the IRQ anymore. Indeed when returning from the IRQ handler
>>>>> the IRQ is not deactivated. Only its priority is lowered. This means
>>>>> the same IRQ cannot hit before the guest completes the virtual IRQ
>>>>> and the GIC automatically deactivates the corresponding physical IRQ.
>>>>>
>>>>> Besides, the injection still is based on irqfd triggering. The only
>>>>> impact on irqfd process is resamplefd is not called anymore on
>>>>> virtual IRQ completion since this latter becomes "transparent".
>>>>>
>>>>> The current integration is based on an extension of the KVM-VFIO
>>>>> device, previously used by KVM to interact with VFIO groups. The
>>>>> patch serie now enables KVM to directly interact with a VFIO
>>>>> platform device. The VFIO external API was extended for that purpose.
>>>>>
>>>>> Th KVM-VFIO device can get/put the vfio platform device, check its
>>>>> integrity and type, get the IRQ number associated to an IRQ index.
>>>>>
>>>>> The IRQ forward programming is architecture specific (virtual interrupt
>>>>> controller programming basically). However the whole infrastructure is
>>>>> kept generic.
>>>>>
>>>>> from a user point of view, the functionality is provided through new
>>>>> KVM-VFIO device commands,
>> KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
>>>>> and the capability can be checked with KVM_HAS_DEVICE_ATTR.
>>>>> Assignment can only be changed when the physical IRQ is not active.
>>>>> It is the responsability of the user to do this check.
>>>>>
>>>>> This patch serie has the following dependencies:
>>>>> - "ARM: Forwarding physical interrupts to a guest VM"
>>>>>   (http://lwn.net/Articles/603514/) in
>>>>> - [PATCH v3] irqfd for ARM
>>>>> - and obviously the VFIO platform driver serie:
>>>>>   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
>>>>>   https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
>>>>>
>>>>> Integrated pieces can be found at
>>>>> ssh://git.linaro.org/people/eric.auger/linux.git
>>>>> on branch 3.17rc3_irqfd_forward_integ_v2
>>>>>
>>>>> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
>>>>>
>>>>> v1 -> v2:
>>>>> - forward control is moved from architecture specific file into generic
>>>>>   vfio.c module.
>>>>>   only kvm_arch_set_fwd_state remains architecture specific
>>>>> - integrate Kim's patch which enables KVM-VFIO for ARM
>>>>> - fix vgic state bypass in vgic_queue_hwirq
>>>>> - struct kvm_arch_forwarded_irq moved from
>> arch/arm/include/uapi/asm/kvm.h
>>>>>   to include/uapi/linux/kvm.h
>>>>>   also irq_index renamed into index and guest_irq renamed into gsi
>>>>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
>>>>> - vfio_external_get_base_device renamed into vfio_external_base_device
>>>>> - vfio_external_get_type removed
>>>>> - kvm_vfio_external_get_base_device renamed into
>> kvm_vfio_external_base_device
>>>>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into
>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>>>>
>>>>> Eric Auger (8):
>>>>>   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
>>>>>     IRQ
>>>>>   KVM: ARM: VGIC: add forwarded irq rbtree lock
>>>>>   VFIO: platform: handler tests whether the IRQ is forwarded
>>>>>   KVM: KVM-VFIO: update user API to program forwarded IRQ
>>>>>   VFIO: Extend external user API
>>>>>   KVM: KVM-VFIO: add new VFIO external API hooks
>>>>>   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ
>> forwarding
>>>>>     control
>>>>>   KVM: KVM-VFIO: ARM forwarding control
>>>>>
>>>>> Kim Phillips (1):
>>>>>   ARM: KVM: Enable the KVM-VFIO device
>>>>>
>>>>>  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
>>>>>  arch/arm/include/asm/kvm_host.h            |   7 +
>>>>>  arch/arm/kvm/Kconfig                       |   1 +
>>>>>  arch/arm/kvm/Makefile                      |   4 +-
>>>>>  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
>>>>>  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
>>>>>  drivers/vfio/vfio.c                        |  24 ++
>>>>>  include/kvm/arm_vgic.h                     |   1 +
>>>>>  include/linux/kvm_host.h                   |  27 ++
>>>>>  include/linux/vfio.h                       |   3 +
>>>>>  include/uapi/linux/kvm.h                   |   9 +
>>>>>  virt/kvm/arm/vgic.c                        |  59 +++-
>>>>>  virt/kvm/vfio.c                            | 497
>> ++++++++++++++++++++++++++++-
>>>>>  13 files changed, 733 insertions(+), 17 deletions(-)
>>>>>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
>>>>>
>>>>
>>>> Have we ventured too far in the other direction?  I suppose what I was
>>>> hoping to see was something more like:
>>>>
>>>> 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
>>>>
>>>> 		/* get vfio_device */
>>>>
>>>> 		/* get mutex */
>>>>
>>>> 		/* verify device+irq isn't already forwarded */
>>>>
>>>> 		/* allocate device/forwarded irq */
>>>>
>>>> 		/* get struct device */
>>>>
>>>> 		/* callout to arch code passing struct device, gsi, ... */
>>>>
>>>> 		/* if success, add to kv, else free and error */
>>>>
>>>> 		/* mutex unlock */
>>>> 	}
>>>
>>> I think that's essentially what this patch set is trying to do, but
>>> there are just too many complicated intertwining cases right now that
>>> makes the code hard to read.
>>>
>>>>
>>>> Exposing the internal mutex out to arch code, as in v1, was an
>>>> indication that we were pushing too much out to arch code, but including
>>>> platform_device.h into virt/kvm/vfio.c tells me we're still not
>>>> abstracting at the right point.  Thanks,
>>>>
>>> I raised my eyebrows over the platform device bus thingy here as well,
>>> but on the other hand, there's nothing ARM-specific about referring to
>>> the platform device bus.
>>>
>>> I think perhaps it just has to be made more clear that the generic code
>>> deals with translating the device resources in the necessary way, and
>>> currently it only supports vfio-platform devices?
>>
>> Ok, you're probably right, looking at it again it is closer than I
>> thought.  At the same time, the use of platform device in
>> virt/kvm/vfio.c is pointless and can easily be pushed out to the arch
>> code as just another error return case.  vfio.c doesn't need to be aware
>> of hwirq.  The rest of the code is just overly complicated, with three
>> different cleanup functions and validation function bloat.  Thanks,
>>
>> Alex
> 
> 
> Hi Alex, Could you please tell what is the current status of this patch set.
> As you mentioned in another thread, something(such as, kvm_vfio_device_get_external_user(), etc.)
> in this patch set can be leveraged for VT-d Posted-interrtups.
> 
> Thanks,
> Feng
> 
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-11-17 13:41           ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-11-17 13:41 UTC (permalink / raw)
  To: Wu, Feng, Alex Williamson, Christoffer Dall
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	joel.schopp, kim.phillips, paulus, gleb, pbonzini, linux-kernel,
	patches, will.deacon, a.motakis, a.rigo, john.liuli

Hi Feng,

I will submit a PATCH v3 release end of this week.

Best Regards

Eric

On 11/17/2014 12:25 PM, Wu, Feng wrote:
> 
> 
>> -----Original Message-----
>> From: linux-kernel-owner@vger.kernel.org
>> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Alex Williamson
>> Sent: Thursday, September 11, 2014 1:10 PM
>> To: Christoffer Dall
>> Cc: Eric Auger; eric.auger@st.com; marc.zyngier@arm.com;
>> linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
>> kvm@vger.kernel.org; joel.schopp@amd.com; kim.phillips@freescale.com;
>> paulus@samba.org; gleb@kernel.org; pbonzini@redhat.com;
>> linux-kernel@vger.kernel.org; patches@linaro.org; will.deacon@arm.com;
>> a.motakis@virtualopensystems.com; a.rigo@virtualopensystems.com;
>> john.liuli@huawei.com
>> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
>>
>> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
>>> On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
>>>> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
>>>>> This RFC proposes an integration of "ARM: Forwarding physical
>>>>> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
>>>>> KVM.
>>>>>
>>>>> It enables to transform a VFIO platform driver IRQ into a forwarded
>>>>> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
>>>>> switch can be avoided on guest virtual IRQ completion. Before this
>>>>> patch, a maintenance IRQ was triggered on the virtual IRQ completion.
>>>>>
>>>>> When the IRQ is forwarded, the VFIO platform driver does not need to
>>>>> disable the IRQ anymore. Indeed when returning from the IRQ handler
>>>>> the IRQ is not deactivated. Only its priority is lowered. This means
>>>>> the same IRQ cannot hit before the guest completes the virtual IRQ
>>>>> and the GIC automatically deactivates the corresponding physical IRQ.
>>>>>
>>>>> Besides, the injection still is based on irqfd triggering. The only
>>>>> impact on irqfd process is resamplefd is not called anymore on
>>>>> virtual IRQ completion since this latter becomes "transparent".
>>>>>
>>>>> The current integration is based on an extension of the KVM-VFIO
>>>>> device, previously used by KVM to interact with VFIO groups. The
>>>>> patch serie now enables KVM to directly interact with a VFIO
>>>>> platform device. The VFIO external API was extended for that purpose.
>>>>>
>>>>> Th KVM-VFIO device can get/put the vfio platform device, check its
>>>>> integrity and type, get the IRQ number associated to an IRQ index.
>>>>>
>>>>> The IRQ forward programming is architecture specific (virtual interrupt
>>>>> controller programming basically). However the whole infrastructure is
>>>>> kept generic.
>>>>>
>>>>> from a user point of view, the functionality is provided through new
>>>>> KVM-VFIO device commands,
>> KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
>>>>> and the capability can be checked with KVM_HAS_DEVICE_ATTR.
>>>>> Assignment can only be changed when the physical IRQ is not active.
>>>>> It is the responsability of the user to do this check.
>>>>>
>>>>> This patch serie has the following dependencies:
>>>>> - "ARM: Forwarding physical interrupts to a guest VM"
>>>>>   (http://lwn.net/Articles/603514/) in
>>>>> - [PATCH v3] irqfd for ARM
>>>>> - and obviously the VFIO platform driver serie:
>>>>>   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
>>>>>   https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
>>>>>
>>>>> Integrated pieces can be found at
>>>>> ssh://git.linaro.org/people/eric.auger/linux.git
>>>>> on branch 3.17rc3_irqfd_forward_integ_v2
>>>>>
>>>>> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
>>>>>
>>>>> v1 -> v2:
>>>>> - forward control is moved from architecture specific file into generic
>>>>>   vfio.c module.
>>>>>   only kvm_arch_set_fwd_state remains architecture specific
>>>>> - integrate Kim's patch which enables KVM-VFIO for ARM
>>>>> - fix vgic state bypass in vgic_queue_hwirq
>>>>> - struct kvm_arch_forwarded_irq moved from
>> arch/arm/include/uapi/asm/kvm.h
>>>>>   to include/uapi/linux/kvm.h
>>>>>   also irq_index renamed into index and guest_irq renamed into gsi
>>>>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
>>>>> - vfio_external_get_base_device renamed into vfio_external_base_device
>>>>> - vfio_external_get_type removed
>>>>> - kvm_vfio_external_get_base_device renamed into
>> kvm_vfio_external_base_device
>>>>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into
>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>>>>
>>>>> Eric Auger (8):
>>>>>   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
>>>>>     IRQ
>>>>>   KVM: ARM: VGIC: add forwarded irq rbtree lock
>>>>>   VFIO: platform: handler tests whether the IRQ is forwarded
>>>>>   KVM: KVM-VFIO: update user API to program forwarded IRQ
>>>>>   VFIO: Extend external user API
>>>>>   KVM: KVM-VFIO: add new VFIO external API hooks
>>>>>   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ
>> forwarding
>>>>>     control
>>>>>   KVM: KVM-VFIO: ARM forwarding control
>>>>>
>>>>> Kim Phillips (1):
>>>>>   ARM: KVM: Enable the KVM-VFIO device
>>>>>
>>>>>  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
>>>>>  arch/arm/include/asm/kvm_host.h            |   7 +
>>>>>  arch/arm/kvm/Kconfig                       |   1 +
>>>>>  arch/arm/kvm/Makefile                      |   4 +-
>>>>>  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
>>>>>  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
>>>>>  drivers/vfio/vfio.c                        |  24 ++
>>>>>  include/kvm/arm_vgic.h                     |   1 +
>>>>>  include/linux/kvm_host.h                   |  27 ++
>>>>>  include/linux/vfio.h                       |   3 +
>>>>>  include/uapi/linux/kvm.h                   |   9 +
>>>>>  virt/kvm/arm/vgic.c                        |  59 +++-
>>>>>  virt/kvm/vfio.c                            | 497
>> ++++++++++++++++++++++++++++-
>>>>>  13 files changed, 733 insertions(+), 17 deletions(-)
>>>>>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
>>>>>
>>>>
>>>> Have we ventured too far in the other direction?  I suppose what I was
>>>> hoping to see was something more like:
>>>>
>>>> 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
>>>>
>>>> 		/* get vfio_device */
>>>>
>>>> 		/* get mutex */
>>>>
>>>> 		/* verify device+irq isn't already forwarded */
>>>>
>>>> 		/* allocate device/forwarded irq */
>>>>
>>>> 		/* get struct device */
>>>>
>>>> 		/* callout to arch code passing struct device, gsi, ... */
>>>>
>>>> 		/* if success, add to kv, else free and error */
>>>>
>>>> 		/* mutex unlock */
>>>> 	}
>>>
>>> I think that's essentially what this patch set is trying to do, but
>>> there are just too many complicated intertwining cases right now that
>>> makes the code hard to read.
>>>
>>>>
>>>> Exposing the internal mutex out to arch code, as in v1, was an
>>>> indication that we were pushing too much out to arch code, but including
>>>> platform_device.h into virt/kvm/vfio.c tells me we're still not
>>>> abstracting at the right point.  Thanks,
>>>>
>>> I raised my eyebrows over the platform device bus thingy here as well,
>>> but on the other hand, there's nothing ARM-specific about referring to
>>> the platform device bus.
>>>
>>> I think perhaps it just has to be made more clear that the generic code
>>> deals with translating the device resources in the necessary way, and
>>> currently it only supports vfio-platform devices?
>>
>> Ok, you're probably right, looking at it again it is closer than I
>> thought.  At the same time, the use of platform device in
>> virt/kvm/vfio.c is pointless and can easily be pushed out to the arch
>> code as just another error return case.  vfio.c doesn't need to be aware
>> of hwirq.  The rest of the code is just overly complicated, with three
>> different cleanup functions and validation function bloat.  Thanks,
>>
>> Alex
> 
> 
> Hi Alex, Could you please tell what is the current status of this patch set.
> As you mentioned in another thread, something(such as, kvm_vfio_device_get_external_user(), etc.)
> in this patch set can be leveraged for VT-d Posted-interrtups.
> 
> Thanks,
> Feng
> 
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-11-17 13:41           ` Eric Auger
  0 siblings, 0 replies; 101+ messages in thread
From: Eric Auger @ 2014-11-17 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Feng,

I will submit a PATCH v3 release end of this week.

Best Regards

Eric

On 11/17/2014 12:25 PM, Wu, Feng wrote:
> 
> 
>> -----Original Message-----
>> From: linux-kernel-owner at vger.kernel.org
>> [mailto:linux-kernel-owner at vger.kernel.org] On Behalf Of Alex Williamson
>> Sent: Thursday, September 11, 2014 1:10 PM
>> To: Christoffer Dall
>> Cc: Eric Auger; eric.auger at st.com; marc.zyngier at arm.com;
>> linux-arm-kernel at lists.infradead.org; kvmarm at lists.cs.columbia.edu;
>> kvm at vger.kernel.org; joel.schopp at amd.com; kim.phillips at freescale.com;
>> paulus at samba.org; gleb at kernel.org; pbonzini at redhat.com;
>> linux-kernel at vger.kernel.org; patches at linaro.org; will.deacon at arm.com;
>> a.motakis at virtualopensystems.com; a.rigo at virtualopensystems.com;
>> john.liuli at huawei.com
>> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
>>
>> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
>>> On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
>>>> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
>>>>> This RFC proposes an integration of "ARM: Forwarding physical
>>>>> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
>>>>> KVM.
>>>>>
>>>>> It enables to transform a VFIO platform driver IRQ into a forwarded
>>>>> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
>>>>> switch can be avoided on guest virtual IRQ completion. Before this
>>>>> patch, a maintenance IRQ was triggered on the virtual IRQ completion.
>>>>>
>>>>> When the IRQ is forwarded, the VFIO platform driver does not need to
>>>>> disable the IRQ anymore. Indeed when returning from the IRQ handler
>>>>> the IRQ is not deactivated. Only its priority is lowered. This means
>>>>> the same IRQ cannot hit before the guest completes the virtual IRQ
>>>>> and the GIC automatically deactivates the corresponding physical IRQ.
>>>>>
>>>>> Besides, the injection still is based on irqfd triggering. The only
>>>>> impact on irqfd process is resamplefd is not called anymore on
>>>>> virtual IRQ completion since this latter becomes "transparent".
>>>>>
>>>>> The current integration is based on an extension of the KVM-VFIO
>>>>> device, previously used by KVM to interact with VFIO groups. The
>>>>> patch serie now enables KVM to directly interact with a VFIO
>>>>> platform device. The VFIO external API was extended for that purpose.
>>>>>
>>>>> Th KVM-VFIO device can get/put the vfio platform device, check its
>>>>> integrity and type, get the IRQ number associated to an IRQ index.
>>>>>
>>>>> The IRQ forward programming is architecture specific (virtual interrupt
>>>>> controller programming basically). However the whole infrastructure is
>>>>> kept generic.
>>>>>
>>>>> from a user point of view, the functionality is provided through new
>>>>> KVM-VFIO device commands,
>> KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
>>>>> and the capability can be checked with KVM_HAS_DEVICE_ATTR.
>>>>> Assignment can only be changed when the physical IRQ is not active.
>>>>> It is the responsability of the user to do this check.
>>>>>
>>>>> This patch serie has the following dependencies:
>>>>> - "ARM: Forwarding physical interrupts to a guest VM"
>>>>>   (http://lwn.net/Articles/603514/) in
>>>>> - [PATCH v3] irqfd for ARM
>>>>> - and obviously the VFIO platform driver serie:
>>>>>   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
>>>>>   https://www.mail-archive.com/kvm at vger.kernel.org/msg103247.html
>>>>>
>>>>> Integrated pieces can be found at
>>>>> ssh://git.linaro.org/people/eric.auger/linux.git
>>>>> on branch 3.17rc3_irqfd_forward_integ_v2
>>>>>
>>>>> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
>>>>>
>>>>> v1 -> v2:
>>>>> - forward control is moved from architecture specific file into generic
>>>>>   vfio.c module.
>>>>>   only kvm_arch_set_fwd_state remains architecture specific
>>>>> - integrate Kim's patch which enables KVM-VFIO for ARM
>>>>> - fix vgic state bypass in vgic_queue_hwirq
>>>>> - struct kvm_arch_forwarded_irq moved from
>> arch/arm/include/uapi/asm/kvm.h
>>>>>   to include/uapi/linux/kvm.h
>>>>>   also irq_index renamed into index and guest_irq renamed into gsi
>>>>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
>>>>> - vfio_external_get_base_device renamed into vfio_external_base_device
>>>>> - vfio_external_get_type removed
>>>>> - kvm_vfio_external_get_base_device renamed into
>> kvm_vfio_external_base_device
>>>>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into
>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>>>>
>>>>> Eric Auger (8):
>>>>>   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
>>>>>     IRQ
>>>>>   KVM: ARM: VGIC: add forwarded irq rbtree lock
>>>>>   VFIO: platform: handler tests whether the IRQ is forwarded
>>>>>   KVM: KVM-VFIO: update user API to program forwarded IRQ
>>>>>   VFIO: Extend external user API
>>>>>   KVM: KVM-VFIO: add new VFIO external API hooks
>>>>>   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ
>> forwarding
>>>>>     control
>>>>>   KVM: KVM-VFIO: ARM forwarding control
>>>>>
>>>>> Kim Phillips (1):
>>>>>   ARM: KVM: Enable the KVM-VFIO device
>>>>>
>>>>>  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
>>>>>  arch/arm/include/asm/kvm_host.h            |   7 +
>>>>>  arch/arm/kvm/Kconfig                       |   1 +
>>>>>  arch/arm/kvm/Makefile                      |   4 +-
>>>>>  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
>>>>>  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
>>>>>  drivers/vfio/vfio.c                        |  24 ++
>>>>>  include/kvm/arm_vgic.h                     |   1 +
>>>>>  include/linux/kvm_host.h                   |  27 ++
>>>>>  include/linux/vfio.h                       |   3 +
>>>>>  include/uapi/linux/kvm.h                   |   9 +
>>>>>  virt/kvm/arm/vgic.c                        |  59 +++-
>>>>>  virt/kvm/vfio.c                            | 497
>> ++++++++++++++++++++++++++++-
>>>>>  13 files changed, 733 insertions(+), 17 deletions(-)
>>>>>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
>>>>>
>>>>
>>>> Have we ventured too far in the other direction?  I suppose what I was
>>>> hoping to see was something more like:
>>>>
>>>> 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
>>>>
>>>> 		/* get vfio_device */
>>>>
>>>> 		/* get mutex */
>>>>
>>>> 		/* verify device+irq isn't already forwarded */
>>>>
>>>> 		/* allocate device/forwarded irq */
>>>>
>>>> 		/* get struct device */
>>>>
>>>> 		/* callout to arch code passing struct device, gsi, ... */
>>>>
>>>> 		/* if success, add to kv, else free and error */
>>>>
>>>> 		/* mutex unlock */
>>>> 	}
>>>
>>> I think that's essentially what this patch set is trying to do, but
>>> there are just too many complicated intertwining cases right now that
>>> makes the code hard to read.
>>>
>>>>
>>>> Exposing the internal mutex out to arch code, as in v1, was an
>>>> indication that we were pushing too much out to arch code, but including
>>>> platform_device.h into virt/kvm/vfio.c tells me we're still not
>>>> abstracting at the right point.  Thanks,
>>>>
>>> I raised my eyebrows over the platform device bus thingy here as well,
>>> but on the other hand, there's nothing ARM-specific about referring to
>>> the platform device bus.
>>>
>>> I think perhaps it just has to be made more clear that the generic code
>>> deals with translating the device resources in the necessary way, and
>>> currently it only supports vfio-platform devices?
>>
>> Ok, you're probably right, looking at it again it is closer than I
>> thought.  At the same time, the use of platform device in
>> virt/kvm/vfio.c is pointless and can easily be pushed out to the arch
>> code as just another error return case.  vfio.c doesn't need to be aware
>> of hwirq.  The rest of the code is just overly complicated, with three
>> different cleanup functions and validation function bloat.  Thanks,
>>
>> Alex
> 
> 
> Hi Alex, Could you please tell what is the current status of this patch set.
> As you mentioned in another thread, something(such as, kvm_vfio_device_get_external_user(), etc.)
> in this patch set can be leveraged for VT-d Posted-interrtups.
> 
> Thanks,
> Feng
> 
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 101+ messages in thread

* RE: [RFC v2 0/9] KVM-VFIO IRQ forward control
  2014-11-17 13:41           ` Eric Auger
  (?)
@ 2014-11-17 13:45             ` Wu, Feng
  -1 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2014-11-17 13:45 UTC (permalink / raw)
  To: Eric Auger, Alex Williamson, Christoffer Dall
  Cc: eric.auger, marc.zyngier, linux-arm-kernel, kvmarm, kvm,
	joel.schopp, kim.phillips, paulus, gleb, pbonzini, linux-kernel,
	patches, will.deacon, a.motakis, a.rigo, john.liuli, Wu, Feng



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Eric Auger
> Sent: Monday, November 17, 2014 9:42 PM
> To: Wu, Feng; Alex Williamson; Christoffer Dall
> Cc: eric.auger@st.com; marc.zyngier@arm.com;
> linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> kvm@vger.kernel.org; joel.schopp@amd.com; kim.phillips@freescale.com;
> paulus@samba.org; gleb@kernel.org; pbonzini@redhat.com;
> linux-kernel@vger.kernel.org; patches@linaro.org; will.deacon@arm.com;
> a.motakis@virtualopensystems.com; a.rigo@virtualopensystems.com;
> john.liuli@huawei.com
> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
> 
> Hi Feng,
> 
> I will submit a PATCH v3 release end of this week.
> 
> Best Regards
> 
> Eric

Thanks for the update, Eric!

Thanks,
Feng

> 
> On 11/17/2014 12:25 PM, Wu, Feng wrote:
> >
> >
> >> -----Original Message-----
> >> From: linux-kernel-owner@vger.kernel.org
> >> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Alex Williamson
> >> Sent: Thursday, September 11, 2014 1:10 PM
> >> To: Christoffer Dall
> >> Cc: Eric Auger; eric.auger@st.com; marc.zyngier@arm.com;
> >> linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> >> kvm@vger.kernel.org; joel.schopp@amd.com; kim.phillips@freescale.com;
> >> paulus@samba.org; gleb@kernel.org; pbonzini@redhat.com;
> >> linux-kernel@vger.kernel.org; patches@linaro.org; will.deacon@arm.com;
> >> a.motakis@virtualopensystems.com; a.rigo@virtualopensystems.com;
> >> john.liuli@huawei.com
> >> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
> >>
> >> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> >>> On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> >>>> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> >>>>> This RFC proposes an integration of "ARM: Forwarding physical
> >>>>> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> >>>>> KVM.
> >>>>>
> >>>>> It enables to transform a VFIO platform driver IRQ into a forwarded
> >>>>> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> >>>>> switch can be avoided on guest virtual IRQ completion. Before this
> >>>>> patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> >>>>>
> >>>>> When the IRQ is forwarded, the VFIO platform driver does not need to
> >>>>> disable the IRQ anymore. Indeed when returning from the IRQ handler
> >>>>> the IRQ is not deactivated. Only its priority is lowered. This means
> >>>>> the same IRQ cannot hit before the guest completes the virtual IRQ
> >>>>> and the GIC automatically deactivates the corresponding physical IRQ.
> >>>>>
> >>>>> Besides, the injection still is based on irqfd triggering. The only
> >>>>> impact on irqfd process is resamplefd is not called anymore on
> >>>>> virtual IRQ completion since this latter becomes "transparent".
> >>>>>
> >>>>> The current integration is based on an extension of the KVM-VFIO
> >>>>> device, previously used by KVM to interact with VFIO groups. The
> >>>>> patch serie now enables KVM to directly interact with a VFIO
> >>>>> platform device. The VFIO external API was extended for that purpose.
> >>>>>
> >>>>> Th KVM-VFIO device can get/put the vfio platform device, check its
> >>>>> integrity and type, get the IRQ number associated to an IRQ index.
> >>>>>
> >>>>> The IRQ forward programming is architecture specific (virtual interrupt
> >>>>> controller programming basically). However the whole infrastructure is
> >>>>> kept generic.
> >>>>>
> >>>>> from a user point of view, the functionality is provided through new
> >>>>> KVM-VFIO device commands,
> >> KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> >>>>> and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> >>>>> Assignment can only be changed when the physical IRQ is not active.
> >>>>> It is the responsability of the user to do this check.
> >>>>>
> >>>>> This patch serie has the following dependencies:
> >>>>> - "ARM: Forwarding physical interrupts to a guest VM"
> >>>>>   (http://lwn.net/Articles/603514/) in
> >>>>> - [PATCH v3] irqfd for ARM
> >>>>> - and obviously the VFIO platform driver serie:
> >>>>>   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> >>>>>
> https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> >>>>>
> >>>>> Integrated pieces can be found at
> >>>>> ssh://git.linaro.org/people/eric.auger/linux.git
> >>>>> on branch 3.17rc3_irqfd_forward_integ_v2
> >>>>>
> >>>>> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> >>>>>
> >>>>> v1 -> v2:
> >>>>> - forward control is moved from architecture specific file into generic
> >>>>>   vfio.c module.
> >>>>>   only kvm_arch_set_fwd_state remains architecture specific
> >>>>> - integrate Kim's patch which enables KVM-VFIO for ARM
> >>>>> - fix vgic state bypass in vgic_queue_hwirq
> >>>>> - struct kvm_arch_forwarded_irq moved from
> >> arch/arm/include/uapi/asm/kvm.h
> >>>>>   to include/uapi/linux/kvm.h
> >>>>>   also irq_index renamed into index and guest_irq renamed into gsi
> >>>>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> >>>>> - vfio_external_get_base_device renamed into
> vfio_external_base_device
> >>>>> - vfio_external_get_type removed
> >>>>> - kvm_vfio_external_get_base_device renamed into
> >> kvm_vfio_external_base_device
> >>>>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into
> >> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>>>>
> >>>>> Eric Auger (8):
> >>>>>   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> >>>>>     IRQ
> >>>>>   KVM: ARM: VGIC: add forwarded irq rbtree lock
> >>>>>   VFIO: platform: handler tests whether the IRQ is forwarded
> >>>>>   KVM: KVM-VFIO: update user API to program forwarded IRQ
> >>>>>   VFIO: Extend external user API
> >>>>>   KVM: KVM-VFIO: add new VFIO external API hooks
> >>>>>   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ
> >> forwarding
> >>>>>     control
> >>>>>   KVM: KVM-VFIO: ARM forwarding control
> >>>>>
> >>>>> Kim Phillips (1):
> >>>>>   ARM: KVM: Enable the KVM-VFIO device
> >>>>>
> >>>>>  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> >>>>>  arch/arm/include/asm/kvm_host.h            |   7 +
> >>>>>  arch/arm/kvm/Kconfig                       |   1 +
> >>>>>  arch/arm/kvm/Makefile                      |   4 +-
> >>>>>  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
> >>>>>  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> >>>>>  drivers/vfio/vfio.c                        |  24 ++
> >>>>>  include/kvm/arm_vgic.h                     |   1 +
> >>>>>  include/linux/kvm_host.h                   |  27 ++
> >>>>>  include/linux/vfio.h                       |   3 +
> >>>>>  include/uapi/linux/kvm.h                   |   9 +
> >>>>>  virt/kvm/arm/vgic.c                        |  59 +++-
> >>>>>  virt/kvm/vfio.c                            | 497
> >> ++++++++++++++++++++++++++++-
> >>>>>  13 files changed, 733 insertions(+), 17 deletions(-)
> >>>>>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> >>>>>
> >>>>
> >>>> Have we ventured too far in the other direction?  I suppose what I was
> >>>> hoping to see was something more like:
> >>>>
> >>>> 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> >>>>
> >>>> 		/* get vfio_device */
> >>>>
> >>>> 		/* get mutex */
> >>>>
> >>>> 		/* verify device+irq isn't already forwarded */
> >>>>
> >>>> 		/* allocate device/forwarded irq */
> >>>>
> >>>> 		/* get struct device */
> >>>>
> >>>> 		/* callout to arch code passing struct device, gsi, ... */
> >>>>
> >>>> 		/* if success, add to kv, else free and error */
> >>>>
> >>>> 		/* mutex unlock */
> >>>> 	}
> >>>
> >>> I think that's essentially what this patch set is trying to do, but
> >>> there are just too many complicated intertwining cases right now that
> >>> makes the code hard to read.
> >>>
> >>>>
> >>>> Exposing the internal mutex out to arch code, as in v1, was an
> >>>> indication that we were pushing too much out to arch code, but including
> >>>> platform_device.h into virt/kvm/vfio.c tells me we're still not
> >>>> abstracting at the right point.  Thanks,
> >>>>
> >>> I raised my eyebrows over the platform device bus thingy here as well,
> >>> but on the other hand, there's nothing ARM-specific about referring to
> >>> the platform device bus.
> >>>
> >>> I think perhaps it just has to be made more clear that the generic code
> >>> deals with translating the device resources in the necessary way, and
> >>> currently it only supports vfio-platform devices?
> >>
> >> Ok, you're probably right, looking at it again it is closer than I
> >> thought.  At the same time, the use of platform device in
> >> virt/kvm/vfio.c is pointless and can easily be pushed out to the arch
> >> code as just another error return case.  vfio.c doesn't need to be aware
> >> of hwirq.  The rest of the code is just overly complicated, with three
> >> different cleanup functions and validation function bloat.  Thanks,
> >>
> >> Alex
> >
> >
> > Hi Alex, Could you please tell what is the current status of this patch set.
> > As you mentioned in another thread, something(such as,
> kvm_vfio_device_get_external_user(), etc.)
> > in this patch set can be leveraged for VT-d Posted-interrtups.
> >
> > Thanks,
> > Feng
> >
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 101+ messages in thread

* RE: [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-11-17 13:45             ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2014-11-17 13:45 UTC (permalink / raw)
  To: Eric Auger, Alex Williamson, Christoffer Dall
  Cc: joel.schopp, kim.phillips, eric.auger, kvm, patches,
	marc.zyngier, john.liuli, will.deacon, linux-kernel, a.rigo,
	gleb, paulus, a.motakis, pbonzini, Wu, Feng, kvmarm,
	linux-arm-kernel



> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> Behalf Of Eric Auger
> Sent: Monday, November 17, 2014 9:42 PM
> To: Wu, Feng; Alex Williamson; Christoffer Dall
> Cc: eric.auger@st.com; marc.zyngier@arm.com;
> linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> kvm@vger.kernel.org; joel.schopp@amd.com; kim.phillips@freescale.com;
> paulus@samba.org; gleb@kernel.org; pbonzini@redhat.com;
> linux-kernel@vger.kernel.org; patches@linaro.org; will.deacon@arm.com;
> a.motakis@virtualopensystems.com; a.rigo@virtualopensystems.com;
> john.liuli@huawei.com
> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
> 
> Hi Feng,
> 
> I will submit a PATCH v3 release end of this week.
> 
> Best Regards
> 
> Eric

Thanks for the update, Eric!

Thanks,
Feng

> 
> On 11/17/2014 12:25 PM, Wu, Feng wrote:
> >
> >
> >> -----Original Message-----
> >> From: linux-kernel-owner@vger.kernel.org
> >> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Alex Williamson
> >> Sent: Thursday, September 11, 2014 1:10 PM
> >> To: Christoffer Dall
> >> Cc: Eric Auger; eric.auger@st.com; marc.zyngier@arm.com;
> >> linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> >> kvm@vger.kernel.org; joel.schopp@amd.com; kim.phillips@freescale.com;
> >> paulus@samba.org; gleb@kernel.org; pbonzini@redhat.com;
> >> linux-kernel@vger.kernel.org; patches@linaro.org; will.deacon@arm.com;
> >> a.motakis@virtualopensystems.com; a.rigo@virtualopensystems.com;
> >> john.liuli@huawei.com
> >> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
> >>
> >> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> >>> On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> >>>> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> >>>>> This RFC proposes an integration of "ARM: Forwarding physical
> >>>>> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> >>>>> KVM.
> >>>>>
> >>>>> It enables to transform a VFIO platform driver IRQ into a forwarded
> >>>>> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> >>>>> switch can be avoided on guest virtual IRQ completion. Before this
> >>>>> patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> >>>>>
> >>>>> When the IRQ is forwarded, the VFIO platform driver does not need to
> >>>>> disable the IRQ anymore. Indeed when returning from the IRQ handler
> >>>>> the IRQ is not deactivated. Only its priority is lowered. This means
> >>>>> the same IRQ cannot hit before the guest completes the virtual IRQ
> >>>>> and the GIC automatically deactivates the corresponding physical IRQ.
> >>>>>
> >>>>> Besides, the injection still is based on irqfd triggering. The only
> >>>>> impact on irqfd process is resamplefd is not called anymore on
> >>>>> virtual IRQ completion since this latter becomes "transparent".
> >>>>>
> >>>>> The current integration is based on an extension of the KVM-VFIO
> >>>>> device, previously used by KVM to interact with VFIO groups. The
> >>>>> patch serie now enables KVM to directly interact with a VFIO
> >>>>> platform device. The VFIO external API was extended for that purpose.
> >>>>>
> >>>>> Th KVM-VFIO device can get/put the vfio platform device, check its
> >>>>> integrity and type, get the IRQ number associated to an IRQ index.
> >>>>>
> >>>>> The IRQ forward programming is architecture specific (virtual interrupt
> >>>>> controller programming basically). However the whole infrastructure is
> >>>>> kept generic.
> >>>>>
> >>>>> from a user point of view, the functionality is provided through new
> >>>>> KVM-VFIO device commands,
> >> KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> >>>>> and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> >>>>> Assignment can only be changed when the physical IRQ is not active.
> >>>>> It is the responsability of the user to do this check.
> >>>>>
> >>>>> This patch serie has the following dependencies:
> >>>>> - "ARM: Forwarding physical interrupts to a guest VM"
> >>>>>   (http://lwn.net/Articles/603514/) in
> >>>>> - [PATCH v3] irqfd for ARM
> >>>>> - and obviously the VFIO platform driver serie:
> >>>>>   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> >>>>>
> https://www.mail-archive.com/kvm@vger.kernel.org/msg103247.html
> >>>>>
> >>>>> Integrated pieces can be found at
> >>>>> ssh://git.linaro.org/people/eric.auger/linux.git
> >>>>> on branch 3.17rc3_irqfd_forward_integ_v2
> >>>>>
> >>>>> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> >>>>>
> >>>>> v1 -> v2:
> >>>>> - forward control is moved from architecture specific file into generic
> >>>>>   vfio.c module.
> >>>>>   only kvm_arch_set_fwd_state remains architecture specific
> >>>>> - integrate Kim's patch which enables KVM-VFIO for ARM
> >>>>> - fix vgic state bypass in vgic_queue_hwirq
> >>>>> - struct kvm_arch_forwarded_irq moved from
> >> arch/arm/include/uapi/asm/kvm.h
> >>>>>   to include/uapi/linux/kvm.h
> >>>>>   also irq_index renamed into index and guest_irq renamed into gsi
> >>>>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> >>>>> - vfio_external_get_base_device renamed into
> vfio_external_base_device
> >>>>> - vfio_external_get_type removed
> >>>>> - kvm_vfio_external_get_base_device renamed into
> >> kvm_vfio_external_base_device
> >>>>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into
> >> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>>>>
> >>>>> Eric Auger (8):
> >>>>>   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> >>>>>     IRQ
> >>>>>   KVM: ARM: VGIC: add forwarded irq rbtree lock
> >>>>>   VFIO: platform: handler tests whether the IRQ is forwarded
> >>>>>   KVM: KVM-VFIO: update user API to program forwarded IRQ
> >>>>>   VFIO: Extend external user API
> >>>>>   KVM: KVM-VFIO: add new VFIO external API hooks
> >>>>>   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ
> >> forwarding
> >>>>>     control
> >>>>>   KVM: KVM-VFIO: ARM forwarding control
> >>>>>
> >>>>> Kim Phillips (1):
> >>>>>   ARM: KVM: Enable the KVM-VFIO device
> >>>>>
> >>>>>  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> >>>>>  arch/arm/include/asm/kvm_host.h            |   7 +
> >>>>>  arch/arm/kvm/Kconfig                       |   1 +
> >>>>>  arch/arm/kvm/Makefile                      |   4 +-
> >>>>>  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
> >>>>>  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> >>>>>  drivers/vfio/vfio.c                        |  24 ++
> >>>>>  include/kvm/arm_vgic.h                     |   1 +
> >>>>>  include/linux/kvm_host.h                   |  27 ++
> >>>>>  include/linux/vfio.h                       |   3 +
> >>>>>  include/uapi/linux/kvm.h                   |   9 +
> >>>>>  virt/kvm/arm/vgic.c                        |  59 +++-
> >>>>>  virt/kvm/vfio.c                            | 497
> >> ++++++++++++++++++++++++++++-
> >>>>>  13 files changed, 733 insertions(+), 17 deletions(-)
> >>>>>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> >>>>>
> >>>>
> >>>> Have we ventured too far in the other direction?  I suppose what I was
> >>>> hoping to see was something more like:
> >>>>
> >>>> 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> >>>>
> >>>> 		/* get vfio_device */
> >>>>
> >>>> 		/* get mutex */
> >>>>
> >>>> 		/* verify device+irq isn't already forwarded */
> >>>>
> >>>> 		/* allocate device/forwarded irq */
> >>>>
> >>>> 		/* get struct device */
> >>>>
> >>>> 		/* callout to arch code passing struct device, gsi, ... */
> >>>>
> >>>> 		/* if success, add to kv, else free and error */
> >>>>
> >>>> 		/* mutex unlock */
> >>>> 	}
> >>>
> >>> I think that's essentially what this patch set is trying to do, but
> >>> there are just too many complicated intertwining cases right now that
> >>> makes the code hard to read.
> >>>
> >>>>
> >>>> Exposing the internal mutex out to arch code, as in v1, was an
> >>>> indication that we were pushing too much out to arch code, but including
> >>>> platform_device.h into virt/kvm/vfio.c tells me we're still not
> >>>> abstracting at the right point.  Thanks,
> >>>>
> >>> I raised my eyebrows over the platform device bus thingy here as well,
> >>> but on the other hand, there's nothing ARM-specific about referring to
> >>> the platform device bus.
> >>>
> >>> I think perhaps it just has to be made more clear that the generic code
> >>> deals with translating the device resources in the necessary way, and
> >>> currently it only supports vfio-platform devices?
> >>
> >> Ok, you're probably right, looking at it again it is closer than I
> >> thought.  At the same time, the use of platform device in
> >> virt/kvm/vfio.c is pointless and can easily be pushed out to the arch
> >> code as just another error return case.  vfio.c doesn't need to be aware
> >> of hwirq.  The rest of the code is just overly complicated, with three
> >> different cleanup functions and validation function bloat.  Thanks,
> >>
> >> Alex
> >
> >
> > Hi Alex, Could you please tell what is the current status of this patch set.
> > As you mentioned in another thread, something(such as,
> kvm_vfio_device_get_external_user(), etc.)
> > in this patch set can be leveraged for VT-d Posted-interrtups.
> >
> > Thanks,
> > Feng
> >
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [RFC v2 0/9] KVM-VFIO IRQ forward control
@ 2014-11-17 13:45             ` Wu, Feng
  0 siblings, 0 replies; 101+ messages in thread
From: Wu, Feng @ 2014-11-17 13:45 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: kvm-owner at vger.kernel.org [mailto:kvm-owner at vger.kernel.org] On
> Behalf Of Eric Auger
> Sent: Monday, November 17, 2014 9:42 PM
> To: Wu, Feng; Alex Williamson; Christoffer Dall
> Cc: eric.auger at st.com; marc.zyngier at arm.com;
> linux-arm-kernel at lists.infradead.org; kvmarm at lists.cs.columbia.edu;
> kvm at vger.kernel.org; joel.schopp at amd.com; kim.phillips at freescale.com;
> paulus at samba.org; gleb at kernel.org; pbonzini at redhat.com;
> linux-kernel at vger.kernel.org; patches at linaro.org; will.deacon at arm.com;
> a.motakis at virtualopensystems.com; a.rigo at virtualopensystems.com;
> john.liuli at huawei.com
> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
> 
> Hi Feng,
> 
> I will submit a PATCH v3 release end of this week.
> 
> Best Regards
> 
> Eric

Thanks for the update, Eric!

Thanks,
Feng

> 
> On 11/17/2014 12:25 PM, Wu, Feng wrote:
> >
> >
> >> -----Original Message-----
> >> From: linux-kernel-owner at vger.kernel.org
> >> [mailto:linux-kernel-owner at vger.kernel.org] On Behalf Of Alex Williamson
> >> Sent: Thursday, September 11, 2014 1:10 PM
> >> To: Christoffer Dall
> >> Cc: Eric Auger; eric.auger at st.com; marc.zyngier at arm.com;
> >> linux-arm-kernel at lists.infradead.org; kvmarm at lists.cs.columbia.edu;
> >> kvm at vger.kernel.org; joel.schopp at amd.com; kim.phillips at freescale.com;
> >> paulus at samba.org; gleb at kernel.org; pbonzini at redhat.com;
> >> linux-kernel at vger.kernel.org; patches at linaro.org; will.deacon at arm.com;
> >> a.motakis at virtualopensystems.com; a.rigo at virtualopensystems.com;
> >> john.liuli at huawei.com
> >> Subject: Re: [RFC v2 0/9] KVM-VFIO IRQ forward control
> >>
> >> On Thu, 2014-09-11 at 05:10 +0200, Christoffer Dall wrote:
> >>> On Tue, Sep 02, 2014 at 03:05:41PM -0600, Alex Williamson wrote:
> >>>> On Mon, 2014-09-01 at 14:52 +0200, Eric Auger wrote:
> >>>>> This RFC proposes an integration of "ARM: Forwarding physical
> >>>>> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> >>>>> KVM.
> >>>>>
> >>>>> It enables to transform a VFIO platform driver IRQ into a forwarded
> >>>>> IRQ. The direct benefit is that, for a level sensitive IRQ, a VM
> >>>>> switch can be avoided on guest virtual IRQ completion. Before this
> >>>>> patch, a maintenance IRQ was triggered on the virtual IRQ completion.
> >>>>>
> >>>>> When the IRQ is forwarded, the VFIO platform driver does not need to
> >>>>> disable the IRQ anymore. Indeed when returning from the IRQ handler
> >>>>> the IRQ is not deactivated. Only its priority is lowered. This means
> >>>>> the same IRQ cannot hit before the guest completes the virtual IRQ
> >>>>> and the GIC automatically deactivates the corresponding physical IRQ.
> >>>>>
> >>>>> Besides, the injection still is based on irqfd triggering. The only
> >>>>> impact on irqfd process is resamplefd is not called anymore on
> >>>>> virtual IRQ completion since this latter becomes "transparent".
> >>>>>
> >>>>> The current integration is based on an extension of the KVM-VFIO
> >>>>> device, previously used by KVM to interact with VFIO groups. The
> >>>>> patch serie now enables KVM to directly interact with a VFIO
> >>>>> platform device. The VFIO external API was extended for that purpose.
> >>>>>
> >>>>> Th KVM-VFIO device can get/put the vfio platform device, check its
> >>>>> integrity and type, get the IRQ number associated to an IRQ index.
> >>>>>
> >>>>> The IRQ forward programming is architecture specific (virtual interrupt
> >>>>> controller programming basically). However the whole infrastructure is
> >>>>> kept generic.
> >>>>>
> >>>>> from a user point of view, the functionality is provided through new
> >>>>> KVM-VFIO device commands,
> >> KVM_DEV_VFIO_DEVICE_(UN)FORWARD_IRQ
> >>>>> and the capability can be checked with KVM_HAS_DEVICE_ATTR.
> >>>>> Assignment can only be changed when the physical IRQ is not active.
> >>>>> It is the responsability of the user to do this check.
> >>>>>
> >>>>> This patch serie has the following dependencies:
> >>>>> - "ARM: Forwarding physical interrupts to a guest VM"
> >>>>>   (http://lwn.net/Articles/603514/) in
> >>>>> - [PATCH v3] irqfd for ARM
> >>>>> - and obviously the VFIO platform driver serie:
> >>>>>   [RFC PATCH v6 00/20] VFIO support for platform devices on ARM
> >>>>>
> https://www.mail-archive.com/kvm at vger.kernel.org/msg103247.html
> >>>>>
> >>>>> Integrated pieces can be found at
> >>>>> ssh://git.linaro.org/people/eric.auger/linux.git
> >>>>> on branch 3.17rc3_irqfd_forward_integ_v2
> >>>>>
> >>>>> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> >>>>>
> >>>>> v1 -> v2:
> >>>>> - forward control is moved from architecture specific file into generic
> >>>>>   vfio.c module.
> >>>>>   only kvm_arch_set_fwd_state remains architecture specific
> >>>>> - integrate Kim's patch which enables KVM-VFIO for ARM
> >>>>> - fix vgic state bypass in vgic_queue_hwirq
> >>>>> - struct kvm_arch_forwarded_irq moved from
> >> arch/arm/include/uapi/asm/kvm.h
> >>>>>   to include/uapi/linux/kvm.h
> >>>>>   also irq_index renamed into index and guest_irq renamed into gsi
> >>>>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> >>>>> - vfio_external_get_base_device renamed into
> vfio_external_base_device
> >>>>> - vfio_external_get_type removed
> >>>>> - kvm_vfio_external_get_base_device renamed into
> >> kvm_vfio_external_base_device
> >>>>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into
> >> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>>>>
> >>>>> Eric Auger (8):
> >>>>>   KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded
> >>>>>     IRQ
> >>>>>   KVM: ARM: VGIC: add forwarded irq rbtree lock
> >>>>>   VFIO: platform: handler tests whether the IRQ is forwarded
> >>>>>   KVM: KVM-VFIO: update user API to program forwarded IRQ
> >>>>>   VFIO: Extend external user API
> >>>>>   KVM: KVM-VFIO: add new VFIO external API hooks
> >>>>>   KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ
> >> forwarding
> >>>>>     control
> >>>>>   KVM: KVM-VFIO: ARM forwarding control
> >>>>>
> >>>>> Kim Phillips (1):
> >>>>>   ARM: KVM: Enable the KVM-VFIO device
> >>>>>
> >>>>>  Documentation/virtual/kvm/devices/vfio.txt |  26 ++
> >>>>>  arch/arm/include/asm/kvm_host.h            |   7 +
> >>>>>  arch/arm/kvm/Kconfig                       |   1 +
> >>>>>  arch/arm/kvm/Makefile                      |   4 +-
> >>>>>  arch/arm/kvm/kvm_vfio_arm.c                |  85 +++++
> >>>>>  drivers/vfio/platform/vfio_platform_irq.c  |   7 +-
> >>>>>  drivers/vfio/vfio.c                        |  24 ++
> >>>>>  include/kvm/arm_vgic.h                     |   1 +
> >>>>>  include/linux/kvm_host.h                   |  27 ++
> >>>>>  include/linux/vfio.h                       |   3 +
> >>>>>  include/uapi/linux/kvm.h                   |   9 +
> >>>>>  virt/kvm/arm/vgic.c                        |  59 +++-
> >>>>>  virt/kvm/vfio.c                            | 497
> >> ++++++++++++++++++++++++++++-
> >>>>>  13 files changed, 733 insertions(+), 17 deletions(-)
> >>>>>  create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> >>>>>
> >>>>
> >>>> Have we ventured too far in the other direction?  I suppose what I was
> >>>> hoping to see was something more like:
> >>>>
> >>>> 	case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:{
> >>>>
> >>>> 		/* get vfio_device */
> >>>>
> >>>> 		/* get mutex */
> >>>>
> >>>> 		/* verify device+irq isn't already forwarded */
> >>>>
> >>>> 		/* allocate device/forwarded irq */
> >>>>
> >>>> 		/* get struct device */
> >>>>
> >>>> 		/* callout to arch code passing struct device, gsi, ... */
> >>>>
> >>>> 		/* if success, add to kv, else free and error */
> >>>>
> >>>> 		/* mutex unlock */
> >>>> 	}
> >>>
> >>> I think that's essentially what this patch set is trying to do, but
> >>> there are just too many complicated intertwining cases right now that
> >>> makes the code hard to read.
> >>>
> >>>>
> >>>> Exposing the internal mutex out to arch code, as in v1, was an
> >>>> indication that we were pushing too much out to arch code, but including
> >>>> platform_device.h into virt/kvm/vfio.c tells me we're still not
> >>>> abstracting at the right point.  Thanks,
> >>>>
> >>> I raised my eyebrows over the platform device bus thingy here as well,
> >>> but on the other hand, there's nothing ARM-specific about referring to
> >>> the platform device bus.
> >>>
> >>> I think perhaps it just has to be made more clear that the generic code
> >>> deals with translating the device resources in the necessary way, and
> >>> currently it only supports vfio-platform devices?
> >>
> >> Ok, you're probably right, looking at it again it is closer than I
> >> thought.  At the same time, the use of platform device in
> >> virt/kvm/vfio.c is pointless and can easily be pushed out to the arch
> >> code as just another error return case.  vfio.c doesn't need to be aware
> >> of hwirq.  The rest of the code is just overly complicated, with three
> >> different cleanup functions and validation function bloat.  Thanks,
> >>
> >> Alex
> >
> >
> > Hi Alex, Could you please tell what is the current status of this patch set.
> > As you mentioned in another thread, something(such as,
> kvm_vfio_device_get_external_user(), etc.)
> > in this patch set can be leveraged for VT-d Posted-interrtups.
> >
> > Thanks,
> > Feng
> >
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to majordomo at vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 101+ messages in thread

end of thread, other threads:[~2014-11-17 13:53 UTC | newest]

Thread overview: 101+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-01 12:52 [RFC v2 0/9] KVM-VFIO IRQ forward control Eric Auger
2014-09-01 12:52 ` Eric Auger
2014-09-01 12:52 ` [RFC v2 1/9] KVM: ARM: VGIC: fix multiple injection of level sensitive forwarded IRQ Eric Auger
2014-09-01 12:52   ` Eric Auger
2014-09-11  3:09   ` Christoffer Dall
2014-09-11  3:09     ` Christoffer Dall
2014-09-11 18:17     ` Eric Auger
2014-09-11 18:17       ` Eric Auger
2014-09-11 22:14       ` Christoffer Dall
2014-09-11 22:14         ` Christoffer Dall
2014-09-01 12:52 ` [RFC v2 2/9] KVM: ARM: VGIC: add forwarded irq rbtree lock Eric Auger
2014-09-01 12:52   ` Eric Auger
2014-09-11  3:09   ` Christoffer Dall
2014-09-11  3:09     ` Christoffer Dall
2014-09-11 17:31     ` Eric Auger
2014-09-11 17:31       ` Eric Auger
2014-09-01 12:52 ` [RFC v2 3/9] ARM: KVM: Enable the KVM-VFIO device Eric Auger
2014-09-01 12:52   ` Eric Auger
2014-09-01 12:52 ` [RFC v2 4/9] VFIO: platform: handler tests whether the IRQ is forwarded Eric Auger
2014-09-01 12:52   ` Eric Auger
2014-09-11  3:10   ` Christoffer Dall
2014-09-11  3:10     ` Christoffer Dall
2014-09-11  8:44     ` Eric Auger
2014-09-11  8:44       ` Eric Auger
2014-09-11 17:05       ` Christoffer Dall
2014-09-11 17:05         ` Christoffer Dall
2014-09-11 18:07         ` Alex Williamson
2014-09-11 18:07           ` Alex Williamson
2014-09-11 17:08       ` Antonios Motakis
2014-09-11 17:08         ` Antonios Motakis
2014-09-01 12:52 ` [RFC v2 5/9] KVM: KVM-VFIO: update user API to program forwarded IRQ Eric Auger
2014-09-01 12:52   ` Eric Auger
2014-09-11  3:10   ` Christoffer Dall
2014-09-11  3:10     ` Christoffer Dall
2014-09-11  8:49     ` Eric Auger
2014-09-11  8:49       ` Eric Auger
2014-09-11 17:08       ` Christoffer Dall
2014-09-11 17:08         ` Christoffer Dall
2014-09-01 12:52 ` [RFC v2 6/9] VFIO: Extend external user API Eric Auger
2014-09-01 12:52   ` Eric Auger
2014-09-01 12:52   ` Eric Auger
2014-09-11  3:10   ` Christoffer Dall
2014-09-11  3:10     ` Christoffer Dall
2014-09-11  8:50     ` Eric Auger
2014-09-11  8:50       ` Eric Auger
2014-09-01 12:52 ` [RFC v2 7/9] KVM: KVM-VFIO: add new VFIO external API hooks Eric Auger
2014-09-01 12:52   ` Eric Auger
2014-09-11  3:10   ` Christoffer Dall
2014-09-11  3:10     ` Christoffer Dall
2014-09-11  8:51     ` Eric Auger
2014-09-11  8:51       ` Eric Auger
2014-09-01 12:52 ` [RFC v2 8/9] KVM: KVM-VFIO: generic KVM_DEV_VFIO_DEVICE command and IRQ forwarding control Eric Auger
2014-09-01 12:52   ` Eric Auger
2014-09-01 12:52   ` Eric Auger
2014-09-11  3:10   ` Christoffer Dall
2014-09-11  3:10     ` Christoffer Dall
2014-09-11  5:05     ` Alex Williamson
2014-09-11  5:05       ` Alex Williamson
2014-09-11  5:05       ` Alex Williamson
2014-09-11 12:04       ` Eric Auger
2014-09-11 12:04         ` Eric Auger
2014-09-11 15:59         ` Alex Williamson
2014-09-11 15:59           ` Alex Williamson
2014-09-11 17:24           ` Christoffer Dall
2014-09-11 17:24             ` Christoffer Dall
2014-09-11 17:22         ` Christoffer Dall
2014-09-11 17:22           ` Christoffer Dall
2014-09-11 17:10       ` Christoffer Dall
2014-09-11 17:10         ` Christoffer Dall
2014-09-11 18:14         ` Alex Williamson
2014-09-11 18:14           ` Alex Williamson
2014-09-11 21:59           ` Christoffer Dall
2014-09-11 21:59             ` Christoffer Dall
2014-09-11  9:35     ` Eric Auger
2014-09-11  9:35       ` Eric Auger
2014-09-11 15:47       ` Alex Williamson
2014-09-11 15:47         ` Alex Williamson
2014-09-11 15:47         ` Alex Williamson
2014-09-11 17:32       ` Christoffer Dall
2014-09-11 17:32         ` Christoffer Dall
2014-09-01 12:52 ` [RFC v2 9/9] KVM: KVM-VFIO: ARM " Eric Auger
2014-09-01 12:52   ` Eric Auger
2014-09-11  3:10   ` Christoffer Dall
2014-09-11  3:10     ` Christoffer Dall
2014-09-02 21:05 ` [RFC v2 0/9] KVM-VFIO IRQ forward control Alex Williamson
2014-09-02 21:05   ` Alex Williamson
2014-09-05 12:52   ` Eric Auger
2014-09-05 12:52     ` Eric Auger
2014-09-11  3:10   ` Christoffer Dall
2014-09-11  3:10     ` Christoffer Dall
2014-09-11  5:09     ` Alex Williamson
2014-09-11  5:09       ` Alex Williamson
2014-11-17 11:25       ` Wu, Feng
2014-11-17 11:25         ` Wu, Feng
2014-11-17 11:25         ` Wu, Feng
2014-11-17 13:41         ` Eric Auger
2014-11-17 13:41           ` Eric Auger
2014-11-17 13:41           ` Eric Auger
2014-11-17 13:45           ` Wu, Feng
2014-11-17 13:45             ` Wu, Feng
2014-11-17 13:45             ` Wu, Feng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.