linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation
@ 2016-07-05 11:22 Andre Przywara
  2016-07-05 11:22 ` [PATCH v8 01/17] KVM: arm/arm64: move redistributor kvm_io_devices Andre Przywara
                   ` (18 more replies)
  0 siblings, 19 replies; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

this series allows those KVM guests that use an emulated GICv3 to use LPIs
as well, though in the moment this is limited to emulated PCI devices.
This is based on kvmarm/queue, which now only features the new VGIC
implementation.

This time only smaller corrections for the KVM ITS emulation support:
I addressed the review comments, which pointed out some vgic_put_irq()
omissions. Also the GICv2 init sequence has changed, so that we can now
bail out a KVM_DEVICE init without leaking a HYP mapping.
Also a bug in the MAPC emulation was fixed, which allowed multiple
mappings of the same collection ID.
The KVM_DEVICE init sequence has now some checks to ensure the right
order. The requirements are a bit stricter than for the GICv2/GICv3
devices: we need to setup the mapping address before calling the
INIT ioctl. This apparently has some implications on QEMU, I just need
to be convinced that we should follow QEMU's approach. It seems to look
a bit ugly to stash the ITS init into the existing GICv3 code, especially
since the ITS is a separate, optional device.

You can find all of this code (and the prerequisites) in the
its-emul/v8 branch of my repository [1].
This has been briefly tested on the model and on GICv3 hardware.
If you have GICv3 capable hardware, please test it on your setup.
Also of course any review comments are very welcome!

Cheers,
Andre.

Changelog v7..v8:
- rebase on old-VGIC removal patch
- add missing vgic_put_irq()s
- check and ensure proper ITS initialisation sequence
- avoid double collection mapping
- renaming vits_ function prefixes to vgic_its_
- properly setup PENDBASER (for new VGIC now)
- change vgic_v2_probe init order to allow clean exit

Changelog v6..v7:
- use kref reference counting
- remove RCU usage from lpi_list, use spinlock instead
- copy list of LPIs before accessing guest memory
- introduce kvm_io_bus_get_dev()
- refactor parts of arm-gic-v3.h header file
- provide proper initial values for redistributor and ITS base registers
- rework sanitisation of base registers
- rework VGIC MMIO dispatching to differentiate between VGIC parts
- smaller fixes, also comments and commit messages amended

Changelog v5..v6:
- remove its_list from VGIC code
- add lpi_list and accessor functions
- introduce reference counting to struct vgic_irq
- replace its_lock spinlock with its_cmd and its_lock mutexes
- simplify guest memory accesses (due to the new mutexes)
- avoid unnecessary affinity updates
- refine base register address masking
- introduce sanity checks for PROPBASER and PENDBASER
- implement BASER<n> registers
- pass struct vgic_its directly into the MMIO handlers
- convert KVM_SIGNAL_MSI ioctl into an MMIO write
- add explicit INIT ioctl to the ITS KVM device
- adjusting comments and commit messages

Changelog v4..v5:
- adapting to final new VGIC (MMIO handlers, etc.)
- new KVM device to model an ITS, multiple instances allowed
- move redistributor data into struct vgic_cpu
- separate distributor and ITS(es)
- various bug fixes and amended comments after review comments

Changelog v3..v4:
- adapting to new VGIC (changes in IRQ injection mechanism)

Changelog v2..v3:
- adapt to 4.3-rc and Christoffer's timer rework
- adapt spin locks on handling PROPBASER/PENDBASER registers
- rework locking in ITS command handling (dropping dist where needed)
- only clear LPI pending bit if LPI could actually be queued
- simplify GICR_CTLR handling
- properly free ITTEs (including our pending bitmap)
- fix corner cases with unmapped collections
- keep retire_lr() around
- rename vgic_handle_base_register to vgic_reg64_access()
- use kcalloc instead of kmalloc
- minor fixes, renames and added comments

Changelog v1..v2
- fix issues when using non-ITS GICv3 emulation
- streamline frame address initialization (new patch 05/15)
- preallocate buffer memory for reading from guest's memory
- move locking into the actual command handlers
-   preallocate memory for new structures if needed
- use non-atomic __set_bit() and __clear_bit() when under the lock
- add INT command handler to allow LPI injection from the guest
- rewrite CWRITER handler to align with new locking scheme
- remove unneeded CONFIG_HAVE_KVM_MSI #ifdefs
- check memory table size against our LPI limit (65536 interrupts)
- observe initial gap of 1024 interrupts in pending table
- use term "configuration table" to be in line with the spec
- clarify and extend documentation on API extensions
- introduce new KVM_CAP_MSI_DEVID capability to advertise device ID requirement
- update, fix and add many comments
- minor style changes as requested by reviewers

---------------

The GICv3 ITS (Interrupt Translation Service) is a part of the
ARM GICv3 interrupt controller [3] used for implementing MSIs.
It specifies a new kind of interrupts (LPIs), which are mapped to
establish a connection between a device, its MSI payload value and
the target processor the IRQ is eventually delivered to.
In order to allow using MSIs in an ARM64 KVM guest, we emulate this
ITS widget in the kernel.
The ITS works by reading commands written by software (from the guest
in our case) into a (guest allocated) memory region and establishing
the mapping between a device, the MSI payload and the target CPU.
We parse these commands and update our internal data structures to
reflect those changes. On an MSI injection we iterate those
structures to learn the LPI number we have to inject.
For the time being we use simple lists to hold the data, this is
good enough for the small number of entries each of the components
currently have. Should this become a performance bottleneck in the
future, those can be extended to arrays or trees if needed.

Most of the code lives in a separate source file (vgic-its.c), though
there are some changes necessary in the existing VGIC files.

For the time being this series gives us the ability to use emulated
PCI devices that can use MSIs in the guest. Those have to be
triggered by letting the userland device emulation simulate the MSI
write with the KVM_SIGNAL_MSI ioctl. This will be translated into
the proper LPI by the ITS emulation and injected into the guest in
the usual way (just with a higher IRQ number).

This series is based on kvmarm/queue and can be found at the
its-emul/v8 branch of this repository [1].
For this to be used you need a GICv3 host machine (a fast model would
do), though it does not rely on any host ITS bits (neither in hardware
or software).

To test this you can use the kvmtool patches available in the "its-v6"
branch here [2].
Start a guest with: "$ lkvm run --irqchip=gicv3-its --force-pci"
and see the ITS being used for instance by the virtio devices.

[1]: git://linux-arm.org/linux-ap.git
     http://www.linux-arm.org/git?p=linux-ap.git;a=log;h=refs/heads/its-emul/v8
[2]: git://linux-arm.org/kvmtool.git
     http://www.linux-arm.org/git?p=kvmtool.git;a=log;h=refs/heads/its-v6
[3]: http://arminfo.emea.arm.com/help/topic/com.arm.doc.ihi0069a/IHI0069A_gic_architecture_specification.pdf

Andre Przywara (17):
  KVM: arm/arm64: move redistributor kvm_io_devices
  KVM: arm/arm64: check return value for kvm_register_vgic_device
  KVM: extend struct kvm_msi to hold a 32-bit device ID
  KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities
  KVM: kvm_io_bus: add kvm_io_bus_get_dev() call
  KVM: arm/arm64: VGIC: add refcounting for IRQs
  irqchip: refactor and add GICv3 definitions
  KVM: arm64: handle ITS related GICv3 redistributor registers
  KVM: arm64: introduce ITS emulation file with MMIO framework
  KVM: arm64: introduce new KVM ITS device
  KVM: arm64: implement basic ITS register handlers
  KVM: arm64: connect LPIs to the VGIC emulation
  KVM: arm64: read initial LPI pending table
  KVM: arm64: allow updates of LPI configuration table
  KVM: arm64: implement ITS command queue command handlers
  KVM: arm64: implement MSI injection in ITS emulation
  KVM: arm64: enable ITS emulation as a virtual MSI controller

 Documentation/virtual/kvm/api.txt              |   14 +-
 Documentation/virtual/kvm/devices/arm-vgic.txt |   25 +-
 arch/arm/include/asm/kvm_host.h                |    2 +-
 arch/arm/kvm/arm.c                             |    3 +-
 arch/arm64/include/asm/kvm_host.h              |    2 +-
 arch/arm64/include/uapi/asm/kvm.h              |    2 +
 arch/arm64/kvm/Kconfig                         |    1 +
 arch/arm64/kvm/Makefile                        |    1 +
 arch/arm64/kvm/reset.c                         |    8 +-
 include/kvm/arm_vgic.h                         |   66 +-
 include/linux/irqchip/arm-gic-v3.h             |  165 ++-
 include/linux/kvm_host.h                       |    2 +
 include/uapi/linux/kvm.h                       |    7 +-
 virt/kvm/arm/vgic/vgic-init.c                  |    9 +-
 virt/kvm/arm/vgic/vgic-its.c                   | 1425 ++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic-kvm-device.c            |   22 +-
 virt/kvm/arm/vgic/vgic-mmio-v2.c               |   48 +-
 virt/kvm/arm/vgic/vgic-mmio-v3.c               |  301 ++++-
 virt/kvm/arm/vgic/vgic-mmio.c                  |   61 +-
 virt/kvm/arm/vgic/vgic-mmio.h                  |   45 +-
 virt/kvm/arm/vgic/vgic-v2.c                    |   12 +-
 virt/kvm/arm/vgic/vgic-v3.c                    |   29 +-
 virt/kvm/arm/vgic/vgic.c                       |  108 +-
 virt/kvm/arm/vgic/vgic.h                       |   37 +-
 virt/kvm/kvm_main.c                            |   24 +
 25 files changed, 2216 insertions(+), 203 deletions(-)
 create mode 100644 virt/kvm/arm/vgic/vgic-its.c

-- 
2.9.0

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 01/17] KVM: arm/arm64: move redistributor kvm_io_devices
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
@ 2016-07-05 11:22 ` Andre Przywara
  2016-07-05 11:22 ` [PATCH v8 02/17] KVM: arm/arm64: check return value for kvm_register_vgic_device Andre Przywara
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

Logically a GICv3 redistributor is assigned to a (v)CPU, so we should
aim to keep redistributor related variables out of our struct vgic_dist.

Let's start by replacing the redistributor related kvm_io_device array
with two members in our existing struct vgic_cpu, which are naturally
per-VCPU and thus don't require any allocation / freeing.
So apart from the better fit with the redistributor design this saves
some code as well.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 include/kvm/arm_vgic.h           |  8 +++++++-
 virt/kvm/arm/vgic/vgic-init.c    |  1 -
 virt/kvm/arm/vgic/vgic-mmio-v3.c | 22 ++++++++--------------
 3 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 1264037..5142e2a 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -145,7 +145,6 @@ struct vgic_dist {
 	struct vgic_irq		*spis;
 
 	struct vgic_io_device	dist_iodev;
-	struct vgic_io_device	*redist_iodevs;
 };
 
 struct vgic_v2_cpu_if {
@@ -193,6 +192,13 @@ struct vgic_cpu {
 	struct list_head ap_list_head;
 
 	u64 live_lrs;
+
+	/*
+	 * Members below are used with GICv3 emulation only and represent
+	 * parts of the redistributor.
+	 */
+	struct vgic_io_device	rd_iodev;
+	struct vgic_io_device	sgi_iodev;
 };
 
 int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write);
diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index a1442f7..90cae48 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -271,7 +271,6 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
 	dist->initialized = false;
 
 	kfree(dist->spis);
-	kfree(dist->redist_iodevs);
 	dist->nr_spis = 0;
 
 	mutex_unlock(&kvm->lock);
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index a0c515a..fc7b6c9 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -285,21 +285,14 @@ unsigned int vgic_v3_init_dist_iodev(struct vgic_io_device *dev)
 
 int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t redist_base_address)
 {
-	int nr_vcpus = atomic_read(&kvm->online_vcpus);
 	struct kvm_vcpu *vcpu;
-	struct vgic_io_device *devices;
 	int c, ret = 0;
 
-	devices = kmalloc(sizeof(struct vgic_io_device) * nr_vcpus * 2,
-			  GFP_KERNEL);
-	if (!devices)
-		return -ENOMEM;
-
 	kvm_for_each_vcpu(c, vcpu, kvm) {
 		gpa_t rd_base = redist_base_address + c * SZ_64K * 2;
 		gpa_t sgi_base = rd_base + SZ_64K;
-		struct vgic_io_device *rd_dev = &devices[c * 2];
-		struct vgic_io_device *sgi_dev = &devices[c * 2 + 1];
+		struct vgic_io_device *rd_dev = &vcpu->arch.vgic_cpu.rd_iodev;
+		struct vgic_io_device *sgi_dev = &vcpu->arch.vgic_cpu.sgi_iodev;
 
 		kvm_iodevice_init(&rd_dev->dev, &kvm_io_gic_ops);
 		rd_dev->base_addr = rd_base;
@@ -335,14 +328,15 @@ int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t redist_base_address)
 	if (ret) {
 		/* The current c failed, so we start with the previous one. */
 		for (c--; c >= 0; c--) {
+			struct vgic_cpu *vgic_cpu;
+
+			vcpu = kvm_get_vcpu(kvm, c);
+			vgic_cpu = &vcpu->arch.vgic_cpu;
 			kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS,
-						  &devices[c * 2].dev);
+						  &vgic_cpu->rd_iodev.dev);
 			kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS,
-						  &devices[c * 2 + 1].dev);
+						  &vgic_cpu->sgi_iodev.dev);
 		}
-		kfree(devices);
-	} else {
-		kvm->arch.vgic.redist_iodevs = devices;
 	}
 
 	return ret;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 02/17] KVM: arm/arm64: check return value for kvm_register_vgic_device
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
  2016-07-05 11:22 ` [PATCH v8 01/17] KVM: arm/arm64: move redistributor kvm_io_devices Andre Przywara
@ 2016-07-05 11:22 ` Andre Przywara
  2016-07-05 11:22 ` [PATCH v8 03/17] KVM: extend struct kvm_msi to hold a 32-bit device ID Andre Przywara
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

kvm_register_device_ops() can return an error, so lets check its return
value and propagate this up the call chain.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 virt/kvm/arm/vgic/vgic-kvm-device.c | 15 +++++++++------
 virt/kvm/arm/vgic/vgic-v2.c         | 11 ++++++++---
 virt/kvm/arm/vgic/vgic-v3.c         | 15 +++++++++++++--
 virt/kvm/arm/vgic/vgic.h            |  2 +-
 4 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-kvm-device.c b/virt/kvm/arm/vgic/vgic-kvm-device.c
index 0130c4b..2f24f13 100644
--- a/virt/kvm/arm/vgic/vgic-kvm-device.c
+++ b/virt/kvm/arm/vgic/vgic-kvm-device.c
@@ -210,20 +210,24 @@ static void vgic_destroy(struct kvm_device *dev)
 	kfree(dev);
 }
 
-void kvm_register_vgic_device(unsigned long type)
+int kvm_register_vgic_device(unsigned long type)
 {
+	int ret = -ENODEV;
+
 	switch (type) {
 	case KVM_DEV_TYPE_ARM_VGIC_V2:
-		kvm_register_device_ops(&kvm_arm_vgic_v2_ops,
-					KVM_DEV_TYPE_ARM_VGIC_V2);
+		ret = kvm_register_device_ops(&kvm_arm_vgic_v2_ops,
+					      KVM_DEV_TYPE_ARM_VGIC_V2);
 		break;
 #ifdef CONFIG_KVM_ARM_VGIC_V3
 	case KVM_DEV_TYPE_ARM_VGIC_V3:
-		kvm_register_device_ops(&kvm_arm_vgic_v3_ops,
-					KVM_DEV_TYPE_ARM_VGIC_V3);
+		ret = kvm_register_device_ops(&kvm_arm_vgic_v3_ops,
+					      KVM_DEV_TYPE_ARM_VGIC_V3);
 		break;
 #endif
 	}
+
+	return ret;
 }
 
 /** vgic_attr_regs_access: allows user space to read/write VGIC registers
@@ -428,4 +432,3 @@ struct kvm_device_ops kvm_arm_vgic_v3_ops = {
 };
 
 #endif /* CONFIG_KVM_ARM_VGIC_V3 */
-
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index e31405e..079bf67 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -332,20 +332,25 @@ int vgic_v2_probe(const struct gic_kvm_info *info)
 	vtr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VTR);
 	kvm_vgic_global_state.nr_lr = (vtr & 0x3f) + 1;
 
+	ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V2);
+	if (ret) {
+		kvm_err("Cannot register GICv2 KVM device\n");
+		iounmap(kvm_vgic_global_state.vctrl_base);
+		return ret;
+	}
+
 	ret = create_hyp_io_mappings(kvm_vgic_global_state.vctrl_base,
 				     kvm_vgic_global_state.vctrl_base +
 					 resource_size(&info->vctrl),
 				     info->vctrl.start);
-
 	if (ret) {
 		kvm_err("Cannot map VCTRL into hyp\n");
+		kvm_unregister_device_ops(KVM_DEV_TYPE_ARM_VGIC_V2);
 		iounmap(kvm_vgic_global_state.vctrl_base);
 		return ret;
 	}
 
 	kvm_vgic_global_state.can_emulate_gicv2 = true;
-	kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V2);
-
 	kvm_vgic_global_state.vcpu_base = info->vcpu.start;
 	kvm_vgic_global_state.type = VGIC_V2;
 	kvm_vgic_global_state.max_gic_vcpus = VGIC_V2_MAX_CPUS;
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 346b4ad..e48a22e 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -296,6 +296,7 @@ out:
 int vgic_v3_probe(const struct gic_kvm_info *info)
 {
 	u32 ich_vtr_el2 = kvm_call_hyp(__vgic_v3_get_ich_vtr_el2);
+	int ret;
 
 	/*
 	 * The ListRegs field is 5 bits, but there is a architectural
@@ -319,12 +320,22 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
 	} else {
 		kvm_vgic_global_state.vcpu_base = info->vcpu.start;
 		kvm_vgic_global_state.can_emulate_gicv2 = true;
-		kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V2);
+		ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V2);
+		if (ret) {
+			kvm_err("Cannot register GICv2 KVM device.\n");
+			return ret;
+		}
 		kvm_info("vgic-v2@%llx\n", info->vcpu.start);
 	}
+	ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V3);
+	if (ret) {
+		kvm_err("Cannot register GICv3 KVM device.\n");
+		kvm_unregister_device_ops(KVM_DEV_TYPE_ARM_VGIC_V2);
+		return ret;
+	}
+
 	if (kvm_vgic_global_state.vcpu_base == 0)
 		kvm_info("disabling GICv2 emulation\n");
-	kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V3);
 
 	kvm_vgic_global_state.vctrl_base = NULL;
 	kvm_vgic_global_state.type = VGIC_V3;
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index 7b300ca..c752152 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -124,7 +124,7 @@ static inline int vgic_register_redist_iodevs(struct kvm *kvm,
 }
 #endif
 
-void kvm_register_vgic_device(unsigned long type);
+int kvm_register_vgic_device(unsigned long type);
 int vgic_lazy_init(struct kvm *kvm);
 int vgic_init(struct kvm *kvm);
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 03/17] KVM: extend struct kvm_msi to hold a 32-bit device ID
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
  2016-07-05 11:22 ` [PATCH v8 01/17] KVM: arm/arm64: move redistributor kvm_io_devices Andre Przywara
  2016-07-05 11:22 ` [PATCH v8 02/17] KVM: arm/arm64: check return value for kvm_register_vgic_device Andre Przywara
@ 2016-07-05 11:22 ` Andre Przywara
  2016-07-06 21:06   ` Christoffer Dall
  2016-07-05 11:22 ` [PATCH v8 04/17] KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities Andre Przywara
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

The ARM GICv3 ITS MSI controller requires a device ID to be able to
assign the proper interrupt vector. On real hardware, this ID is
sampled from the bus. To be able to emulate an ITS controller, extend
the KVM MSI interface to let userspace provide such a device ID. For
PCI devices, the device ID is simply the 16-bit bus-device-function
triplet, which should be easily available to the userland tool.

Also there is a new KVM capability which advertises whether the
current VM requires a device ID to be set along with the MSI data.
This flag is still reported as not available everywhere, later we will
enable it when ITS emulation is used.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
---
 Documentation/virtual/kvm/api.txt | 12 ++++++++++--
 include/uapi/linux/kvm.h          |  5 ++++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 09efa9e..6551311 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2175,10 +2175,18 @@ struct kvm_msi {
 	__u32 address_hi;
 	__u32 data;
 	__u32 flags;
-	__u8  pad[16];
+	__u32 devid;
+	__u8  pad[12];
 };
 
-No flags are defined so far. The corresponding field must be 0.
+flags: KVM_MSI_VALID_DEVID: devid contains a valid value
+devid: If KVM_MSI_VALID_DEVID is set, contains a unique device identifier
+       for the device that wrote the MSI message.
+       For PCI, this is usually a BFD identifier in the lower 16 bits.
+
+The per-VM KVM_CAP_MSI_DEVID capability advertises the need to provide
+the device ID. If this capability is not set, userland cannot rely on
+the kernel to allow the KVM_MSI_VALID_DEVID flag being set.
 
 
 4.71 KVM_CREATE_PIT2
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 05ebf47..7de96f5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_ARM_PMU_V3 126
 #define KVM_CAP_VCPU_ATTRIBUTES 127
 #define KVM_CAP_MAX_VCPU_ID 128
+#define KVM_CAP_MSI_DEVID 129
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1024,12 +1025,14 @@ struct kvm_one_reg {
 	__u64 addr;
 };
 
+#define KVM_MSI_VALID_DEVID	(1U << 0)
 struct kvm_msi {
 	__u32 address_lo;
 	__u32 address_hi;
 	__u32 data;
 	__u32 flags;
-	__u8  pad[16];
+	__u32 devid;
+	__u8  pad[12];
 };
 
 struct kvm_arm_device_addr {
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 04/17] KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (2 preceding siblings ...)
  2016-07-05 11:22 ` [PATCH v8 03/17] KVM: extend struct kvm_msi to hold a 32-bit device ID Andre Przywara
@ 2016-07-05 11:22 ` Andre Przywara
  2016-07-05 11:22 ` [PATCH v8 05/17] KVM: kvm_io_bus: add kvm_io_bus_get_dev() call Andre Przywara
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

KVM capabilities can be a per-VM property, though ARM/ARM64 currently
does not pass on the VM pointer to the architecture specific
capability handlers.
Add a "struct kvm*" parameter to those function to later allow proper
per-VM capability reporting.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_host.h   | 2 +-
 arch/arm/kvm/arm.c                | 2 +-
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/reset.c            | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 96387d4..3c40facd 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -281,7 +281,7 @@ static inline void __cpu_reset_hyp_mode(phys_addr_t boot_pgd_ptr,
 	 */
 }
 
-static inline int kvm_arch_dev_ioctl_check_extension(long ext)
+static inline int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 {
 	return 0;
 }
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index c74483f..557e390 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -201,7 +201,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = KVM_MAX_VCPUS;
 		break;
 	default:
-		r = kvm_arch_dev_ioctl_check_extension(ext);
+		r = kvm_arch_dev_ioctl_check_extension(kvm, ext);
 		break;
 	}
 	return r;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 49095fc..ebe8904 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -47,7 +47,7 @@
 
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
-int kvm_arch_dev_ioctl_check_extension(long ext);
+int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext);
 unsigned long kvm_hyp_reset_entry(void);
 void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
 
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 7be24f2..3989833 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -65,7 +65,7 @@ static bool cpu_has_32bit_el1(void)
  * We currently assume that the number of HW registers is uniform
  * across all CPUs (see cpuinfo_sanity_check).
  */
-int kvm_arch_dev_ioctl_check_extension(long ext)
+int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 {
 	int r;
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 05/17] KVM: kvm_io_bus: add kvm_io_bus_get_dev() call
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (3 preceding siblings ...)
  2016-07-05 11:22 ` [PATCH v8 04/17] KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities Andre Przywara
@ 2016-07-05 11:22 ` Andre Przywara
  2016-07-06 21:15   ` Christoffer Dall
  2016-07-05 11:22 ` [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs Andre Przywara
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

The kvm_io_bus framework is a nice place of holding information about
various MMIO regions for kernel emulated devices.
Add a call to retrieve the kvm_io_device structure which is associated
with a certain MMIO address. This avoids to duplicate kvm_io_bus'
knowledge of MMIO regions without having to fake MMIO calls if a user
needs the device a certain MMIO address belongs to.
This will be used by the ITS emulation to get the associated ITS device
when someone triggers an MSI via an ioctl from userspace.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 include/linux/kvm_host.h |  2 ++
 virt/kvm/kvm_main.c      | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0640ee9..614a981 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -164,6 +164,8 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
 			    int len, struct kvm_io_device *dev);
 int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 			      struct kvm_io_device *dev);
+struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+					 gpa_t addr);
 
 #ifdef CONFIG_KVM_ASYNC_PF
 struct kvm_async_pf {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ef54b4c..bd2eb92 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3496,6 +3496,30 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 	return r;
 }
 
+struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
+					 gpa_t addr)
+{
+	struct kvm_io_bus *bus;
+	int dev_idx, srcu_idx;
+	struct kvm_io_device *iodev = NULL;
+
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+
+	bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
+
+	dev_idx = kvm_io_bus_get_first_dev(bus, addr, 1);
+	if (dev_idx < 0)
+		goto out_unlock;
+
+	iodev = bus->range[dev_idx].dev;
+
+out_unlock:
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+
+	return iodev;
+}
+EXPORT_SYMBOL_GPL(kvm_io_bus_get_dev);
+
 static struct notifier_block kvm_cpu_notifier = {
 	.notifier_call = kvm_cpu_hotplug,
 };
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (4 preceding siblings ...)
  2016-07-05 11:22 ` [PATCH v8 05/17] KVM: kvm_io_bus: add kvm_io_bus_get_dev() call Andre Przywara
@ 2016-07-05 11:22 ` Andre Przywara
  2016-07-07 13:13   ` Christoffer Dall
  2016-07-07 15:00   ` Marc Zyngier
  2016-07-05 11:22 ` [PATCH v8 07/17] irqchip: refactor and add GICv3 definitions Andre Przywara
                   ` (12 subsequent siblings)
  18 siblings, 2 replies; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

In the moment our struct vgic_irq's are statically allocated at guest
creation time. So getting a pointer to an IRQ structure is trivial and
safe. LPIs are more dynamic, they can be mapped and unmapped at any time
during the guest's _runtime_.
In preparation for supporting LPIs we introduce reference counting for
those structures using the kernel's kref infrastructure.
Since private IRQs and SPIs are statically allocated, the refcount never
drops to 0 at the moment, but we increase it when an IRQ gets onto a VCPU
list and decrease it when it gets removed.
This introduces vgic_put_irq(), which wraps kref_put and hides the
release function from the callers.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 include/kvm/arm_vgic.h           |  1 +
 virt/kvm/arm/vgic/vgic-init.c    |  2 ++
 virt/kvm/arm/vgic/vgic-mmio-v2.c |  8 +++++++
 virt/kvm/arm/vgic/vgic-mmio-v3.c | 20 +++++++++++------
 virt/kvm/arm/vgic/vgic-mmio.c    | 25 ++++++++++++++++++++-
 virt/kvm/arm/vgic/vgic-v2.c      |  1 +
 virt/kvm/arm/vgic/vgic-v3.c      |  1 +
 virt/kvm/arm/vgic/vgic.c         | 48 +++++++++++++++++++++++++++++++---------
 virt/kvm/arm/vgic/vgic.h         |  1 +
 9 files changed, 89 insertions(+), 18 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 5142e2a..450b4da 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -96,6 +96,7 @@ struct vgic_irq {
 	bool active;			/* not used for LPIs */
 	bool enabled;
 	bool hw;			/* Tied to HW IRQ */
+	struct kref refcount;		/* Used for LPIs */
 	u32 hwintid;			/* HW INTID number */
 	union {
 		u8 targets;			/* GICv2 target VCPUs mask */
diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index 90cae48..ac3c1a5 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -177,6 +177,7 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
 		spin_lock_init(&irq->irq_lock);
 		irq->vcpu = NULL;
 		irq->target_vcpu = vcpu0;
+		kref_init(&irq->refcount);
 		if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
 			irq->targets = 0;
 		else
@@ -211,6 +212,7 @@ static void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
 		irq->vcpu = NULL;
 		irq->target_vcpu = vcpu;
 		irq->targets = 1U << vcpu->vcpu_id;
+		kref_init(&irq->refcount);
 		if (vgic_irq_is_sgi(i)) {
 			/* SGIs */
 			irq->enabled = 1;
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
index a213936..4152348 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
@@ -102,6 +102,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
 		irq->source |= 1U << source_vcpu->vcpu_id;
 
 		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
+		vgic_put_irq(source_vcpu->kvm, irq);
 	}
 }
 
@@ -116,6 +117,8 @@ static unsigned long vgic_mmio_read_target(struct kvm_vcpu *vcpu,
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
 		val |= (u64)irq->targets << (i * 8);
+
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 
 	return val;
@@ -143,6 +146,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
 		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
 
 		spin_unlock(&irq->irq_lock);
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
 
@@ -157,6 +161,8 @@ static unsigned long vgic_mmio_read_sgipend(struct kvm_vcpu *vcpu,
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
 		val |= (u64)irq->source << (i * 8);
+
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 	return val;
 }
@@ -178,6 +184,7 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
 			irq->pending = false;
 
 		spin_unlock(&irq->irq_lock);
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
 
@@ -201,6 +208,7 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
 		} else {
 			spin_unlock(&irq->irq_lock);
 		}
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
 
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index fc7b6c9..bfcafbd 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -80,15 +80,17 @@ static unsigned long vgic_mmio_read_irouter(struct kvm_vcpu *vcpu,
 {
 	int intid = VGIC_ADDR_TO_INTID(addr, 64);
 	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
+	unsigned long ret = 0;
 
 	if (!irq)
 		return 0;
 
 	/* The upper word is RAZ for us. */
-	if (addr & 4)
-		return 0;
+	if (!(addr & 4))
+		ret = extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
 
-	return extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
+	vgic_put_irq(vcpu->kvm, irq);
+	return ret;
 }
 
 static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
@@ -96,15 +98,17 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
 				    unsigned long val)
 {
 	int intid = VGIC_ADDR_TO_INTID(addr, 64);
-	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
-
-	if (!irq)
-		return;
+	struct vgic_irq *irq;
 
 	/* The upper word is WI for us since we don't implement Aff3. */
 	if (addr & 4)
 		return;
 
+	irq = vgic_get_irq(vcpu->kvm, NULL, intid);
+
+	if (!irq)
+		return;
+
 	spin_lock(&irq->irq_lock);
 
 	/* We only care about and preserve Aff0, Aff1 and Aff2. */
@@ -112,6 +116,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
 	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
 
 	spin_unlock(&irq->irq_lock);
+	vgic_put_irq(vcpu->kvm, irq);
 }
 
 static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
@@ -445,5 +450,6 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
 		irq->pending = true;
 
 		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index 9f6fab7..5e79e01 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -56,6 +56,8 @@ unsigned long vgic_mmio_read_enable(struct kvm_vcpu *vcpu,
 
 		if (irq->enabled)
 			value |= (1U << i);
+
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 
 	return value;
@@ -74,6 +76,8 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
 		spin_lock(&irq->irq_lock);
 		irq->enabled = true;
 		vgic_queue_irq_unlock(vcpu->kvm, irq);
+
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
 
@@ -92,6 +96,7 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
 		irq->enabled = false;
 
 		spin_unlock(&irq->irq_lock);
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
 
@@ -108,6 +113,8 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
 
 		if (irq->pending)
 			value |= (1U << i);
+
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 
 	return value;
@@ -129,6 +136,7 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
 			irq->soft_pending = true;
 
 		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
 
@@ -152,6 +160,7 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
 		}
 
 		spin_unlock(&irq->irq_lock);
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
 
@@ -168,6 +177,8 @@ unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu,
 
 		if (irq->active)
 			value |= (1U << i);
+
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 
 	return value;
@@ -242,6 +253,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 		vgic_mmio_change_active(vcpu, irq, false);
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 	vgic_change_active_finish(vcpu, intid);
 }
@@ -257,6 +269,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 		vgic_mmio_change_active(vcpu, irq, true);
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 	vgic_change_active_finish(vcpu, intid);
 }
@@ -272,6 +285,8 @@ unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
 		val |= (u64)irq->priority << (i * 8);
+
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 
 	return val;
@@ -298,6 +313,8 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
 		/* Narrow the priority range to what we actually support */
 		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
 		spin_unlock(&irq->irq_lock);
+
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
 
@@ -313,6 +330,8 @@ unsigned long vgic_mmio_read_config(struct kvm_vcpu *vcpu,
 
 		if (irq->config == VGIC_CONFIG_EDGE)
 			value |= (2U << (i * 2));
+
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 
 	return value;
@@ -326,7 +345,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
 	int i;
 
 	for (i = 0; i < len * 4; i++) {
-		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
+		struct vgic_irq *irq;
 
 		/*
 		 * The configuration cannot be changed for SGIs in general,
@@ -337,14 +356,18 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
 		if (intid + i < VGIC_NR_PRIVATE_IRQS)
 			continue;
 
+		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 		spin_lock(&irq->irq_lock);
+
 		if (test_bit(i * 2 + 1, &val)) {
 			irq->config = VGIC_CONFIG_EDGE;
 		} else {
 			irq->config = VGIC_CONFIG_LEVEL;
 			irq->pending = irq->line_level | irq->soft_pending;
 		}
+
 		spin_unlock(&irq->irq_lock);
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
 
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index 079bf67..0bf6709 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -124,6 +124,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 		}
 
 		spin_unlock(&irq->irq_lock);
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
 
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index e48a22e..f0ac064 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -113,6 +113,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 		}
 
 		spin_unlock(&irq->irq_lock);
+		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
 
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 69b61ab..ae80894 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -48,13 +48,20 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
 struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
 			      u32 intid)
 {
-	/* SGIs and PPIs */
-	if (intid <= VGIC_MAX_PRIVATE)
-		return &vcpu->arch.vgic_cpu.private_irqs[intid];
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	struct vgic_irq *irq;
 
-	/* SPIs */
-	if (intid <= VGIC_MAX_SPI)
-		return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
+	if (intid <= VGIC_MAX_PRIVATE) {        /* SGIs and PPIs */
+		irq = &vcpu->arch.vgic_cpu.private_irqs[intid];
+		kref_get(&irq->refcount);
+		return irq;
+	}
+
+	if (intid <= VGIC_MAX_SPI) {            /* SPIs */
+		irq = &dist->spis[intid - VGIC_NR_PRIVATE_IRQS];
+		kref_get(&irq->refcount);
+		return irq;
+	}
 
 	/* LPIs are not yet covered */
 	if (intid >= VGIC_MIN_LPI)
@@ -64,6 +71,17 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
 	return NULL;
 }
 
+/* The refcount should never drop to 0 at the moment. */
+static void vgic_irq_release(struct kref *ref)
+{
+	WARN_ON(1);
+}
+
+void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
+{
+	kref_put(&irq->refcount, vgic_irq_release);
+}
+
 /**
  * kvm_vgic_target_oracle - compute the target vcpu for an irq
  *
@@ -236,6 +254,7 @@ retry:
 		goto retry;
 	}
 
+	kref_get(&irq->refcount);
 	list_add_tail(&irq->ap_list, &vcpu->arch.vgic_cpu.ap_list_head);
 	irq->vcpu = vcpu;
 
@@ -269,14 +288,17 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid,
 	if (!irq)
 		return -EINVAL;
 
-	if (irq->hw != mapped_irq)
+	if (irq->hw != mapped_irq) {
+		vgic_put_irq(kvm, irq);
 		return -EINVAL;
+	}
 
 	spin_lock(&irq->irq_lock);
 
 	if (!vgic_validate_injection(irq, level)) {
 		/* Nothing to see here, move along... */
 		spin_unlock(&irq->irq_lock);
+		vgic_put_irq(kvm, irq);
 		return 0;
 	}
 
@@ -288,6 +310,7 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid,
 	}
 
 	vgic_queue_irq_unlock(kvm, irq);
+	vgic_put_irq(kvm, irq);
 
 	return 0;
 }
@@ -330,25 +353,28 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
 	irq->hwintid = phys_irq;
 
 	spin_unlock(&irq->irq_lock);
+	vgic_put_irq(vcpu->kvm, irq);
 
 	return 0;
 }
 
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
 {
-	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
-
-	BUG_ON(!irq);
+	struct vgic_irq *irq;
 
 	if (!vgic_initialized(vcpu->kvm))
 		return -EAGAIN;
 
+	irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
+	BUG_ON(!irq);
+
 	spin_lock(&irq->irq_lock);
 
 	irq->hw = false;
 	irq->hwintid = 0;
 
 	spin_unlock(&irq->irq_lock);
+	vgic_put_irq(vcpu->kvm, irq);
 
 	return 0;
 }
@@ -386,6 +412,7 @@ retry:
 			list_del(&irq->ap_list);
 			irq->vcpu = NULL;
 			spin_unlock(&irq->irq_lock);
+			vgic_put_irq(vcpu->kvm, irq);
 			continue;
 		}
 
@@ -614,6 +641,7 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
 	spin_lock(&irq->irq_lock);
 	map_is_active = irq->hw && irq->active;
 	spin_unlock(&irq->irq_lock);
+	vgic_put_irq(vcpu->kvm, irq);
 
 	return map_is_active;
 }
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index c752152..5b79c34 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -38,6 +38,7 @@ struct vgic_vmcr {
 
 struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
 			      u32 intid);
+void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
 bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
 void vgic_kick_vcpus(struct kvm *kvm);
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 07/17] irqchip: refactor and add GICv3 definitions
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (5 preceding siblings ...)
  2016-07-05 11:22 ` [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs Andre Przywara
@ 2016-07-05 11:22 ` Andre Przywara
  2016-07-05 11:23 ` [PATCH v8 08/17] KVM: arm64: handle ITS related GICv3 redistributor registers Andre Przywara
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

arm-gic-v3.h contains bit and register definitions for the GICv3 and ITS,
at least for the bits the we currently care about.
The ITS emulation needs more definitions, so add them and refactor
the memory attribute #defines to be more universally usable.
To avoid changing all users, we still provide some of the old definitons
defined with the help of the new macros.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 include/linux/irqchip/arm-gic-v3.h | 165 +++++++++++++++++++++++--------------
 1 file changed, 105 insertions(+), 60 deletions(-)

diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
index bfbd707..2699aa0 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -112,34 +112,53 @@
 #define GICR_WAKER_ProcessorSleep	(1U << 1)
 #define GICR_WAKER_ChildrenAsleep	(1U << 2)
 
-#define GICR_PROPBASER_NonShareable	(0U << 10)
-#define GICR_PROPBASER_InnerShareable	(1U << 10)
-#define GICR_PROPBASER_OuterShareable	(2U << 10)
-#define GICR_PROPBASER_SHAREABILITY_MASK (3UL << 10)
-#define GICR_PROPBASER_nCnB		(0U << 7)
-#define GICR_PROPBASER_nC		(1U << 7)
-#define GICR_PROPBASER_RaWt		(2U << 7)
-#define GICR_PROPBASER_RaWb		(3U << 7)
-#define GICR_PROPBASER_WaWt		(4U << 7)
-#define GICR_PROPBASER_WaWb		(5U << 7)
-#define GICR_PROPBASER_RaWaWt		(6U << 7)
-#define GICR_PROPBASER_RaWaWb		(7U << 7)
-#define GICR_PROPBASER_CACHEABILITY_MASK (7U << 7)
-#define GICR_PROPBASER_IDBITS_MASK	(0x1f)
-
-#define GICR_PENDBASER_NonShareable	(0U << 10)
-#define GICR_PENDBASER_InnerShareable	(1U << 10)
-#define GICR_PENDBASER_OuterShareable	(2U << 10)
-#define GICR_PENDBASER_SHAREABILITY_MASK (3UL << 10)
-#define GICR_PENDBASER_nCnB		(0U << 7)
-#define GICR_PENDBASER_nC		(1U << 7)
-#define GICR_PENDBASER_RaWt		(2U << 7)
-#define GICR_PENDBASER_RaWb		(3U << 7)
-#define GICR_PENDBASER_WaWt		(4U << 7)
-#define GICR_PENDBASER_WaWb		(5U << 7)
-#define GICR_PENDBASER_RaWaWt		(6U << 7)
-#define GICR_PENDBASER_RaWaWb		(7U << 7)
-#define GICR_PENDBASER_CACHEABILITY_MASK (7U << 7)
+#define GIC_BASER_CACHE_nCnB		0ULL
+#define GIC_BASER_CACHE_SameAsInner	0ULL
+#define GIC_BASER_CACHE_nC		1ULL
+#define GIC_BASER_CACHE_RaWt		2ULL
+#define GIC_BASER_CACHE_RaWb		3ULL
+#define GIC_BASER_CACHE_WaWt		4ULL
+#define GIC_BASER_CACHE_WaWb		5ULL
+#define GIC_BASER_CACHE_RaWaWt		6ULL
+#define GIC_BASER_CACHE_RaWaWb		7ULL
+#define GIC_BASER_CACHE_MASK		7ULL
+#define GIC_BASER_NonShareable		0ULL
+#define GIC_BASER_InnerShareable	1ULL
+#define GIC_BASER_OuterShareable	2ULL
+#define GIC_BASER_SHAREABILITY_MASK	3ULL
+
+#define GIC_BASER_CACHEABILITY(reg, inner_outer, type)			\
+	(GIC_BASER_CACHE_##type << reg##_##inner_outer##_CACHEABILITY_SHIFT)
+
+#define GIC_BASER_SHAREABILITY(reg, type)				\
+	(GIC_BASER_##type << reg##_SHAREABILITY_SHIFT)
+
+#define GICR_PROPBASER_SHAREABILITY_SHIFT		(10)
+#define GICR_PROPBASER_INNER_CACHEABILITY_SHIFT		(7)
+#define GICR_PROPBASER_OUTER_CACHEABILITY_SHIFT		(56)
+#define GICR_PROPBASER_SHAREABILITY_MASK				\
+	GIC_BASER_SHAREABILITY(GICR_PROPBASER, SHAREABILITY_MASK)
+#define GICR_PROPBASER_CACHEABILITY_MASK				\
+	GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, MASK)
+
+#define GICR_PROPBASER_InnerShareable					\
+	GIC_BASER_SHAREABILITY(GICR_PROPBASER, InnerShareable)
+#define GICR_PROPBASER_nC GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, nC)
+#define GICR_PROPBASER_WaWb GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, WaWb)
+#define GICR_PROPBASER_IDBITS_MASK			(0x1f)
+
+#define GICR_PENDBASER_SHAREABILITY_SHIFT		(10)
+#define GICR_PENDBASER_INNER_CACHEABILITY_SHIFT		(7)
+#define GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT		(56)
+#define GICR_PENDBASER_SHAREABILITY_MASK				\
+	GIC_BASER_SHAREABILITY(GICR_PENDBASER, SHAREABILITY_MASK)
+#define GICR_PENDBASER_CACHEABILITY_MASK				\
+	GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, MASK)
+
+#define GICR_PENDBASER_InnerShareable					\
+	GIC_BASER_SHAREABILITY(GICR_PENDBASER, InnerShareable)
+#define GICR_PENDBASER_nC GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, nC)
+#define GICR_PENDBASER_WaWb GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, WaWb)
 
 /*
  * Re-Distributor registers, offsets from SGI_base
@@ -175,53 +194,62 @@
 #define GITS_CWRITER			0x0088
 #define GITS_CREADR			0x0090
 #define GITS_BASER			0x0100
+#define GITS_IDREGS_BASE		0xffd0
+#define GITS_PIDR0			0xffe0
+#define GITS_PIDR1			0xffe4
 #define GITS_PIDR2			GICR_PIDR2
+#define GITS_PIDR4			0xffd0
+#define GITS_CIDR0			0xfff0
+#define GITS_CIDR1			0xfff4
+#define GITS_CIDR2			0xfff8
+#define GITS_CIDR3			0xfffc
 
 #define GITS_TRANSLATER			0x10040
 
 #define GITS_CTLR_ENABLE		(1U << 0)
 #define GITS_CTLR_QUIESCENT		(1U << 31)
 
+#define GITS_TYPER_PLPIS		(1UL << 0)
+#define GITS_TYPER_IDBITS_SHIFT		8
 #define GITS_TYPER_DEVBITS_SHIFT	13
 #define GITS_TYPER_DEVBITS(r)		((((r) >> GITS_TYPER_DEVBITS_SHIFT) & 0x1f) + 1)
 #define GITS_TYPER_PTA			(1UL << 19)
-
-#define GITS_CBASER_VALID		(1UL << 63)
-#define GITS_CBASER_nCnB		(0UL << 59)
-#define GITS_CBASER_nC			(1UL << 59)
-#define GITS_CBASER_RaWt		(2UL << 59)
-#define GITS_CBASER_RaWb		(3UL << 59)
-#define GITS_CBASER_WaWt		(4UL << 59)
-#define GITS_CBASER_WaWb		(5UL << 59)
-#define GITS_CBASER_RaWaWt		(6UL << 59)
-#define GITS_CBASER_RaWaWb		(7UL << 59)
-#define GITS_CBASER_CACHEABILITY_MASK	(7UL << 59)
-#define GITS_CBASER_NonShareable	(0UL << 10)
-#define GITS_CBASER_InnerShareable	(1UL << 10)
-#define GITS_CBASER_OuterShareable	(2UL << 10)
-#define GITS_CBASER_SHAREABILITY_MASK	(3UL << 10)
+#define GITS_TYPER_HWCOLLCNT_SHIFT	24
+
+#define GITS_CBASER_VALID			(1UL << 63)
+#define GITS_CBASER_SHAREABILITY_SHIFT		(10)
+#define GITS_CBASER_INNER_CACHEABILITY_SHIFT	(59)
+#define GITS_CBASER_OUTER_CACHEABILITY_SHIFT	(53)
+#define GITS_CBASER_SHAREABILITY_MASK					\
+	GIC_BASER_SHAREABILITY(GITS_CBASER, SHAREABILITY_MASK)
+#define GITS_CBASER_CACHEABILITY_MASK					\
+	GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, MASK)
+
+#define GITS_CBASER_InnerShareable					\
+	GIC_BASER_SHAREABILITY(GITS_CBASER, InnerShareable)
+#define GITS_CBASER_nC GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, nC)
+#define GITS_CBASER_WaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, WaWb)
 
 #define GITS_BASER_NR_REGS		8
 
-#define GITS_BASER_VALID		(1UL << 63)
-#define GITS_BASER_nCnB			(0UL << 59)
-#define GITS_BASER_nC			(1UL << 59)
-#define GITS_BASER_RaWt			(2UL << 59)
-#define GITS_BASER_RaWb			(3UL << 59)
-#define GITS_BASER_WaWt			(4UL << 59)
-#define GITS_BASER_WaWb			(5UL << 59)
-#define GITS_BASER_RaWaWt		(6UL << 59)
-#define GITS_BASER_RaWaWb		(7UL << 59)
-#define GITS_BASER_CACHEABILITY_MASK	(7UL << 59)
-#define GITS_BASER_TYPE_SHIFT		(56)
+#define GITS_BASER_VALID			(1UL << 63)
+#define GITS_BASER_INDIRECT			(1ULL << 62)
+#define GITS_BASER_INNER_CACHEABILITY_SHIFT	(59)
+#define GITS_BASER_OUTER_CACHEABILITY_SHIFT	(53)
+#define GITS_BASER_INNER_CACHEABILITY_MASK				\
+	GIC_BASER_CACHEABILITY(GITS_BASER, INNER, MASK)
+#define GITS_BASER_SHAREABILITY_MASK					\
+	GIC_BASER_SHAREABILITY(GITS_BASER, SHAREABILITY_MASK)
+
+#define GITS_BASER_nC GIC_BASER_CACHEABILITY(GITS_BASER, INNER, nC)
+#define GITS_BASER_WaWb GIC_BASER_CACHEABILITY(GITS_BASER, INNER, WaWb)
+#define GITS_BASER_TYPE_SHIFT			(56)
 #define GITS_BASER_TYPE(r)		(((r) >> GITS_BASER_TYPE_SHIFT) & 7)
-#define GITS_BASER_ENTRY_SIZE_SHIFT	(48)
+#define GITS_BASER_ENTRY_SIZE_SHIFT		(48)
 #define GITS_BASER_ENTRY_SIZE(r)	((((r) >> GITS_BASER_ENTRY_SIZE_SHIFT) & 0xff) + 1)
-#define GITS_BASER_NonShareable		(0UL << 10)
-#define GITS_BASER_InnerShareable	(1UL << 10)
-#define GITS_BASER_OuterShareable	(2UL << 10)
 #define GITS_BASER_SHAREABILITY_SHIFT	(10)
-#define GITS_BASER_SHAREABILITY_MASK	(3UL << GITS_BASER_SHAREABILITY_SHIFT)
+#define GITS_BASER_InnerShareable					\
+	GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable)
 #define GITS_BASER_PAGE_SIZE_SHIFT	(8)
 #define GITS_BASER_PAGE_SIZE_4K		(0UL << GITS_BASER_PAGE_SIZE_SHIFT)
 #define GITS_BASER_PAGE_SIZE_16K	(1UL << GITS_BASER_PAGE_SIZE_SHIFT)
@@ -243,7 +271,10 @@
  */
 #define GITS_CMD_MAPD			0x08
 #define GITS_CMD_MAPC			0x09
-#define GITS_CMD_MAPVI			0x0a
+#define GITS_CMD_MAPTI			0x0a
+/* older GIC documentation used MAPVI for this command */
+#define GITS_CMD_MAPVI			GITS_CMD_MAPTI
+#define GITS_CMD_MAPI			0x0b
 #define GITS_CMD_MOVI			0x01
 #define GITS_CMD_DISCARD		0x0f
 #define GITS_CMD_INV			0x0c
@@ -254,6 +285,20 @@
 #define GITS_CMD_SYNC			0x05
 
 /*
+ * ITS error numbers
+ */
+#define E_ITS_MOVI_UNMAPPED_INTERRUPT           0x010107
+#define E_ITS_MOVI_UNMAPPED_COLLECTION          0x010109
+#define E_ITS_CLEAR_UNMAPPED_INTERRUPT          0x010507
+#define E_ITS_MAPC_PROCNUM_OOR                  0x010902
+#define E_ITS_MAPTI_UNMAPPED_DEVICE             0x010a04
+#define E_ITS_MAPTI_PHYSICALID_OOR              0x010a06
+#define E_ITS_INV_UNMAPPED_INTERRUPT            0x010c07
+#define E_ITS_INVALL_UNMAPPED_COLLECTION        0x010d09
+#define E_ITS_MOVALL_PROCNUM_OOR                0x010e01
+#define E_ITS_DISCARD_UNMAPPED_INTERRUPT        0x010f07
+
+/*
  * CPU interface registers
  */
 #define ICC_CTLR_EL1_EOImode_drop_dir	(0U << 1)
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 08/17] KVM: arm64: handle ITS related GICv3 redistributor registers
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (6 preceding siblings ...)
  2016-07-05 11:22 ` [PATCH v8 07/17] irqchip: refactor and add GICv3 definitions Andre Przywara
@ 2016-07-05 11:23 ` Andre Przywara
  2016-07-08 15:40   ` Christoffer Dall
  2016-07-05 11:23 ` [PATCH v8 09/17] KVM: arm64: introduce ITS emulation file with MMIO framework Andre Przywara
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

In the GICv3 redistributor there are the PENDBASER and PROPBASER
registers which we did not emulate so far, as they only make sense
when having an ITS. In preparation for that emulate those MMIO
accesses by storing the 64-bit data written into it into a variable
which we later read in the ITS emulation.
We also sanitise the registers, making sure RES0 regions are respected
and checking for valid memory attributes.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 include/kvm/arm_vgic.h           |  13 ++++
 virt/kvm/arm/vgic/vgic-mmio-v3.c | 143 ++++++++++++++++++++++++++++++++++++++-
 virt/kvm/arm/vgic/vgic-mmio.h    |   8 +++
 virt/kvm/arm/vgic/vgic-v3.c      |  11 ++-
 4 files changed, 171 insertions(+), 4 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 450b4da..f6f860d 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -146,6 +146,14 @@ struct vgic_dist {
 	struct vgic_irq		*spis;
 
 	struct vgic_io_device	dist_iodev;
+
+	/*
+	 * Contains the address of the LPI configuration table.
+	 * Since we report GICR_TYPER.CommonLPIAff as 0b00, we can share
+	 * one address across all redistributors.
+	 * GICv3 spec: 6.1.2 "LPI Configuration tables"
+	 */
+	u64			propbaser;
 };
 
 struct vgic_v2_cpu_if {
@@ -200,6 +208,11 @@ struct vgic_cpu {
 	 */
 	struct vgic_io_device	rd_iodev;
 	struct vgic_io_device	sgi_iodev;
+
+	/* Points to the LPI pending tables for the redistributor */
+	u64 pendbaser;
+
+	bool lpis_enabled;
 };
 
 int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write);
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index bfcafbd..9dd8632 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -29,6 +29,19 @@ static unsigned long extract_bytes(unsigned long data, unsigned int offset,
 	return (data >> (offset * 8)) & GENMASK_ULL(num * 8 - 1, 0);
 }
 
+/* allows updates of any half of a 64-bit register (or the whole thing) */
+static u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
+			    unsigned long val)
+{
+	int lower = (offset & 4) * 8;
+	int upper = lower + 8 * len - 1;
+
+	reg &= ~GENMASK_ULL(upper, lower);
+	val &= GENMASK_ULL(len * 8 - 1, 0);
+
+	return reg | ((u64)val << lower);
+}
+
 static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu,
 					    gpa_t addr, unsigned int len)
 {
@@ -152,6 +165,132 @@ static unsigned long vgic_mmio_read_v3_idregs(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
+/* We want to avoid outer shareable. */
+u64 vgic_sanitise_shareability(u64 reg)
+{
+	switch (reg & GIC_BASER_SHAREABILITY_MASK) {
+	case GIC_BASER_OuterShareable:
+		return GIC_BASER_InnerShareable;
+	default:
+		return reg;
+	}
+}
+
+/* Non-cacheable or same-as-inner are OK. */
+u64 vgic_sanitise_outer_cacheability(u64 reg)
+{
+	switch (reg & GIC_BASER_CACHE_MASK) {
+	case GIC_BASER_CACHE_SameAsInner:
+	case GIC_BASER_CACHE_nC:
+		return reg;
+	default:
+		return GIC_BASER_CACHE_nC;
+	}
+}
+
+/* Avoid any inner non-cacheable mapping. */
+u64 vgic_sanitise_inner_cacheability(u64 reg)
+{
+	switch (reg & GIC_BASER_CACHE_MASK) {
+	case GIC_BASER_CACHE_nCnB:
+	case GIC_BASER_CACHE_nC:
+		return GIC_BASER_CACHE_RaWb;
+	default:
+		return reg;
+	}
+}
+
+u64 vgic_sanitise_field(u64 reg, int field_shift, u64 field_mask,
+			u64 (*sanitise_fn)(u64))
+{
+	u64 field = (reg >> field_shift) & field_mask;
+
+	field = sanitise_fn(field) << field_shift;
+	return (reg & ~(field_mask << field_shift)) | field;
+}
+
+static u64 vgic_sanitise_pendbaser(u64 reg)
+{
+	reg = vgic_sanitise_field(reg, GICR_PENDBASER_SHAREABILITY_SHIFT,
+				  GIC_BASER_SHAREABILITY_MASK,
+				  vgic_sanitise_shareability);
+	reg = vgic_sanitise_field(reg, GICR_PENDBASER_INNER_CACHEABILITY_SHIFT,
+				  GIC_BASER_CACHE_MASK,
+				  vgic_sanitise_inner_cacheability);
+	reg = vgic_sanitise_field(reg, GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT,
+				  GIC_BASER_CACHE_MASK,
+				  vgic_sanitise_outer_cacheability);
+	return reg;
+}
+
+static u64 vgic_sanitise_propbaser(u64 reg)
+{
+	reg = vgic_sanitise_field(reg, GICR_PROPBASER_SHAREABILITY_SHIFT,
+				  GIC_BASER_SHAREABILITY_MASK,
+				  vgic_sanitise_shareability);
+	reg = vgic_sanitise_field(reg, GICR_PROPBASER_INNER_CACHEABILITY_SHIFT,
+				  GIC_BASER_CACHE_MASK,
+				  vgic_sanitise_inner_cacheability);
+	reg = vgic_sanitise_field(reg, GICR_PROPBASER_OUTER_CACHEABILITY_SHIFT,
+				  GIC_BASER_CACHE_MASK,
+				  vgic_sanitise_outer_cacheability);
+	return reg;
+}
+
+#define PROPBASER_RES0_MASK						\
+	(GENMASK_ULL(63, 59) | GENMASK_ULL(55, 52) | GENMASK_ULL(6, 5))
+#define PENDBASER_RES0_MASK						\
+	(BIT_ULL(63) | GENMASK_ULL(61, 59) | GENMASK_ULL(55, 52) |	\
+	 GENMASK_ULL(15, 12) | GENMASK_ULL(6, 0))
+
+static unsigned long vgic_mmio_read_propbase(struct kvm_vcpu *vcpu,
+					     gpa_t addr, unsigned int len)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+
+	return extract_bytes(dist->propbaser, addr & 7, len);
+}
+
+static void vgic_mmio_write_propbase(struct kvm_vcpu *vcpu,
+				     gpa_t addr, unsigned int len,
+				     unsigned long val)
+{
+	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	/* Storing a value with LPIs already enabled is undefined */
+	if (vgic_cpu->lpis_enabled)
+		return;
+
+	dist->propbaser = update_64bit_reg(dist->propbaser, addr & 4, len, val);
+	dist->propbaser &= ~PROPBASER_RES0_MASK;
+	dist->propbaser = vgic_sanitise_propbaser(dist->propbaser);
+}
+
+static unsigned long vgic_mmio_read_pendbase(struct kvm_vcpu *vcpu,
+					     gpa_t addr, unsigned int len)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	return extract_bytes(vgic_cpu->pendbaser, addr & 7, len);
+}
+
+static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu,
+				     gpa_t addr, unsigned int len,
+				     unsigned long val)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	/* Storing a value with LPIs already enabled is undefined */
+	if (vgic_cpu->lpis_enabled)
+		return;
+
+	vgic_cpu->pendbaser = update_64bit_reg(vgic_cpu->pendbaser,
+					       addr & 4, len, val);
+	vgic_cpu->pendbaser &= ~PENDBASER_RES0_MASK;
+	vgic_cpu->pendbaser = vgic_sanitise_pendbaser(vgic_cpu->pendbaser);
+}
+
 /*
  * The GICv3 per-IRQ registers are split to control PPIs and SGIs in the
  * redistributors, while SPIs are covered by registers in the distributor
@@ -232,10 +371,10 @@ static const struct vgic_register_region vgic_v3_rdbase_registers[] = {
 		vgic_mmio_read_v3r_typer, vgic_mmio_write_wi, 8,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_LENGTH(GICR_PROPBASER,
-		vgic_mmio_read_raz, vgic_mmio_write_wi, 8,
+		vgic_mmio_read_propbase, vgic_mmio_write_propbase, 8,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_LENGTH(GICR_PENDBASER,
-		vgic_mmio_read_raz, vgic_mmio_write_wi, 8,
+		vgic_mmio_read_pendbase, vgic_mmio_write_pendbase, 8,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
 	REGISTER_DESC_WITH_LENGTH(GICR_IDREGS,
 		vgic_mmio_read_v3_idregs, vgic_mmio_write_wi, 48,
diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
index 8509014..e863ccc 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.h
+++ b/virt/kvm/arm/vgic/vgic-mmio.h
@@ -147,4 +147,12 @@ unsigned int vgic_v2_init_dist_iodev(struct vgic_io_device *dev);
 
 unsigned int vgic_v3_init_dist_iodev(struct vgic_io_device *dev);
 
+#ifdef CONFIG_KVM_ARM_VGIC_V3
+u64 vgic_sanitise_outer_cacheability(u64 reg);
+u64 vgic_sanitise_inner_cacheability(u64 reg);
+u64 vgic_sanitise_shareability(u64 reg);
+u64 vgic_sanitise_field(u64 reg, int field_shift, u64 field_mask,
+			u64 (*sanitise_fn)(u64));
+#endif
+
 #endif
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index f0ac064..6f8f31f 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -191,6 +191,11 @@ void vgic_v3_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcrp)
 	vmcrp->pmr  = (vmcr & ICH_VMCR_PMR_MASK) >> ICH_VMCR_PMR_SHIFT;
 }
 
+#define INITIAL_PENDBASER_VALUE						  \
+	(GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWb)		| \
+	GIC_BASER_CACHEABILITY(GICR_PENDBASER, OUTER, SameAsInner)	| \
+	GIC_BASER_SHAREABILITY(GICR_PENDBASER, InnerShareable))
+
 void vgic_v3_enable(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *vgic_v3 = &vcpu->arch.vgic_cpu.vgic_v3;
@@ -208,10 +213,12 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 	 * way, so we force SRE to 1 to demonstrate this to the guest.
 	 * This goes with the spec allowing the value to be RAO/WI.
 	 */
-	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
+	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
 		vgic_v3->vgic_sre = ICC_SRE_EL1_SRE;
-	else
+		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
+	} else {
 		vgic_v3->vgic_sre = 0;
+	}
 
 	/* Get the show on the road... */
 	vgic_v3->vgic_hcr = ICH_HCR_EN;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 09/17] KVM: arm64: introduce ITS emulation file with MMIO framework
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (7 preceding siblings ...)
  2016-07-05 11:23 ` [PATCH v8 08/17] KVM: arm64: handle ITS related GICv3 redistributor registers Andre Przywara
@ 2016-07-05 11:23 ` Andre Przywara
  2016-07-08 13:34   ` Marc Zyngier
  2016-07-05 11:23 ` [PATCH v8 10/17] KVM: arm64: introduce new KVM ITS device Andre Przywara
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

The ARM GICv3 ITS emulation code goes into a separate file, but needs
to be connected to the GICv3 emulation, of which it is an option.
The ITS MMIO handlers require the respective ITS pointer to be passed in,
so we amend the existing VGIC MMIO framework to let it cope with that.
Also we introduce the basic ITS data structure and initialize it, but
don't return any success yet, as we are not yet ready for the show.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 arch/arm64/kvm/Makefile          |   1 +
 include/kvm/arm_vgic.h           |  14 +++++-
 virt/kvm/arm/vgic/vgic-its.c     | 100 +++++++++++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic-mmio-v2.c |  40 +++++++--------
 virt/kvm/arm/vgic/vgic-mmio-v3.c | 104 ++++++++++++++++++++++++++-------------
 virt/kvm/arm/vgic/vgic-mmio.c    |  36 +++++++++++---
 virt/kvm/arm/vgic/vgic-mmio.h    |  31 +++++++++---
 virt/kvm/arm/vgic/vgic.h         |   7 +++
 8 files changed, 266 insertions(+), 67 deletions(-)
 create mode 100644 virt/kvm/arm/vgic/vgic-its.c

diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index f00b2cd..a5b9664 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -29,5 +29,6 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio-v2.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio-v3.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-kvm-device.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
 kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index f6f860d..f606641 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -108,15 +108,27 @@ struct vgic_irq {
 };
 
 struct vgic_register_region;
+struct vgic_its;
 
 struct vgic_io_device {
 	gpa_t base_addr;
-	struct kvm_vcpu *redist_vcpu;
+	union {
+		struct kvm_vcpu *redist_vcpu;
+		struct vgic_its *its;
+	};
 	const struct vgic_register_region *regions;
 	int nr_regions;
 	struct kvm_io_device dev;
 };
 
+struct vgic_its {
+	/* The base address of the ITS control register frame */
+	gpa_t			vgic_its_base;
+
+	bool			enabled;
+	struct vgic_io_device	iodev;
+};
+
 struct vgic_dist {
 	bool			in_kernel;
 	bool			ready;
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
new file mode 100644
index 0000000..ab8d244
--- /dev/null
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -0,0 +1,100 @@
+/*
+ * GICv3 ITS emulation
+ *
+ * Copyright (C) 2015,2016 ARM Ltd.
+ * Author: Andre Przywara <andre.przywara@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/interrupt.h>
+
+#include <linux/irqchip/arm-gic-v3.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
+
+#include "vgic.h"
+#include "vgic-mmio.h"
+
+#define REGISTER_ITS_DESC(off, rd, wr, length, acc)		\
+{								\
+	.reg_offset = off,					\
+	.len = length,						\
+	.access_flags = acc,					\
+	.iodev_type = IODEV_ITS,				\
+	.its_read = rd,						\
+	.its_write = wr,					\
+}
+
+static unsigned long its_mmio_read_raz(struct kvm *kvm, struct vgic_its *its,
+				       gpa_t addr, unsigned int len)
+{
+	return 0;
+}
+
+static void its_mmio_write_wi(struct kvm *kvm, struct vgic_its *its,
+			      gpa_t addr, unsigned int len, unsigned long val)
+{
+	/* Ignore */
+}
+
+static struct vgic_register_region its_registers[] = {
+	REGISTER_ITS_DESC(GITS_CTLR,
+		its_mmio_read_raz, its_mmio_write_wi, 4,
+		VGIC_ACCESS_32bit),
+	REGISTER_ITS_DESC(GITS_IIDR,
+		its_mmio_read_raz, its_mmio_write_wi, 4,
+		VGIC_ACCESS_32bit),
+	REGISTER_ITS_DESC(GITS_TYPER,
+		its_mmio_read_raz, its_mmio_write_wi, 8,
+		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
+	REGISTER_ITS_DESC(GITS_CBASER,
+		its_mmio_read_raz, its_mmio_write_wi, 8,
+		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
+	REGISTER_ITS_DESC(GITS_CWRITER,
+		its_mmio_read_raz, its_mmio_write_wi, 8,
+		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
+	REGISTER_ITS_DESC(GITS_CREADR,
+		its_mmio_read_raz, its_mmio_write_wi, 8,
+		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
+	REGISTER_ITS_DESC(GITS_BASER,
+		its_mmio_read_raz, its_mmio_write_wi, 0x40,
+		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
+	REGISTER_ITS_DESC(GITS_IDREGS_BASE,
+		its_mmio_read_raz, its_mmio_write_wi, 0x30,
+		VGIC_ACCESS_32bit),
+};
+
+int vits_register(struct kvm *kvm, struct vgic_its *its)
+{
+	struct vgic_io_device *iodev = &its->iodev;
+	int ret;
+
+	iodev->regions = its_registers;
+	iodev->nr_regions = ARRAY_SIZE(its_registers);
+	kvm_iodevice_init(&iodev->dev, &kvm_io_gic_ops);
+
+	iodev->base_addr = its->vgic_its_base;
+	iodev->its = its;
+	mutex_lock(&kvm->slots_lock);
+	ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, iodev->base_addr,
+				      SZ_64K, &iodev->dev);
+	mutex_unlock(&kvm->slots_lock);
+
+	return ret;
+}
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
index 4152348..bca5bf7 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
@@ -291,67 +291,67 @@ static void vgic_mmio_write_vcpuif(struct kvm_vcpu *vcpu,
 }
 
 static const struct vgic_register_region vgic_v2_dist_registers[] = {
-	REGISTER_DESC_WITH_LENGTH(GIC_DIST_CTRL,
+	REGISTER_DESC_WITH_LENGTH(GIC_DIST_CTRL, IODEV_DIST,
 		vgic_mmio_read_v2_misc, vgic_mmio_write_v2_misc, 12,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_IGROUP,
+	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_IGROUP, IODEV_DIST,
 		vgic_mmio_read_rao, vgic_mmio_write_wi, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_SET,
+	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_SET, IODEV_DIST,
 		vgic_mmio_read_enable, vgic_mmio_write_senable, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_CLEAR,
+	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_CLEAR, IODEV_DIST,
 		vgic_mmio_read_enable, vgic_mmio_write_cenable, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_SET,
+	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_SET, IODEV_DIST,
 		vgic_mmio_read_pending, vgic_mmio_write_spending, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_CLEAR,
+	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_CLEAR, IODEV_DIST,
 		vgic_mmio_read_pending, vgic_mmio_write_cpending, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_SET,
+	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_SET, IODEV_DIST,
 		vgic_mmio_read_active, vgic_mmio_write_sactive, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_CLEAR,
+	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_CLEAR, IODEV_DIST,
 		vgic_mmio_read_active, vgic_mmio_write_cactive, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PRI,
+	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PRI, IODEV_DIST,
 		vgic_mmio_read_priority, vgic_mmio_write_priority, 8,
 		VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_TARGET,
+	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_TARGET, IODEV_DIST,
 		vgic_mmio_read_target, vgic_mmio_write_target, 8,
 		VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_CONFIG,
+	REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_CONFIG, IODEV_DIST,
 		vgic_mmio_read_config, vgic_mmio_write_config, 2,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GIC_DIST_SOFTINT,
+	REGISTER_DESC_WITH_LENGTH(GIC_DIST_SOFTINT, IODEV_DIST,
 		vgic_mmio_read_raz, vgic_mmio_write_sgir, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GIC_DIST_SGI_PENDING_CLEAR,
+	REGISTER_DESC_WITH_LENGTH(GIC_DIST_SGI_PENDING_CLEAR, IODEV_DIST,
 		vgic_mmio_read_sgipend, vgic_mmio_write_sgipendc, 16,
 		VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
-	REGISTER_DESC_WITH_LENGTH(GIC_DIST_SGI_PENDING_SET,
+	REGISTER_DESC_WITH_LENGTH(GIC_DIST_SGI_PENDING_SET, IODEV_DIST,
 		vgic_mmio_read_sgipend, vgic_mmio_write_sgipends, 16,
 		VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
 };
 
 static const struct vgic_register_region vgic_v2_cpu_registers[] = {
-	REGISTER_DESC_WITH_LENGTH(GIC_CPU_CTRL,
+	REGISTER_DESC_WITH_LENGTH(GIC_CPU_CTRL, IODEV_CPUIF,
 		vgic_mmio_read_vcpuif, vgic_mmio_write_vcpuif, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GIC_CPU_PRIMASK,
+	REGISTER_DESC_WITH_LENGTH(GIC_CPU_PRIMASK, IODEV_CPUIF,
 		vgic_mmio_read_vcpuif, vgic_mmio_write_vcpuif, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GIC_CPU_BINPOINT,
+	REGISTER_DESC_WITH_LENGTH(GIC_CPU_BINPOINT, IODEV_CPUIF,
 		vgic_mmio_read_vcpuif, vgic_mmio_write_vcpuif, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GIC_CPU_ALIAS_BINPOINT,
+	REGISTER_DESC_WITH_LENGTH(GIC_CPU_ALIAS_BINPOINT, IODEV_CPUIF,
 		vgic_mmio_read_vcpuif, vgic_mmio_write_vcpuif, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GIC_CPU_ACTIVEPRIO,
+	REGISTER_DESC_WITH_LENGTH(GIC_CPU_ACTIVEPRIO, IODEV_CPUIF,
 		vgic_mmio_read_raz, vgic_mmio_write_wi, 16,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GIC_CPU_IDENT,
+	REGISTER_DESC_WITH_LENGTH(GIC_CPU_IDENT, IODEV_CPUIF,
 		vgic_mmio_read_vcpuif, vgic_mmio_write_vcpuif, 4,
 		VGIC_ACCESS_32bit),
 };
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index 9dd8632..75e5728 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -42,6 +42,16 @@ static u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
 	return reg | ((u64)val << lower);
 }
 
+bool vgic_has_its(struct kvm *kvm)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+
+	if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
+		return false;
+
+	return false;
+}
+
 static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu,
 					    gpa_t addr, unsigned int len)
 {
@@ -132,6 +142,32 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
 	vgic_put_irq(vcpu->kvm, irq);
 }
 
+static unsigned long vgic_mmio_read_v3r_ctlr(struct kvm_vcpu *vcpu,
+					     gpa_t addr, unsigned int len)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	return vgic_cpu->lpis_enabled ? GICR_CTLR_ENABLE_LPIS : 0;
+}
+
+
+static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu *vcpu,
+				     gpa_t addr, unsigned int len,
+				     unsigned long val)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	bool was_enabled = vgic_cpu->lpis_enabled;
+
+	if (!vgic_has_its(vcpu->kvm))
+		return;
+
+	vgic_cpu->lpis_enabled = val & GICR_CTLR_ENABLE_LPIS;
+
+	if (!was_enabled && vgic_cpu->lpis_enabled) {
+		/* Eventually do something */
+	}
+}
+
 static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
 					      gpa_t addr, unsigned int len)
 {
@@ -298,12 +334,13 @@ static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu,
  * We take some special care here to fix the calculation of the register
  * offset.
  */
-#define REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(off, rd, wr, bpi, acc)	\
+#define REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(off, type, rd, wr, bpi, acc) \
 	{								\
 		.reg_offset = off,					\
 		.bits_per_irq = bpi,					\
 		.len = (bpi * VGIC_NR_PRIVATE_IRQS) / 8,		\
 		.access_flags = acc,					\
+		.iodev_type = type,					\
 		.read = vgic_mmio_read_raz,				\
 		.write = vgic_mmio_write_wi,				\
 	}, {								\
@@ -311,108 +348,109 @@ static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu,
 		.bits_per_irq = bpi,					\
 		.len = (bpi * (1024 - VGIC_NR_PRIVATE_IRQS)) / 8,	\
 		.access_flags = acc,					\
+		.iodev_type = type,					\
 		.read = rd,						\
 		.write = wr,						\
 	}
 
 static const struct vgic_register_region vgic_v3_dist_registers[] = {
-	REGISTER_DESC_WITH_LENGTH(GICD_CTLR,
+	REGISTER_DESC_WITH_LENGTH(GICD_CTLR, IODEV_DIST,
 		vgic_mmio_read_v3_misc, vgic_mmio_write_v3_misc, 16,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IGROUPR,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IGROUPR, IODEV_DIST,
 		vgic_mmio_read_rao, vgic_mmio_write_wi, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISENABLER,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISENABLER, IODEV_DIST,
 		vgic_mmio_read_enable, vgic_mmio_write_senable, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICENABLER,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICENABLER, IODEV_DIST,
 		vgic_mmio_read_enable, vgic_mmio_write_cenable, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISPENDR,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISPENDR, IODEV_DIST,
 		vgic_mmio_read_pending, vgic_mmio_write_spending, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICPENDR,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICPENDR, IODEV_DIST,
 		vgic_mmio_read_pending, vgic_mmio_write_cpending, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISACTIVER,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISACTIVER, IODEV_DIST,
 		vgic_mmio_read_active, vgic_mmio_write_sactive, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICACTIVER,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICACTIVER, IODEV_DIST,
 		vgic_mmio_read_active, vgic_mmio_write_cactive, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IPRIORITYR,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IPRIORITYR, IODEV_DIST,
 		vgic_mmio_read_priority, vgic_mmio_write_priority, 8,
 		VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ITARGETSR,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ITARGETSR, IODEV_DIST,
 		vgic_mmio_read_raz, vgic_mmio_write_wi, 8,
 		VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICFGR,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICFGR, IODEV_DIST,
 		vgic_mmio_read_config, vgic_mmio_write_config, 2,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IGRPMODR,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IGRPMODR, IODEV_DIST,
 		vgic_mmio_read_raz, vgic_mmio_write_wi, 1,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IROUTER,
+	REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IROUTER, IODEV_DIST,
 		vgic_mmio_read_irouter, vgic_mmio_write_irouter, 64,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICD_IDREGS,
+	REGISTER_DESC_WITH_LENGTH(GICD_IDREGS, IODEV_DIST,
 		vgic_mmio_read_v3_idregs, vgic_mmio_write_wi, 48,
 		VGIC_ACCESS_32bit),
 };
 
 static const struct vgic_register_region vgic_v3_rdbase_registers[] = {
-	REGISTER_DESC_WITH_LENGTH(GICR_CTLR,
-		vgic_mmio_read_raz, vgic_mmio_write_wi, 4,
+	REGISTER_DESC_WITH_LENGTH(GICR_CTLR, IODEV_REDIST,
+		vgic_mmio_read_v3r_ctlr, vgic_mmio_write_v3r_ctlr, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_IIDR,
+	REGISTER_DESC_WITH_LENGTH(GICR_IIDR, IODEV_REDIST,
 		vgic_mmio_read_v3r_iidr, vgic_mmio_write_wi, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_TYPER,
+	REGISTER_DESC_WITH_LENGTH(GICR_TYPER, IODEV_REDIST,
 		vgic_mmio_read_v3r_typer, vgic_mmio_write_wi, 8,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_PROPBASER,
+	REGISTER_DESC_WITH_LENGTH(GICR_PROPBASER, IODEV_REDIST,
 		vgic_mmio_read_propbase, vgic_mmio_write_propbase, 8,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_PENDBASER,
+	REGISTER_DESC_WITH_LENGTH(GICR_PENDBASER, IODEV_REDIST,
 		vgic_mmio_read_pendbase, vgic_mmio_write_pendbase, 8,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_IDREGS,
+	REGISTER_DESC_WITH_LENGTH(GICR_IDREGS, IODEV_REDIST,
 		vgic_mmio_read_v3_idregs, vgic_mmio_write_wi, 48,
 		VGIC_ACCESS_32bit),
 };
 
 static const struct vgic_register_region vgic_v3_sgibase_registers[] = {
-	REGISTER_DESC_WITH_LENGTH(GICR_IGROUPR0,
+	REGISTER_DESC_WITH_LENGTH(GICR_IGROUPR0, IODEV_REDIST,
 		vgic_mmio_read_rao, vgic_mmio_write_wi, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_ISENABLER0,
+	REGISTER_DESC_WITH_LENGTH(GICR_ISENABLER0, IODEV_REDIST,
 		vgic_mmio_read_enable, vgic_mmio_write_senable, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_ICENABLER0,
+	REGISTER_DESC_WITH_LENGTH(GICR_ICENABLER0, IODEV_REDIST,
 		vgic_mmio_read_enable, vgic_mmio_write_cenable, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_ISPENDR0,
+	REGISTER_DESC_WITH_LENGTH(GICR_ISPENDR0, IODEV_REDIST,
 		vgic_mmio_read_pending, vgic_mmio_write_spending, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_ICPENDR0,
+	REGISTER_DESC_WITH_LENGTH(GICR_ICPENDR0, IODEV_REDIST,
 		vgic_mmio_read_pending, vgic_mmio_write_cpending, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_ISACTIVER0,
+	REGISTER_DESC_WITH_LENGTH(GICR_ISACTIVER0, IODEV_REDIST,
 		vgic_mmio_read_active, vgic_mmio_write_sactive, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_ICACTIVER0,
+	REGISTER_DESC_WITH_LENGTH(GICR_ICACTIVER0, IODEV_REDIST,
 		vgic_mmio_read_active, vgic_mmio_write_cactive, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_IPRIORITYR0,
+	REGISTER_DESC_WITH_LENGTH(GICR_IPRIORITYR0, IODEV_REDIST,
 		vgic_mmio_read_priority, vgic_mmio_write_priority, 32,
 		VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_ICFGR0,
+	REGISTER_DESC_WITH_LENGTH(GICR_ICFGR0, IODEV_REDIST,
 		vgic_mmio_read_config, vgic_mmio_write_config, 8,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_IGRPMODR0,
+	REGISTER_DESC_WITH_LENGTH(GICR_IGRPMODR0, IODEV_REDIST,
 		vgic_mmio_read_raz, vgic_mmio_write_wi, 4,
 		VGIC_ACCESS_32bit),
-	REGISTER_DESC_WITH_LENGTH(GICR_NSACR,
+	REGISTER_DESC_WITH_LENGTH(GICR_NSACR, IODEV_REDIST,
 		vgic_mmio_read_raz, vgic_mmio_write_wi, 4,
 		VGIC_ACCESS_32bit),
 };
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index 5e79e01..a097c1a 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -473,8 +473,7 @@ static int dispatch_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
 {
 	struct vgic_io_device *iodev = kvm_to_vgic_iodev(dev);
 	const struct vgic_register_region *region;
-	struct kvm_vcpu *r_vcpu;
-	unsigned long data;
+	unsigned long data = 0;
 
 	region = vgic_find_mmio_region(iodev->regions, iodev->nr_regions,
 				       addr - iodev->base_addr);
@@ -483,8 +482,20 @@ static int dispatch_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
 		return 0;
 	}
 
-	r_vcpu = iodev->redist_vcpu ? iodev->redist_vcpu : vcpu;
-	data = region->read(r_vcpu, addr, len);
+	switch (region->iodev_type) {
+	case IODEV_CPUIF:
+		return 1;
+	case IODEV_DIST:
+		data = region->read(vcpu, addr, len);
+		break;
+	case IODEV_REDIST:
+		data = region->read(iodev->redist_vcpu, addr, len);
+		break;
+	case IODEV_ITS:
+		data = region->its_read(vcpu->kvm, iodev->its, addr, len);
+		break;
+	}
+
 	vgic_data_host_to_mmio_bus(val, len, data);
 	return 0;
 }
@@ -494,7 +505,6 @@ static int dispatch_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
 {
 	struct vgic_io_device *iodev = kvm_to_vgic_iodev(dev);
 	const struct vgic_register_region *region;
-	struct kvm_vcpu *r_vcpu;
 	unsigned long data = vgic_data_mmio_bus_to_host(val, len);
 
 	region = vgic_find_mmio_region(iodev->regions, iodev->nr_regions,
@@ -505,8 +515,20 @@ static int dispatch_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
 	if (!check_region(region, addr, len))
 		return 0;
 
-	r_vcpu = iodev->redist_vcpu ? iodev->redist_vcpu : vcpu;
-	region->write(r_vcpu, addr, len, data);
+	switch (region->iodev_type) {
+	case IODEV_CPUIF:
+		break;
+	case IODEV_DIST:
+		region->write(vcpu, addr, len, data);
+		break;
+	case IODEV_REDIST:
+		region->write(iodev->redist_vcpu, addr, len, data);
+		break;
+	case IODEV_ITS:
+		region->its_write(vcpu->kvm, iodev->its, addr, len, data);
+		break;
+	}
+
 	return 0;
 }
 
diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
index e863ccc..23e97a7 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.h
+++ b/virt/kvm/arm/vgic/vgic-mmio.h
@@ -16,15 +16,32 @@
 #ifndef __KVM_ARM_VGIC_MMIO_H__
 #define __KVM_ARM_VGIC_MMIO_H__
 
+enum iodev_type {
+	IODEV_CPUIF,
+	IODEV_DIST,
+	IODEV_REDIST,
+	IODEV_ITS
+};
+
 struct vgic_register_region {
 	unsigned int reg_offset;
 	unsigned int len;
 	unsigned int bits_per_irq;
 	unsigned int access_flags;
-	unsigned long (*read)(struct kvm_vcpu *vcpu, gpa_t addr,
-			      unsigned int len);
-	void (*write)(struct kvm_vcpu *vcpu, gpa_t addr, unsigned int len,
-		      unsigned long val);
+	enum iodev_type iodev_type;
+	union {
+		unsigned long (*read)(struct kvm_vcpu *vcpu, gpa_t addr,
+				      unsigned int len);
+		unsigned long (*its_read)(struct kvm *kvm, struct vgic_its *its,
+					  gpa_t addr, unsigned int len);
+	};
+	union {
+		void (*write)(struct kvm_vcpu *vcpu, gpa_t addr,
+			      unsigned int len, unsigned long val);
+		void (*its_write)(struct kvm *kvm, struct vgic_its *its,
+				  gpa_t addr, unsigned int len,
+				  unsigned long val);
+	};
 };
 
 extern struct kvm_io_device_ops kvm_io_gic_ops;
@@ -57,22 +74,24 @@ extern struct kvm_io_device_ops kvm_io_gic_ops;
  * The _WITH_LENGTH version instantiates registers with a fixed length
  * and is mutually exclusive with the _PER_IRQ version.
  */
-#define REGISTER_DESC_WITH_BITS_PER_IRQ(off, rd, wr, bpi, acc)		\
+#define REGISTER_DESC_WITH_BITS_PER_IRQ(off, type, rd, wr, bpi, acc)	\
 	{								\
 		.reg_offset = off,					\
 		.bits_per_irq = bpi,					\
 		.len = bpi * 1024 / 8,					\
 		.access_flags = acc,					\
+		.iodev_type = type,					\
 		.read = rd,						\
 		.write = wr,						\
 	}
 
-#define REGISTER_DESC_WITH_LENGTH(off, rd, wr, length, acc)		\
+#define REGISTER_DESC_WITH_LENGTH(off, type, rd, wr, length, acc)	\
 	{								\
 		.reg_offset = off,					\
 		.bits_per_irq = 0,					\
 		.len = length,						\
 		.access_flags = acc,					\
+		.iodev_type = type,					\
 		.read = rd,						\
 		.write = wr,						\
 	}
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index 5b79c34..31807c1 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -72,6 +72,7 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu);
 int vgic_v3_probe(const struct gic_kvm_info *info);
 int vgic_v3_map_resources(struct kvm *kvm);
 int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t dist_base_address);
+bool vgic_has_its(struct kvm *kvm);
 #else
 static inline void vgic_v3_process_maintenance(struct kvm_vcpu *vcpu)
 {
@@ -123,6 +124,12 @@ static inline int vgic_register_redist_iodevs(struct kvm *kvm,
 {
 	return -ENODEV;
 }
+
+static inline bool vgic_has_its(struct kvm *kvm)
+{
+	return false;
+}
+
 #endif
 
 int kvm_register_vgic_device(unsigned long type);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 10/17] KVM: arm64: introduce new KVM ITS device
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (8 preceding siblings ...)
  2016-07-05 11:23 ` [PATCH v8 09/17] KVM: arm64: introduce ITS emulation file with MMIO framework Andre Przywara
@ 2016-07-05 11:23 ` Andre Przywara
  2016-07-05 11:23 ` [PATCH v8 11/17] KVM: arm64: implement basic ITS register handlers Andre Przywara
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

Introduce a new KVM device that represents an ARM Interrupt Translation
Service (ITS) controller. Since there can be multiple of this per guest,
we can't piggy back on the existing GICv3 distributor device, but create
a new type of KVM device.
On the KVM_CREATE_DEVICE ioctl we allocate and initialize the ITS data
structure and store the pointer in the kvm_device data.
Upon an explicit init ioctl from userland (after having setup the MMIO
address) we register the handlers with the kvm_io_bus framework.
Any reference to an ITS thus has to go via this interface.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 Documentation/virtual/kvm/devices/arm-vgic.txt |  25 +++--
 arch/arm/kvm/arm.c                             |   1 +
 arch/arm64/include/uapi/asm/kvm.h              |   2 +
 include/kvm/arm_vgic.h                         |   3 +
 include/uapi/linux/kvm.h                       |   2 +
 virt/kvm/arm/vgic/vgic-its.c                   | 142 ++++++++++++++++++++++++-
 virt/kvm/arm/vgic/vgic-kvm-device.c            |   7 +-
 virt/kvm/arm/vgic/vgic-mmio-v3.c               |   2 +-
 virt/kvm/arm/vgic/vgic.h                       |   8 ++
 9 files changed, 182 insertions(+), 10 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt b/Documentation/virtual/kvm/devices/arm-vgic.txt
index 59541d4..89182f8 100644
--- a/Documentation/virtual/kvm/devices/arm-vgic.txt
+++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
@@ -4,16 +4,22 @@ ARM Virtual Generic Interrupt Controller (VGIC)
 Device types supported:
   KVM_DEV_TYPE_ARM_VGIC_V2     ARM Generic Interrupt Controller v2.0
   KVM_DEV_TYPE_ARM_VGIC_V3     ARM Generic Interrupt Controller v3.0
+  KVM_DEV_TYPE_ARM_VGIC_ITS    ARM Interrupt Translation Service Controller
 
-Only one VGIC instance may be instantiated through either this API or the
-legacy KVM_CREATE_IRQCHIP api.  The created VGIC will act as the VM interrupt
-controller, requiring emulated user-space devices to inject interrupts to the
-VGIC instead of directly to CPUs.
+Only one VGIC instance of the V2/V3 types above may be instantiated through
+either this API or the legacy KVM_CREATE_IRQCHIP api.  The created VGIC will
+act as the VM interrupt controller, requiring emulated user-space devices to
+inject interrupts to the VGIC instead of directly to CPUs.
 
 Creating a guest GICv3 device requires a host GICv3 as well.
 GICv3 implementations with hardware compatibility support allow a guest GICv2
 as well.
 
+Creating a virtual ITS controller requires a host GICv3 (but does not depend
+on having physical ITS controllers).
+There can be multiple ITS controllers per guest, each of them has to have
+a separate, non-overlapping MMIO region.
+
 Groups:
   KVM_DEV_ARM_VGIC_GRP_ADDR
   Attributes:
@@ -39,6 +45,13 @@ Groups:
       Only valid for KVM_DEV_TYPE_ARM_VGIC_V3.
       This address needs to be 64K aligned.
 
+    KVM_VGIC_V3_ADDR_TYPE_ITS (rw, 64-bit)
+      Base address in the guest physical address space of the GICv3 ITS
+      control register frame. The ITS allows MSI(-X) interrupts to be
+      injected into guests. This extension is optional. If the kernel
+      does not support the ITS, the call returns -ENODEV.
+      Only valid for KVM_DEV_TYPE_ARM_VGIC_ITS.
+      This address needs to be 64K aligned and the region covers 128K.
 
   KVM_DEV_ARM_VGIC_GRP_DIST_REGS
   Attributes:
@@ -109,8 +122,8 @@ Groups:
   KVM_DEV_ARM_VGIC_GRP_CTRL
   Attributes:
     KVM_DEV_ARM_VGIC_CTRL_INIT
-      request the initialization of the VGIC, no additional parameter in
-      kvm_device_attr.addr.
+      request the initialization of the VGIC or ITS, no additional parameter
+      in kvm_device_attr.addr.
   Errors:
     -ENXIO: VGIC not properly configured as required prior to calling
      this attribute
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 557e390..0d5c255 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -20,6 +20,7 @@
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/kvm_host.h>
+#include <linux/list.h>
 #include <linux/module.h>
 #include <linux/vmalloc.h>
 #include <linux/fs.h>
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index f209ea1..f8c257b 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -87,9 +87,11 @@ struct kvm_regs {
 /* Supported VGICv3 address types  */
 #define KVM_VGIC_V3_ADDR_TYPE_DIST	2
 #define KVM_VGIC_V3_ADDR_TYPE_REDIST	3
+#define KVM_VGIC_ITS_ADDR_TYPE		4
 
 #define KVM_VGIC_V3_DIST_SIZE		SZ_64K
 #define KVM_VGIC_V3_REDIST_SIZE		(2 * SZ_64K)
+#define KVM_VGIC_V3_ITS_SIZE		SZ_64K
 
 #define KVM_ARM_VCPU_POWER_OFF		0 /* CPU is started in OFF state */
 #define KVM_ARM_VCPU_EL1_32BIT		1 /* CPU running a 32bit VM */
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index f606641..eb82c7d 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -126,6 +126,7 @@ struct vgic_its {
 	gpa_t			vgic_its_base;
 
 	bool			enabled;
+	bool			initialized;
 	struct vgic_io_device	iodev;
 };
 
@@ -159,6 +160,8 @@ struct vgic_dist {
 
 	struct vgic_io_device	dist_iodev;
 
+	bool			has_its;
+
 	/*
 	 * Contains the address of the LPI configuration table.
 	 * Since we report GICR_TYPER.CommonLPIAff as 0b00, we can share
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7de96f5..d8c4c32 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1077,6 +1077,8 @@ enum kvm_device_type {
 #define KVM_DEV_TYPE_FLIC		KVM_DEV_TYPE_FLIC
 	KVM_DEV_TYPE_ARM_VGIC_V3,
 #define KVM_DEV_TYPE_ARM_VGIC_V3	KVM_DEV_TYPE_ARM_VGIC_V3
+	KVM_DEV_TYPE_ARM_VGIC_ITS,
+#define KVM_DEV_TYPE_ARM_VGIC_ITS	KVM_DEV_TYPE_ARM_VGIC_ITS
 	KVM_DEV_TYPE_MAX,
 };
 
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index ab8d244..d49bdad 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -21,6 +21,7 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 #include <linux/interrupt.h>
+#include <linux/uaccess.h>
 
 #include <linux/irqchip/arm-gic-v3.h>
 
@@ -80,7 +81,7 @@ static struct vgic_register_region its_registers[] = {
 		VGIC_ACCESS_32bit),
 };
 
-int vits_register(struct kvm *kvm, struct vgic_its *its)
+static int vgic_its_register(struct kvm *kvm, struct vgic_its *its)
 {
 	struct vgic_io_device *iodev = &its->iodev;
 	int ret;
@@ -98,3 +99,142 @@ int vits_register(struct kvm *kvm, struct vgic_its *its)
 
 	return ret;
 }
+
+static int vgic_its_create(struct kvm_device *dev, u32 type)
+{
+	struct vgic_its *its;
+
+	if (type != KVM_DEV_TYPE_ARM_VGIC_ITS)
+		return -ENODEV;
+
+	its = kzalloc(sizeof(struct vgic_its), GFP_KERNEL);
+	if (!its)
+		return -ENOMEM;
+
+	its->vgic_its_base = VGIC_ADDR_UNDEF;
+
+	dev->kvm->arch.vgic.has_its = true;
+	its->initialized = false;
+	its->enabled = false;
+
+	dev->private = its;
+
+	return 0;
+}
+
+static void vgic_its_destroy(struct kvm_device *kvm_dev)
+{
+	struct vgic_its *its = kvm_dev->private;
+
+	kfree(its);
+}
+
+static int vgic_its_has_attr(struct kvm_device *dev,
+			     struct kvm_device_attr *attr)
+{
+	switch (attr->group) {
+	case KVM_DEV_ARM_VGIC_GRP_ADDR:
+		switch (attr->attr) {
+		case KVM_VGIC_ITS_ADDR_TYPE:
+			return 0;
+		}
+		break;
+	case KVM_DEV_ARM_VGIC_GRP_CTRL:
+		switch (attr->attr) {
+		case KVM_DEV_ARM_VGIC_CTRL_INIT:
+			return 0;
+		}
+		break;
+	}
+	return -ENXIO;
+}
+
+static int vgic_its_set_attr(struct kvm_device *dev,
+			     struct kvm_device_attr *attr)
+{
+	struct vgic_its *its = dev->private;
+	int ret;
+
+	switch (attr->group) {
+	case KVM_DEV_ARM_VGIC_GRP_ADDR: {
+		u64 __user *uaddr = (u64 __user *)(long)attr->addr;
+		unsigned long type = (unsigned long)attr->attr;
+		u64 addr;
+
+		if (type != KVM_VGIC_ITS_ADDR_TYPE)
+			return -ENODEV;
+
+		if (its->initialized)
+			return -EBUSY;
+
+		if (copy_from_user(&addr, uaddr, sizeof(addr)))
+			return -EFAULT;
+
+		ret = vgic_check_ioaddr(dev->kvm, &its->vgic_its_base,
+					addr, SZ_64K);
+		if (ret)
+			return ret;
+
+		its->vgic_its_base = addr;
+
+		return 0;
+	}
+	case KVM_DEV_ARM_VGIC_GRP_CTRL:
+		switch (attr->attr) {
+		case KVM_DEV_ARM_VGIC_CTRL_INIT:
+			if (its->initialized)
+				return 0;
+
+			if (IS_VGIC_ADDR_UNDEF(its->vgic_its_base))
+				return -ENXIO;
+
+			ret = vgic_its_register(dev->kvm, its);
+			if (ret)
+				return ret;
+
+			its->initialized = true;
+			return 0;
+		}
+		break;
+	}
+	return -ENXIO;
+}
+
+static int vgic_its_get_attr(struct kvm_device *dev,
+			     struct kvm_device_attr *attr)
+{
+	switch (attr->group) {
+	case KVM_DEV_ARM_VGIC_GRP_ADDR: {
+		struct vgic_its *its = dev->private;
+		u64 addr = its->vgic_its_base;
+		u64 __user *uaddr = (u64 __user *)(long)attr->addr;
+		unsigned long type = (unsigned long)attr->attr;
+
+		if (type != KVM_VGIC_ITS_ADDR_TYPE)
+			return -ENODEV;
+
+		if (copy_to_user(uaddr, &addr, sizeof(addr)))
+			return -EFAULT;
+		break;
+	default:
+		return -ENXIO;
+	}
+	}
+
+	return 0;
+}
+
+struct kvm_device_ops kvm_arm_vgic_its_ops = {
+	.name = "kvm-arm-vgic-its",
+	.create = vgic_its_create,
+	.destroy = vgic_its_destroy,
+	.set_attr = vgic_its_set_attr,
+	.get_attr = vgic_its_get_attr,
+	.has_attr = vgic_its_has_attr,
+};
+
+int kvm_vgic_register_its_device(void)
+{
+	return kvm_register_device_ops(&kvm_arm_vgic_its_ops,
+				       KVM_DEV_TYPE_ARM_VGIC_ITS);
+}
diff --git a/virt/kvm/arm/vgic/vgic-kvm-device.c b/virt/kvm/arm/vgic/vgic-kvm-device.c
index 2f24f13..1813f93 100644
--- a/virt/kvm/arm/vgic/vgic-kvm-device.c
+++ b/virt/kvm/arm/vgic/vgic-kvm-device.c
@@ -21,8 +21,8 @@
 
 /* common helpers */
 
-static int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
-			     phys_addr_t addr, phys_addr_t alignment)
+int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
+		      phys_addr_t addr, phys_addr_t alignment)
 {
 	if (addr & ~KVM_PHYS_MASK)
 		return -E2BIG;
@@ -223,6 +223,9 @@ int kvm_register_vgic_device(unsigned long type)
 	case KVM_DEV_TYPE_ARM_VGIC_V3:
 		ret = kvm_register_device_ops(&kvm_arm_vgic_v3_ops,
 					      KVM_DEV_TYPE_ARM_VGIC_V3);
+		if (ret)
+			break;
+		ret = kvm_vgic_register_its_device();
 		break;
 #endif
 	}
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index 75e5728..062ff95 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -49,7 +49,7 @@ bool vgic_has_its(struct kvm *kvm)
 	if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
 		return false;
 
-	return false;
+	return dist->has_its;
 }
 
 static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu,
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index 31807c1..9dc7207 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -42,6 +42,9 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
 bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
 void vgic_kick_vcpus(struct kvm *kvm);
 
+int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
+		      phys_addr_t addr, phys_addr_t alignment);
+
 void vgic_v2_process_maintenance(struct kvm_vcpu *vcpu);
 void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu);
 void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr);
@@ -73,6 +76,7 @@ int vgic_v3_probe(const struct gic_kvm_info *info);
 int vgic_v3_map_resources(struct kvm *kvm);
 int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t dist_base_address);
 bool vgic_has_its(struct kvm *kvm);
+int kvm_vgic_register_its_device(void);
 #else
 static inline void vgic_v3_process_maintenance(struct kvm_vcpu *vcpu)
 {
@@ -130,6 +134,10 @@ static inline bool vgic_has_its(struct kvm *kvm)
 	return false;
 }
 
+static inline int kvm_vgic_register_its_device(void)
+{
+	return -ENODEV;
+}
 #endif
 
 int kvm_register_vgic_device(unsigned long type);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 11/17] KVM: arm64: implement basic ITS register handlers
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (9 preceding siblings ...)
  2016-07-05 11:23 ` [PATCH v8 10/17] KVM: arm64: introduce new KVM ITS device Andre Przywara
@ 2016-07-05 11:23 ` Andre Przywara
  2016-07-08 14:58   ` Marc Zyngier
  2016-07-05 11:23 ` [PATCH v8 12/17] KVM: arm64: connect LPIs to the VGIC emulation Andre Przywara
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

Add emulation for some basic MMIO registers used in the ITS emulation.
This includes:
- GITS_{CTLR,TYPER,IIDR}
- ID registers
- GITS_{CBASER,CREADR,CWRITER}
  (which implement the ITS command buffer handling)
- GITS_BASER<n>

Most of the handlers are pretty straight forward, only the CWRITER
handler is a bit more involved by taking the new its_cmd mutex and
then iterating over the command buffer.
The registers holding base addresses and attributes are sanitised before
storing them.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 include/kvm/arm_vgic.h           |  16 ++
 virt/kvm/arm/vgic/vgic-its.c     | 376 +++++++++++++++++++++++++++++++++++++--
 virt/kvm/arm/vgic/vgic-mmio-v3.c |   8 +-
 virt/kvm/arm/vgic/vgic-mmio.h    |   6 +
 virt/kvm/arm/vgic/vgic.c         |  12 +-
 5 files changed, 401 insertions(+), 17 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index eb82c7d..17d3929 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -22,6 +22,7 @@
 #include <linux/spinlock.h>
 #include <linux/types.h>
 #include <kvm/iodev.h>
+#include <linux/list.h>
 
 #define VGIC_V3_MAX_CPUS	255
 #define VGIC_V2_MAX_CPUS	8
@@ -128,6 +129,21 @@ struct vgic_its {
 	bool			enabled;
 	bool			initialized;
 	struct vgic_io_device	iodev;
+
+	/* These registers correspond to GITS_BASER{0,1} */
+	u64			baser_device_table;
+	u64			baser_coll_table;
+
+	/* Protects the command queue */
+	struct mutex		cmd_lock;
+	u64			cbaser;
+	u32			creadr;
+	u32			cwriter;
+
+	/* Protects the device and collection lists */
+	struct mutex		its_lock;
+	struct list_head	device_list;
+	struct list_head	collection_list;
 };
 
 struct vgic_dist {
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index d49bdad..a9336a4 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -21,6 +21,7 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 #include <linux/interrupt.h>
+#include <linux/list.h>
 #include <linux/uaccess.h>
 
 #include <linux/irqchip/arm-gic-v3.h>
@@ -32,6 +33,307 @@
 #include "vgic.h"
 #include "vgic-mmio.h"
 
+struct its_device {
+	struct list_head dev_list;
+
+	/* the head for the list of ITTEs */
+	struct list_head itt_head;
+	u32 device_id;
+};
+
+#define COLLECTION_NOT_MAPPED ((u32)~0)
+
+struct its_collection {
+	struct list_head coll_list;
+
+	u32 collection_id;
+	u32 target_addr;
+};
+
+#define its_is_collection_mapped(coll) ((coll) && \
+				((coll)->target_addr != COLLECTION_NOT_MAPPED))
+
+struct its_itte {
+	struct list_head itte_list;
+
+	struct its_collection *collection;
+	u32 lpi;
+	u32 event_id;
+};
+
+#define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
+
+static unsigned long vgic_mmio_read_its_ctlr(struct kvm *vcpu,
+					     struct vgic_its *its,
+					     gpa_t addr, unsigned int len)
+{
+	u32 reg = 0;
+
+	mutex_lock(&its->cmd_lock);
+	if (its->creadr == its->cwriter)
+		reg |= GITS_CTLR_QUIESCENT;
+	if (its->enabled)
+		reg |= GITS_CTLR_ENABLE;
+	mutex_unlock(&its->cmd_lock);
+
+	return reg;
+}
+
+static void vgic_mmio_write_its_ctlr(struct kvm *kvm, struct vgic_its *its,
+				     gpa_t addr, unsigned int len,
+				     unsigned long val)
+{
+	its->enabled = !!(val & GITS_CTLR_ENABLE);
+}
+
+static unsigned long vgic_mmio_read_its_typer(struct kvm *kvm,
+					      struct vgic_its *its,
+					      gpa_t addr, unsigned int len)
+{
+	u64 reg = GITS_TYPER_PLPIS;
+
+	/*
+	 * We use linear CPU numbers for redistributor addressing,
+	 * so GITS_TYPER.PTA is 0.
+	 * Also we force all PROPBASER registers to be the same, so
+	 * CommonLPIAff is 0 as well.
+	 * To avoid memory waste in the guest, we keep the number of IDBits and
+	 * DevBits low - as least for the time being.
+	 */
+	reg |= 0x0f << GITS_TYPER_DEVBITS_SHIFT;
+	reg |= 0x0f << GITS_TYPER_IDBITS_SHIFT;
+
+	return extract_bytes(reg, addr & 7, len);
+}
+
+static unsigned long vgic_mmio_read_its_iidr(struct kvm *kvm,
+					     struct vgic_its *its,
+					     gpa_t addr, unsigned int len)
+{
+	return (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
+}
+
+static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm,
+					       struct vgic_its *its,
+					       gpa_t addr, unsigned int len)
+{
+	switch (addr & 0xffff) {
+	case GITS_PIDR0:
+		return 0x92;	/* part number, bits[7:0] */
+	case GITS_PIDR1:
+		return 0xb4;	/* part number, bits[11:8] */
+	case GITS_PIDR2:
+		return GIC_PIDR2_ARCH_GICv3 | 0x0b;
+	case GITS_PIDR4:
+		return 0x40;	/* This is a 64K software visible page */
+	/* The following are the ID registers for (any) GIC. */
+	case GITS_CIDR0:
+		return 0x0d;
+	case GITS_CIDR1:
+		return 0xf0;
+	case GITS_CIDR2:
+		return 0x05;
+	case GITS_CIDR3:
+		return 0xb1;
+	}
+
+	return 0;
+}
+
+/* Requires the its_lock to be held. */
+static void its_free_itte(struct kvm *kvm, struct its_itte *itte)
+{
+	list_del(&itte->itte_list);
+	kfree(itte);
+}
+
+static int vits_handle_command(struct kvm *kvm, struct vgic_its *its,
+			       u64 *its_cmd)
+{
+	return -ENODEV;
+}
+
+static u64 vgic_sanitise_its_baser(u64 reg)
+{
+	reg = vgic_sanitise_field(reg, GITS_BASER_SHAREABILITY_SHIFT,
+				  GIC_BASER_SHAREABILITY_MASK,
+				  vgic_sanitise_shareability);
+	reg = vgic_sanitise_field(reg, GITS_BASER_INNER_CACHEABILITY_SHIFT,
+				  GIC_BASER_CACHE_MASK,
+				  vgic_sanitise_inner_cacheability);
+	reg = vgic_sanitise_field(reg, GITS_BASER_OUTER_CACHEABILITY_SHIFT,
+				  GIC_BASER_CACHE_MASK,
+				  vgic_sanitise_outer_cacheability);
+	return reg;
+}
+
+static u64 vgic_sanitise_its_cbaser(u64 reg)
+{
+	reg = vgic_sanitise_field(reg, GITS_CBASER_SHAREABILITY_SHIFT,
+				  GIC_BASER_SHAREABILITY_MASK,
+				  vgic_sanitise_shareability);
+	reg = vgic_sanitise_field(reg, GITS_CBASER_INNER_CACHEABILITY_SHIFT,
+				  GIC_BASER_CACHE_MASK,
+				  vgic_sanitise_inner_cacheability);
+	reg = vgic_sanitise_field(reg, GITS_CBASER_OUTER_CACHEABILITY_SHIFT,
+				  GIC_BASER_CACHE_MASK,
+				  vgic_sanitise_outer_cacheability);
+	return reg;
+}
+
+static unsigned long vgic_mmio_read_its_cbaser(struct kvm *kvm,
+					       struct vgic_its *its,
+					       gpa_t addr, unsigned int len)
+{
+	return extract_bytes(its->cbaser, addr & 7, len);
+}
+
+static void vgic_mmio_write_its_cbaser(struct kvm *kvm, struct vgic_its *its,
+				       gpa_t addr, unsigned int len,
+				       unsigned long val)
+{
+	/* When GITS_CTLR.Enable is 1, this register is RO. */
+	if (its->enabled)
+		return;
+
+	mutex_lock(&its->cmd_lock);
+	its->cbaser = update_64bit_reg(its->cbaser, addr & 7, len, val);
+	/* Sanitise the physical address to be 64k aligned. */
+	its->cbaser &= ~GENMASK_ULL(15, 12);
+	its->cbaser = vgic_sanitise_its_cbaser(its->cbaser);
+	its->creadr = 0;
+	/*
+	 * CWRITER is architecturally UNKNOWN on reset, but we need to reset
+	 * it to CREADR to make sure we start with an empty command buffer.
+	 */
+	its->cwriter = its->creadr;
+	mutex_unlock(&its->cmd_lock);
+}
+
+#define ITS_CMD_BUFFER_SIZE(baser)	((((baser) & 0xff) + 1) << 12)
+#define ITS_CMD_SIZE			32
+
+/*
+ * By writing to CWRITER the guest announces new commands to be processed.
+ * To avoid any races in the first place, we take the its_cmd lock, which
+ * protects our ring buffer variables, so that there is only one user
+ * per ITS handling commands@a given time.
+ */
+static void vgic_mmio_write_its_cwriter(struct kvm *kvm, struct vgic_its *its,
+					gpa_t addr, unsigned int len,
+					unsigned long val)
+{
+	gpa_t cbaser;
+	u64 cmd_buf[4];
+	u32 reg;
+
+	if (!its)
+		return;
+
+	cbaser = CBASER_ADDRESS(its->cbaser);
+
+	reg = update_64bit_reg(its->cwriter & 0xfffe0, addr & 7, len, val);
+	reg &= 0xfffe0;
+	if (reg > ITS_CMD_BUFFER_SIZE(its->cbaser))
+		return;
+
+	mutex_lock(&its->cmd_lock);
+
+	its->cwriter = reg;
+
+	while (its->cwriter != its->creadr) {
+		int ret = kvm_read_guest(kvm, cbaser + its->creadr,
+					 cmd_buf, ITS_CMD_SIZE);
+		/*
+		 * If kvm_read_guest() fails, this could be due to the guest
+		 * programming a bogus value in CBASER or something else going
+		 * wrong from which we cannot easily recover.
+		 * We just ignore that command then.
+		 */
+		if (!ret)
+			vits_handle_command(kvm, its, cmd_buf);
+
+		its->creadr += ITS_CMD_SIZE;
+		if (its->creadr == ITS_CMD_BUFFER_SIZE(its->cbaser))
+			its->creadr = 0;
+	}
+
+	mutex_unlock(&its->cmd_lock);
+}
+
+static unsigned long vgic_mmio_read_its_cwriter(struct kvm *kvm,
+						struct vgic_its *its,
+						gpa_t addr, unsigned int len)
+{
+	return extract_bytes(its->cwriter & 0xfffe0, addr & 0x7, len);
+}
+
+static unsigned long vgic_mmio_read_its_creadr(struct kvm *kvm,
+					       struct vgic_its *its,
+					       gpa_t addr, unsigned int len)
+{
+	return extract_bytes(its->creadr & 0xfffe0, addr & 0x7, len);
+}
+
+#define BASER_INDEX(addr) (((addr) / sizeof(u64)) & 0x7)
+static unsigned long vgic_mmio_read_its_baser(struct kvm *kvm,
+					      struct vgic_its *its,
+					      gpa_t addr, unsigned int len)
+{
+	u64 reg;
+
+	switch (BASER_INDEX(addr)) {
+	case 0:
+		reg = its->baser_device_table;
+		break;
+	case 1:
+		reg = its->baser_coll_table;
+		break;
+	default:
+		reg = 0;
+		break;
+	}
+
+	return extract_bytes(reg, addr & 7, len);
+}
+
+#define GITS_BASER_RO_MASK	(GENMASK_ULL(52, 48) | GENMASK_ULL(58, 56))
+static void vgic_mmio_write_its_baser(struct kvm *kvm,
+				      struct vgic_its *its,
+				      gpa_t addr, unsigned int len,
+				      unsigned long val)
+{
+	u64 reg, *regptr;
+	u64 entry_size, device_type;
+
+	/* When GITS_CTLR.Enable is 1, we ignore write accesses. */
+	if (its->enabled)
+		return;
+
+	switch (BASER_INDEX(addr)) {
+	case 0:
+		regptr = &its->baser_device_table;
+		entry_size = 8;
+		device_type = GITS_BASER_TYPE_DEVICE;
+		break;
+	case 1:
+		regptr = &its->baser_coll_table;
+		entry_size = 8;
+		device_type = GITS_BASER_TYPE_COLLECTION;
+		break;
+	default:
+		return;
+	}
+
+	reg = update_64bit_reg(*regptr, addr & 7, len, val);
+	reg &= ~GITS_BASER_RO_MASK;
+	reg |= (entry_size - 1) << GITS_BASER_ENTRY_SIZE_SHIFT;
+	reg |= device_type << GITS_BASER_TYPE_SHIFT;
+	reg = vgic_sanitise_its_baser(reg);
+
+	*regptr = reg;
+}
+
 #define REGISTER_ITS_DESC(off, rd, wr, length, acc)		\
 {								\
 	.reg_offset = off,					\
@@ -42,8 +344,8 @@
 	.its_write = wr,					\
 }
 
-static unsigned long its_mmio_read_raz(struct kvm *kvm, struct vgic_its *its,
-				       gpa_t addr, unsigned int len)
+unsigned long its_mmio_read_raz(struct kvm *kvm, struct vgic_its *its,
+				gpa_t addr, unsigned int len)
 {
 	return 0;
 }
@@ -56,28 +358,28 @@ static void its_mmio_write_wi(struct kvm *kvm, struct vgic_its *its,
 
 static struct vgic_register_region its_registers[] = {
 	REGISTER_ITS_DESC(GITS_CTLR,
-		its_mmio_read_raz, its_mmio_write_wi, 4,
+		vgic_mmio_read_its_ctlr, vgic_mmio_write_its_ctlr, 4,
 		VGIC_ACCESS_32bit),
 	REGISTER_ITS_DESC(GITS_IIDR,
-		its_mmio_read_raz, its_mmio_write_wi, 4,
+		vgic_mmio_read_its_iidr, its_mmio_write_wi, 4,
 		VGIC_ACCESS_32bit),
 	REGISTER_ITS_DESC(GITS_TYPER,
-		its_mmio_read_raz, its_mmio_write_wi, 8,
+		vgic_mmio_read_its_typer, its_mmio_write_wi, 8,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
 	REGISTER_ITS_DESC(GITS_CBASER,
-		its_mmio_read_raz, its_mmio_write_wi, 8,
+		vgic_mmio_read_its_cbaser, vgic_mmio_write_its_cbaser, 8,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
 	REGISTER_ITS_DESC(GITS_CWRITER,
-		its_mmio_read_raz, its_mmio_write_wi, 8,
+		vgic_mmio_read_its_cwriter, vgic_mmio_write_its_cwriter, 8,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
 	REGISTER_ITS_DESC(GITS_CREADR,
-		its_mmio_read_raz, its_mmio_write_wi, 8,
+		vgic_mmio_read_its_creadr, its_mmio_write_wi, 8,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
 	REGISTER_ITS_DESC(GITS_BASER,
-		its_mmio_read_raz, its_mmio_write_wi, 0x40,
+		vgic_mmio_read_its_baser, vgic_mmio_write_its_baser, 0x40,
 		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
 	REGISTER_ITS_DESC(GITS_IDREGS_BASE,
-		its_mmio_read_raz, its_mmio_write_wi, 0x30,
+		vgic_mmio_read_its_idregs, its_mmio_write_wi, 0x30,
 		VGIC_ACCESS_32bit),
 };
 
@@ -100,6 +402,18 @@ static int vgic_its_register(struct kvm *kvm, struct vgic_its *its)
 	return ret;
 }
 
+#define INITIAL_BASER_VALUE						  \
+	(GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWb)		| \
+	 GIC_BASER_CACHEABILITY(GITS_BASER, OUTER, SameAsInner)		| \
+	 GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable)		| \
+	 ((8ULL - 1) << GITS_BASER_ENTRY_SIZE_SHIFT)			| \
+	 GITS_BASER_PAGE_SIZE_64K)
+
+#define INITIAL_PROPBASER_VALUE						  \
+	(GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWb)		| \
+	 GIC_BASER_CACHEABILITY(GICR_PROPBASER, OUTER, SameAsInner)	| \
+	 GIC_BASER_SHAREABILITY(GICR_PROPBASER, InnerShareable))
+
 static int vgic_its_create(struct kvm_device *dev, u32 type)
 {
 	struct vgic_its *its;
@@ -111,12 +425,25 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
 	if (!its)
 		return -ENOMEM;
 
+	mutex_init(&its->its_lock);
+	mutex_init(&its->cmd_lock);
+
 	its->vgic_its_base = VGIC_ADDR_UNDEF;
 
+	INIT_LIST_HEAD(&its->device_list);
+	INIT_LIST_HEAD(&its->collection_list);
+
 	dev->kvm->arch.vgic.has_its = true;
 	its->initialized = false;
 	its->enabled = false;
 
+	its->baser_device_table = INITIAL_BASER_VALUE			|
+		((u64)GITS_BASER_TYPE_DEVICE << GITS_BASER_TYPE_SHIFT)	|
+		GITS_BASER_INDIRECT;
+	its->baser_coll_table = INITIAL_BASER_VALUE |
+		((u64)GITS_BASER_TYPE_COLLECTION << GITS_BASER_TYPE_SHIFT);
+	dev->kvm->arch.vgic.propbaser = INITIAL_PROPBASER_VALUE;
+
 	dev->private = its;
 
 	return 0;
@@ -124,7 +451,36 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
 
 static void vgic_its_destroy(struct kvm_device *kvm_dev)
 {
+	struct kvm *kvm = kvm_dev->kvm;
 	struct vgic_its *its = kvm_dev->private;
+	struct its_device *dev;
+	struct its_itte *itte;
+	struct list_head *dev_cur, *dev_temp;
+	struct list_head *cur, *temp;
+
+	/*
+	 * We may end up here without the lists ever having been initialized.
+	 * Check this and bail out early to avoid dereferencing a NULL pointer.
+	 */
+	if (!its->device_list.next)
+		return;
+
+	mutex_lock(&its->its_lock);
+	list_for_each_safe(dev_cur, dev_temp, &its->device_list) {
+		dev = container_of(dev_cur, struct its_device, dev_list);
+		list_for_each_safe(cur, temp, &dev->itt_head) {
+			itte = (container_of(cur, struct its_itte, itte_list));
+			its_free_itte(kvm, itte);
+		}
+		list_del(dev_cur);
+		kfree(dev);
+	}
+
+	list_for_each_safe(cur, temp, &its->collection_list) {
+		list_del(cur);
+		kfree(container_of(cur, struct its_collection, coll_list));
+	}
+	mutex_unlock(&its->its_lock);
 
 	kfree(its);
 }
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index 062ff95..370e89e 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -23,15 +23,15 @@
 #include "vgic-mmio.h"
 
 /* extract @num bytes at @offset bytes offset in data */
-static unsigned long extract_bytes(unsigned long data, unsigned int offset,
-				   unsigned int num)
+unsigned long extract_bytes(unsigned long data, unsigned int offset,
+			    unsigned int num)
 {
 	return (data >> (offset * 8)) & GENMASK_ULL(num * 8 - 1, 0);
 }
 
 /* allows updates of any half of a 64-bit register (or the whole thing) */
-static u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
-			    unsigned long val)
+u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
+		     unsigned long val)
 {
 	int lower = (offset & 4) * 8;
 	int upper = lower + 8 * len - 1;
diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
index 23e97a7..513bb5c 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.h
+++ b/virt/kvm/arm/vgic/vgic-mmio.h
@@ -106,6 +106,12 @@ unsigned long vgic_data_mmio_bus_to_host(const void *val, unsigned int len);
 void vgic_data_host_to_mmio_bus(void *buf, unsigned int len,
 				unsigned long data);
 
+unsigned long extract_bytes(unsigned long data, unsigned int offset,
+			    unsigned int num);
+
+u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
+		     unsigned long val);
+
 unsigned long vgic_mmio_read_raz(struct kvm_vcpu *vcpu,
 				 gpa_t addr, unsigned int len);
 
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index ae80894..a5d9a10 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -33,10 +33,16 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
 
 /*
  * Locking order is always:
- *   vgic_cpu->ap_list_lock
- *     vgic_irq->irq_lock
+ * its->cmd_lock (mutex)
+ *   its->its_lock (mutex)
+ *     vgic_cpu->ap_list_lock
+ *       vgic_irq->irq_lock
  *
- * (that is, always take the ap_list_lock before the struct vgic_irq lock).
+ * If you need to take multiple locks, always take the upper lock first,
+ * then the lower ones, e.g. first take the its_lock, then the irq_lock.
+ * If you are already holding a lock and need to take a higher one, you
+ * have to drop the lower ranking lock first and re-aquire it after having
+ * taken the upper one.
  *
  * When taking more than one ap_list_lock at the same time, always take the
  * lowest numbered VCPU's ap_list_lock first, so:
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 12/17] KVM: arm64: connect LPIs to the VGIC emulation
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (10 preceding siblings ...)
  2016-07-05 11:23 ` [PATCH v8 11/17] KVM: arm64: implement basic ITS register handlers Andre Przywara
@ 2016-07-05 11:23 ` Andre Przywara
  2016-07-11 16:20   ` Marc Zyngier
  2016-07-05 11:23 ` [PATCH v8 13/17] KVM: arm64: read initial LPI pending table Andre Przywara
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

LPIs are dynamically created (mapped) at guest runtime and their
actual number can be quite high, but is mostly assigned using a very
sparse allocation scheme. So arrays are not an ideal data structure
to hold the information.
We use a spin-lock protected linked list to hold all mapped LPIs,
represented by their struct vgic_irq. This lock is grouped between the
ap_list_lock and the vgic_irq lock in our locking order.
Also we store a pointer to that struct vgic_irq in our struct its_itte,
so we can easily access it.
Eventually we call our new vgic_its_get_lpi() from vgic_get_irq(), so
the VGIC code gets transparently access to LPIs.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 include/kvm/arm_vgic.h        |  6 ++++++
 virt/kvm/arm/vgic/vgic-init.c |  3 +++
 virt/kvm/arm/vgic/vgic-its.c  | 32 +++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic-v3.c   |  2 ++
 virt/kvm/arm/vgic/vgic.c      | 48 +++++++++++++++++++++++++++++++++++--------
 virt/kvm/arm/vgic/vgic.h      |  7 +++++++
 6 files changed, 90 insertions(+), 8 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 17d3929..5aff85c 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -77,6 +77,7 @@ enum vgic_irq_config {
 
 struct vgic_irq {
 	spinlock_t irq_lock;		/* Protects the content of the struct */
+	struct list_head lpi_entry;	/* Used to link all LPIs together */
 	struct list_head ap_list;
 
 	struct kvm_vcpu *vcpu;		/* SGIs and PPIs: The VCPU
@@ -185,6 +186,11 @@ struct vgic_dist {
 	 * GICv3 spec: 6.1.2 "LPI Configuration tables"
 	 */
 	u64			propbaser;
+
+	/* Protects the lpi_list and the count value below. */
+	spinlock_t		lpi_list_lock;
+	struct list_head	lpi_list_head;
+	int			lpi_list_count;
 };
 
 struct vgic_v2_cpu_if {
diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index ac3c1a5..535e713 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -157,6 +157,9 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
 	struct kvm_vcpu *vcpu0 = kvm_get_vcpu(kvm, 0);
 	int i;
 
+	INIT_LIST_HEAD(&dist->lpi_list_head);
+	spin_lock_init(&dist->lpi_list_lock);
+
 	dist->spis = kcalloc(nr_spis, sizeof(struct vgic_irq), GFP_KERNEL);
 	if (!dist->spis)
 		return  -ENOMEM;
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index a9336a4..1e2e649 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -33,6 +33,31 @@
 #include "vgic.h"
 #include "vgic-mmio.h"
 
+/*
+ * Iterate over the VM's list of mapped LPIs to find the one with a
+ * matching interrupt ID and return a reference to the IRQ structure.
+ */
+struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	struct vgic_irq *irq = NULL;
+
+	spin_lock(&dist->lpi_list_lock);
+	list_for_each_entry(irq, &dist->lpi_list_head, lpi_entry) {
+		if (irq->intid != intid)
+			continue;
+
+		kref_get(&irq->refcount);
+		goto out_unlock;
+	}
+	irq = NULL;
+
+out_unlock:
+	spin_unlock(&dist->lpi_list_lock);
+
+	return irq;
+}
+
 struct its_device {
 	struct list_head dev_list;
 
@@ -56,11 +81,17 @@ struct its_collection {
 struct its_itte {
 	struct list_head itte_list;
 
+	struct vgic_irq *irq;
 	struct its_collection *collection;
 	u32 lpi;
 	u32 event_id;
 };
 
+/* To be used as an iterator this macro misses the enclosing parentheses */
+#define for_each_lpi_its(dev, itte, its) \
+	list_for_each_entry(dev, &(its)->device_list, dev_list) \
+		list_for_each_entry(itte, &(dev)->itt_head, itte_list)
+
 #define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
 
 static unsigned long vgic_mmio_read_its_ctlr(struct kvm *vcpu,
@@ -144,6 +175,7 @@ static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm,
 static void its_free_itte(struct kvm *kvm, struct its_itte *itte)
 {
 	list_del(&itte->itte_list);
+	vgic_put_irq(kvm, itte->irq);
 	kfree(itte);
 }
 
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 6f8f31f..0506543 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -81,6 +81,8 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 		else
 			intid = val & GICH_LR_VIRTUALID;
 		irq = vgic_get_irq(vcpu->kvm, vcpu, intid);
+		if (!irq)	/* An LPI could have been unmapped. */
+			continue;
 
 		spin_lock(&irq->irq_lock);
 
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index a5d9a10..72b2516 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -36,7 +36,8 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
  * its->cmd_lock (mutex)
  *   its->its_lock (mutex)
  *     vgic_cpu->ap_list_lock
- *       vgic_irq->irq_lock
+ *       kvm->lpi_list_lock
+ *         vgic_irq->irq_lock
  *
  * If you need to take multiple locks, always take the upper lock first,
  * then the lower ones, e.g. first take the its_lock, then the irq_lock.
@@ -69,23 +70,54 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
 		return irq;
 	}
 
-	/* LPIs are not yet covered */
-	if (intid >= VGIC_MIN_LPI)
+	if (intid < VGIC_MIN_LPI) {
+		WARN(1, "Looking up struct vgic_irq for reserved INTID");
 		return NULL;
+	}
 
-	WARN(1, "Looking up struct vgic_irq for reserved INTID");
-	return NULL;
+	/* LPIs */
+	return vgic_its_get_lpi(kvm, intid);
 }
 
-/* The refcount should never drop to 0@the moment. */
+/*
+ * We can't do anything in here, because we lack the kvm pointer to
+ * lock and remove the item from the lpi_list. So we keep this function
+ * empty and use the return value of kref_put() to trigger the freeing.
+ */
 static void vgic_irq_release(struct kref *ref)
 {
-	WARN_ON(1);
+}
+
+static void __vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq, bool locked)
+{
+	struct vgic_dist *dist;
+
+	if (!kref_put(&irq->refcount, vgic_irq_release))
+		return;
+
+	if (irq->intid < VGIC_MIN_LPI)
+		return;
+
+	dist = &kvm->arch.vgic;
+
+	if (!locked)
+		spin_lock(&dist->lpi_list_lock);
+	list_del(&irq->lpi_entry);
+	dist->lpi_list_count--;
+	if (!locked)
+		spin_unlock(&dist->lpi_list_lock);
+
+	kfree(irq);
+}
+
+void vgic_put_irq_locked(struct kvm *kvm, struct vgic_irq *irq)
+{
+	__vgic_put_irq(kvm, irq, true);
 }
 
 void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
 {
-	kref_put(&irq->refcount, vgic_irq_release);
+	__vgic_put_irq(kvm, irq, false);
 }
 
 /**
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index 9dc7207..eef9ec1 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -39,6 +39,7 @@ struct vgic_vmcr {
 struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
 			      u32 intid);
 void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
+void vgic_put_irq_locked(struct kvm *kvm, struct vgic_irq *irq);
 bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
 void vgic_kick_vcpus(struct kvm *kvm);
 
@@ -77,6 +78,7 @@ int vgic_v3_map_resources(struct kvm *kvm);
 int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t dist_base_address);
 bool vgic_has_its(struct kvm *kvm);
 int kvm_vgic_register_its_device(void);
+struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid);
 #else
 static inline void vgic_v3_process_maintenance(struct kvm_vcpu *vcpu)
 {
@@ -138,6 +140,11 @@ static inline int kvm_vgic_register_its_device(void)
 {
 	return -ENODEV;
 }
+
+static inline struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid)
+{
+	return NULL;
+}
 #endif
 
 int kvm_register_vgic_device(unsigned long type);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 13/17] KVM: arm64: read initial LPI pending table
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (11 preceding siblings ...)
  2016-07-05 11:23 ` [PATCH v8 12/17] KVM: arm64: connect LPIs to the VGIC emulation Andre Przywara
@ 2016-07-05 11:23 ` Andre Przywara
  2016-07-11 16:50   ` Marc Zyngier
  2016-07-05 11:23 ` [PATCH v8 14/17] KVM: arm64: allow updates of LPI configuration table Andre Przywara
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

The LPI pending status for a GICv3 redistributor is held in a table
in (guest) memory. To achieve reasonable performance, we cache this
data in our struct vgic_irq. The initial pending state must be read
from guest memory upon enabling LPIs for this redistributor.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 virt/kvm/arm/vgic/vgic-its.c | 81 ++++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic.h     |  6 ++++
 2 files changed, 87 insertions(+)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 1e2e649..29bb4fe 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -93,6 +93,81 @@ struct its_itte {
 		list_for_each_entry(itte, &(dev)->itt_head, itte_list)
 
 #define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
+#define PENDBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 16))
+
+static int vgic_its_copy_lpi_list(struct kvm *kvm, u32 **intid_ptr)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	struct vgic_irq *irq;
+	u32 *intids;
+	int irq_count = dist->lpi_list_count, i = 0;
+
+	/*
+	 * We use the current value of the list length, which may change
+	 * after the kmalloc. We don't care, because the guest shouldn't
+	 * change anything while the command handling is still running,
+	 * and in the worst case we would miss a new IRQ, which one wouldn't
+	 * expect to be covered by this command anyway.
+	 */
+	intids = kmalloc_array(irq_count, sizeof(intids[0]), GFP_KERNEL);
+	if (!intids)
+		return -ENOMEM;
+
+	spin_lock(&dist->lpi_list_lock);
+	list_for_each_entry(irq, &dist->lpi_list_head, lpi_entry) {
+		if (kref_get_unless_zero(&irq->refcount)) {
+			intids[i] = irq->intid;
+			vgic_put_irq_locked(kvm, irq);
+		}
+		if (i++ == irq_count)
+			break;
+	}
+	spin_unlock(&dist->lpi_list_lock);
+
+	*intid_ptr = intids;
+	return irq_count;
+}
+
+/*
+ * Scan the whole LPI pending table and sync the pending bit in there
+ * with our own data structures. This relies on the LPI being
+ * mapped before.
+ */
+static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
+{
+	gpa_t pendbase = PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
+	struct vgic_irq *irq;
+	u8 pendmask;
+	int ret = 0;
+	u32 *intids;
+	int nr_irqs, i;
+
+	nr_irqs = vgic_its_copy_lpi_list(vcpu->kvm, &intids);
+	if (nr_irqs < 0)
+		return nr_irqs;
+
+	for (i = 0; i < nr_irqs; i++) {
+		int byte_offset, bit_nr;
+
+		byte_offset = intids[i] / BITS_PER_BYTE;
+		bit_nr = intids[i] % BITS_PER_BYTE;
+
+		ret = kvm_read_guest(vcpu->kvm, pendbase + byte_offset,
+				     &pendmask, 1);
+		if (ret) {
+			kfree(intids);
+			return ret;
+		}
+
+		irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
+		spin_lock(&irq->irq_lock);
+		irq->pending = pendmask & (1U << bit_nr);
+		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_put_irq(vcpu->kvm, irq);
+	}
+
+	return ret;
+}
 
 static unsigned long vgic_mmio_read_its_ctlr(struct kvm *vcpu,
 					     struct vgic_its *its,
@@ -415,6 +490,12 @@ static struct vgic_register_region its_registers[] = {
 		VGIC_ACCESS_32bit),
 };
 
+/* This is called on setting the LPI enable bit in the redistributor. */
+void vgic_enable_lpis(struct kvm_vcpu *vcpu)
+{
+	its_sync_lpi_pending_table(vcpu);
+}
+
 static int vgic_its_register(struct kvm *kvm, struct vgic_its *its)
 {
 	struct vgic_io_device *iodev = &its->iodev;
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index eef9ec1..4a9165f 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -25,6 +25,7 @@
 #define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
 
 #define INTERRUPT_ID_BITS_SPIS	10
+#define INTERRUPT_ID_BITS_ITS	16
 #define VGIC_PRI_BITS		5
 
 #define vgic_irq_is_sgi(intid) ((intid) < VGIC_NR_SGIS)
@@ -79,6 +80,7 @@ int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t dist_base_address);
 bool vgic_has_its(struct kvm *kvm);
 int kvm_vgic_register_its_device(void);
 struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid);
+void vgic_enable_lpis(struct kvm_vcpu *vcpu);
 #else
 static inline void vgic_v3_process_maintenance(struct kvm_vcpu *vcpu)
 {
@@ -145,6 +147,10 @@ static inline struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid)
 {
 	return NULL;
 }
+
+static inline void vgic_enable_lpis(struct kvm_vcpu *vcpu)
+{
+}
 #endif
 
 int kvm_register_vgic_device(unsigned long type);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 14/17] KVM: arm64: allow updates of LPI configuration table
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (12 preceding siblings ...)
  2016-07-05 11:23 ` [PATCH v8 13/17] KVM: arm64: read initial LPI pending table Andre Przywara
@ 2016-07-05 11:23 ` Andre Przywara
  2016-07-11 16:59   ` Marc Zyngier
  2016-07-05 11:23 ` [PATCH v8 15/17] KVM: arm64: implement ITS command queue command handlers Andre Przywara
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

The (system-wide) LPI configuration table is held in a table in
(guest) memory. To achieve reasonable performance, we cache this data
in our struct vgic_irq. If the guest updates the configuration data
(which consists of the enable bit and the priority value), it issues
an INV or INVALL command to allow us to update our information.
Provide functions that update that information for one LPI or all LPIs
mapped to a specific collection.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 virt/kvm/arm/vgic/vgic-its.c | 45 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 29bb4fe..5de71bd 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -94,6 +94,51 @@ struct its_itte {
 
 #define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
 #define PENDBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 16))
+#define PROPBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
+
+#define GIC_LPI_OFFSET 8192
+
+#define LPI_PROP_ENABLE_BIT(p)	((p) & LPI_PROP_ENABLED)
+#define LPI_PROP_PRIORITY(p)	((p) & 0xfc)
+
+/*
+ * Reads the configuration data for a given LPI from guest memory and
+ * updates the fields in struct vgic_irq.
+ * If filter_vcpu is not NULL, applies only if the IRQ is targeting this
+ * VCPU. Unconditionally applies if filter_vcpu is NULL.
+ */
+static int update_lpi_config_filtered(struct kvm *kvm, struct vgic_irq *irq,
+				      struct kvm_vcpu *filter_vcpu)
+{
+	u64 propbase = PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
+	u8 prop;
+	int ret;
+
+	ret = kvm_read_guest(kvm, propbase + irq->intid - GIC_LPI_OFFSET,
+			     &prop, 1);
+
+	if (ret)
+		return ret;
+
+	spin_lock(&irq->irq_lock);
+
+	if (!filter_vcpu || filter_vcpu == irq->target_vcpu) {
+		irq->priority = LPI_PROP_PRIORITY(prop);
+		irq->enabled = LPI_PROP_ENABLE_BIT(prop);
+
+		vgic_queue_irq_unlock(kvm, irq);
+	} else {
+		spin_unlock(&irq->irq_lock);
+	}
+
+	return 0;
+}
+
+/* Updates the priority and enable bit for a given LPI. */
+int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq)
+{
+	return update_lpi_config_filtered(kvm, irq, NULL);
+}
 
 static int vgic_its_copy_lpi_list(struct kvm *kvm, u32 **intid_ptr)
 {
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 15/17] KVM: arm64: implement ITS command queue command handlers
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (13 preceding siblings ...)
  2016-07-05 11:23 ` [PATCH v8 14/17] KVM: arm64: allow updates of LPI configuration table Andre Przywara
@ 2016-07-05 11:23 ` Andre Przywara
  2016-07-11 17:17   ` Marc Zyngier
  2016-07-05 11:23 ` [PATCH v8 16/17] KVM: arm64: implement MSI injection in ITS emulation Andre Przywara
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

The connection between a device, an event ID, the LPI number and the
allocated CPU is stored in in-memory tables in a GICv3, but their
format is not specified by the spec. Instead software uses a command
queue in a ring buffer to let the ITS implementation use their own
format.
Implement handlers for the various ITS commands and let them store
the requested relation into our own data structures. Those data
structures are protected by the its_lock mutex.
Our internal ring buffer read and write pointers are protected by the
its_cmd mutex, so that at most one VCPU per ITS can handle commands at
any given time.
Error handling is very basic at the moment, as we don't have a good
way of communicating errors to the guest (usually a SError).
The INT command handler is missing at this point, as we gain the
capability of actually injecting MSIs into the guest only later on.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 virt/kvm/arm/vgic/vgic-its.c | 609 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 605 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 5de71bd..432daed 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -58,6 +58,43 @@ out_unlock:
 	return irq;
 }
 
+/*
+ * Creates a new (reference to a) struct vgic_irq for a given LPI.
+ * If this LPI is already mapped on another ITS, we increase its refcount
+ * and return a pointer to the existing structure.
+ * If this is a "new" LPI, we allocate and initialize a new struct vgic_irq.
+ * This function returns a pointer to the _unlocked_ structure.
+ */
+static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	struct vgic_irq *irq = vgic_its_get_lpi(kvm, intid);
+
+	/* In this case there is no put, since we keep the reference. */
+	if (irq)
+		return irq;
+
+	irq = kzalloc(sizeof(struct vgic_irq), GFP_KERNEL);
+
+	if (!irq)
+		return NULL;
+
+	INIT_LIST_HEAD(&irq->lpi_entry);
+	INIT_LIST_HEAD(&irq->ap_list);
+	spin_lock_init(&irq->irq_lock);
+
+	irq->config = VGIC_CONFIG_EDGE;
+	kref_init(&irq->refcount);
+	irq->intid = intid;
+
+	spin_lock(&dist->lpi_list_lock);
+	list_add_tail(&irq->lpi_entry, &dist->lpi_list_head);
+	dist->lpi_list_count++;
+	spin_unlock(&dist->lpi_list_lock);
+
+	return irq;
+}
+
 struct its_device {
 	struct list_head dev_list;
 
@@ -87,6 +124,43 @@ struct its_itte {
 	u32 event_id;
 };
 
+/*
+ * Find and returns a device in the device table for an ITS.
+ * Must be called with the its_lock held.
+ */
+static struct its_device *find_its_device(struct vgic_its *its, u32 device_id)
+{
+	struct its_device *device;
+
+	list_for_each_entry(device, &its->device_list, dev_list)
+		if (device_id == device->device_id)
+			return device;
+
+	return NULL;
+}
+
+/*
+ * Find and returns an interrupt translation table entry (ITTE) for a given
+ * Device ID/Event ID pair on an ITS.
+ * Must be called with the its_lock held.
+ */
+static struct its_itte *find_itte(struct vgic_its *its, u32 device_id,
+				  u32 event_id)
+{
+	struct its_device *device;
+	struct its_itte *itte;
+
+	device = find_its_device(its, device_id);
+	if (device == NULL)
+		return NULL;
+
+	list_for_each_entry(itte, &device->itt_head, itte_list)
+		if (itte->event_id == event_id)
+			return itte;
+
+	return NULL;
+}
+
 /* To be used as an iterator this macro misses the enclosing parentheses */
 #define for_each_lpi_its(dev, itte, its) \
 	list_for_each_entry(dev, &(its)->device_list, dev_list) \
@@ -98,6 +172,22 @@ struct its_itte {
 
 #define GIC_LPI_OFFSET 8192
 
+/*
+ * Finds and returns a collection in the ITS collection table.
+ * Must be called with the its_lock held.
+ */
+static struct its_collection *find_collection(struct vgic_its *its, int coll_id)
+{
+	struct its_collection *collection;
+
+	list_for_each_entry(collection, &its->collection_list, coll_list) {
+		if (coll_id == collection->collection_id)
+			return collection;
+	}
+
+	return NULL;
+}
+
 #define LPI_PROP_ENABLE_BIT(p)	((p) & LPI_PROP_ENABLED)
 #define LPI_PROP_PRIORITY(p)	((p) & 0xfc)
 
@@ -135,7 +225,7 @@ static int update_lpi_config_filtered(struct kvm *kvm, struct vgic_irq *irq,
 }
 
 /* Updates the priority and enable bit for a given LPI. */
-int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq)
+static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq)
 {
 	return update_lpi_config_filtered(kvm, irq, NULL);
 }
@@ -174,6 +264,48 @@ static int vgic_its_copy_lpi_list(struct kvm *kvm, u32 **intid_ptr)
 }
 
 /*
+ * Promotes the ITS view of affinity of an ITTE (which redistributor this LPI
+ * is targeting) to the VGIC's view, which deals with target VCPUs.
+ * Needs to be called whenever either the collection for a LPIs has
+ * changed or the collection itself got retargeted.
+ */
+static void update_affinity_itte(struct kvm *kvm, struct its_itte *itte)
+{
+	struct kvm_vcpu *vcpu;
+
+	vcpu = kvm_get_vcpu(kvm, itte->collection->target_addr);
+
+	spin_lock(&itte->irq->irq_lock);
+	itte->irq->target_vcpu = vcpu;
+	spin_unlock(&itte->irq->irq_lock);
+}
+
+/*
+ * Updates the target VCPU for every LPI targeting this collection.
+ * Must be called with the its_lock held.
+ */
+static void update_affinity_collection(struct kvm *kvm, struct vgic_its *its,
+				       struct its_collection *coll)
+{
+	struct its_device *device;
+	struct its_itte *itte;
+
+	for_each_lpi_its(device, itte, its) {
+		if (!itte->collection || coll != itte->collection)
+			continue;
+
+		update_affinity_itte(kvm, itte);
+	}
+}
+
+static u32 max_lpis_propbaser(u64 propbaser)
+{
+	int nr_idbits = (propbaser & 0x1f) + 1;
+
+	return 1U << min(nr_idbits, INTERRUPT_ID_BITS_ITS);
+}
+
+/*
  * Scan the whole LPI pending table and sync the pending bit in there
  * with our own data structures. This relies on the LPI being
  * mapped before.
@@ -299,10 +431,479 @@ static void its_free_itte(struct kvm *kvm, struct its_itte *itte)
 	kfree(itte);
 }
 
-static int vits_handle_command(struct kvm *kvm, struct vgic_its *its,
+static u64 its_cmd_mask_field(u64 *its_cmd, int word, int shift, int size)
+{
+	return (le64_to_cpu(its_cmd[word]) >> shift) & (BIT_ULL(size) - 1);
+}
+
+#define its_cmd_get_command(cmd)	its_cmd_mask_field(cmd, 0,  0,  8)
+#define its_cmd_get_deviceid(cmd)	its_cmd_mask_field(cmd, 0, 32, 32)
+#define its_cmd_get_id(cmd)		its_cmd_mask_field(cmd, 1,  0, 32)
+#define its_cmd_get_physical_id(cmd)	its_cmd_mask_field(cmd, 1, 32, 32)
+#define its_cmd_get_collection(cmd)	its_cmd_mask_field(cmd, 2,  0, 16)
+#define its_cmd_get_target_addr(cmd)	its_cmd_mask_field(cmd, 2, 16, 32)
+#define its_cmd_get_validbit(cmd)	its_cmd_mask_field(cmd, 2, 63,  1)
+
+/* The DISCARD command frees an Interrupt Translation Table Entry (ITTE). */
+static int vgic_its_cmd_handle_discard(struct kvm *kvm, struct vgic_its *its,
+				   u64 *its_cmd)
+{
+	u32 device_id;
+	u32 event_id;
+	struct its_itte *itte;
+	int ret = E_ITS_DISCARD_UNMAPPED_INTERRUPT;
+
+	device_id = its_cmd_get_deviceid(its_cmd);
+	event_id = its_cmd_get_id(its_cmd);
+
+	mutex_lock(&its->its_lock);
+	itte = find_itte(its, device_id, event_id);
+	if (itte && itte->collection) {
+		/*
+		 * Though the spec talks about removing the pending state, we
+		 * don't bother here since we clear the ITTE anyway and the
+		 * pending state is a property of the ITTE struct.
+		 */
+		its_free_itte(kvm, itte);
+		ret = 0;
+	}
+
+	mutex_unlock(&its->its_lock);
+	return ret;
+}
+
+/* The MOVI command moves an ITTE to a different collection. */
+static int vgic_its_cmd_handle_movi(struct kvm *kvm, struct vgic_its *its,
+				u64 *its_cmd)
+{
+	u32 device_id = its_cmd_get_deviceid(its_cmd);
+	u32 event_id = its_cmd_get_id(its_cmd);
+	u32 coll_id = its_cmd_get_collection(its_cmd);
+	struct kvm_vcpu *vcpu;
+	struct its_itte *itte;
+	struct its_collection *collection;
+	int ret = 0;
+
+	mutex_lock(&its->its_lock);
+	itte = find_itte(its, device_id, event_id);
+	if (!itte) {
+		ret = E_ITS_MOVI_UNMAPPED_INTERRUPT;
+		goto out_unlock;
+	}
+	if (!its_is_collection_mapped(itte->collection)) {
+		ret = E_ITS_MOVI_UNMAPPED_COLLECTION;
+		goto out_unlock;
+	}
+
+	collection = find_collection(its, coll_id);
+	if (!its_is_collection_mapped(collection)) {
+		ret = E_ITS_MOVI_UNMAPPED_COLLECTION;
+		goto out_unlock;
+	}
+
+	itte->collection = collection;
+	vcpu = kvm_get_vcpu(kvm, collection->target_addr);
+
+	spin_lock(&itte->irq->irq_lock);
+	itte->irq->target_vcpu = vcpu;
+	spin_unlock(&itte->irq->irq_lock);
+
+out_unlock:
+	mutex_unlock(&its->its_lock);
+	return ret;
+}
+
+static void vgic_its_init_collection(struct vgic_its *its,
+				 struct its_collection *collection,
+				 u32 coll_id)
+{
+	collection->collection_id = coll_id;
+	collection->target_addr = COLLECTION_NOT_MAPPED;
+
+	list_add_tail(&collection->coll_list, &its->collection_list);
+}
+
+/* The MAPTI and MAPI commands map LPIs to ITTEs. */
+static int vgic_its_cmd_handle_mapi(struct kvm *kvm, struct vgic_its *its,
+				u64 *its_cmd, u8 subcmd)
+{
+	u32 device_id = its_cmd_get_deviceid(its_cmd);
+	u32 event_id = its_cmd_get_id(its_cmd);
+	u32 coll_id = its_cmd_get_collection(its_cmd);
+	struct its_itte *itte;
+	struct its_device *device;
+	struct its_collection *collection, *new_coll = NULL;
+	int lpi_nr;
+	int ret = 0;
+
+	mutex_lock(&its->its_lock);
+
+	device = find_its_device(its, device_id);
+	if (!device) {
+		ret = E_ITS_MAPTI_UNMAPPED_DEVICE;
+		goto out_unlock;
+	}
+
+	collection = find_collection(its, coll_id);
+	if (!collection) {
+		new_coll = kzalloc(sizeof(struct its_collection), GFP_KERNEL);
+		if (!new_coll) {
+			ret = -ENOMEM;
+			goto out_unlock;
+		}
+	}
+
+	if (subcmd == GITS_CMD_MAPTI)
+		lpi_nr = its_cmd_get_physical_id(its_cmd);
+	else
+		lpi_nr = event_id;
+	if (lpi_nr < GIC_LPI_OFFSET ||
+	    lpi_nr >= max_lpis_propbaser(kvm->arch.vgic.propbaser))
+		return E_ITS_MAPTI_PHYSICALID_OOR;
+
+	itte = find_itte(its, device_id, event_id);
+	if (!itte) {
+		itte = kzalloc(sizeof(struct its_itte), GFP_KERNEL);
+		if (!itte) {
+			kfree(new_coll);
+			ret = -ENOMEM;
+			goto out_unlock;
+		}
+
+		itte->event_id	= event_id;
+		list_add_tail(&itte->itte_list, &device->itt_head);
+	}
+
+	if (!collection) {
+		collection = new_coll;
+		vgic_its_init_collection(its, collection, coll_id);
+	}
+
+	itte->collection = collection;
+	itte->lpi = lpi_nr;
+	itte->irq = vgic_add_lpi(kvm, lpi_nr);
+	update_affinity_itte(kvm, itte);
+
+	/*
+	 * We "cache" the configuration table entries in out struct vgic_irq's.
+	 * However we only have those structs for mapped IRQs, so we read in
+	 * the respective config data from memory here upon mapping the LPI.
+	 */
+	update_lpi_config(kvm, itte->irq);
+
+out_unlock:
+	mutex_unlock(&its->its_lock);
+
+	return 0;
+}
+
+/* Requires the its_lock to be held. */
+static void vgic_its_unmap_device(struct kvm *kvm, struct its_device *device)
+{
+	struct its_itte *itte, *temp;
+
+	/*
+	 * The spec says that unmapping a device with still valid
+	 * ITTEs associated is UNPREDICTABLE. We remove all ITTEs,
+	 * since we cannot leave the memory unreferenced.
+	 */
+	list_for_each_entry_safe(itte, temp, &device->itt_head, itte_list)
+		its_free_itte(kvm, itte);
+
+	list_del(&device->dev_list);
+	kfree(device);
+}
+
+/* MAPD maps or unmaps a device ID to Interrupt Translation Tables (ITTs). */
+static int vgic_its_cmd_handle_mapd(struct kvm *kvm, struct vgic_its *its,
+				u64 *its_cmd)
+{
+	bool valid = its_cmd_get_validbit(its_cmd);
+	u32 device_id = its_cmd_get_deviceid(its_cmd);
+	struct its_device *device;
+	int ret = 0;
+
+	mutex_lock(&its->its_lock);
+
+	device = find_its_device(its, device_id);
+	if (device)
+		vgic_its_unmap_device(kvm, device);
+
+	/*
+	 * The spec does not say whether unmapping a not-mapped device
+	 * is an error, so we are done in any case.
+	 */
+	if (!valid)
+		goto out_unlock;
+
+	device = kzalloc(sizeof(struct its_device), GFP_KERNEL);
+	if (!device) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+
+	device->device_id = device_id;
+	INIT_LIST_HEAD(&device->itt_head);
+
+	list_add_tail(&device->dev_list, &its->device_list);
+
+out_unlock:
+	mutex_unlock(&its->its_lock);
+	return ret;
+}
+
+/* The MAPC command maps collection IDs to redistributors. */
+static int vgic_its_cmd_handle_mapc(struct kvm *kvm, struct vgic_its *its,
+				u64 *its_cmd)
+{
+	u16 coll_id;
+	u32 target_addr;
+	struct its_collection *collection;
+	bool valid;
+	int ret = 0;
+
+	valid = its_cmd_get_validbit(its_cmd);
+	coll_id = its_cmd_get_collection(its_cmd);
+	target_addr = its_cmd_get_target_addr(its_cmd);
+
+	if (target_addr >= atomic_read(&kvm->online_vcpus))
+		return E_ITS_MAPC_PROCNUM_OOR;
+
+	mutex_lock(&its->its_lock);
+
+	collection = find_collection(its, coll_id);
+
+	if (!valid) {
+		struct its_device *device;
+		struct its_itte *itte;
+		/*
+		 * Clearing the mapping for that collection ID removes the
+		 * entry from the list. If there wasn't any before, we can
+		 * go home early.
+		 */
+		if (!collection)
+			goto out_unlock;
+
+		for_each_lpi_its(device, itte, its)
+			if (itte->collection &&
+			    itte->collection->collection_id == coll_id)
+				itte->collection = NULL;
+
+		list_del(&collection->coll_list);
+		kfree(collection);
+	} else {
+		if (!collection) {
+			collection = kzalloc(sizeof(struct its_collection),
+					     GFP_KERNEL);
+			if (!collection) {
+				ret = -ENOMEM;
+				goto out_unlock;
+			}
+
+			vgic_its_init_collection(its, collection, coll_id);
+			collection->target_addr = target_addr;
+		} else {
+			collection->target_addr = target_addr;
+			update_affinity_collection(kvm, its, collection);
+		}
+	}
+
+out_unlock:
+	mutex_unlock(&its->its_lock);
+
+	return ret;
+}
+
+/* The CLEAR command removes the pending state for a particular LPI. */
+static int vgic_its_cmd_handle_clear(struct kvm *kvm, struct vgic_its *its,
+				 u64 *its_cmd)
+{
+	u32 device_id;
+	u32 event_id;
+	struct its_itte *itte;
+	int ret = 0;
+
+	device_id = its_cmd_get_deviceid(its_cmd);
+	event_id = its_cmd_get_id(its_cmd);
+
+	mutex_lock(&its->its_lock);
+
+	itte = find_itte(its, device_id, event_id);
+	if (!itte) {
+		ret = E_ITS_CLEAR_UNMAPPED_INTERRUPT;
+		goto out_unlock;
+	}
+
+	itte->irq->pending = false;
+
+out_unlock:
+	mutex_unlock(&its->its_lock);
+	return ret;
+}
+
+/* The INV command syncs the configuration bits from the memory table. */
+static int vgic_its_cmd_handle_inv(struct kvm *kvm, struct vgic_its *its,
+			       u64 *its_cmd)
+{
+	u32 device_id;
+	u32 event_id;
+	struct its_itte *itte;
+	int ret;
+
+	device_id = its_cmd_get_deviceid(its_cmd);
+	event_id = its_cmd_get_id(its_cmd);
+
+	mutex_lock(&its->its_lock);
+
+	itte = find_itte(its, device_id, event_id);
+	if (!itte) {
+		ret = E_ITS_INV_UNMAPPED_INTERRUPT;
+		goto out_unlock;
+	}
+
+	ret = update_lpi_config(kvm, itte->irq);
+
+out_unlock:
+	mutex_unlock(&its->its_lock);
+	return ret;
+}
+
+/*
+ * The INVALL command requests flushing of all IRQ data in this collection.
+ * Find the VCPU mapped to that collection, then iterate over the VM's list
+ * of mapped LPIs and update the configuration for each IRQ which targets
+ * the specified vcpu. The configuration will be read from the in-memory
+ * configuration table.
+ */
+static int vgic_its_cmd_handle_invall(struct kvm *kvm, struct vgic_its *its,
+				  u64 *its_cmd)
+{
+	u32 coll_id = its_cmd_get_collection(its_cmd);
+	struct its_collection *collection;
+	struct kvm_vcpu *vcpu;
+	struct vgic_irq *irq;
+	u32 *intids;
+	int irq_count, i;
+
+	mutex_lock(&its->its_lock);
+
+	collection = find_collection(its, coll_id);
+	if (!its_is_collection_mapped(collection))
+		return E_ITS_INVALL_UNMAPPED_COLLECTION;
+
+	vcpu = kvm_get_vcpu(kvm, collection->target_addr);
+
+	irq_count = vgic_its_copy_lpi_list(kvm, &intids);
+	if (irq_count < 0)
+		return irq_count;
+
+	for (i = 0; i < irq_count; i++) {
+		irq = vgic_get_irq(kvm, NULL, intids[i]);
+		if (!irq)
+			continue;
+		update_lpi_config_filtered(kvm, irq, vcpu);
+		vgic_put_irq_locked(kvm, irq);
+	}
+
+	kfree(intids);
+
+	mutex_unlock(&its->its_lock);
+
+	return 0;
+}
+
+/*
+ * The MOVALL command moves the pending state of all IRQs targeting one
+ * redistributor to another. We don't hold the pending state in the VCPUs,
+ * but in the IRQs instead, so there is really not much to do for us here.
+ * However the spec says that no IRQ must target the old redistributor
+ * afterwards, so we make sure that no LPI is using the associated target_vcpu.
+ * This command affects all LPIs in the system.
+ */
+static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
+				  u64 *its_cmd)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	u32 target1_addr = its_cmd_get_target_addr(its_cmd);
+	u32 target2_addr = its_cmd_mask_field(its_cmd, 3, 16, 32);
+	struct kvm_vcpu *vcpu1, *vcpu2;
+	struct vgic_irq *irq;
+
+	if (target1_addr >= atomic_read(&kvm->online_vcpus) ||
+	    target2_addr >= atomic_read(&kvm->online_vcpus))
+		return E_ITS_MOVALL_PROCNUM_OOR;
+
+	if (target1_addr == target2_addr)
+		return 0;
+
+	vcpu1 = kvm_get_vcpu(kvm, target1_addr);
+	vcpu2 = kvm_get_vcpu(kvm, target2_addr);
+
+	spin_lock(&dist->lpi_list_lock);
+
+	list_for_each_entry(irq, &dist->lpi_list_head, lpi_entry) {
+		spin_lock(&irq->irq_lock);
+
+		if (irq->target_vcpu == vcpu1)
+			irq->target_vcpu = vcpu2;
+
+		spin_unlock(&irq->irq_lock);
+	}
+
+	spin_unlock(&dist->lpi_list_lock);
+
+	return 0;
+}
+
+/*
+ * This function is called with the its_cmd lock held, but the ITS data
+ * structure lock dropped. It is within the responsibility of the actual
+ * command handlers to take care of proper locking when needed.
+ */
+static int vgic_its_handle_command(struct kvm *kvm, struct vgic_its *its,
 			       u64 *its_cmd)
 {
-	return -ENODEV;
+	u8 cmd = its_cmd_get_command(its_cmd);
+	int ret = -ENODEV;
+
+	switch (cmd) {
+	case GITS_CMD_MAPD:
+		ret = vgic_its_cmd_handle_mapd(kvm, its, its_cmd);
+		break;
+	case GITS_CMD_MAPC:
+		ret = vgic_its_cmd_handle_mapc(kvm, its, its_cmd);
+		break;
+	case GITS_CMD_MAPI:
+		ret = vgic_its_cmd_handle_mapi(kvm, its, its_cmd, cmd);
+		break;
+	case GITS_CMD_MAPTI:
+		ret = vgic_its_cmd_handle_mapi(kvm, its, its_cmd, cmd);
+		break;
+	case GITS_CMD_MOVI:
+		ret = vgic_its_cmd_handle_movi(kvm, its, its_cmd);
+		break;
+	case GITS_CMD_DISCARD:
+		ret = vgic_its_cmd_handle_discard(kvm, its, its_cmd);
+		break;
+	case GITS_CMD_CLEAR:
+		ret = vgic_its_cmd_handle_clear(kvm, its, its_cmd);
+		break;
+	case GITS_CMD_MOVALL:
+		ret = vgic_its_cmd_handle_movall(kvm, its, its_cmd);
+		break;
+	case GITS_CMD_INV:
+		ret = vgic_its_cmd_handle_inv(kvm, its, its_cmd);
+		break;
+	case GITS_CMD_INVALL:
+		ret = vgic_its_cmd_handle_invall(kvm, its, its_cmd);
+		break;
+	case GITS_CMD_SYNC:
+		/* we ignore this command: we are in sync all of the time */
+		ret = 0;
+		break;
+	}
+
+	return ret;
 }
 
 static u64 vgic_sanitise_its_baser(u64 reg)
@@ -403,7 +1004,7 @@ static void vgic_mmio_write_its_cwriter(struct kvm *kvm, struct vgic_its *its,
 		 * We just ignore that command then.
 		 */
 		if (!ret)
-			vits_handle_command(kvm, its, cmd_buf);
+			vgic_its_handle_command(kvm, its, cmd_buf);
 
 		its->creadr += ITS_CMD_SIZE;
 		if (its->creadr == ITS_CMD_BUFFER_SIZE(its->cbaser))
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 16/17] KVM: arm64: implement MSI injection in ITS emulation
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (14 preceding siblings ...)
  2016-07-05 11:23 ` [PATCH v8 15/17] KVM: arm64: implement ITS command queue command handlers Andre Przywara
@ 2016-07-05 11:23 ` Andre Przywara
  2016-07-05 11:23 ` [PATCH v8 17/17] KVM: arm64: enable ITS emulation as a virtual MSI controller Andre Przywara
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

When userland wants to inject a MSI into the guest, it uses the
KVM_SIGNAL_MSI ioctl, which carries the doorbell address along with
the payload and the device ID.
We convert this into an MMIO write to the ITS translation register,
so we can use the knowledge of the kvm_io_bus framework about the
different ITSes and magically end up in the right ITS.
The device ID is combined with the payload into a 64-bit write.
Inside the handler we use our wrapper functions to iterate the linked
lists and find the proper Interrupt Translation Table Entry and thus
the corresponding struct vgic_irq to finally set the pending bit.
We provide a VGIC emulation model specific routine for the actual
MSI injection. The wrapper functions return an error for models not
(yet) implementing MSIs (like the GICv2 emulation).
We also provide the handler for the ITS "INT" command, which allows a
guest to trigger an MSI via the ITS command queue. Since this one knows
about the right ITS already, we directly call the MMIO handler function
without using the kvm_io_bus framework.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 virt/kvm/arm/vgic/vgic-its.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic.h     |  6 ++++
 2 files changed, 76 insertions(+)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 432daed..79a1b80 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -423,6 +423,61 @@ static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm,
 	return 0;
 }
 
+static void vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
+				 u32 devid, u32 eventid)
+{
+	struct its_itte *itte;
+
+	if (!its->enabled)
+		return;
+
+	mutex_lock(&its->its_lock);
+
+	itte = find_itte(its, devid, eventid);
+	/* Triggering an unmapped IRQ gets silently dropped. */
+	if (itte && its_is_collection_mapped(itte->collection)) {
+		struct kvm_vcpu *vcpu;
+
+		vcpu = kvm_get_vcpu(kvm, itte->collection->target_addr);
+		if (vcpu && vcpu->arch.vgic_cpu.lpis_enabled) {
+			spin_lock(&itte->irq->irq_lock);
+			itte->irq->pending = true;
+			vgic_queue_irq_unlock(kvm, itte->irq);
+		}
+	}
+
+	mutex_unlock(&its->its_lock);
+}
+
+/*
+ * Dispatches an incoming MSI request to the KVM IO bus, which will redirect
+ * it for us to the proper ITS and the translation register write handler.
+ */
+int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
+{
+	u64 address;
+	struct kvm_io_device *kvm_io_dev;
+	struct vgic_io_device *iodev;
+
+	if (!vgic_has_its(kvm))
+		return -ENODEV;
+
+	if (!(msi->flags & KVM_MSI_VALID_DEVID))
+		return -EINVAL;
+
+	address = (u64)msi->address_hi << 32 | msi->address_lo;
+	address -= SZ_64K;
+
+	kvm_io_dev = kvm_io_bus_get_dev(kvm, KVM_MMIO_BUS, address);
+	if (!kvm_io_dev)
+		return -ENODEV;
+
+	iodev = container_of(kvm_io_dev, struct vgic_io_device, dev);
+	vgic_its_trigger_msi(kvm, iodev->its, msi->devid, msi->data);
+
+	return 0;
+}
+
 /* Requires the its_lock to be held. */
 static void its_free_itte(struct kvm *kvm, struct its_itte *itte)
 {
@@ -855,6 +910,18 @@ static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
 	return 0;
 }
 
+/* The INT command injects the LPI associated with that DevID/EvID pair. */
+static int vgic_its_cmd_handle_int(struct kvm *kvm, struct vgic_its *its,
+			       u64 *its_cmd)
+{
+	u32 msi_data = its_cmd_get_id(its_cmd);
+	u64 msi_devid = its_cmd_get_deviceid(its_cmd);
+
+	vgic_its_trigger_msi(kvm, its, msi_devid, msi_data);
+
+	return 0;
+}
+
 /*
  * This function is called with the its_cmd lock held, but the ITS data
  * structure lock dropped. It is within the responsibility of the actual
@@ -891,6 +958,9 @@ static int vgic_its_handle_command(struct kvm *kvm, struct vgic_its *its,
 	case GITS_CMD_MOVALL:
 		ret = vgic_its_cmd_handle_movall(kvm, its, its_cmd);
 		break;
+	case GITS_CMD_INT:
+		ret = vgic_its_cmd_handle_int(kvm, its, its_cmd);
+		break;
 	case GITS_CMD_INV:
 		ret = vgic_its_cmd_handle_inv(kvm, its, its_cmd);
 		break;
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index 4a9165f..b3e5678 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -81,6 +81,7 @@ bool vgic_has_its(struct kvm *kvm);
 int kvm_vgic_register_its_device(void);
 struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid);
 void vgic_enable_lpis(struct kvm_vcpu *vcpu);
+int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi);
 #else
 static inline void vgic_v3_process_maintenance(struct kvm_vcpu *vcpu)
 {
@@ -151,6 +152,11 @@ static inline struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid)
 static inline void vgic_enable_lpis(struct kvm_vcpu *vcpu)
 {
 }
+
+static inline int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
+{
+	return -ENODEV;
+}
 #endif
 
 int kvm_register_vgic_device(unsigned long type);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 17/17] KVM: arm64: enable ITS emulation as a virtual MSI controller
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (15 preceding siblings ...)
  2016-07-05 11:23 ` [PATCH v8 16/17] KVM: arm64: implement MSI injection in ITS emulation Andre Przywara
@ 2016-07-05 11:23 ` Andre Przywara
  2016-07-06  8:52 ` [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Auger Eric
  2016-07-11 17:36 ` Marc Zyngier
  18 siblings, 0 replies; 49+ messages in thread
From: Andre Przywara @ 2016-07-05 11:23 UTC (permalink / raw)
  To: linux-arm-kernel

Now that all ITS emulation functionality is in place, we advertise
MSI functionality to userland and also the ITS device to the guest - if
userland has configured that.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 Documentation/virtual/kvm/api.txt |  2 +-
 arch/arm64/kvm/Kconfig            |  1 +
 arch/arm64/kvm/reset.c            |  6 ++++++
 include/kvm/arm_vgic.h            |  5 +++++
 virt/kvm/arm/vgic/vgic-init.c     |  3 +++
 virt/kvm/arm/vgic/vgic-mmio-v3.c  | 14 ++++++++++----
 virt/kvm/arm/vgic/vgic.c          |  8 ++++++++
 7 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 6551311..07049ea 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2162,7 +2162,7 @@ after pausing the vcpu, but before it is resumed.
 4.71 KVM_SIGNAL_MSI
 
 Capability: KVM_CAP_SIGNAL_MSI
-Architectures: x86
+Architectures: x86 arm64
 Type: vm ioctl
 Parameters: struct kvm_msi (in)
 Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index aa2e34e..9d2eff0 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM
 	select HAVE_KVM_IRQFD
 	select KVM_ARM_VGIC_V3
 	select KVM_ARM_PMU if HW_PERF_EVENTS
+	select HAVE_KVM_MSI
 	---help---
 	  Support hosting virtualized guest machines.
 	  We don't support KVM with 16K page tables yet, due to the multiple
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3989833..d8c3140 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -86,6 +86,12 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_VCPU_ATTRIBUTES:
 		r = 1;
 		break;
+	case KVM_CAP_MSI_DEVID:
+		if (!kvm)
+			r = -EINVAL;
+		else
+			r = kvm->arch.vgic.msis_require_devid;
+		break;
 	default:
 		r = 0;
 	}
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 5aff85c..c64db0f 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -155,6 +155,9 @@ struct vgic_dist {
 	/* vGIC model the kernel emulates for the guest (GICv2 or GICv3) */
 	u32			vgic_model;
 
+	/* Do injected MSIs require an additional device ID? */
+	bool			msis_require_devid;
+
 	int			nr_spis;
 
 	/* TODO: Consider moving to global state */
@@ -300,4 +303,6 @@ static inline int kvm_vgic_get_max_vcpus(void)
 	return kvm_vgic_global_state.max_gic_vcpus;
 }
 
+int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi);
+
 #endif /* __KVM_ARM_VGIC_H */
diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index 535e713..01a60dc 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -258,6 +258,9 @@ int vgic_init(struct kvm *kvm)
 	if (ret)
 		goto out;
 
+	if (vgic_has_its(kvm))
+		dist->msis_require_devid = true;
+
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		kvm_vgic_vcpu_init(vcpu);
 
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index 370e89e..c7c7a87 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -66,7 +66,12 @@ static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu,
 	case GICD_TYPER:
 		value = vcpu->kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS;
 		value = (value >> 5) - 1;
-		value |= (INTERRUPT_ID_BITS_SPIS - 1) << 19;
+		if (vgic_has_its(vcpu->kvm)) {
+			value |= (INTERRUPT_ID_BITS_ITS - 1) << 19;
+			value |= GICD_TYPER_LPIS;
+		} else {
+			value |= (INTERRUPT_ID_BITS_SPIS - 1) << 19;
+		}
 		break;
 	case GICD_IIDR:
 		value = (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
@@ -163,9 +168,8 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu *vcpu,
 
 	vgic_cpu->lpis_enabled = val & GICR_CTLR_ENABLE_LPIS;
 
-	if (!was_enabled && vgic_cpu->lpis_enabled) {
-		/* Eventually do something */
-	}
+	if (!was_enabled && vgic_cpu->lpis_enabled)
+		vgic_enable_lpis(vcpu);
 }
 
 static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
@@ -179,6 +183,8 @@ static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
 	value |= ((target_vcpu_id & 0xffff) << 8);
 	if (target_vcpu_id == atomic_read(&vcpu->kvm->online_vcpus) - 1)
 		value |= GICR_TYPER_LAST;
+	if (vgic_has_its(vcpu->kvm))
+		value |= GICR_TYPER_PLPIS;
 
 	return extract_bytes(value, addr & 7, len);
 }
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 72b2516..c4f3aba 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -683,3 +683,11 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
 
 	return map_is_active;
 }
+
+int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi)
+{
+	if (vgic_has_its(kvm))
+		return vgic_its_inject_msi(kvm, msi);
+	else
+		return -ENODEV;
+}
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (16 preceding siblings ...)
  2016-07-05 11:23 ` [PATCH v8 17/17] KVM: arm64: enable ITS emulation as a virtual MSI controller Andre Przywara
@ 2016-07-06  8:52 ` Auger Eric
  2016-07-11 17:36 ` Marc Zyngier
  18 siblings, 0 replies; 49+ messages in thread
From: Auger Eric @ 2016-07-06  8:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Andre,
On 05/07/2016 13:22, Andre Przywara wrote:
> Hi,
> 
> this series allows those KVM guests that use an emulated GICv3 to use LPIs
> as well, though in the moment this is limited to emulated PCI devices.
> This is based on kvmarm/queue, which now only features the new VGIC
> implementation.
> 
> This time only smaller corrections for the KVM ITS emulation support:
> I addressed the review comments, which pointed out some vgic_put_irq()
> omissions. Also the GICv2 init sequence has changed, so that we can now
> bail out a KVM_DEVICE init without leaking a HYP mapping.
> Also a bug in the MAPC emulation was fixed, which allowed multiple
> mappings of the same collection ID.
> The KVM_DEVICE init sequence has now some checks to ensure the right
> order. The requirements are a bit stricter than for the GICv2/GICv3
> devices: we need to setup the mapping address before calling the
> INIT ioctl. This apparently has some implications on QEMU, I just need
> to be convinced that we should follow QEMU's approach. It seems to look
> a bit ugly to stash the ITS init into the existing GICv3 code, especially
> since the ITS is a separate, optional device.
> 
> You can find all of this code (and the prerequisites) in the
> its-emul/v8 branch of my repository [1].
> This has been briefly tested on the model and on GICv3 hardware.
> If you have GICv3 capable hardware, please test it on your setup.
> Also of course any review comments are very welcome!
> 
> Cheers,
> Andre.
> 
> Changelog v7..v8:
> - rebase on old-VGIC removal patch
> - add missing vgic_put_irq()s
> - check and ensure proper ITS initialisation sequence
> - avoid double collection mapping
> - renaming vits_ function prefixes to vgic_its_
> - properly setup PENDBASER (for new VGIC now)
> - change vgic_v2_probe init order to allow clean exit
> 
> Changelog v6..v7:
> - use kref reference counting
> - remove RCU usage from lpi_list, use spinlock instead
> - copy list of LPIs before accessing guest memory
> - introduce kvm_io_bus_get_dev()
> - refactor parts of arm-gic-v3.h header file
> - provide proper initial values for redistributor and ITS base registers
> - rework sanitisation of base registers
> - rework VGIC MMIO dispatching to differentiate between VGIC parts
> - smaller fixes, also comments and commit messages amended
> 
> Changelog v5..v6:
> - remove its_list from VGIC code
> - add lpi_list and accessor functions
> - introduce reference counting to struct vgic_irq
> - replace its_lock spinlock with its_cmd and its_lock mutexes
> - simplify guest memory accesses (due to the new mutexes)
> - avoid unnecessary affinity updates
> - refine base register address masking
> - introduce sanity checks for PROPBASER and PENDBASER
> - implement BASER<n> registers
> - pass struct vgic_its directly into the MMIO handlers
> - convert KVM_SIGNAL_MSI ioctl into an MMIO write
> - add explicit INIT ioctl to the ITS KVM device
> - adjusting comments and commit messages
> 
> Changelog v4..v5:
> - adapting to final new VGIC (MMIO handlers, etc.)
> - new KVM device to model an ITS, multiple instances allowed
> - move redistributor data into struct vgic_cpu
> - separate distributor and ITS(es)
> - various bug fixes and amended comments after review comments
> 
> Changelog v3..v4:
> - adapting to new VGIC (changes in IRQ injection mechanism)
> 
> Changelog v2..v3:
> - adapt to 4.3-rc and Christoffer's timer rework
> - adapt spin locks on handling PROPBASER/PENDBASER registers
> - rework locking in ITS command handling (dropping dist where needed)
> - only clear LPI pending bit if LPI could actually be queued
> - simplify GICR_CTLR handling
> - properly free ITTEs (including our pending bitmap)
> - fix corner cases with unmapped collections
> - keep retire_lr() around
> - rename vgic_handle_base_register to vgic_reg64_access()
> - use kcalloc instead of kmalloc
> - minor fixes, renames and added comments
> 
> Changelog v1..v2
> - fix issues when using non-ITS GICv3 emulation
> - streamline frame address initialization (new patch 05/15)
> - preallocate buffer memory for reading from guest's memory
> - move locking into the actual command handlers
> -   preallocate memory for new structures if needed
> - use non-atomic __set_bit() and __clear_bit() when under the lock
> - add INT command handler to allow LPI injection from the guest
> - rewrite CWRITER handler to align with new locking scheme
> - remove unneeded CONFIG_HAVE_KVM_MSI #ifdefs
> - check memory table size against our LPI limit (65536 interrupts)
> - observe initial gap of 1024 interrupts in pending table
> - use term "configuration table" to be in line with the spec
> - clarify and extend documentation on API extensions
> - introduce new KVM_CAP_MSI_DEVID capability to advertise device ID requirement
> - update, fix and add many comments
> - minor style changes as requested by reviewers
> 
> ---------------
> 
> The GICv3 ITS (Interrupt Translation Service) is a part of the
> ARM GICv3 interrupt controller [3] used for implementing MSIs.
> It specifies a new kind of interrupts (LPIs), which are mapped to
> establish a connection between a device, its MSI payload value and
> the target processor the IRQ is eventually delivered to.
> In order to allow using MSIs in an ARM64 KVM guest, we emulate this
> ITS widget in the kernel.
> The ITS works by reading commands written by software (from the guest
> in our case) into a (guest allocated) memory region and establishing
> the mapping between a device, the MSI payload and the target CPU.
> We parse these commands and update our internal data structures to
> reflect those changes. On an MSI injection we iterate those
> structures to learn the LPI number we have to inject.
> For the time being we use simple lists to hold the data, this is
> good enough for the small number of entries each of the components
> currently have. Should this become a performance bottleneck in the
> future, those can be extended to arrays or trees if needed.
> 
> Most of the code lives in a separate source file (vgic-its.c), though
> there are some changes necessary in the existing VGIC files.
> 
> For the time being this series gives us the ability to use emulated
> PCI devices that can use MSIs in the guest. Those have to be
> triggered by letting the userland device emulation simulate the MSI
> write with the KVM_SIGNAL_MSI ioctl. This will be translated into
> the proper LPI by the ITS emulation and injected into the guest in
> the usual way (just with a higher IRQ number).
> 
> This series is based on kvmarm/queue and can be found at the
> its-emul/v8 branch of this repository [1].
> For this to be used you need a GICv3 host machine (a fast model would
> do), though it does not rely on any host ITS bits (neither in hardware
> or software).
> 
> To test this you can use the kvmtool patches available in the "its-v6"
> branch here [2].
> Start a guest with: "$ lkvm run --irqchip=gicv3-its --force-pci"
> and see the ITS being used for instance by the virtio devices.
> 
> [1]: git://linux-arm.org/linux-ap.git
>      http://www.linux-arm.org/git?p=linux-ap.git;a=log;h=refs/heads/its-emul/v8
> [2]: git://linux-arm.org/kvmtool.git
>      http://www.linux-arm.org/git?p=kvmtool.git;a=log;h=refs/heads/its-v6
> [3]: http://arminfo.emea.arm.com/help/topic/com.arm.doc.ihi0069a/IHI0069A_gic_architecture_specification.pdf
> 
> Andre Przywara (17):
>   KVM: arm/arm64: move redistributor kvm_io_devices
>   KVM: arm/arm64: check return value for kvm_register_vgic_device
>   KVM: extend struct kvm_msi to hold a 32-bit device ID
>   KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities
>   KVM: kvm_io_bus: add kvm_io_bus_get_dev() call
>   KVM: arm/arm64: VGIC: add refcounting for IRQs
>   irqchip: refactor and add GICv3 definitions
>   KVM: arm64: handle ITS related GICv3 redistributor registers
>   KVM: arm64: introduce ITS emulation file with MMIO framework
>   KVM: arm64: introduce new KVM ITS device
>   KVM: arm64: implement basic ITS register handlers
>   KVM: arm64: connect LPIs to the VGIC emulation
>   KVM: arm64: read initial LPI pending table
>   KVM: arm64: allow updates of LPI configuration table
>   KVM: arm64: implement ITS command queue command handlers
>   KVM: arm64: implement MSI injection in ITS emulation
>   KVM: arm64: enable ITS emulation as a virtual MSI controller
> 
>  Documentation/virtual/kvm/api.txt              |   14 +-
>  Documentation/virtual/kvm/devices/arm-vgic.txt |   25 +-
>  arch/arm/include/asm/kvm_host.h                |    2 +-
>  arch/arm/kvm/arm.c                             |    3 +-
>  arch/arm64/include/asm/kvm_host.h              |    2 +-
>  arch/arm64/include/uapi/asm/kvm.h              |    2 +
>  arch/arm64/kvm/Kconfig                         |    1 +
>  arch/arm64/kvm/Makefile                        |    1 +
>  arch/arm64/kvm/reset.c                         |    8 +-
>  include/kvm/arm_vgic.h                         |   66 +-
>  include/linux/irqchip/arm-gic-v3.h             |  165 ++-
>  include/linux/kvm_host.h                       |    2 +
>  include/uapi/linux/kvm.h                       |    7 +-
>  virt/kvm/arm/vgic/vgic-init.c                  |    9 +-
>  virt/kvm/arm/vgic/vgic-its.c                   | 1425 ++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic-kvm-device.c            |   22 +-
>  virt/kvm/arm/vgic/vgic-mmio-v2.c               |   48 +-
>  virt/kvm/arm/vgic/vgic-mmio-v3.c               |  301 ++++-
>  virt/kvm/arm/vgic/vgic-mmio.c                  |   61 +-
>  virt/kvm/arm/vgic/vgic-mmio.h                  |   45 +-
>  virt/kvm/arm/vgic/vgic-v2.c                    |   12 +-
>  virt/kvm/arm/vgic/vgic-v3.c                    |   29 +-
>  virt/kvm/arm/vgic/vgic.c                       |  108 +-
>  virt/kvm/arm/vgic/vgic.h                       |   37 +-
>  virt/kvm/kvm_main.c                            |   24 +
>  25 files changed, 2216 insertions(+), 203 deletions(-)
>  create mode 100644 virt/kvm/arm/vgic/vgic-its.c
> 

Tested-by: Eric Auger <eric.auger@redhat.com>

The code was tested on Cavium ThunderX with qemu virtio-net-pci and
vhost-net (with and without your fix found on your branch).

Cheers

Eric

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 03/17] KVM: extend struct kvm_msi to hold a 32-bit device ID
  2016-07-05 11:22 ` [PATCH v8 03/17] KVM: extend struct kvm_msi to hold a 32-bit device ID Andre Przywara
@ 2016-07-06 21:06   ` Christoffer Dall
  2016-07-06 21:54     ` André Przywara
  0 siblings, 1 reply; 49+ messages in thread
From: Christoffer Dall @ 2016-07-06 21:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 05, 2016 at 12:22:55PM +0100, Andre Przywara wrote:
> The ARM GICv3 ITS MSI controller requires a device ID to be able to
> assign the proper interrupt vector. On real hardware, this ID is
> sampled from the bus. To be able to emulate an ITS controller, extend
> the KVM MSI interface to let userspace provide such a device ID. For
> PCI devices, the device ID is simply the 16-bit bus-device-function
> triplet, which should be easily available to the userland tool.
> 
> Also there is a new KVM capability which advertises whether the
> current VM requires a device ID to be set along with the MSI data.
> This flag is still reported as not available everywhere, later we will
> enable it when ITS emulation is used.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Reviewed-by: Eric Auger <eric.auger@linaro.org>
> ---
>  Documentation/virtual/kvm/api.txt | 12 ++++++++++--
>  include/uapi/linux/kvm.h          |  5 ++++-
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 09efa9e..6551311 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2175,10 +2175,18 @@ struct kvm_msi {
>  	__u32 address_hi;
>  	__u32 data;
>  	__u32 flags;
> -	__u8  pad[16];
> +	__u32 devid;
> +	__u8  pad[12];
>  };
>  
> -No flags are defined so far. The corresponding field must be 0.
> +flags: KVM_MSI_VALID_DEVID: devid contains a valid value
> +devid: If KVM_MSI_VALID_DEVID is set, contains a unique device identifier
> +       for the device that wrote the MSI message.
> +       For PCI, this is usually a BFD identifier in the lower 16 bits.
> +
> +The per-VM KVM_CAP_MSI_DEVID capability advertises the need to provide
> +the device ID. If this capability is not set, userland cannot rely on
> +the kernel to allow the KVM_MSI_VALID_DEVID flag being set.

If KVM_CAP_MSI_DEVID is set, is it an error to provide a struct kvm_msi
without the KVM_MSI_VALID_DEVID flag set, or not necessarily?

>  
>  
>  4.71 KVM_CREATE_PIT2
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 05ebf47..7de96f5 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_ARM_PMU_V3 126
>  #define KVM_CAP_VCPU_ATTRIBUTES 127
>  #define KVM_CAP_MAX_VCPU_ID 128
> +#define KVM_CAP_MSI_DEVID 129
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -1024,12 +1025,14 @@ struct kvm_one_reg {
>  	__u64 addr;
>  };
>  
> +#define KVM_MSI_VALID_DEVID	(1U << 0)
>  struct kvm_msi {
>  	__u32 address_lo;
>  	__u32 address_hi;
>  	__u32 data;
>  	__u32 flags;
> -	__u8  pad[16];
> +	__u32 devid;
> +	__u8  pad[12];
>  };
>  
>  struct kvm_arm_device_addr {
> -- 
> 2.9.0
> 

Looks good to me, but you probably need an ack from Paolo or Radim
before we can queue this.

FWIW: Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

-Christoffer

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 05/17] KVM: kvm_io_bus: add kvm_io_bus_get_dev() call
  2016-07-05 11:22 ` [PATCH v8 05/17] KVM: kvm_io_bus: add kvm_io_bus_get_dev() call Andre Przywara
@ 2016-07-06 21:15   ` Christoffer Dall
  2016-07-06 21:36     ` André Przywara
  0 siblings, 1 reply; 49+ messages in thread
From: Christoffer Dall @ 2016-07-06 21:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 05, 2016 at 12:22:57PM +0100, Andre Przywara wrote:
> The kvm_io_bus framework is a nice place of holding information about
> various MMIO regions for kernel emulated devices.
> Add a call to retrieve the kvm_io_device structure which is associated
> with a certain MMIO address. This avoids to duplicate kvm_io_bus'
> knowledge of MMIO regions without having to fake MMIO calls if a user
> needs the device a certain MMIO address belongs to.
> This will be used by the ITS emulation to get the associated ITS device
> when someone triggers an MSI via an ioctl from userspace.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> ---
>  include/linux/kvm_host.h |  2 ++
>  virt/kvm/kvm_main.c      | 24 ++++++++++++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 0640ee9..614a981 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -164,6 +164,8 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
>  			    int len, struct kvm_io_device *dev);
>  int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>  			      struct kvm_io_device *dev);
> +struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> +					 gpa_t addr);
>  
>  #ifdef CONFIG_KVM_ASYNC_PF
>  struct kvm_async_pf {
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ef54b4c..bd2eb92 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3496,6 +3496,30 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>  	return r;
>  }
>  

do you need to hold kvm->slots_lock here like the other functions
touching the io_bus framework here, which have comments specifying that?

> +struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
> +					 gpa_t addr)
> +{
> +	struct kvm_io_bus *bus;
> +	int dev_idx, srcu_idx;
> +	struct kvm_io_device *iodev = NULL;
> +
> +	srcu_idx = srcu_read_lock(&kvm->srcu);
> +
> +	bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
> +
> +	dev_idx = kvm_io_bus_get_first_dev(bus, addr, 1);
> +	if (dev_idx < 0)
> +		goto out_unlock;
> +
> +	iodev = bus->range[dev_idx].dev;
> +
> +out_unlock:
> +	srcu_read_unlock(&kvm->srcu, srcu_idx);
> +
> +	return iodev;
> +}
> +EXPORT_SYMBOL_GPL(kvm_io_bus_get_dev);
> +
>  static struct notifier_block kvm_cpu_notifier = {
>  	.notifier_call = kvm_cpu_hotplug,
>  };
> -- 
> 2.9.0
> 

You also need an ack here, but my comment-comment above notwithstanding:

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

-Christoffer

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 05/17] KVM: kvm_io_bus: add kvm_io_bus_get_dev() call
  2016-07-06 21:15   ` Christoffer Dall
@ 2016-07-06 21:36     ` André Przywara
  0 siblings, 0 replies; 49+ messages in thread
From: André Przywara @ 2016-07-06 21:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 06/07/16 22:15, Christoffer Dall wrote:
> On Tue, Jul 05, 2016 at 12:22:57PM +0100, Andre Przywara wrote:
>> The kvm_io_bus framework is a nice place of holding information about
>> various MMIO regions for kernel emulated devices.
>> Add a call to retrieve the kvm_io_device structure which is associated
>> with a certain MMIO address. This avoids to duplicate kvm_io_bus'
>> knowledge of MMIO regions without having to fake MMIO calls if a user
>> needs the device a certain MMIO address belongs to.
>> This will be used by the ITS emulation to get the associated ITS device
>> when someone triggers an MSI via an ioctl from userspace.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  include/linux/kvm_host.h |  2 ++
>>  virt/kvm/kvm_main.c      | 24 ++++++++++++++++++++++++
>>  2 files changed, 26 insertions(+)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 0640ee9..614a981 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -164,6 +164,8 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
>>  			    int len, struct kvm_io_device *dev);
>>  int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>>  			      struct kvm_io_device *dev);
>> +struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>> +					 gpa_t addr);
>>  
>>  #ifdef CONFIG_KVM_ASYNC_PF
>>  struct kvm_async_pf {
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index ef54b4c..bd2eb92 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -3496,6 +3496,30 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>>  	return r;
>>  }
>>  
> 
> do you need to hold kvm->slots_lock here like the other functions
> touching the io_bus framework here, which have comments specifying that?

AFAICT this comment is outdated, the slots_lock needs only to be taken
if one changes the kvm_io_bus (adding or removing devices).
Readers (looking up a device) use RCU. I looked at the other readers,
also in other architectures, none of them takes the lock AFAICS when
they just get an entry.

Paolo, Radim, can you confirm this?
Shall I send a patch that remove the misleading comments
at virt/kvm/kvm_main.c, lines 3347, 3365 and 3414?

Cheers,
Andre.

>> +struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>> +					 gpa_t addr)
>> +{
>> +	struct kvm_io_bus *bus;
>> +	int dev_idx, srcu_idx;
>> +	struct kvm_io_device *iodev = NULL;
>> +
>> +	srcu_idx = srcu_read_lock(&kvm->srcu);
>> +
>> +	bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
>> +
>> +	dev_idx = kvm_io_bus_get_first_dev(bus, addr, 1);
>> +	if (dev_idx < 0)
>> +		goto out_unlock;
>> +
>> +	iodev = bus->range[dev_idx].dev;
>> +
>> +out_unlock:
>> +	srcu_read_unlock(&kvm->srcu, srcu_idx);
>> +
>> +	return iodev;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_io_bus_get_dev);
>> +
>>  static struct notifier_block kvm_cpu_notifier = {
>>  	.notifier_call = kvm_cpu_hotplug,
>>  };
>> -- 
>> 2.9.0
>>
> 
> You also need an ack here, but my comment-comment above notwithstanding:
> 
> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 03/17] KVM: extend struct kvm_msi to hold a 32-bit device ID
  2016-07-06 21:06   ` Christoffer Dall
@ 2016-07-06 21:54     ` André Przywara
  2016-07-07  9:37       ` Christoffer Dall
  0 siblings, 1 reply; 49+ messages in thread
From: André Przywara @ 2016-07-06 21:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/07/16 22:06, Christoffer Dall wrote:
> On Tue, Jul 05, 2016 at 12:22:55PM +0100, Andre Przywara wrote:
>> The ARM GICv3 ITS MSI controller requires a device ID to be able to
>> assign the proper interrupt vector. On real hardware, this ID is
>> sampled from the bus. To be able to emulate an ITS controller, extend
>> the KVM MSI interface to let userspace provide such a device ID. For
>> PCI devices, the device ID is simply the 16-bit bus-device-function
>> triplet, which should be easily available to the userland tool.
>>
>> Also there is a new KVM capability which advertises whether the
>> current VM requires a device ID to be set along with the MSI data.
>> This flag is still reported as not available everywhere, later we will
>> enable it when ITS emulation is used.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Reviewed-by: Eric Auger <eric.auger@linaro.org>
>> ---
>>  Documentation/virtual/kvm/api.txt | 12 ++++++++++--
>>  include/uapi/linux/kvm.h          |  5 ++++-
>>  2 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 09efa9e..6551311 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2175,10 +2175,18 @@ struct kvm_msi {
>>  	__u32 address_hi;
>>  	__u32 data;
>>  	__u32 flags;
>> -	__u8  pad[16];
>> +	__u32 devid;
>> +	__u8  pad[12];
>>  };
>>  
>> -No flags are defined so far. The corresponding field must be 0.
>> +flags: KVM_MSI_VALID_DEVID: devid contains a valid value
>> +devid: If KVM_MSI_VALID_DEVID is set, contains a unique device identifier
>> +       for the device that wrote the MSI message.
>> +       For PCI, this is usually a BFD identifier in the lower 16 bits.
>> +
>> +The per-VM KVM_CAP_MSI_DEVID capability advertises the need to provide
>> +the device ID. If this capability is not set, userland cannot rely on
>> +the kernel to allow the KVM_MSI_VALID_DEVID flag being set.
> 
> If KVM_CAP_MSI_DEVID is set, is it an error to provide a struct kvm_msi
> without the KVM_MSI_VALID_DEVID flag set, or not necessarily?

In the moment we return an error when the bit is not set. But
theoretically a guest could have other (in-kernel emulated) MSI
controllers which don't require a device ID, though currently there is none.
The reason for both the KVM capability and the flag on the ioctl is for
mutual assurance that both userland and the kernel know about this ITS
specific extension.
This KVM_SIGNAL_MSI API is rather generic, so I don't want to
unnecessarily slam the door on other emulated ITS controllers.

>>  4.71 KVM_CREATE_PIT2
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 05ebf47..7de96f5 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -866,6 +866,7 @@ struct kvm_ppc_smmu_info {
>>  #define KVM_CAP_ARM_PMU_V3 126
>>  #define KVM_CAP_VCPU_ATTRIBUTES 127
>>  #define KVM_CAP_MAX_VCPU_ID 128
>> +#define KVM_CAP_MSI_DEVID 129
>>  
>>  #ifdef KVM_CAP_IRQ_ROUTING
>>  
>> @@ -1024,12 +1025,14 @@ struct kvm_one_reg {
>>  	__u64 addr;
>>  };
>>  
>> +#define KVM_MSI_VALID_DEVID	(1U << 0)
>>  struct kvm_msi {
>>  	__u32 address_lo;
>>  	__u32 address_hi;
>>  	__u32 data;
>>  	__u32 flags;
>> -	__u8  pad[16];
>> +	__u32 devid;
>> +	__u8  pad[12];
>>  };
>>  
>>  struct kvm_arm_device_addr {
>> -- 
>> 2.9.0
>>
> 
> Looks good to me, but you probably need an ack from Paolo or Radim
> before we can queue this.
> 
> FWIW: Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

Thanks!
Andre

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 03/17] KVM: extend struct kvm_msi to hold a 32-bit device ID
  2016-07-06 21:54     ` André Przywara
@ 2016-07-07  9:37       ` Christoffer Dall
  0 siblings, 0 replies; 49+ messages in thread
From: Christoffer Dall @ 2016-07-07  9:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 06, 2016 at 10:54:58PM +0100, Andr? Przywara wrote:
> On 06/07/16 22:06, Christoffer Dall wrote:
> > On Tue, Jul 05, 2016 at 12:22:55PM +0100, Andre Przywara wrote:
> >> The ARM GICv3 ITS MSI controller requires a device ID to be able to
> >> assign the proper interrupt vector. On real hardware, this ID is
> >> sampled from the bus. To be able to emulate an ITS controller, extend
> >> the KVM MSI interface to let userspace provide such a device ID. For
> >> PCI devices, the device ID is simply the 16-bit bus-device-function
> >> triplet, which should be easily available to the userland tool.
> >>
> >> Also there is a new KVM capability which advertises whether the
> >> current VM requires a device ID to be set along with the MSI data.
> >> This flag is still reported as not available everywhere, later we will
> >> enable it when ITS emulation is used.
> >>
> >> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> >> Reviewed-by: Eric Auger <eric.auger@linaro.org>
> >> ---
> >>  Documentation/virtual/kvm/api.txt | 12 ++++++++++--
> >>  include/uapi/linux/kvm.h          |  5 ++++-
> >>  2 files changed, 14 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> index 09efa9e..6551311 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -2175,10 +2175,18 @@ struct kvm_msi {
> >>  	__u32 address_hi;
> >>  	__u32 data;
> >>  	__u32 flags;
> >> -	__u8  pad[16];
> >> +	__u32 devid;
> >> +	__u8  pad[12];
> >>  };
> >>  
> >> -No flags are defined so far. The corresponding field must be 0.
> >> +flags: KVM_MSI_VALID_DEVID: devid contains a valid value
> >> +devid: If KVM_MSI_VALID_DEVID is set, contains a unique device identifier
> >> +       for the device that wrote the MSI message.
> >> +       For PCI, this is usually a BFD identifier in the lower 16 bits.
> >> +
> >> +The per-VM KVM_CAP_MSI_DEVID capability advertises the need to provide
> >> +the device ID. If this capability is not set, userland cannot rely on
> >> +the kernel to allow the KVM_MSI_VALID_DEVID flag being set.
> > 
> > If KVM_CAP_MSI_DEVID is set, is it an error to provide a struct kvm_msi
> > without the KVM_MSI_VALID_DEVID flag set, or not necessarily?
> 
> In the moment we return an error when the bit is not set. But
> theoretically a guest could have other (in-kernel emulated) MSI
> controllers which don't require a device ID, though currently there is none.
> The reason for both the KVM capability and the flag on the ioctl is for
> mutual assurance that both userland and the kernel know about this ITS
> specific extension.
> This KVM_SIGNAL_MSI API is rather generic, so I don't want to
> unnecessarily slam the door on other emulated ITS controllers.
> 

That's a reasonable argument, that you could have multiple MSI
controllers with/without device ids.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs
  2016-07-05 11:22 ` [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs Andre Przywara
@ 2016-07-07 13:13   ` Christoffer Dall
  2016-07-07 15:00   ` Marc Zyngier
  1 sibling, 0 replies; 49+ messages in thread
From: Christoffer Dall @ 2016-07-07 13:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 05, 2016 at 12:22:58PM +0100, Andre Przywara wrote:
> In the moment our struct vgic_irq's are statically allocated at guest
> creation time. So getting a pointer to an IRQ structure is trivial and
> safe. LPIs are more dynamic, they can be mapped and unmapped at any time
> during the guest's _runtime_.
> In preparation for supporting LPIs we introduce reference counting for
> those structures using the kernel's kref infrastructure.
> Since private IRQs and SPIs are statically allocated, the refcount never
> drops to 0 at the moment, but we increase it when an IRQ gets onto a VCPU
> list and decrease it when it gets removed.
> This introduces vgic_put_irq(), which wraps kref_put and hides the
> release function from the callers.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  include/kvm/arm_vgic.h           |  1 +
>  virt/kvm/arm/vgic/vgic-init.c    |  2 ++
>  virt/kvm/arm/vgic/vgic-mmio-v2.c |  8 +++++++
>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 20 +++++++++++------
>  virt/kvm/arm/vgic/vgic-mmio.c    | 25 ++++++++++++++++++++-
>  virt/kvm/arm/vgic/vgic-v2.c      |  1 +
>  virt/kvm/arm/vgic/vgic-v3.c      |  1 +
>  virt/kvm/arm/vgic/vgic.c         | 48 +++++++++++++++++++++++++++++++---------
>  virt/kvm/arm/vgic/vgic.h         |  1 +
>  9 files changed, 89 insertions(+), 18 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 5142e2a..450b4da 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -96,6 +96,7 @@ struct vgic_irq {
>  	bool active;			/* not used for LPIs */
>  	bool enabled;
>  	bool hw;			/* Tied to HW IRQ */
> +	struct kref refcount;		/* Used for LPIs */
>  	u32 hwintid;			/* HW INTID number */
>  	union {
>  		u8 targets;			/* GICv2 target VCPUs mask */
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 90cae48..ac3c1a5 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -177,6 +177,7 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
>  		spin_lock_init(&irq->irq_lock);
>  		irq->vcpu = NULL;
>  		irq->target_vcpu = vcpu0;
> +		kref_init(&irq->refcount);
>  		if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
>  			irq->targets = 0;
>  		else
> @@ -211,6 +212,7 @@ static void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>  		irq->vcpu = NULL;
>  		irq->target_vcpu = vcpu;
>  		irq->targets = 1U << vcpu->vcpu_id;
> +		kref_init(&irq->refcount);
>  		if (vgic_irq_is_sgi(i)) {
>  			/* SGIs */
>  			irq->enabled = 1;
> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> index a213936..4152348 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> @@ -102,6 +102,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
>  		irq->source |= 1U << source_vcpu->vcpu_id;
>  
>  		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
> +		vgic_put_irq(source_vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -116,6 +117,8 @@ static unsigned long vgic_mmio_read_target(struct kvm_vcpu *vcpu,
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
>  		val |= (u64)irq->targets << (i * 8);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return val;
> @@ -143,6 +146,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
>  		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -157,6 +161,8 @@ static unsigned long vgic_mmio_read_sgipend(struct kvm_vcpu *vcpu,
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
>  		val |= (u64)irq->source << (i * 8);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  	return val;
>  }
> @@ -178,6 +184,7 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
>  			irq->pending = false;
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -201,6 +208,7 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
>  		} else {
>  			spin_unlock(&irq->irq_lock);
>  		}
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> index fc7b6c9..bfcafbd 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> @@ -80,15 +80,17 @@ static unsigned long vgic_mmio_read_irouter(struct kvm_vcpu *vcpu,
>  {
>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> +	unsigned long ret = 0;
>  
>  	if (!irq)
>  		return 0;
>  
>  	/* The upper word is RAZ for us. */
> -	if (addr & 4)
> -		return 0;
> +	if (!(addr & 4))
> +		ret = extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>  
> -	return extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
> +	vgic_put_irq(vcpu->kvm, irq);
> +	return ret;
>  }
>  
>  static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
> @@ -96,15 +98,17 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>  				    unsigned long val)
>  {
>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
> -	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> -
> -	if (!irq)
> -		return;
> +	struct vgic_irq *irq;
>  
>  	/* The upper word is WI for us since we don't implement Aff3. */
>  	if (addr & 4)
>  		return;
>  
> +	irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> +
> +	if (!irq)
> +		return;
> +
>  	spin_lock(&irq->irq_lock);
>  
>  	/* We only care about and preserve Aff0, Aff1 and Aff2. */
> @@ -112,6 +116,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
>  
>  	spin_unlock(&irq->irq_lock);
> +	vgic_put_irq(vcpu->kvm, irq);
>  }
>  
>  static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
> @@ -445,5 +450,6 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>  		irq->pending = true;
>  
>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index 9f6fab7..5e79e01 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -56,6 +56,8 @@ unsigned long vgic_mmio_read_enable(struct kvm_vcpu *vcpu,
>  
>  		if (irq->enabled)
>  			value |= (1U << i);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return value;
> @@ -74,6 +76,8 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
>  		spin_lock(&irq->irq_lock);
>  		irq->enabled = true;
>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -92,6 +96,7 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
>  		irq->enabled = false;
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -108,6 +113,8 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
>  
>  		if (irq->pending)
>  			value |= (1U << i);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return value;
> @@ -129,6 +136,7 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
>  			irq->soft_pending = true;
>  
>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -152,6 +160,7 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
>  		}
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -168,6 +177,8 @@ unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu,
>  
>  		if (irq->active)
>  			value |= (1U << i);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return value;
> @@ -242,6 +253,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  		vgic_mmio_change_active(vcpu, irq, false);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  	vgic_change_active_finish(vcpu, intid);
>  }
> @@ -257,6 +269,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  		vgic_mmio_change_active(vcpu, irq, true);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  	vgic_change_active_finish(vcpu, intid);
>  }
> @@ -272,6 +285,8 @@ unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
>  		val |= (u64)irq->priority << (i * 8);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return val;
> @@ -298,6 +313,8 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
>  		/* Narrow the priority range to what we actually support */
>  		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
>  		spin_unlock(&irq->irq_lock);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -313,6 +330,8 @@ unsigned long vgic_mmio_read_config(struct kvm_vcpu *vcpu,
>  
>  		if (irq->config == VGIC_CONFIG_EDGE)
>  			value |= (2U << (i * 2));
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return value;
> @@ -326,7 +345,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>  	int i;
>  
>  	for (i = 0; i < len * 4; i++) {
> -		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> +		struct vgic_irq *irq;
>  
>  		/*
>  		 * The configuration cannot be changed for SGIs in general,
> @@ -337,14 +356,18 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>  		if (intid + i < VGIC_NR_PRIVATE_IRQS)
>  			continue;
>  
> +		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  		spin_lock(&irq->irq_lock);
> +
>  		if (test_bit(i * 2 + 1, &val)) {
>  			irq->config = VGIC_CONFIG_EDGE;
>  		} else {
>  			irq->config = VGIC_CONFIG_LEVEL;
>  			irq->pending = irq->line_level | irq->soft_pending;
>  		}
> +
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index 079bf67..0bf6709 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -124,6 +124,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  		}
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index e48a22e..f0ac064 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -113,6 +113,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  		}
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 69b61ab..ae80894 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -48,13 +48,20 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  			      u32 intid)
>  {
> -	/* SGIs and PPIs */
> -	if (intid <= VGIC_MAX_PRIVATE)
> -		return &vcpu->arch.vgic_cpu.private_irqs[intid];
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +	struct vgic_irq *irq;
>  
> -	/* SPIs */
> -	if (intid <= VGIC_MAX_SPI)
> -		return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
> +	if (intid <= VGIC_MAX_PRIVATE) {        /* SGIs and PPIs */
> +		irq = &vcpu->arch.vgic_cpu.private_irqs[intid];
> +		kref_get(&irq->refcount);
> +		return irq;
> +	}
> +
> +	if (intid <= VGIC_MAX_SPI) {            /* SPIs */
> +		irq = &dist->spis[intid - VGIC_NR_PRIVATE_IRQS];
> +		kref_get(&irq->refcount);
> +		return irq;
> +	}

It appears to me that this could be made nicer (the diff as well) by
just setting the IRQ variable and at the end of the function do:

if (irq)
	kref_get(irq->refcount);
return irq;

Perhaps a later patch will do something like that or make it obvious why
that's not a good idea.

Again, not a reason for a respin on its own.

>  
>  	/* LPIs are not yet covered */
>  	if (intid >= VGIC_MIN_LPI)
> @@ -64,6 +71,17 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  	return NULL;
>  }
>  
> +/* The refcount should never drop to 0 at the moment. */
> +static void vgic_irq_release(struct kref *ref)
> +{
> +	WARN_ON(1);
> +}
> +
> +void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> +{
> +	kref_put(&irq->refcount, vgic_irq_release);
> +}
> +
>  /**
>   * kvm_vgic_target_oracle - compute the target vcpu for an irq
>   *
> @@ -236,6 +254,7 @@ retry:
>  		goto retry;
>  	}
>  
> +	kref_get(&irq->refcount);
>  	list_add_tail(&irq->ap_list, &vcpu->arch.vgic_cpu.ap_list_head);
>  	irq->vcpu = vcpu;
>  
> @@ -269,14 +288,17 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  	if (!irq)
>  		return -EINVAL;
>  
> -	if (irq->hw != mapped_irq)
> +	if (irq->hw != mapped_irq) {
> +		vgic_put_irq(kvm, irq);
>  		return -EINVAL;
> +	}
>  
>  	spin_lock(&irq->irq_lock);
>  
>  	if (!vgic_validate_injection(irq, level)) {
>  		/* Nothing to see here, move along... */
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(kvm, irq);
>  		return 0;
>  	}
>  
> @@ -288,6 +310,7 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  	}
>  
>  	vgic_queue_irq_unlock(kvm, irq);
> +	vgic_put_irq(kvm, irq);
>  
>  	return 0;
>  }
> @@ -330,25 +353,28 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
>  	irq->hwintid = phys_irq;
>  
>  	spin_unlock(&irq->irq_lock);
> +	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return 0;
>  }
>  
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  {
> -	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> -
> -	BUG_ON(!irq);
> +	struct vgic_irq *irq;
>  
>  	if (!vgic_initialized(vcpu->kvm))
>  		return -EAGAIN;
>  
> +	irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> +	BUG_ON(!irq);
> +
>  	spin_lock(&irq->irq_lock);
>  
>  	irq->hw = false;
>  	irq->hwintid = 0;
>  
>  	spin_unlock(&irq->irq_lock);
> +	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return 0;
>  }
> @@ -386,6 +412,7 @@ retry:
>  			list_del(&irq->ap_list);
>  			irq->vcpu = NULL;
>  			spin_unlock(&irq->irq_lock);
> +			vgic_put_irq(vcpu->kvm, irq);
>  			continue;
>  		}
>  
> @@ -614,6 +641,7 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  	spin_lock(&irq->irq_lock);
>  	map_is_active = irq->hw && irq->active;
>  	spin_unlock(&irq->irq_lock);
> +	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return map_is_active;
>  }
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index c752152..5b79c34 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -38,6 +38,7 @@ struct vgic_vmcr {
>  
>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  			      u32 intid);
> +void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
>  bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
>  void vgic_kick_vcpus(struct kvm *kvm);
>  
> -- 
> 2.9.0
> 

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs
  2016-07-05 11:22 ` [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs Andre Przywara
  2016-07-07 13:13   ` Christoffer Dall
@ 2016-07-07 15:00   ` Marc Zyngier
  2016-07-08 10:28     ` Andre Przywara
  1 sibling, 1 reply; 49+ messages in thread
From: Marc Zyngier @ 2016-07-07 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/07/16 12:22, Andre Przywara wrote:
> In the moment our struct vgic_irq's are statically allocated at guest
> creation time. So getting a pointer to an IRQ structure is trivial and
> safe. LPIs are more dynamic, they can be mapped and unmapped at any time
> during the guest's _runtime_.
> In preparation for supporting LPIs we introduce reference counting for
> those structures using the kernel's kref infrastructure.
> Since private IRQs and SPIs are statically allocated, the refcount never
> drops to 0 at the moment, but we increase it when an IRQ gets onto a VCPU
> list and decrease it when it gets removed.
> This introduces vgic_put_irq(), which wraps kref_put and hides the
> release function from the callers.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  include/kvm/arm_vgic.h           |  1 +
>  virt/kvm/arm/vgic/vgic-init.c    |  2 ++
>  virt/kvm/arm/vgic/vgic-mmio-v2.c |  8 +++++++
>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 20 +++++++++++------
>  virt/kvm/arm/vgic/vgic-mmio.c    | 25 ++++++++++++++++++++-
>  virt/kvm/arm/vgic/vgic-v2.c      |  1 +
>  virt/kvm/arm/vgic/vgic-v3.c      |  1 +
>  virt/kvm/arm/vgic/vgic.c         | 48 +++++++++++++++++++++++++++++++---------
>  virt/kvm/arm/vgic/vgic.h         |  1 +
>  9 files changed, 89 insertions(+), 18 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 5142e2a..450b4da 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -96,6 +96,7 @@ struct vgic_irq {
>  	bool active;			/* not used for LPIs */
>  	bool enabled;
>  	bool hw;			/* Tied to HW IRQ */
> +	struct kref refcount;		/* Used for LPIs */
>  	u32 hwintid;			/* HW INTID number */
>  	union {
>  		u8 targets;			/* GICv2 target VCPUs mask */
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 90cae48..ac3c1a5 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -177,6 +177,7 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
>  		spin_lock_init(&irq->irq_lock);
>  		irq->vcpu = NULL;
>  		irq->target_vcpu = vcpu0;
> +		kref_init(&irq->refcount);
>  		if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
>  			irq->targets = 0;
>  		else
> @@ -211,6 +212,7 @@ static void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>  		irq->vcpu = NULL;
>  		irq->target_vcpu = vcpu;
>  		irq->targets = 1U << vcpu->vcpu_id;
> +		kref_init(&irq->refcount);
>  		if (vgic_irq_is_sgi(i)) {
>  			/* SGIs */
>  			irq->enabled = 1;
> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> index a213936..4152348 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> @@ -102,6 +102,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
>  		irq->source |= 1U << source_vcpu->vcpu_id;
>  
>  		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
> +		vgic_put_irq(source_vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -116,6 +117,8 @@ static unsigned long vgic_mmio_read_target(struct kvm_vcpu *vcpu,
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
>  		val |= (u64)irq->targets << (i * 8);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return val;
> @@ -143,6 +146,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
>  		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -157,6 +161,8 @@ static unsigned long vgic_mmio_read_sgipend(struct kvm_vcpu *vcpu,
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
>  		val |= (u64)irq->source << (i * 8);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  	return val;
>  }
> @@ -178,6 +184,7 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
>  			irq->pending = false;
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -201,6 +208,7 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
>  		} else {
>  			spin_unlock(&irq->irq_lock);
>  		}
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> index fc7b6c9..bfcafbd 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> @@ -80,15 +80,17 @@ static unsigned long vgic_mmio_read_irouter(struct kvm_vcpu *vcpu,
>  {
>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> +	unsigned long ret = 0;
>  
>  	if (!irq)
>  		return 0;
>  
>  	/* The upper word is RAZ for us. */
> -	if (addr & 4)
> -		return 0;
> +	if (!(addr & 4))
> +		ret = extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>  
> -	return extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
> +	vgic_put_irq(vcpu->kvm, irq);
> +	return ret;
>  }
>  
>  static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
> @@ -96,15 +98,17 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>  				    unsigned long val)
>  {
>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
> -	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> -
> -	if (!irq)
> -		return;
> +	struct vgic_irq *irq;
>  
>  	/* The upper word is WI for us since we don't implement Aff3. */
>  	if (addr & 4)
>  		return;
>  
> +	irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> +
> +	if (!irq)
> +		return;
> +
>  	spin_lock(&irq->irq_lock);
>  
>  	/* We only care about and preserve Aff0, Aff1 and Aff2. */
> @@ -112,6 +116,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
>  
>  	spin_unlock(&irq->irq_lock);
> +	vgic_put_irq(vcpu->kvm, irq);
>  }
>  
>  static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
> @@ -445,5 +450,6 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>  		irq->pending = true;
>  
>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index 9f6fab7..5e79e01 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -56,6 +56,8 @@ unsigned long vgic_mmio_read_enable(struct kvm_vcpu *vcpu,
>  
>  		if (irq->enabled)
>  			value |= (1U << i);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return value;
> @@ -74,6 +76,8 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
>  		spin_lock(&irq->irq_lock);
>  		irq->enabled = true;
>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -92,6 +96,7 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
>  		irq->enabled = false;
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -108,6 +113,8 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
>  
>  		if (irq->pending)
>  			value |= (1U << i);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return value;
> @@ -129,6 +136,7 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
>  			irq->soft_pending = true;
>  
>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -152,6 +160,7 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
>  		}
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -168,6 +177,8 @@ unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu,
>  
>  		if (irq->active)
>  			value |= (1U << i);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return value;
> @@ -242,6 +253,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  		vgic_mmio_change_active(vcpu, irq, false);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  	vgic_change_active_finish(vcpu, intid);
>  }
> @@ -257,6 +269,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  		vgic_mmio_change_active(vcpu, irq, true);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  	vgic_change_active_finish(vcpu, intid);
>  }
> @@ -272,6 +285,8 @@ unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
>  		val |= (u64)irq->priority << (i * 8);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return val;
> @@ -298,6 +313,8 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
>  		/* Narrow the priority range to what we actually support */
>  		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
>  		spin_unlock(&irq->irq_lock);
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> @@ -313,6 +330,8 @@ unsigned long vgic_mmio_read_config(struct kvm_vcpu *vcpu,
>  
>  		if (irq->config == VGIC_CONFIG_EDGE)
>  			value |= (2U << (i * 2));
> +
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
>  	return value;
> @@ -326,7 +345,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>  	int i;
>  
>  	for (i = 0; i < len * 4; i++) {
> -		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> +		struct vgic_irq *irq;
>  
>  		/*
>  		 * The configuration cannot be changed for SGIs in general,
> @@ -337,14 +356,18 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>  		if (intid + i < VGIC_NR_PRIVATE_IRQS)
>  			continue;
>  
> +		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  		spin_lock(&irq->irq_lock);
> +
>  		if (test_bit(i * 2 + 1, &val)) {
>  			irq->config = VGIC_CONFIG_EDGE;
>  		} else {
>  			irq->config = VGIC_CONFIG_LEVEL;
>  			irq->pending = irq->line_level | irq->soft_pending;
>  		}
> +
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index 079bf67..0bf6709 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -124,6 +124,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  		}
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index e48a22e..f0ac064 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -113,6 +113,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  		}
>  
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 69b61ab..ae80894 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -48,13 +48,20 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  			      u32 intid)
>  {
> -	/* SGIs and PPIs */
> -	if (intid <= VGIC_MAX_PRIVATE)
> -		return &vcpu->arch.vgic_cpu.private_irqs[intid];
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +	struct vgic_irq *irq;
>  
> -	/* SPIs */
> -	if (intid <= VGIC_MAX_SPI)
> -		return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
> +	if (intid <= VGIC_MAX_PRIVATE) {        /* SGIs and PPIs */
> +		irq = &vcpu->arch.vgic_cpu.private_irqs[intid];
> +		kref_get(&irq->refcount);
> +		return irq;
> +	}
> +
> +	if (intid <= VGIC_MAX_SPI) {            /* SPIs */
> +		irq = &dist->spis[intid - VGIC_NR_PRIVATE_IRQS];
> +		kref_get(&irq->refcount);
> +		return irq;
> +	}
>  
>  	/* LPIs are not yet covered */
>  	if (intid >= VGIC_MIN_LPI)
> @@ -64,6 +71,17 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  	return NULL;
>  }
>  
> +/* The refcount should never drop to 0 at the moment. */
> +static void vgic_irq_release(struct kref *ref)
> +{
> +	WARN_ON(1);
> +}
> +
> +void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> +{
> +	kref_put(&irq->refcount, vgic_irq_release);
> +}
> +
>  /**
>   * kvm_vgic_target_oracle - compute the target vcpu for an irq
>   *
> @@ -236,6 +254,7 @@ retry:
>  		goto retry;
>  	}
>  
> +	kref_get(&irq->refcount);

Could you use vgic_get_irq() instead? I know it is slightly overkill,
but I can already tell that we'll need to add some tracing in both the
put and get helpers in order to do some debugging. Having straight
kref_get/put is going to make this tracing difficult, so let's not go there.

>  	list_add_tail(&irq->ap_list, &vcpu->arch.vgic_cpu.ap_list_head);
>  	irq->vcpu = vcpu;
>  
> @@ -269,14 +288,17 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  	if (!irq)
>  		return -EINVAL;
>  
> -	if (irq->hw != mapped_irq)
> +	if (irq->hw != mapped_irq) {
> +		vgic_put_irq(kvm, irq);
>  		return -EINVAL;
> +	}
>  
>  	spin_lock(&irq->irq_lock);
>  
>  	if (!vgic_validate_injection(irq, level)) {
>  		/* Nothing to see here, move along... */
>  		spin_unlock(&irq->irq_lock);
> +		vgic_put_irq(kvm, irq);
>  		return 0;
>  	}
>  
> @@ -288,6 +310,7 @@ static int vgic_update_irq_pending(struct kvm *kvm, int cpuid,
>  	}
>  
>  	vgic_queue_irq_unlock(kvm, irq);
> +	vgic_put_irq(kvm, irq);
>  
>  	return 0;
>  }
> @@ -330,25 +353,28 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
>  	irq->hwintid = phys_irq;
>  
>  	spin_unlock(&irq->irq_lock);
> +	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return 0;
>  }
>  
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  {
> -	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> -
> -	BUG_ON(!irq);
> +	struct vgic_irq *irq;
>  
>  	if (!vgic_initialized(vcpu->kvm))
>  		return -EAGAIN;
>  
> +	irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> +	BUG_ON(!irq);
> +
>  	spin_lock(&irq->irq_lock);
>  
>  	irq->hw = false;
>  	irq->hwintid = 0;
>  
>  	spin_unlock(&irq->irq_lock);
> +	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return 0;
>  }
> @@ -386,6 +412,7 @@ retry:
>  			list_del(&irq->ap_list);
>  			irq->vcpu = NULL;
>  			spin_unlock(&irq->irq_lock);
> +			vgic_put_irq(vcpu->kvm, irq);
>  			continue;
>  		}
>  
> @@ -614,6 +641,7 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  	spin_lock(&irq->irq_lock);
>  	map_is_active = irq->hw && irq->active;
>  	spin_unlock(&irq->irq_lock);
> +	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return map_is_active;
>  }
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index c752152..5b79c34 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -38,6 +38,7 @@ struct vgic_vmcr {
>  
>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  			      u32 intid);
> +void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
>  bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
>  void vgic_kick_vcpus(struct kvm *kvm);
>  
> 

Otherwise looks good.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs
  2016-07-07 15:00   ` Marc Zyngier
@ 2016-07-08 10:28     ` Andre Przywara
  2016-07-08 10:50       ` Marc Zyngier
  0 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-08 10:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 07/07/16 16:00, Marc Zyngier wrote:
> On 05/07/16 12:22, Andre Przywara wrote:
>> In the moment our struct vgic_irq's are statically allocated at guest
>> creation time. So getting a pointer to an IRQ structure is trivial and
>> safe. LPIs are more dynamic, they can be mapped and unmapped at any time
>> during the guest's _runtime_.
>> In preparation for supporting LPIs we introduce reference counting for
>> those structures using the kernel's kref infrastructure.
>> Since private IRQs and SPIs are statically allocated, the refcount never
>> drops to 0 at the moment, but we increase it when an IRQ gets onto a VCPU
>> list and decrease it when it gets removed.
>> This introduces vgic_put_irq(), which wraps kref_put and hides the
>> release function from the callers.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  include/kvm/arm_vgic.h           |  1 +
>>  virt/kvm/arm/vgic/vgic-init.c    |  2 ++
>>  virt/kvm/arm/vgic/vgic-mmio-v2.c |  8 +++++++
>>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 20 +++++++++++------
>>  virt/kvm/arm/vgic/vgic-mmio.c    | 25 ++++++++++++++++++++-
>>  virt/kvm/arm/vgic/vgic-v2.c      |  1 +
>>  virt/kvm/arm/vgic/vgic-v3.c      |  1 +
>>  virt/kvm/arm/vgic/vgic.c         | 48 +++++++++++++++++++++++++++++++---------
>>  virt/kvm/arm/vgic/vgic.h         |  1 +
>>  9 files changed, 89 insertions(+), 18 deletions(-)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 5142e2a..450b4da 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -96,6 +96,7 @@ struct vgic_irq {
>>  	bool active;			/* not used for LPIs */
>>  	bool enabled;
>>  	bool hw;			/* Tied to HW IRQ */
>> +	struct kref refcount;		/* Used for LPIs */
>>  	u32 hwintid;			/* HW INTID number */
>>  	union {
>>  		u8 targets;			/* GICv2 target VCPUs mask */
>> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
>> index 90cae48..ac3c1a5 100644
>> --- a/virt/kvm/arm/vgic/vgic-init.c
>> +++ b/virt/kvm/arm/vgic/vgic-init.c
>> @@ -177,6 +177,7 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
>>  		spin_lock_init(&irq->irq_lock);
>>  		irq->vcpu = NULL;
>>  		irq->target_vcpu = vcpu0;
>> +		kref_init(&irq->refcount);
>>  		if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
>>  			irq->targets = 0;
>>  		else
>> @@ -211,6 +212,7 @@ static void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>>  		irq->vcpu = NULL;
>>  		irq->target_vcpu = vcpu;
>>  		irq->targets = 1U << vcpu->vcpu_id;
>> +		kref_init(&irq->refcount);
>>  		if (vgic_irq_is_sgi(i)) {
>>  			/* SGIs */
>>  			irq->enabled = 1;
>> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
>> index a213936..4152348 100644
>> --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
>> +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
>> @@ -102,6 +102,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
>>  		irq->source |= 1U << source_vcpu->vcpu_id;
>>  
>>  		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
>> +		vgic_put_irq(source_vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> @@ -116,6 +117,8 @@ static unsigned long vgic_mmio_read_target(struct kvm_vcpu *vcpu,
>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>  
>>  		val |= (u64)irq->targets << (i * 8);
>> +
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  
>>  	return val;
>> @@ -143,6 +146,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
>>  		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
>>  
>>  		spin_unlock(&irq->irq_lock);
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> @@ -157,6 +161,8 @@ static unsigned long vgic_mmio_read_sgipend(struct kvm_vcpu *vcpu,
>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>  
>>  		val |= (u64)irq->source << (i * 8);
>> +
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  	return val;
>>  }
>> @@ -178,6 +184,7 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
>>  			irq->pending = false;
>>  
>>  		spin_unlock(&irq->irq_lock);
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> @@ -201,6 +208,7 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
>>  		} else {
>>  			spin_unlock(&irq->irq_lock);
>>  		}
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>> index fc7b6c9..bfcafbd 100644
>> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
>> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>> @@ -80,15 +80,17 @@ static unsigned long vgic_mmio_read_irouter(struct kvm_vcpu *vcpu,
>>  {
>>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>> +	unsigned long ret = 0;
>>  
>>  	if (!irq)
>>  		return 0;
>>  
>>  	/* The upper word is RAZ for us. */
>> -	if (addr & 4)
>> -		return 0;
>> +	if (!(addr & 4))
>> +		ret = extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>>  
>> -	return extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>> +	vgic_put_irq(vcpu->kvm, irq);
>> +	return ret;
>>  }
>>  
>>  static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>> @@ -96,15 +98,17 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>>  				    unsigned long val)
>>  {
>>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>> -	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>> -
>> -	if (!irq)
>> -		return;
>> +	struct vgic_irq *irq;
>>  
>>  	/* The upper word is WI for us since we don't implement Aff3. */
>>  	if (addr & 4)
>>  		return;
>>  
>> +	irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>> +
>> +	if (!irq)
>> +		return;
>> +
>>  	spin_lock(&irq->irq_lock);
>>  
>>  	/* We only care about and preserve Aff0, Aff1 and Aff2. */
>> @@ -112,6 +116,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>>  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
>>  
>>  	spin_unlock(&irq->irq_lock);
>> +	vgic_put_irq(vcpu->kvm, irq);
>>  }
>>  
>>  static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
>> @@ -445,5 +450,6 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>>  		irq->pending = true;
>>  
>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
>> index 9f6fab7..5e79e01 100644
>> --- a/virt/kvm/arm/vgic/vgic-mmio.c
>> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
>> @@ -56,6 +56,8 @@ unsigned long vgic_mmio_read_enable(struct kvm_vcpu *vcpu,
>>  
>>  		if (irq->enabled)
>>  			value |= (1U << i);
>> +
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  
>>  	return value;
>> @@ -74,6 +76,8 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
>>  		spin_lock(&irq->irq_lock);
>>  		irq->enabled = true;
>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>> +
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> @@ -92,6 +96,7 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
>>  		irq->enabled = false;
>>  
>>  		spin_unlock(&irq->irq_lock);
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> @@ -108,6 +113,8 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
>>  
>>  		if (irq->pending)
>>  			value |= (1U << i);
>> +
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  
>>  	return value;
>> @@ -129,6 +136,7 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
>>  			irq->soft_pending = true;
>>  
>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> @@ -152,6 +160,7 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
>>  		}
>>  
>>  		spin_unlock(&irq->irq_lock);
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> @@ -168,6 +177,8 @@ unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu,
>>  
>>  		if (irq->active)
>>  			value |= (1U << i);
>> +
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  
>>  	return value;
>> @@ -242,6 +253,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
>>  	for_each_set_bit(i, &val, len * 8) {
>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>  		vgic_mmio_change_active(vcpu, irq, false);
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  	vgic_change_active_finish(vcpu, intid);
>>  }
>> @@ -257,6 +269,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
>>  	for_each_set_bit(i, &val, len * 8) {
>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>  		vgic_mmio_change_active(vcpu, irq, true);
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  	vgic_change_active_finish(vcpu, intid);
>>  }
>> @@ -272,6 +285,8 @@ unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>  
>>  		val |= (u64)irq->priority << (i * 8);
>> +
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  
>>  	return val;
>> @@ -298,6 +313,8 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
>>  		/* Narrow the priority range to what we actually support */
>>  		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
>>  		spin_unlock(&irq->irq_lock);
>> +
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> @@ -313,6 +330,8 @@ unsigned long vgic_mmio_read_config(struct kvm_vcpu *vcpu,
>>  
>>  		if (irq->config == VGIC_CONFIG_EDGE)
>>  			value |= (2U << (i * 2));
>> +
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  
>>  	return value;
>> @@ -326,7 +345,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>>  	int i;
>>  
>>  	for (i = 0; i < len * 4; i++) {
>> -		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>> +		struct vgic_irq *irq;
>>  
>>  		/*
>>  		 * The configuration cannot be changed for SGIs in general,
>> @@ -337,14 +356,18 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>>  		if (intid + i < VGIC_NR_PRIVATE_IRQS)
>>  			continue;
>>  
>> +		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>  		spin_lock(&irq->irq_lock);
>> +
>>  		if (test_bit(i * 2 + 1, &val)) {
>>  			irq->config = VGIC_CONFIG_EDGE;
>>  		} else {
>>  			irq->config = VGIC_CONFIG_LEVEL;
>>  			irq->pending = irq->line_level | irq->soft_pending;
>>  		}
>> +
>>  		spin_unlock(&irq->irq_lock);
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
>> index 079bf67..0bf6709 100644
>> --- a/virt/kvm/arm/vgic/vgic-v2.c
>> +++ b/virt/kvm/arm/vgic/vgic-v2.c
>> @@ -124,6 +124,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>>  		}
>>  
>>  		spin_unlock(&irq->irq_lock);
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
>> index e48a22e..f0ac064 100644
>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>> @@ -113,6 +113,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>>  		}
>>  
>>  		spin_unlock(&irq->irq_lock);
>> +		vgic_put_irq(vcpu->kvm, irq);
>>  	}
>>  }
>>  
>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>> index 69b61ab..ae80894 100644
>> --- a/virt/kvm/arm/vgic/vgic.c
>> +++ b/virt/kvm/arm/vgic/vgic.c
>> @@ -48,13 +48,20 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
>>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>>  			      u32 intid)
>>  {
>> -	/* SGIs and PPIs */
>> -	if (intid <= VGIC_MAX_PRIVATE)
>> -		return &vcpu->arch.vgic_cpu.private_irqs[intid];
>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>> +	struct vgic_irq *irq;
>>  
>> -	/* SPIs */
>> -	if (intid <= VGIC_MAX_SPI)
>> -		return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
>> +	if (intid <= VGIC_MAX_PRIVATE) {        /* SGIs and PPIs */
>> +		irq = &vcpu->arch.vgic_cpu.private_irqs[intid];
>> +		kref_get(&irq->refcount);
>> +		return irq;
>> +	}
>> +
>> +	if (intid <= VGIC_MAX_SPI) {            /* SPIs */
>> +		irq = &dist->spis[intid - VGIC_NR_PRIVATE_IRQS];
>> +		kref_get(&irq->refcount);
>> +		return irq;
>> +	}
>>  
>>  	/* LPIs are not yet covered */
>>  	if (intid >= VGIC_MIN_LPI)
>> @@ -64,6 +71,17 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>>  	return NULL;
>>  }
>>  
>> +/* The refcount should never drop to 0 at the moment. */
>> +static void vgic_irq_release(struct kref *ref)
>> +{
>> +	WARN_ON(1);
>> +}
>> +
>> +void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>> +{
>> +	kref_put(&irq->refcount, vgic_irq_release);
>> +}
>> +
>>  /**
>>   * kvm_vgic_target_oracle - compute the target vcpu for an irq
>>   *
>> @@ -236,6 +254,7 @@ retry:
>>  		goto retry;
>>  	}
>>  
>> +	kref_get(&irq->refcount);
> 
> Could you use vgic_get_irq() instead? I know it is slightly overkill,
> but I can already tell that we'll need to add some tracing in both the
> put and get helpers in order to do some debugging. Having straight
> kref_get/put is going to make this tracing difficult, so let's not go there.

I'd rather not.
1) Putting the IRQ on the ap_list is the "other user" of the
refcounting, I don't want to mix that unnecessarily with the
vgic_get_irq() (as in: get the struct by the number) use case. That may
actually help tracing, since we can have separate tracepoints to
distinguish them.
2) This would violate the locking order, since we hold the irq_lock here
and possibly take the lpi_list_lock in vgic_get_irq().
I don't think we can or should drop the irq_lock and re-take it just for
this.

I am happy to revisit this though once we actually get the tracepoints.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs
  2016-07-08 10:28     ` Andre Przywara
@ 2016-07-08 10:50       ` Marc Zyngier
  2016-07-08 12:54         ` André Przywara
  0 siblings, 1 reply; 49+ messages in thread
From: Marc Zyngier @ 2016-07-08 10:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/07/16 11:28, Andre Przywara wrote:
> Hi,
> 
> On 07/07/16 16:00, Marc Zyngier wrote:
>> On 05/07/16 12:22, Andre Przywara wrote:
>>> In the moment our struct vgic_irq's are statically allocated at guest
>>> creation time. So getting a pointer to an IRQ structure is trivial and
>>> safe. LPIs are more dynamic, they can be mapped and unmapped at any time
>>> during the guest's _runtime_.
>>> In preparation for supporting LPIs we introduce reference counting for
>>> those structures using the kernel's kref infrastructure.
>>> Since private IRQs and SPIs are statically allocated, the refcount never
>>> drops to 0 at the moment, but we increase it when an IRQ gets onto a VCPU
>>> list and decrease it when it gets removed.
>>> This introduces vgic_put_irq(), which wraps kref_put and hides the
>>> release function from the callers.
>>>
>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>> ---
>>>  include/kvm/arm_vgic.h           |  1 +
>>>  virt/kvm/arm/vgic/vgic-init.c    |  2 ++
>>>  virt/kvm/arm/vgic/vgic-mmio-v2.c |  8 +++++++
>>>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 20 +++++++++++------
>>>  virt/kvm/arm/vgic/vgic-mmio.c    | 25 ++++++++++++++++++++-
>>>  virt/kvm/arm/vgic/vgic-v2.c      |  1 +
>>>  virt/kvm/arm/vgic/vgic-v3.c      |  1 +
>>>  virt/kvm/arm/vgic/vgic.c         | 48 +++++++++++++++++++++++++++++++---------
>>>  virt/kvm/arm/vgic/vgic.h         |  1 +
>>>  9 files changed, 89 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>> index 5142e2a..450b4da 100644
>>> --- a/include/kvm/arm_vgic.h
>>> +++ b/include/kvm/arm_vgic.h
>>> @@ -96,6 +96,7 @@ struct vgic_irq {
>>>  	bool active;			/* not used for LPIs */
>>>  	bool enabled;
>>>  	bool hw;			/* Tied to HW IRQ */
>>> +	struct kref refcount;		/* Used for LPIs */
>>>  	u32 hwintid;			/* HW INTID number */
>>>  	union {
>>>  		u8 targets;			/* GICv2 target VCPUs mask */
>>> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
>>> index 90cae48..ac3c1a5 100644
>>> --- a/virt/kvm/arm/vgic/vgic-init.c
>>> +++ b/virt/kvm/arm/vgic/vgic-init.c
>>> @@ -177,6 +177,7 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
>>>  		spin_lock_init(&irq->irq_lock);
>>>  		irq->vcpu = NULL;
>>>  		irq->target_vcpu = vcpu0;
>>> +		kref_init(&irq->refcount);
>>>  		if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
>>>  			irq->targets = 0;
>>>  		else
>>> @@ -211,6 +212,7 @@ static void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>>>  		irq->vcpu = NULL;
>>>  		irq->target_vcpu = vcpu;
>>>  		irq->targets = 1U << vcpu->vcpu_id;
>>> +		kref_init(&irq->refcount);
>>>  		if (vgic_irq_is_sgi(i)) {
>>>  			/* SGIs */
>>>  			irq->enabled = 1;
>>> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>> index a213936..4152348 100644
>>> --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>> +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>> @@ -102,6 +102,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
>>>  		irq->source |= 1U << source_vcpu->vcpu_id;
>>>  
>>>  		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
>>> +		vgic_put_irq(source_vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> @@ -116,6 +117,8 @@ static unsigned long vgic_mmio_read_target(struct kvm_vcpu *vcpu,
>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>  
>>>  		val |= (u64)irq->targets << (i * 8);
>>> +
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  
>>>  	return val;
>>> @@ -143,6 +146,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
>>>  		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
>>>  
>>>  		spin_unlock(&irq->irq_lock);
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> @@ -157,6 +161,8 @@ static unsigned long vgic_mmio_read_sgipend(struct kvm_vcpu *vcpu,
>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>  
>>>  		val |= (u64)irq->source << (i * 8);
>>> +
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  	return val;
>>>  }
>>> @@ -178,6 +184,7 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
>>>  			irq->pending = false;
>>>  
>>>  		spin_unlock(&irq->irq_lock);
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> @@ -201,6 +208,7 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
>>>  		} else {
>>>  			spin_unlock(&irq->irq_lock);
>>>  		}
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>>> index fc7b6c9..bfcafbd 100644
>>> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
>>> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>>> @@ -80,15 +80,17 @@ static unsigned long vgic_mmio_read_irouter(struct kvm_vcpu *vcpu,
>>>  {
>>>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>>>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>>> +	unsigned long ret = 0;
>>>  
>>>  	if (!irq)
>>>  		return 0;
>>>  
>>>  	/* The upper word is RAZ for us. */
>>> -	if (addr & 4)
>>> -		return 0;
>>> +	if (!(addr & 4))
>>> +		ret = extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>>>  
>>> -	return extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>>> +	vgic_put_irq(vcpu->kvm, irq);
>>> +	return ret;
>>>  }
>>>  
>>>  static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>>> @@ -96,15 +98,17 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>>>  				    unsigned long val)
>>>  {
>>>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>>> -	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>>> -
>>> -	if (!irq)
>>> -		return;
>>> +	struct vgic_irq *irq;
>>>  
>>>  	/* The upper word is WI for us since we don't implement Aff3. */
>>>  	if (addr & 4)
>>>  		return;
>>>  
>>> +	irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>>> +
>>> +	if (!irq)
>>> +		return;
>>> +
>>>  	spin_lock(&irq->irq_lock);
>>>  
>>>  	/* We only care about and preserve Aff0, Aff1 and Aff2. */
>>> @@ -112,6 +116,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>>>  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
>>>  
>>>  	spin_unlock(&irq->irq_lock);
>>> +	vgic_put_irq(vcpu->kvm, irq);
>>>  }
>>>  
>>>  static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
>>> @@ -445,5 +450,6 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>>>  		irq->pending = true;
>>>  
>>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
>>> index 9f6fab7..5e79e01 100644
>>> --- a/virt/kvm/arm/vgic/vgic-mmio.c
>>> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
>>> @@ -56,6 +56,8 @@ unsigned long vgic_mmio_read_enable(struct kvm_vcpu *vcpu,
>>>  
>>>  		if (irq->enabled)
>>>  			value |= (1U << i);
>>> +
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  
>>>  	return value;
>>> @@ -74,6 +76,8 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
>>>  		spin_lock(&irq->irq_lock);
>>>  		irq->enabled = true;
>>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>>> +
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> @@ -92,6 +96,7 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
>>>  		irq->enabled = false;
>>>  
>>>  		spin_unlock(&irq->irq_lock);
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> @@ -108,6 +113,8 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
>>>  
>>>  		if (irq->pending)
>>>  			value |= (1U << i);
>>> +
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  
>>>  	return value;
>>> @@ -129,6 +136,7 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
>>>  			irq->soft_pending = true;
>>>  
>>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> @@ -152,6 +160,7 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
>>>  		}
>>>  
>>>  		spin_unlock(&irq->irq_lock);
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> @@ -168,6 +177,8 @@ unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu,
>>>  
>>>  		if (irq->active)
>>>  			value |= (1U << i);
>>> +
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  
>>>  	return value;
>>> @@ -242,6 +253,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
>>>  	for_each_set_bit(i, &val, len * 8) {
>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>  		vgic_mmio_change_active(vcpu, irq, false);
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  	vgic_change_active_finish(vcpu, intid);
>>>  }
>>> @@ -257,6 +269,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
>>>  	for_each_set_bit(i, &val, len * 8) {
>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>  		vgic_mmio_change_active(vcpu, irq, true);
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  	vgic_change_active_finish(vcpu, intid);
>>>  }
>>> @@ -272,6 +285,8 @@ unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>  
>>>  		val |= (u64)irq->priority << (i * 8);
>>> +
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  
>>>  	return val;
>>> @@ -298,6 +313,8 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
>>>  		/* Narrow the priority range to what we actually support */
>>>  		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
>>>  		spin_unlock(&irq->irq_lock);
>>> +
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> @@ -313,6 +330,8 @@ unsigned long vgic_mmio_read_config(struct kvm_vcpu *vcpu,
>>>  
>>>  		if (irq->config == VGIC_CONFIG_EDGE)
>>>  			value |= (2U << (i * 2));
>>> +
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  
>>>  	return value;
>>> @@ -326,7 +345,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>>>  	int i;
>>>  
>>>  	for (i = 0; i < len * 4; i++) {
>>> -		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>> +		struct vgic_irq *irq;
>>>  
>>>  		/*
>>>  		 * The configuration cannot be changed for SGIs in general,
>>> @@ -337,14 +356,18 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>>>  		if (intid + i < VGIC_NR_PRIVATE_IRQS)
>>>  			continue;
>>>  
>>> +		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>  		spin_lock(&irq->irq_lock);
>>> +
>>>  		if (test_bit(i * 2 + 1, &val)) {
>>>  			irq->config = VGIC_CONFIG_EDGE;
>>>  		} else {
>>>  			irq->config = VGIC_CONFIG_LEVEL;
>>>  			irq->pending = irq->line_level | irq->soft_pending;
>>>  		}
>>> +
>>>  		spin_unlock(&irq->irq_lock);
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
>>> index 079bf67..0bf6709 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v2.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c
>>> @@ -124,6 +124,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>>>  		}
>>>  
>>>  		spin_unlock(&irq->irq_lock);
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
>>> index e48a22e..f0ac064 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>>> @@ -113,6 +113,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>>>  		}
>>>  
>>>  		spin_unlock(&irq->irq_lock);
>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>>  }
>>>  
>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>>> index 69b61ab..ae80894 100644
>>> --- a/virt/kvm/arm/vgic/vgic.c
>>> +++ b/virt/kvm/arm/vgic/vgic.c
>>> @@ -48,13 +48,20 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
>>>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>>>  			      u32 intid)
>>>  {
>>> -	/* SGIs and PPIs */
>>> -	if (intid <= VGIC_MAX_PRIVATE)
>>> -		return &vcpu->arch.vgic_cpu.private_irqs[intid];
>>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>>> +	struct vgic_irq *irq;
>>>  
>>> -	/* SPIs */
>>> -	if (intid <= VGIC_MAX_SPI)
>>> -		return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
>>> +	if (intid <= VGIC_MAX_PRIVATE) {        /* SGIs and PPIs */
>>> +		irq = &vcpu->arch.vgic_cpu.private_irqs[intid];
>>> +		kref_get(&irq->refcount);
>>> +		return irq;
>>> +	}
>>> +
>>> +	if (intid <= VGIC_MAX_SPI) {            /* SPIs */
>>> +		irq = &dist->spis[intid - VGIC_NR_PRIVATE_IRQS];
>>> +		kref_get(&irq->refcount);
>>> +		return irq;
>>> +	}
>>>  
>>>  	/* LPIs are not yet covered */
>>>  	if (intid >= VGIC_MIN_LPI)
>>> @@ -64,6 +71,17 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>>>  	return NULL;
>>>  }
>>>  
>>> +/* The refcount should never drop to 0 at the moment. */
>>> +static void vgic_irq_release(struct kref *ref)
>>> +{
>>> +	WARN_ON(1);
>>> +}
>>> +
>>> +void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>>> +{
>>> +	kref_put(&irq->refcount, vgic_irq_release);
>>> +}
>>> +
>>>  /**
>>>   * kvm_vgic_target_oracle - compute the target vcpu for an irq
>>>   *
>>> @@ -236,6 +254,7 @@ retry:
>>>  		goto retry;
>>>  	}
>>>  
>>> +	kref_get(&irq->refcount);
>>
>> Could you use vgic_get_irq() instead? I know it is slightly overkill,
>> but I can already tell that we'll need to add some tracing in both the
>> put and get helpers in order to do some debugging. Having straight
>> kref_get/put is going to make this tracing difficult, so let's not go there.
> 
> I'd rather not.
> 1) Putting the IRQ on the ap_list is the "other user" of the
> refcounting, I don't want to mix that unnecessarily with the
> vgic_get_irq() (as in: get the struct by the number) use case. That may
> actually help tracing, since we can have separate tracepoints to
> distinguish them.

And yet you end-up doing a vgic_put_irq() in the fold operation. Which
is wrong, by the way, as the interrupt is still in in ap_list. This
should be moved to the prune operation.

> 2) This would violate the locking order, since we hold the irq_lock here
> and possibly take the lpi_list_lock in vgic_get_irq().
> I don't think we can or should drop the irq_lock and re-take it just for
> this.

That's a much more convincing argument. And when you take the above into
account, you realize that your locking is not correct. You shouldn't be
dropping the refcount in fold, but in prune, meaning that you're holding
the ap_lock and the irq_lock, same as when you inserted the interrupt in
the list.

This is outlining an essential principle: if your locking/refcounting is
not symmetric, it is likely that you're doing something wrong, and that
should bother you really badly.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs
  2016-07-08 10:50       ` Marc Zyngier
@ 2016-07-08 12:54         ` André Przywara
  2016-07-08 13:09           ` Marc Zyngier
  0 siblings, 1 reply; 49+ messages in thread
From: André Przywara @ 2016-07-08 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/07/16 11:50, Marc Zyngier wrote:
> On 08/07/16 11:28, Andre Przywara wrote:
>> Hi,
>>
>> On 07/07/16 16:00, Marc Zyngier wrote:
>>> On 05/07/16 12:22, Andre Przywara wrote:
>>>> In the moment our struct vgic_irq's are statically allocated at guest
>>>> creation time. So getting a pointer to an IRQ structure is trivial and
>>>> safe. LPIs are more dynamic, they can be mapped and unmapped at any time
>>>> during the guest's _runtime_.
>>>> In preparation for supporting LPIs we introduce reference counting for
>>>> those structures using the kernel's kref infrastructure.
>>>> Since private IRQs and SPIs are statically allocated, the refcount never
>>>> drops to 0 at the moment, but we increase it when an IRQ gets onto a VCPU
>>>> list and decrease it when it gets removed.
>>>> This introduces vgic_put_irq(), which wraps kref_put and hides the
>>>> release function from the callers.
>>>>
>>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>>> ---
>>>>  include/kvm/arm_vgic.h           |  1 +
>>>>  virt/kvm/arm/vgic/vgic-init.c    |  2 ++
>>>>  virt/kvm/arm/vgic/vgic-mmio-v2.c |  8 +++++++
>>>>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 20 +++++++++++------
>>>>  virt/kvm/arm/vgic/vgic-mmio.c    | 25 ++++++++++++++++++++-
>>>>  virt/kvm/arm/vgic/vgic-v2.c      |  1 +
>>>>  virt/kvm/arm/vgic/vgic-v3.c      |  1 +
>>>>  virt/kvm/arm/vgic/vgic.c         | 48 +++++++++++++++++++++++++++++++---------
>>>>  virt/kvm/arm/vgic/vgic.h         |  1 +
>>>>  9 files changed, 89 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>>> index 5142e2a..450b4da 100644
>>>> --- a/include/kvm/arm_vgic.h
>>>> +++ b/include/kvm/arm_vgic.h
>>>> @@ -96,6 +96,7 @@ struct vgic_irq {
>>>>  	bool active;			/* not used for LPIs */
>>>>  	bool enabled;
>>>>  	bool hw;			/* Tied to HW IRQ */
>>>> +	struct kref refcount;		/* Used for LPIs */
>>>>  	u32 hwintid;			/* HW INTID number */
>>>>  	union {
>>>>  		u8 targets;			/* GICv2 target VCPUs mask */
>>>> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
>>>> index 90cae48..ac3c1a5 100644
>>>> --- a/virt/kvm/arm/vgic/vgic-init.c
>>>> +++ b/virt/kvm/arm/vgic/vgic-init.c
>>>> @@ -177,6 +177,7 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
>>>>  		spin_lock_init(&irq->irq_lock);
>>>>  		irq->vcpu = NULL;
>>>>  		irq->target_vcpu = vcpu0;
>>>> +		kref_init(&irq->refcount);
>>>>  		if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
>>>>  			irq->targets = 0;
>>>>  		else
>>>> @@ -211,6 +212,7 @@ static void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>>>>  		irq->vcpu = NULL;
>>>>  		irq->target_vcpu = vcpu;
>>>>  		irq->targets = 1U << vcpu->vcpu_id;
>>>> +		kref_init(&irq->refcount);
>>>>  		if (vgic_irq_is_sgi(i)) {
>>>>  			/* SGIs */
>>>>  			irq->enabled = 1;
>>>> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>>> index a213936..4152348 100644
>>>> --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>>> +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>>> @@ -102,6 +102,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
>>>>  		irq->source |= 1U << source_vcpu->vcpu_id;
>>>>  
>>>>  		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
>>>> +		vgic_put_irq(source_vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> @@ -116,6 +117,8 @@ static unsigned long vgic_mmio_read_target(struct kvm_vcpu *vcpu,
>>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>  
>>>>  		val |= (u64)irq->targets << (i * 8);
>>>> +
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  
>>>>  	return val;
>>>> @@ -143,6 +146,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
>>>>  		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
>>>>  
>>>>  		spin_unlock(&irq->irq_lock);
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> @@ -157,6 +161,8 @@ static unsigned long vgic_mmio_read_sgipend(struct kvm_vcpu *vcpu,
>>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>  
>>>>  		val |= (u64)irq->source << (i * 8);
>>>> +
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  	return val;
>>>>  }
>>>> @@ -178,6 +184,7 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
>>>>  			irq->pending = false;
>>>>  
>>>>  		spin_unlock(&irq->irq_lock);
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> @@ -201,6 +208,7 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
>>>>  		} else {
>>>>  			spin_unlock(&irq->irq_lock);
>>>>  		}
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>>>> index fc7b6c9..bfcafbd 100644
>>>> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
>>>> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>>>> @@ -80,15 +80,17 @@ static unsigned long vgic_mmio_read_irouter(struct kvm_vcpu *vcpu,
>>>>  {
>>>>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>>>>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>>>> +	unsigned long ret = 0;
>>>>  
>>>>  	if (!irq)
>>>>  		return 0;
>>>>  
>>>>  	/* The upper word is RAZ for us. */
>>>> -	if (addr & 4)
>>>> -		return 0;
>>>> +	if (!(addr & 4))
>>>> +		ret = extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>>>>  
>>>> -	return extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>>>> +	vgic_put_irq(vcpu->kvm, irq);
>>>> +	return ret;
>>>>  }
>>>>  
>>>>  static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>>>> @@ -96,15 +98,17 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>>>>  				    unsigned long val)
>>>>  {
>>>>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>>>> -	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>>>> -
>>>> -	if (!irq)
>>>> -		return;
>>>> +	struct vgic_irq *irq;
>>>>  
>>>>  	/* The upper word is WI for us since we don't implement Aff3. */
>>>>  	if (addr & 4)
>>>>  		return;
>>>>  
>>>> +	irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>>>> +
>>>> +	if (!irq)
>>>> +		return;
>>>> +
>>>>  	spin_lock(&irq->irq_lock);
>>>>  
>>>>  	/* We only care about and preserve Aff0, Aff1 and Aff2. */
>>>> @@ -112,6 +116,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>>>>  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
>>>>  
>>>>  	spin_unlock(&irq->irq_lock);
>>>> +	vgic_put_irq(vcpu->kvm, irq);
>>>>  }
>>>>  
>>>>  static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
>>>> @@ -445,5 +450,6 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>>>>  		irq->pending = true;
>>>>  
>>>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
>>>> index 9f6fab7..5e79e01 100644
>>>> --- a/virt/kvm/arm/vgic/vgic-mmio.c
>>>> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
>>>> @@ -56,6 +56,8 @@ unsigned long vgic_mmio_read_enable(struct kvm_vcpu *vcpu,
>>>>  
>>>>  		if (irq->enabled)
>>>>  			value |= (1U << i);
>>>> +
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  
>>>>  	return value;
>>>> @@ -74,6 +76,8 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
>>>>  		spin_lock(&irq->irq_lock);
>>>>  		irq->enabled = true;
>>>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>>>> +
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> @@ -92,6 +96,7 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
>>>>  		irq->enabled = false;
>>>>  
>>>>  		spin_unlock(&irq->irq_lock);
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> @@ -108,6 +113,8 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
>>>>  
>>>>  		if (irq->pending)
>>>>  			value |= (1U << i);
>>>> +
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  
>>>>  	return value;
>>>> @@ -129,6 +136,7 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
>>>>  			irq->soft_pending = true;
>>>>  
>>>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> @@ -152,6 +160,7 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
>>>>  		}
>>>>  
>>>>  		spin_unlock(&irq->irq_lock);
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> @@ -168,6 +177,8 @@ unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu,
>>>>  
>>>>  		if (irq->active)
>>>>  			value |= (1U << i);
>>>> +
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  
>>>>  	return value;
>>>> @@ -242,6 +253,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
>>>>  	for_each_set_bit(i, &val, len * 8) {
>>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>  		vgic_mmio_change_active(vcpu, irq, false);
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  	vgic_change_active_finish(vcpu, intid);
>>>>  }
>>>> @@ -257,6 +269,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
>>>>  	for_each_set_bit(i, &val, len * 8) {
>>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>  		vgic_mmio_change_active(vcpu, irq, true);
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  	vgic_change_active_finish(vcpu, intid);
>>>>  }
>>>> @@ -272,6 +285,8 @@ unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
>>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>  
>>>>  		val |= (u64)irq->priority << (i * 8);
>>>> +
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  
>>>>  	return val;
>>>> @@ -298,6 +313,8 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
>>>>  		/* Narrow the priority range to what we actually support */
>>>>  		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
>>>>  		spin_unlock(&irq->irq_lock);
>>>> +
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> @@ -313,6 +330,8 @@ unsigned long vgic_mmio_read_config(struct kvm_vcpu *vcpu,
>>>>  
>>>>  		if (irq->config == VGIC_CONFIG_EDGE)
>>>>  			value |= (2U << (i * 2));
>>>> +
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  
>>>>  	return value;
>>>> @@ -326,7 +345,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>>>>  	int i;
>>>>  
>>>>  	for (i = 0; i < len * 4; i++) {
>>>> -		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>> +		struct vgic_irq *irq;
>>>>  
>>>>  		/*
>>>>  		 * The configuration cannot be changed for SGIs in general,
>>>> @@ -337,14 +356,18 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>>>>  		if (intid + i < VGIC_NR_PRIVATE_IRQS)
>>>>  			continue;
>>>>  
>>>> +		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>  		spin_lock(&irq->irq_lock);
>>>> +
>>>>  		if (test_bit(i * 2 + 1, &val)) {
>>>>  			irq->config = VGIC_CONFIG_EDGE;
>>>>  		} else {
>>>>  			irq->config = VGIC_CONFIG_LEVEL;
>>>>  			irq->pending = irq->line_level | irq->soft_pending;
>>>>  		}
>>>> +
>>>>  		spin_unlock(&irq->irq_lock);
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
>>>> index 079bf67..0bf6709 100644
>>>> --- a/virt/kvm/arm/vgic/vgic-v2.c
>>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c
>>>> @@ -124,6 +124,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>>>>  		}
>>>>  
>>>>  		spin_unlock(&irq->irq_lock);
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
>>>> index e48a22e..f0ac064 100644
>>>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>>>> @@ -113,6 +113,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>>>>  		}
>>>>  
>>>>  		spin_unlock(&irq->irq_lock);
>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>  	}
>>>>  }
>>>>  
>>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>>>> index 69b61ab..ae80894 100644
>>>> --- a/virt/kvm/arm/vgic/vgic.c
>>>> +++ b/virt/kvm/arm/vgic/vgic.c
>>>> @@ -48,13 +48,20 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
>>>>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>>>>  			      u32 intid)
>>>>  {
>>>> -	/* SGIs and PPIs */
>>>> -	if (intid <= VGIC_MAX_PRIVATE)
>>>> -		return &vcpu->arch.vgic_cpu.private_irqs[intid];
>>>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>>>> +	struct vgic_irq *irq;
>>>>  
>>>> -	/* SPIs */
>>>> -	if (intid <= VGIC_MAX_SPI)
>>>> -		return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
>>>> +	if (intid <= VGIC_MAX_PRIVATE) {        /* SGIs and PPIs */
>>>> +		irq = &vcpu->arch.vgic_cpu.private_irqs[intid];
>>>> +		kref_get(&irq->refcount);
>>>> +		return irq;
>>>> +	}
>>>> +
>>>> +	if (intid <= VGIC_MAX_SPI) {            /* SPIs */
>>>> +		irq = &dist->spis[intid - VGIC_NR_PRIVATE_IRQS];
>>>> +		kref_get(&irq->refcount);
>>>> +		return irq;
>>>> +	}
>>>>  
>>>>  	/* LPIs are not yet covered */
>>>>  	if (intid >= VGIC_MIN_LPI)
>>>> @@ -64,6 +71,17 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>>>>  	return NULL;
>>>>  }
>>>>  
>>>> +/* The refcount should never drop to 0 at the moment. */
>>>> +static void vgic_irq_release(struct kref *ref)
>>>> +{
>>>> +	WARN_ON(1);
>>>> +}
>>>> +
>>>> +void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>>>> +{
>>>> +	kref_put(&irq->refcount, vgic_irq_release);
>>>> +}
>>>> +
>>>>  /**
>>>>   * kvm_vgic_target_oracle - compute the target vcpu for an irq
>>>>   *
>>>> @@ -236,6 +254,7 @@ retry:
>>>>  		goto retry;
>>>>  	}
>>>>  
>>>> +	kref_get(&irq->refcount);
>>>
>>> Could you use vgic_get_irq() instead? I know it is slightly overkill,
>>> but I can already tell that we'll need to add some tracing in both the
>>> put and get helpers in order to do some debugging. Having straight
>>> kref_get/put is going to make this tracing difficult, so let's not go there.
>>
>> I'd rather not.
>> 1) Putting the IRQ on the ap_list is the "other user" of the
>> refcounting, I don't want to mix that unnecessarily with the
>> vgic_get_irq() (as in: get the struct by the number) use case. That may
>> actually help tracing, since we can have separate tracepoints to
>> distinguish them.
> 
> And yet you end-up doing a vgic_put_irq() in the fold operation. Which
> is wrong, by the way, as the interrupt is still in in ap_list. This
> should be moved to the prune operation.
> 
>> 2) This would violate the locking order, since we hold the irq_lock here
>> and possibly take the lpi_list_lock in vgic_get_irq().
>> I don't think we can or should drop the irq_lock and re-take it just for
>> this.
> 
> That's a much more convincing argument. And when you take the above into
> account, you realize that your locking is not correct. You shouldn't be
> dropping the refcount in fold, but in prune, meaning that you're holding
> the ap_lock and the irq_lock, same as when you inserted the interrupt in
> the list.
> 
> This is outlining an essential principle: if your locking/refcounting is
> not symmetric, it is likely that you're doing something wrong, and that
> should bother you really badly.

Can you point me to the exact location where it's not symmetric?
I just looked at it again and can't find the issue.
I "put" it in v[23]_fold because we did a vgic_get_irq a few lines
before to translate the LR's intid into our struct vgic_irq pointer.
The vgic_get_irq() call isn't in this patch, because we used it already
before and this patch is just adding the respective puts.
The only asymmetry I could find is the expected one when it comes to and
goes from the ap_list.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs
  2016-07-08 12:54         ` André Przywara
@ 2016-07-08 13:09           ` Marc Zyngier
  2016-07-08 13:14             ` André Przywara
  0 siblings, 1 reply; 49+ messages in thread
From: Marc Zyngier @ 2016-07-08 13:09 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/07/16 13:54, Andr? Przywara wrote:
> On 08/07/16 11:50, Marc Zyngier wrote:
>> On 08/07/16 11:28, Andre Przywara wrote:
>>> Hi,
>>>
>>> On 07/07/16 16:00, Marc Zyngier wrote:
>>>> On 05/07/16 12:22, Andre Przywara wrote:
>>>>> In the moment our struct vgic_irq's are statically allocated at guest
>>>>> creation time. So getting a pointer to an IRQ structure is trivial and
>>>>> safe. LPIs are more dynamic, they can be mapped and unmapped at any time
>>>>> during the guest's _runtime_.
>>>>> In preparation for supporting LPIs we introduce reference counting for
>>>>> those structures using the kernel's kref infrastructure.
>>>>> Since private IRQs and SPIs are statically allocated, the refcount never
>>>>> drops to 0 at the moment, but we increase it when an IRQ gets onto a VCPU
>>>>> list and decrease it when it gets removed.
>>>>> This introduces vgic_put_irq(), which wraps kref_put and hides the
>>>>> release function from the callers.
>>>>>
>>>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>>>> ---
>>>>>  include/kvm/arm_vgic.h           |  1 +
>>>>>  virt/kvm/arm/vgic/vgic-init.c    |  2 ++
>>>>>  virt/kvm/arm/vgic/vgic-mmio-v2.c |  8 +++++++
>>>>>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 20 +++++++++++------
>>>>>  virt/kvm/arm/vgic/vgic-mmio.c    | 25 ++++++++++++++++++++-
>>>>>  virt/kvm/arm/vgic/vgic-v2.c      |  1 +
>>>>>  virt/kvm/arm/vgic/vgic-v3.c      |  1 +
>>>>>  virt/kvm/arm/vgic/vgic.c         | 48 +++++++++++++++++++++++++++++++---------
>>>>>  virt/kvm/arm/vgic/vgic.h         |  1 +
>>>>>  9 files changed, 89 insertions(+), 18 deletions(-)
>>>>>
>>>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>>>> index 5142e2a..450b4da 100644
>>>>> --- a/include/kvm/arm_vgic.h
>>>>> +++ b/include/kvm/arm_vgic.h
>>>>> @@ -96,6 +96,7 @@ struct vgic_irq {
>>>>>  	bool active;			/* not used for LPIs */
>>>>>  	bool enabled;
>>>>>  	bool hw;			/* Tied to HW IRQ */
>>>>> +	struct kref refcount;		/* Used for LPIs */
>>>>>  	u32 hwintid;			/* HW INTID number */
>>>>>  	union {
>>>>>  		u8 targets;			/* GICv2 target VCPUs mask */
>>>>> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
>>>>> index 90cae48..ac3c1a5 100644
>>>>> --- a/virt/kvm/arm/vgic/vgic-init.c
>>>>> +++ b/virt/kvm/arm/vgic/vgic-init.c
>>>>> @@ -177,6 +177,7 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
>>>>>  		spin_lock_init(&irq->irq_lock);
>>>>>  		irq->vcpu = NULL;
>>>>>  		irq->target_vcpu = vcpu0;
>>>>> +		kref_init(&irq->refcount);
>>>>>  		if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V2)
>>>>>  			irq->targets = 0;
>>>>>  		else
>>>>> @@ -211,6 +212,7 @@ static void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>>>>>  		irq->vcpu = NULL;
>>>>>  		irq->target_vcpu = vcpu;
>>>>>  		irq->targets = 1U << vcpu->vcpu_id;
>>>>> +		kref_init(&irq->refcount);
>>>>>  		if (vgic_irq_is_sgi(i)) {
>>>>>  			/* SGIs */
>>>>>  			irq->enabled = 1;
>>>>> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>>>> index a213936..4152348 100644
>>>>> --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>>>> +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
>>>>> @@ -102,6 +102,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
>>>>>  		irq->source |= 1U << source_vcpu->vcpu_id;
>>>>>  
>>>>>  		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
>>>>> +		vgic_put_irq(source_vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> @@ -116,6 +117,8 @@ static unsigned long vgic_mmio_read_target(struct kvm_vcpu *vcpu,
>>>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>>  
>>>>>  		val |= (u64)irq->targets << (i * 8);
>>>>> +
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  
>>>>>  	return val;
>>>>> @@ -143,6 +146,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
>>>>>  		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
>>>>>  
>>>>>  		spin_unlock(&irq->irq_lock);
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> @@ -157,6 +161,8 @@ static unsigned long vgic_mmio_read_sgipend(struct kvm_vcpu *vcpu,
>>>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>>  
>>>>>  		val |= (u64)irq->source << (i * 8);
>>>>> +
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  	return val;
>>>>>  }
>>>>> @@ -178,6 +184,7 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
>>>>>  			irq->pending = false;
>>>>>  
>>>>>  		spin_unlock(&irq->irq_lock);
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> @@ -201,6 +208,7 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
>>>>>  		} else {
>>>>>  			spin_unlock(&irq->irq_lock);
>>>>>  		}
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>>>>> index fc7b6c9..bfcafbd 100644
>>>>> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
>>>>> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>>>>> @@ -80,15 +80,17 @@ static unsigned long vgic_mmio_read_irouter(struct kvm_vcpu *vcpu,
>>>>>  {
>>>>>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>>>>>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>>>>> +	unsigned long ret = 0;
>>>>>  
>>>>>  	if (!irq)
>>>>>  		return 0;
>>>>>  
>>>>>  	/* The upper word is RAZ for us. */
>>>>> -	if (addr & 4)
>>>>> -		return 0;
>>>>> +	if (!(addr & 4))
>>>>> +		ret = extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>>>>>  
>>>>> -	return extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>>>>> +	vgic_put_irq(vcpu->kvm, irq);
>>>>> +	return ret;
>>>>>  }
>>>>>  
>>>>>  static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>>>>> @@ -96,15 +98,17 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>>>>>  				    unsigned long val)
>>>>>  {
>>>>>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>>>>> -	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>>>>> -
>>>>> -	if (!irq)
>>>>> -		return;
>>>>> +	struct vgic_irq *irq;
>>>>>  
>>>>>  	/* The upper word is WI for us since we don't implement Aff3. */
>>>>>  	if (addr & 4)
>>>>>  		return;
>>>>>  
>>>>> +	irq = vgic_get_irq(vcpu->kvm, NULL, intid);
>>>>> +
>>>>> +	if (!irq)
>>>>> +		return;
>>>>> +
>>>>>  	spin_lock(&irq->irq_lock);
>>>>>  
>>>>>  	/* We only care about and preserve Aff0, Aff1 and Aff2. */
>>>>> @@ -112,6 +116,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>>>>>  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
>>>>>  
>>>>>  	spin_unlock(&irq->irq_lock);
>>>>> +	vgic_put_irq(vcpu->kvm, irq);
>>>>>  }
>>>>>  
>>>>>  static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
>>>>> @@ -445,5 +450,6 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>>>>>  		irq->pending = true;
>>>>>  
>>>>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
>>>>> index 9f6fab7..5e79e01 100644
>>>>> --- a/virt/kvm/arm/vgic/vgic-mmio.c
>>>>> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
>>>>> @@ -56,6 +56,8 @@ unsigned long vgic_mmio_read_enable(struct kvm_vcpu *vcpu,
>>>>>  
>>>>>  		if (irq->enabled)
>>>>>  			value |= (1U << i);
>>>>> +
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  
>>>>>  	return value;
>>>>> @@ -74,6 +76,8 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
>>>>>  		spin_lock(&irq->irq_lock);
>>>>>  		irq->enabled = true;
>>>>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>>>>> +
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> @@ -92,6 +96,7 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
>>>>>  		irq->enabled = false;
>>>>>  
>>>>>  		spin_unlock(&irq->irq_lock);
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> @@ -108,6 +113,8 @@ unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu,
>>>>>  
>>>>>  		if (irq->pending)
>>>>>  			value |= (1U << i);
>>>>> +
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  
>>>>>  	return value;
>>>>> @@ -129,6 +136,7 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
>>>>>  			irq->soft_pending = true;
>>>>>  
>>>>>  		vgic_queue_irq_unlock(vcpu->kvm, irq);
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> @@ -152,6 +160,7 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
>>>>>  		}
>>>>>  
>>>>>  		spin_unlock(&irq->irq_lock);
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> @@ -168,6 +177,8 @@ unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu,
>>>>>  
>>>>>  		if (irq->active)
>>>>>  			value |= (1U << i);
>>>>> +
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  
>>>>>  	return value;
>>>>> @@ -242,6 +253,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
>>>>>  	for_each_set_bit(i, &val, len * 8) {
>>>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>>  		vgic_mmio_change_active(vcpu, irq, false);
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  	vgic_change_active_finish(vcpu, intid);
>>>>>  }
>>>>> @@ -257,6 +269,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
>>>>>  	for_each_set_bit(i, &val, len * 8) {
>>>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>>  		vgic_mmio_change_active(vcpu, irq, true);
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  	vgic_change_active_finish(vcpu, intid);
>>>>>  }
>>>>> @@ -272,6 +285,8 @@ unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
>>>>>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>>  
>>>>>  		val |= (u64)irq->priority << (i * 8);
>>>>> +
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  
>>>>>  	return val;
>>>>> @@ -298,6 +313,8 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
>>>>>  		/* Narrow the priority range to what we actually support */
>>>>>  		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
>>>>>  		spin_unlock(&irq->irq_lock);
>>>>> +
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> @@ -313,6 +330,8 @@ unsigned long vgic_mmio_read_config(struct kvm_vcpu *vcpu,
>>>>>  
>>>>>  		if (irq->config == VGIC_CONFIG_EDGE)
>>>>>  			value |= (2U << (i * 2));
>>>>> +
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  
>>>>>  	return value;
>>>>> @@ -326,7 +345,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>>>>>  	int i;
>>>>>  
>>>>>  	for (i = 0; i < len * 4; i++) {
>>>>> -		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>> +		struct vgic_irq *irq;
>>>>>  
>>>>>  		/*
>>>>>  		 * The configuration cannot be changed for SGIs in general,
>>>>> @@ -337,14 +356,18 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>>>>>  		if (intid + i < VGIC_NR_PRIVATE_IRQS)
>>>>>  			continue;
>>>>>  
>>>>> +		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>>>>>  		spin_lock(&irq->irq_lock);
>>>>> +
>>>>>  		if (test_bit(i * 2 + 1, &val)) {
>>>>>  			irq->config = VGIC_CONFIG_EDGE;
>>>>>  		} else {
>>>>>  			irq->config = VGIC_CONFIG_LEVEL;
>>>>>  			irq->pending = irq->line_level | irq->soft_pending;
>>>>>  		}
>>>>> +
>>>>>  		spin_unlock(&irq->irq_lock);
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
>>>>> index 079bf67..0bf6709 100644
>>>>> --- a/virt/kvm/arm/vgic/vgic-v2.c
>>>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c
>>>>> @@ -124,6 +124,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>>>>>  		}
>>>>>  
>>>>>  		spin_unlock(&irq->irq_lock);
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
>>>>> index e48a22e..f0ac064 100644
>>>>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>>>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>>>>> @@ -113,6 +113,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>>>>>  		}
>>>>>  
>>>>>  		spin_unlock(&irq->irq_lock);
>>>>> +		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>>  }
>>>>>  
>>>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>>>>> index 69b61ab..ae80894 100644
>>>>> --- a/virt/kvm/arm/vgic/vgic.c
>>>>> +++ b/virt/kvm/arm/vgic/vgic.c
>>>>> @@ -48,13 +48,20 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
>>>>>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>>>>>  			      u32 intid)
>>>>>  {
>>>>> -	/* SGIs and PPIs */
>>>>> -	if (intid <= VGIC_MAX_PRIVATE)
>>>>> -		return &vcpu->arch.vgic_cpu.private_irqs[intid];
>>>>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>>>>> +	struct vgic_irq *irq;
>>>>>  
>>>>> -	/* SPIs */
>>>>> -	if (intid <= VGIC_MAX_SPI)
>>>>> -		return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
>>>>> +	if (intid <= VGIC_MAX_PRIVATE) {        /* SGIs and PPIs */
>>>>> +		irq = &vcpu->arch.vgic_cpu.private_irqs[intid];
>>>>> +		kref_get(&irq->refcount);
>>>>> +		return irq;
>>>>> +	}
>>>>> +
>>>>> +	if (intid <= VGIC_MAX_SPI) {            /* SPIs */
>>>>> +		irq = &dist->spis[intid - VGIC_NR_PRIVATE_IRQS];
>>>>> +		kref_get(&irq->refcount);
>>>>> +		return irq;
>>>>> +	}
>>>>>  
>>>>>  	/* LPIs are not yet covered */
>>>>>  	if (intid >= VGIC_MIN_LPI)
>>>>> @@ -64,6 +71,17 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>>>>>  	return NULL;
>>>>>  }
>>>>>  
>>>>> +/* The refcount should never drop to 0 at the moment. */
>>>>> +static void vgic_irq_release(struct kref *ref)
>>>>> +{
>>>>> +	WARN_ON(1);
>>>>> +}
>>>>> +
>>>>> +void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>>>>> +{
>>>>> +	kref_put(&irq->refcount, vgic_irq_release);
>>>>> +}
>>>>> +
>>>>>  /**
>>>>>   * kvm_vgic_target_oracle - compute the target vcpu for an irq
>>>>>   *
>>>>> @@ -236,6 +254,7 @@ retry:
>>>>>  		goto retry;
>>>>>  	}
>>>>>  
>>>>> +	kref_get(&irq->refcount);
>>>>
>>>> Could you use vgic_get_irq() instead? I know it is slightly overkill,
>>>> but I can already tell that we'll need to add some tracing in both the
>>>> put and get helpers in order to do some debugging. Having straight
>>>> kref_get/put is going to make this tracing difficult, so let's not go there.
>>>
>>> I'd rather not.
>>> 1) Putting the IRQ on the ap_list is the "other user" of the
>>> refcounting, I don't want to mix that unnecessarily with the
>>> vgic_get_irq() (as in: get the struct by the number) use case. That may
>>> actually help tracing, since we can have separate tracepoints to
>>> distinguish them.
>>
>> And yet you end-up doing a vgic_put_irq() in the fold operation. Which
>> is wrong, by the way, as the interrupt is still in in ap_list. This
>> should be moved to the prune operation.
>>
>>> 2) This would violate the locking order, since we hold the irq_lock here
>>> and possibly take the lpi_list_lock in vgic_get_irq().
>>> I don't think we can or should drop the irq_lock and re-take it just for
>>> this.
>>
>> That's a much more convincing argument. And when you take the above into
>> account, you realize that your locking is not correct. You shouldn't be
>> dropping the refcount in fold, but in prune, meaning that you're holding
>> the ap_lock and the irq_lock, same as when you inserted the interrupt in
>> the list.
>>
>> This is outlining an essential principle: if your locking/refcounting is
>> not symmetric, it is likely that you're doing something wrong, and that
>> should bother you really badly.
> 
> Can you point me to the exact location where it's not symmetric?
> I just looked at it again and can't find the issue.
> I "put" it in v[23]_fold because we did a vgic_get_irq a few lines
> before to translate the LR's intid into our struct vgic_irq pointer.

Right. I misread that one, apologies.

> The vgic_get_irq() call isn't in this patch, because we used it already
> before and this patch is just adding the respective puts.
> The only asymmetry I could find is the expected one when it comes to and
> goes from the ap_list.

So I assume this is the pendent of the kref_get call?

@@ -386,6 +412,7 @@ retry:
 			list_del(&irq->ap_list);
 			irq->vcpu = NULL;
 			spin_unlock(&irq->irq_lock);
+			vgic_put_irq(vcpu->kvm, irq);
 			continue;

If that's the case, please add a comment, because this is really hard to
find out which vgic_put_irq() balances with a kref_get() and not a
vgic_get_irq().

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs
  2016-07-08 13:09           ` Marc Zyngier
@ 2016-07-08 13:14             ` André Przywara
  0 siblings, 0 replies; 49+ messages in thread
From: André Przywara @ 2016-07-08 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/07/16 14:09, Marc Zyngier wrote:
> On 08/07/16 13:54, Andr? Przywara wrote:
>> On 08/07/16 11:50, Marc Zyngier wrote:
>>> On 08/07/16 11:28, Andre Przywara wrote:
>>>> Hi,
>>>>
>>>> On 07/07/16 16:00, Marc Zyngier wrote:
>>>>> On 05/07/16 12:22, Andre Przywara wrote:

....

>>>>>> @@ -236,6 +254,7 @@ retry:
>>>>>>  		goto retry;
>>>>>>  	}
>>>>>>  
>>>>>> +	kref_get(&irq->refcount);
>>>>>
>>>>> Could you use vgic_get_irq() instead? I know it is slightly overkill,
>>>>> but I can already tell that we'll need to add some tracing in both the
>>>>> put and get helpers in order to do some debugging. Having straight
>>>>> kref_get/put is going to make this tracing difficult, so let's not go there.
>>>>
>>>> I'd rather not.
>>>> 1) Putting the IRQ on the ap_list is the "other user" of the
>>>> refcounting, I don't want to mix that unnecessarily with the
>>>> vgic_get_irq() (as in: get the struct by the number) use case. That may
>>>> actually help tracing, since we can have separate tracepoints to
>>>> distinguish them.
>>>
>>> And yet you end-up doing a vgic_put_irq() in the fold operation. Which
>>> is wrong, by the way, as the interrupt is still in in ap_list. This
>>> should be moved to the prune operation.
>>>
>>>> 2) This would violate the locking order, since we hold the irq_lock here
>>>> and possibly take the lpi_list_lock in vgic_get_irq().
>>>> I don't think we can or should drop the irq_lock and re-take it just for
>>>> this.
>>>
>>> That's a much more convincing argument. And when you take the above into
>>> account, you realize that your locking is not correct. You shouldn't be
>>> dropping the refcount in fold, but in prune, meaning that you're holding
>>> the ap_lock and the irq_lock, same as when you inserted the interrupt in
>>> the list.
>>>
>>> This is outlining an essential principle: if your locking/refcounting is
>>> not symmetric, it is likely that you're doing something wrong, and that
>>> should bother you really badly.
>>
>> Can you point me to the exact location where it's not symmetric?
>> I just looked at it again and can't find the issue.
>> I "put" it in v[23]_fold because we did a vgic_get_irq a few lines
>> before to translate the LR's intid into our struct vgic_irq pointer.
> 
> Right. I misread that one, apologies.
> 
>> The vgic_get_irq() call isn't in this patch, because we used it already
>> before and this patch is just adding the respective puts.
>> The only asymmetry I could find is the expected one when it comes to and
>> goes from the ap_list.
> 
> So I assume this is the pendent of the kref_get call?

Yes.

> 
> @@ -386,6 +412,7 @@ retry:
>  			list_del(&irq->ap_list);
>  			irq->vcpu = NULL;
>  			spin_unlock(&irq->irq_lock);
> +			vgic_put_irq(vcpu->kvm, irq);
>  			continue;
> 
> If that's the case, please add a comment, because this is really hard to
> find out which vgic_put_irq() balances with a kref_get() and not a
> vgic_get_irq().

Yes, that's what I thought as well just _after_ having hit the Send
button ...

I think much of the confusion stems from the fact that we used
vgic_get_irq() before, only that it wasn't a "get" in the refcounting
get-put sense.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 09/17] KVM: arm64: introduce ITS emulation file with MMIO framework
  2016-07-05 11:23 ` [PATCH v8 09/17] KVM: arm64: introduce ITS emulation file with MMIO framework Andre Przywara
@ 2016-07-08 13:34   ` Marc Zyngier
  2016-07-08 13:55     ` Marc Zyngier
  2016-07-08 14:04     ` André Przywara
  0 siblings, 2 replies; 49+ messages in thread
From: Marc Zyngier @ 2016-07-08 13:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/07/16 12:23, Andre Przywara wrote:
> The ARM GICv3 ITS emulation code goes into a separate file, but needs
> to be connected to the GICv3 emulation, of which it is an option.
> The ITS MMIO handlers require the respective ITS pointer to be passed in,
> so we amend the existing VGIC MMIO framework to let it cope with that.
> Also we introduce the basic ITS data structure and initialize it, but
> don't return any success yet, as we are not yet ready for the show.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  arch/arm64/kvm/Makefile          |   1 +
>  include/kvm/arm_vgic.h           |  14 +++++-
>  virt/kvm/arm/vgic/vgic-its.c     | 100 +++++++++++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic-mmio-v2.c |  40 +++++++--------
>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 104 ++++++++++++++++++++++++++-------------
>  virt/kvm/arm/vgic/vgic-mmio.c    |  36 +++++++++++---
>  virt/kvm/arm/vgic/vgic-mmio.h    |  31 +++++++++---
>  virt/kvm/arm/vgic/vgic.h         |   7 +++
>  8 files changed, 266 insertions(+), 67 deletions(-)
>  create mode 100644 virt/kvm/arm/vgic/vgic-its.c
> 
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index f00b2cd..a5b9664 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -29,5 +29,6 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio-v2.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio-v3.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-kvm-device.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index f6f860d..f606641 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -108,15 +108,27 @@ struct vgic_irq {
>  };
>  
>  struct vgic_register_region;
> +struct vgic_its;
>  
>  struct vgic_io_device {
>  	gpa_t base_addr;
> -	struct kvm_vcpu *redist_vcpu;
> +	union {
> +		struct kvm_vcpu *redist_vcpu;
> +		struct vgic_its *its;
> +	};

The only question that springs to mind is...

>  	const struct vgic_register_region *regions;
>  	int nr_regions;
>  	struct kvm_io_device dev;
>  };
>  
> +struct vgic_its {
> +	/* The base address of the ITS control register frame */
> +	gpa_t			vgic_its_base;
> +
> +	bool			enabled;
> +	struct vgic_io_device	iodev;
> +};
> +
>  struct vgic_dist {
>  	bool			in_kernel;
>  	bool			ready;
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> new file mode 100644
> index 0000000..ab8d244
> --- /dev/null
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -0,0 +1,100 @@
> +/*
> + * GICv3 ITS emulation
> + *
> + * Copyright (C) 2015,2016 ARM Ltd.
> + * Author: Andre Przywara <andre.przywara@arm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/cpu.h>
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +#include <linux/interrupt.h>
> +
> +#include <linux/irqchip/arm-gic-v3.h>
> +
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_mmu.h>
> +
> +#include "vgic.h"
> +#include "vgic-mmio.h"
> +
> +#define REGISTER_ITS_DESC(off, rd, wr, length, acc)		\
> +{								\
> +	.reg_offset = off,					\
> +	.len = length,						\
> +	.access_flags = acc,					\
> +	.iodev_type = IODEV_ITS,				\

... why isn't this at the device level? It doesn't make much sense to
have it at the register level (we never access a register in isolation,
we always access it relatively to a device).

And given that the *only* time you actually evaluate this flag is in
dispatch_mmio_read/write, there is zero benefit in duplicating it all
over the place.

Smaller structures, smaller patch. Am I missing something?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 09/17] KVM: arm64: introduce ITS emulation file with MMIO framework
  2016-07-08 13:34   ` Marc Zyngier
@ 2016-07-08 13:55     ` Marc Zyngier
  2016-07-08 14:04     ` André Przywara
  1 sibling, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2016-07-08 13:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/07/16 14:34, Marc Zyngier wrote:
> On 05/07/16 12:23, Andre Przywara wrote:
>> The ARM GICv3 ITS emulation code goes into a separate file, but needs
>> to be connected to the GICv3 emulation, of which it is an option.
>> The ITS MMIO handlers require the respective ITS pointer to be passed in,
>> so we amend the existing VGIC MMIO framework to let it cope with that.
>> Also we introduce the basic ITS data structure and initialize it, but
>> don't return any success yet, as we are not yet ready for the show.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  arch/arm64/kvm/Makefile          |   1 +
>>  include/kvm/arm_vgic.h           |  14 +++++-
>>  virt/kvm/arm/vgic/vgic-its.c     | 100 +++++++++++++++++++++++++++++++++++++
>>  virt/kvm/arm/vgic/vgic-mmio-v2.c |  40 +++++++--------
>>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 104 ++++++++++++++++++++++++++-------------
>>  virt/kvm/arm/vgic/vgic-mmio.c    |  36 +++++++++++---
>>  virt/kvm/arm/vgic/vgic-mmio.h    |  31 +++++++++---
>>  virt/kvm/arm/vgic/vgic.h         |   7 +++
>>  8 files changed, 266 insertions(+), 67 deletions(-)
>>  create mode 100644 virt/kvm/arm/vgic/vgic-its.c
>>
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index f00b2cd..a5b9664 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -29,5 +29,6 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio-v2.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio-v3.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-kvm-device.o
>> +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index f6f860d..f606641 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -108,15 +108,27 @@ struct vgic_irq {
>>  };
>>  
>>  struct vgic_register_region;
>> +struct vgic_its;
>>  
>>  struct vgic_io_device {
>>  	gpa_t base_addr;
>> -	struct kvm_vcpu *redist_vcpu;
>> +	union {
>> +		struct kvm_vcpu *redist_vcpu;
>> +		struct vgic_its *its;
>> +	};
> 
> The only question that springs to mind is...
> 
>>  	const struct vgic_register_region *regions;
>>  	int nr_regions;
>>  	struct kvm_io_device dev;
>>  };
>>  
>> +struct vgic_its {
>> +	/* The base address of the ITS control register frame */
>> +	gpa_t			vgic_its_base;
>> +
>> +	bool			enabled;
>> +	struct vgic_io_device	iodev;
>> +};
>> +
>>  struct vgic_dist {
>>  	bool			in_kernel;
>>  	bool			ready;
>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>> new file mode 100644
>> index 0000000..ab8d244
>> --- /dev/null
>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>> @@ -0,0 +1,100 @@
>> +/*
>> + * GICv3 ITS emulation
>> + *
>> + * Copyright (C) 2015,2016 ARM Ltd.
>> + * Author: Andre Przywara <andre.przywara@arm.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/cpu.h>
>> +#include <linux/kvm.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/interrupt.h>
>> +
>> +#include <linux/irqchip/arm-gic-v3.h>
>> +
>> +#include <asm/kvm_emulate.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/kvm_mmu.h>
>> +
>> +#include "vgic.h"
>> +#include "vgic-mmio.h"
>> +
>> +#define REGISTER_ITS_DESC(off, rd, wr, length, acc)		\
>> +{								\
>> +	.reg_offset = off,					\
>> +	.len = length,						\
>> +	.access_flags = acc,					\
>> +	.iodev_type = IODEV_ITS,				\
> 
> ... why isn't this at the device level? It doesn't make much sense to
> have it at the register level (we never access a register in isolation,
> we always access it relatively to a device).
> 
> And given that the *only* time you actually evaluate this flag is in
> dispatch_mmio_read/write, there is zero benefit in duplicating it all
> over the place.
> 
> Smaller structures, smaller patch. Am I missing something?

And for the record, here's what I've cooked on top of your patch:

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index c64db0f..95eab74 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -112,12 +112,20 @@ struct vgic_irq {
 struct vgic_register_region;
 struct vgic_its;
 
+enum iodev_type {
+	IODEV_CPUIF,
+	IODEV_DIST,
+	IODEV_REDIST,
+	IODEV_ITS,
+};
+
 struct vgic_io_device {
 	gpa_t base_addr;
 	union {
 		struct kvm_vcpu *redist_vcpu;
 		struct vgic_its *its;
 	};
+	enum iodev_type iodev_type;
 	const struct vgic_register_region *regions;
 	int nr_regions;
 	struct kvm_io_device dev;
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 4459a59..7c7d16b 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -1162,7 +1162,6 @@ static void vgic_mmio_write_its_baser(struct kvm *kvm,
 	.reg_offset = off,					\
 	.len = length,						\
 	.access_flags = acc,					\
-	.iodev_type = IODEV_ITS,				\
 	.its_read = rd,						\
 	.its_write = wr,					\
 }
@@ -1219,6 +1218,7 @@ static int vgic_its_register(struct kvm *kvm, struct vgic_its *its)
 
 	iodev->regions = its_registers;
 	iodev->nr_regions = ARRAY_SIZE(its_registers);
+	iodev->iodev_type = IODEV_ITS;
 	kvm_iodevice_init(&iodev->dev, &kvm_io_gic_ops);
 
 	iodev->base_addr = its->vgic_its_base;
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
index bca5bf7..52af312 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
@@ -360,6 +360,7 @@ unsigned int vgic_v2_init_dist_iodev(struct vgic_io_device *dev)
 {
 	dev->regions = vgic_v2_dist_registers;
 	dev->nr_regions = ARRAY_SIZE(vgic_v2_dist_registers);
+	dev->iodev_type = IODEV_DIST;
 
 	kvm_iodevice_init(&dev->dev, &kvm_io_gic_ops);
 
@@ -437,6 +438,7 @@ int vgic_v2_cpuif_uaccess(struct kvm_vcpu *vcpu, bool is_write,
 	struct vgic_io_device dev = {
 		.regions = vgic_v2_cpu_registers,
 		.nr_regions = ARRAY_SIZE(vgic_v2_cpu_registers),
+		.iodev_type = IODEV_CPUIF,
 	};
 
 	return vgic_uaccess(vcpu, &dev, is_write, offset, val);
@@ -448,6 +450,7 @@ int vgic_v2_dist_uaccess(struct kvm_vcpu *vcpu, bool is_write,
 	struct vgic_io_device dev = {
 		.regions = vgic_v2_dist_registers,
 		.nr_regions = ARRAY_SIZE(vgic_v2_dist_registers),
+		.iodev_type = IODEV_DIST,
 	};
 
 	return vgic_uaccess(vcpu, &dev, is_write, offset, val);
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index c7c7a87..d1d2020 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -346,7 +346,6 @@ static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu,
 		.bits_per_irq = bpi,					\
 		.len = (bpi * VGIC_NR_PRIVATE_IRQS) / 8,		\
 		.access_flags = acc,					\
-		.iodev_type = type,					\
 		.read = vgic_mmio_read_raz,				\
 		.write = vgic_mmio_write_wi,				\
 	}, {								\
@@ -354,7 +353,6 @@ static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu,
 		.bits_per_irq = bpi,					\
 		.len = (bpi * (1024 - VGIC_NR_PRIVATE_IRQS)) / 8,	\
 		.access_flags = acc,					\
-		.iodev_type = type,					\
 		.read = rd,						\
 		.write = wr,						\
 	}
@@ -465,6 +463,7 @@ unsigned int vgic_v3_init_dist_iodev(struct vgic_io_device *dev)
 {
 	dev->regions = vgic_v3_dist_registers;
 	dev->nr_regions = ARRAY_SIZE(vgic_v3_dist_registers);
+	dev->iodev_type = IODEV_DIST;
 
 	kvm_iodevice_init(&dev->dev, &kvm_io_gic_ops);
 
@@ -486,6 +485,7 @@ int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t redist_base_address)
 		rd_dev->base_addr = rd_base;
 		rd_dev->regions = vgic_v3_rdbase_registers;
 		rd_dev->nr_regions = ARRAY_SIZE(vgic_v3_rdbase_registers);
+		rd_dev->iodev_type = IODEV_REDIST;
 		rd_dev->redist_vcpu = vcpu;
 
 		mutex_lock(&kvm->slots_lock);
@@ -500,6 +500,7 @@ int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t redist_base_address)
 		sgi_dev->base_addr = sgi_base;
 		sgi_dev->regions = vgic_v3_sgibase_registers;
 		sgi_dev->nr_regions = ARRAY_SIZE(vgic_v3_sgibase_registers);
+		sgi_dev->iodev_type = IODEV_REDIST;
 		sgi_dev->redist_vcpu = vcpu;
 
 		mutex_lock(&kvm->slots_lock);
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index a097c1a..97bf8e7 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -482,7 +482,7 @@ static int dispatch_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
 		return 0;
 	}
 
-	switch (region->iodev_type) {
+	switch (iodev->iodev_type) {
 	case IODEV_CPUIF:
 		return 1;
 	case IODEV_DIST:
@@ -515,7 +515,7 @@ static int dispatch_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
 	if (!check_region(region, addr, len))
 		return 0;
 
-	switch (region->iodev_type) {
+	switch (iodev->iodev_type) {
 	case IODEV_CPUIF:
 		break;
 	case IODEV_DIST:
diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
index 513bb5c..b6950f3 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.h
+++ b/virt/kvm/arm/vgic/vgic-mmio.h
@@ -16,19 +16,11 @@
 #ifndef __KVM_ARM_VGIC_MMIO_H__
 #define __KVM_ARM_VGIC_MMIO_H__
 
-enum iodev_type {
-	IODEV_CPUIF,
-	IODEV_DIST,
-	IODEV_REDIST,
-	IODEV_ITS
-};
-
 struct vgic_register_region {
 	unsigned int reg_offset;
 	unsigned int len;
 	unsigned int bits_per_irq;
 	unsigned int access_flags;
-	enum iodev_type iodev_type;
 	union {
 		unsigned long (*read)(struct kvm_vcpu *vcpu, gpa_t addr,
 				      unsigned int len);
@@ -80,7 +72,6 @@ extern struct kvm_io_device_ops kvm_io_gic_ops;
 		.bits_per_irq = bpi,					\
 		.len = bpi * 1024 / 8,					\
 		.access_flags = acc,					\
-		.iodev_type = type,					\
 		.read = rd,						\
 		.write = wr,						\
 	}
@@ -91,7 +82,6 @@ extern struct kvm_io_device_ops kvm_io_gic_ops;
 		.bits_per_irq = 0,					\
 		.len = length,						\
 		.access_flags = acc,					\
-		.iodev_type = type,					\
 		.read = rd,						\
 		.write = wr,						\
 	}

It works exactly the same way, except that I don't have to type
each and every register. I'll leave you to clean the rest of the
patch! ;-)

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v8 09/17] KVM: arm64: introduce ITS emulation file with MMIO framework
  2016-07-08 13:34   ` Marc Zyngier
  2016-07-08 13:55     ` Marc Zyngier
@ 2016-07-08 14:04     ` André Przywara
  1 sibling, 0 replies; 49+ messages in thread
From: André Przywara @ 2016-07-08 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/07/16 14:34, Marc Zyngier wrote:
> On 05/07/16 12:23, Andre Przywara wrote:
>> The ARM GICv3 ITS emulation code goes into a separate file, but needs
>> to be connected to the GICv3 emulation, of which it is an option.
>> The ITS MMIO handlers require the respective ITS pointer to be passed in,
>> so we amend the existing VGIC MMIO framework to let it cope with that.
>> Also we introduce the basic ITS data structure and initialize it, but
>> don't return any success yet, as we are not yet ready for the show.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  arch/arm64/kvm/Makefile          |   1 +
>>  include/kvm/arm_vgic.h           |  14 +++++-
>>  virt/kvm/arm/vgic/vgic-its.c     | 100 +++++++++++++++++++++++++++++++++++++
>>  virt/kvm/arm/vgic/vgic-mmio-v2.c |  40 +++++++--------
>>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 104 ++++++++++++++++++++++++++-------------
>>  virt/kvm/arm/vgic/vgic-mmio.c    |  36 +++++++++++---
>>  virt/kvm/arm/vgic/vgic-mmio.h    |  31 +++++++++---
>>  virt/kvm/arm/vgic/vgic.h         |   7 +++
>>  8 files changed, 266 insertions(+), 67 deletions(-)
>>  create mode 100644 virt/kvm/arm/vgic/vgic-its.c
>>
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index f00b2cd..a5b9664 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -29,5 +29,6 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio-v2.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio-v3.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-kvm-device.o
>> +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index f6f860d..f606641 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -108,15 +108,27 @@ struct vgic_irq {
>>  };
>>  
>>  struct vgic_register_region;
>> +struct vgic_its;
>>  
>>  struct vgic_io_device {
>>  	gpa_t base_addr;
>> -	struct kvm_vcpu *redist_vcpu;
>> +	union {
>> +		struct kvm_vcpu *redist_vcpu;
>> +		struct vgic_its *its;
>> +	};
> 
> The only question that springs to mind is...
> 
>>  	const struct vgic_register_region *regions;
>>  	int nr_regions;
>>  	struct kvm_io_device dev;
>>  };
>>  
>> +struct vgic_its {
>> +	/* The base address of the ITS control register frame */
>> +	gpa_t			vgic_its_base;
>> +
>> +	bool			enabled;
>> +	struct vgic_io_device	iodev;
>> +};
>> +
>>  struct vgic_dist {
>>  	bool			in_kernel;
>>  	bool			ready;
>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>> new file mode 100644
>> index 0000000..ab8d244
>> --- /dev/null
>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>> @@ -0,0 +1,100 @@
>> +/*
>> + * GICv3 ITS emulation
>> + *
>> + * Copyright (C) 2015,2016 ARM Ltd.
>> + * Author: Andre Przywara <andre.przywara@arm.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/cpu.h>
>> +#include <linux/kvm.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/interrupt.h>
>> +
>> +#include <linux/irqchip/arm-gic-v3.h>
>> +
>> +#include <asm/kvm_emulate.h>
>> +#include <asm/kvm_arm.h>
>> +#include <asm/kvm_mmu.h>
>> +
>> +#include "vgic.h"
>> +#include "vgic-mmio.h"
>> +
>> +#define REGISTER_ITS_DESC(off, rd, wr, length, acc)		\
>> +{								\
>> +	.reg_offset = off,					\
>> +	.len = length,						\
>> +	.access_flags = acc,					\
>> +	.iodev_type = IODEV_ITS,				\
> 
> ... why isn't this at the device level? It doesn't make much sense to
> have it at the register level (we never access a register in isolation,
> we always access it relatively to a device).
> 
> And given that the *only* time you actually evaluate this flag is in
> dispatch_mmio_read/write, there is zero benefit in duplicating it all
> over the place.
> 
> Smaller structures, smaller patch. Am I missing something?

Looks possible. I think I tried something like this in the beginning and
hit some wall - possibly the one in my head ;-). Also I found it saner
to have the type associated with the declaration instead of adjusting
this in the registration function.
But looking again it seems indeed better to do it your way (TM).

Cheers,
Andre

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 11/17] KVM: arm64: implement basic ITS register handlers
  2016-07-05 11:23 ` [PATCH v8 11/17] KVM: arm64: implement basic ITS register handlers Andre Przywara
@ 2016-07-08 14:58   ` Marc Zyngier
  2016-07-11  9:00     ` Andre Przywara
  0 siblings, 1 reply; 49+ messages in thread
From: Marc Zyngier @ 2016-07-08 14:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/07/16 12:23, Andre Przywara wrote:
> Add emulation for some basic MMIO registers used in the ITS emulation.
> This includes:
> - GITS_{CTLR,TYPER,IIDR}
> - ID registers
> - GITS_{CBASER,CREADR,CWRITER}
>   (which implement the ITS command buffer handling)
> - GITS_BASER<n>
> 
> Most of the handlers are pretty straight forward, only the CWRITER
> handler is a bit more involved by taking the new its_cmd mutex and
> then iterating over the command buffer.
> The registers holding base addresses and attributes are sanitised before
> storing them.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  include/kvm/arm_vgic.h           |  16 ++
>  virt/kvm/arm/vgic/vgic-its.c     | 376 +++++++++++++++++++++++++++++++++++++--
>  virt/kvm/arm/vgic/vgic-mmio-v3.c |   8 +-
>  virt/kvm/arm/vgic/vgic-mmio.h    |   6 +
>  virt/kvm/arm/vgic/vgic.c         |  12 +-
>  5 files changed, 401 insertions(+), 17 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index eb82c7d..17d3929 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -22,6 +22,7 @@
>  #include <linux/spinlock.h>
>  #include <linux/types.h>
>  #include <kvm/iodev.h>
> +#include <linux/list.h>
>  
>  #define VGIC_V3_MAX_CPUS	255
>  #define VGIC_V2_MAX_CPUS	8
> @@ -128,6 +129,21 @@ struct vgic_its {
>  	bool			enabled;
>  	bool			initialized;
>  	struct vgic_io_device	iodev;
> +
> +	/* These registers correspond to GITS_BASER{0,1} */
> +	u64			baser_device_table;
> +	u64			baser_coll_table;
> +
> +	/* Protects the command queue */
> +	struct mutex		cmd_lock;
> +	u64			cbaser;
> +	u32			creadr;
> +	u32			cwriter;
> +
> +	/* Protects the device and collection lists */
> +	struct mutex		its_lock;
> +	struct list_head	device_list;
> +	struct list_head	collection_list;
>  };
>  
>  struct vgic_dist {
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index d49bdad..a9336a4 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -21,6 +21,7 @@
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  #include <linux/interrupt.h>
> +#include <linux/list.h>
>  #include <linux/uaccess.h>
>  
>  #include <linux/irqchip/arm-gic-v3.h>
> @@ -32,6 +33,307 @@
>  #include "vgic.h"
>  #include "vgic-mmio.h"
>  
> +struct its_device {
> +	struct list_head dev_list;
> +
> +	/* the head for the list of ITTEs */
> +	struct list_head itt_head;
> +	u32 device_id;
> +};
> +
> +#define COLLECTION_NOT_MAPPED ((u32)~0)
> +
> +struct its_collection {
> +	struct list_head coll_list;
> +
> +	u32 collection_id;
> +	u32 target_addr;
> +};
> +
> +#define its_is_collection_mapped(coll) ((coll) && \
> +				((coll)->target_addr != COLLECTION_NOT_MAPPED))
> +
> +struct its_itte {
> +	struct list_head itte_list;
> +
> +	struct its_collection *collection;
> +	u32 lpi;
> +	u32 event_id;
> +};
> +
> +#define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
> +
> +static unsigned long vgic_mmio_read_its_ctlr(struct kvm *vcpu,
> +					     struct vgic_its *its,
> +					     gpa_t addr, unsigned int len)
> +{
> +	u32 reg = 0;
> +
> +	mutex_lock(&its->cmd_lock);
> +	if (its->creadr == its->cwriter)
> +		reg |= GITS_CTLR_QUIESCENT;
> +	if (its->enabled)
> +		reg |= GITS_CTLR_ENABLE;
> +	mutex_unlock(&its->cmd_lock);
> +
> +	return reg;
> +}
> +
> +static void vgic_mmio_write_its_ctlr(struct kvm *kvm, struct vgic_its *its,
> +				     gpa_t addr, unsigned int len,
> +				     unsigned long val)
> +{
> +	its->enabled = !!(val & GITS_CTLR_ENABLE);
> +}
> +
> +static unsigned long vgic_mmio_read_its_typer(struct kvm *kvm,
> +					      struct vgic_its *its,
> +					      gpa_t addr, unsigned int len)
> +{
> +	u64 reg = GITS_TYPER_PLPIS;
> +
> +	/*
> +	 * We use linear CPU numbers for redistributor addressing,
> +	 * so GITS_TYPER.PTA is 0.
> +	 * Also we force all PROPBASER registers to be the same, so
> +	 * CommonLPIAff is 0 as well.
> +	 * To avoid memory waste in the guest, we keep the number of IDBits and
> +	 * DevBits low - as least for the time being.
> +	 */
> +	reg |= 0x0f << GITS_TYPER_DEVBITS_SHIFT;
> +	reg |= 0x0f << GITS_TYPER_IDBITS_SHIFT;
> +
> +	return extract_bytes(reg, addr & 7, len);
> +}
> +
> +static unsigned long vgic_mmio_read_its_iidr(struct kvm *kvm,
> +					     struct vgic_its *its,
> +					     gpa_t addr, unsigned int len)
> +{
> +	return (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
> +}
> +
> +static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm,
> +					       struct vgic_its *its,
> +					       gpa_t addr, unsigned int len)
> +{
> +	switch (addr & 0xffff) {
> +	case GITS_PIDR0:
> +		return 0x92;	/* part number, bits[7:0] */
> +	case GITS_PIDR1:
> +		return 0xb4;	/* part number, bits[11:8] */
> +	case GITS_PIDR2:
> +		return GIC_PIDR2_ARCH_GICv3 | 0x0b;
> +	case GITS_PIDR4:
> +		return 0x40;	/* This is a 64K software visible page */
> +	/* The following are the ID registers for (any) GIC. */
> +	case GITS_CIDR0:
> +		return 0x0d;
> +	case GITS_CIDR1:
> +		return 0xf0;
> +	case GITS_CIDR2:
> +		return 0x05;
> +	case GITS_CIDR3:
> +		return 0xb1;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Requires the its_lock to be held. */
> +static void its_free_itte(struct kvm *kvm, struct its_itte *itte)
> +{
> +	list_del(&itte->itte_list);
> +	kfree(itte);
> +}
> +
> +static int vits_handle_command(struct kvm *kvm, struct vgic_its *its,
> +			       u64 *its_cmd)
> +{
> +	return -ENODEV;
> +}
> +
> +static u64 vgic_sanitise_its_baser(u64 reg)
> +{
> +	reg = vgic_sanitise_field(reg, GITS_BASER_SHAREABILITY_SHIFT,
> +				  GIC_BASER_SHAREABILITY_MASK,
> +				  vgic_sanitise_shareability);
> +	reg = vgic_sanitise_field(reg, GITS_BASER_INNER_CACHEABILITY_SHIFT,
> +				  GIC_BASER_CACHE_MASK,
> +				  vgic_sanitise_inner_cacheability);
> +	reg = vgic_sanitise_field(reg, GITS_BASER_OUTER_CACHEABILITY_SHIFT,
> +				  GIC_BASER_CACHE_MASK,
> +				  vgic_sanitise_outer_cacheability);
> +	return reg;
> +}
> +
> +static u64 vgic_sanitise_its_cbaser(u64 reg)
> +{
> +	reg = vgic_sanitise_field(reg, GITS_CBASER_SHAREABILITY_SHIFT,
> +				  GIC_BASER_SHAREABILITY_MASK,

-ECOPYPASTE : GITS_CBASER_SHAREABILITY_MASK

> +				  vgic_sanitise_shareability);
> +	reg = vgic_sanitise_field(reg, GITS_CBASER_INNER_CACHEABILITY_SHIFT,
> +				  GIC_BASER_CACHE_MASK,

Same here?

> +				  vgic_sanitise_inner_cacheability);
> +	reg = vgic_sanitise_field(reg, GITS_CBASER_OUTER_CACHEABILITY_SHIFT,
> +				  GIC_BASER_CACHE_MASK,

And here?

> +				  vgic_sanitise_outer_cacheability);
> +	return reg;
> +}
> +
> +static unsigned long vgic_mmio_read_its_cbaser(struct kvm *kvm,
> +					       struct vgic_its *its,
> +					       gpa_t addr, unsigned int len)
> +{
> +	return extract_bytes(its->cbaser, addr & 7, len);
> +}
> +
> +static void vgic_mmio_write_its_cbaser(struct kvm *kvm, struct vgic_its *its,
> +				       gpa_t addr, unsigned int len,
> +				       unsigned long val)
> +{
> +	/* When GITS_CTLR.Enable is 1, this register is RO. */
> +	if (its->enabled)
> +		return;
> +
> +	mutex_lock(&its->cmd_lock);
> +	its->cbaser = update_64bit_reg(its->cbaser, addr & 7, len, val);
> +	/* Sanitise the physical address to be 64k aligned. */
> +	its->cbaser &= ~GENMASK_ULL(15, 12);

So you're not supporting 52bit addresses, as you're forcing the bottom
addresses to zero.

> +	its->cbaser = vgic_sanitise_its_cbaser(its->cbaser);
> +	its->creadr = 0;
> +	/*
> +	 * CWRITER is architecturally UNKNOWN on reset, but we need to reset
> +	 * it to CREADR to make sure we start with an empty command buffer.
> +	 */
> +	its->cwriter = its->creadr;
> +	mutex_unlock(&its->cmd_lock);
> +}
> +
> +#define ITS_CMD_BUFFER_SIZE(baser)	((((baser) & 0xff) + 1) << 12)
> +#define ITS_CMD_SIZE			32
> +
> +/*
> + * By writing to CWRITER the guest announces new commands to be processed.
> + * To avoid any races in the first place, we take the its_cmd lock, which
> + * protects our ring buffer variables, so that there is only one user
> + * per ITS handling commands at a given time.
> + */
> +static void vgic_mmio_write_its_cwriter(struct kvm *kvm, struct vgic_its *its,
> +					gpa_t addr, unsigned int len,
> +					unsigned long val)
> +{
> +	gpa_t cbaser;
> +	u64 cmd_buf[4];
> +	u32 reg;
> +
> +	if (!its)
> +		return;
> +
> +	cbaser = CBASER_ADDRESS(its->cbaser);
> +
> +	reg = update_64bit_reg(its->cwriter & 0xfffe0, addr & 7, len, val);
> +	reg &= 0xfffe0;
> +	if (reg > ITS_CMD_BUFFER_SIZE(its->cbaser))
> +		return;
> +
> +	mutex_lock(&its->cmd_lock);
> +
> +	its->cwriter = reg;
> +
> +	while (its->cwriter != its->creadr) {
> +		int ret = kvm_read_guest(kvm, cbaser + its->creadr,
> +					 cmd_buf, ITS_CMD_SIZE);
> +		/*
> +		 * If kvm_read_guest() fails, this could be due to the guest
> +		 * programming a bogus value in CBASER or something else going
> +		 * wrong from which we cannot easily recover.
> +		 * We just ignore that command then.
> +		 */
> +		if (!ret)
> +			vits_handle_command(kvm, its, cmd_buf);
> +
> +		its->creadr += ITS_CMD_SIZE;
> +		if (its->creadr == ITS_CMD_BUFFER_SIZE(its->cbaser))
> +			its->creadr = 0;
> +	}
> +
> +	mutex_unlock(&its->cmd_lock);
> +}
> +
> +static unsigned long vgic_mmio_read_its_cwriter(struct kvm *kvm,
> +						struct vgic_its *its,
> +						gpa_t addr, unsigned int len)
> +{
> +	return extract_bytes(its->cwriter & 0xfffe0, addr & 0x7, len);
> +}
> +
> +static unsigned long vgic_mmio_read_its_creadr(struct kvm *kvm,
> +					       struct vgic_its *its,
> +					       gpa_t addr, unsigned int len)
> +{
> +	return extract_bytes(its->creadr & 0xfffe0, addr & 0x7, len);
> +}
> +
> +#define BASER_INDEX(addr) (((addr) / sizeof(u64)) & 0x7)
> +static unsigned long vgic_mmio_read_its_baser(struct kvm *kvm,
> +					      struct vgic_its *its,
> +					      gpa_t addr, unsigned int len)
> +{
> +	u64 reg;
> +
> +	switch (BASER_INDEX(addr)) {
> +	case 0:
> +		reg = its->baser_device_table;
> +		break;
> +	case 1:
> +		reg = its->baser_coll_table;
> +		break;
> +	default:
> +		reg = 0;
> +		break;
> +	}
> +
> +	return extract_bytes(reg, addr & 7, len);
> +}
> +
> +#define GITS_BASER_RO_MASK	(GENMASK_ULL(52, 48) | GENMASK_ULL(58, 56))
> +static void vgic_mmio_write_its_baser(struct kvm *kvm,
> +				      struct vgic_its *its,
> +				      gpa_t addr, unsigned int len,
> +				      unsigned long val)
> +{
> +	u64 reg, *regptr;
> +	u64 entry_size, device_type;
> +
> +	/* When GITS_CTLR.Enable is 1, we ignore write accesses. */
> +	if (its->enabled)
> +		return;
> +
> +	switch (BASER_INDEX(addr)) {
> +	case 0:
> +		regptr = &its->baser_device_table;
> +		entry_size = 8;
> +		device_type = GITS_BASER_TYPE_DEVICE;
> +		break;
> +	case 1:
> +		regptr = &its->baser_coll_table;
> +		entry_size = 8;
> +		device_type = GITS_BASER_TYPE_COLLECTION;
> +		break;
> +	default:
> +		return;
> +	}
> +
> +	reg = update_64bit_reg(*regptr, addr & 7, len, val);
> +	reg &= ~GITS_BASER_RO_MASK;
> +	reg |= (entry_size - 1) << GITS_BASER_ENTRY_SIZE_SHIFT;
> +	reg |= device_type << GITS_BASER_TYPE_SHIFT;
> +	reg = vgic_sanitise_its_baser(reg);

So you claim to support indirect tables on collections too? That's
pretty odd. I'd be happier if you filtered that out on collections.

Also, you're supporting any page size (which is fine on its own), but
also not doing anything regarding the width of the address (52bits are
only valid for 64kB ITS pages). This is completely inconsistent with
what you're doing with GITS_CBASER.

I'd suggest you reduce the scope to a single supported page size (64kB),
and decide whether you want to support 52bit PAs or not. Either way
would be valid, but it has to be consistent across the board.

It may not be of great importance right now, but it is going to be
really critical for save/restore, and we'd better get it right from the
beginning.

> +
> +	*regptr = reg;
> +}
> +
>  #define REGISTER_ITS_DESC(off, rd, wr, length, acc)		\
>  {								\
>  	.reg_offset = off,					\
> @@ -42,8 +344,8 @@
>  	.its_write = wr,					\
>  }
>  
> -static unsigned long its_mmio_read_raz(struct kvm *kvm, struct vgic_its *its,
> -				       gpa_t addr, unsigned int len)
> +unsigned long its_mmio_read_raz(struct kvm *kvm, struct vgic_its *its,
> +				gpa_t addr, unsigned int len)
>  {
>  	return 0;
>  }
> @@ -56,28 +358,28 @@ static void its_mmio_write_wi(struct kvm *kvm, struct vgic_its *its,
>  
>  static struct vgic_register_region its_registers[] = {
>  	REGISTER_ITS_DESC(GITS_CTLR,
> -		its_mmio_read_raz, its_mmio_write_wi, 4,
> +		vgic_mmio_read_its_ctlr, vgic_mmio_write_its_ctlr, 4,
>  		VGIC_ACCESS_32bit),
>  	REGISTER_ITS_DESC(GITS_IIDR,
> -		its_mmio_read_raz, its_mmio_write_wi, 4,
> +		vgic_mmio_read_its_iidr, its_mmio_write_wi, 4,
>  		VGIC_ACCESS_32bit),
>  	REGISTER_ITS_DESC(GITS_TYPER,
> -		its_mmio_read_raz, its_mmio_write_wi, 8,
> +		vgic_mmio_read_its_typer, its_mmio_write_wi, 8,
>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>  	REGISTER_ITS_DESC(GITS_CBASER,
> -		its_mmio_read_raz, its_mmio_write_wi, 8,
> +		vgic_mmio_read_its_cbaser, vgic_mmio_write_its_cbaser, 8,
>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>  	REGISTER_ITS_DESC(GITS_CWRITER,
> -		its_mmio_read_raz, its_mmio_write_wi, 8,
> +		vgic_mmio_read_its_cwriter, vgic_mmio_write_its_cwriter, 8,
>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>  	REGISTER_ITS_DESC(GITS_CREADR,
> -		its_mmio_read_raz, its_mmio_write_wi, 8,
> +		vgic_mmio_read_its_creadr, its_mmio_write_wi, 8,
>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>  	REGISTER_ITS_DESC(GITS_BASER,
> -		its_mmio_read_raz, its_mmio_write_wi, 0x40,
> +		vgic_mmio_read_its_baser, vgic_mmio_write_its_baser, 0x40,
>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>  	REGISTER_ITS_DESC(GITS_IDREGS_BASE,
> -		its_mmio_read_raz, its_mmio_write_wi, 0x30,
> +		vgic_mmio_read_its_idregs, its_mmio_write_wi, 0x30,
>  		VGIC_ACCESS_32bit),
>  };
>  
> @@ -100,6 +402,18 @@ static int vgic_its_register(struct kvm *kvm, struct vgic_its *its)
>  	return ret;
>  }
>  
> +#define INITIAL_BASER_VALUE						  \
> +	(GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWb)		| \
> +	 GIC_BASER_CACHEABILITY(GITS_BASER, OUTER, SameAsInner)		| \
> +	 GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable)		| \
> +	 ((8ULL - 1) << GITS_BASER_ENTRY_SIZE_SHIFT)			| \
> +	 GITS_BASER_PAGE_SIZE_64K)
> +
> +#define INITIAL_PROPBASER_VALUE						  \
> +	(GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWb)		| \
> +	 GIC_BASER_CACHEABILITY(GICR_PROPBASER, OUTER, SameAsInner)	| \
> +	 GIC_BASER_SHAREABILITY(GICR_PROPBASER, InnerShareable))
> +
>  static int vgic_its_create(struct kvm_device *dev, u32 type)
>  {
>  	struct vgic_its *its;
> @@ -111,12 +425,25 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
>  	if (!its)
>  		return -ENOMEM;
>  
> +	mutex_init(&its->its_lock);
> +	mutex_init(&its->cmd_lock);
> +
>  	its->vgic_its_base = VGIC_ADDR_UNDEF;
>  
> +	INIT_LIST_HEAD(&its->device_list);
> +	INIT_LIST_HEAD(&its->collection_list);
> +
>  	dev->kvm->arch.vgic.has_its = true;
>  	its->initialized = false;
>  	its->enabled = false;
>  
> +	its->baser_device_table = INITIAL_BASER_VALUE			|
> +		((u64)GITS_BASER_TYPE_DEVICE << GITS_BASER_TYPE_SHIFT)	|
> +		GITS_BASER_INDIRECT;

It is a bit odd to advertize the indirect flag as a reset value, but I
don't see anything that indicates it is not allowed...

> +	its->baser_coll_table = INITIAL_BASER_VALUE |
> +		((u64)GITS_BASER_TYPE_COLLECTION << GITS_BASER_TYPE_SHIFT);
> +	dev->kvm->arch.vgic.propbaser = INITIAL_PROPBASER_VALUE;
> +
>  	dev->private = its;
>  
>  	return 0;
> @@ -124,7 +451,36 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
>  
>  static void vgic_its_destroy(struct kvm_device *kvm_dev)
>  {
> +	struct kvm *kvm = kvm_dev->kvm;
>  	struct vgic_its *its = kvm_dev->private;
> +	struct its_device *dev;
> +	struct its_itte *itte;
> +	struct list_head *dev_cur, *dev_temp;
> +	struct list_head *cur, *temp;
> +
> +	/*
> +	 * We may end up here without the lists ever having been initialized.
> +	 * Check this and bail out early to avoid dereferencing a NULL pointer.
> +	 */
> +	if (!its->device_list.next)
> +		return;
> +
> +	mutex_lock(&its->its_lock);
> +	list_for_each_safe(dev_cur, dev_temp, &its->device_list) {
> +		dev = container_of(dev_cur, struct its_device, dev_list);
> +		list_for_each_safe(cur, temp, &dev->itt_head) {
> +			itte = (container_of(cur, struct its_itte, itte_list));
> +			its_free_itte(kvm, itte);
> +		}
> +		list_del(dev_cur);
> +		kfree(dev);
> +	}
> +
> +	list_for_each_safe(cur, temp, &its->collection_list) {
> +		list_del(cur);
> +		kfree(container_of(cur, struct its_collection, coll_list));
> +	}
> +	mutex_unlock(&its->its_lock);
>  
>  	kfree(its);
>  }
> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> index 062ff95..370e89e 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> @@ -23,15 +23,15 @@
>  #include "vgic-mmio.h"
>  
>  /* extract @num bytes at @offset bytes offset in data */
> -static unsigned long extract_bytes(unsigned long data, unsigned int offset,
> -				   unsigned int num)
> +unsigned long extract_bytes(unsigned long data, unsigned int offset,
> +			    unsigned int num)
>  {
>  	return (data >> (offset * 8)) & GENMASK_ULL(num * 8 - 1, 0);
>  }
>  
>  /* allows updates of any half of a 64-bit register (or the whole thing) */
> -static u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
> -			    unsigned long val)
> +u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
> +		     unsigned long val)
>  {
>  	int lower = (offset & 4) * 8;
>  	int upper = lower + 8 * len - 1;
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
> index 23e97a7..513bb5c 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.h
> +++ b/virt/kvm/arm/vgic/vgic-mmio.h
> @@ -106,6 +106,12 @@ unsigned long vgic_data_mmio_bus_to_host(const void *val, unsigned int len);
>  void vgic_data_host_to_mmio_bus(void *buf, unsigned int len,
>  				unsigned long data);
>  
> +unsigned long extract_bytes(unsigned long data, unsigned int offset,
> +			    unsigned int num);
> +
> +u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
> +		     unsigned long val);
> +
>  unsigned long vgic_mmio_read_raz(struct kvm_vcpu *vcpu,
>  				 gpa_t addr, unsigned int len);
>  
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index ae80894..a5d9a10 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -33,10 +33,16 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
>  
>  /*
>   * Locking order is always:
> - *   vgic_cpu->ap_list_lock
> - *     vgic_irq->irq_lock
> + * its->cmd_lock (mutex)
> + *   its->its_lock (mutex)
> + *     vgic_cpu->ap_list_lock
> + *       vgic_irq->irq_lock
>   *
> - * (that is, always take the ap_list_lock before the struct vgic_irq lock).
> + * If you need to take multiple locks, always take the upper lock first,
> + * then the lower ones, e.g. first take the its_lock, then the irq_lock.
> + * If you are already holding a lock and need to take a higher one, you
> + * have to drop the lower ranking lock first and re-aquire it after having
> + * taken the upper one.
>   *
>   * When taking more than one ap_list_lock at the same time, always take the
>   * lowest numbered VCPU's ap_list_lock first, so:
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 08/17] KVM: arm64: handle ITS related GICv3 redistributor registers
  2016-07-05 11:23 ` [PATCH v8 08/17] KVM: arm64: handle ITS related GICv3 redistributor registers Andre Przywara
@ 2016-07-08 15:40   ` Christoffer Dall
  2016-07-11  7:45     ` André Przywara
  0 siblings, 1 reply; 49+ messages in thread
From: Christoffer Dall @ 2016-07-08 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 05, 2016 at 12:23:00PM +0100, Andre Przywara wrote:
> In the GICv3 redistributor there are the PENDBASER and PROPBASER
> registers which we did not emulate so far, as they only make sense
> when having an ITS. In preparation for that emulate those MMIO
> accesses by storing the 64-bit data written into it into a variable
> which we later read in the ITS emulation.
> We also sanitise the registers, making sure RES0 regions are respected
> and checking for valid memory attributes.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  include/kvm/arm_vgic.h           |  13 ++++
>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 143 ++++++++++++++++++++++++++++++++++++++-
>  virt/kvm/arm/vgic/vgic-mmio.h    |   8 +++
>  virt/kvm/arm/vgic/vgic-v3.c      |  11 ++-
>  4 files changed, 171 insertions(+), 4 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 450b4da..f6f860d 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -146,6 +146,14 @@ struct vgic_dist {
>  	struct vgic_irq		*spis;
>  
>  	struct vgic_io_device	dist_iodev;
> +
> +	/*
> +	 * Contains the address of the LPI configuration table.

Is this field the address or the actual format of the GICR_PROPBASER ?

If the former, the type should potentially be gpa_t, if the latter, you
shouldn't say that this is just the address :)

> +	 * Since we report GICR_TYPER.CommonLPIAff as 0b00, we can share
> +	 * one address across all redistributors.
> +	 * GICv3 spec: 6.1.2 "LPI Configuration tables"
> +	 */
> +	u64			propbaser;
>  };
>  
>  struct vgic_v2_cpu_if {
> @@ -200,6 +208,11 @@ struct vgic_cpu {
>  	 */
>  	struct vgic_io_device	rd_iodev;
>  	struct vgic_io_device	sgi_iodev;
> +
> +	/* Points to the LPI pending tables for the redistributor */
> +	u64 pendbaser;

ditto

> +
> +	bool lpis_enabled;
>  };
>  
>  int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write);
> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> index bfcafbd..9dd8632 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> @@ -29,6 +29,19 @@ static unsigned long extract_bytes(unsigned long data, unsigned int offset,
>  	return (data >> (offset * 8)) & GENMASK_ULL(num * 8 - 1, 0);
>  }
>  
> +/* allows updates of any half of a 64-bit register (or the whole thing) */

when I see this function I have to read the bit fiddling to understand
the parameters and semantics.

actually, I'm not sure I read this correctly, but could you mean:

  /*
   * Update @len bytes at @offset bytes offset in @reg from the least
   * significant bytes in @val?
   */

Also, I don't get why this would be limited to either halfs or the whole
of a 64-bit register, but maybe I'm reading the code wrong.

Didn't we have a reviewed function for the vgic that was called
update_bytes or inser_bytes before, that we could use or did we never
actually review that?


> +static u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
> +			    unsigned long val)
> +{
> +	int lower = (offset & 4) * 8;
> +	int upper = lower + 8 * len - 1;
> +
> +	reg &= ~GENMASK_ULL(upper, lower);
> +	val &= GENMASK_ULL(len * 8 - 1, 0);
> +
> +	return reg | ((u64)val << lower);

this casting is weird.  Why is val not either a u64 or a u32?  Or the
whole lot could be unsigned long/unsigned int like extract_bytes.

> +}
> +
>  static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu,
>  					    gpa_t addr, unsigned int len)
>  {
> @@ -152,6 +165,132 @@ static unsigned long vgic_mmio_read_v3_idregs(struct kvm_vcpu *vcpu,
>  	return 0;
>  }
>  
> +/* We want to avoid outer shareable. */
> +u64 vgic_sanitise_shareability(u64 reg)
> +{
> +	switch (reg & GIC_BASER_SHAREABILITY_MASK) {
> +	case GIC_BASER_OuterShareable:
> +		return GIC_BASER_InnerShareable;
> +	default:
> +		return reg;
> +	}
> +}
> +
> +/* Non-cacheable or same-as-inner are OK. */
> +u64 vgic_sanitise_outer_cacheability(u64 reg)
> +{
> +	switch (reg & GIC_BASER_CACHE_MASK) {
> +	case GIC_BASER_CACHE_SameAsInner:
> +	case GIC_BASER_CACHE_nC:
> +		return reg;
> +	default:
> +		return GIC_BASER_CACHE_nC;
> +	}
> +}
> +
> +/* Avoid any inner non-cacheable mapping. */
> +u64 vgic_sanitise_inner_cacheability(u64 reg)
> +{
> +	switch (reg & GIC_BASER_CACHE_MASK) {
> +	case GIC_BASER_CACHE_nCnB:
> +	case GIC_BASER_CACHE_nC:
> +		return GIC_BASER_CACHE_RaWb;
> +	default:
> +		return reg;
> +	}
> +}

nit: My OCD sets in here because the functions above are not ordered
similarly to the calls below :)

also, why do you need to apply the mask in the sanitize functions when
you've just applied it in the callers?  It would make more sense to me
if you didn't and named the parameters @field instead of @reg in the
sanitize functions.

> +
> +u64 vgic_sanitise_field(u64 reg, int field_shift, u64 field_mask,
> +			u64 (*sanitise_fn)(u64))
> +{
> +	u64 field = (reg >> field_shift) & field_mask;
> +
> +	field = sanitise_fn(field) << field_shift;
> +	return (reg & ~(field_mask << field_shift)) | field;
> +}
> +
> +static u64 vgic_sanitise_pendbaser(u64 reg)
> +{
> +	reg = vgic_sanitise_field(reg, GICR_PENDBASER_SHAREABILITY_SHIFT,
> +				  GIC_BASER_SHAREABILITY_MASK,
> +				  vgic_sanitise_shareability);
> +	reg = vgic_sanitise_field(reg, GICR_PENDBASER_INNER_CACHEABILITY_SHIFT,
> +				  GIC_BASER_CACHE_MASK,
> +				  vgic_sanitise_inner_cacheability);
> +	reg = vgic_sanitise_field(reg, GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT,
> +				  GIC_BASER_CACHE_MASK,
> +				  vgic_sanitise_outer_cacheability);
> +	return reg;
> +}
> +
> +static u64 vgic_sanitise_propbaser(u64 reg)
> +{
> +	reg = vgic_sanitise_field(reg, GICR_PROPBASER_SHAREABILITY_SHIFT,
> +				  GIC_BASER_SHAREABILITY_MASK,
> +				  vgic_sanitise_shareability);
> +	reg = vgic_sanitise_field(reg, GICR_PROPBASER_INNER_CACHEABILITY_SHIFT,
> +				  GIC_BASER_CACHE_MASK,
> +				  vgic_sanitise_inner_cacheability);
> +	reg = vgic_sanitise_field(reg, GICR_PROPBASER_OUTER_CACHEABILITY_SHIFT,
> +				  GIC_BASER_CACHE_MASK,
> +				  vgic_sanitise_outer_cacheability);
> +	return reg;
> +}

assuming the defines themselves are correct (I didn't check) this looks
good otherwise.

> +
> +#define PROPBASER_RES0_MASK						\
> +	(GENMASK_ULL(63, 59) | GENMASK_ULL(55, 52) | GENMASK_ULL(6, 5))

Why is the middle one for bits 55:52?  is it not 55:48?  Perhaps larger
physical addresses compared to the revision of the GICv3 spec I have at
hand?

> +#define PENDBASER_RES0_MASK						\
> +	(BIT_ULL(63) | GENMASK_ULL(61, 59) | GENMASK_ULL(55, 52) |	\
> +	 GENMASK_ULL(15, 12) | GENMASK_ULL(6, 0))
> +
> +static unsigned long vgic_mmio_read_propbase(struct kvm_vcpu *vcpu,
> +					     gpa_t addr, unsigned int len)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +
> +	return extract_bytes(dist->propbaser, addr & 7, len);
> +}
> +
> +static void vgic_mmio_write_propbase(struct kvm_vcpu *vcpu,
> +				     gpa_t addr, unsigned int len,
> +				     unsigned long val)
> +{
> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> +	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> +
> +	/* Storing a value with LPIs already enabled is undefined */
> +	if (vgic_cpu->lpis_enabled)
> +		return;
> +
> +	dist->propbaser = update_64bit_reg(dist->propbaser, addr & 4, len, val);
> +	dist->propbaser &= ~PROPBASER_RES0_MASK;
> +	dist->propbaser = vgic_sanitise_propbaser(dist->propbaser);

what prevents multiple writers messing with dist->propbaser or the
lips_enabled being set in the middle of us fiddling with dist_propbaser?

> +}
> +
> +static unsigned long vgic_mmio_read_pendbase(struct kvm_vcpu *vcpu,
> +					     gpa_t addr, unsigned int len)
> +{
> +	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> +
> +	return extract_bytes(vgic_cpu->pendbaser, addr & 7, len);
> +}
> +
> +static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu,
> +				     gpa_t addr, unsigned int len,
> +				     unsigned long val)
> +{
> +	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> +
> +	/* Storing a value with LPIs already enabled is undefined */
> +	if (vgic_cpu->lpis_enabled)
> +		return;
> +
> +	vgic_cpu->pendbaser = update_64bit_reg(vgic_cpu->pendbaser,
> +					       addr & 4, len, val);
> +	vgic_cpu->pendbaser &= ~PENDBASER_RES0_MASK;
> +	vgic_cpu->pendbaser = vgic_sanitise_pendbaser(vgic_cpu->pendbaser);
> +}
> +
>  /*
>   * The GICv3 per-IRQ registers are split to control PPIs and SGIs in the
>   * redistributors, while SPIs are covered by registers in the distributor
> @@ -232,10 +371,10 @@ static const struct vgic_register_region vgic_v3_rdbase_registers[] = {
>  		vgic_mmio_read_v3r_typer, vgic_mmio_write_wi, 8,
>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>  	REGISTER_DESC_WITH_LENGTH(GICR_PROPBASER,
> -		vgic_mmio_read_raz, vgic_mmio_write_wi, 8,
> +		vgic_mmio_read_propbase, vgic_mmio_write_propbase, 8,
>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>  	REGISTER_DESC_WITH_LENGTH(GICR_PENDBASER,
> -		vgic_mmio_read_raz, vgic_mmio_write_wi, 8,
> +		vgic_mmio_read_pendbase, vgic_mmio_write_pendbase, 8,
>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>  	REGISTER_DESC_WITH_LENGTH(GICR_IDREGS,
>  		vgic_mmio_read_v3_idregs, vgic_mmio_write_wi, 48,
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
> index 8509014..e863ccc 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.h
> +++ b/virt/kvm/arm/vgic/vgic-mmio.h
> @@ -147,4 +147,12 @@ unsigned int vgic_v2_init_dist_iodev(struct vgic_io_device *dev);
>  
>  unsigned int vgic_v3_init_dist_iodev(struct vgic_io_device *dev);
>  
> +#ifdef CONFIG_KVM_ARM_VGIC_V3
> +u64 vgic_sanitise_outer_cacheability(u64 reg);
> +u64 vgic_sanitise_inner_cacheability(u64 reg);
> +u64 vgic_sanitise_shareability(u64 reg);
> +u64 vgic_sanitise_field(u64 reg, int field_shift, u64 field_mask,
> +			u64 (*sanitise_fn)(u64));
> +#endif
> +
>  #endif
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index f0ac064..6f8f31f 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -191,6 +191,11 @@ void vgic_v3_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcrp)
>  	vmcrp->pmr  = (vmcr & ICH_VMCR_PMR_MASK) >> ICH_VMCR_PMR_SHIFT;
>  }
>  
> +#define INITIAL_PENDBASER_VALUE						  \
> +	(GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWb)		| \
> +	GIC_BASER_CACHEABILITY(GICR_PENDBASER, OUTER, SameAsInner)	| \
> +	GIC_BASER_SHAREABILITY(GICR_PENDBASER, InnerShareable))
> +
>  void vgic_v3_enable(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_v3_cpu_if *vgic_v3 = &vcpu->arch.vgic_cpu.vgic_v3;
> @@ -208,10 +213,12 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
>  	 * way, so we force SRE to 1 to demonstrate this to the guest.
>  	 * This goes with the spec allowing the value to be RAO/WI.
>  	 */
> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> +	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
>  		vgic_v3->vgic_sre = ICC_SRE_EL1_SRE;
> -	else
> +		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;

why is pendbaser initialized, but not propbaser?  Do I have an outdated
spec?  Both seem to not have any predefined reset value.

> +	} else {
>  		vgic_v3->vgic_sre = 0;
> +	}
>  
>  	/* Get the show on the road... */
>  	vgic_v3->vgic_hcr = ICH_HCR_EN;
> -- 
> 2.9.0
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 08/17] KVM: arm64: handle ITS related GICv3 redistributor registers
  2016-07-08 15:40   ` Christoffer Dall
@ 2016-07-11  7:45     ` André Przywara
  0 siblings, 0 replies; 49+ messages in thread
From: André Przywara @ 2016-07-11  7:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/07/16 16:40, Christoffer Dall wrote:

Hi Christoffer,

thanks very much for taking a look!

> On Tue, Jul 05, 2016 at 12:23:00PM +0100, Andre Przywara wrote:
>> In the GICv3 redistributor there are the PENDBASER and PROPBASER
>> registers which we did not emulate so far, as they only make sense
>> when having an ITS. In preparation for that emulate those MMIO
>> accesses by storing the 64-bit data written into it into a variable
>> which we later read in the ITS emulation.
>> We also sanitise the registers, making sure RES0 regions are respected
>> and checking for valid memory attributes.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  include/kvm/arm_vgic.h           |  13 ++++
>>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 143 ++++++++++++++++++++++++++++++++++++++-
>>  virt/kvm/arm/vgic/vgic-mmio.h    |   8 +++
>>  virt/kvm/arm/vgic/vgic-v3.c      |  11 ++-
>>  4 files changed, 171 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 450b4da..f6f860d 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -146,6 +146,14 @@ struct vgic_dist {
>>  	struct vgic_irq		*spis;
>>  
>>  	struct vgic_io_device	dist_iodev;
>> +
>> +	/*
>> +	 * Contains the address of the LPI configuration table.
> 
> Is this field the address or the actual format of the GICR_PROPBASER ?
> 
> If the former, the type should potentially be gpa_t, if the latter, you
> shouldn't say that this is just the address :)

I think we had this discussion before, I am happy to use a wording that
makes everyone happy ;-)
I already changed it to "_Contains_ the address" to note that it's the
whole register, but that holding the _address_ is the actual purpose of
PROPBASER. A less unambiguous comment is warmly welcomed ...

>> +	 * Since we report GICR_TYPER.CommonLPIAff as 0b00, we can share
>> +	 * one address across all redistributors.
>> +	 * GICv3 spec: 6.1.2 "LPI Configuration tables"
>> +	 */
>> +	u64			propbaser;
>>  };
>>  
>>  struct vgic_v2_cpu_if {
>> @@ -200,6 +208,11 @@ struct vgic_cpu {
>>  	 */
>>  	struct vgic_io_device	rd_iodev;
>>  	struct vgic_io_device	sgi_iodev;
>> +
>> +	/* Points to the LPI pending tables for the redistributor */
>> +	u64 pendbaser;
> 
> ditto
> 
>> +
>> +	bool lpis_enabled;
>>  };
>>  
>>  int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write);
>> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>> index bfcafbd..9dd8632 100644
>> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
>> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>> @@ -29,6 +29,19 @@ static unsigned long extract_bytes(unsigned long data, unsigned int offset,
>>  	return (data >> (offset * 8)) & GENMASK_ULL(num * 8 - 1, 0);
>>  }
>>  
>> +/* allows updates of any half of a 64-bit register (or the whole thing) */
> 
> when I see this function I have to read the bit fiddling to understand
> the parameters and semantics.
> 
> actually, I'm not sure I read this correctly, but could you mean:
> 
>   /*
>    * Update @len bytes at @offset bytes offset in @reg from the least
>    * significant bytes in @val?
>    */

Yes, that's what it does, though I am not sure that explanation is more
meaningful, as it describes the code, but not the intention of it.
The main purpose of that function is to allow updates of one half of a
register while keeping the other half intact.

> Also, I don't get why this would be limited to either halfs or the whole
> of a 64-bit register, but maybe I'm reading the code wrong.

Every GIC register that can be accessed as 64-bit can only be 32-bit as
well, so we don't need any byte or halfword handling.
Containing this function to these two cases made it simpler (though this
is admittedly not obvious ;-)

> Didn't we have a reviewed function for the vgic that was called
> update_bytes or inser_bytes before, that we could use or did we never
> actually review that?

I forgot, I think there was a function, but that had issues as well, IIRC.

> 
>> +static u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
>> +			    unsigned long val)
>> +{
>> +	int lower = (offset & 4) * 8;
>> +	int upper = lower + 8 * len - 1;
>> +
>> +	reg &= ~GENMASK_ULL(upper, lower);
>> +	val &= GENMASK_ULL(len * 8 - 1, 0);
>> +
>> +	return reg | ((u64)val << lower);
> 
> this casting is weird.  Why is val not either a u64 or a u32?  Or the
> whole lot could be unsigned long/unsigned int like extract_bytes.

The idea is to make val naturally the largest possible type that could
be subject to MMIO operations and also to let it match the type used in
the MMIO handlers.
This was with 32-bit support already in mind.

> 
>> +}
>> +
>>  static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu,
>>  					    gpa_t addr, unsigned int len)
>>  {
>> @@ -152,6 +165,132 @@ static unsigned long vgic_mmio_read_v3_idregs(struct kvm_vcpu *vcpu,
>>  	return 0;
>>  }
>>  
>> +/* We want to avoid outer shareable. */
>> +u64 vgic_sanitise_shareability(u64 reg)
>> +{
>> +	switch (reg & GIC_BASER_SHAREABILITY_MASK) {
>> +	case GIC_BASER_OuterShareable:
>> +		return GIC_BASER_InnerShareable;
>> +	default:
>> +		return reg;
>> +	}
>> +}
>> +
>> +/* Non-cacheable or same-as-inner are OK. */
>> +u64 vgic_sanitise_outer_cacheability(u64 reg)
>> +{
>> +	switch (reg & GIC_BASER_CACHE_MASK) {
>> +	case GIC_BASER_CACHE_SameAsInner:
>> +	case GIC_BASER_CACHE_nC:
>> +		return reg;
>> +	default:
>> +		return GIC_BASER_CACHE_nC;
>> +	}
>> +}
>> +
>> +/* Avoid any inner non-cacheable mapping. */
>> +u64 vgic_sanitise_inner_cacheability(u64 reg)
>> +{
>> +	switch (reg & GIC_BASER_CACHE_MASK) {
>> +	case GIC_BASER_CACHE_nCnB:
>> +	case GIC_BASER_CACHE_nC:
>> +		return GIC_BASER_CACHE_RaWb;
>> +	default:
>> +		return reg;
>> +	}
>> +}
> 
> nit: My OCD sets in here because the functions above are not ordered
> similarly to the calls below :)

Then let me fix this before you get into any trouble ...

> also, why do you need to apply the mask in the sanitize functions when
> you've just applied it in the callers?  It would make more sense to me
> if you didn't and named the parameters @field instead of @reg in the
> sanitize functions.

OK, can do.

>> +
>> +u64 vgic_sanitise_field(u64 reg, int field_shift, u64 field_mask,
>> +			u64 (*sanitise_fn)(u64))
>> +{
>> +	u64 field = (reg >> field_shift) & field_mask;
>> +
>> +	field = sanitise_fn(field) << field_shift;
>> +	return (reg & ~(field_mask << field_shift)) | field;
>> +}
>> +
>> +static u64 vgic_sanitise_pendbaser(u64 reg)
>> +{
>> +	reg = vgic_sanitise_field(reg, GICR_PENDBASER_SHAREABILITY_SHIFT,
>> +				  GIC_BASER_SHAREABILITY_MASK,
>> +				  vgic_sanitise_shareability);
>> +	reg = vgic_sanitise_field(reg, GICR_PENDBASER_INNER_CACHEABILITY_SHIFT,
>> +				  GIC_BASER_CACHE_MASK,
>> +				  vgic_sanitise_inner_cacheability);
>> +	reg = vgic_sanitise_field(reg, GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT,
>> +				  GIC_BASER_CACHE_MASK,
>> +				  vgic_sanitise_outer_cacheability);
>> +	return reg;
>> +}
>> +
>> +static u64 vgic_sanitise_propbaser(u64 reg)
>> +{
>> +	reg = vgic_sanitise_field(reg, GICR_PROPBASER_SHAREABILITY_SHIFT,
>> +				  GIC_BASER_SHAREABILITY_MASK,
>> +				  vgic_sanitise_shareability);
>> +	reg = vgic_sanitise_field(reg, GICR_PROPBASER_INNER_CACHEABILITY_SHIFT,
>> +				  GIC_BASER_CACHE_MASK,
>> +				  vgic_sanitise_inner_cacheability);
>> +	reg = vgic_sanitise_field(reg, GICR_PROPBASER_OUTER_CACHEABILITY_SHIFT,
>> +				  GIC_BASER_CACHE_MASK,
>> +				  vgic_sanitise_outer_cacheability);
>> +	return reg;
>> +}
> 
> assuming the defines themselves are correct (I didn't check) this looks
> good otherwise.
> 
>> +
>> +#define PROPBASER_RES0_MASK						\
>> +	(GENMASK_ULL(63, 59) | GENMASK_ULL(55, 52) | GENMASK_ULL(6, 5))
> 
> Why is the middle one for bits 55:52?  is it not 55:48?  Perhaps larger
> physical addresses compared to the revision of the GICv3 spec I have at
> hand?

Possibly. The latest public version is issue B from December 2015, and
IIRC this was one of the changes.

> 
>> +#define PENDBASER_RES0_MASK						\
>> +	(BIT_ULL(63) | GENMASK_ULL(61, 59) | GENMASK_ULL(55, 52) |	\
>> +	 GENMASK_ULL(15, 12) | GENMASK_ULL(6, 0))
>> +
>> +static unsigned long vgic_mmio_read_propbase(struct kvm_vcpu *vcpu,
>> +					     gpa_t addr, unsigned int len)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +
>> +	return extract_bytes(dist->propbaser, addr & 7, len);
>> +}
>> +
>> +static void vgic_mmio_write_propbase(struct kvm_vcpu *vcpu,
>> +				     gpa_t addr, unsigned int len,
>> +				     unsigned long val)
>> +{
>> +	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>> +	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>> +
>> +	/* Storing a value with LPIs already enabled is undefined */
>> +	if (vgic_cpu->lpis_enabled)
>> +		return;
>> +
>> +	dist->propbaser = update_64bit_reg(dist->propbaser, addr & 4, len, val);
>> +	dist->propbaser &= ~PROPBASER_RES0_MASK;
>> +	dist->propbaser = vgic_sanitise_propbaser(dist->propbaser);
> 
> what prevents multiple writers messing with dist->propbaser or the

Good point! A rather easy fix would be to use a local variable here and
do just one update at the end.  A more involved one to use per-VCPU
propbaser variables. But eventually we only need _one_ value and don't
have a VCPU pointer when we need to access the value.

> lips_enabled being set in the middle of us fiddling with dist_propbaser?
  ^^^^
I am biting my lips on that matter ;-)

More seriously, I don't think this is a problem, since this is specified
as "undefined". So if I am not mistaken we can happily go on with
changing the value anyway. Any illegal value should be caught by
kvm_read_guest().

>> +}
>> +
>> +static unsigned long vgic_mmio_read_pendbase(struct kvm_vcpu *vcpu,
>> +					     gpa_t addr, unsigned int len)
>> +{
>> +	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>> +
>> +	return extract_bytes(vgic_cpu->pendbaser, addr & 7, len);
>> +}
>> +
>> +static void vgic_mmio_write_pendbase(struct kvm_vcpu *vcpu,
>> +				     gpa_t addr, unsigned int len,
>> +				     unsigned long val)
>> +{
>> +	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>> +
>> +	/* Storing a value with LPIs already enabled is undefined */
>> +	if (vgic_cpu->lpis_enabled)
>> +		return;
>> +
>> +	vgic_cpu->pendbaser = update_64bit_reg(vgic_cpu->pendbaser,
>> +					       addr & 4, len, val);
>> +	vgic_cpu->pendbaser &= ~PENDBASER_RES0_MASK;
>> +	vgic_cpu->pendbaser = vgic_sanitise_pendbaser(vgic_cpu->pendbaser);
>> +}
>> +
>>  /*
>>   * The GICv3 per-IRQ registers are split to control PPIs and SGIs in the
>>   * redistributors, while SPIs are covered by registers in the distributor
>> @@ -232,10 +371,10 @@ static const struct vgic_register_region vgic_v3_rdbase_registers[] = {
>>  		vgic_mmio_read_v3r_typer, vgic_mmio_write_wi, 8,
>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>  	REGISTER_DESC_WITH_LENGTH(GICR_PROPBASER,
>> -		vgic_mmio_read_raz, vgic_mmio_write_wi, 8,
>> +		vgic_mmio_read_propbase, vgic_mmio_write_propbase, 8,
>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>  	REGISTER_DESC_WITH_LENGTH(GICR_PENDBASER,
>> -		vgic_mmio_read_raz, vgic_mmio_write_wi, 8,
>> +		vgic_mmio_read_pendbase, vgic_mmio_write_pendbase, 8,
>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>  	REGISTER_DESC_WITH_LENGTH(GICR_IDREGS,
>>  		vgic_mmio_read_v3_idregs, vgic_mmio_write_wi, 48,
>> diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
>> index 8509014..e863ccc 100644
>> --- a/virt/kvm/arm/vgic/vgic-mmio.h
>> +++ b/virt/kvm/arm/vgic/vgic-mmio.h
>> @@ -147,4 +147,12 @@ unsigned int vgic_v2_init_dist_iodev(struct vgic_io_device *dev);
>>  
>>  unsigned int vgic_v3_init_dist_iodev(struct vgic_io_device *dev);
>>  
>> +#ifdef CONFIG_KVM_ARM_VGIC_V3
>> +u64 vgic_sanitise_outer_cacheability(u64 reg);
>> +u64 vgic_sanitise_inner_cacheability(u64 reg);
>> +u64 vgic_sanitise_shareability(u64 reg);
>> +u64 vgic_sanitise_field(u64 reg, int field_shift, u64 field_mask,
>> +			u64 (*sanitise_fn)(u64));
>> +#endif
>> +
>>  #endif
>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
>> index f0ac064..6f8f31f 100644
>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>> @@ -191,6 +191,11 @@ void vgic_v3_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcrp)
>>  	vmcrp->pmr  = (vmcr & ICH_VMCR_PMR_MASK) >> ICH_VMCR_PMR_SHIFT;
>>  }
>>  
>> +#define INITIAL_PENDBASER_VALUE						  \
>> +	(GIC_BASER_CACHEABILITY(GICR_PENDBASER, INNER, RaWb)		| \
>> +	GIC_BASER_CACHEABILITY(GICR_PENDBASER, OUTER, SameAsInner)	| \
>> +	GIC_BASER_SHAREABILITY(GICR_PENDBASER, InnerShareable))
>> +
>>  void vgic_v3_enable(struct kvm_vcpu *vcpu)
>>  {
>>  	struct vgic_v3_cpu_if *vgic_v3 = &vcpu->arch.vgic_cpu.vgic_v3;
>> @@ -208,10 +213,12 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
>>  	 * way, so we force SRE to 1 to demonstrate this to the guest.
>>  	 * This goes with the spec allowing the value to be RAO/WI.
>>  	 */
>> -	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
>> +	if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
>>  		vgic_v3->vgic_sre = ICC_SRE_EL1_SRE;
>> -	else
>> +		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
> 
> why is pendbaser initialized, but not propbaser?  Do I have an outdated
> spec?  Both seem to not have any predefined reset value.

pendbaser is per core, propbaser is one value per ITS eventually.
So the propbaser initialisation is in vgic_its_create() in patch 11/17.

Thanks again for the comments!

Cheers,
Andre.

> 
>> +	} else {
>>  		vgic_v3->vgic_sre = 0;
>> +	}
>>  
>>  	/* Get the show on the road... */
>>  	vgic_v3->vgic_hcr = ICH_HCR_EN;
>> -- 
>> 2.9.0
>>
> 
> Thanks,
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 11/17] KVM: arm64: implement basic ITS register handlers
  2016-07-08 14:58   ` Marc Zyngier
@ 2016-07-11  9:00     ` Andre Przywara
  2016-07-11 14:21       ` Marc Zyngier
  0 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-11  9:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 08/07/16 15:58, Marc Zyngier wrote:
> On 05/07/16 12:23, Andre Przywara wrote:
>> Add emulation for some basic MMIO registers used in the ITS emulation.
>> This includes:
>> - GITS_{CTLR,TYPER,IIDR}
>> - ID registers
>> - GITS_{CBASER,CREADR,CWRITER}
>>   (which implement the ITS command buffer handling)
>> - GITS_BASER<n>
>>
>> Most of the handlers are pretty straight forward, only the CWRITER
>> handler is a bit more involved by taking the new its_cmd mutex and
>> then iterating over the command buffer.
>> The registers holding base addresses and attributes are sanitised before
>> storing them.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  include/kvm/arm_vgic.h           |  16 ++
>>  virt/kvm/arm/vgic/vgic-its.c     | 376 +++++++++++++++++++++++++++++++++++++--
>>  virt/kvm/arm/vgic/vgic-mmio-v3.c |   8 +-
>>  virt/kvm/arm/vgic/vgic-mmio.h    |   6 +
>>  virt/kvm/arm/vgic/vgic.c         |  12 +-
>>  5 files changed, 401 insertions(+), 17 deletions(-)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index eb82c7d..17d3929 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -22,6 +22,7 @@
>>  #include <linux/spinlock.h>
>>  #include <linux/types.h>
>>  #include <kvm/iodev.h>
>> +#include <linux/list.h>
>>  
>>  #define VGIC_V3_MAX_CPUS	255
>>  #define VGIC_V2_MAX_CPUS	8
>> @@ -128,6 +129,21 @@ struct vgic_its {
>>  	bool			enabled;
>>  	bool			initialized;
>>  	struct vgic_io_device	iodev;
>> +
>> +	/* These registers correspond to GITS_BASER{0,1} */
>> +	u64			baser_device_table;
>> +	u64			baser_coll_table;
>> +
>> +	/* Protects the command queue */
>> +	struct mutex		cmd_lock;
>> +	u64			cbaser;
>> +	u32			creadr;
>> +	u32			cwriter;
>> +
>> +	/* Protects the device and collection lists */
>> +	struct mutex		its_lock;
>> +	struct list_head	device_list;
>> +	struct list_head	collection_list;
>>  };
>>  
>>  struct vgic_dist {
>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>> index d49bdad..a9336a4 100644
>> --- a/virt/kvm/arm/vgic/vgic-its.c
>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>> @@ -21,6 +21,7 @@
>>  #include <linux/kvm.h>
>>  #include <linux/kvm_host.h>
>>  #include <linux/interrupt.h>
>> +#include <linux/list.h>
>>  #include <linux/uaccess.h>
>>  
>>  #include <linux/irqchip/arm-gic-v3.h>
>> @@ -32,6 +33,307 @@
>>  #include "vgic.h"
>>  #include "vgic-mmio.h"
>>  
>> +struct its_device {
>> +	struct list_head dev_list;
>> +
>> +	/* the head for the list of ITTEs */
>> +	struct list_head itt_head;
>> +	u32 device_id;
>> +};
>> +
>> +#define COLLECTION_NOT_MAPPED ((u32)~0)
>> +
>> +struct its_collection {
>> +	struct list_head coll_list;
>> +
>> +	u32 collection_id;
>> +	u32 target_addr;
>> +};
>> +
>> +#define its_is_collection_mapped(coll) ((coll) && \
>> +				((coll)->target_addr != COLLECTION_NOT_MAPPED))
>> +
>> +struct its_itte {
>> +	struct list_head itte_list;
>> +
>> +	struct its_collection *collection;
>> +	u32 lpi;
>> +	u32 event_id;
>> +};
>> +
>> +#define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
>> +
>> +static unsigned long vgic_mmio_read_its_ctlr(struct kvm *vcpu,
>> +					     struct vgic_its *its,
>> +					     gpa_t addr, unsigned int len)
>> +{
>> +	u32 reg = 0;
>> +
>> +	mutex_lock(&its->cmd_lock);
>> +	if (its->creadr == its->cwriter)
>> +		reg |= GITS_CTLR_QUIESCENT;
>> +	if (its->enabled)
>> +		reg |= GITS_CTLR_ENABLE;
>> +	mutex_unlock(&its->cmd_lock);
>> +
>> +	return reg;
>> +}
>> +
>> +static void vgic_mmio_write_its_ctlr(struct kvm *kvm, struct vgic_its *its,
>> +				     gpa_t addr, unsigned int len,
>> +				     unsigned long val)
>> +{
>> +	its->enabled = !!(val & GITS_CTLR_ENABLE);
>> +}
>> +
>> +static unsigned long vgic_mmio_read_its_typer(struct kvm *kvm,
>> +					      struct vgic_its *its,
>> +					      gpa_t addr, unsigned int len)
>> +{
>> +	u64 reg = GITS_TYPER_PLPIS;
>> +
>> +	/*
>> +	 * We use linear CPU numbers for redistributor addressing,
>> +	 * so GITS_TYPER.PTA is 0.
>> +	 * Also we force all PROPBASER registers to be the same, so
>> +	 * CommonLPIAff is 0 as well.
>> +	 * To avoid memory waste in the guest, we keep the number of IDBits and
>> +	 * DevBits low - as least for the time being.
>> +	 */
>> +	reg |= 0x0f << GITS_TYPER_DEVBITS_SHIFT;
>> +	reg |= 0x0f << GITS_TYPER_IDBITS_SHIFT;
>> +
>> +	return extract_bytes(reg, addr & 7, len);
>> +}
>> +
>> +static unsigned long vgic_mmio_read_its_iidr(struct kvm *kvm,
>> +					     struct vgic_its *its,
>> +					     gpa_t addr, unsigned int len)
>> +{
>> +	return (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
>> +}
>> +
>> +static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm,
>> +					       struct vgic_its *its,
>> +					       gpa_t addr, unsigned int len)
>> +{
>> +	switch (addr & 0xffff) {
>> +	case GITS_PIDR0:
>> +		return 0x92;	/* part number, bits[7:0] */
>> +	case GITS_PIDR1:
>> +		return 0xb4;	/* part number, bits[11:8] */
>> +	case GITS_PIDR2:
>> +		return GIC_PIDR2_ARCH_GICv3 | 0x0b;
>> +	case GITS_PIDR4:
>> +		return 0x40;	/* This is a 64K software visible page */
>> +	/* The following are the ID registers for (any) GIC. */
>> +	case GITS_CIDR0:
>> +		return 0x0d;
>> +	case GITS_CIDR1:
>> +		return 0xf0;
>> +	case GITS_CIDR2:
>> +		return 0x05;
>> +	case GITS_CIDR3:
>> +		return 0xb1;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/* Requires the its_lock to be held. */
>> +static void its_free_itte(struct kvm *kvm, struct its_itte *itte)
>> +{
>> +	list_del(&itte->itte_list);
>> +	kfree(itte);
>> +}
>> +
>> +static int vits_handle_command(struct kvm *kvm, struct vgic_its *its,
>> +			       u64 *its_cmd)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +static u64 vgic_sanitise_its_baser(u64 reg)
>> +{
>> +	reg = vgic_sanitise_field(reg, GITS_BASER_SHAREABILITY_SHIFT,
>> +				  GIC_BASER_SHAREABILITY_MASK,
>> +				  vgic_sanitise_shareability);
>> +	reg = vgic_sanitise_field(reg, GITS_BASER_INNER_CACHEABILITY_SHIFT,
>> +				  GIC_BASER_CACHE_MASK,
>> +				  vgic_sanitise_inner_cacheability);
>> +	reg = vgic_sanitise_field(reg, GITS_BASER_OUTER_CACHEABILITY_SHIFT,
>> +				  GIC_BASER_CACHE_MASK,
>> +				  vgic_sanitise_outer_cacheability);
>> +	return reg;
>> +}
>> +
>> +static u64 vgic_sanitise_its_cbaser(u64 reg)
>> +{
>> +	reg = vgic_sanitise_field(reg, GITS_CBASER_SHAREABILITY_SHIFT,
>> +				  GIC_BASER_SHAREABILITY_MASK,
> 
> -ECOPYPASTE : GITS_CBASER_SHAREABILITY_MASK
> 
>> +				  vgic_sanitise_shareability);
>> +	reg = vgic_sanitise_field(reg, GITS_CBASER_INNER_CACHEABILITY_SHIFT,
>> +				  GIC_BASER_CACHE_MASK,
> 
> Same here?
> 
>> +				  vgic_sanitise_inner_cacheability);
>> +	reg = vgic_sanitise_field(reg, GITS_CBASER_OUTER_CACHEABILITY_SHIFT,
>> +				  GIC_BASER_CACHE_MASK,
> 
> And here?

Those shareability _masks_ are all the same.
I can use specific #defines to that one value, if that makes you happy,
though I wanted to avoid too many definitions.

>> +				  vgic_sanitise_outer_cacheability);
>> +	return reg;
>> +}
>> +
>> +static unsigned long vgic_mmio_read_its_cbaser(struct kvm *kvm,
>> +					       struct vgic_its *its,
>> +					       gpa_t addr, unsigned int len)
>> +{
>> +	return extract_bytes(its->cbaser, addr & 7, len);
>> +}
>> +
>> +static void vgic_mmio_write_its_cbaser(struct kvm *kvm, struct vgic_its *its,
>> +				       gpa_t addr, unsigned int len,
>> +				       unsigned long val)
>> +{
>> +	/* When GITS_CTLR.Enable is 1, this register is RO. */
>> +	if (its->enabled)
>> +		return;
>> +
>> +	mutex_lock(&its->cmd_lock);
>> +	its->cbaser = update_64bit_reg(its->cbaser, addr & 7, len, val);
>> +	/* Sanitise the physical address to be 64k aligned. */
>> +	its->cbaser &= ~GENMASK_ULL(15, 12);
> 
> So you're not supporting 52bit addresses, as you're forcing the bottom
> addresses to zero.

Yes, I decided to go with 48bits.

>> +	its->cbaser = vgic_sanitise_its_cbaser(its->cbaser);
>> +	its->creadr = 0;
>> +	/*
>> +	 * CWRITER is architecturally UNKNOWN on reset, but we need to reset
>> +	 * it to CREADR to make sure we start with an empty command buffer.
>> +	 */
>> +	its->cwriter = its->creadr;
>> +	mutex_unlock(&its->cmd_lock);
>> +}
>> +
>> +#define ITS_CMD_BUFFER_SIZE(baser)	((((baser) & 0xff) + 1) << 12)
>> +#define ITS_CMD_SIZE			32
>> +
>> +/*
>> + * By writing to CWRITER the guest announces new commands to be processed.
>> + * To avoid any races in the first place, we take the its_cmd lock, which
>> + * protects our ring buffer variables, so that there is only one user
>> + * per ITS handling commands at a given time.
>> + */
>> +static void vgic_mmio_write_its_cwriter(struct kvm *kvm, struct vgic_its *its,
>> +					gpa_t addr, unsigned int len,
>> +					unsigned long val)
>> +{
>> +	gpa_t cbaser;
>> +	u64 cmd_buf[4];
>> +	u32 reg;
>> +
>> +	if (!its)
>> +		return;
>> +
>> +	cbaser = CBASER_ADDRESS(its->cbaser);
>> +
>> +	reg = update_64bit_reg(its->cwriter & 0xfffe0, addr & 7, len, val);
>> +	reg &= 0xfffe0;
>> +	if (reg > ITS_CMD_BUFFER_SIZE(its->cbaser))
>> +		return;
>> +
>> +	mutex_lock(&its->cmd_lock);
>> +
>> +	its->cwriter = reg;
>> +
>> +	while (its->cwriter != its->creadr) {
>> +		int ret = kvm_read_guest(kvm, cbaser + its->creadr,
>> +					 cmd_buf, ITS_CMD_SIZE);
>> +		/*
>> +		 * If kvm_read_guest() fails, this could be due to the guest
>> +		 * programming a bogus value in CBASER or something else going
>> +		 * wrong from which we cannot easily recover.
>> +		 * We just ignore that command then.
>> +		 */
>> +		if (!ret)
>> +			vits_handle_command(kvm, its, cmd_buf);
>> +
>> +		its->creadr += ITS_CMD_SIZE;
>> +		if (its->creadr == ITS_CMD_BUFFER_SIZE(its->cbaser))
>> +			its->creadr = 0;
>> +	}
>> +
>> +	mutex_unlock(&its->cmd_lock);
>> +}
>> +
>> +static unsigned long vgic_mmio_read_its_cwriter(struct kvm *kvm,
>> +						struct vgic_its *its,
>> +						gpa_t addr, unsigned int len)
>> +{
>> +	return extract_bytes(its->cwriter & 0xfffe0, addr & 0x7, len);
>> +}
>> +
>> +static unsigned long vgic_mmio_read_its_creadr(struct kvm *kvm,
>> +					       struct vgic_its *its,
>> +					       gpa_t addr, unsigned int len)
>> +{
>> +	return extract_bytes(its->creadr & 0xfffe0, addr & 0x7, len);
>> +}
>> +
>> +#define BASER_INDEX(addr) (((addr) / sizeof(u64)) & 0x7)
>> +static unsigned long vgic_mmio_read_its_baser(struct kvm *kvm,
>> +					      struct vgic_its *its,
>> +					      gpa_t addr, unsigned int len)
>> +{
>> +	u64 reg;
>> +
>> +	switch (BASER_INDEX(addr)) {
>> +	case 0:
>> +		reg = its->baser_device_table;
>> +		break;
>> +	case 1:
>> +		reg = its->baser_coll_table;
>> +		break;
>> +	default:
>> +		reg = 0;
>> +		break;
>> +	}
>> +
>> +	return extract_bytes(reg, addr & 7, len);
>> +}
>> +
>> +#define GITS_BASER_RO_MASK	(GENMASK_ULL(52, 48) | GENMASK_ULL(58, 56))
>> +static void vgic_mmio_write_its_baser(struct kvm *kvm,
>> +				      struct vgic_its *its,
>> +				      gpa_t addr, unsigned int len,
>> +				      unsigned long val)
>> +{
>> +	u64 reg, *regptr;
>> +	u64 entry_size, device_type;
>> +
>> +	/* When GITS_CTLR.Enable is 1, we ignore write accesses. */
>> +	if (its->enabled)
>> +		return;
>> +
>> +	switch (BASER_INDEX(addr)) {
>> +	case 0:
>> +		regptr = &its->baser_device_table;
>> +		entry_size = 8;
>> +		device_type = GITS_BASER_TYPE_DEVICE;
>> +		break;
>> +	case 1:
>> +		regptr = &its->baser_coll_table;
>> +		entry_size = 8;
>> +		device_type = GITS_BASER_TYPE_COLLECTION;
>> +		break;
>> +	default:
>> +		return;
>> +	}
>> +
>> +	reg = update_64bit_reg(*regptr, addr & 7, len, val);
>> +	reg &= ~GITS_BASER_RO_MASK;
>> +	reg |= (entry_size - 1) << GITS_BASER_ENTRY_SIZE_SHIFT;
>> +	reg |= device_type << GITS_BASER_TYPE_SHIFT;
>> +	reg = vgic_sanitise_its_baser(reg);
> 
> So you claim to support indirect tables on collections too? That's
> pretty odd. I'd be happier if you filtered that out on collections.

Sure, can do. Just wondering what would be the reason for that? Is there
anything that causes troubles on supporting indirect collection tables?

> Also, you're supporting any page size (which is fine on its own), but
> also not doing anything regarding the width of the address (52bits are
> only valid for 64kB ITS pages). This is completely inconsistent with
> what you're doing with GITS_CBASER.
> 
> I'd suggest you reduce the scope to a single supported page size (64kB),
> and decide whether you want to support 52bit PAs or not. Either way
> would be valid, but it has to be consistent across the board.

My intention was to support all page sizes (we need 4K for AArch32,
don't we?), but only 48 bits of PA (as KVM doesn't support more than
48bits atm anyway, if I am not mistaken).

So I will clear bits 15:12 if the page size is 64K. Does that make sense?

> It may not be of great importance right now, but it is going to be
> really critical for save/restore, and we'd better get it right from the
> beginning.
> 
>> +
>> +	*regptr = reg;
>> +}
>> +
>>  #define REGISTER_ITS_DESC(off, rd, wr, length, acc)		\
>>  {								\
>>  	.reg_offset = off,					\
>> @@ -42,8 +344,8 @@
>>  	.its_write = wr,					\
>>  }
>>  
>> -static unsigned long its_mmio_read_raz(struct kvm *kvm, struct vgic_its *its,
>> -				       gpa_t addr, unsigned int len)
>> +unsigned long its_mmio_read_raz(struct kvm *kvm, struct vgic_its *its,
>> +				gpa_t addr, unsigned int len)
>>  {
>>  	return 0;
>>  }
>> @@ -56,28 +358,28 @@ static void its_mmio_write_wi(struct kvm *kvm, struct vgic_its *its,
>>  
>>  static struct vgic_register_region its_registers[] = {
>>  	REGISTER_ITS_DESC(GITS_CTLR,
>> -		its_mmio_read_raz, its_mmio_write_wi, 4,
>> +		vgic_mmio_read_its_ctlr, vgic_mmio_write_its_ctlr, 4,
>>  		VGIC_ACCESS_32bit),
>>  	REGISTER_ITS_DESC(GITS_IIDR,
>> -		its_mmio_read_raz, its_mmio_write_wi, 4,
>> +		vgic_mmio_read_its_iidr, its_mmio_write_wi, 4,
>>  		VGIC_ACCESS_32bit),
>>  	REGISTER_ITS_DESC(GITS_TYPER,
>> -		its_mmio_read_raz, its_mmio_write_wi, 8,
>> +		vgic_mmio_read_its_typer, its_mmio_write_wi, 8,
>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>  	REGISTER_ITS_DESC(GITS_CBASER,
>> -		its_mmio_read_raz, its_mmio_write_wi, 8,
>> +		vgic_mmio_read_its_cbaser, vgic_mmio_write_its_cbaser, 8,
>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>  	REGISTER_ITS_DESC(GITS_CWRITER,
>> -		its_mmio_read_raz, its_mmio_write_wi, 8,
>> +		vgic_mmio_read_its_cwriter, vgic_mmio_write_its_cwriter, 8,
>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>  	REGISTER_ITS_DESC(GITS_CREADR,
>> -		its_mmio_read_raz, its_mmio_write_wi, 8,
>> +		vgic_mmio_read_its_creadr, its_mmio_write_wi, 8,
>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>  	REGISTER_ITS_DESC(GITS_BASER,
>> -		its_mmio_read_raz, its_mmio_write_wi, 0x40,
>> +		vgic_mmio_read_its_baser, vgic_mmio_write_its_baser, 0x40,
>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>  	REGISTER_ITS_DESC(GITS_IDREGS_BASE,
>> -		its_mmio_read_raz, its_mmio_write_wi, 0x30,
>> +		vgic_mmio_read_its_idregs, its_mmio_write_wi, 0x30,
>>  		VGIC_ACCESS_32bit),
>>  };
>>  
>> @@ -100,6 +402,18 @@ static int vgic_its_register(struct kvm *kvm, struct vgic_its *its)
>>  	return ret;
>>  }
>>  
>> +#define INITIAL_BASER_VALUE						  \
>> +	(GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWb)		| \
>> +	 GIC_BASER_CACHEABILITY(GITS_BASER, OUTER, SameAsInner)		| \
>> +	 GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable)		| \
>> +	 ((8ULL - 1) << GITS_BASER_ENTRY_SIZE_SHIFT)			| \
>> +	 GITS_BASER_PAGE_SIZE_64K)
>> +
>> +#define INITIAL_PROPBASER_VALUE						  \
>> +	(GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWb)		| \
>> +	 GIC_BASER_CACHEABILITY(GICR_PROPBASER, OUTER, SameAsInner)	| \
>> +	 GIC_BASER_SHAREABILITY(GICR_PROPBASER, InnerShareable))
>> +
>>  static int vgic_its_create(struct kvm_device *dev, u32 type)
>>  {
>>  	struct vgic_its *its;
>> @@ -111,12 +425,25 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
>>  	if (!its)
>>  		return -ENOMEM;
>>  
>> +	mutex_init(&its->its_lock);
>> +	mutex_init(&its->cmd_lock);
>> +
>>  	its->vgic_its_base = VGIC_ADDR_UNDEF;
>>  
>> +	INIT_LIST_HEAD(&its->device_list);
>> +	INIT_LIST_HEAD(&its->collection_list);
>> +
>>  	dev->kvm->arch.vgic.has_its = true;
>>  	its->initialized = false;
>>  	its->enabled = false;
>>  
>> +	its->baser_device_table = INITIAL_BASER_VALUE			|
>> +		((u64)GITS_BASER_TYPE_DEVICE << GITS_BASER_TYPE_SHIFT)	|
>> +		GITS_BASER_INDIRECT;
> 
> It is a bit odd to advertize the indirect flag as a reset value, but I
> don't see anything that indicates it is not allowed...

I find it really confusing as to what fields are supposed to indicate
support on reset and which are just taking part of that
"write-and-see-if-it-sticks" game.

I take it now there are no requirements on the reset state and
everything is negioated via writing to the register?
In this case I'd move the indirect indication from here to the write
function above.

Cheers,
Andre.


>> +	its->baser_coll_table = INITIAL_BASER_VALUE |
>> +		((u64)GITS_BASER_TYPE_COLLECTION << GITS_BASER_TYPE_SHIFT);
>> +	dev->kvm->arch.vgic.propbaser = INITIAL_PROPBASER_VALUE;
>> +
>>  	dev->private = its;
>>  
>>  	return 0;
>> @@ -124,7 +451,36 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
>>  
>>  static void vgic_its_destroy(struct kvm_device *kvm_dev)
>>  {
>> +	struct kvm *kvm = kvm_dev->kvm;
>>  	struct vgic_its *its = kvm_dev->private;
>> +	struct its_device *dev;
>> +	struct its_itte *itte;
>> +	struct list_head *dev_cur, *dev_temp;
>> +	struct list_head *cur, *temp;
>> +
>> +	/*
>> +	 * We may end up here without the lists ever having been initialized.
>> +	 * Check this and bail out early to avoid dereferencing a NULL pointer.
>> +	 */
>> +	if (!its->device_list.next)
>> +		return;
>> +
>> +	mutex_lock(&its->its_lock);
>> +	list_for_each_safe(dev_cur, dev_temp, &its->device_list) {
>> +		dev = container_of(dev_cur, struct its_device, dev_list);
>> +		list_for_each_safe(cur, temp, &dev->itt_head) {
>> +			itte = (container_of(cur, struct its_itte, itte_list));
>> +			its_free_itte(kvm, itte);
>> +		}
>> +		list_del(dev_cur);
>> +		kfree(dev);
>> +	}
>> +
>> +	list_for_each_safe(cur, temp, &its->collection_list) {
>> +		list_del(cur);
>> +		kfree(container_of(cur, struct its_collection, coll_list));
>> +	}
>> +	mutex_unlock(&its->its_lock);
>>  
>>  	kfree(its);
>>  }
>> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>> index 062ff95..370e89e 100644
>> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
>> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
>> @@ -23,15 +23,15 @@
>>  #include "vgic-mmio.h"
>>  
>>  /* extract @num bytes at @offset bytes offset in data */
>> -static unsigned long extract_bytes(unsigned long data, unsigned int offset,
>> -				   unsigned int num)
>> +unsigned long extract_bytes(unsigned long data, unsigned int offset,
>> +			    unsigned int num)
>>  {
>>  	return (data >> (offset * 8)) & GENMASK_ULL(num * 8 - 1, 0);
>>  }
>>  
>>  /* allows updates of any half of a 64-bit register (or the whole thing) */
>> -static u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
>> -			    unsigned long val)
>> +u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
>> +		     unsigned long val)
>>  {
>>  	int lower = (offset & 4) * 8;
>>  	int upper = lower + 8 * len - 1;
>> diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
>> index 23e97a7..513bb5c 100644
>> --- a/virt/kvm/arm/vgic/vgic-mmio.h
>> +++ b/virt/kvm/arm/vgic/vgic-mmio.h
>> @@ -106,6 +106,12 @@ unsigned long vgic_data_mmio_bus_to_host(const void *val, unsigned int len);
>>  void vgic_data_host_to_mmio_bus(void *buf, unsigned int len,
>>  				unsigned long data);
>>  
>> +unsigned long extract_bytes(unsigned long data, unsigned int offset,
>> +			    unsigned int num);
>> +
>> +u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
>> +		     unsigned long val);
>> +
>>  unsigned long vgic_mmio_read_raz(struct kvm_vcpu *vcpu,
>>  				 gpa_t addr, unsigned int len);
>>  
>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>> index ae80894..a5d9a10 100644
>> --- a/virt/kvm/arm/vgic/vgic.c
>> +++ b/virt/kvm/arm/vgic/vgic.c
>> @@ -33,10 +33,16 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
>>  
>>  /*
>>   * Locking order is always:
>> - *   vgic_cpu->ap_list_lock
>> - *     vgic_irq->irq_lock
>> + * its->cmd_lock (mutex)
>> + *   its->its_lock (mutex)
>> + *     vgic_cpu->ap_list_lock
>> + *       vgic_irq->irq_lock
>>   *
>> - * (that is, always take the ap_list_lock before the struct vgic_irq lock).
>> + * If you need to take multiple locks, always take the upper lock first,
>> + * then the lower ones, e.g. first take the its_lock, then the irq_lock.
>> + * If you are already holding a lock and need to take a higher one, you
>> + * have to drop the lower ranking lock first and re-aquire it after having
>> + * taken the upper one.
>>   *
>>   * When taking more than one ap_list_lock at the same time, always take the
>>   * lowest numbered VCPU's ap_list_lock first, so:
>>
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 11/17] KVM: arm64: implement basic ITS register handlers
  2016-07-11  9:00     ` Andre Przywara
@ 2016-07-11 14:21       ` Marc Zyngier
  0 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2016-07-11 14:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/07/16 10:00, Andre Przywara wrote:
> Hi,
> 
> On 08/07/16 15:58, Marc Zyngier wrote:
>> On 05/07/16 12:23, Andre Przywara wrote:
>>> Add emulation for some basic MMIO registers used in the ITS emulation.
>>> This includes:
>>> - GITS_{CTLR,TYPER,IIDR}
>>> - ID registers
>>> - GITS_{CBASER,CREADR,CWRITER}
>>>   (which implement the ITS command buffer handling)
>>> - GITS_BASER<n>
>>>
>>> Most of the handlers are pretty straight forward, only the CWRITER
>>> handler is a bit more involved by taking the new its_cmd mutex and
>>> then iterating over the command buffer.
>>> The registers holding base addresses and attributes are sanitised before
>>> storing them.
>>>
>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>> ---
>>>  include/kvm/arm_vgic.h           |  16 ++
>>>  virt/kvm/arm/vgic/vgic-its.c     | 376 +++++++++++++++++++++++++++++++++++++--
>>>  virt/kvm/arm/vgic/vgic-mmio-v3.c |   8 +-
>>>  virt/kvm/arm/vgic/vgic-mmio.h    |   6 +
>>>  virt/kvm/arm/vgic/vgic.c         |  12 +-
>>>  5 files changed, 401 insertions(+), 17 deletions(-)
>>>

[...]

>>> +/* Requires the its_lock to be held. */
>>> +static void its_free_itte(struct kvm *kvm, struct its_itte *itte)
>>> +{
>>> +	list_del(&itte->itte_list);
>>> +	kfree(itte);
>>> +}
>>> +
>>> +static int vits_handle_command(struct kvm *kvm, struct vgic_its *its,
>>> +			       u64 *its_cmd)
>>> +{
>>> +	return -ENODEV;
>>> +}
>>> +
>>> +static u64 vgic_sanitise_its_baser(u64 reg)
>>> +{
>>> +	reg = vgic_sanitise_field(reg, GITS_BASER_SHAREABILITY_SHIFT,
>>> +				  GIC_BASER_SHAREABILITY_MASK,
>>> +				  vgic_sanitise_shareability);
>>> +	reg = vgic_sanitise_field(reg, GITS_BASER_INNER_CACHEABILITY_SHIFT,
>>> +				  GIC_BASER_CACHE_MASK,
>>> +				  vgic_sanitise_inner_cacheability);
>>> +	reg = vgic_sanitise_field(reg, GITS_BASER_OUTER_CACHEABILITY_SHIFT,
>>> +				  GIC_BASER_CACHE_MASK,
>>> +				  vgic_sanitise_outer_cacheability);
>>> +	return reg;
>>> +}
>>> +
>>> +static u64 vgic_sanitise_its_cbaser(u64 reg)
>>> +{
>>> +	reg = vgic_sanitise_field(reg, GITS_CBASER_SHAREABILITY_SHIFT,
>>> +				  GIC_BASER_SHAREABILITY_MASK,
>>
>> -ECOPYPASTE : GITS_CBASER_SHAREABILITY_MASK
>>
>>> +				  vgic_sanitise_shareability);
>>> +	reg = vgic_sanitise_field(reg, GITS_CBASER_INNER_CACHEABILITY_SHIFT,
>>> +				  GIC_BASER_CACHE_MASK,
>>
>> Same here?
>>
>>> +				  vgic_sanitise_inner_cacheability);
>>> +	reg = vgic_sanitise_field(reg, GITS_CBASER_OUTER_CACHEABILITY_SHIFT,
>>> +				  GIC_BASER_CACHE_MASK,
>>
>> And here?
> 
> Those shareability _masks_ are all the same.
> I can use specific #defines to that one value, if that makes you happy,
> though I wanted to avoid too many definitions.

What is the point of having #defines if their name doesn't indicate what
they do? You might as well call it BOB, or leave the raw value... And
no, don't do any of that. Just add the required defines.

> 
>>> +				  vgic_sanitise_outer_cacheability);
>>> +	return reg;
>>> +}
>>> +
>>> +static unsigned long vgic_mmio_read_its_cbaser(struct kvm *kvm,
>>> +					       struct vgic_its *its,
>>> +					       gpa_t addr, unsigned int len)
>>> +{
>>> +	return extract_bytes(its->cbaser, addr & 7, len);
>>> +}
>>> +
>>> +static void vgic_mmio_write_its_cbaser(struct kvm *kvm, struct vgic_its *its,
>>> +				       gpa_t addr, unsigned int len,
>>> +				       unsigned long val)
>>> +{
>>> +	/* When GITS_CTLR.Enable is 1, this register is RO. */
>>> +	if (its->enabled)
>>> +		return;
>>> +
>>> +	mutex_lock(&its->cmd_lock);
>>> +	its->cbaser = update_64bit_reg(its->cbaser, addr & 7, len, val);
>>> +	/* Sanitise the physical address to be 64k aligned. */
>>> +	its->cbaser &= ~GENMASK_ULL(15, 12);
>>
>> So you're not supporting 52bit addresses, as you're forcing the bottom
>> addresses to zero.
> 
> Yes, I decided to go with 48bits.
> 
>>> +	its->cbaser = vgic_sanitise_its_cbaser(its->cbaser);
>>> +	its->creadr = 0;
>>> +	/*
>>> +	 * CWRITER is architecturally UNKNOWN on reset, but we need to reset
>>> +	 * it to CREADR to make sure we start with an empty command buffer.
>>> +	 */
>>> +	its->cwriter = its->creadr;
>>> +	mutex_unlock(&its->cmd_lock);
>>> +}
>>> +
>>> +#define ITS_CMD_BUFFER_SIZE(baser)	((((baser) & 0xff) + 1) << 12)
>>> +#define ITS_CMD_SIZE			32
>>> +
>>> +/*
>>> + * By writing to CWRITER the guest announces new commands to be processed.
>>> + * To avoid any races in the first place, we take the its_cmd lock, which
>>> + * protects our ring buffer variables, so that there is only one user
>>> + * per ITS handling commands at a given time.
>>> + */
>>> +static void vgic_mmio_write_its_cwriter(struct kvm *kvm, struct vgic_its *its,
>>> +					gpa_t addr, unsigned int len,
>>> +					unsigned long val)
>>> +{
>>> +	gpa_t cbaser;
>>> +	u64 cmd_buf[4];
>>> +	u32 reg;
>>> +
>>> +	if (!its)
>>> +		return;
>>> +
>>> +	cbaser = CBASER_ADDRESS(its->cbaser);
>>> +
>>> +	reg = update_64bit_reg(its->cwriter & 0xfffe0, addr & 7, len, val);
>>> +	reg &= 0xfffe0;
>>> +	if (reg > ITS_CMD_BUFFER_SIZE(its->cbaser))
>>> +		return;
>>> +
>>> +	mutex_lock(&its->cmd_lock);
>>> +
>>> +	its->cwriter = reg;
>>> +
>>> +	while (its->cwriter != its->creadr) {
>>> +		int ret = kvm_read_guest(kvm, cbaser + its->creadr,
>>> +					 cmd_buf, ITS_CMD_SIZE);
>>> +		/*
>>> +		 * If kvm_read_guest() fails, this could be due to the guest
>>> +		 * programming a bogus value in CBASER or something else going
>>> +		 * wrong from which we cannot easily recover.
>>> +		 * We just ignore that command then.
>>> +		 */
>>> +		if (!ret)
>>> +			vits_handle_command(kvm, its, cmd_buf);
>>> +
>>> +		its->creadr += ITS_CMD_SIZE;
>>> +		if (its->creadr == ITS_CMD_BUFFER_SIZE(its->cbaser))
>>> +			its->creadr = 0;
>>> +	}
>>> +
>>> +	mutex_unlock(&its->cmd_lock);
>>> +}
>>> +
>>> +static unsigned long vgic_mmio_read_its_cwriter(struct kvm *kvm,
>>> +						struct vgic_its *its,
>>> +						gpa_t addr, unsigned int len)
>>> +{
>>> +	return extract_bytes(its->cwriter & 0xfffe0, addr & 0x7, len);
>>> +}
>>> +
>>> +static unsigned long vgic_mmio_read_its_creadr(struct kvm *kvm,
>>> +					       struct vgic_its *its,
>>> +					       gpa_t addr, unsigned int len)
>>> +{
>>> +	return extract_bytes(its->creadr & 0xfffe0, addr & 0x7, len);
>>> +}
>>> +
>>> +#define BASER_INDEX(addr) (((addr) / sizeof(u64)) & 0x7)
>>> +static unsigned long vgic_mmio_read_its_baser(struct kvm *kvm,
>>> +					      struct vgic_its *its,
>>> +					      gpa_t addr, unsigned int len)
>>> +{
>>> +	u64 reg;
>>> +
>>> +	switch (BASER_INDEX(addr)) {
>>> +	case 0:
>>> +		reg = its->baser_device_table;
>>> +		break;
>>> +	case 1:
>>> +		reg = its->baser_coll_table;
>>> +		break;
>>> +	default:
>>> +		reg = 0;
>>> +		break;
>>> +	}
>>> +
>>> +	return extract_bytes(reg, addr & 7, len);
>>> +}
>>> +
>>> +#define GITS_BASER_RO_MASK	(GENMASK_ULL(52, 48) | GENMASK_ULL(58, 56))
>>> +static void vgic_mmio_write_its_baser(struct kvm *kvm,
>>> +				      struct vgic_its *its,
>>> +				      gpa_t addr, unsigned int len,
>>> +				      unsigned long val)
>>> +{
>>> +	u64 reg, *regptr;
>>> +	u64 entry_size, device_type;
>>> +
>>> +	/* When GITS_CTLR.Enable is 1, we ignore write accesses. */
>>> +	if (its->enabled)
>>> +		return;
>>> +
>>> +	switch (BASER_INDEX(addr)) {
>>> +	case 0:
>>> +		regptr = &its->baser_device_table;
>>> +		entry_size = 8;
>>> +		device_type = GITS_BASER_TYPE_DEVICE;
>>> +		break;
>>> +	case 1:
>>> +		regptr = &its->baser_coll_table;
>>> +		entry_size = 8;
>>> +		device_type = GITS_BASER_TYPE_COLLECTION;
>>> +		break;
>>> +	default:
>>> +		return;
>>> +	}
>>> +
>>> +	reg = update_64bit_reg(*regptr, addr & 7, len, val);
>>> +	reg &= ~GITS_BASER_RO_MASK;
>>> +	reg |= (entry_size - 1) << GITS_BASER_ENTRY_SIZE_SHIFT;
>>> +	reg |= device_type << GITS_BASER_TYPE_SHIFT;
>>> +	reg = vgic_sanitise_its_baser(reg);
>>
>> So you claim to support indirect tables on collections too? That's
>> pretty odd. I'd be happier if you filtered that out on collections.
> 
> Sure, can do. Just wondering what would be the reason for that? Is there
> anything that causes troubles on supporting indirect collection tables?

The main problem is that they don't make any sense, as this is not a
sparse space. A stupid GIC driver may use it and actively waste memory
(and performance on a real HW implementation).

> 
>> Also, you're supporting any page size (which is fine on its own), but
>> also not doing anything regarding the width of the address (52bits are
>> only valid for 64kB ITS pages). This is completely inconsistent with
>> what you're doing with GITS_CBASER.
>>
>> I'd suggest you reduce the scope to a single supported page size (64kB),
>> and decide whether you want to support 52bit PAs or not. Either way
>> would be valid, but it has to be consistent across the board.
> 
> My intention was to support all page sizes (we need 4K for AArch32,
> don't we?), but only 48 bits of PA (as KVM doesn't support more than
> 48bits atm anyway, if I am not mistaken).

Page size is the ITS page size, nothing to do with the CPU at all. So
there is absolutely no requirement to support a particular page size
(you just need to support one).

> So I will clear bits 15:12 if the page size is 64K. Does that make sense?

Don't bother with the "if". Stick to 64kB pages.

> 
>> It may not be of great importance right now, but it is going to be
>> really critical for save/restore, and we'd better get it right from the
>> beginning.
>>
>>> +
>>> +	*regptr = reg;
>>> +}
>>> +
>>>  #define REGISTER_ITS_DESC(off, rd, wr, length, acc)		\
>>>  {								\
>>>  	.reg_offset = off,					\
>>> @@ -42,8 +344,8 @@
>>>  	.its_write = wr,					\
>>>  }
>>>  
>>> -static unsigned long its_mmio_read_raz(struct kvm *kvm, struct vgic_its *its,
>>> -				       gpa_t addr, unsigned int len)
>>> +unsigned long its_mmio_read_raz(struct kvm *kvm, struct vgic_its *its,
>>> +				gpa_t addr, unsigned int len)
>>>  {
>>>  	return 0;
>>>  }
>>> @@ -56,28 +358,28 @@ static void its_mmio_write_wi(struct kvm *kvm, struct vgic_its *its,
>>>  
>>>  static struct vgic_register_region its_registers[] = {
>>>  	REGISTER_ITS_DESC(GITS_CTLR,
>>> -		its_mmio_read_raz, its_mmio_write_wi, 4,
>>> +		vgic_mmio_read_its_ctlr, vgic_mmio_write_its_ctlr, 4,
>>>  		VGIC_ACCESS_32bit),
>>>  	REGISTER_ITS_DESC(GITS_IIDR,
>>> -		its_mmio_read_raz, its_mmio_write_wi, 4,
>>> +		vgic_mmio_read_its_iidr, its_mmio_write_wi, 4,
>>>  		VGIC_ACCESS_32bit),
>>>  	REGISTER_ITS_DESC(GITS_TYPER,
>>> -		its_mmio_read_raz, its_mmio_write_wi, 8,
>>> +		vgic_mmio_read_its_typer, its_mmio_write_wi, 8,
>>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>>  	REGISTER_ITS_DESC(GITS_CBASER,
>>> -		its_mmio_read_raz, its_mmio_write_wi, 8,
>>> +		vgic_mmio_read_its_cbaser, vgic_mmio_write_its_cbaser, 8,
>>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>>  	REGISTER_ITS_DESC(GITS_CWRITER,
>>> -		its_mmio_read_raz, its_mmio_write_wi, 8,
>>> +		vgic_mmio_read_its_cwriter, vgic_mmio_write_its_cwriter, 8,
>>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>>  	REGISTER_ITS_DESC(GITS_CREADR,
>>> -		its_mmio_read_raz, its_mmio_write_wi, 8,
>>> +		vgic_mmio_read_its_creadr, its_mmio_write_wi, 8,
>>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>>  	REGISTER_ITS_DESC(GITS_BASER,
>>> -		its_mmio_read_raz, its_mmio_write_wi, 0x40,
>>> +		vgic_mmio_read_its_baser, vgic_mmio_write_its_baser, 0x40,
>>>  		VGIC_ACCESS_64bit | VGIC_ACCESS_32bit),
>>>  	REGISTER_ITS_DESC(GITS_IDREGS_BASE,
>>> -		its_mmio_read_raz, its_mmio_write_wi, 0x30,
>>> +		vgic_mmio_read_its_idregs, its_mmio_write_wi, 0x30,
>>>  		VGIC_ACCESS_32bit),
>>>  };
>>>  
>>> @@ -100,6 +402,18 @@ static int vgic_its_register(struct kvm *kvm, struct vgic_its *its)
>>>  	return ret;
>>>  }
>>>  
>>> +#define INITIAL_BASER_VALUE						  \
>>> +	(GIC_BASER_CACHEABILITY(GITS_BASER, INNER, RaWb)		| \
>>> +	 GIC_BASER_CACHEABILITY(GITS_BASER, OUTER, SameAsInner)		| \
>>> +	 GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable)		| \
>>> +	 ((8ULL - 1) << GITS_BASER_ENTRY_SIZE_SHIFT)			| \
>>> +	 GITS_BASER_PAGE_SIZE_64K)
>>> +
>>> +#define INITIAL_PROPBASER_VALUE						  \
>>> +	(GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWb)		| \
>>> +	 GIC_BASER_CACHEABILITY(GICR_PROPBASER, OUTER, SameAsInner)	| \
>>> +	 GIC_BASER_SHAREABILITY(GICR_PROPBASER, InnerShareable))
>>> +
>>>  static int vgic_its_create(struct kvm_device *dev, u32 type)
>>>  {
>>>  	struct vgic_its *its;
>>> @@ -111,12 +425,25 @@ static int vgic_its_create(struct kvm_device *dev, u32 type)
>>>  	if (!its)
>>>  		return -ENOMEM;
>>>  
>>> +	mutex_init(&its->its_lock);
>>> +	mutex_init(&its->cmd_lock);
>>> +
>>>  	its->vgic_its_base = VGIC_ADDR_UNDEF;
>>>  
>>> +	INIT_LIST_HEAD(&its->device_list);
>>> +	INIT_LIST_HEAD(&its->collection_list);
>>> +
>>>  	dev->kvm->arch.vgic.has_its = true;
>>>  	its->initialized = false;
>>>  	its->enabled = false;
>>>  
>>> +	its->baser_device_table = INITIAL_BASER_VALUE			|
>>> +		((u64)GITS_BASER_TYPE_DEVICE << GITS_BASER_TYPE_SHIFT)	|
>>> +		GITS_BASER_INDIRECT;
>>
>> It is a bit odd to advertize the indirect flag as a reset value, but I
>> don't see anything that indicates it is not allowed...
> 
> I find it really confusing as to what fields are supposed to indicate
> support on reset and which are just taking part of that
> "write-and-see-if-it-sticks" game.

Reset values are usually "UNKNOWN".

> I take it now there are no requirements on the reset state and
> everything is negioated via writing to the register?
> In this case I'd move the indirect indication from here to the write
> function above.

Please do.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 12/17] KVM: arm64: connect LPIs to the VGIC emulation
  2016-07-05 11:23 ` [PATCH v8 12/17] KVM: arm64: connect LPIs to the VGIC emulation Andre Przywara
@ 2016-07-11 16:20   ` Marc Zyngier
  0 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2016-07-11 16:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/07/16 12:23, Andre Przywara wrote:
> LPIs are dynamically created (mapped) at guest runtime and their
> actual number can be quite high, but is mostly assigned using a very
> sparse allocation scheme. So arrays are not an ideal data structure
> to hold the information.
> We use a spin-lock protected linked list to hold all mapped LPIs,
> represented by their struct vgic_irq. This lock is grouped between the
> ap_list_lock and the vgic_irq lock in our locking order.
> Also we store a pointer to that struct vgic_irq in our struct its_itte,
> so we can easily access it.
> Eventually we call our new vgic_its_get_lpi() from vgic_get_irq(), so
> the VGIC code gets transparently access to LPIs.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  include/kvm/arm_vgic.h        |  6 ++++++
>  virt/kvm/arm/vgic/vgic-init.c |  3 +++
>  virt/kvm/arm/vgic/vgic-its.c  | 32 +++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic-v3.c   |  2 ++
>  virt/kvm/arm/vgic/vgic.c      | 48 +++++++++++++++++++++++++++++++++++--------
>  virt/kvm/arm/vgic/vgic.h      |  7 +++++++
>  6 files changed, 90 insertions(+), 8 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 17d3929..5aff85c 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -77,6 +77,7 @@ enum vgic_irq_config {
>  
>  struct vgic_irq {
>  	spinlock_t irq_lock;		/* Protects the content of the struct */
> +	struct list_head lpi_entry;	/* Used to link all LPIs together */

Maybe name that field consistently with the one that just follows?

>  	struct list_head ap_list;
>  
>  	struct kvm_vcpu *vcpu;		/* SGIs and PPIs: The VCPU
> @@ -185,6 +186,11 @@ struct vgic_dist {
>  	 * GICv3 spec: 6.1.2 "LPI Configuration tables"
>  	 */
>  	u64			propbaser;
> +
> +	/* Protects the lpi_list and the count value below. */
> +	spinlock_t		lpi_list_lock;
> +	struct list_head	lpi_list_head;
> +	int			lpi_list_count;
>  };
>  
>  struct vgic_v2_cpu_if {
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index ac3c1a5..535e713 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -157,6 +157,9 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
>  	struct kvm_vcpu *vcpu0 = kvm_get_vcpu(kvm, 0);
>  	int i;
>  
> +	INIT_LIST_HEAD(&dist->lpi_list_head);
> +	spin_lock_init(&dist->lpi_list_lock);
> +
>  	dist->spis = kcalloc(nr_spis, sizeof(struct vgic_irq), GFP_KERNEL);
>  	if (!dist->spis)
>  		return  -ENOMEM;
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index a9336a4..1e2e649 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -33,6 +33,31 @@
>  #include "vgic.h"
>  #include "vgic-mmio.h"
>  
> +/*
> + * Iterate over the VM's list of mapped LPIs to find the one with a
> + * matching interrupt ID and return a reference to the IRQ structure.
> + */
> +struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid)

Why is this in the ITS code? This shouldn't be made a first class API,
but instead kept close to the vgic_get_irq() code.

> +{
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +	struct vgic_irq *irq = NULL;
> +
> +	spin_lock(&dist->lpi_list_lock);
> +	list_for_each_entry(irq, &dist->lpi_list_head, lpi_entry) {
> +		if (irq->intid != intid)
> +			continue;
> +
> +		kref_get(&irq->refcount);

Please add a comment stating that the refcount is incremented, and that
vgic_put_irq() is to be called to drop the reference. And moving the
function away (as well as making it static) will remove aly doubt about
its use.

> +		goto out_unlock;
> +	}
> +	irq = NULL;
> +
> +out_unlock:
> +	spin_unlock(&dist->lpi_list_lock);
> +
> +	return irq;
> +}
> +
>  struct its_device {
>  	struct list_head dev_list;
>  
> @@ -56,11 +81,17 @@ struct its_collection {
>  struct its_itte {
>  	struct list_head itte_list;
>  
> +	struct vgic_irq *irq;
>  	struct its_collection *collection;
>  	u32 lpi;
>  	u32 event_id;
>  };
>  
> +/* To be used as an iterator this macro misses the enclosing parentheses */
> +#define for_each_lpi_its(dev, itte, its) \
> +	list_for_each_entry(dev, &(its)->device_list, dev_list) \
> +		list_for_each_entry(itte, &(dev)->itt_head, itte_list)

Where is this macro used? Please move it in the appropriate patch.

> +
>  #define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
>  
>  static unsigned long vgic_mmio_read_its_ctlr(struct kvm *vcpu,
> @@ -144,6 +175,7 @@ static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm,
>  static void its_free_itte(struct kvm *kvm, struct its_itte *itte)
>  {
>  	list_del(&itte->itte_list);
> +	vgic_put_irq(kvm, itte->irq);

Who does the "get"? Where is it populated?

>  	kfree(itte);
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 6f8f31f..0506543 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -81,6 +81,8 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  		else
>  			intid = val & GICH_LR_VIRTUALID;
>  		irq = vgic_get_irq(vcpu->kvm, vcpu, intid);
> +		if (!irq)	/* An LPI could have been unmapped. */
> +			continue;
>  
>  		spin_lock(&irq->irq_lock);
>  
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index a5d9a10..72b2516 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -36,7 +36,8 @@ struct vgic_global __section(.hyp.text) kvm_vgic_global_state;
>   * its->cmd_lock (mutex)
>   *   its->its_lock (mutex)
>   *     vgic_cpu->ap_list_lock
> - *       vgic_irq->irq_lock
> + *       kvm->lpi_list_lock
> + *         vgic_irq->irq_lock
>   *
>   * If you need to take multiple locks, always take the upper lock first,
>   * then the lower ones, e.g. first take the its_lock, then the irq_lock.
> @@ -69,23 +70,54 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  		return irq;
>  	}
>  
> -	/* LPIs are not yet covered */
> -	if (intid >= VGIC_MIN_LPI)
> +	if (intid < VGIC_MIN_LPI) {
> +		WARN(1, "Looking up struct vgic_irq for reserved INTID");
>  		return NULL;
> +	}
>  
> -	WARN(1, "Looking up struct vgic_irq for reserved INTID");
> -	return NULL;
> +	/* LPIs */
> +	return vgic_its_get_lpi(kvm, intid);
>  }
>  
> -/* The refcount should never drop to 0 at the moment. */
> +/*
> + * We can't do anything in here, because we lack the kvm pointer to
> + * lock and remove the item from the lpi_list. So we keep this function
> + * empty and use the return value of kref_put() to trigger the freeing.
> + */
>  static void vgic_irq_release(struct kref *ref)
>  {
> -	WARN_ON(1);
> +}
> +
> +static void __vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq, bool locked)
> +{
> +	struct vgic_dist *dist;
> +
> +	if (!kref_put(&irq->refcount, vgic_irq_release))
> +		return;
> +
> +	if (irq->intid < VGIC_MIN_LPI)
> +		return;
> +
> +	dist = &kvm->arch.vgic;
> +
> +	if (!locked)
> +		spin_lock(&dist->lpi_list_lock);
> +	list_del(&irq->lpi_entry);
> +	dist->lpi_list_count--;
> +	if (!locked)
> +		spin_unlock(&dist->lpi_list_lock);
> +
> +	kfree(irq);
> +}
> +
> +void vgic_put_irq_locked(struct kvm *kvm, struct vgic_irq *irq)
> +{
> +	__vgic_put_irq(kvm, irq, true);
>  }
>  
>  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>  {
> -	kref_put(&irq->refcount, vgic_irq_release);
> +	__vgic_put_irq(kvm, irq, false);
>  }

The usual idiom in the kernel is to have __vgic_put_irq() to always work
on locked irqs, and vgic_put_irq() to explicitly take the lock (and call
__vgic_put_irq). Please follow that rule, as it helps people unfamiliar
with the code to understand it more easily.

Also, the "locked" version isn't used in this patch. Maybe move it where
it is actually used?

>  
>  /**
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index 9dc7207..eef9ec1 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -39,6 +39,7 @@ struct vgic_vmcr {
>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  			      u32 intid);
>  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
> +void vgic_put_irq_locked(struct kvm *kvm, struct vgic_irq *irq);
>  bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
>  void vgic_kick_vcpus(struct kvm *kvm);
>  
> @@ -77,6 +78,7 @@ int vgic_v3_map_resources(struct kvm *kvm);
>  int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t dist_base_address);
>  bool vgic_has_its(struct kvm *kvm);
>  int kvm_vgic_register_its_device(void);
> +struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid);
>  #else
>  static inline void vgic_v3_process_maintenance(struct kvm_vcpu *vcpu)
>  {
> @@ -138,6 +140,11 @@ static inline int kvm_vgic_register_its_device(void)
>  {
>  	return -ENODEV;
>  }
> +
> +static inline struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid)
> +{
> +	return NULL;
> +}
>  #endif
>  
>  int kvm_register_vgic_device(unsigned long type);
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 13/17] KVM: arm64: read initial LPI pending table
  2016-07-05 11:23 ` [PATCH v8 13/17] KVM: arm64: read initial LPI pending table Andre Przywara
@ 2016-07-11 16:50   ` Marc Zyngier
  2016-07-11 17:38     ` Andre Przywara
  2016-07-12 11:33     ` Andre Przywara
  0 siblings, 2 replies; 49+ messages in thread
From: Marc Zyngier @ 2016-07-11 16:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/07/16 12:23, Andre Przywara wrote:
> The LPI pending status for a GICv3 redistributor is held in a table
> in (guest) memory. To achieve reasonable performance, we cache this
> data in our struct vgic_irq. The initial pending state must be read
> from guest memory upon enabling LPIs for this redistributor.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 81 ++++++++++++++++++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic.h     |  6 ++++
>  2 files changed, 87 insertions(+)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index 1e2e649..29bb4fe 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -93,6 +93,81 @@ struct its_itte {
>  		list_for_each_entry(itte, &(dev)->itt_head, itte_list)
>  
>  #define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
> +#define PENDBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 16))

52 bits again. Pick a side!

> +
> +static int vgic_its_copy_lpi_list(struct kvm *kvm, u32 **intid_ptr)
> +{
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +	struct vgic_irq *irq;
> +	u32 *intids;
> +	int irq_count = dist->lpi_list_count, i = 0;
> +
> +	/*
> +	 * We use the current value of the list length, which may change
> +	 * after the kmalloc. We don't care, because the guest shouldn't
> +	 * change anything while the command handling is still running,
> +	 * and in the worst case we would miss a new IRQ, which one wouldn't
> +	 * expect to be covered by this command anyway.
> +	 */
> +	intids = kmalloc_array(irq_count, sizeof(intids[0]), GFP_KERNEL);
> +	if (!intids)
> +		return -ENOMEM;
> +
> +	spin_lock(&dist->lpi_list_lock);
> +	list_for_each_entry(irq, &dist->lpi_list_head, lpi_entry) {
> +		if (kref_get_unless_zero(&irq->refcount)) {
> +			intids[i] = irq->intid;
> +			vgic_put_irq_locked(kvm, irq);

This is ugly. You know you're not going to free the irq, since it was at
least one when you did kref_get_unless_zero(). Why not doing a simple
kref_put (possibly in a macro so that you can hide the dummy release
function)?

> +		}
> +		if (i++ == irq_count)
> +			break;
> +	}
> +	spin_unlock(&dist->lpi_list_lock);
> +
> +	*intid_ptr = intids;
> +	return irq_count;
> +}
> +
> +/*
> + * Scan the whole LPI pending table and sync the pending bit in there
> + * with our own data structures. This relies on the LPI being
> + * mapped before.
> + */
> +static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
> +{
> +	gpa_t pendbase = PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
> +	struct vgic_irq *irq;
> +	u8 pendmask;
> +	int ret = 0;
> +	u32 *intids;
> +	int nr_irqs, i;
> +
> +	nr_irqs = vgic_its_copy_lpi_list(vcpu->kvm, &intids);
> +	if (nr_irqs < 0)
> +		return nr_irqs;
> +
> +	for (i = 0; i < nr_irqs; i++) {
> +		int byte_offset, bit_nr;
> +
> +		byte_offset = intids[i] / BITS_PER_BYTE;
> +		bit_nr = intids[i] % BITS_PER_BYTE;
> +
> +		ret = kvm_read_guest(vcpu->kvm, pendbase + byte_offset,
> +				     &pendmask, 1);

How about having a small cache of the last read offset and data? If LPIs
are contiguously allocated, you save yourself quite a few (expensive)
userspace accesses.

> +		if (ret) {
> +			kfree(intids);
> +			return ret;
> +		}
> +
> +		irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
> +		spin_lock(&irq->irq_lock);
> +		irq->pending = pendmask & (1U << bit_nr);
> +		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_put_irq(vcpu->kvm, irq);
> +	}
> +
> +	return ret;
> +}
>  
>  static unsigned long vgic_mmio_read_its_ctlr(struct kvm *vcpu,
>  					     struct vgic_its *its,
> @@ -415,6 +490,12 @@ static struct vgic_register_region its_registers[] = {
>  		VGIC_ACCESS_32bit),
>  };
>  
> +/* This is called on setting the LPI enable bit in the redistributor. */
> +void vgic_enable_lpis(struct kvm_vcpu *vcpu)
> +{
> +	its_sync_lpi_pending_table(vcpu);
> +}
> +
>  static int vgic_its_register(struct kvm *kvm, struct vgic_its *its)
>  {
>  	struct vgic_io_device *iodev = &its->iodev;
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index eef9ec1..4a9165f 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -25,6 +25,7 @@
>  #define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
>  
>  #define INTERRUPT_ID_BITS_SPIS	10
> +#define INTERRUPT_ID_BITS_ITS	16

Do we have plan for a userspace-accessible property for this? I can
imagine userspace willing to have bigger LPI space...

>  #define VGIC_PRI_BITS		5
>  
>  #define vgic_irq_is_sgi(intid) ((intid) < VGIC_NR_SGIS)
> @@ -79,6 +80,7 @@ int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t dist_base_address);
>  bool vgic_has_its(struct kvm *kvm);
>  int kvm_vgic_register_its_device(void);
>  struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid);
> +void vgic_enable_lpis(struct kvm_vcpu *vcpu);
>  #else
>  static inline void vgic_v3_process_maintenance(struct kvm_vcpu *vcpu)
>  {
> @@ -145,6 +147,10 @@ static inline struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid)
>  {
>  	return NULL;
>  }
> +
> +static inline void vgic_enable_lpis(struct kvm_vcpu *vcpu)
> +{
> +}
>  #endif
>  
>  int kvm_register_vgic_device(unsigned long type);
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 14/17] KVM: arm64: allow updates of LPI configuration table
  2016-07-05 11:23 ` [PATCH v8 14/17] KVM: arm64: allow updates of LPI configuration table Andre Przywara
@ 2016-07-11 16:59   ` Marc Zyngier
  0 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2016-07-11 16:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/07/16 12:23, Andre Przywara wrote:
> The (system-wide) LPI configuration table is held in a table in
> (guest) memory. To achieve reasonable performance, we cache this data
> in our struct vgic_irq. If the guest updates the configuration data
> (which consists of the enable bit and the priority value), it issues
> an INV or INVALL command to allow us to update our information.
> Provide functions that update that information for one LPI or all LPIs
> mapped to a specific collection.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 45 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index 29bb4fe..5de71bd 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -94,6 +94,51 @@ struct its_itte {
>  
>  #define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
>  #define PENDBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 16))
> +#define PROPBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))

52 bits...

> +
> +#define GIC_LPI_OFFSET 8192
> +
> +#define LPI_PROP_ENABLE_BIT(p)	((p) & LPI_PROP_ENABLED)
> +#define LPI_PROP_PRIORITY(p)	((p) & 0xfc)
> +
> +/*
> + * Reads the configuration data for a given LPI from guest memory and
> + * updates the fields in struct vgic_irq.
> + * If filter_vcpu is not NULL, applies only if the IRQ is targeting this
> + * VCPU. Unconditionally applies if filter_vcpu is NULL.
> + */
> +static int update_lpi_config_filtered(struct kvm *kvm, struct vgic_irq *irq,
> +				      struct kvm_vcpu *filter_vcpu)
> +{
> +	u64 propbase = PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
> +	u8 prop;
> +	int ret;
> +
> +	ret = kvm_read_guest(kvm, propbase + irq->intid - GIC_LPI_OFFSET,
> +			     &prop, 1);
> +
> +	if (ret)
> +		return ret;
> +
> +	spin_lock(&irq->irq_lock);
> +
> +	if (!filter_vcpu || filter_vcpu == irq->target_vcpu) {
> +		irq->priority = LPI_PROP_PRIORITY(prop);
> +		irq->enabled = LPI_PROP_ENABLE_BIT(prop);
> +
> +		vgic_queue_irq_unlock(kvm, irq);
> +	} else {
> +		spin_unlock(&irq->irq_lock);
> +	}
> +
> +	return 0;
> +}
> +
> +/* Updates the priority and enable bit for a given LPI. */
> +int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq)

static?

> +{
> +	return update_lpi_config_filtered(kvm, irq, NULL);
> +}

I think you can drop the "_filtered" thing, and just have a
update_lpi_config() that takes a vcpu parameter. The comment at the top
is clear enough about the use case.

>  
>  static int vgic_its_copy_lpi_list(struct kvm *kvm, u32 **intid_ptr)
>  {
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 15/17] KVM: arm64: implement ITS command queue command handlers
  2016-07-05 11:23 ` [PATCH v8 15/17] KVM: arm64: implement ITS command queue command handlers Andre Przywara
@ 2016-07-11 17:17   ` Marc Zyngier
  2016-07-11 17:47     ` Andre Przywara
  0 siblings, 1 reply; 49+ messages in thread
From: Marc Zyngier @ 2016-07-11 17:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/07/16 12:23, Andre Przywara wrote:
> The connection between a device, an event ID, the LPI number and the
> allocated CPU is stored in in-memory tables in a GICv3, but their
> format is not specified by the spec. Instead software uses a command
> queue in a ring buffer to let the ITS implementation use their own
> format.
> Implement handlers for the various ITS commands and let them store
> the requested relation into our own data structures. Those data
> structures are protected by the its_lock mutex.
> Our internal ring buffer read and write pointers are protected by the
> its_cmd mutex, so that at most one VCPU per ITS can handle commands at
> any given time.
> Error handling is very basic at the moment, as we don't have a good
> way of communicating errors to the guest (usually a SError).
> The INT command handler is missing at this point, as we gain the
> capability of actually injecting MSIs into the guest only later on.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 609 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 605 insertions(+), 4 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index 5de71bd..432daed 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -58,6 +58,43 @@ out_unlock:
>  	return irq;
>  }
>  
> +/*
> + * Creates a new (reference to a) struct vgic_irq for a given LPI.
> + * If this LPI is already mapped on another ITS, we increase its refcount
> + * and return a pointer to the existing structure.
> + * If this is a "new" LPI, we allocate and initialize a new struct vgic_irq.
> + * This function returns a pointer to the _unlocked_ structure.
> + */
> +static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid)
> +{
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +	struct vgic_irq *irq = vgic_its_get_lpi(kvm, intid);

So this thing doesn't return with any lock held...

> +
> +	/* In this case there is no put, since we keep the reference. */
> +	if (irq)
> +		return irq;
> +
> +	irq = kzalloc(sizeof(struct vgic_irq), GFP_KERNEL);
> +
> +	if (!irq)
> +		return NULL;
> +
> +	INIT_LIST_HEAD(&irq->lpi_entry);
> +	INIT_LIST_HEAD(&irq->ap_list);
> +	spin_lock_init(&irq->irq_lock);
> +
> +	irq->config = VGIC_CONFIG_EDGE;
> +	kref_init(&irq->refcount);
> +	irq->intid = intid;

which means that two callers can allocate their own irq structure...

> +
> +	spin_lock(&dist->lpi_list_lock);
> +	list_add_tail(&irq->lpi_entry, &dist->lpi_list_head);
> +	dist->lpi_list_count++;
> +	spin_unlock(&dist->lpi_list_lock);

and insert it. Not too bad if they are different LPIs, but leading to
Armageddon if they are the same. You absolutely need to check for the
the presence of the interrupt in this list *while holding the lock*.

> +
> +	return irq;
> +}
> +
>  struct its_device {
>  	struct list_head dev_list;
>  
> @@ -87,6 +124,43 @@ struct its_itte {
>  	u32 event_id;
>  };
>  
> +/*
> + * Find and returns a device in the device table for an ITS.
> + * Must be called with the its_lock held.
> + */
> +static struct its_device *find_its_device(struct vgic_its *its, u32 device_id)
> +{
> +	struct its_device *device;
> +
> +	list_for_each_entry(device, &its->device_list, dev_list)
> +		if (device_id == device->device_id)
> +			return device;
> +
> +	return NULL;
> +}
> +
> +/*
> + * Find and returns an interrupt translation table entry (ITTE) for a given
> + * Device ID/Event ID pair on an ITS.
> + * Must be called with the its_lock held.
> + */
> +static struct its_itte *find_itte(struct vgic_its *its, u32 device_id,
> +				  u32 event_id)
> +{
> +	struct its_device *device;
> +	struct its_itte *itte;
> +
> +	device = find_its_device(its, device_id);
> +	if (device == NULL)
> +		return NULL;
> +
> +	list_for_each_entry(itte, &device->itt_head, itte_list)
> +		if (itte->event_id == event_id)
> +			return itte;
> +
> +	return NULL;
> +}
> +
>  /* To be used as an iterator this macro misses the enclosing parentheses */
>  #define for_each_lpi_its(dev, itte, its) \
>  	list_for_each_entry(dev, &(its)->device_list, dev_list) \
> @@ -98,6 +172,22 @@ struct its_itte {
>  
>  #define GIC_LPI_OFFSET 8192
>  
> +/*
> + * Finds and returns a collection in the ITS collection table.
> + * Must be called with the its_lock held.
> + */
> +static struct its_collection *find_collection(struct vgic_its *its, int coll_id)
> +{
> +	struct its_collection *collection;
> +
> +	list_for_each_entry(collection, &its->collection_list, coll_list) {
> +		if (coll_id == collection->collection_id)
> +			return collection;
> +	}
> +
> +	return NULL;
> +}
> +
>  #define LPI_PROP_ENABLE_BIT(p)	((p) & LPI_PROP_ENABLED)
>  #define LPI_PROP_PRIORITY(p)	((p) & 0xfc)
>  
> @@ -135,7 +225,7 @@ static int update_lpi_config_filtered(struct kvm *kvm, struct vgic_irq *irq,
>  }
>  
>  /* Updates the priority and enable bit for a given LPI. */
> -int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq)
> +static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq)
>  {
>  	return update_lpi_config_filtered(kvm, irq, NULL);
>  }
> @@ -174,6 +264,48 @@ static int vgic_its_copy_lpi_list(struct kvm *kvm, u32 **intid_ptr)
>  }
>  
>  /*
> + * Promotes the ITS view of affinity of an ITTE (which redistributor this LPI
> + * is targeting) to the VGIC's view, which deals with target VCPUs.
> + * Needs to be called whenever either the collection for a LPIs has
> + * changed or the collection itself got retargeted.
> + */
> +static void update_affinity_itte(struct kvm *kvm, struct its_itte *itte)
> +{
> +	struct kvm_vcpu *vcpu;
> +
> +	vcpu = kvm_get_vcpu(kvm, itte->collection->target_addr);
> +
> +	spin_lock(&itte->irq->irq_lock);
> +	itte->irq->target_vcpu = vcpu;
> +	spin_unlock(&itte->irq->irq_lock);
> +}
> +
> +/*
> + * Updates the target VCPU for every LPI targeting this collection.
> + * Must be called with the its_lock held.
> + */
> +static void update_affinity_collection(struct kvm *kvm, struct vgic_its *its,
> +				       struct its_collection *coll)
> +{
> +	struct its_device *device;
> +	struct its_itte *itte;
> +
> +	for_each_lpi_its(device, itte, its) {
> +		if (!itte->collection || coll != itte->collection)
> +			continue;
> +
> +		update_affinity_itte(kvm, itte);
> +	}
> +}
> +
> +static u32 max_lpis_propbaser(u64 propbaser)
> +{
> +	int nr_idbits = (propbaser & 0x1f) + 1;
> +
> +	return 1U << min(nr_idbits, INTERRUPT_ID_BITS_ITS);
> +}
> +
> +/*
>   * Scan the whole LPI pending table and sync the pending bit in there
>   * with our own data structures. This relies on the LPI being
>   * mapped before.
> @@ -299,10 +431,479 @@ static void its_free_itte(struct kvm *kvm, struct its_itte *itte)
>  	kfree(itte);
>  }
>  
> -static int vits_handle_command(struct kvm *kvm, struct vgic_its *its,
> +static u64 its_cmd_mask_field(u64 *its_cmd, int word, int shift, int size)
> +{
> +	return (le64_to_cpu(its_cmd[word]) >> shift) & (BIT_ULL(size) - 1);
> +}
> +
> +#define its_cmd_get_command(cmd)	its_cmd_mask_field(cmd, 0,  0,  8)
> +#define its_cmd_get_deviceid(cmd)	its_cmd_mask_field(cmd, 0, 32, 32)
> +#define its_cmd_get_id(cmd)		its_cmd_mask_field(cmd, 1,  0, 32)
> +#define its_cmd_get_physical_id(cmd)	its_cmd_mask_field(cmd, 1, 32, 32)
> +#define its_cmd_get_collection(cmd)	its_cmd_mask_field(cmd, 2,  0, 16)
> +#define its_cmd_get_target_addr(cmd)	its_cmd_mask_field(cmd, 2, 16, 32)
> +#define its_cmd_get_validbit(cmd)	its_cmd_mask_field(cmd, 2, 63,  1)
> +
> +/* The DISCARD command frees an Interrupt Translation Table Entry (ITTE). */
> +static int vgic_its_cmd_handle_discard(struct kvm *kvm, struct vgic_its *its,
> +				   u64 *its_cmd)
> +{
> +	u32 device_id;
> +	u32 event_id;
> +	struct its_itte *itte;
> +	int ret = E_ITS_DISCARD_UNMAPPED_INTERRUPT;
> +
> +	device_id = its_cmd_get_deviceid(its_cmd);
> +	event_id = its_cmd_get_id(its_cmd);
> +
> +	mutex_lock(&its->its_lock);
> +	itte = find_itte(its, device_id, event_id);
> +	if (itte && itte->collection) {
> +		/*
> +		 * Though the spec talks about removing the pending state, we
> +		 * don't bother here since we clear the ITTE anyway and the
> +		 * pending state is a property of the ITTE struct.
> +		 */
> +		its_free_itte(kvm, itte);
> +		ret = 0;
> +	}
> +
> +	mutex_unlock(&its->its_lock);
> +	return ret;
> +}
> +
> +/* The MOVI command moves an ITTE to a different collection. */
> +static int vgic_its_cmd_handle_movi(struct kvm *kvm, struct vgic_its *its,
> +				u64 *its_cmd)
> +{
> +	u32 device_id = its_cmd_get_deviceid(its_cmd);
> +	u32 event_id = its_cmd_get_id(its_cmd);
> +	u32 coll_id = its_cmd_get_collection(its_cmd);
> +	struct kvm_vcpu *vcpu;
> +	struct its_itte *itte;
> +	struct its_collection *collection;
> +	int ret = 0;
> +
> +	mutex_lock(&its->its_lock);
> +	itte = find_itte(its, device_id, event_id);
> +	if (!itte) {
> +		ret = E_ITS_MOVI_UNMAPPED_INTERRUPT;
> +		goto out_unlock;
> +	}
> +	if (!its_is_collection_mapped(itte->collection)) {
> +		ret = E_ITS_MOVI_UNMAPPED_COLLECTION;
> +		goto out_unlock;
> +	}
> +
> +	collection = find_collection(its, coll_id);
> +	if (!its_is_collection_mapped(collection)) {
> +		ret = E_ITS_MOVI_UNMAPPED_COLLECTION;
> +		goto out_unlock;
> +	}
> +
> +	itte->collection = collection;
> +	vcpu = kvm_get_vcpu(kvm, collection->target_addr);
> +
> +	spin_lock(&itte->irq->irq_lock);
> +	itte->irq->target_vcpu = vcpu;
> +	spin_unlock(&itte->irq->irq_lock);
> +
> +out_unlock:
> +	mutex_unlock(&its->its_lock);
> +	return ret;
> +}
> +
> +static void vgic_its_init_collection(struct vgic_its *its,
> +				 struct its_collection *collection,
> +				 u32 coll_id)
> +{
> +	collection->collection_id = coll_id;
> +	collection->target_addr = COLLECTION_NOT_MAPPED;
> +
> +	list_add_tail(&collection->coll_list, &its->collection_list);
> +}
> +
> +/* The MAPTI and MAPI commands map LPIs to ITTEs. */
> +static int vgic_its_cmd_handle_mapi(struct kvm *kvm, struct vgic_its *its,
> +				u64 *its_cmd, u8 subcmd)
> +{
> +	u32 device_id = its_cmd_get_deviceid(its_cmd);
> +	u32 event_id = its_cmd_get_id(its_cmd);
> +	u32 coll_id = its_cmd_get_collection(its_cmd);
> +	struct its_itte *itte;
> +	struct its_device *device;
> +	struct its_collection *collection, *new_coll = NULL;
> +	int lpi_nr;
> +	int ret = 0;
> +
> +	mutex_lock(&its->its_lock);
> +
> +	device = find_its_device(its, device_id);
> +	if (!device) {
> +		ret = E_ITS_MAPTI_UNMAPPED_DEVICE;
> +		goto out_unlock;
> +	}
> +
> +	collection = find_collection(its, coll_id);
> +	if (!collection) {
> +		new_coll = kzalloc(sizeof(struct its_collection), GFP_KERNEL);
> +		if (!new_coll) {
> +			ret = -ENOMEM;
> +			goto out_unlock;
> +		}
> +	}
> +
> +	if (subcmd == GITS_CMD_MAPTI)
> +		lpi_nr = its_cmd_get_physical_id(its_cmd);
> +	else
> +		lpi_nr = event_id;
> +	if (lpi_nr < GIC_LPI_OFFSET ||
> +	    lpi_nr >= max_lpis_propbaser(kvm->arch.vgic.propbaser))
> +		return E_ITS_MAPTI_PHYSICALID_OOR;
> +
> +	itte = find_itte(its, device_id, event_id);
> +	if (!itte) {
> +		itte = kzalloc(sizeof(struct its_itte), GFP_KERNEL);
> +		if (!itte) {
> +			kfree(new_coll);
> +			ret = -ENOMEM;
> +			goto out_unlock;
> +		}
> +
> +		itte->event_id	= event_id;
> +		list_add_tail(&itte->itte_list, &device->itt_head);
> +	}
> +
> +	if (!collection) {
> +		collection = new_coll;
> +		vgic_its_init_collection(its, collection, coll_id);
> +	}
> +
> +	itte->collection = collection;
> +	itte->lpi = lpi_nr;
> +	itte->irq = vgic_add_lpi(kvm, lpi_nr);
> +	update_affinity_itte(kvm, itte);
> +
> +	/*
> +	 * We "cache" the configuration table entries in out struct vgic_irq's.
> +	 * However we only have those structs for mapped IRQs, so we read in
> +	 * the respective config data from memory here upon mapping the LPI.
> +	 */
> +	update_lpi_config(kvm, itte->irq);
> +
> +out_unlock:
> +	mutex_unlock(&its->its_lock);
> +
> +	return 0;
> +}
> +
> +/* Requires the its_lock to be held. */
> +static void vgic_its_unmap_device(struct kvm *kvm, struct its_device *device)
> +{
> +	struct its_itte *itte, *temp;
> +
> +	/*
> +	 * The spec says that unmapping a device with still valid
> +	 * ITTEs associated is UNPREDICTABLE. We remove all ITTEs,
> +	 * since we cannot leave the memory unreferenced.
> +	 */
> +	list_for_each_entry_safe(itte, temp, &device->itt_head, itte_list)
> +		its_free_itte(kvm, itte);
> +
> +	list_del(&device->dev_list);
> +	kfree(device);
> +}
> +
> +/* MAPD maps or unmaps a device ID to Interrupt Translation Tables (ITTs). */
> +static int vgic_its_cmd_handle_mapd(struct kvm *kvm, struct vgic_its *its,
> +				u64 *its_cmd)
> +{
> +	bool valid = its_cmd_get_validbit(its_cmd);
> +	u32 device_id = its_cmd_get_deviceid(its_cmd);
> +	struct its_device *device;
> +	int ret = 0;
> +
> +	mutex_lock(&its->its_lock);
> +
> +	device = find_its_device(its, device_id);
> +	if (device)
> +		vgic_its_unmap_device(kvm, device);
> +
> +	/*
> +	 * The spec does not say whether unmapping a not-mapped device
> +	 * is an error, so we are done in any case.
> +	 */
> +	if (!valid)
> +		goto out_unlock;
> +
> +	device = kzalloc(sizeof(struct its_device), GFP_KERNEL);
> +	if (!device) {
> +		ret = -ENOMEM;
> +		goto out_unlock;
> +	}
> +
> +	device->device_id = device_id;
> +	INIT_LIST_HEAD(&device->itt_head);
> +
> +	list_add_tail(&device->dev_list, &its->device_list);
> +
> +out_unlock:
> +	mutex_unlock(&its->its_lock);
> +	return ret;
> +}
> +
> +/* The MAPC command maps collection IDs to redistributors. */
> +static int vgic_its_cmd_handle_mapc(struct kvm *kvm, struct vgic_its *its,
> +				u64 *its_cmd)
> +{
> +	u16 coll_id;
> +	u32 target_addr;
> +	struct its_collection *collection;
> +	bool valid;
> +	int ret = 0;
> +
> +	valid = its_cmd_get_validbit(its_cmd);
> +	coll_id = its_cmd_get_collection(its_cmd);
> +	target_addr = its_cmd_get_target_addr(its_cmd);
> +
> +	if (target_addr >= atomic_read(&kvm->online_vcpus))
> +		return E_ITS_MAPC_PROCNUM_OOR;
> +
> +	mutex_lock(&its->its_lock);
> +
> +	collection = find_collection(its, coll_id);
> +
> +	if (!valid) {
> +		struct its_device *device;
> +		struct its_itte *itte;
> +		/*
> +		 * Clearing the mapping for that collection ID removes the
> +		 * entry from the list. If there wasn't any before, we can
> +		 * go home early.
> +		 */
> +		if (!collection)
> +			goto out_unlock;
> +
> +		for_each_lpi_its(device, itte, its)
> +			if (itte->collection &&
> +			    itte->collection->collection_id == coll_id)
> +				itte->collection = NULL;
> +
> +		list_del(&collection->coll_list);
> +		kfree(collection);
> +	} else {
> +		if (!collection) {
> +			collection = kzalloc(sizeof(struct its_collection),
> +					     GFP_KERNEL);
> +			if (!collection) {
> +				ret = -ENOMEM;
> +				goto out_unlock;
> +			}
> +
> +			vgic_its_init_collection(its, collection, coll_id);
> +			collection->target_addr = target_addr;
> +		} else {
> +			collection->target_addr = target_addr;
> +			update_affinity_collection(kvm, its, collection);
> +		}
> +	}
> +
> +out_unlock:
> +	mutex_unlock(&its->its_lock);
> +
> +	return ret;
> +}
> +
> +/* The CLEAR command removes the pending state for a particular LPI. */
> +static int vgic_its_cmd_handle_clear(struct kvm *kvm, struct vgic_its *its,
> +				 u64 *its_cmd)
> +{
> +	u32 device_id;
> +	u32 event_id;
> +	struct its_itte *itte;
> +	int ret = 0;
> +
> +	device_id = its_cmd_get_deviceid(its_cmd);
> +	event_id = its_cmd_get_id(its_cmd);
> +
> +	mutex_lock(&its->its_lock);
> +
> +	itte = find_itte(its, device_id, event_id);
> +	if (!itte) {
> +		ret = E_ITS_CLEAR_UNMAPPED_INTERRUPT;
> +		goto out_unlock;
> +	}
> +
> +	itte->irq->pending = false;
> +
> +out_unlock:
> +	mutex_unlock(&its->its_lock);
> +	return ret;
> +}
> +
> +/* The INV command syncs the configuration bits from the memory table. */
> +static int vgic_its_cmd_handle_inv(struct kvm *kvm, struct vgic_its *its,
> +			       u64 *its_cmd)
> +{
> +	u32 device_id;
> +	u32 event_id;
> +	struct its_itte *itte;
> +	int ret;
> +
> +	device_id = its_cmd_get_deviceid(its_cmd);
> +	event_id = its_cmd_get_id(its_cmd);
> +
> +	mutex_lock(&its->its_lock);
> +
> +	itte = find_itte(its, device_id, event_id);
> +	if (!itte) {
> +		ret = E_ITS_INV_UNMAPPED_INTERRUPT;
> +		goto out_unlock;
> +	}
> +
> +	ret = update_lpi_config(kvm, itte->irq);
> +
> +out_unlock:
> +	mutex_unlock(&its->its_lock);
> +	return ret;
> +}
> +
> +/*
> + * The INVALL command requests flushing of all IRQ data in this collection.
> + * Find the VCPU mapped to that collection, then iterate over the VM's list
> + * of mapped LPIs and update the configuration for each IRQ which targets
> + * the specified vcpu. The configuration will be read from the in-memory
> + * configuration table.
> + */
> +static int vgic_its_cmd_handle_invall(struct kvm *kvm, struct vgic_its *its,
> +				  u64 *its_cmd)
> +{
> +	u32 coll_id = its_cmd_get_collection(its_cmd);
> +	struct its_collection *collection;
> +	struct kvm_vcpu *vcpu;
> +	struct vgic_irq *irq;
> +	u32 *intids;
> +	int irq_count, i;
> +
> +	mutex_lock(&its->its_lock);
> +
> +	collection = find_collection(its, coll_id);
> +	if (!its_is_collection_mapped(collection))
> +		return E_ITS_INVALL_UNMAPPED_COLLECTION;
> +
> +	vcpu = kvm_get_vcpu(kvm, collection->target_addr);
> +
> +	irq_count = vgic_its_copy_lpi_list(kvm, &intids);
> +	if (irq_count < 0)
> +		return irq_count;
> +
> +	for (i = 0; i < irq_count; i++) {
> +		irq = vgic_get_irq(kvm, NULL, intids[i]);
> +		if (!irq)
> +			continue;
> +		update_lpi_config_filtered(kvm, irq, vcpu);
> +		vgic_put_irq_locked(kvm, irq);

Where is the lpi_list_lock taken? And why would we need it since we've
copied everything already? By the look of it, this vgic_put_irq_locked
should not exist at all, as the only other use case is quite dubious.

> +	}
> +
> +	kfree(intids);
> +
> +	mutex_unlock(&its->its_lock);
> +
> +	return 0;
> +}
> +
> +/*
> + * The MOVALL command moves the pending state of all IRQs targeting one
> + * redistributor to another. We don't hold the pending state in the VCPUs,
> + * but in the IRQs instead, so there is really not much to do for us here.
> + * However the spec says that no IRQ must target the old redistributor
> + * afterwards, so we make sure that no LPI is using the associated target_vcpu.
> + * This command affects all LPIs in the system.
> + */
> +static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
> +				  u64 *its_cmd)
> +{
> +	struct vgic_dist *dist = &kvm->arch.vgic;
> +	u32 target1_addr = its_cmd_get_target_addr(its_cmd);
> +	u32 target2_addr = its_cmd_mask_field(its_cmd, 3, 16, 32);
> +	struct kvm_vcpu *vcpu1, *vcpu2;
> +	struct vgic_irq *irq;
> +
> +	if (target1_addr >= atomic_read(&kvm->online_vcpus) ||
> +	    target2_addr >= atomic_read(&kvm->online_vcpus))
> +		return E_ITS_MOVALL_PROCNUM_OOR;
> +
> +	if (target1_addr == target2_addr)
> +		return 0;
> +
> +	vcpu1 = kvm_get_vcpu(kvm, target1_addr);
> +	vcpu2 = kvm_get_vcpu(kvm, target2_addr);
> +
> +	spin_lock(&dist->lpi_list_lock);
> +
> +	list_for_each_entry(irq, &dist->lpi_list_head, lpi_entry) {
> +		spin_lock(&irq->irq_lock);
> +
> +		if (irq->target_vcpu == vcpu1)
> +			irq->target_vcpu = vcpu2;
> +
> +		spin_unlock(&irq->irq_lock);
> +	}
> +
> +	spin_unlock(&dist->lpi_list_lock);
> +
> +	return 0;
> +}
> +
> +/*
> + * This function is called with the its_cmd lock held, but the ITS data
> + * structure lock dropped. It is within the responsibility of the actual
> + * command handlers to take care of proper locking when needed.
> + */
> +static int vgic_its_handle_command(struct kvm *kvm, struct vgic_its *its,
>  			       u64 *its_cmd)
>  {
> -	return -ENODEV;
> +	u8 cmd = its_cmd_get_command(its_cmd);
> +	int ret = -ENODEV;
> +
> +	switch (cmd) {
> +	case GITS_CMD_MAPD:
> +		ret = vgic_its_cmd_handle_mapd(kvm, its, its_cmd);
> +		break;
> +	case GITS_CMD_MAPC:
> +		ret = vgic_its_cmd_handle_mapc(kvm, its, its_cmd);
> +		break;
> +	case GITS_CMD_MAPI:
> +		ret = vgic_its_cmd_handle_mapi(kvm, its, its_cmd, cmd);
> +		break;
> +	case GITS_CMD_MAPTI:
> +		ret = vgic_its_cmd_handle_mapi(kvm, its, its_cmd, cmd);
> +		break;
> +	case GITS_CMD_MOVI:
> +		ret = vgic_its_cmd_handle_movi(kvm, its, its_cmd);
> +		break;
> +	case GITS_CMD_DISCARD:
> +		ret = vgic_its_cmd_handle_discard(kvm, its, its_cmd);
> +		break;
> +	case GITS_CMD_CLEAR:
> +		ret = vgic_its_cmd_handle_clear(kvm, its, its_cmd);
> +		break;
> +	case GITS_CMD_MOVALL:
> +		ret = vgic_its_cmd_handle_movall(kvm, its, its_cmd);
> +		break;
> +	case GITS_CMD_INV:
> +		ret = vgic_its_cmd_handle_inv(kvm, its, its_cmd);
> +		break;
> +	case GITS_CMD_INVALL:
> +		ret = vgic_its_cmd_handle_invall(kvm, its, its_cmd);
> +		break;
> +	case GITS_CMD_SYNC:
> +		/* we ignore this command: we are in sync all of the time */
> +		ret = 0;
> +		break;
> +	}

Given that most commands do take the its mutex, it would make a lot of
sense to move the locking here, and remove it from all of the other
commands. This will streamline the code.

> +
> +	return ret;
>  }
>  
>  static u64 vgic_sanitise_its_baser(u64 reg)
> @@ -403,7 +1004,7 @@ static void vgic_mmio_write_its_cwriter(struct kvm *kvm, struct vgic_its *its,
>  		 * We just ignore that command then.
>  		 */
>  		if (!ret)
> -			vits_handle_command(kvm, its, cmd_buf);
> +			vgic_its_handle_command(kvm, its, cmd_buf);

Care to solve this function renaming nit?

>  
>  		its->creadr += ITS_CMD_SIZE;
>  		if (its->creadr == ITS_CMD_BUFFER_SIZE(its->cbaser))
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation
  2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
                   ` (17 preceding siblings ...)
  2016-07-06  8:52 ` [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Auger Eric
@ 2016-07-11 17:36 ` Marc Zyngier
  18 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2016-07-11 17:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/07/16 12:22, Andre Przywara wrote:
> Hi,
> 
> this series allows those KVM guests that use an emulated GICv3 to use LPIs
> as well, though in the moment this is limited to emulated PCI devices.
> This is based on kvmarm/queue, which now only features the new VGIC
> implementation.
> 
> This time only smaller corrections for the KVM ITS emulation support:
> I addressed the review comments, which pointed out some vgic_put_irq()
> omissions. Also the GICv2 init sequence has changed, so that we can now
> bail out a KVM_DEVICE init without leaking a HYP mapping.
> Also a bug in the MAPC emulation was fixed, which allowed multiple
> mappings of the same collection ID.
> The KVM_DEVICE init sequence has now some checks to ensure the right
> order. The requirements are a bit stricter than for the GICv2/GICv3
> devices: we need to setup the mapping address before calling the
> INIT ioctl. This apparently has some implications on QEMU, I just need
> to be convinced that we should follow QEMU's approach. It seems to look
> a bit ugly to stash the ITS init into the existing GICv3 code, especially
> since the ITS is a separate, optional device.
> 
> You can find all of this code (and the prerequisites) in the
> its-emul/v8 branch of my repository [1].
> This has been briefly tested on the model and on GICv3 hardware.
> If you have GICv3 capable hardware, please test it on your setup.
> Also of course any review comments are very welcome!
> 
> Cheers,
> Andre.
> 
> Changelog v7..v8:
> - rebase on old-VGIC removal patch
> - add missing vgic_put_irq()s
> - check and ensure proper ITS initialisation sequence
> - avoid double collection mapping
> - renaming vits_ function prefixes to vgic_its_
> - properly setup PENDBASER (for new VGIC now)
> - change vgic_v2_probe init order to allow clean exit
> 
> Changelog v6..v7:
> - use kref reference counting
> - remove RCU usage from lpi_list, use spinlock instead
> - copy list of LPIs before accessing guest memory
> - introduce kvm_io_bus_get_dev()
> - refactor parts of arm-gic-v3.h header file
> - provide proper initial values for redistributor and ITS base registers
> - rework sanitisation of base registers
> - rework VGIC MMIO dispatching to differentiate between VGIC parts
> - smaller fixes, also comments and commit messages amended
> 
> Changelog v5..v6:
> - remove its_list from VGIC code
> - add lpi_list and accessor functions
> - introduce reference counting to struct vgic_irq
> - replace its_lock spinlock with its_cmd and its_lock mutexes
> - simplify guest memory accesses (due to the new mutexes)
> - avoid unnecessary affinity updates
> - refine base register address masking
> - introduce sanity checks for PROPBASER and PENDBASER
> - implement BASER<n> registers
> - pass struct vgic_its directly into the MMIO handlers
> - convert KVM_SIGNAL_MSI ioctl into an MMIO write
> - add explicit INIT ioctl to the ITS KVM device
> - adjusting comments and commit messages
> 
> Changelog v4..v5:
> - adapting to final new VGIC (MMIO handlers, etc.)
> - new KVM device to model an ITS, multiple instances allowed
> - move redistributor data into struct vgic_cpu
> - separate distributor and ITS(es)
> - various bug fixes and amended comments after review comments
> 
> Changelog v3..v4:
> - adapting to new VGIC (changes in IRQ injection mechanism)
> 
> Changelog v2..v3:
> - adapt to 4.3-rc and Christoffer's timer rework
> - adapt spin locks on handling PROPBASER/PENDBASER registers
> - rework locking in ITS command handling (dropping dist where needed)
> - only clear LPI pending bit if LPI could actually be queued
> - simplify GICR_CTLR handling
> - properly free ITTEs (including our pending bitmap)
> - fix corner cases with unmapped collections
> - keep retire_lr() around
> - rename vgic_handle_base_register to vgic_reg64_access()
> - use kcalloc instead of kmalloc
> - minor fixes, renames and added comments
> 
> Changelog v1..v2
> - fix issues when using non-ITS GICv3 emulation
> - streamline frame address initialization (new patch 05/15)
> - preallocate buffer memory for reading from guest's memory
> - move locking into the actual command handlers
> -   preallocate memory for new structures if needed
> - use non-atomic __set_bit() and __clear_bit() when under the lock
> - add INT command handler to allow LPI injection from the guest
> - rewrite CWRITER handler to align with new locking scheme
> - remove unneeded CONFIG_HAVE_KVM_MSI #ifdefs
> - check memory table size against our LPI limit (65536 interrupts)
> - observe initial gap of 1024 interrupts in pending table
> - use term "configuration table" to be in line with the spec
> - clarify and extend documentation on API extensions
> - introduce new KVM_CAP_MSI_DEVID capability to advertise device ID requirement
> - update, fix and add many comments
> - minor style changes as requested by reviewers

I'm done for this round. Some issues are relatively cosmetic and can be
fixed pretty quickly. Some others need more attention. Overall, you seem
to leave crumbles of previous designs, which makes it hard to follow
sometimes (I love SW archaeology as much as the next guy, but the GIC
has stopped exciting me a while ago...).

Looking forward to v9.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 13/17] KVM: arm64: read initial LPI pending table
  2016-07-11 16:50   ` Marc Zyngier
@ 2016-07-11 17:38     ` Andre Przywara
  2016-07-12 11:33     ` Andre Przywara
  1 sibling, 0 replies; 49+ messages in thread
From: Andre Przywara @ 2016-07-11 17:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 11/07/16 17:50, Marc Zyngier wrote:
> On 05/07/16 12:23, Andre Przywara wrote:
>> The LPI pending status for a GICv3 redistributor is held in a table
>> in (guest) memory. To achieve reasonable performance, we cache this
>> data in our struct vgic_irq. The initial pending state must be read
>> from guest memory upon enabling LPIs for this redistributor.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  virt/kvm/arm/vgic/vgic-its.c | 81 ++++++++++++++++++++++++++++++++++++++++++++
>>  virt/kvm/arm/vgic/vgic.h     |  6 ++++
>>  2 files changed, 87 insertions(+)
>>
>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>> index 1e2e649..29bb4fe 100644
>> --- a/virt/kvm/arm/vgic/vgic-its.c
>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>> @@ -93,6 +93,81 @@ struct its_itte {
>>  		list_for_each_entry(itte, &(dev)->itt_head, itte_list)
>>  
>>  #define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
>> +#define PENDBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 16))
> 
> 52 bits again. Pick a side!

Well, this is the architecturally described address field in the
register. It's only used to mask the (ideally already) sanitised value.
But you are right that I should clear bits 51:48 upon the guest writing
the register (also true for the other registers). Will fix this.

>> +
>> +static int vgic_its_copy_lpi_list(struct kvm *kvm, u32 **intid_ptr)
>> +{
>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>> +	struct vgic_irq *irq;
>> +	u32 *intids;
>> +	int irq_count = dist->lpi_list_count, i = 0;
>> +
>> +	/*
>> +	 * We use the current value of the list length, which may change
>> +	 * after the kmalloc. We don't care, because the guest shouldn't
>> +	 * change anything while the command handling is still running,
>> +	 * and in the worst case we would miss a new IRQ, which one wouldn't
>> +	 * expect to be covered by this command anyway.
>> +	 */
>> +	intids = kmalloc_array(irq_count, sizeof(intids[0]), GFP_KERNEL);
>> +	if (!intids)
>> +		return -ENOMEM;
>> +
>> +	spin_lock(&dist->lpi_list_lock);
>> +	list_for_each_entry(irq, &dist->lpi_list_head, lpi_entry) {
>> +		if (kref_get_unless_zero(&irq->refcount)) {
>> +			intids[i] = irq->intid;
>> +			vgic_put_irq_locked(kvm, irq);
> 
> This is ugly. You know you're not going to free the irq, since it was at
> least one when you did kref_get_unless_zero(). Why not doing a simple
> kref_put (possibly in a macro so that you can hide the dummy release
> function)?

Do I know that? What prevents another user (ap_list or ITTE) to remove
its reference meanwhile? The lpi_list_lock does not help here, since
that just protects the lpi_list, but not any references.
But I am wondering whether I actually still need that unless_zero
version, let me think about that.

>> +		}
>> +		if (i++ == irq_count)
>> +			break;
>> +	}
>> +	spin_unlock(&dist->lpi_list_lock);
>> +
>> +	*intid_ptr = intids;
>> +	return irq_count;
>> +}
>> +
>> +/*
>> + * Scan the whole LPI pending table and sync the pending bit in there
>> + * with our own data structures. This relies on the LPI being
>> + * mapped before.
>> + */
>> +static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
>> +{
>> +	gpa_t pendbase = PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
>> +	struct vgic_irq *irq;
>> +	u8 pendmask;
>> +	int ret = 0;
>> +	u32 *intids;
>> +	int nr_irqs, i;
>> +
>> +	nr_irqs = vgic_its_copy_lpi_list(vcpu->kvm, &intids);
>> +	if (nr_irqs < 0)
>> +		return nr_irqs;
>> +
>> +	for (i = 0; i < nr_irqs; i++) {
>> +		int byte_offset, bit_nr;
>> +
>> +		byte_offset = intids[i] / BITS_PER_BYTE;
>> +		bit_nr = intids[i] % BITS_PER_BYTE;
>> +
>> +		ret = kvm_read_guest(vcpu->kvm, pendbase + byte_offset,
>> +				     &pendmask, 1);
> 
> How about having a small cache of the last read offset and data? If LPIs
> are contiguously allocated, you save yourself quite a few (expensive)
> userspace accesses.

Sounds good.

> 
>> +		if (ret) {
>> +			kfree(intids);
>> +			return ret;
>> +		}
>> +
>> +		irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
>> +		spin_lock(&irq->irq_lock);
>> +		irq->pending = pendmask & (1U << bit_nr);
>> +		vgic_queue_irq_unlock(vcpu->kvm, irq);
>> +		vgic_put_irq(vcpu->kvm, irq);
>> +	}
>> +
>> +	return ret;
>> +}
>>  
>>  static unsigned long vgic_mmio_read_its_ctlr(struct kvm *vcpu,
>>  					     struct vgic_its *its,
>> @@ -415,6 +490,12 @@ static struct vgic_register_region its_registers[] = {
>>  		VGIC_ACCESS_32bit),
>>  };
>>  
>> +/* This is called on setting the LPI enable bit in the redistributor. */
>> +void vgic_enable_lpis(struct kvm_vcpu *vcpu)
>> +{
>> +	its_sync_lpi_pending_table(vcpu);
>> +}
>> +
>>  static int vgic_its_register(struct kvm *kvm, struct vgic_its *its)
>>  {
>>  	struct vgic_io_device *iodev = &its->iodev;
>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
>> index eef9ec1..4a9165f 100644
>> --- a/virt/kvm/arm/vgic/vgic.h
>> +++ b/virt/kvm/arm/vgic/vgic.h
>> @@ -25,6 +25,7 @@
>>  #define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
>>  
>>  #define INTERRUPT_ID_BITS_SPIS	10
>> +#define INTERRUPT_ID_BITS_ITS	16
> 
> Do we have plan for a userspace-accessible property for this? I can
> imagine userspace willing to have bigger LPI space...

The idea was to add an attribute later via the kvm_device API to set up
LPI parameters. That's why I introduced the init call to signal that
setup is finished and we can use those numbers (or defaults).
Since we can check for the existence of attributes via the has_attr
interface, we can add them at any time without breaking something.
I can look how involved it is to add something for the number of LPI
(bits) now.

Cheers,
Andre.


>>  #define VGIC_PRI_BITS		5
>>  
>>  #define vgic_irq_is_sgi(intid) ((intid) < VGIC_NR_SGIS)
>> @@ -79,6 +80,7 @@ int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t dist_base_address);
>>  bool vgic_has_its(struct kvm *kvm);
>>  int kvm_vgic_register_its_device(void);
>>  struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid);
>> +void vgic_enable_lpis(struct kvm_vcpu *vcpu);
>>  #else
>>  static inline void vgic_v3_process_maintenance(struct kvm_vcpu *vcpu)
>>  {
>> @@ -145,6 +147,10 @@ static inline struct vgic_irq *vgic_its_get_lpi(struct kvm *kvm, u32 intid)
>>  {
>>  	return NULL;
>>  }
>> +
>> +static inline void vgic_enable_lpis(struct kvm_vcpu *vcpu)
>> +{
>> +}
>>  #endif
>>  
>>  int kvm_register_vgic_device(unsigned long type);
>>
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 15/17] KVM: arm64: implement ITS command queue command handlers
  2016-07-11 17:17   ` Marc Zyngier
@ 2016-07-11 17:47     ` Andre Przywara
  2016-07-11 17:52       ` Marc Zyngier
  0 siblings, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-11 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 11/07/16 18:17, Marc Zyngier wrote:
> On 05/07/16 12:23, Andre Przywara wrote:
>> The connection between a device, an event ID, the LPI number and the
>> allocated CPU is stored in in-memory tables in a GICv3, but their
>> format is not specified by the spec. Instead software uses a command
>> queue in a ring buffer to let the ITS implementation use their own
>> format.
>> Implement handlers for the various ITS commands and let them store
>> the requested relation into our own data structures. Those data
>> structures are protected by the its_lock mutex.
>> Our internal ring buffer read and write pointers are protected by the
>> its_cmd mutex, so that at most one VCPU per ITS can handle commands at
>> any given time.
>> Error handling is very basic at the moment, as we don't have a good
>> way of communicating errors to the guest (usually a SError).
>> The INT command handler is missing at this point, as we gain the
>> capability of actually injecting MSIs into the guest only later on.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  virt/kvm/arm/vgic/vgic-its.c | 609 ++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 605 insertions(+), 4 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>> index 5de71bd..432daed 100644
>> --- a/virt/kvm/arm/vgic/vgic-its.c
>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>> @@ -58,6 +58,43 @@ out_unlock:
>>  	return irq;
>>  }
>>  
>> +/*
>> + * Creates a new (reference to a) struct vgic_irq for a given LPI.
>> + * If this LPI is already mapped on another ITS, we increase its refcount
>> + * and return a pointer to the existing structure.
>> + * If this is a "new" LPI, we allocate and initialize a new struct vgic_irq.
>> + * This function returns a pointer to the _unlocked_ structure.
>> + */
>> +static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid)
>> +{
>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>> +	struct vgic_irq *irq = vgic_its_get_lpi(kvm, intid);
> 
> So this thing doesn't return with any lock held...
> 
>> +
>> +	/* In this case there is no put, since we keep the reference. */
>> +	if (irq)
>> +		return irq;
>> +
>> +	irq = kzalloc(sizeof(struct vgic_irq), GFP_KERNEL);
>> +
>> +	if (!irq)
>> +		return NULL;
>> +
>> +	INIT_LIST_HEAD(&irq->lpi_entry);
>> +	INIT_LIST_HEAD(&irq->ap_list);
>> +	spin_lock_init(&irq->irq_lock);
>> +
>> +	irq->config = VGIC_CONFIG_EDGE;
>> +	kref_init(&irq->refcount);
>> +	irq->intid = intid;
> 
> which means that two callers can allocate their own irq structure...

In practise this will never happen, because the only caller
(handle_mapi) takes the its_lock mutex. But I see that this is fragile
and not safe. I guess I can search the list again after having taken the
lock.

>> +
>> +	spin_lock(&dist->lpi_list_lock);
>> +	list_add_tail(&irq->lpi_entry, &dist->lpi_list_head);
>> +	dist->lpi_list_count++;
>> +	spin_unlock(&dist->lpi_list_lock);
> 
> and insert it. Not too bad if they are different LPIs, but leading to
> Armageddon if they are the same. You absolutely need to check for the
> the presence of the interrupt in this list *while holding the lock*.
> 
>> +
>> +	return irq;
>> +}
>> +
>>  struct its_device {
>>  	struct list_head dev_list;
>>  

....

>> +/*
>> + * The INVALL command requests flushing of all IRQ data in this collection.
>> + * Find the VCPU mapped to that collection, then iterate over the VM's list
>> + * of mapped LPIs and update the configuration for each IRQ which targets
>> + * the specified vcpu. The configuration will be read from the in-memory
>> + * configuration table.
>> + */
>> +static int vgic_its_cmd_handle_invall(struct kvm *kvm, struct vgic_its *its,
>> +				  u64 *its_cmd)
>> +{
>> +	u32 coll_id = its_cmd_get_collection(its_cmd);
>> +	struct its_collection *collection;
>> +	struct kvm_vcpu *vcpu;
>> +	struct vgic_irq *irq;
>> +	u32 *intids;
>> +	int irq_count, i;
>> +
>> +	mutex_lock(&its->its_lock);
>> +
>> +	collection = find_collection(its, coll_id);
>> +	if (!its_is_collection_mapped(collection))
>> +		return E_ITS_INVALL_UNMAPPED_COLLECTION;
>> +
>> +	vcpu = kvm_get_vcpu(kvm, collection->target_addr);
>> +
>> +	irq_count = vgic_its_copy_lpi_list(kvm, &intids);
>> +	if (irq_count < 0)
>> +		return irq_count;
>> +
>> +	for (i = 0; i < irq_count; i++) {
>> +		irq = vgic_get_irq(kvm, NULL, intids[i]);
>> +		if (!irq)
>> +			continue;
>> +		update_lpi_config_filtered(kvm, irq, vcpu);
>> +		vgic_put_irq_locked(kvm, irq);
> 
> Where is the lpi_list_lock taken?

Argh, good catch!

> And why would we need it since we've
> copied everything already? By the look of it, this vgic_put_irq_locked
> should not exist at all, as the only other use case is quite dubious.

Possibly, I don't like it either. Let me check if I can kill that sucker.

Cheers,
Andre.

> 
>> +	}
>> +
>> +	kfree(intids);
>> +
>> +	mutex_unlock(&its->its_lock);
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * The MOVALL command moves the pending state of all IRQs targeting one
>> + * redistributor to another. We don't hold the pending state in the VCPUs,
>> + * but in the IRQs instead, so there is really not much to do for us here.
>> + * However the spec says that no IRQ must target the old redistributor
>> + * afterwards, so we make sure that no LPI is using the associated target_vcpu.
>> + * This command affects all LPIs in the system.
>> + */
>> +static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
>> +				  u64 *its_cmd)
>> +{
>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>> +	u32 target1_addr = its_cmd_get_target_addr(its_cmd);
>> +	u32 target2_addr = its_cmd_mask_field(its_cmd, 3, 16, 32);
>> +	struct kvm_vcpu *vcpu1, *vcpu2;
>> +	struct vgic_irq *irq;
>> +
>> +	if (target1_addr >= atomic_read(&kvm->online_vcpus) ||
>> +	    target2_addr >= atomic_read(&kvm->online_vcpus))
>> +		return E_ITS_MOVALL_PROCNUM_OOR;
>> +
>> +	if (target1_addr == target2_addr)
>> +		return 0;
>> +
>> +	vcpu1 = kvm_get_vcpu(kvm, target1_addr);
>> +	vcpu2 = kvm_get_vcpu(kvm, target2_addr);
>> +
>> +	spin_lock(&dist->lpi_list_lock);
>> +
>> +	list_for_each_entry(irq, &dist->lpi_list_head, lpi_entry) {
>> +		spin_lock(&irq->irq_lock);
>> +
>> +		if (irq->target_vcpu == vcpu1)
>> +			irq->target_vcpu = vcpu2;
>> +
>> +		spin_unlock(&irq->irq_lock);
>> +	}
>> +
>> +	spin_unlock(&dist->lpi_list_lock);
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * This function is called with the its_cmd lock held, but the ITS data
>> + * structure lock dropped. It is within the responsibility of the actual
>> + * command handlers to take care of proper locking when needed.
>> + */
>> +static int vgic_its_handle_command(struct kvm *kvm, struct vgic_its *its,
>>  			       u64 *its_cmd)
>>  {
>> -	return -ENODEV;
>> +	u8 cmd = its_cmd_get_command(its_cmd);
>> +	int ret = -ENODEV;
>> +
>> +	switch (cmd) {
>> +	case GITS_CMD_MAPD:
>> +		ret = vgic_its_cmd_handle_mapd(kvm, its, its_cmd);
>> +		break;
>> +	case GITS_CMD_MAPC:
>> +		ret = vgic_its_cmd_handle_mapc(kvm, its, its_cmd);
>> +		break;
>> +	case GITS_CMD_MAPI:
>> +		ret = vgic_its_cmd_handle_mapi(kvm, its, its_cmd, cmd);
>> +		break;
>> +	case GITS_CMD_MAPTI:
>> +		ret = vgic_its_cmd_handle_mapi(kvm, its, its_cmd, cmd);
>> +		break;
>> +	case GITS_CMD_MOVI:
>> +		ret = vgic_its_cmd_handle_movi(kvm, its, its_cmd);
>> +		break;
>> +	case GITS_CMD_DISCARD:
>> +		ret = vgic_its_cmd_handle_discard(kvm, its, its_cmd);
>> +		break;
>> +	case GITS_CMD_CLEAR:
>> +		ret = vgic_its_cmd_handle_clear(kvm, its, its_cmd);
>> +		break;
>> +	case GITS_CMD_MOVALL:
>> +		ret = vgic_its_cmd_handle_movall(kvm, its, its_cmd);
>> +		break;
>> +	case GITS_CMD_INV:
>> +		ret = vgic_its_cmd_handle_inv(kvm, its, its_cmd);
>> +		break;
>> +	case GITS_CMD_INVALL:
>> +		ret = vgic_its_cmd_handle_invall(kvm, its, its_cmd);
>> +		break;
>> +	case GITS_CMD_SYNC:
>> +		/* we ignore this command: we are in sync all of the time */
>> +		ret = 0;
>> +		break;
>> +	}
> 
> Given that most commands do take the its mutex, it would make a lot of
> sense to move the locking here, and remove it from all of the other
> commands. This will streamline the code.
> 
>> +
>> +	return ret;
>>  }
>>  
>>  static u64 vgic_sanitise_its_baser(u64 reg)
>> @@ -403,7 +1004,7 @@ static void vgic_mmio_write_its_cwriter(struct kvm *kvm, struct vgic_its *its,
>>  		 * We just ignore that command then.
>>  		 */
>>  		if (!ret)
>> -			vits_handle_command(kvm, its, cmd_buf);
>> +			vgic_its_handle_command(kvm, its, cmd_buf);
> 
> Care to solve this function renaming nit?
> 
>>  
>>  		its->creadr += ITS_CMD_SIZE;
>>  		if (its->creadr == ITS_CMD_BUFFER_SIZE(its->cbaser))
>>
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 15/17] KVM: arm64: implement ITS command queue command handlers
  2016-07-11 17:47     ` Andre Przywara
@ 2016-07-11 17:52       ` Marc Zyngier
  0 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2016-07-11 17:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/07/16 18:47, Andre Przywara wrote:
> Hi,
> 
> On 11/07/16 18:17, Marc Zyngier wrote:
>> On 05/07/16 12:23, Andre Przywara wrote:
>>> The connection between a device, an event ID, the LPI number and the
>>> allocated CPU is stored in in-memory tables in a GICv3, but their
>>> format is not specified by the spec. Instead software uses a command
>>> queue in a ring buffer to let the ITS implementation use their own
>>> format.
>>> Implement handlers for the various ITS commands and let them store
>>> the requested relation into our own data structures. Those data
>>> structures are protected by the its_lock mutex.
>>> Our internal ring buffer read and write pointers are protected by the
>>> its_cmd mutex, so that at most one VCPU per ITS can handle commands at
>>> any given time.
>>> Error handling is very basic at the moment, as we don't have a good
>>> way of communicating errors to the guest (usually a SError).
>>> The INT command handler is missing at this point, as we gain the
>>> capability of actually injecting MSIs into the guest only later on.
>>>
>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>> ---
>>>  virt/kvm/arm/vgic/vgic-its.c | 609 ++++++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 605 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>>> index 5de71bd..432daed 100644
>>> --- a/virt/kvm/arm/vgic/vgic-its.c
>>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>>> @@ -58,6 +58,43 @@ out_unlock:
>>>  	return irq;
>>>  }
>>>  
>>> +/*
>>> + * Creates a new (reference to a) struct vgic_irq for a given LPI.
>>> + * If this LPI is already mapped on another ITS, we increase its refcount
>>> + * and return a pointer to the existing structure.
>>> + * If this is a "new" LPI, we allocate and initialize a new struct vgic_irq.
>>> + * This function returns a pointer to the _unlocked_ structure.
>>> + */
>>> +static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid)
>>> +{
>>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>>> +	struct vgic_irq *irq = vgic_its_get_lpi(kvm, intid);
>>
>> So this thing doesn't return with any lock held...
>>
>>> +
>>> +	/* In this case there is no put, since we keep the reference. */
>>> +	if (irq)
>>> +		return irq;
>>> +
>>> +	irq = kzalloc(sizeof(struct vgic_irq), GFP_KERNEL);
>>> +
>>> +	if (!irq)
>>> +		return NULL;
>>> +
>>> +	INIT_LIST_HEAD(&irq->lpi_entry);
>>> +	INIT_LIST_HEAD(&irq->ap_list);
>>> +	spin_lock_init(&irq->irq_lock);
>>> +
>>> +	irq->config = VGIC_CONFIG_EDGE;
>>> +	kref_init(&irq->refcount);
>>> +	irq->intid = intid;
>>
>> which means that two callers can allocate their own irq structure...
> 
> In practise this will never happen, because the only caller
> (handle_mapi) takes the its_lock mutex. But I see that this is fragile

Given that the its_lock is per ITS, and that we're dealing with global
objects, this doesn't protect against anything. I can have two VCPUs
firing MAPIs on two ITSs, and hit that path with reasonable chances of
creating mayhem.

> and not safe. I guess I can search the list again after having taken the
> lock.

Please do.

> 
>>> +
>>> +	spin_lock(&dist->lpi_list_lock);
>>> +	list_add_tail(&irq->lpi_entry, &dist->lpi_list_head);
>>> +	dist->lpi_list_count++;
>>> +	spin_unlock(&dist->lpi_list_lock);
>>
>> and insert it. Not too bad if they are different LPIs, but leading to
>> Armageddon if they are the same. You absolutely need to check for the
>> the presence of the interrupt in this list *while holding the lock*.
>>
>>> +
>>> +	return irq;
>>> +}
>>> +
>>>  struct its_device {
>>>  	struct list_head dev_list;
>>>  
> 
> ....
> 
>>> +/*
>>> + * The INVALL command requests flushing of all IRQ data in this collection.
>>> + * Find the VCPU mapped to that collection, then iterate over the VM's list
>>> + * of mapped LPIs and update the configuration for each IRQ which targets
>>> + * the specified vcpu. The configuration will be read from the in-memory
>>> + * configuration table.
>>> + */
>>> +static int vgic_its_cmd_handle_invall(struct kvm *kvm, struct vgic_its *its,
>>> +				  u64 *its_cmd)
>>> +{
>>> +	u32 coll_id = its_cmd_get_collection(its_cmd);
>>> +	struct its_collection *collection;
>>> +	struct kvm_vcpu *vcpu;
>>> +	struct vgic_irq *irq;
>>> +	u32 *intids;
>>> +	int irq_count, i;
>>> +
>>> +	mutex_lock(&its->its_lock);
>>> +
>>> +	collection = find_collection(its, coll_id);
>>> +	if (!its_is_collection_mapped(collection))
>>> +		return E_ITS_INVALL_UNMAPPED_COLLECTION;
>>> +
>>> +	vcpu = kvm_get_vcpu(kvm, collection->target_addr);
>>> +
>>> +	irq_count = vgic_its_copy_lpi_list(kvm, &intids);
>>> +	if (irq_count < 0)
>>> +		return irq_count;
>>> +
>>> +	for (i = 0; i < irq_count; i++) {
>>> +		irq = vgic_get_irq(kvm, NULL, intids[i]);
>>> +		if (!irq)
>>> +			continue;
>>> +		update_lpi_config_filtered(kvm, irq, vcpu);
>>> +		vgic_put_irq_locked(kvm, irq);
>>
>> Where is the lpi_list_lock taken?
> 
> Argh, good catch!
> 
>> And why would we need it since we've
>> copied everything already? By the look of it, this vgic_put_irq_locked
>> should not exist at all, as the only other use case is quite dubious.
> 
> Possibly, I don't like it either. Let me check if I can kill that sucker.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 13/17] KVM: arm64: read initial LPI pending table
  2016-07-11 16:50   ` Marc Zyngier
  2016-07-11 17:38     ` Andre Przywara
@ 2016-07-12 11:33     ` Andre Przywara
  2016-07-12 12:39       ` Marc Zyngier
  1 sibling, 1 reply; 49+ messages in thread
From: Andre Przywara @ 2016-07-12 11:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 11/07/16 17:50, Marc Zyngier wrote:
> On 05/07/16 12:23, Andre Przywara wrote:
>> The LPI pending status for a GICv3 redistributor is held in a table
>> in (guest) memory. To achieve reasonable performance, we cache this
>> data in our struct vgic_irq. The initial pending state must be read
>> from guest memory upon enabling LPIs for this redistributor.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>  virt/kvm/arm/vgic/vgic-its.c | 81 ++++++++++++++++++++++++++++++++++++++++++++
>>  virt/kvm/arm/vgic/vgic.h     |  6 ++++
>>  2 files changed, 87 insertions(+)
>>
>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>> index 1e2e649..29bb4fe 100644
>> --- a/virt/kvm/arm/vgic/vgic-its.c
>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>> @@ -93,6 +93,81 @@ struct its_itte {
>>  		list_for_each_entry(itte, &(dev)->itt_head, itte_list)
>>  
>>  #define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
>> +#define PENDBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 16))
> 
> 52 bits again. Pick a side!
> 
>> +
>> +static int vgic_its_copy_lpi_list(struct kvm *kvm, u32 **intid_ptr)
>> +{
>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>> +	struct vgic_irq *irq;
>> +	u32 *intids;
>> +	int irq_count = dist->lpi_list_count, i = 0;
>> +
>> +	/*
>> +	 * We use the current value of the list length, which may change
>> +	 * after the kmalloc. We don't care, because the guest shouldn't
>> +	 * change anything while the command handling is still running,
>> +	 * and in the worst case we would miss a new IRQ, which one wouldn't
>> +	 * expect to be covered by this command anyway.
>> +	 */
>> +	intids = kmalloc_array(irq_count, sizeof(intids[0]), GFP_KERNEL);
>> +	if (!intids)
>> +		return -ENOMEM;
>> +
>> +	spin_lock(&dist->lpi_list_lock);
>> +	list_for_each_entry(irq, &dist->lpi_list_head, lpi_entry) {
>> +		if (kref_get_unless_zero(&irq->refcount)) {
>> +			intids[i] = irq->intid;
>> +			vgic_put_irq_locked(kvm, irq);
> 
> This is ugly. You know you're not going to free the irq, since it was at
> least one when you did kref_get_unless_zero(). Why not doing a simple
> kref_put (possibly in a macro so that you can hide the dummy release
> function)?

I think I don't need the get and put at all, which would allow to
totally drop the vgic_put_irq_locked version:
1) We have the lpi_list_lock, so if we find the IRQ in the list, it's
still valid (we free it only after having it removed).
2) It can't be removed without dropping the lock.
3) We just store the number, not the pointer.
4) An LPI can be unmapped anyway after we dropped the lock and before we
actually use the copy. We take care of that already by calling get again
and coping with a NULL return.

So is it feasible to remove the get and put here completely or is that
dodgy since we technically use the reference?
Shall I document that one doesn't need get and put if holding the
lpi_list_lock?

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v8 13/17] KVM: arm64: read initial LPI pending table
  2016-07-12 11:33     ` Andre Przywara
@ 2016-07-12 12:39       ` Marc Zyngier
  0 siblings, 0 replies; 49+ messages in thread
From: Marc Zyngier @ 2016-07-12 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/07/16 12:33, Andre Przywara wrote:
> Hi,
> 
> On 11/07/16 17:50, Marc Zyngier wrote:
>> On 05/07/16 12:23, Andre Przywara wrote:
>>> The LPI pending status for a GICv3 redistributor is held in a table
>>> in (guest) memory. To achieve reasonable performance, we cache this
>>> data in our struct vgic_irq. The initial pending state must be read
>>> from guest memory upon enabling LPIs for this redistributor.
>>>
>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>> ---
>>>  virt/kvm/arm/vgic/vgic-its.c | 81 ++++++++++++++++++++++++++++++++++++++++++++
>>>  virt/kvm/arm/vgic/vgic.h     |  6 ++++
>>>  2 files changed, 87 insertions(+)
>>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>>> index 1e2e649..29bb4fe 100644
>>> --- a/virt/kvm/arm/vgic/vgic-its.c
>>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>>> @@ -93,6 +93,81 @@ struct its_itte {
>>>  		list_for_each_entry(itte, &(dev)->itt_head, itte_list)
>>>  
>>>  #define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
>>> +#define PENDBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 16))
>>
>> 52 bits again. Pick a side!
>>
>>> +
>>> +static int vgic_its_copy_lpi_list(struct kvm *kvm, u32 **intid_ptr)
>>> +{
>>> +	struct vgic_dist *dist = &kvm->arch.vgic;
>>> +	struct vgic_irq *irq;
>>> +	u32 *intids;
>>> +	int irq_count = dist->lpi_list_count, i = 0;
>>> +
>>> +	/*
>>> +	 * We use the current value of the list length, which may change
>>> +	 * after the kmalloc. We don't care, because the guest shouldn't
>>> +	 * change anything while the command handling is still running,
>>> +	 * and in the worst case we would miss a new IRQ, which one wouldn't
>>> +	 * expect to be covered by this command anyway.
>>> +	 */
>>> +	intids = kmalloc_array(irq_count, sizeof(intids[0]), GFP_KERNEL);
>>> +	if (!intids)
>>> +		return -ENOMEM;
>>> +
>>> +	spin_lock(&dist->lpi_list_lock);
>>> +	list_for_each_entry(irq, &dist->lpi_list_head, lpi_entry) {
>>> +		if (kref_get_unless_zero(&irq->refcount)) {
>>> +			intids[i] = irq->intid;
>>> +			vgic_put_irq_locked(kvm, irq);
>>
>> This is ugly. You know you're not going to free the irq, since it was at
>> least one when you did kref_get_unless_zero(). Why not doing a simple
>> kref_put (possibly in a macro so that you can hide the dummy release
>> function)?
> 
> I think I don't need the get and put at all, which would allow to
> totally drop the vgic_put_irq_locked version:
> 1) We have the lpi_list_lock, so if we find the IRQ in the list, it's
> still valid (we free it only after having it removed).
> 2) It can't be removed without dropping the lock.
> 3) We just store the number, not the pointer.
> 4) An LPI can be unmapped anyway after we dropped the lock and before we
> actually use the copy. We take care of that already by calling get again
> and coping with a NULL return.
> 
> So is it feasible to remove the get and put here completely or is that
> dodgy since we technically use the reference?
> Shall I document that one doesn't need get and put if holding the
> lpi_list_lock?

Yes please. Also put a comment in here, as this is the only place (I
think) where we are iterating the list without using refcounts.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2016-07-12 12:39 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-05 11:22 [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Andre Przywara
2016-07-05 11:22 ` [PATCH v8 01/17] KVM: arm/arm64: move redistributor kvm_io_devices Andre Przywara
2016-07-05 11:22 ` [PATCH v8 02/17] KVM: arm/arm64: check return value for kvm_register_vgic_device Andre Przywara
2016-07-05 11:22 ` [PATCH v8 03/17] KVM: extend struct kvm_msi to hold a 32-bit device ID Andre Przywara
2016-07-06 21:06   ` Christoffer Dall
2016-07-06 21:54     ` André Przywara
2016-07-07  9:37       ` Christoffer Dall
2016-07-05 11:22 ` [PATCH v8 04/17] KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities Andre Przywara
2016-07-05 11:22 ` [PATCH v8 05/17] KVM: kvm_io_bus: add kvm_io_bus_get_dev() call Andre Przywara
2016-07-06 21:15   ` Christoffer Dall
2016-07-06 21:36     ` André Przywara
2016-07-05 11:22 ` [PATCH v8 06/17] KVM: arm/arm64: VGIC: add refcounting for IRQs Andre Przywara
2016-07-07 13:13   ` Christoffer Dall
2016-07-07 15:00   ` Marc Zyngier
2016-07-08 10:28     ` Andre Przywara
2016-07-08 10:50       ` Marc Zyngier
2016-07-08 12:54         ` André Przywara
2016-07-08 13:09           ` Marc Zyngier
2016-07-08 13:14             ` André Przywara
2016-07-05 11:22 ` [PATCH v8 07/17] irqchip: refactor and add GICv3 definitions Andre Przywara
2016-07-05 11:23 ` [PATCH v8 08/17] KVM: arm64: handle ITS related GICv3 redistributor registers Andre Przywara
2016-07-08 15:40   ` Christoffer Dall
2016-07-11  7:45     ` André Przywara
2016-07-05 11:23 ` [PATCH v8 09/17] KVM: arm64: introduce ITS emulation file with MMIO framework Andre Przywara
2016-07-08 13:34   ` Marc Zyngier
2016-07-08 13:55     ` Marc Zyngier
2016-07-08 14:04     ` André Przywara
2016-07-05 11:23 ` [PATCH v8 10/17] KVM: arm64: introduce new KVM ITS device Andre Przywara
2016-07-05 11:23 ` [PATCH v8 11/17] KVM: arm64: implement basic ITS register handlers Andre Przywara
2016-07-08 14:58   ` Marc Zyngier
2016-07-11  9:00     ` Andre Przywara
2016-07-11 14:21       ` Marc Zyngier
2016-07-05 11:23 ` [PATCH v8 12/17] KVM: arm64: connect LPIs to the VGIC emulation Andre Przywara
2016-07-11 16:20   ` Marc Zyngier
2016-07-05 11:23 ` [PATCH v8 13/17] KVM: arm64: read initial LPI pending table Andre Przywara
2016-07-11 16:50   ` Marc Zyngier
2016-07-11 17:38     ` Andre Przywara
2016-07-12 11:33     ` Andre Przywara
2016-07-12 12:39       ` Marc Zyngier
2016-07-05 11:23 ` [PATCH v8 14/17] KVM: arm64: allow updates of LPI configuration table Andre Przywara
2016-07-11 16:59   ` Marc Zyngier
2016-07-05 11:23 ` [PATCH v8 15/17] KVM: arm64: implement ITS command queue command handlers Andre Przywara
2016-07-11 17:17   ` Marc Zyngier
2016-07-11 17:47     ` Andre Przywara
2016-07-11 17:52       ` Marc Zyngier
2016-07-05 11:23 ` [PATCH v8 16/17] KVM: arm64: implement MSI injection in ITS emulation Andre Przywara
2016-07-05 11:23 ` [PATCH v8 17/17] KVM: arm64: enable ITS emulation as a virtual MSI controller Andre Przywara
2016-07-06  8:52 ` [PATCH v8 00/17] KVM: arm64: GICv3 ITS emulation Auger Eric
2016-07-11 17:36 ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).